Scala 2.8的for表達(dá)式:性能與運(yùn)行順序的改進(jìn)
本文來(lái)自EastSun的博客,原文標(biāo)題為《Scala2.8探秘之for表達(dá)式》。
51CTO編輯推薦:Scala編程語(yǔ)言專題
雖然Scala2.8還在持續(xù)跳票中,但目前Nightly Build版本的可用性已經(jīng)很高了,其中Scala2.8中主要特性都已經(jīng)實(shí)現(xiàn)。毫無(wú)疑問(wèn),Scala2.8的發(fā)布將會(huì)是Scala發(fā)展中的一個(gè)重大里程碑。在這個(gè)版本中,不僅包括了許多Scala社區(qū)期待已久的特性,如命名參數(shù)、類型特殊化等等,詳細(xì)的信息可以參看http://eastsun.javaeye.com/blog/373710;而且包含了一些對(duì)之前Scala中設(shè)計(jì)不合理的地方的改進(jìn),例如Scala中對(duì)String以及數(shù)組的處理,在之前的版本中Scala編譯器在編譯的時(shí)候?qū)@兩種類型進(jìn)行了一些比較特殊(魔法)的處理,這樣做雖然某些程度上使得這兩種類型在Scala中更加易用,但同時(shí)也破壞了Scala中的一致性,并隨之產(chǎn)生了一些離奇的問(wèn)題。舉個(gè)例子,在Scala2.7.7 final中我們可以看到如下結(jié)果:
- //Scala2.7.7 final
- scala> val array = Array("String Array")
- array: Array[java.lang.String] = Array(String Array)
- scala> println(array.toString())
- Array(String Array)
- scala> println(array)
- [Ljava.lang.String;@139491b
- scala> "WOW" == "WOW".reverse
- res4: Boolean = false
- scala> "abcdefg" map { _.toUpperCase }
- res5: Seq[Char] = ArrayBufferRO(A, B, C, D, E, F, G) //注意,得到的是個(gè)Seq[Char]而不是String
顯然這樣的運(yùn)行結(jié)果有違我們的直覺(jué),即使是對(duì)Scala有一定了解的人也未必能馬上明白其中的奧妙。
在Scala2.8中,這些問(wèn)題都得到了一定的解決——可能有些解決方式會(huì)讓你覺(jué)得退步了,但是保持了Scala的一致性,并且消除了編譯器在背后做的那些魔法,使事情變得簡(jiǎn)單——雖然這樣破壞了Scala的向后兼容性,但我認(rèn)為這樣做事非常值得的:這為Scala成為一門(mén)成功的語(yǔ)言墊下了基礎(chǔ)。下面我們看看在Scala 2.8.0.r20117-b20091213020323中上面代碼的運(yùn)行結(jié)果:
- scala> val array = Array("String Array")
- array: Array[java.lang.String] = Array(String Array)
- scala> println(array.toString())
- [Ljava.lang.String;@335053
- scala> println(array)
- [Ljava.lang.String;@335053
- scala> "WOW" == "WOW".reverse
- res2: Boolean = true
- scala> "abcdefg" map { _.toUpperCase }
- res3: String = ABCDEFG
可以看到,運(yùn)行結(jié)果就如我們所預(yù)想的那樣——固然其中數(shù)組的toString結(jié)果變得丑陋了,和Java中一樣丑陋,但保持一致了。
OK,String與數(shù)組的討論就此為止,以后如果有時(shí)間我再來(lái)詳細(xì)解釋一下其背后的故事;現(xiàn)在我們轉(zhuǎn)向本文要討論的東東:Scala中的for表達(dá)式。注意:在本文中使用了兩種不同的Scala版本:Scala2.7.7final與2.8.0.r20117-b20091213020323進(jìn)行對(duì)比。
1.Scala2.8之前的for表達(dá)式
在Scala中,通常有以下幾種使用方式:
- for (p <- e) e'
- for (p <- e if g) e'
- for (p <- e; p' <- e' ...) e''
以及相應(yīng)的
- for (p <- e) yield e'
- for (p <- e if g) yield e'
- for (p <- e; p' <- e' ...) yield e''
其中p,p'為Scala中的Pattern;e,e',e''為表達(dá)式;g為Boolean表達(dá)式。
根據(jù)《The Scala Language Specification Version 2.7》,上面的for表達(dá)式將在編譯階段展開(kāi)為下面的形式(沒(méi)有考慮p為比較復(fù)雜的Pattern時(shí)的情形):
- for (p <- e) e' => e.foreach { case p => e' }
- for (p <- e if g) e' => for (p <- e.filter{ (x1,...,xn) => g }) e' => ..
- for (p <- e; p' <- e' ...) e'' => e.foreach{ case p => for (p' <- e' ...) e'' }
以及相應(yīng)的
- for (p <- e) yield e' => e.map { case p => e' }
- for (p <- e if g) yield e' => for (p <- e.filter{ (x1,...,xn) => g }) yield e'
- for (p <- e; p' <- e' ...) yield e'' => e.flatmap { case p => for (p' <- e' ...) yield e'' }
注意的是,這個(gè)轉(zhuǎn)換發(fā)生在類型檢查之前。也就是說(shuō),對(duì)map,filter,flatMap以及foreach這四個(gè)方法的方法簽名沒(méi)有任何其它限制,只需要滿足展開(kāi)后for語(yǔ)句的類型檢查。通常情況下,對(duì)于一個(gè)具有類型參數(shù)A的類C——一般表示某種數(shù)據(jù)結(jié)構(gòu)(集合)——下面的定義方式是比較自然的:
- class C[A] {
- def map[B](f: A => B): C[B]
- def flatMap[B](f: A => C[B]): C[B]
- def filter(p: A => Boolean): C[A]
- def foreach(b: A => Unit): Unit
- }
相對(duì)Java1.5中的for語(yǔ)句,Scala的實(shí)現(xiàn)更加靈活,并且以一種輕量級(jí)的方式實(shí)現(xiàn)了List comprehension。舉個(gè)例子,下面幾行代碼實(shí)現(xiàn)了求一個(gè)List的全排列:
- scala> def perm[T](ls: List[T]): List[List[T]] = ls match {
- | case Nil => List(Nil)
- | case xs => for(x <- xs;ys <- perm(xs-x)) yield x::ys
- | }
- perm: [T](List[T])List[List[T]]
- scala> perm(1::2::3::Nil)
- res2: List[List[Int]] = List(List(1, 2, 3), List(1, 3, 2), List(2, 1, 3),
- List(2, 3, 1), List(3, 1, 2), List(3, 2, 1))
但是這個(gè)轉(zhuǎn)換規(guī)則還不甚完美。當(dāng)轉(zhuǎn)換后的表達(dá)式含有filter方法的時(shí)候,會(huì)產(chǎn)生幾個(gè)問(wèn)題。
(a)性能問(wèn)題
以一個(gè)簡(jiǎn)單的問(wèn)題為例:求1~999中所有偶數(shù)之和。
下面是兩段類似的解決代碼:
- val set = 1 until 1000
- var sum = 0
- for(num <- set;if(num%2 == 0)) sum += num
或者把if語(yǔ)句移到括號(hào)外面:
- val set = 1 until 1000
- var sum = 0
- for(num <- set) if(num%2 == 0) sum += num
這兩段代碼功能上應(yīng)該是等價(jià)的,但是運(yùn)行效率如何呢?下面首先寫(xiě)一個(gè)粗略的測(cè)試函數(shù):
- /**
- 計(jì)算count次調(diào)用call所需的時(shí)間,單位:毫秒
- */
- def time(call : => Unit,count: Int): Long = {
- var cnt = count
- val start = System.currentTimeMillis
- while(cnt > 0) {
- call
- cnt -= 1
- }
- System.currentTimeMillis - start
- }
先在Scala2.7.7final中將每段代碼各運(yùn)行十萬(wàn)次:
- scala> val set = 1 until 1000
- set: Range = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
- scala> time({
- | var sum = 0
- | for(num <- set;if(num%2 == 0)) sum += num
- | },100000)
- res3: Long = 47390
- scala>
- scala> time({
- | var sum = 0
- | for(num <- set) if(num%2 == 0) sum += num
- | },100000)
- res4: Long = 3344
- scala>
測(cè)試結(jié)果很出乎意料:兩段類似的代碼,性能竟相差一個(gè)數(shù)量級(jí)!
之所以會(huì)有這么大的差異,根據(jù)上述的轉(zhuǎn)換規(guī)則,第一段代碼將會(huì)轉(zhuǎn)換為下面的實(shí)際執(zhí)行代碼:
- val set = 1 until 1000
- var sum = 0
- set.filter{ num => num%2 == 0) }.foreach{ case num => sum += num }
而第二段代碼實(shí)際執(zhí)行的是:
- val set = 1 until 1000
- var sum = 0
- set.foreach{ case num => if(num%2 == 0) sum += num }
相對(duì)而言,第一段代碼會(huì)調(diào)用filter方法,創(chuàng)建一個(gè)新的集合類,而這個(gè)集合類包含了1~999中所有的偶數(shù)。顯然這個(gè)過(guò)程是比較昂貴的,也是不必要的。
(b)運(yùn)行順序
以
- for (p <- e if g) e'
為例,實(shí)際運(yùn)行的代碼為:
- e.filter{ (x1,...,xn) => g }.foreach{ case p => e'}
可以看到,雖然直觀上if g與e'在遍歷的時(shí)候應(yīng)該是依次循環(huán)執(zhí)行;但事實(shí)上轉(zhuǎn)換后if g整體先于e'執(zhí)行。當(dāng)g與e'中同時(shí)包含一個(gè)變量v,并且在g中對(duì)變量v進(jìn)行改動(dòng)時(shí),實(shí)際運(yùn)行結(jié)果可能和我們所預(yù)想的不一致。可能問(wèn)題描述的不是很清楚,下面引用fineqtbull同學(xué)在for語(yǔ)句中內(nèi)嵌if語(yǔ)句的副作用一文中的代碼作為例子來(lái)說(shuō)明:實(shí)現(xiàn)compress方法,其功能是將一個(gè)list中連續(xù)相同的元素刪減至一個(gè)。比如compress(List(1,1,2,3,3,1,1,4)) == List(1,2,3,1,4),下面是兩段類似的實(shí)現(xiàn)代碼,咋一看都沒(méi)問(wèn)題,但運(yùn)行結(jié)果卻不一樣。
- Welcome to Scala version 2.7.7.final (Java HotSpot(TM) Client VM, Java 1.6.0_17).
- scala> def compress1[T](ls: List[T]): List[T] = {
- | var res = List(ls.first)
- | for(x <- ls) if(x != res.last) res = res:::List(x)
- | res
- | }
- compress1: [T](List[T])List[T]
- scala> def compress2[T](ls: List[T]): List[T] = {
- | var res = List(ls.first)
- | for(x <- ls;if(x != res.last)) res = res:::List(x)
- | res
- | }
- compress2: [T](List[T])List[T]
- scala> compress1(List(1,1,2,3,3,1,1,4))
- res0: List[Int] = List(1, 2, 3, 1, 4)
- scala> compress2(List(1,1,2,3,3,1,1,4))
- res1: List[Int] = List(1, 2, 3, 3, 4)
- scala>
有了之前的說(shuō)明,我們不難發(fā)現(xiàn)其原因所在。但這樣的結(jié)果顯然違反了C/Java中習(xí)慣用法,很容易讓一個(gè)剛接觸Scala的人產(chǎn)生困惑。
2.Scala2.8中的for表達(dá)式
剛才已經(jīng)提到了Scala2.8之前for表達(dá)式所存在的兩個(gè)問(wèn)題,那么在Scala2.8中這兩個(gè)問(wèn)題有沒(méi)有得到解決呢?下面將之前的代碼在2.8.0.r20117-b20091213020323重新運(yùn)行一次試試:
- Welcome to Scala version 2.8.0.r20117-b20091213020323 (Java HotSpot(TM) Client VM, Java 1.6.0_17).
- scala> def time(call : => Unit,count: Int): Long = {
- | var cnt = count
- | val start = System.currentTimeMillis
- | while(cnt > 0) {
- | call
- | cnt -= 1
- | }
- | System.currentTimeMillis - start
- | }
- time: (call: => Unit,count: Int)Long
- scala> val set = 1 until 1000
- set: scala.collection.immutable.Range ...
- scala> time({
- | var sum = 0
- | for(num <- set;if(num%2 == 0)) sum += num
- | },100000)
- res0: Long = 6906
- scala> time({
- | var sum = 0
- | for(num <- set) if(num%2 == 0) sum += num
- | },100000)
- res1: Long = 4312
- scala> def compress1[T](ls: List[T]): List[T] = {
- | var res = List(ls.first)
- | for(x <- ls) if(x != res.last) res = res:::List(x)
- | res
- | }
- compress1: [T](ls: List[T])List[T]
- scala> def compress2[T](ls: List[T]): List[T] = {
- | var res = List(ls.first)
- | for(x <- ls;if(x != res.last)) res = res:::List(x)
- | res
- | }
- compress2: [T](ls: List[T])List[T]
- scala> compress1(List(1,1,2,3,3,1,1,4))
- res2: List[Int] = List(1, 2, 3, 1, 4)
- scala> compress2(List(1,1,2,3,3,1,1,4))
- res3: List[Int] = List(1, 2, 3, 1, 4)
- scala>
#T#可以看到Scala2.8中已經(jīng)很好的解決了這兩個(gè)問(wèn)題。對(duì)于一門(mén)語(yǔ)言,要想對(duì)已經(jīng)存在的問(wèn)題進(jìn)行改進(jìn)是很困難的,因?yàn)槭紫冗@些改進(jìn)要盡可能少的影響已經(jīng)存在的舊有代碼,另一方面不能帶來(lái)新的問(wèn)題。幸運(yùn)的是,Martin想到了一個(gè)簡(jiǎn)單而優(yōu)雅的方法成功做到了這些。下面簡(jiǎn)單的敘述一下Martin的解決方法,感興趣的同學(xué)可以去看Martin的原文Rethinking filter。
首先,Martin引入了一個(gè)新的方法withFilter,這個(gè)方法與filter方法一樣以一個(gè)條件函數(shù)A => Boolean作為參數(shù)。與filter方法不同的是,withFilter方法是lazy的。它并不會(huì)創(chuàng)建一個(gè)新的包含符合條件的元素所組成的集合類,而是返回一個(gè)代理類WithFilter,這個(gè)類具有foreach、map、flatMap以及withFilter等方法,并且所有這些方法的調(diào)用是與原來(lái)?xiàng)l件組合的結(jié)果。下面是TraversableLike中WithFilter的實(shí)現(xiàn),Scala2.8中所有的集合類都繼承了TraversableLike:
- class WithFilter(p: A => Boolean) {
- /** Builds a new collection by applying a function to all elements of the
- * outer $coll containing this `WithFilter` instance that satisfy predicate `p`.
- *
- * @param f the function to apply to each element.
- * @tparam B the element type of the returned collection.
- * @tparam That $thatinfo
- * @param bf $bfinfo
- * @return a new collection of type `That` resulting from applying the given function
- * `f` to each element of the outer $coll that satisfies predicate `p`
- * and collecting the results.
- *
- * @usecase def map[B](f: A => B): $Coll[B]
- *
- * @return a new $coll resulting from applying the given function
- * `f` to each element of the outer $coll that satisfies predicate `p`
- * and collecting the results.
- */
- def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
- val b = bf(repr)
- for (x <- self)
- if (p(x)) b += f(x)
- b.result
- }
- /** Builds a new collection by applying a function to all elements of the
- * outer $coll containing this `WithFilter` instance that satisfy predicate `p` and concatenating the results.
- *
- * @param f the function to apply to each element.
- * @tparam B the element type of the returned collection.
- * @tparam That $thatinfo
- * @param bf $bfinfo
- * @return a new collection of type `That` resulting from applying the given collection-valued function
- * `f` to each element of the outer $coll that satisfies predicate `p` and concatenating the results.
- *
- * @usecase def flatMap[B](f: A => Traversable[B]): $Coll[B]
- *
- * @return a new $coll resulting from applying the given collection-valued function
- * `f` to each element of the outer $coll that satisfies predicate `p` and concatenating the results.
- */
- def flatMap[B, That](f: A => Traversable[B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
- val b = bf(repr)
- for (x <- self)
- if (p(x)) b ++= f(x)
- b.result
- }
- /** Applies a function `f` to all elements of the outer $coll containing this `WithFilter` instance
- * that satisfy predicate `p`.
- *
- * @param f the function that is applied for its side-effect to every element.
- * The result of function `f` is discarded.
- *
- * @tparam U the type parameter describing the result of function `f`.
- * This result will always be ignored. Typically `U` is `Unit`,
- * but this is not necessary.
- *
- * @usecase def foreach(f: A => Unit): Unit
- */
- def foreach[U](f: A => U): Unit =
- for (x <- self)
- if (p(x)) f(x)
- /** Further refines the filter for this $coll.
- *
- * @param q the predicate used to test elements.
- * @return an object of class `WithFilter`, which supports
- * `map`, `flatMap`, `foreach`, and `withFilter` operations.
- * All these operations apply to those elements of this $coll which
- * satify the predicate `q` in addition to the predicate `p`.
- */
- def withFilter(q: A => Boolean): WithFilter =
- new WithFilter(x => p(x) && q(x))
- }
在Scala2.8中,for表達(dá)式的轉(zhuǎn)換方式大體保持不變,只是將以前使用filter的地方全部替換為withFilter方法。
結(jié)語(yǔ):可以看到Scala2.8成功解決了之前Scala中存在的一些問(wèn)題,使得Scala語(yǔ)言變得更加優(yōu)雅、強(qiáng)大。你還在等待Java7的閉包嗎?趕緊去嘗試Scala2.8吧^_^