List views, flatMap, and forcing


#1

Consider this SO thread.
It is not easy to understand how things work (or why they do this way) in these two cases:

def compute(n: Int) = { println("Computing "+n); n * 2 }

// Example 0
val v0 = List.range(1, 4).view.flatMap(n => List(1,2).map(k => compute(n)))
v0(0) // Computing 1 
      // Computing 1
      // Computing 2
      // Computing 2
      // Computing 3
      // Computing 3
      // Computing 1
      // Computing 1

// Example 2
val v2 = List.range(1, 4).view.flatMap(n => List(1,2).view.map(k => compute(n)))
v2(0) // Computing 1

The result of example 0 is somewhat unexpected (as the SeqView nature is preserved on flatmap, even though the flatmap’s production is not a view). I also think that the scala-docs page on views should include some information about this (no info about SeqViewN, the type of values resulting from flatmapping a view, seems to be available on the web).

Can somebody explain what is happening under-the-hood? It is not easy to trace the calls (and the intended semantics) along List, SeqLike, SeqViewLike, TraversableViewLike and the like.


#2

Tracing the calls isn’t too bad but then my eyes have bled plenty of times over code like this.

In Example 0 there are two things which are important. First,

List.range(1, 4) === List(1, 2, 3)

Second, the implementation for indexing into a List first checks to see if the index provided is greater than the length of the list. Applying v0(0) first drops the first 0 elements (a no-op) and then since 0 is not < 0 it checks if what is remaining after the drop .isEmpty. A smarter view (like one that is views all the way down [See Example 2] could tell that the result is .nonEmpty after a single result but it seems Example 0 needs to compute the entire list before deciding it is .nonEmpty.

Without a view being involved v0 would equal

List(List(2, 2), List(4, 4), List(6, 6)).flatten

With the view involved v0 is a computation that will calculate a list if you perform actions on it that will force the view (such as indexing into it). I think these examples are also instructive

def compute(n: Int) = { println("Computing "+n); n * 2 }

scala> val vs = List.range(1, 4).view.flatMap{n => println(";;"+n); List(1,2).map(k => compute(n))}
vs: scala.collection.SeqView[Int,Seq[_]] = SeqViewN(...)

scala> vs.take(2)
res0: scala.collection.SeqView[Int,Seq[_]] = SeqViewNS(...)

scala> vs.take(2).toList
;;1
Computing 1
Computing 1
res1: List[Int] = List(2, 2)

scala> vs.take(3).toList
;;1
Computing 1
Computing 1
;;2
Computing 2
Computing 2
res2: List[Int] = List(2, 2, 4)

When we want only two elements of the list the view only needs to process the first range element producing the two desired elements. If we want three elements the view needs to process the first two range elements producing four elements of which we then only take three.

In Example 2 the views go all the way down and so we can produce our desired single element by only running the computation until a single element is produced. The lazier you make things the less work has to be done to generate the result you desire.

I realize that was a more anecdotal coverage than under-the-hood but I hope it gives you some idea of what’s going on.


#3

Re-reading the original question you might have been asking for how views work under the hood rather than how the example code was working. In Scala a view basically builds a data structure that overrides certain aspects of the principal to do the work of being more lazy. I’ll say that again another way because it’s key. A view doesn’t calculate a transformation. It’s a data structure that can be manipulated by surrounding with other view data structures which keeps things as lazy as possible. Only at the end when you need a final result do any transformations happen and usually they can happen on as little data as possible.

 List.range(0,10000)   .map{_ * 2}   .filter{ _ % 5 == 0}   .take(3)
| 10k values             | 10k values  | 2k values              | 3 values

List.range(0,10000).view   .map{_ * 2}        .filter{_ % 5 == 0}           .take(3)
| datastruct A                  |  datastruct B(A) | datastruct C(B(A))       | 3 values *

* take "wants" 3 values so it first "needs" a value.  Asks C for a value.
C needs a value so asks B for a value.  B asks A.
A gives 0.
B gives 0 * 2.
C gives 0.
take "needs" another value.  Asks C for a value.
C needs a value so asks B for a value.  B asks A.
A gives 1.
B gives 1 * 2.
C filters out 1 * 2 but has to give a value to take so asks B for a value.
B asks A.
A gives 2.
B gives 2 * 2.
C filters out 2 * 2 ...
...
A gives 5.
B gives 2 * 5.
C gives 10.  take now has 2 values and keeps going until it has the 3 desired (or runs off the end of the view).

For Seq that’s SeqView which works by fiddling with how length() and apply() work. The goal of a view is to do only as much work as is needed for whatever you are trying to do and not more. That’s going to meet with more or less success depending on what you are trying to do and how you have gone about trying to do it (Example 0 vs Example 2 above) but in general if I would normally need to calculate 10k values because of a .flatMap and a view can get me by only calculating 1k of them it’s still a win.

Pretty terrible description but without a whiteboard and a nice IDE to go bouncing around in code it’s tough (for me) to do better in so little space.


#4

That’s ok, but I’ll try to be more precise: why indexed access to a view flatmapped with a non-view requires computing length while it is not the case when flatmapped to another view? Once it is clear the principle for which the behaviour in the two examples makes sense and is indeed expected, how are Scala collections organised to support such idea?


#5

An excellent question and one I didn’t code dive to figue out. Looks SeqViewLike trait FlatMapped is doing the heavy lifting. It wants to pull in something from TraversableView. List apply just needs to check if something isEmpty. It looks like the FlatMapped trait is making use of length which you won’t know without calculating all the values to count them. Conceptually you only need to calculate the first two elements (required to calculate them both because the closure give to flatMap isn’t itself using a LIst.view) to know the remainder is nonEmpty but that doesn’t seem to be how it was implemented.