I am implementing a wrapper that manages iterators within it. Essentially I keep iterators in the wrapper
that represent columns of data. While testing I found that my wrapper is not lazy. So I made this very
simple test:
val counter1 = Stream.from(1).toIterable // No need to call toIterable
val r1 = counter1.map( _ * 2).take(25).toList // terminates
println(r1)
The infinite stream above can be processed lazily. So now I implement my own Iterable and expect it
to behave lazily too just as above. I have the following minimal example:
val data = List(Stream.from(1).toIterator)
val counter3 = new Iterable[List[Int]] {
override def iterator: Iterator[List[Int]] = new AbstractIterator[List[Int]] {
override def hasNext: Boolean = data.forall( p => p.hasNext)
override def next(): List[Int] = data.map { v => v.next() }
}
}
//val r3 = counter3.toIterable.map( _.map(_ * 2) ).take(25).toList // does not terminate
val r3 = counter3.map( _.map(_ * 2) ).take(25).toList // does not terminate
//val r3 = counter3.toIterator.map( _.map(_ * 2) ).take(25).toList // terminates
//val r3 = counter3.iterator.map( _.map(_ * 2) ).take(25).toList // terminates
println(r3)
If I use the Iterable’s map directly, the behavior is not lazy. I am obligated to access
the iterator first. What am I doing wrong?
The default implementations of collection methods up to Scala 2.12 are strict. You have to override basically everything if you want to implement a lazy collection. This changes with the new design in 2.13.0-M4 that makes the default implementations lazy and provides mix-in traits for more efficient strict implementations.
If you are specifically referring to the Iterable and Iterator implementations this seems odd. Note that I am using the Stream’s iteratable, so Iterable is lazy. Because I use that Iterable, my Iterable should also be lazy (baring some error I am making). Note that if I access the Stream’s iterator via the very same Iterable, then it is lazy. Must be more to it than meets the eye.
Same behavior on 2.13, maybe because of using List here? Just guessing from browsing. I got lost among the factories.
scala> val data = List(Stream.from(1).toIterator)
^
warning: method toIterator in trait IterableOnceOps is deprecated (since 2.13.0): Use .iterator() instead of .toIterator
^
warning: value Stream in package scala is deprecated (since 2.13.0): Use LazyList instead of Stream
data: List[Iterator[Int]] = List(non-empty iterator)
scala> def vs = new Iterable[List[Int]] { def iterator = new Iterator[List[Int]] { def hasNext = data.forall(_.hasNext); def next() = data.map(_.next()) }}
vs: Iterable[List[Int]]
scala> vs.map(_.map(_*2)).take(25).toList // bye
Note that what is built by Iterable.map has nothing to do with anything else, though one could imagine that an Iterable[SomethingLazy] might try to be lazy on 2.12. Also guessing.
The list is not an issue, but you are correct - the example is not clear.
I have made the following simpler example:
case class Wrapper(r:Iterable[Int])
val counter2 = Wrapper(Stream.from(1))
val r2 = counter2.r.map( _ * 2).take(25).toList // terminates
println(r2)
val data = Wrapper(Stream.from(1))
val counter3 = new Iterable[Int] {
override def iterator: Iterator[Int] = new AbstractIterator[Int] {
val iter = data.r.toIterator
override def hasNext: Boolean = iter.hasNext
override def next(): Int = iter.next()
}
}
//val r3 = counter3.toIterable.map( _ * 2 ).take(25).toList // does not terminate
//val r3 = counter3.map( _ * 2 ).take(25).toList // does not terminate
val r3 = counter3.toIterator.map( _ * 2 ).take(25).toList // terminates
//val r3 = counter3.iterator.map( _ * 2 ).take(25).toList // terminates
println(r3)
In the first “section” I use the Stream’s Iterable directly with no problem.
In the second case I simply wrap this Iterable in my own. Note that it does
nothing useful here - just delegates to the underlying iterator. But even so
it is not lazy when I process the stream via the Iterable and not the underlying
iterator.
Here is the “same” code that you can paste into scastie:
object Main {
case class Wrapper(r:Iterable[Int])
def main(args: Array[String]): Unit = {}
val counter2 = Wrapper(Stream.from(1))
val r2 = counter2.r.map( _ * 2).take(25).toList // terminates
println(r2)
val data = Wrapper(Stream.from(1))
val counter3 = new Iterable[Int] {
override def iterator: Iterator[Int] = new Iterator[Int] {
val iter = data.r.toIterator
override def hasNext: Boolean = iter.hasNext
override def next(): Int = iter.next()
}
}
//val r3 = counter3.toIterable.map( _ * 2 ).take(25).toList // does not terminate
val r3 = counter3.map( _ * 2 ).take(25).toList // does not terminate
//val r3 = counter3.toIterator.map( _ * 2 ).take(25).toList // terminates
//val r3 = counter3.iterator.map( _ * 2 ).take(25).toList // terminates
println(r3)
}
Same result, it hangs processing an infinite loop.
Hmm… now how do I cancel the process in scastie’s sever?
I see what you did there. But I had to take another look anyway.
The problem is indeed what map builds. In 2.12 map is strict so you’d have to override it with a lazy implementation and provide a sensible CanBuildFrom instance, too. In 2.13 map supports lazy building but the default Iterable, as @som-snytt points out, is List, so it’s not lazy. It works (in 2.13.0-M4) if you override the iterableFactory to build a lazy collection:
val counter3 = new Iterable[Int] {
override def iterator: Iterator[Int] = ...
override def iterableFactory = Stream
}
Ok. Finally get the gist of it. I am using 2.12 so I have one last question: the underlying CanBuildFrom generates a List, however the following statement succeeds:
val r3 = counter3.iterator.map( _ * 2 ).take(25).toList
Am I correct in assuming it is lazy? Reason I ask is because I am using this as a kludge to “solve” the problem.
Yes, you are correct. Iterator goes to great lengths to be lazy.
I see the latest commit on 2.12 Iterator is, “Iterator.scanLeft is lazy at initial value.” Apparently, it was my contribution, but I’d have to get out a napkin and a pencil to verify how it works. Another was, “GroupedIterator is also lazy when padded.” I no longer understand the helpful comments I added to the code four years ago.
Those iterators are like the Monday crossword.
I’m not sure why you see your solution as a kludge. Nothing says lazy like iterator, so you have made it clear that you want laziness, the way par makes it clear you want parallelism.