Streams and LazyLists in 2.12 vs 2.13

urbanc · August 9, 2019, 12:22pm

Hi All,

I am trying to update some code about Streams from 2.12 to 2.13. Streams have been deprecated in 2.13 and replaced by LazyLists. My question is why some memory performance has been lost in 2.13 (either by using the old Streams or the new LazyLists)?

For example, in the code below I am trying to enumerate bigger and bigger trees, which Scala 2.12 can cope with up to a depth of 4 (resulting in 599882556 trees). But the same code in 2.13 exits with an OutOfMemoryError exception. Is this to be expected? Is there anything one can do about it? Essentially I want to generate test cases and then test a property about them as far as my memory and patience last.

Thanks,
Christian

The code below prints out in 2.12 the size for the depth-4 case, but in 2.13 the last line results in an OutOfMemory exception. This is independent from whether Stream or LazyLists are used.

abstract class Tree
case object Empty extends Tree
case class Leaf(c: Char) extends Tree
case class Node(t1: Tree, t2: Tree) extends Tree

def enum(n: Int, cs: List[Char]) : Stream[Tree] = n match {
  case 0 => Empty #:: cs.toStream.map(Leaf)
  case n => {  
    val ts = enum(n - 1, cs)
    ts #:::
    (for (t1 <- ts; t2 <- ts) yield Node(t1, t2))
  }
}

println(enum(2, ('a' to 'b').toList).size)
println(enum(3, ('a' to 'b').toList).size)
println(enum(4, ('a' to 'b').toList).size)

szeiger · August 9, 2019, 4:34pm

Can you open a ticket for this? I wasn’t able to reproduce the exact failure (which would depend on the correct JVM settings) but the new Cons encoding is indeed bad. A Cons in 2.12 is 24 bytes vs 32 bytes in 2.13 due to the use of a lazy val which wastes space for the initialization state bitmap instead of cleverly reusing the tlGen variable as in the manual scheme in 2.12. Probably even worse: Since it doesn’t override tlGen, the memory for the initializer cannot be freed even after it has been run.

tarsa · August 9, 2019, 8:07pm

Is node size even relevant at all? I mean, on one hand enum(4, ('a' to 'b').toList).size forces the Stream, but OTOH there’s no need to keep computed elements in memory after they have been traversed. WDYT?

szeiger · August 12, 2019, 11:28am

The stream is built breadth first by appending a stream of sub-trees to a given tree. The head of the stream (and therefore the entire stream) doesn’t go out of scope until it has been built in entirety.

tarsa · August 12, 2019, 12:48pm

I don’t see why the whole stream needs to be kept in memory. Consider only the outermost invocation:

  case 4 => {  // 4 instead of n, because we care only about
               //  the most memory consuming recursion level
    val ts = enum(n - 1, cs) // assume `ts` has size of 10k
                             // after forcing evaluation
    ts #::: // here we're lazily recreating `ts` to form
            //  a new prefix of returned collection
    (for (t1 <- ts; t2 <- ts) yield Node(t1, t2)) // here the tail
                                                  // (of size 100m = 10k * 10k)
                                                  // is lazily created
  }

In above scenario only ts (of size 10k) needs to be kept fully evaluated in memory. Resulting stream of size 10k + 100m can be evaluated lazily. Is that correct?