Lazy values in arbitrary collections

Omicron · November 27, 2024, 1:27pm

I’m learning Scala currently, coming from Haskell. I’m a bit confused how Scala handles lazy values. I would expect that writing following code should be possible.

val fib : Array[LazyInt] = Array.tabulate(100) { case 0 => 0; case 1 => 1; case n => fib(n - 1) + fib(n - 2) }

But it won’t compile. Is laziness simply much more limited in Scala?

jducoeur · November 27, 2024, 1:59pm

Well, to begin with, laziness isn’t automatic in Scala. The language defaults to strictness, so you have to say lazy val if you want laziness. (That won’t solve your problem here, but it’s the first step.)

More generally, since the language isn’t lazy-centric the way Haskell is, you will generally need more ceremony for that sort of thing than you would over there. It just doesn’t have this sort of concise generators for lazy sequences built into the language itself – that sort of thing generally depends on libraries.

Keep in mind, Haskell’s a bit of an outlier. Most languages are strict; there aren’t that many that default to laziness.

spamegg1 · November 27, 2024, 2:41pm

Welcome to Scala.

Yes, laziness is much more limited.
Haskell is lazy by default, Scala has eager evaluation by default.
You can learn Scala’s evaluation strategy in detail, straight from the language creator, in Functional Programming in Scala course, Week 1.

I’m not sure what LazyInt is in your code…
OK, I looked it up and it’s in the standard library.
But that’s in the scala.runtime package, probably a special thing, not for basic usage.
(I’ve never seen it in my 4 years of Scala usage)

So it’s not a good idea to dive straight into it without learning the fundamentals of the language properly, and doing trial and error… don’t use Google searches or API lookups to figure it out on your own, use a proper resource to learn.

In Scala you can declare a value to be lazy with:

lazy val x = ??? // can be any type of value

Otherwise, it’s eagerly evaluated.

There are also “by-name parameters” which are a bit like “lazy parameters”, they are only evaluated if needed:

scala> def loop: Int = loop
1 warning found
-- Warning: ---------------------------------------------------------------------------
1 |def loop: Int = loop
  |^^^^^^^^^^^^^^^^^^^^
  |Infinite recursive call
def loop: Int
                                                                                       
scala> def spam(x: => Int) = 42 // by-name parameter
def spam(x: => Int): Int
                                                                                       
scala> spam(loop)
val res0: Int = 42

Otherwise all parameters are eagerly evaluated. These are explained in detail on Week 1 of the online course (linked above).

There are also “lazy collections” called View. Scala lets you switch back and forth between lazy and eager collections as you want, by using some methods.

The only “special lazy data structure” that is built-in to the language (to my knowledge) is LazyList. You can use it to do your Fibonacci thing here. LazyList is covered in detail in the second Scala course where one of the programming assignments use it heavily.

BalmungSan · November 27, 2024, 2:42pm

IIUC, you want a strict collection of fixed size but whose elements are lazyly computed?
You can get something like that using cats Eval: Scastie - An interactive playground for Scala.

Omicron · November 27, 2024, 3:15pm

Thanks, this is exactly what I was looking for, though would definitely prefer more an approach with less boilerplate.

For interested why I’m asking this question, it’s so I can write code like this here
https://jelv.is/blog/Lazy-Dynamic-Programming/

BalmungSan · November 27, 2024, 3:49pm

Yeah it seems cats.Eval is what you are looking for.
Maybe cats.effect.IO or fs2 may come in handy as well.

jducoeur · November 27, 2024, 4:15pm

Expanding on that a little:

Keep in mind that, whereas Haskell is All FP, All the Time, Scala is intentionally broader and multi-paradigm. (That was part of why it was created in the first place.) It allows various programming idioms, but they’re not all built directly in, the way they are in Haskell.

So working in Scala usually entails choosing an idiom, and then choosing libraries that support that idiom. In the case of baseline functional programming, the most central library is typically Cats, which provides a lot of the core utilities for FP.

On top of that, cats-effect provides the IO monad (as does ZIO); fs2 provides FP-friendly streaming machinery, which is lazy as you’re looking for.

sageserpent-open · November 27, 2024, 5:52pm

Thanks - that may be pertinent to something I’m working on: kineticMerge/src/main/scala/com/sageserpent/kineticmerge/core/LongestCommonSubsequence.scala at 27e4be3c12d7ac52e8287bca59a4dff4f4f101c9 · sageserpent-open/kineticMerge · GitHub. That code has gone through a lot of refactorings from one approach to another, it is currently a rather nasty hybrid of a pure functional dynamic programming algorithm with some imperative array use. Performance and stack safety are much better these days, so that’s the trade-off.

Read it and weep…

In the meantime, sticking with Fibonacci numbers and memoization (as opposed to strict dynamic programming), you might find this amusing from an earlier discussion: Scastie - An interactive playground for Scala..

Enjoy Scala!

sageserpent-open · November 27, 2024, 6:56pm

One thing about that linked Haskell code - given that it is a true dynamic programming problem, I wonder whether a very large number of the total number of sub problems end up being evaluated once anyway when you ask for the top level solution.

This was what I found when going down this route previously.

I don’t doubt that it will find the correct solution, I’m just not so sure about the efficiency aspect, other than simple memoization. It will avoid traversing the same subproblems again and again, which is great, but doesn’t thin out the sub problem tree all that much.

I would love to be proven wrong on this, if anyone can demonstrate it.

Omicron · November 28, 2024, 11:34am

I’m not sure what you exactly mean about “doesn’t thin out sub problem tree al that much” but naive lazy solutions like ones linked above when calling “top” value will be equivalent to top-down recursion. Although neat thing about lazy DP approach is that you can choose your own evaluation strategy if default one is not efficient enough (it results in a lot or recursive calls).

sageserpent-open · November 28, 2024, 12:45pm

There are multiple aspects to this:

You want memoisation as a cheap and cheerful way of making top-down recursion emulate dynamic programming; so the same subproblems aren’t revisited by multiple paths through the recursion tree. Dynamic programming systematically evaluates the subproblems up from the bottom, so each subproblem gets evaluated once, but…
A nice thing that can happen in theory with the top-down recursion is that not all the subproblems evaluated by a dynamic programming approach need to be evaluated by a recursive approach - that is what I meant by the ‘thinning-out’. I did some investigation of this for a three-way longest common subsequence algorithm and found that yes, there is some slight thinning on average when using recursion-with-memoization, but not all that much.

Some numbers taken from a property-based test for a longest common subsequence algorithm, the first are the number of subproblem evaluations via top-down recursion with memoization, the second are via dynamic programming:

(85139,85140)
(5481,5508)
(1612,1620)
(12990,12992)
(19998,20000)
(6894,44772)
(16606,18676)
(52285,52316)
(40497,40500)
(44456,44462)
(25165,25168)
(5513,36504)
(1193,1200)
(16976,42398)
(21793,21840)
(1343,5984)
(63165,63168)
(22388,75348)
(446,8800)
(7149,7150)
(3862,3900)
(22617,22620)
(32637,32640)
(15282,15283)
(15707,15708)
(21915,21945)
(27986,28014)
(98788,98838)
(32719,47040)
(19962,19992)
(80353,80401)

Note that there are the occasional big wins, but overall the recursive strategy doesn’t optimise that much. Furthermore…

There is the question of stack space when recursing. Even in a language with non-strict evaluation as a possibility, the subproblems require strict evaluation…

~~…or so I found. Again, I would like to be proven wrong here with a demonstration, because I like the clean recursive approach.~~

EDIT: what am I blathering on about? - I ended up using Cats to unroll the recursion via continuations IIRC, but it was incredibly slow.

There is also the question of heap space for the cached subproblems, this is not a trivial amount. It is possible to organise the dynamic programming algorithm to work in a bounded heap space requirement (what I call a ‘swathe’), but I couldn’t find a way around this for the top-down recursive memoized approach. Here is my mumbing to myself on this topic before I gave up: Revisit `LongestCommonSubsequence.of` to use a top-down recursive approach. · Issue #107 · sageserpent-open/kineticMerge · GitHub.

Not trying to put you off this at all, please do experiment, just found the topic interesting.

Have fun with it.