StackOverflowError on tail-recursive function

gclv · October 28, 2022, 5:46pm

Why would a function that is tail-recursive (or at least one that compiles with the @tailrec annotation) stack overflow?

I am attempting to write a continuation-passing style rightFold. Here’s what I’ve got so far:

def rightFold[T, U](base: U)(fn: (T, => U) => U)(list: Seq[T]): U = {
  @scala.annotation.tailrec
  def go(f: U => U, xs: Seq[T]): U = xs match {
    case Nil => f(base)
    case head +: next => go(u => fn(head, f(u)), next)
  }

  go(identity, list)
}

CPS is still a bit mind-bending to me, so I wouldn’t be surprised if the implementation wasn’t correct. Still, at least it compiles and yields correct results for trivial instances:

rightFold[Int, Int](0)(_ + _)(Array(1, 2, 3, 4).toIndexedSeq) // : Int = 10 <- expected

val shortList = LazyList.continually(10).take(256)
rightFold[Int, Int](0) { _ + _}(shortList) // : Int = 2560 <- expected

Now, for the actual question. What is being allocated on the stack here? What does this implementation overflow it? Is the implementation incorrect? Am I doing something silly?

val longList = LazyList.continually(10).take(100_000)
rightFold[Int, Int](0) { _ + _}(longList) // <-- Stack overflow!!

javax-swing · October 28, 2022, 6:09pm

I think the problem here is that your are building up a nested function, with 3 values in the list you would have f(1, f(2, f(3)))

when you have 100000 elements you have that many nest function calls

gclv · October 28, 2022, 6:40pm

Huh, I suppose that makes sense. I guess I haven’t fully grokked continuation-passing style, though, since I thought that was inevitable to implement a right fold.

Any chance anybody could point me to a good implementation of this?

Thanks a lot!

spamegg1 · October 29, 2022, 6:14am

I believe it’s not possible to make foldRight tail recursive (simply because of the way the binary operation f itself has to be applied, not any other reason), so the Scala Library uses while loops to implement it (or reverse the list and use foldLeft which is tail-recursive).

gclv · October 31, 2022, 12:42pm

Here’s an implementation that works:

def foldRightCPS[T, U](base: U)(f: (T, => U) => U)(as: Seq[T]): U = {
  import scala.util.control.TailCalls._

  @annotation.tailrec
  def _foldRightCPS(fu: U => TailRec[U])(as: Seq[T]): TailRec[U] = {
    as match {
      case Nil => fu(base)
      case head +: next => _foldRightCPS(u => tailcall(fu(f(head, u))))(next)
    }
  }
    
  _foldRightCPS(u => done(u))(as).result
}

It uses scala.util.control.TailCalls. (docs)

There’s a lot to this library that I don’t understand. But could someone explain to me what it is about such an implementation that doesn’t make it subject to the same types of problems?

SethTisue · October 31, 2022, 2:07pm

This might help: Trampoline (computing) - Wikipedia

After each function call, control returns to the trampoline (the result method in TailRec), so stack isn’t consumed. Each individual unit of work completes before we proceed to the next unit of work.

gclv · October 31, 2022, 5:23pm

Thanks. I understand the concept of a trampoline for TCO. But surely tail calls are optimized to some extent by the Scala compiler already – whether via a trampoline, or by some other means. So what is it about this implementation – other than the different interface – that makes it capable of performing what the other one can’t?

Maybe my question will be answered by reading the paper on this implementation.

SethTisue · October 31, 2022, 5:38pm

Someone in SO is poking around in the same thing. I initially assumed it was you: fold - FoldRight over Infinite Structures in Scala using Trampolines - Stack Overflow

SethTisue · October 31, 2022, 5:47pm

The Scala compiler never introduces trampolining. If you want trampolining you have to explicitly use TailCalls (or some hand-coded equivalent of it — TailCalls is just ordinary Scala code, without special compiler support).

Self-recursive tail calls (not tail calls in general!) are optimized by the Scala compiler by replacing them with jumps, so they don’t consume stack. The emitted bytecode resembles that would be emitted by an equivalent while loop. @tailrec fails compilation if this optimization cannot be performed.

Reading the paper you linked to should definitely help you understand how TailCalls works.