This started with a discussion related to Colojure’s loop/recur construct. I believe I have included all elements of the discussion w/o any of the particulars. Originally, I had thought the non-final methods would require a few extra opcodes to stack optimize tail recursive functions. I will show that optimizing stack usage with tail recursion should produce the identical code regardless of the method’s “final” status.
Summary:
Currently the Scala compiler will only perform tail recursion for final methods. There is no reason for it to limit the optimization.
Discussion:
In this summary I will use the words “method” and “function.” When I use the word “function” I mean an executable subroutine. Methods may be functions or abstract. An abstract method defines a contract, but no usable subroutine. In Scala and other Smalltalk derived languages, there is an implicit variable called “this”, “self” or something the same. I will make this variable explicit in my examples, e.g.
trait X { def fn():Unit }
is a method in X of type
X=>Unit
Consider the functions (a,b,d) with the same type
(X,Seq[T])=>Seq[T]
Let the function “b” be defined by the first concrete class with type X. The function “b” iterates over the given sequence and returns a sequence. In scala there are two ways to create this function.
trait X[T] { def b(tsx:Seq[T]):Seq[T]=???}
or
trait X[T] { val b: (x:X,tsx:Seq[T]):Seq[T]=???}
Until later, I will not make a distinction between the two.
The function “a” needs to consume the function “b”. Function “a” calls “b”
a(x,tsx) = { b(x,tsx) }
The consumer “a” in general knows nothing about the implementation of “b”. The function “b” could tsx, iterate over every element in tsx, use tail recursion, iterate with a do/while construct, or something else. Unless “a” depends on an implementation artifact of “b” it shouldn’t presume anything about the implementation.
[[Skip over the use case if you see the problem already.]]
Use case 1:
<<<System version 1— year 1>>>
The trait X is in an installed system that has been running for several years. The method “b” is rarely called. When it is called, it is with very short sequences. The Hotspot compiler never compiles the subroutine as it is never “hot.”
.b(x,tsx) // Called very rarely.
The programmer an n^2 algorithm. It was trivial to implement, code review, and there isn’t any performance reason to use a more sophisticated algorithm. It uses tail recursion for looping, the scala compiler does not optimize the tail recursion. Every iteration performs a subroutine call to itself.
<<<System version 2— year 2>>>
One year later, a new capability is added to the system. The function “a” uses the function “b”, but “a” is rarely used. The function “b” is called a lot, it’s n^2 algorithm is no longer acceptable. Its algorithm is changed to a n*ln(n). The function remains using tail recursion, but it remains unoptimized. All of the tsx are short.
.a(x,tsx) // Called rarely, when it does it calls “b”.
.b(x,tsx). // Now called frequently. The length of “tsx” never exceeds 5.
Unknown to the engineers, the programmer who wrote “a” depends on the implementation of “b” to call back “a” for each iteration. Heavens knows why the programmer would be so odd. But “a” will break if it isn’t called for every iteration in “b”. However, “a”’s programmer wrote robust test routines and performed code coverage. The function “a” is almost never used, it misses key quality controls. It’s put into production.
<<System version 3— year 3>>>
A year later system starts working on different data. Now the function “b” is being called with the “tsx” being tens of thousand of elements long. The function “b” fails with a stack overflow. The fastest way to get the bug fix into “b” is to optimize tail recursion.
The body of “b” is put into a new private method “d”. The @tailrec annotation is placed in front of “d”. The function “b” is left public, but now just delegates to “d”.
<<System version 3 — year 4>>
Something goes wrong. The core table in the database is destroyed. The cause of the problem can’t be found. The production staff restore the database and restart the system.
<<System version 3.1 — year 4.1>>
A call to the function “a” is made. Remember it relies on being called back while “b” is being evaluated. It isn’t, for a single event, the system’s internal state is poisoned. This is the same thing that destroyed the database in year 4. After 24 hours the poison hits the core of the system. The database is destroyed again.
An entire day’s production is lost. A forensic team is created to find out what went wrong. It the database is destroyed again the next day. The forensic team can’t find the bug.
The system is now taken down at midnight every day, some PM is done on the database. The database no longer has to be rebuilt every day during peak hours. A very good thing. Unfortunately another operator must be added to the night staff. Additional jobs are needed to perform maintenance each night. The entire batch execution structure must be rewritten. Unfortunately, a job that needs to be nightly no longer fits into the overnight batch cycle. In addition to the additional operator, more computers and disks are bought to make everything complete overnight.
<<System version 3.1 — year 4.5>>
Four months later, the team finds that the fault is in function “a.” When it first went into production in version 2, it worked correctly. That was the last time anyone actually tested the function “b”. That entire subsystem was unchanged after version 2 was released. The function was never reevaluated before changes to the core system (release 3) were put into production. For the first year of version 3, the data remained the same. The function “a” wasn’t called. The data changed at the start of year 4. No system was changed, but the data being fed into the system changed. The bug in function “a” doesn’t immediately destroy the database. It introduces a poisoned data packet that will crash the database in a day or so.
The bug is fixed. The production staff, operations manager, and the manager that has overspent his budget by more than $1 million gather pitchforks and look for the programmer of “a”.
[[End of use case 1]]
Analysis:
The problem occurred when the function “a” was installed into production. A year after it was installed, a performance problem was detected in “b”. The function “b” is rewritten to change from n^2 to nln(n). A year after that, the data changed and “b” was called with a 10,000 long sequence. It’s tail recursion was optimized.
The first time the problem was observed is a year later. Two years and a major version change later. The function was almost never called. When it first it was called, “b” called it with each step. It worked. When version 3 was installed, it stopped working, but the bug was benign. It never caused a problem. It wasn’t until the data changed that it started being called every day. Unfortunately, the bug didn’t call the system to fall over immediately. It poisoned the state of the system in a way that would cause it to fall over at a later time.
Use Case 2:
Use Case 2 is actually Use Case 1. I’ll just give you more information.
We have trait
trait X[T] { def fn(tsx:Seq[T]):Seq[T] /* abstract */ }
Function “b” is created by
class System1[T] extends X[T]
{ def fn(tsx:Seq[T]):Seq[T] = {...} }
For the class System, the method “fn” is the function “b”. The method “fn” is written using n^2 complexity and uses tail recursion for looping. The method isn’t declared final, so the scala compiler version 2.18 can’t optimize the tail recursion. But that’s fine. The sequences are never longer than 5, it is rarely called. Any kind of optimization is more hassle than not optimizing.
Starting in release 2 another class is created
class SubSystem2[T] extends System1[T]
{ def fn(tsx:Seq[T]):Seq[T] = { super.fn(tsx) }
SubSystem2’s “fn” is function “a”. Somewhere in its implementation, it calls System’s implementation of “fn”. It calls it in a what that requires itself to be called by the super implementation to call itself once for each element. The call stack looks like
….a
……..b
………...a // Iteration 0
………...a. // Iteration 1
………...a
………...a. // Until n
In version 3, System1’s “fn” (function “b”) first uses a cpu efficient nLogN algorithm. Later it’s tail recursion was optimized to stack efficient loop/not-call tail recursion.
Most callers of “b” use an invokedynamic call. The function “b” calls “d” using a jsr instruction. SubSystem2’s “fn”, function “a”, uses an invokespecial call.
Here lies the problem. The programmer of SubSystem2.fn uses a super.() call. Because the programmer is overriding System1’s fn, and in this case, knows the inner implementation of it. Writing SubSystem2.fn takes advantage of knowing the implementation, not just the contract for “fn”.
The problem didn’t happen until there was a major revision of System1. Even then, it didn’t become catastrophic until the data in the system changed and it started getting called more often.
Lesson:
It’s a very bad idea to rely on the implementation of a method that you call. In Use Case 1, the names of the functions were “a”, “b”, and “d”. In Use Case 2, the functions are called “fn”, “fn”, and “private tailRecOptimizedFn”.
Use Case 1 likely would never happen. The problem happened in Use Case 2 because the author of SubSystem2 overrode “fn” and used “super”. That likely was the reason the author of SS2 felt entitled to rely on the implementation of its superclass.
Conclusion:
Tail recursion stack optimization is little different than an algorithm optimization. One optimizes for call stack depth, the other for cpu cycles. Scala 2.18 isn’t doing anyone any favors by refusing to stack optimize tail recursion. The bug could have happened if regardless of the “final” status of the functions. All functions “a”, “b”, “c” could be static and the same bug would exist. There is nothing special (sic) about non-final methods. Every method can be optimized for cpu or stack by programmers. Having the scala compiler implement a standard optimization technique is a good thing. Not performing the optimization isn’t a bad thing, it just causes runtime overhead. The compiler is free to optimize (or not) stack usage for a final method. The compiler has code that essentially reads “if final then optimize else raise an error.” There is no reason for the conditional logic. The compiler should stack optimize a method’s tail recursion regardless if it is final or not.