Tail recursion in non-final methods

tyohDeveloper · July 19, 2019, 10:33pm

This started with a discussion related to Colojure’s loop/recur construct. I believe I have included all elements of the discussion w/o any of the particulars. Originally, I had thought the non-final methods would require a few extra opcodes to stack optimize tail recursive functions. I will show that optimizing stack usage with tail recursion should produce the identical code regardless of the method’s “final” status.

Summary:

Currently the Scala compiler will only perform tail recursion for final methods. There is no reason for it to limit the optimization.

Discussion:

In this summary I will use the words “method” and “function.” When I use the word “function” I mean an executable subroutine. Methods may be functions or abstract. An abstract method defines a contract, but no usable subroutine. In Scala and other Smalltalk derived languages, there is an implicit variable called “this”, “self” or something the same. I will make this variable explicit in my examples, e.g.

trait X { def fn():Unit }

is a method in X of type

X=>Unit

Consider the functions (a,b,d) with the same type

(X,Seq[T])=>Seq[T]

Let the function “b” be defined by the first concrete class with type X. The function “b” iterates over the given sequence and returns a sequence. In scala there are two ways to create this function.

trait X[T] { def b(tsx:Seq[T]):Seq[T]=???}

or

trait X[T] { val b: (x:X,tsx:Seq[T]):Seq[T]=???}

Until later, I will not make a distinction between the two.

The function “a” needs to consume the function “b”. Function “a” calls “b”

a(x,tsx) = { b(x,tsx) }

The consumer “a” in general knows nothing about the implementation of “b”. The function “b” could tsx, iterate over every element in tsx, use tail recursion, iterate with a do/while construct, or something else. Unless “a” depends on an implementation artifact of “b” it shouldn’t presume anything about the implementation.

[[Skip over the use case if you see the problem already.]]

Use case 1:

<<<System version 1— year 1>>>

The trait X is in an installed system that has been running for several years. The method “b” is rarely called. When it is called, it is with very short sequences. The Hotspot compiler never compiles the subroutine as it is never “hot.”

.b(x,tsx) // Called very rarely.

The programmer an n^2 algorithm. It was trivial to implement, code review, and there isn’t any performance reason to use a more sophisticated algorithm. It uses tail recursion for looping, the scala compiler does not optimize the tail recursion. Every iteration performs a subroutine call to itself.

<<<System version 2— year 2>>>

One year later, a new capability is added to the system. The function “a” uses the function “b”, but “a” is rarely used. The function “b” is called a lot, it’s n^2 algorithm is no longer acceptable. Its algorithm is changed to a n*ln(n). The function remains using tail recursion, but it remains unoptimized. All of the tsx are short.

.a(x,tsx)     // Called rarely, when it does it calls “b”.

.b(x,tsx).    // Now called frequently. The length of “tsx” never exceeds 5.

Unknown to the engineers, the programmer who wrote “a” depends on the implementation of “b” to call back “a” for each iteration. Heavens knows why the programmer would be so odd. But “a” will break if it isn’t called for every iteration in “b”. However, “a”’s programmer wrote robust test routines and performed code coverage. The function “a” is almost never used, it misses key quality controls. It’s put into production.

<<System version 3— year 3>>>

A year later system starts working on different data. Now the function “b” is being called with the “tsx” being tens of thousand of elements long. The function “b” fails with a stack overflow. The fastest way to get the bug fix into “b” is to optimize tail recursion.

The body of “b” is put into a new private method “d”. The @tailrec annotation is placed in front of “d”. The function “b” is left public, but now just delegates to “d”.

<<System version 3 — year 4>>

Something goes wrong. The core table in the database is destroyed. The cause of the problem can’t be found. The production staff restore the database and restart the system.

<<System version 3.1 — year 4.1>>

A call to the function “a” is made. Remember it relies on being called back while “b” is being evaluated. It isn’t, for a single event, the system’s internal state is poisoned. This is the same thing that destroyed the database in year 4. After 24 hours the poison hits the core of the system. The database is destroyed again.

An entire day’s production is lost. A forensic team is created to find out what went wrong. It the database is destroyed again the next day. The forensic team can’t find the bug.

The system is now taken down at midnight every day, some PM is done on the database. The database no longer has to be rebuilt every day during peak hours. A very good thing. Unfortunately another operator must be added to the night staff. Additional jobs are needed to perform maintenance each night. The entire batch execution structure must be rewritten. Unfortunately, a job that needs to be nightly no longer fits into the overnight batch cycle. In addition to the additional operator, more computers and disks are bought to make everything complete overnight.

<<System version 3.1 — year 4.5>>

Four months later, the team finds that the fault is in function “a.” When it first went into production in version 2, it worked correctly. That was the last time anyone actually tested the function “b”. That entire subsystem was unchanged after version 2 was released. The function was never reevaluated before changes to the core system (release 3) were put into production. For the first year of version 3, the data remained the same. The function “a” wasn’t called. The data changed at the start of year 4. No system was changed, but the data being fed into the system changed. The bug in function “a” doesn’t immediately destroy the database. It introduces a poisoned data packet that will crash the database in a day or so.

The bug is fixed. The production staff, operations manager, and the manager that has overspent his budget by more than $1 million gather pitchforks and look for the programmer of “a”.

[[End of use case 1]]

Analysis:

The problem occurred when the function “a” was installed into production. A year after it was installed, a performance problem was detected in “b”. The function “b” is rewritten to change from n^2 to nln(n). A year after that, the data changed and “b” was called with a 10,000 long sequence. It’s tail recursion was optimized.

The first time the problem was observed is a year later. Two years and a major version change later. The function was almost never called. When it first it was called, “b” called it with each step. It worked. When version 3 was installed, it stopped working, but the bug was benign. It never caused a problem. It wasn’t until the data changed that it started being called every day. Unfortunately, the bug didn’t call the system to fall over immediately. It poisoned the state of the system in a way that would cause it to fall over at a later time.

Use Case 2:

Use Case 2 is actually Use Case 1. I’ll just give you more information.

We have trait

trait X[T] { def fn(tsx:Seq[T]):Seq[T] /* abstract */ }

Function “b” is created by

class System1[T] extends X[T]

    { def fn(tsx:Seq[T]):Seq[T] = {...}  }

For the class System, the method “fn” is the function “b”. The method “fn” is written using n^2 complexity and uses tail recursion for looping. The method isn’t declared final, so the scala compiler version 2.18 can’t optimize the tail recursion. But that’s fine. The sequences are never longer than 5, it is rarely called. Any kind of optimization is more hassle than not optimizing.

Starting in release 2 another class is created

class SubSystem2[T] extends System1[T] 

   { def fn(tsx:Seq[T]):Seq[T] = { super.fn(tsx) }

SubSystem2’s “fn” is function “a”. Somewhere in its implementation, it calls System’s implementation of “fn”. It calls it in a what that requires itself to be called by the super implementation to call itself once for each element. The call stack looks like

….a

……..b

………...a   // Iteration 0

………...a.  // Iteration 1

………...a

………...a.  // Until n

In version 3, System1’s “fn” (function “b”) first uses a cpu efficient nLogN algorithm. Later it’s tail recursion was optimized to stack efficient loop/not-call tail recursion.

Most callers of “b” use an invokedynamic call. The function “b” calls “d” using a jsr instruction. SubSystem2’s “fn”, function “a”, uses an invokespecial call.

Here lies the problem. The programmer of SubSystem2.fn uses a super.() call. Because the programmer is overriding System1’s fn, and in this case, knows the inner implementation of it. Writing SubSystem2.fn takes advantage of knowing the implementation, not just the contract for “fn”.

The problem didn’t happen until there was a major revision of System1. Even then, it didn’t become catastrophic until the data in the system changed and it started getting called more often.

Lesson:

It’s a very bad idea to rely on the implementation of a method that you call. In Use Case 1, the names of the functions were “a”, “b”, and “d”. In Use Case 2, the functions are called “fn”, “fn”, and “private tailRecOptimizedFn”.

Use Case 1 likely would never happen. The problem happened in Use Case 2 because the author of SubSystem2 overrode “fn” and used “super”. That likely was the reason the author of SS2 felt entitled to rely on the implementation of its superclass.

Conclusion:

Tail recursion stack optimization is little different than an algorithm optimization. One optimizes for call stack depth, the other for cpu cycles. Scala 2.18 isn’t doing anyone any favors by refusing to stack optimize tail recursion. The bug could have happened if regardless of the “final” status of the functions. All functions “a”, “b”, “c” could be static and the same bug would exist. There is nothing special (sic) about non-final methods. Every method can be optimized for cpu or stack by programmers. Having the scala compiler implement a standard optimization technique is a good thing. Not performing the optimization isn’t a bad thing, it just causes runtime overhead. The compiler is free to optimize (or not) stack usage for a final method. The compiler has code that essentially reads “if final then optimize else raise an error.” There is no reason for the conditional logic. The compiler should stack optimize a method’s tail recursion regardless if it is final or not.

tyohDeveloper · July 19, 2019, 10:34pm

Oops, I think I might have used Scala Compiler 2.18 for 2.13. It changes nothing.

tarsa · July 20, 2019, 12:15am

I’m not sure what you wanted to tell by showing that long story, but it seems that you are trying to rationalize breaking semantics of bad code. The reality is that if user code do not rely on undefined behaviour or implementation specific properties then no safe optimization is allowed to change semantics of user’s code. It doesn’t matter if user’s code is badly written, error-prone, slow, brittle, inadequate, causing cancer, plain wrong or whatever. Calling super is well defined in Scala so automatic optimizations must not break code that invokes super just because it invokes super. Scala classes can be extended from Java (you’re talking about non-final methods and subclassing anyway) and Java code has super also - that’s another thing to consider when proposing change in semantics.

tyohDeveloper · July 20, 2019, 3:04am

You misunderstand. The the behavior you outline with calling “b” via a super call is no different than exists anywhere. The method “b” will always break “a” when optimized. The maintainers of “b” will never know they broke “a”.

In the use case, “b” had a clear contract, but hidden attributes. 1) It was only to be used rarely. 2) It was only to be used for short sequences. Over time, first 1 then 2 changed. The owners of the method “b” optimized it by hand twice. The first required them to change the algorithm. The complexity goes from n^2 to n*logN. The second is to optimize the stack usage.

Neither optimization changes the contract for ”b”. The first loosens the constraint of how frequently it can be called. The second optimization eliminates the length of sequence constraint. The contract remains identical. The limitations are removed. The maintainers of “b” are correct that they can optimize “b” at will. Unfortunately, the programmer that codes “a” writes one or two bugs. Both of them depend on the number of times it is called in relationship to the number of calls it makes to “b” via super.

The owners of System1 may not ever know of the existence of SubSystem2. That is the use case. SubSystem2 writes code that depends on the first implementation of “b”. That’s a bug, but it’s not discovered in the testing of System1nor in the testing of SubSystem2. It is an insidious bug that will never be discovered by the maintainers of System1. Because the bug isn’t theirs. In all three versions of “a” the code is correct.

I outlined the use case to the worst outcome possible for the bug. 1) the bug only shows up when the data streams change, 2) some times that the bug triggers no symptoms occur, 3) when a symptom occurs, it occurs well after the error happens.

System1 will not catch the bug when it is updated.

SubSystem2 never goes through code level tests. Period. It never changes, so it does not need to be tested at that level.

When System1 is changed, SubSystem2 will go through an integration change. That test most likely won’t discover the problem with the first optimization.

It isn’t until both systems have been in production for a year the bug will cause a fatality. That happens because of the base bug and the new data pattern flowing through both symptoms.

The compiler’s rule not to stack optimize “b” likely compounds the cost of the bug. The bug would have been caught by the programmer when they wrote “a” in the first place. The authors of “b” made the correct choices in all three versions. Unoptimized, optimized for cycles, optimize for stack. That the combined compound system eventually suffers a fatality isn’t due to anything other than the programmer of “a”.

The test here is

Does the compiler deliberately create code that is more likely cause runtime errors later or does it create code that is not only faster, but less likely to fail?

Ether choice results in correct code.

The bug you fear will happen. Nothing can change that. It will happen because the compiler knows how to optimize the code or the maintainers optimize the code. Preventing the compiler to do the optimization saves nothing. The bug exists because “a” tightly binds to the first implementation of “b”.

The optimizer will never prevent nor cause the bug. Writing a test in the compiler to skip if the method is non-final doesn’t save anyone. It adds complexity to the compiler and generates slower code with execution constraints.

The compiler should generate the same code in the general case as well as the final case. The “final” quality of a method is little different than the “protected” quality. Think it through. I’ve written my complete system. It works.

Because I know that static calls are faster than fully dynamic calls, write a program to place “final” everywhere in the code. I expect that the system will be faster. I expect the system to operate exactly the same. It is like using -optimize. The system will work exactly the same, but just faster.

A caller can break when it’s target is optimized. The bug is the calling code, not in the optimized code. Optimizing tail recursion to be a loop rather than a stack call is just that. An optimization. Something has to be done deliberately/carelessly to make that not true.

Imagine the scala compiler did no optimization. The bug you fear still exists. Imagine I rewrite the first version of “fn” to explicitly use a while loop in place of tail recursion loop. The first “fn” code gets stack optimized tail recursion and the compiler has nothing to do with it. My first unoptimized code and my later, optimized code didn’t change anything. The method “fn” still can be overridden. If the consumer breaks, it’s only because it is too tightly bound to the unoptimized edition.

wrbriggs · July 20, 2019, 3:10am

I feel like you are conflating two different problems. The long example you provided seems to me to be mostly a cautionary tale on why circular dependencies are bad, and why you should strive to avoid writing functions that aren’t referentially transparent (or at least push state changes to the outer layer of your code base).

The fact that this particular use case benefited from a tail-recursive refactor seems almost beside the point; the same problem could just as easily happen with a variety of other refactors.

The biggest problem in your use case isn’t that the function wasn’t tail-call optimized on the stack; the problem is that this example system has tightly coupled implementations that create a circular dependency and an unstable mutable state. It’s already breaking many of the best practices for both OO design (e.g., open/close principle, separation of concerns, single responsibility) and functional programming (e.g., immutable data structures, referential transparency).

There are certainly arguments to be made and discussions to be had regarding Scala’s current restrictions on tail call optimization, but I think I must be missing something, as I don’t find this use case particularly compelling given my current understanding of it.

tyohDeveloper · July 20, 2019, 3:28am

Scala’s existing compiler chooses to optimize only final methods. The stated reason is that it will prevent buggy software from presenting symptoms. The bug will exist if the optimizer is a human or a compiler.

If there is a body of software out their that will break if it’s libraries are optimized, that body is buggy. And it’s the worst kind of buggy that exists.

wrbriggs · July 20, 2019, 4:28am

The stated reason is that it will prevent buggy software from presenting symptoms. The bug will exist if the optimizer is a human or a compiler.

It seems like you’re creating a strawman, and then arguing against it. My understanding is that Scala requires methods be final due to the way that method dispatch and inheritance work in the JVM (i.e., the JVM’s goto instruction is less flexible than true jump instructions - it can only target instructions within the scope of the current function).

There are valid reasons to recurse into an overridden method (e.g. continuation passing). Scala’s insistence on final methods prevents programmers from having to deal with this class of bug, while providing flexibility where necessary.