Has a loop/recurs comprehension been considered for scala

tyohDeveloper · July 6, 2019, 10:30pm

Clojure uses a loop/recurs comprehension to make tail recursion explicit. Has something like this been considered for scala? It would be better than the @tailrec that scala currently uses. It could also become a foundation for multifunction tail recursion/entry-exit.

tarsa · July 7, 2019, 11:49am

How such compherension would look like? Scala’s standard library already contains facilities for trampolining: TailCalls It’s somewhat verbose and has some efficiency overhead, but should work for any recursion scheme.

Also @tailrec doesn’t actually do anything except instructing the compiler to throw an error if it can’t turn recursion into an optimized version, i.e. it’s for verification only:
https://www.scala-lang.org/api/current/scala/annotation/tailrec.html

A method annotation which verifies that the method will be compiled with tail call optimization.
If it is present, the compiler will issue an error if the method cannot be optimized into a loop.

Tail recursion is optimized into loops whenever it can be, regardless if @tailrec is present or not.

As a final note - Java guys are working on tail call elimination. The work is part of Project Loom: OpenJDK: Loom Probably it will take them a few years to deliver that, though. They haven’t produced any public prototype of Project Loom yet and their description on wiki page isn’t comforting either:
Main - Main - OpenJDK Wiki

Tail Calls

Design

We envision tail-call elimination that pops one or perhaps even an arbitrary number of stack frames at explicitly marked call-sites. It is not the intention of this project to implement automatic tail-call optimization.

Implementation

The implementation of this feature requires cross-cutting changes to the VM, VM specification (bytecode), and possibly the front-end Java compiler (javac). As a result, in order not to delay the completion of continuations and fibers, we will only begin specifying and implementing this feature only when the project is at a more advanced phase.

tyohDeveloper · July 7, 2019, 1:19pm

Thanks. I didn’t know about TailRec. That’s sounds like it solves the simple case (99%) of multi-function coroutine problem.

I don’t know how recur would/should fit into scala. Recur explicitly calls the identical function. It is a syntax for a zero weight tail loop. Like a “this” call, but for the function. It doesn’t offer any more than @tailrec. It makes it explicit at the call site. It’s less of a special trick that the programmer must know about.

Clojure is a lisp, so the loop/recur comprehension is a natural fit. I’m not sure, it might even be a macro. For comprehensions aren’t a good fit for a lisp. My first thought was “why don’t you just say what you mean?” Lispers think in terms of map/reduce, the for is syntactically jarring.

[I’m not suggesting this lexical solution — it’s ugly.] I was thinking somewhat along the lines of

fn(...):type =  body @tailrec(...)

Using @tailrec function annotation seems error prone, just like Scheme’s compiler requirement for tail recursion. It’s too simple for programmer to incorrectly think they are making a tail call. With scala, the programmer can easily make the same mistake. The annotation will catch it, iff it is used.

I was wondering if something had been considered. It sounds like it has. The annotation being good enough that …control.TailRec was added for the more general case rather than adding more to the language itself.

Thanks.

tyohDeveloper · July 7, 2019, 6:08pm

[tangentially related]
Why doesn’t scala optimize non-final methods? The implied val “this” can’t change, so there is no risk of dispatching to a different function.

tarsa · July 7, 2019, 6:26pm

Because together with recursion you can:

override a non-final method
call super.method from the overriding method

This code:

class Parent {
	def method(n: Int): Unit = {
		println("Entering Parent.method")
		if (n > 0) method(n - 1)
	}
}

class Child extends Parent {
	override def method(n: Int): Unit = {
		println("Entering Child.method")
		super.method(n)
	}
}

object Main extends App {
	new Child().method(3)
}

gives following output:

Entering Child.method
Entering Parent.method
Entering Child.method
Entering Parent.method
Entering Child.method
Entering Parent.method
Entering Child.method
Entering Parent.method

As you see, there’s close cooperation between parent’s and child’s methods. To retain correctness of tail call optimization in such scenario you would probably need some serious gymnastics. What if Parent and Child are in different libraries? What if you swap JARs between e.g. test run and production run? Probably the disadvantages outweigh the advantages.

Also it isn’t painful to just split out an extra final function from a non-final one, i.e. you can turn such code:

class SomeClass {
  def nonFinalMethod(params) = <body>
}

into something like that:

class SomeClass {
  def nonFinalMethod(params) = finalMethod(params)

  final def finalMethod(params) = <body>
}

or that:

class SomeClass {
  def nonFinalMethod(params) = {
    // "go" is a final local method
    def go(params1) = <body>
    go(startingParams)
  }
}

First solution won’t solve the problem with super.method but if you warn against that then you can have both tail recursion optimization and openess for overriding. Second solution is OK as long as you don’t call nonFinalMethod from inside of go local method.

tyohDeveloper · July 7, 2019, 6:46pm

The compiler has a bug. Neither of those are tail recursive. The call a function of the same name, but not the same function. Each method is a different function. Calling a different function is obviously not tail recursive. The programmer is explicitly saying, “Call a method that has the same name, but is different than this function.” The compiler has a bug.

Tail recursion only happens when both are true
1: The method dispatch pattern match val is (this) is eq.
2: There is no code that is executed after the the function call.

The first is not true in either case. The compiler checks this already, it issues an error. The compiler could change it’s code generation to be upwardly compatible. The function call is referentially transparent.

tarsa · July 7, 2019, 7:02pm

Generally in Java and Scala following rules apply:

call to non-final method is virtual
call to final method is non-virtual
in a subclass you can call method implementation from superclass

Therefore even if you call non-final method named X from non-final method named X (of the same class) it doesn’t always mean that the method calls itself. That breaks tail recursion optimization that is based on static rewriting into a loop.

I don’t see a bug anywhere in the compiler. I’ve presented perfectly legal code from both Scala and JVM point of view. It’s just how it works in Java world.

Tail call optimization must not change the semantics so if the optmization has the potential to break it, it won’t be applied.

tyohDeveloper · July 7, 2019, 8:42pm

No, “this” is invariant. That means the method dispatch will always call the same function. The method is virtual, the function is not. Super.f and new C.f are not this. It’s impossible to call the same function if one is not using this. It’s a misunderstanding at best, really it’s a bug.

tarsa · July 7, 2019, 10:39pm

I do not see any function there. In Scala a method is the same as a method in Java. Function is an instance of class Function0, Function1, … and so on.

this is implied if no instance is provided, i.e. the code from post above is directly completed with explicit this. by the compiler and then looks like this:

class Parent {
	def method(n: Int): Unit = {
		println("Entering Parent.method")
		if (n > 0) this.method(n - 1) // explicit "this." !!!
	}
}

class Child extends Parent {
	override def method(n: Int): Unit = {
		println("Entering Child.method")
		super.method(n)
	}
}

object Main extends App {
	new Child().method(3)
}

Above code will still produce exactly the same output (and will look exactly the same in Java bytecode) as the previous version:

Entering Child.method
Entering Parent.method
Entering Child.method
Entering Parent.method
Entering Child.method
Entering Parent.method
Entering Child.method
Entering Parent.method

this.method is always virtual unless method is final. It doesn’t matter if this.method is called inside method or not and it shouldn’t matter. If this.method works differently depending whether it’s inside of method or not then that breaks refactoring. Nobody wants broken refactoring.

tyohDeveloper · July 7, 2019, 10:52pm

The function is the concrete method. Not the abstract method.

Functions are reverentially transparent (if no side effects). Methods are not referentially transparent. They should be schematically transparent, else the programmer/designer is violating the principles of OO. But they should never be the same. If they are, someone is just duplicating the code. Violating general programming principles.

tarsa · July 7, 2019, 11:01pm

Well, you use some foreign terminology. In Scala a function is really an instance of a FunctionXXX class, e.g: Function2

A function of 2 parameters.

In the following example, the definition of max is a shorthand for the anonymous class definition anonfun2:
object Main extends App {
   val max = (x: Int, y: Int) => if (x < y) y else x

   val anonfun2 = new Function2[Int, Int, Int] {
     def apply(x: Int, y: Int): Int = if (x < y) y else x
   }
   assert(max(0, 1) == anonfun2(0, 1))
}

What is not referentially transparent about methods?

Could you give examples of that violations? Preferably in Scala.

tyohDeveloper · July 8, 2019, 12:29am

package acme
import scala.collection.immutable.IndexedSeq

// Violates contract. Returns a sequence of length 0.
// Map must return a sequence the same length as itself.
class Foo extends IndexedSeq[String] {
  override def map( f: String=>String ): IndexSeq[String] =
    IndexedSeq[String]()
}

Why is a violation? Imagine using zip to create pairs that can be used to create a map. The map will be empty. The consumer will be very surprised. E.g.

def consumerMethod( sx: Seq[String] ):Map[(String,String] = sx.map(???).zip(sx).toMap

The consumer has no idea that sx is a Foo. It requires a map to function correctly. The returned map is zero length. The contract has been violated.

tarsa · July 8, 2019, 12:46am

Well, you can do analogous violation with functions. Suppose you have:

trait Filterable[A] {
  val filterFunction: (A => Boolean) => Filterable[A] =
    <some concrete valid implementation>
}

then you do:

class Foo extends Filterable[String] {
  override val filterFunction =
    _ => Filterable.empty // this is not a valid implementation
}

You’ve violated contract using functions. How violating contracts using methods is different from violating contracts using functions and why it matters? Are we still talking in context of tail recursion optimization?

In Scala functions are built using methods, not the other way around. Methods are low level building blocks, functions are ordinary classes with some syntactic sugar. Tail recursion optimization works on methods, not functions.

tyohDeveloper · July 9, 2019, 1:15am

There is no difference between an individual method and a function. Using val or def to declare and implement the function makes no difference.

With OO (and several other paradigms) the declaration of a non-final/non-override method defines a multi-method. A method that can be implemented by any subtype. Each individual def of a method is a function. Each have an identical name, parameters, and return contract. They don’t have an identical implementation. The consumer depends on the method the contract. In the case above, Foo should have thrown a runtime exception rather than return a null array.

The consumer can create an iteration, call length, do many things. Map is the only thing that violates the contract. In the case I outlined, the consumer can use an iteration to zip the origin and the results of applying a function. If they know all of the possible combinations, they won’t have to use a getOrElse. And that’s the best case. Foo could end up poisoning the entire state of the running system. By the time the system or dozens of systems goes belly up, they are so far from the offending code that it might take weeks to track it down. Longer if it poisons a persistent information storage. The fault may not be found for days to years.

Function is each individual implementation. Method is each contract. Supertypes do not have to implement the method. It can be abstract. In these cases, the concrete subtypes must override and implement the method. There is absolutely no reason for a super type to declare an abstract method. Except to define a contract that all concrete subtypes implement.

I’m not certain this is the proper scala terminology. In scala books, most absolute references to functions mean “val …: x=>y”. More often they speak of functions as an abstract concept. In v3.0 of scala, there is no difference between a concrete definition of a function and a concrete definition of a method. Def vs. val is only syntactic. Other than the implicit parameter/variable this. Even that last bit is sketchy. Val functions have access to their dynamic, lexical scope.

The way I mean it is a function has a concrete implementation. Methods are either a contract declaration and/or a concrete function that abides by the contract.

Tail recursion is an implementation detail. The consumer has no knowledge of its use. @tailrec isn’t part of the contract, it’s part of the black box of the implementation. The tail-rec might use tail directly, create and use an iterator, a loop, or a stack call. The compiler works on concrete functions/methods. It knows if a method uses tail recursion or not. The vm also can reason that a method is tail recursive. Both will know that a calls that dispatch on “this” is the only thing that can possibly tail recursive. Calling another object’s method will always be different. By definition of “this” is a ref to another object. Or in the case of “super” a different type.

Look to lambda calculus. If a (math) function means one thing on page 1 and something else on page 2, no proof would be possible. The contract is the only thing that gives the function meaning. While math rarely uses concrete numbers*, the function “add1” is extremely important. It provides the same operation on the groups Natural, Integer, Rational, Irrational. Using it we can prove that all three groups are of the same order of infinity (aleph nought), counting the number of each group results in a different order of infinity (aleph one). It’s critical that the function named “add1” performs the equivalent operation in each group. Add1 is a constructor in Natural and Integer and it isn’t a constructor for the other two groups, but it performs the same in each group. So the named function follows the same contract in each of the four groups, but it isn’t identical in each group.

Violating the contract of the “add1” method by using incomparable functions in a program won’t destroy the universe, probably won’t prevent you from getting a PhD. It might cause your company to declare chapter-7.

Concrete numbers are the domain of scientists and engineers. Mathematicians quickly all ability to to arithmetic starting their junior year.

tarsa · July 13, 2019, 9:02pm

There is and it’s substantial: in Scala, Java, Kotlin, C# etc a function is an object and can be passed around while a method isn’t. Method has to be lifted to a function to be able to be passed around. Also you can’t invoke a virtual method like a non-virtual method, i.e. if a method is virtual then you can’t even invoke a particular implementation - what you invoke always depends on the class of receiver (the only exception is calling super).

JVM doesn’t have multimethods. .NET also doesn’t have them. Same goes for C++. Multimethods are in Common Lisp though. Multimethods as written on Multiple dispatch - Wikipedia is a mechanism of dynamic overloading. Overloading in Java, Scala, C#, C++ etc is static, so you can’t directly have multimethods in such environments. You can emulate them though, with IMO a rather big overhead.

Nope. I haven’t seen anywhere that Scala 3 fundamentally changes what a function is. scastie unfortunately doesn’t work so I’ll not show any proof, but then you could prove that a function changed its mechanics. The only new things about function that I know is that the type system is improved to the point where functions can be generic and have internally path dependent types. That was previously reserved for methods only. In Scala 3 you can losslessly lift methods to functions in more (or all?) situations.

anything(arguments) is always expanded by the Scala compiler to anything.apply(arguments) so it’s clearly visible that it’s a method invocation of an object.

Update:
scastie started working. Scastie - An interactive playground for Scala. shows that functions in Dotty are still instances of FunctionXXX subclasses:

  val function = (a: Int, b: String) => a + b.length
  println(function.isInstanceOf[Function2[_, _, _]])

prints true

@tailrec is just an annotation meant to verify whether a compiler did TCO. It doesn’t enable or force TCO.

this is a reference like any other with the exception that it’s implicitly passed to instance method and it’s implicitly available in instance methods. this.method(args) is no different than notThis.method(args) to the compiler. If method is a virtual method (in Java, Scala, Kotlin, etc every overridable method is virtual) then the type of this or nonThis is only used to verify that method exists on object. The invoked implementation of method depends on the type of object pointed to by this or nonThis.
egXa3w - Online Scala Compiler & Debugging Tool - Ideone.com

object Main extends App {
	class Parent {
		def entry(): Unit =
			// this is a virtual dispatch, so it can invoke different
			// implementation of "printer" method than println("I'm Parent")
			// `this` reference is of type Parent, but real type can be Child
			this.printer() 
 
		def printer(): Unit =
			println("I'm Parent")
	}
 
	class Child extends Parent {
		override def printer(): Unit =
			println("I'm Child")
	}
 
	new Child().entry() // prints "I'm Child"
}

Going back to the Scala example I’ve given above, i.e.:

class Parent {
	def method(n: Int): Unit = {
		println("Entering Parent.method")
		if (n > 0) this.method(n - 1) // explicit "this." !!!
	}
}

class Child extends Parent {
	override def method(n: Int): Unit = {
		println("Entering Child.method")
		super.method(n)
	}
}

object Main extends App {
	new Child().method(3)
}

the output is the same in Java ( WMOzRD - Online Java Compiler & Debugging Tool - Ideone.com ):

import java.util.*;
import java.lang.*;
import java.io.*;
 
class Parent {
	void method(int n) {
		System.out.println("Entering Parent.method");
		if (n > 0) this.method(n - 1); // explicit "this." !!!
	}
}
 
class Child extends Parent {
	@Override void method(int n) {
		System.out.println("Entering Child.method");
		super.method(n);
	}
}
 
public class Main {
	public static void main (String[] args) throws java.lang.Exception {
		new Child().method(3);
	}
}

in Python ( Y7ekft - Online Python3 Interpreter & Debugging Tool - Ideone.com ):

class Parent:
	def method(self, n):
		print("Entering Parent.method")
		if n > 0:
			self.method(n - 1)
 
class Child(Parent):
	def method(self, n):
		print("Entering Child.method")
		super().method(n)
 
Child().method(3)

in JavaScript ( JS Bin - Collaborative JavaScript Debugging ):

"use strict";

class Parent {
  method(n) {
    console.log("Entering Parent.method")
    if (n > 0) this.method(n - 1)
  }
}

class Child extends Parent {
  method(n) {
    console.log("Entering Child.method")
    super.method(n)
  }
}

new Child().method(3)

They all print following output:

Entering Child.method
Entering Parent.method
Entering Child.method
Entering Parent.method
Entering Child.method
Entering Parent.method
Entering Child.method
Entering Parent.method

which prove that in all of them this.method(args) is virtual and doesn’t depend on type of this reference. Scala language thus doesn’t differ in that regard from most of other OOP languages. Notable exceptions are C++ and C# which have non-virtual instance methods which have static dispatch, but if you use virtual keyword in C++ or C# then you get the same behaviour as in Java, Scala, JavaScript, Python, etc Therefore either all mentioned languages are buggy or they aren’t. What behaviour would you want from them and why?

Remember that with virtual methods the reference type is only used to verify that invocation is legal. Actual implementation of a virtual method that is invoked depends on real object type, not the reference type.

tyohDeveloper · July 14, 2019, 12:00am

Functions are always implementations. If a function is referentially transparent, it is referentially transparent. In other words, it doesn’t matter how it is consumed. Methods are multi-methods. They only are allowed a single implementation type pattern to match. This is a val. It can never change. QED, it will always dispatch to the same function, never to a different to different function (never a different method).

An implementation of a method is a function. The method contract itself is not a function. Eta selects a specific function. That function is unknown to the consumer of the method/eta. Again, it doesn’t matter to the consumer. Tail recursion isn’t part of the contract. It is part of the implementation/function. In every case, dispatching on this will execute the exact same function. The function doesn’t even need to be referentially transparent. The only difference between a tail-recursive function and generally recursive function is the need to use a method dispatch. Final functions guarantee the same function will be called. So does a dispatch on “this.”

The annotation is no different than using -depreciated in the compiler. It doesn’t change anything other than what the compiler writes to the console. It doesn’t even suggest to the compiler that it should attempt to prove it is recursive. Which is a trivial task. It already proves that it only understands a fraction of the problem.

The use of macros or partial functions can not be determined, but they are not functions. They are partial functions or compile time constructed artifacts. Actually it is possible to determine if a macro creates a referentially transparent function. In general, they are never used to create one.

tarsa · July 14, 2019, 7:08am

Ouch. If you really want to have your own definitions of functions or multi-methods, that’s fine. Since we were talking about that:

I think I’ve proved well enough that you can’t statically know what will this.method(args) invoke even when you’re inside method. I’ve shown examples in Scala, Java, JavaScript and Python and if you want I can show examples with virtual methods in C# and C++. It will work the same.

And as a bonus - how would you describe that in your terminology?

class Parent extends (Int => Unit) {
	def apply(n: Int): Unit = {
		println("Entering Parent.apply")
		if (n > 0) this(n - 1)
	}
}

class Child extends Parent {
	override def apply(n: Int): Unit = {
		println("Entering Child.apply")
		super.apply(n)
	}
}

object Main {
  def main(args: Array[String]): Unit =
  	functionInvoker(new Child)
  
  def functionInvoker(function: Int => Unit): Unit =
  	function(3)
}

It works the same in Scala 2.13.0 ( Scastie - An interactive playground for Scala. ) and Dotty 0.17 ( Scastie - An interactive playground for Scala. ) giving following output:

Entering Child.apply
Entering Parent.apply
Entering Child.apply
Entering Parent.apply
Entering Child.apply
Entering Parent.apply
Entering Child.apply
Entering Parent.apply

tyohDeveloper · July 14, 2019, 4:39pm

I will start at the beginning.

Tail recursion is a do/while or for loop. It is not a call. The entire point is to not use the stack.

Compilers that make a method call always have code that pushes the parameters onto the stack. Including the constant variable this. The compiler knows the address/reference-to of every variable. Constant variables or mutable variables. In the general case (not Unit), the method pushes the return value onto the stack. That is is where the calling code expects to find the return value after the static or dynamic opcodes expects the return value. The opcodes immediately after the code pops the result of the call. Either to squirrel it away or to ensure the top of the stack has the address to return to.

In recursive function the compiler must have the address to the opcode after the called function returns*. That is either the address of the method preamble or the opcode directly after the preamble. It also knows the address of the next opcode. It also has the address of the first opcode in the code for the method,. A tail recursive call is a jump relative to the executing code.

In the cases of jsr and invoke* the compiler must know the class reference to specify. If that class is the same class as the executing class, generating an invoke* is not required. This is also true for any inherited or interface method. For the case of invokevirtual the class and method references (C,A*)T. If that reference resolved to the currently executing method, it will resolve the same method during the lookup. The compiler knows which class/interface was specified by the caller. If that is the same data it would produce, the lookup must resolve to the executing code. Invoke virtual and special require a tad more logic that a final or static method call, but the compiler will generate the same (C,A*)T as the caller.

QED: The compiler has the knowledge of everything required to generate the tail-recursion. This is true for a static method and dynamic method calls. It also has all of the information to pull the returned value off of the stack and generate the opcodes after the call. The code the compiler produces for tail-recursion is exactly the same code generated by a for loop or a while/do loop.

The preamble is where the parameters are popped off of the stack. A tail recursive call can push the loop variables onto the stack. In which case the first few opcodes pop the opcodes and squirrels them away. The compiler may chose to leave a parameter on the stack. The variable this is the most common parameter to leave on the stack. The return value of a call (static or dynamic) is the top of the stack. The compiler must also know the stack of both a void/Unit call and a call with a return value (tail, static, dynamic calls).

[Tail recursion is functionally identical to inclining the function call. Once the Hotspot compiler knows enough to implement tail recursion, it also has the information required to inline other calls, including dynamic dispatch. The logical address of both static and dynamic calls changes depending on when the class is loaded. If the Hotspot compiler can infer tail recursion, it can infer inline calls. The only reason it doesn’t implement tail recursion already is that it doesn’t have as much information as the source code compiler. ]

tarsa · July 14, 2019, 5:03pm

First, we must disambiguate between compilation from source code to bytecode and JIT compilation from bytecode to native code. scalac only does compilation to bytecode, compilation to native code is done by JIT. scalac doesn’t know whether a non-final method in a non-final class is going to be overridden because it doesn’t know the final classpath. JIT OTOH knows the full classpath at a given moment and can do class hierarchy analysis determining e.g. which methods are effectively final (classpath can also dynamically change as classes can be loaded and unloaded, turning effectively final methods to overridable methods and vice versa, but HotSpot knows how to deal with that). JVM (HotSpot) currently doesn’t do any tail call optimization for security reasons (some security mechanisms rely on predictable stacktrace).

I’m not sure what you want to change. scalac or JVM? Change language semantics or add new keyword? Implement a linker for Scala which will do static (thus inferior) class hierarchy analysis?

Show how your optimization should work, i.e. show:

unoptimized Scala code before optimization
optimized (pseudo) Scala code after optimization
what optimization exactly does
exact conditions under which optimization can be applied
also prove that optimization doesn’t change semantics

tyohDeveloper · July 14, 2019, 5:10pm

Eta expansion converts a {class, interface} def to a function.

trait Bar { def m(p:String):Unit }
class Foo extends Bar {
  val f: (p:String)=>Unit = println(“function”)
  def m(s:String): Unit = println(“function”)
  val fn = m _
}

I don’t know exactly. The code the scala compiler produces. The actual code generated should be identical. Though that is not a requirement. A concrete definition of a method is a function. The abstract definition of the method (contract) defines a multimethod. In clojure, the macros

(defmulti f Class)
(defmethod f String [s] (println s))

Is a multi-method (abstract) declaration. In McCarthy lisp it is the same as make-specializable. The method declaration defines a concrete function that is specialized on the parameter “s”. This is equivalent to the scala code above. With the exception that “this” is an implicit parameter that is never referenced.

The macro “defmethod” creates a function. In scala the “def” in the class Foo is a function. The abstract declaration “def” in the trait Bar defines a multimethod. The scala team may use different names. But the concepts are identical.