Lambda side effect change when `inline` is used - is this intended?

jpsacha · November 21, 2022, 4:50pm

I trying to add support for Scala 3 to the cfor project (Scala cfor macro, like a java for-loop). In Scala 3, the implementation is quite compact and relies on inline. However for the inlined methods the side effects change in some situations.

Here is a simplified implementation of an optimized loop that illustrated the issue:

inline def loop(n: Int)(inline body: Int => Any): Unit =
  var i = 0
  while i < max do
    body(i)
    i += 1

This executes n times the lambda body.

Consider a use case with side effects:

var v = 0
loop(3) {
  v += 100
  x => println(x)
}
println(s"v: $v [should be 100]")

The final value of v is different depending if body is inlined or not, that is, the signature of loop above is changed to

inline def loop(n: Int)(body: Int => Any)): Unit = ...

Without inline, as expected, the final value of v is 100. With inline the value is 300.

Is this an expected behavior for inline?
If it is expected, is there a way to modify the implementation of loop to maintain the compatibility with old code using side effects like that?

javax-swing · November 21, 2022, 10:19pm

I think that semantically this makes sense, it boils down to a strictness problem, you can achieve the same result with a by name parameter rather than inline.

I don’t think you can achieve what you want without losing the performance benefits of the inline.

you may be able to use a macro to achieve what you want, but that could be quite a bit of work.

are you sure that you need it to be inlined? usually the JIT is pretty good at optimising these things on the fly

jpsacha · November 21, 2022, 11:13pm

Scala for is slower than while loop (or the inlined code), though may not be important for everybody.

To put some number on it, here is benchmark comparing cfor (inline), to Scala regular for (foreach) and while loop. In Scala 3.2.1 (using code from the PR):

Benchmark            (length)  Mode  Cnt     Score    Error  Units
cforSum                  1000  avgt   10   234.728 +-  3.829  ns/op
scalaForeachSum          1000  avgt   10  1096.873 +- 16.572  ns/op
scalaWhileSum            1000  avgt   10   239.102 +- 10.611  ns/op

The cfor code is more that 4 times after.

Interestingly in Scala 3.13.8, Scala foreach was doing better:

Benchmark            (length)  Mode  Cnt     Score     Error  Units
cforSum                  1000  avgt   10   232.725 +-   3.174  ns/op
scalaForeachSum          1000  avgt   10   418.292 +-  11.573  ns/op
scalaWhileSum            1000  avgt   10   231.293 +-   4.036  ns/op

still cfor is almost twice as fast as Scala for.

Depending what your code does that overhead my more or less important. When you write computationally intensive code, like image processing, that overhead in Scala for is really visible. Using while is possible, but the code is less clear and more error prone.

The side effect in question come up in existing cfor tests (some failed). For me using this type of side effect is strange, but it is legal Scala code.

I just do not know if that difference in side effect is normal or a compiler bug.

markehammons · November 22, 2022, 6:40am

This seems very much to be expected. When using inline parameters, you’re passing in a chunk of code, and that chunk of code replaces body(i) in your inline method. The chunk in question is the block of code you passed in that contains a side effect. In non-inline parameters, the side-effects in the block are executed and the block is reduced to the result (last line of the block). With an inline parameter, the block is not reduced to it’s result, but rather passed in whole, and then replicated everywhere you refer to the inline parameter.

That means you have v += 100 inside the body of your while loop.

If you want the side-effect to only be executed once, you must expand body outside of the while loop. That means something like val fn= body outside the while loop, and using fn inside the while loop.

The behavior you’re seeing is not a bug. This behavior of inline parameters makes them very useful, but also very dangerous. You have to take care with programming with them.

som-snytt · November 22, 2022, 7:25am

I have different words to describe what is said in the other replies.

loop(42)(f)

passes an expression f.

loop(42) { f }

also passes an expression, but it is a “block expression”, { f }.

That is, the whole block in braces is the arg to the function.

For by-name or inline, it doesn’t matter what the block expression looks like.

The type of the block is, of course, the type of the result expression at the end.

Where you put the x => delimits or demarcates the result expression. That’s why you don’t have to write

xs.map { x => { f(x) } }

I think I saw a ticket or PR about dropping this syntax. I hope I am mistaken, or I hope the SIP committee nixes it. (SIP apparently stands for “So long as It Parses”. Just kidding in a way that is not entirely untrue.)

If the expected type for the loop parameter is some special type ToDo, where there is a conversion from a function to ToDo, then the conversion is applied as:

{
  v += 100
  ToDo(println(_))
}

where the arg is by-name ToDo(f: => Any => Unit).

Then evaluating the body once produces a ToDo after evaluating the statements you wanted evaluated once.

Then the ToDo can be applied n times.

“I didn’t try this, on my phone.” OP assumes all risk, etc.

som-snytt · November 22, 2022, 8:03am

Sample because I’m trying to keep my feet wet with Scala 3

import scala.language.implicitConversions

object Syntax:
  class ToDo(body: => Int => Unit):
    lazy val g: Int => Unit = body

  given Conversion[Int => Unit, ToDo] with
    def apply(g: Int => Unit): ToDo = new ToDo(g)

  def loop(n: Int)(f: ToDo): Unit =
    var i = 0
    val fg = f.g
    while i < n do
      fg(i)
      i += 1

@main def test() =
  import Syntax.{*, given}
  var bumps = 0
  loop(42) {
    bumps += 1
    println(_)
  }

which looks like

[[syntax trees at end of                     typer]] // looper.scala
package <empty> {
  import scala.language.implicitConversions
  final lazy module val Syntax: Syntax = new Syntax()
  final module class Syntax() extends Object() { this: Syntax.type =>
    class ToDo(body: => (Int => Unit)) extends Object() {
      private[this] val body: => Int => Unit
      lazy val g: Int => Unit = ToDo.this.body
    }
    final lazy module given val given_Conversion_Function_ToDo: Syntax.given_Conversion_Function_ToDo =
      new Syntax.given_Conversion_Function_ToDo()
    final module class given_Conversion_Function_ToDo() extends Conversion[Int => Unit, Syntax.ToDo]() {
      this: Syntax.given_Conversion_Function_ToDo.type =>
      def apply(g: Int => Unit): Syntax.ToDo = new Syntax.ToDo(g)
    }
    def loop(n: Int)(f: Syntax.ToDo): Unit =
      {
        var i: Int = 0
        val fg: Int => Unit = f.g
        while i.<(n) do
          {
            fg.apply(i)
            i = i.+(1)
          }
      }
  }
  final lazy module val looper$package: looper$package = new looper$package()
  final module class looper$package() extends Object() { this: looper$package.type =>
    @main def test(): Unit =
      {
        import Syntax.{*, given}
        var bumps: Int = 0
        Syntax.loop(42)(
          {
            bumps = bumps.+(1)
            Syntax.given_Conversion_Function_ToDo.apply(
              {
                def $anonfun(_$1: Any): Unit = println(_$1)
                closure($anonfun)
              }
            )
          }
        )
      }
  }
  final class test() extends Object() {
    <static> def main(args: Array[String]): Unit =
      try test() catch
        {
          case error @ _:scala.util.CommandLineParser.ParseError => scala.util.CommandLineParser.showError(error)
        }
  }
}

I had already forgotten how to write a Scala 3 import. I know I said somewhere that one shouldn’t have to remember how to write an import, and now I remember why I said that.

There’s probably a mechanism to pass in an arg to the main and _.toInt etc but I also forgot how to do that.

If I can’t remember how write something in vi, then it’s probably not worth remembering without autotooling.

Alexa is my Copilot is the bumper sticker.

Edit: I forgot what the OP was. The daughter did fix her fish’s tank water, so Leo the fish isn’t likely to die tonight. I joked that the fish’s name is “Six” as in 六, she speaks Chinese not because of me, and may I say sometimes she still can’t help laughing or rather chortling at my jokes, such as Jupiter is so bright, do you know which planet is closest? The one you’re standing on. So when I came back, I verified that adding inlines still “works”. For some definition of “work” left as an exercise for the reader.

jpsacha · November 23, 2022, 2:20am

This approach with ToDo works with the side effect. given Conversion requires a concrete type.

I probably oversimplified the “loop” example. The actual implementation of cfor uses parametric type for the incremented variable (not Int):

inline def cfor[A](inline init: A)(inline test: A => Boolean, inline next: A => A)(inline body: A => Any): Unit =
  var a = init
  while test(a) do
    body(a)
    a = next(a)

Not clear if the ToDo approach could be used with a parametric type due to given Conversion

bishabosha · November 23, 2022, 8:58am

For reference, I contributed the Scala 3 version of cfor in Spire, which will only execute side-effects once.

More work can be done still to improve the inlining of side-effecting blocks that return lambdas

som-snytt · November 23, 2022, 10:22am

Not sure if you’re unclear just about:

given cv[A]: Conversion[A => Unit, ToDo[A]] = ???

I sprinkled type params throughout.

Now to go study bishabosha’s macro.

jpsacha · November 23, 2022, 9:56pm

Thanks. I did not think about adding type parameter that way. Unfortunately, the benchmarking shows significant performance drop from 294 ns/op to 1417 ns/op.
I am going to table this path for now. I will taka look at @bishabosha implementation. cfor project ReadMe says that it was originally inspired by Spire macro