I’m one of the authors of scalding and several similar libraries for building distributed computation pipelines. One thing those libraries generally can’t do is real map-fusion of the code:
pipe.map { x => x * 2 }.map { x => x + 2 }
we might like to avoid boxing twice and have that optimized to
pipe.map { x => (x * 2) + 2 }
one way I had thought we could solve this was with dotty’s ability to do staging and hold expressions.
Imagine a toy like:
object PipeExample {
sealed trait Pipe[A] {
def toExpr(using QuoteContext): Expr[List[A]]
}
case class Source[A](toList: List[A]) extends Pipe[A] {
def toExpr(using QuoteContext) = ???
}
case class Mapped[A, B](first: Pipe[A], fn: Expr[A => B]) extends Pipe[B] {
def toExpr(using QuoteContext) = ???
// here we would recursively expand all the functions into a single method so the JVM doesn't see them as megamorphic calls to Function1.apply.
}
extension pipeOps {
inline def [A, B](first: Pipe[A]).map(inline fn: => A => B)(using QuoteContext): Pipe[B] =
Mapped(first, '{fn})
}
inline def compile[A](p: => Pipe[A])(using Toolbox): List[A] =
run(p.toExpr)
}
this won’t compile because you get:
[error] 204 | Mapped(first, '{fn})
[error] | ^^
[error] | access to value fn from wrong staging level:
[error] | - the definition is at level 0,
[error] | - but the access is at level 1.
A similar issue will come up with this pattern without map fusion. If we build up nodes in a dag that will later build Exprs, we generally want to tie the type of the nodes to the type of the expressions they generate, but every time I try to do that, I get the error:
[error] | access to type A from wrong staging level:
[error] | - the definition is at level 0,
[error] | - but the access is at level 1.
[error] |
[error] | The access would be accepted with a given scala.quoted.Type[A]
I bypassed this error by just using Any and then casting to the correct type. I think my casts are sound, but I don’t like having to cast (even if users wouldn’t have to interact with the casts).
My questions:
- can someone show a toy example without casts that achieves the map-fusion optimization (which is just an example not the entirety of what one would want)?
- are there real soundness bugs these limits about level 0 and level 1 errors are complaining about, or is it (as I suspect) ruling out many sound programs in order to also rule out many unsound programs?
- If the above is true, is there a prospect for a more precise static check that could allow us to express this pattern without resorting to casts?
Thanks.