2.11 scala compiler internals

Hello!

While trying to understand bottlenecks in scala (2.11) compiler on our monolith, I came across the following observation

I tried compiling 10000 scala files using scalac, all of these files were independent of each other i.e. they can be compiled in isolation

scalac *.scala

This completed in around 38s

I also noticed the scalac compiler compiles all the files serially ( checked thread dumps e.t.c ). So, I tried compiling batches of 5000 files using two separate scalac invocation. I hoped the compilation to roughly half

To my surprise, this resulted in an even worse scenario where it took a total of 48s to complete compiling each file

So, does scalac compile all the provided files serially or it leverages some form of parallelism in itself?

There is no parallelism within scalac itself.

(Note though that a commercial product from Triplequote called Hydra adds it.)

The usual way to proceed in a scenario like this is to break the monolith up into independently compilable subprojects. sbt has project-level parallelism, all within a single JVM.

I can’t be completely certain why your batching approach didn’t give a speedup, but my best guess would be that by firing up multiple JVMs, you had to pay JVM startup cost multiple times and also pay HotSpot warmup cost multiple times. To avoid that, you really want everything to happen inside a single, already-warm JVM instance, as in sbt.

The other thing I’d note here is that the Scala 2.12 compiler is considerably faster than the 2.11 compiler, so if there’s any chance you can move to 2.12 or 2.13, you’ll almost certainly see a nice speedup.

1 Like