Parallel processing: Java vs. Scala

As an aero engineer that uses Scala, I am wondering how Java and Scala compare in terms of parallel processing safety and simplicity. If Scala is superior, it would help me advocate for it in my organization.

I use immutable case classes and parallel Vectors extensively. They are very convenient for parallel processing since all you have to do is add “.par” to the iteration to safely make it parallel.

I saw the following coursera course on parallel programming in Java:

How much of that material is just a harder way to do parallelism than using parallel Vectors in Scala? Thanks.

Java lags Scala in parallel processing efficacy, but Scala is dragging Java along, as the best features of Scala tend to get adopted into Java eventually.

I also was able to transform a program from scalar to parallel just by the strategic addition of “.par” in a line of code. On a 4 core 8 thread system, the performance of the program increased by about 40% in the test case using real data. And yes, immutable collections, and good functional design makes .par work even better.

Scala Future is more straightforward than Java Futrure, where you have to use CompletableFuture to get the same effect.

Akka can really help, but now we are talking more about concurrent computing than parallel computing. Akka has both Java and Scala APIs, but the Scala APIs are more powerful and streamlined. Also, Scala has ‘for comprehensions’ which can express a lot of things better.

Back to parallel computing, compare the Scala collection classes with the Java ones.

fwiw,

3+ years ago Rex Kerr benchmarked Java 8 parallel streams vs. Scala’s parallel collections:

https://groups.google.com/d/msg/scala-internals/cCH3YbS3UUI/BFNypJqub2MJ

Hopefully things have improved since, but I don’t know.

From a syntax perspective, Scala is, in my biased opinion, superior.

I appreciate the replies, but I am a bit disappointed in the lack of enthusiasm about the superiority of Scala over Java for parallel programming.

Also, not being a Java programmer, I am still wondering how far the Scala parallel Vector class can take me compared to all that Java stuff in that coursera class.

I’m guessing that most of the complexity of Java parallelism is due to the fact that most Java classes and collections are mutable, forcing the developer to use locks and other defensive measures to prevent race conditions. Am I wrong about that?

That’s a bit of a scala community thing. You will be hard-pressed to find anyone in the scala community to claim scala is the best programming language ever for any purpose.

Scala itself doesn’t have anything to say about parallelism or concurrency, so it’s rather hard to be enthusiastic about that.

As for libraries that provide parallelism on the JVM target, scala parallel collections are really nice to work with IMO. For large scale parallelism spanning multiple machines, SPARK is really neat.

Whether you consume those libraries from - from Scala, form Java, or from any other language that can use JVM libraries doesn’t really matter much.

In that sense, scala isn’t better at parallelism at all.

Yeah, I think @martijnhoekstra has the right of it here. The thing is, Scala is just a language; parallelism is really a library-level function. So it’s a somewhat apples-and-oranges question.

That said, using libraries is often more pleasant from Scala than Java, just in terms of how much boilerplate you have to wrestle with. For example, Akka (mentioned upthread a bit tangentially) works fine from both languages – but it’s considerably more pleasant to work with in Scala. And to be fair, Scala’s standard-library versions of some concepts like Future are often a bit easier to work with than Java’s.

But broadly speaking, you should be evaluating libraries here, not languages per se. The Scala ecosystem has a lot to offer in that regard (including a lot of great stuff on the functional-programming side), but it’s not world-shaking differences, just well-thought-out, relatively usable libraries…

1 Like

I guess I’ll take a minute to throw in some of my opinions here. The TL;DR version is that I find Scala to be a much better language for programming parallelism than Java. That isn’t because it has better libraries though, it is because it is a more powerful language that stresses a functional style.

The bottom line is that anything you do in Scala in terms of parallelism could be done in Java. Every library supporting parallelism in Scala is either also available in Java, like Akka, or has equivalents in Java, like parallel collections and composable Futures. The way you interact with those libraries is different though. I’ll start with a simple example using a parallel collection. Here’s the Scala code.

val mappedSum = values.par.map(f).sum

How this translates to Java varies a bit depending on the type of values, whether it is an array or a List and issues with primitives, but it will be a fair bit longer. More importantly, using the parallel stream is only safe if the function f is a pure function. Pure functions are much more the norm in Scala than they are in Java. As such, it is much more likely that you will be able to take existing functions in Scala and use them in parallel than you will be able to in Java.

I feel that Futures provide an even better example, because they illustrate the power of the Scala syntax to make DSLs, something that Java clearly lacks.

val futures = for(v <- values) yield Future { f(v) }
val resultF = Future.sequence(futures).map(_.sum)

This code does the same thing as the parallel collection, but it does it in a way that could be used to do far more interesting things. Because it does the same thing, the issue with the nature of f remains, but what stands out to me in this example is the ease of expressing the creation and use of Futures. I know that there are people on these boards who dislike the pass-by-name semantics in Scala, but I think they are wonderful for the creation of DSL features like Future. This code would make a lot of people think that Future is a language construct because of the Future { ... } syntax, but as has been pointed out in this thread, no parallel constructs are built into the Scala language. It is just that the language is flexible enough to make libraries that look like language constructs. To my mind, this isn’t just a convenience, it is something that boosts productivity.

To see this, just take the two examples above and convert them to Java using parallel streams and completable futures. For fun, assume that values starts its life as an array, then some refactoring changes it to a List and see how the code has to be modified. Also think about your code base and how often elements in the Java code have side effects, especially mutation, and what it would take to get rid of that to make it so that it was safer to reuse existing code in parallel contexts. If the code were written in Scala, odds are good that the code style would be largely functional with a heavy use of val and immutable collections. Most of your functions would be pure functions, and would, therefore, be safe to use in a parallel context.

The thing about parallelism is that it is really a project-wide concern and it goes way beyond just performance issues. So if you have a team of highly qualified Java developers who are always diligent and don’t mind writing more verbose code, then they can probably do just as well in Java as in Scala. However, standard Scala style will produce code that is both shorter and less error-prone, so I expect most real teams will get more done using Scala.

1 Like

Just two cents on this topic: scala’s for comprehensions are part of what makes futures easier to use in scala than in java… and that is a language feature, not a library feature.

Thanks again for the replies. They have greatly helped my understanding. Now let me throw another related question out there. I’ve been hearing lately about GPUs for massively parallel computing. Can current Scala easily make use of these by using “.par” just like it can make use of multiple cores on a “conventional” CPU?

Unfortunately, no.

Maybe stuff like that will come to the JVM: http://openjdk.java.net/projects/sumatra/ - don’t hold your breath though.

Aparapi tried to do it with a library approach. AFAIK, that initiative is dead.

I don’t expect anything like this to exist in scala anytime soon - though who knows what those guys over at EPFL will think of next (the dotty metaprogramming features as they are experimented with now might make something like this possible as a research goal in the future)

The Compute.scala library (https://github.com/ThoughtWorksInc/Compute.scala) that was just announced on one of these boards might be worth looking into. It doesn’t give you GPU with just .par, but it is aimed at bringing OpenCL computing access to Scala projects using a Scala style.

1 Like