Recommended Scala/JVM ecosystem learning resources for Pythonista?

anthony.cros · May 8, 2021, 9:15pm

Author of the Gallia Project here. Funny I landed on this post completely by chance, while searching for a Scala.js-like project for Python (I will set up alerts from now on, to be notified).

Interestingly the reason I was searching for such a Python-related project is because time and again I find myself impressed with their tools (seaborn specifically here). To give perspective in the context of Gallia, at some point I was considering the kind of visualisation libraries I could offer support for (the way I “support” mongodb for instance). But the libraries I found in Scala felt over-complicated, or had giant dependency graphs that seemed unwarranted. The Java ones of course felt clunky. And then sure enough, I stumbled upon seaborn: easy to use, to the point, pretty. Now the thing is Python itself is something I dislike generally speaking, for reasons that should be obvious to users of this forum at least. But I think Scala must find a way to better bridge the wonderful libraries that exist in Python world, because their pragmatism is what the Scala ecosystem most crucially misses in my opinion (as illustrated in Gallia’s goals).

Now regarding how Gallia fits in the picture for @phendric , it would probably be most comparable to Python’s pandas. One major difference however would be the fact that Gallia isn’t a "dataframe’ library, in the sense that it does not expect the data to be tabular. I’m not a pandas expert, but looking at this SO answer for instance makes me think that pandas might at least feel a bit unnatural upon dealing with nested structures. Here’s what I mean: example of handling gene interactions with Gallia, for a given gene (“Genemania” dataset). Gallia also offers a shorthand to help with referencing nested fields as if they were top-level ones: see documentation. Lastly these two examples show what some pandas processing look like in Gallia:

example 1 (uses Eurostat census data)
example 2 (uses Football premier league data)

In terms of other other data manipulation libraries on the JVM (excluding big data tools), I’m also aware of these:

Scala: Saddle and Frameless
Java: Tablesaw
Kotlin: Krangl

I have however only played around with them a little, not enough that I could really comment on their merits.

Hope that helps!