Scala appears extremely bloated for a first-time user

venkat · June 5, 2021, 3:26pm

I’m well experienced in java since v1.1. Though I have coded in python recently, java is my home-ground. So, wanted to checkout a bit on Scala. Followed instructions to install the Metals extension in VSCode. Then I created a new project by selecting hello-world.g8.

Then all the hell broke loose. It started downloading something called “bloop” (I came to know that it is a build server), though I have sbt already installed locally. I let it complete whatever it is doing. After it completed setting up my awesome one-liner project of hello-world, I took a look at the files it created. I can’t believe my eyes. There are 513 items (folders+files) in the root project folder taking up 1.7MB on disk. I haven’t seen a hello-world project being so bloated in any other language.

What’s going on here? Why was the build stuff allowed to be so overly complicated? I knew the build is getting complicated when people moved from Ant to Maven, but didn’t expect it to get this much worse. I thought Scala build has just one job to do - just do what javac does, and produce a class file. Is there a way to keep it that simple?

All my excitement of seeing a bright, modern off-shoot of Java are gone now

-venkat

venkat · June 5, 2021, 3:36pm

Also, by the way, is there any other extension (other than Metals) which can do just 3 things - syntax highlighting, language assist and then compile to class files? I thought a VS Code extension should be capable of doing these things out of the box.

jducoeur · June 5, 2021, 3:44pm

I suspect part of that is that you aren’t noticing the bloat of Java itself. Consider: a Java program isn’t just that one class file – it’s that file plus the gigantic weight of the JRE, all of which is installed elsewhere on your machine.

Scala builds on top of the JVM, but it isn’t just the JVM: it has its own standard library as well. I haven’t measured it, but I suspect the Scala stdlib is a couple of orders of magnitude smaller than Java’s is – but you’re used to ignoring the Java one, so you’re startled by the new stuff.

On the plus side, most of that downloading is a one-off, to set things up on your machine. Additional projects are generally much quicker.

venkat · June 5, 2021, 4:10pm

Thanks for the note. I did optimistically think it must be some “one-off” thing, but the stuff (153 items) are created right inside the project root folder, which means each project will have that baggage.

Reg comparing to java, JRE is outside of the bloat I’m talking about. I’m just looking at stuff to deal with while going from source code to bytecode. And how the official plugin creates this bloat and forces bloop into the project.

Why can’t it instead use scalac to start with? and leave the option of build server to the user?

BalmungSan · June 5, 2021, 4:25pm

You can use sbt as your BSP, I do that.

But really, I do not understand what is the deal with the tool downloading something, you can think of bloop as an implementation detail of metals.

Also, you can choose to not use metals at all, just download the official syntax highlight plugin and do the compilation manually in the shell.

Also, why do you care about the number of files created? You do not need to commit those and you can delete them whenever you want. Like, have you checked the amount of files git creates.

MarkCLewis · June 5, 2021, 4:37pm

You can start with scala and scalac just like you might start with java and javac. I do that with my students when we start with scripts using vim as the editor. However, things get challenging as soon as you decide you want to use anything that isn’t part of the standard library. Over time I have noticed that this approach to doing things has been downplayed and it is more challenging to find the downloads as it has gotten easier to set things up with sbt. I like sbt and Metals as a development environment. There are a number of things it pulls down when you first get started, but I have to say that it still feels a lot lighter than Eclipse, which was always my standard tool for Java development.

cbley · June 5, 2021, 5:12pm

You haven’t used Node.js, obviously…

$ find node_modules | wc -l
45944
$ du -hs node_modules
306M	node_modules

Nobody really complains about this. And rightly so, why would anyone care?

Because that would mean a lot of people would complain about the build being slow. sbt as well as bloop compile incrementally and scalac cannot do that, AFAIK. We should provide good defaults, and the average user really never needs to tinker with the build tooling that is hidden behind the scenes.

venkat · June 5, 2021, 5:24pm

I did think of nodeJS. But the node-modules folder is re-used across all project. Just now I have created another hello-world project with just 4 lines (printing of hello-world is moved to a function now). This one has created 506 files inside the project folder, and it installed all stuff afresh. This doesn’t happen with NodeJS, Python or Java

venkat · June 5, 2021, 5:31pm

I agree about dependency resolution requiring a mature build tool. However it doesn’t seem be related to dependencies being pulled here. Nor these things are being downloaded to a common folder such as .m2 (in case of maven) or node-modules (for nodeJS) or python libs. What I was complaining is about per-project bloat which is bad.

Ichoran · June 5, 2021, 6:07pm

Scala isn’t a good language to develop with on devices that are very constrained in memory or storage space. It also assumes you have a decent internet connection.

If you have a not-terribly-resource-constrained machine with an okay connection, as long as the build tools keep track of the files themselves, completely, and don’t bug you with them, who cares how big they are or how many files there are?

Is it actually causing any problem? When you want to deploy something, you create an assembly or somesuch–the tools take care of that too.

The build is usually incredibly simple, not complicated. It handles the complications. Compared to Ant or Maven, you have a few lines that specify exactly the critical information you need, and you’re good.

But that’s not enough. You don’t want to compile programs dissociated from everything else, do you? Don’t you want to be able to use the other numerous fantastic libraries that exist? That’s not just “produce a class file”. What about documentation? Don’t you want it to be easy to create? Integrate with an IDE? It has to happen somehow…but that’s not just “produce a class file”. What if you have seven different builds you want to create? What if you have a bunch of internal dependencies in your project?

You should judge build tools and languages by how easy they make your work, not how many files they produce along the way, in the vast majority of cases.

(Incidentally, judging this way leads me to use mill rather than sbt as the build tool…I find sbt easier for hello world, mill easier for every real project I want, and sbt easier for when needing to do a bunch of complex stuff via plugins that only sbt has…except I don’t need that. But either way, I don’t count the number of files; I count how hard it is to express my build logic in the build tool, and get the program to do what I want it to do in the language.)

mario-galic · June 5, 2021, 7:20pm

➜  sbt new scala/scala-seed.g8
name [Scala Seed Project]: hello-world
➜  cd hello-world
➜  hello-world  ls -l
total 8
-rw-r--r--  1 mario  staff  440  5 Jun 20:13 build.sbt
drwxr-xr-x  7 mario  staff  224  5 Jun 20:14 project
drwxr-xr-x  4 mario  staff  128  5 Jun 20:13 src
drwxr-xr-x  7 mario  staff  224  5 Jun 20:15 target
➜  cd ..
➜  find hello-world -type f | wc -l
     107
➜  du -sh hello-world
412K	hello-world
➜  cd hello-world
➜  hello-world sbt package
➜  hello-world du -sh target/scala-2.13/hello-world_2.13-0.1.0-SNAPSHOT.jar
4.0K	target/scala-2.13/hello-world_2.13-0.1.0-SNAPSHOT.jar
➜  hello-world echo 'addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")' >> project/plugins.sbt
➜  hello-world sbt assembly
➜  hello-world du -sh target/scala-2.13/hello-world-assembly-0.1.0-SNAPSHOT.jar
5.6M	target/scala-2.13/hello-world-assembly-0.1.0-SNAPSHOT.jar

There are around 107 files, and the size of the whole project is 412K. Fat jar for hello world is 5.6M which is mostly taken by scala-library.jar, whilst the “skinny” one is 4K.

venkat · June 5, 2021, 8:42pm

That “project” folder has a “target” folder inside it with similar contents as the “target” folder at the same level. Both “target” folders have “streams” folders in them which contribute to most of the total 314 items in the project you have built (after sbt package). How is this project structure (and streams) meaningful for a hello-world project?

BalmungSan · June 5, 2021, 9:12pm

They aren’t, those are files produced by your build tool. You do not care about them, you do not commit them, you do not keep track of them.

Also, you are mistaken about Python and NodeJS, the node_modules folder is per project. For Python installing dependencies at the system level is always a bad idea; you can break your system. At user level is useful for tools like the aws cli or jupyter, python projects use virtual environments which are way more bloated than the target folders of Scala.

Finally, Scala do reuse downloaded dependencies between projects (contrary to NodeJS or Python). And the target folder is not terrible different for one of Java.

Ichoran · June 5, 2021, 9:24pm

I’m not sure that using a “Hello World” project to understand the workings of a high-powered full-featured build tool is the best approach.

If you want to understand the tooling in depth, that’s great, but the goal of the tooling is not to produce the cleanest “Hello, World” possible. So if you go in with this perspective, you won’t have an easy time understanding why things do what they do.

If you’re actually operating in a situation where a few dozen or hundred or thousand extra files is preventing you from accomplishing what you need to, then if you share your constraints we might be able to help (or advise what other languages would better suit your needs). So far I’ve encountered exactly zero cases where the JDK is fine but Scala tooling is too much. But if you have one, let’s hear about it–maybe we can help, or at least save you time in eventually judging that it won’t work for you.

tarsa · June 6, 2021, 9:28am

That’s because a project definition in SBT is itself a Scala project and that can be nested several levels deep: sbt Reference Manual — Organizing the build Where I work, we actually exploit that for automatic dependencies resolution (i.e. I wrote some sbt plugins that require recursive project definitions).

It looks like these streams are used for debugging purposes, when build fails: sbt Reference Manual — Tasks

Usually the files that are produced from your source code and that are going to be needed when deploying your application are located in target/classes folders.

sbt is the counterpart of Maven. The counterpart of javac is scalac. scalac is included in standalone Scala package. You can download that using e.g. coursier Single command Scala setup - Alex Archambault or by using link at the bottom of Redirecting… :

The (big) disadvantage is that scalac alone won’t help you with dependency management and lots of other stuff related to building projects. Therefore I recommend to stick to sbt, just like Java programmers should stick to Maven rather than invoking javac themselves.

siddhartha-gadgil · June 6, 2021, 10:16am

By the way, if you actually want to run small programs in scala there is an option: Ammonite scripts.

Being more used to scala, I find Python running or not depending on what is installed somewhere else a little alarming. I suppose this is a little like static versus dynamic typing - ease at small scales versus managing complexity.

regards,
Siddhartha

martijnhoekstra · June 6, 2021, 12:12pm

If you just want that, you could just run scalac. For a single file, that’s quite ok.

Mind you, I don’t recommend that when you go beyond a single file, or maybe two. People use build tools to make life simpler for them. Just using javac breaks quickly is more difficult than the simplicity is worth. If that’s your preference however, nothing is stopping you from doing that.

venkat · June 7, 2021, 5:21pm

Many thanks for the sbt docs link. Couldn’t see it explaining the absolute necessity of having nested target folders or what breaks if you cut it down to just one target folder (as in maven) though. Any complexity added without strong reason makes it look bad. Hope there are some good reason which I didn’t understand yet.

venkat · June 7, 2021, 5:25pm

You are right about about NodeJS. Thanks for pointing it out. However, the node-modules is clearly a dependency repo (same as maven-repo). I wouldn’t have complained if the 500+ items in my hello-world project are dependency libraries required by the hello-world code. I don’t think that is the case here.

BalmungSan · June 7, 2021, 9:21pm

I hope I do not sound rude, but I still do not understand what exactly is your concern.

Because it seems you are not really concerned about the size of the files, nor if big files (like dependencies) are cached or not. Only the amount of them; why?

Also, what is your expectation? It is very unlikely that this would change for sbt (for multiple reasons). Also, you have other alternatives like using scalac directly or using other build tools like mill or seed.