Scala appears extremely bloated for a first-time user

hmf · June 8, 2021, 8:12am

Interesting, hadn’t heard of seed before.

venkat · June 8, 2021, 1:40pm

Thanks. I have knocked off sbt and switched over to Maven with scala-maven plugin. Everything looks clean now; all dependency jars in common repo, and only the project-specific stuff in the project folder (18 items vs 900+ items). “mvn package” delivers the jar which runs fine with spark-submit. Not sure what I’m missing by not having SBT here.

BalmungSan · June 8, 2021, 2:04pm

sbt also saves dependencies to a shared repo.

For a Spark project you probably won’t be missing anything, it is probably even recommended to use maven for those.

Off-topic, once your Spark jobs become more complex and you start to need dependencies you have fourth options:

Install those jars in your cluster, usually a bad idea unless you have a cluster per job (or set of related jobs).
Pass the jars when submitting the job.
Create an uber jar with your dependencies and your code, but that doesn’t include neither Spark nor Scala. Not sure how to do this with maven, I guess it is possible since sbt allows you to do it using sbt-assembly.
Create a thin jar of your code and an uber jar with only the dependencies (again without Scala and Spark) to submit. I never did this, but I believe people used sbt-native-packager for that.

Finally, remember to use the exact versions of your cluster for Spark and Scala in your code; even binary version numbers.