Scala appears extremely bloated for a first-time user

Interesting, hadn’t heard of seed before.

1 Like

Thanks. I have knocked off sbt and switched over to Maven with scala-maven plugin. Everything looks clean now; all dependency jars in common repo, and only the project-specific stuff in the project folder (18 items vs 900+ items). “mvn package” delivers the jar which runs fine with spark-submit. Not sure what I’m missing by not having SBT here.

sbt also saves dependencies to a shared repo.

For a Spark project you probably won’t be missing anything, it is probably even recommended to use maven for those.

Off-topic, once your Spark jobs become more complex and you start to need dependencies you have fourth options:

  • Install those jars in your cluster, usually a bad idea unless you have a cluster per job (or set of related jobs).
  • Pass the jars when submitting the job.
  • Create an uber jar with your dependencies and your code, but that doesn’t include neither Spark nor Scala. Not sure how to do this with maven, I guess it is possible since sbt allows you to do it using sbt-assembly.
  • Create a thin jar of your code and an uber jar with only the dependencies (again without Scala and Spark) to submit. I never did this, but I believe people used sbt-native-packager for that.

Finally, remember to use the exact versions of your cluster for Spark and Scala in your code; even binary version numbers.

1 Like