Interesting, hadn’t heard of seed before.
1 Like
Thanks. I have knocked off sbt and switched over to Maven with scala-maven plugin. Everything looks clean now; all dependency jars in common repo, and only the project-specific stuff in the project folder (18 items vs 900+ items). “mvn package” delivers the jar which runs fine with spark-submit. Not sure what I’m missing by not having SBT here.
sbt also saves dependencies to a shared repo.
For a Spark project you probably won’t be missing anything, it is probably even recommended to use maven for those.
Off-topic, once your Spark jobs become more complex and you start to need dependencies you have fourth options:
- Install those jars in your cluster, usually a bad idea unless you have a cluster per job (or set of related jobs).
- Pass the jars when submitting the job.
- Create an uber jar with your dependencies and your code, but that doesn’t include neither Spark nor Scala. Not sure how to do this with maven, I guess it is possible since sbt allows you to do it using sbt-assembly.
- Create a thin jar of your code and an uber jar with only the dependencies (again without Scala and Spark) to submit. I never did this, but I believe people used sbt-native-packager for that.
Finally, remember to use the exact versions of your cluster for Spark and Scala in your code; even binary version numbers.
1 Like