Running multiple sbt processes simultaneously

I’d like to run several (100s of) sbt processes on a compute cluster. I notice that when sbt first starts up, it seems to update some files in my home directory. And then compile the scala code. If I run multiple sbt processes, even on different hosts, but which share my home directory, is this safe? Is sbt smart enough to use some sort of cooperative locking for this update, or do I need to force very sbt run to use a different resource directory.

What about the compilation, what if two different sbt processes try to recompile the same files. Is it my responsibility to prevent this by assuring that nothing needs to be recompiled? If so, can I tell sbt to exit with an error if it thinks it needs to recompile something?

[johan:cl-robdd/src/cl-robdd-scala] jnewton% sbt run dimacsParse
[info] Loading settings for project global-plugins from idea.sbt,plugins.sbt ...
[info] Loading global plugins from /Users/jnewton/.sbt/1.0/plugins
[info] Updating ProjectRef(uri("file:/Users/jnewton/.sbt/1.0/plugins/"), "global-plugins")...
[info] Done updating.
[info] Loading project definition from /Users/jnewton/sw/regular-type-expression/cl-robdd/src/cl-robdd-scala/project
[info] Updating ProjectRef(uri("file:/Users/jnewton/sw/regular-type-expression/cl-robdd/src/cl-robdd-scala/project/"), "cl-robdd-scala-build")...
[info] Done updating.
[info] Loading settings for project cl-robdd-scala from build.sbt ...
[info] Set current project to cl-robdd-scala (in build file:/Users/jnewton/sw/regular-type-expression/cl-robdd/src/cl-robdd-scala/)

This won’t answer your questions but it may be of interest. When you launch your first sbt session, it will launch an “sbt” server (tested only in Linux). Any other sbt process running on the same machine will detect and connect to this server. I think that every time you issue a command, these will go to the server. However I don’t know if the compiling will be synced (uses zinc that supports incremental compilation).

As far as locking goes, I have only seen this when the ivy repository is accessed.

I regularly use 2 to 3 session of sbt when testing data-processing software: one runs a process to generate and send data, another reads and process data generating results and a 3rd gets the results and displays this in a GUI. I only use one of these session to compile the code, so have not had any syncing issues.

Hope this is of some use.

Here is my updated plan. I have several 100 sbt jobs to run. I will create schedule one job for each in the compute cluster but I will make them all dependent on an initial job whose task it is to compile the code using “git pull ; sbt compile”. This should guarantee that no “sbt runMain …” job will need to recompile anything.

It appears that sbt does intact have some sort of locking. Here is the message I received when running multiple sbt processes on several nodes in our compute cluster.

cd /lrde/home/jnewton/sw/regular-type-expression/bin//../cl-robdd/src/cl-robdd-scala

sbt -Dsbt.log.noformat=true "runMain dimacs.dimacsParse $argv"
sbt -Dsbt.log.noformat=true runMain dimacs.dimacsParse /lrde/cluster/jnewton/SAT-benchmarks/NoLimits g2-ACG-20-10p1.cnf
Waiting for lock on /lrde/home/jnewton/.sbt/boot/sbt.boot.lock to be available...

In my experience this locking seems to occur only when you launch sbt when it looks for the required libraries in the ivy repository. Once it is done, the lock is released and the next sbt instance can proceed.

Why do you need to use sbt to launch the applications? You can do this directly via Java in the command line. I use a fat jar to do this (may not be the best way to do it). I have some notes on this. Please see for more information:

HTHs

@hmf, Thanks for the comment, but I’m not sure I understand the motivation of your question. Is sbt something we should avoid using? It seems to me like a tool that solves many problems I’d have to solve by hand otherwise.

I took a look at your link to PRODUCTECH, but I admit that I don’t see the connection to what I’m trying to do. Even the first paragraph assumes the reader has much more understanding that I do. For example what does it mean 1) for “a project to terminate”? 2) What is Predictive Maintenance Analytics, 3) What is ADW?

You definitely should use a build tool for, well, building your app. Whether you should use it to launch the application on the client site is a different question. I’d second @hmf with the recommendation of building and deploying a (fat) jar that only requires a JVM to be installed on the client.

Same here, looks like you’ve detected a copy/paste anomaly. :wink: I’d recommend something like sbt-assembly.

1 Like

I prefer, for deployment, the SBT Native Packager. Once you have your DEB or RPM or whatever package, you can install it and launch your app like any other app, and you can later easily remove or upgrade it.

General agreement with the above. While sbt has a “run” command, that’s mostly used during development – while I’m working, I’ll usually stay within the sbt shell, doing a tight loop of compile-run-debug-edit-repeat.

But it’s unusual, even rare, to use that for actual deployment, and sbt isn’t optimized for that. The various deployment options mentioned above are much, much more common, and more likely to work well.

1 Like

Apologies for any confusion (better links below). The notes include the use of the sbt pluging that provides the assembly command. This generates a fat jar with all the libraries you need to execute your code. The text just explains:

  • how to set-up the plugin [1]
  • use the assembly command [2]
  • execute the code using Java [2]

With this you can create a script to execute as many instances of as many applications you have in the project. No need to fuss around to set-up the class paths.

HTHs,

  1. https://cese.gitlab.io/adw/docs/installation.html
  2. https://cese.gitlab.io/adw/docs/installation.html#using-the-fat-jar

Correct, sbt assembly. The linked repo readme may be a little intimidating - hence the notes.