I am using Scala3, and one of my favorite part is to use par (after import scala.collection.parallel._ and import scala.collection.parallel.CollectionConverters._) to automatically change my iterations into parallelizations. But I faced one problem for a very long time: I cannot control the number of CPUs for this.
It seems that it will use all my CPUs available at once if needed. From the official document, I tried to use
import java.util.concurrent.ForkJoinPool
val taskPar = gz2zstTasks.par
taskPar.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(2))
Then, in the .jvmopts file right under my project, I use -Djava.util.concurrent.ForkJoinPool.common.parallelism=2 to control the number of CPUs. But it does not work.
I also tried to add -XX:ActiveProcessorCount=20 in .jvmopts, but it does not work, too.
I think I must miss something here, and it should be easy to control the number of CPUs when using it.
I haven’t actually done this myself, but the recommended way to change the parallelism on an individual collection is like so:
import scala.collection.parallel._
val tasksupport = new ForkJoinTaskSupport(new java.util.concurrent.ForkJoinPool(2))
val pc = mutable.ParArray(1, 2, 3)
pc.tasksupport = tasksupport
If you want all your parallel collection operations to run using 2-ish threads, you can set them each to use that tasksupport; if you want each to get their own pool of 2-ish threads you can create a new one each time, etc..
Parallel collections, unlike the Future stack, do not use an implicit execution context; they read the global one and use it. You can apparently set it via system properties as described in the description of the global field in ExecutionContext.
I also don’t haven’t set system properties in a very very long time, but that’s where to start. (The odd thing about them is that they are strings.)
If you need to preserve TaskSupport when forking a task, then the first issue expresses the expectation that setting the support on the inner collection should work and the second issue talks about funky interactions when attempting that. (I don’t remember the details or what happened to JDK 9 ForkJoinPool support. Example test leveraging JDK 9 scala/test/files/jvm/scala-concurrent-tck-b.scala at 2.13.x · scala/scala · GitHub)
@som-snytt Thanks for your reply! I don’t know anything about TaskSupport. My aim is to simply let my program use few CPUs intead of eating all the resources at once.
in .jvmopts file right under my project root. I am using SBT for my project .
It seems that I still cannot control the total number of processors. It just use all the available ones. Any suggestions on this? I think sbt should read the jvmopts for me.
Thanks, I don’t know what happened in my configurations. (I deleted ~/.sbt, but it still did not work). Will follow your suggestion to create a minimal project for this testing.
Are you running your tests in a forked JVM? .jvmopts controls sbt’s own JVM, but it does not control forked JVMs; those are controlled by the javaOptions setting, as per sbt Reference Manual — Forking
@SethTisue Thanks! I tried to set javaOptions instead of .jvmopts. But it does not work.
// Build-wide settings
ThisBuild / organization ≔ "io.github.beyondpie"
ThisBuild / organizationName ≔ "zulab"
ThisBuild / scalaVersion ≔ "3.7.1"
ThisBuild / logLevel ≔ Level.Info
ThisBuild / resolvers += "Bioviz".at(
"https://nexus.bioviz.org/repository/maven-releases/")
ThisBuild / Compile / scalacOptions ≔ List(
"-encoding",
"utf8",
"-feature",
"-language:implicitConversions",
"-language:existentials",
// "-experimental",
"-unchecked",
"-explain-types",
"-explain",
"-deprecation"
)
// Ref: https://www.scala-sbt.org/1.x/docs/Multi-Project.html
lazy val bioscala = (project in file("bioscala"))
.settings(
name ≔ "bioscala",
version ≔ "0.7.0",
Test / logBuffered ≔ ⊥,
javaOptions ++= Seq(
"Dscala.concurrent.context.numThreads=2",
"Dscala.concurrent.context.maxThreads=4"
),
libraryDependencies ++= Seq(
"org.scalatest" %% "scalatest" % "3.2.19" % "test",
"com.lihaoyi" %% "os-lib" % "0.11.4",
"org.scala-lang.modules" %% "scala-parallel-collections" % "1.2.0",
// only works on scala 3.6.3 now
// "com.lihaoyi" % "ammonite" % "3.0.2" % "test" cross CrossVersion.full,
"commons-io" % "commons-io" % "2.19.0",
"org.jsoup" % "jsoup" % "1.20.1",
// I removed some other libraries here.
)
)
lazy val pt = (project in file("100.project"))
.dependsOn(bioscala)
.settings(
name ≔ "pt",
version ≔ "0.9"
)
lazy val DE = (project in file("12.DE"))
.dependsOn(bioscala, pt)
.settings(
name ≔ "DE",
version ≔ "1.0"
)
Here is my project’s build.sbt. I put javaOptions inside project bioscala, where my test is under project DE. I am not sure if there is something I got wrong when setting sbt. I have a script under src/test/scala, and use Test/run under project 12.DE for this parallel test under sbt.
If I put javaOptions inside 12.DE, the test reports that “[warn] Test / run / javaOptions will be ignored, Test / run / fork is set to false”, and still it uses more processors than what I imagined (I think scala.concurrent.context.numThreads=2 should use 2 CPUs? ).
Hi @som-snytt
Thanks for correcting me. Yes, I add your suggested configurations into test (under one of my project with test, 12.DE). It still uses more CPUs.
My own observation aligns with Som’s. If I do this:
% scala --dep org.scala-lang.modules::scala-parallel-collections:1.2.0 \
-Dscala.concurrent.context.maxThreads=2
Welcome to Scala 3.7.0 (21.0.7, Java OpenJDK 64-Bit Server VM).
Type in expressions for evaluation. Or try :help.
scala> import scala.collection.parallel.CollectionConverters._
scala> List.fill(500)(0).par.map(_ => while(true) {})
I see 200% CPU usage in MacOS’s Activity Monitor, as expected.
And if I leave out the -Dscala.concurrent.context.maxThreads=2, then I get much higher CPU usage — around 1150%.
Note that my experience shows that it’s sufficient to set maxThreads only.
I think you need to:
Verify that you are able to reproduce my results yourself, using the reproduction steps shown above
Do some investigating and apply some rigor to the problem and figure out what it is that you are doing differently
Having me, Rex, and Som all try to guess what might be going on in your code that we haven’t seen — well, that just doesn’t seem to be working. Or rather, we have seem to have made some progress, but not all the way to a solution.