How to control number of CPUs used when using par

beyondpie · June 12, 2025, 5:11pm

Dear Scala Group,

I am using Scala3, and one of my favorite part is to use par (after import scala.collection.parallel._ and import scala.collection.parallel.CollectionConverters._) to automatically change my iterations into parallelizations. But I faced one problem for a very long time: I cannot control the number of CPUs for this.

It seems that it will use all my CPUs available at once if needed. From the official document, I tried to use

import java.util.concurrent.ForkJoinPool
val taskPar = gz2zstTasks.par

taskPar.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(2))

Then, in the .jvmopts file right under my project, I use -Djava.util.concurrent.ForkJoinPool.common.parallelism=2 to control the number of CPUs. But it does not work.

I also tried to add -XX:ActiveProcessorCount=20 in .jvmopts, but it does not work, too.

I think I must miss something here, and it should be easy to control the number of CPUs when using it.

Really appreciate any help on this. Thanks!

Songpeng

Ichoran · June 12, 2025, 6:34pm

I haven’t actually done this myself, but the recommended way to change the parallelism on an individual collection is like so:

import scala.collection.parallel._
val tasksupport = new ForkJoinTaskSupport(new java.util.concurrent.ForkJoinPool(2))

val pc = mutable.ParArray(1, 2, 3)
pc.tasksupport = tasksupport

If you want all your parallel collection operations to run using 2-ish threads, you can set them each to use that tasksupport; if you want each to get their own pool of 2-ish threads you can create a new one each time, etc..

Parallel collections, unlike the Future stack, do not use an implicit execution context; they read the global one and use it. You can apparently set it via system properties as described in the description of the global field in ExecutionContext.

I also don’t haven’t set system properties in a very very long time, but that’s where to start. (The odd thing about them is that they are strings.)

beyondpie · June 12, 2025, 6:50pm

Thank you! I don’t know ExecutionContext before, I will give it a look. It looks that I can set them as java parameters in .jvmopts. WIll try it.

som-snytt · June 12, 2025, 8:13pm

If the global config works, great.

If you need to preserve TaskSupport when forking a task, then the first issue expresses the expectation that setting the support on the inner collection should work and the second issue talks about funky interactions when attempting that. (I don’t remember the details or what happened to JDK 9 ForkJoinPool support. Example test leveraging JDK 9 scala/test/files/jvm/scala-concurrent-tck-b.scala at 2.13.x · scala/scala · GitHub)

github.com/scala/scala-parallel-collections

Controlling parallelism in nested parallel collection

opened 10:53PM - 30 Oct 17 UTC

closed 05:14AM - 12 Jan 18 UTC

algobardo

I have really hard time understanding why the following behaviour of parallel co…llections in scala should be the expected one. Consider the following piece of code ``` import java.util.concurrent.TimeUnit import scala.collection.parallel.ForkJoinTaskSupport object test extends App { val CHANGE_ME = false val externalParallelism = if (CHANGE_ME) 10 else 2 val l = (1 until 100).par val l2 = List(1).par // par but with only one element l2.tasksupport = new ForkJoinTaskSupport( new scala.concurrent.forkjoin.ForkJoinPool(externalParallelism)) l.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(2)) l2.map { j => l.map { i => println(s"STARTED $i") TimeUnit.SECONDS.sleep(10000) // blocking println(s"DONE $i") } } } ``` If the code above is ran with CHANGE_ME = false you will immediately observe ``` STARTED 1 STARTED 50 ``` showing that the current parallelism is 2, which is correct since the internal parallel collection `l` is bound to a task support with a fork-join thread pool with parallelism 2. If now you set CHANGE_ME = true, you will surprisingly observe STARTED 87 STARTED 93 STARTED 13 STARTED 25 STARTED 75 STARTED 56 STARTED 62 STARTED 50 STARTED 1 witnessing that the internal parallel collection is using threads that should be bound to the external parallel collection. I guess the reason has to be found in the `executeAndWaitResult` ``` def executeAndWaitResult[R, Tp](task: Task[R, Tp]): R = { val fjtask = newWrappedTask(task) if (Thread.currentThread.isInstanceOf[ForkJoinWorkerThread]) { fjtask.fork } else { forkJoinPool.execute(fjtask) } fjtask.sync() // if (fjtask.body.throwable != null) println("throwing: " + fjtask.body.throwable + " at " + fjtask.body) fjtask.body.forwardThrowable() fjtask.body.result } ``` where it is the current thread identity that drive the pool where the task is pushed. - Is this the expected behaviour? - If so, how can the parallelism of `l` be bound then in situations where the parallel collection is created and operated in a ForkJoinWorkerThread ? - The example above is artificial, but in my case I have parallel scala tests that are causing "internal" parallel collections ignoring the parallelism bound. JVM: java version "1.8.0_121" Scala: 2.11.11

github.com/scala/scala-parallel-collections

inner .par collection breaks outer ForkJoinTaskSupport on Linux x86 Scala 2.12.5

opened 09:48PM - 23 Jul 18 UTC

closed 03:35AM - 23 Aug 18 UTC

ktegan

I first submitted this to stackoverflow and Volodymyr Glushak was not able to re…create the bug on Mac OS or using scastie. I'm hoping that someone else can recreate this issue or figure out what about my current environment might be causing the problem. I'm using Scala 2.12.5 and Java OpenJDK 1.8.0_161-b14 on a Linux 3.10.0 x86_64 kernel. I want to make a parallel collection that uses a fixed number of threads. The standard advice for this is to set tasksupport for the parallel collection to use a ForkJoinTaskSupport with a ForkJoinPool with a fixed number of threads. That works fine UNTIL the processing you are doing in your parallel collection itself uses a parallel collection. When this is the case it appears that the limit set for the ForkJoinPool goes away. A simple test looks something like the following: import java.util.concurrent.atomic.AtomicInteger import java.util.concurrent.ForkJoinPool import scala.collection.parallel.ForkJoinTaskSupport object InnerPar { val numTasks = 100 val numThreads = 10 /** return maximum number of simultaneous threads running when requesting numThreads */ def forkJoinPoolMaxThreads(useInnerPar:Boolean): Int = { // every thread in the outer collection will increment // and decrement this counter as it starts and exits val threadCounter = new AtomicInteger(0) // this function increments and decrements threadCounter // on start and exit, optionally creates an inner parallel collection // and finally returns the thread count it found at startup def incrementAndCountThreads(idx:Int):Int = { val otherThreadsRunning:Int = threadCounter.getAndAdd(1) if (useInnerPar) { (0 until 20).toSeq.par.map { elem => elem + 1 } } Thread.sleep(10) threadCounter.getAndAdd(-1) otherThreadsRunning + 1 } // create parallel collection using a ForkJoinPool with numThreads val parCollection = (0 until numTasks).toVector.par parCollection.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(numThreads)) val threadCountLogList = parCollection.map { idx => incrementAndCountThreads(idx) } // return the maximum number of simultaneous threads // running simultaneously (as counted on each thread start) threadCountLogList.max } def main(args:Array[String]):Unit = { val testConfigs = Seq(true, false, true, false) testConfigs.foreach { useInnerPar => val maxThreads = forkJoinPoolMaxThreads(useInnerPar) // the total number of threads running should not have exceeded // numThreads at any point val isSuccess = (maxThreads <= numThreads) println(f"useInnerPar $useInnerPar%6s, success is $isSuccess%6s (maxThreads $maxThreads%4d)") } } } And from this in my environment I get the following output, showing that more than numThreads (in the example 10) threads are running simultaneously if we create a parallel collection inside of incrementAndCountThreads(). useInnerPar true, success is false (maxThreads 22) useInnerPar false, success is true (maxThreads 10) useInnerPar true, success is false (maxThreads 24) useInnerPar false, success is true (maxThreads 10) Also note that using a ForkJoinTaskSupport in the inner collection does not fix the problem. In other words you get the same results if you use the following code for the inner collection: if (useInnerPar) { val innerParCollection = (0 until 20).toVector.par innerParCollection.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(3)) innerParCollection.map { elem => elem + 1 } } Am I missing something? If not is there a way to work around this? Thanks!

Ditto.

beyondpie · June 14, 2025, 4:58am

@som-snytt Thanks for your reply! I don’t know anything about TaskSupport. My aim is to simply let my program use few CPUs intead of eating all the resources at once.

@Ichoran Following your suggestions, I put

-Dscala.concurrent.context.numThreads=2
-Dscala.concurrent.context.maxThreads=4

in .jvmopts file right under my project root. I am using SBT for my project .
It seems that I still cannot control the total number of processors. It just use all the available ones. Any suggestions on this? I think sbt should read the jvmopts for me.

Thanks!

som-snytt · June 14, 2025, 1:50pm

I tried it with scala-cli and using javaOpt, and also with sbt and .jvmopts, and both just worked.

I suggest you share a minimal project that demonstrates the issue, and what command you run.

It’s worth deleting your ~/.sbt to exclude other factors.

beyondpie · June 15, 2025, 5:29am

Thanks, I don’t know what happened in my configurations. (I deleted ~/.sbt, but it still did not work). Will follow your suggestion to create a minimal project for this testing.

SethTisue · June 15, 2025, 3:28pm

Are you running your tests in a forked JVM? .jvmopts controls sbt’s own JVM, but it does not control forked JVMs; those are controlled by the javaOptions setting, as per sbt Reference Manual — Forking

beyondpie · June 15, 2025, 5:49pm

@SethTisue Thanks! I tried to set javaOptions instead of .jvmopts. But it does not work.

// Build-wide settings
ThisBuild / organization ≔ "io.github.beyondpie"
ThisBuild / organizationName ≔ "zulab"
ThisBuild / scalaVersion ≔ "3.7.1"
ThisBuild / logLevel ≔ Level.Info
ThisBuild / resolvers += "Bioviz".at(
  "https://nexus.bioviz.org/repository/maven-releases/")

ThisBuild / Compile / scalacOptions ≔ List(
  "-encoding",
  "utf8",
  "-feature",
  "-language:implicitConversions",
  "-language:existentials",
  // "-experimental",
  "-unchecked",
  "-explain-types",
  "-explain",
  "-deprecation"
)

// Ref: https://www.scala-sbt.org/1.x/docs/Multi-Project.html
lazy val bioscala = (project in file("bioscala"))
  .settings(
    name ≔ "bioscala",
    version ≔ "0.7.0",
    Test / logBuffered ≔ ⊥,
    javaOptions ++= Seq(
      "Dscala.concurrent.context.numThreads=2",
      "Dscala.concurrent.context.maxThreads=4"
    ),
    libraryDependencies ++= Seq(
      "org.scalatest" %% "scalatest" % "3.2.19" % "test",
      "com.lihaoyi" %% "os-lib" % "0.11.4",
      "org.scala-lang.modules" %% "scala-parallel-collections" % "1.2.0",
      // only works on scala 3.6.3 now
      // "com.lihaoyi" % "ammonite" % "3.0.2" % "test" cross CrossVersion.full,
      "commons-io"           % "commons-io"  % "2.19.0",
      "org.jsoup"            % "jsoup"       % "1.20.1",
     // I removed some other libraries here.
    )
  )
lazy val pt = (project in file("100.project"))
  .dependsOn(bioscala)
  .settings(
    name ≔ "pt",
    version ≔ "0.9"
  )
lazy val DE = (project in file("12.DE"))
  .dependsOn(bioscala, pt)
  .settings(
    name ≔ "DE",
    version ≔ "1.0"
  )

Here is my project’s build.sbt. I put javaOptions inside project bioscala, where my test is under project DE. I am not sure if there is something I got wrong when setting sbt. I have a script under src/test/scala, and use Test/run under project 12.DE for this parallel test under sbt.

If I put javaOptions inside 12.DE, the test reports that “[warn] Test / run / javaOptions will be ignored, Test / run / fork is set to false”, and still it uses more processors than what I imagined (I think scala.concurrent.context.numThreads=2 should use 2 CPUs? ).

som-snytt · June 15, 2025, 6:26pm

This works for test; per instructions, use Test/run for run.

//Test / run / fork := true
Test / fork := true

Test / javaOptions ++= Seq(
  "-Dscala.concurrent.context.numThreads=2",
  "-Dscala.concurrent.context.maxThreads=4"
)

beyondpie · June 15, 2025, 7:13pm

Hi @som-snytt
Thanks for correcting me. Yes, I add your suggested configurations into test (under one of my project with test, 12.DE). It still uses more CPUs.

SethTisue · June 15, 2025, 7:22pm

It appears you left out the -s before the Ds.

SethTisue · June 15, 2025, 7:25pm

This can be separated into two parts:

Is the system property actually being set in the JVM process in question?
Is the system property having the effect you intended?

It’s important to determine which part is the point of failure.

You can distinguish the two by inserting something like println(sys.props("scala.concurrent.context.numThreads")) into the code you’re running.

beyondpie · June 16, 2025, 1:20am

@SethTisue Thanks for your reply!

It appears you left out the -s before the Ds.

Yes, I’ve updated it. Thanks. But the result is not changed.

println(sys.props("scala.concurrent.context.numThreads"))

It says:

[info] running (fork) controlParallelCollection
[info] 2

So I think the system property actually is set. I am using Scala 3.7.1, Jdk 22.0.1-internal 2024-04-16. Not sure if this affects the results?

SethTisue · June 16, 2025, 1:54am

My own observation aligns with Som’s. If I do this:

% scala --dep org.scala-lang.modules::scala-parallel-collections:1.2.0 \
-Dscala.concurrent.context.maxThreads=2 
Welcome to Scala 3.7.0 (21.0.7, Java OpenJDK 64-Bit Server VM).
Type in expressions for evaluation. Or try :help.
                                                                                                    
scala> import scala.collection.parallel.CollectionConverters._
                                                                                                    
scala> List.fill(500)(0).par.map(_ => while(true) {})

I see 200% CPU usage in MacOS’s Activity Monitor, as expected.

And if I leave out the -Dscala.concurrent.context.maxThreads=2, then I get much higher CPU usage — around 1150%.

Note that my experience shows that it’s sufficient to set maxThreads only.

I think you need to:

Verify that you are able to reproduce my results yourself, using the reproduction steps shown above
Do some investigating and apply some rigor to the problem and figure out what it is that you are doing differently

Having me, Rex, and Som all try to guess what might be going on in your code that we haven’t seen — well, that just doesn’t seem to be working. Or rather, we have seem to have made some progress, but not all the way to a solution.

beyondpie · June 16, 2025, 2:35am

@SethTisue Thanks! I can reproduce your results! I will look into my codes. Thanks again!