Parallel collections using multiple cores

I’m queuing several 100 scala programs simultaneously on a compute cluster which has nodes of 16, 24, or 32 cores depending on the scheduler. When I submit a job I can specify how many cores a process needs.

In my scala program my parallization algorithm is very simple. I’m using .par.map on a Set[…] of tuples, to allow the function reducePosCountClausesIncremental to be called independently on each element of the set. (shown int he code below).

QUESTION: Is the .par library efficiently using the 4 cores? How can I know? If the library is asking how many cores there are, then it is getting back the answer 16, 24, or 32, rather than 4.

Is there some kind of feedback I can ask for from the parallel collection to ask who many threads/core/etc it used?

I see that the scala.collection.parallel.TaskSupport object has an API for twerking the parallization strategy. Is this something I need to fiddle with? or is the library already doing a good job?

Basically, should I blindly and naively trust the parallization strategy of the standard library? or should I be skeptical and try to tweak it for my particular computation and hardware?

    def loop(posCounts: Set[(Int, Int, ClauseAsList)]): Unit = {

      if (posCounts.nonEmpty) {
        val RemoveAdd(removes, adds) = foldUps(posCounts.par.map((reducePosCountClausesIncremental _).tupled).toList)
        loop(calcNextPhase(removes, adds))
      }
    }