Promising performance results from treeMapReduce

Thank you to the several people helped me with he development of my treeMapReduce function. I have some preliminary tests of its performance as compared to normal foldLeft. I’m adding all the fractions of the form 1/n (excluding 1/0) from -1/r to 1/r, with r being in `List(10, 20, 50, 100, 200, 500, 1000, 2000, 5000).

The results are interesting. From somewhere between r=50 and r=100, my implementation of treeMapReduce behaves better, and apparently much better as the size increases. I don’t know yet whether it is polynomially better or exponentially better.

10 treeMapReduce: Elapsed time: 151011.685 u-seconds
10          fold: Elapsed time: 1285.852 u-seconds

20 treeMapReduce: Elapsed time: 883.631 u-seconds
20          fold: Elapsed time: 523.748 u-seconds

50 treeMapReduce: Elapsed time: 7806.193 u-seconds
50          fold: Elapsed time: 8578.5 u-seconds

100 treeMapReduce: Elapsed time: 3178.148 u-seconds
100          fold: Elapsed time: 12028.774 u-seconds

200 treeMapReduce: Elapsed time: 6827.183 useconds
200          fold: Elapsed time: 18540.335 useconds

500 treeMapReduce: Elapsed time: 12062.567 u-seconds
500          fold: Elapsed time: 70240.239 u-seconds

1000 treeMapReduce: Elapsed time: 16772.648 u-seconds
1000          fold: Elapsed time: 241687.172 u-seconds

2000 treeMapReduce: Elapsed time: 26706.833 u-seconds
2000          fold: Elapsed time: 1523622.028 u-seconds

5000 treeMapReduce: Elapsed time: 83140.555 u-seconds
5000          fold: Elapsed time: 2.0322914918E7 u-seconds

Here is the code I’m using to make the comparison.

  def time[R](name: String, block: => R): R = {
    val t0 = System.nanoTime()
    val result = block // call-by-name
    val t1 = System.nanoTime()
    println(s"$name: Elapsed time: ${(t1 - t0) / 1000.0} =u-seconds") // time in micro seconds

    import treereduce.TreeReducable._
    import treereduce.TreeReduce._
    import spire.algebra._
    import spire.implicits._
    import spire.math._

    val zero = Rational(0,1)
    for {r <- List(10, 20, 50, 100, 200, 500, 1000, 2000, 5000)} {
      val piercedInterval = (-r to -1) ++ (1 to r)
      time(s"$r treeMapReduce", {
        val sum = piercedInterval.treeMapReduce(Rational(0, 1))(Rational(1, _), _ + _)
        assert(sum === zero)
      time(s"$r          fold", {
        val sum = {
          Rational(1, _)
        }.foldLeft(Rational(0, 1))(_ + _)
        assert(sum === zero)

Adding rational numbers is the type of folding which treeMapReduce was designed for. The idea is that the more such rational numbers you try to add, the larger the denominators will become and the more difficult it is to find the least common denominator to add them. The more rationals you try to add, the more the calculation is dominated by this search for gcd. The goal of treeMapReduce is to keep the gcds small as long as possible. It does this my making the heuristic assumption that the given numbers has small denominators as compared to the intermediate results.

Please use JMH or some other reasonable benchmarking framework if you want to have a decent chance of your timings meaning something!

Note, for instance, that your times for size 10 are way higher than for 20–this is because you’re not warming up and therefore are getting JIT compilation time also. Note also that 50 for treeMapReduce is higher than both 20 and 100, probably because of more JIT compilation and/or being impacted by a garbage collection pause.

By the time you get to 1000 it’s very clearly a win, but everything else should be viewed with great suspicion.

@Ichoran, I’m not familiar with JMH, but I looked at some web site describing how to use it, and tried to follow the example.

Lots of things go wrong.

Perhaps you could suggest a better source of information I could look into in order to use JMH as you suggest.

[info] Updating ProjectRef(uri("file:/Users/jnewton/sw/regular-type-expression/cl-robdd/src/cl-robdd-scala/project/"), "cl-robdd-scala-build")...
[warn] 	module not found: pl.project13.scala#sbt-jmh;0.2.4
[warn] ==== typesafe-ivy-releases: tried
[warn] ==== sbt-plugin-releases: tried
[warn] ==== local: tried
[warn]   /Users/jnewton/.ivy2/local/pl.project13.scala/sbt-jmh/scala_2.12/sbt_1.0/0.2.4/ivys/ivy.xml
[warn] ==== public: tried
[warn] ==== local-preloaded-ivy: tried
[warn]   /Users/jnewton/.sbt/preloaded/pl.project13.scala/sbt-jmh/0.2.4/ivys/ivy.xml
[warn] ==== local-preloaded: tried
[warn]   file:////Users/jnewton/.sbt/preloaded/pl/project13/scala/sbt-jmh_2.12_1.0/0.2.4/sbt-jmh-0.2.4.pom
[warn] 	::::::::::::::::::::::::::::::::::::::::::::::
[warn] 	::          UNRESOLVED DEPENDENCIES         ::
[warn] 	::::::::::::::::::::::::::::::::::::::::::::::
[warn] 	:: pl.project13.scala#sbt-jmh;0.2.4: not found
[warn] 	::::::::::::::::::::::::::::::::::::::::::::::
[warn] 	Note: Some unresolved dependencies have extra attributes.  Check that these dependencies exist with the requested attributes.
[warn] 		pl.project13.scala:sbt-jmh:0.2.4 (scalaVersion=2.12, sbtVersion=1.0)
[warn] 	Note: Unresolved dependencies path:
[warn] 		pl.project13.scala:sbt-jmh:0.2.4 (scalaVersion=2.12, sbtVersion=1.0) (/Users/jnewton/sw/regular-type-expression/cl-robdd/src/cl-robdd-scala/project/plugins.sbt#L1-2)
[warn] 		  +- default:cl-robdd-scala-build:0.1.0-SNAPSHOT (scalaVersion=2.12, sbtVersion=1.0)
[error] sbt.librarymanagement.ResolveException: unresolved dependency: pl.project13.scala#sbt-jmh;0.2.4: not found
[error] 	at sbt.internal.librarymanagement.IvyActions$.resolveAndRetrieve(IvyActions.scala:332)
[error] 	at sbt.internal.librarymanagement.IvyActions$.$anonfun$updateEither$1(IvyActions.scala:208)
[error] 	at sbt.internal.librarymanagement.IvySbt$Module.$anonfun$withModule$1(Ivy.scala:239)
[error] 	at sbt.internal.librarymanagement.IvySbt.$anonfun$withIvy$1(Ivy.scala:204)
[error] 	at sbt.internal.librarymanagement.IvySbt.sbt$internal$librarymanagement$IvySbt$$action$1(Ivy.scala:70)
[error] 	at sbt.internal.librarymanagement.IvySbt$$anon$
[error] 	at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:95)
[error] 	at xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:80)
[error] 	at xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:99)

Nice. :slight_smile:

Have you checked whether conflating the “map” step really yields a performance improvement compared to a vanilla #map() invocation upfront and a subsequent #treeReduce()?

Try a newer plugin version - the blog post is from 2015…

@sangamon, no I haven’t checked how it effects the performance. But semantically it is necessary for my application, so I included it. My suspicion, that whether map is done sooner or later has no real effect, as it will be called the same number of times on exactly the same arguments.

Well, #map() (without a prior #view) will create an intermediate collection, so there will be some effect - I’d just hope that, at least with #view thrown in, it should be negligible.

Personally I’d try to limit the scope of the function to the core of what it actually tries to improve over vanilla #foldLeft() - makes it simpler and easier to reuse in other scenarios.

Here are the results including the pre-mapped time. In this case I simply map the input data by the function Rational(1._) and then using the function identity in the call to treeMapReduce. I’ve also reverse the computation to try to mitigate the effect of JVM warmup time.

7500 treeMapReduce: Elapsed time: 366.954679 ms
7500 treeMapReduce (premapped): Elapsed time: 248.117361 ms
7500          fold: Elapsed time: 65680.342273 ms

4000 treeMapReduce: Elapsed time: 56.033245 ms
4000 treeMapReduce (premapped): Elapsed time: 57.941081 ms
4000          fold: Elapsed time: 10621.126162 ms

2000 treeMapReduce: Elapsed time: 15.521412 ms
2000 treeMapReduce (premapped): Elapsed time: 14.801642 ms
2000          fold: Elapsed time: 1452.061682 ms

1000 treeMapReduce: Elapsed time: 6.830617 ms
1000 treeMapReduce (premapped): Elapsed time: 8.010081 ms
1000          fold: Elapsed time: 249.093154 ms

750 treeMapReduce: Elapsed time: 6.145716 ms
750 treeMapReduce (premapped): Elapsed time: 4.25656 ms
750          fold: Elapsed time: 98.032205 ms

400 treeMapReduce: Elapsed time: 1.635057 ms
400 treeMapReduce (premapped): Elapsed time: 1.297002 ms
400          fold: Elapsed time: 24.82132 ms

200 treeMapReduce: Elapsed time: 0.524225 ms
200 treeMapReduce (premapped): Elapsed time: 0.48011 ms
200          fold: Elapsed time: 5.309775 ms

100 treeMapReduce: Elapsed time: 0.337112 ms
100 treeMapReduce (premapped): Elapsed time: 0.35326 ms
100          fold: Elapsed time: 1.027745 ms

75 treeMapReduce: Elapsed time: 0.174846 ms
75 treeMapReduce (premapped): Elapsed time: 0.178162 ms
75          fold: Elapsed time: 0.612126 ms

40 treeMapReduce: Elapsed time: 0.080857 ms
40 treeMapReduce (premapped): Elapsed time: 0.093869 ms
40          fold: Elapsed time: 0.089154 ms

20 treeMapReduce: Elapsed time: 0.062364 ms
20 treeMapReduce (premapped): Elapsed time: 0.077151 ms
20          fold: Elapsed time: 0.080771 ms

10 treeMapReduce: Elapsed time: 0.062454 ms
10 treeMapReduce (premapped): Elapsed time: 0.073589 ms
10          fold: Elapsed time: 0.060697 ms

As suspected, it does not make much difference in this test case when the seqop function is called.