Function performance: List and Vector benchtest

counter · July 2, 2020, 10:47am

When implementing the function of generate Pascal’s triangle, I find a fact that List is faster than Vector in LazyList

Codes as following

package myjmh
import java.util.concurrent.TimeUnit

import org.openjdk.jmh.annotations._

@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@BenchmarkMode(Array(Mode.AverageTime))
class fibTest {
  def generateF1(numRows: Int): Vector[Vector[Int]] = {
    LazyList.iterate(Vector(1))(v => (0 +: v :+ 0).sliding(2).map(_.sum).toVector).take(numRows).toVector
  }

  def generateF2(numRows: Int): List[List[Int]] = {
    LazyList.iterate(List(1))(v => (0 +: v :+ 0).sliding(2).map(_.sum).toList).take(numRows).toList
  }


  @Benchmark
  @OperationsPerInvocation
  def mytest1() = {
    generateF1(2000)
  }

  @Benchmark
  @OperationsPerInvocation
  def mytest2() = {
    generateF2(2000)
  }
}

I use sbt-jmh to test performance.

sbt> jmh:run -i 3 -wi 3 -f3 -t2 .*mytest.*
...
[info] # Run complete. Total time: 00:06:33
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark        Mode  Cnt       Score       Error  Units
[info] fibTest.mytest1  avgt    9  756295.431 ± 15359.877  us/op
[info] fibTest.mytest2  avgt    9  584970.123 ± 18923.042  us/op

However, the two function diffs just :+ and +: and convert function like toList or toVector

In Programming in Scala 3rd , it lists the time cost of each collection.

Thus eC is slow than C, but L and much slower than ec, before test, I think Vector is faster than List in this code, not to say Vector in 2.13.2 is re-implement by new alogorithm:

My question is, why Vector is slower than List here?

My environment :

WSL
Scala 2.13.2
Java 1.8.0_181

tarsa · July 2, 2020, 11:02am

More like “why is .toVector slower than .toList here” as I don’t see other differences. Scala’s stdlib uses mutability for performance reasons. .toList uses mutability to construct a List quickly (in linear time, with constant memory overhead). .toVector probably also uses mutability but maybe in a less effective way.

Jasper-M · July 2, 2020, 12:36pm

For one thing, a lot of stuff other than prepending and appending is going on in that code.

Another thing is that the table doesn’t really list the time cost. It shows how the time it takes to complete an operation evolves as a collection gets bigger. Constant means that if operation X takes 10 milliseconds on a collection of size 10, then it will take 10 milliseconds on a collection of size 1 million. Linear means that if that operation takes 10 milliseconds for size 10, and 100 for size 100, it will probably take around 1 million milliseconds for size 1 million. eC means that the time will grow a little but very slowly compared to L. But that doesn’t say anything about how long an operation really takes. Append on List can be L but still very fast on relatively small lists. While maybe on Vector it’s pretty slow compared to List for a small size, but will be faster than List for larger sizes.

And not entirely related to the question, but I wouldn’t use LazyList as an intermediate builder for another collection. In this case you could use List.iterate(List(1), numRows)(...). Or in general if you want to avoid generating intermediary garbage or want to fuse intermediary operations together, I would start from Iterator instead of LazyList.

hernansaab · July 4, 2020, 12:22pm

I would be interested in knowing how the cats Chain perform compared to List

szeiger · July 5, 2020, 4:12pm

sliding is a major factor here that hasn’t been mentioned yet.

Starting with a version of the test case that removes the LazyList and follows good JMH benchmarking practice (like not using constants and consuming the results to make sure you’re actually computing what you think at runtime):

  @Param(Array("2000"))
  var size: Int = _

  @Benchmark
  def testVector(bh: Blackhole) = {
    var v = Vector(1)
    var i = 1
    while(i < size) {
      v = (0 +: v :+ 0).sliding(2).map(_.sum).toVector
      i += 1
    }
    bh.consume(v)
  }

  @Benchmark
  def testList(bh: Blackhole) = {
    var v = List(1)
    var i = 1
    while(i < size) {
      v = (0 +: v :+ 0).sliding(2).map(_.sum).toList
      i += 1
    }
    bh.consume(v)
  }

For this version I get

[info] Benchmark                       (size)  Mode  Cnt       Score      Error  Units
[info] CollectionBenchmark.testList      2000  avgt   20  252418.689 ± 1338.564  us/op
[info] CollectionBenchmark.testVector    2000  avgt   20  321702.395 ± 2495.787  us/op

Using GroupedIterator with additional collection conversions so we only use a different implementation for sliding but still operate on the same types (i.e. .iterator.sliding(2).map(_.to{List|Vector}.sum) instead of .sliding(2).map(_.sum)):

[info] CollectionBenchmark.testList      2000  avgt   20  219279.824 ± 2815.904  us/op
[info] CollectionBenchmark.testVector    2000  avgt   20  255130.429 ± 5141.456  us/op

It looks like we have a really bad default implementation of sliding (which is overridden by neither List nor Vector). In both cases the Iterator detour is considerably faster despite having to build an ArraySeq first which then gets converted to a Vector or List.

Finally, summing over the ArraySeq without converting back:

[info] CollectionBenchmark.testList      2000  avgt   20  188229.061 ± 2082.610  us/op
[info] CollectionBenchmark.testVector    2000  avgt   20  181409.047 ±  485.285  us/op

So toVector from an ArraySeq is considerably slower than toList. I didn’t dig any deeper here. With the relatively large instance size of 1000 (on average) this seems a bit odd but it’s probably to be expected. Building a Vector has more constant overhead. Memory use and allocations will be much lower but allocating small objects in a tight loop is fast and has good memory locality. The price you pay is increased memory footprint and higher gc pressure but we’re not measuring either of those.

There is room for improvement though. Vector building has no optimization for building from large arrays. Small arrays (<= 32) are special-cased but larger ones get appended element by element (different from appending a Vector, which is done with 1 or 2 arraycopy operations per 32-element slice, depending on alignment). The collections library should provide a general abstraction for optimizing such array-based bulk operations.

szeiger · July 6, 2020, 2:30am

Here’s a better one: Faster sliding for IndexedSeq by szeiger · Pull Request #9103 · scala/scala · GitHub