Collect may need a way to skip the current element within the case statement

Consider the following:
view
.collect {
case o if f(o) => bigFn(o)
}
.collect {
case p if p.value > 10 => g§
}
https://pastebin.com/skckN5DF (with formatting)

In this circumstance, for performance reasons, the fewer classes generated the better. bigFn is performance-heavy enough that we don’t want to call it twice. If we were able to return a value from a collect statement to indicate that we wanted to “skip” the current element, we would be able to avoid the performance cost of calling collect twice in a row. For example:
view
.collect {
case o if f(o) => {
val oVal = bigFn(o)
if oVal.value > 10 oVal else Nothing
}
}
https://pastebin.com/ka8m3eJa

What do you think about this?

You can use flatMap:

view.flatMap { o =>

if (!f(o)) None

else Some(bigFn(o)).filter(_ > 10)

}

My main reservation about doing so is the performance penalty that flatmap incurs.

Benchmark                              Mode  Cnt        Score      Error  Units
FilterBenchmark.withCollect           thrpt   10  1359035.689 ± 2749.815  ops/s
FilterBenchmark.withCollectTypeMatch  thrpt   10  1361227.743 ± 2337.850  ops/s
FilterBenchmark.withFlatMap           thrpt   10   113074.826 ±  288.107  ops/s
FilterBenchmark.withFlatMapTypeMatch  thrpt   10   113188.419 ±  262.826  ops/s

FlatMap generates options for every item in the sequence, whereas collect filters the items before operating on them. This leads to a 13x deficit in performance in a circumstance in which this deficit matters.

For high-performance code like this you will see considerably larger gains by writing custom code than you will by hoping that the library happens to hit your exact use-case.

In your particular case you can write a faster version with a builder and an iterator.

val b = Array.newBuilder[Foo]
val i = xs.iterator
while (i.hasNext) {
  val o = i.next
  if (f(o)) {
    val temp = bigFn(o)
    if (temp.value > 10) b += temp
  }
}
b.result()

This manually fuses the iterator- or view-based version

xs.iterator.filter(f).map(bigFn).filter(_.value > 10).toArray

which, if you care about the overhead of boxing in Option, may be significant.

(Note: the filter/map/filter may be faster than the Option boxing.)

Scala 2.13 has overloads in scala.PartialFunction like def andThen[C](k: PartialFunction[B, C]): PartialFunction[A, C] which enable concise composition of partial functions, so you can fuse multiple collects into one. Example: https://scastie.scala-lang.org/n8QriIZTT1K4qQvA97ljyw

import scala.{PartialFunction => PF}

val ints = List.tabulate(10)(i => i)

val skipOddNumsAndHalveEven: PF[Int, Int] = {
  case x if x % 2 == 0 => x / 2
}

def skipLowerThanAndSubtract(threshold: Int): PF[Int, Int] = {
  case x if x >= threshold => x - threshold
}

// unfused partial functions, two 'collect' passes
println(ints.collect(skipOddNumsAndHalveEven).collect(skipLowerThanAndSubtract(3)))
// fused partial functions, single 'collect' pass
println(ints.collect(skipOddNumsAndHalveEven.andThen(skipLowerThanAndSubtract(3))))

On Scala 2.12 this will throw MatchError due to lack of support of partial functions in andThen, compose etc combinators. On Scala 2.13 it will work properly.

@Ichoran You’re right; I’ll refactor in this case.

@tarsa Appreciate the info. That’s clean enough, I’ll definitely use that when I run into this next time in smaller collections.