Best way to generate pairs

Is there a better way to generate a list of pairs (x,y) from a given list which preserve the order of x and y. For example computePairs(List(1,2,3)) returns a list containing (1,2), (1,3), and (2,3) but not (2,1), (3,1), or (3,2). It seems my solution is more complicated than necessary.

It seems fairly common that I encounter functions which I only know how to write using tail recursion.

  def computePairs[T](data:List[T]):List[(T,T)] = {
    val nil: List[(T, T)] = Nil

    def recur(data: List[T], data2: List[T], acc: List[(T, T)]): List[(T, T)] = {
      (data, data2) match {
        case (Nil, _) => acc
        case (h1::Nil, Nil) => acc
        case (h1:: _, h2 :: tail) => recur(data, tail, (h1 -> h2) :: acc)
        case (h1 :: _ :: tail, Nil) => recur(data.tail, data.tail.tail, acc)
      }
    }
    if (data.isEmpty)
      nil
    else
      recur(data, data.tail, nil)
  }

Something to this effect

def computePairs[T](data: List[T]): List[(T, T)] =
  data
    .tails
    .filter(_.nonEmpty)
    .flatMap(xs => xs.tail.map((xs.head, _)))
    .toList

Or if you wanna be really lame:

def computePairs[T](data: List[T]): List[(T, T)] =
  data
    .combinations(2)
    .map{ case List(a, b) => (a, b) }
    .toList

But that just takes all the fun out of it, doesn’t it? :joy:

It’s also not exactly the same thing in case your data can contain duplicates.

That implementation is REALLY SLOW. on a 7311 length test case, my implementation finishes in about a second (wall clock timing) I let Jasper-M’s implementation run for 5 minutes and it never finished.

That’s strange. All 3 seem to be going out of memory for me on a default setup.

I changed my heap size to 2500 Mbytes

Without really measuring yours and my tails solution seem to be similar in memory and time usage. Though they take considerably longer than a second for me. I wonder what kind of machine you’re using.
That the combinations one is so slow might be related to the duplicate checking.

Here’s what I’m using
15

Here are my three functions


  def computePairs3[T](data: List[T]): List[(T, T)] =
    data
      .tails
      .filter(_.nonEmpty)
      .flatMap(xs => xs.tail.map((xs.head, _)))
      .toList

  def computePairs2[T](data: List[T]): List[(T, T)] =
    data
      .combinations(2)
      .map{ case List(a, b) => (a, b) }
      .toList

  def computePairs[T](data:List[T]):List[(T,T)] = {
    val nil: List[(T, T)] = Nil

    def recur(data: List[T], data2: List[T], acc: List[(T, T)]): List[(T, T)] = {
      (data, data2) match {
        case (Nil, _) => acc
        case (h1::Nil, Nil) => acc
        case (h1:: _, h2 :: tail) => recur(data, tail, (h1 -> h2) :: acc)
        case (h1 :: _ :: tail, Nil) => recur(data.tail, data.tail.tail, acc)
      }
    }
    if (data.isEmpty)
      nil
    else
      recur(data, data.tail, nil)
  }

I’ve set up a small function to test the execution using System.nanoTime


  def main(argv: Array[String]): Unit = {
    //val localTemps = locateTemperatures(1975, "/stations.csv", "/head1975.csv")
    // triangularMesh()
   for { i <- 800 to 5000 by 100
         data = (1 to i).toList} {
     println(s"length = $i")
     val t0 = System.nanoTime()
     computePairs(data)
     val t1 = System.nanoTime()
     println(s"computePairs ${(t1 - t0)/1e9}")
     computePairs2(data)
     val t2 = System.nanoTime()
     println(s"computePairs2 ${(t2 - t1)/1e9}")
     computePairs3(data)
     val t3 = System.nanoTime()
     println(s"computePairs3 ${(t3 - t2)/1e9}")
   }
  }

It seems the tail recursive version is by far the fastest.

length = 2000
computePairs 0.011500182
computePairs2 19.731632092
computePairs3 0.05089523
length = 2100
computePairs 0.0128521
computePairs2 21.493444822
computePairs3 0.061381673
length = 2200
computePairs 0.013651585
computePairs2 25.100234025
computePairs3 0.066952978
length = 2300
computePairs 0.014679647
computePairs2 29.491569607
computePairs3 0.074526357
length = 2400
computePairs 0.018036686
computePairs2 31.925753898
computePairs3 0.07187178
length = 2500
computePairs 0.017542373
computePairs2 36.101928877
computePairs3 0.086673232

How about that:

val list = List(1, 2, 3, 4)

val result = list.tails.foldLeft[List[(Int, Int)]](Nil)((acc, subList) => {
  subList match {
    case head :: tail => tail.map((head, _)).reverse ::: acc
    case _ => acc
  }
}).reverse

println(result) // prints: List((1,2), (1,3), (1,4), (2,3), (2,4), (3,4))

???

Or that:

val list = List(1, 2, 3, 4)

val result = List.unfold(list) {
  case head :: tail => Some(tail.map((head, _)) -> tail)
  case Nil => None
}.flatten

println(result) // prints: List((1,2), (1,3), (1,4), (2,3), (2,4), (3,4))

.reverse is not necessary. I don’t care about the order of the computed list.

I don’t seem to have List.unfold

I added the function

  def computePairs4[T](data:List[T]):List[(T,T)] =
    data.tails.foldLeft[List[(T, T)]](Nil)((acc, subList) => {
      subList match {
        case head :: tail => tail.map((head, _)) ::: acc
        case _ => acc
      }
    })

So I ran the timing test again.

length = 1000
computePairs 0.033336759
computePairs2 2.66880161
computePairs3 0.030787732
computePairs4 0.01413223
length = 2000
computePairs 0.125602738
computePairs2 18.915179939
computePairs3 0.036282752
computePairs4 0.026724497
length = 3000
computePairs 0.026496565
computePairs2 62.851550954
computePairs3 0.09019425
computePairs4 0.069243984
length = 4000
computePairs 4.581079107
computePairs2 272.994857273
computePairs3 1.659703356
computePairs4 13.428980719

List.unfold was added in Scala 2.13. There is also Iterator.unfold and many more unfolds.

So now computePairs3 is faster than computePairs? Looks like your benchmarks results are rather unstable. I would consider using sbt-jmh if performance numbers are really important.

I have a suspicion that your recursive version is the fastest because it generates the least number of non-optimizable garbage. Even here: tail.map((head, _)) ::: acc - first the prefix is generated (tail.map((head, _))) and then preprending it to acc required rebuilding the prefix to point to new tail.

For reducing garbage probably something like that would be good:

  def computePairs[T](data: List[T]): List[(T, T)] = {
    (for (head :: tail <- data.tails) yield {
      tail.iterator.map((head, _))
    }).flatten.toList
  }

or if you manage to get Scala 2.13 working then:

  def computePairs[T](data: List[T]): List[(T, T)] = {
    Iterator.unfold(data) {
      case head :: tail => Some(tail.iterator.map((head, _)) -> tail)
      case Nil => None
    }.flatten.toList
  }

It seems to me that building a list just in order to flatten it is inherently wasteful. If one can use flatMap that avoids the needs to create an intermediate object just to flatten it later.

Yes, I want to convert the projected to a newer Scala version eventually. Perhaps it is time to do so.

The test case is done simply with lists of integers. But in my actual program the objects are heftier. I suppose this is why I was getting an out-of-memory error.

In some examples I’ve build lists of lists, but in other I’ve built iterators of iterators. Creating or flattening iterators in Scala 2.13 should be constant time operation as there are many iterator implementations and specialized methods on iterators.

list.tails returns Iterator. list.iterator and Iterator.unfold too (obviously).

Another scheme-like approach is to write a function to map over the cons cells, and a function to do collection using a poor man’s continuation passing style. Here is an example implementation:

  // call the given client function on each cons cell in the given list,
  // discarding the return values
  def mapConsCells[T](data:List[T])(client:List[T]=>Unit):Unit = {
    @tailrec
    def recur(data: List[T]):Unit = {
      if (data.nonEmpty) {
        client(data)
        recur(data.tail)
      }
    }
    recur(data)
  }

  def withCollector[T](continuation: (T => Unit) => Unit): List[T] = {
    var objects: List[T] = Nil

    def collect(obj: T): Unit = {
      objects = obj :: objects
    }

    continuation(collect)
    objects
  }

  def computePairs5[T](data:List[T]):List[(T,T)] = {
    withCollector(collect => {
      mapConsCells(data) { c: List[T] =>
        val a = c.head
        c.tail.foreach { b: T => collect((a, b)) } }
    })
  }