# Best way to generate pairs

Is there a better way to generate a list of pairs `(x,y)` from a given list which preserve the order of` x` and `y`. For example `computePairs(List(1,2,3))` returns a list containing `(1,2)`, `(1,3)`, and `(2,3)` but not `(2,1)`, `(3,1)`, or `(3,2)`. It seems my solution is more complicated than necessary.

It seems fairly common that I encounter functions which I only know how to write using tail recursion.

``````  def computePairs[T](data:List[T]):List[(T,T)] = {
val nil: List[(T, T)] = Nil

def recur(data: List[T], data2: List[T], acc: List[(T, T)]): List[(T, T)] = {
(data, data2) match {
case (Nil, _) => acc
case (h1::Nil, Nil) => acc
case (h1:: _, h2 :: tail) => recur(data, tail, (h1 -> h2) :: acc)
case (h1 :: _ :: tail, Nil) => recur(data.tail, data.tail.tail, acc)
}
}
if (data.isEmpty)
nil
else
recur(data, data.tail, nil)
}
``````

Something to this effect

``````def computePairs[T](data: List[T]): List[(T, T)] =
data
.tails
.filter(_.nonEmpty)
.toList
``````

Or if you wanna be really lame:

``````def computePairs[T](data: List[T]): List[(T, T)] =
data
.combinations(2)
.map{ case List(a, b) => (a, b) }
.toList
``````

But that just takes all the fun out of it, doesn’t it?

It’s also not exactly the same thing in case your data can contain duplicates.

That implementation is REALLY SLOW. on a 7311 length test case, my implementation finishes in about a second (wall clock timing) I let Jasper-M’s implementation run for 5 minutes and it never finished.

That’s strange. All 3 seem to be going out of memory for me on a default setup.

I changed my heap size to 2500 Mbytes

Without really measuring yours and my `tails` solution seem to be similar in memory and time usage. Though they take considerably longer than a second for me. I wonder what kind of machine you’re using.
That the `combinations` one is so slow might be related to the duplicate checking.

Here’s what I’m using

Here are my three functions

``````
def computePairs3[T](data: List[T]): List[(T, T)] =
data
.tails
.filter(_.nonEmpty)
.toList

def computePairs2[T](data: List[T]): List[(T, T)] =
data
.combinations(2)
.map{ case List(a, b) => (a, b) }
.toList

def computePairs[T](data:List[T]):List[(T,T)] = {
val nil: List[(T, T)] = Nil

def recur(data: List[T], data2: List[T], acc: List[(T, T)]): List[(T, T)] = {
(data, data2) match {
case (Nil, _) => acc
case (h1::Nil, Nil) => acc
case (h1:: _, h2 :: tail) => recur(data, tail, (h1 -> h2) :: acc)
case (h1 :: _ :: tail, Nil) => recur(data.tail, data.tail.tail, acc)
}
}
if (data.isEmpty)
nil
else
recur(data, data.tail, nil)
}
``````

I’ve set up a small function to test the execution using `System.nanoTime`

``````
def main(argv: Array[String]): Unit = {
//val localTemps = locateTemperatures(1975, "/stations.csv", "/head1975.csv")
// triangularMesh()
for { i <- 800 to 5000 by 100
data = (1 to i).toList} {
println(s"length = \$i")
val t0 = System.nanoTime()
computePairs(data)
val t1 = System.nanoTime()
println(s"computePairs \${(t1 - t0)/1e9}")
computePairs2(data)
val t2 = System.nanoTime()
println(s"computePairs2 \${(t2 - t1)/1e9}")
computePairs3(data)
val t3 = System.nanoTime()
println(s"computePairs3 \${(t3 - t2)/1e9}")
}
}
``````

It seems the tail recursive version is by far the fastest.

``````length = 2000
computePairs 0.011500182
computePairs2 19.731632092
computePairs3 0.05089523
length = 2100
computePairs 0.0128521
computePairs2 21.493444822
computePairs3 0.061381673
length = 2200
computePairs 0.013651585
computePairs2 25.100234025
computePairs3 0.066952978
length = 2300
computePairs 0.014679647
computePairs2 29.491569607
computePairs3 0.074526357
length = 2400
computePairs 0.018036686
computePairs2 31.925753898
computePairs3 0.07187178
length = 2500
computePairs 0.017542373
computePairs2 36.101928877
computePairs3 0.086673232
``````

``````val list = List(1, 2, 3, 4)

val result = list.tails.foldLeft[List[(Int, Int)]](Nil)((acc, subList) => {
subList match {
case _ => acc
}
}).reverse

println(result) // prints: List((1,2), (1,3), (1,4), (2,3), (2,4), (3,4))
``````

???

Or that:

``````val list = List(1, 2, 3, 4)

val result = List.unfold(list) {
case Nil => None
}.flatten

println(result) // prints: List((1,2), (1,3), (1,4), (2,3), (2,4), (3,4))
``````

`.reverse` is not necessary. I don’t care about the order of the computed list.

I don’t seem to have `List.unfold`

``````  def computePairs4[T](data:List[T]):List[(T,T)] =
data.tails.foldLeft[List[(T, T)]](Nil)((acc, subList) => {
subList match {
case _ => acc
}
})
``````

So I ran the timing test again.

``````length = 1000
computePairs 0.033336759
computePairs2 2.66880161
computePairs3 0.030787732
computePairs4 0.01413223
length = 2000
computePairs 0.125602738
computePairs2 18.915179939
computePairs3 0.036282752
computePairs4 0.026724497
length = 3000
computePairs 0.026496565
computePairs2 62.851550954
computePairs3 0.09019425
computePairs4 0.069243984
length = 4000
computePairs 4.581079107
computePairs2 272.994857273
computePairs3 1.659703356
computePairs4 13.428980719
``````

`List.unfold` was added in Scala 2.13. There is also `Iterator.unfold` and many more `unfold`s.

So now `computePairs3` is faster than `computePairs`? Looks like your benchmarks results are rather unstable. I would consider using `sbt-jmh` if performance numbers are really important.

I have a suspicion that your recursive version is the fastest because it generates the least number of non-optimizable garbage. Even here: `tail.map((head, _)) ::: acc` - first the prefix is generated (`tail.map((head, _))`) and then preprending it to `acc` required rebuilding the prefix to point to new tail.

For reducing garbage probably something like that would be good:

``````  def computePairs[T](data: List[T]): List[(T, T)] = {
(for (head :: tail <- data.tails) yield {
}).flatten.toList
}
``````

or if you manage to get Scala 2.13 working then:

``````  def computePairs[T](data: List[T]): List[(T, T)] = {
Iterator.unfold(data) {
case Nil => None
}.flatten.toList
}
``````

It seems to me that building a list just in order to flatten it is inherently wasteful. If one can use `flatMap` that avoids the needs to create an intermediate object just to flatten it later.

Yes, I want to convert the projected to a newer Scala version eventually. Perhaps it is time to do so.

The test case is done simply with lists of integers. But in my actual program the objects are heftier. I suppose this is why I was getting an out-of-memory error.

In some examples I’ve build lists of lists, but in other I’ve built iterators of iterators. Creating or flattening iterators in Scala 2.13 should be constant time operation as there are many iterator implementations and specialized methods on iterators.

`list.tails` returns `Iterator`. `list.iterator` and `Iterator.unfold` too (obviously).

Another scheme-like approach is to write a function to map over the cons cells, and a function to do collection using a poor man’s continuation passing style. Here is an example implementation:

``````  // call the given client function on each cons cell in the given list,
def mapConsCells[T](data:List[T])(client:List[T]=>Unit):Unit = {
@tailrec
def recur(data: List[T]):Unit = {
if (data.nonEmpty) {
client(data)
recur(data.tail)
}
}
recur(data)
}

def withCollector[T](continuation: (T => Unit) => Unit): List[T] = {
var objects: List[T] = Nil

def collect(obj: T): Unit = {
objects = obj :: objects
}

continuation(collect)
objects
}

def computePairs5[T](data:List[T]):List[(T,T)] = {
withCollector(collect => {
mapConsCells(data) { c: List[T] =>