Collect adjacent elements into pairs

jimka · July 23, 2019, 1:12pm

What is the most elegant way using the standard library to group consecutive paris of elements into 2-tuples? I’m not sure what type of data structure the elements are given in, so hopefully the solution could be fairly general, working on arrays, lists, sequences, etc.

Something like the following

List(1,2,3,4,5,1,2,3,4,5) -> List((1,2),(3,4),(5,1),(2,3),(4,5))

Of course one element might be left over so it would have to be handled specially, e.g., return a pair consisting of the list of pairs and an Option of the remaining element.

I can easily do this with a tail recursive function given that the input is a List or an Array, but can it be done as some combination of drop/fold/zip/scan for example?

Jasper-M · July 23, 2019, 1:23pm

You can use grouped. And then convert the intermediate lists to tuples.

scala> def pairs[A](as: List[A]) = as.grouped(2).map {
     |   case List(x, y) => (x, y)
     |   case _          => sys.error("uneven size")
     | }.toList
pairs: [A](as: List[A])List[(A, A)]

scala> pairs(List(1,2,3,4,5,1,2,3,4,5))
res13: List[(Int, Int)] = List((1,2), (3,4), (5,1), (2,3), (4,5))

jimka · July 23, 2019, 1:30pm

Ahh that works, I don’t really need tuples, I can also use lists, and just pattern match against 2-element lists or 1 element lists. Thanks.

jimka · July 24, 2019, 9:11am

Thanks for the suggestion. Here is what I have now, which leads to other questions.

The first question comes because grouped drastically changes the type of its input (outputs a very different type than its input). Certainly the type must be changed/tweaked, because of pairing, but it would be nice if it grouped its input by converted a M[A] to M[List[A]]. Does such a function exist?

Having such a function should (???) allow me to rewrite the function below so that the call to map would respect the fact that the given object is a par object such as List(1,2,3,4,5).par

  def pairReduce_v1[B](m: List[B])(init: B)(combop: (B, B) => B):B = {
    val reduced = m.grouped(2).map { b: List[B] =>
      b match {
        case List() => init
        case List(x) => x
        case List(x, y) => combop(x, y)
      }
    }.toList   // unfortunately we must convert back to list now :-(
    if (reduced.isEmpty)
      init
    else if (reduced.tail.isEmpty)
      reduced.head
    else // recursive call uses the identity function as seqop
      pairReduce_v1(reduced)(init)(combop)
  }

Would it be possible to rewrite this function something like the following. Am I going to have to implement a type class?

  def pairReduce_v1[A, B](m: Seq[A])(init: B)(combop: (B, B) => B):B = {
    val reduced = m.grouped(2).map { b: List[B] =>
      b match {
        case List() => init
        case List(x) => x
        case List(x, y) => combop(x, y)
      }
    }
    if (reduced.isEmpty)
      init
    else if (reduced.tail.isEmpty) // TODO how to ask this question for arbitrary Seq ???
      reduced.head
    else // recursive call uses the identity function as seqop
      pairReduce_v1(reduced)(init)(combop)
  }

jducoeur · July 24, 2019, 12:59pm

Do you mean the fact that the series is output as an Iterable? That seems necessary for efficiency, so that it is lazy – basically, the onus is on you to force that to be strict, but I probably wouldn’t recommend that.

And it’s really unrelated to the original type – it’s taking an M[A], and producing an Iterable[M[A]], composed of the subsets. I think what you’re asking for is for that to return M[M[A]], and I don’t think that actually makes sense in practice.

I really don’t think that makes sense in the general case. Consider: grouped() is also implemented on collections such as Map, but it wouldn’t make any sense for its output to be a Map[Map[]]. The “wrapper” type in grouped() (the Iterable) really is conceptually separate from the type that is being grouped.

I’m not sure, but my instinct says yes – problems where you are starting with a generic type T, and need to implement a function that winds up creating another instance of T, tend to be best solved with a typeclass.

Otherwise, you tend to wind up with the f-bound problem. This isn’t quite the same problem, but feels like it’s likely to hit similar issues – the clause in which you are calculating reduced feels like a classic f-bound situation.

jimka · July 24, 2019, 2:41pm

This reasoning seems curious to me. The function in question is a library function of sorts. The caller can imagine that it is sitting next to foldLeft and fold etc. The caller might call it with List(1,2,3,4,5,...), or he might call it with List(1,2,3,4,5,....).par. In the latter case the caller would expect the computation to be possibly done in parallel, but in effect only the pairing (obj.grouped(2)) would be done in parallel. In my opinion, it is the caller who decides whether the parallization should be done (depending on his application, shared resources etc). And the way to specify this desire, using the scala standard library, is to make the data parallizable, and then use the same 2nd order functions.

Furthermore, with regard to lazy vs greedy computation, again it should be the caller who expresses this desire, not the library function.

Does my point of view make sense? Or am I thinking too simplistically.

jducoeur · July 24, 2019, 5:04pm

See above – I think the key point of disagreement is that you think the “outer” result type should have something to do with the input type, and I don’t think that actually makes sense in practice. Only the “inner” type is related to it.

Keep in mind: List.grouped() isn’t implemented on List – it’s implemented on a supertype, way up the inheritance stack. It’s a generic function for grouping values in collections. Outputting the same type is tricky.

And like I said above, returning the original collection type would be nonsensical in cases like Map – I don’t think there’s even a function signature for that that would make any sense.

So honestly, I think you’re asking for a function that is defined differently from the one that actually exists: you’re expecting rather unusual requirements, that only make sense for a subset of the collections…