Detect consecutive elements in a list

jimka · May 28, 2019, 8:35am

The List and Seq libraries have lots of interesting as well as obscure methods. Mastering them is fun an challenging.

Given a sequence, (in my case it its a List), and two candidate elements, a and b, what is the best way to determine whether the list contains a followed immediately by b somewhere in it? Here is my attempt, but I hope there’s an easier way.

    (!list.isEmpty &&
      (list.head :: list).tails.exists {
        case x :: y :: _ => x == a && y == b
        case _ => false
      })

I have to use (list.head::list).tails because tails does not contain the list itself. Could also use list::(list.tails).

martijnhoekstra · May 28, 2019, 9:21am

if you don’t want to use the stdlib method containsSlice (so list.containsSlice(List(a, b))) how about

val a: A
val b: A

def containsab(list: List[A]) = list match {
  case `a` :: `b` :: _ => true
  case  _ :: tail => containsab(tail)
  case _ => false
}

If you want to make your own containsSlice with combinators

def containsSlice[A](list: List[A], sublist: List[A]) = {
  list.sliding(toFind.length).contains(toFind)
  //or, more efficiently list.iterator.sliding(toFind.length).exists(it => it.sameElements(sublist))
}

or if you want to write out the recursion by hand

def containsSlice[A](list: List[A], sublist: List[A]) = list.startsWith(sublist) ||
  list.tailOption match {
    case Some(tail) => containsSlice(tail)
    case None => false
  }

the prettier

def containsSlice[A](list: List[A], sublist: List[A]) = list.startsWith(sublist)
  || list.tailOption.map(t => containsSlice(t, sublist)).getOrElse(false)

won’t work as the recursive call is inside the map, it won’t optimize out.

the “use unsafe methods for performance” version would be

def containsSlice[A](list: List[A], sublist: List[A]) = list.startsWith(sublist) ||
  list.nonEmpty && containsSlice(list.tail)

curoli · May 28, 2019, 2:30pm

**scala> val seq: Seq[Int] = 1 to 10
seq: Seq[Int] = Range 1 to 10

(seq.dropRight(1)).zip(seq.drop(1)).contains((5, 6))
res0: Boolean = true
(seq.dropRight(1)).zip(seq.drop(1)).contains((6, 5))
res1: Boolean = false**

stewSquared · May 28, 2019, 10:19pm

As @martijnhoekstra mentioned, this can be directly implemented with containsSlice:

list.containsSlice(Seq(a, b))

But since you’re here to master “interesting and obscure methods”, here are a few alternatives using standard collection methods:

list.indexOfSlice(Seq(a, b)) != -1

list.sliding(2).contains(Seq(a, b))

list.zip(list.drop(1)).contains((a, b))

list.tails.exists(_.startsWith(Seq(a, b)))

Btw, containsSlice uses indexOfSlice, which has a reasonably performant implementation.

jimka · May 29, 2019, 7:37am

@stewSquared, your comment made me think of the real problem that I was trying to solve. Perhaps there’s much better way that what I did. Here is the problem.
Given a sequence s, of elements of type A, and a function f:A=>B. I want to split s into a sequence of sequences, in particular split on the boundaries where f(x) == f(y), even if x !=y … omitting sequences of length one. E.g., if the seq contains a,b,c,d,esuch that f(a)==f(b)==...==f(e), then only keep a and e in the output seq, omitting b...d.

I did this in 8 lines using a tail recursive function and pattern matching. Maybe there’s a more clever way?

    def splitOnBoundary[A,B](s:List[A],f:A=>B):List[List[A]] = {
      def split(acc:List[List[A]],as:List[A],remaining:List[A]):List[List[A]] = {
        (as,remaining ) match {
          case (Nil,Nil) => acc.reverse
          case (s1::Nil,s2::t2) if f(s1) == f(s2) => split(acc,List(s2),t2)
          case (s1::t1, s2::t2) if f(s1) == f(s2) => split(as.reverse::acc,List(s2),t2)
          case (s1::_, s2::t2) => split(acc,s2::as,t2)
          case (Nil,s2::t2) => split(acc,List(s2),t2)
          case (s2::Nil,Nil) => split(acc,Nil,Nil)
          case (_,Nil) => split(as.reverse::acc,Nil,Nil)
        }
      }
      split(Nil,Nil,s)
    }

jimka · May 29, 2019, 8:29am

I can remove two of the cases, if I simply filter out singleton lists as a post filter.

    def splitOnBoundary[A, B](s: List[A], f: A => B): List[List[A]] = {
      def split(acc: List[List[A]], as: List[A], remaining: List[A]): List[List[A]] = {
        (as, remaining) match {
          case (Nil, Nil) => acc.reverse
          case (s1 :: _, s2 :: t2) if f(s1) == f(s2) => split(as.reverse :: acc, List(s2), t2)
          case (s1 :: _, s2 :: t2) => split(acc, s2 :: as, t2)
          case (Nil, s2 :: t2) => split(acc, List(s2), t2)
          case (_, Nil) => split(as.reverse :: acc, Nil, Nil)
        }
      }
      split(Nil, Nil, s).filter(_.length > 1)
    }

martijnhoekstra · May 29, 2019, 11:35am

slightly simplified:

def splitOnBoundary[A](as: List[A], equiv: (A, A) => Boolean): List[A] = {
  def rec(remaining: List[A], agg: List[A]) = remaining match {
    case Nil => agg.reverse
    case (h1 :: t1) => agg match {
      case (ha :: sa :: ta) if equiv(h1, ha) && equiv(ha, sa) => rec(t1, h1 :: sa :: ta)
      case _ => rec(t1, h1 :: agg)
    }
  }
  rec(as, Nil)
}

the aggregate is a flat list, and you skip the previously checked element if and only if the next element, the previously checked element, and the element before that are equivalent

if you want to call it with the projection, derive the equivalence relation from that and universal equality:

def splitOnBoundry[A, B](as: List[A], proj: A => B): List[A] = splitOnBoundry(as, (a1, a2) => proj(a1) == proj(a2))

jimka · May 29, 2019, 11:55am

@martijnhoekstra, I don’t understand how your implementation will work. Isn’t it incompatible with type erasure? How are the two functions named splitOnBoundry which both specialize on List and Function distinguished after type erasure?

martijnhoekstra · May 29, 2019, 11:57am

one is Function1[Erased, Erased], the other is Function2[Erased, Erased, Erased] – but I may be wrong there. In that case, you’d need to change one of the two method names.