Sceptical of zip

jimka · June 5, 2019, 7:42am

I admit that I’m sceptical of zip because it seems (to me) like a hack to get around the limitation that arity of method calls in Scala must be known at compile time. I suspect my scepticism is misguided, but don’t really know how to verify my suspicion.

To avoid using zip I often use a tail-recursive local function which simultaneously traverses both Lists.
Here is an example. Can someone suggest a better approach? I’m happy to have suggestions. I admit that my implementation is obscure, and every time I look at it again, I have to convince myself again that it works.

The function recursively traverses two Lists, clause1 and clause2. At each iteration it looks at the heads of the two lists. If they are not equal in absolute value, the recursion stops and false is returned, otherwise, the recursion continues. As soon as more than one such difference is detected, the recursion stops and false is returned. If the end of the lists are reached, then true or false is returned depending on whether exactly 1 difference was discovered.

I think I could write this function in a less obfuscated way using zip and count, but 1) I’m afraid zip might create an intermediate list which is unnecessary, and 2) I never want to count higher than 2. As soon as 2 is reached, I want count to abort and qmCompatible_? should return false.

This is actually important for my application, as the lists might be very long, and there might be a large number of such lists which are being tested.

  type Clause = List[Int]

  def qmCompatible_?(clause1: Clause, clause2: Clause): Boolean = {
    // We assume this function is only called with lists of the same
    //   length, if it is called with lists of different lengths, the
    //   results are undefined.
    // Given two lists of Int (Clause), determine whether they
    // are equal, except for exactly one pair of corresponding entries
    // which are equal in absolute value.
    // E.g., List(1, -2, 3) == List(1, 2, 3)
    // but   List(-1, -2, 3) != List(1, 2, -3)
    // and   List(1, 2, 4) != List(1, 2, 3)
    def loop(clause1: Clause, clause2: Clause, diff: Int): Boolean = {
      if (diff > 1) false
      else (clause1, clause2) match {
        case (a :: as, b :: bs) =>
          equalAbs(a, b) && loop(as, bs, if (a == b) diff else diff + 1)
        case (_, _) => 1 == diff
      }
    }

    loop(clause1, clause2, diff=0)
  }

  def equalAbs(x: Int, y: Int): Boolean = {
    abs(x) == abs(y)
  }

SethTisue · June 5, 2019, 8:03am

If you’re worried about the intermediate list, use an Iterator, or, in Scala 2.13, a View.

Jasper-M · June 5, 2019, 11:17am

Could you illustrate that with an example that shows how zip would not be necessary if arity of method calls must not be known at compile time? I don’t really see the connection.

Or lazyZip.

jimka · June 5, 2019, 12:15pm

Sorry to reference Common Lisp. I’m not trying to make an argument for CL, just that that is what I’m most familiar with. But in Common Lisp the mapping functions take any n-ary function and n-many lists to iterate over. In Scala-speak would look something like the following.

List(1,2,3).fold(0){_+_}
(List(1,2,3),List(10,20,30)).fold(0,0){_+_+_}
(List(1,2,3),List(10,20,30),List(100,200,300)).fold(0,0,0){_+_+_+_}

var s = somelist
var t = somelist
(s,t).fold(z1,z2){_+_+_}
(s,t).exists(_>_)   // is there an element of s which is > than the corresponding element of t
(s,t).count(_==_) // count corresponding equal elements
(s,t).forall(_!=_)
(s,t).map(_+_) // add the elements of s pairwise to the elements of t
(s,t).flatMap(f) // f is some binary function

What I meant about knowing the artily at compile-time vs run-time was referring to the fact that in CL I can have a list of lists, where I don’t know the length of the outer nor inner lists at compile time, and I can still do the map or flatMap (in CL called mapcar and mapcan), as long as the types are correct at run-time. The following code takes a list of n lists each of length m, and returns a list of m lists each of length n.

(apply #'mapcar #'list lists-of-lists)

CL-USER> (let ((ll (list (list 1 2 3)
                         (list 10 20 30)
                         (list 100 200 300)
                         (list 1000 2000 3000))))
           (apply #'mapcar #'list ll))
((1 10 100 1000) (2 20 200 2000) (3 30 300 3000))
CL-USER> (let ((ll (list (list 1 2 3)
                         (list 10 20 30)
                         (list 100 200 300)
                         (list 1000 2000 3000))))
           (apply #'mapcan #'list ll))
(1 10 100 1000 2 20 200 2000 3 30 300 3000)
CL-USER>

jimka · June 5, 2019, 12:18pm

Yes, I applaud the enhancements in 2.13. Great job, congratulations to all the contributors.

MarkCLewis · June 5, 2019, 2:38pm

In Scala 2.12 and earlier I liked the zipped method on tuples for this, but it only did arity 2 and 3. That was still enough for most uses. I’m going to have to get used to lazyZip in Scala 2.13 as zipped is being deprecated.

Ichoran · June 5, 2019, 4:08pm

Not given your constraints (long lists, care about efficiency) except to not even use lists.

zip, like most of Scala’s high level constructs, are there to make it easier to express logic but not necessarily to be really fast. You end up creating extra tuple objects, too, which isn’t good even in the iterator/view case.

The recursive version I’d write (with arrays, for speed) is similar to what you wrote, though I’d run the logic differently. Admittedly it’s considerably longer than your logic, but I find separating the concerns easier.

def qm(xs: Array[Int], ys: Array[Int]): Boolean = {
  def find(i: Int = 0): Int =
    if (i >= xs.length || i >= ys.length) -1
    else if (!equalAbs(xs(i), ys(i)) -1
    else if (xs(i) == ys(i)) find(i+1)
    else i
  def exact(i: Int): Boolean =
    if (i >= xs.length || i >= ys.length) true
    else if (xs(i) == ys(i)) exact(i+1)
    else false
  val diff = find()
  (diff >= 0) && exact(diff+1)    
}

The shortest logic I can think of is something like this:

def qm(xs: Seq[Int], ys: Seq[Int]): Boolean =
  (xs zip ys).collect{ case (x, y) if x != y => equalsAbs(x, y) } == Seq(true)

which you can use with iterators and sameElements instead of == to become lazy:

def qm(xs: Seq[Int], ys: Seq[Int]): Boolean =
  (xs.iterator zip ys.iterator).
    collect{ case (x,y) if x != y => equalsAbs(x, y) }.
    sameElements(Iterator(true))

Yeah, being able to abstract over arity would help with that. Can’t do it easily in Scala, and in practice it’s not really done at all; instead you build the tuples and operate on a single list.