What is zip of Set

I found some code (written by one of my students) which works, and passes all the tests in the test suite, but I have trouble believing that it works.

Given two sets of 4 strings each, such that the intersection of the two sets might be empty or inhabited, the student found the matches and differences like this.

Do small sets always zip in some predicable order which makes this code accidentally work?

What does (c1,c2).zipped do?

  // Card is a type aliased to Set[String], and every Card is guaranteed exactly 4 elements.
  def completeTriptych(c1: Card, c2: Card): Card = {
    (c1, c2).zipped.foldLeft(Set[String]())({
      (acc, cards) =>
        if (cards._1 == cards._2)
          acc + cards._1
        else {
          acc + features.find(_.contains(cards._1)).get
            .find(el => el != cards._1 && el != cards._2).get
        }
    })
  }

features is a Set[Set[String]] a set of 4 Sets, each of the leaf sets contains 4 strings, exactly one of which is contained in c1, and one of which is contained in c2.

Here is an example?

(Set("red","oval","stripped","two"),
  Set("green","squiggle","filled","one")).zipped.toList

This surprisingly returns List((red,green), (oval,squiggle), (stripped,filled), (two,one)),
i.e., zipping element 1 with element 1, element 2 with element 2, etc…

There’s an implicit def scala.Predef.tuple2ToZippedOps which wraps a pair of values in scala.runtime.Tuple2Zipped.Ops which has a method zipped that takes some further implicits.

There are classes scala.collection.immutable.Set.SetX for X in (1, 2, 3, 4) that have predictable order.

That’s correct but it is an implementation detail that you shouldn’t rely on for correct behavior.

1 Like

Maybe it would be worth it to randomize the order in that Set subclasses, so people couldn’t rely on it?

I guess there is a compatibility issue but surely (for say 3.0) the better design would be to make sure zipped isn’t defined except on types where there is some ordering that can be relied on?

It won’t be possible for 3.0 because it has to have the same standard library as 2.13

However, I believe I am not the only one who believes the heavy on inheritance approach used to model the collections just to allow abstracting between them isn’t really worth it.

I find the concept of Set very useful and practical as the concept corresponds well to certain abstractions needed in real problems. However, the concept seems contrary to functional programming, in the sense that FP is about referential transparency and dependable results. The Set has a very side-effect-like behavior, in the sense that interacting with it can give intentionally unpredictable results. This is powerful if you understand the abstraction.

In my opinion it is an excellent abstraction. But as my clever student led me to discover, it is pretty easy to depend on undocumented behavior without realizing it.

I also found the concept of Set very useful.

I just do not find useful that it has a lot of methods like head or zip which don’t make sense on them.

1 Like

I’m using Sets a lot, and occasionally, I use Set.head or Set.headOption with the intention "pick an element, doesn’t matter which`. So, those methods are definitely useful, they might just be misleading what they do.

I’ve never used Set.zip, but I could imagine someone might have a case of “pair up elements from two collections arbitrarily”.

If you use a SortedSet that imposes an ordering on its elements you will get much more predictable behavior, otherwise you will just have to know that extensional equality doesn’t imply congruence; i.e., s1 == s2 doesn’t necessarily mean f(s1) == f(s2) for all pure functions f.

For what it’s worth, Cats formalizes this and only provides UnorderedFoldable for Set, which limits you to commutative folds.

1 Like