Ordering in Vector groupBy

If I call groupBy on a Vector, can I assume that the ordering within each group will be the same as the relative ordering in the original Vector? I just want to be sure. Thanks.

Nope !! … it’s not warrantied … I’ve just run this at the Scala REPL:

scala> Vector("1", "1", "3", "1", "0").groupBy(x => x)
res0: scala.collection.immutable.Map[String,scala.collection.immutable.Vector[String]] = Map(1 -> Vector(1, 1, 1), 0 -> Vector(0), 3 -> Vector(3))

As you can see in the res0 value, the item 0 comes before the item 3 … and at the original Vector the 3 is first

collection.immutable.Map doesn’t preserve key order, but that’s different than what Russ asked. He asked about the ordering within each group.

Within each group, the ordering from the original Vector is preserved.

Example:

scala 2.13.0> Vector.fill(20)(util.Random.nextInt(100))
res6: scala.collection.immutable.Vector[Int] = Vector(57, 74, 17, 79, 87, 86, 13, 63, 47, 56, 94, 19, 84, 39, 93, 15, 35, 33, 2, 68)

scala 2.13.0> res6.groupBy(_ % 10)
res7: scala.collection.immutable.Map[Int,scala.collection.immutable.Vector[Int]] = HashMap(5 -> Vector(15, 35), 6 -> Vector(86, 56), 9 -> Vector(79, 19, 39), 2 -> Vector(2), 7 -> Vector(57, 17, 87, 47), 3 -> Vector(13, 63, 93, 33), 8 -> Vector(68), 4 -> Vector(74, 94, 84))

take Vector(79, 19, 39) for example — those numbers appear in that same order in the input.

1 Like

This isnt specified in the doc though, which implies that it could change for whatever reason, and still meet the contract of the method, and depending on it is depending on an implementation detail potentially subject to change.

Either way should IMO be documented.

Yes, I agree that the preservation of ordering should be guaranteed and documented.

In case it’s better for performance or simplicity, I’d argue that preservation of ordering should not be guaranteed.

def groupBy[A, B](as: List[A], f: A => B) = as.foldLeft(Map.empty[B, List[A]])((m, a) => m.updateWith(f(a))(l => a :: l.getOrElse(Nil)))

should IMO be allowed without going through builders that build a List is order, or reversing them on the way out.

Preserving the order seems a small cost for a large benefit and it appears we already have it. I suppose it was always intended like this, but the docs don’t reflect this, just because most collection methods inherit their documentation from super types like Iterable. Let’s help people like Russ by updating the docs.

I don’t think this should be a concern. All of the collections have efficient builders, and we consider it normal for collections operations to have builder-based implementations.

I think a pull request documenting the order-preserving property would be accepted, especially if it came with a property-based test that checked it.

(about property-based collections tests, see GitHub - scala/scala-collection-laws: partially-automatic generation of tests for the entire collections library , which unfortunately is still a separate repo; we’d love to have interested contributor bring it into scala/scala. ticket on that is Include Rex's collections-laws tests in PR validation · Issue #411 · scala/scala-dev · GitHub)

1 Like

I have a different solution to my original problem, so I won’t be using groupBy in this case. However, it was intuitive to me to assume, at least until I thought about it more carefully, that the original ordering would be preserved. If it is not preserved, that opens up the potential for subtle errors by those who intuitively assume otherwise. Whether that assumption is careless or not, why take the chance?

I think there is a general principle by the authors. Things will be stable unless otherwise noted.

I think that’s the case. I might have read it somewhere else. It might have been in a discussion of a sort. Generally, everything should produce identical results when evaluated over identical inputs. Anyplace that doesn’t happen, it needs a big-red-flashing-sign that the consumer must beware.

Even things like HashSet guarantees the order of the results. We just need computers to figure out the order. Unless there is a compelling reason (speed,memory) things really should be ordered.

I believe, the document does not mention ordering because, ordering depends on the type on which groupBy is being performed.

If groupBy is performed on a Vector, then ordering within the result Vector is implied.
If groupBy is performed on an unordered collection like Set, ordering does not matter.

Documentation should clarify that, the characteristics on the input type will be preserved in the output.