Help with sortWith function

jimka · November 19, 2020, 10:26am

I’m looking at the documentation of sortWith. It is confusing to me what my lt function should do in case the two given objects are equal.

Is sortWith ever/never going call my function with equal arguments?
Am I responsible for calling == and returning a Boolean, which Boolean should I return,
Am I really supposed to implement a less-than-or-equal function, or is it really a less-than function?

Jasper-M · November 19, 2020, 10:33am

It’s a less-than function, so if a is strictly less than b return true, in all other cases return false.

jimka · November 19, 2020, 11:49am

@Jasper-M, I have some empirical evidence that you are right. It appears for my simple experiment that using < strict less-than makes the sort stable. e.g., (1,2),(1,3),(1,4) remain in the original order, as do (2,1),(2,5)

List((2,1),(1,2),(1,3),(1,4),(2,5)).sortWith{
  case (a,b) => a._1 < b._1
}
--> List((1,2), (1,3), (1,4), (2,1), (2,5))

However, using <= the sort is not stable. We see that the resulting list has perturbed (1,4),(1,3),(1,2)

List((2,1),(1,2),(1,3),(1,4),(2,5)).sortWith{
  case (a,b) => a._1 <= b._1
}
--> List((1,4), (1,3), (1,2), (2,5), (2,1))

martijnhoekstra · November 19, 2020, 1:40pm

Would a documentation change like this: https://github.com/scala/scala/compare/2.13.x...martijnhoekstra:sortwithdoc have helped?

jimka · November 20, 2020, 9:41am

@martijnhoekstra, thanks for the proposed improvement in the documentation. For me it the text is still ambiguous. Of course I understand it because we discussed it above and I did some experimentation. But when asking whether a user can understand the behavior JUST from the documentation, it still lacks clarity (in my opinion).

It would have been nice if the documentation explained what to do about equivalent elements.
The choice of the word “precede” in the proposed documentation change, is still not 100% clear. If a == b then a is indeed allowed to precede b in the sorted list. This might imply that the function is allowed to return true on equal elements. However, such an understanding would be a mistake on the part of the user.

Another thing that was unclear for me from reading the original documentation was whether
I MUST detect equivalence myself, and whether I should return true or false in this case, and whether my failure to do so would result in a broken sort, or whether it would only break the stability feature.

It is conceivable (although not true) that false has the meaning of "the objects need to be swapped. In such a case returning false on equivalent elements would cause an infinite loop.

Another sentence that seems to be missing (although it might be stated elsewhere) is that requirement that the given compare function must be transitive in the sense that if a<b and b<c, then a<c. That may be obvious to experienced programmers, but I believe it is a crucial hypothesis which sortWith assumes.

curoli · November 20, 2020, 10:38am

The docs say “precedes in the ordering”, which is stronger than saying it precedes in the sorted collection.

When we say “a precedes b in the ordering”, it means the ordering demands that a is always before b and they are not equal.

jimka · November 20, 2020, 11:18am

@curoli, no argument that the essence is there. Just in order to minimize confusion it might be nice to clarify. For example rather than saying “precedes in the ordering” to “is not equivalent and strictly precedes in the ordering” would prevent the reader from having to wonder.

BTW my comments are only suggestions. When I write technical documents, my philosophy is to make it correct and easy for someone to read, rather than making it as terse as possible. Sometimes saying the same thing in two slightly different ways can unburden the reader.

Jasper-M · November 20, 2020, 11:38am

IMHO some improvements could be done to the docs, but “less than” is the least ambiguous name and description for the parameter of sortWith. It’s a well known operator with a well known mathematical definition.

Perhaps some confusion could stem from the phrase “elements that are equal (as determined by lt )”, because it leaves as an exercise for the reader that equal means !(lt(a, b) || lt(b, a)).