Sorting by Comparable parts

kwalcock · September 8, 2021, 5:17am

I’m trying to sort complicated objects (simplified here into class Record) by using a sequence (the comparableExtractors) of little functions (of type ComparableExtractor[_]) that extract Comparable parts of the object. These functions are called until some parts compare as not equal and thereby determine which of the Records comes first. However, even though IntelliJ likes the code, the compiler does not. I can’t get the type specification correct. There must be some black hole in my understanding. I’m working in Scala 2.12, but hope that the solution also works for 2.11. The code is below. The compiler error is

inferred type arguments [_$1] do not conform to method compareExtraction's type parameter bounds [T <: Comparable[T]]
        .find { comparableExtractor => compareExtraction(comparableExtractor) != 0 }

Here’s the code.

package org.example

class Record(val index: Integer, val name: String) extends Comparable[Record] {
  type ComparableExtractor[T <: Comparable[T]] = Record => T

  val comparableExtractors: Seq[ComparableExtractor[_]] = Seq(
    (record => record.index): ComparableExtractor[Integer],
    (record => record.name): ComparableExtractor[String]
  )

  override def compareTo(other: Record): Int = {

    def compareExtraction[T <: Comparable[T]](comparableExtractor: ComparableExtractor[T]): Int =
        comparableExtractor(this).compareTo(comparableExtractor(other))

    comparableExtractors
        .find { comparableExtractor: ComparableExtractor[_] => compareExtraction(comparableExtractor) != 0 }
        .map(compareExtraction(_))
        .getOrElse(-1)
  }
}

object Example extends App {
  val records = Seq(
    new Record(1, "b"),
    new Record(2, "a")
  )
  val sortedRecords = records.sorted

  println(sortedRecords)
}

cbley · September 8, 2021, 6:47am

Hi.

This sounds like you may want to use a lens library like quicklens or monocle. Here are a few others.

You are using a wildcard type variable here, so there’s no information which type this ComparableExtrator extracts.

And your approach is not idiomatic Scala. Usually, you would be using type classes for this kind of stuff (and there already is one: Ordering).

For your example, you could simply use

val sortedRecords = records.sortBy(r => (r.index, r.name))

cbley · September 8, 2021, 7:25am

Besides, your implementation of the compareTo method

kwalcock:

comparableExtractors
        .find { comparableExtractor: ComparableExtractor[_] => compareExtraction(comparableExtractor) != 0 }
        .map(compareExtraction(_))
        .getOrElse(-1)

breaks the reflexivity law that a == a.

And, if you would make Record a case class, the ordering would be defined automatically, in the way that you wanted.

kwalcock · September 8, 2021, 3:17pm

Thanks. I tried with Ordering and ran into much the same problem, but will try again.

This shouldn’t be so complicated as to require use of anything more than the standard library, I would think.

The conversion to a tuple is familiar, but that only works because r.name looks like it is cheap. In reality it is prohibitively expensive. However, I’m contradicting myself because if it is so expensive, I should cache the result. I’ll add that.

That’s a good point about reflexivity. I can return 0 instead. This is being called on a list that is guaranteed not to have duplicates, but still.

kwalcock · September 8, 2021, 9:17pm

The working version has been updated to that below. I’m still not happy with some of the duplication or unnecessary recalculation after the find by the map, but at least it seems to work. Better ideas would be appreciated. Thanks.

package org.example

class RecordComparator(val record: Record) extends Comparable[RecordComparator] {
  lazy val index = { println(s"slowly calculate index from $record"); record.index }
  lazy val name = { println(s"slowly calculate name from $record"); record.name }
  lazy val description = { println(s"slowly calculate description from $record"); record.description }

  override def compareTo(other: RecordComparator): Int = {
    RecordComparator.fieldComparators
      .find(_(this, other) != 0)
      .map(_(this, other))
      .getOrElse(0)
  }
}

object RecordComparator {
  type FieldComparator = (RecordComparator, RecordComparator) => Int

  val fieldComparators = Seq[FieldComparator](
    (left, right) => left.name.compareTo(right.name),
    (left, right) => left.description.compareTo(right.description),
    (left, right) => left.index.compareTo(right.index)
  )
}

case class Record(index: Integer, name: String, description: String) extends Comparable[Record] {
  val recordComparator = new RecordComparator(this)

  override def compareTo(other: Record): Int = recordComparator.compareTo(other.recordComparator)
}

object Example extends App {
  val records = Seq(
    Record(3, "b", "3b"),
    Record(1, "b", "1b"),
    Record(2, "a", "2a")
  )
  val sortedRecords = records.sorted

  println(sortedRecords)
}

cbley · September 9, 2021, 7:01am

Maybe you could explain what your objective is, what you want to achieve. Minimize allocations? Optimize for speed?

Note that Scala 2.13’s Ordering class has an orElseBy method, which you can easily add for 2.12 too:

import scala.math.Ordering

object implicits {
  implicit class OrderingOrElseBy[T](val ordering: Ordering[T]) extends AnyVal {
    def orElseBy[U](f: T => U)(implicit o: Ordering[U]): Ordering[T] = { (a, b) =>              
        val res = ordering.compare(a, b)
        if (res == 0) o.compare(f(a), f(b))
        else res 
    }
  }
}

object Example extends App {
  import implicits._

  val recordOrdering = Ordering.by[Record, Integer](_.index)
    .orElseBy(_.name)
    .orElseBy(_.description)

  val records = Seq(
    Record(3, "b", "3b"),
    Record(1, "b", "1b"),
    Record(2, "a", "2a")
  )
  val sortedRecords = records.sorted(recordOrdering)

  println(sortedRecords)
}

kwalcock · September 9, 2021, 9:18pm

This is brilliant. I didn’t think of backporting anything. I like how there is only one written call to record.index. It’s fancy enough that Scala 2.11 doesn’t grok it, though. It complains

[error]  Example3.scala:13:75: missing parameter type
[error]      def orElseBy[U](f: T => U)(implicit o: Ordering[U]): Ordering[T] = { (a, b) =>
[error]                                                                           ^

I haven’t figured out how to help the compiler. Neither a: T nor a: Ordering[T] work.

I looked into the redundancy in

Ordering.by[Record, Integer](_.index)

and came up with

Ordering.by { _: Record => true } // give compiler some clues
    .orElseBy(_.index)

but that leaves some to be desired as well. The compiler should know that index is an Int like it can deal with the other ones.

cbley · September 10, 2021, 7:02am

kwalcock:

It’s fancy enough that Scala 2.11 doesn’t grok it, though. It complains

[error]  Example3.scala:13:75: missing parameter type
[error]      def orElseBy[U](f: T => U)(implicit o: Ordering[U]): Ordering[T] = { (a, b) =>
[error]                                                                           ^

Yeah, Scala 2.11 does not support conversion of lambda functions to SAM type instances. One has to be a bit more explicit:

def orElseBy[U](f: T => U)(implicit o: Ordering[U]): Ordering[T] = new Ordering[T] {
      def compare(a: T, b: T) = {
        val res = ordering.compare(a, b)
        if (res == 0) o.compare(f(a), f(b))
        else res 
      }
    }

kwalcock:

I looked into the redundancy in
Ordering.by[Record, Integer](_.index)
and came up with
Ordering.by { _: Record => true } // give compiler some clues
    .orElseBy(_.index)
but that leaves some to be desired as well. The compiler should know that index is an Int like it can deal with the other ones.

You could instead write something like this:

Ordering.by((_: Record).index)

Here’s a scastie example running with Scala 2.11: Scastie - An interactive playground for Scala.

kwalcock · September 10, 2021, 6:29pm

Excellent! Now that you mention it, I do remember that change between 2.11 and 2.12. To get around that slight redundancy or non-parallel construction in which the first field to compare is described differently than the others, I’m trying an Unordered. Its compare is needlessly called, but pretty things don’t have to be free. Here’s the updated code before I move some parts to library files for reuse. Thanks for your help.

package org.example

import scala.math.Ordering

object Unordered {
  def apply[T]: Ordering[T] = new Ordering[T] {
    def compare(x: T, y: T): Int = 0
  }
}

import implicits._
object implicits {
  implicit class OrderingOrElseBy[T](val ordering: Ordering[T]) extends AnyVal {
    def orElseBy[U](f: T => U)(implicit o: Ordering[U]): Ordering[T] = new Ordering[T] {
      def compare(x: T, y: T): Int = {
        val result = ordering.compare(x, y)
        if (result == 0) o.compare(f(x), f(y))
        else result
      }
    }
  }
}

case class RecordComparator(record: Record) {
  lazy val slowIndex: Int = { println(s"$this slowIndex"); record.index }
  lazy val slowName: String = { println(s"$this slowName"); record.name }
  lazy val slowDescription: String = { println(s"$this slowDescription"); record.description }
}

object RecordComparator {
  val ordering = Unordered[RecordComparator]
    .orElseBy(_.slowName)
    .orElseBy(_.slowDescription)
    .orElseBy(_.slowIndex)
}

case class Record(index: Integer, name: String, description: String) {
  protected val recordComparator = RecordComparator(this)
}

object Record {
  implicit val ordering: Ordering[Record] = RecordComparator.ordering.on(_.recordComparator)
}

object Example extends App {
  val records = Seq(
    Record(3, "b", "3b"),
    Record(1, "b", "1b"),
    Record(2, "a", "2a"),
    Record(4, "b", "1b")
  )
  val sortedRecords = records.sorted

  println(sortedRecords)
}

kwalcock · September 11, 2021, 4:29am

For the record, it did occur to me that self comparisons should be checked. They are otherwise very expensive because every single field needs to be compared.

  implicit class OrderingOrElseBy[T <: AnyRef](val ordering: Ordering[T]) {
    def orElseBy[U](f: T => U)(implicit innerOrdering: Ordering[U]): Ordering[T] = new Ordering[T] {
      def compare(x: T, y: T): Int = {
        if (x.eq(y)) 0 // Short-circuit all further comparisons.
        else {
          val result = ordering.compare(x, y)

          if (result == 0) innerOrdering.compare(f(x), f(y))
          else result
        }
      }
    }
  }

chaotic3quilibrium · September 11, 2021, 4:21pm

Since what you are doing (especially your using the x.eq(y) method), is bordering right on an abyss of complexity and confusion regarding the equals and hashCode methods available on every single instance of both AnyVal and AnyRef (which all automatically inherit from java.lang.Object), it’s worth sharing an Answer I produced on StackOverflow dealing with all of the issues in this area.

Almost all ordering in Scala eventually has to visit this area if it in ANY way plans to use ANY aspect of the Scala collections library.

chaotic3quilibrium · September 11, 2021, 4:24pm

Well, it appears I screwed up and didn’t properly “Reply” to the post which this addresses. And deleting and attempting to correctly “Reply” wouldn’t let me “post identical content” even though the original post was deleted. Sigh.

kwalcock · September 12, 2021, 5:04am

Thanks. I read that discussion and maybe even reread it. This codebase has customized versions of equals and hashCode which may have been influenced by your post. It looks like I should change _.hashCode to _.##.

chaotic3quilibrium · September 12, 2021, 5:44pm

Yasvw! Godspeed!

I cannot begin to tell you how much time I have lost in this area. I have a tendency to learn the hard way. Like the maximum hard way.