Help writing equals and a compatible hashCode

jimka · March 4, 2020, 3:52pm

I’m trying to debug some code, and i want to experiment with making a different equals function for my class.
Currently I’ve made equals equivalent to eq. What I’d like to try is that two objects BddNode(a:Int,b:Bdd,c:Bdd) equals BddNode(x:Int,y:Bdd,z:Bdd) if a == x, b eq y, and c eq z. If I make that my equals function logic, then how do I change hashCode to be compatible?


case class BddNode(label:Int, positive:Bdd, negative:Bdd) extends Bdd {
  // equals and hashCode methods copied from here:
  //   https://users.scala-lang.org/t/what-does-equivalence-of-instances-mean/4368/11?u=jimka
  override def equals(that: Any): Boolean = {
    that match {
      case b: Bdd => this eq b
      case _ => false
    }
  }

  override def hashCode(): Int =
    System.identityHashCode(this)
}

martijnhoekstra · March 4, 2020, 3:59pm

you could use for example

a.## ^ System.identityHashCode(b) ^ System.identityHashCode(c)

A weakness is that it hashes identical with b and c switched. Whether that’s a problem depends on how likely that is.

jimka · March 4, 2020, 6:59pm

could I use something like a.## ^ System.identityHashCode(b) ^ ~System.identityHashCode(c)? If I’m not mistaken ~ is bitwise-not. As is exceedingly likely that I will have label/a/b and also label/b/a in the same program.

martijnhoekstra · March 4, 2020, 7:17pm

yeah, then a.## ^ System.identityHashCode(b) ^ ~System.identityHashCode(c) sounds good.

eyalroth · March 4, 2020, 7:20pm

I would advice using Scala’s own hashcode utility:

import scala.runtime.ScalaRunTime

case class BddNode(label:Int, positive:Bdd, negative:Bdd) extends Bdd {
  override def hashCode(): Int = {
    val product = (label, System.identityHashCode(positive), System.identityHashCode(negative))
    ScalaRunTime._hashcode(product)
  }
}

It uses MurmurHash3, which is designed to generate well-distributed non-cryptographic hashes. Perhaps @Ichoran has more to insight to provide, given that he is credited in the source code.

jimka · March 4, 2020, 8:41pm

My first thought was that allocating a Tuple3 is too much work just to generate a hash code. Am I overreacting?

eyalroth · March 4, 2020, 9:28pm

Maybe, but if the hashcode isn’t well distributed, that might cost with slower hash-map lookup. You’re gonna have to benchmark and see which option is better overall.

Ichoran · March 4, 2020, 9:53pm

If you want an ordered hashcode, you can build your own using MurmurHash3. If you want it unordered (i.e. Foo(a, b, c) has the same hashcode as Foo(b, c, a), then use unorderedHash. Otherwise, you can do it by hand like finalizeHash( mixLast( mix( mix(42, a.##), b.##, c.##))), 3 ), where you can choose whatever you want for 42 and 3.

The mutable version is easier to understand, I think:

var h = initialValueThatYouChoose
h = mix(h, a.##)
h = mix(h, b.##)
h = mixLast(h, c.##)   // Can also do mix, this is just slightly faster
finalizeHash(h, 3)   // Usually the value here is the number of items hashed

If you want to have multiple different things that each have their own distinct hash codes, but contain the same data, make them have different initial values.

Tuple does all this for you, but in the worst case for three items with trivial hash codes (e.g. Int, whose value is its own hash code), you are about 3x faster using MurmurHash3 directly.

It’s still only like 10 ns per hash (on a decently fast machine) overhead for the tuple, so you need to be doing a lot of them for it to matter.

eyalroth · March 5, 2020, 2:54pm

@jimka Note that in your case you probably don’t want to be using .##, especially not on the nested Bdds, as this will trigger a recursive evaluation (which you want to avoid):

var h = initialValueThatYouChoose
h = mix(h, a) // `a` is `Int`, so `.##` will return the same value, and thus redundant
h = mix(h, System.identityHashCode(b))
h = mixLast(h, System.identityHashCode(c))
finalizeHash(h, 3)

jimka · March 5, 2020, 3:15pm

I admit I don’t understand all the discussion. What’s wrong with my proposed solution? Is it a good one? or is it too naive? I’m only using ## on the integer, not on the bdd pointers. Or are ## and System.identityHashCode really the same thing under the hood?

a.## ^ System.identityHashCode(b) ^ ~System.identityHashCode(c)

eyalroth · March 5, 2020, 4:38pm

## is basically hashCode, but it deals with some edge cases regarding certain value types (such as Double). You can read about it more here.

For an integer, the hash-code will always be the same as the integer itself, so you can just use the integer.

val i: Int = ???
i == i.hashCode // true
i == i.## // true

As for identityHashCode, it has rather different semantics than hashCode. The former will generate an arbitrary hash, while the latter is expected to be compatible with equals (in the class it is defined).

The default implementation of equals is reference equality (== in Java, eq in Scala), and the default hashCode implementation is identityHashCode. In this case, hashCode is compatible with equals as it is guaranteed that the object is equal only to itself, and identityHashCode will always generate the same hash for the same object.

When implementing a different equals that allows separate objects to be considered equal (based on some deterministic logic), one must implement hashCode so that every pair of objects that are considered equal will generate the same hash code.

Objects which are not equal may still generate the same hash code – that would be considered a collision – and a good implementation of the hashCode method will try to minimize the amount of collisions (that’s the meaning of a “well-distributed” hash). The reason we don’t want collisions is a direct result of the main usage of hashCode; i.e, lookup in a hash table. When looking up an object in a hash table, we must test equality of that object with every other object in the table that has the same hash code, so we would like that number to be as low as possible.

The MurmurHash3 algorithm should provide better distribution (less collisions) than a simple ^.

As for using identityHashCode in your case, it is so we could emulate the equality logic of eq between the children. We know that if two objects are the same (eq), they ought to generate the same identityHashCode; however, it is possible that different objects will also generate the same identityHashCode (there is no guarantee for this method to be unique), and then we would get a collision.

You could try mitigating these collisions by generating a unique id for every object you create, though I’m not sure if that would yield much of an improvement on the hash-lookup performance:

case class BddNode {
  private val id: Long = rollingNumber.getAndIncrement()
  override def hashCode(): Int = {
    var h = initialValueThatYouChoose
    h = mix(h, label)
    h = mix(h, positive.id.toInt)
    h = mixLast(h, negative.id.toInt)
    finalizeHash(h, 3)
  }
}

object BddNode {
  // you could initialize this with a `DynamicVariable` like you did with `numAllocations`.
  private val rollingNumber = new java.util.concurrent.atomic.AtomicLong()
}

I’m not sure this will be effective, as the potential number of the objects you will allocate may exceed the range of Int (this is why the id is Long), and since a hashCode is Int, you may still get collisions.