How to find a key of a Map

Right, I meant it is not guaranteed that the key is actually stored somewhere in the Map, and therefore asking for a key is not guaranteed to give a key that is identical to the one you used to update, or identical to the one you got last time.

1 Like

Pretty much a yes on both questions :slight_smile:

You can attempt at detecting when your objects are being GC-ed by overriding the finalize method and logging there. This doesn’t necessarily mean that the object’s entry was evacuated from the cache; unfortunately it doesn’t seem to provide a “hook” for entries’ evacuation. You could try checking the cache size though.

Note that this cache implementation is a naive one. Guava (Google’s library for Java) has a more elaborate Interner utility which provides a simpler API and potentially better performance. You’d obviously want to use a weak interner.

I have implemented your suggestion, and I probably have a bug. What I see is that it is EXTREMELY slow. Can you tell me what the following line 38 of your code example does depending on whether I redefine equals and hashCode vs as you’ve defined BddNode in your example? I fear that the way you’ve defined BddNode there is a n equals call somewhere in the WeakHashMap.get method which is traversing the two BddNodes to discover equality. The BddNodes are HHUUGGEE if you try to traverse them. Maybe I’m wrong with my supposition about why I’m seeing the slowness.

case class BddNode(label:Int, positive:Bdd, negative:Bdd) extends Bdd {
  // equals and hashCode methods copied from here:
  //   https://users.scala-lang.org/t/what-does-equivalence-of-instances-mean/4368/11?u=jimka
  override def equals(that: Any): Boolean = {
    that match {
      case b: Bdd => this eq b
      case _ => false
    }
  }

  override def hashCode(): Int =
    System.identityHashCode(this)
}

In order for the weak hash map or the guava interner solutions to work, the equals and hashcode methods must rely on canonical / logical identity rather than reference identity (as in your code).

Your problem is that you want to prevent duplication of objects that are logically equal; hence, when you are about to create a new object, you want to first check whether a logically equal object already exists in your cache. If you had a reference to that object in the first place, you wouldn’t need the cache, but since you don’t have it, you need to construct a new object – meaning, no object in the cache will have a reference to it, and you will always fail to find it in the cache (rendering it useless).

It’s fair to assume that in case your nodes are deeply nested, then both hashcode and equals will consume a lot of CPU, as they are both recursive (in this case). You could assert that by profiling you application (say, with VisualVM).

In order to optimize the runtime of hashcode, I would attempt to only compute it once and then memorize it; that will trade memory for CPU:

import scala.runtime.ScalaRunTime

case class BddNode(label:Int, positive:Bdd, negative:Bdd) extends Bdd {
  // `_hashCode` is what case classes use to generate hash code
  private lazy val _hashCode = ScalaRunTime._hashCode(this)
  override def hashCode(): Int = _hashCode
}

As for equals, I think you would need to come up with a way of generating a unique hash function for you nodes and also memorize it; hashcode is not unique, as it might very well produce collisions between non-equal objects:

case class BddNode(label:Int, positive:Bdd, negative:Bdd) extends Bdd {
  override def equals(that: Any): Boolean = {
    that match {
      case node: BddNode => uniqueHash == node.uniqueHash
      case _ => false
    }
  }

  private lazy val uniqueHash: String = {
    def childHash(child: Bdd): String = child match {
      case True => "T"
      case False => "F"
      case node: BddNode => node.uniqueHash
    }

    s"$label[${childHash(positive)},${childHash(negative)}]" 
  }
}

This uniqueHash is very simplistic and consumes a lot of memory. I’ve used it in order to demonstrate what it stands for – the “path” of the node to the root bdds (True and False), which determines the identity of the node.

If the logical meaning of the label is to denote the level of nesting of the node, then it can be safely omitted from this hash method, as the level of nesting is implied by the path.

One more thing. If, when about to create a new node, it is guaranteed that the children are interned (both positive and negative don’t have other objects that are similar to them), then you can use this equals method:

override def equals(that: Any): Boolean = {
  that match {
    case node: BddNode =>
      label == node.label && positive eq node.positive && negative eq node.negative
    case _ => false
  }
}

I think this is the sense of my other question.

I’m trying the following. Early results are positive. It’s not dead slow as it was before.

I wonder if it is a bad idea to use pattern matching in the equals method.
There was a previous discussion about how to write a similar equal method.

case class BddNode(label:Int, positive:Bdd, negative:Bdd) extends Bdd {

  override def equals(that: Any): Boolean = {
    that match {
      case BddNode(l2:Int,p2:Bdd,n2:Bdd) => ( p2 eq positive) && (n2 eq negative) && (l2 == label)
      case _ => false
    }
  }

  override def hashCode(): Int =
    label.## ^ System.identityHashCode(positive) ^ ~System.identityHashCode(negative)

...
}

So one piece of good news, whereas my function was getting out-of-memory at numNodes=40, with the previous implementation, after incorporating Eyal’s suggestion, including my hashCode described above I’m not able to compute numNodes=40, and 41.

@jimka:
Weren’t you creating BddNodes in a way that ensures uniqueness of label field? In such case you can simplify the code:

case class BddNode(label:Int, positive:Bdd, negative:Bdd) extends Bdd {
  override def equals(that: Any): Boolean = {
    that match {
      case b: Bdd => this eq b
      case _ => false
    }
  }

  override def hashCode(): Int =
    label * some_prime // multiply by a big prime to improve distribution
}

The issue with System.identityHashCode is that it is guranteed to be stable in Java specification, but object’s address doesn’t, so JVM needs to have a map of Object address -> hash code as the address of an object can change (garbage collector can and often does move the objects around). This map takes up memory and also lookup could be costly compared to just accessing label field. This map’s contents are also lazily created so if you don’t invoke System.identityHashCode then you don’t pay the price of mapping your object to identity hash code.

@tarsa no, label is not a unique identifier.
Here is an image of what such a structure looks like. As you can see many nodes have the same label: Z1, Z2, Z3, Z4. But a node is uniquely identified by its label+positiveChild+negativeChild

What is the purpose of System.identityHashCode if it cannot be reliably used as a hash-code? This seems strange to me, like I’m perhaps not understanding something fundamental. Do Maps work? Do they use identityHashCode internally?

It is reliable. What @tarsa said is that identityHashCode consumes more memory, but only after first usage (it is lazy), and therefore it would have been better if the Bdds had a unique identifier in the first place (but they don’t).

Is it possible in Scala to decrease the size of an object containing an Integer by promising that the integer will always between 0 and 256 exclusive? exponentially many wasted bytes is expensive. Typically this type of optimization is only available in C-like languages.

You could use a Byte; i.e, 1.toByte. I’m not sure how this helps though.

Will Byte actually compile to a normal Integer?

Byte will be compiled to Byte. But like all Scala integer types, it is signed, so it goes from -128 to 127, not 0 to 255.

ah OK, that sucks :frowning: Not sure if it is useful for my applications to have integers from 1 to 127.

BTW was there ever a discussion of supporting unsigned integers in Scala 3 ?

java.lang.Integer in Java 8 implements some unsigned operations: compareUnsigned, divideUnsigned, remainderUnsigned, parseUnsigned, toUnsignedString. Addition, subtraction and multiplication don’t need unsigned equivalents because in two’s complement encoding they work the same.

You could implement unsigned types on Scala 2, but if you go for AnyVals then they may incur high overhead: https://failex.blogspot.com/2017/04/the-high-cost-of-anyval-subclasses.html OTOH Scala 3 has https://docs.scala-lang.org/sips/opaque-types.html so it should be more feasible to do it (by design, opaque types never box).

There are always some issues when doing operation on mixed signedness types. Rust language forces you to cast numbers to matching types before doing operations, but that’s rather cheap syntactically (e.g. something as i32, something_else as u32). Also unsigned types are source of mistakes if you e.g. count down to zero and have some off-by-1 error in counting somewhere. It’s easier to avoid mistakes with signed types.

As to numeric types ranges and bit sizes, they’re pretty standard.

  • Byte takes 1 byte and has range -(2^7) to (2^7)- 1
  • Short takes 2 bytes and has range -(2^15) to (2^15)- 1
  • Int takes 4 bytes and has range -(2^31) to (2^31)- 1
  • Long takes 8 bytes and has range -(2^63) to (2^63)- 1
1 Like

Nobody ever said “if only I had a signed Byte”, but pretending they are unsigned (and not using division on them) works well enough.

1 Like

And especially since this works:

scala> 255.toByte
res1: Byte = -1