How to find a key of a Map

jimka · February 28, 2020, 1:29pm

If I have a Map[A,Boolean] and A is type for which distinct values might be ==, is there a way for me to find the key (if any) which is equal to a given object of type A?

One application of this would be to normalize a collection so that all == objects become eq, and thus decrease memory consumption.

BalmungSan · February 28, 2020, 2:37pm

I do not understand the question. If two things are the same, then it can’t appear multiple times as the key of a map. And thus, you should only need to map.contains(x).

jimka · February 28, 2020, 3:27pm

Let me give an example.

Suppose I have a Map[(Int,Int),Boolean] such and at some point I add ((2,4) -> true) to it. Later on I create another tuple which happens to have the same components (2,4). If I ask whether (2,4) is in the map, the answer is yes. But its not the same (2,4) object, it’s just one which happens to contain 2 and 4. How can I ask the map to give me the original (2,4) object? I could of course iterate over all the keys, but that would be a linear complexity operation, rather than an amortised constant complexity operation.

A use case. Suppose I have a sequence of 100,000 tuples. I could normalise the 100,000 tuples, but building a Map, and then mapping over the sequence to build a new sequence where each tuple is replaced with the hash key which it is == to. This would decrease the memory allocation by 100,000 * (2 * 64), i.e., replacing 3 64 bit pointers by 1 such pointer, but repeated 100,000 times, thus saving 16Meg of memory.

BalmungSan · February 28, 2020, 3:41pm

Sorry, I still do not get the question.

If you already have an object which is equivalent to the one inside the map, why do you the original one? Unless it is mutable and carries some internal state?

jimka · February 28, 2020, 3:45pm

Because 100,000 copies of an object takes 128 meg of memory, and 100,000 pointers to the same object takes 16 meg of memory. My application deals with exponentially explosive data structures, and I regularly get out-of-memory errors. I’m looking for ways to decrease memory usage.

BalmungSan · February 28, 2020, 3:55pm

Yeah I get that, what I still do not understand (and bear with me that this probably due my bad English) is how this relates to the original question.

If I understand you correctly, you want to find if a key exists in the Map and retrieve the original key, but what do you want to use to do that check? If you construct an object that is exactly the same one, then how retrieving the original one would help if you already created a copy?

jimka · February 28, 2020, 4:18pm

Here is code which I believe would work, but has quadratic complexity. The function getKeyFromMap returns the key from the Map which is equal to its first argument.

def block[A](body:(A=>Nothing)=>A):A = {
  // CL like block/return, the name of the return() function is provided
  //  by the caller.
  //  Usage:  block{ ret =>  ... ret(someValue) ...}

  import scala.util.control.NoStackTrace

  class NonLocalExit(val data:A) extends Exception with NoStackTrace {}
  def ret(data:A):Nothing = {
    throw new NonLocalExit(data)
  }
  try{
    body(ret)
  }
  catch{
    case nonLocalExit: NonLocalExit => nonLocalExit.data
  }
}

def getKeyFromMap[T,S](key:T,base:Map[T,S]):Option[T] = {
  block { produce =>
    base.map{ case (k:T, _:S) => if (key == k) produce(Some(k)) }
    produce(None)
  }
}

def normalizeSequence[T,S](seq:Seq[T],base:Map[T,S]):Seq[T] = {
  seq.map{x:T =>
    getKeyFromMap(x,base) match {
      case Some(key) => key
      case _ => x
    }
  }
}

val seq = List((1,2),(2,3),(1,2),(1,2),(2,3))
val base = seq.map{ tuple => (tuple -> true)}.toMap
val normalizedBase = normalizeSequence(seq,base)

// check to confirm that any equal items are actually eq
for { x1 <- normalizedBase
      x2 <- normalizedBase 
      if x1 == x2
      } assert (x1 eq x2)

jimka · February 28, 2020, 4:20pm

The way it would help would be that I must have something generating the 100,000 objects, many of which are == to each other. So while I’m creating them, I would normalise them, so that the transient values would be garbage which the GC will happily and efficiently collect as ephemeral data. the JVM is very good at reclaiming ephemeral object.

shawjef3 · February 28, 2020, 4:45pm

@jimka you’re looking for reference equality. Tuples and Produces (case classes are Products) don’t use reference equality for ==.

scala> (2,4) eq (2,4)
res1: Boolean = false

scala> (2,4) ne (2,4)
res2: Boolean = true

jgulotta · February 28, 2020, 5:47pm

Key retrieval is not part of the current map or set APIs, but if it’s not too much overhead you could use a TrieMap (or Java concurrent map) where the value is the same object as the first key that was inserted.

@ object Intern  {
    private val pool = collection.concurrent.TrieMap.empty[AnyRef, AnyRef]

    def apply[T](value: T): T = pool.getOrElseUpdate(value.asInstanceOf[AnyRef], value.asInstanceOf[AnyRef]).asInstanceOf[T]
  }
defined object Intern

@ val a = (1, 2)
a: (Int, Int) = (1, 2)

@ val b = (1, 2)
b: (Int, Int) = (1, 2)

@ a == b
res12: Boolean = true

@ a eq b
res13: Boolean = false

@ a eq Intern(a)
res14: Boolean = true

@ b eq Intern(b)
res15: Boolean = false

@ a eq Intern(b)
res16: Boolean = true

jimka · February 28, 2020, 6:43pm

Is there a weak TrieMap ?

jimka · February 28, 2020, 7:03pm

What are all the @ signs for?

jgulotta · February 28, 2020, 7:06pm

There is no weak TrieMap, but you may be able to use a WeakReference to get the behavior you desire.

jgulotta · February 28, 2020, 7:07pm

The @ is copy/paste from an ammonite REPL session.

curoli · February 28, 2020, 7:45pm

But how are you getting 100,000 independent copies of the same object?

jimka · February 28, 2020, 8:15pm

The 100,000 is a pedagogical example. My data structure uses a weak value hash map to enforce structural identity, ie to assure by construction that all isomorphic objects are eq, making tree equivalence an O(1) operation.

Sadly the Java boss weak value hash map wastes what seems like a huge amount of memory. Therefore I’m investigating whether a weak key hash map might work. But to use a weak keys map I need to find the key given something that hashes to the same thing.

Does it make sense?

jimka · February 28, 2020, 8:19pm

Seems like I may need to learn about implementing weak-references. They are magic to me.

eyalroth · March 1, 2020, 1:06pm

I’m not exactly sure what is the purpose of this code; is it to demonstrate what you are trying to achieve in general?

In any case, it seems that you’re looking for reference equality (as mentioned by others). According to this SO question, you could use Java’s IdentityHashMap for this purpose, and potentially convert it to a Scala immutable map.

jimka · March 1, 2020, 1:19pm

Can we make the java IdentityHashMap into a weak-reference hash map, so that objects which are not referenced elsewhere are subject to GC? I’ve never implemented weak-hash-tables but I’ve used them in Lisp. Taking a look at java-weak-reference I admit that I don’t understand java, and I don’t know how to use this information in Scala. Am I supposed to create a Map[] in Scala of Scala objects each wrapped by this java object? Will this allow the GC to remove objects from the Map[] when they are no longer referenced elsewhere? Or does it man I need to implement some class entirely in java and then import it into my Scala program?

eyalroth · March 1, 2020, 1:37pm

I am not exactly sure what you are trying to achieve, or what is exactly your use case in which your JVM’s memory explodes.