How to find a key of a Map

If I have a Map[A,Boolean] and A is type for which distinct values might be ==, is there a way for me to find the key (if any) which is equal to a given object of type A?

One application of this would be to normalize a collection so that all == objects become eq, and thus decrease memory consumption.

I do not understand the question. If two things are the same, then it can’t appear multiple times as the key of a map. And thus, you should only need to map.contains(x).

Let me give an example.

Suppose I have a Map[(Int,Int),Boolean] such and at some point I add ((2,4) -> true) to it. Later on I create another tuple which happens to have the same components (2,4). If I ask whether (2,4) is in the map, the answer is yes. But its not the same (2,4) object, it’s just one which happens to contain 2 and 4. How can I ask the map to give me the original (2,4) object? I could of course iterate over all the keys, but that would be a linear complexity operation, rather than an amortised constant complexity operation.

A use case. Suppose I have a sequence of 100,000 tuples. I could normalise the 100,000 tuples, but building a Map, and then mapping over the sequence to build a new sequence where each tuple is replaced with the hash key which it is == to. This would decrease the memory allocation by 100,000 * (2 * 64), i.e., replacing 3 64 bit pointers by 1 such pointer, but repeated 100,000 times, thus saving 16Meg of memory.

Sorry, I still do not get the question.

If you already have an object which is equivalent to the one inside the map, why do you the original one? Unless it is mutable and carries some internal state?

Because 100,000 copies of an object takes 128 meg of memory, and 100,000 pointers to the same object takes 16 meg of memory. My application deals with exponentially explosive data structures, and I regularly get out-of-memory errors. I’m looking for ways to decrease memory usage.

Yeah I get that, what I still do not understand (and bear with me that this probably due my bad English) is how this relates to the original question.

If I understand you correctly, you want to find if a key exists in the Map and retrieve the original key, but what do you want to use to do that check? If you construct an object that is exactly the same one, then how retrieving the original one would help if you already created a copy?

Here is code which I believe would work, but has quadratic complexity. The function getKeyFromMap returns the key from the Map which is equal to its first argument.

def block[A](body:(A=>Nothing)=>A):A = {
  // CL like block/return, the name of the return() function is provided
  //  by the caller.
  //  Usage:  block{ ret =>  ... ret(someValue) ...}

  import scala.util.control.NoStackTrace

  class NonLocalExit(val data:A) extends Exception with NoStackTrace {}
  def ret(data:A):Nothing = {
    throw new NonLocalExit(data)
  }
  try{
    body(ret)
  }
  catch{
    case nonLocalExit: NonLocalExit => nonLocalExit.data
  }
}

def getKeyFromMap[T,S](key:T,base:Map[T,S]):Option[T] = {
  block { produce =>
    base.map{ case (k:T, _:S) => if (key == k) produce(Some(k)) }
    produce(None)
  }
}

def normalizeSequence[T,S](seq:Seq[T],base:Map[T,S]):Seq[T] = {
  seq.map{x:T =>
    getKeyFromMap(x,base) match {
      case Some(key) => key
      case _ => x
    }
  }
}

val seq = List((1,2),(2,3),(1,2),(1,2),(2,3))
val base = seq.map{ tuple => (tuple -> true)}.toMap
val normalizedBase = normalizeSequence(seq,base)

// check to confirm that any equal items are actually eq
for { x1 <- normalizedBase
      x2 <- normalizedBase 
      if x1 == x2
      } assert (x1 eq x2)

The way it would help would be that I must have something generating the 100,000 objects, many of which are == to each other. So while I’m creating them, I would normalise them, so that the transient values would be garbage which the GC will happily and efficiently collect as ephemeral data. the JVM is very good at reclaiming ephemeral object.

@jimka you’re looking for reference equality. Tuples and Produces (case classes are Products) don’t use reference equality for ==.

scala> (2,4) eq (2,4)
res1: Boolean = false

scala> (2,4) ne (2,4)
res2: Boolean = true
1 Like

Key retrieval is not part of the current map or set APIs, but if it’s not too much overhead you could use a TrieMap (or Java concurrent map) where the value is the same object as the first key that was inserted.

@ object Intern  {
    private val pool = collection.concurrent.TrieMap.empty[AnyRef, AnyRef]

    def apply[T](value: T): T = pool.getOrElseUpdate(value.asInstanceOf[AnyRef], value.asInstanceOf[AnyRef]).asInstanceOf[T]
  }
defined object Intern

@ val a = (1, 2)
a: (Int, Int) = (1, 2)

@ val b = (1, 2)
b: (Int, Int) = (1, 2)

@ a == b
res12: Boolean = true

@ a eq b
res13: Boolean = false

@ a eq Intern(a)
res14: Boolean = true

@ b eq Intern(b)
res15: Boolean = false

@ a eq Intern(b)
res16: Boolean = true

Is there a weak TrieMap ?

What are all the @ signs for?

There is no weak TrieMap, but you may be able to use a WeakReference to get the behavior you desire.

The @ is copy/paste from an ammonite REPL session.

But how are you getting 100,000 independent copies of the same object?

The 100,000 is a pedagogical example. My data structure uses a weak value hash map to enforce structural identity, ie to assure by construction that all isomorphic objects are eq, making tree equivalence an O(1) operation.

Sadly the Java boss weak value hash map wastes what seems like a huge amount of memory. Therefore I’m investigating whether a weak key hash map might work. But to use a weak keys map I need to find the key given something that hashes to the same thing.

Does it make sense?

Seems like I may need to learn about implementing weak-references. They are magic to me.

I’m not exactly sure what is the purpose of this code; is it to demonstrate what you are trying to achieve in general?

In any case, it seems that you’re looking for reference equality (as mentioned by others). According to this SO question, you could use Java’s IdentityHashMap for this purpose, and potentially convert it to a Scala immutable map.

Can we make the java IdentityHashMap into a weak-reference hash map, so that objects which are not referenced elsewhere are subject to GC? I’ve never implemented weak-hash-tables but I’ve used them in Lisp. Taking a look at java-weak-reference I admit that I don’t understand java, and I don’t know how to use this information in Scala. Am I supposed to create a Map[] in Scala of Scala objects each wrapped by this java object? Will this allow the GC to remove objects from the Map[] when they are no longer referenced elsewhere? Or does it man I need to implement some class entirely in java and then import it into my Scala program?

I am not exactly sure what you are trying to achieve, or what is exactly your use case in which your JVM’s memory explodes.