For with hash map reference

I find it frightfully convenient, even unsettling so, how elegantly the for comprehension works with hash map references. This is especially true using a data structure which is a hash of maybe hashes of maybe hashes etc. A hash map reference using get returns an Option, and map or flatMap automatically short-circuit the evaluation if the Option is None. This can replace many lines of logic which I’ve repeated several times in my application. For example The following two expressions are equivalent. At least as far as I can tell, but it took me a long time to convince myself that they really are.

QUESTION: Does everyone agree that the for comprehension is better? Or is it too terse? Which version of the function is easier to understand?

def removeClause(removeMe: Clause, numPos: Int, length: Int, rectified:Clause): Unit = {
  hash.get(numPos) match {
    case None => Unit
    case Some(lengthHash) =>
      lengthHash.get(length) match {
        case None => Unit
        case Some(rectHash) => {
          rectHash.get(rectified) match {
            case None => Unit
            case Some(_) => rectHash(rectified) -= removeMe
          }
        }
      }
  }
}

the same as

def removeClause(removeMe: Clause, numPos: Int, length: Int, rectified:Clause): Unit = {
  for{
    lengthHash <- hash.get(numPos)
    rectHash <- lengthHash.get(length)
    _ <- rectHash.get(rectified)
  } rectHash(rectified) -= removeMe
}

The shortcircuit to None is awesome. Thanks for sharing this example.
The for comprehension makes it more expressive and concise.

I wonder whether the’s a similar type of shortcut notation for the addClause method? In this case there is important action to take when None is found.

  // Add the given clause, in the correct place, into hash.   This function,
  // addClause, takes redundant information in its arguments.  We assume this redundant
  // information is correct (not contradictory).  Use other arity versions of addClause
  // in the case you need to calculate posCount, length, or rectified.
  // * posCount is the number of positive integers in addMe
  // * length is the length of addMe
  // * rectified is a Clause of all positive integers.  
  //        E.g., if addMe=List(1,-2,-3), then rectified=List(1,2,3)
  def addClause(addMe: Clause, posCount: Int, length: Int, rectified:Clause): Unit = {
    hash.get(posCount) match {
      // If there is no mapping yet for posCount, then create one.
      case None => hash += (posCount -> new HashMap)
      case _ => Unit
    }
    hash(posCount).get(length) match {
      // If there is no mapping yet for length, then create one.
      case None => hash(posCount) += (length -> new HashMap)
      case _ => Unit
    }
    val rectHash:HashMap[Clause,Set[Clause]] = hash(posCount)(length)
    rectHash.get(rectified) match {
      // If there is not Set[Clause] yet for rectified, then create one by adding
      //    the mapping from rectified to Set(addMe); otherwise add addMe to the
      //    existing set of clause.  The Set operation is non-destructive, but
      //    but the new set is destructively added to the existing Mash Map
      case None => rectHash += (rectified -> Set(addMe))
      case _ => rectHash(rectified) += addMe
    }
  }

There isn’t any special syntax for handling the None case (as it is specific to Option, while for comprehensions work with any Monad).
But for your addClause method, there is a much shorter variant using methods from Map:

  def addClause(addMe: Clause, posCount: Int, length: Int, rectified:Clause): Unit =
    hash.getOrElseUpdate(posCount, new HashMap)
            .getOrElseUpdate(length, new HashMap)
            .getOrElseUpdate(rectified, Set()) += addMe

This uses getOrElseUpdate, which works like getOrElse, but also stores the returned default value in the map.

That’s great. I didn’t know about getOrElseUpdate, but it unfortunately doesn’t compile:

Error:(51, 42) value += is not a member of Set[dimacs.dimacsSimplify.Clause]
  Expression does not convert to assignment because receiver is not assignable.
      .getOrElseUpdate(rectified, Set()) += addMe

Frankly, I think the question is ill-formed: it’s very much a matter of taste. I strongly prefer for (when appropriate) myself, but I know many folks who prefer to spell it out.

That said, your first version, with all the .gets and matches – I don’t know anybody who does it that way when the None clauses just reduce to Unit like this. That’s kind of pointlessly wordy, and obscures the intent. The common alternative to for is to spell out the flatMaps and maps yourself – precisely the same code, just expressed differently.

Hmm. The error message suggests that it’s using immutable.Set, where you clearly require mutable.Set. You might try changing that Set() to mutable.Set(), although I’m really surprised you aren’t hitting other compile errors as well – are you sure that your data structure is declared with the mutable version?

I’m using immutable Sets. When I add a Clause to the hash table, I find the key/value pair in the Hash Map, the value is a Set[Clause], and I add the new Clause non-destructively to the Set, and destructively mute the Hash Map to reference that Set. At least that’s what my original code does. By the way, this confusion is part of why I find the original code difficult to read—it is not clear what is getting mutated.

I did wonder whether I’d gain some speed if I changed the Sets to be mutable. That’s on my list of things to test in optimizing the code.

Ah – okay, that explains why @crater2150’s suggestion didn’t work. Honestly, mixing mutable and immutable like this can be pretty hard to read and comprehend (as is attested by the fact that two of us read that code fairly carefully and didn’t understand what’s going on). You’re going to need to rewrite his last line to take that into account, but offhand I’m not quite sure of the best way to do that.

1 Like

I still don’t understand how the line in the original code:

case _ => rectHash(rectified) += addMe

worked with an immutable Set. Is += somehow overloaded on mutable maps to allow this?

For me the original line makes sense. rectHash(rectified) += addMe has the same semantics as rectHash(rectified) = rechHash(rectified) + addMe. The + simply creates a new Set[Clause] by adding addMe non-destructively to the Set which is the value of rectHash(rectified). Then the = simply destructively updates the Hash Map. Is that mysterious?

The following works, but I’m not sure I’m happy with it.

  def addClause(addMe: Clause, posCount: Int, length: Int, rectified:Clause): Unit = {
    hash.getOrElseUpdate(posCount, new HashMap)
      .getOrElseUpdate(length, new HashMap)
      .getOrElseUpdate(rectified, Set())

      hash(posCount)(length)(rectified) += addMe
  }

Although it would be nice to execute the last line hash(posCount)(length)(rectified) += addMe if and only if the ElseUpdate part didn’t run. Then rather than initializing the size with empty Set(), I could initialize with Set(addMe) and skip the 3 hash table dereferences and unnecessary Set + Set operation.

  def addClause(addMe: Clause, posCount: Int, length: Int, rectified:Clause): Unit = {
    hash.getOrElseUpdate(posCount, new HashMap)
      .getOrElseUpdate(length, new HashMap)
      .getOrElseUpdate(rectified, Set(addMe))

      // magically execute the following only if the ElseUpdate part didn't evaluate.
      hash(posCount)(length)(rectified) += addMe
  }

Perhaps the following is not so ugly?

  def addClause(addMe: Clause, posCount: Int, length: Int, rectified:Clause): Unit = {
    val lengthHash = hash.getOrElseUpdate(posCount, new HashMap)
    val rectHash = lengthHash.getOrElseUpdate(length, new HashMap)

    rectHash(rectified) = rectHash.get(rectified) match {
      case None =>  Set(addMe)
      case Some(clauses) => clauses + addMe
    }
  }

I understood the intent of the code correctly and also think it’s clear what the result is. As someone nearly never using mutable maps I just did not know that map(key) = value is allowed syntactically at all.

I looked it up, and it seems to be syntactic sugar for a call to map.update(key, value), just like map(key) is sugar for map.apply(key) (when used with +=, a combination of both is used). This also explains, why it does not work with getOrElseUpdate.

If you want to omit additional hash dereferences, you could also use the updateWith method, which also omits a second key lookup on insert:

  def addClause(addMe: Clause, posCount: Int, length: Int, rectified:Clause): Unit = {
    val lengthHash = hash.getOrElseUpdate(posCount, new HashMap)
    val rectHash = lengthHash.getOrElseUpdate(length, new HashMap)

    rectHash.updateWith(rectified) {
      case None =>  Set(addMe)
      case Some(clauses) => clauses + addMe
    }
    // alternatively
    rectHash.updateWith(rectified)(v => v.map(_ + addMe).getOrElse(Set(addMe))
  }

+= works for immutable sets as long as it’s a var, because then

set += newElement

is rewritten to

set = set + newElement

For example:

**Welcome to Scala 2.12.8 (OpenJDK 64-Bit Server VM, Java 11.0.3).
Type in expressions for evaluation. Or try :help.

var set = Set(1,2,3)
set: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
set += 4
set
res1: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)**

I do find, though, that the typechecker sometimes trips over it until the type annotations are added.

If you already know the code reasonably well, no – but that’s the rub. Someone who doesn’t know it well looks at this code, seems the Unit return type and the obviously-mutable calls above it, and tends to assume that it’s mutable all the way down.

It’s not spaghetti per se, but I do think it might be challenging for a disparate group to maintain…

I knew it worked for vars, the automatic rewrite to update in case of mutable data structures is what was new to me. Which totally makes sense for mutable data structures, but in case of nested structures, like here it wasn’t obvious to me, that it applies to the Map and not the Set inside.

(And the first code I posted actually works with mutable.Set imported)