Scala sum/iterator bug?

I have a mutable.Map (used in a method as a collection.Map[String, PathData] argument)

I wanted to find the average length of the keys, so I used the following sub expression (subsequently divided by the map size):

pathMap.keys.map(_.length).sum

However, this gives the wrong answer (it is much too low). Converting it to a sequence first then gives the right answer:

pathMap.keys.toSeq.map(_.length).sum

Assuming that I’m not mutating the mutable map (which I’m not), then it is reasonable for these to return different answers, or is this just a bug?

This is an ammonite script, and I presume that I’m using the latest or very recent 2.13.X version.

E.g.,

println(pathMap.keys.toSeq.map(_.length).sum)
println(pathMap.keys.map(_.length).sum)

//prints 
1307484
40665

scala> val m = scala.collection.mutable.Map("a" -> 1, "b" -> 2, "c" -> 3)
val m: scala.collection.mutable.Map[String,Int] = HashMap(a -> 1, b -> 2, c -> 3)

scala> m.keys
val res0: Iterable[String] = Set(a, b, c)

scala> m.keys.map(_.length)
val res1: Iterable[Int] = Set(1)

The joys of collection subtyping. :woozy_face:

3 Likes

Haha, surprising behaviour of careful design! It is logical that keys of a Map form a Set. But of course their lengths do not. It’s an easy mistake to make.

Ah yes, thanks for the explanation.

I think that map applied to elements of a set returning a set is somewhat surprising. Even more so, if that if you examine the types in the call flow you then you never see the type as a set at all …

Well, #map() (derived from the Functor concept) usually returns the “self” type. But the behavior is somewhat surprising (or not - this is just why there cannot be a proper Functor for Set).

Agreed. I don’t see why Map#keys() doesn’t declare Set as a return type (or why it exists at all, as there is #keySet()) - probably just legacy/historical coincidence.

For your use case, I’d just go for this:

m.keysIterator.map(_.length).sum
1 Like

Yes, this is what I had changed my code to.

I don’t regularly write Scala for my day job, and hence often end up dipping in and out from time to time (but it has been about 10 years now …), and I have a feeling that I have hit this before (or perhaps just seen someone talking about it) and I didn’t remember it was an issue.

I think that doing the set aggregation/conversion on any method that is defined as being iterable is just sort of confusing/surprising.

Maybe there are good theoretical and sound reasons why it should be this way, but it still feels that people will occasionally trip up over this.

Regards,
Rob