The answer’s not nearly that simple.
First, it’s important to keep in mind that the mapValues
function has essentially nothing to do with map
, which is a much more general function that is implemented on many data types. mapValues
is much more complex and much more specific.
And for
actually is a complete fiction: it’s “syntax sugar”, a syntactic structure that the compiler turns into calls to some combination of map
, flatMap
, foreach
and withFilter
under the hood. So for
almost can’t be slower than map
– it’s typically made of calls to map
. (But it depends on the details.)
So it all depends on the exact data structures and what you are doing. for
usually compiles to call to map
– but in your case it isn’t doing so (because you aren’t using the yield
keyword), and mapValues
is very specialized.
Also, I have a suspicion that you’re not testing with large enough numbers. Using .toList
to create theMultiPolygonKeys
, and then calling theMultiPolygonKeys(i)
over and over like that, is just about the slowest possible way to do this. It’s super-fast on short Lists, but as your list gets longer and longer, that’s a seriously O(N**2) operation. Every call to theMultiPolygonKeys(i)
is starting from the beginning of the list, and walking i
steps – very quick for 5 items, incredibly slow for a million.
But also note, almost nobody writes Scala code like this. This way of using for
, with indexes for a loop, is common in many languages but fairly rare in Scala, and pretty much never used with Lists. You’d instead say:
for (key <- multiPolygons.keys.toList) {
val geom = multiPolygons.get(key.toString).get.geom
val histogram = rasterTileLayerRDD.polygonalHistogram(geom)
val theStats = histogram.statistics
println(theMultiPolygonsKeys(i).toString, theStats)
}
The two primary differences are:
- You rarely work with indexes – you just assign the desired value directly in your
for
clause.
- You usually should favor
val
unless there is a specific reason to use var
. (Which is actually a little unusual.)
Hope this helps. Getting a realistic sense for how fast different operations are is really important, but the honest answer is usually going to be, “It depends on the details”. That’s why there are so many different data structures – each is better at some things, worse at others.