What's the complexity of `immutable.HashSet.concat(HashSet)`?

The title says it all. I looked into it, but it’s hard to understand without a deep dive into the whole implementation. I am interested in whether an algorithm I’m thinking of is viable.

2 Likes

According to the following, adding one element takes effectively constant time (with some qualifications):
https://docs.scala-lang.org/overviews/collections-2.13/performance-characteristics.html

Well, that’s only par hash set in general. And that’s not the question I asked.

1 Like

Are we doing a technical part of a job interview here? Probably for someone to produce wireframe UIs, but we have to be stringent in our hiring. :sweat_smile:

You know from looking at the source code that it is at most to a good approximation linear in the size of the prepended collection, is that good enough for your plan? It may well be better than that, but if this answer suffices…

I had to scratch the itch and write a benchmark for this, the Scastie is here: Scastie - An interactive playground for Scala..

It’s a Scalameter benchmark, using sets of integers so the underlying hashing is constant time.

You’ll have to run it from the command line as per the Scalameter instructions here: SBT integration | ScalaMeter.

Better still, use IntelliJ to run the benchmark directly.

It’s easy to import the results from Scalameter into Excel or OpenOffice to get a regression equation.

UPDATE: New and improved benchmark:

  1. This one forces evaluation if the set pair iterator to avoid polluting the measured time with Americium’s generation of cases.
  2. It allows the sizes of the lhs and rhs of the set pairs to vary across the run set, as long as they sum to the size being measured.
  3. The size is changed exponentially.
4 Likes

Wow, that was quite a bit of work! Thanks. I kind of hoped it’s faster than linear, although without much conviction. I’ll probably have to use SortedSet, just that Ordering is cumbersome.

You may be able to get better concatenation performance if you use a fingertree. I don’t know precisely what you want to do with your sets (and what you want to put in them).

Assuming you have something orderable and has monoid addition defined for its ordering property, you can use something like this: sciss/FingerTree: A Scala implementation of the versatile purely functional data structure of the same name. - Codeberg.org.

As I recall, concatenation of fingertrees is fast, as long as the concatenated parts don’t have overlaps. Give it a try.

Bloom filters and hyperloglogs may also suit your purposes better…

2 Likes