Fast filter over a list of json strings

Hi All.

From time to time I have to read a file that contains a list of simple json maps. I do not always know what keys will be in that map. Once I load it I need to perform bunch of the operations like filter for a certain field having specific value and so on. Naturally there is no database-like index on any of the fields in the map stored in the list. I can of course load it into a database but again data is somewhat free form and I was wondering if there is a way dramatically speed up operations like List(…).filter(…).filter(…).findAny() by performing some kind of indexing operation to the list?

Three ideas come to my mind.

  • Do all the operations lazily so you can avoid extra traversals of the data. e.g. Using an Iterator or a Stream (like fs2, akka, monix, zio)
  • Do not load all the file at once but rather process it at batches. Again, this could be done using a Stream.
  • If your main operations are on the first level of nesting parse your data into a Map[String, Json] so you take advantage of the fast access by index.

Note: I personally would use circe-fs2 for this, which would give me the three previous points.

Could probably be done super-efficiently in weePickle by building a custom Visitor that understands the data structure and does the filtering while the JSON is being parsed, but that’s a good deal of effort.