From time to time I have to read a file that contains a list of simple json maps. I do not always know what keys will be in that map. Once I load it I need to perform bunch of the operations like filter for a certain field having specific value and so on. Naturally there is no database-like index on any of the fields in the map stored in the list. I can of course load it into a database but again data is somewhat free form and I was wondering if there is a way dramatically speed up operations like List(…).filter(…).filter(…).findAny() by performing some kind of indexing operation to the list?
Three ideas come to my mind.
- Do all the operations lazily so you can avoid extra traversals of the data. e.g. Using an
Iterator or a
Stream (like fs2, akka, monix, zio)
- Do not load all the file at once but rather process it at batches. Again, this could be done using a
- If your main operations are on the first level of nesting parse your data into a
Map[String, Json] so you take advantage of the fast access by index.
Note: I personally would use circe-fs2 for this, which would give me the three previous points.
Could probably be done super-efficiently in weePickle by building a custom
Visitor that understands the data structure and does the filtering while the JSON is being parsed, but that’s a good deal of effort.