here comes another beginners question: I am given a multiline String containing a key-value-pair per line. Now I’d like to parse the String generating a case class object from it. Assume the following case class:
case class Something(key1: String, key2: String, key3: Boolean)
and a String like:
key2: value2
key1: value1
key4: value4
key3: no
As can be seen, the String contains some hurdles:
key-value pairs might not occur in a defined order.
whitespaces and newlines might occur more often than required
there might be keys inside, that can be neglected when the object is constructed
There may be a need for implicit conversions such as key3 = yes -> True, False else
My question is: what is the most beautiful, scala-like approach for parsing. I considered to use pattern matching but using it in a loop, I would need to initially construct an “empty object” whose setters are called during parsing. This is not very scala-like.
It doesn’t require the Map to be mutable. You can build a Seq[(String, String)] and call toMap on it. That whole process can be done in a way that is functional and immutable.
What I’m wondering about is if other people would use regular expressions for this. Given the variability of the input, that is my first inclination. Then do a for-yield with a pattern on the regular expression.
Not at all – you’re parsing in a line-by-line loop, with each line creating a new immutable Map based on the previous Map and the new information. It’s very rarely necessary to use a mutable Map…
Personally, I tend to just go directly to FastParse. But you’re probably right that that’s overkill in this case, and that doing a line-by-line regex is good enough…
You can’t do that with the for-comprehen, but you can use collect on the iteratee instead.It takes a partial function (e.g. a pattern match that does not match all cases) and returns only the values that matched.
For this special case, Scala has a method lines for Strings, which splits a string at \n. It returns an Iterator instead of an array, which may be faster (not storing the intermediate result), but the split should be negligible performance-wise anyway.
With the for-comprehension, you would have to wrap the whole comprehension in parens to add a .toMap afterwards. But using collect, the whole thing becomes a one-liner:
serialized.lines.collect{ case pattern(key, value) => (key, value) }.toMap
You can do this with a for-comprehension. One of the features of for-comprehensions (that has recently been debated some on these boards) is that is simply skips anything that isn’t a match. So you can do the following.
val resultMap = (for(pattern(key, value) <- serialized.split("\n")) yield (key, value)).toMap
Any line that doesn’t match the pattern is excluded. Whether you prefer this syntax to the collect is up to you. Using @crater2150’s lines trick works here too. (I hadn’t seen that method in the API for String. I just used it for Source in the past.)
val resultMap = (for(pattern(key, value) <- serialized.lines) yield (key, value)).toMap
(Aside: this is now JDK version dependent — String acquired a built-in lines method on JDK 11. We undeprecated .linesIterator in Scala 2.12.8 to avoid conflicting with the new method.)