I’ve found myself in need of parsing strings of the form:
r + e + c + u - r - s - i - o - n t + o - o - e + a - s - y
r + e - c - u - r - s + i + o - n t - o + o - e + a - s - y
r + e - c + u + r - s - i - o + n t + o - o + e - a - s + y
r - e + c + u - r + s - i + o - n t - o + o + e - a - s + y
t - o - o - e + a + s + y
r - e - c - u + r - s - i + o + n t + o - o - e - a + s - y
t - o + o - e - a + s - y
I decided to try using parsing combinators:
class ParseAll extends RegexParsers:
override val whiteSpace: Regex = """\h+""".r
def nl: Parser[Any] = """\v""".r // one newline
def nl2: Parser[Any] = """\v{2,}""".r // two or more newlines
def all: Parser[Set[Set[Int]]] = repsep(par, nl2) <~ rep(nl) ^^ (_.toSet) // sequence of paragraphs
def par: Parser[Set[Int]] = rep1sep(line, nl) ^^ (_.toSet.flatten) // sequence of lines
def line: Parser[Set[Int]] = repNM(1, 2, sum) ^^ (_.toSet) // 1 or 2 sums
def sum: Parser[Int] = chainl1(letter, plus | minus)
def plus: Parser[(Int, Int) => Int] = "+" ^^ (_ => (x: Int, y: Int) => x + y)
def minus: Parser[(Int, Int) => Int] = "-" ^^ (_ => (x: Int, y: Int) => x - y)
def letter: Parser[Int] = """\p{Alpha}""".r ^^ (_.head.toInt) // single letter
This is concise and does exactly what I need. (The columns in the input are actually meaningful but I plan to write another parser for that.) My question is more of a matter of style and use of features. Is this the right way to handle the fact that blank lines are meaningful in the input? For instance, the <~ rep(nl)
in the first rule looks weird to me, but I need to handle potential newlines at the end of the file. Also, is there a more generic way of saying: “I want repetitions to be parsed as sets instead of sequences”?
Thanks in advance for any feedback.