Rewriting left recursive grammars


#1

I had a parser which parsed boolean expressions correctly by having higher precedence items encoded in separate terms which somehow came together and worked.

However I’m now trying to encode a more thorough grammar which is given in left recursive format. This has the problem of stack overflow. For example in this very slimmed down version:

  def formula: Parser[Boolean] = (
      formula ~ "&&" ~ formula ^^ {case f1 ~ "&&" ~ f2 => f1 && f2 }
    | truthValue
  )

  def truthValue: Parser[Boolean]  = (
      "true" ^^ (_ => true)
    | "false" ^^ (_ => false)
  )

the formula will keep going down into itself until it blows up.

Therefore following the algorithm given on https://en.wikipedia.org/wiki/Left_recursion “Removing Direct Recursion”, I end up with this:

  def formula: Parser[Boolean] = term~formula'

  def formula' = (
                "&&"~formula'
              | "" // epsilon
            )

However, I find it hard to apply the ^^ operator since, for example the “&&”~formula’ item has no idea of what has come before it so how do I do the case f1 ~ “&&” ~ f2 element?

There is probably an easier way to solve this problem, but I’m trying to keep to the grammar as much as possible. Swapping truthValue and formula around leads to it terminating too early.

Many thanks,


#2

The transformed version looks wrong. Where’s truthValue? In any case, you can encode whatever data you have in each production and then combine it properly on the way up. Untested (and it’s been a few years since I last used Scala parser combinators):

 def formula: Parser[Boolean] = term ~ formula' ^^ { case f1 ~ p => p(f1) }

  def formula': Parser[Boolean => Boolean] = (
                "&&"~formula' ^^ { case "&&" ~ f2 => (f1 => f1 && f2)}
              | "" ^^^ identity // epsilon
            )

#3

great thanks - I’ve managed to solve it by changing the defs in the left recursive grammars to lazy vals and keeping PackratParser mixed in but your solution shows how to solve the problem using the classic rewriting rules which I couldn’t manage.

I’ve come across another problem - the grammar I am implementing has two rules which are

string~"="~string

or

term

which itself eventually goes down to

string~"="~string.

It is possible when the parser comes across the first string~"="~string to do a lookup of the first string in a map or set object in the ^^ method. If the string is not found, then it tells the parser string~"="~string has failed, and to continue trying to match term?

i.e

(string<~"=")~string ^^ { case s1~s2 => if (mySet.contains(s1)) new SomeClass(s2) else this parsing run has failed, try to match it under term}