With-input-from-string

jducoeur · July 2, 2019, 4:41pm

Oh, scala-parser-combinators works just fine – I’m still using it in production. It’s just very slow compared to many of the other options…

sangamon · July 2, 2019, 5:23pm

Maybe there is some FastParse feature that helps with this - I don’t know the library, yet. If not, there’s still the option of transforming the grammar. That’s somewhat inconvenient and likely to turn out non-trivial in the general case, but I guess left-recursive grammars pose problems for handcrafted parsers, as well.

curoli · July 3, 2019, 7:27am

Well, there is an example right at the beginning of the FastParse hompage that involves indirect left-recursion.

So, FastParse seems to be able to do left-recursion in some cases - the problem is understanding when it works and when it doesn’t.

sangamon · July 3, 2019, 9:14am

There’s no left recursion there, just repetition. Left associativity is realized inside #eval() (which can be found further down on the page). Switching to #foldRight() should make it right associative. This is a perfectly fine restructuring that yields the intended result, it’s just different than encoding it directly in the grammar.

curoli · July 3, 2019, 10:54am

Ok, the FastParse doc example is not strictly left-recursive, because factor is number|parens. But then, my failing example is not strictly left-recursive either, because expression is term|sum.

I can see the doc example, thanks to repetition patterns in the grammar, can consume 1+2+3 without descending for the extra +, while my example requires descending.

It is still not clear why my example fails, though. The FastParse docs explicitly state that mutually recursive parsers are supported and realized via backtracking. Shouldn’t is parse after a single, simple backtrack?

sangamon · July 3, 2019, 11:27am

The FastParse example is recursive, but not left - the only way to enter the recursion requires a leftmost literal “(” before the recursive application.

In your example, if you inline sum, you get expression := ... | expression ....

curoli · July 3, 2019, 12:50pm

Oh, I see. I thought to be left-recursive, it needs to be to the left of the |, but that is actually not necessary. So, as you said, the FastParse doc example is not left-recursive, but my example is.

The FastParse result is still mysterious, since a simple descend and backtrack should work on “a+a”, but I realize now my grammar is problematic, because other inputs may cause indefinite descend (I think “b” would). Writing a grammar is definitely less trivial than I thought!

In that case, FastParse can be forgiven to fail on my flawed grammar, even if the failure mode is puzzling - perhaps some obscure optimization.

I think at some point I’ll try FastParse again with a better grammar.

davips · July 22, 2020, 6:39am

I am also trying to avoid/handle left-recursion in fastparse.
I found it intriguing people were insisting into putting it as a replacement for the old combinators library.
I’ve put a question on SO if anyone is interested in helping.

curoli · July 22, 2020, 11:35am

To clarify, left-recursive means that a term can recursively call itself without first consuming anything, but not necessarily as the first choice. So the first comment on SO may not solve the problem.

The first example on the FastParse site actually is a parser for math expressions that avoids left-recursion.

The trick to parse expressions consisting of binary infix operators is to divide your grammar terms into different levels according to how tightly they are bound, i.e. according to how many operators will treat them as a unit.

The highest level are unbreakable terms like numbers and things in parentheses that will remain a unit regardless of which operators are before or after.

The second highest level are expressions connected by operators of highest precedence, then the next level are for operators of next highest precedence and so on.

In the FastParse example, the highest level is called factor (numbers and things in parentheses), the second highest ist called divMul (division and multiplication) and the third is called addSub (addition and subtraction).

The trick to avoid recursion is to build expressions of each level by only using terms of higher level. For example, addSub only contains divMul terms and divMul only contains factor terms.

Instead of defining terms as binary, they are defined as operand followed by any number of operator-operand. Thereby addSub captures not only 1+1, but also 1 and 1+1+1. Note that operators of the same precedence are combined into the same rule, so addSub also captures 1+1-1+1-1.

To figure out that 1+1+1 means effectively (1+1)+1 in this example is not done by the parser, but the evaluation after the parsing.

I suppose it is possible to write a grammar that breaks everything down into binary expressions through the parser, but it would require quite a bit more rules.

Jasper-M · July 22, 2020, 12:44pm

A key part here is also that instead of recursing on itself, the parser rule uses the rep combinator.

So instead of

def num[_: P] = P(CharIn("0-9").rep(1).!).map(n => Num(n.toInt))

def div[_: P] = P(expr ~ "/" ~ expr).map(Div.tupled)

def expr[_: P]: P[Expr] = P(div | num)

You’d get

def num[_: P] = P(CharIn("0-9").rep(1).!).map(n => Num(n.toInt))

def div[_: P] = P(num ~ ("/" ~ num).rep).map{case (a,b) => b.foldLeft(a: Expr)(Div)}

davips · July 22, 2020, 5:49pm

Thanks for both replies.
I suppose they solve pretty well arithmetic cases, but I am unsure if, as long as I put lambdas/function application/etc, things will get really complicated for the purpose of using a combinator library.
Perhaps I should stick to the slow(?) old packrat implementation, or put some state inside the defs to detect the loop… I just couldn’t figure out how.

curoli · July 22, 2020, 6:10pm

Packrat parsers don’t work for left-recursive grammar. There are other types of parsers that may work, but maybe it is easier to convert your grammar into non-left-recursive. Can you give more details about what you are looking for?

davips · August 2, 2020, 8:38am

Curoli,
The std lib implementation of PackRat seems to avoid left-recursion (despite being terribly slow as the std ordinary parser is):
https://www.scala-lang.org/api/2.12.3/scala-parser-combinators/scala/util/parsing/combinator/PackratParsers.html

I have no ideia how to do it, but I feel I could embed some “rats” inside my fastparser functions to detect and avoid the recursion. I am afraid I will have to spend some months studying the packrat and parser algorithms in general to be able to do it.

curoli · August 6, 2020, 12:44pm

Ah, thanks for the clarifications. According to Wikipedia on Packrat Parsers:

"The process of rewriting indirectly left-recursive rules is complex in some packrat parsers, especially when semantic actions are involved.

With some modification, traditional packrat parsing can support direct left recursion,[3][9][10] but doing so results in a loss of the linear-time parsing property[9] which is generally the justification for using PEGs and packrat parsing in the first place. Only the OMeta parsing algorithm[9] supports full direct and indirect left recursion without additional attendant complexity (but again, at a loss of the linear time complexity), whereas all GLR parsers support left recursion. "

I would just rewrite my grammar to be non-left recursive.