Syntax parsing (BNF, code generators, etc)

Good day

What tutorial should I consult to learn how to do complex parsing in Scala?

Parser combinators do not look comfortable for me, is any lex/yacc approach or Ragel variants exists for Scala?

What is it about parser combinators that you don’t like? In particular, are you referring to the “standard” parser combinators library or any? Atto and fastparse are pretty nice.

1 Like

Did you ever try to write C++ parser using combinators?

  • How ugly the resulting code?
  • Is this code supportable by non-Scala programmers?
  • How resource-hungry is it comparing to state machines generated by compiler compilers?

I’m looking on Scala stack as a distributed platform for making embedded source code analysis and processing system, something like 1-2Gb of the source code of embedded Linux with full kernel+libc+ssh+… tools, say typical Buildroot or OpenWrt build set.

The goal above look undoable, so I first try to implement embedded C++ parser with GNU & IAR syntax variants.

I’m going to reiterate: have you looked at fastparse? It’s efficient, it’s relatively easy to use, and it produces the nicest parser code I’ve ever worked with.

If you’re fundamentally opposed to parser combinators, you may be out of luck in the Scala ecosystem. But things have evolved a lot from the early versions of that idea…

1 Like

Is it important that the parsing is done is Scala or is it OK with a Java-framework, like https://www.antlr.org/, which produces java-code ?

Thanks all. First, I’ll try parser combinator libs, maybe some of them able to work in multithreading. And leave Ragel and ANTLR as a fallback solution.

Are combinators comparable with DCG in universality and context-dependent grammar parsing?

A parser for C++ is going to be “ugly” no matter what. Parser combinators will be the least ugly of all options though, because they come closest to looking like the BNF while being concise. Would you rather a huge ball of generated code?

I’m not sure how you want Scala code to be supportable by non-Scala programmers, unless you mean, not very advanced Scala programmers.

If you want simplicity I would recommend Atto, if speed is important you can use FastParse, although version 2 has a bit of a weird syntax and since it uses macros some things can be slightly counterintuitive, however these are pretty minor issues.

BTW there is a Scala parser written with FastParse, you can check whether you think it looks ugly to you.

I would personally use fastparse, but there is also parboiled. It’s nearly as good, though I liked fastparse’s syntax better. https://github.com/sirthias/parboiled2

I just found that fastparse can work in stream mode, it looks much better, and grammar DSL really more readable. Thanks for advice.

Are any binary parsers exist for protocols and data format dissections exists able to analyze bit fields and big/little endianness in file system images and network protocol captures? Offline, not realtime as FP-wrapped Java not so fast as C(++).

https://github.com/scodec/scodec

1 Like

How would you write a parser-combinator, for example, to parse a String literal? Basically, how do you implement the rule that a String literal ends with a double-quote, but not a double-quote preceded by a backslash?

Maybe that’s just my lack of experience, but I can’t find an easy solution, which makes me a bit skeptical of parser-combinators.

I would check the Python and Scala parsers written with FastParse. The latter is relied on by Ammonite.

I’m not sure why you consider it a particular hard case.

Off the top of my head, you would probably define a parser for string element which would normally consume one character but if it sees a backspace consumes two. It rejects a standalone double quote. Then you’d sandwich a zero-or-more repeat of that between two double-quote parsers.

Go to http://www.lihaoyi.com/fastparse and search for def string. It’s right there!

Oh, yeah, I must have had a knot in my brain. It’s actually quite easy to parse a String literal with parser-combinator.