How to externalize internal scala DSL?

iLoveZod13 · November 25, 2020, 8:17am

Hi gurus,

I tried to define own internal DSL as payroll calculation scripts as below:
class SchemaEntryPoint (implicit var ctxMeta: PayrollEngineContextMeta, var pyClusterBuffer: PayrollClusterBuffer) extends PySchemaDsl { def schema_cn20 = { // ----------------Schema Scripts Begins Here---------- IMPORT(IT0008) PIT(PCR1(“NORM”)) IF(OFF_CYCLE){ BLOCK(“Off-cycle schema block”){ PIT(PCR1(“NORM”)) } } ELSE { IMPORT(IT0008) } PIT(PCR1(“NORM”)) IF(TERM_MONTH){ PIT(PCR1(“WFWF”)) } // ----------------Schema Scripts Ends Here---------- } }

Above DSL scripts which describes payroll calculation rules, is complied with main program, and can only be maintained by developer. But what I really need is to make it externalized (DSL script saved in database or txt file, etc) and to be loaded at runtime, so that payroll business specialist can maintain it without touching any main codes.

This requirement is much like the rule-engine concept：

Runtime load rule set from external source (DB or file system)
Load structured dataset and apply to loaded rules (expressed as DSL scripts)
Execute actions, logical step as scripted with the DSL

But I could not find any mature scala rule-engine for this, because the payroll calculation rule is much more complex than common business rule set like IF…ELSE.

Do I need to design a customized rule-engine for this requirement? I am quite at sea now…

curoli · November 25, 2020, 11:58am

For an external DSL, you would design a grammar and implement a parser. I can recommend FastParse.

The term “rule engine” is too generic to ask which is the best. It depends on what kind of rules you have.

I understand your use case involves going through a number of records and subjecting each record to modifications based on rules. Sounds vaguely similar to a project I’m currently working on, except that in my case, records are genomic features. As a DSL, I started with something based on JSON, then implemented complex filtering conditions using FastParse and I’m planning to move the entire DSL away from JSON to my own grammar at some point.

cbley · November 25, 2020, 3:08pm

Nowadays, I’d probably take a look into cats-parser which is already compatible with dotty / Scala 3 and mostly compatible with FastParse.

Why another parsing library? See this blog post detailing the design. To reiterate, this library has a few goals:

Compatability: should work on all scala platforms and recent versions. Currently it supports JVM, JS on versions 2.12, 2.13, and Dotty. The core library should have minimal dependencies. Currently this library only depends on cats.

Excellent performance: should be as fast or faster than any parser combinator that has comparable scala version support.

Cats friendliness: method names match cats style, and out of the box support for cats typeclasses.

Precise errors: following the Haskell Trifecta parsing library, backtracking is opt-in vs opt-out. This design tends to make it easier to write parsers that point correctly to failure points.

Safety: by introducing Parser1, a parser that must consume at least one character on success, some combinators and methods can be made safer to use and less prone to runtime errors.

Stability: we are very reluctant to break compatibility between versions. We want to put a minimal tax on users to stay on the latest versions.

iLoveZod13 · December 2, 2020, 4:37am

Thanks a lot Curoli and cbley for your recommendation.

I think I am picking up fastparse, as cats-parser has less low-level documentation for newbie like me.

But lexical parsing with parser library is just the 1st step of customized DSL. There are further steps like syntactical transformation and compilation.

I am picking up basic concept from below 2 tutorials which are based on scala standard parser:

And I am going to learn deeper from < Hands-on Scala Programming > chapter 19 and 20. Unfortunately it will take more than 1 month before I get the book.

Do you have any further clue that I can refer to?

curoli · December 2, 2020, 11:30am

After you have parsed the language document into a syntax tree, the next step highly depends on your context.

For example, in my current project, the parser returns an expression that takes a record and returns a Boolean for filtering (I’m also planning to add expressions that turn records into new fields for mapping).

You can also take a look at an earlier project, which allowed munging of tab-separated value files based on a DSL supporting various datatypes and method calls. The entry point for the shell (REPL) is here. In this project, I rolled my own shift-reduce parser, but would definitely use FastParse if I had to do it again.