I wonder which parser is recommended to parse JVM languages, in particular Scala and Java, in 2025? By parser, I mean a tool that lets deconstructs the code such that the code can be transformed to some other format like JSON.
ps!
I am looking into ANTLR 4, and I wonder if there are better alternatives.
Kind of depends on the details of how you want to use it. For example, if I was writing a tool in Scala itself that parsed Scala code, I might poke at FastParse’s Scala parser.
OTOH, for many purposes I’d probably focus on using the TASTy files generated by the Scala compiler. (Already parsed and ready for manipulation.)
There’s no single correct answer here – it really depends on what you’re trying to accomplish, and the context you’re parsing in.
Basically, I would like to parse the code and JavaDoc/ScalaDoc and transform them to a structured format, i.e. JSON, that a language model understands. The best option is to use a parser that supports multiple languages like Python, C, C# in addition to JVM languages. However, a parser that supports only JVM languages would be very good.
The functionality I need can be summarized as the following:
Read the system documentation, i.e. JavaDoc, ScalaDoc, Python Docstring and parse the explanation for the role of class, interface, trait, method in “prose” to structured information in Json.
Read the signature of methods of a class/interface/trait and cross match with the parsed information in nr.1, and extend the above json with more structured information
All in all, I want to read the “definitions” of high level constructs in an application , say Java or Scala, along with the system documentation and parse it to some structured information like JSON.