Which parser is recommended in 2025?

I wonder which parser is recommended to parse JVM languages, in particular Scala and Java, in 2025? By parser, I mean a tool that lets deconstructs the code such that the code can be transformed to some other format like JSON.

ps!
I am looking into ANTLR 4, and I wonder if there are better alternatives.

Kind of depends on the details of how you want to use it. For example, if I was writing a tool in Scala itself that parsed Scala code, I might poke at FastParse’s Scala parser.

OTOH, for many purposes I’d probably focus on using the TASTy files generated by the Scala compiler. (Already parsed and ready for manipulation.)

There’s no single correct answer here – it really depends on what you’re trying to accomplish, and the context you’re parsing in.

2 Likes

Thank you for kind response.

Basically, I would like to parse the code and JavaDoc/ScalaDoc and transform them to a structured format, i.e. JSON, that a language model understands. The best option is to use a parser that supports multiple languages like Python, C, C# in addition to JVM languages. However, a parser that supports only JVM languages would be very good.

The functionality I need can be summarized as the following:

  1. Read the system documentation, i.e. JavaDoc, ScalaDoc, Python Docstring and parse the explanation for the role of class, interface, trait, method in “prose” to structured information in Json.
  2. Read the signature of methods of a class/interface/trait and cross match with the parsed information in nr.1, and extend the above json with more structured information

All in all, I want to read the “definitions” of high level constructs in an application , say Java or Scala, along with the system documentation and parse it to some structured information like JSON.

Peace :slight_smile:

If it is for LLMs, you might want to look at Introduction - Model Context Protocol and in particular recent support for it from Metals (the Scala LSP)

(I do not use AI that much, so I don’t know much, and I’ve not read the above articles, but they look like good introductions to the subject)

1 Like

On a related topic, what’s the standard way to manipulate TASTy files?

To read, I believe it is GitHub - scalacenter/tasty-query
To write, I would say write code and let the compiler handle writing the tasty

3 Likes