Pull to refresh

JavaCC 21 Parser Generator

Reading time4 min
Views2.4K

JavaCC 21 is a continuation of work on the venerable JavaCC parser generator, originally developed at Sun Microsystems in the 1990’s and released under a liberal open source license in 2003. It is currently the most advanced version of JavaCC. It has many feature enhancements (with more to come soon) and also generates much more modern, readable Java code. Also, certain key bugs have finally been fixed. (N.B. The “21” in JavaCC 21 is not a version number. It is simply part of the project name and means that this is a JavaCC for the 21st century!)


Here are some highlights:


Up-to-date Java language support


JavaCC 21 supports the Java language through JDK 13. See here. The Java grammar that JavaCC 21 uses internally can be used in your own projects without any restriction.


Major Bugfix! Nested Syntactic Lookahead now works correctly!


A longstanding limitation of JavaCC has been that syntactic lookahead does not nest, i.e. work recursively. This has been an issue in JavaCC for 24 years and was never addressed, and surely caused the tool to be less generally useful than it could have been, since attempts to do anything at all sophisticated would typically use recursive lookahead and would simply not work. This is now fixed in JavaCC 21.


Streamlined Syntax


Though the original (or legacy) syntax is still supported (and will be for the indefinite future), JavaCC 21 offers a new streamlined syntax that should be easier to read and write. Sometimes the improvement in readability is truly dramatic! For example, where you would previously have written:


void FooBar() :
{}
{
    Foo() Bar()
}

with the streamlined syntax, you can now write:


FooBar : Foo Bar ;

Cumbersome aspects of the legacy LOOKAHEAD construct have been streamlined. See here. Again, the newer syntax sometimes affords dramatic improvements in clarity. Where you would have written before:


LOOKAHEAD(Foo() Bar() Baz()) Foo() Bar() Baz()

In JavaCC 21, you can write:


=> Foo Bar Baz

The new up-to-here marker also offers great gains in readability and maintainability. For example, where you would have written before:


LOOKAHEAD(Foo() Bar()) Foo() Bar() Baz()

You can now write:


Foo Bar =>|| Baz

The new syntactic element =>|| is called the up-to-here marker and expresses much more clearly and succinctly the concept that, when deciding whether to enter the Foo Bar Baz expansion, we scan forward up to and including the Bar.


Lookbehind predicates


Lookbehind is a new feature in JavaCC 21 that allows you to write predicates at choice points that check whether you are at a given point in the parse. For example:


SCAN …\Foo => Bar

The above uses a lookbehind predicate to express the idea that we only enter the Bar production if we have previously entered a Foo. Or, for example:


SCAN ~\…\Foo => Foo

This means that we can only enter the Foo production if we are not already in a Foo. (Or in other words, Foo is not re-entrant.) (Lookbehind is quite a useful feature, already used in internal development, and we are not aware of any similar tools that offer this.)


Tree Building Enhancements


JavaCC 21 is based on the view that building an AST (Abstract Syntax Tree) is the normal usage of this sort of tool. While the legacy JavaCC package does contain automatic tree-building functionality, i.e. the JJTree preprocessor, JJTree has some (very) longstanding usability issues that JavaCC 21 addresses.


For one thing, a lot of what makes JJTree quite cumbersome to use is precisely that it is a preprocessor! In JavaCC 21, all of the JJTree functionality is simply in the core tool and the generated parser builds an AST by default. (Tree building can be turned off however.)


(NB. JavaCC 21 uses the same syntax for tree-building annotations as legacy JJTree.)


JavaCC 21 has an INJECT statement that allows you to inject Java code into any generated file thus doing away with the unwieldy anti-pattern of post-editing generated files.


Better Generated Code


JavaCC 21 generates more readable code generally. Certain things have been modernized significantly. Consider the Token.kind field in the Token.java file generated by the legacy tool. That field is an integer and is also (contrary to well known best practices) publicly accessible. In the Token.java file that JavaCC 21 generates, the Token.getType() method returns a type-safe Enum and all of the code in the generated parser that previously used integers to represent the type of Token, now uses type-safe Enums.


Code generated by legacy JavaCC gave just about zero information about where the generated code originated. Parsers generated by JavaCC 21 have line/column information (relative to the real source file, the grammar file) and they also inject information into the stack trace generated by ParseException that contain line/column information relative to the grammar file.


If you want to compare side-by-side code generated by legacy JavaCC with that generated by JavaCC 21, see this page.


The current JavaCC 21 codebase itself is the result of a massive refactoring/cleanup. Code generation has been externalized to FreeMarker templates. To get an idea of what this looks like in practice, here is the main template that generates Java code for grammatical productions.


Assorted Usability Enhancements


In general, JavaCC 21 has more sensible default settings and is much more usable out-of-the-box. See here


JavaCC 21 is actively developed!


Perhaps most importantly, the project is again under active development. Now that the project is active again, users can expect significant new features fairly soon. Given that code generation has been externalized to template files, the ability to generate parsers in other languages is probably not very far off. Another near-term major goal is to provide support for fault-tolerant parsing, where a parser incorporates heuristics for building an AST even when the input is invalid (unbalanced delimiters, missing semicolon and such).


One way to stay up to date with the JavaCC 21 project is to subscribe to our blog newsfeed in any newsreader.


Usage is really quite simple. As described here JavaCC 21 is invoked on the command line via:
(N.B. The “21” in JavaCC 21 is not a version number. It is simply part of the project name and means that this is a JavaCC for the 21st century!)


java -jar javacc-full.jar MyGrammar.javacc

The latest source code can be checked out from Github via:


git clone https://github.com/javacc21/javacc21.git

And then you can do a build by invoking ant from the top-level directory. You should also be able to run the test suite by running ant test.


If you are interested in this project, either as a user or as a developer, you may write us. Better yet, you can sign up on our Discourse forum and post any questions or suggestions there.

Tags:
Hubs:
+8
Comments0

Articles

Change theme settings