Rust LALR(1) parser generator

Dmitry Soshnikov
3 min readJun 15, 2017

--

In this small note we’ll talk about new plugin for the Syntax tool, which allows building parsers for Rust programming language.

NOTE: if you’re not familiar with the Syntax tool, you can address this introductory article.

Automatic parsers

Writing a manual parser “by hands” might be fun, but mostly when you learn the thing. Even a simple recursive-descent parser can take some time and effort to actually implement it, and also recursive-descent parsers can be pretty slow due to a lot of recursion.

For production usage, a parser generator tool might be a better approach. It can generate fast table-driven (i.e. recursion-less) parsers from simple grammar format.

Syntax is one of such tools: supporting grammars in Bison/Yacc/JSON formats, and being a language-agnostic parser generator, it allows building almost any parser for any language. And recently plugin for Rust language was added.

Other solutions

There are some other good solutions available for Rust, e.g. nom parser combinator, which though uses a different approach, and instead of having a small grammar, and an automatic parser, it uses a set of manually implemented functions. This approach can be useful as well, however below we focus on exactly Bison/Yacc approach for parsers.

Building the parser

Below we’re going to build a parser step by step, getting as direct evaluations, and also AST (abstract syntax tree) generations.

We’re going to use standard Cargo package manager to build our project. In the example we’re going to use a simple “calculator” grammar.

0. Prerequisite: install Syntaxtool

npm install -g syntax-cli

NOTE: npm is pre-installed with Node.js

1. Create new cargo project

Let’s create a new application called calc-parser:

cargo new calc-parser --bin
Created binary (application) `calc-parser` project
cd calc-parser

2. Create a local library crate

Now we need to create an actual library which will hold our parser code. Let’s call it syntax:

cargo new syntax --lib
Created library `syntax` project

And add it to the dependencies list of the main app. In the Cargo.toml:

[dependencies]
syntax = { path = "syntax" }

3. Configure dependencies of the syntax crate

Syntax tool generates a parser which depends on two external crates: onig, and lazy_static. Let’s add them to our syntax library dependencies list.

In the syntax/Cargo.toml add:

[dependencies]
onig = "4"
lazy_static = "0.2.8"

NOTE: Onig dependency requires Rust version at least 1.26.0.

4. Create grammar file

Below is our simple calculator grammar. In the syntax/grammar.g add:

NOTE: if you’re not familiar with Bison/Yacc grammar format, you can address this documentation.

NOTE: here we used example in Bison/Yacc format. You can also check the example in JSON format.

Since Rust is a strongly and (mostly) statically typed language, we need to define types of all the used arguments in production handlers. They are defined in the Rust’s closure notation.

For example, before being able to do a mathematical operation of $$ = $1 + $3 in the first Expr + Expr production, we need to define the types of the used arguments, and the result type:

|$1: i32, $3: i32| -> i32;

If an argument is just propagated without any operation, the type declarations can be omitted, as in the last production ( Expr ) where we just return the $$ = $2.

Notice also, that in the third NUMBER production, we get direct matched token value via the yytext variable. Since we don’t use arguments, the type declaration of the production defines only return type, having empty arguments:

|| -> i32;

We could also access the matched token via the $1.value, and for this the type declaration would be |$1: Token| -> i32.

5. Generate the parser

Now using Syntax tool, let’s generate the parser from our grammar:

syntax-cli -g syntax/grammar.g -m LALR1 -o syntax/src/lib.rs    ✓ Successfully generated: syntax/src/lib.rs

Notice how we specified output file to be the lib.rs from our syntax crate. We also chose LALR(1) parsing mode, which is the most practical one.

6. Use the parser

Now in the main.rs we can require and use the parser:

Check the result:

cargo run
....
6

And that’s it! We have a fully working parser automatically generated from the simple grammar.

Above we used a direct evaluation of the expression, however, we can easily build an AST for the code. Check out this example which builds a tree of nodes for math expressions.

If you have any questions, I’ll be glad to answer them in the comments. Have fun with parsers!

--

--

Dmitry Soshnikov
Dmitry Soshnikov

Written by Dmitry Soshnikov

Software engineer interested in learning and education. Sometimes blog on topics of programming languages theory, compilers, and ECMAScript.

No responses yet