Rust LALR(1) parser generator
In this small note we’ll talk about new plugin for the Syntax tool, which allows building parsers for Rust programming language.
NOTE: if you’re not familiar with the Syntax tool, you can address this introductory article.
Automatic parsers
Writing a manual parser “by hands” might be fun, but mostly when you learn the thing. Even a simple recursive-descent parser can take some time and effort to actually implement it, and also recursive-descent parsers can be pretty slow due to a lot of recursion.
For production usage, a parser generator tool might be a better approach. It can generate fast table-driven (i.e. recursion-less) parsers from simple grammar format.
Syntax is one of such tools: supporting grammars in Bison/Yacc/JSON formats, and being a language-agnostic parser generator, it allows building almost any parser for any language. And recently plugin for Rust language was added.
Other solutions
There are some other good solutions available for Rust, e.g. nom parser combinator, which though uses a different approach, and instead of having a small grammar, and an automatic parser, it uses a set of manually implemented functions. This approach can be useful as well, however below we focus on exactly Bison/Yacc approach for parsers.
Building the parser
Below we’re going to build a parser step by step, getting as direct evaluations, and also AST (abstract syntax tree) generations.
We’re going to use standard Cargo package manager to build our project. In the example we’re going to use a simple “calculator” grammar.
0. Prerequisite: install Syntax
tool
npm install -g syntax-cli
NOTE:
npm
is pre-installed with Node.js
1. Create new cargo project
Let’s create a new application called calc-parser
:
cargo new calc-parser --bin
Created binary (application) `calc-parser` projectcd calc-parser
2. Create a local library crate
Now we need to create an actual library which will hold our parser code. Let’s call it syntax
:
cargo new syntax --lib
Created library `syntax` project
And add it to the dependencies list of the main app. In the Cargo.toml
:
[dependencies]
syntax = { path = "syntax" }
3. Configure dependencies of the syntax
crate
Syntax tool generates a parser which depends on two external crates: onig
, and lazy_static
. Let’s add them to our syntax
library dependencies list.
In the syntax/Cargo.toml
add:
[dependencies]
onig = "4"
lazy_static = "0.2.8"
NOTE: Onig dependency requires Rust version at least
1.26.0
.
4. Create grammar file
Below is our simple calculator grammar. In the syntax/grammar.g
add:
NOTE: if you’re not familiar with Bison/Yacc grammar format, you can address this documentation.
NOTE: here we used example in Bison/Yacc format. You can also check the example in JSON format.
Since Rust is a strongly and (mostly) statically typed language, we need to define types of all the used arguments in production handlers. They are defined in the Rust’s closure notation.
For example, before being able to do a mathematical operation of $$ = $1 + $3
in the first Expr + Expr
production, we need to define the types of the used arguments, and the result type:
|$1: i32, $3: i32| -> i32;
If an argument is just propagated without any operation, the type declarations can be omitted, as in the last production ( Expr )
where we just return the $$ = $2
.
Notice also, that in the third NUMBER
production, we get direct matched token value via the yytext
variable. Since we don’t use arguments, the type declaration of the production defines only return type, having empty arguments:
|| -> i32;
We could also access the matched token via the $1.value
, and for this the type declaration would be |$1: Token| -> i32
.
5. Generate the parser
Now using Syntax tool, let’s generate the parser from our grammar:
syntax-cli -g syntax/grammar.g -m LALR1 -o syntax/src/lib.rs ✓ Successfully generated: syntax/src/lib.rs
Notice how we specified output file to be the lib.rs
from our syntax
crate. We also chose LALR(1)
parsing mode, which is the most practical one.
6. Use the parser
Now in the main.rs
we can require and use the parser:
Check the result:
cargo run
....6
And that’s it! We have a fully working parser automatically generated from the simple grammar.
Above we used a direct evaluation of the expression, however, we can easily build an AST for the code. Check out this example which builds a tree of nodes for math expressions.
If you have any questions, I’ll be glad to answer them in the comments. Have fun with parsers!