Using modern regexes in JavaScript

Dmitry Soshnikov
3 min readApr 24, 2017

--

It’s not a secret, that today JS RegExp library is pretty limited in contrast with Python regexes, or PCRE. A good thing though, ES2018 is going to bring several great improvements to regular expressions.

Some of the new features, and other convenient extensions can already be used today, using transpilation. We’re going to discuss a Babel plugin for modern regexes which enables powerful RegExp features, and make usage of regexes more convenient.

Babel plugin for modern regexp

The plugin babel-plugin-transform-modern-regexp can be installed from npm, and the usage section explains how to add it to your Babel transformation pipeline.

Currently it implements three new features, which we’re going to discuss in detail.

“dotAll” /s flag

As we know, the “dot” . meta-symbol matches any, but new line \n characters. The new /s flag adds support for it, and dot starts to match really all chars, including new lines.

Note: see details in the proposal.

Named capturing groups

Another useful feature is the named capturing groups.

Note: see details in the proposal.

Simple (or numbered) capturing groups allow referring a captured content either by using a backreference within the regexp itself, or in a matched result. Each capturing group is assigned a unique number and can be referenced using that number, however this can make a regular expression hard to grasp and refactor.

For example, having the/(\d{4})-(\d{2})-(\d{2})/ expression, that matches a date, one cannot be sure which group corresponds to the month and which one is the day without examining the surrounding code. Also, if we want to swap the order of the month and the day, the group references should also be updated.

Named capture groups provide a nice solution for these issues:

As we can see, the regexp looks more clear now. If the plugin is used with the useRuntime option, one can also access groups property on the result.

Extended RegExp: /x flag

The x flag, which is standard in Python or PCRE, is not yet standardized in ECMAScript. However, with the plugin it can normally be used today in JavaScript without any runtime overhead (it’s translated to normal regexp literals).

The x flag enables two cool features in regular expressions. First, most of the whitespace are ignored, and second, it allows using #-comments in regular expressions, making them much cleaner, and comprehensible.

Since the flag is not supported yet by JS engines, and parsers, it cannot be used with regexp literals. However we can normally useRegExp constructor for this:

Now it’s annotated by comments, and a person who reads this code don’t have to parse with eyes complex one-liner regexes.

This is already better, however, adds slight inconvenience: meta-chars (as in any JavaScript string) have to be escaped with double slashes: \\d instead of \d, like in regexp literals. To solve this, plugin provides convenient re shorthand.

Using “re” shorthand

Not only it’s shorter than new RegExp(...), it also allows using single escape of meta-chars, as in regexp literals:

Note: \\1 still should be escaped with two slashes since\1is treated as an Octal number, which aren’t allowed in template strings.

As we can see, re accepts a regular expression in literal notation, which unifies the usage format. And in both cases, the regular expressions are just translated to the simple /(\d{4})-(\d{2})-(\d{2})/ discussed above.

JS regexes still need more improvements, however as we have seen in this article, some of them already can be used today in any legacy JS engine. Please feel free to use the plugin; I’ll appreciate any feedback, and will be glad to answer any questions.

Have fun with regexes ✎

--

--

Dmitry Soshnikov

Software engineer interested in learning and education. Sometimes blog on topics of programming languages theory, compilers, and ECMAScript.