Using modern regexes in JavaScript

It’s not a secret, that today JS RegExp library is pretty limited in contrast with Python regexes, or PCRE. A good thing though, ES2018 is going to bring several great improvements to regular expressions.

Some of the new features, and other convenient extensions can already be used today, using transpilation. We’re going to discuss a Babel plugin for modern regexes which enables powerful RegExp features, and make usage of regexes more convenient.

Babel plugin for modern regexp

Currently it implements three new features, which we’re going to discuss in detail.

“dotAll” /s flag

Note: see details in the proposal.

Named capturing groups

Note: see details in the proposal.

Simple (or numbered) capturing groups allow referring a captured content either by using a backreference within the regexp itself, or in a matched result. Each capturing group is assigned a unique number and can be referenced using that number, however this can make a regular expression hard to grasp and refactor.

For example, having the/(\d{4})-(\d{2})-(\d{2})/ expression, that matches a date, one cannot be sure which group corresponds to the month and which one is the day without examining the surrounding code. Also, if we want to swap the order of the month and the day, the group references should also be updated.

Named capture groups provide a nice solution for these issues:

As we can see, the regexp looks more clear now. If the plugin is used with the useRuntime option, one can also access groups property on the result.

Extended RegExp: /x flag

The x flag enables two cool features in regular expressions. First, most of the whitespace are ignored, and second, it allows using #-comments in regular expressions, making them much cleaner, and comprehensible.

Since the flag is not supported yet by JS engines, and parsers, it cannot be used with regexp literals. However we can normally useRegExp constructor for this:

Now it’s annotated by comments, and a person who reads this code don’t have to parse with eyes complex one-liner regexes.

This is already better, however, adds slight inconvenience: meta-chars (as in any JavaScript string) have to be escaped with double slashes: \\d instead of \d, like in regexp literals. To solve this, plugin provides convenient re shorthand.

Using “re” shorthand

Note: \\1 still should be escaped with two slashes since\1is treated as an Octal number, which aren’t allowed in template strings.

As we can see, re accepts a regular expression in literal notation, which unifies the usage format. And in both cases, the regular expressions are just translated to the simple /(\d{4})-(\d{2})-(\d{2})/ discussed above.

JS regexes still need more improvements, however as we have seen in this article, some of them already can be used today in any legacy JS engine. Please feel free to use the plugin; I’ll appreciate any feedback, and will be glad to answer any questions.

Have fun with regexes ✎

Software engineer interested in learning and education. Sometimes blog on topics of programming languages theory, compilers, and ECMAScript.