Named capturing groups in JavaScript regex?

Named capturing groups in JavaScript regex?

As far as I know there is no such thing as named capturing groups in JavaScript. What is the alternative way to get similar functionality?

Solutions/Answers:

Solution 1:

ECMAScript 2018 introduces named capturing groups into JavaScript regexes.

If you need to support older browsers, you can do everything with normal (numbered) capturing groups that you can do with named capturing groups, you just need to keep track of the numbers – which may be cumbersome if the order of capturing group in your regex changes.

There are only two “structural” advantages of named capturing groups I can think of:

  1. In some regex flavors (.NET and JGSoft, as far as I know), you can use the same name for different groups in your regex (see here for an example where this matters). But most regex flavors do not support this functionality anyway.

  2. If you need to refer to numbered capturing groups in a situation where they are surrounded by digits, you can get a problem. Let’s say you want to add a zero to a digit and therefore want to replace (\d) with $10. In JavaScript, this will work (as long as you have fewer than 10 capturing group in your regex), but Perl will think you’re looking for backreference number 10 instead of number 1, followed by a . In Perl, you can use ${1}0 in this case.

Related:  JavaScript: Decimal Values

Other than that, named capturing groups are just “syntactic sugar”. It helps to use capturing groups only when you really need them and to use non-capturing groups (?:...) in all other circumstances.

The bigger problem (in my opinion) with JavaScript is that it does not support verbose regexes which would make the creation of readable, complex regular expressions a lot easier.

Steve Levithan’s XRegExp library solves these problems.

Solution 2:

You can use XRegExp, an augmented, extensible, cross-browser implementation of regular expressions, including support for additional syntax, flags, and methods:

  • Adds new regex and replacement text syntax, including comprehensive support for named capture.
  • Adds two new regex flags: s, to make dot match all characters (aka dotall or singleline mode), and x, for free-spacing and comments (aka extended mode).
  • Provides a suite of functions and methods that make complex regex processing a breeze.
  • Automagically fixes the most commonly encountered cross-browser inconsistencies in regex behavior and syntax.
  • Lets you easily create and use plugins that add new syntax and flags to XRegExp’s regular expression language.
Related:  ReactJS - Does render get called any time “setState” is called?

Solution 3:

Another possible solution: create an object containing the group names and indexes.

var regex = new RegExp("(.*) (.*)");
var regexGroups = { FirstName: 1, LastName: 2 };

Then, use the object keys to reference the groups:

var m = regex.exec("John Smith");
var f = m[regexGroups.FirstName];

This improves the readability/quality of the code using the results of the regex, but not the readability of the regex itself.

Solution 4:

In ES6 you can use array destructuring to catch your groups:

let text = '27 months';
let regex = /(\d+)\s*(days?|months?|years?)/;
let [, count, unit] = regex.exec(text) || [];

// count === '27'
// unit === 'months'

Notice:

  • the first comma in the last let skips the first value of the resulting array, which is the whole matched string
  • the || [] after .exec() will prevent a destructuring error when there are no matches (because .exec() will return null)

Solution 5:

Update: It finally made it into JavaScript (ECMAScript 2018)!


Named capturing groups could make it into JavaScript very soon.
The proposal for it is at stage 3 already.

Related:  Chrome disables buttons and input elements when offline

A capture group can be given a name inside angular brackets using the (?<name>...) syntax, for
any identifier name. The regular expression for a date then can be
written as /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u. Each name
should be unique and follow the grammar for ECMAScript IdentifierName.

Named groups can be accessed from properties of a groups property of
the regular expression result. Numbered references to the groups are
also created, just as for non-named groups. For example:

let re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;
let result = re.exec('2015-01-02');
// result.groups.year === '2015';
// result.groups.month === '01';
// result.groups.day === '02';

// result[0] === '2015-01-02';
// result[1] === '2015';
// result[2] === '01';
// result[3] === '02';

Solution 6:

Naming captured groups provide one thing: less confusion with complex regular expressions.

It really depends on your use-case but maybe pretty-printing your regex could help.

Or you could try and define constants to refer to your captured groups.

Comments might then also help to show others who read your code, what you have done.

For the rest I must agree with Tims answer.