Skip to content

Commit

Permalink
Fix emulation group capture map
Browse files Browse the repository at this point in the history
  • Loading branch information
slevithan committed Aug 18, 2024
1 parent 88a5d8c commit 345c73e
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 21 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
- Combining atomic/possessive syntax with subroutines previously resulted in subroutines using capturing wrappers. This is now avoided when the regex doesn’t use backreferences, resulting in faster-running generated regex source.
- Possessive fixed repetition quantifiers (e.g. `{2}+`) are now converted to greedy quantifiers, which gives the same behavior with faster-running generated regex source.

**Fixes**

- When using extended syntax (e.g. atomic groups) that resulted in the use of emulation groups in generated source, the `subclass: true` option led to incorrect values for submatches that preceded emulation groups.

## Released changes

Changes for released versions are tracked on the GitHub [releases](https://github.com/slevithan/regex/releases) page.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ Here's how possessive quantifier syntax compares to the greedy and lazy quantifi
| *N* or more | `{2,}` | `{2,}?` | `{2,}+` |
| Between *N* and *M* | `{0,5}` | `{0,5}?` | `{0,5}+` |

> Fixed repetition quantifiers like `{2}` behave the same whether they're greedy `{2}`, lazy `{2}?`, or possessive `{2}+`.
> Fixed repetition quantifiers behave the same whether they're greedy `{2}`, lazy `{2}?`, or possessive `{2}+`.
> [!NOTE]
> Possessive quantifiers are supported in many other regex flavors. There's a [proposal](https://github.com/tc39/proposal-regexp-atomic-operators) to add them to JavaScript.
Expand Down
1 change: 1 addition & 0 deletions spec/regex-tag.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ describe('regex', () => {
it('should adjust for emulation groups when referencing groups by number from outside the regex', () => {
// RegExp#exec
expect(regex({subclass: true})`(?>(?<a>.))(?<b>.)`.exec('ab')[2]).toBe('b');
expect(regex({subclass: true})`(?<a>.)(?>(?<b>.))`.exec('ab')[1]).toBe('a');
// String#replace: replacement string
expect('ab'.replace(regex({subclass: true})`(?>(?<a>.))(?<b>.)`, '$2$1')).toBe('ba');
// String#replace: replacement function
Expand Down
46 changes: 26 additions & 20 deletions src/regex.js
Original file line number Diff line number Diff line change
Expand Up @@ -132,27 +132,27 @@ const regexFromTemplate = (options, template, ...substitutions) => {
].forEach(p => expression = p(expression, {flags: fullFlags, useEmulationGroups: subclass}));
if (subclass) {
const unmarked = unmarkEmulationGroups(expression);
return new WrappedRegex(unmarked.expression, fullFlags, {emulationGroups: unmarked.emulationGroups});
return new WrappedRegex(unmarked.expression, fullFlags, {captureMap: unmarked.captureMap});
}
return new RegExp(expression, fullFlags);
}

class WrappedRegex extends RegExp {
#emulationGroups;
#captureMap;
/**
@param {string | WrappedRegex} expression
@param {string} [flags]
@param {{emulationGroups: Array<boolean>;}} [data]
@param {{captureMap: Array<boolean>;}} [data]
*/
constructor(expression, flags, data) {
super(expression, flags);
if (data) {
this.#emulationGroups = data.emulationGroups;
this.#captureMap = data.captureMap;
// The third argument `data` isn't provided when regexes are copied as part of the internal
// handling of string methods `matchAll` and `split`
} else if (expression instanceof WrappedRegex) {
// Can read private properties of the existing object since it was created by this class
this.#emulationGroups = expression.#emulationGroups;
this.#captureMap = expression.#captureMap;
}
}
/**
Expand All @@ -163,14 +163,14 @@ class WrappedRegex extends RegExp {
*/
exec(str) {
const match = RegExp.prototype.exec.call(this, str);
if (!match || !this.#emulationGroups) {
if (!match || !this.#captureMap) {
return match;
}
const copy = [...match];
// Empty all but the first value of the array while preserving its other properties
match.length = 1;
for (let i = 1; i < copy.length; i++) {
if (!this.#emulationGroups[i]) {
if (this.#captureMap[i]) {
match.push(copy[i]);
}
}
Expand Down Expand Up @@ -305,24 +305,30 @@ function transformForLocalFlags(re, outerFlags) {
}

/**
Build the emulation group map and remove markers for anonymous captures which were added to emulate
extended syntax.
Build the capturing group map (with emulation groups marked as `false` to indicate their submatches
shouldn't appear in results), and remove the markers for anonymous captures which were added to
emulate extended syntax.
@param {string} expression
@returns {{expression: string; emulationGroups: Array<boolean>;}}
@returns {{expression: string; captureMap: Array<boolean>;}}
*/
function unmarkEmulationGroups(expression) {
const marker = emulationGroupMarker.replace(/\$/g, '\\$');
const emulationGroups = [false];
expression = replaceUnescaped(expression, `(?:${capturingDelim})${marker}`, ({0: m}) => {
if (m.endsWith(emulationGroupMarker)) {
emulationGroups.push(true);
return m.slice(0, -emulationGroupMarker.length);
}
emulationGroups.push(false);
return m;
}, Context.DEFAULT);
const captureMap = [true];
expression = replaceUnescaped(
expression,
`(?:${capturingDelim})(?<mark>${marker})?`,
({0: m, groups: {mark}}) => {
if (mark) {
captureMap.push(false);
return m.slice(0, -emulationGroupMarker.length);
}
captureMap.push(true);
return m;
},
Context.DEFAULT
);
return {
emulationGroups,
captureMap,
expression,
};
}
Expand Down

0 comments on commit 345c73e

Please sign in to comment.