Fix capturing group numbering and a few more issues in regex parsing #392
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The JS regex engine assigns numbers to capturing groups sequentially (regardless of the group being named or not named) but .NET uses a different, weird approach:
This could totally mess up numbered backreferences and replace pattern references. (See also sebastienros/jint#1603 (comment)). So, as a workaround, we wrap all named capturing groups in a non-named capturing group to force .NET to include all capturing groups in the resulting match in the expected order. (For example
/.(?<a>a)(.)\1/
will be converted to@".((?<a>a))(.)\1"
, which make it work as expected).Of course, this won't prevent named groups from being listed after the numbered ones but we can't really do anything about that other than returning the actual count of groups (
RegExpParseResult.ActualRegexGroupCount
). Using this information, consumers can discard the irrelevant groups. We also provide a method (RegExpParseResult.GetRegexGroupName
) for querying the group name by number, which can be used as a replacement forRegex.GroupNameFromNumber
.This means that we can no longer get away with storing a plain
Regex
object in the AST or returning it fromScanner.AdaptRegExp
. We need to put a wrapper (RegExpParseResult
) around theRegex
object to be able to store the related information. (RegExpParseResult
also allows us to improve error reporting a bit.) How do you feel about this API change? (BTW, I tried to adjust Jint to this change and could do that without major problems. I'll open a PR over there soon so you can evaluate that side of the "equation" as well.)Also fixed a minor bug related to lone surrogate matching. (Test262 test
language/literals/regexp/u-surrogate-pairs-atom-escape-decimal.js
will now pass.)