Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement workarounds for regex parsing known issues #1603

Merged
merged 4 commits into from
Jul 31, 2023
Merged

Implement workarounds for regex parsing known issues #1603

merged 4 commits into from
Jul 31, 2023

Conversation

adams85
Copy link
Contributor

@adams85 adams85 commented Jul 30, 2023

Implements workarounds for regex parsing known issues.

Also, fixes a bug in unicode mode regex handling (code point -> code unit index translation is not needed as .NET regexes return code unit indices).

Copy link
Collaborator

@lahma lahma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for helping out here! Seems that following test cases can also removed from Test262Harness.settings.json exclude now as they pass after the fixes:

"built-ins/RegExp/match-indices/indices-array-unicode-property-names.js",
"built-ins/RegExp/named-groups/non-unicode-property-names.js",
"built-ins/RegExp/named-groups/unicode-property-names.js",
"built-ins/RegExp/prototype/Symbol.replace/named-groups.js",

Jint/Native/RegExp/RegExpPrototype.cs Outdated Show resolved Hide resolved
Jint/Native/RegExp/RegExpPrototype.cs Outdated Show resolved Hide resolved
@adams85
Copy link
Contributor Author

adams85 commented Jul 31, 2023

PR updated. There's a failing macos test but doesn't seem to have anything to do with my changes:

[xUnit.net 00:00:01.75]     Jint.Tests.PublicInterface.PublicInterfaceTests.BindFunctionInstancesArePublic [FAIL]

Error: System.AggregateException : One or more errors occurred. (Unable to cast object of type 'Jint.Runtime.Environments.GlobalEnvironmentRecord' to type 'Jint.Runtime.Environments.FunctionEnvironmentRecord'.)
---- System.InvalidCastException : Unable to cast object of type 'Jint.Runtime.Environments.GlobalEnvironmentRecord' to type 'Jint.Runtime.Environments.FunctionEnvironmentRecord'.

Is this really something unrelated?

@lahma
Copy link
Collaborator

lahma commented Jul 31, 2023

PR updated. There's a failing macos test but doesn't seem to have anything to do with my changes:

[xUnit.net 00:00:01.75]     Jint.Tests.PublicInterface.PublicInterfaceTests.BindFunctionInstancesArePublic [FAIL]

Error: System.AggregateException : One or more errors occurred. (Unable to cast object of type 'Jint.Runtime.Environments.GlobalEnvironmentRecord' to type 'Jint.Runtime.Environments.FunctionEnvironmentRecord'.)
---- System.InvalidCastException : Unable to cast object of type 'Jint.Runtime.Environments.GlobalEnvironmentRecord' to type 'Jint.Runtime.Environments.FunctionEnvironmentRecord'.

Is this really something unrelated?

Unrelated. Engine is not thread safe and for some reason that test case is racy on macos.

Copy link
Collaborator

@lahma lahma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I guess we can merge?

@adams85
Copy link
Contributor Author

adams85 commented Jul 31, 2023

Yep, I think this one is ready.

However, I just discovered another issue:

  • "bab".replace(/.(?<a>a)(.)/, "$1$2") results "ab", while
  • new Regex(".(?<a>a)(.)", RegexOptions.ECMAScript).Replace("bab", "$1$2") results "ba".

It seems that capturing group numbering is messed up in .NET, at least it doesn't work in the same way as in JS (which assigns strictly increasing numbers to capturing groups, regardless of being a named group or not). This is a pretty sad situation and I'm not sure if we can do anything about it. I'll try to find a workaround but will do that in a separate PR (if it exists at all - but probably that will need to be fixed in Esprima...)

@lahma
Copy link
Collaborator

lahma commented Jul 31, 2023

It's almost like that RegexOptions.ECMAScript wasn't doing it's job 😉

@lahma
Copy link
Collaborator

lahma commented Jul 31, 2023

For some reason I cannot neither update or rebase this branch via GitHub UI, could you please do the rebase locally and force push?

@lahma lahma enabled auto-merge (squash) July 31, 2023 12:15
@lahma lahma merged commit 6eddc61 into sebastienros:main Jul 31, 2023
3 checks passed
@lahma
Copy link
Collaborator

lahma commented Jul 31, 2023

Thanks once again!

@adams85
Copy link
Contributor Author

adams85 commented Jul 31, 2023

Glad to help!

I did some investigation on capturing group numbering. According to MSDN, this "anomaly" is by design:

[...] Captures that use parentheses are numbered automatically from left to right based on the order of the opening parentheses in the regular expression, starting from 1. However, named capture groups are always ordered last, after non-named capture groups. The capture that's numbered 0 is the text matched by the entire regular expression pattern.

Crazy design decision. I never thought I would say this, but in this regard JS is the sane one ... If only MS had got the ECMAScript compatibility mode right at least... But that ship has sailed for good.

I can see only one viable workaround: in the regex parser we rewrite all named capturing groups to numbered ones (and, of course, named backrefs as well), like /.(?<a>a)(.)\k<a>/ -> @".(a)(.)\1". Only problem is that Jint will (and other consumers may) want to look up matches by group names, so we also need to return that information in some form (probably a group name -> group number lookup), besides the Regex instance. The regex parser gathers this information, so technically this workaround should be possible.

I think we can't get away without addressing this issues because it's a big one. Without fixing this, we can't even call legacy regex handling okayish... So looks like we're in for another RC. I'll get back to you soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants