Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Esprima 3.0.0-rc-03 #1596

Merged
merged 4 commits into from
Jul 29, 2023
Merged

Upgrade to Esprima 3.0.0-rc-03 #1596

merged 4 commits into from
Jul 29, 2023

Conversation

lahma
Copy link
Collaborator

@lahma lahma commented Jul 29, 2023

No description provided.

Copy link
Contributor

@adams85 adams85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we should also address this Unicode mode (flag u) problem:

There could be false positive empty string matches in the middle of surrogate pairs. Patterns as simple as /a?/ have this issue when the input string contains astral Unicode chars. I didn't find a workaround for this but I'm pretty sure that, if it's exists, it's very complicated. This is a big problem but can be mitigated somewhat using a bit of "post-processing", that is, filtering out the false positive matches after evaluation like it's done here. I'm afraid we can't do anything better than this until .NET adds the options to treat the input string as Unicode code points.

If the issue is not clear, just check the output of this expression: [..."💩".matchAll(/a?/ug)]. It should return only 2 matches (with index 0 and 2) but Jint will report 3 matches (an extra fake one with index 1) unless we do something about it.

What do you think? Should we, as a best effort, remove such false positive matches during regex evaluation? I'm not familiar with the Jint internals but I suppose it could be done by adjusting RegExpPrototype.

Jint/Native/RegExp/RegExpConstructor.cs Outdated Show resolved Hide resolved
@lahma
Copy link
Collaborator Author

lahma commented Jul 29, 2023

I think you are the best authority to assess this as you've gone so deep with regex parsing. Jint generally just tries its best with produced regexes. Many seem as quite advanced scenarios so not sure how often people will hit limitations.

Maybe after this has been merged you could check if you can work some magic on Jint side? No pressure as Jint probably isn't your main scenario to tweak Esprima 🙂

@adams85
Copy link
Contributor

adams85 commented Jul 29, 2023

Sure thing. I'll have a look soon if the situation can be improved.

@lahma lahma enabled auto-merge (squash) July 29, 2023 19:00
@lahma lahma merged commit ac1cec8 into sebastienros:main Jul 29, 2023
3 checks passed
@lahma lahma deleted the esprima-3-rc3 branch July 29, 2023 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants