Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct regular expression flags scanning for non-BMP characters #58612

Merged
merged 5 commits into from
Jun 4, 2024

Conversation

graphemecluster
Copy link
Contributor

@graphemecluster graphemecluster commented May 22, 2024

Per the IsValidRegularExpressionLiteral static semantics in the ECMA262 specification, FlagText should be interpreted in code points, not code units.

Cherry picked from #58289@e67692a.

@typescript-bot
Copy link
Collaborator

This PR doesn't have any linked issues. Please open an issue that references this PR. From there we can discuss and prioritise.

@typescript-bot typescript-bot added the For Uncommitted Bug PR for untriaged, rejected, closed or missing bug label May 22, 2024
@rbuckton
Copy link
Member

Overall, this looks good. It looks like there are some failing tests that still need to be resolved. Also, since this isn't a bug fix, per se, I expect we will want to hold off from putting this in the 5.5 RC and will wait to merge after main is open for TypeScript 5.6 work.

Copy link
Member

@rbuckton rbuckton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching from "Approve" to "Request Changes" until the test issues are resolved.

@graphemecluster
Copy link
Contributor Author

IMO it is a bug fix especially since I forgot to revert the change on line 2561 in #58289. And it’s important to align the scanner behavior to the spec.
#58615 is a bug fix, too, that is even more critical since the bug breaks real-world codebases.
#58613 is not the case, but (as an excuse) it was supposed to be included in #55600.

@rbuckton
Copy link
Member

rbuckton commented May 31, 2024

IMO it is a bug fix especially since I forgot to revert the change on line 2561 in #58289. And it’s important to align the scanner behavior to the spec.

What change do you mean? If it's not related to code points then I could see taking a small fix for that. Properly handling non-BMP characters in flags doesn't come up in practice and delaying this specific functionality will only result in a slightly less informative error since none of the valid flag/modifier characters will ever be equivalent to the first byte of a multi-byte code point.

#58615 is a bug fix, too, that is even more critical since the bug breaks real-world codebases. #58613 is not the case, but (as an excuse) it was supposed to be included in #55600.

#58613 can wait till 5.6, but I'll review #58615 shortly.

@graphemecluster
Copy link
Contributor Author

What change do you mean? If it's not related to code points then I could see taking a small fix for that.

This line, which causes the String.fromCharCode below to convert incorrectly (as the method only consider the 16 least significant bits).

@rbuckton
Copy link
Member

What change do you mean? If it's not related to code points then I could see taking a small fix for that.

This line, which causes the String.fromCharCode below to convert incorrectly (as the method only consider the 16 least significant bits).

I've just put up a minimal fix for that line in #58727.

Use `codePointChecked` instead of `charCodeChecked` in `reScanSlashToken`
@rbuckton rbuckton added this to the TypeScript 5.6.0 milestone Jun 4, 2024
@rbuckton rbuckton merged commit dc1ffb1 into microsoft:main Jun 4, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
For Uncommitted Bug PR for untriaged, rejected, closed or missing bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants