fix(regex): LineContinuation produces empty code points sequence#13458
fix(regex): LineContinuation produces empty code points sequence#13458
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
CodSpeed Instrumentation Performance ReportMerging #13458 will not alter performanceComparing Summary
|
c26f50a to
096f603
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a bug in the regular expression parser where LineContinuation sequences (backslash followed by line terminators) were incorrectly producing code points instead of empty sequences as required by the ECMAScript specification.
- Changed
parse_line_terminator_sequenceto returnboolinstead ofOption<u32>to indicate detection without producing code points - Modified
parse_string_characterto recursively continue parsing when a LineContinuation is detected - Added detailed documentation explaining the ECMAScript specification requirement
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
crates/oxc_regular_expression/src/parser/reader/string_literal_parser/parser_impl.rs
Show resolved
Hide resolved
crates/oxc_regular_expression/src/parser/reader/string_literal_parser/parser_impl.rs
Outdated
Show resolved
Hide resolved
096f603 to
1698435
Compare
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
<LF>and<CR>cannot appear in a string literal, except as part of a LineContinuation to produce the empty code points sequence.
Currently, I think we can properly handle <CR> and <LF> when parsing LineContinuation. Simply skipping all of them doesn't seem appropriate. (And <PS> and <LS> should not be skipped)
The issue might be that the current AST only represents CodePoint, which makes it impossible to determine whether they originated from a LineContinuation.
So... what was the original purpose of this PR?
Was it inconvenient in some specific use case?
I found it when working on #13365. oxc/crates/oxc_linter/src/rules/eslint/no_misleading_character_class.rs Lines 582 to 585 in 9ca9dc5 But I guess this is not the root of my problem :)
This will need then more refactoring. :/
|
Yes...🫠 Do you mind just leaving this as-is for now? I've never seen usage like this, so I want to believe it won't cause any problems. |

https://tc39.es/ecma262/#sec-literals-string-literals
This implementation skips the code point.
Maybe return
Option<Option<u32>>forparse_string_character?