Skip to content

fix(regex): LineContinuation produces empty code points sequence#13458

Closed
Sysix wants to merge 1 commit intomainfrom
08-31-fix_regex_linecontinuation_produces_the_empty_code_points_sequence
Closed

fix(regex): LineContinuation produces empty code points sequence#13458
Sysix wants to merge 1 commit intomainfrom
08-31-fix_regex_linecontinuation_produces_the_empty_code_points_sequence

Conversation

@Sysix
Copy link
Member

@Sysix Sysix commented Aug 30, 2025

https://tc39.es/ecma262/#sec-literals-string-literals

Note 2
and cannot appear in a string literal, except as part of a LineContinuation to produce the empty code points sequence. The proper way to include either in the String value of a string literal is to use an escape sequence such as \n or \u000A.

This implementation skips the code point.
Maybe return Option<Option<u32>> for parse_string_character?

Copy link
Member Author

Sysix commented Aug 30, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions bot added the C-bug Category - Bug label Aug 30, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Aug 30, 2025

CodSpeed Instrumentation Performance Report

Merging #13458 will not alter performance

Comparing 08-31-fix_regex_linecontinuation_produces_the_empty_code_points_sequence (1698435) with main (5b139aa)

Summary

✅ 37 untouched benchmarks

@Sysix Sysix changed the title fix(regex): LineContinuation produces the empty code points sequence fix(regex): LineContinuation produces empty code points sequence Aug 30, 2025
@Sysix Sysix force-pushed the 08-31-fix_regex_linecontinuation_produces_the_empty_code_points_sequence branch from c26f50a to 096f603 Compare August 31, 2025 00:16
@Sysix Sysix marked this pull request as ready for review August 31, 2025 00:19
Copilot AI review requested due to automatic review settings August 31, 2025 00:19
@Sysix Sysix requested a review from leaysgur as a code owner August 31, 2025 00:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug in the regular expression parser where LineContinuation sequences (backslash followed by line terminators) were incorrectly producing code points instead of empty sequences as required by the ECMAScript specification.

  • Changed parse_line_terminator_sequence to return bool instead of Option<u32> to indicate detection without producing code points
  • Modified parse_string_character to recursively continue parsing when a LineContinuation is detected
  • Added detailed documentation explaining the ECMAScript specification requirement

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@Sysix Sysix force-pushed the 08-31-fix_regex_linecontinuation_produces_the_empty_code_points_sequence branch from 096f603 to 1698435 Compare August 31, 2025 00:37
leaysgur

This comment was marked as outdated.

@leaysgur

This comment was marked as outdated.

Copy link
Member

@leaysgur leaysgur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<LF> and <CR> cannot appear in a string literal, except as part of a LineContinuation to produce the empty code points sequence.

Currently, I think we can properly handle <CR> and <LF> when parsing LineContinuation. Simply skipping all of them doesn't seem appropriate. (And <PS> and <LS> should not be skipped)

The issue might be that the current AST only represents CodePoint, which makes it impossible to determine whether they originated from a LineContinuation.

So... what was the original purpose of this PR?
Was it inconvenient in some specific use case?

@Sysix
Copy link
Member Author

Sysix commented Sep 1, 2025

So... what was the original purpose of this PR?

I found it when working on #13365.
CoPilot found it after wanted to fix this failing test case:

// (
// r#"new RegExp('[ \\u\\\r\nfe0f]')"#, // line continuation: backslash + <CR> + <LF>
// None
// ),

But I guess this is not the root of my problem :)

The issue might be that the current AST only represents CodePoint, which makes it impossible to determine whether they originated from a LineContinuation.

This will need then more refactoring. :/
I guess the empty codepoint does not apply for TemplateLiteral.

Note
TV excludes the code units of LineContinuation while TRV includes them. and LineTerminatorSequences are normalized to for both TV and TRV. An explicit TemplateEscapeSequence is needed to include a or sequence.

@leaysgur
Copy link
Member

leaysgur commented Sep 3, 2025

This will need then more refactoring. :/

Yes...🫠

Do you mind just leaving this as-is for now?

https://github.com/eslint/eslint/blob/a355a0e5b2e6a47cda099b31dc7d112cfb5c4315/tests/lib/rules/no-misleading-character-class.js#L2027C10-L2032C56

I've never seen usage like this, so I want to believe it won't cause any problems.

@Sysix Sysix closed this Sep 5, 2025
@Sysix Sysix deleted the 08-31-fix_regex_linecontinuation_produces_the_empty_code_points_sequence branch September 5, 2025 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-bug Category - Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants