Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(js_lexer): regex escape char #2044

Merged
merged 6 commits into from
Mar 11, 2024
Merged

Conversation

Sec-ant
Copy link
Member

@Sec-ant Sec-ant commented Mar 11, 2024

Summary

After the lexer reads an escape char \ in a regular expression, it should advance a byte or a char, so that valid regexes like /\’/ won't fail.

Fixes #1941.

Test Plan

A test case is added in crates/biome_js_parser/test_data/inline/ok/literals.js

@github-actions github-actions bot added A-Parser Area: parser L-JavaScript Language: JavaScript and super languages labels Mar 11, 2024
Copy link

netlify bot commented Mar 11, 2024

Deploy Preview for biomejs ready!

Name Link
🔨 Latest commit 8964131
🔍 Latest deploy log https://app.netlify.com/sites/biomejs/deploys/65ef1ba04cd60b0008585601
😎 Deploy Preview https://deploy-preview-2044--biomejs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 100 (no change from production)
Accessibility: 97 (no change from production)
Best Practices: 100 (no change from production)
SEO: 93 (no change from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify site configuration.

@Sec-ant Sec-ant changed the title fix(lexer): regex escape char fix(js_lexer): regex escape char Mar 11, 2024
@arendjr
Copy link
Contributor

arendjr commented Mar 11, 2024

Nice! I would appreciate if we can add a test case for this, but the fix makes sense 👍

Copy link

codspeed-hq bot commented Mar 11, 2024

CodSpeed Performance Report

Merging #2044 will degrade performances by 18.31%

Comparing Sec-ant:fix-regex-lexer (8964131) with main (57fa936)

Summary

⚡ 1 improvements
❌ 2 regressions
✅ 90 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main Sec-ant:fix-regex-lexer Change
router.ts[cached] 10.8 ms 11.6 ms -6.72%
big5-added.json[cached] 2.2 ms 2.7 ms -18.31%
db.json[cached] 77.2 ms 71.8 ms +7.63%

@arendjr
Copy link
Contributor

arendjr commented Mar 11, 2024

@Conaclos
Copy link
Member

Conaclos commented Mar 11, 2024

Could you add a changelog entry in CHANGELOG.md? I just committed the Unreleased section on main. You could rebase your PR on main :)

EDIT: you can find some guidelines in our contributing guide.

@Sec-ant
Copy link
Member Author

Sec-ant commented Mar 11, 2024

Lexer test cases can be added here, btw: https://github.com/biomejs/biome/blob/main/crates/biome_js_parser/src/lexer/tests.rs

Yes I was hoping to add a test here, something like this:

#[test]
fn issue_1941() {
    assert_lex! {
        r#"/\’/"#,
        JS_REGEX_LITERAL:6
    }
}

But the lexer needs some context to read this as a regex, or it will produce SLASH:1 ERROR:1 ERROR:3 SLASH:1. I don't know how should I give it the proper context?

@arendjr
Copy link
Contributor

arendjr commented Mar 11, 2024

I think r#"let a = /\’/"# should do the trick.

@Sec-ant
Copy link
Member Author

Sec-ant commented Mar 11, 2024

I think r#"let a = /\’/"# should do the trick.

Still slash and error token. Did I do something wrong?

[
    (
        LET_KW,
        0..3,
    ),
    (
        WHITESPACE,
        3..4,
    ),
    (
        IDENT,
        4..5,
    ),
    (
        WHITESPACE,
        5..6,
    ),
    (
        EQ,
        6..7,
    ),
    (
        WHITESPACE,
        7..8,
    ),
    (
        SLASH,
        8..9,
    ),
    (
        ERROR_TOKEN,
        9..10,
    ),
    (
        ERROR_TOKEN,
        10..13,
    ),
    (
        SLASH,
        13..14,
    ),
]

@arendjr
Copy link
Contributor

arendjr commented Mar 11, 2024

No, it's not your fault. It looks like the lexer cannot determine the presence of regexes on its own. It only happens after the parser tells it to "re-lex" certain nodes. So instead, I think you should add your test case here: https://github.com/biomejs/biome/blob/main/crates/biome_js_parser/test_data/inline/ok/literals.js

@Sec-ant
Copy link
Member Author

Sec-ant commented Mar 11, 2024

@arendjr Yes here it works, thanks!

@Sec-ant
Copy link
Member Author

Sec-ant commented Mar 11, 2024

Could you add a changelog entry in CHANGELOG.md? I just committed the Unreleased section on main. You could rebase your PR on main :)

EDIT: you can find some guidelines in our contributing guide.

@Conaclos Changelog entry added. Should I add changelogs for other PRs (post v1.60) as well? Some are already merged.

@github-actions github-actions bot added A-Formatter Area: formatter A-Website Area: website L-CSS Language: CSS L-JSON Language: JSON and super languages labels Mar 11, 2024
@github-actions github-actions bot removed A-Formatter Area: formatter A-Website Area: website L-CSS Language: CSS labels Mar 11, 2024
Copy link
Contributor

@arendjr arendjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@arendjr
Copy link
Contributor

arendjr commented Mar 11, 2024

Should I add changelogs for other PRs (post v1.60) as well? Some are already merged.

Might be better to do that in a separate PR, I think.

@arendjr
Copy link
Contributor

arendjr commented Mar 11, 2024

Could you run the codegen one more time to update the changelog for the website?

@github-actions github-actions bot added A-Formatter Area: formatter A-Website Area: website L-CSS Language: CSS labels Mar 11, 2024
@Sec-ant
Copy link
Member Author

Sec-ant commented Mar 11, 2024

There're over 100 files modified after just gen. One is the changelog entry I added, but others seem to be introduced from the main branch when I rebased. I splitted them into two commits.

@arendjr
Copy link
Contributor

arendjr commented Mar 11, 2024

Thanks, I guess something was pushed to main without running the codegen. Still waiting for the CI, but I think it will pass 🤞

@arendjr arendjr merged commit 5223bc8 into biomejs:main Mar 11, 2024
19 of 20 checks passed
@arendjr
Copy link
Contributor

arendjr commented Mar 11, 2024

Thanks for the patch!

@Sec-ant Sec-ant deleted the fix-regex-lexer branch March 11, 2024 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Changelog Area: changelog A-Formatter Area: formatter A-Parser Area: parser A-Website Area: website L-CSS Language: CSS L-JavaScript Language: JavaScript and super languages L-JSON Language: JSON and super languages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 Including smart-quote with escape \ confuses the JS lexer
3 participants