Skip to content

Comments

perf(lexer): store escaped identifiers in a Vec#16283

Closed
lilnasy wants to merge 7 commits intooxc-project:mainfrom
lilnasy:escaped-strings-vec
Closed

perf(lexer): store escaped identifiers in a Vec#16283
lilnasy wants to merge 7 commits intooxc-project:mainfrom
lilnasy:escaped-strings-vec

Conversation

@lilnasy
Copy link
Contributor

@lilnasy lilnasy commented Nov 29, 2025

Closes #16208.

@github-actions github-actions bot added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels Nov 29, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 29, 2025

CodSpeed Performance Report

Merging #16283 will improve performances by 3.22%

Comparing lilnasy:escaped-strings-vec (0373f16) with main (43a6c32)

Summary

⚡ 1 improvement
✅ 41 untouched
⏩ 3 skipped1

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
Simulation lexer[RadixUIAdoptionSection.jsx] 21.2 µs 20.5 µs +3.22%

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Member

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. A few comments below.

Run just allocs to fix the CI fail - this alters the number of allocations made in parser, I guess.

@lilnasy lilnasy force-pushed the escaped-strings-vec branch 2 times, most recently from e8c3e92 to 1fe4398 Compare November 30, 2025 09:15
@lilnasy lilnasy marked this pull request as ready for review November 30, 2025 10:05
Copilot AI review requested due to automatic review settings November 30, 2025 10:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes memory allocation in the lexer by replacing HashMaps with Vecs for storing escaped identifiers and template strings. The Token structure is updated to store a 1-based index (32 bits) instead of a boolean flag (8 bits), allowing direct indexing into the Vec storage while using 0 to indicate no escapes.

  • Changes escaped string/template storage from FxHashMap<u32, T> to Vec<T> for better memory efficiency
  • Replaces boolean escaped flag in Token with escape_index (u32) field
  • Updates bit layout in Token to accommodate the new 32-bit field

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tasks/track_memory_allocations/allocs_parser.snap Documents performance improvements showing reduced system allocations
crates/oxc_parser/src/lexer/token.rs Replaces escaped boolean with escape_index u32 field, updates bit layout and all related getters/setters/tests
crates/oxc_parser/src/lexer/mod.rs Changes storage from HashMap to Vec, removes rustc_hash dependency
crates/oxc_parser/src/lexer/string.rs Updates to use Vec with 1-based indexing for escaped string storage
crates/oxc_parser/src/lexer/template.rs Updates to use Vec with 1-based indexing for escaped template storage
crates/oxc_parser/src/cursor.rs Updates API call to pass Token instead of span start

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +159 to +160
/// Returns `index + 1` (to allow for `0` to mean no escapes)
/// of the escaped string otherwise.
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is misleading. It should clarify that this returns a 1-based index into the escaped strings/templates vectors. Consider rewording to:

/// Returns `0` if token has no escape sequences.
/// Returns a 1-based index into the escaped strings/templates vector otherwise.

This makes it clearer that the returned value is directly the index to use (after subtracting 1), not index + 1.

Suggested change
/// Returns `index + 1` (to allow for `0` to mean no escapes)
/// of the escaped string otherwise.
/// Returns a 1-based index into the escaped strings/templates vector otherwise.

Copilot uses AI. Check for mistakes.
@lilnasy lilnasy force-pushed the escaped-strings-vec branch 2 times, most recently from 687b124 to 857b992 Compare November 30, 2025 14:48
Copy link
Member

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this.

The odd thing is that while there's a large improvement on 1 of the lexer benchmarks, the parser benchmarks (which is what matters) actually regress a tiny bit (sub-1% change). It's not the boost I'd hoped for. The mysteries of the compiler 🤷...

But we're within the noise threshold, so it could not indicate very much. It'll be interesting to see if the benchmarks stay the same when you push again, and they re-run.

@overlookmotel
Copy link
Member

overlookmotel commented Dec 3, 2025

Just to let you know, reason I've stalled on this is (a) unfortunately it's not showing the perf improvement in parser that I hoped for, and (b) we may need those 4 spare bytes in Token for something else - oxc-project/backlog#193 (comment) - but that's not clear yet.

In any case, the original motivation for this was to enable unescaping identifiers in tokens in linter (#16207), but it turns out TS-ESLint doesn't do that, so maybe we don't need to either.

@lilnasy
Copy link
Contributor Author

lilnasy commented Dec 3, 2025

I thought that might be the case. My original motivation was to find a good onboard to lexer internals, and I've gotten it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lexer: Store unescaped strings in Vecs instead of HashMaps

2 participants