perf(lexer): store escaped identifiers in a Vec#16283
perf(lexer): store escaped identifiers in a Vec#16283lilnasy wants to merge 7 commits intooxc-project:mainfrom
Vec#16283Conversation
CodSpeed Performance ReportMerging #16283 will improve performances by 3.22%Comparing Summary
Benchmarks breakdown
Footnotes
|
e8c3e92 to
1fe4398
Compare
There was a problem hiding this comment.
Pull request overview
This PR optimizes memory allocation in the lexer by replacing HashMaps with Vecs for storing escaped identifiers and template strings. The Token structure is updated to store a 1-based index (32 bits) instead of a boolean flag (8 bits), allowing direct indexing into the Vec storage while using 0 to indicate no escapes.
- Changes escaped string/template storage from
FxHashMap<u32, T>toVec<T>for better memory efficiency - Replaces boolean
escapedflag in Token withescape_index(u32) field - Updates bit layout in Token to accommodate the new 32-bit field
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tasks/track_memory_allocations/allocs_parser.snap | Documents performance improvements showing reduced system allocations |
| crates/oxc_parser/src/lexer/token.rs | Replaces escaped boolean with escape_index u32 field, updates bit layout and all related getters/setters/tests |
| crates/oxc_parser/src/lexer/mod.rs | Changes storage from HashMap to Vec, removes rustc_hash dependency |
| crates/oxc_parser/src/lexer/string.rs | Updates to use Vec with 1-based indexing for escaped string storage |
| crates/oxc_parser/src/lexer/template.rs | Updates to use Vec with 1-based indexing for escaped template storage |
| crates/oxc_parser/src/cursor.rs | Updates API call to pass Token instead of span start |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Returns `index + 1` (to allow for `0` to mean no escapes) | ||
| /// of the escaped string otherwise. |
There was a problem hiding this comment.
The documentation is misleading. It should clarify that this returns a 1-based index into the escaped strings/templates vectors. Consider rewording to:
/// Returns `0` if token has no escape sequences.
/// Returns a 1-based index into the escaped strings/templates vector otherwise.
This makes it clearer that the returned value is directly the index to use (after subtracting 1), not index + 1.
| /// Returns `index + 1` (to allow for `0` to mean no escapes) | |
| /// of the escaped string otherwise. | |
| /// Returns a 1-based index into the escaped strings/templates vector otherwise. |
687b124 to
857b992
Compare
There was a problem hiding this comment.
Thanks for doing this.
The odd thing is that while there's a large improvement on 1 of the lexer benchmarks, the parser benchmarks (which is what matters) actually regress a tiny bit (sub-1% change). It's not the boost I'd hoped for. The mysteries of the compiler 🤷...
But we're within the noise threshold, so it could not indicate very much. It'll be interesting to see if the benchmarks stay the same when you push again, and they re-run.
8a0fa5b to
0373f16
Compare
|
Just to let you know, reason I've stalled on this is (a) unfortunately it's not showing the perf improvement in parser that I hoped for, and (b) we may need those 4 spare bytes in In any case, the original motivation for this was to enable unescaping identifiers in tokens in linter (#16207), but it turns out TS-ESLint doesn't do that, so maybe we don't need to either. |
|
I thought that might be the case. My original motivation was to find a good onboard to lexer internals, and I've gotten it! |
Closes #16208.