perf(lexer): store escaped identifiers in a `Vec` by lilnasy · Pull Request #16283 · oxc-project/oxc

lilnasy · 2025-11-29T20:47:44Z

Closes #16208.

codspeed-hq · 2025-11-29T20:53:30Z

CodSpeed Performance Report

Merging #16283 will improve performances by 3.22%

_{Comparing lilnasy:escaped-strings-vec (0373f16) with main (43a6c32)}

Summary

⚡ 1 improvement
✅ 41 untouched
⏩ 3 skipped¹

Benchmarks breakdown

	Mode	Benchmark	`BASE`	`HEAD`	Change
⚡	Simulation	`lexer[RadixUIAdoptionSection.jsx]`	21.2 µs	20.5 µs	+3.22%

3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

overlookmotel

Nice work. A few comments below.

Run just allocs to fix the CI fail - this alters the number of allocations made in parser, I guess.

crates/oxc_parser/src/lexer/string.rs

crates/oxc_parser/src/lexer/token.rs

Copilot

Pull request overview

This PR optimizes memory allocation in the lexer by replacing HashMaps with Vecs for storing escaped identifiers and template strings. The Token structure is updated to store a 1-based index (32 bits) instead of a boolean flag (8 bits), allowing direct indexing into the Vec storage while using 0 to indicate no escapes.

Changes escaped string/template storage from FxHashMap<u32, T> to Vec<T> for better memory efficiency
Replaces boolean escaped flag in Token with escape_index (u32) field
Updates bit layout in Token to accommodate the new 32-bit field

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tasks/track_memory_allocations/allocs_parser.snap	Documents performance improvements showing reduced system allocations
crates/oxc_parser/src/lexer/token.rs	Replaces `escaped` boolean with `escape_index` u32 field, updates bit layout and all related getters/setters/tests
crates/oxc_parser/src/lexer/mod.rs	Changes storage from HashMap to Vec, removes rustc_hash dependency
crates/oxc_parser/src/lexer/string.rs	Updates to use Vec with 1-based indexing for escaped string storage
crates/oxc_parser/src/lexer/template.rs	Updates to use Vec with 1-based indexing for escaped template storage
crates/oxc_parser/src/cursor.rs	Updates API call to pass Token instead of span start

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-30T10:07:48Z

crates/oxc_parser/src/lexer/token.rs

+    /// Returns `index + 1` (to allow for `0` to mean no escapes)
+    /// of the escaped string otherwise.


The documentation is misleading. It should clarify that this returns a 1-based index into the escaped strings/templates vectors. Consider rewording to:

/// Returns `0` if token has no escape sequences. /// Returns a 1-based index into the escaped strings/templates vector otherwise.

This makes it clearer that the returned value is directly the index to use (after subtracting 1), not index + 1.

Suggested change

/// Returns `index + 1` (to allow for `0` to mean no escapes)

/// of the escaped string otherwise.

/// Returns a 1-based index into the escaped strings/templates vector otherwise.

overlookmotel

Thanks for doing this.

The odd thing is that while there's a large improvement on 1 of the lexer benchmarks, the parser benchmarks (which is what matters) actually regress a tiny bit (sub-1% change). It's not the boost I'd hoped for. The mysteries of the compiler 🤷...

But we're within the noise threshold, so it could not indicate very much. It'll be interesting to see if the benchmarks stay the same when you push again, and they re-run.

crates/oxc_parser/src/lexer/token.rs

overlookmotel · 2025-12-03T22:11:58Z

Just to let you know, reason I've stalled on this is (a) unfortunately it's not showing the perf improvement in parser that I hoped for, and (b) we may need those 4 spare bytes in Token for something else - oxc-project/backlog#193 (comment) - but that's not clear yet.

In any case, the original motivation for this was to enable unescaping identifiers in tokens in linter (#16207), but it turns out TS-ESLint doesn't do that, so maybe we don't need to either.

lilnasy · 2025-12-03T22:32:42Z

I thought that might be the case. My original motivation was to find a good onboard to lexer internals, and I've gotten it!

github-actions bot added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels Nov 29, 2025

overlookmotel reviewed Nov 30, 2025

View reviewed changes

crates/oxc_parser/src/lexer/string.rs Outdated Show resolved Hide resolved

crates/oxc_parser/src/lexer/string.rs Outdated Show resolved Hide resolved

crates/oxc_parser/src/lexer/token.rs Show resolved Hide resolved

lilnasy force-pushed the escaped-strings-vec branch 2 times, most recently from e8c3e92 to 1fe4398 Compare November 30, 2025 09:15

lilnasy marked this pull request as ready for review November 30, 2025 10:05

Copilot AI review requested due to automatic review settings November 30, 2025 10:05

Copilot started reviewing on behalf of lilnasy November 30, 2025 10:05 View session

Copilot finished reviewing on behalf of lilnasy November 30, 2025 10:07

Copilot AI reviewed Nov 30, 2025

View reviewed changes

lilnasy force-pushed the escaped-strings-vec branch 2 times, most recently from 687b124 to 857b992 Compare November 30, 2025 14:48

overlookmotel reviewed Dec 1, 2025

View reviewed changes

crates/oxc_parser/src/lexer/token.rs Outdated Show resolved Hide resolved

crates/oxc_parser/src/lexer/token.rs Outdated Show resolved Hide resolved

crates/oxc_parser/src/lexer/token.rs Outdated Show resolved Hide resolved

lilnasy added 7 commits December 1, 2025 07:52

perf(lexer): store escaped identifiers in a Vec

14c7d4a

just allocs

3c7da7d

avoid sentinel values

ca33ccf

move escape_index to the last 32 bits in the token memory layout

9a21707

undo unimportant comment diff

547d85d

move comment and debug fields in layout order

63bfece

remove is_valid_shift assertion for ESCAPE_INDEX_SHIFT

0373f16

lilnasy force-pushed the escaped-strings-vec branch from 8a0fa5b to 0373f16 Compare December 1, 2025 02:22

overlookmotel mentioned this pull request Dec 1, 2025

perf(parser): remove bounds check for getting strings from lexer #16135

Closed

lilnasy closed this Dec 3, 2025

overlookmotel mentioned this pull request Dec 7, 2025

Lexer: Store unescaped strings in Vecs instead of HashMaps #16208

Closed

lilnasy deleted the escaped-strings-vec branch December 16, 2025 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

perf(lexer): store escaped identifiers in a `Vec`#16283

perf(lexer): store escaped identifiers in a `Vec`#16283
lilnasy wants to merge 7 commits intooxc-project:mainfrom
lilnasy:escaped-strings-vec

lilnasy commented Nov 29, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Nov 29, 2025 •

edited

Loading

Uh oh!

overlookmotel left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

overlookmotel left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

overlookmotel commented Dec 3, 2025 •

edited

Loading

Uh oh!

lilnasy commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		/// Returns `index + 1` (to allow for `0` to mean no escapes)
		/// of the escaped string otherwise.

	/// Returns `index + 1` (to allow for `0` to mean no escapes)
	/// of the escaped string otherwise.
	/// Returns a 1-based index into the escaped strings/templates vector otherwise.

Uh oh!

Comments

Conversation

lilnasy commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #16283 will improve performances by 3.22%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

overlookmotel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

overlookmotel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

overlookmotel commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lilnasy commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lilnasy commented Nov 29, 2025 •

edited

Loading

codspeed-hq bot commented Nov 29, 2025 •

edited

Loading

overlookmotel left a comment •

edited

Loading

overlookmotel left a comment •

edited

Loading

overlookmotel commented Dec 3, 2025 •

edited

Loading