Skip to content

perf(all): experiment: add dedicated identifier string type#16416

Closed
camchenry wants to merge 2 commits intomainfrom
12-02-perf_all_experiment_add_dedicated_identifier_string_type
Closed

perf(all): experiment: add dedicated identifier string type#16416
camchenry wants to merge 2 commits intomainfrom
12-02-perf_all_experiment_add_dedicated_identifier_string_type

Conversation

@camchenry
Copy link
Member

@camchenry camchenry commented Dec 3, 2025

This is a proof of concept. Mainly to see how bad the damage is if we add a new Ident type. Will help guide the API and also how we approach implementing this.

@github-actions github-actions bot added A-linter Area - Linter A-parser Area - Parser A-semantic Area - Semantic A-cli Area - CLI A-minifier Area - Minifier A-ast Area - AST A-transformer Area - Transformer / Transpiler A-isolated-declarations Isolated Declarations A-ast-tools Area - AST tools A-formatter Area - Formatter A-linter-plugins Area - Linter JS plugins labels Dec 3, 2025
Copy link
Member Author


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions bot added the C-performance Category - Solution not expected to change functional behavior, only performance label Dec 3, 2025
@camchenry camchenry force-pushed the 12-02-perf_all_experiment_add_dedicated_identifier_string_type branch from 72d7eb5 to bfbde4f Compare December 3, 2025 02:09
@codspeed-hq
Copy link

codspeed-hq bot commented Dec 3, 2025

CodSpeed Performance Report

Merging #16416 will degrade performances by 3.17%

Comparing 12-02-perf_all_experiment_add_dedicated_identifier_string_type (38dbc7d) with main (514c724)

Summary

⚡ 3 improvements
❌ 1 regression
✅ 38 untouched
⏩ 3 skipped1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
Simulation isolated-declarations[vue-id.ts] 58 ms 55 ms +5.57%
Simulation semantic[binder.ts] 4.1 ms 3.9 ms +5.1%
Simulation semantic[react.development.js] 1.6 ms 1.5 ms +3.77%
Simulation parser[binder.ts] 3.2 ms 3.3 ms -3.17%

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@camchenry camchenry force-pushed the 12-02-perf_all_experiment_add_dedicated_identifier_string_type branch from bfbde4f to d0d090f Compare December 3, 2025 02:16
@camchenry

This comment was marked as outdated.

@camchenry camchenry force-pushed the 12-02-perf_all_experiment_add_dedicated_identifier_string_type branch 5 times, most recently from e20a200 to c3790ea Compare December 4, 2025 03:44
@camchenry
Copy link
Member Author

@overlookmotel I ended up making a lot of changes I didn't mean to, so please excuse some of the unnecessary changes. I think I've committed a crime on our codebase. However, the main point is that I converted some of the usage of FxHashMap<Atom, _> to use IdentHashMap<_> instead. I did not include the arena-allocated hash maps as that presents quite a challenge in implementation, though I think that's where some of the most valuable work might be.

Regardless, we are starting to at least some positive benefits that we would expect. It's essentially breakeven once you take into account the additional parsing time, but it's 2-3% faster in semantic already:
image

Comment on lines 41 to 47
// NOTE: This is creating a fresh hasher for each identifier, which is probably bad for performance?
// But, I want to see how terrible it is and keep the API simple for testing.
let hash = {
let mut hasher = FxHasher::default();
s.hash(&mut hasher);
hasher.finish()
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a hasher costs nothing. Do "jump to definition" on FxHasher::default() and you'll see.

Copy link
Member

@overlookmotel overlookmotel Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized something. hasher.finish() is rotating the bits (again, do "jump to definition"). That's not what we want to happen as we want the bits with highest entropy in top 32 bits - we do the rotation ourselves in Hash::hash instead.

So I'd suggest undoing the rotation again here.

let hash = {
    let mut hasher = FxHasher::default();
    s.hash(&mut hasher);
    hasher.finish()
};

// `FxHasher::finish` performs a rotation.
// That's not what we want as we want the highest entropy bits in top 32 bits.
// Undo that rotation here by rotating back in the opposite direction.
// This code is the exact reverse of the code in `FxHasher::finish`,
// so compiler should see that together they make a no-op, and remove both rotations.
#[cfg(target_pointer_width = "64")]
const ROTATE: u32 = 26;
#[cfg(target_pointer_width = "32")]
const ROTATE: u32 = 15;

let hash = hash.rotate_right(ROTATE);

If we have a higher-entropy hash, that should mean less collisions. If so, it might move the semantic benchmarks a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good to know that doing FxHasher::default() is actually not doing anything. If we end up porting the fxhash code into oxc, hopefully we can make the rotation simpler (by just not doing it), and not need to rely on this hopefully turning into a no-op.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is a bit of a hack. We will want to pull FxHasher's code into Oxc as you say. But I'd be very confident compiler will remove it, as long as FxHasher::finish gets inlined (which it really should). Compiler is really good at simple optimizations like this.

https://godbolt.org/z/334EEeKvf

Anyway, it's only 2 ops if it doesn't. The thing that might have an effect on perf is if we get less hash collisions.

Comment on lines 240 to 247
impl<'new_alloc> CloneIn<'new_alloc> for Ident<'_> {
type Cloned = Ident<'new_alloc>;

#[inline]
fn clone_in(&self, allocator: &'new_alloc Allocator) -> Self::Cloned {
Ident::from_in(self.as_str(), allocator)
}
}
Copy link
Member

@overlookmotel overlookmotel Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the place where re-hashing will be happening unnecessarily in semantic when copying from 1 arena to another.

I think this should do the trick - copy string, but don't recalculate hash:

impl<'new_alloc> CloneIn<'new_alloc> for Ident<'_> {
    type Cloned = Ident<'new_alloc>;

    #[inline]
    fn clone_in(&self, allocator: &'new_alloc Allocator) -> Self::Cloned {
        let s = allocator.alloc_str(self.as_str());
        let ptr = NonNull::from(s).cast::<u8>();
        Ident { ptr, len_and_hash: self.len_and_hash, _marker: PhantomData }
    }
}

If there are any other methods which create an Ident from another Ident, they should do the same - but I think this is maybe the only one?

@camchenry camchenry force-pushed the 12-02-perf_all_experiment_add_dedicated_identifier_string_type branch from c3790ea to 7b76c1d Compare December 5, 2025 17:05
@camchenry camchenry force-pushed the 12-02-perf_all_experiment_add_dedicated_identifier_string_type branch 2 times, most recently from efb0e00 to 5d0ddea Compare December 5, 2025 19:48
@github-actions github-actions bot added the A-editor Area - Editor and Language Server label Dec 5, 2025
@camchenry camchenry force-pushed the 12-02-perf_all_experiment_add_dedicated_identifier_string_type branch 3 times, most recently from 488ef40 to 713e396 Compare December 6, 2025 02:47
…#16509)

Comment fix
(https://eslint.org/docs/latest/rules/arrow-body-style)

---------

Signed-off-by: GRK <gauravrkochar@gmail.com>
Co-authored-by: Connor Shea <connor.james.shea@gmail.com>
@camchenry camchenry force-pushed the 12-02-perf_all_experiment_add_dedicated_identifier_string_type branch from 713e396 to 24a8c14 Compare December 6, 2025 04:05
graphite-app bot pushed a commit that referenced this pull request Jan 1, 2026
`phf_set!` is good for large lists of strings, but in cases of small lists, it is often faster to just do a `.contains` or a `matches!`. It also slightly improves compile times since no work needs to be done to hash at compile time.

It's possible the math on this might flip once we implement `Ident` (i.e., #16416), since the hashes will be pre-computed at parse-time.
@overlookmotel overlookmotel removed the A-linter-plugins Area - Linter JS plugins label Jan 20, 2026
@Boshen Boshen self-assigned this Jan 21, 2026
@Boshen
Copy link
Member

Boshen commented Jan 21, 2026

I'll move this forward.

@Boshen
Copy link
Member

Boshen commented Jan 23, 2026

closed by the stack #18400

@Boshen Boshen closed this Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ast Area - AST A-ast-tools Area - AST tools A-cli Area - CLI A-editor Area - Editor and Language Server A-formatter Area - Formatter A-isolated-declarations Isolated Declarations A-linter Area - Linter A-minifier Area - Minifier A-parser Area - Parser A-semantic Area - Semantic A-transformer Area - Transformer / Transpiler C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants