perf(all): experiment: add dedicated identifier string type#16416
perf(all): experiment: add dedicated identifier string type#16416
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
72d7eb5 to
bfbde4f
Compare
CodSpeed Performance ReportMerging #16416 will degrade performances by 3.17%Comparing Summary
Benchmarks breakdown
Footnotes
|
bfbde4f to
d0d090f
Compare
This comment was marked as outdated.
This comment was marked as outdated.
e20a200 to
c3790ea
Compare
|
@overlookmotel I ended up making a lot of changes I didn't mean to, so please excuse some of the unnecessary changes. I think I've committed a crime on our codebase. However, the main point is that I converted some of the usage of Regardless, we are starting to at least some positive benefits that we would expect. It's essentially breakeven once you take into account the additional parsing time, but it's 2-3% faster in semantic already: |
crates/oxc_span/src/ident.rs
Outdated
| // NOTE: This is creating a fresh hasher for each identifier, which is probably bad for performance? | ||
| // But, I want to see how terrible it is and keep the API simple for testing. | ||
| let hash = { | ||
| let mut hasher = FxHasher::default(); | ||
| s.hash(&mut hasher); | ||
| hasher.finish() | ||
| }; |
There was a problem hiding this comment.
Creating a hasher costs nothing. Do "jump to definition" on FxHasher::default() and you'll see.
There was a problem hiding this comment.
I just realized something. hasher.finish() is rotating the bits (again, do "jump to definition"). That's not what we want to happen as we want the bits with highest entropy in top 32 bits - we do the rotation ourselves in Hash::hash instead.
So I'd suggest undoing the rotation again here.
let hash = {
let mut hasher = FxHasher::default();
s.hash(&mut hasher);
hasher.finish()
};
// `FxHasher::finish` performs a rotation.
// That's not what we want as we want the highest entropy bits in top 32 bits.
// Undo that rotation here by rotating back in the opposite direction.
// This code is the exact reverse of the code in `FxHasher::finish`,
// so compiler should see that together they make a no-op, and remove both rotations.
#[cfg(target_pointer_width = "64")]
const ROTATE: u32 = 26;
#[cfg(target_pointer_width = "32")]
const ROTATE: u32 = 15;
let hash = hash.rotate_right(ROTATE);If we have a higher-entropy hash, that should mean less collisions. If so, it might move the semantic benchmarks a bit.
There was a problem hiding this comment.
Ah, good to know that doing FxHasher::default() is actually not doing anything. If we end up porting the fxhash code into oxc, hopefully we can make the rotation simpler (by just not doing it), and not need to rely on this hopefully turning into a no-op.
There was a problem hiding this comment.
Yes this is a bit of a hack. We will want to pull FxHasher's code into Oxc as you say. But I'd be very confident compiler will remove it, as long as FxHasher::finish gets inlined (which it really should). Compiler is really good at simple optimizations like this.
https://godbolt.org/z/334EEeKvf
Anyway, it's only 2 ops if it doesn't. The thing that might have an effect on perf is if we get less hash collisions.
| impl<'new_alloc> CloneIn<'new_alloc> for Ident<'_> { | ||
| type Cloned = Ident<'new_alloc>; | ||
|
|
||
| #[inline] | ||
| fn clone_in(&self, allocator: &'new_alloc Allocator) -> Self::Cloned { | ||
| Ident::from_in(self.as_str(), allocator) | ||
| } | ||
| } |
There was a problem hiding this comment.
This is the place where re-hashing will be happening unnecessarily in semantic when copying from 1 arena to another.
I think this should do the trick - copy string, but don't recalculate hash:
impl<'new_alloc> CloneIn<'new_alloc> for Ident<'_> {
type Cloned = Ident<'new_alloc>;
#[inline]
fn clone_in(&self, allocator: &'new_alloc Allocator) -> Self::Cloned {
let s = allocator.alloc_str(self.as_str());
let ptr = NonNull::from(s).cast::<u8>();
Ident { ptr, len_and_hash: self.len_and_hash, _marker: PhantomData }
}
}If there are any other methods which create an Ident from another Ident, they should do the same - but I think this is maybe the only one?
c3790ea to
7b76c1d
Compare
efb0e00 to
5d0ddea
Compare
488ef40 to
713e396
Compare
…#16509) Comment fix (https://eslint.org/docs/latest/rules/arrow-body-style) --------- Signed-off-by: GRK <gauravrkochar@gmail.com> Co-authored-by: Connor Shea <connor.james.shea@gmail.com>
713e396 to
24a8c14
Compare
`phf_set!` is good for large lists of strings, but in cases of small lists, it is often faster to just do a `.contains` or a `matches!`. It also slightly improves compile times since no work needs to be done to hash at compile time. It's possible the math on this might flip once we implement `Ident` (i.e., #16416), since the hashes will be pre-computed at parse-time.
|
I'll move this forward. |
|
closed by the stack #18400 |


Identtype for identifier strings backlog#193This is a proof of concept. Mainly to see how bad the damage is if we add a new
Identtype. Will help guide the API and also how we approach implementing this.