Skip to content

perf(lexer): pad Token to 16 bytes#9926

Closed
overlookmotel wants to merge 1 commit intomainfrom
03-20-perf_lexer_pad_token_to_16_bytes
Closed

perf(lexer): pad Token to 16 bytes#9926
overlookmotel wants to merge 1 commit intomainfrom
03-20-perf_lexer_pad_token_to_16_bytes

Conversation

@overlookmotel
Copy link
Member

@overlookmotel overlookmotel commented Mar 20, 2025

#9918 caused a small regression on lexer and parser benchmarks because it introduced an uninitialized padding byte into Token.

Add another dummy u8 to fill the uninitialized hole.

@github-actions github-actions bot added the A-parser Area - Parser label Mar 20, 2025
Copy link
Member Author


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions bot added the C-performance Category - Solution not expected to change functional behavior, only performance label Mar 20, 2025
@overlookmotel overlookmotel marked this pull request as ready for review March 20, 2025 12:47
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 20, 2025

CodSpeed Performance Report

Merging #9926 will not alter performance

Comparing 03-20-perf_lexer_pad_token_to_16_bytes (c54c02a) with main (33c1c76)

Summary

✅ 33 untouched benchmarks

@Boshen
Copy link
Member

Boshen commented Mar 20, 2025

I'm not seeing any difference, both are size = 16 (0x10), align = 0x4. What are property do I need to inspect? Can this be a compile time check?

We already have this

#[cfg(test)]
mod size_asserts {
    use super::Token;
    const _: () = assert!(std::mem::size_of::<Token>() == 16);
}

why is this not enough?

@overlookmotel
Copy link
Member Author

overlookmotel commented Mar 21, 2025

Unfortunately, Rust offers no tools to detect whether a type contains padding. So I don't think it's possible to implement a test.

This PR does not affect type size or alignment, only whether any of the bytes representing the type are uninitialized. When they are, creating and copying a Token is more instructions, as compiler avoids ever reading or writing the uninitialized byte. So e.g. copying a Token is 2 x 8-byte reads + 2 x 8-byte writes (overlapping). Whereas with the _padding: u8, _padding2: u16 fields added, there are no unitialized bytes, so copying Token is just 1 x 16-byte read + 1 x 16-byte write.

This is a micro-micro-optimization, but creating and copying Tokens is such an extremely hot path that it makes a measureable difference. #3283 was the one that revealed this - 5% speed-up just due to the fields of Token getting re-arranged.

But this PR isn't succeeding in fixing the perf loss. Looks like changing the u32 field to a u16 has caused the other fields to move around, which is also a regression. I'll try to fix it.

@overlookmotel overlookmotel marked this pull request as draft March 21, 2025 03:15
overlookmotel pushed a commit that referenced this pull request May 13, 2025
Pack the parser tokens into a single u128.

I was musing with o3 about compiler performance, and it told me that
Hermes used packed tokens to improve parser performance so I thought I'd
try and see if that would also work here. In local benchmarks it appears
to have a decent improvement.

closes #9926
@Boshen Boshen deleted the 03-20-perf_lexer_pad_token_to_16_bytes branch August 30, 2025 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants