-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt table sizes to the contents, accommodating u64 rmeta offsets #113542
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 36f73e3842661138a4cbda3cd8f22ef90494c53c with merge 819796cbc9cc9775f4fed20477f602cf43cbda6b... |
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (819796cbc9cc9775f4fed20477f602cf43cbda6b): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 656.907s -> 658.694s (0.27%) |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 2c29776342dac9316da5f030070478a8870f8705 with merge e89b1c5248a7c7386b88f9ee784339821d187bb3... |
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e89b1c5248a7c7386b88f9ee784339821d187bb3): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 656.951s -> 656.796s (-0.02%) |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 17d530e19db0f3e15ea106e3975da1ba0c707f4c with merge a1d3d10f347e823a591407c9fbe1b81d62ee67af... |
I forgot to update the labels, oops. |
going to reroll r? compiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm. r=me with nits
let len = self.num_elems; | ||
let len: u32 = len.try_into().unwrap(); | ||
len.write_to_bytes(meta_bytes); | ||
for i in 0..8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please write a comment here stating the motivation for this interleaving? This code would be very difficult to understand without context.
@@ -136,7 +136,8 @@ impl<T> LazyArray<T> { | |||
/// eagerly and in-order. | |||
struct LazyTable<I, T> { | |||
position: NonZeroUsize, | |||
encoded_size: usize, | |||
width: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some comments on those?
let ([position_bytes, meta_bytes], []) = b.as_chunks::<4>() else { panic!() }; | ||
if *meta_bytes == [0; 4] { | ||
fn from_bytes(b: &[u8; 16]) -> Self { | ||
let mut position = [0u8; 8]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe extract this to a function since this code is duplicated? Also a small comment referencing the idea behind the interleaving would be nice.
@bors r=b-naber |
I'm adding relnotes now so that I don't forget. If we manage to improve the bitcode embedding strategy, then this and a handful of other PRs I've landed will combine to make a number of programs compile that previously could not be compiled on account of their size. People who have run into such issues and quietly hacked around the situation will probably be happy to hear some good news. |
☀️ Test successful - checks-actions |
Finished benchmarking commit (d64c845): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 631.192s -> 631.341s (0.02%) |
The regressions are only in instruction counts on doc builds. Cycles and wall time report some changes, but they are not reproduced by the pre-merge perf run and are within the noise envelope based on the recent graphs. This PR makes the rmeta format more complex in order to shrink it, so I think the instruction counts are not necessarily indicative of a "the compiler is slower" effect, but rather that we are using more instructions to do less work overall. Alternatively, the icount regressions are justified by the binary size improvements. @rustbot label: +perf-regression-triaged |
Removing relnotes, if we manage to improve the bitcode embedding it won't land in 1.74 with this PR. |
This is an implementation of rust-lang/compiler-team#666
The objective of this PR is to permit the rmeta format to accommodate larger crates that need offsets larger than a
u32
can store without compromising performance for crates that do not need such range. The second commit is a number of tiny optimization opportunities I noticed while looking at perf recordings of the first commit.The rmeta tables need to have fixed-size elements to permit lazy random access. But the size only needs to be fixed per table, not per element type. This PR adds another
usize
to the table header which indicates the table element size. As each element of a table is set, we keep track of the widest encoded table value, then don't bother encoding all the unused trailing bytes on each value. When decoding table elements, we copy them to a full-width array if they are not already full-width.LazyArray
needs some special treatment. Most other values that are encoded in tables are indexes or offsets, and those tend to be small so we get to drop a lot of zero bytes off the end. ButLazyArray
encodes two small values in a fixed-width table element: A position of the table and the length of the table. The treatment described above could trim zero bytes off the table length, but any nonzero length shields the position bytes from the optimization. To improve this, we interleave the bytes of position and length. This change is responsible for about half of the crate metadata win on many crates.Fixes #112934 (probably)
Fixes #103607
Fixes #111855