Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt table sizes to the contents, accommodating u64 rmeta offsets #113542

Merged
merged 3 commits into from
Aug 30, 2023

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Jul 10, 2023

This is an implementation of rust-lang/compiler-team#666

The objective of this PR is to permit the rmeta format to accommodate larger crates that need offsets larger than a u32 can store without compromising performance for crates that do not need such range. The second commit is a number of tiny optimization opportunities I noticed while looking at perf recordings of the first commit.

The rmeta tables need to have fixed-size elements to permit lazy random access. But the size only needs to be fixed per table, not per element type. This PR adds another usize to the table header which indicates the table element size. As each element of a table is set, we keep track of the widest encoded table value, then don't bother encoding all the unused trailing bytes on each value. When decoding table elements, we copy them to a full-width array if they are not already full-width.

LazyArray needs some special treatment. Most other values that are encoded in tables are indexes or offsets, and those tend to be small so we get to drop a lot of zero bytes off the end. But LazyArray encodes two small values in a fixed-width table element: A position of the table and the length of the table. The treatment described above could trim zero bytes off the table length, but any nonzero length shields the position bytes from the optimization. To improve this, we interleave the bytes of position and length. This change is responsible for about half of the crate metadata win on many crates.

Fixes #112934 (probably)
Fixes #103607
Fixes #111855

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 10, 2023
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 10, 2023
@saethlin saethlin removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 10, 2023
@bors
Copy link
Contributor

bors commented Jul 10, 2023

⌛ Trying commit 36f73e3842661138a4cbda3cd8f22ef90494c53c with merge 819796cbc9cc9775f4fed20477f602cf43cbda6b...

@saethlin saethlin added the S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. label Jul 10, 2023
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jul 10, 2023

☀️ Try build successful - checks-actions
Build commit: 819796cbc9cc9775f4fed20477f602cf43cbda6b (819796cbc9cc9775f4fed20477f602cf43cbda6b)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (819796cbc9cc9775f4fed20477f602cf43cbda6b): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.7% [0.4%, 1.4%] 6
Regressions ❌
(secondary)
0.9% [0.2%, 1.5%] 23
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.7% [0.4%, 1.4%] 6

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.9% [1.2%, 5.0%] 4
Improvements ✅
(primary)
-1.3% [-2.4%, -0.6%] 33
Improvements ✅
(secondary)
-2.7% [-4.0%, -1.6%] 9
All ❌✅ (primary) -1.3% [-2.4%, -0.6%] 33

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.6% [2.2%, 3.0%] 2
Regressions ❌
(secondary)
3.0% [2.3%, 3.5%] 11
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.6% [2.2%, 3.0%] 2

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.9% [-10.4%, -0.3%] 126
Improvements ✅
(secondary)
-5.3% [-15.6%, -0.1%] 75
All ❌✅ (primary) -3.9% [-10.4%, -0.3%] 126

Bootstrap: 656.907s -> 658.694s (0.27%)

@rustbot rustbot added the perf-regression Performance regression. label Jul 10, 2023
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 10, 2023
@bors
Copy link
Contributor

bors commented Jul 10, 2023

⌛ Trying commit 2c29776342dac9316da5f030070478a8870f8705 with merge e89b1c5248a7c7386b88f9ee784339821d187bb3...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jul 10, 2023

☀️ Try build successful - checks-actions
Build commit: e89b1c5248a7c7386b88f9ee784339821d187bb3 (e89b1c5248a7c7386b88f9ee784339821d187bb3)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e89b1c5248a7c7386b88f9ee784339821d187bb3): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.8% [0.5%, 1.5%] 7
Regressions ❌
(secondary)
1.1% [0.4%, 1.5%] 18
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.8% [0.5%, 1.5%] 7

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.9% [1.5%, 4.3%] 4
Improvements ✅
(primary)
-1.4% [-2.5%, -0.6%] 34
Improvements ✅
(secondary)
-2.4% [-3.7%, -1.3%] 6
All ❌✅ (primary) -1.4% [-2.5%, -0.6%] 34

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.9% [2.9%, 2.9%] 1
Regressions ❌
(secondary)
2.5% [1.3%, 2.9%] 8
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.9% [2.9%, 2.9%] 1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.9% [-10.4%, -0.3%] 126
Improvements ✅
(secondary)
-5.3% [-15.6%, -0.1%] 75
All ❌✅ (primary) -3.9% [-10.4%, -0.3%] 126

Bootstrap: 656.951s -> 656.796s (-0.02%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 10, 2023
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 10, 2023
@bors
Copy link
Contributor

bors commented Jul 10, 2023

⌛ Trying commit 17d530e19db0f3e15ea106e3975da1ba0c707f4c with merge a1d3d10f347e823a591407c9fbe1b81d62ee67af...

@saethlin saethlin added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. labels Jul 24, 2023
@saethlin
Copy link
Member Author

I forgot to update the labels, oops.

@apiraino
Copy link
Contributor

going to reroll r? compiler

Copy link
Contributor

@b-naber b-naber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. r=me with nits

let len = self.num_elems;
let len: u32 = len.try_into().unwrap();
len.write_to_bytes(meta_bytes);
for i in 0..8 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please write a comment here stating the motivation for this interleaving? This code would be very difficult to understand without context.

@@ -136,7 +136,8 @@ impl<T> LazyArray<T> {
/// eagerly and in-order.
struct LazyTable<I, T> {
position: NonZeroUsize,
encoded_size: usize,
width: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some comments on those?

let ([position_bytes, meta_bytes], []) = b.as_chunks::<4>() else { panic!() };
if *meta_bytes == [0; 4] {
fn from_bytes(b: &[u8; 16]) -> Self {
let mut position = [0u8; 8];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract this to a function since this code is duplicated? Also a small comment referencing the idea behind the interleaving would be nice.

@saethlin saethlin removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 29, 2023
@saethlin
Copy link
Member Author

@bors r=b-naber

@bors
Copy link
Contributor

bors commented Aug 30, 2023

📌 Commit 225b3c0 has been approved by b-naber

It is now in the queue for this repository.

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Aug 30, 2023
@saethlin saethlin added the relnotes Marks issues that should be documented in the release notes of the next release. label Aug 30, 2023
@saethlin
Copy link
Member Author

saethlin commented Aug 30, 2023

I'm adding relnotes now so that I don't forget.

If we manage to improve the bitcode embedding strategy, then this and a handful of other PRs I've landed will combine to make a number of programs compile that previously could not be compiled on account of their size. People who have run into such issues and quietly hacked around the situation will probably be happy to hear some good news.

@bors
Copy link
Contributor

bors commented Aug 30, 2023

⌛ Testing commit 225b3c0 with merge d64c845...

@bors
Copy link
Contributor

bors commented Aug 30, 2023

☀️ Test successful - checks-actions
Approved by: b-naber
Pushing d64c845 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 30, 2023
@bors bors merged commit d64c845 into rust-lang:master Aug 30, 2023
@rustbot rustbot added this to the 1.74.0 milestone Aug 30, 2023
@saethlin saethlin deleted the adaptive-tables branch August 30, 2023 04:30
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d64c845): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.2%, 1.2%] 11
Regressions ❌
(secondary)
0.8% [0.2%, 1.3%] 26
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [0.2%, 1.2%] 11

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [0.6%, 1.3%] 10
Regressions ❌
(secondary)
3.6% [1.5%, 12.8%] 14
Improvements ✅
(primary)
-1.5% [-2.8%, -0.6%] 10
Improvements ✅
(secondary)
-2.4% [-3.9%, -0.6%] 6
All ❌✅ (primary) -0.2% [-2.8%, 1.3%] 20

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
4.1% [0.7%, 6.4%] 7
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 4.1% [0.7%, 6.4%] 7

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.0% [-10.3%, -0.3%] 129
Improvements ✅
(secondary)
-5.5% [-18.3%, -0.1%] 75
All ❌✅ (primary) -4.0% [-10.3%, -0.3%] 129

Bootstrap: 631.192s -> 631.341s (0.02%)
Artifact size: 317.47 MiB -> 316.61 MiB (-0.27%)

@saethlin
Copy link
Member Author

saethlin commented Aug 30, 2023

The regressions are only in instruction counts on doc builds. Cycles and wall time report some changes, but they are not reproduced by the pre-merge perf run and are within the noise envelope based on the recent graphs.

This PR makes the rmeta format more complex in order to shrink it, so I think the instruction counts are not necessarily indicative of a "the compiler is slower" effect, but rather that we are using more instructions to do less work overall.

Alternatively, the icount regressions are justified by the binary size improvements.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Aug 30, 2023
@saethlin saethlin changed the title Adapt table sizes to the contents Adapt table sizes to the contents, accommodating u64 rmeta offsets Sep 4, 2023
@saethlin saethlin removed the relnotes Marks issues that should be documented in the release notes of the next release. label Sep 23, 2023
@saethlin
Copy link
Member Author

Removing relnotes, if we manage to improve the bitcode embedding it won't land in 1.74 with this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
8 participants