Optimize `impl_read_unsigned_leb128` #92604

nnethercote · 2022-01-06T02:05:49Z

I see instruction count improvements of up to 3.5% locally with these changes, mostly on the smaller benchmarks.

r? @michaelwoerister

nnethercote · 2022-01-06T02:06:14Z

@bors try @rust-timer queue

rust-timer · 2022-01-06T02:06:15Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-01-06T02:06:25Z

⌛ Trying commit ac6e14fe68386ccb4988623104c2f36f2fdcdbb9 with merge 2817d16b0dd82139c66514d1ed9609a37366c53d...

rust-timer · 2022-01-06T03:45:14Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-01-06T05:01:18Z

⌛ Trying commit a18ddee2ead17a25fd616314fd0e30524b5fd04e with merge 7775b1f299a2e55b29c22ee1c04e4b31a08e34f8...

bors · 2022-01-06T06:37:11Z

☀️ Try build successful - checks-actions
Build commit: 7775b1f299a2e55b29c22ee1c04e4b31a08e34f8 (7775b1f299a2e55b29c22ee1c04e4b31a08e34f8)

rust-timer · 2022-01-06T06:37:12Z

Queued 7775b1f299a2e55b29c22ee1c04e4b31a08e34f8 with parent f1ce0e6, future comparison URL.

rust-timer · 2022-01-06T08:07:27Z

Finished benchmarking commit (7775b1f299a2e55b29c22ee1c04e4b31a08e34f8): comparison url.

Summary: This change led to moderate relevant mixed results 🤷 in compiler performance.

Moderate improvement in instruction counts (up to -2.2% on incr-unchanged builds of ucd)
Moderate regression in instruction counts (up to 1.4% on incr-unchanged builds of ctfe-stress-4)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

michaelwoerister · 2022-01-06T09:23:23Z

compiler/rustc_serialize/src/leb128.rs

+
+            // The first iteration of the loop is unpeeled because this code is
+            // hot and integers less than 128 are very common, occurring 80%+
+            // of the time.


Is that true for all types? IIRC, that's very common for usize but much less so for u64, for example. Maybe it would make sense to have two implementations, one optimized for small values and another optimized for larger values?

Indeed it could be a good idea to generalize the impl for various "categories" of numbers (often small, often large), it helped recently in hashing: #92103.

Good point! I did some follow-up investigation, and saw that this accounts for the regressions seen in some cases. For example, the ctfe-stress-4 check incr-unchanged regression seen on the CI run has this distribution of leb128 lengths for read u64 values:

2361790 counts: ( 1) 1311716 (55.5%, 55.5%): u64 1 ( 2) 1049356 (44.4%,100.0%): u64 10 ( 3) 574 ( 0.0%,100.0%): u64 9 ( 4) 115 ( 0.0%,100.0%): u64 2 ( 5) 20 ( 0.0%,100.0%): u64 4 ( 6) 5 ( 0.0%,100.0%): u64 3 ( 7) 3 ( 0.0%,100.0%): u64 8 ( 8) 1 ( 0.0%,100.0%): u64 7

Kobzol · 2022-01-06T11:29:38Z

FWIW, I looked at reading/writing LEB128 recently. I tried two things:

Removing bound checks (let byte = slice[*position];) from reading: didn't help
Removing the loop from writing and performing an explicit match on the number of bytes that should be written, like:

match value {
    0..0x7F => leb_write(),
    0x80... => leb_write(), leb_write()
}

etc. It helped for some benchmarks, but in the end it was mostly a regression, probably because the code was much larger. But I think that maybe if a good compromise on the number of branches was found, it could be an improvement, because now the loop is quite tight and the repeated increment can slow it down.

It's a small but clear performance win.

nnethercote · 2022-01-06T23:49:04Z

I have split out the second commit ("Modify the buffer position directly when reading leb128 values") into its own PR (#92631) because it's a straightforward win for all types. Once that is merged I'll return here and work through the more subtle changes that are type-dependent. Thanks for the feedback.

nnethercote · 2022-01-07T03:02:37Z

I did some more investigation.

Modifying the buffer position directly is a universal win, as Modify the buffer position directly when reading leb128 values. #92631 shows.
Unpeeling the first iteration of the loop is also a universal win, even for u64 and u128. (I tried removing it for just u64 and u128, that made things slightly worse.)
Putting the loop into a separate function is a regression on a lot of case, so I have ditched that.

So I now have two commits in this PR, and if the CI run works well I think we should land them both here, and drop #92631. Sorry for any confusion!

@bors try @rust-timer queue

bors · 2022-01-07T15:49:31Z

📌 Commit facba24 has been approved by michaelwoerister

hjiayz · 2022-01-08T16:01:35Z

https://crates.io/crates/varint-simd
Simd leb128 lib

bjorn3 · 2022-01-08T16:07:48Z

It seems like getting any benefit from that crate requires compiling rustc with at least the ssse3 target feature. Rustc is currently compiled to be suitable for all x86_64 cpu's.

bors · 2022-01-13T07:03:39Z

⌛ Testing commit facba24 with merge e7fc933df9c0bfd4e10cc827cdcf6bb44afb4769...

bors · 2022-01-13T11:18:39Z

💥 Test timed out

lqd · 2022-01-13T11:29:32Z

@bors retry Test timed out

rust-log-analyzer · 2022-01-13T11:35:52Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

bors · 2022-01-14T14:09:30Z

⌛ Testing commit facba24 with merge 1078f1d405bf2eacdfc6efb40f0ef7aa797cbabf...

rust-log-analyzer · 2022-01-14T14:19:09Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

bors · 2022-01-14T14:20:12Z

💔 Test failed - checks-actions

lqd · 2022-01-14T14:49:11Z

@bors retry Some apple builder is having network issues. "unable to access 'https://github.com/rust-lang-ci/rust/': Could not resolve host: github.com"

bors · 2022-01-15T07:27:33Z

⌛ Testing commit facba24 with merge 38c22af...

bors · 2022-01-15T10:25:48Z

☀️ Test successful - checks-actions
Approved by: michaelwoerister
Pushing 38c22af to master...

rust-timer · 2022-01-15T11:56:27Z

Finished benchmarking commit (38c22af): comparison url.

Summary: This change led to large relevant improvements 🎉 in compiler performance.

Large improvement in instruction counts (up to -3.6% on full builds of helloworld)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 6, 2022

rust-highfive assigned michaelwoerister Jan 6, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 6, 2022

nnethercote changed the title ~~Optimize impl read unsigned leb128~~ Optimize impl_read_unsigned_leb128 Jan 6, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 6, 2022

This comment has been minimized.

Sign in to view

nnethercote mentioned this pull request Jan 6, 2022

Micro-optimize the heck out of LEB128 reading and writing. #69050

Merged

nnethercote force-pushed the optimize-impl_read_unsigned_leb128 branch from ac6e14f to 16869c2 Compare January 6, 2022 02:23

This comment has been minimized.

Sign in to view

nnethercote force-pushed the optimize-impl_read_unsigned_leb128 branch from 16869c2 to a18ddee Compare January 6, 2022 02:49

rust-lang deleted a comment from rust-timer Jan 6, 2022

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 6, 2022

michaelwoerister reviewed Jan 6, 2022

View reviewed changes

Modify the buffer position directly when reading leb128 values.

5f549d9

It's a small but clear performance win.

Unpeel the first iteration of the loop in impl_read_unsigned_leb128.

facba24

nnethercote force-pushed the optimize-impl_read_unsigned_leb128 branch from a18ddee to facba24 Compare January 7, 2022 02:58

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 7, 2022

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 13, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 13, 2022

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 14, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 14, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 15, 2022

bors merged commit 38c22af into rust-lang:master Jan 15, 2022

rustbot added this to the 1.60.0 milestone Jan 15, 2022

nnethercote deleted the optimize-impl_read_unsigned_leb128 branch January 15, 2022 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `impl_read_unsigned_leb128` #92604

Optimize `impl_read_unsigned_leb128` #92604

nnethercote commented Jan 6, 2022

nnethercote commented Jan 6, 2022

rust-timer commented Jan 6, 2022

bors commented Jan 6, 2022

This comment has been minimized.

This comment has been minimized.

rust-timer commented Jan 6, 2022

bors commented Jan 6, 2022

bors commented Jan 6, 2022

rust-timer commented Jan 6, 2022

rust-timer commented Jan 6, 2022

michaelwoerister Jan 6, 2022

Kobzol Jan 6, 2022

nnethercote Jan 6, 2022

Kobzol commented Jan 6, 2022

nnethercote commented Jan 6, 2022

nnethercote commented Jan 7, 2022 •

edited

Loading

bors commented Jan 7, 2022

hjiayz commented Jan 8, 2022

bjorn3 commented Jan 8, 2022

bors commented Jan 13, 2022

bors commented Jan 13, 2022

lqd commented Jan 13, 2022

rust-log-analyzer commented Jan 13, 2022

bors commented Jan 14, 2022

rust-log-analyzer commented Jan 14, 2022

bors commented Jan 14, 2022

lqd commented Jan 14, 2022

bors commented Jan 15, 2022

bors commented Jan 15, 2022

rust-timer commented Jan 15, 2022

Optimize impl_read_unsigned_leb128 #92604

Optimize impl_read_unsigned_leb128 #92604

Conversation

nnethercote commented Jan 6, 2022

nnethercote commented Jan 6, 2022

rust-timer commented Jan 6, 2022

bors commented Jan 6, 2022

This comment has been minimized.

This comment has been minimized.

rust-timer commented Jan 6, 2022

bors commented Jan 6, 2022

bors commented Jan 6, 2022

rust-timer commented Jan 6, 2022

rust-timer commented Jan 6, 2022

michaelwoerister Jan 6, 2022

Choose a reason for hiding this comment

Kobzol Jan 6, 2022

Choose a reason for hiding this comment

nnethercote Jan 6, 2022

Choose a reason for hiding this comment

Kobzol commented Jan 6, 2022

nnethercote commented Jan 6, 2022

nnethercote commented Jan 7, 2022 • edited Loading

bors commented Jan 7, 2022

hjiayz commented Jan 8, 2022

bjorn3 commented Jan 8, 2022

bors commented Jan 13, 2022

bors commented Jan 13, 2022

lqd commented Jan 13, 2022

rust-log-analyzer commented Jan 13, 2022

bors commented Jan 14, 2022

rust-log-analyzer commented Jan 14, 2022

bors commented Jan 14, 2022

lqd commented Jan 14, 2022

bors commented Jan 15, 2022

bors commented Jan 15, 2022

rust-timer commented Jan 15, 2022

Optimize `impl_read_unsigned_leb128` #92604

Optimize `impl_read_unsigned_leb128` #92604

nnethercote commented Jan 7, 2022 •

edited

Loading