Iterate over the smaller list #78323

est31 · 2020-10-24T13:59:41Z

If there are two lists of different sizes,
iterating over the smaller list and then
looking up in the larger list is cheaper
than vice versa, because lookups scale
sublinearly.

If there are two lists of different sizes, iterating over the smaller list and then looking up in the larger list is cheaper than vice versa, because lookups scale sublinearly.

rust-highfive · 2020-10-24T13:59:44Z

r? @varkor

(rust_highfive has picked a reviewer for you, use r? to override)

est31 · 2020-10-24T14:00:39Z

see #78317 for context.

@Mark-Simulacrum could this get a perf run?

jonas-schievink · 2020-10-24T14:06:38Z

@bors try @rust-timer queue

rust-timer · 2020-10-24T14:06:39Z

Awaiting bors try build completion

bors · 2020-10-24T14:06:49Z

⌛ Trying commit a21c2eb with merge 6d375b994aad05109a6a39de4ae01e8ed8940f4d...

bors · 2020-10-24T14:49:16Z

☀️ Try build successful - checks-actions
Build commit: 6d375b994aad05109a6a39de4ae01e8ed8940f4d (6d375b994aad05109a6a39de4ae01e8ed8940f4d)

rust-timer · 2020-10-24T14:49:17Z

Queued 6d375b994aad05109a6a39de4ae01e8ed8940f4d with parent 2e8a54a, future comparison URL.

rust-timer · 2020-10-24T17:52:40Z

Finished benchmarking try commit (6d375b994aad05109a6a39de4ae01e8ed8940f4d): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

est31 · 2020-10-25T05:58:38Z

The biggest change is in the bootstrap timings, many of which going down in total compile time by 2% and 1%. I'm not sure how noisy they are but if multiple go down at the same time, can it be due to noise?

There's a tiny change in packed-simd-opt run incr-unchanged of 0.062 seconds to 0.059 seconds for crate_inherent_impls_overlap_check, so from 1.70% to 1.63% of the total compile time.

Local benchmarks of stm32f4 give me a regression, likely because its impl blocks are highly-homogenous and thus don't benefit from the optimization (the optimization is best for heterogenous impl blocks) and instead one feels the overhead of doing the length comparison... It also calls traits::overlapping_impls very often so that the pre-check code is only fluff that's in the way. Edit: turns out to be an artifact of how I measured, see my comment below.

I have added a commit to this PR that saves some allocations in the arena. Would it be possible to get another perf run for the PR? Then I can compare the two commits to get the impact of that new commit... that works, right?

jyn514 · 2020-10-25T09:36:24Z

@bors try @rust-timer queue

rust-timer · 2020-10-25T09:36:25Z

Awaiting bors try build completion

bors · 2020-10-25T09:36:35Z

⌛ Trying commit 5afdb1efe64d306a5ed9dbb528da7c0f50d4258b with merge f5b6e9824f9430f5bbe58b7c12c4a69612e93a30...

bors · 2020-10-25T10:17:48Z

☀️ Try build successful - checks-actions
Build commit: f5b6e9824f9430f5bbe58b7c12c4a69612e93a30 (f5b6e9824f9430f5bbe58b7c12c4a69612e93a30)

rust-timer · 2020-10-25T10:17:50Z

Queued f5b6e9824f9430f5bbe58b7c12c4a69612e93a30 with parent 3e0dd24, future comparison URL.

rust-timer · 2020-10-25T12:34:04Z

Finished benchmarking try commit (f5b6e9824f9430f5bbe58b7c12c4a69612e93a30): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

est31 · 2020-10-25T18:11:31Z

Now it's red again for the bootstrap times, but green in the instruction counts. Maybe it's noise? Maybe it was noise in #78317 as well?

IDK should I close this or should it be merged? What do you think?

bugadani

I don't know how wall times are measured exactly (well, besides starting and stopping a clock along with the process) and how sensitive is perf to it's environment (processes on the VM, other VMs on the node, etc.). I wouldn't consider a 2% change a huge one there. On the other hand, I don't know enough about the setup to know for sure.

Around half of the averages are 0.0%, others show a tiny improvement. Not much, but I can't see any significant regressions, so I don't see why this shouldn't be accepted.

compiler/rustc_typeck/src/coherence/inherent_impls_overlap.rs

The associated_items(def_id) call allocates internally. Previously, we'd have called it for each pair, so we'd have had O(n^2) many calls. By precomputing the associated items, we avoid repeating so many allocations. The only instance where this precomputation would be a regression is if there's only one inherent impl block for the type, as the inner loop then doesn't run. In that instance, we just early return. Also, use SmallVec to avoid doing an allocation at all if the number is small (the case for most impl blocks out there).

est31 · 2020-10-26T01:35:22Z

Sooo... apparently the way I made local benchmarks of stm32f4 was flawed: I rebased my PR onto the commit of the latest nightly, then compared a local build of rustc with the PR with the nightly. But they were built in different environments, so the local build was a bit slower. If I take the PR's base, the same code that the nightly uses, and compare it to the nightly, I get a slowdown of my local build as well. So I changed the way I run benchmarks to compare two local builds: once of the code as of this PR, and with a local build of the PR's base. Then I see a noticeable speedup:

features	cargo +local-bare check	cargo +local check
stm32f401,stm32f405	31.838s	31.340s
stm32f401,stm32f405,stm32f407,stm32f410, stm32f411	1m34.270s	1m33.241s
stm32f401,stm32f405,stm32f407,stm32f410, stm32f411,stm32f412	2m3.626s	2m1.451s
stm32f401,stm32f405,stm32f407,stm32f410, stm32f411,stm32f412,stm32f413,stm32f427	3m35.581s	3m30.222s
stm32f401,stm32f405,stm32f407,stm32f410, stm32f411,stm32f412,stm32f413,stm32f427, stm32f429,stm32f446,stm32f469	7m28.523s	7m17.673s

Up to 9 seconds speedup if all stm32f4 features are enabled!

This helped me convince of the worth of this PR.

est31 · 2020-10-26T05:21:16Z

Also ran some -Zself-profile. In the instance of 8 enabled features (4th line in the table) crate_inherent_impls_overlap_check took 28.64s before this PR, and 23.53s after, using 8-10% of the entire compile time.

varkor · 2020-10-28T00:24:53Z

@bors r+

bors · 2020-10-28T00:24:54Z

📌 Commit 6c9b8ad has been approved by varkor

bors · 2020-10-28T03:58:38Z

⌛ Testing commit 6c9b8ad with merge 2eb4fc8...

bors · 2020-10-28T06:10:18Z

☀️ Test successful - checks-actions
Approved by: varkor
Pushing 2eb4fc8 to master...

Mark-Simulacrum · 2020-11-03T21:32:46Z

Appears to have been a slight win on packed-simd incr-unchanged instruction counts, though it is unclear that this had any effect on cpu cycles or wall times.

est31 · 2020-11-04T00:33:27Z

@Mark-Simulacrum yes that's consistent with what the manual benchmark runs showed. The wall clock time changes are too small for it to matter for smaller workloads in the builtin benchmark testsuite, but they are observable with real life workloads like stm32f4 (see comments above).

Iterate over the smaller list

a21c2eb

If there are two lists of different sizes, iterating over the smaller list and then looking up in the larger list is cheaper than vice versa, because lookups scale sublinearly.

rust-highfive assigned varkor Oct 24, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 24, 2020

bugadani reviewed Oct 25, 2020

View reviewed changes

compiler/rustc_typeck/src/coherence/inherent_impls_overlap.rs Outdated Show resolved Hide resolved

est31 force-pushed the smaller_list_overlap branch from 5afdb1e to 6c9b8ad Compare October 25, 2020 21:51

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 28, 2020

bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 28, 2020

bors merged commit 2eb4fc8 into rust-lang:master Oct 28, 2020

rustbot added this to the 1.49.0 milestone Oct 28, 2020

bors mentioned this pull request Oct 28, 2020

Turn quadratic time on number of impl blocks into linear time #78317

Merged

est31 mentioned this pull request Dec 6, 2020

Add stm327xx-hal benchmark rust-lang/rustc-perf#802

Merged

est31 mentioned this pull request Jan 13, 2023

Move format_args!() into AST (and expand it during AST lowering) #106745

Merged

matthiaskrgr mentioned this pull request Apr 13, 2024

ICE: unsized arguments must not be 'extern' types #123887

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterate over the smaller list #78323

Iterate over the smaller list #78323

est31 commented Oct 24, 2020

rust-highfive commented Oct 24, 2020

est31 commented Oct 24, 2020

jonas-schievink commented Oct 24, 2020

rust-timer commented Oct 24, 2020

bors commented Oct 24, 2020

bors commented Oct 24, 2020

rust-timer commented Oct 24, 2020

rust-timer commented Oct 24, 2020

est31 commented Oct 25, 2020 •

edited

Loading

jyn514 commented Oct 25, 2020

rust-timer commented Oct 25, 2020

bors commented Oct 25, 2020

bors commented Oct 25, 2020

rust-timer commented Oct 25, 2020

rust-timer commented Oct 25, 2020

est31 commented Oct 25, 2020

bugadani left a comment

est31 commented Oct 26, 2020

est31 commented Oct 26, 2020

varkor commented Oct 28, 2020

bors commented Oct 28, 2020

bors commented Oct 28, 2020

bors commented Oct 28, 2020

Mark-Simulacrum commented Nov 3, 2020

est31 commented Nov 4, 2020

Iterate over the smaller list #78323

Iterate over the smaller list #78323

Conversation

est31 commented Oct 24, 2020

rust-highfive commented Oct 24, 2020

est31 commented Oct 24, 2020

jonas-schievink commented Oct 24, 2020

rust-timer commented Oct 24, 2020

bors commented Oct 24, 2020

bors commented Oct 24, 2020

rust-timer commented Oct 24, 2020

rust-timer commented Oct 24, 2020

est31 commented Oct 25, 2020 • edited Loading

jyn514 commented Oct 25, 2020

rust-timer commented Oct 25, 2020

bors commented Oct 25, 2020

bors commented Oct 25, 2020

rust-timer commented Oct 25, 2020

rust-timer commented Oct 25, 2020

est31 commented Oct 25, 2020

bugadani left a comment

Choose a reason for hiding this comment

est31 commented Oct 26, 2020

est31 commented Oct 26, 2020

varkor commented Oct 28, 2020

bors commented Oct 28, 2020

bors commented Oct 28, 2020

bors commented Oct 28, 2020

Mark-Simulacrum commented Nov 3, 2020

est31 commented Nov 4, 2020

est31 commented Oct 25, 2020 •

edited

Loading