-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterate over the smaller list #78323
Conversation
If there are two lists of different sizes, iterating over the smaller list and then looking up in the larger list is cheaper than vice versa, because lookups scale sublinearly.
r? @varkor (rust_highfive has picked a reviewer for you, use r? to override) |
see #78317 for context. @Mark-Simulacrum could this get a perf run? |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit a21c2eb with merge 6d375b994aad05109a6a39de4ae01e8ed8940f4d... |
☀️ Try build successful - checks-actions |
Queued 6d375b994aad05109a6a39de4ae01e8ed8940f4d with parent 2e8a54a, future comparison URL. |
Finished benchmarking try commit (6d375b994aad05109a6a39de4ae01e8ed8940f4d): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
The biggest change is in the bootstrap timings, many of which going down in total compile time by 2% and 1%. I'm not sure how noisy they are but if multiple go down at the same time, can it be due to noise? There's a tiny change in
I have added a commit to this PR that saves some allocations in the arena. Would it be possible to get another perf run for the PR? Then I can compare the two commits to get the impact of that new commit... that works, right? |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 5afdb1efe64d306a5ed9dbb528da7c0f50d4258b with merge f5b6e9824f9430f5bbe58b7c12c4a69612e93a30... |
☀️ Try build successful - checks-actions |
Queued f5b6e9824f9430f5bbe58b7c12c4a69612e93a30 with parent 3e0dd24, future comparison URL. |
Finished benchmarking try commit (f5b6e9824f9430f5bbe58b7c12c4a69612e93a30): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Now it's red again for the bootstrap times, but green in the instruction counts. Maybe it's noise? Maybe it was noise in #78317 as well? IDK should I close this or should it be merged? What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how wall times are measured exactly (well, besides starting and stopping a clock along with the process) and how sensitive is perf to it's environment (processes on the VM, other VMs on the node, etc.). I wouldn't consider a 2% change a huge one there. On the other hand, I don't know enough about the setup to know for sure.
Around half of the averages are 0.0%, others show a tiny improvement. Not much, but I can't see any significant regressions, so I don't see why this shouldn't be accepted.
The associated_items(def_id) call allocates internally. Previously, we'd have called it for each pair, so we'd have had O(n^2) many calls. By precomputing the associated items, we avoid repeating so many allocations. The only instance where this precomputation would be a regression is if there's only one inherent impl block for the type, as the inner loop then doesn't run. In that instance, we just early return. Also, use SmallVec to avoid doing an allocation at all if the number is small (the case for most impl blocks out there).
5afdb1e
to
6c9b8ad
Compare
Sooo... apparently the way I made local benchmarks of
Up to 9 seconds speedup if all This helped me convince of the worth of this PR. |
Also ran some |
@bors r+ |
📌 Commit 6c9b8ad has been approved by |
☀️ Test successful - checks-actions |
Appears to have been a slight win on packed-simd incr-unchanged instruction counts, though it is unclear that this had any effect on cpu cycles or wall times. |
@Mark-Simulacrum yes that's consistent with what the manual benchmark runs showed. The wall clock time changes are too small for it to matter for smaller workloads in the builtin benchmark testsuite, but they are observable with real life workloads like stm32f4 (see comments above). |
If there are two lists of different sizes,
iterating over the smaller list and then
looking up in the larger list is cheaper
than vice versa, because lookups scale
sublinearly.