-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge into larger interval set #120024
Merge into larger interval set #120024
Conversation
This reduces the work done while merging rows. In at least one case (issue 50450), we have thousands of union([range], [20,000 ranges]), which previously inserted each of the 20,000 ranges one by one. Now we only insert one range into the right hand set after copying the set over.
r? @cjgillot (rustbot has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…<try> Merge into larger interval set This reduces the work done while merging rows. In at least one case (rust-lang#50450), we have thousands of union([range], [20,000 ranges]), which previously inserted each of the 20,000 ranges one by one. Now we only insert one range into the right hand set after copying the set over. This cuts the runtime of the test case in rust-lang#50450 from ~26 seconds to ~6 seconds locally, though it doesn't change the memory usage peak (~9.5GB).
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (204a1d9): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 664.25s -> 665.588s (0.20%) |
No strong opinion on whether these results merit doing this or not. In some cases (see PR description) this is a large win, but it's likely that the extra check is somewhat expensive for less clear-cut cases. We could try to tune (e.g., only do this if the difference is >300 elements or something) but I'm not convinced that's warranted. The new branch costs instructions but likely is well-predicted by CPUs, so we're probably not actually regressing in the common case (or at least not significantly). |
2 questions:
For the second, I mean an algorithm close to the merge of sorted lists: take the lowest interval from both sets, expand it as much as necessary by consuming intervals, and repeat. |
At least a few call sites can't provide other by value without cloning, which would reduce to basically this same implementation (just more spread out). I think the algorithm you suggest is possible, but I'm not sure it would be much of a win. The common case for interval sets is that we have ~1-5 intervals, since they're primarily used for representing liveness (I think? Or presence?) ranges (which are usually not that disjoint). Some code can be pathological though where a variable is live in thousands of discontingous intervals - which we seem to merge with a single "self" interval. For that case any complex algorithm seems unlikely to be better - we'll need a bunch of extra logic but in the end still end up either worse off or equal (basically the ideal is a binary search + Vec::splice, but it's almost what we have here). My sense is that this case is sufficiently rare that the extra logic isn't warranted, while this simple delta perhaps is. |
Fair enough. |
☀️ Test successful - checks-actions |
Finished benchmarking commit (6351247): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 659.977s -> 662.476s (0.38%) |
Given that the results here mirror the pre-merge perf run results fairly closely, I think it's fair to take the review as justification that this is worth the cost to protect against the extreme case. @rustbot label: +perf-regression-triaged |
This reduces the work done while merging rows. In at least one case (#50450), we have thousands of union([range], [20,000 ranges]), which previously inserted each of the 20,000 ranges one by one. Now we only insert one range into the right hand set after copying the set over.
This cuts the runtime of the test case in #50450 from ~26 seconds to ~6 seconds locally, though it doesn't change the memory usage peak (~9.5GB).