-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Compute jump threading opportunities in a single pass #142821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
This comment has been minimized.
This comment has been minimized.
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Compute jump threading opportunities in a single pass The current implementation of jump threading walks MIR CFG backwards from each `SwitchInt` terminator. This PR replaces this by a single postorder traversal of MIR. In theory, we could do a full fixpoint dataflow analysis, but this has low returns as we forbid threading through a loop header, and we do not merge TOs yet. The second commit in this PR modifies the carried state to a lighter data structure. The current implementation uses some kind of `IndexVec<ValueIndex, &[Condition]>`. This is needlessly heavy, as the state rarely ever carries more than a few `Condition`s. The first commit replaces this state with a simpler `&[Condition]`, and puts the corresponding `ValueIndex` inside `Condition`. The last commit is the main change. It needs a fair amount of data structure tweaks, as each condition now needs to carry its chain of blocks with it.
|
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (d27b44e): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.9%, secondary -3.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 1.6%, secondary 2.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.1%, secondary -0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 689.042s -> 688.964s (-0.01%) |
bdf9d85 to
3f66e3a
Compare
This comment has been minimized.
This comment has been minimized.
|
Some changes occurred in coverage tests. cc @Zalathar |
|
r? wg-mir-opt |
|
Failed to set assignee to
|
325fee6 to
b541dc6
Compare
|
oh, there are people in the wg which can't actually be assigned for review 😅 |
oli-obk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can yolo-review it (check that the general design makes sense and appears to be doing what it is supposed to), but I am certain I cannot antagonistically review it in the way that we should be reviewing mir opts to make sure we don't have a misoptimization. I have tried the last two weeks but I don't think I am a good reviewer for such work
| rustc_index::newtype_index!( | ||
| /// This index uniquely identifies a tracked place and therefore a slot in [`State`]. | ||
| /// | ||
| /// It is an implementation detail of this module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is now outdated
|
r? mir |
31e6085 to
95e10dd
Compare
This comment has been minimized.
This comment has been minimized.
95e10dd to
87ac07f
Compare
This comment has been minimized.
This comment has been minimized.
|
☔ The latest upstream changes (presumably #148789) made this pull request unmergeable. Please resolve the merge conflicts. |
87ac07f to
223620f
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
I've been back and forth over this so many times that I think I've built enough confidence in it, even though I'll admit it is at the edge of my grasp. This PR is really well-organized and well-described. @bors r+ |
|
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 1d60f9e (parent) -> 4ad239f (this PR) Test differencesShow 2 test diffs2 doctest diffs were found. These are ignored, as they are noisy. Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 4ad239f4156aa4e7df5ac9eb90ff0ab3d0089d1c --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (4ad239f): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowOur benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.5%, secondary 1.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 3.5%, secondary -2.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.2%, secondary -0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 471.751s -> 469.27s (-0.53%) |
The current implementation of jump threading walks MIR CFG backwards from each
SwitchIntterminator. This PR replaces this by a single postorder traversal of MIR. In theory, we could do a full fixpoint dataflow analysis, but this has low returns as we forbid threading through a loop header.The second commit in this PR modifies the carried state to a lighter data structure. The current implementation uses some kind of
IndexVec<ValueIndex, &[Condition]>. This is needlessly heavy, as the state rarely ever carries more than a fewConditions. The first commit replaces this state with a simpler&[Condition], and puts the correspondingValueIndexinsideCondition.The three later commits are perf tweaks.
The sixth commit is the main change. Instead of carrying the goto target inside the condition, we maintain a set of conditions associated with each block, and their consequences in following blocks. Think: if this condition is fulfilled in this block, then that condition is fulfilled in that block. This makes the threading algorithm much easier to implement, without the extra bookkeeping of
ThreadingOpportunitywe had.Later commits modify that algorithm to shrink the set of duplicated blocks. By propagating fulfilled conditions down the CFG, and trimming costly threads.