-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permit MIR inlining without #[inline] #109247
Conversation
This comment has been minimized.
This comment has been minimized.
1412877
to
ac6d3cb
Compare
This comment has been minimized.
This comment has been minimized.
ac6d3cb
to
f5f1cb1
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit f5f1cb189a638607152e449c02d83dfce67f9d83 with merge 2dffcafbed22d039709534ae1a5078d9202122ff... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (2dffcafbed22d039709534ae1a5078d9202122ff): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
The perf summary above looks more negative than the actual report page. The overall effect is dragged down a lot by the fact that this regresses a function in |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 9373b9301ae612f830885f227fadf2e77ad1b0e8 with merge 9615c8e881c7c418a670d586a2fe748104c298fe... |
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
This looks ok to me. There are some concern about check builds regressing instruction counts, but for changes like this that affect lots of code, it's equally/more important to look at cycles and wall-times. Those improve for check builds. (Select "Show non-relevant results" for the broad view.) |
// CHECK: sub <8 x i32> %[[A]], %[[B]] | ||
// CHECK: sub <8 x i32> %[[B]], %[[A]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh, what happened here? Is this just a reordering of the loads but not a reordering of the actual subtraction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me after making sure that this test doesn't get miscompiled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exactly what is going on here but the test isn't miscompiled. It's our aggressive abbreviations in this codegen test that make it look suspicious. Using -Zmir-opt-level=3 to simulate the effect of this PR because it lifts the inline attribute requirement: https://godbolt.org/z/EP7aTcz1x
Before:
define void @_ZN7example21short_integer_zip_map17hbd5b1e22189a0746E(ptr noalias nocapture noundef writeonly sret([8 x i32]) dereferenceable(32) %0, ptr noalias nocapture noundef readonly dereferenceable(32) %x, ptr noalias nocapture noundef readonly dereferenceable(32) %y) unnamed_addr #0 personality ptr @rust_eh_personality !dbg !6 {
%1 = load <8 x i32>, ptr %x, align 4, !dbg !11
%2 = load <8 x i32>, ptr %y, align 4, !dbg !11
%3 = sub <8 x i32> %1, %2, !dbg !12
store <8 x i32> %3, ptr %0, align 4, !dbg !62, !alias.scope !63, !noalias !66
ret void, !dbg !68
}
After:
define void @_ZN7example21short_integer_zip_map17hbd5b1e22189a0746E(ptr noalias nocapture noundef writeonly sret([8 x i32]) dereferenceable(32) %0, ptr noalias nocapture noundef readonly dereferenceable(32) %x, ptr noalias nocapture noundef readonly dereferenceable(32) %y) unnamed_addr #0 personality ptr @rust_eh_personality !dbg !6 {
%1 = load <8 x i32>, ptr %y, align 4, !dbg !11
%2 = load <8 x i32>, ptr %x, align 4, !dbg !18
%3 = sub <8 x i32> %2, %1, !dbg !19
store <8 x i32> %3, ptr %0, align 4, !dbg !65, !alias.scope !66, !noalias !69
ret void, !dbg !71
}
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (5546cb6): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
@saethlin @compiler-errors looks like perf may be worse than in the original perf run before merge. Of particular concern is that the cargo incr-patched: println test case has regressed significantly (42% increase in wall time). Is it perhaps worth taking another look at the performance impact of this? cc @nnethercote |
That case was present before merge and has been discussed in this thread. Do you disagree with the statements made above? |
… r=cjgillot Don't expect normalization to succeed in elaborate_drops Fixes rust-lang#110682 This was exposed through the changes in rust-lang#109247, which causes more things to be inlined. Inlining can happen before monomorphization, so we can't expect normalization to succeed. In the elaborate_drops analysis we currently have [this call](https://github.com/rust-lang/rust/blob/033aa092ab23ba14cdad27073c5e37ba0eddb428/compiler/rustc_mir_dataflow/src/elaborate_drops.rs#L278) to `normalize_erasing_regions`, which ICEs when normalization fails. The types are used to infer [whether the type needs a drop](https://github.com/rust-lang/rust/blob/033aa092ab23ba14cdad27073c5e37ba0eddb428/compiler/rustc_mir_dataflow/src/elaborate_drops.rs#L374), where `needs_drop` itself [uses `try_normalize_erasing_regions`](https://github.com/rust-lang/rust/blob/033aa092ab23ba14cdad27073c5e37ba0eddb428/compiler/rustc_middle/src/ty/util.rs#L1121). ~[`instance_mir`](https://doc.rust-lang.org/stable/nightly-rustc/rustc_middle/ty/context/struct.TyCtxt.html#method.instance_mir) isn't explicit about whether it expects the instances corresponding to the `InstanceDef`s to be monomorphized (though I think in all other contexts the function is used post-monomorphization), so the use of `instance_mir` in inlining doesn't necessarily seem wrong to me.~
Add #[inline] to as_deref While working on rust-lang/rust#109247 I found an `as_deref` call in the compiler that should have been inlined. This fixes the missing inlining (but doesn't address the perf issues I was chasing). r? `@thomcc`
Add #[inline] to as_deref While working on rust-lang/rust#109247 I found an `as_deref` call in the compiler that should have been inlined. This fixes the missing inlining (but doesn't address the perf issues I was chasing). r? `@thomcc`
Add #[inline] to as_deref While working on rust-lang/rust#109247 I found an `as_deref` call in the compiler that should have been inlined. This fixes the missing inlining (but doesn't address the perf issues I was chasing). r? `@thomcc`
I noticed that there are at least a handful of portable-simd functions that have no
#[inline]
but compile to an assign + return.I locally benchmarked inlining thresholds between 0 and 50 in increments of 5, and 50 seems to be the best. Interesting. That didn't include check builds though,
maybe perf will have something to say about that.Perf has little useful to say about this. We generally regress all the check builds, as best as I can tell, due to a number of small codegen changes in a particular hot function in the compiler. Probably this is because we've nudged the inlining outcomes all over, and uses of
#[inline(always)]
/#[inline(never)]
might need to be adjusted.