Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish between library and lang UB in assert_unsafe_precondition #121662

Merged
merged 5 commits into from
Mar 10, 2024

Conversation

saethlin
Copy link
Member

As described in #121583 (comment), assert_unsafe_precondition now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).

debug_assert_nounwind was originally added to avoid the "only at runtime" aspect of assert_unsafe_precondition. Since then the difference between the macros has gotten muddied. This totally revamps the situation.

Now all preconditions shall be checked with assert_unsafe_precondition. If you have a precondition that's only checkable at runtime, do a const_eval_select hack, as done in this PR.

r? RalfJung

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 27, 2024
@rustbot
Copy link
Collaborator

rustbot commented Feb 27, 2024

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

This PR changes MIR

cc @oli-obk, @RalfJung, @JakobDegen, @davidtwco, @celinval, @vakaras

This PR changes Stable MIR

cc @oli-obk, @celinval, @spastorino, @ouz-a

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@saethlin saethlin force-pushed the precondition-unification branch from a64a002 to ea23c97 Compare February 27, 2024 03:25
@rustbot
Copy link
Collaborator

rustbot commented Feb 27, 2024

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

@saethlin saethlin force-pushed the precondition-unification branch from ea23c97 to b268228 Compare February 27, 2024 03:40
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 27, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 27, 2024
…=<try>

Distinguish between library and lang UB in assert_unsafe_precondition

As described in rust-lang#121583 (comment), `assert_unsafe_precondition` now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).

`debug_assert_nounwind` was originally added to avoid the "only at runtime" aspect of `assert_unsafe_precondition`. Since then the difference between the macros has gotten muddied. This totally revamps the situation.

Now _all_ preconditions shall be checked with `assert_unsafe_precondition`. If you have a precondition that's only checkable at runtime, do a `const_eval_select` hack, as done in this PR.

r? RalfJung
@bors
Copy link
Contributor

bors commented Feb 27, 2024

⌛ Trying commit b268228 with merge c492ab1...

@bors
Copy link
Contributor

bors commented Feb 27, 2024

☀️ Try build successful - checks-actions
Build commit: c492ab1 (c492ab1d2d6c77e2d02bf9d0a998a6aee430fd48)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c492ab1): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.2%, 0.9%] 5
Regressions ❌
(secondary)
0.5% [0.1%, 1.3%] 7
Improvements ✅
(primary)
-0.8% [-2.0%, -0.2%] 9
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.3% [-2.0%, 0.9%] 14

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
6.9% [1.2%, 12.1%] 4
Regressions ❌
(secondary)
3.1% [1.4%, 4.0%] 3
Improvements ✅
(primary)
-4.7% [-11.6%, -0.5%] 8
Improvements ✅
(secondary)
-2.4% [-2.4%, -2.4%] 1
All ❌✅ (primary) -0.9% [-11.6%, 12.1%] 12

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [0.9%, 1.2%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.2% [-1.2%, -1.2%] 1
Improvements ✅
(secondary)
-1.0% [-1.0%, -1.0%] 1
All ❌✅ (primary) 0.3% [-1.2%, 1.2%] 3

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.5%] 16
Regressions ❌
(secondary)
0.9% [0.0%, 1.5%] 12
Improvements ✅
(primary)
-0.7% [-2.7%, -0.1%] 70
Improvements ✅
(secondary)
-0.7% [-2.4%, -0.1%] 4
All ❌✅ (primary) -0.5% [-2.7%, 0.5%] 86

Bootstrap: 649.3s -> 650.779s (0.23%)
Artifact size: 311.08 MiB -> 311.03 MiB (-0.02%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 27, 2024
@saethlin saethlin force-pushed the precondition-unification branch from b268228 to f8d0380 Compare February 28, 2024 22:23
library/core/src/intrinsics.rs Outdated Show resolved Hide resolved
library/core/src/intrinsics.rs Outdated Show resolved Hide resolved
/// implement debug assertions that are included in the precompiled standard library, but can
/// be optimized out by builds that monomorphize the standard library code with debug
/// Whether we should check for library UB. This evaluate to the value of `cfg!(debug_assertions)`
/// in codegen, and `true` in const-eval/Miri.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe const-eval/Miri should be consistent with runtime here? Every single time we made Miri diverge from that people were caught off-guard by it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does const-eval usually behave differently when you pass --release? Granted it's just UB checking... but still I expect const to always work the same regardless of the flags I pass. Even if that's wrong, it feel more like it's part of the type system than the runtime part of Rust.

I'll swap this to work the same because I think it's sufficiently less surprising for Miri and kind of a wash in const-eval.

Copy link
Member

@RalfJung RalfJung Mar 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does const-eval usually behave differently when you pass --release?

const-eval usually honors #[cfg(debug_assertions)]/cfg!(debug_assertions) the same way everything else does.

So, yes. It feels strange to do anything different here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still open, I think library UB checks during const-eval and Miri should match runtime behavior.

@saethlin
Copy link
Member Author

#97669
@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 10, 2024
@bors
Copy link
Contributor

bors commented Mar 10, 2024

⌛ Testing commit 2a1f97f with merge 768408a...

@bors
Copy link
Contributor

bors commented Mar 10, 2024

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing 768408a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 10, 2024
@bors bors merged commit 768408a into rust-lang:master Mar 10, 2024
12 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Mar 10, 2024
@saethlin saethlin deleted the precondition-unification branch March 10, 2024 03:56
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (768408a): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.1% [0.3%, 1.7%] 4
Regressions ❌
(secondary)
1.1% [1.1%, 1.1%] 1
Improvements ✅
(primary)
-0.9% [-1.7%, -0.4%] 4
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) 0.1% [-1.7%, 1.7%] 8

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.8% [1.0%, 7.1%] 4
Regressions ❌
(secondary)
2.5% [0.9%, 4.8%] 10
Improvements ✅
(primary)
-4.3% [-6.8%, -0.3%] 8
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.6% [-6.8%, 7.1%] 12

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.2% [1.1%, 1.3%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.3% [-1.5%, -1.1%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.0% [-1.5%, 1.3%] 4

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.1%, 0.5%] 18
Regressions ❌
(secondary)
0.9% [0.0%, 1.5%] 12
Improvements ✅
(primary)
-0.7% [-2.5%, -0.1%] 70
Improvements ✅
(secondary)
-0.6% [-2.4%, -0.1%] 4
All ❌✅ (primary) -0.5% [-2.5%, 0.5%] 88

Bootstrap: 647.204s -> 645.087s (-0.33%)
Artifact size: 310.08 MiB -> 309.96 MiB (-0.04%)

celinval pushed a commit to model-checking/kani that referenced this pull request Mar 12, 2024
Relevant upstream changes:

rust-lang/rust#120675: An intrinsic `Symbol` is
now wrapped in a `IntrinsicDef` struct, so the relevant part of the code
needed to be updated.
rust-lang/rust#121464: The second argument of
the `create_wrapper_file` function changed from a vector to a string.
rust-lang/rust#121662: `NullOp::DebugAssertions`
was renamed to `NullOp::UbCheck` and it now has data (currently unused
by Kani)
rust-lang/rust#121728: Introduces `F16` and
`F128`, so needed to add stubs for them
rust-lang/rust#121969: `parse_sess` was renamed
to `psess`, so updated the relevant code.
rust-lang/rust#122059: The
`is_val_statically_known` intrinsic is now used in some `core::fmt`
code, so had to handle it in (codegen'ed to false).
rust-lang/rust#122233: This added a new
`retag_box_to_raw` intrinsic. This is an operation that is primarily
relevant for stacked borrows. For Kani, we just return the pointer.

Resolves #3057
@pnkfelix
Copy link
Member

pnkfelix commented Mar 13, 2024

Visiting for weekly rustc-perf triage

  • primary regressions are to cranelift-codegen opt-full (1.74%), cargo opt-full (1.33%), clap opt-full (1.18%), and exa debug-incr-unchanged (0.28%).
  • the rust-timer run was "only" expected to regress 5 benchmarks, largely a different set, by a mean of 0.5%, not the 1.1% reported above. (I do not know what to take away from that observation, especially the fact that it a different set of benchmarks was impacted...)
  • not marking as triaged.

@DianQK
Copy link
Member

DianQK commented Mar 13, 2024

#122282 might solve part of it?

Primary benchmarks are to cranelift-codegen opt-full (-2.03%), cargo opt-full (-0.90%) and clap opt-full (-1.35%). See https://perf.rust-lang.org/compare.html?start=cdb775cab50311de54ccf3a07b331cc56b0da436&end=acc30d0cafd012430f7d0d83addb6fea4c7b2bbf&stat=instructions:u.

@saethlin
Copy link
Member Author

The changes above are chaotic, small, about-same-magnitude changes, and almost all benchmarks that were changed are opt builds. To me, that means that they are primarily if not entirely due to perturbing the MIR inliner's decisions.

Perhaps that's not the usual kind of perf report you're used to seeing in perf triage. I think most other kinds of changes produce perf reports where it is educational to look into each benchmark, because the fact that one crate was regressed or not means that there is a sort of code pattern that is now pessimized or handled better. I have certainly done such careful case-by-case analysis myself on many PRs and found it helpful. I firmly believe that such analysis is not useful in these perf reports that are primarily MIR inliner chaos (not noise!!). We could make rather subtle tweaks to the relevant standard library code, and completely change the set of benchmarks that get better or worse.

And in addition, for PRs like this where the perf reports are MIR inliner chaos, one should not expect the pre-merge perf report to look the same as the post-merge one. PRs that land between those try builds will provide the perturbations to cause a different chaotic outcome.

The exact same logic applies to #122282. It contains some regressions. That PR is good because it improves our ability to optimize MIR, and not because "improvements outweigh regressions". That suggests that the regressions in such PRs could be recouped by sufficient re-engineering of that change, but I believe they cannot because they reflect imprecision in the MIR inliner's cost model, as opposed to anything about the changes in the PR.

The same logic applies to any improvements that are reported in a PR that adds code to the standard library; such improvements reflect that the MIR inliner previously made wrong decisions and now happens (by sheer luck) to in a few cases make better ones.

If anyone wants to try to do case-by-case analysis of what changed in the generated code or analyze the MIR inliner's decisions they are welcome to attempt it. I will not be doing so. We already know what the next step is to improve the compile-time overhead of these checks, and that is #122282 as @DianQK indicated.

@riking
Copy link

riking commented Mar 13, 2024

Binary size results looking good though, which is an expected result of this change

@saethlin
Copy link
Member Author

Binary size results looking good though, which is an expected result of this change

No. I think you're making exactly the mistake I tried to warn about above:

The same logic applies to any improvements that are reported in a PR that adds code to the standard library; such improvements reflect that the MIR inliner previously made wrong decisions and now happens (by sheer luck) to in a few cases make better ones.

In all cases the relevant code is behind an if false to an opt build, so LLVM should be just as capable of optimizing these checks before as it is now. I strongly suspect that what we've done is just nudge the MIR inliner to inline less, and in terms of binary size, less inlining tends to be a win.

I do not understand exactly why the MIR inliner tends to do inlinings that LLVM doesn't, but I've done a fair bit of analysis in that area before and it does. For some reason, a lot of drop-related symbols appear or disappear from binaries when inlining is tweaked.

@pnkfelix
Copy link
Member

Thanks for the explanations @saethlin , I really appreciate it, and I totally buy the explanation you have provided here.

These are really good thoughts. I am wondering if there's any place less transient we could put them, e.g. in some sort of guide/document that would help people doing the rustc-perf triage with the kinds of things they should be considering.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Mar 14, 2024
@saethlin
Copy link
Member Author

Not just perf triage, we should have a manual for doing perf report analysis in general.

bjorn3 pushed a commit to bjorn3/rust that referenced this pull request Mar 16, 2024
…=RalfJung

Distinguish between library and lang UB in assert_unsafe_precondition

As described in rust-lang#121583 (comment), `assert_unsafe_precondition` now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).

`debug_assert_nounwind` was originally added to avoid the "only at runtime" aspect of `assert_unsafe_precondition`. Since then the difference between the macros has gotten muddied. This totally revamps the situation.

Now _all_ preconditions shall be checked with `assert_unsafe_precondition`. If you have a precondition that's only checkable at runtime, do a `const_eval_select` hack, as done in this PR.

r? RalfJung
matthiaskrgr pushed a commit to matthiaskrgr/rust that referenced this pull request Mar 21, 2024
…=RalfJung

Distinguish between library and lang UB in assert_unsafe_precondition

As described in rust-lang#121583 (comment), `assert_unsafe_precondition` now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).

`debug_assert_nounwind` was originally added to avoid the "only at runtime" aspect of `assert_unsafe_precondition`. Since then the difference between the macros has gotten muddied. This totally revamps the situation.

Now _all_ preconditions shall be checked with `assert_unsafe_precondition`. If you have a precondition that's only checkable at runtime, do a `const_eval_select` hack, as done in this PR.

r? RalfJung
@CAD97
Copy link
Contributor

CAD97 commented Mar 28, 2024

I think I stumbled onto why this is causing "chaotic, small, about-same-magnitude changes" by "perturbing the MIR inliner's decisions" in #123174: the newly introduced branch is blocking MIR inlining of #[inline] functions like Vec::deref, which without the branch would otherwise be a single straightline basic block and obvious candidates for inlining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.