Skip to content

perf(codegen): Eliminate size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843

Open
TKanX wants to merge 1 commit intorust-lang:mainfrom
TKanX:bugfix/152788-codegen-dst-size-nuw-assume
Open

perf(codegen): Eliminate size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843
TKanX wants to merge 1 commit intorust-lang:mainfrom
TKanX:bugfix/152788-codegen-dst-size-nuw-assume

Conversation

@TKanX
Copy link
Copy Markdown
Contributor

@TKanX TKanX commented Feb 19, 2026

View all comments

Summary:

Problem:

size_of_val(p) == 0 fails to optimize away for DST types that have a statically-known non-zero-sized prefix:

pub struct Foo<T: ?Sized>(pub [u32; 3], pub T);

pub fn demo(p: &Foo<dyn std::fmt::Debug>) -> bool {
    std::mem::size_of_val(p) == 0  // always false, but LLVM can't prove it
}

Foo has a 12-byte prefix, so its total size is always ≥ 12. Yet the comparison persists as a runtime computation in LLVM IR. This matters because Box<dyn T> drop emits this exact check to guard the deallocation call — for types with a guaranteed non-zero prefix, the branch should vanish but doesn't.

The slice tail variant Foo<[i32]> already optimized correctly; Foo<dyn Trait> and Foo<[u8]> did not.

Root Cause:

In size_and_align_of_dst (the ADT/Tuple branch), the size computation is:

full_size = (offset + unsized_size + (align-1)) & -align

LLVM cannot prove full_size > 0 because:

  1. offset + unsized_size used plain add — no overflow flags, so LLVM cannot conclude the result is ≥ offset.
  2. (x + addend) & -align — LLVM has no fold to prove that alignment rounding never reduces the value below x.

Solution:

Two changes:

  1. add nuw nsw on offset + unsized_size — the sum is bounded by the rounded size ≤ isize::MAX, so neither signed nor unsigned overflow is possible. Tells LLVM: unrounded_size ≥ offset.

  2. assume(full_size ≥ unrounded_size)round_up(x, a) ≥ x is a mathematical identity for power-of-two a. Tells LLVM: full_size ≥ unrounded_size ≥ offset. If offset > 0, the chain proves full_size > 0.

LLVM IR Comparison:

Foo<dyn Debug> — before (godbolt):

define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  %0 = getelementptr inbounds nuw i8, ptr %p.1, i64 8
  %1 = load i64, ptr %0, align 8, !range !3, !invariant.load !4
  %2 = getelementptr inbounds nuw i8, ptr %p.1, i64 16
  %3 = load i64, ptr %2, align 8, !range !5, !invariant.load !4
  %4 = tail call i64 @llvm.umax.i64(i64 %3, i64 4)
  %5 = add nuw i64 %1, 11
  %6 = add i64 %5, %4
  %7 = sub i64 0, %4
  %8 = and i64 %6, %7
  %_0 = icmp eq i64 %8, 0
  ret i1 %_0
}

Foo<dyn Debug> — after:

define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  ret i1 false
}

Foo<[u8]> — before:

define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  %0 = add i64 %p.1, 15
  %_0 = icmp ult i64 %0, 4
  ret i1 %_0
}

Foo<[u8]> — after:

define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  ret i1 false
}

Changes:

  • compiler/rustc_codegen_ssa/src/size_of_val.rs: addunchecked_suadd (NUW+NSW) on offset + unsized_size; add assume(full_size ≥ unrounded_size).
  • tests/codegen-llvm/dst-size-of-val-not-zst.rs: new codegen test verifying size_of_val == 0 folds to ret i1 false for Foo<dyn Debug>, Foo<[u8]>, and Foo<[i32]>.

Fixes #152788.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 19, 2026
@rustbot

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@TKanX
Copy link
Copy Markdown
Contributor Author

TKanX commented Feb 20, 2026

@rustbot label +A-LLVM +A-codegen +C-optimization +T-compiler

@rustbot rustbot added A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such labels Feb 20, 2026
@fmease
Copy link
Copy Markdown
Member

fmease commented Feb 21, 2026

r? codegen

@rustbot rustbot assigned dianqk and unassigned fmease Feb 21, 2026
@rust-bors

This comment has been minimized.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 22, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Feb 22, 2026

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@TKanX TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from a9ec27f to 8339cfe Compare February 22, 2026 05:32
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Feb 22, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@TKanX
Copy link
Copy Markdown
Contributor Author

TKanX commented Feb 22, 2026

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2026
@TKanX TKanX requested a review from scottmcm February 22, 2026 05:34
Comment on lines +183 to +189
// Alignment rounding can only increase the size, never decrease it:
// `round_up(x, a) >= x` for power-of-two `a`. With the `nuw` on the
// addition above, LLVM can therefore deduce
// `full_size >= unrounded_size >= offset`, which proves `full_size > 0`
// for types with a non-zero-sized prefix (#152788).
let size_ge = bx.icmp(IntPredicate::IntUGE, full_size, unrounded_size);
bx.assume(size_ge);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on which things you tried and why this is the best one? Was it not enough to say that the alignment is a power-of-two? Or...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask because most of the text in the OP is just useless LLM slop, and the updates to the tests make me suspicious.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottmcm

Can you elaborate on which things you tried and why this is the best one? Was it not enough to say that the alignment is a power-of-two? Or...

Tried nuw-only (unchecked_uadd) first. That gives LLVM unrounded >= offset > 0 but it stops at the rounding — LLVM can't prove (x + a-1) & -a >= x. Also checked whether feeding ctpop(align) == 1 would help, but there's no fold for "round-up is monotonic when alignment is pow2" in InstCombine/ValueTracking. So the assume tells LLVM the conclusion directly.

nsw (making it unchecked_suadd) is because unrounded ≤ rounded ≤ isize::MAX. Same reasoning as your #152867.

I ask because most of the text in the OP is just useless LLM slop, and the updates to the tests make me suspicious.

Sorry about the OP — English isn't my native language, I overwrite when trying to be precise. Will clean it up.

For the tests: CHECK-NOT: icmp broke because assume itself emits an icmp. The !range checks on the first two functions were dropped because the assume keeps the size computation alive, so there's now a size load before the alignment load — FileCheck hits the wrong one. Range metadata is still verified in align_load_from_align_of_val below. RANGE_METAALIGN_RANGE since it only covers alignment loads now. Range value {1, 0}{1, 0x20000001} is Align::max_for_target (same change as #152929).

Happy to close this if you'd rather land it as part of #152867.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Landing this separately is great -- I opened the issue because this particular bit about what LLVM can prove is different enough from the point of layout_of_val that it's better to have the changes separated. (That's why I pulled out #152929 too 🙂 )

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, I experimented a bit https://llvm.godbolt.org/z/haGYz7aax and even getting lots of annotations on everything and assume it's still not able to understand what's happening properly.

(Also it's so annoying to see add nsw i64 %4, -1 since that used to be sub nuw nsw i64 %4, 1 but LLVM just insists on throwing that information away.)

@dianqk
Copy link
Copy Markdown
Member

dianqk commented Feb 22, 2026

r? scottmcm

@rustbot rustbot assigned scottmcm and unassigned dianqk Feb 22, 2026
Comment on lines -33 to 36
// CHECK: load [[USIZE:i[0-9]+]], {{.+}} !range [[RANGE_META:![0-9]+]]
// CHECK: load [[USIZE:i[0-9]+]]
// CHECK-NOT: llvm.umax
// CHECK-NOT: icmp
// CHECK-NOT: select
// CHECK: ret
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem here is that if this was testing for "not icmp", just removing that check means this test is (potentially) no longer testing what it was trying to test before.

If there's an icmp now, probably what you want instead is something like

    // CHECK-NOT: llvm.umax
    // CHECK-NOT: icmp
    // CHECK-NOT: select
    // CHECK: [[DOES_NOT_SHRINK:%.+]] = icmp ... something here ...
    // CHECK-NEXT: call void @llvm.assume(i1 [[DOES_NOT_SHRINK]])
    // CHECK-NOT: llvm.umax
    // CHECK-NOT: icmp
    // CHECK-NOT: select

so that the test is that the only icmp is the expected one that's used for the assume.


Similarly, why remove the !range check? It's not being optimized out, is it? (If it is, that's also interesting.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the emitted IR — the assume (and the entire size computation) gets DCE'd in these two functions at -O3, since they only need alignment for the field projection. So there's no extra icmp at all, and the alignment load is still the first one with !range. Restored the original patterns as-is; the file is now unchanged from main.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is no longer unchanged, so this comment applies again.

@rustbot author

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently failing on LLVM20, i added min-llvm-version: 21 thinking the ret i1 false fold would work there, but that was just an assumption. on LLVM20 the add nuw nsw + assume(icmp uge ...) survive in IR and can be checked directly. should i rewrite the test to check the emission side instead?

@scottmcm

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest you survey what testing exists for it and what the intent of the various tests are.

In general, it's fine to limit desirable optimization tests to latest LLVM only, since that's what we ship and if people are using older LLVM then it's at least someone expected that things will optimize less well.

On the other hand, if it's "we're testing what rustc is doing" tests, then those should generally continue to pass on older LLVM because we don't want rustc to break on older LLVMs.

@scottmcm similar to your suggestion: 5eec5e3

@rust-log-analyzer

This comment has been minimized.

@TKanX TKanX marked this pull request as draft March 4, 2026 02:41
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 4, 2026
@TKanX TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from 5423cd4 to a184e09 Compare March 4, 2026 03:13
@rust-log-analyzer

This comment has been minimized.

@TKanX TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from a184e09 to c45ca83 Compare March 4, 2026 04:23
@rust-log-analyzer

This comment has been minimized.

@TKanX TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from c45ca83 to 7f42ac4 Compare March 4, 2026 07:51
@rust-log-analyzer

This comment has been minimized.

@scottmcm
Copy link
Copy Markdown
Member

scottmcm commented Mar 5, 2026

I would suggest you survey what testing exists for it and what the intent of the various tests are.

In general, it's fine to limit desirable optimization tests to latest LLVM only, since that's what we ship and if people are using older LLVM then it's at least someone expected that things will optimize less well.

On the other hand, if it's "we're testing what rustc is doing" tests, then those should generally continue to pass on older LLVM because we don't want rustc to break on older LLVMs.


Another thing you could try would be whether -C opt-level=1 still is sufficient to meet the goals of the tests in question but because they try less they might be more consistent between versions. I don't know.

@TKanX TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from 7f42ac4 to e5e8c07 Compare March 7, 2026 09:32
…assume

Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>
@TKanX TKanX marked this pull request as ready for review March 10, 2026 05:14
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 10, 2026
@TKanX TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from e5e8c07 to 5eec5e3 Compare March 10, 2026 05:16
@scottmcm
Copy link
Copy Markdown
Member

@bors r+ rollup=iffy (conditional-on-llvm-version codegen tests are extra scary)

@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Mar 26, 2026

📌 Commit 5eec5e3 has been approved by scottmcm

It is now in the queue for this repository.

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 26, 2026
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Mar 26, 2026
…ze-nuw-assume, r=scottmcm

perf(codegen): Eliminate `size_of_val == 0` for DSTs with Non-zero-sized Prefix via NUW and Assume

*[View all comments](https://triagebot.infra.rust-lang.org/gh-comments/rust-lang/rust/pull/152843)*

### Summary:

#### Problem:

`size_of_val(p) == 0` fails to optimize away for DST types that have a statically-known non-zero-sized prefix:

```rust
pub struct Foo<T: ?Sized>(pub [u32; 3], pub T);

pub fn demo(p: &Foo<dyn std::fmt::Debug>) -> bool {
    std::mem::size_of_val(p) == 0  // always false, but LLVM can't prove it
}
```

`Foo` has a 12-byte prefix, so its total size is always ≥ 12. Yet the comparison persists as a runtime computation in LLVM IR. This matters because `Box<dyn T>` drop emits this exact check to guard the deallocation call — for types with a guaranteed non-zero prefix, the branch should vanish but doesn't.

The slice tail variant `Foo<[i32]>` already optimized correctly; `Foo<dyn Trait>` and `Foo<[u8]>` did not.

#### Root Cause:

In `size_and_align_of_dst` (the ADT/Tuple branch), the size computation is:

```
full_size = (offset + unsized_size + (align-1)) & -align
```

LLVM cannot prove `full_size > 0` because:

1. `offset + unsized_size` used plain `add` — no overflow flags, so LLVM cannot conclude the result is ≥ `offset`.
2. `(x + addend) & -align` — LLVM has no fold to prove that alignment rounding never reduces the value below `x`.

#### Solution:

Two changes:

1. **`add nuw nsw` on `offset + unsized_size`** — the sum is bounded by the rounded size ≤ `isize::MAX`, so neither signed nor unsigned overflow is possible. Tells LLVM: `unrounded_size ≥ offset`.

2. **`assume(full_size ≥ unrounded_size)`** — `round_up(x, a) ≥ x` is a mathematical identity for power-of-two `a`. Tells LLVM: `full_size ≥ unrounded_size ≥ offset`. If `offset > 0`, the chain proves `full_size > 0`.

#### LLVM IR Comparison:

`Foo<dyn Debug>` — before ([godbolt](https://rust.godbolt.org/z/r1d5n6Phe)):

```llvm
define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  %0 = getelementptr inbounds nuw i8, ptr %p.1, i64 8
  %1 = load i64, ptr %0, align 8, !range !3, !invariant.load !4
  %2 = getelementptr inbounds nuw i8, ptr %p.1, i64 16
  %3 = load i64, ptr %2, align 8, !range !5, !invariant.load !4
  %4 = tail call i64 @llvm.umax.i64(i64 %3, i64 4)
  %5 = add nuw i64 %1, 11
  %6 = add i64 %5, %4
  %7 = sub i64 0, %4
  %8 = and i64 %6, %7
  %_0 = icmp eq i64 %8, 0
  ret i1 %_0
}
```

`Foo<dyn Debug>` — after:

```llvm
define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  ret i1 false
}
```

`Foo<[u8]>` — before:

```llvm
define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  %0 = add i64 %p.1, 15
  %_0 = icmp ult i64 %0, 4
  ret i1 %_0
}
```

`Foo<[u8]>` — after:

```llvm
define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  ret i1 false
}
```

### Changes:

- `compiler/rustc_codegen_ssa/src/size_of_val.rs`: `add` → `unchecked_suadd` (NUW+NSW) on `offset + unsized_size`; add `assume(full_size ≥ unrounded_size)`.
- `tests/codegen-llvm/dst-size-of-val-not-zst.rs`: new codegen test verifying `size_of_val == 0` folds to `ret i1 false` for `Foo<dyn Debug>`, `Foo<[u8]>`, and `Foo<[i32]>`.

Fixes rust-lang#152788.
@matthiaskrgr
Copy link
Copy Markdown
Member

@bors r-
#154438 (comment)

@rust-bors rust-bors bot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 26, 2026
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Mar 26, 2026

This pull request was unapproved.

This PR was contained in a rollup (#154438), which was unapproved.

@TKanX
Copy link
Copy Markdown
Contributor Author

TKanX commented Mar 26, 2026

@matthiaskrgr build fail?

@matthiaskrgr
Copy link
Copy Markdown
Member

looks like the test you changed here failed: https://triage.rust-lang.org/gha-logs/rust-lang/rust/68771279640#L2026-03-26T19:26:05.8780382Z

@@ -30,10 +33,18 @@ pub struct Struct<W: ?Sized> {
pub fn eliminates_runtime_check_when_align_1(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An option if it makes your life easier: since this one is "just" an optimization, you could pull it into a separate file with min-llvm-version: 21 and not worry about llvm20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

size_of_val(p) == 0 doesn't optimize out for clearly-not-ZST values

8 participants