Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaybeUninit seems to prevent RVO in even the most trivial cases. #90595

Closed
mcy opened this issue Nov 4, 2021 · 13 comments · Fixed by #121298
Closed

MaybeUninit seems to prevent RVO in even the most trivial cases. #90595

mcy opened this issue Nov 4, 2021 · 13 comments · Fixed by #121298
Assignees
Labels
C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@mcy
Copy link
Contributor

mcy commented Nov 4, 2021

I tried this code:

pub struct Foo([u8; 1000]);

extern "C" {
  fn init(p: *mut Foo);
}

impl Foo {
  pub fn new_from_uninit() -> Self {
    let mut x = std::mem::MaybeUninit::uninit();
    unsafe {
      init(x.as_mut_ptr());
      x.assume_init()
    }
  }

  pub fn new() -> Self {
    let mut x = Self([0; 1000]);
    unsafe { init(&mut x) }
    x
  }
}

The generated assembly at -Oz on x86 is:

example::Foo::new_from_uninit:
        push    r14
        push    rbx
        sub     rsp, 1000
        mov     rbx, rdi
        mov     r14, rsp
        mov     rdi, r14
        call    qword ptr [rip + init@GOTPCREL]
        mov     edx, 1000
        mov     rdi, rbx
        mov     rsi, r14
        call    qword ptr [rip + memcpy@GOTPCREL] // No RVO! >:(
        mov     rax, rbx
        add     rsp, 1000
        pop     rbx
        pop     r14
        ret

example::Foo::new:
        push    rbx
        mov     rbx, rdi
        mov     edx, 1000
        xor     esi, esi
        call    qword ptr [rip + memset@GOTPCREL]
        mov     rdi, rbx
        call    qword ptr [rip + init@GOTPCREL]
        mov     rax, rbx
        pop     rbx
        ret

Observe that Rust (or LLVM, as the case may be) fails to RVO x when using MaybeUninit, ironically having potentially worse performance than the one that calls memset.

Complete godbolt example: https://godbolt.org/z/c1ccf7WeK

Here is the pertinent optimized IR:

define void @_ZN7example3Foo15new_from_uninit17h20aebee91382058eE(%Foo* noalias nocapture sret(%Foo) dereferenceable(1000) %0) unnamed_addr #0 !dbg !6 {
start:
  %x = alloca %"std::mem::MaybeUninit<Foo>", align 1
  %1 = getelementptr inbounds %"std::mem::MaybeUninit<Foo>", %"std::mem::MaybeUninit<Foo>"* %x, i64 0, i32 0, i64 0, !dbg !11
  call void @llvm.lifetime.start.p0i8(i64 1000, i8* nonnull %1), !dbg !11
  
  %2 = bitcast %"std::mem::MaybeUninit<Foo>"* %x to %Foo*, !dbg !12
  call void @init(%Foo* nonnull %2), !dbg !20
  %3 = getelementptr inbounds %Foo, %Foo* %0, i64 0, i32 0, i64 0
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 1 dereferenceable(1000) %3, i8* noundef nonnull align 1 dereferenceable(1000) %1, i64 1000, i1 false), !dbg !21
  call void @llvm.lifetime.end.p0i8(i64 1000, i8* nonnull %1), !dbg !22
  ret void, !dbg !23
}

define void @_ZN7example3Foo3new17h8ad79a0e3ddd97ffE(%Foo* noalias nocapture sret(%Foo) dereferenceable(1000) %x) unnamed_addr #0 !dbg !24 {
start:
  %x89 = getelementptr inbounds %Foo, %Foo* %x, i64 0, i32 0, i64 0
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 1 dereferenceable(1000) %x89, i8 0, i64 1000, i1 false), !dbg !25
  tail call void @init(%Foo* nonnull %x), !dbg !26
  ret void, !dbg !27
}

Skimming the -O0 IR doesn't help much, since Rust needs to generate a whole mess of code in the MaybeUninit version. The thing that's very mysterious to me is that, somehow, the bitcast in @_ZN7example3Foo15new_from_uninit17h20aebee91382058eE is acting as a barrier for merging %x with the return slot %0??

This seems like an LLVM bug but filing here first in case it's bad IR codegen on rustc's part.

@mcy mcy added the C-bug Category: This is a bug. label Nov 4, 2021
@mcy
Copy link
Contributor Author

mcy commented Nov 5, 2021

https://godbolt.org/z/eq45ejr8T via @Thinkofname observes that the bitcast is a red herring. Maybe LLVM is having trouble GVN'ing %x and %0 together because they are of different types...? There's literally no other reason I can think of.

@alex
Copy link
Member

alex commented Nov 5, 2021

Is this possibly the same as https://bugs.llvm.org/show_bug.cgi?id=47114

@scottmcm
Copy link
Member

scottmcm commented Nov 7, 2021

Another similar, albeit not identical, issue with ManuallyDrop: #79914

@nikic nikic added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Nov 7, 2021
@nikic
Copy link
Contributor

nikic commented Nov 7, 2021

This optimization is illegal on the LLVM level because init() may unwind. It does work fine if you enable the c_unwind feature: https://godbolt.org/z/5Y7P3fdqr

@mcy
Copy link
Contributor Author

mcy commented Nov 8, 2021

Isn't unwinding in the foreign -> Rust direction UB? At the very least, Rust's unwind tables are set up (last time I checked, I could be wrong) so that unwinding through Rust jumps to a handler that aborts. Seems like this is a pessimization accounting for a case that cannot happen.

(-Cpanic=abort has the same effect, fwiw.)

Edit: hang on, how is this even disallowed at the IR level in the first place? Not only does it optimize correctly in the non-MaybeUninit case, but it also does so when using mem::uninitialized (ignoring for a moment the fact that that's also UB; I don't think it's interfering with the optimizer). I thought the bitcast was the culprit, but apparently not???

This is the IR in question but where I've just swapped out MaybeUninit for Foo, and then passed through opt: https://godbolt.org/z/G175hMnYz

I'm not sold on unwinding being the real culprit here, but it's not clear whether rustc is emitting poor IR or LLVM is doing something a posteriori dumb.

(I spent a while fighting with Alive2 to decide whether or not I believed unwinding was the problem and decided to go to sleep instead.)

@nikic
Copy link
Contributor

nikic commented Nov 8, 2021

Isn't unwinding in the foreign -> Rust direction UB? At the very least, Rust's unwind tables are set up (last time I checked, I could be wrong) so that unwinding through Rust jumps to a handler that aborts. Seems like this is a pessimization accounting for a case that cannot happen.

It is UB, but currently only behind the c_unwind feature flag, for historical reasons.

Edit: hang on, how is this even disallowed at the IR level in the first place?

If the function first writes to its argument and then unwinds, without the call slot optimization the write would not be visible. With the optimization the write becomes visible.

Now, in this particular case the destination is an sret argument, and we could argue that sret arguments shouldn't be observed on the unwind path. That would allow the optimization to occur without nounwind. We currently don't specify such a requirement for sret, but I think it would be viable (and I'm pretty sure I've seen a discussion about it somewhere in the past...)

@mcy
Copy link
Contributor Author

mcy commented Nov 8, 2021

Ah! That makes sense. I figured that because it was marked sret it was assumed the caller would not read it if the function unwound.

Still doesn't explain why the version without MaybeUninit doesn't have this problem, though. I didn't read the -O0 IR too closely, but perhaps rustc emits sufficiently different IR for that function that it winds up not mattering? Or is this just a matter that the call @could_unwind; call @llvm.memcpy operations can't be reordered a priori (for a suitable definition of "reorder").

@nikic
Copy link
Contributor

nikic commented Nov 9, 2021

Still doesn't explain why the version without MaybeUninit doesn't have this problem, though. I didn't read the -O0 IR too closely, but perhaps rustc emits sufficiently different IR for that function that it winds up not mattering? Or is this just a matter that the call @could_unwind; call @llvm.memcpy operations can't be reordered a priori (for a suitable definition of "reorder").

Without MaybeUninit some MIR optimization takes care of this (try -Z mir-opt-level=0 to disable). I'm not familiar with MIR optimizations though, so I don't know which one does it / which limitations it has.

@nikic
Copy link
Contributor

nikic commented Dec 4, 2021

I started an llvm-dev thread on the topic: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154164.html

Unfortunately, while looking into this I found a major bug in call slot optimization: The check introduced in llvm/llvm-project@703e488 does not look through pointer casts, and thus rarely triggers in practice. I'm afraid that once this is fixed (or once we have opaque pointers) this may commonly block the optimization for rust in practice.

@veber-alex
Copy link
Contributor

There is a regression with nightly compared to 1.64, the memcpy is back with opt-level of 2 or 3.
It still optimized with opt-level of z, 1 or s.
https://godbolt.org/z/43r8GjWcr

@nikic
Copy link
Contributor

nikic commented Oct 12, 2022

Goes away with -Z mir-opt-level=1, likely caused by MIR inlining. On the LLVM side, the issue appear to be a lifetime.start intrinsic blocking call slot optimization: https://llvm.godbolt.org/z/43oe7Mon9

@nikic
Copy link
Contributor

nikic commented Oct 13, 2022

Upstream patch: https://reviews.llvm.org/D135886

@nikic
Copy link
Contributor

nikic commented Apr 3, 2023

The aforementioned regression (for panic=abort) is fixed with the LLVM 16 upgrade on nightly.

@nikic nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Apr 3, 2023
compiler-errors added a commit to compiler-errors/rust that referenced this issue Jan 4, 2024
…t, r=nikic

Add codegen test for RVO on MaybeUninit

Codegen test for rust-lang#90595. Currently, this only works with `-Cpanic=abort`, but hopefully in the [future](https://www.npopov.com/2024/01/01/This-year-in-LLVM-2023.html#writable-and-dead_on_unwind) it should also work in the presence of panics.

r? `@nikic`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 4, 2024
…t, r=nikic

Add codegen test for RVO on MaybeUninit

Codegen test for rust-lang#90595. Currently, this only works with `-Cpanic=abort`, but hopefully in the [future](https://www.npopov.com/2024/01/01/This-year-in-LLVM-2023.html#writable-and-dead_on_unwind) it should also work in the presence of panics.

r? ``@nikic``
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jan 4, 2024
Rollup merge of rust-lang#119555 - Kobzol:maybeuninit-rvo-codegen-test, r=nikic

Add codegen test for RVO on MaybeUninit

Codegen test for rust-lang#90595. Currently, this only works with `-Cpanic=abort`, but hopefully in the [future](https://www.npopov.com/2024/01/01/This-year-in-LLVM-2023.html#writable-and-dead_on_unwind) it should also work in the presence of panics.

r? ``@nikic``
@Kobzol Kobzol removed the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Jan 5, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 19, 2024
Set writable and dead_on_unwind attributes for sret arguments

Set the `writable` and `dead_on_unwind` attributes for `sret` arguments. This allows call slot optimization to remove more memcpy's.

See https://llvm.org/docs/LangRef.html#parameter-attributes for the specification of these attributes.

Fixes rust-lang#90595.
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 8, 2024
Set writable and dead_on_unwind attributes for sret arguments

Set the `writable` and `dead_on_unwind` attributes for `sret` arguments. This allows call slot optimization to remove more memcpy's.

See https://llvm.org/docs/LangRef.html#parameter-attributes for the specification of these attributes. In short, the statement we're making here is that:

 * The return slot is writable.
 * The return slot will not be read if the function unwinds.

Fixes rust-lang#90595.
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 24, 2024
Set writable and dead_on_unwind attributes for sret arguments

Set the `writable` and `dead_on_unwind` attributes for `sret` arguments. This allows call slot optimization to remove more memcpy's.

See https://llvm.org/docs/LangRef.html#parameter-attributes for the specification of these attributes. In short, the statement we're making here is that:

 * The return slot is writable.
 * The return slot will not be read if the function unwinds.

Fixes rust-lang#90595.
@bors bors closed this as completed in 284f94f Apr 25, 2024
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Apr 26, 2024
Set writable and dead_on_unwind attributes for sret arguments

Set the `writable` and `dead_on_unwind` attributes for `sret` arguments. This allows call slot optimization to remove more memcpy's.

See https://llvm.org/docs/LangRef.html#parameter-attributes for the specification of these attributes. In short, the statement we're making here is that:

 * The return slot is writable.
 * The return slot will not be read if the function unwinds.

Fixes rust-lang/rust#90595.
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
Set writable and dead_on_unwind attributes for sret arguments

Set the `writable` and `dead_on_unwind` attributes for `sret` arguments. This allows call slot optimization to remove more memcpy's.

See https://llvm.org/docs/LangRef.html#parameter-attributes for the specification of these attributes. In short, the statement we're making here is that:

 * The return slot is writable.
 * The return slot will not be read if the function unwinds.

Fixes rust-lang/rust#90595.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants