Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code less optimized when accessing a specific array element, only on nightly #111508

Open
antonilol opened this issue May 12, 2023 · 9 comments · Fixed by #125347
Open

code less optimized when accessing a specific array element, only on nightly #111508

antonilol opened this issue May 12, 2023 · 9 comments · Fixed by #125347
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@antonilol
Copy link
Contributor

I tried this code on rustc nightly (rustc 1.71.0-nightly (c4190f2d3 2023-05-07)) on godbolt and the assembly code contains some unnecessary checks, but not when changing the value of N to anything different than 24 (but not out of bounds for a) or changing the version to 1.69.0 or beta

const N: usize = 24;

pub fn example(a: Vec<u8>) -> u8 {
    if a.len() != 32 {
        return 0;
    }

    let a: [u8; 32] = a.try_into().unwrap();

    a[15] + a[N]
}

https://godbolt.org/z/WMWv5s57o

@antonilol antonilol added the C-bug Category: This is a bug. label May 12, 2023
@clubby789
Copy link
Contributor

clubby789 commented May 12, 2023

searched nightlies: from nightly-2022-11-01 to nightly-2023-05-12
regressed nightly: nightly-2023-04-30
searched commit range: f495605...87b1f89
regressed commit: f229949

bisected with cargo-bisect-rustc v0.6.6

Host triple: x86_64-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc 2022-11-01 --prompt --script ./script.sh
-> #108106, cc @the8472

It looks like the vec is copied through an alloca instead of just being handled through pointers. The cause is probably the layout changing from type { { i64, ptr }, i64 } to type { { ptr, i64 }, i64 }

@clubby789 clubby789 added S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels May 12, 2023
@the8472
Copy link
Member

the8472 commented May 12, 2023

The cause is probably the layout changing from type { { i64, ptr }, i64 } to type { { ptr, i64 }, i64 }

That's the intended layout. And I wouldn't expect pointer-in-the-middle to optimize better other than by accident.

And the optimization failure depends on which array element gets accessed after the vec has been turned into an array. So it shouldn't even be related to the Vec struct layout anymore. So it looks more like spooky action at the distance going on in LLVM.

It's also possible that the Result layout for the try_into changed.

@saethlin
Copy link
Member

The index 24 seems to be special. Don't know why.

There's also something very odd going on in the MIR for this: https://godbolt.org/z/f9Wa46cMr

    let mut _0: u8;                      // return place in scope 0 at /app/example.rs:3:31: 3:33
    let mut _2: usize;                   // in scope 0 at /app/example.rs:4:8: 4:15
    let mut _3: &std::vec::Vec<u8>;      // in scope 0 at /app/example.rs:4:8: 4:15
    let _4: [u8; 32];                    // in scope 0 at /app/example.rs:8:9: 8:10
    let mut _5: std::result::Result<[u8; 32], std::vec::Vec<u8>>; // in scope 0 at /app/example.rs:8:23: 8:35
    let mut _6: std::vec::Vec<u8>;       // in scope 0 at /app/example.rs:8:23: 8:24
    let mut _7: u8;                      // in scope 0 at /app/example.rs:10:5: 10:10
    let mut _8: u8;                      // in scope 0 at /app/example.rs:10:13: 10:17
    let mut _9: bool;                    // in scope 0 at /app/example.rs:11:1: 11:2
        _7 = _4[15 of 16];               // scope 1 at /app/example.rs:10:5: 10:10
        StorageLive(_8);                 // scope 1 at /app/example.rs:10:13: 10:17
        _8 = _4[24 of 25];               // scope 1 at /app/example.rs:10:13: 10:17

The 15 of 16 is a bit surprising to me. Maybe this is just a subtelty of our MIR dumps? But the array is length 32, so I would have expected to see a 32 there.

Cranking up the MIR inliner seems to improve things, but only if you don't also increase the inlining threshold for functions with #[inline]: https://godbolt.org/z/Ed4GvPe8d (I feel like LLVM should be more stable than this)

@workingjubilee
Copy link
Member

I am guessing this is on an inflection point of a "if x is less than y, do this" and an "if x is greater than y, do this" heuristic, where one of the cases should be "greater or equal than" or "less than or equal".

@erikdesjardins
Copy link
Contributor

In nightly looking at 24 vs. 25, the only difference in the LLVM IR we generate is the GEP index, as you would expect:

https://godbolt.org/z/zYcs9hhYc

-  %11 = getelementptr inbounds [32 x i8], ptr %a1, i64 0, i64 24
+  %11 = getelementptr inbounds [32 x i8], ptr %a1, i64 0, i64 25
  %_8 = load i8, ptr %11, align 1, !noundef !2

@nikic
Copy link
Contributor

nikic commented May 13, 2023

It looks like it gets SROAd (via bit insert/extract) if the index is 25 but not if it is 24. Haven't looked into why.

Something that would also fix this is to forward load from memcpy target to load from memcpy source in GVN, which seems like it would be a good idea to do anyway.

@cjgillot
Copy link
Contributor

The 15 of 16 is a bit surprising to me. Maybe this is just a subtelty of our MIR dumps? But the array is length 32, so I would have expected to see a 32 there.

This is not an issue. This is the ConstantIndex projection. 15 is the offset and 16 is the minimum length. That information is not used by codegen, so const-prop sets it to offset+1.

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
@erikdesjardins
Copy link
Contributor

Looks like this was fixed in 1.72: https://godbolt.org/z/W4j8qx8s7. I don't know why.

@clubby789 clubby789 added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Feb 24, 2024
tesuji added a commit to tesuji/rustc that referenced this issue May 20, 2024
tesuji added a commit to tesuji/rustc that referenced this issue Jun 8, 2024
tesuji added a commit to tesuji/rustc that referenced this issue Jun 9, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 9, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 10, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 11, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 13, 2024
@bors bors closed this as completed in 7ac6c2f Jun 14, 2024
@workingjubilee
Copy link
Member

#130726 "unfixes" this if it lands, unfortunately.

@jieyouxu jieyouxu added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants