Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed assert 2 != 2 after update to LLVM 17 #115385

Closed
djkoloski opened this issue Aug 30, 2023 · 13 comments · Fixed by #115591
Closed

Failed assert 2 != 2 after update to LLVM 17 #115385

djkoloski opened this issue Aug 30, 2023 · 13 comments · Fixed by #115591
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. P-critical Critical priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@djkoloski
Copy link
Contributor

djkoloski commented Aug 30, 2023

Running this code in release mode fails:

main.rs:

use serde::Deserialize;

#[derive(Deserialize)]
pub struct TestExecutionPlan {
    pub pos: usize,
    /// List of actions to performs.
    pub actions: Vec<String>,
    pub expected_reboot: bool,
    pub unexpected_reboot_count: u8,
    pub failures: Vec<String>,
}

fn main() {
    let config =
        "{\"pos\":2,\"actions\":[],\"expected_reboot\":true,\"unexpected_reboot_count\":0,\"failures\":[]}";
    let restored_plan: TestExecutionPlan =
        serde_json5::from_str(config).expect("Expected a valid JSON config.");
    assert_eq!(restored_plan.pos, 2);
}

.cargo/config.toml:

[target.x86_64-unknown-linux-gnu]
rustflags = [
    "-Ccodegen-units=1",
]

Cargo.toml

[package]
name = "rust_playground"
version = "0.1.0"
edition = "2021"

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json5 = "0.1"

When run with cargo run --release:

thread 'main' panicked at src/main.rs:18:5:
assertion `left == right` failed
  left: 2
 right: 2
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This code passes under MIRI, so this is not the result of UB.

This does not repro on nightly, or even Fuchsia with the toolchain built upstream. We suspect that this may be because we set -Cpanic=abort on our toolchain, but we do have more differences from the upstream binaries. Edit: this turned out to be because we had -Ccodegen-units=1 set in our build.

We managed to bisect this in-tree down to #114048, which updated the LLVM submodule to 17.0.0. That, along with the nature of the bug, gives us confidence that this is likely a miscompile when using the codegen-units flag.

This reproduces with the latest nightly:

$ rustc --version
rustc 1.74.0-nightly (84a9f4c6e 2023-08-29)

as well as the nightly immediately after the LLVM upgrade:

$ rustc +nightly-2023-08-09 --version
rustc 1.73.0-nightly (f88a8b71c 2023-08-08)
@djkoloski djkoloski added the C-bug Category: This is a bug. label Aug 30, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 30, 2023
@djkoloski djkoloski changed the title Failed assert 2 != 2 on x86_64-fuchsia after update to LLVM 17 Failed assert 2 != 2 after update to LLVM 17 Aug 30, 2023
@tmandry tmandry added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. P-critical Critical priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 30, 2023
@workingjubilee
Copy link
Member

This code passes under MIRI, so this is not the result of UB.

You may already know this, but: Unless you set all the "also warn on..." flags on at once, the Rust MIR interpreter allowing something to pass should only be taken as a probabilistic statement on UB. It defaults to conservative. There are several undecided points of the Rust operational model that may become explicitly UB but are not necessarily flagged as such by miri, and even though "Rust hasn't defined it as formally always-UB" means "rustc should not be attempting to suggest LLVM optimizes based on this", we sometimes discover LLVM has added an optimization that is technically valid according to the LLVM language's model if you interpret vague things in an... interesting way.

We also sometimes discover LLVM has simply added an invalid transformation, of course, which still seems more likely. "The compiler told you to reject the evidence of your eyes and ears. It was its final, most essential command."

@djkoloski
Copy link
Contributor Author

Thank you, yes. I should clarify that I ran the crate through MIRI to ensure that this is not the result of UB in the Rust specification of this program. I did that because the repro currently pulls in two external crates which are not guaranteed to be UB-free. You are correct that this does not guarantee that UB doesn't sneak in during the translation from Rust semantics to LLVM semantics.

@lqd
Copy link
Member

lqd commented Aug 30, 2023

From a quick look, à la cargo rustc --profile release -q -- -Ccodegen-units=1 -Cllvm-args=-opt-bisect-limit=17447 --emit=llvm-ir, bisects to the JumpThreadingPass on the main function.

I haven't tried reducing yet so I'm not sure if the following IR has been extracted correctly, but here goes:

The diff between the two is quite small, and the reproducer should reduce nicely. I'll try doing so tomorrow if no one gets to it before then.

(cc @rust-lang/wg-llvm in the meantime)


update: nothing substantial on the reduction yet , and I've asked the LLVM WG for help in this zulip topic.

@djkoloski
Copy link
Contributor Author

Thanks for the help, I have reproduced the erroneous pass (-opt-bisect-limit is very helpful!). I threw together a script that compiles a .ll file, runs it to verify it does not assert, then runs the optimization pass, recompiles, and runs to verify it does assert. I have that plugged into llvm-reduce and am letting that run for a while to see if it can minimize the reproduction.

@lqd
Copy link
Member

lqd commented Aug 31, 2023

I'd love to take a look at the script: I had hoped to do exactly that, but had issues and went the other route of minimizing from the rust side (starting with serde_json5, but after doing that I wondered if I'd also have to minimize pest and serde and asked the WG for help with building and running the repro for llvm-reduce).

@djkoloski
Copy link
Contributor Author

Here's a gist of the script I'm running with llvm-reduce. I was able to grab the .ll from right before the bad pass (in target/release/deps) and plug that into llvm-reduce.

@DianQK
Copy link
Member

DianQK commented Sep 1, 2023

@rustbot claim

@DianQK
Copy link
Member

DianQK commented Sep 2, 2023

Upstream issue: llvm/llvm-project#65195.

I shared my reduced method for this issue.
Some interesting steps may be generic.

  1. Currently we bisected to JumpThreadingPass. I guess we assume one branch is unreachable.
  2. This project disable LTO.
  3. I change the code to
let restored_plan: TestExecutionPlan = serde_json5::from_str(config).expect("Expected a valid JSON config.");
if restored_plan.pos == 2 {
  good()
} else {
  bad()
}
  1. Use llvm-nm to check the good symbol dismissed.
  2. Also bisected to JumpThreadingPass
  3. Run the IR before JumpThreadingPass with opt16. good also be removed. My guess is that JumpThreadingPass is just a failure and we need to continue to find the error.
  4. (Key!) pipeline with opt-16: opt17 -O2 -S -opt-bisect-limit=n | opt16 -O2 -S | grep @good
  5. Bisected to SimplifyCFG
  6. Compare opt16 vs opt17 on the SimplifyCFG step
  7. DONE ^^

PR coming soon (Going back to my parent's home later, so the PR will be tomorrow.).
I also need some trade-off thoughts for the PR.

@djkoloski
Copy link
Contributor Author

In case it's useful, I also completed a reduction: good.ll, bad.ll. The diff between them is pretty small and also shows a branch being optimized out:

image

Reading through this a bit more closely, I think that llvm-reduce may have been too aggressive. It looks like it removed the branch checking that the deserialized value equals two, which is not what I wanted. This reduction may just end up being garbage.

@DianQK
Copy link
Member

DianQK commented Sep 3, 2023

Sorry I got the wrong conclusion. The error should appear in JumpThreading. 😵😵‍💫🥹😭

@DianQK
Copy link
Member

DianQK commented Sep 3, 2023

In case it's useful, I also completed a reduction: good.ll, bad.ll.

Maybe You could try llvm-extract on good.ll.

@RalfJung
Copy link
Member

RalfJung commented Sep 5, 2023

Thank you, yes. I should clarify that I ran the crate through MIRI to ensure that this is not the result of UB in the Rust specification of this program.

Unfortunately there are also still gaps between Miri and the Rust spec, as explained in Miri's readme.

However, to my knowledge Miri can detect all de-facto UB that is exploited by LLVM or MIR optimizations. So a difference between the behavior of Miri and the compiled program is always one of

  • the result of non-determinism (different absolute allocation addresses, different thread scheduling, ...)
  • a consequence of an integer-to-pointer cast (that's why Miri warns about them by default)
  • a bug in Miri
  • a bug in codegen

The main caveat is that future Miri/rustc updates can lead to UB reports in previously UB-free programs, if Miri starts detecting more of the UB permitted by the spec.

@djkoloski
Copy link
Contributor Author

I have reverse-engineered a minimal repro with no deps based on the miscompiled LLVM IR:

main.rs

#[repr(i64)]
pub enum Boolean {
    False = 0,
    True = 1,
}

impl Clone for Boolean {
    fn clone(&self) -> Self {
        *self
    }
}

impl Copy for Boolean {}

#[link(name = "auxiliary")]
extern "C" {
    fn config_set_value(value: i64);
    fn set_value(foo: *mut i64);
}

/// # Safety
///
/// The next call to `set_value` must set its value to either 0 or 1 if x is
/// `true`.
#[no_mangle]
pub unsafe fn foo(x: bool) {
    let mut foo = core::mem::MaybeUninit::<i64>::uninit();
    // SAFETY: `set_value` always initializes the i64 it is given a pointer to.
    unsafe {
        set_value(foo.as_mut_ptr());
    }

    if x {
        // SAFETY: The caller has asserted that `set_value` will only set `foo`
        // to either 0 or 1.
        let l1 = unsafe { *foo.as_mut_ptr().cast::<Boolean>() };
        if matches!(l1, Boolean::False) {
            // SAFETY: We may write to `foo` here.
            unsafe {
                *foo.as_mut_ptr() = 0;
            }
        }
    }

    // SAFETY: `foo` is definitely initialized.
    let l2 = unsafe { *foo.as_mut_ptr() };
    if l2 == 2 {
        bar();
    }
}

#[no_mangle]
pub fn bar() {
    println!("Call to bar preserved!");
}

fn main() {
    let print = std::env::args().nth(1).unwrap() == "print";

    let (config, x) = if print {
        (2, false)
    } else {
        (0, true)
    };

    // SAFETY: (config, x) are always a valid pair to call `config_set_value`
    // and `foo` with.
    unsafe {
        config_set_value(config);
        foo(x);
    }
}

auxiliary.c

#include <stdint.h>

static int64_t i = 0;

void config_set_value(int64_t value) {
    i = value;
}

void set_value(int64_t *foo) {
    *foo = i;
}

.cargo/config.toml

[target.x86_64-unknown-linux-gnu]
rustflags = [
    "-Ccodegen-units=1",
    "-Lauxiliary",
]

And expresses between the two nightlies which bumped the LLVM version:

$ cargo +nightly-2023-08-08 run --release -- print
Call to bar preserved!
$ cargo +nightly-2023-08-09 run --release -- print

I'm moving this into a compiler codegen test and will close this issue when it is checked in.

djkoloski pushed a commit to djkoloski/rust that referenced this issue Sep 6, 2023
@bors bors closed this as completed in 7a4904c Sep 11, 2023
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Sep 11, 2023
Rollup merge of rust-lang#115591 - djkoloski:issue_115385, r=cuviper

Add regression test for LLVM 17-rc3 miscompile

Closes rust-lang#115385, see that issue for more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. P-critical Critical priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants