Failed assert `2 != 2` after update to LLVM 17 #115385

djkoloski · 2023-08-30T14:57:15Z

Running this code in release mode fails:

main.rs:

use serde::Deserialize;

#[derive(Deserialize)]
pub struct TestExecutionPlan {
    pub pos: usize,
    /// List of actions to performs.
    pub actions: Vec<String>,
    pub expected_reboot: bool,
    pub unexpected_reboot_count: u8,
    pub failures: Vec<String>,
}

fn main() {
    let config =
        "{\"pos\":2,\"actions\":[],\"expected_reboot\":true,\"unexpected_reboot_count\":0,\"failures\":[]}";
    let restored_plan: TestExecutionPlan =
        serde_json5::from_str(config).expect("Expected a valid JSON config.");
    assert_eq!(restored_plan.pos, 2);
}

.cargo/config.toml:

[target.x86_64-unknown-linux-gnu]
rustflags = [
    "-Ccodegen-units=1",
]

Cargo.toml

[package]
name = "rust_playground"
version = "0.1.0"
edition = "2021"

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json5 = "0.1"

When run with cargo run --release:

thread 'main' panicked at src/main.rs:18:5:
assertion `left == right` failed
  left: 2
 right: 2
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This code passes under MIRI, so this is not the result of UB.

This does not repro on nightly, or even Fuchsia with the toolchain built upstream. We suspect that this may be because we set -Cpanic=abort on our toolchain, but we do have more differences from the upstream binaries. Edit: this turned out to be because we had -Ccodegen-units=1 set in our build.

We managed to bisect this in-tree down to #114048, which updated the LLVM submodule to 17.0.0. That, along with the nature of the bug, gives us confidence that this is likely a miscompile when using the codegen-units flag.

This reproduces with the latest nightly:

$ rustc --version
rustc 1.74.0-nightly (84a9f4c6e 2023-08-29)

as well as the nightly immediately after the LLVM upgrade:

$ rustc +nightly-2023-08-09 --version
rustc 1.73.0-nightly (f88a8b71c 2023-08-08)

The text was updated successfully, but these errors were encountered:

workingjubilee · 2023-08-30T21:17:38Z

This code passes under MIRI, so this is not the result of UB.

You may already know this, but: Unless you set all the "also warn on..." flags on at once, the Rust MIR interpreter allowing something to pass should only be taken as a probabilistic statement on UB. It defaults to conservative. There are several undecided points of the Rust operational model that may become explicitly UB but are not necessarily flagged as such by miri, and even though "Rust hasn't defined it as formally always-UB" means "rustc should not be attempting to suggest LLVM optimizes based on this", we sometimes discover LLVM has added an optimization that is technically valid according to the LLVM language's model if you interpret vague things in an... interesting way.

We also sometimes discover LLVM has simply added an invalid transformation, of course, which still seems more likely. "The compiler told you to reject the evidence of your eyes and ears. It was its final, most essential command."

djkoloski · 2023-08-30T21:32:59Z

Thank you, yes. I should clarify that I ran the crate through MIRI to ensure that this is not the result of UB in the Rust specification of this program. I did that because the repro currently pulls in two external crates which are not guaranteed to be UB-free. You are correct that this does not guarantee that UB doesn't sneak in during the translation from Rust semantics to LLVM semantics.

lqd · 2023-08-30T22:09:10Z

From a quick look, à la cargo rustc --profile release -q -- -Ccodegen-units=1 -Cllvm-args=-opt-bisect-limit=17447 --emit=llvm-ir, bisects to the JumpThreadingPass on the main function.

I haven't tried reducing yet so I'm not sure if the following IR has been extracted correctly, but here goes:

this unreduced IR doesn't reproduce the regression
this unreduced IR does reproduce the regression

The diff between the two is quite small, and the reproducer should reduce nicely. I'll try doing so tomorrow if no one gets to it before then.

(cc @rust-lang/wg-llvm in the meantime)

update: nothing substantial on the reduction yet , and I've asked the LLVM WG for help in this zulip topic.

djkoloski · 2023-08-31T20:05:30Z

Thanks for the help, I have reproduced the erroneous pass (-opt-bisect-limit is very helpful!). I threw together a script that compiles a .ll file, runs it to verify it does not assert, then runs the optimization pass, recompiles, and runs to verify it does assert. I have that plugged into llvm-reduce and am letting that run for a while to see if it can minimize the reproduction.

lqd · 2023-08-31T20:13:53Z

I'd love to take a look at the script: I had hoped to do exactly that, but had issues and went the other route of minimizing from the rust side (starting with serde_json5, but after doing that I wondered if I'd also have to minimize pest and serde and asked the WG for help with building and running the repro for llvm-reduce).

djkoloski · 2023-08-31T20:20:05Z

Here's a gist of the script I'm running with llvm-reduce. I was able to grab the .ll from right before the bad pass (in target/release/deps) and plug that into llvm-reduce.

DianQK · 2023-09-01T15:56:48Z

@rustbot claim

DianQK · 2023-09-02T04:14:40Z

Upstream issue: llvm/llvm-project#65195.

I shared my reduced method for this issue.
Some interesting steps may be generic.

Currently we bisected to JumpThreadingPass. I guess we assume one branch is unreachable.
This project disable LTO.
I change the code to

let restored_plan: TestExecutionPlan = serde_json5::from_str(config).expect("Expected a valid JSON config.");
if restored_plan.pos == 2 {
  good()
} else {
  bad()
}

Use llvm-nm to check the good symbol dismissed.
Also bisected to JumpThreadingPass
Run the IR before JumpThreadingPass with opt16. good also be removed. My guess is that JumpThreadingPass is just a failure and we need to continue to find the error.
(Key!) pipeline with opt-16: opt17 -O2 -S -opt-bisect-limit=n | opt16 -O2 -S | grep @good
Bisected to SimplifyCFG
Compare opt16 vs opt17 on the SimplifyCFG step
DONE ^^

PR coming soon (Going back to my parent's home later, so the PR will be tomorrow.).
I also need some trade-off thoughts for the PR.

djkoloski · 2023-09-03T01:50:32Z

In case it's useful, I also completed a reduction: good.ll, bad.ll. The diff between them is pretty small and also shows a branch being optimized out:

Reading through this a bit more closely, I think that llvm-reduce may have been too aggressive. It looks like it removed the branch checking that the deserialized value equals two, which is not what I wanted. This reduction may just end up being garbage.

DianQK · 2023-09-03T04:47:18Z

Sorry I got the wrong conclusion. The error should appear in JumpThreading. 😵😵‍💫🥹😭

DianQK · 2023-09-03T04:51:57Z

In case it's useful, I also completed a reduction: good.ll, bad.ll.

Maybe You could try llvm-extract on good.ll.

RalfJung · 2023-09-05T20:26:28Z

Thank you, yes. I should clarify that I ran the crate through MIRI to ensure that this is not the result of UB in the Rust specification of this program.

Unfortunately there are also still gaps between Miri and the Rust spec, as explained in Miri's readme.

However, to my knowledge Miri can detect all de-facto UB that is exploited by LLVM or MIR optimizations. So a difference between the behavior of Miri and the compiled program is always one of

the result of non-determinism (different absolute allocation addresses, different thread scheduling, ...)
a consequence of an integer-to-pointer cast (that's why Miri warns about them by default)
a bug in Miri
a bug in codegen

The main caveat is that future Miri/rustc updates can lead to UB reports in previously UB-free programs, if Miri starts detecting more of the UB permitted by the spec.

djkoloski · 2023-09-06T00:41:12Z

I have reverse-engineered a minimal repro with no deps based on the miscompiled LLVM IR:

main.rs

#[repr(i64)]
pub enum Boolean {
    False = 0,
    True = 1,
}

impl Clone for Boolean {
    fn clone(&self) -> Self {
        *self
    }
}

impl Copy for Boolean {}

#[link(name = "auxiliary")]
extern "C" {
    fn config_set_value(value: i64);
    fn set_value(foo: *mut i64);
}

/// # Safety
///
/// The next call to `set_value` must set its value to either 0 or 1 if x is
/// `true`.
#[no_mangle]
pub unsafe fn foo(x: bool) {
    let mut foo = core::mem::MaybeUninit::<i64>::uninit();
    // SAFETY: `set_value` always initializes the i64 it is given a pointer to.
    unsafe {
        set_value(foo.as_mut_ptr());
    }

    if x {
        // SAFETY: The caller has asserted that `set_value` will only set `foo`
        // to either 0 or 1.
        let l1 = unsafe { *foo.as_mut_ptr().cast::<Boolean>() };
        if matches!(l1, Boolean::False) {
            // SAFETY: We may write to `foo` here.
            unsafe {
                *foo.as_mut_ptr() = 0;
            }
        }
    }

    // SAFETY: `foo` is definitely initialized.
    let l2 = unsafe { *foo.as_mut_ptr() };
    if l2 == 2 {
        bar();
    }
}

#[no_mangle]
pub fn bar() {
    println!("Call to bar preserved!");
}

fn main() {
    let print = std::env::args().nth(1).unwrap() == "print";

    let (config, x) = if print {
        (2, false)
    } else {
        (0, true)
    };

    // SAFETY: (config, x) are always a valid pair to call `config_set_value`
    // and `foo` with.
    unsafe {
        config_set_value(config);
        foo(x);
    }
}

auxiliary.c

#include <stdint.h>

static int64_t i = 0;

void config_set_value(int64_t value) {
    i = value;
}

void set_value(int64_t *foo) {
    *foo = i;
}

.cargo/config.toml

[target.x86_64-unknown-linux-gnu]
rustflags = [
    "-Ccodegen-units=1",
    "-Lauxiliary",
]

And expresses between the two nightlies which bumped the LLVM version:

$ cargo +nightly-2023-08-08 run --release -- print
Call to bar preserved!
$ cargo +nightly-2023-08-09 run --release -- print

I'm moving this into a compiler codegen test and will close this issue when it is checked in.

See rust-lang#115385 for more details.

Rollup merge of rust-lang#115591 - djkoloski:issue_115385, r=cuviper Add regression test for LLVM 17-rc3 miscompile Closes rust-lang#115385, see that issue for more details.

djkoloski added the C-bug Category: This is a bug. label Aug 30, 2023

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 30, 2023

djkoloski changed the title ~~Failed assert 2 != 2 on x86_64-fuchsia after update to LLVM 17~~ Failed assert 2 != 2 after update to LLVM 17 Aug 30, 2023

rustbot assigned DianQK Sep 1, 2023

DianQK mentioned this issue Sep 2, 2023

JumpThreading error removes call void @bar() instruction. llvm/llvm-project#65195

Closed

djkoloski mentioned this issue Sep 5, 2023

Merge tag llvmorg-17.0.0-rc4 into rustc/17.0-2023-07-29 rust-lang/llvm-project#151

Merged

djkoloski pushed a commit to djkoloski/rust that referenced this issue Sep 6, 2023

Add regression test for LLVM 17-rc3 miscompile

c18da3c

See rust-lang#115385 for more details.

djkoloski mentioned this issue Sep 6, 2023

Add regression test for LLVM 17-rc3 miscompile #115591

Merged

wesleywiser mentioned this issue Sep 7, 2023

Update LLVM submodule to 17.0.0-rc4 #115581

Merged

bors closed this as completed in 7a4904c Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed assert `2 != 2` after update to LLVM 17 #115385

Failed assert `2 != 2` after update to LLVM 17 #115385

djkoloski commented Aug 30, 2023 •

edited by rustbot

Loading

workingjubilee commented Aug 30, 2023

djkoloski commented Aug 30, 2023

lqd commented Aug 30, 2023 •

edited

Loading

djkoloski commented Aug 31, 2023

lqd commented Aug 31, 2023

djkoloski commented Aug 31, 2023

DianQK commented Sep 1, 2023

DianQK commented Sep 2, 2023 •

edited

Loading

djkoloski commented Sep 3, 2023

DianQK commented Sep 3, 2023 •

edited

Loading

DianQK commented Sep 3, 2023

RalfJung commented Sep 5, 2023 •

edited

Loading

djkoloski commented Sep 6, 2023

Failed assert 2 != 2 after update to LLVM 17 #115385

Failed assert 2 != 2 after update to LLVM 17 #115385

Comments

djkoloski commented Aug 30, 2023 • edited by rustbot Loading

workingjubilee commented Aug 30, 2023

djkoloski commented Aug 30, 2023

lqd commented Aug 30, 2023 • edited Loading

djkoloski commented Aug 31, 2023

lqd commented Aug 31, 2023

djkoloski commented Aug 31, 2023

DianQK commented Sep 1, 2023

DianQK commented Sep 2, 2023 • edited Loading

djkoloski commented Sep 3, 2023

DianQK commented Sep 3, 2023 • edited Loading

DianQK commented Sep 3, 2023

RalfJung commented Sep 5, 2023 • edited Loading

djkoloski commented Sep 6, 2023

Failed assert `2 != 2` after update to LLVM 17 #115385

Failed assert `2 != 2` after update to LLVM 17 #115385

djkoloski commented Aug 30, 2023 •

edited by rustbot

Loading

lqd commented Aug 30, 2023 •

edited

Loading

DianQK commented Sep 2, 2023 •

edited

Loading

DianQK commented Sep 3, 2023 •

edited

Loading

RalfJung commented Sep 5, 2023 •

edited

Loading