Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C without UB is translated to Rust with UB #301

Open
jrmuizel opened this issue Sep 18, 2020 · 15 comments
Open

C without UB is translated to Rust with UB #301

jrmuizel opened this issue Sep 18, 2020 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@jrmuizel
Copy link

struct Foo {
    int len;
};

void dec(Foo* f) {
    f->len--;
}

void bar() {
    Foo f = {5};
    struct Foo *fp = &f;
    dec(fp);
    f.len = 6;
    dec(fp);
}

translates to

#[derive(Copy, Clone)]
#[repr(C)]
pub struct Foo {
    pub len: libc::c_int,
}
#[no_mangle]
pub unsafe extern "C" fn dec(mut f: *mut Foo) { (*f).len -= 1; }
#[no_mangle]
pub unsafe extern "C" fn bar() {
    let mut f: Foo = { let mut init = Foo{len: 5 as libc::c_int,}; init };
    let mut fp: *mut Foo = &mut f;
    dec(fp);
    f.len = 6 as libc::c_int;
    dec(fp);
}

which when run under miri gives:

error: Undefined Behavior: no item granting read access to tag <untagged> found in borrow stack.
  --> src/main.rs:10:49
   |
10 | pub unsafe extern "C" fn dec(mut f: *mut Foo) { (*f).len -= 1; }
   |                                                 ^^^^^^^^^^^^^ no item granting read access to tag <untagged> found in borrow stack.
   |
   = help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
   = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
           
   = note: inside `dec` at src/main.rs:10:49
note: inside `bar` at src/main.rs:17:5
  --> src/main.rs:17:5
   |
17 |     dec(fp);
   |     ^^^^^^^
note: inside `main` at src/main.rs:21:14
  --> src/main.rs:21:14
   |
21 |     unsafe { bar() }
   |              ^^^^^
   = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
   = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:137:18
   = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:66:18
   = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
   = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:381:40
   = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:345:19
   = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:382:14
   = note: inside `std::rt::lang_start_internal` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:51:25
   = note: inside `std::rt::lang_start::<()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:65:5

This happens because the mutable borrow of f is invalidated when f.len is mutated.

@ahomescu
Copy link
Contributor

ahomescu commented Sep 18, 2020

This happens because the mutable borrow of f is invalidated when f.len is mutated.

Sounds like the fix will be non-trivial, if possible at all, since we try to perfectly preserve the semantics and structure (somewhat) of the original C program. Re-borrowing every value after every possible mutation is going to require a lot of additional code and might introduce new bugs.

@TheDan64
Copy link
Contributor

Isn't the solution to just do let mut fp = &raw mut f; once raw references (or whatever they're called) are stabilized/nightly-available? This is a known issue with downcasting refs to ptrs in general

@jrmuizel
Copy link
Author

Isn't the solution to just do let mut fp = &raw mut f; once raw references (or whatever they're called) are stabilized/nightly-available? This is a known issue with downcasting refs to ptrs in general

I thought so too but that doesn't seem to fix it:
https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=20e0c25fd0bce187fb2ec03df11ff945

@TheDan64
Copy link
Contributor

Personally, I'd bet MIRI just doesn't understand raw references yet. This is the exact problem they were intended to fix, as I understand it.

@jrmuizel
Copy link
Author

Personally, I'd bet MIRI just doesn't understand raw references yet. This is the exact problem they were intended to fix, as I understand it.

@RalfJung what do you think?

@ahomescu
Copy link
Contributor

This is the exact problem they were intended to fix, as I understand it.

The RFC gives other motivations: taking pointers to unaligned structure fields (which you can't do via an intermediate reference) and to fields in structures at invalid addresses (e.g. at 0x0 to implement offsetof). On the other hand, it might solve the problem in this issue because of this part:

Lowering to MIR should not insert an implicit reborrow of <place> in &raw mut <place>; that reborrow would assert validity and thus defeat the entire point. The borrow checker should do the usual checks on the place used in &raw, but can just ignore the result of this operation and the newly created "reference" can have any lifetime. When translating MIR to LLVM, nothing special has to happen as references and raw pointers have the same LLVM type anyway; the new operation behaves like Ref. When interpreting MIR in the Miri engine, the engine will know not to enforce any invariants on the raw pointer created by &raw.

@RalfJung
Copy link

Miri does understand raw pointers. However, Rust has aliasing rules that C does not have, and those are becoming a problem here.

Specifically, when you have a local variable f and write f.len = ..., then (like if you had used a mutable reference) this asserts to the compiler that f is the unique way to access this memory, so any outstanding other pointers (safe references or unsafe raw pointers) now become invalid. This is crucial to enable the compiler to perform optimizations based on these aliasing rules.

Isn't the solution to just do let mut fp = &raw mut f; once raw references (or whatever they're called) are stabilized/nightly-available? This is a known issue with downcasting refs to ptrs in general

Yes that sounds like the most consistent approach to me -- do that and then never use f again. If everything goes through raw pointers, there are no alias assumptions.

I thought so too but that doesn't seem to fix it:

That is still mixing raw with non-raw accesses. Here's the "raw-only" version.

@jrmuizel
Copy link
Author

std::ptr::addr_of_mut will be in Rust 1.51 which should allow fixing this.

@ahomescu
Copy link
Contributor

ahomescu commented May 9, 2022

Looks like std::ptr::addr_of and addr_of_mut got stabilized, so we should use them.

@kkysen
Copy link
Contributor

kkysen commented Jun 3, 2022

Hi @RalfJung! We're noticing some unexpected behavior with miri with regards to addr_of_mut! and &mut x as *mut _ casts, and we wanted to make sure we're understanding things correctly and/or if there are any bugs in miri, so we have a few questions/examples to clarify.

We're using this as scaffolding code:

use std::ptr::addr_of_mut;

static mut X: i32 = 1;

unsafe fn dec(x: *mut i32) { 
    *x -= 1;
}

And all the examples below are wrapped in unsafe {}.

  1. Why does &mut x invalidate mutable pointers to x but &mut x as *mut _ doesn't?

We noticed that this passes miri:
playground

let mut x: i32 = 1;
let p = &mut x as *mut i32;
&mut x as *mut i32;
dec(p);

but not this:
playground

let mut x: i32 = 1;
let p = &mut x as *mut i32;
let _a = &mut x;
dec(p);

In the &mut x as *mut _ case, why does creating the intermediary &mut x not invalidate mutable pointers to x? Or rather, in
playground

let a = &mut x;         // Invalidates `p`
let _b = a as *mut i32; // Seemingly revalidates it

why does the &mut x invalidate p and then as *mut i32 revalidate it? We would've thought addr_of_mut! would be needed here to avoid creating the intermediary &mut x.

  1. Why does addr_of_mut! not work in some cases where &mut x as *mut _ does?

We noticed that while addr_of_mut! works fine for local variables, like this:
playground

let mut x: i32 = 1;
let p = &mut x as *mut i32;
*addr_of_mut!(x) = 2;
dec(p);

it doesn't work for static variables, like this:
playground

let p = &mut X as *mut i32;
*addr_of_mut!(X) = 2;
dec(p);

Also, for static variables, it works as long as the addr_of_mut! does not come after a &mut x as *mut _. That is,
playground

let p = &mut X as *mut i32;
*addr_of_mut!(X) = 2;
dec(p);

doesn't work, but these 3 other orderings do:
playground

let p = addr_of_mut!(X);
*addr_of_mut!(X) = 2;
dec(p);

playground

let p = addr_of_mut!(X);
*(&mut X as *mut i32) = 2;
dec(p);

playground

let p = &mut X as *mut i32;
*(&mut X as *mut i32) = 2;
dec(p);

Is this the correct behavior, and if it is, could you explain what exactly is happening? Or is &mut x as *mut _ working in more cases than it should? Or is addr_of_mut! working in fewer cases than it should? And why are static variables behaving differently from local variables?

Thanks for all the help!

@RalfJung
Copy link

RalfJung commented Jun 3, 2022

Many of your questions, I think, are related to the fact that Miri, by default, treats raw pointers as "untagged". This is necessary to accept a bunch of real-world code out there that does nasty things with integer-pointer casts, but can lead to very surprising behavior. We are actively working on fixing this, but we are not there yet.

So, what happens when you run all of your examples with -Zmiri-tag-raw-pointers? Unfortunately, this is not possible on the playground, so it is more annoying to explore. Your first example then fails, as you (probably?) expected:

let mut x: i32 = 1;
let p = &mut x as *mut i32;
&mut x as *mut i32;
dec(p);

shows

error: Undefined Behavior: attempting a read access using <3072> at alloc1653[0x0], but that tag does not exist in the borrow stack for this location
  --> example.rs:6:5
   |
6  |     *x -= 1;
   |     ^^^^^^^
   |     |
   |     attempting a read access using <3072> at alloc1653[0x0], but that tag does not exist in the borrow stack for this location
   |     this error occurs as part of an access at alloc1653[0x0..0x4]
   |
   = help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
   = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <3072> was created by a retag at offsets [0x0..0x4]
  --> example.rs:12:9
   |
12 | let p = &mut x as *mut i32;
   |         ^^^^^^
help: <3072> was later invalidated at offsets [0x0..0x4]
  --> example.rs:13:1
   |
13 | &mut x as *mut i32;
   | ^^^^^^
   = note: inside `dec` at example.rs:6:5

If there is still surprising behavior that you see with this flag, please let me know. :)

@kkysen
Copy link
Contributor

kkysen commented Jun 7, 2022

Thanks so much for the help! When I run all the examples with -Zmiri-tag-raw-pointers, the only one that still passes is the one that uses addr_of_mut! everywhere:

playground

let p = addr_of_mut!(X);
*addr_of_mut!(X) = 2;
dec(p);

which is what I was expecting.

@RalfJung
Copy link

RalfJung commented Jun 7, 2022

That's great to hear. :) Once rust-lang/miri#2133 is finished, it'd be great if you could test them all again using the default flags; the hope is they should still all behave as you expect then.

@kkysen
Copy link
Contributor

kkysen commented Jun 7, 2022

Okay, once rust-lang/miri#2133 lands, I'll re-test all the examples.

@kkysen
Copy link
Contributor

kkysen commented Jun 7, 2022

The current plan is to:

  1. Fix all to-pointer casts that go through references to instead use addr_of! and addr_of_mut!. That is,

    • &mut x as *mut T becomes addr_of_mut!(x)
    • &x as *const T becomes addr_of!(x)

    Doing it this way, specifically for all such casts and for *mut and *const pointers, has numerous advantages:

    • It is correct in all cases because it does not create intermediate references.
    • It more closely matches C semantics, which don't contain any of the semantics of Rust references, so we should avoid creating them.
    • It is shorter than the & borrow and as cast, which is better for readability.
    • It has a more descriptive name, so it's clear what is happening compared to the & borrow and as cast.
    • It has higher precedence than an as cast, so extra parentheses aren't needed, which hurts readability.
  2. Then we fix the direct access to owned values (locals and statics) that have had their address taken. Instead, we immediately shadow the owned value with a *mut pointer to it (gotten through addr_of_mut!). Then all reads and writes go through that *mut pointer instead of the original owned value.

@kkysen kkysen added the bug Something isn't working label Jun 17, 2022
@kkysen kkysen self-assigned this Jun 17, 2022
@kkysen kkysen changed the title C without undefined behaviour is translated to Rust code with undefined behaviour C without UB is translated to Rust with UB Jun 29, 2022
zeroqn referenced this issue in zeroqn/blake2b-ref.rs Aug 10, 2022
jjyr referenced this issue in jjyr/blake2b-ref.rs Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants