Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miri tests failing #135

Closed
sharksforarms opened this issue Aug 3, 2021 · 9 comments
Closed

Miri tests failing #135

sharksforarms opened this issue Aug 3, 2021 · 9 comments

Comments

@sharksforarms
Copy link
Contributor

I noticed the following recently in miri tests in deku: https://github.com/sharksforarms/deku/runs/3231975319?check_suite_focus=true#step:4:301

error: Undefined Behavior: pointer to alloc961099 was dereferenced after this allocation got freed
   --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/bitvec-0.22.3/src/ptr/span.rs:418:12
    |
418 |         unsafe { &*self.to_bitslice_ptr() }
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^ pointer to alloc961099 was dereferenced after this allocation got freed
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
            

I was able to bisect it down to 3a6bc02 in bitvec using cargo +nightly miri test 2>&1 | grep "test array::tests::ops ... ok"

What's weird is that running the singular test works fine, and -- --test-threads=1 passes too, but running all tests throws an error.

@niluxv
Copy link

niluxv commented Aug 4, 2021

I don't think this has anything to do with commit 3a6bc02 and it is probably a bug in miri. I think this happens when a BitSlice is allocated at the same point in memory where there has been a previous allocation. That also explains why the issue is extremely dependant on all kinds of environment settings like filtering on tests, test threads and small code changes like in 3a6bc02.

The code which triggers the error

unsafe { &*self.to_bitslice_ptr() }

is sound because self.to_bitslice_ptr() gives a *const BitSlice, which is basically a *const [()]. When the pointer doesn't point to an allocation, miri knows that this is fine, but when it points to an allocation that has previously been used and is freed/deallocated, this somehow triggers the deref to freed allocation error.

Examples in playground showing the above described behaviour:
fresh dangling pointer, miri ok
pointer to freed allocation (thus dangling), miri error

@niluxv
Copy link

niluxv commented Aug 29, 2021

It was pointed out in the referenced miri issue by RalfJung that it might not actually be sound to dereference a pointer to a ZST when the pointer points into a previous deallocated allocation. See rust-lang/unsafe-code-guidelines#93.

The issue is that bitvec encodes the in-byte offset in unused bits of a reference to a ZST. A reference (contrary to a raw pointer) needs to always be valid for reads, so that it is UB when the reference points into memory that is part of a deallocated allocation. Fortunately on Linux x86_64 the reference cannot point to deallocated memory, since the coding bits would otherwise be unused (all zero) by the memory address space design. However, miri is platform independent and cannot assume such restrictions on the memory address space.

@myrrlyn
Copy link
Collaborator

myrrlyn commented Nov 20, 2021

I apologize for my radio silence; it's been a wild fall for me and I've been basically heads-down in pushing for 1.0 and haven't looked at my GitHub notifs at all.

This has been occurring for me also, essentially exactly as you all have described. I've been trying to figure out a way to bypass it that maintains all the necessary invariants, but because it is a transient bug, I haven't been able to really have much success with it.

The main problem appears to be a tension between behavior I have been explicitly asked to maintain for compatibility with [T], and Miri's comprehension of reference behavior. While bitvec is willing to conjure zero-length &[()] references using data pointers other than the canonical dangling, in general it shouldn't do so using addresses that have become deällocated. This is not my first time encountering strangeness with stack items, though.

What stumps me is that the references Miri believes are faulty appear as [(); 0]. If they were non-zero apparent length, I would be more understanding, because that would indicate Miri doesn't understand the BitSpan encoding and is being conservatively harsh. Faulting on references to [(); 0], though, is a different issue.

I don't really know if there's anything I can particularly do to convince Miri that I'm not dereferencing deällocated memory. I haven't seen this fault in a while, but I also haven't changed any of the faulting codepaths since before it went away, so I'm very conscious that there's nothing I did to resolve it.

I think this is something we're basically going to have to live with. I'd like to have more communication with the Miri team about not investigating provenance on [(); N] dereferences, since one of the major purposes of ZSTs is that they do not exist in memory. Miri's conception of provenance irritates me enough that I've strongly considered being more aggressive about routing pointers through usize and destroying provenance information, but there's not a lot that I can do to the bit-pattern of an address that LLVM will dead-code eliminate but Miri can't also see through.

@RalfJung
Copy link

RalfJung commented Dec 3, 2021

(Coming here from rust-lang/miri#1866 (comment).)

I'd like to have more communication with the Miri team

I'm happy to figure out what is needed here. :) I will note though that the problem reported against Miri above is caused by usize being used too much, and if provenance would be preserved better Miri would be less confused. ;)

Also, while I would love to put further exemptions on provenance for zero-sized things, sadly I could not not get confirmation from LLVM that they are okay with this. (Specifically, I asked if getelementptr inbounds with offset 0 is okay no matter the provenance, and to the extend that I got an answer it seemed to be 'no'.) This is being discussed in rust-lang/unsafe-code-guidelines#93 and rust-lang/unsafe-code-guidelines#299.

@RalfJung
Copy link

RalfJung commented Dec 6, 2021

I will note though that the problem reported against Miri above is caused by usize being used too much, and if provenance would be preserved better Miri would be less confused. ;)

Specifically, as @tavianator noted here, avoiding the usize-to-ptr cast in this line fixes at least some provenance problems. That cast can be avoided using a function like this:

// Converts 'addr' to a pointer using the provenance of 'prov'.
fn int_to_ptr_with_provenance<T>(addr: usize, prov: *const T) -> *const T {
  // Nowhere in this function do we cast an integer to a pointer!
  let ptr = prov.cast::<u8>();
  ptr.wrapping_add(addr.wrapping_sub(ptr as usize)).cast()
}

This function makes it explicit which provenance the new pointer should have (in the bitvec code linked above, that would be the provenance of addr.to_const()), thus avoiding any guesswork that the Rust semantics (and Miri) otherwise have to do to produce a provenance for a ptr created from a usize.

On the other hand I have no idea if the compiler will be able to entirely optimize away something like int_to_ptr_with_provenance -- one would hope LLVM Machine IR optimizations can take care of things like that, but I don't know.

@myrrlyn
Copy link
Collaborator

myrrlyn commented Dec 21, 2021

I will note though that the problem reported against Miri above is caused by usize being used too much, and if provenance would be preserved better Miri would be less confused. ;)

I have been habitually taking an adversarial view of Miri's understanding, largely because I don't understand it, and am always happy to be told to change my tune :)

I also always appreciate free code to do a better job. To be sure I understand it, this function takes a computed new address addr, an original pointer with provenance information attached prov, and then computes prov + (addr - prov) within the provenance space, resulting in a new value of addr but with the provenance information from prov?

On the other hand I have no idea if the compiler will be able to entirely optimize away something like int_to_ptr_with_provenance -- one would hope LLVM Machine IR optimizations can take care of things like that, but I don't know.

When I plug BitSpan::new_unchecked, including your int_to_ptr_with_provenance, into Godbolt and instantiate it, the debug build has an absurdly massive amount of mov instructions, and the release build looks like this:

example::make_bitspan_u32:
        mov     rax, rdi
        shl     rsi, 5
        movzx   ecx, dl
        sub     rsi, rcx
        and     edx, 3
        lea     rdx, [rdx + 8*rsi]
        ret

I have long ago stopped pretending that I need to care about LLVM's desires, because lots of people smarter and more dedicated than me have already done this. A truly frightening amount of the crate gets solved at compile time and I have given up entirely on trying to benchmark it or otherwise be conscious about performance, because despite my best efforts I just cannot stop getting reports of 0ns/iteration. If I weren't directly and freely benefiting from other people's labor I'd almost be annoyed.

@RalfJung
Copy link

To be sure I understand it, this function takes a computed new address addr, an original pointer with provenance information attached prov, and then computes prov + (addr - prov) within the provenance space, resulting in a new value of addr but with the provenance information from prov?

Yes, that sounds right. Ideally LLVM would have a primitive operation to do this, but for now this is the best way we have to express this operation.

niluxv added a commit to niluxv/bitvec that referenced this issue Dec 25, 2021
Fix the miri failures in doctests, see issue ferrilab#135. The issue is that miri doesn't guess
correct provenance in the int-to-ptr cast in `BitSpan::new_unchecked`, as was found by
@tavianator [here](rust-lang/miri#1866 (comment)).

The solution is to preserve provenance and was proposed by  @tavianator
[here](rust-lang/miri#1866 (comment)).
With this change the entire test suite passes under miri.
@myrrlyn
Copy link
Collaborator

myrrlyn commented Jan 5, 2022

Removing my patch solution (the load-bearing debug-assert) passes Miri with Ralf's good_inttoptr function (thank you again!). I am closing this as part of run-up to 1.0. If anybody really needs to stay on 0.22, please re-open and I'll add this to the backlog. Thanks everyone!

@myrrlyn
Copy link
Collaborator

myrrlyn commented Jan 12, 2022

I'm locking this because I'm about to tell people about it on reddit

@ferrilab ferrilab locked and limited conversation to collaborators Jan 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants