Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store pointer bytes in pointers for SB/provenance #272

Closed
wants to merge 2 commits into from

Conversation

saethlin
Copy link

@saethlin saethlin commented Apr 6, 2022

I'm opening this because I was made aware of #271.

I would be willing to consider a PR that makes this code friendlier to miri's raw pointer tagging

With these changes, this codebases passes Miri with raw pointer tagging.

In any case, if the original data structure can't be made compatible with raw pointer tagging, that is the opposite of a reason to get rid of it. That is what makes it interesting and useful for informing miri design and standard library design; see rust-lang/miri#1936.

I heartily agree. I think a lot of code which is not currently can be made amenable to Stacked Borrows (even with raw pointer tagging). The question is whether people are willing to write code in the patterns that it requires. In this case, the requirement is that pointer bytes be stored in a type which can carry provenance (that is, if you want to resurrect and dereference the pointer). There are two types which are useful for this: raw pointers and MaybeUninit. This PR implements the raw pointer approach.

Here is the MaybeUninit approach: https://github.com/saethlin/semver/blob/396400cd2eaccc82b6cf01f202fd08ef85099f7e/src/identifier.rs#L90-L120

I far prefer the MaybeUninit approach because I can see a way to implement it without writing platform-dependent code (I have no confidence in the 32-bit code in this PR). But it would bump this crate's MSRV up to 1.36, and I don't know if it's an option for this crate.

(yes I know the transmutes in the MaybeUninit implementation don't compile on 32-bit platforms, but that can be fixed I'm just being lazy with code that may not be interesting)

// Rustc >=1.53 has intrinsics for counting zeros on a non-zeroable integer.
// On many architectures these are more efficient than counting on ordinary
// zeroable integers (bsf vs cttz). On rustc <1.53 without those intrinsics,
// we count zeros in the u64 rather than the NonZeroU64.
#[cfg(no_nonzero_bitscan)]
let repr = repr.get();
let repr = repr.as_ptr() as usize;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be repr.as_usize because this is where you extract an integer value


// SAFETY: the most significant bit of repr is known to be set, so the value
// is not zero.
unsafe { NonZeroU64::new_unchecked(repr) }
Repr::from_ptr(unsafe { NonNull::new_unchecked(ptr.wrapping_sub(prev).wrapping_add(repr)) })
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-subtracting the offset has better behavior in debug mode on CHERI because you never produce a wildly out of range pointer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub unsafe fn from_inline(bytes: u64) -> Repr {
Repr {
ptr: unsafe { NonNull::new_unchecked(bytes as usize as *mut u8) },
remainder: (bytes >> 32) as usize,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the lack of as_u64, I don't see anywhere remainder is ever read. Is it actually a 64 bit value or is it just a isize?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is read both in the is_inline method, and a pointer to it is produced by inline_as_str. As I said in my initial comment, all this juggling to support 32-bit platforms makes me uncomfortable and I think the MaybeUninit implementation would be simpler, but whether it is viable at all is up to the maintainer.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that was exceedingly unclear. Maybe that cast chain could be moved to an impl on Repr?

@RalfJung
Copy link
Contributor

I heartily agree. I think a lot of code which is not currently can be made amenable to Stacked Borrows (even with raw pointer tagging). The question is whether people are willing to write code in the patterns that it requires. In this case, the requirement is that pointer bytes be stored in a type which can carry provenance (that is, if you want to resurrect and dereference the pointer).

FWIW, this requirement is not something that a future aliasing model is likely to change. It is, in fact, entirely orthogonal to aliasing questions -- this is about whether we support transmuting pointers to integers and back. IOW, the old code would fail even with -Zmiri-strict-provenance -Zmiri-disable-stacked-borrows. Pointer-integer transmutation is a major source of ambiguity in memory models and it is unclear to what extent they can be supported without putting a high burden on optimizations in all code (even code that performs no such transmutation).

Comment on lines 95 to 99
#[cfg(target_pointer_width = "32")]
#[derive(PartialEq, Eq, Clone, Copy)]
pub struct Repr {
ptr: NonNull<u8>,
remainder: usize,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to result in the front and back 4 bytes being randomly swapped depending on what order these two fields get placed in the struct layout.

fn main() {
    let prerelease = semver::Prerelease::new("asdfjkl").unwrap();
    println!("{}", prerelease);
    println!("{:?}", prerelease.as_str());
}
jklasdf
"jkl\0asdf"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add this one to the list?

Screenshot from 2022-05-06 10-10-57

@dtolnay
Copy link
Owner

dtolnay commented Apr 30, 2022

I tried adding a #[repr(C)] to control the field order I commented above, but the test suite still fails on 32-bit regardless of what order I put those two fields. Could you take a look? You can reproduce with cargo miri test --target i686-unknown-linux-gnu.

---- test_new stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `5`,
 right: `1`', tests/test_identifier.rs:17:9

@saethlin
Copy link
Author

saethlin commented May 1, 2022

So the answer is that I'm a dumdum and didn't actually run the tests on i686-unknown-linux-gnu. I actually wrote 2 bugs, but I also fixed the field order so that you could (if you really wanted) do the to_u64 conversion with a transmute.

We also learned in rust-lang/rust#96538 that ptr.wrapping_sub(ptr as usize) can be miscompiled currently, so I switched to a sequence of operations that we are pretty sure doesn't miscompile.

#[cfg(target_pointer_width = "32")]
pub unsafe fn from_inline(bytes: u64) -> Repr {
Repr {
ptr: unsafe { NonNull::new_unchecked(bytes as usize as *mut u8) },
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assumes the target is little endian. On big endian, the bytes at the lower address in memory are more significant, so the less significant half of bytes would be holding the end of the inline string, which would all be 0 bytes for strings length 4 or shorter -- see the call site here:

semver/src/identifier.rs

Lines 174 to 180 in 0b38170

let mut bytes = [0u8; 8];
// SAFETY: string is big enough to read len bytes, bytes is big
// enough to write len bytes, and they do not overlap.
unsafe { ptr::copy_nonoverlapping(string.as_ptr(), bytes.as_mut_ptr(), len) };
// SAFETY: it's nonzero because the input string was at least 1
// byte of ASCII and did not contain \0.
unsafe { Repr::from_inline(u64::from_ne_bytes(bytes)) }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--target mips64-unknown-linux-gnuabi64 works pretty well for little-endian testing.

Copy link
Owner

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put up my own implementation in #273, which does not involve any architecture-specific conditional compilation (target pointer width or endianness).

Would appreciate review if any of you get a chance.

@dtolnay
Copy link
Owner

dtolnay commented May 2, 2022

Closing in favor of #273.

Thanks anyway for the PR, and for everyone's review comments! This was helpful to inform my approach in the other PR.

@dtolnay dtolnay closed this May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants