-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store pointer bytes in pointers for SB/provenance #272
Conversation
src/identifier.rs
Outdated
// Rustc >=1.53 has intrinsics for counting zeros on a non-zeroable integer. | ||
// On many architectures these are more efficient than counting on ordinary | ||
// zeroable integers (bsf vs cttz). On rustc <1.53 without those intrinsics, | ||
// we count zeros in the u64 rather than the NonZeroU64. | ||
#[cfg(no_nonzero_bitscan)] | ||
let repr = repr.get(); | ||
let repr = repr.as_ptr() as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be repr.as_usize because this is where you extract an integer value
src/identifier.rs
Outdated
|
||
// SAFETY: the most significant bit of repr is known to be set, so the value | ||
// is not zero. | ||
unsafe { NonZeroU64::new_unchecked(repr) } | ||
Repr::from_ptr(unsafe { NonNull::new_unchecked(ptr.wrapping_sub(prev).wrapping_add(repr)) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-subtracting the offset has better behavior in debug mode on CHERI because you never produce a wildly out of range pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub unsafe fn from_inline(bytes: u64) -> Repr { | ||
Repr { | ||
ptr: unsafe { NonNull::new_unchecked(bytes as usize as *mut u8) }, | ||
remainder: (bytes >> 32) as usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the lack of as_u64, I don't see anywhere remainder is ever read. Is it actually a 64 bit value or is it just a isize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is read both in the is_inline
method, and a pointer to it is produced by inline_as_str
. As I said in my initial comment, all this juggling to support 32-bit platforms makes me uncomfortable and I think the MaybeUninit
implementation would be simpler, but whether it is viable at all is up to the maintainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that was exceedingly unclear. Maybe that cast chain could be moved to an impl on Repr?
FWIW, this requirement is not something that a future aliasing model is likely to change. It is, in fact, entirely orthogonal to aliasing questions -- this is about whether we support transmuting pointers to integers and back. IOW, the old code would fail even with |
src/identifier.rs
Outdated
#[cfg(target_pointer_width = "32")] | ||
#[derive(PartialEq, Eq, Clone, Copy)] | ||
pub struct Repr { | ||
ptr: NonNull<u8>, | ||
remainder: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to result in the front and back 4 bytes being randomly swapped depending on what order these two fields get placed in the struct layout.
fn main() {
let prerelease = semver::Prerelease::new("asdfjkl").unwrap();
println!("{}", prerelease);
println!("{:?}", prerelease.as_str());
}
jklasdf
"jkl\0asdf"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding a ---- test_new stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `5`,
right: `1`', tests/test_identifier.rs:17:9 |
So the answer is that I'm a dumdum and didn't actually run the tests on We also learned in rust-lang/rust#96538 that |
#[cfg(target_pointer_width = "32")] | ||
pub unsafe fn from_inline(bytes: u64) -> Repr { | ||
Repr { | ||
ptr: unsafe { NonNull::new_unchecked(bytes as usize as *mut u8) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this assumes the target is little endian. On big endian, the bytes at the lower address in memory are more significant, so the less significant half of bytes
would be holding the end of the inline string, which would all be 0 bytes for strings length 4 or shorter -- see the call site here:
Lines 174 to 180 in 0b38170
let mut bytes = [0u8; 8]; | |
// SAFETY: string is big enough to read len bytes, bytes is big | |
// enough to write len bytes, and they do not overlap. | |
unsafe { ptr::copy_nonoverlapping(string.as_ptr(), bytes.as_mut_ptr(), len) }; | |
// SAFETY: it's nonzero because the input string was at least 1 | |
// byte of ASCII and did not contain \0. | |
unsafe { Repr::from_inline(u64::from_ne_bytes(bytes)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--target mips64-unknown-linux-gnuabi64
works pretty well for little-endian testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put up my own implementation in #273, which does not involve any architecture-specific conditional compilation (target pointer width or endianness).
Would appreciate review if any of you get a chance.
Closing in favor of #273. Thanks anyway for the PR, and for everyone's review comments! This was helpful to inform my approach in the other PR. |
I'm opening this because I was made aware of #271.
With these changes, this codebases passes Miri with raw pointer tagging.
I heartily agree. I think a lot of code which is not currently can be made amenable to Stacked Borrows (even with raw pointer tagging). The question is whether people are willing to write code in the patterns that it requires. In this case, the requirement is that pointer bytes be stored in a type which can carry provenance (that is, if you want to resurrect and dereference the pointer). There are two types which are useful for this: raw pointers and
MaybeUninit
. This PR implements the raw pointer approach.Here is the
MaybeUninit
approach: https://github.com/saethlin/semver/blob/396400cd2eaccc82b6cf01f202fd08ef85099f7e/src/identifier.rs#L90-L120I far prefer the
MaybeUninit
approach because I can see a way to implement it without writing platform-dependent code (I have no confidence in the 32-bit code in this PR). But it would bump this crate's MSRV up to 1.36, and I don't know if it's an option for this crate.(yes I know the transmutes in the MaybeUninit implementation don't compile on 32-bit platforms, but that can be fixed I'm just being lazy with code that may not be interesting)