-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validity of aggregate types (structs, enums, tuples, arrays, ...) #69
Comments
Since unions are covered in #73 -- I think this looks right to me. One question mark might be some of the "well-known" types for example: I'm not sure how to categorize those. (Similarly, rustc itself supports a richer attribute
Probably that should be fixed by having their fields by |
Ah, good point. This all boils down to |
Hm, the fact that these are "wrapper structs" actually has some interesting consequences. #![feature(rustc_attrs)]
#[rustc_layout_scalar_valid_range_start(1)]
#[repr(transparent)]
pub(crate) struct NonZero(u32);
fn main() { unsafe {
let mut x = Some(NonZero(1));
match x {
Some(NonZero(ref mut c)) => {
// Just writing 0 into an &mut u32
*c = 0;
}
None => panic!(),
}
assert!(x.is_some());
} } The assertion fails. And yet, I cannot find a good argument for where this program would raise UB. After all, it never creates an rvalue that violates its validity invariant. |
@RalfJung Wouldn't an rvalue be created in the implementation of |
Not really, it is just going to load the discriminant. But even then, it would be an rvalue of type |
Hmm, I still feel that |
There's three potential places I see that could be declared UB:
To me it feels like we should have UB when assigning 0 to the u32 wrapped in the NonZeroU32. This, however, would require the u32 to inherit validity requirements. I noted that the NonZero types in core don't ever create refs to the inner type at all. (I don't know about other There's a bit more freedom in where UB can be around stdlib-only attributes I feel. Attributes like this one are For those curious, all uses of |
I think that is the key point here, and it worries me a lot: we'd have to somehow keep around some extra state (the "true discriminant"), and that is far from trivial. In particular, with types like
fn main() { unsafe {
let mut x = Some(true);
match x {
Some(ref mut b) => {
let u = b as *mut bool as *mut u8;
// Just writing into a *mut u8
*u = 2;
}
None => panic!(),
}
assert!(x.is_some());
} } |
In that example I'd think the UB is clearer, personally. You're writing I don't know how the rule would be written, or exactly when the UB would be defined to happen, but writing Actual ad-hoc proposal: while there is a reference on the stacked-borrows stack (conceptually, I think this can work for other borrowing models), it's UB to write a value that doesn't meet the validity requirement of all of the borrows. (Either that, or it's UB when popping the raw and getting to the ref on top.) |
There's no such thing as a "bool memory slot". Rust has no C-style rules about "type punning" -- memory is a bunch of bytes, and every load/store operation interprets these types appropriately. That makes it much easier to explain legal byte-level accesses that can be mixed with "properly" typed accesses. To take the discussion into a different direction, is it even a problem if this is not UB? Are there optimizations that are in conflict with this code? The one optimization I can imagine is something along the lines of "nobody wrote the discriminant so it cannot have changed". |
Here's another variant: fn main() { unsafe {
let mut x = Some(&0);
match x {
Some(ref mut b) => {
let u = b as *mut &i32 as *mut usize;
// Just writing into a *mut u8
*u = 0;
}
None => panic!(),
}
assert!(x.is_some());
} } In terms of what happens with the memory, this code is basically the same as fn main() { unsafe {
let mut x = Some(&0);
(&mut x as *mut _ as *mut usize) = 0;
assert!(x.is_some());
} } which is certainly allowed. It seems really hard to allow the latter but disallow the former. |
I don't see how, or rather, writing a value "of the wrong type" (intuitively, even if memory is untyped) isn't the only cause for it. Taking |
We guarantee the enum niche optimization for The example @RalfJung showed (#69 (comment)) exploits the knowledge that the niche AFAIK we don't guarantee which niche is used, and we are allowed to change that, so AFAICT the We could start guaranteeing which niches are used for the enum optimizations, although that might get complicated as we start guaranteeing more enum optimizations, e.g., for Personally, I find it weird that using That way to avoid UB one would need to write: fn main() { unsafe {
let mut x = Some(true);
match x {
Some(ref mut b) => {
let u = b as *mut bool as *mut u8;
let o: Option<bool> = None;
// Just writing into a *mut u8
*u = std::mem::transmute(o);
}
None => panic!(),
}
assert!(x.is_some());
} } |
That doesn't make it UB, but it makes the code rely on unspecified layout details. It's like transmuting structs around making assumptions about layout. If the compiler makes the expected choice, that's not UB, but it might become UB if the compiler choice ever changes. But for my latest example, using references, we do specify the layout optimization, so there would be no issue with this.
That's not the case. This is not UB either: fn main() { unsafe {
let mut x = true;
let xptr = &mut x as *mut bool as *mut u8;
*xptr = 2;
} } The point is that |
Makes sense.
Ah, in my experiments in the playground I was always using
We should probably add Vectors to the list, with the caveat that we currently can't have Vectors whose element types are unions, so we can't currently have a Vector of MaybeUninit, although maybe we should be able to do that. |
By vector your mean SIMD types? These behave exactly like arrays, as far as I am concerned. But yeah, we should probably list them... not sure when/if that list will be complete.^^ |
Yes I mean SIMD types. From the validity point of view, they do behave exactly as arrays. They are only a distinct kind of type because they have a different layout. |
I propose we move the discussion about validity of inner fields and unused variables to #84. |
IIRC @gnzlbg agreed to do a write-up of this. |
For cases where one wants to transmute [u8] into structs, would one of the following help avoiding UB? a) a const method that verifies that all bit patterns of a type are valid values or b) a runtime method that checks whether a sequence of bytes would be a valid value of a specific type? |
See #137. It would be good to explicitly document that fieldless This is a common misunderstanding and as such it seems to be a good idea to explicitly mention it. |
It may be good to explicitly state that the #[repr(u8)]
#[non_exhaustive]
enum Foo {
Zero,
One,
}
fn this_is_not_okay() -> Foo {
unsafe { std::mem::transmute(2u8) }
} |
Chiming in to say I recently made the mistake of accepting a fieldless It looks like this issue has been quiescent for a bit over a year. What's needed to make progress on documenting enum validity? Are there still decisions to be made or is it a matter of writing up some verbiage and making a pull request? Thanks, |
a PR would be good. Note that even if something is documented in these guidelines or issues then it's all still "non-normative", as they say. In other words, unofficial, guidelines only, the compiler doesn't necessarily respect it, etc etc. Most stuff described in this repository needs to eventually become various RFCs. Until then, writing it down in this repo is the best we got. |
Thanks, I'll work on a PR! |
Yeah, there was not a lot of discussion here because we have consensus on most things, I would say:
So, a PR definitely makes sense. |
Discussing what the validity invariants of aggregate types are (and assembling a full list of aggregate types).
Safe compound types include enums, structs, tuples, arrays, slices, closures, generators, SIMD vectors.
The obvious invariant is
repr(C)
enums as well! See #[repr(C)] C-like enums and out of range values rust-memory-model#41 for some discussion of that specific case.Is there any exception? Currently at least, generators are an exception: Their fields may be uninitialized, leading to special cases in both layout computation code and Miri.
(I put these all together because my expectation is that there's not much to say here. We can split this up into several topics if that seems necessary.)
The text was updated successfully, but these errors were encountered: