-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safe function MIR reads discriminant of moved-out local #91029
Comments
|
…value" This reverts commit 0a2b7d7, reversing changes made to 47c1bd1. This caused several unforeseen problems: - rust-lang#91029 - rust-lang#89764 (comment)
Revert "require full validity when determining the discriminant of a value" This reverts commit 0a2b7d7, reversing changes made to 47c1bd1. This caused several unforeseen problems: - rust-lang#91029 - rust-lang#89764 (comment) So I think it's best to revert for now while we keep discussing the MIR semantics of getting a discriminant. r? `@oli-obk`
Revert "require full validity when determining the discriminant of a value" This reverts commit 0a2b7d7, reversing changes made to 47c1bd1. This caused several unforeseen problems: - rust-lang#91029 - rust-lang#89764 (comment) So I think it's best to revert for now while we keep discussing the MIR semantics of getting a discriminant. r? `@oli-obk`
I've been looking at some related stuff in the last few days. Consider: use std::num::NonZeroU8 as NZ;
struct HasNiche(NZ);
impl Drop for HasNiche {
fn drop(&mut self) {
println!("Niche drop")
}
}
struct LoudDrop(u8);
impl Drop for LoudDrop {
fn drop(&mut self) {
println!("Extra field drop")
}
}
#[inline(never)]
fn print_boop() {
println!("boop");
}
fn real_main(e: Option<(HasNiche, LoudDrop)>) {
match e {
Some((_a, _)) => {}
None => {}
}
print_boop();
}
fn main() {
assert_eq!(std::mem::size_of::<Option<(HasNiche, LoudDrop)>>(), 2);
let x = (HasNiche(NZ::new(5).unwrap()), LoudDrop(10));
real_main(Some(x));
} At the end of There is nothing to really "fix" here though, I think this is just basically a language bug. If we want to support both the above pattern and drop order, and deinitializing on move, then we have to extend drop elaboration to track the active variant separately - this is certainly possible, but it's enough complexity that this feels like a language change. I might try proposing a lang design meeting to go over the options here and get a sense of what the team feels would be best. There's a couple different options |
From a language point, I don't feel there is a real issue. Moving a value deinitializes, but not deinitializing is always more defined than deinitializing. Therefore, if the compiler decides not to deinitialize the enum in order to keep easier track of things, it is completely within its rights. From a MIR semantics POV, we need to have a better way of separating "move and deinitialize" from "move". Add a boolean flag to |
Isn't Personally I think we should rather have a boolean flag in the |
It moves the value (e.g. it clears the drop flag), it just does not deinitialize it. I just wanted to confirm that this does not affect the question of whether moves of locals kill borrows. Which I believe it doesn't, as long as rustc is consistent with itself. |
This issue is about the dynamic operational semantics of If this was unsafe code we'd have the excuse that maybe the code is UB, but since this is all safe code that cannot be the case. Hence either MIR building is wrong or the op.sem of |
Right. If we decide that a MIR move is always destructive, we would need to change the MIR code rustc generates. I believe that for allowing for easier reordering of MIR operations during MIR optimizations, it might be right to have MIR-level "destructive moves" and "non-destructive moves". The operational semantics has to support some sort of non-destructive moves for In that case, the compiler would need to ensure it emits a non-destructive move during match lowering. And all optimizations that want to move reads backwards should turn them into non-destructive reads (or equivalently, have a pass that turns all reads non-destructive beforehand). The mental model I actually had in mind was a pretty ugly "a read of a complete local (e.g. The way I see it:
This means that Rust -> MIR -> MiniRust or Rust -> MIR -> MIRI allows some programs that are UB by the specification, but I think with match desugaring that that is fairly unavoidable (since match desugaring can sometimes not dereference pointers even when it "might"). |
As this issue shows, that is not sufficient. This code has UB in Miri even without loads being destructive! That's how we found it in the first place. The UB arises because a Box is being dropped, making a value invalid no matter whether loads are being destructive or not. This issue is specifically about that match interaction and whether the MIR we build here makes sense, and the semantics of GetDiscriminant. rust-lang/unsafe-code-guidelines#188 is the issue for the general discussion around the semantics of move; #68364 and #71117 are also relevant.
The plan is to use MiniRust and the authoritative definition for what is UB, so this would be undesirable. All Surface Rust UB should be captured in the desugaring. However we can define a desugaring that does not go via MIR. Still it is also desirable for Miri to be able to detect all UB, so we should strive to at least have the option of reflecting all of it in MIR. |
If I understand things correctly, the pattern we have is
Is, for example, giving the Also, in my POV, the values that drop elaboration touches are "owned by the compiler" and we should have our aliasing model set up so that anyone that touches them is invoking at least surface-language-level UB. |
I'd rather avoid having two variants of GetDiscriminant. IF we need to go through the effort of defining a GetDiscriminant that works with partially initialized enums, we might as well use it everywhere and not add even more complexity by defining 2 variants.
I am not entirely sure what you mean by this, but it also sounds like a separate discussion. Could you open an issue in https://github.com/rust-lang/unsafe-code-guidelines/ with an example? |
For "why match construction can sometimes 'refine' UB", see e.g. fn main() {
let mut data = false;
let ptr = &mut data as *mut bool;
let safe_ref = unsafe {&* ptr };
match safe_ref {
&false if {
unsafe { *ptr = true };
false
} => unreachable!(),
&false => println!("A"),
&true => println!("B"),
}
} It is clear to me that this code should be UB, but I don't see a way to make that obvious in the MIR since |
My second thought was that drop elaboration always takes place "after" semantics is decided: when we use MIR for defining semantics, the MIR that counts is the MIR that comes out of MIR construction, and everything that comes after is allowed to be a refinement. |
We might want surface-level Rust matches to have the property of forcing the scrutinee to be valid. ElaborateDrop matches are not a part of the semantics in my mind, they are a refinement of the semantics. |
Is this code defined or undefined, any why? And what variant does it drop? enum MyEnum {
V1 {
discr: Box<u32>,
other_value: Option<Box<u32>>,
},
V2,
}
fn main() {
let mut val = MyEnum::V1 {
discr: Box::new(0),
other_value: Some(Box::new(1))
};
let raw_ptr_to_discr: *mut usize = match val {
MyEnum::V2 => panic!(),
MyEnum::V1 { ref mut discr, .. } => discr as *mut _ as *mut _
};
match val {
MyEnum::V1 { other_value, .. } => drop(other_value),
_ => unreachable!(),
};
// `change the variant`
unsafe { *raw_ptr_to_discr = 0; }
// end of scope
} |
Then that's a problem with our desugaring. I don't think we should be getting into the business of trying to write an opsem for surface-Rust, I expect that to have too many problems. There have been some discussions about how the desugaring should be adjusted so that we're certain we catch all such UB. It's a rather big design space though and no agreement yet.
This is a very nice example. I think the only thing consistent with destructive moves we can have is that this will attempt to drop the fields in V1 that are not moved out of (which is UB due to dropping a null box) |
Yea. I think we also want it to be well-defined (to drop V1) if you do unsafe {
*raw_ptr_to_discr = Box::new(0u32).leak() as *const _ as usize;
} |
BUT - personally, I believe that the "what is UB" part of operational semantics are supposed to run on pre-drop-elaboration, or rather, pre-all-optimization, MIR. And also have a "-Z naive-mir-construction" flag that makes e.g. matches desugar to an if-chain and maybe a few other changes. I believe that both non-naive MIR construction and drop elaborations are allowed to be refinements, i.e. define some behavior for some UB. |
While I still believe that drop elaboration is not supposed to be a part of operational semantics, I don't see a real reason why it can't remember the variant in a drop flag. Maybe it should? Can you find any code that would make the current implementation not a refinement of the "remember the discriminant and check that the enum has the right variant, ignoring the enum's validity invariant implementation"? Unfortunately, I think that fully defining a "magic get discriminant" along with destructive moves would be pretty annoying (if you move out the field that does contain the discriminant, how to define "only me can corrupt this field"?), so maybe just moving to "remember variant in a drop flag" is the simplest fix. |
Drop elaboration is not an optimization imo. How would you write Miri for pre-drop elab, without just reimplementing all of drop elab?
I agree, and this is what I've suggested on the past. The only problem is that this is a pretty big change to drop elab and I have no time to implement it.
No idea, because I don't know what an opsem for pre-drop elab MIR looks like. The only way I can think of to do it would make it trivially correct to just run drop elab instead. |
The more accurate way to say this is: when I wrote drop elaboration, I had in mind an opsem for pre-drop-elaboration MIR which is basically keeping the entire drop elaboration info in magic side tables. I wrote drop elaboration to attempt to be a (potentially strict) refinement of that op sem. If you want drop elaboration to also be a precise implementation of that because it would make MIRI's life easier, I might actually have a go at changing this. I don't think there are any other places where it is a strict refinement, but "being a precise implementation rather than a refinement" was not a design goal. |
That's really useful information actually.
I do think we want this, not only to make Miri's life easier, but also because I think we would run the risk of implementing the same thing twice, which will just result in bugs when those things disagree...
I would be super thankful if this was fixed. If you do work on it, make sure to coordinate work with the folks working on this MCP to prevent toe stepping, etc. |
Now, after we have MIR construction and MIR borrowck all debugged, I see why having semantics after drop elaboration rather than before them is a good idea. I still want to see whether we really want the DropIf change, or juts do borrowck with a "normal" Drop terminator. |
My understanding is that the target goal (and motivation behind this MCP) is being able to run drop-elab before borrowck. If this can be achieved without |
I have opened rust-lang/unsafe-code-guidelines#391 for the general discussion of drop elaboration and operational semantics. For this specific discussion, I notice I am a bit confused. For the case of a partially moved out enum, why do we even have to read the discriminant again to see which variant to drop? Clearly we know which fields have been moved out of, doesn't that also mean we know the variant and can just drop the remaining fields of that variant? |
#90895 caused the Miri test suite to fail, and further investigation showed that this is due to strange MIR being generated for
Option::map
: when I dump the MIR for this code and look at the pre-const-prop code, it looks like:The strangeness is in bb5: this is where we end up after the closure returns, but it reads the discriminant of
_1
which we have already moved out of at this point! With Miri now checking validity when the discriminant is read, it complains (in the linked-list testcase) that there is a danglingBox
here (I guess the closure deallocated thatBox
). There are also proposals that would make a move de-initialize the memory it is moving from, which would certainly make looking at its discriminant again UB.I am not quite sure where this comes from, but it does seem in conflict with the idea that only valid values can have their discriminant read. Cc #89764 @wesleywiser
The text was updated successfully, but these errors were encountered: