-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matching on uninhabited unsafe places (union fields, raw pointer dereferences, etc.) allowed in safe code. #47412
Comments
Seems like unions elements ought to be inhabited. |
@nagisa points out that we can't tell if generic types are inhabited, so maybe such a fix is not viable. |
cc @rust-lang/lang -- a bit of a tricky thing to decide what we should disallow here, though I'm leaning towards "safe access to union fields", or at least restricting the cases further (e.g., to those cases where know more than copy, but also inhabited) |
Also: @eternaleye points out uninitialized values (e.g., on itanium) may trap if you read from them, and this union RFC would seem to allow access to them from safe code. Leaning more and more towards "union fields should never be safe to access". =) |
We've known for a long time that a bitcast between two types (with the same number of bits) is safe iff all possible bitpatterns of each of the two types are valid and they correspond to distinct values. |
Is this actually UB, and should it be? EDIT: Ah, nevermind, I understand. I tried this on the playground, and the entire function compiles down to a single undefined instruction ( Also, I don't recall us ever making union field reads safe. Does this somehow work outside an unsafe block because the match doesn't actually do any pattern-matching? It seems like the reference to (Also, good catch, @nagisa!) |
@joshtriplett: It's not that "accessing" it is UB; it's that accessing it produces a value whose mere existence is UB, because |
@eternaleye Thanks; yeah, took staring at it for a while to realize that was the problem here. |
Note that you can also get UB with fn main() {
union A { a: u8, v: u16 }
let a = A { a: 1 };
match a.v {
_ => println!("Congrats, it's a u16!")
}
} as this reads one of the |
@eternaleye That code won't compile; it'll generate After experimenting with this a fair bit, it looks like Attempting to compile @nagisa's original example should produce the same error E0133 at compile time. |
Wait, I'm getting errors for reading, but @nagisa's example doesn't do this (pattern-matching isn't by-value unless there is a pattern that needs it to be). |
@joshtriplett Yeah, I just noticed that - I'd been taking at face value that @nagisa's example was reading out. |
OK, so perhaps the problem is more narrow. |
In particular, matches with empty arms are somewhat special -- they act as an "assertion" of sorts that the path in question is valid. This probably means we forgot to account for that as a kind of read. UPDATE: Some discussion on IRC where I spell out a bit more of the background |
I think this is just a corner case that we didn't catch (or have a test for) in the union implementation, namely, that a match on a union field with no patterns wasn't treated as unsafe. (That said, on the off chance someone was relying on this, such as via some kind of generic code and macros, when we fix it we should probably do a crater run.) Here's a test case that should not compile:
It currently compiles and prints "should not be allowed"; it should not compile at all. |
The #![feature(untagged_unions)]
fn main() {
enum Void {}
union A { a: (), v: Void }
let a = A { a: () };
match a.v {
}
} |
It looks like this was changed between nightly-2017-09-23 and nightly-2017-09-29. Maybe caused by #44700? |
@joshtriplett Not just "no arms", but "no by-value arms" - this should also require fn main() {
union A { a: u8, v: u16 }
let a = A { a: 1 };
match a.v {
_ => println!("Congrats, it's a u16!")
}
} |
@eternaleye I'm fine with making that unsafe temporarily, however, it's not clear that it will eventually have to be. In particular, I think that fn main() {
let x = Box::new(22);
match x {
_ => { }
}
} As I wrote on IRC, I believe matches with no arms have to be considered somewhat special here. Admittedly this needs to be written up more formally and documented. |
Another way to look at it: with a match with no arms, there is nowhere to branch to! So if that code is ever reached, that is UB. But with a single |
Oh, wait, my example is bogus =) and actually the example I thought would compile didn't: fn main() {
let x = Box::new(22);
drop(x);
match x {
_ => { }
}
} Nonetheless, I think this can come up. I'll play around some more. =) |
OK, so, in the MIR-based borrowck, these examples do work as I expected: #![feature(nll)]
fn main() {
let pair = (Box::new(22), Box::new(22));
drop(pair);
match pair {
_ => { }
}
}): I will try to write up a more thorough "proposal" of some kind regarding this validity predicates. I've tried in the past but each time I get stuck trying to figure out how much background to give. |
Reflecting some discussion from IRC back here: my proposal to address this is that naming a union field in a match should always require an unsafe block, even if the match doesn't name the field value or apply any patterns to the field value. That includes only having a |
So, to clarify something that @joshtriplett alluded to but didn't make explicit: There are two interesting questions to clarify. At what point do we have UB, and when is unsafety required? Clearly, unsafety must be required for any case that could cause UB, but it may also be required more broadly. I think it's reasonable to require unsafe more broadly, especially to start. But I think we should also write up and nail down the cases where UB could occur. And I think we may find value in helping the user identify the intersection and calling special attention to those cases where UB could actually occur. |
@petrochenkov Does regression-from-stable-to-nightly apply here? The bug now exists in current stable. |
Oops, wrong label. |
There is no MIR, even dead code, to generate for |
@eddyb wrote:
As part of fixing #27282 I am currently experimenting with adding MIR constructs that represent "start a (pseudo-)borrow of the discriminant for a Its possible we might leverage that work to represent the accesses in question here. |
triage: P-high Well, this is a regression. We ought to fix it. Assigning to pnkfelix and myself to figure out how to get this fixed. |
@nox just demonstrated this by using EDIT: can't we just always add just a dummy Lines 182 to 188 in 29c8276
And the new
|
(Wait, why would that raw pointer dereference be allowed in safe code? Or is this an analogy for the effect that was achieved, and not itself the code that was used to to do it?) |
@glaebhoerl Because the check is done on MIR, and this entire issue is about the dereference/union field access not ending up in MIR because it's never read from/written to by |
(Ah I see, I was looking for the union field access in there and wasn't following the details closely enough to see the analogy.) |
I think that's basically what we need to add, yes. At least, it'd be good to do this for now, and maybe revisit later if we want to think about a more "elegant" fix. |
Assigning to @eddyb to do something for now to close the gaping hole. |
@nikomatsakis Small problem, that approach also doubles up some MIR borrowck errors, e.g.: let mut x = 0;
let r = &mut x;
match *x { ... }
*r += 1; Because of the added dummy access, |
@eddyb seems ok for now |
rustc_mir: insert a dummy access to places being matched on, when building MIR. Fixes #47412 by adding a `_dummy = Discriminant(place)` before each `match place {...}`. r? @nikomatsakis
With the following code
it is possible to invoke undefined behaviour in safe code without using unstable features.
The text was updated successfully, but these errors were encountered: