-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decide on when MIR Discriminant() operation is UB #91095
Comments
There seems to be an additional mismatch in terms of which discriminants are valid. The code generation considers discriminants corresponding to uninhabited variants to be invalid, but they do not seem to cause any errors in Miri. Does Miri validate a scalar range? |
How concretely does that assumption look like -- can you link to the code that does this?
Miri checks that a variant with the given tag exists. But Miri does not do further special casing of uninhabited variants. For |
The layout for tag describes a valid range, which is used to emit LLVM range metadata when loading the tag. For example, for a direct encoding: rust/compiler/rustc_middle/src/ty/layout.rs Lines 1146 to 1160 in 81117ff
rust/compiler/rustc_codegen_llvm/src/builder.rs Lines 483 to 485 in 81117ff
|
Ah, and then that
Yeah that seems like there are a lot of special cases for uninhabited variants / enums built into the codegen backend that do not seem to fall out of any more general underlying principle -- except for "the entire value must be valid", but that is incompatible with things we do elsewhere. |
I don't really see how there's more than one option here. For
For SB this can then be a read of those places. This lets this example continue to be DB. Despite requiring near full validity, I do think this is effectively encoding the idea of "doing minimal work" - just in a way that is cognizant of enum layout optimizations. Now of course we could instead try and define the safety requirements of For the associated MIR rvalue, I suppose we could choose a weaker requirement (since layout information is actually available there), but I'm also not sure what the benefit of that is. |
Note that this doesn't necessarily have to make the elaborate drops code wrong, if we decide that moving out leaves the value initialized (not that I'm advocating for that) |
That's not actually so bad -- it is, I think, basically what Miri currently does. One "just" implements the logic of determining the discriminant in the obvious way, and if any of the values considered in that process are uninit, we raise UB. |
What are the benefits of this though? Users can't stably rely on this anyway, so the only way that they can make use of this is to do some cursed thing where they try and analyze the chosen layout - I'm not even convinced though that it's actually possible to do this for enums the same way it is for structs. Do we expect MIR optimization passes to do this sort of analysis? Do we even want them to? This is one of those instances where I expect that adding more UB to the language is going to decrease the amount of UB in the ecosystem, because it will reduce the chance that people do dangerous things for which there are probably better solutions. I do recognize that this might be useful if we allow |
oh, another relevant point here @RalfJung : Types don't have to have niches in order for us to want to read some of their bytes in relation to a discriminant call. Specifically, imagine that I have an enum like the following: enum E {
A(NonZeroU8, u8),
B,
C,
} rustc could choose a layout like this:
Now we could imagine a future in which we get enough features (eg a x == E::B into
But if we guarantee that What all this means is that "the niche is partially occupied" would be an insufficient description of the requirements (assuming we want to enable this optimization). What we'd instead need is some condition that talks about the use of a field's bytes in conjunction with the niche in another field |
It's the least amount of UB we can have. That's IMO the default state and any extra UB needs justification. :) I am not convinced that special-casing |
This last part is a good point, and I'm not sure what the best strategy for proceeding with it is. But my point above still stands. There are real (and probably non-trivial) optimizations that require us to read possibly anything that is not behind an |
Actually it should be able to do that, since all our references are |
Edit: After more experimentation, this example is inconclusive because LLVM doesn't figure it out even with more metadata. Still though, I do think that |
Races are not a concern, in LLVM a read-write race just means the read returns undef. |
Oh this is a good point about the races, I thought they returned |
Yeah, it can do the 2-byte load, freeze that, and then compare with |
We do not currently have a clear description of what the semantics of the Discriminant() MIR operation, and the corresponding intrinsic (exposed via
mem::discriminant()
), are -- specifically, what are the safety preconditions of this operation, and when is it UB?Note that this operation works on all types, not just enums. For valid values of non non-enum types it returns some valid integer value (currently, 0).
The implementation in Miri (to be restored with #91088) does the minimum amount of work necessary to determine the discriminant: if the type has no discriminant (since there are not at least 2 variants), the operation is always defined; otherwise it reads the tag (which encodes the discriminant) and causes UB if that is uninitialized or does not encode a valid discriminant. (There are some thorny question here around what happens if the discriminant has provenance; I would like to keep that out of scope for this issue -- it should likely be treated like a ptr-to-int transmute, whatever we end up doing with that: rust-lang/unsafe-code-guidelines#286.)
The codegen backend adds some extra UB for the case where the type is uninhabited:
rust/compiler/rustc_codegen_ssa/src/mir/place.rs
Lines 206 to 215 in 81117ff
We also have a related MIR optimization in https://github.com/rust-lang/rust/blob/93542a8240c5f926ac5f3f99cef99366082f9c2b/compiler/rustc_mir_transform/src/uninhabited_enum_branching.rs. I am not quite sure what this does though, it seems to be more about assuming that if a particular enum variant is uninhabited then we will never see the discriminant for that variant, and can hence remove it from the
SwitchInt
?An 'obvious' choice is to say that the value passed to the
Discriminant
operator must fully satisfy its validity invariant -- that would certainly justify both the MIR optimization and what the codegen backend does. However, this also has problems:Discriminant
operations on partially moved-out-of enums (Safe function MIR reads discriminant of moved-out local #91029). Depending on the semantics of 'move' and whether validity invariants might take into account what a pointer points to (such as requiring that aBox
be initialized), this might lead to callingDiscriminant
on invalid values.These observations make me doubtful that requiring full validity is the right thing. Making the fewest assumptions is appealing IMO, but not compatible with our codegen backend nor with the MIR optimizations -- the optimization seems to kick in even for operations of the form
Discriminant(*ptr)
, so the validity invariant ofptr
itself does not help either. It could be possible to strike some middle ground, but that feels like a rather ad-hoc adjustment to the current set of optimizations.To summarize:
Cc @wesleywiser @tmandry @rust-lang/wg-mir-opt
The text was updated successfully, but these errors were encountered: