-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generator fields are not necessarily initialized #56100
Conversation
5c701b1
to
6eeedbc
Compare
@@ -142,6 +142,7 @@ macro_rules! make_value_visitor { | |||
self.walk_value(v) | |||
} | |||
/// Visit the given value as a union. No automatic recursion can happen here. | |||
/// Also called for the fields of a generator, which may or may not be initialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this happening in the code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah I went back on this because it doesn't work very well... I guess I could still do it an go through visit_field
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no that doesn't work, it doesn't have a union type. I don't think there is a way to visit the other generator fields at all with the current interface, and it doesn't seem worth extending the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yea, that's totally fine, as long as the comments mirror reality ;)
Well, as long as validation doesn't get hickups elsewhere because https://github.com/solson/miri/blob/adfede5cec2c8a136830f7fc309dbb45ac7a098a/src/helpers.rs#L221 wasn't visited in miri.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that is a good point. I forgot that I was using visit_union
there.
This is relevant when determining where there are UnsafeCell
inside a generator. If there is no UnsafeCell
, shared references enforce memory to be frozen. So we probably should go conservatively type-based here like we do for unions... dang.
Just calling visit_union
after doing the field projections would actually work, but it would violate the protocol that lets a visitor keep track of which "path" inside the data structure we are at. The only visitor relying on the path is validation, which doesn't do anything for unions, so this is fine in principle... but it's not nice. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new visit_generator_field
hook for this. Now at least it makes sense, and likely nobody will ever overwrite that hook...
// (which is the state) are actually implicitly `MaybeUninit`, i.e., | ||
// they may or may not be initialized, so we cannot visit them. | ||
match v.layout().ty.sty { | ||
ty::Generator(..) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Niche code also has an exception for generator fields
rust/src/librustc/ty/layout.rs
Lines 1812 to 1817 in 7a0cef7
// Locals variables which live across yields are stored | |
// in the generator type as fields. These may be uninitialized | |
// so we don't look for niches there. | |
if let ty::Generator(..) = layout.ty.sty { | |
return Ok(None); | |
} |
Would it make sense to try to simplify all downstream code for generators by wrapping all its fields with MaybeUninit
very early?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, yes that would be the same exception.
I am not sure how complicated it would be for generators to do this wrapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO generators should be treated like an union
with field offsets.
Unless we want to generate "variants" for the states involved, which would be a bit more work, but would provide a safe view into the state of the generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't have Union
layout though, so right now they need special treatment everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we want to generate "variants" for the states involved, which would be a bit more work, but would provide a safe view into the state of the generator.
I was considering that, but I don't know if that actually works in a non-scary way, as you'll want to switch from one variant to another without copying everything.
IMO generators should be treated like an union with field offsets.
but why the entire generator? The discriminant field is perfectly safe to read and we could even do value range restrictions on it to be able to use niche optimizations on generators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discriminant field is perfectly safe to read and we could even do value range restrictions on it to be able to use niche optimizations on generators.
In fact that would make perfect sense, it encodes the state after all and hence has a limited value range.
Cc @Nemo157 |
I'm assuming this is going to be another size pessimization for generators. :( sighs and looks longingly at #52924 |
@cramertj I don't follow. This PR doesn't change generator layout at all, and layout computation already pretty much treats them as |
@RalfJung Ah I missed the comment above saying that we already ignored niches in the layout optimizations. You could imagine initializing the object such that it had a bit-valid repr on creation to prevent UB, but we don't do that, so... :) |
@cramertj Since it's like a tagged enum, IMO we should use the tag ("current state") as a niche, by giving it a validity range based on the number of states. |
That might not even be possible for some types (uninhabited types, for example), and certainly be "fun" for references (which have to be dereferencable).^^ But also... why? |
I mean, this doesn't seem unreasonable to me? |
TBH I don't even see how it helps, let alone how it ever amortizes the cost of having to set the right bit pattern on initialization.^^ But, anyway, if the state tag gets a niche then |
I think the most valuable size optimization would be that the discriminants of all the generators in a stack of generators get unified into a single discriminant value. I doubt using the niches of fields is that important. |
Coming back to the topic of this PR... it seems everyone agrees that currently, the fields of a |
@bors r+ Yes. this PR represents the current state of how the compiler views generators and I think this code will break if we try to change that representation, so we'll notice |
📌 Commit 6befe67 has been approved by |
…-obk generator fields are not necessarily initialized Looking at the MIR we generate for generators, I think we deliberately leave fields of the generator uninitialized in ways that would be illegal if this was a normal struct (or rather, one would have to use `MaybeUninit`). Consider [this example](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=417b4a2950421b726dd7b307e9ee3bec): ```rust #![feature(generators, generator_trait)] fn main() { let generator = || { let mut x = Box::new(5); { let y = &mut *x; *y = 5; yield *y; *y = 10; } *x }; let _gen = generator; } ``` It generates the MIR ``` fn main() -> (){ let mut _0: (); // return place scope 1 { scope 3 { } scope 4 { let _2: [generator@src/main.rs:4:21: 13:6 for<'r> {std::boxed::Box<i32>, i32, &'r mut i32, ()}]; // "_gen" in scope 4 at src/main.rs:14:9: 14:13 } } scope 2 { let _1: [generator@src/main.rs:4:21: 13:6 for<'r> {std::boxed::Box<i32>, i32, &'r mut i32, ()}]; // "generator" in scope 2 at src/main.rs:4:9: 4:18 } bb0: { StorageLive(_1); // bb0[0]: scope 0 at src/main.rs:4:9: 4:18 (_1.0: u32) = const 0u32; // bb0[1]: scope 0 at src/main.rs:4:21: 13:6 // ty::Const // + ty: u32 // + val: Scalar(Bits { size: 4, bits: 0 }) // mir::Constant // + span: src/main.rs:4:21: 13:6 // + ty: u32 // + literal: Const { ty: u32, val: Scalar(Bits { size: 4, bits: 0 }) } StorageLive(_2); // bb0[2]: scope 1 at src/main.rs:14:9: 14:13 _2 = move _1; // bb0[3]: scope 1 at src/main.rs:14:16: 14:25 drop(_2) -> bb1; // bb0[4]: scope 1 at src/main.rs:15:1: 15:2 } bb1: { StorageDead(_2); // bb1[0]: scope 1 at src/main.rs:15:1: 15:2 StorageDead(_1); // bb1[1]: scope 0 at src/main.rs:15:1: 15:2 return; // bb1[2]: scope 0 at src/main.rs:15:2: 15:2 } } ``` Notice how we only initialize the first field of `_1` (even though it contains a `Box`!), and then assign it to `_2`. This violates the rule "on assignment, all data must satisfy the validity invariant", and hence miri complains about this code. What this PR effectively does is to change the validity invariant for generators such that it says nothing about the fields of the generator. We behave as if every field of the generator was wrapped in a `MaybeUninit`. r? @oli-obk Cc @nikomatsakis @eddyb @cramertj @withoutboats @Zoxc
Rollup of 14 pull requests Successful merges: - #56024 (Don't auto-inline const functions) - #56045 (Check arg/ret sizedness at ExprKind::Path) - #56072 (Stabilize macro_literal_matcher) - #56075 (Encode a custom "producers" section in wasm files) - #56100 (generator fields are not necessarily initialized) - #56101 (Incorporate `dyn` into more comments and docs.) - #56144 (Fix BTreeSet and BTreeMap gdb pretty-printers) - #56151 (Move a flaky process test out of libstd) - #56170 (Fix self profiler ICE on Windows) - #56176 (Panic setup msg) - #56204 (Suggest correct enum variant on typo) - #56207 (Stabilize the int_to_from_bytes feature) - #56210 (read_c_str should call the AllocationExtra hooks) - #56211 ([master] Forward-ports from beta) Failed merges: r? @ghost
Looking at the MIR we generate for generators, I think we deliberately leave fields of the generator uninitialized in ways that would be illegal if this was a normal struct (or rather, one would have to use
MaybeUninit
). Consider this example:It generates the MIR
Notice how we only initialize the first field of
_1
(even though it contains aBox
!), and then assign it to_2
. This violates the rule "on assignment, all data must satisfy the validity invariant", and hence miri complains about this code.What this PR effectively does is to change the validity invariant for generators such that it says nothing about the fields of the generator. We behave as if every field of the generator was wrapped in a
MaybeUninit
.r? @oli-obk
Cc @nikomatsakis @eddyb @cramertj @withoutboats @Zoxc