Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Untagged unions (tracking issue for RFC 1444) #32836

Closed
3 of 5 tasks
Tracked by #17
nikomatsakis opened this issue Apr 8, 2016 · 210 comments · Fixed by #65747
Closed
3 of 5 tasks
Tracked by #17

Untagged unions (tracking issue for RFC 1444) #32836

nikomatsakis opened this issue Apr 8, 2016 · 210 comments · Fixed by #65747
Assignees
Labels
B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. F-untagged_unions `#![feature(untagged_unions)]` finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Apr 8, 2016

Tracking issue for rust-lang/rfcs#1444.

Unresolved questions:

Open issues of high import:

@nikomatsakis nikomatsakis added B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. T-lang Relevant to the language team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. labels Apr 8, 2016
@sfackler
Copy link
Member

sfackler commented Apr 8, 2016

I may have missed it in the discussion on the RFC, but am I correct in thinking that destructors of union variants are never run? Would the destructor for the Box::new(1) run in this example?

union Foo {
    f: i32,
    g: Box<i32>,
}

let mut f = Foo { g: Box::new(1) };
f.g = Box::new(2);

@solson
Copy link
Member

solson commented Apr 8, 2016

@sfackler My current understanding is that f.g = Box::new(2) will run the destructor but f = Foo { g: Box::new(2) } would not. That is, assigning to a Box<i32> lvalue will cause a drop like always, but assigning to a Foo lvalue will not.

@sfackler
Copy link
Member

sfackler commented Apr 8, 2016

So an assignment to a variant is like an assertion that the field was previously "valid"?

@solson
Copy link
Member

solson commented Apr 8, 2016

@sfackler For Drop types, yeah, that's my understanding. If they weren't previously valid you need to use the Foo constructor form or ptr::write. From a quick grep, it doesn't seem like the RFC is explicit about this detail, though. I see it as an instantiation of the general rule that writing to a Drop lvalue causes a destructor call.

@ohAitch
Copy link

ohAitch commented Apr 8, 2016

Should a &mut union with Drop variants be a lint?

On Friday, 8 April 2016, Scott Olson [email protected] wrote:

@sfackler https://github.com/sfackler For Drop types, yeah, that's my
understanding. If they weren't previously valid you need to use the Foo
constructor form or ptr::write. From a quick grep, it doesn't seem like
the RFC is explicit about this detail, though.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#32836 (comment)

@joshtriplett
Copy link
Member

On April 8, 2016 3:36:22 PM PDT, Scott Olson [email protected] wrote:

@sfackler For Drop types, yeah, that's my understanding. If they
weren't previously valid you need to use the Foo constructor form or
ptr::write. From a quick grep, it doesn't seem like the RFC is
explicit about this detail, though.

I should have covered that case explicitly. I think both behaviors are defensible, but I think it'd be far less surprising to never implicitly drop a field. The RFC already recommends a lint for union fields with types that implement Drop. I don't think assigning to a field implies that field was previously valid.

@sfackler
Copy link
Member

sfackler commented Apr 8, 2016

Yeah, that approach seems a bit less dangerous to me as well.

@solson
Copy link
Member

solson commented Apr 8, 2016

Not dropping when assigning to a union field would make f.g = Box::new(2) act differently from let p = &mut f.g; *p = Box::new(2), because you can't make the latter case not drop. I think my approach is less surprising.

It's not a new problem, either; unsafe programmers already have to deal with other situations where foo = bar is UB if foo is uninitialized and Drop.

@joshtriplett
Copy link
Member

I personally don't plan to use Drop types with unions at all. So I'll defer entirely to people who have worked with analogous unsafe code on the semantics of doing so.

@retep998
Copy link
Member

retep998 commented Apr 9, 2016

I also don't intend to use Drop types in unions so either way doesn't matter to me as long as it is consistent.

@ohAitch
Copy link

ohAitch commented Apr 9, 2016

I don't intend to use mutable references to unions, and probably
just "weirdly-tagged" ones with Into

On Friday, 8 April 2016, Peter Atashian [email protected] wrote:

I also don't intend to use Drop types in unions so either way doesn't
matter to me as long as it is consistent.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#32836 (comment)

@nikomatsakis
Copy link
Contributor Author

Seems like this is a good issue to raise up as an unresolved question. I'm not sure yet which approach I prefer.

@joshtriplett
Copy link
Member

@nikomatsakis As much as I find it awkward for assigning to a union field of a type with Drop to require previous validity of that field, the reference case @tsion mentioned seems almost unavoidable. I think this might just be a gotcha associated with code that intentionally disables the lint for putting a type with Drop in a union. (And a short explanation of it should be in the explanatory text for that lint.)

@solson
Copy link
Member

solson commented Apr 12, 2016

And I'd like to reiterate that unsafe programmers must already generally know that a = b means drop_in_place(&mut a); ptr::write(&mut a, b) to write safe code. Not dropping union fields would be one more exception to learn, not one less.

(NB: the drop doesn't happen when a is statically known to already be uninitialized, like let a; a = b;.)

But I support having a default warning against Drop variants in unions that people have to #[allow(..)] since this is a fairly non-obvious detail.

@nikomatsakis
Copy link
Contributor Author

@tsion this is not true for a = b and maybe only sometimes true for a.x = b but it is certainly true for *a = b. This uncertainty is what made me hesitant about it. For example, this compiles:

fn main() {
  let mut x: (i32, i32);
  x.0 = 2;
  x.1 = 3;
}

(though trying to print x later fails, but I consider that a bug)

@solson
Copy link
Member

solson commented Apr 12, 2016

@nikomatsakis That example is new to me. I guess I would have considered it a bug that that example compiles, given my previous experience.

But I'm not sure I see the relevance of that example. Why is what I said not true for a = b and only sometimes for a.x = b?

Say, if x.0 had a type with a destructor, surely that destructor is called:

fn main() {
    let mut x: (Box<i32>, i32);
    x.0 = Box::new(2); // x.0 statically know to be uninit, destructor not called
    x.0 = Box::new(3); // x.0 destructor is called before writing new value
}

@arielb1
Copy link
Contributor

arielb1 commented Apr 14, 2016

Maybe just lint against that kind of write?

@nikomatsakis
Copy link
Contributor Author

My point is only that = does not always run the destructor; it
uses some knowledge about whether the target is known to be
initialized.

On Tue, Apr 12, 2016 at 04:10:39PM -0700, Scott Olson wrote:

@nikomatsakis That example new to me. I guess I would have considered it a bug that that example compiles, given my previous experience.

But I'm not sure I see the relevance of that example. Why is what I said not true for a = b and only sometimes for 'a.x = b'?

Say, if x.0 had a type with a destructor, surely that destructor is called:

fn main() {
    let mut x: (Box<i32>, i32);
    x.0 = Box::new(2); // x.0 statically know to be uninit, destructor not called
    x.0 = Box::new(3); // x.0 destructor is called
}

@arielb1
Copy link
Contributor

arielb1 commented Apr 16, 2016

@nikomatsakis

It runs the destructor if the drop flag is set.

But I think that kind of write is confusing anyway, so why not just forbid it? You can always do *(&mut u.var) = val.

@solson
Copy link
Member

solson commented Apr 16, 2016

My point is only that = does not always run the destructor; it uses some knowledge about whether the target is known to be initialized.

@nikomatsakis I already mentioned that:

(NB: the drop doesn't happen when a is statically known to already be uninitialized, like let a; a = b;.)

But I didn't account for dynamic checking of drop flags, so this is definitely more complicated than I considered.

@arielb1
Copy link
Contributor

arielb1 commented Apr 17, 2016

@tsion

Drop flags are only semi-dynamic - after zeroing drop is gone, they are a part of codegen. I say we forbid that kind of write because it does more confusion than good.

@ghost
Copy link

ghost commented Apr 27, 2016

Should Drop types even be allowed in unions? If I'm understanding things correctly, the main reason to have unions in Rust is to interface with C code that has unions, and C doesn't even have destructors. For all other purposes, it seems that it's better to just use an enum in Rust code.

@Amanieu
Copy link
Member

Amanieu commented Apr 27, 2016

There is a valid use case for using a union to implement a NoDrop type which inhibits drop.

@joshtriplett
Copy link
Member

As well as invoking such code manually via drop_in_place or similar.

@RumataEstor
Copy link

To me dropping a field value while writing to it is definitely wrong because the previous option type is undefined.

Would it be possible to prohibit field setters but require full union replacement? In this case if the union implements Drop full union drop would be called for the value replaced as expected.

@joshtriplett
Copy link
Member

I don't think it makes sense to prohibit field setters; most uses of unions should have no problem using those, and fields without a Drop implementation will likely remain the common case. Unions with fields that implement Drop will produce a warning by default, making it even less likely to hit this case accidentally.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jul 29, 2018

@joshtriplett

primary use cases of unions

It's not obvious to me at all why this is the primary use case.
It may be true for repr(C) unions if you assume that all uses of unions for tagged unions / "Rust enum emulation" in FFI assume extensibility (which is not true), but from what I've seen, uses of repr(Rust) unions (drop control, intialization control, transmutes) do not expect "unexpected variants" suddenly appearing in them.

@joshtriplett
Copy link
Member

@petrochenkov I didn't say "break the primary use case", I said "break primary use cases". FFI is one of the primary use cases of unions.

@scottmcm
Copy link
Member

and take the union (heh ;) ) of all of those sets

There's certainly an attractive obviousness to a statement that "the possible values of a union are the union of the possible values of all its possible variants"...

@RalfJung
Copy link
Member

RalfJung commented Jul 30, 2018

True. However, that's not the proposal -- we all agree that the following should be legal:

union F {
  x: (u8, bool),
  y: (bool, u8),
}
fn foo() -> F {
  let mut f = F { x: (5, false) };
  unsafe { f.y.1 = 17; }
  f
}

Actually I think it is a bug that this even requires unsafe.

So, the union has to be taken bytewise, at least.
Also, I don't think "attractive obviousness" on its own is a sufficiently good reason. Any invariant we decide on is a significant burden for unsafe code authors, we should have concrete advantages that we get in turn.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jul 30, 2018

@RalfJung

Actually I think it is a bug that this even requires unsafe.

I don't know about the new MIR-based unsafety-checker implementation, but in the old HIR-based one it was certainly a checker limitation/simplification - only expressions of the form expr1.field = expr2 were analyzed for possible "field assignment" unsafety opt-out, everything else was conservatively treated as generic "field access" that's unsafe for unions.

@petrochenkov
Copy link
Contributor

Answering the comment in #52786 (comment):

So the idea is that compiler still doesn't know anything about the Wrap<T>'s contract and can't e.g. do layout optimizations. Ok, this position is understood.
This means that internally, inside of Wrap's module, implementation of Wrap<T> module can, for example, temporarily write "unexpected values" into it, if it doesn't leak them to users, and compiler will be okay with them.

I'm not sure though how exactly the part of Wraps contract about absence of unexpected values is related to field privacy.

First of all, regardless of fields being private or public, unexpected values cannot be written directly through those fields. You need something like a raw pointer, or code on the other side of FFI to do it, and it can be done without any field access, just by having a pointer to the whole union. So we need to approach this from some other direction than access to a field being restricted.

As I interpret you comment, the approach is to say that a private field (in union or a struct, doesn't matter) implies an arbitrary invariant unknown to user, so any operations changing that field (directly or through wild pointers, doesn't matter) result in UB because they can potentially break that unspecified invariant.

This means that if a union has a single private field, then its implementer (but not compiler) can assume that no third party will write an unexpected value into that union.
That's a "default union documentation clause" for the user in some sense:
- (Default) If a union has a private field you can't write garbage into it.
- Otherwise, you can write garbage into a union unless its docs explicitly prohibit it.

If some union wants to prohibit unexpected values while still providing pub access to its expected fields (e.g. when those fields have no their own invariants), then it still can do it through documentation, that's why the "unless" in the second clause is necessary.

@RalfJung
Does this describe you position accurately?

How scenarios like this are treated?

mod m {
    union MyPrivateUnion { /* private fields */ }
    extern {
        fn my_private_ffi_function() -> MyPrivateUnion; // Can return garbage (?)
    }
}

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2018

As I interpret you comment, the approach is to say that a private field (in union or a struct, doesn't matter) implies an arbitrary invariant unknown to user, so any operations changing that field (directly or through wild pointers, doesn't matter) result in UB because they can potentially break that unspecified invariant.

No, that is not what I meant.

There are multiple invariants. I do not know how many we will need, but there will be at least two (and I don't have great names for them):

  • The "Layout-level invariant" (or "syntactic invariant") of a type is completely defined by the syntactic shape of the type. These are things like "&mut T is non-NULL and aligned", "bool is 0 or 1", "! cannot exist". On this level, *mut T is the same as usize -- both allow any value (or maybe any initialized value, but that distinction is for another discussion). We are, eventually, going to have a document spelling out these invariants for all types, by structural recursion: The layout-level invariant of a struct is that all its fields have their invariant maintained, etc. Visibility does not play a role here.

    Violating the layout-level invariant is instantaneous UB. This is a statement we can make because we have defined this invariant in very simple terms, and we make it part of the definition of the language itself. We can then exploit this UB (and we already do), e.g. to perform enum layout optimizations.

  • The "Custom type-level invariant" (or "semantic invariant") of a type is picked by whoever implements the type. The compiler cannot know this invariant as we do not have a language to express it, and the same goes for the language definition. We cannot make violating this invariant UB, as we cannot even say what that invariant is! The fact that it is even possible to have custom invariants is a feature of any useful type system: Abstraction. I wrote more about this in a past blog post.

    The connection between the custom, semantic invariant and UB is that we declare that unsafe code may rely on its semantic invariants being preserved by foreign code. That makes it incorrect to just go ahead any put random stuff into a Vec's size field. Note that I said incorrect (I sometimes use the term unsound) -- but not undefined behavior! Another example to demonstrate this difference (really, the same example) is the discussion about aliasing rules for &mut ZST. Creating a dangling well-aligned non-null &mut ZST is never immediate UB, but it is still incorrect/unsound because one may write unsafe code which relies on this not to happen.

It would be nice to align these two concepts, but I do not think it is practical. First of all, for some types (function pointers, dyn traits), the definition of the custom, semantic invariant actually uses the definition of UB in the language. This definition would be circular if we wanted to say that it is UB to ever violate the custom, semantic invariant. Secondly, I'd prefer if the definition of our language, and whether a certain execution trace exhibits UB, was a decidable property. Semantic, custom invariants are frequently not decidable.


I'm not sure though how exactly the part of Wraps contract about absence of unexpected values is related to field privacy.

Essentially, when a type chooses its custom invariant, it has to make sure that anything that safe code can do preserves the invariant. After all, the promise is that just using this type's safe API can never lead to UB. This is applies to both structs and unions. One of the things safe code can do is access public fields, which is where this connection comes from.

For example, a public field of a struct cannot have a custom invariant that is different from the custom invariant of the field type: After all, any safe user could write arbitrary data into that field, or read form the field and expect "good" data. A struct where all fields are public can be safely constructed, placing further restrictions on the field.

A union with a public field... well that's somewhat interesting. Reading union fields is unsafe anyway, so nothing changes there. Writing union fields is safe, so a union with a public field has to be able to handle arbitrary data which satisfies that field's type's custom invariant being put into the field. I doubt this will be very useful...

So, to recap, when you choose a custom invariant, it is your responsibility to make sure that foreign safe code cannot break this invariant (and you have tools like private fields to help you achieve this). It is the responsibility of foreign unafe code to not violate your invariant when that code does something safe code could not do.


This means that internally, inside of Wrap's module, implementation of Wrap module can, for example, temporarily write "unexpected values" into it, if it doesn't leak them to users, and compiler will be okay with them.

Correct. (panic-safety is a concern here but you are probably aware). This is just like, in Vec, I can safely do

let sz = self.size;
self.size = 1337;
self.size = sz;

and there is no UB.


mod m {
    union MyPrivateUnion { /* private fields */ }
    extern {
        fn my_private_ffi_function() -> MyPrivateUnion; // Can return garbage (?)
    }
}

In terms of the syntactic layout invariant, my_private_ffi_function can do anything (assuming the function call ABI and signature matches). In terms of the semantic custom invariant, that's not visible in the code -- whoever wrote this module had an invariant in mind, they should document it next to their union definition and then make sure that the FFI function returns a value which satisfies the invariant.

@RalfJung
Copy link
Member

I finally wrote that blog post about whether and when &mut T must be initialized, and the two kinds of invariants I mentioned above.

@SimonSapin
Copy link
Contributor

Is there anything left to track here that’s not already covered by #55149, or should we close?

@Nemo157
Copy link
Member

Nemo157 commented May 13, 2019

E0658 still points here:

error[E0658]: unions with non-Copy fields are unstable (see issue #32836)

@Avi-D-coder
Copy link
Contributor

This currently plays terribly with atomics, since they do not implement Copy. Does anyone know a workaround?

@SimonSapin
Copy link
Contributor

When #55149 is implemented, you’ll be able to use ManuallyDrop<AtomicFoo> in a union. Until then, the only work-around is to use Nightly (or not use union and find some alternative).

@RalfJung
Copy link
Member

With that implemented, you shouldn't even need ManuallyDrop; after all rustc knows that Atomic* does not implement Drop.

@Centril
Copy link
Contributor

Centril commented Oct 21, 2019

Assigning myself to switch the tracking issue to the new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. F-untagged_unions `#![feature(untagged_unions)]` finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.