Representation of unions #13

nikomatsakis · 2018-08-30T17:18:20Z

Discussing how unions are laid out.

Is #[repr(C)] meaningful when applie to a union?
When (if ever) do we guarantee that all fields start at offset 0?
When (if ever) do we guarantee that all fields have the same address?
Any key things to note re: FFI interop?

The text was updated successfully, but these errors were encountered:

joshtriplett · 2018-08-30T18:02:59Z

#[repr(C)] is meaningful (it guarantees that the union will have the same layout as an equivalent C union would); we do need to determine if #[repr(Rust)] wants to diverge from that, though. (Or if we want to guarantee that it'll never diverge in the future.)

Also, some relevant text from C11:
6.5.3.6.6:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

6.5.8.5:

All pointers to members of the same union object compare equal.

6.7.2.1.16:

A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit- field, then to the unit in which it resides), and vice versa.

hanna-kruppe · 2018-08-30T18:14:19Z

AFAIK the first paragraph you quoted (6.5.3.6.6) is solely about strict aliasing/TBAA, which Rust doesn't do. The other two quotes seem to practically guarantee that all members of the union start at the same offset, and offset 0 at that unless the union foo * -> struct union_member* cast adjusts the pointer. Does that sound right?

joshtriplett · 2018-08-30T18:38:28Z

@rkruppe Yes. (Also, I think the "suitably converted" there exclusively means a type conversion, not a value change.)

nikomatsakis · 2018-09-05T16:17:30Z

we do need to determine if #[repr(Rust)] wants to diverge from that, though. (Or if we want to guarantee that it'll never diverge in the future.)

Seems like we might as well reserve the right for that, although I don't see much motivation. Maybe we should drill in to some of the more specific questions:

Do we guarantee that fields begin at offset 0 for #[repr(Rust)] unions?
Or that they have the same address?

I am sort of of tempted to do so, because I don't know that there is much practical utility to doing otherwise, but I'd be curious to hear of use cases.

joshtriplett · 2018-09-05T16:38:30Z

In the interests of full evaluation of alternatives: the only argument I've heard for doing otherwise would be if we could detect that all the variants in a repr(Rust) union had "holes" in their representations, and then arrange those representations within the union such as to give the overall union a "hole". We could only do so if we 1) allowed flexibility in representation for repr(Rust) unions, and 2) prohibited repr(Rust) unions from containing arbitrary unknown bit patterns not expressed by the field types (which repr(C) unions have to allow).

I don't believe we should do either of those things, but I wanted to mention the arguments for doing so for completeness.

hanna-kruppe · 2018-09-05T16:46:39Z

That's an interesting line of thought! However, what could we actually use these "holes" for? I assume you're referring to padding. I'm not aware of any way to stash discriminants or other data in a contained type's padding. Any write or copy is allowed to omit or clobber padding bytes at random. For example, suppose Result<(u8, u32), u8> wanted to place the discriminant or the u8 in the padding of the tuple, this breaks as soon as someone takes a mutable reference to the tuple and writes to it.

gnzlbg · 2018-09-05T17:46:59Z

Do we guarantee that fields begin at offset 0 for #[repr(Rust)] unions?

Do we want #[repr(Rust)] unions? What do they allow ? E.g. I would be fine with just requiring that all unions must be #[repr(C)] for now, adding a warning for non #[repr(C)] unions, and maybe in the 2018 edition turning that into an error.

We can then, at some point, re-consider adding #[repr(Rust)] unions, with suitable motivation. I am not saying that they would be useless, but that if we are going to specify them exactly the same as #[repr(C)], we don't need both, and not being able to use them in C-FFI would be a downside of repr(Rust) union w.r.t. repr(C) here.

We don't have to allow all kinds of types for all reprs, and doing so makes us waste time in the specification of each repr.

joshtriplett · 2018-09-05T18:53:17Z

@gnzlbg We already have repr(Rust) unions in stable, and people are actively using them. People want to be able to build the space-efficient data structures they allow, and similar.

joshtriplett · 2018-09-05T18:56:41Z

@rkruppe I'm not talking about padding. I'm talking about things like enums and bool not using all the bits in their representation.

If I have a repr(Rust) union of a bool and a three-variant enum, and I then wrap that in an Option, could that fit in one byte?

Again, I don't think that we should do that, but people have suggested doing so.

hanna-kruppe · 2018-09-05T19:15:43Z

Ah, that makes more sense. I also don't think we should do this, though, I'm in favor of the "unions are bags of uninterpreted bits" approach that we seem to be slowly converging on (e.g. with the disposition to merge rust-lang/rfcs#2514).

gnzlbg · 2018-09-06T10:17:36Z

We already have repr(Rust) unions in stable, and people are actively using them. People want to be able to build the space-efficient data structures they allow, and similar.

My question was more about: how is repr(Rust) for unions different from repr(C) ? AFAICT they are identical with the sole exception that repr(C) unions can be passed to C code via FFI while doing so with repr(Rust) unions is UB. So if boths reprs are going to be identical except for this difference, I'd rather prefer to have one single repr that can be passed to C, and that would be repr(C).

We don't really have to break any code for doing this change. We can just say that unions are repr(C) by default, so that writing repr(C) on them becomes redundant, and that if you want a repr(Rust) union you have to actually write and spell repr(Rust), which is something you can't do now.

RalfJung · 2018-09-06T14:44:20Z

If I have a repr(Rust) union of a bool and a three-variant enum, and I then wrap that in an Option, could that fit in one byte?

I think this is part of the next discussion, the one about validity invariants: We first have to decide which bit patterns are valid for a union; once that is fixed, if the decision rules out some bit patterns as invalid, we can talk about layout optimizations.

hanna-kruppe · 2018-09-06T15:20:52Z

That's a good point, but it's also inconvenient since pretty much everything that there is to decide about union layout now depends on the validity invariant =/

RalfJung · 2018-09-06T15:35:21Z

Well, there are still the things @joshtriplett brought up above.

But yeah, while for enums layout came first and now we retrofit the validity invariant, it seems more reasonable to me to proceed the other way around with unions.

I guess one question we could discuss here, that could inform that validity discussion, is: Is there a strong need for layout optimizations on unions? We know for sure that some unions should not get layout optimized (MaybeUninit, for example). Would it be a noticeable loss to never do layout optimizations for unions? If yes, that might be a motivation to have a non-trivial validity invariant as proposed by @petrochenkov.

My personal stanza on this is that if someone really wants their union layout optimized, we should provide attributes to let them do that -- attributes that could also be used on struct, to e.g. make NonNull* types definable by a library. We shouldn't use the field types for this, because unions are often used to "get around" the type system so those field types might not be as reliable as they are in other places.

hanna-kruppe · 2018-09-06T15:39:30Z

Well, there are still the things @joshtriplett brought up above.

Which ones? The C standard quotes just more or less confirm what I think everyone agrees on for repr(C) layout, and as far as I can see all the points on repr(Rust) are driven by layout optimizations that interact with the validity invariant.

My personal stanza on this is that if someone really wants their union layout optimized, we should provide attributes to let them do that -- attributes that could also be used on struct, to e.g. make NonNull* types definable by a library. We shouldn't use the field types for this, because unions are often used to "get around" the type system so those field types might not be as reliable as they are in other places.

+1

petrochenkov · 2018-09-06T15:41:14Z

Is there a strong need for layout optimizations on unions?

My primary use case was ManuallyDrop, but it was converted to a struct recently.

nikomatsakis · 2018-09-10T21:54:52Z

So, this seems like a place where we can "describe the controversy", rather than laying out specific rules. I think I could summarize the comments like so thus far (let me know if I am missing something):

#[repr(C)] unions are guaranteed to be layout and ABI compatible with C unions
- presuming their interior types share the same guarantee etc
the default layout of unions is not yet fixed, because we are trying to figure out if we want room to do layout optimizations
- we'll be able to figure this out after we work through the validity invariants
- but we currently know of no strong use cases for unions to have layout optimizations and we would like people to send us examples

Personally, I lean towards the view that we should not assume anything about the bits of a union until we see an actual access (which then gives us the type we can use). This seems to imply that we can't do layout optimizations, because we can't be sure that the union is even initialized, and all layout optimizations rely on a notion of initialization. (But perhaps this is overly strict?)

joshtriplett · 2018-09-10T22:00:01Z

@nikomatsakis That would be my preference. And even then, the compiler can't necessarily keep assuming that type over time.

alercah · 2018-09-11T00:22:17Z

I think the idea here is that you may be able to do layout optimizations provided that you can find some guarantee that applies to all members? I think the "bag of bits" thing is a bit of a red herring here, although it does depend on the exact decisions we make about uninitialized memory.

For instance, given union<T, U> TwoPointer(&T, &U);, it seems to me that it ought to be possible to null-pointer optimize Option<TwoPointer<T, U>>, because the set of valid representations of a union ought to be the union of the set of valid representations of its fields.

Now that I think about it more, I don't think there's a sensible way for the compiler to offset fields within the union in order to achieve this result more generally. The niches into which we perform enum optimizations are quite different from padding holes, which we have discussed already cannot be assumed to remain constant (and requiring that may force undersized assignments, which may be a pessimization). Without going to too much detail right now, I'm pretty sure we cannot make use of this unless, as Niko floated for enums, we have some way to insist that you cannot bind references to individual fields.

Furthermore, I think we're converging on saying that a single-field struct always lays its field at offset 0. If that is the case, then the same should probably apply to unions, so we would say that all fields are laid out at offset 0.

This leaves two questions unanswered, however: a) can a union be more strictly aligned than any of its fields and b) can a union have unnecessary trailing padding? As far as I can tell, C answers "yes" to both, though I do not know why. I think this implies that repr(Rust) should also do so, as providing guarantees more strict than the C ABI on the platform seems odd. But I'm also tempted to say that repr(Rust) may not match the C ABI exactly, in that we may choose to omit the padding/stricter alignment.

nikomatsakis · 2018-09-26T13:52:14Z

@alercah

Furthermore, I think we're converging on saying that a single-field struct always lays its field at offset 0. If that is the case, then the same should probably apply to unions, so we would say that all fields are laid out at offset 0.

I don't recall us saying anything about this in the struct chapter. Perhaps I just missed it. In any case, it seems like something we might want to add into #31.

RalfJung · 2018-10-05T15:06:11Z

For instance, given union<T, U> TwoPointer(&T, &U);, it seems to me that it ought to be possible to null-pointer optimize Option<TwoPointer<T, U>>, because the set of valid representations of a union ought to be the union of the set of valid representations of its fields.

The latter part of your statement is not correct. We should allow things like this. The union is not valid for any of its variants after the assignment to f.y.1, but that code should remain legal.

So, if anything we'd have to do that union bytewise. I'd rather if we didn't. Unions are bags of bits with names to access some offsets, end of story. That's already complicated enough to use that I wouldn't want to further complicate the story by adding layout optimizations to the mix.

If people show compelling use-cases for layout-optimized unions, I'd suggest we work towards stabilizing something like rust-lang/rust#54032. That covers your union-of-two-references.

RalfJung · 2018-10-11T20:15:22Z

I heard people want me to do the writeup. Assigning to me then.

I didn't hear of any deadline though... ;)

gnzlbg · 2018-10-12T12:44:39Z

What is the layout of union variants when the repr of the different variants differ? E.g.:

union U {
    a: __m128, // repr(simd)
    b: (),
}

Currently, U.a appears to have array layout instead of repr(simd) layout.

hanna-kruppe · 2018-10-12T13:45:44Z

Memory layout is no different. Calling convention details are different (and that factors into the relatively superficial difference in the IR we produce observed over there in that PR), but I see no reason to specify those for repr(rust) unions, as they are irrelevant outside of FFI which one should use repr(C) for anyway.

RalfJung · 2018-10-12T14:33:38Z

So the distinction between[...] and <...> in LLVM should just affect calling conventions, but it does affect more, and that's the problem in that PR?

hanna-kruppe · 2018-10-12T17:32:20Z

No, arrays vs vectors is a quite important distinction for the IR, but none of those differences except ABI lowering affect the sort of visible behavior we are documenting here.

RalfJung · 2018-10-13T07:27:52Z

To summarize the discussion that happened here, the consensus seems to be that repr(C) unions have all their fields at offset 0, and for repr(Rust) that's likely the most sensible option but we'll have to await the discussion about validity invariants for unions to rule out use-cases like #13 (comment). Any objections?

nikomatsakis · 2018-10-23T17:07:14Z

@RalfJung sounds great to me! Do you think you can get a write-up done by this Thursday? Would be good to have somethng by the meeting. =)

RalfJung · 2018-10-24T16:28:34Z

Done: #39

Feels rather short, but what else is there to say?

gnzlbg · 2018-11-20T21:22:15Z

It would be useful to use unions like MaybeUninit<T> in C FFI, but for that MaybeUninit<T> would need to have the same repr as T. Currently we don't have anything like repr(transparent) for unions, and for the case of MaybeUninit<T> which is an union with two variants, it's unclear to me whether something like repr(transparent) would work.

RalfJung · 2019-06-29T10:29:36Z

Turns out regex relies on repr(Rust) unions having their field at offset 0:

https://github.com/rust-lang/regex/blob/172898a4fda4fd6a2d1be9fc7b8a0ea971c84ca6/src/vector/ssse3.rs#L80-L83

I bet they are not the only ones...

nikomatsakis added the A-layout Topic: Related to data structure layout (`#[repr]`) label Aug 30, 2018

avadacatavra added A-layout Topic: Related to data structure layout (`#[repr]`) active discussion topic and removed A-layout Topic: Related to data structure layout (`#[repr]`) labels Aug 31, 2018

hanna-kruppe mentioned this issue Sep 5, 2018

Representation of enums #10

Closed

RalfJung mentioned this issue Sep 6, 2018

Tracking issue for RFC 1892, "Deprecate uninitialized in favor of a new MaybeUninit type" rust-lang/rust#53491

Closed

3 tasks

nikomatsakis added the S-writeup-needed Status: Ready for a writeup and no one is assigned label Oct 11, 2018

RalfJung self-assigned this Oct 11, 2018

nikomatsakis assigned RalfJung and unassigned RalfJung Oct 23, 2018

RalfJung mentioned this issue Oct 24, 2018

basic chapter for representation of unions #39

Merged

avadacatavra closed this as completed in #39 Oct 25, 2018

gnzlbg mentioned this issue Feb 4, 2019

Validity of unions #73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representation of unions #13

Representation of unions #13

nikomatsakis commented Aug 30, 2018

joshtriplett commented Aug 30, 2018

hanna-kruppe commented Aug 30, 2018

joshtriplett commented Aug 30, 2018 •

edited

Loading

nikomatsakis commented Sep 5, 2018

joshtriplett commented Sep 5, 2018

hanna-kruppe commented Sep 5, 2018 •

edited

Loading

gnzlbg commented Sep 5, 2018 •

edited

Loading

joshtriplett commented Sep 5, 2018 •

edited

Loading

joshtriplett commented Sep 5, 2018

hanna-kruppe commented Sep 5, 2018

gnzlbg commented Sep 6, 2018 •

edited

Loading

RalfJung commented Sep 6, 2018

hanna-kruppe commented Sep 6, 2018

RalfJung commented Sep 6, 2018

hanna-kruppe commented Sep 6, 2018

petrochenkov commented Sep 6, 2018

nikomatsakis commented Sep 10, 2018

joshtriplett commented Sep 10, 2018

alercah commented Sep 11, 2018

nikomatsakis commented Sep 26, 2018

RalfJung commented Oct 5, 2018 •

edited

Loading

RalfJung commented Oct 11, 2018

gnzlbg commented Oct 12, 2018 •

edited

Loading

hanna-kruppe commented Oct 12, 2018 •

edited

Loading

RalfJung commented Oct 12, 2018

hanna-kruppe commented Oct 12, 2018

RalfJung commented Oct 13, 2018

nikomatsakis commented Oct 23, 2018

RalfJung commented Oct 24, 2018

gnzlbg commented Nov 20, 2018 •

edited

Loading

RalfJung commented Jun 29, 2019

Representation of unions #13

Representation of unions #13

Comments

nikomatsakis commented Aug 30, 2018

joshtriplett commented Aug 30, 2018

hanna-kruppe commented Aug 30, 2018

joshtriplett commented Aug 30, 2018 • edited Loading

nikomatsakis commented Sep 5, 2018

joshtriplett commented Sep 5, 2018

hanna-kruppe commented Sep 5, 2018 • edited Loading

gnzlbg commented Sep 5, 2018 • edited Loading

joshtriplett commented Sep 5, 2018 • edited Loading

joshtriplett commented Sep 5, 2018

hanna-kruppe commented Sep 5, 2018

gnzlbg commented Sep 6, 2018 • edited Loading

RalfJung commented Sep 6, 2018

hanna-kruppe commented Sep 6, 2018

RalfJung commented Sep 6, 2018

hanna-kruppe commented Sep 6, 2018

petrochenkov commented Sep 6, 2018

nikomatsakis commented Sep 10, 2018

joshtriplett commented Sep 10, 2018

alercah commented Sep 11, 2018

nikomatsakis commented Sep 26, 2018

RalfJung commented Oct 5, 2018 • edited Loading

RalfJung commented Oct 11, 2018

gnzlbg commented Oct 12, 2018 • edited Loading

hanna-kruppe commented Oct 12, 2018 • edited Loading

RalfJung commented Oct 12, 2018

hanna-kruppe commented Oct 12, 2018

RalfJung commented Oct 13, 2018

nikomatsakis commented Oct 23, 2018

RalfJung commented Oct 24, 2018

gnzlbg commented Nov 20, 2018 • edited Loading

RalfJung commented Jun 29, 2019

joshtriplett commented Aug 30, 2018 •

edited

Loading

hanna-kruppe commented Sep 5, 2018 •

edited

Loading

gnzlbg commented Sep 5, 2018 •

edited

Loading

joshtriplett commented Sep 5, 2018 •

edited

Loading

gnzlbg commented Sep 6, 2018 •

edited

Loading

RalfJung commented Oct 5, 2018 •

edited

Loading

gnzlbg commented Oct 12, 2018 •

edited

Loading

hanna-kruppe commented Oct 12, 2018 •

edited

Loading

gnzlbg commented Nov 20, 2018 •

edited

Loading