Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validity of unions #73

Closed
RalfJung opened this issue Jan 10, 2019 · 77 comments
Closed

Validity of unions #73

RalfJung opened this issue Jan 10, 2019 · 77 comments
Labels
A-unions Topic: Related to unions A-validity Topic: Related to validity invariants S-pending-design Status: Resolving this issue requires addressing some open design questions

Comments

@RalfJung
Copy link
Member

Discussing the validity invariant of unions.

One possible choice here is "none, any bit pattern is allowed no matter which types the fields have, and including uninitialized bits".

We could also decide that e.g. a

union Foo { a: bool, b: (bool, u8) }

must start with the first byte being either the bit-pattern of false or the bit-pattern of true, because all fields agree on that invariant.

Notice that we cannot require the union to be valid for some field: for a union like

union Mix {
  f1: (bool, u8),
  f2: (u8, bool),
}

we want to allow a bit pattern like 0x3 0x3, which can occur from code like

let m = Mix { f1: (false, 3) };
m.f2.0 = 3;

There is no demonstrated benefit from disallowing such code, and this kind of code seems perfectly reasonable around unions.

Given that, any validity invariant that wants to restrict the set of allowed bit patterns will be rather complicated. However, such an invariant would enable us to e.g. layout-optimize Option<Foo>, whereas the "anything goes"-invariant would prohibit any kind of layout optimization around unions.

@RalfJung RalfJung added active discussion topic A-validity Topic: Related to validity invariants labels Jan 10, 2019
@RalfJung
Copy link
Member Author

My personal preference is to allow any bit pattern, mostly to keep things simple. Unions are already complex enough, and they only occur in unsafe code, we should make this as simple to use for the programmer as we can.

If we ever desparately need to layout-optimize Option<Foo>, I propose we add attributes that let us control this -- something like a stable version of rustc_layout_scalar_valid_range_start.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 31, 2019

My personal preference is to allow any bit pattern, mostly to keep things simple. Unions are already complex enough, and they only occur in unsafe code, we should make this as simple to use for the programmer as we can.

I feel like @joshtriplett and @cramertj have both, in the past, expressed strong opinions about this.

I think I agree regarding allowing any bit pattern. That said, I can definitely imagine wanting to be able to create a union that is "as optimized" as the equivalent enum, but that it has no discriminant (because you know you can figure that out from other means).

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 31, 2019

@RalfJung

We could also decide ... must start with the first byte being either the bit-pattern of false or the bit-pattern of true, because all fields agree on that invariant.

It seems like this would be true only if the Rust compiler decided to lay the fields out at offset zero, right? Personally, I sort of think we should just guarantee that the Rust compiler will do so. Particularly if we decide that unions are an opaque "bag of bits" from the perspective of the compiler, what is the motivation for the compiler to add extra padding into that bag?

(The same applies to your second example.)

@RalfJung
Copy link
Member Author

It seems like this would be true only if the Rust compiler decided to lay the fields out at offset zero, right?

Yes, I've been assuming that to be the base.

I feel like @joshtriplett and @cramertj have both, in the past, expressed strong opinions about this.

Yeah I remember that as well. Also @petrochenkov expressed the opposite opinion, namely that unions should have a non-trivial invariant.

I can definitely imagine wanting to be able to create a union that is "as optimized" as the equivalent enum, but that it has no discriminant (because you know you can figure that out from other means).

I can imagine wanting to do many things. :) But I feel such needs are better served by an opt-in attribute, than enabled per default.

@cramertj
Copy link
Member

cramertj commented Feb 1, 2019

I think I agree regarding allowing any bit pattern. That said, I can definitely imagine wanting to be able to create a union that is "as optimized" as the equivalent enum, but that it has no discriminant (because you know you can figure that out from other means).

Yep, I would prefer this for non-repr(C) unions. IMO the common case (the FFI stuff) should always use repr(C), so this should be a non-issue for most folks. We could even lint on non-repr(C) unions just as a "be sure you know what you're doing, and that this is uncommon" hint.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 1, 2019

Yep, I would prefer this for non-repr(C) unions.

Could you spell out what you mean by "this"?

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 4, 2019

It seems like this would be true only if the Rust compiler decided to lay the fields out at offset zero, right? Personally, I sort of think we should just guarantee that the Rust compiler will do so.

I also think that we should guarantee this, but @joshtriplett mentioned some reasons about why we might not want to do that in the discussion about the layout of unions (#13 (comment)). It's unclear to me whether that interchange achieved some consensus, but maybe we should open a different issue to discuss whether we might want to guarantee this particular thing ? That would need amending the layout of unions in the repo.

EDIT: for repr(C) unions, the fields always start at offset 0 AFAIK.

@hanna-kruppe
Copy link

I also think that we should guarantee this, but @joshtriplett mentioned some reasons about why we might not want to do that in the discussion about the layout of unions (#13 (comment)). It's unclear to me whether that interchange achieved some consensus, but maybe we should open a different issue to discuss whether we might want to guarantee this particular thing ?

That question can't really be separated from what we're discussing here: these kinds of layout optimizations are only possible if unions have a non-trivial validity invariant relating to the validity invariants of the fields. If we achieve consensus here that unions are just bags of bits, then it should be uncontroversial that there's no reason to place union fields at nonzero offsets. Conversely, the desire for layout optimizations of unions is a reason to want some non-trivial validity invariant for unions. I believe that's why nothing was settled during the previous discussion of union layout.

@cramertj
Copy link
Member

cramertj commented Feb 5, 2019

@RalfJung

Yep, I would prefer this for non-repr(C) unions.
Could you spell out what you mean by "this"?

Yeah-- I'd like the opportunity to optimize based on known-invalid bitpatterns for repr(Rust) unions.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 5, 2019

@cramertj So what do you think about code like this, which violates the principle that at any time, some variant of the union is valid?

union Mix {
  f1: (bool, u8),
  f2: (u8, bool),
}

let m = Mix { f1: (false, 3) };
m.f2.0 = 3;

From what I recall, this is something we explicitly want to support. It is also rather hard to argue that this is UB because it never actually performs an operation that "sees" a bad value.

Given that unions are basically sugar for transmutes, I am really worried about automatically assuming that any of their bit patterns are invalid.

@cramertj
Copy link
Member

cramertj commented Feb 5, 2019

oh wow what a mess, good point! I'd still argue that the niches left over from overlapping all variants (e.g. union Mix { f1: NonZeroU32, f2: NonZeroU8 }) could be supported, but that is more complicated than what I originally imagined.

@RalfJung
Copy link
Member Author

I thought I had written down what the validity invariant could be to justify layout optimizations such as what @cramertj is asking for, but it seems not here... it would be something like:

Bit/byte i of the union is allowed to have value v iff there is a variant of the union such that bit/byte i of the variant is allowed to have value v. We assume all variants to be "filled up" to the same size with padding, which may have any value.

Whether it should be bits or bytes is unclear, as are a few other things. For example, this kind of implicitly assumes that validity talks only about the bits, but it might also talk about contents of the memory and then things become even more messy. Also this is very hard to check for in an implementation of our dynamic semantics.

One thing that everyone seems to agree on though (including the above definition) is that if the union has a field of size 0 (such as is the case for MaybeUninit), then it may contain any value and thus there can be no layout optimizations.

@HadrienG2
Copy link

HadrienG2 commented Jun 20, 2019

One thing that everyone seems to agree on though (including the above definition) is that if the union has a field of size 0 (such as is the case for MaybeUninit), then it may contain any value and thus there can be no layout optimizations.

This property certainly is necessary for the current implementation of MaybeUninit to be sound:

// 100% safe code!
let opt = Some(MaybeUninit::<NonZeroU8>::zeroed());
assert!(opt.is_some(),
        "Can be false if union with () is not a layout optimization barrier");

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 20, 2019

IIUC, something that follows/is required from/by @RalfJung's definition, is that for any layout-optimization to be possible:

  • all fields of the union must have the same niche at the same offset, and

  • at least one field in the union must be valid at all times (otherwise layout optimizations are unsound, like @HadrienG2 points out here)

That is, the union Mix { f1: (bool, u8), f2: (u8, bool), } described above cannot ever benefit from any layout optimizations. Only unions like the one @cramertj mentions here (e.g., union U { f1: NonZeroU32, f2: NonZeroU8, f3: NonNull<T>, f4: &'static mut T }, etc.) can, as long as we require an union field to be valid at all times.

if the union has a field of size 0 (such as is the case for MaybeUninit), then it may contain any value and thus there can be no layout optimizations.

This appears to be forward-compatible with all other guarantees I can imagine. If all agree, maybe we can put this part already in wording, so that we can just focus on considering the validity of unions without zero-sized fields afterwards.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 20, 2019

If we wanted to enable the layout optimizations mentioned by @cramertj , we could do that by extending this:

Bit/byte i of the union is allowed to have value v iff there is a variant of the union such that bit/byte i of the variant is allowed to have value v. We assume all variants to be "filled up" to the same size with padding, which may have any value.

with:

  • Bit/byte i of the union is not allowed to have value v iff the bit i of every variant of the union is not allowed to have value v.

  • An union is valid if at least one of its fields is valid.

This second point would break #73 (comment) , but without it, layout optimizations do not appear possible to me.

@RalfJung
Copy link
Member Author

Bit/byte i of the union is not allowed to have value v iff the bit i of every variant of the union is not allowed to have value v.

That is equivalent to what I said. P <-> Q and !P <-> !Q are the same statement.

An union is valid if at least one of its fields is valid.

Now I am confused. If we require this, the complicated definition I proposed is not necessary. But also this rules out the use case for union Mix { f1: (bool, u8), f2: (u8, bool), } that I described in the OP.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 22, 2019

@RalfJung

But also this rules out the use case for union Mix { f1: (bool, u8), f2: (u8, bool), } that I described in the OP.

The complicated definition above does not allow layout optimizations on Mix either, right? (EDIT: all union bits overlap with an u8).

@RalfJung
Copy link
Member Author

But you said at least one field must be valid? Or is that not what you meant?

Or did you just intend to state a consequence of that definition? Sure, just by virtue of Union { field: val } being safe we must ensure that a union is valid at least if one of its field is valid.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 22, 2019

@RalfJung My point is that if we require an union field to always be valid, we can optimize the layout of many unions, including Mix. If we require your "complex" definition, we can optimize the layout of less unions, e.g., we can't optimize the layout of Mix, because under your definition, all Mix bit-patterns are valid (all bits overlap with an u8). If we say that all bit-patterns are valid for all unions, then we can't optimize the layout of any union.

Maybe there is an even more complex definition than yours that would allow optimizing Mix layout, without requiring a field to be valid at all times, but AFAICT such a definition would end up being really close to "a field must be valid at all times".

I've the feeling that automatic layout optimizations for unions are incompatible with enabling any use case that requires that no fields are valid (like the one covered in the OP).

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Jun 22, 2019

we want to allow a bit pattern like 0x3 0x3, which can occur from code like

let m = Mix { f1: (false, 3) };
m.f2.0 = 3;

There is no demonstrated benefit from disallowing such code, and this kind of code seems perfectly reasonable around unions.

It seems like this is a question that is not that obvious. For instance, having

union Mix2 {
    f1: (NonNullU8, u8),
    f2: (u8, NonNullU8),
}

it would be nice if we could deduce that 0_u16 is not a valid bit pattern for Mix2, since it would enable Option<Mix2> to have an optimized layout.

Back to our "Schrödinger union pattern",

let m = Mix { f1: (false, 3) };
m.f2.0 = 3;

the question is whether we want the above example to require the the union be defined with a zero-sized field for the code not to be UB, so that more layout optimizations are possible, or if such benefit does not outweight the dangers of a potentially easy to miss UB scenario.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 23, 2019

it would be nice if we could deduce that 0_u16 is not a valid bit pattern for Mix2, since it would enable Option to have an optimized layout.

Please read the first post of this thread. This is the same as my Mix example up there. We explicitly consider 0_u16 a valid bit pattern for Mix2.

Maybe there is an even more complex definition than yours that would allow optimizing Mix layout,

No, that's impossible. If we want to accept the code in my OP, then that type cannot have a niche.

@danielhenrymantilla
Copy link
Contributor

Addressing that very first post was the whole point of my post: I was just pointing out that accepting 3, 3 as a valid pattern had the advantage of not making the provided example code not (require a zero a zero-sized variant to avoid) UB at the cost of a potential layout optimisation, and wondered if everybody agreed about such choice.
Or maybe you mean that the ship has already sailed?

@RalfJung
Copy link
Member Author

RalfJung commented Jun 25, 2019

Oh I see. So you are basically saying what I said in the OP but framing it differently. Fair :)

I don't think the ship has sailed in the sense that we have an explicit RFC-based consensus, but I do feel that many people would consider this one footgun too much for already tricky union code. So far nobody objected when I argued we should support that code.

Personally I think any layout optimizations around union are one footgun too much. ;) I think we should instead give the user the option to explicitly declare a niche on their type, if they want to get layout optimizations. Basically, I am arguing for explicit being better than implicit here.

@danielhenrymantilla
Copy link
Contributor

I agree. Having to opt-out (by adding a zero-sized field) of a layout optimization leading to "easy" UB doesn't seem like a great idea.
Forbidding the possibility to opt-in the optimization seems, however, overly restrictive.

Could there be a #[repr(Rust, at_least_one_field_valid)] attribute to enable this?
(I imagine an attribute within the repr family, since I assume niches to be part of the repr; it will depend on what the other thread has to say about it).

@alercah
Copy link

alercah commented Oct 7, 2022

Coming back to this with fresh eyes, of my two "for completeness" options above, the first involves extra complexity and I can't see any benefits to it. The second does technically provide some benefits (niches) but is much more complex.

I think I see three reasonable paths forward:

  1. Make #[repr(Rust)] bag-o-bytes by default, with the possibility of someday having a #[safe_field_access] attribute that announces the safety invariants required for safe field access, with no effect on the validity invariants.
  2. Same, except that we make the attribute affect validity invariants rather than safety.
  3. Make #[repr(Raw)] bag-o-bytes, and reserve #[repr(Rust)] for future possibilities, including for safe field access.

To pick between these, my opinions are:

  1. If we're going to have the complexity required to reason about the invariants required for safe field access anyway, we might as well get niches out of it. So if safe field access ever happens, we might as well make the invariants required for it be validity invariants.
  2. If we're going to make safe field access affect the validity invariant, it shouldn't use the same repr as the bag-o-bytes unions.

So therefore I prefer the third option for now. But it's a relatively weak opinion.

@digama0
Copy link

digama0 commented Oct 7, 2022

I don't think what you have described is necessarily a different repr. Assuming we do option 2 because we want to get niche optimized unions, we can still have #[repr(Rust)] be bag-o-bytes. I don't really have a strong opinion on whether #[safe_field_access] should be a "repr" (since it doesn't actually change the size or offset of the fields), but if you think it should be classed as such then #[repr(safe_field_access)] is an option as well.

That is, my preference is for #[repr(Rust)] being bag-o-bytes, with the possibility of additional attributes or repr qualifiers to implement something like unions with niches (which requires a whole bunch of design work anyway). Given that such an attribute is not currently under discussion, that plan would be consistent with both your options 1 and 2, but would block option 3 which seems to be aiming for unadorned unions being safe in the future (?).

@JakobDegen JakobDegen added S-pending-design Status: Resolving this issue requires addressing some open design questions and removed C-active-discussion-topic labels Jun 6, 2023
@deltragon
Copy link

deltragon commented Jul 5, 2023

This discussion seems to go that direction as well, but I noticed in rust-lang/rust#113344 (comment) that this would be a usecase for having unions not be unconditionally marked as "may be uninit" (either by some "no bytes are padding" check, or with a different repr).

@RalfJung
Copy link
Member Author

RalfJung commented Jul 5, 2023

having unions not be unconditionally marked as "may be uninit"

More like, sometimes having them marked as "never uninit". "may be uninit" is the natural default state of bytes in memory, work has to be done to get anything else. ;)

I wouldn't say that is where the discussion seems to go? @CAD97 made a proposal above but that makes heavy use of predict which is the "nuclear option" of specifying a semantics -- that's angelic non-determinism, which is tempting and powerful and causes all sorts of problems.

More realistically, we'd end up with a situation where the type of a union is described by a list of constraints on each byte, and we leave unspecified how exactly rustc computes that list of constraints.

enum Type {
  ...
  Union {
    fields: Fields,
    bytes: List<UnionByte>,
  }
}

enum UnionByte {
  /// This byte may be anything, even uninit.
  Any,
  /// This byte must be initialized.
  Init,
  /// This byte is padding, and not preserved on typed copies.
  Padding,
}

@scottmcm
Copy link
Member

scottmcm commented Jul 5, 2023

Naïvely, I've always liked the "validity for unions is that at least one field must be valid", since it has the straight-forward _allow_uninit: () as an opt-in for "all bytes uninit is ok", whereas if that's the default, there's no obviously-best way to request the other.

But I guess if union bytes need to be able to carry provenance too, that's not enough to solve it.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 5, 2023

Naïvely, I've always liked the "validity for unions is that at least one field must be valid", since it has the straight-forward _allow_uninit: () as an opt-in for "all bytes uninit is ok", whereas if that's the default, there's no obviously-best way to request the other.

That is not an option, since it can be violated in safe code. An example is literally in the OP of this thread. :)

@chorman0773
Copy link
Contributor

Is there even a meaningful rule aside from "Yes", tbh?

@digama0
Copy link

digama0 commented Jul 6, 2023

More realistically, we'd end up with a situation where the type of a union is described by a list of constraints on each byte, and we leave unspecified how exactly rustc computes that list of constraints.

Are you tracking (in minirust repo or elsewhere) the constraints that must be satisfied by a compiler-produced layout scheme? Ideally in executable style similar to minirust itself. Stuff like "field offsets must be nonoverlapping and contained in the type layout" would go there, in addition to "a field value type must ensure that all its bytes are consistent with the UnionBytes overlapping it" for this new scheme.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 6, 2023

Is there even a meaningful rule aside from "Yes", tbh?

"Yes" to what?

Are you tracking (in minirust repo or elsewhere) the constraints that must be satisfied by a compiler-produced layout scheme? Ideally in executable style similar to minirust itself.

I don't think there is a way to make that executable. "Can the n-th byte of representations of this type be uninit" is not a question that can be answered operationally (except for exhaustive enumeration). Specifying the set of legal choices for layout will be part of the Rust-to-MiniRust lowering (similar to how the existing Rust-to-MiniRust translator computes the "chunks").

Field offsets can of course overlap for unions, the only constraint is that the field fits in the union size.

@digama0
Copy link

digama0 commented Jul 6, 2023

I don't think there is a way to make that executable. "Can the n-th byte of representations of this type be uninit" is not a question that can be answered operationally (except for exhaustive enumeration).

I'm not sure this is true. Given a grammar of type-defining layouts, I would expect it to be fairly easy to recursively determine whether the nth byte can have uninit as a value. It's not like we have to worry about arbitrary types with arbitrary safety invariants here, only the things the compiler supports: enums, unions, and structs around primitive types with specified validity invariants. That is, any conforming compiler is required to build layouts out of some basic building blocks provided by the spec (like "struct with this layout" or "enum with this discrimination tree to read the discriminant"), and for all of those building blocks evaluating their properties should be straightforward.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 6, 2023

Sure, we can write such an algorithm as part of the Rust-to-MiniRust lowering. As I said, that's what we already do for constructing the "chunks".

But in MiniRust, actually checking whether the type representation for a given type satisfies this property is not possible. I thought that's what you were asking.

@digama0
Copy link

digama0 commented Jul 6, 2023

Let me put it this way: rustc picks some implementation-defined strategy, maybe including randomness or the phase of the moon, to determine for each type what its layout is. This layout is described by a grammar, which is already written down in minirust - this is Type, as far as I can tell. You can then evaluate given an element of Type whether it satisfies all the local validity requirements.

We already have Type::inhabited which determines whether a Type "has any elements" without having to do anything like an exhaustive enumeration of values. This would be just like that, but for determining whether a type structurally cannot hold uninit at a given byte, because it's not padding and it is part of an integer value, something like that. (It would then be a theorem that if ty.valid_for(i, UnionByte::Init) is true then ty.decode(bytes).is_some() implies bytes[i] is initialized.)

Is there something about Type that makes it unsuitable to do this? Everything I can see in the definition would imply that you can write a valid_for() which is basically linear time computable.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 7, 2023

We have inhabited since it affects decode for references. It is completely okay to always set inhabited to true.

Sure, we could add some more syntactic structure for this. I don't see a good reason to do that though since MiniRust doesn't care. This is entirely on the frontend. The Rust-to-MiniRust translation will specify the set of possible layouts to choose for any given union, and that is where we will need such an analysis -- but not inside MiniRust itself.

@chorman0773
Copy link
Contributor

"Yes" to what?

Yes, the bytes are valid. Or, the validity predicate for a union type is valid(b)=true.

I can't see any rule that could assign a validity invariant that doesn't involve a massive Decision tree. And even then, only something trivial like:

pub union Foo{
     x: NonZeroU32,
     y: (NonZeroU16, NonZeroU16)
}

would actually get assigned any invalid bit patterns.

At the most, we could limit uninit bytes to where any variant allows an uninit byte, or a padding byte introduced by the union itself (either tail padding, or, for the repr(Rust) union fans, leading padding). I can't see this reasonbly being exploitable, though, especially since if we do read out a scalar (or something that contains scalars), we know it's not uninit.

@RalfJung
Copy link
Member Author

@scottmcm why did you say it was important here that this type be noundef?

@scottmcm
Copy link
Member

@RalfJung I don't have any concrete issue. It's just that it's a scalar pair, so today it's passed as ptr noundef nonnull + ptr noundef in LLVM function arguments (https://rust.godbolt.org/z/Wr93ExKod) and I worry that losing that noundef on the second pointer would inhibit optimizations.

@scottmcm
Copy link
Member

Oh, wait, it's worse than I thought -- if I used a union then it's no longer eligible to be a scalar pair at all, and gets passed by pointer instead https://rust.godbolt.org/z/bWhEGeYs1.

@CAD97
Copy link

CAD97 commented Jul 15, 2023

That led to me doing some more experimentation and discovering more ways #[repr(transparent)] has fun implications, in that MaybeUninit<(usize, usize)> gets passed as two i64LLVM parameters at the LLVM level, meaning it'd lose any provenance. This is compared to MaybeUninit<(*mut (), *mut ())> which gets passed as two ptrLLVM.

For the specific case of #[repr(transparent)], presuming we want to preserve ABI transparency, I think the validity effectively has to be exactly the transparently wrapped type's, except that bytes may also store uninit, and with copies only preserving the same bytes as for the wrapped type. (Importantly, this doesn't impact whether provenance is valid.)

(Apologies if this was asserted upthread; I don't recall it being directly stated, at least not so plainly.)

@RalfJung
Copy link
Member Author

RalfJung commented Jul 16, 2023

It's just that it's a scalar pair, so today it's passed as ptr noundef nonnull + ptr noundef in LLVM function arguments (https://rust.godbolt.org/z/Wr93ExKod) and I worry that losing that noundef on the second pointer would inhibit optimizations.

My hope was always that we'd eventually get the ability to annotate types to add the attributes we want, instead of somehow having to preserve them through a union. (E.g. you'd probably also have situations where you would want to preserve the nonnull.)

if I used a union then it's no longer eligible to be a scalar pair at all, and gets passed by pointer instead https://rust.godbolt.org/z/bWhEGeYs1.

A repr(transparent) union should help.

MaybeUninit<(usize, usize)> gets passed as two i64LLVM parameters at the LLVM level, meaning it'd lose any provenance.

Well, maybe it would, it's not like LLVM specifies this. Doesn't LLVM itself even sometimes turn ptr loads into i64 loads?

The much-discussed "byte" type in LLVM would fix this. So we could also document this as a known issue (unlikely to cause trouble in practice) with the LLVM backend caused by an LLVM limitation, and hope LLVM can one day give us a clear answer for how to express "a type of a given fixed size that gets passed in registers and can carry provenance". I think currently they would say i64 is the type to use for that.

@scottmcm
Copy link
Member

A repr(transparent) union should help.

I must have misunderstood your point here, because I can't make a two-field transparent union:

error[E0690]: transparent union needs at most one non-zero-sized field, but has 2
 --> src/lib.rs:4:1
  |
4 | pub union Foo<T> {
  | ^^^^^^^^^^^^^^^^ needs at most one non-zero-sized field, but has 2
5 |     ptr: NonNull<T>,
  |     --------------- this field is non-zero-sized
6 |     len: usize,
  |     ---------- this field is non-zero-sized

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=28e02ef2968d6500948188e22b976db8

@RalfJung
Copy link
Member Author

Ah, interesting. MaybeUninit<(NonNull, usize)> will be a scalar pair. So probably you need a repr(transparent) union with one field that is a pair.

@scottmcm
Copy link
Member

scottmcm commented Jul 20, 2023

I don't think that would really help me in #113344, though.

The union would be convenient -- and way less MIR -- if instead of

let $len = unsafe { &mut *ptr::addr_of_mut!($this.end_or_len).cast::<usize>() };
let $end = unsafe { &mut *ptr::addr_of_mut!($this.end_or_len).cast::<NonNull<T>>() };

I could just write

let $len = unsafe { &mut $this.end_or_len.len };
let $end = unsafe { &mut $this.end_or_len.ptr };

But sticking a pair into a transparent union doesn't make things any easier than just having the *const T field -- it probably makes it overall way harder since accessing the begin pointer becomes messy too.


I guess what this boils down to is that I want an "unsafe enum", not really a union. (Not that I know what to do with that observation.)

@RalfJung
Copy link
Member Author

Ah sorry, what I said makes no sense.

I guess this is a layout computation thing... a union where all fields are scalar of the same size, could be passed around as a scalar.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 2, 2023

This issue got too long and just asking about the validity invariant skips some key aspects (such as which bytes are preserved when a union gets copied). Closing in favor of #438, which I think summarizes the remaining open questions here and merges this with #156.

@RalfJung RalfJung closed this as completed Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-unions Topic: Related to unions A-validity Topic: Related to validity invariants S-pending-design Status: Resolving this issue requires addressing some open design questions
Projects
None yet
Development

No branches or pull requests