Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a stucts-and-tuples chapter #31

Merged
merged 19 commits into from
Oct 25, 2018
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions reference/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@
- [Unions](./active_discussion/unions.md)
- [Uninitialized memory](./active_discussion/uninitialized_memory.md)
- [Data representation](./representation.md)
- [Structs and tuples](./representation/structs-and-tuples.md)
- [Optimizations](./optimizations.md)
- [Optimizing immutable memory](./optimizations/immutable_memory.md)
5 changes: 0 additions & 5 deletions reference/src/representation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,6 @@

https://github.com/rust-rfcs/unsafe-code-guidelines/issues/9

## Representation of structs and tuples

https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11
https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12

## Representation of enums

https://github.com/rust-rfcs/unsafe-code-guidelines/issues/10
Expand Down
370 changes: 370 additions & 0 deletions reference/src/representation/structs-and-tuples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,370 @@
# Representation of structs and tuples

**Disclaimer:** This chapter represents the consensus from issues
[#11] and [#12]. The statements in here are not (yet) "guaranteed"
not to change until an RFC ratifies them.

[#11]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11
[#12]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12

## Tuple types

In general, an anonymous tuple type `(T1..Tn)` of arity N is laid out
"as if" there were a corresponding tuple struct declared in libcore:

```rust
#[repr(Rust)]
struct TupleN<P1..Pn:?Sized>(P1..Pn);
```

In this case, `(T1..Tn)` would be compatible with `TupleN<T1..Tn>`.
As discussed below, this generally means that the compiler is **free
to re-order field layout** as it wishes. Thus, if you would like a
guaranteed layout from a tuple, you are generally advised to create a
named struct with a `#[repr(C)]` annotation (see [the section on
structs for more details](#structs)).

Note that the final element of a tuple (`Pn`) is marked as `?Sized` to
permit unsized tuple coercion -- this is implemented on nightly but is
currently unstable ([tracking issue][#42877]). In the future, we may
extend unsizing to other elements of tuples as well.

[#42877]: https://github.com/rust-lang/rust/issues/42877

### Other notes on tuples

Some related discussion:

- [RFC #1582](https://github.com/rust-lang/rfcs/pull/1582) proposed
that tuple structs should have a "nested representation", where
e.g. `(T1, T2, T3)` would in fact be laid out as `(T1, (T2,
T3))`. The purpose of this was to permit variadic matching and so
forth against some suffix of the struct. This RFC was not accepted,
however. This lay out requires extra padding and seems somewhat
surprising: it means that the layout of tuples and tuple structs
would diverge significantly from structs with named fields.

<a name="structs"></a>

## Struct types

Structs come in two principle varieties:

```rust
// Structs with named fields
struct Foo { f1: T1, .., fn: Tn }

// Tuple structs
struct Foo(T1, .., Tn);
```

In terms of their layout, tuple structs can be understood as
equivalent to a named struct with fields named `0..n-1`:

```rust
struct Foo {
0: T1,
...
n-1: Tn
}
```

(In fact, one may use such field names in patterns or in accessor
expressions like `foo.0`.)

Structs can have various `#[repr]` flags that influence their layout:

- `#[repr(Rust)]` -- the default.
- `#[repr(C)]` -- request C compatibility
- `#[repr(align(N))]` -- specify the alignment
- `#[repr(packed)]` -- request packed layout where fields are not internally aligned
- `#[repr(transparent)]` -- request that a "wrapper struct" be treated
"as if" it were an instance of its field type when passed as an
argument

### Default layout ("repr rust")

**The default layout of structs is not specified.** As of this
writing, we have not reached a full consensus on what limitations
should exist on possible field struct layouts, so effectively one must
assume that the compiler can select any layout it likes for each
struct on each compilation, and it is not required to select the same
layout across two compilations. This implies that (among other things)
two structs with the same field types may not be laid out in the same
way (for example, the hypothetical struct representing tuples may be
laid out differently from user-declared structs).

Known things that can influence layout (non-exhaustive):

- the type of the struct fields and the layout of those types
- compiler settings, including esoteric choices like optimization fuel

**A note on determinism.** The definition above does not guarantee
determinism between executions of the compiler -- two executions may
select different layouts, even if all inputs are identical. Naturally,
in practice, the compiler aims to produce deterministic output for a
given set of inputs. However, it is difficult to produce a
comprehensive summary of the various factors that may affect the
layout of structs, and so for the time being we have opted for a
conservative definition.

**Compiler's current behavior.** As of the time of this writing, the
compiler will reorder struct fields to minimize the overall size of
the struct (and in particular to eliminate padding due to alignment
restrictions).

Layout is presently defined not in terms of a "fully monomorphized"
struct definition but rather in terms of its generic definition along
with a set of substitutions (values for each type parameter; lifetime
parameters do not affect layout). This distinction is important
because of *unsizing* -- if the final field has generic type, the
compiler will not reorder it, to allow for the possibility of
unsizing. E.g., `struct Foo { x: u16, y: u32 }` and `struct Foo<T> {
x: u16, y: T }` where `T = u32` are not guaranteed to be identical.

#### Unresolved questions

During the course of the discussion in [#11] and [#12], various
suggestions arose to limit the compiler's flexibility. These questions
are currently considering **unresolved** and -- for each of them -- an
issue has been opened for further discussion on the repository. This
section documents the questions and gives a few light details, but the
reader is referred to the issues for further discussion.

**Zero-sized structs ([#37]).** If you have a struct which --
transitively -- contains no data of non-zero size, then the size of
that struct will be zero as well. These zero-sized structs appear
frequently as exceptions in other layout considerations (e.g.,
single-field structs). An example of such a struct is
`std::marker::PhantomData`.

[#37]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/37

**Single-field structs ([#34]).** If you have a struct with single field
(`struct Foo { x: T }`), should we guarantee that the memory layout of
`Foo` is identical to the memory layout of `T` (note that ABI details
around function calls may still draw a distinction, which is why
`#[repr(transparent)]` is needed). What about zero-sized types like
`PhantomData`?

[#34]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/34

**Homogeneous structs ([#36]).** If you have homogeneous structs, where all
the `N` fields are of a single type `T`, can we guarantee a mapping to
the memory layout of `[T; N]`? How do we map between the field names
and the indices? What about zero-sized types?

[#36]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/36

**Deterministic layout ([#35]).** Can we say that layout is some deterministic
function of a certain, fixed set of inputs? This would allow you to be
sure that if you do not alter those inputs, your struct layout would
not change, even if it meant that you can't predict precisely what it
will be. For example, we might say that struct layout is a function of
the struct's generic types and its substitutions, full stop -- this
would imply that any two structs with the same definition are laid out
the same. This might interfere with our ability to do profile-guided
layout or to analyze how a struct is used and optimize based on
that. Some would call that a feature.

[#35]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/35

### C-compatible layout ("repr C")

For structs tagged `#[repr(C)]`, the compiler will apply a C-like
layout scheme. See section 6.7.2.1 of the [C17 specification][C17] for
a detailed write-up of what such rules entail (as well as the relevant
specs for your platform). For most platforms, however, this means the
following:
nikomatsakis marked this conversation as resolved.
Show resolved Hide resolved

[C17]: http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf

- Field order is preserved.
- The first field begins at offset 0.
- Assuming the struct is not packed, each field's offset is aligned[^aligned] to
the ABI-mandated alignment for that field's type, possibly creating
unused padding bits.
nikomatsakis marked this conversation as resolved.
Show resolved Hide resolved
- The total size of the struct is rounded up to its overall alignment.

[^aligned]: Aligning an offset O to an alignment A means to round up the offset O until it is a multiple of the alignment A.

One deviation from C comes about with "empty structs". In Rust, a
struct that contains (transitively) no data members is considered to
have size zero, which is not something that exists in C. This includes
a struct like `#[repr(C)] struct Foo { }`. Further, when a
`#[repr(C)]` struct has a field whose type has zero-size, that field
may induce padding due to its alignment, but will not otherwise affect
the offsets of subsequent fields (as it takes up zero space).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @rkruppe noted on Zulip, this seems like a problem. It means that "copy-pasting struct definitions and adding repr(C) everywhere" does not give you C compatibility, because your Foo would actually take space when put in a larger struct in C.

This seems like a bug, TBH. I am not sure if it is a bug that we can still fix. Might be worth having at least a warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — I was thinking the same in that conversation. That is, "bug and not clearly a bug we can fix", which does suggest that at least a lint is warranted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the benefit of others, the Zulip conversation was here.

Copy link

@strega-nil strega-nil Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not actually how C works - C does not allow empty structs. The godbolt link was from C++, which does allow empty structs, and in which empty structs have size one. Basically, this would only be an issue for somebody like Mozilla, who talk to C++ through a non-C ABI.

Notably, also, gcc and clang's C extension for empty structs has sizeof(struct { }) = 0: https://godbolt.org/z/K3_5fJ

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transcribing another thing @ubsan noted on Zulip: empty structs are accepted as an extension by some C compilers, but (at least) GCC and Clang make them have size zero, unlike C++. Example: https://godbolt.org/z/AS2gdC

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow-by-default at best - 0 size structures are weird in C++, and usually you'd use either EBO or [[no_unique_address]] with them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't them being weird another argument for making this a warn-by-default lint? I expect many people will not know this. I am not a C++ expert, but I have programmed in C++ for many years and never heard about this; I don't think we can expect everybody doing C++ FFI to know about these issues.

Copy link
Contributor

@gnzlbg gnzlbg Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to summarize: C does not allow empty structs, some C language extensions allow empty structs with sizeof == 0, C++ does allow empty structs but these have a sizeof == 1 unless they are inherited from or they are fields that have the [[no_unique_address]] attribute (in both cases, they don't increase the size of the struct - i'm unsure what role the alignment of the type plays though).

I think #[repr(C)] should warn-by-default on this when it makes a difference, that is, when the ZST would change the layout. We could have an opt-in warning that always warns on ZST being used in #[repr(C)] but I fear that would be extremely noisy for little win.

About the situation with the C-language extension and C++ it appears that #[repr(C)] != #[repr(Cxx)], so maybe we just need to add new reprs to deal with those. In the mean time it might be worth it to just ignore C++ while specifying #[repr(C)] here (maybe add a note so that we don't forget).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to google EBO: Empty base optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RalfJung note that EBO is guaranteed by the C++>=11 standard for types with standard layout, like empty structs: struct T {};, so it is a required layout optimization.


The intention is that if one has a set of C struct declarations and a
corresponding set of Rust struct declarations, all of which are tagged
with `#[repr(C)]`, then the layout of those structs will all be
identical. Note that this setup implies that none of the structs in
question can contain any `#[repr(Rust)]` structs (or Rust tuples), as
those would have no corresponding C struct declaration -- as
`#[repr(Rust)]` types have undefined layout, you cannot safely declare
their layout in a C program.

See also the notes on [ABI compatibility](#fnabi) under the section on `#[repr(transparent)]`.

nikomatsakis marked this conversation as resolved.
Show resolved Hide resolved
### Fixed alignment

The `#[repr(align(N))]` attribute may be used to raise the alignment
of a struct, as described in [The Rust Reference][TRR-align].

[TRR-align]: (https://doc.rust-lang.org/stable/reference/type-layout.html#the-align-representation).

### Packed layout

The `#[repr(packed(N))]` attribute may be used to impose a maximum
limit on the alignments for individual fields. It is most commonly
nikomatsakis marked this conversation as resolved.
Show resolved Hide resolved
used with an alignment of 1, which makes the struct as small as
possible. For example, in a `#[repr(packed(2))]` struct, a `u8` or
`u16` would be aligned at 1- or 2-bytes respectively (as normal), but
a `u32` would be aligned at only 2 bytes instead of 4. In the absence
of an explicit `#[repr(align)]` directive, `#[repr(packed(N))]` also
sets the alignment for the struct as a whole to N bytes.

The resulting fields may not fall at properly aligned boundaries in
memory. This makes it unsafe to create a Rust reference (`&T` or `&mut
T`) to those fields, as the compiler requires that all reference
values must always be aligned (so that it can use more efficient
load/store instructions at runtime). See the [Rust reference for more
details][TRR-packed].

[TRR-packed]: https://doc.rust-lang.org/stable/reference/type-layout.html#the-packed-representation

<a name="fnabi"> </a>

### Function call ABI compatibility

In general, when invoking functions that use the C ABI, `#[repr(C)]`
structs are guaranteed to be passed in the same way as their
corresponding C counterpart (presuming one exists). `#[repr(Rust)]`
structs have no such guarantee. This means that if you have an `extern
"C"` function, you cannot pass a `#[repr(Rust)]` struct as one of its
arguments. Instead, one would typically pass `#[repr(C)]` structs (or
possibly pointers to Rust-structs, if those structs are opaque on the
other side, or the callee is defined in Rust).

However, there is a subtle point about C ABIs: in some C ABIs, passing
a struct with one field of type `T` as an argument is **not**
equivalent to just passing a value of type `T`. So e.g. if you have a
C function that is defined to take a `uint32_t`:

```C
void some_function(uint32_t value) { .. }
```

It is **incorrect** to pass in a struct as that value, even if that
struct is `#[repr(C)`] and has only one field:

```rust
#[repr(C)]
struct Foo { x: u32 }

extern "C" some_function(Foo);

some_function(Foo { x: 22 }); // Bad!
```

Instead, you should declare the struct with `#[repr(transparent)]`,
which specifies that `Foo` should use the ABI rules for its field
type, `u32`. This is useful when using "wrapper structs" in Rust to
give stronger typing guarantees.

`#[repr(transparent)]` can only be applied to structs with a single
field whose type `T` has non-zero size, along with some number of
other fields whose types are all zero-sized (typically
`std::marker::PhantomData` fields). The struct then takes on the "ABI
behavior" of the type `T` that has non-zero size.

(Note further that the Rust ABI is undefined and theoretically may
vary from compiler revision to compiler revision.)

nikomatsakis marked this conversation as resolved.
Show resolved Hide resolved
## Unresolved question: Guaranteeing compatible layouts?

One key unresolved question was whether we would want to guarantee
that two `#[repr(Rust)]` structs whose fields have the same types are
laid out in a "compatible" way, such that one could be transmuted to
the other. @rkruppe laid out a [number of
examples](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-419956939)
where this might be a reasonable thing to expect. As currently
written, and in an effort to be conservative, we make no such
guarantee, though we do not firmly rule out doing such a thing in the future.

It seems like it may well be desirable to -- at minimum -- guarantee
that `#[repr(Rust)]` layout is "some deterministic function of the
struct declaration and the monomorphized types of its fields". Note
that it is not sufficient to consider the monomorphized type of a
struct's fields: due to unsizing coercions, it matters whether the
struct is declared in a generic way or not, since the "unsized" field
must presently be [laid out last in the
structure](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12#issuecomment-417843595). (Note
that tuples are always coercible (see [#42877] for more information),
and are always declared as generics.) This implies that our
"deterministic function" also takes as input the form in which the
fields are declared in the struct.

However, that rule is not true today. For example, the compiler
includes an option (called "optimization fuel") that will enable us to
alter the layout of only the "first N" structs declared in the
source. When one is accidentally relying on the layout of a structure,
this can be used to track down the struct that is causing the problem.

[#42877]: https://github.com/rust-lang/rust/issues/42877
[pg-unsized-tuple]: https://play.rust-lang.org/?gist=46399bb68ac685f23beffefc014203ce&version=nightly&mode=debug&edition=2015

There are also benefits to having fewer guarantees. For example:

- Code hardening tools can be used to randomize the layout of individual structs.
- Profile-guided optimization might analyze how instances of a
particular struct are used and tweak the layout (e.g., to insert
padding and reduce false sharing).
- However, there aren't many tools that do this sort of thing
([1](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420650851),
[2](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420681763)). Moreover,
it would probably be better for the tools to merely recommend
annotations that could be added
([1](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420077105),
[2](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420077105)),
such that the knowledge of the improved layouts can be recorded in the
source.

As a more declarative alternative, @alercah [proposed a possible
extension](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12#issuecomment-420165155)
that would permit one to declare that the layout of two structs or
types are compatible (e.g., `#[repr(as(Foo))] struct Bar { .. }`),
nikomatsakis marked this conversation as resolved.
Show resolved Hide resolved
thus permitting safe transmutes (and also ABI compatibility). One
might also use some weaker form of `#[repr(C)]` to specify a "more
deterministic" layout. These areas need future exploration.

## Counteropinions and other notes

@joshtripplet [argued against reordering struct
fields](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-417953576),
suggesting instead it would be better if users reordering fields
themselves. However, there are a number of downsides to such a
proposal (and -- further -- it does not match our existing behavior):

- In a generic struct, the [best ordering of fields may not be known
ahead of
time](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420659840),
so the user cannot do it manually.
- If layout is defined, and a library exposes a struct with all public
fields, then clients may be more likely to assume that the layout of
that struct is stable. If they were to write unsafe code that relied
on this assumption, that would break if fields were reordered. But
libraries may well expect the freedom to reorder fields. This case
is weakened because of the requirement to write unsafe code (after
all, one can always write unsafe code that relies on virtually any
implementation detail); if we were to permit **safe** casts that
rely on the layout, then reordering fields would clearly be a
breaking change (see also [this
comment](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420117856)
and [this
thread](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/31#discussion_r224955817)).
- Many people would prefer the name ordering to be chosen for
"readability" and not optimal layout.

## Footnotes