rust-lang · avadacatavra · Oct 25, 2018 · Sep 25, 2018 · Sep 26, 2018 · Oct 11, 2018
diff --git a/reference/src/SUMMARY.md b/reference/src/SUMMARY.md
@@ -14,5 +14,6 @@
   - [Unions](./active_discussion/unions.md)
   - [Uninitialized memory](./active_discussion/uninitialized_memory.md)
 - [Data representation](./representation.md)
+  - [Structs and tuples](./representation/structs-and-tuples.md)
 - [Optimizations](./optimizations.md)
   - [Optimizing immutable memory](./optimizations/immutable_memory.md) 
diff --git a/reference/src/representation.md b/reference/src/representation.md
@@ -6,11 +6,6 @@
 
 https://github.com/rust-rfcs/unsafe-code-guidelines/issues/9
 
-## Representation of structs and tuples
-
-https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11
-https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12
-
 ## Representation of enums 
 
 https://github.com/rust-rfcs/unsafe-code-guidelines/issues/10 

diff --git a/reference/src/representation/structs-and-tuples.md b/reference/src/representation/structs-and-tuples.md
@@ -0,0 +1,370 @@
+# Representation of structs and tuples
+
+**Disclaimer:** This chapter represents the consensus from issues
+[#11] and [#12].  The statements in here are not (yet) "guaranteed"
+not to change until an RFC ratifies them.
+
+[#11]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11
+[#12]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12
+
+## Tuple types
+
+In general, an anonymous tuple type `(T1..Tn)` of arity N is laid out
+"as if" there were a corresponding tuple struct declared in libcore:
+
+```rust
+#[repr(Rust)]
+struct TupleN<P1..Pn:?Sized>(P1..Pn);
+```
+
+In this case, `(T1..Tn)` would be compatible with `TupleN<T1..Tn>`.
+As discussed below, this generally means that the compiler is **free
+to re-order field layout** as it wishes. Thus, if you would like a
+guaranteed layout from a tuple, you are generally advised to create a
+named struct with a `#[repr(C)]` annotation (see [the section on
+structs for more details](#structs)).
+
+Note that the final element of a tuple (`Pn`) is marked as `?Sized` to
+permit unsized tuple coercion -- this is implemented on nightly but is
+currently unstable ([tracking issue][#42877]). In the future, we may
+extend unsizing to other elements of tuples as well.
+
+[#42877]: https://github.com/rust-lang/rust/issues/42877
+
+### Other notes on tuples
+
+Some related discussion:
+
+- [RFC #1582](https://github.com/rust-lang/rfcs/pull/1582) proposed
+  that tuple structs should have a "nested representation", where
+  e.g. `(T1, T2, T3)` would in fact be laid out as `(T1, (T2,
+  T3))`. The purpose of this was to permit variadic matching and so
+  forth against some suffix of the struct. This RFC was not accepted,
+  however. This lay out requires extra padding and seems somewhat
+  surprising: it means that the layout of tuples and tuple structs
+  would diverge significantly from structs with named fields.
+
+<a name="structs"></a>
+
+## Struct types
+
+Structs come in two principle varieties:
+
+```rust
+// Structs with named fields
+struct Foo { f1: T1, .., fn: Tn }
+
+// Tuple structs
+struct Foo(T1, .., Tn);
+```
+
+In terms of their layout, tuple structs can be understood as
+equivalent to a named struct with fields named `0..n-1`:
+
+```rust
+struct Foo {
+  0: T1,
+  ...
+  n-1: Tn
+}
+```
+
+(In fact, one may use such field names in patterns or in accessor
+expressions like `foo.0`.)
+
+Structs can have various `#[repr]` flags that influence their layout:
+
+- `#[repr(Rust)]` -- the default.
+- `#[repr(C)]` -- request C compatibility
+- `#[repr(align(N))]` -- specify the alignment
+- `#[repr(packed)]` -- request packed layout where fields are not internally aligned
+- `#[repr(transparent)]` -- request that a "wrapper struct" be treated
+  "as if" it were an instance of its field type when passed as an
+  argument
+
+### Default layout ("repr rust")
+
+**The default layout of structs is not specified.** As of this
+writing, we have not reached a full consensus on what limitations
+should exist on possible field struct layouts, so effectively one must
+assume that the compiler can select any layout it likes for each
+struct on each compilation, and it is not required to select the same
+layout across two compilations. This implies that (among other things)
+two structs with the same field types may not be laid out in the same
+way (for example, the hypothetical struct representing tuples may be
+laid out differently from user-declared structs).
+
+Known things that can influence layout (non-exhaustive):
+
+- the type of the struct fields and the layout of those types
+- compiler settings, including esoteric choices like optimization fuel
+
+**A note on determinism.** The definition above does not guarantee
+determinism between executions of the compiler -- two executions may
+select different layouts, even if all inputs are identical. Naturally,
+in practice, the compiler aims to produce deterministic output for a
+given set of inputs. However, it is difficult to produce a
+comprehensive summary of the various factors that may affect the
+layout of structs, and so for the time being we have opted for a
+conservative definition.
+
+**Compiler's current behavior.** As of the time of this writing, the
+compiler will reorder struct fields to minimize the overall size of
+the struct (and in particular to eliminate padding due to alignment
+restrictions).
+
+Layout is presently defined not in terms of a "fully monomorphized"
+struct definition but rather in terms of its generic definition along
+with a set of substitutions (values for each type parameter; lifetime
+parameters do not affect layout). This distinction is important
+because of *unsizing* -- if the final field has generic type, the
+compiler will not reorder it, to allow for the possibility of
+unsizing. E.g., `struct Foo { x: u16, y: u32 }` and `struct Foo<T> {
+x: u16, y: T }` where `T = u32` are not guaranteed to be identical.
+
+#### Unresolved questions
+
+During the course of the discussion in [#11] and [#12], various
+suggestions arose to limit the compiler's flexibility. These questions
+are currently considering **unresolved** and -- for each of them -- an
+issue has been opened for further discussion on the repository. This
+section documents the questions and gives a few light details, but the
+reader is referred to the issues for further discussion.
+
+**Zero-sized structs ([#37]).** If you have a struct which --
+transitively -- contains no data of non-zero size, then the size of
+that struct will be zero as well. These zero-sized structs appear
+frequently as exceptions in other layout considerations (e.g.,
+single-field structs). An example of such a struct is
+`std::marker::PhantomData`.
+
+[#37]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/37
+
+**Single-field structs ([#34]).** If you have a struct with single field
+(`struct Foo { x: T }`), should we guarantee that the memory layout of
+`Foo` is identical to the memory layout of `T` (note that ABI details
+around function calls may still draw a distinction, which is why
+`#[repr(transparent)]` is needed). What about zero-sized types like
+`PhantomData`?
+
+[#34]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/34
+
+**Homogeneous structs ([#36]).** If you have homogeneous structs, where all
+the `N` fields are of a single type `T`, can we guarantee a mapping to
+the memory layout of `[T; N]`? How do we map between the field names
+and the indices? What about zero-sized types?
+
+[#36]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/36
+
+**Deterministic layout ([#35]).** Can we say that layout is some deterministic
+function of a certain, fixed set of inputs? This would allow you to be
+sure that if you do not alter those inputs, your struct layout would
+not change, even if it meant that you can't predict precisely what it
+will be. For example, we might say that struct layout is a function of
+the struct's generic types and its substitutions, full stop -- this
+would imply that any two structs with the same definition are laid out
+the same. This might interfere with our ability to do profile-guided
+layout or to analyze how a struct is used and optimize based on
+that. Some would call that a feature.
+
+[#35]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/35
+
+### C-compatible layout ("repr C")
+
+For structs tagged `#[repr(C)]`, the compiler will apply a C-like
+layout scheme. See section 6.7.2.1 of the [C17 specification][C17] for
+a detailed write-up of what such rules entail (as well as the relevant
+specs for your platform). For most platforms, however, this means the
+following:
+
+[C17]: http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf
+
+- Field order is preserved.
+- The first field begins at offset 0.
+- Assuming the struct is not packed, each field's offset is aligned[^aligned] to
+  the ABI-mandated alignment for that field's type, possibly creating
+  unused padding bits.
+- The total size of the struct is rounded up to its overall alignment.  
+
+[^aligned]: Aligning an offset O to an alignment A means to round up the offset O until it is a multiple of the alignment A.
+
+One deviation from C comes about with "empty structs". In Rust, a
+struct that contains (transitively) no data members is considered to
+have size zero, which is not something that exists in C. This includes
+a struct like `#[repr(C)] struct Foo { }`. Further, when a
+`#[repr(C)]` struct has a field whose type has zero-size, that field
+may induce padding due to its alignment, but will not otherwise affect
+the offsets of subsequent fields (as it takes up zero space).
+
+The intention is that if one has a set of C struct declarations and a
+corresponding set of Rust struct declarations, all of which are tagged
+with `#[repr(C)]`, then the layout of those structs will all be
+identical. Note that this setup implies that none of the structs in
+question can contain any `#[repr(Rust)]` structs (or Rust tuples), as
+those would have no corresponding C struct declaration -- as
+`#[repr(Rust)]` types have undefined layout, you cannot safely declare
+their layout in a C program.
+
+See also the notes on [ABI compatibility](#fnabi) under the section on `#[repr(transparent)]`.
+
+### Fixed alignment
+
+The `#[repr(align(N))]` attribute may be used to raise the alignment
+of a struct, as described in [The Rust Reference][TRR-align].
+
+[TRR-align]: (https://doc.rust-lang.org/stable/reference/type-layout.html#the-align-representation).
+
+### Packed layout
+
+The `#[repr(packed(N))]` attribute may be used to impose a maximum
+limit on the alignments for individual fields. It is most commonly
+used with an alignment of 1, which makes the struct as small as
+possible. For example, in a `#[repr(packed(2))]` struct, a `u8` or
+`u16` would be aligned at 1- or 2-bytes respectively (as normal), but
+a `u32` would be aligned at only 2 bytes instead of 4. In the absence
+of an explicit `#[repr(align)]` directive, `#[repr(packed(N))]` also
+sets the alignment for the struct as a whole to N bytes.
+
+The resulting fields may not fall at properly aligned boundaries in
+memory. This makes it unsafe to create a Rust reference (`&T` or `&mut
+T`) to those fields, as the compiler requires that all reference
+values must always be aligned (so that it can use more efficient
+load/store instructions at runtime). See the [Rust reference for more
+details][TRR-packed].
+
+[TRR-packed]: https://doc.rust-lang.org/stable/reference/type-layout.html#the-packed-representation
+
+<a name="fnabi"> </a>
+
+### Function call ABI compatibility
+
+In general, when invoking functions that use the C ABI, `#[repr(C)]`
+structs are guaranteed to be passed in the same way as their
+corresponding C counterpart (presuming one exists). `#[repr(Rust)]`
+structs have no such guarantee. This means that if you have an `extern
+"C"` function, you cannot pass a `#[repr(Rust)]` struct as one of its
+arguments. Instead, one would typically pass `#[repr(C)]` structs (or
+possibly pointers to Rust-structs, if those structs are opaque on the
+other side, or the callee is defined in Rust).
+
+However, there is a subtle point about C ABIs: in some C ABIs, passing
+a struct with one field of type `T` as an argument is **not**
+equivalent to just passing a value of type `T`. So e.g. if you have a
+C function that is defined to take a `uint32_t`:
+
+```C
+void some_function(uint32_t value) { .. }
+```
+
+It is **incorrect** to pass in a struct as that value, even if that
+struct is `#[repr(C)`] and has only one field:
+
+```rust
+#[repr(C)]
+struct Foo { x: u32 }
+
+extern "C" some_function(Foo);
+
+some_function(Foo { x: 22 }); // Bad!
+```
+
+Instead, you should declare the struct with `#[repr(transparent)]`,
+which specifies that `Foo` should use the ABI rules for its field
+type, `u32`. This is useful when using "wrapper structs" in Rust to
+give stronger typing guarantees.
+
+`#[repr(transparent)]` can only be applied to structs with a single
+field whose type `T` has non-zero size, along with some number of
+other fields whose types are all zero-sized (typically
+`std::marker::PhantomData` fields). The struct then takes on the "ABI
+behavior" of the type `T` that has non-zero size.
+
+(Note further that the Rust ABI is undefined and theoretically may
+vary from compiler revision to compiler revision.)
+
+## Unresolved question: Guaranteeing compatible layouts?
+
+One key unresolved question was whether we would want to guarantee
+that two `#[repr(Rust)]` structs whose fields have the same types are
+laid out in a "compatible" way, such that one could be transmuted to
+the other. @rkruppe laid out a [number of
+examples](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-419956939)
+where this might be a reasonable thing to expect. As currently
+written, and in an effort to be conservative, we make no such
+guarantee, though we do not firmly rule out doing such a thing in the future.  
+
+It seems like it may well be desirable to -- at minimum -- guarantee
+that `#[repr(Rust)]` layout is "some deterministic function of the
+struct declaration and the monomorphized types of its fields".  Note
+that it is not sufficient to consider the monomorphized type of a
+struct's fields: due to unsizing coercions, it matters whether the
+struct is declared in a generic way or not, since the "unsized" field
+must presently be [laid out last in the
+structure](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12#issuecomment-417843595). (Note
+that tuples are always coercible (see [#42877] for more information),
+and are always declared as generics.) This implies that our
+"deterministic function" also takes as input the form in which the
+fields are declared in the struct.
+
+However, that rule is not true today. For example, the compiler
+includes an option (called "optimization fuel") that will enable us to
+alter the layout of only the "first N" structs declared in the
+source. When one is accidentally relying on the layout of a structure,
+this can be used to track down the struct that is causing the problem.
+
+[#42877]: https://github.com/rust-lang/rust/issues/42877
+[pg-unsized-tuple]: https://play.rust-lang.org/?gist=46399bb68ac685f23beffefc014203ce&version=nightly&mode=debug&edition=2015
+
+There are also benefits to having fewer guarantees. For example:
+
+- Code hardening tools can be used to randomize the layout of individual structs.
+- Profile-guided optimization might analyze how instances of a
+particular struct are used and tweak the layout (e.g., to insert
+padding and reduce false sharing).
+    - However, there aren't many tools that do this sort of thing
+([1](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420650851),
+[2](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420681763)). Moreover,
+it would probably be better for the tools to merely recommend
+annotations that could be added
+([1](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420077105),
+[2](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420077105)),
+such that the knowledge of the improved layouts can be recorded in the
+source.
+
+As a more declarative alternative, @alercah [proposed a possible
+extension](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12#issuecomment-420165155)
+that would permit one to declare that the layout of two structs or
+types are compatible (e.g., `#[repr(as(Foo))] struct Bar { .. }`),
+thus permitting safe transmutes (and also ABI compatibility).  One
+might also use some weaker form of `#[repr(C)]` to specify a "more
+deterministic" layout. These areas need future exploration.
+
+## Counteropinions and other notes
+
+@joshtripplet [argued against reordering struct
+fields](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-417953576),
+suggesting instead it would be better if users reordering fields
+themselves. However, there are a number of downsides to such a
+proposal (and -- further -- it does not match our existing behavior):
+
+- In a generic struct, the [best ordering of fields may not be known
+  ahead of
+  time](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420659840),
+  so the user cannot do it manually.
+- If layout is defined, and a library exposes a struct with all public
+  fields, then clients may be more likely to assume that the layout of
+  that struct is stable. If they were to write unsafe code that relied
+  on this assumption, that would break if fields were reordered. But
+  libraries may well expect the freedom to reorder fields. This case
+  is weakened because of the requirement to write unsafe code (after
+  all, one can always write unsafe code that relies on virtually any
+  implementation detail); if we were to permit **safe** casts that
+  rely on the layout, then reordering fields would clearly be a
+  breaking change (see also [this
+  comment](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420117856)
+  and [this
+  thread](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/31#discussion_r224955817)).
+- Many people would prefer the name ordering to be chosen for
+  "readability" and not optimal layout.
+
+## Footnotes