Skip to content

Commit

Permalink
New Section - Type Layout
Browse files Browse the repository at this point in the history
  • Loading branch information
Havvy committed Nov 17, 2017
1 parent 4b49378 commit d32069e
Show file tree
Hide file tree
Showing 7 changed files with 294 additions and 23 deletions.
1 change: 1 addition & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@
- [Type system](type-system.md)
- [Types](types.md)
- [Dynamically Sized Types](dynamically-sized-types.md)
- [Type layout](type-layout.md)
- [Interior mutability](interior-mutability.md)
- [Subtyping](subtyping.md)
- [Type coercions](type-coercions.md)
Expand Down
2 changes: 1 addition & 1 deletion src/attributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ pub mod m3 {
}
```

### Inline attributes
### Inline attribute

The inline attribute suggests that the compiler should place a copy of
the function or static in the caller, rather than generating code to
Expand Down
2 changes: 1 addition & 1 deletion src/dynamically-sized-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Most types have a fixed size that is known at compile time and implement the
trait [`Sized`][sized]. A type with a size that is known only at run-time is
called a _dynamically sized type_ (_DST_) or (informally) an unsized type.
called a _dynamically sized type_ (_DST_) or, informally, an unsized type.
[Slices] and [trait objects] are two examples of <abbr title="dynamically sized
types">DSTs</abbr>. Such types can only be used in certain cases:

Expand Down
10 changes: 10 additions & 0 deletions src/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
An ‘abstract syntax tree’, or ‘AST’, is an intermediate representation of
the structure of the program when the compiler is compiling it.

### Alignment

The *alignment* of a value specifies what addresses are valid to store the value
at.

### Arity

Arity refers to the number of arguments a function or operation takes.
Expand Down Expand Up @@ -57,6 +62,11 @@ can create such an lvalue without initializing it.
Prelude, or The Rust Prelude, is a small collection of items - mostly traits - that are
imported into very module of every crate. The traits in the prelude are pervasive.

### Size

The *size* of a value is the offset in bytes between successive elements in an
array with that item type including alignment padding.

### Slice

A slice is dynamically-sized view into a contiguous sequence, written as `[T]`.
Expand Down
52 changes: 33 additions & 19 deletions src/items/enumerations.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ An _enumeration_ is a simultaneous definition of a nominal [enumerated type] as
well as a set of *constructors*, that can be used to create or pattern-match
values of the corresponding enumerated type.

[enumerated type]: types.html#enumerated-types

Enumerations are declared with the keyword `enum`.

An example of an `enum` item and its use:
Expand All @@ -24,7 +22,7 @@ Enumeration constructors can have either named or unnamed fields:

```rust
enum Animal {
Dog (String, f64),
Dog(String, f64),
Cat { name: String, weight: f64 },
}

Expand All @@ -34,36 +32,52 @@ a = Animal::Cat { name: "Spotty".to_string(), weight: 2.7 };

In this example, `Cat` is a _struct-like enum variant_, whereas `Dog` is simply
called an enum variant. Each enum instance has a _discriminant_ which is an
integer associated to it that is used to determine which variant it holds.
integer associated to it that is used to determine which variant it holds. An
opaque reference to this variant can be obtained with the [`mem::discriminant`]
function.

## C-like Enumerations

If there is no data attached to *any* of the variants of an enumeration it is
called a *c-like enumeration*. If a discriminant isn't specified, they start at
zero, and add one for each variant, in order. Each enum value is just its
discriminant which you can specify explicitly:
If there is no data attached to *any* of the variants of an enumeration and
there is at least one variant then it is called a *c-like enumeration*.

C-like enumerations can be cast to integer types with the `as` operator by a
[numeric cast]. The enumeration can optionaly specify which integer each
discriminant gets by following the variant name with `=` and then an integer
literal. If the first variant in the declaration is unspecified, then it is set
to zero. For every unspecified discriminant, it is set to one higher than the
previous variant in the declaration.

```rust
enum Foo {
Bar, // 0
Baz = 123,
Baz = 123, // 123
Quux, // 124
}

let baz_discriminant = Foo::Baz as u32;
assert_eq!(baz_discriminant, 123u32);
```

The right hand side of the specification is interpreted as an `isize` value,
but the compiler is allowed to use a smaller type in the actual memory layout.
The [`repr` attribute] can be added in order to change the type of the right
hand side and specify the memory layout.
Under the [default representation], the specified discriminant is interpreted as
an `isize` value although the compiler is allowed to use a smaller type in the
actual memory layout. The size and thus acceptable values can be changed by
using a [primitive representation] or the [`C` representation].

It is an error when either two variants share the same discriminant or for an
unspecified discriminant, the previous discriminant is the maximum value for the
size of the discriminant. <!-- Need examples here. -->

[`repr` attribute]: attributes.html#ffi-attributes
## Zero-variant Enumerations

You can also cast a c-like enum to get its discriminant:
Enums with zero variants are known as *zero-variant enumerations*. As they have
no valid values, they cannot be instantiated.

```rust
# enum Foo { Baz = 123 }
let x = Foo::Baz as u32; // x is now 123u32
enum ZeroVariants {}
```

This only works as long as none of the variants have data attached. If it were
`Baz(i32)`, this is disallowed.
[enumerated type]: types.html#enumerated-types
[`mem::discriminant`]: std/mem/fn.discriminant.html
[numeric cast]: expressions/operator-expr.html#semantics
[`repr` attribute]: attributes.html#ffi-attributes
246 changes: 246 additions & 0 deletions src/type-layout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
# Type Layout

The layout of a type is the way the size, alignment, and the offsets of any
fields and discriminants for the values of that type.

**PR NOTE: This doesn't include valid values. E.g. `bool` and `i8` have the
same layout under this definition. Nor does it include calling convention
differences, so `u8` and `#[repr(C)] struct S { f: u8 }` have the same layout,
as does `*T` and `&T`. I'm not sure if it should or not.**

While specific releases of the compiler will have the same layout for types,
there is a lot of room for new versions of the compiler to do different things.
Instead of trying to document exactly what is done, we only document what is
guaranteed today.

## Size and Alignment

All values have an alignment and size.

The *alignment* of a value specifies what addresses are valid to store the value
at. A value of alignment `n` must only be stored at an address that is a
multiple of n. For example, a value with an alignment of 2 must be stored at an
even address, while a value with an alignment of 1 can be stored at any address.
Alignment is measured in bytes, and must be at least 1, and always a power of 2.
The alignment of a value can be checked with the [`align_of_val`] function.

The *size* of a value is the offset in bytes between successive elements in an
array with that item type including alignment padding. The size of a value is
always a multiple of its alignment. The size of a value can be checked with the
[`size_of_val`] function.

Types where all values have the same size and alignment known at compile time
implement the [`Sized`] trait and can be checked with the [`size_of`] and
[`align_of`] functions. Types that are not [`Sized`] are known as [dynamically
sized types]. Since all values of a `Sized` type share the same size and
alignment, we refer to those shared values as the size of the type and the
alignment of the type respectively.

## Primitive Data Layout

The size of most primitives is given in this table.

Type | `size_of::\<Type>()`
- | - | -
bool | 1
u8 | 1
u16 | 2
u32 | 4
u64 | 8
i8 | 1
i16 | 2
i32 | 4
i64 | 8
f32 | 4
f64 | 8
char | 4

`usize` and `isize` have a size big enough to contain every address on the
target platform. For example, on a 32 bit target, this is 4 bytes and on a 64
bit target, this is 8 bytes.

Most primitives are generally aligned to their size, although this is
platform-specific behavior. In particular, on x86 u64 and f64 may be only
aligned to 32 bits.

## Pointers and References Layout

Pointers and references have the same layout. Mutability of the pointer or
reference does not change the layout.

Pointers to sized types have the same size and alignment as `usize`.

Pointers to unsized types are sized. The size and alignemnt is guaranteed to be
at least equal to the size and alignment of a pointer.

> Note: Though you should not rely on this, all pointers to <abbr
> title="Dynamically Sized Types">DSTs</abbr> are currently twice the size of
> the size of `usize` and have the same alignment.
## Array Layout

Arrays are laid out so that the `nth` element of the array is offset from the
start of the array by `n * the size of the type` bytes. An array of `[T; n]`
has a size of `size_of::<T>() * n` and the same alignment of `T`.

## Slice Layout

Slices have the same layout as the section of the array they slice.

## Tuple Layout

Tuples do not have any guarantes about their layout.

The exception to this is the unit tuple (`()`) which is guaranteed as a
zero-sized type to have a size of 0 and an alignment of 1.

## Trait Object Layout

Trait objects have the same layout as the value the trait object is of.

## Closure Layout

Closures have no layout guarantees.

## Representations

All **FIXME** types have a *representation* that specifies what the layout
is for the type.

Note: The representation does not depend upon the type's fields or generic
parameters.

The possible representations for a type are the default representation, `C`, the
primitive representations, and `packed`. Multiple representations can be applied
to a single type.

The representation of a type can be changed by applying the [`repr` attribute]
to it. The following example shows a struct with a `C` representation.

```
#[repr(C)]
struct ThreeInts {
first: i16,
second: i8,
third: i32
}
```

The representation of a type does not change the layout of its fields. For
example, a struct with a `C` representation that contains a struct `Inner` with
the default representation will not change the layout of Inner.

### The Default Representation

Nominal types without a `repr` attribute have the default representation.
Informally, this representation is also called the `rust` representation.

There are no guarantees of data layout made by this representation.

### The `C` Representation

The `C` representation is designed for creating types that are interoptable with
the C Language and soundly performing operations that rely on data layout such
as reinterpreting values as a different type.

This representation can be applied to structs, unions, and enums.

#### \#[repr(C)] Structs

The alignment of the struct is the alignment of the most-aligned field in it.

The size and offset of fields is determine by the following algorithm.

Start with a current offset of 0 bytes.

For each field in declaration order in the struct, first determine the size and
alignment of the field. If the current offset is not a multiple of the field's
alignment, then add padding bytes increasing the current offset until the
current offset is a multiple of the field's alignment. The offset for the field
is what the current offset is now. Then increase the current offset by the size
of the field.

Finally, the size of the struct is the current offset rounded up to the nearest
multiple of the struct's alignment.

> Note: You can have zero-sized structs from this algorithm. This differs from
> C where structs without data still have a size of one byte.
#### \#[repr(C)] Unions

A [union] declared with `#[repr(C)]` will have the same size and alignment as an
equivalent C union declaration in the C language for the target platform.
Usually, a union would have the maximum size of the maximum size of all of its
fields, and the maximum alignment of the maximum alignment of all of its fields.
These maximums may come from different fields.

```
#[repr(C)]
union Union {
f1: u16,
f2: [u8; 4],
}
assert_eq!(std::mem::size_of<Union>(), 4); // From f2
assert_eq!(std::mem::align_of<Union>(), 2); // From f1
```

#### \#[repr(C)] Enums

For [C-like enumerations], the `C` representation has the size and alignment of
the default `enum` size and alignment for the target platform's C ABI.

> Note: The enum representation in C is implementation defined, so this is
> really a "best guess". In particular, this may be incorrect when the C code
> of interest is compiled with certain flags.
> Warning: There are crucial differences between an `enum` in the C language and
> Rust's C-like enumerations with this representation. An `enum` in C is
> mostly a `typedef` plus some named constants; in other words, an object of an
> `enum` type can hold any integer value. For example, this is often used for
> bitflags in `C`. In contrast, Rust’s C-like enumerations can only legally hold
> the discrimnant values, everything else is undefined behaviour. Therefore,
> using a C-like enumeration in FFI to model a C `enum` is often wrong.
It is an error for [zero-variant enumerations] to have the `C` representation.

For all other enumerations, the layout is unspecified.

### Primitive representations

The *primitive representations* are the representations with the same names as
the primitive integer types. That is: `u8`, `u16`, `u32`, `u64`, `usize`, `i8`,
`i16`, `i32`, `i64`, and `isize`.

Primitive representations can only be applied to enumerations.

For [C-like enumerations], they set the size and alignment to be the same as the
primitive type of the same name. For example, a C-like enumeration with a `u8`
representation can only have discriminants between 0 and 255 inclusive.

It is an error for [zero-variant enumerations] to have a primitive
representation.

For all other enumerations, the layout is unspecified.

### The `packed` Representation

The `packed` representation can only be used on `struct`s and `union`s.

It modifies the representation (either the default or `C`) by removing any
padding bytes and forcing the alignment of the type to `1`.

> Warning: Dereferencing an unaligned pointer is [undefined behaviour] and is
> possible to [safely create unaligned pointers to `packed` fields][27060].
> Like all ways to create undefined behavior in safe Rust, this is a bug.
[`align_of_val`]: ../std/mem/fn.align_of_val.html
[`size_of_val`]: ../std/mem/fn.size_of_val.html
[`align_of`]: ../std/mem/fn.align_of.html
[`size_of`]: ../std/mem/fn.size_of.html
[`Sized`]: ../std/marker/trait.Sized.html
[dynamically sized types]: dynamically-sized-types.html
[C-like enumerations]: items/enumerations.html#c-like-enumerations
[zero-variant enumerations]: items/enumerations.html#zero-variant-enumerations
[undefined behavior]: behavior-considered-undefined.html
[27060]: https://github.com/rust-lang/rust/issues/27060
4 changes: 2 additions & 2 deletions src/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,8 +146,8 @@ let slice: &[i32] = &boxed_array[..];
All elements of arrays and slices are always initialized, and access to an
array or slice is always bounds-checked in safe methods and operators.

The [`Vec<T>`] standard library type provides a heap allocated resizable array
type.
> Note: The [`Vec<T>`] standard library type provides a heap allocated resizable
> array type.
[dynamically sized type]: dynamically-sized-types.html
[`Vec<T>`]: ../std/vec/struct.Vec.html
Expand Down

0 comments on commit d32069e

Please sign in to comment.