diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 7551d559b..5464df575 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -61,6 +61,7 @@ - [Type system](type-system.md) - [Types](types.md) - [Dynamically Sized Types](dynamically-sized-types.md) + - [Type layout](type-layout.md) - [Interior mutability](interior-mutability.md) - [Subtyping](subtyping.md) - [Type coercions](type-coercions.md) diff --git a/src/attributes.md b/src/attributes.md index 0930b30c8..627044adb 100644 --- a/src/attributes.md +++ b/src/attributes.md @@ -357,7 +357,7 @@ pub mod m3 { } ``` -### Inline attributes +### Inline attribute The inline attribute suggests that the compiler should place a copy of the function or static in the caller, rather than generating code to diff --git a/src/dynamically-sized-types.md b/src/dynamically-sized-types.md index b459dcda6..5485ef7a0 100644 --- a/src/dynamically-sized-types.md +++ b/src/dynamically-sized-types.md @@ -2,7 +2,7 @@ Most types have a fixed size that is known at compile time and implement the trait [`Sized`][sized]. A type with a size that is known only at run-time is -called a _dynamically sized type_ (_DST_) or (informally) an unsized type. +called a _dynamically sized type_ (_DST_) or, informally, an unsized type. [Slices] and [trait objects] are two examples of DSTs. Such types can only be used in certain cases: diff --git a/src/glossary.md b/src/glossary.md index 16f3bc2ef..50d18e757 100644 --- a/src/glossary.md +++ b/src/glossary.md @@ -5,6 +5,12 @@ An ‘abstract syntax tree’, or ‘AST’, is an intermediate representation of the structure of the program when the compiler is compiling it. +### Alignment + +The alignment of a value specifies what addresses values are preferred to +start at. Always a power of two. References to a value must be aligned. +[More][alignment]. + ### Arity Arity refers to the number of arguments a function or operator takes. @@ -69,6 +75,21 @@ Types that can be referred to by a path directly. Specifically [enums], Prelude, or The Rust Prelude, is a small collection of items - mostly traits - that are imported into very module of every crate. The traits in the prelude are pervasive. +### Size + +The size of a value has two definitions. + +The first is that it is how much memory must be allocated to store that value. + +The second is that it is the offset in bytes between successive elements in an +array with that item type. + +It is a multiple of the alignment, including zero. The size can change +depending on compiler version (as new optimizations are made) and target +platform (similar to how `usize` varies per-platform). + +[More][alignment]. + ### Slice A slice is dynamically-sized view into a contiguous sequence, written as `[T]`. @@ -104,6 +125,7 @@ It allows a type to make certain promises about its behavior. Generic functions and generic structs can use traits to constrain, or bound, the types they accept. +[alignment]: type-layout.html#size-and-alignment [enums]: items/enumerations.html [structs]: items/structs.html [unions]: items/unions.html diff --git a/src/items/enumerations.md b/src/items/enumerations.md index a39b5adf6..2bd007e1b 100644 --- a/src/items/enumerations.md +++ b/src/items/enumerations.md @@ -25,11 +25,9 @@ > _EnumItemDiscriminant_ : >    `=` [_Expression_] -An _enumeration_ is a simultaneous definition of a nominal [enumerated type] as -well as a set of *constructors*, that can be used to create or pattern-match -values of the corresponding enumerated type. - -[enumerated type]: types.html#enumerated-types +An *enumeration*, also referred to as *enum* is a simultaneous definition of a +nominal [enumerated type] as well as a set of *constructors*, that can be used +to create or pattern-match values of the corresponding enumerated type. Enumerations are declared with the keyword `enum`. @@ -45,11 +43,11 @@ let mut a: Animal = Animal::Dog; a = Animal::Cat; ``` -Enumeration constructors can have either named or unnamed fields: +Enum constructors can have either named or unnamed fields: ```rust enum Animal { - Dog (String, f64), + Dog(String, f64), Cat { name: String, weight: f64 }, } @@ -59,41 +57,79 @@ a = Animal::Cat { name: "Spotty".to_string(), weight: 2.7 }; In this example, `Cat` is a _struct-like enum variant_, whereas `Dog` is simply called an enum variant. Each enum instance has a _discriminant_ which is an -integer associated to it that is used to determine which variant it holds. +integer associated to it that is used to determine which variant it holds. An +opaque reference to this discriminant can be obtained with the +[`mem::discriminant`] function. ## Custom Discriminant Values for Field-Less Enumerations If there is no data attached to *any* of the variants of an enumeration, then the discriminant can be directly chosen and accessed. -If a discriminant isn't specified, they start at zero, and add one for each -variant, in order. Each enum value is just its discriminant which you can -specify explicitly: +These enumerations can be cast to integer types with the `as` operator by a +[numeric cast]. The enumeration can optionaly specify which integer each +discriminant gets by following the variant name with `=` and then an integer +literal. If the first variant in the declaration is unspecified, then it is set +to zero. For every unspecified discriminant, it is set to one higher than the +previous variant in the declaration. ```rust enum Foo { Bar, // 0 - Baz = 123, + Baz = 123, // 123 Quux, // 124 } + +let baz_discriminant = Foo::Baz as u32; +assert_eq!(baz_discriminant, 123); ``` -The right hand side of the specification is interpreted as an `isize` value, -but the compiler is allowed to use a smaller type in the actual memory layout. -The [`repr` attribute] can be added in order to change the type of the right -hand side and specify the memory layout. +Under the [default representation], the specified discriminant is interpreted as +an `isize` value although the compiler is allowed to use a smaller type in the +actual memory layout. The size and thus acceptable values can be changed by +using a [primitive representation] or the [`C` representation]. -[`repr` attribute]: attributes.html#ffi-attributes +It is an error when two variants share the same discriminant. -You can also cast a field-less enum to get its discriminant: +```rust,ignore +enum SharedDiscriminantError { + SharedA = 1, + SharedB = 1 +} -```rust -# enum Foo { Baz = 123 } -let x = Foo::Baz as u32; // x is now 123u32 +enum SharedDiscriminantError2 { + Zero, // 0 + One, // 1 + OneToo = 1 // 1 (collision with previous!) +} ``` -This only works as long as none of the variants have data attached. If it were -`Baz(i32)`, this is disallowed. +It is also an error to have an unspecified discriminant where the previous +discriminant is the maximum value for the size of the discriminant. + +```rust,ignore +#[repr(u8)] +enum OverflowingDiscriminantError { + Max = 255, + MaxPlusOne // Would be 256, but that overflows the enum. +} + +#[repr(u8)] +enum OverflowingDiscriminantError2 { + MaxMinusOne = 254, // 254 + Max, // 255 + MaxPlusOne // Would be 256, but that overflows the enum. +} +``` + +## Zero-variant Enums + +Enums with zero variants are known as *zero-variant enums*. As they have +no valid values, they cannot be instantiated. + +```rust +enum ZeroVariants {} +``` [IDENTIFIER]: identifiers.html [_Generics_]: items.html#type-parameters @@ -101,3 +137,7 @@ This only works as long as none of the variants have data attached. If it were [_Expression_]: expressions.html [_TupleFields_]: items/structs.html [_StructFields_]: items/structs.html +[enumerated type]: types.html#enumerated-types +[`mem::discriminant`]: std/mem/fn.discriminant.html +[numeric cast]: expressions/operator-expr.html#semantics +[`repr` attribute]: attributes.html#ffi-attributes diff --git a/src/type-layout.md b/src/type-layout.md new file mode 100644 index 000000000..6f188937b --- /dev/null +++ b/src/type-layout.md @@ -0,0 +1,287 @@ +# Type Layout + +The layout of a type is its size, alignment, and the relative offsets of its +fields. For enums, how the discriminant is laid out and interpreted is also part +of type layout. + +Type layout can be changed with each compilation. Instead of trying to document +exactly what is done, we only document what is guaranteed today. + +## Size and Alignment + +All values have an alignment and size. + +The *alignment* of a value specifies what addresses are valid to store the value +at. A value of alignment `n` must only be stored at an address that is a +multiple of n. For example, a value with an alignment of 2 must be stored at an +even address, while a value with an alignment of 1 can be stored at any address. +Alignment is measured in bytes, and must be at least 1, and always a power of 2. +The alignment of a value can be checked with the [`align_of_val`] function. + +The *size* of a value is the offset in bytes between successive elements in an +array with that item type including alignment padding. The size of a value is +always a multiple of its alignment. The size of a value can be checked with the +[`size_of_val`] function. + +Types where all values have the same size and alignment known at compile time +implement the [`Sized`] trait and can be checked with the [`size_of`] and +[`align_of`] functions. Types that are not [`Sized`] are known as [dynamically +sized types]. Since all values of a `Sized` type share the same size and +alignment, we refer to those shared values as the size of the type and the +alignment of the type respectively. + +## Primitive Data Layout + +The size of most primitives is given in this table. + +Type | `size_of::\()` +- | - | - +u8 | 1 +u16 | 2 +u32 | 4 +u64 | 8 +i8 | 1 +i16 | 2 +i32 | 4 +i64 | 8 +f32 | 4 +f64 | 8 +char | 4 + +`usize` and `isize` have a size big enough to contain every address on the +target platform. For example, on a 32 bit target, this is 4 bytes and on a 64 +bit target, this is 8 bytes. + +Most primitives are generally aligned to their size, although this is +platform-specific behavior. In particular, on x86 u64 and f64 are only +aligned to 32 bits. + +## Pointers and References Layout + +Pointers and references have the same layout. Mutability of the pointer or +reference does not change the layout. + +Pointers to sized types have the same size and alignment as `usize`. + +Pointers to unsized types are sized. The size and alignemnt is guaranteed to be +at least equal to the size and alignment of a pointer. + +> Note: Though you should not rely on this, all pointers to title="Dynamically Sized Types">DSTs are currently twice the size of +> the size of `usize` and have the same alignment. + +## Array Layout + +Arrays are laid out so that the `nth` element of the array is offset from the +start of the array by `n * the size of the type` bytes. An array of `[T; n]` +has a size of `size_of::() * n` and the same alignment of `T`. + +## Slice Layout + +Slices have the same layout as the section of the array they slice. + +> Note: This is about the raw `[T]` type, not pointers (`&[T]`, `Box<[T]>`, +> etc.) to slices. + +## Tuple Layout + +Tuples do not have any guarantes about their layout. + +The exception to this is the unit tuple (`()`) which is guaranteed as a +zero-sized type to have a size of 0 and an alignment of 1. + +## Trait Object Layout + +Trait objects have the same layout as the value the trait object is of. + +> Note: This is about the raw trait object types, not pointers (`&Trait`, +> `Box`, etc.) to trait objects. + +## Closure Layout + +Closures have no layout guarantees. + +## Representations + +All user-defined composite types (`struct`s, `enum`, and `union`s) have a +*representation* that specifies what the layout is for the type. + +The possible representations for a type are the default representation, `C`, the +primitive representations, and `packed`. Multiple representations can be applied +to a single type. + +The representation of a type can be changed by applying the [`repr` attribute] +to it. The following example shows a struct with a `C` representation. + +``` +#[repr(C)] +struct ThreeInts { + first: i16, + second: i8, + third: i32 +} +``` + +> Note: As a consequence of the representation being an attribute on the item, +> the representation does not depend on generic parameters. Any two types with +> the same name have the same representation. For example, `Foo` and +> `Foo` both have the same representation. + +The representation of a type does not change the layout of its fields. For +example, a struct with a `C` representation that contains a struct `Inner` with +the default representation will not change the layout of Inner. + +### The Default Representation + +Nominal types without a `repr` attribute have the default representation. +Informally, this representation is also called the `rust` representation. + +There are no guarantees of data layout made by this representation. + +### The `C` Representation + +The `C` representation is designed for dual purposes. One purpose is for +creating types that are interoptable with the C Language. The second purpose is +to create types that you can soundly performing operations that rely on data +layout such as reinterpreting values as a different type. + +Because of this dual purpose, it is possible to create types that are not useful +for interfacing with the C programming language. + +This representation can be applied to structs, unions, and enums. + +#### \#[repr(C)] Structs + +The alignment of the struct is the alignment of the most-aligned field in it. + +The size and offset of fields is determined by the following algorithm. + +Start with a current offset of 0 bytes. + +For each field in declaration order in the struct, first determine the size and +alignment of the field. If the current offset is not a multiple of the field's +alignment, then add padding bytes to the current offset until it is a multiple +of the field's alignment. The offset for the field is what the current offset +is now. Then increase the current offset by the size of the field. + +Finally, the size of the struct is the current offset rounded up to the nearest +multiple of the struct's alignment. + +Here is this algorithm described in psudeocode. + +```rust,ignore +struct.alignment = struct.fields().map(|field| field.alignment).max(); + +let current_offset = 0; + +for field in struct.fields_in_declaration_order() { + // Increase the current offset so that it's a multiple of the alignment + // of this field. For the first field, this will always be zero. + // The skipped bytes are called padding bytes. + current_offset += field.alignment % current_offset; + + struct[field].offset = current_offset; + + current_offset += field.size; +} + +struct.size = current_offset + current_offset % struct.alignment; +``` + +> Note: This algorithm can produce zero-sized structs. This differs from +> C where structs without data still have a size of one byte. + +#### \#[repr(C)] Unions + +A union declared with `#[repr(C)]` will have the same size and alignment as an +equivalent C union declaration in the C language for the target platform. +The union will have a size of the maximum size of all of its fields rounded to +its alignment, and an alignment of the maximum alignment of all of its fields. +These maximums may come from different fields. + +``` +#[repr(C)] +union Union { + f1: u16, + f2: [u8; 4], +} + +assert_eq!(std::mem::size_of::(), 4); // From f2 +assert_eq!(std::mem::align_of::(), 2); // From f1 + +#[repr(C)] +union SizeRoundedUp { + a: u32, + b: [u16; 3], +} + +assert_eq!(std::mem::size_of::(), 8); // Size of 6 from b, + // rounded up to 8 from + // alignment of a. +assert_eq!(std::mem::align_of::(), 4); // From a +``` + +#### \#[repr(C)] Enums + +For [C-like enumerations], the `C` representation has the size and alignment of +the default `enum` size and alignment for the target platform's C ABI. + +> Note: The enum representation in C is implementation defined, so this is +> really a "best guess". In particular, this may be incorrect when the C code +> of interest is compiled with certain flags. + +> Warning: There are crucial differences between an `enum` in the C language and +> Rust's C-like enumerations with this representation. An `enum` in C is +> mostly a `typedef` plus some named constants; in other words, an object of an +> `enum` type can hold any integer value. For example, this is often used for +> bitflags in `C`. In contrast, Rust’s C-like enumerations can only legally hold +> the discrimnant values, everything else is undefined behaviour. Therefore, +> using a C-like enumeration in FFI to model a C `enum` is often wrong. + +It is an error for [zero-variant enumerations] to have the `C` representation. + +For all other enumerations, the layout is unspecified. + +Likewise, combining the `C` representation with a primitive representation, the +layout is unspecified. + +### Primitive representations + +The *primitive representations* are the representations with the same names as +the primitive integer types. That is: `u8`, `u16`, `u32`, `u64`, `usize`, `i8`, +`i16`, `i32`, `i64`, and `isize`. + +Primitive representations can only be applied to enumerations. + +For [C-like enumerations], they set the size and alignment to be the same as the +primitive type of the same name. For example, a C-like enumeration with a `u8` +representation can only have discriminants between 0 and 255 inclusive. + +It is an error for [zero-variant enumerations] to have a primitive +representation. + +For all other enumerations, the layout is unspecified. + +Likewise, combining two primitive representations together is unspecified. + +### The `packed` Representation + +The `packed` representation can only be used on `struct`s and `union`s. + +It modifies the representation (either the default or `C`) by removing any +padding bytes and forcing the alignment of the type to `1`. + +> Warning: Dereferencing an unaligned pointer is [undefined behaviour] and it is +> possible to [safely create unaligned pointers to `packed` fields][27060]. +> Like all ways to create undefined behavior in safe Rust, this is a bug. + +[`align_of_val`]: ../std/mem/fn.align_of_val.html +[`size_of_val`]: ../std/mem/fn.size_of_val.html +[`align_of`]: ../std/mem/fn.align_of.html +[`size_of`]: ../std/mem/fn.size_of.html +[`Sized`]: ../std/marker/trait.Sized.html +[dynamically sized types]: dynamically-sized-types.html +[C-like enumerations]: items/enumerations.html#c-like-enumerations +[zero-variant enumerations]: items/enumerations.html#zero-variant-enumerations +[undefined behavior]: behavior-considered-undefined.html +[27060]: https://github.com/rust-lang/rust/issues/27060 \ No newline at end of file diff --git a/src/types.md b/src/types.md index dc4eb91ca..553a14247 100644 --- a/src/types.md +++ b/src/types.md @@ -146,8 +146,8 @@ let slice: &[i32] = &boxed_array[..]; All elements of arrays and slices are always initialized, and access to an array or slice is always bounds-checked in safe methods and operators. -The [`Vec`] standard library type provides a heap allocated resizable array -type. +> Note: The [`Vec`] standard library type provides a heap allocated resizable +> array type. [dynamically sized type]: dynamically-sized-types.html [`Vec`]: ../std/vec/struct.Vec.html