-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Section - Type Layout #156
Conversation
Nominal/user-defined data types? |
Nominal almost works but doesn't because trait objects are nominal but you can't change the representation on those. User-defined is sort of ambiguous, but could work. |
How about "Opt In Built In Types" |
Hmm, maybe "composite types"? All three are composed of other types, either primitive or themselves composites. Which may go well with other text talking about how their layout depends on the layouts (and values) of their constituent types. |
"user-defined/nominal aggregates"? |
I considered "composite", but tuples are also composite. I could go with user-defined composite types. We can pick a different name later if needed. More importantly than the name, I want thoughts about the definition of type layout here. And generally, am I missing details? Should I put in a blockquote explaining how layout currently works for the default representation? Likewise for the NonZero optimization? Are all of the guarantees actually guaranteed? |
So, umm, I just found that rust-by-example is calling them "custom types". So, which do y'all like better, "custom types" or "user-defined composite types"? Quick straw vote - use the "+1" reactji for "custom" and the "heart" reactji for "user-defined composite types". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a pile of loose pedantic comments. Some of them may be dismissable as "that's not how we do these docs".
src/glossary.md
Outdated
### Alignment | ||
|
||
The *alignment* of a value specifies what addresses are valid to store the value | ||
at. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weirdly non-descript? You should at least specify what the constraint is (the type must have an address that is a multiple of the alignment). Size must also be a multiple of alignment. Also loads are relevant for type-punning/deserialization reasons. It may also be UB to simply construct a reference that is misaligned (not clear on where that discussion went). This is because e.g. enum opts may use the fact that references are aligned to do niche-filling tricks (generalization of NonZero).
You may also wish to note the existence of read_unaligned and write_unaligned, and repr(packed). (e.g. link to other docs as "relevant")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth nothing: minimum alignment is 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you spent at least 10 times as long writing that comment as I did on that glossary entry. Will make these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random fact: struct
/ union
(and enum
in Rust) have a per-platform aggregate alignment baseline which may be larger than 1
byte.
For some reason, LLVM allows setting ABI alignment not just preferred alignment for this, and some targets use 0
, some use 8
bits (i.e. 1
byte) for ABI aggregate alignment, despite the two options being identical (you can't have 0
alignment).
It'd be nice to have some assurance that there's no minimum mandated ABI alignment on any platform and that LLVM is just using this for preferred alignment (which mainly exists for when a platform doesn't need something to be aligned, but may perform better, either at runtime or LLVM can choose faster instructions to work with that data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 into my 📋
src/glossary.md
Outdated
### Size | ||
|
||
The *size* of a value is the offset in bytes between successive elements in an | ||
array with that item type including alignment padding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worded awkwardly. I would cut the reference to "alignment padding", at least in this sentence. It's also just "how much memory must be allocated to store a value of that type". e.g. Box<T>
cannot somehow forego the alignment padding, because it's valid to read/write that padding.
Worth Considering: restate that size must be a multiple of alignment.
Worth Considering: note that the size may depend on the compiler version (optimizations) or the target (usize, references, etc)
Worth Considering: note that zero-sized types exist.
src/items/enumerations.md
Outdated
zero, and add one for each variant, in order. Each enum value is just its | ||
discriminant which you can specify explicitly: | ||
If there is no data attached to *any* of the variants of an enumeration and | ||
there is at least one variant then it is called a *c-like enumeration*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anyone actually call enums "enumerations"? Like yeah it's technically short for it etymologically, but you don't get to call every Alex "Alexander".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm...looks at others for opinions on this. I personally prefer spelling out enumeration, but if we want to officially call them enum
s, that's cool too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://doc.rust-lang.org/book/second-edition/ch06-00-enums.html
In this chapter we’ll look at enumerations, also referred to as enums. Enums allow you to define a type by enumerating its possible values.
we then say "enums" every time after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the "also referred as enums" into the reference, and likewise use "enums" every time after.
src/items/enumerations.md
Outdated
Quux, // 124 | ||
} | ||
|
||
let baz_discriminant = Foo::Baz as u32; | ||
assert_eq!(baz_discriminant, 123u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. I don't like eliding suffixes personally. Is there a style rule against it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not 100% sure but i def don't add it unless needed
The [`repr` attribute] can be added in order to change the type of the right | ||
hand side and specify the memory layout. | ||
Under the [default representation], the specified discriminant is interpreted as | ||
an `isize` value although the compiler is allowed to use a smaller type in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL! (weird)
src/type-layout.md
Outdated
alignment of the field. If the current offset is not a multiple of the field's | ||
alignment, then add padding bytes increasing the current offset until the | ||
current offset is a multiple of the field's alignment. The offset for the field | ||
is what the current offset is now. Then increase the current offset by the size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The offset for the field is..." seems superfluous. idk. consider rewording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel that it's superfluous - just overly explicit.
I was considering turning this into psuedocode. As psuedocode, this line would be field.offset = current_offset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
algorithmic desc sounds really good!
src/type-layout.md
Outdated
|
||
A union declared with `#[repr(C)]` will have the same size and alignment as an | ||
equivalent C union declaration in the C language for the target platform. | ||
Usually, a union would have the maximum size of the maximum size of all of its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this usually? When isn't it true? This is important for low-level Rust code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough C to answer this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[repr(C)]
union A {
a: u32,
b: [u16; 3],
}
has size 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, this is a good point. It should be, "Has an alignment equal to the maximum alignment of its fields. Has a size equal to the maximum size of its fields, rounded up to its alignment."
src/type-layout.md
Outdated
equivalent C union declaration in the C language for the target platform. | ||
Usually, a union would have the maximum size of the maximum size of all of its | ||
fields, and the maximum alignment of the maximum alignment of all of its fields. | ||
These maximums may come from different fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doubled "maximum size" and "maximum alignment" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maximum size of the union vs. maximum size of a field of the union. It looks silly, but it's wrong to remove either "maximum".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the notion of a union having a "maximum" size. It just has a size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah. Right. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we don't really have a notion of "maximum" alignment anymore? We squashed alignment into one number pre 1.0.
It is an error for [zero-variant enumerations] to have a primitive | ||
representation. | ||
|
||
For all other enumerations, the layout is unspecified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also unspecified: (C, primitive)
, (primitive1, primitive2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I'm pretending you cannot do that. The first one will come in once your RFC is merged. The second one should probably be an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's currently only ever been a warning, so I think we should be clear here. Technically not back-compat to make an error.
src/type-layout.md
Outdated
It modifies the representation (either the default or `C`) by removing any | ||
padding bytes and forcing the alignment of the type to `1`. | ||
|
||
> Warning: Dereferencing an unaligned pointer is [undefined behaviour] and is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and "it" is
src/type-layout.md
Outdated
|
||
Type | `size_of::\<Type>()` | ||
- | - | - | ||
bool | 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is explicitly not stable.
I'll address @gankro's comments on Thursday. Thank you so very much for them. 💟 |
I've procrastinated too much on this, but I now have the changes. They're in their own commit if you want to look at them specifically. |
Rebased because grammar changes caused merge conflict. |
Rebased once again because C-like enum -> field-less enum rename caused merge conflict. |
src/glossary.md
Outdated
|
||
It is a multiple of the alignment, including zero. The size can change | ||
depending on compiler version (as new optimizations are made) and target | ||
platform (as `usize` varies). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very very very minor, but i'd say "similar to how usize
varies per-platform", as this wording (to me) makes it sound like it's based off of the size of usize.
src/items/enumerations.md
Outdated
|
||
## Zero-variant Enums | ||
|
||
Enums with zero variants are known as *zero-variant enumss*. As they have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra s in enumss
src/type-layout.md
Outdated
struct.size = current_offset + current_offset % struct.alignment; | ||
``` | ||
|
||
> Note: You can have zero-sized structs from this algorithm. This differs from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say
Note: This algorithm can produce zero-sized structs.
All three of those changes have been made. They were all good changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤘
r? @scottmcm @gankro @eddyb - Though if you don't want to, just say so and unsubscribe from the thread.
There's no good term for what the compiler calls ADTs - structs, enums, and unions. I cannot think of one either. As such, there's a FIXME in there.
I'm also choosing a very conservative definition of type layout. I'm not sure if it should be expanded or not, or if the valid values and calling convention changes should be a part or not. Although I don't think they should, I don't have any logical arguments either way. Part of the problem is that layout can probably mean multiple things at the same time. E.g. Layout for allocators is just size and alignment.