Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Section - Type Layout #156

Merged
merged 3 commits into from
Dec 8, 2017
Merged

New Section - Type Layout #156

merged 3 commits into from
Dec 8, 2017

Conversation

Havvy
Copy link
Contributor

@Havvy Havvy commented Nov 17, 2017

r? @scottmcm @gankro @eddyb - Though if you don't want to, just say so and unsubscribe from the thread.

There's no good term for what the compiler calls ADTs - structs, enums, and unions. I cannot think of one either. As such, there's a FIXME in there.

I'm also choosing a very conservative definition of type layout. I'm not sure if it should be expanded or not, or if the valid values and calling convention changes should be a part or not. Although I don't think they should, I don't have any logical arguments either way. Part of the problem is that layout can probably mean multiple things at the same time. E.g. Layout for allocators is just size and alignment.

@Havvy Havvy added the New Content Missing features or aspects of language not currently documented. label Nov 17, 2017
@eddyb
Copy link
Member

eddyb commented Nov 17, 2017

There's no good term for what the compiler calls ADTs - structs, enums, and unions.

Nominal/user-defined data types?

@Havvy
Copy link
Contributor Author

Havvy commented Nov 17, 2017

Nominal almost works but doesn't because trait objects are nominal but you can't change the representation on those. User-defined is sort of ambiguous, but could work.

@Gankra
Copy link
Contributor

Gankra commented Nov 17, 2017

How about "Opt In Built In Types"

@scottmcm
Copy link
Member

Hmm, maybe "composite types"? All three are composed of other types, either primitive or themselves composites. Which may go well with other text talking about how their layout depends on the layouts (and values) of their constituent types.

@eddyb
Copy link
Member

eddyb commented Nov 17, 2017

"user-defined/nominal aggregates"?

@Havvy
Copy link
Contributor Author

Havvy commented Nov 18, 2017

I considered "composite", but tuples are also composite. I could go with user-defined composite types. We can pick a different name later if needed.

More importantly than the name, I want thoughts about the definition of type layout here.

And generally, am I missing details? Should I put in a blockquote explaining how layout currently works for the default representation? Likewise for the NonZero optimization? Are all of the guarantees actually guaranteed?

@Havvy Havvy changed the title [WIP - Do NOT Merge] New Section - Type Layout New Section - Type Layout Nov 21, 2017
@Havvy Havvy requested a review from steveklabnik November 21, 2017 06:42
@Havvy
Copy link
Contributor Author

Havvy commented Nov 21, 2017

So, umm, I just found that rust-by-example is calling them "custom types". So, which do y'all like better, "custom types" or "user-defined composite types"?

Quick straw vote - use the "+1" reactji for "custom" and the "heart" reactji for "user-defined composite types".

Copy link
Contributor

@Gankra Gankra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a pile of loose pedantic comments. Some of them may be dismissable as "that's not how we do these docs".

src/glossary.md Outdated
### Alignment

The *alignment* of a value specifies what addresses are valid to store the value
at.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weirdly non-descript? You should at least specify what the constraint is (the type must have an address that is a multiple of the alignment). Size must also be a multiple of alignment. Also loads are relevant for type-punning/deserialization reasons. It may also be UB to simply construct a reference that is misaligned (not clear on where that discussion went). This is because e.g. enum opts may use the fact that references are aligned to do niche-filling tricks (generalization of NonZero).

You may also wish to note the existence of read_unaligned and write_unaligned, and repr(packed). (e.g. link to other docs as "relevant")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth nothing: minimum alignment is 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you spent at least 10 times as long writing that comment as I did on that glossary entry. Will make these changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random fact: struct / union (and enum in Rust) have a per-platform aggregate alignment baseline which may be larger than 1 byte.

For some reason, LLVM allows setting ABI alignment not just preferred alignment for this, and some targets use 0, some use 8 bits (i.e. 1 byte) for ABI aggregate alignment, despite the two options being identical (you can't have 0 alignment).

It'd be nice to have some assurance that there's no minimum mandated ABI alignment on any platform and that LLVM is just using this for preferred alignment (which mainly exists for when a platform doesn't need something to be aligned, but may perform better, either at runtime or LLVM can choose faster instructions to work with that data).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 into my 📋

src/glossary.md Outdated
### Size

The *size* of a value is the offset in bytes between successive elements in an
array with that item type including alignment padding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worded awkwardly. I would cut the reference to "alignment padding", at least in this sentence. It's also just "how much memory must be allocated to store a value of that type". e.g. Box<T> cannot somehow forego the alignment padding, because it's valid to read/write that padding.

Worth Considering: restate that size must be a multiple of alignment.

Worth Considering: note that the size may depend on the compiler version (optimizations) or the target (usize, references, etc)

Worth Considering: note that zero-sized types exist.

zero, and add one for each variant, in order. Each enum value is just its
discriminant which you can specify explicitly:
If there is no data attached to *any* of the variants of an enumeration and
there is at least one variant then it is called a *c-like enumeration*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone actually call enums "enumerations"? Like yeah it's technically short for it etymologically, but you don't get to call every Alex "Alexander".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm...looks at others for opinions on this. I personally prefer spelling out enumeration, but if we want to officially call them enums, that's cool too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://doc.rust-lang.org/book/second-edition/ch06-00-enums.html

In this chapter we’ll look at enumerations, also referred to as enums. Enums allow you to define a type by enumerating its possible values.

we then say "enums" every time after

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the "also referred as enums" into the reference, and likewise use "enums" every time after.

Quux, // 124
}

let baz_discriminant = Foo::Baz as u32;
assert_eq!(baz_discriminant, 123u32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant suffix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. I don't like eliding suffixes personally. Is there a style rule against it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not 100% sure but i def don't add it unless needed

The [`repr` attribute] can be added in order to change the type of the right
hand side and specify the memory layout.
Under the [default representation], the specified discriminant is interpreted as
an `isize` value although the compiler is allowed to use a smaller type in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! (weird)

alignment of the field. If the current offset is not a multiple of the field's
alignment, then add padding bytes increasing the current offset until the
current offset is a multiple of the field's alignment. The offset for the field
is what the current offset is now. Then increase the current offset by the size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The offset for the field is..." seems superfluous. idk. consider rewording.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel that it's superfluous - just overly explicit.

I was considering turning this into psuedocode. As psuedocode, this line would be field.offset = current_offset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

algorithmic desc sounds really good!


A union declared with `#[repr(C)]` will have the same size and alignment as an
equivalent C union declaration in the C language for the target platform.
Usually, a union would have the maximum size of the maximum size of all of its
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this usually? When isn't it true? This is important for low-level Rust code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough C to answer this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[repr(C)]
union A {
   a: u32,
   b: [u16; 3],
}

has size 8.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, this is a good point. It should be, "Has an alignment equal to the maximum alignment of its fields. Has a size equal to the maximum size of its fields, rounded up to its alignment."

equivalent C union declaration in the C language for the target platform.
Usually, a union would have the maximum size of the maximum size of all of its
fields, and the maximum alignment of the maximum alignment of all of its fields.
These maximums may come from different fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doubled "maximum size" and "maximum alignment" here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maximum size of the union vs. maximum size of a field of the union. It looks silly, but it's wrong to remove either "maximum".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the notion of a union having a "maximum" size. It just has a size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah. Right. Will fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we don't really have a notion of "maximum" alignment anymore? We squashed alignment into one number pre 1.0.

It is an error for [zero-variant enumerations] to have a primitive
representation.

For all other enumerations, the layout is unspecified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also unspecified: (C, primitive), (primitive1, primitive2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I'm pretending you cannot do that. The first one will come in once your RFC is merged. The second one should probably be an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently only ever been a warning, so I think we should be clear here. Technically not back-compat to make an error.

It modifies the representation (either the default or `C`) by removing any
padding bytes and forcing the alignment of the type to `1`.

> Warning: Dereferencing an unaligned pointer is [undefined behaviour] and is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and "it" is


Type | `size_of::\<Type>()`
- | - | -
bool | 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is explicitly not stable.

@Havvy
Copy link
Contributor Author

Havvy commented Nov 22, 2017

I'll address @gankro's comments on Thursday. Thank you so very much for them. 💟

@Havvy
Copy link
Contributor Author

Havvy commented Nov 27, 2017

I've procrastinated too much on this, but I now have the changes. They're in their own commit if you want to look at them specifically.

@Havvy
Copy link
Contributor Author

Havvy commented Dec 2, 2017

Rebased because grammar changes caused merge conflict.

@Havvy
Copy link
Contributor Author

Havvy commented Dec 5, 2017

Rebased once again because C-like enum -> field-less enum rename caused merge conflict.

src/glossary.md Outdated

It is a multiple of the alignment, including zero. The size can change
depending on compiler version (as new optimizations are made) and target
platform (as `usize` varies).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very very very minor, but i'd say "similar to how usize varies per-platform", as this wording (to me) makes it sound like it's based off of the size of usize.


## Zero-variant Enums

Enums with zero variants are known as *zero-variant enumss*. As they have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra s in enumss

struct.size = current_offset + current_offset % struct.alignment;
```

> Note: You can have zero-sized structs from this algorithm. This differs from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say

Note: This algorithm can produce zero-sized structs.

@Havvy
Copy link
Contributor Author

Havvy commented Dec 8, 2017

All three of those changes have been made. They were all good changes.

Copy link
Member

@steveklabnik steveklabnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤘

@steveklabnik steveklabnik merged commit d812ea2 into rust-lang:master Dec 8, 2017
@Havvy Havvy mentioned this pull request Dec 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Content Missing features or aspects of language not currently documented.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants