Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for structs with unspecified layouts. #79

Merged
merged 8 commits into from
May 21, 2014
Merged

Conversation

huonw
Copy link
Member

@huonw huonw commented May 17, 2014

No description provided.

A fixed layout can be selected with the `#[repr]` attribute

```rust
#[repr(C)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole idea is a good one; +1. Bikeshed: I'm not sure if repr(C) is the right notation - you might want a defined layout for other reasons - maybe repr(fixed)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to bikeshed, maybe repr(declaration) or repr(as_written) would be more descriptive?

I think repr(C) is actually OK, because it's specifying that C layout rules should be used (i.e. declaration order), although it could easily be interpreted as "struct for C FFI". There are other possible "fixed" layouts (e.g. sorting by field size, or even alphabetically).

In any case, I don't particularly care about the name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought, but then it occurred to me that the only people who will insist on such control over representation would be coming from a C background anyway. I do like the analogy to repr(C) on enums, and it would be a shame to have repr(C) and repr(fixed) be aliases of each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #[repr(C)] is the only thing that makes sense for producing a C-compatible struct. If #[repr(fixed)] would produce something other than #[repr(C)] then it might be worth having, but if it produces the exact same layout as #[repr(C)] then it seems unnecessarily redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. repr(C) sounds like the best bet.

@bstrie
Copy link
Contributor

bstrie commented May 17, 2014

I think the precedent of #[repr(C)] on enums is enough to justify this proposal, if it would net us a free performance win for no loss of functionality. A lint sounds ideal for prodding people to remember the attribute when doing FFI.

I misunderstood what the current #[packed] does... I was under the impression that it maintained alignment while reordering the fields as mentioned in "The Lost Art of C Structure Packing" (http://www.catb.org/esr/structure-packing/). The use of "pack" in both places is what confused me. We should grow an attribute that reorders fields as in that link.

If we had such an attribute, would it make sense to make it the default under this proposal? If not, what do you suspect the default would become?

struct layouts, for example,
[the Grsecurity suite](http://grsecurity.net/) of security
enhancements to the Linux kernel provides
[`GRKERNSEC_RANDSTRUCT`](http://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Randomize_layout_of_sensitive_kernel_structures)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's completely unnecessary if you're confident that you are memory safe, which (modulo compiler bugs) Rust can give you (except unsafe blocks).

IMO C-struct-compatibility is a major selling point of Rust, it should be the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're writing a kernel in Rust, you presumably aren't guaranteeing that all the programs your kernel runs are also written in Rust. To that end, being able to randomize fields sounds plausibly useful.

However, I would imagine it's probably better done by writing a custom item decorator that randomizes the field order (and places whatever #[repr()] attribute is necessary to tell the compiler to use the declaration order). Which is to say, the kernel author can write the necessary extension, Rust doesn't need to provide it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@o11c, it is also a major selling point of Rust to be highly efficient. IMO, this selling point is more important than C interop.

Of course, easy C interop will still be a selling point, but I suspect it is the wrong default. If we stick to our current default, every struct that is not used for C interop will pay the price of unoptimized representation, unless we add an annotation to the struct definition. This violates the notion of "pay for what you use". And given that the number of structs intended for C interop is relatively few, requiring this annotation for the majority of structs would be comparable to the burden of const-correctness in C++.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kballard that is a good point. It's a trivial syntax extension: https://gist.github.com/huonw/be05427dc80e44f1a594

I'll remove randomisation as a reason for this RFC.

@huonw
Copy link
Member Author

huonw commented May 17, 2014

For reference, our current #[packed] just removes all padding-for-alignment. It doesn't reorder fields e.g. if A had #[packed] then it would be 18 bytes and the "full" AFull version wouldn't have any of the _padding fields.

A problem with packing without reordering is some types & architectures strongly prefer certain alignments, meaning unaligning/packing things naively (by just removing the padding without reordering) can lead to unexpectedly poor performance.

We should grow an attribute that reorders fields as in that link.

If we had such an attribute, would it make sense to make it the default under this proposal? If not, what do you suspect the default would become?

This is exactly what this RFC is discussing. :)

@lilyball
Copy link
Contributor

👍

@huonw huonw changed the title RFC for structs with undefined layouts. RFC for structs with unspecified layouts. May 17, 2014
@alexcrichton
Copy link
Member

I expect that the overwhelming majority of structures need not be FFI compatible, but it might be enlightening to canvas rust and servo to see how many structures would need #[repr(C)] (or the equivalent), just to make sure that we're not surprising ourselves. Both projects have a nontrivial amount of FFI invocations, and it's always nice to know what we're getting into.

I would also add discussion of the lint to the detailed design section, I would expect that to be mandatory with a change such as this.

@Valloric
Copy link

While I like this idea in general, I'm worried about the third-party library corner case: I use a struct from a third-party lib and want to pass it to some C code. The struct isn't annotated with #[repr(C)]. Does this mean I'm screwed?

@lilyball
Copy link
Contributor

@Valloric Seems like you'd have the same issue with using a third-party enum.

What third-party library are you envisioning that provides a non-C-compatible struct that you find just happens to match the definition of a C struct? And if you are relying on this, what guarantee do you have that the third-party library won't reorder its struct fields (or add new fields)?

@Valloric
Copy link

What third-party library are you envisioning that provides a non-C-compatible struct that you find just happens to match the definition of a C struct?

The way I'm reading the RFC, one of the reasons why not forcing a C-compatible layout is a good idea is security. Which I'm guessing means the struct layout can change from compilation to compilation (or rustc versions). So if I want to pass this to a different lib written it C++ that exposes a C interface for interacting with my Rust code, that just won't work.

In other words, a non-C-compatible layout that possibly randomizes for security means I can't send a third-party Rust struct to a C interface I've exposed in a different lib.

I think you're assuming the user might be trying to match some pre-existing C struct; that is not necessarily the case.

@Valloric
Copy link

To elaborate on my example a bit more, imagine the following (quite likely) scenario: I'm part of a large team/org and I'm writing my code in Rust. There's a different team with a C++ legacy codebase that accepts inputs from other libraries. They'd like to provide me with an C interface I can call, but the main data struct I want to give them comes from a third-party Rust lib I've integrated and the struct isn't marked as #[repr(C)]. They can't provide me with an interface that will work since the layout of the Rust struct is unreliable.

@Valloric
Copy link

The problem I'm mentioning could be somewhat mitigated if rustc could have a flag that forces C-compatible layout even for structs that aren't marked with #[repr(C)]... but this still doesn't solve the problem completely. Big corps often have to link in binary blobs from vendors and there's no recompiling those on your end.

I can't see an easy way out of this "library composability" problem; this bothers me because I really like the proposal for both the security and perf benefits.

@huonw
Copy link
Member Author

huonw commented May 17, 2014

Isn't that a problem with traits too? You want to store a value in a HashMap, but upstream doesn't implement Hash for it.

@Valloric
Copy link

Isn't that a problem with traits too? You want to store a value in a HashMap, but upstream doesn't implement Hash for it.

While an issue, that sounds more easily work-aroundable than "I just plain can't pass this to a non-Rust library".

@lilyball
Copy link
Contributor

one of the reasons why not forcing a C-compatible layout is a good idea is security

Not really. Only in the context of sensitive data structures in the kernel.

The real benefit is the compiler can reorder your struct to minimize the amount of packing necessary. As long as you don't have a dependency on the explicit layout of the struct (which for us means using it with C FFI), such a change should not affect the behavior of Rust code, except in that it will make the struct use less memory.

So if I want to pass this to a different lib written it C++ that exposes a C interface for interacting with my Rust code, that just won't work.

If the different lib is interacting with your Rust code, then your Rust code must be exposing an extern "C" interface, which if it exposes the struct, means the struct needs to be marked #[repr(C)]. You can't accidentally expose a struct to outside code.

If you really need to vend a third-party library's struct through FFI, then you can write your own compatible struct, marked as #[repr(C)], and initialize it from the third-party struct you're trying to vend. You can then vend this C-compatible struct.

@bstrie
Copy link
Contributor

bstrie commented May 17, 2014

Isn't that a problem with traits too? You want to store a value in a HashMap, but upstream doesn't implement Hash for it.

While an issue, that sounds more easily work-aroundable than "I just plain can't pass this to a non-Rust library".

Nope, the workaround described here by kballard is pretty much the same process in each case. :)

@bstrie
Copy link
Contributor

bstrie commented May 17, 2014

Though @Valloric raises a similar point to what I was about to come in here and say, which is that I'm curious how this would potentially complicate efforts of other languages to call Rust code natively (that is, via a dedicated Rust FFI rather than routing through C).

@lilyball
Copy link
Contributor

@bstrie For a Rust FFI, I would assume the actual layout of the struct has to be defined somewhere, otherwise how can any Rust code use a struct defined in another library? Which is to say, either the rule for reordering fields has to be fixed, or the chosen field order has to be stored in the crate metadata (which seems like the more sensible approach).

Any third-party language attempting to interact with Rust via a Rust FFI would naturally need to be able to read crate metadata, so it can find the field order.

@pczarn
Copy link

pczarn commented May 17, 2014

The only real benefit is optimisation, but not limited to space optimisation. Consider a struct containing SIMD vectors.

#![feature(simd)]
#[simd]
struct Vec2f(f32, f32);
struct Box {
    bone: int,
    offset: f32x4,
    size: f32x4,
    texture_offset: Vec2f,
    children: uint,
}

The compiler could change the layout so that vector fields are optimally aligned. They already are 16 byte aligned.

@huonw
Copy link
Member Author

huonw commented May 17, 2014

I've removed security as a motivating factor (in light of field randomisation being possible with a simple syntax extension), mentioned the lint in more detail and written up the upstream-FFI struct problem under Drawbacks.

@dobkeratops
Copy link

how about changing #[packed] to mean aligned,reordered packing; add #[unpadded] to do what 'packed' does now, and keep the default as it is -
you can always re-order elements manually swell, and the way struct initialisers works protects users from minding that.

@pczarn
Copy link

pczarn commented May 17, 2014

What #[packed] does now could be generalized to an attribute such as #[fields_aligned_to(1)]. Another attribute should exist to specify explicit alignment for individual fields.

@o11c
Copy link
Contributor

o11c commented May 17, 2014

What #[packed] does now could be generalized to an attribute such as #[fields_aligned_to(1)]. Another attribute should exist to specify explicit alignment for individual fields.

When I first started messing with such things, I always thought it silly that in GCC, __attribute__((aligned(N))) will only increase alignment, not decrease it, so you have to use __attribute__((packed)) as well ... but in recent days, I only:

  • use packed without aligned, for dealing with network ABIs and such.
  • use aligned on an array of char for the purposes of placement new

Besides, since __attribute__((packed)) (and #pragma pack) is already a well-known term, I wouldn't suggest reusing it for something else or even using a new term for that.

Adding an aligned attribute is an old-style RFC, rust-lang/rust#4578

I would like to see:

  • a warning for suboptimally packed structures (but not for structures with only tail padding) - bonus points if it can warn for both 32-bit and 64-bit pointer sizes and also not trigger if the total amount of padding is the same as what tail padding would give (e.g. don't warn for {i8, i32} since it's the same amount of padding as {i32, i8}).
  • that report on how many of Rust's/Servo's struct need FFI compatibility.

# Unresolved questions

- How does this interact with binary compatibility of dynamic libraries?
- Should the lint apply to C-compatible functions defined in Rust like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This is a way better idea than a pervasive default. I care a lot about the compiler maintaining the order I declare my structs in, especially when I'm touching multithreaded code--which is most of my code these days. #[repr(fluid)] might be a better alternative, allowing the compiler to optimize how it sees fit, e.g., alignment when dealing with alignment sensitive platforms. And don't get me started on serialization with this...

@lilyball
Copy link
Contributor

I agree with @o11c, there's enough precedent that #[packed] needs to keep its current meaning.

@mtwilliams
Copy link

@bstrie An acceptable compromise.

@huonw
Copy link
Member Author

huonw commented May 20, 2014

DST probably gets in the way of some of these optimisations, since a type like Rc<T> requires the data field to be at the end, to allow coercing to Rc<Trait>, even if moving the T type before the two reference counts would be more efficient.

@bstrie
Copy link
Contributor

bstrie commented May 20, 2014

Yes, I kept forgetting to mention the DST restriction, but that doesn't mean that optimizations would be wasted on (what I presume to be) the majority of structs that don't contain a DST, nor would it mean that DST-containing structs would be completely unable to benefit from reordering.

@brson
Copy link
Contributor

brson commented May 21, 2014

Accepted as RFC 18.

@bstrie
Copy link
Contributor

bstrie commented May 21, 2014

Is it acceptable for an RFC to be as vague as this one? Even if I support the sentiment, there's no indication of any concrete design, so whose job is it to come up with one? Where does that discussion happen?

@thestinger
Copy link

@bstrie: The important part is making it unspecified by default, and then fiddling with optimizations based on it being unspecified is just internal performance-related work where an RFC isn't usually required.

@pczarn
Copy link

pczarn commented May 21, 2014

I gathered some information about structs defined in Rust's crates.

total shallow padding (bytes) number of structs
0 11151
1 74
2 2
3 102
4 82
5 28
6 75
7 1322
8 6
10 2
11 8
12 2
13 4
14 55
20 2

edit: stopped counting tuples and enums, but my instrumentation still seems unrealistic.

@bstrie
Copy link
Contributor

bstrie commented May 21, 2014

@pczarn, I'd be curious to know more about how your numbers are gathered.

@pczarn
Copy link

pczarn commented May 21, 2014

@bstrie, total padding is the difference between the size of a struct and the sum of the sizes of its fields.

I couldn't get these numbers from (possibly generic) struct definitions in middle::trans::base. I had to write the code elsewhere. Unfortunately, I can't get names of these structs with my code in trans::adt:

diff --git a/src/librustc/middle/trans/adt.rs b/src/librustc/middle/trans/adt.rs
index 9cea6d0..f75c298 100644
--- a/src/librustc/middle/trans/adt.rs
+++ b/src/librustc/middle/trans/adt.rs
@@ -161,6 +161,13 @@ fn represent_type_uncached(cx: &CrateContext, t: ty::t) -> Repr {
             let dtor = ty::ty_dtor(cx.tcx(), def_id).has_drop_flag();
             if dtor { ftys.push(ty::mk_bool()); }

+            let lltys = ftys.iter().map(|&ty| type_of::sizing_type_of(cx, ty)).collect::<Vec<_>>();
+            let llty_rec = Type::struct_(cx, lltys.as_slice(), packed);
+            println!("struct has size {}: {}",
+                machine::llsize_of_alloc(cx, llty_rec),
+                lltys.iter().map(|&typ| machine::llsize_of_alloc(cx, typ)).collect::<Vec<u64>>()
+            );
+
             return Univariant(mk_struct(cx, ftys.as_slice(), packed), dtor)
         }
         ty::ty_enum(def_id, ref substs) => {

(a single line taken from rustc's output: struct has size 48: [4, 1, 8, 16, 16])

@emberian
Copy link
Member

Is this transitive? That is, does it count the padding that the fields of
the struct might have?

On Wed, May 21, 2014 at 7:17 AM, Piotr Czarnecki
[email protected]:

@bstrie https://github.com/bstrie, total padding is the difference
between the size of a struct and the sum of the sizes of its fields.

I didn't manage to get these numbers from (possibly generic) struct
definitions in middle::trans::base. I had to write the code elsewhere.
Unfortunately, I can't get names of these structs with my code:

diff --git a/src/librustc/middle/trans/adt.rs b/src/librustc/middle/trans/adt.rsindex 9cea6d0..f75c298 100644--- a/src/librustc/middle/trans/adt.rs+++ b/src/librustc/middle/trans/adt.rs@@ -161,6 +161,13 @@ fn represent_type_uncached(cx: &CrateContext, t: ty::t) -> Repr {
let dtor = ty::ty_dtor(cx.tcx(), def_id).has_drop_flag();
if dtor { ftys.push(ty::mk_bool()); }

  •        let lltys = ftys.iter().map(|&ty| type_of::sizing_type_of(cx, ty)).collect::<Vec<_>>();+            let llty_rec = Type::struct_(cx, lltys.as_slice(), packed);+            println!("struct has size {}: {}",+                machine::llsize_of_alloc(cx, llty_rec),+                lltys.iter().map(|&typ| machine::llsize_of_alloc(cx, typ)).collect::<Vec<u64>>()+            );+
         return Univariant(mk_struct(cx, ftys.as_slice(), packed), dtor)
     }
     ty::ty_enum(def_id, ref substs) => {
    


Reply to this email directly or view it on GitHubhttps://github.com//pull/79#issuecomment-43759762
.

http://octayn.net/

@pczarn
Copy link

pczarn commented May 21, 2014

@cmr, no, it doesn't care whether any of the fields themselves are packed or not if that's what you mean. However, it could be modified to count the deep padding easily.

Should it count the drop flag? The numbers are quite different without that additional bool.

@emberian
Copy link
Member

Yeah, the drop flag should definitely be counted.

On Wed, May 21, 2014 at 8:22 AM, Piotr Czarnecki
[email protected]:

@cmr https://github.com/cmr, no, it doesn't care whether any of the
fields themselves are packed or not if that's what you mean. However, it
could be modified count the deep padding easily.

Should it count the drop flag? The numbers are quite different without
that additional bool.


Reply to this email directly or view it on GitHubhttps://github.com//pull/79#issuecomment-43769575
.

http://octayn.net/

@emberian
Copy link
Member

(And the "deep" padding too, since that will matter)

On Wed, May 21, 2014 at 10:36 AM, Corey Richardson [email protected] wrote:

Yeah, the drop flag should definitely be counted.

On Wed, May 21, 2014 at 8:22 AM, Piotr Czarnecki <[email protected]

wrote:

@cmr https://github.com/cmr, no, it doesn't care whether any of the
fields themselves are packed or not if that's what you mean. However, it
could be modified count the deep padding easily.

Should it count the drop flag? The numbers are quite different without
that additional bool.


Reply to this email directly or view it on GitHubhttps://github.com//pull/79#issuecomment-43769575
.

http://octayn.net/

http://octayn.net/

@pczarn
Copy link

pczarn commented May 21, 2014

The data is here: https://gist.github.com/pczarn/532a692f105208fcb428

I think I found a way to get rid of duplicates. It's transitive and probably accurate. It reports a few large structs. They are defined in librustc.

The next step is calculating the benefits of layout optimization.

@pczarn
Copy link

pczarn commented May 22, 2014

The following structs need C compatibility.
_libstd_
pthread_mutex_t used
pthread_cond_t used
Unwind_Exception used
_libnative

fd_set
sigaction
sigset_t
_librustuv_
uv_buf_t
uv_stat_t
uv_stdio_container_t
uv_process_options_t
_liblibc_
stat
timespec
utimbuf
glob_t
sockaddr

@SimonSapin
Copy link
Contributor

The RFC talks about the order of fields. Is it safe to assume that it therefore does not affect structs with a single field? In particular, is it safe to transmute between Foo and Bar with struct Foo { data: Bar }?

@emberian
Copy link
Member

emberian commented Dec 7, 2014

Yes, I think that is a sane thing to guarantee. Should clarify that in the RFC.

@Gankra
Copy link
Contributor

Gankra commented Dec 7, 2014

@cmr I am almost certain that some of the other developers have asserted the opposite. That is, that it wouldn't be guaranteed.

@huonw
Copy link
Member Author

huonw commented Dec 7, 2014

It must be guaranteed, since one can take references to the contained struct and all such references need to have the same layout.

withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
cleaned up streams tests and split adapters test into several methods
@Centril Centril added A-repr #[repr(...)] related proposals & ideas A-machine Proposals relating to Rust's abstract machine. A-data-types RFCs about data-types A-product-types Product type related proposals labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-data-types RFCs about data-types A-machine Proposals relating to Rust's abstract machine. A-product-types Product type related proposals A-repr #[repr(...)] related proposals & ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.