-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layout of pointers to slices of ZSTs #224
Comments
I am very against this because it would greatly complicate the unsafe code writing experience for users of the language. |
Not saying it should be done. but will it? which users right now care about the layout of slice? |
The internals of the slice aren't what would complicate it, the fact that a slice is no longer always two Currently, it's an easy rule for people to remember: one usize for normal pointers, and two for slice and trait object pointers. But then if this optimization were put in then every once in a while a slice pointer would be one usize instead of the expected two. Not the end of the world, but it adds complexity to all cases to optimize what I suspect is quite rare. |
Note that multi-trait objects might be larger than two
(and this is without taking into account custom DSTs, which can be arbitrarily large) |
sigh oh well if it's already complex like that then it doesn't hurt extra i guess. |
Related rejected RFC: rust-lang/rfcs#2040 (for thin references rather than slices, which changes the tradeoffs a bit) I agree that this is an unnecessary burden for unsafe code authors, and more generally inelegant. In the framework of custom DSTs, a "pointer to This means that the pointer layout changes not only with the pointee type, but also with the pointer type constructor. In particular, it means that type punning from (something containing a) This specific objection could be avoided by changing all pointer types in the same way, but I think that's even less likely to be acceptable since it would also affect raw pointers. |
@rkruppe Thanks for the link to that RFC. I think most of the arguments made there apply here. This comment by @nikomatsakis is a good summary.
This would require a more "general" and complicated custom DST framework to support these cases, which is unlikely since the current proposal is already complicated enough. That RFC discussion suggest that a better solution might be to offer this via a library type, e.g., maybe something like a That would allow, e.g., a type like |
I think libstd very clearly already promises that Vec's layout isn't clever. |
@Lokathor do you have a link to where libstd makes this guarantee? AFAICT we do guarantee certain cleverness for ZSTs (e.g. that we never allocate), and this result in different behavior (e.g. the capacity of EDIT: That section of libstd does say:
but it's unclear to me at least from that guarantee whether this means that |
I also thought this was a settled issue, but looking through the history seems to have proven me wrong. rust-lang/rust#67024 and rust-lang/rust#45431 are the only two past discussions I can find which cite the "always a triplet" clause as evidence (rust-lang/rust#31554 also quotes it, but it's not directly relevant there). In the former @KrishnaSannasi interpreted it as clearly ruling out implementations which store the capacity on the heap (which seems obviously correct to me, but also very separate from this isssue). In the latter, @gnzlbg raised exactly the same ZST question being discussed here (two years ago!), and explicitly proposed that we reword it to say "where T is not zero-sized" and "When T is zero-sized the layout of Vec is unspecified," but no decision was made. So it seems like we don't actually have anything we can cite as a precedent in either direction. Considering how well-specified a non-ZST Vec's layout is, I do think we should specify (or at least explicitly declare unspecified) a ZST Vec's layout as well. I have no strong opinion on what it should be, since it does seem likely that any code depending on the layout is going to have to think about ZSTs explicitly regardless of what Vec does. |
@gnzlbg every time you ask me where Vec guarantees things I point you to the exact same spot XD There is a section in their docs that's literally called "guarantees", https://doc.rust-lang.org/alloc/vec/struct.Vec.html#guarantees, and it clearly states the triplet thing. It could not be more clear, and it's been like that for as long as I ever remember looking at that page, like 3+ years Edit: and to be clear I don't see how ZST affects it, you'd just have a pointer for no reason. I don't see how Option Vec T affects it, because Option is a totally different type. |
@Lokathor note that |
Yes, that is also true |
Worth noting that code in the wild (the |
@thomcc that pattern is most likely not backed by the layout guarantees that we are making so far... the only legal way to create a |
I didn't mean to imply it was transmuted. They use from_raw_parts: https://github.com/myrrlyn/bitvec/blob/master/src/pointer.rs#L776. If you read the comments in that file it explains in detail what's going on, but basically they're passing a tagged pointer with some special data for the pointer arg of from_raw_parts, and encoding (length_in_bits, bit_offset) into a usize, and passing that in for the length arg of from_raw_parts. (This is in order to emulate a bit-addressed slice) And to be clear I have no horse in this race -- It surprised me that something that dodgy would be in such a widely used crate, and feels like the kind of thing that's really asking for trouble. That said, I do have a pretty strong opinion against the optimization @gnzlbg suggested here. I also worry that something like this would cause bugs in unsafe code, but can be more concrete. My immediate concern boils down to the fact that deref coersions can turn an array into a slice. Specifically, consider a case like: #[repr(C, align(16)]
struct Align16<T>(pub T);
#[repr(C)]
struct Header {
size: usize,
next: *mut Header,
// etc...
data_start: Align16<[(); 0]>
} This is not how I'd do it now, but I'm pretty sure I've done something very close to this when writing a Anyway, if you represented chunks this way, and were returning a ... Anyway, this is just an immediate thought. I imagine it would cause other problems too (maybe for ZSTs which are used to represent opaque C data types). |
I think that using references to ZST like that should in theory cause problems like #134. I'm guessing the reason doesn't in practice is because Miri doesn't track raw pointers and the data is only ever accessed by raw pointers. |
Right now, we specify here that:
However, if
T
is a ZST, we actually only need to store the length of the slice, and can make&[T]
one word wide.When coercing
&[T]
to a*const T
, we can materialize the pointer out of thin air using:std::mem::align_of::<T>() as *const T
.The current specification would prevent such a layout optimization.
Is this an optimization worth supporting?
EDIT: some issues
(ptr, len) != [T]::from_raw_parts(ptr, len).into_raw()
, sinceptr
does not necessarily need to bealign_of::<T>()
, yet if we don't store it, the address is lostThe text was updated successfully, but these errors were encountered: