Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why don't slices use dedicated metadata structs? #125517

Open
dead-claudia opened this issue May 24, 2024 · 15 comments
Open

Why don't slices use dedicated metadata structs? #125517

dead-claudia opened this issue May 24, 2024 · 15 comments
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@dead-claudia
Copy link

dead-claudia commented May 24, 2024

Curious question: why don't slices also use dedicated wrapper structs? Why just &dyn Trait?

I'd think std::ptr::metadata(slice).len() would be clearer than std::ptr::metadata(slice), for example. You could also add other helper methods, to simplify common tasks that might use it, and the added type safety would make it safer to use. (Safe enough, in fact, for you to entirely replace slice construction APIs with it.)

Concretely, I'm thinking of the following, for &str and &[T]:

// Implements each of the following traits, independent of `T` (can't use `derive`):
// - Debug
// - Clone
// - Copy
// - PartialEq
// - Eq
// - PartialOrd
// - Ord
// - Hash
pub struct SliceMetadata<T> {
    len: usize,
    _phantom: core::marker::PhantomData<T>,
}

impl<T> SliceMetadata<T> {
    /// Create a new synthetic metadata, to pair with a pointer.
    const fn new(len: usize) -> Self {
        Self {
            len: self.len,
            _phantom: core::marker::PhantomData,
        }
    }

    /// Returns the stored length
    const fn len(&self) -> usize {
        self.len
    }

    /// Aligns with `DynMetadata::size_of`
    const fn size_of(&self) -> usize {
        self.len * core::mem::size_of::<T>()
    }

    /// Returns a `Some` of a new metadata for an array of `U`s, with the same
    /// byte length of this slice, or `None` if this cannot be done exactly.
    ///
    /// This mirrors array transmutation.
    const fn transmute<U>(&self) -> Option<SliceMetadata<U>> {
        let src_width = core::mem::size_of::<T>();
        let dest_width = core::mem::size_of::<U>();
        if src_width == dest_width {
            // Easy path, works for everything
            Some(SliceMetadata::new(self.len))
        } else if self.len == 0 {
            // Easy path, works for everything
            Some(SliceMetadata::new(0))
        } else if src_width == 0 {
            // Easy path, works for everything
            Some(SliceMetadata::new(if dest_width == 0 { self.len } else { 0 }))
        } else if dest_width == 0 {
            // can't transmute from non-zero-sized `[A; N]` to zero-sized `[B; M]`
            None
        } else {
            // Time to transmute
            let bytes = self.len * src_width;
            if bytes % dest_width != 0 {
                None
            } else {
                Some(SliceMetadata::new(bytes / dest_width))
            }
        }
    }
}

Also, part of the idea here is to replace slice::from_raw_parts{,_mut} and ptr::slice_from_raw_parts{,_mut} with a less error-prone alternative similarly to how mem::MaybeUninit replaced mem::uninit. So, instead of this:

// Create a slice
let slice = std::slice::from_raw_parts(ptr, len);

// Create a slice pointer
let slice_ptr = std::ptr::slice_from_raw_parts(ptr, len);

// Transmute a slice
let transmuted = std::slice::from_raw_parts(
    slice.as_ptr().cast::<U>(),
    slice.len() / std::mem::size_of::<T>() * std::mem::size_of::<U>()
);

// Cast a slice pointer
let (ptr, len) = std::ptr::metadata(slice_ptr);
let casted_ptr = std::ptr::slice_from_raw_parts(
    ptr.cast::<U>(),
    len / std::mem::size_of::<T>() * std::mem::size_of::<U>()
);

You'd do this instead:

// Create a slice
let slice: &[T] = &*std::ptr::from_raw_parts(ptr, SliceMetadata::new(len));

// Create a slice pointer
let slice_ptr: *const [T] = std::ptr::from_raw_parts(ptr, SliceMetadata::new(len));

// Transmute a slice
let (ptr, metadata) = (slice as *const _).to_raw_parts();
let transmuted: &[U] = &*std::ptr::from_raw_parts(ptr, metadata.transmute().unwrap());

// Cast a slice pointer
let (ptr, metadata) = slice_ptr.to_raw_parts();
let transmuted_ptr: *const [U] = std::ptr::from_raw_parts(ptr, metadata.transmute().unwrap());
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 24, 2024
@dead-claudia
Copy link
Author

Related to #81513

@Rua
Copy link
Contributor

Rua commented May 24, 2024

I was thinking about this myself, and I agree that the metadata for slices should be a distinct type. Moreover, when there is a separate metadata type for each different kind of DST, then it becomes possible to distinguish DSTs by their type. You could specify that a type must be a slice DST, or a trait object DST, or whatever new DSTs people might add in the future.

@zachs18
Copy link
Contributor

zachs18 commented May 26, 2024

If "multiply-unsized" types ever become a thing, e.g. [dyn Trait] or [[T]], then there could be a SliceMetadata<T>::inner method returning <T as Pointee>::Metadata.


Also, part of the idea here is to replace slice::from_raw_parts{,_mut} and ptr::slice_from_raw_parts{,_mut} with a less error-prone alternative similarly to how mem::MaybeUninit replaced mem::uninit.

I don't think SliceMetadata and {from,into}_raw_parts would necessarily replace these functions; IMO the "create a slice" and "create a slice pointer" sections don't look any safer/more-"correct" in the "new" version, they just look a bit less concise. The "cast a slice" srctions do look better though, assuming "keep the byte size the same or fail" is the goal (and if the goal is "keep the same slice length", just a normal cast works).

@dead-claudia
Copy link
Author

dead-claudia commented May 26, 2024

@zachs18 To be clear:

  • Creation of slices I was just thinking it could help make the pointer construction a little more explicit. (The SliceMetadata itself essentially models a std::alloc::Layout with a static alignment, so this is not unlike storing layout info with the data.)
  • The transmuting between slice types is where the safety issues really come into play. You could fix that with a std::mem::transmute_slice(&[T]) -> &[U] method (plus mutable variant), but using metadata would be simpler IMO.

@scottmcm
Copy link
Member

  • (The SliceMetadata itself essentially models a std::alloc::Layout with a static alignment, so this is not unlike storing layout info with the data.)

Note that that's only precisely true for references to slices. "Thanks" to https://doc.rust-lang.org/std/ptr/fn.slice_from_raw_parts.html, a *const [i32] can have usize::MAX as its metadata, even though that overflows usize (not to mention overflowing isize, which is what Layout checks).

@dead-claudia
Copy link
Author

@scottmcm There's also 32-bit x86 with PAE, which extends the address space beyond what a simple pointer can address. It is possible in kernel land to access those beyond-max-pointer offsets.

@scottmcm
Copy link
Member

@dead-claudia Not with offset nor with &[_], though. As mentioned in the offset docs and places like https://doc.rust-lang.org/nightly/std/ptr/#allocated-object, Rust says isize::MAX bytes at most, period. (No, the kernel isn't special here, nor is any target.)

@jieyouxu jieyouxu added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-discussion Category: Discussion or questions that doesn't represent real issues. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 4, 2024
@Rua
Copy link
Contributor

Rua commented Jun 15, 2024

Should the impl<T> not be impl<T: ?Sized> at least for some of the methods? Certainly new and len work as-is with unsized types. Then again, your current implementation of size_of assumes that T is the slice element and not the slice itself.

So what about slice-based custom DSTs? It would definitely be valuable to have a size_of method for those too, but I don't know how it would be implemented. In parallel with DynMetadata maybe there should also be align_of and layout.

EDIT: Another thought: since we already have #69835, it may even be useful for the size_of, align_of and layout methods to be placed on a new trait. This trait, in turn, can then be used by other future APIs. It could also be desirable for the metadata of sized types to be a zero-sized newtype (SizedMetadata? EmptyMetadata? NoneMetadata?) instead of simply (), so that methods and traits can be added to it as well.

@dead-claudia
Copy link
Author

@Rua This issue is focused on slices in particular, and ergonomics surrounding them. That's why everything here is centered around slices.

What you're discussing here, I spelled out a bit in #123353 (closed as I thought I had more collected than I did, and I suspect you're in a similar boat). I suggest filing a new issue, linking to it from the main tracking issue #81513, and seeing where it takes you.

@Rua
Copy link
Contributor

Rua commented Jun 22, 2024

Slices and slice-based DSTs use the same metadata. That means that if you call ptr::metadata on either of them, they are both going to return a SliceMetadata<T> under this proposal. However, that's not possible because T must be sized. The method implementations can be left to another discussion, but the fundamental metadata type must be compatible with both.

@Rua
Copy link
Contributor

Rua commented Jun 22, 2024

A method that could be useful to add for SliceMetadata is this one:

pub fn is_valid(&self) -> bool

It would check if the metadata follows the requirements for functions like mem::size_of_val_raw, Layout::for_value_raw, Layout::array, and indeed SliceMetadata::size_of (which can overflow as currently implemented).

@WaffleLapkin
Copy link
Member

However, that's not possible because T must be sized

It is possible. There isn't really a difference between [T] and WrapsSlice<T>.

If <[T]>::Metadata is SliceMetadata<T>, then given struct WrapsSlice<T>([T]);, the WrapsSlice::<T>::Metadata is also SliceMetadata<T>. This is how metadata currently works for all structures -- metadata of the structure is the metadata of the last field.

Note that SliceMetadata::size_of will not necessarily return the size of the thing a reference with the given metadata points. Metadata only describes the unsized tail for structures:

struct A(u16, [u8]);

let a: &A = ...;
assert_eq!(a.1.len(), 2); // given length of the slice = 2
assert_eq!(metadata(a).size_of(), 2); // the byte size of the slice
assert_eq!(size_of_val(a), 4); // size of the whole structure

(this is the same what currently happens with DynMetadata)

@Rua
Copy link
Contributor

Rua commented Jun 23, 2024

Oh, I see. I misunderstood the proposal, I thought T would be the whole type, not just the slice element. Never mind me then!

I think this looks good then. Will you be implementing it?

@vojtechkral
Copy link
Contributor

passer-by remarks: I'm not sure what the overall benefit of the newtype is. Isn't slice transmutation better solved with slice.align_to()?

Also IMO the marker shouldn't be PhantomData<T> since the newtype does not own any Ts. Maybe use PhantomData<fn() -> T> or const ptr? (Don't remember which is better in this situation.)

@dead-claudia
Copy link
Author

dead-claudia commented Sep 12, 2024

@vojtechkral

passer-by remarks: I'm not sure what the overall benefit of the newtype is. Isn't slice transmutation better solved with slice.align_to()?

Yes, for slices you already have and don't own. This targets two cases:

  1. You're dealing with raw pointers and not just slices.
  2. You're transmuting boxes, vecs, and such (where there isn't an equivalent).
  3. Why don't slices use dedicated metadata structs? #125517 (comment) covers a couple other cases.

Also IMO the marker shouldn't be PhantomData<T> since the newtype does not own any Ts. Maybe use PhantomData<fn() -> T> or const ptr? (Don't remember which is better in this situation.)

You could replace PhantomData<T> with PhantomData<anything that contains a T but doesn't include a lifetime>. The whole point is just to ensure that slice layouts don't get crossed up - this is meant to function as essentially a std::alloc::Layout with a compile-time constant alignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants