Buffer: a safe utility collection for allocating memory.#103
Buffer: a safe utility collection for allocating memory.#103reem wants to merge 4 commits intoGankra:masterfrom
Conversation
Buffer provides an entirely safe interface for managing memory within other collections. It provides allocate, reallocate, and deallocate functionality and obeys RAII principles (deallocate on drop). This can be used to simplify the implementation of any data structure which does allocation not through another structure like Vec. From the docs: A safe wrapper around a heap allocated buffer of Ts, tracking capacity only. Buffer makes no promises about the actual contents of this memory, that's up to the user of the structure and can be manipulated using the standard pointer utilities, accessible through the impl of `Deref<Target=*mut T>` for `Buffer<T>`. As a result of this hands-off approach, `Buffer`s destructor does not attempt to drop any of the contained elements; the destructor simply frees the contained memory. You can think of `Buffer<T>` as an approximation for `Box<[T]>` where the elements are not guaranteed to be valid/initialized. It is meant to be used as a building block for other collections, so they do not have to concern themselves with the minutiae of allocating, reallocating, and deallocating memory.
|
I've noticed that this logic gets strewn all over the place in various collections both here and in std, it would be much better to bundle it in one typed, checked place, then just re-use that. |
src/util/buffer.rs
Outdated
There was a problem hiding this comment.
Is this reassigning to self in the reallocate case actually safe with the drop impl? From tests it appears yes, but I'm not actually confident about this.
There was a problem hiding this comment.
Yea, this should be incorrect (?). This should drop self before reassigning it.
|
Sidenote: I really enjoyed using |
|
Other sidenote: this also creates and publicizes a |
|
Receiving a bizarre error on travis that won't show up locally. @gankro does this look familiar? |
|
Woaaah that's crazy. |
There was a problem hiding this comment.
If you make this NonZero<Unique<T>>, you can get rid of the unsafe impls.
There was a problem hiding this comment.
That's what I tried originally, but there is no impl of Zeroable for Unique<T>, so you can't put it in a NonZero.
There was a problem hiding this comment.
That sounds like a libstd bug.
There was a problem hiding this comment.
I would be happy to make this change when it becomes possible, and also change all the underlying allocating utilities to take NonZero<Unique<T>> instead of NonZero<*mut T>, seems like a really excellent use of using the type system to maintain invariants.
There was a problem hiding this comment.
I'm making a PR with the impl, by the way. Running tests now.
|
I like this, but want to grab it without pulling in collect-rs. Can it go in another package instead? |
|
@cgaebel it could go elsewhere if that is desired - I'm not really sure what our policy on such things should be. |
These checks aren't needed since the old size was checked when it was originally allocated.
This allows the use of `NonZero<Unique<T>>` for owned, non-null raw pointers. cc Gankra/collect-rs#103
|
This could be used in #100 to massively simplify and safe-ify a lot of the code. |
|
I'm doing some fixing on this but getting allocation errors... Don't merge yet. |
|
Ok, should be fixed and ready to go. That was a tricky to track down bug. |
Also provide forward-compatibility for the move to `NonZero<Unique<T>>` in the future and tweak some docs.
…location. By only swapping out the buffer and not also changing the capacity to 0 during reallocation, a capacity overflow could cause the drop impl of Buffer to run when the buffer was empty() and the capacity was non-zero, causing an incorrect call to deallocate. We now also swap the capacity to 0, so that when reallocate is called (the site of a possible panic) the buffer is in a consistent empty state. Also, in order to avoid leaking memory on oom (ya kind of useless, but still) reallocate now deallocates the old ptr if reallocation fails.
There was a problem hiding this comment.
I'm... not sure this is what NonZero is intended to be used for.
Like, you can, obviously.
There was a problem hiding this comment.
It provides the invariant that the pointer returned by allocate is never null, i.e. allocate can never fail except by aborting.
There was a problem hiding this comment.
Or did you mean cap? That's just encoding the invariant that capacity must never be 0 in the type system.
|
With respect to simplifying the logic used by collections, to my knowledge there are 4 collections that handle allocation directly: Vec, RingBuf, HashMap, and BTree. HashMap and BTree do a triple-array-as-one-allocation trick, so this utility isn't applicable. RingBuf and Vec would potentially benefit from this design, but Vec is actually pretty delicate pref-wise and does a few fancy things. Not sure if you could just drop this in. RingBuf can benefit, as I demonstrated in my faux allocator PR to rust-lang. |
|
You could use this for any allocated buffer, including HashMap and BTree. It's just a wrapper around |
|
Hash and BTree basically make an ([A; n], [B; n], [C; n]). There's no one T that will work here other than u8, but then you're just doing the all the computations anyway. I believe @gereeter had some interesting ideas about using a kind of builder pattern for this kind of thing, |
|
Also of interest is that Hash stores only the "head" pointer and re-computes the other array offsets at every query, while BTree stores all 3 pointers forever. Both of these choices were based on empirical performance data. But BTree might not want to cache the pointers when B is a known constant. So even if you abstract away the triple-vec logic, you need to provide a mechanism for retrieving the relative offsets. |
|
Sounds more complex than I assumed. I was mostly targeting Vec, HybridVec, RingBuf and similar structures with this anyway. |
|
Regarding making all allocation easier, I had 2 APIs in mind. The first one is simply aimed at avoiding all the nasty calculations involved in allocating space for things like parallel arrays and BTree nodes. This involves creating a let desc = MemDescriptor::from_ty::<SafeHash>().array(cap)
.then(MemDescriptor::from_ty::<K>().array(cap))
.then(MemDescriptor::from_ty::<V>().array(cap));The second API idea uses a trait abstracting over different types of arrays to generically implement things like trait Data {
type Target;
unsafe fn read(&self) -> T;
unsafe fn write(&mut self, value: T);
unsafe fn offset(&self, offset: usize) -> Self;
fn array_descriptor(num_elements: usize) -> MemDescriptor;
}
struct Flat<T> { ptr: NonZero<Unique<T>>}
impl<T> Data for Flat<T> {
type Target = T;
unsafe fn read(&self) -> T { mem::read(self.ptr) }
unsafe fn write(&mut self, value: T) { mem::write(self.ptr, value) }
unsafe fn offset(&self, offset: usize) { Flat { ptr: self.ptr.offset(offset) } }
fn array_descriptor(num_elements: usize) -> MemDescriptor { MemDescriptor::from_ty::<T>().array(num_elements) }
}
struct Parallel<A, B> { first: A, second: B }
impl<A: Data, B: Data> Data for Parallel<A, B> {
type Target = (A::Target, B::Target);
// ...
}With such a trait, |
This allows the use of `NonZero<Unique<T>>` for owned, non-null raw pointers. cc Gankra/collect-rs#103
|
I'm actually kind of interested in moving this into std. We could put the can't-get-bigger-than-int-max logic in it and share it between ringbuf and vec. For reference I think the check only needs to be performed when |
|
@gankro Shouldn't the "can't get larger than isize" be done in the allocator? |
|
The allocator API is already unsafe, and I'm not clear that there's anything fundamentally unsound about making really huge allocations. It just causes trouble when you try to treat it as an array of bytes and ptr offset across the whole range. Maybe that's a problem for |
|
I thought such large allocations were always unsound due to some LLVM stuff, but I don't remember anymore. |
|
@gankro Is there a plan for this? |
|
I can rebase if we want to land this, or remake the PR to std, or move to another package. Thoughts? |
|
RIP Collect; can be its own crate. |
|
@reem I still think the |
|
Yes. I was debating whether it belongs in contain-rs or as a standalone repo. |
Buffer provides an entirely safe interface for managing memory within other
collections. It provides allocate, reallocate, and deallocate functionality
and obeys RAII principles (deallocate on drop).
This can be used to simplify the implementation of any data structure which
does allocation not through another structure like Vec.
From the docs:
A safe wrapper around a heap allocated buffer of Ts, tracking capacity only.
Buffer makes no promises about the actual contents of this memory, that's up
to the user of the structure and can be manipulated using the standard pointer
utilities, accessible through the impl of
Deref<Target=*mut T>forBuffer<T>.As a result of this hands-off approach,
Buffers destructor does not attemptto drop any of the contained elements; the destructor simply frees the contained
memory.
You can think of
Buffer<T>as an approximation forBox<[T]>where the elementsare not guaranteed to be valid/initialized. It is meant to be used as a building
block for other collections, so they do not have to concern themselves with the
minutiae of allocating, reallocating, and deallocating memory.