feat: FromIntoIterator helper for serializing iterator-enabled collections#285
feat: FromIntoIterator helper for serializing iterator-enabled collections#285kskalski merged 9 commits intoanza-xyz:masterfrom
Conversation
| /// use wincode::{Deserialize, Serialize, containers::Seq, len::BincodeLen}; | ||
| /// | ||
| /// #[derive(PartialEq, Debug)] | ||
| /// struct MyCollection<T>(Vec<T>); |
There was a problem hiding this comment.
Are there real-world examples we can point to (e.g., in agave) that need this functionality?
For this example in particular, using a derive would be simpler and more performant.
#[derive(SchemaWrite, SchemaRead)]
struct MyCollection<T>(Vec<T>);There was a problem hiding this comment.
the exact example I had in mind initially is https://github.com/anza-xyz/agave/blob/577e58fea70c1b75c1d70d4031b37bd8450bfc5a/runtime/src/stakes.rs#L172 - basically we use an already implemented (and non-trival) helper instead of manually writing it locally in all such cases.
| let mut reader = unsafe { reader.as_trusted_for(size * len) }?; | ||
| (0..len) | ||
| .map(|_| T::get(reader.by_ref())) | ||
| .collect::<Result<Coll, _>>()? |
There was a problem hiding this comment.
There was a problem hiding this comment.
Uh... quite a catch - I added a dedicated struct that exposes its ref as iteratotor and reads items from reader terminating on any error
| /// | ||
| /// # Allocation efficiency | ||
| /// | ||
| /// During deserialization, elements are collected via an [`ExactSizeIterator`] |
There was a problem hiding this comment.
ExactSizeIterator is not necessarily relevant for deserialization. We construct an ExactSizeIterator via (0..len).
The important bit to note in the documentation is the part you have below:
Collections whose [
FromIterator] implementation uses the size hint to
preallocate capacity will allocate optimally
But even still this is not correct as is because allocation is performed via Result's FromIterator, which does not preallocate.
There was a problem hiding this comment.
updated comment to express that we populate size_hint precisely, so any impl that checks that can allocate optimally
|
We need to support |
| /// # } | ||
| /// ``` | ||
| #[cfg(feature = "alloc")] | ||
| pub struct Seq<Coll, T, Len>(PhantomData<(Coll, T, Len)>); |
There was a problem hiding this comment.
We could perhaps avoid the extra T parameter with something like the following (rough idea)
pub struct Seq<Coll, Len>(PhantomData<(Coll, Len)>);
#[cfg(feature = "alloc")]
unsafe impl<Coll, Len, C: ConfigCore> SchemaWrite<C> for Seq<Coll, Len>
where
Len: SeqLen<C>,
Coll: IntoIterator,
Coll::Item: SchemaWrite<C>,
for<'a> &'a Coll: IntoIterator<Item = &'a <Coll::Item as SchemaWrite<C>>::Src>,
for<'a> <&'a Coll as IntoIterator>::IntoIter: ExactSizeIterator,
{
type Src = Coll;
#[inline]
fn size_of(src: &Coll) -> WriteResult<usize> {
size_of_elem_iter::<Coll::Item, Len, C>(src.into_iter())
}
#[inline]
fn write(writer: impl Writer, src: &Coll) -> WriteResult<()> {
write_elem_iter_prealloc_check::<Coll::Item, Len, C>(writer, src.into_iter())
}
}Would make typing constraints simpler from a user perspective (especially in the KV case)
<Seq<MyCollection<u32>, BincodeLen>>::deserialize(&bytes).unwrap();
<SeqKv<MyMap<u32, u64>, BincodeLen>>::deserialize(&bytes).unwrap();There was a problem hiding this comment.
aha, cool, this worked. the only weird aspect is that for SchemaRead the iterator trait doesn't have associated Item type, so... now SchemaRead is also constrainted on IntoIterator just to access its item... this has a couple of bad consequences, but I think it's acceptable tradeoff given how our impl requires specific traits and signatures to match, e.g. & into iterator to produce (&K, &V) and from iterator to consume (K,V)
Yeah, true. We don't want to pull in all deps of agave in Since this example is already using a newtype, we could just implement |
48eb697 to
eeb8b65
Compare
|
Ok, I also managed to remove K,V type params from so I think this is ready to check out again. |
|
Even better - |
| /// Unlike `collect::<Result<C, _>>()` this preserves `remaining` in `size_hint` | ||
| /// so that collections can preallocate the expected capacity. | ||
| #[cfg(feature = "alloc")] | ||
| struct SchemaReadIter<'de, T, C, R> { |
There was a problem hiding this comment.
We could make something more generic here that isn't coupled to SchemaRead
E.g.,
struct ResultPrealloc<T, E>(Result<T, E>);
impl<A, E, V: FromIterator<A>> FromIterator<Result<A, E>> for ResultPrealloc<V, E> {
fn from_iter<I: IntoIterator<Item = Result<A, E>>>(iter: I) -> ResultPrealloc<V, E> {
struct Iter<I, E> {
inner: I,
error: Option<E>,
}
impl<I, T, E> Iterator for Iter<I, E>
where
I: Iterator<Item = Result<T, E>>,
{
type Item = T;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
match self.inner.next()? {
Ok(item) => Some(item),
Err(e) => {
self.error = Some(e);
None
}
}
}
#[inline]
fn size_hint(&self) -> (usize, Option<usize>) {
self.inner.size_hint()
}
}
let mut iter = Iter {
inner: iter.into_iter(),
error: None,
};
let result = V::from_iter(&mut iter);
match iter.error {
None => ResultPrealloc(Ok(result)),
Some(e) => ResultPrealloc(Err(e)),
}
}
}
trait CollectResultExt<T, E>: Iterator<Item = Result<T, E>> {
#[inline]
fn collect_result_prealloc<B>(self) -> Result<B, E>
where
B: FromIterator<T>,
Self: Sized,
{
self.collect::<ResultPrealloc<B, E>>().0
}
}
impl<T, E, I> CollectResultExt<T, E> for I where I: Iterator<Item = Result<T, E>> {}Such that the following is possible:
let coll = (0..len)
.map(|_| T::get(reader.by_ref()))
.collect_result_prealloc()?;And SchemaRead specializations (that do things like the automatic as_trusted_for) could be built on top of this fairly easily. Though, the above code is trivial enough that we probably don't need an extra specialization.
There was a problem hiding this comment.
right, it seems a better separation and more generic solution, applied
9bcc6af to
22eb857
Compare
| /// map: MyMap<u32, u64>, | ||
| /// } | ||
| /// ``` | ||
| #[cfg(feature = "alloc")] |
There was a problem hiding this comment.
I wonder if we actually need to gate this on alloc. Conceivably one could have a type that writes to stack memory in its FromIterator implementation
There was a problem hiding this comment.
right, one could "preallocate" using some custom way that doesn't use alloc... at least the library doesn't allocate, so I will drop alloc
Add
FromIntoIterator<Coll<T>, Len>(also usable asFromIntoIterator<Coll<K, V>, Len>generic container schema for external collection types that cannot have dedicated schema impls added directly.It covers
Collwhere&Coll: IntoIterator<Item = &T::Src>(write) andColl: FromIterator<T::Dst>(read), as well as other variants of iterator items used by map collections, where the iterator yields(&K::Src, &V::Src)pairs andFromIteratoraccepts(K::Dst, V::Dst)tuples.It preserves the static-size trusted-window read / write optimization from the existing macro-based impls.
The main caveat with using this impl is whether given collection's
FromIteratoris able to benefit from passed iterator having target len insize_hint, if they do not then it silently does more allocations than having collection created through possiblewith_capacityAPIs - this should be only a problem for poorly implemented collections.