Skip to content

feat: FromIntoIterator helper for serializing iterator-enabled collections#285

Merged
kskalski merged 9 commits intoanza-xyz:masterfrom
kskalski:ks/seq
Apr 9, 2026
Merged

feat: FromIntoIterator helper for serializing iterator-enabled collections#285
kskalski merged 9 commits intoanza-xyz:masterfrom
kskalski:ks/seq

Conversation

@kskalski
Copy link
Copy Markdown
Contributor

@kskalski kskalski commented Apr 3, 2026

Add FromIntoIterator<Coll<T>, Len> (also usable as FromIntoIterator<Coll<K, V>, Len> generic container schema for external collection types that cannot have dedicated schema impls added directly.

It covers Coll where &Coll: IntoIterator<Item = &T::Src> (write) and Coll: FromIterator<T::Dst> (read), as well as other variants of iterator items used by map collections, where the iterator yields (&K::Src, &V::Src) pairs and FromIterator accepts (K::Dst, V::Dst) tuples.
It preserves the static-size trusted-window read / write optimization from the existing macro-based impls.

The main caveat with using this impl is whether given collection's FromIterator is able to benefit from passed iterator having target len in size_hint, if they do not then it silently does more allocations than having collection created through possible with_capacity APIs - this should be only a problem for poorly implemented collections.

@kskalski kskalski marked this pull request as ready for review April 3, 2026 12:38
Comment thread wincode/src/schema/containers.rs Outdated
/// use wincode::{Deserialize, Serialize, containers::Seq, len::BincodeLen};
///
/// #[derive(PartialEq, Debug)]
/// struct MyCollection<T>(Vec<T>);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there real-world examples we can point to (e.g., in agave) that need this functionality?

For this example in particular, using a derive would be simpler and more performant.

#[derive(SchemaWrite, SchemaRead)]
struct MyCollection<T>(Vec<T>);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the exact example I had in mind initially is https://github.com/anza-xyz/agave/blob/577e58fea70c1b75c1d70d4031b37bd8450bfc5a/runtime/src/stakes.rs#L172 - basically we use an already implemented (and non-trival) helper instead of manually writing it locally in all such cases.

Comment thread wincode/src/schema/containers.rs Outdated
let mut reader = unsafe { reader.as_trusted_for(size * len) }?;
(0..len)
.map(|_| T::get(reader.by_ref()))
.collect::<Result<Coll, _>>()?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result's FromIterator will not actually preallocate, so this will actually start with capacity 0 and incrementally allocate as elements are added.
[1, 2, 3]

To solve this, we will probably need some kind of custom filter-adjacent adapter that preserves the parent iterator ((0..len)) 's size_hint

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh... quite a catch - I added a dedicated struct that exposes its ref as iteratotor and reads items from reader terminating on any error

Comment thread wincode/src/schema/containers.rs Outdated
///
/// # Allocation efficiency
///
/// During deserialization, elements are collected via an [`ExactSizeIterator`]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExactSizeIterator is not necessarily relevant for deserialization. We construct an ExactSizeIterator via (0..len).

The important bit to note in the documentation is the part you have below:

Collections whose [FromIterator] implementation uses the size hint to
preallocate capacity will allocate optimally

But even still this is not correct as is because allocation is performed via Result's FromIterator, which does not preallocate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated comment to express that we populate size_hint precisely, so any impl that checks that can allocate optimally

@kskalski
Copy link
Copy Markdown
Contributor Author

kskalski commented Apr 3, 2026

We need to support im (https://github.com/anza-xyz/agave/blob/c78eb04311b95441e287e090d648e2def57dd20d/runtime/src/stakes/serde_stakes.rs#L163) or maybe imbl (anza-xyz/agave#10762). I guess there is a limit on how many random or exotic dependencies we want to pull in as for external/ implementations.

Comment thread wincode/src/schema/containers.rs Outdated
/// # }
/// ```
#[cfg(feature = "alloc")]
pub struct Seq<Coll, T, Len>(PhantomData<(Coll, T, Len)>);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could perhaps avoid the extra T parameter with something like the following (rough idea)

pub struct Seq<Coll, Len>(PhantomData<(Coll, Len)>);

#[cfg(feature = "alloc")]
unsafe impl<Coll, Len, C: ConfigCore> SchemaWrite<C> for Seq<Coll, Len>
where
    Len: SeqLen<C>,
    Coll: IntoIterator,
    Coll::Item: SchemaWrite<C>,
    for<'a> &'a Coll: IntoIterator<Item = &'a <Coll::Item as SchemaWrite<C>>::Src>,
    for<'a> <&'a Coll as IntoIterator>::IntoIter: ExactSizeIterator,
{
    type Src = Coll;

    #[inline]
    fn size_of(src: &Coll) -> WriteResult<usize> {
        size_of_elem_iter::<Coll::Item, Len, C>(src.into_iter())
    }

    #[inline]
    fn write(writer: impl Writer, src: &Coll) -> WriteResult<()> {
        write_elem_iter_prealloc_check::<Coll::Item, Len, C>(writer, src.into_iter())
    }
}

Would make typing constraints simpler from a user perspective (especially in the KV case)

<Seq<MyCollection<u32>, BincodeLen>>::deserialize(&bytes).unwrap();
<SeqKv<MyMap<u32, u64>, BincodeLen>>::deserialize(&bytes).unwrap();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, cool, this worked. the only weird aspect is that for SchemaRead the iterator trait doesn't have associated Item type, so... now SchemaRead is also constrainted on IntoIterator just to access its item... this has a couple of bad consequences, but I think it's acceptable tradeoff given how our impl requires specific traits and signatures to match, e.g. & into iterator to produce (&K, &V) and from iterator to consume (K,V)

@cpubot
Copy link
Copy Markdown
Contributor

cpubot commented Apr 3, 2026

We need to support im (https://github.com/anza-xyz/agave/blob/c78eb04311b95441e287e090d648e2def57dd20d/runtime/src/stakes/serde_stakes.rs#L163) or maybe imbl (anza-xyz/agave#10762). I guess there is a limit on how many random or exotic dependencies we want to pull in as for external/ implementations.

Yeah, true. We don't want to pull in all deps of agave in external.

Since this example is already using a newtype, we could just implement SchemaRead / SchemaWrite for it in agave.

@kskalski kskalski force-pushed the ks/seq branch 4 times, most recently from 48eb697 to eeb8b65 Compare April 6, 2026 05:25
@kskalski
Copy link
Copy Markdown
Contributor Author

kskalski commented Apr 7, 2026

Ok, I also managed to remove K,V type params from SeqKv and double-checked the helper works properly for im and imbl crates, for types like:

+            type ImHashSetSeq = containers::Seq<im::HashSet<u32>, BincodeLen>;
+            type ImOrdSetSeq = containers::Seq<im::OrdSet<u32>, BincodeLen>;
+            type ImHashMapSeq = containers::SeqKv<im::HashMap<u32, u32>, BincodeLen>;
+            type ImOrdMapSeq = containers::SeqKv<im::OrdMap<u32, u32>, BincodeLen>;
+            type ImblHashSetSeq = containers::Seq<imbl::HashSet<u32>, BincodeLen>;
+            type ImblOrdSetSeq = containers::Seq<imbl::OrdSet<u32>, BincodeLen>;
+            type ImblHashMapSeq = containers::SeqKv<imbl::HashMap<u32, u32>, BincodeLen>;
+            type ImblOrdMapSeq = containers::SeqKv<imbl::OrdMap<u32, u32>, BincodeLen>;

so I think this is ready to check out again.

@kskalski kskalski requested a review from cpubot April 7, 2026 09:20
@kskalski
Copy link
Copy Markdown
Contributor Author

kskalski commented Apr 7, 2026

Even better - SeqKv could be removed, since now Seq can work flexibly on borrowed or owned iterator elements.

@kskalski kskalski changed the title feat: Seq and SeqKv helpers for serializing iterator-enabled collections feat: Seq helper for serializing iterator-enabled collections Apr 7, 2026
Comment thread wincode/src/schema/mod.rs Outdated
Comment thread wincode/src/schema/mod.rs Outdated
Comment thread wincode/src/schema/containers.rs Outdated
Comment thread wincode/src/schema/containers.rs Outdated
Comment thread wincode/src/schema/containers.rs Outdated
/// Unlike `collect::<Result<C, _>>()` this preserves `remaining` in `size_hint`
/// so that collections can preallocate the expected capacity.
#[cfg(feature = "alloc")]
struct SchemaReadIter<'de, T, C, R> {
Copy link
Copy Markdown
Contributor

@cpubot cpubot Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make something more generic here that isn't coupled to SchemaRead
E.g.,

struct ResultPrealloc<T, E>(Result<T, E>);

impl<A, E, V: FromIterator<A>> FromIterator<Result<A, E>> for ResultPrealloc<V, E> {
    fn from_iter<I: IntoIterator<Item = Result<A, E>>>(iter: I) -> ResultPrealloc<V, E> {
        struct Iter<I, E> {
            inner: I,
            error: Option<E>,
        }

        impl<I, T, E> Iterator for Iter<I, E>
        where
            I: Iterator<Item = Result<T, E>>,
        {
            type Item = T;

            #[inline]
            fn next(&mut self) -> Option<Self::Item> {
                match self.inner.next()? {
                    Ok(item) => Some(item),
                    Err(e) => {
                        self.error = Some(e);
                        None
                    }
                }
            }

            #[inline]
            fn size_hint(&self) -> (usize, Option<usize>) {
                self.inner.size_hint()
            }
        }

        let mut iter = Iter {
            inner: iter.into_iter(),
            error: None,
        };
        let result = V::from_iter(&mut iter);
        match iter.error {
            None => ResultPrealloc(Ok(result)),
            Some(e) => ResultPrealloc(Err(e)),
        }
    }
}

trait CollectResultExt<T, E>: Iterator<Item = Result<T, E>> {
    #[inline]
    fn collect_result_prealloc<B>(self) -> Result<B, E>
    where
        B: FromIterator<T>,
        Self: Sized,
    {
        self.collect::<ResultPrealloc<B, E>>().0
    }
}
impl<T, E, I> CollectResultExt<T, E> for I where I: Iterator<Item = Result<T, E>> {}

Such that the following is possible:

let coll = (0..len)
    .map(|_| T::get(reader.by_ref()))
    .collect_result_prealloc()?;

And SchemaRead specializations (that do things like the automatic as_trusted_for) could be built on top of this fairly easily. Though, the above code is trivial enough that we probably don't need an extra specialization.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, it seems a better separation and more generic solution, applied

Comment thread wincode/src/schema/containers.rs Outdated
@kskalski kskalski force-pushed the ks/seq branch 2 times, most recently from 9bcc6af to 22eb857 Compare April 9, 2026 06:15
@kskalski kskalski changed the title feat: Seq helper for serializing iterator-enabled collections feat: FromIntoIterator helper for serializing iterator-enabled collections Apr 9, 2026
Comment thread wincode/src/schema/containers.rs Outdated
/// map: MyMap<u32, u64>,
/// }
/// ```
#[cfg(feature = "alloc")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we actually need to gate this on alloc. Conceivably one could have a type that writes to stack memory in its FromIterator implementation

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, one could "preallocate" using some custom way that doesn't use alloc... at least the library doesn't allocate, so I will drop alloc

Copy link
Copy Markdown
Contributor

@cpubot cpubot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@kskalski kskalski merged commit 2e1cb62 into anza-xyz:master Apr 9, 2026
4 checks passed
@kskalski kskalski deleted the ks/seq branch April 9, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants