Skip to content

Do not emit separator before elements for Intersperse on non-fused iterators#152855

Open
zakarumych wants to merge 1 commit intorust-lang:mainfrom
zakarumych:defused-intersperse
Open

Do not emit separator before elements for Intersperse on non-fused iterators#152855
zakarumych wants to merge 1 commit intorust-lang:mainfrom
zakarumych:defused-intersperse

Conversation

@zakarumych
Copy link

This change is related to unstable feature #![feature(iter_intersperse)]
#79524

What this does is changes behavior of Intersperse on non Fused iterators.
Particularly in case of iterator that returns None once from next method and then proceeds to yield elements normally.
Without this change the Intersperse will also return None, but then it will yield separator before first actual element from inner iterator.
This change fixes that, so even if underlying iterator starts with Nones, Intersperse waits for one item to be returned before inserting separators.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 19, 2026
@rustbot
Copy link
Collaborator

rustbot commented Feb 19, 2026

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @scottmcm, libs
  • @scottmcm, libs expanded to 8 candidates
  • Random selection from Mark-Simulacrum, jhpratt, scottmcm

@asder8215
Copy link
Contributor

asder8215 commented Feb 20, 2026

My understanding of this part here:

What this does is changes behavior of Intersperse on non Fused iterators.
Particularly in case of iterator that returns None once from next method and then proceeds to yield elements normally.

Is that you have a Non-Fused iterator that could iterate like this: None, Some(1), None, Some(2), ...

An example of an iterator I'm imagining that you're talking about is this:

#[derive(Debug)]
struct TestCounter {
    counter: usize,
}

impl Iterator for TestCounter {
    type Item = usize;

    fn next(&mut self) -> Option<Self::Item> {
        if self.counter > 5 {
            None
        } else if self.counter % 2 == 0 {
            self.counter += 1;
            None
        } else {
            self.counter += 1;
            Some(self.counter)
        } 
    }
}

Where this produces an iterator: None -> Some(2) -> None -> Some(4) -> None -> Some(6) -> None endlessly.
The concern here is that intersperse will return an iterator that produces a separator after the first None from how start is not set to true initially:

        if self.started {
            if let Some(v) = self.next_item.take() {
                Some(v)
            } else {
                let next_item = self.iter.next();
                if next_item.is_some() {
                    self.next_item = next_item;
                    Some(self.separator.clone())
                } else {
                    None
                }
            }
        } else {
            self.started = true;
            self.iter.next()
        }

which I'm assuming you thought it might allow it to produce an iterator like: None -> Some(#) -> Some(2) -> None endlessly (if we use # as our separator in between elements).

However, I want to point out that in Intersperse::new struct's and IntersperseWith::new (returned by intersperse() and intersperse_with()) actually converts the passed in iter into a fused iterator. You can see it in the file you edited:

impl<I: Iterator> Intersperse<I>
where
    I::Item: Clone,
{
    pub(in crate::iter) fn new(iter: I, separator: I::Item) -> Self {
        Self { started: false, separator, next_item: None, iter: iter.fuse() }
    }
}

impl<I, G> IntersperseWith<I, G>
where
    I: Iterator,
    G: FnMut() -> I::Item,
{
    pub(in crate::iter) fn new(iter: I, separator: G) -> Self {
        Self { started: false, separator, next_item: None, iter: iter.fuse() }
    }
}

So the iter inside Intersperse/IntersperseWith will construct an iterator from the passed in iter that always yield None on the first time it sees None.

Unless I'm misunderstanding something, I don't think this is something to be concerned about since Intersperse/IntersperseWith covers that.

@zakarumych
Copy link
Author

zakarumych commented Feb 20, 2026

@asder8215 thank you for pointing out that Intersperse and IntersperseWith fuse the inner iterator on construction.
I didn't notice it. I only noticed that FusedIterator implementation on Intersperse requires FusedIterator on inner iterator and assumed it is not fused.

Either Fuse should be removed, or the I: FusedIterator bound on FusedIterator implementation should be relaxed to I: Iterator

Do you know why Fuse was used in the first place? If just because itertools use it, then why it was used there?

It's not like I care a lot about non-fused iterators. I haven't used them for 10 years. But I love Rust for attention to details and this detail needs addressing IMHO ^_^

@asder8215
Copy link
Contributor

asder8215 commented Feb 20, 2026

For reference, this change was introduced in #111379.

I didn't notice it. I only noticed that FusedIterator implementation on Intersperse requires FusedIterator on inner iterator and assumed it is not fused.

My assumption is that Intersperse/IntersperseWith iterator's goal is to produce a separator between adjacent Some(v) items only. That's my interpretation of the docs for intersperse/intersperse_with:

/// Creates a new iterator which places a copy of `separator` between adjacent
/// items of the original iterator.

Usually, None is treated as a marker to finish examining the iterator (not always though, which from here you can see cases where we have an iterator that continues to yield items after seeing None). I'm not sure why one might need to use Intersperse/IntersperseWith on an iterator that doesn't have a guaranteed behavior of using None as a marker that we're done navigating the iterator (though it might be a good question to ask about the behavior of this).

On second look with the code change you made as well, I realized that doing:

       if self.started {
            if let Some(v) = self.next_item.take() {
                Some(v)
            } else {
                let next_item = self.iter.next();
                if next_item.is_some() {
                    self.next_item = next_item;
                    Some(self.separator.clone())
                } else {
                    None
                }
            }
        } else {
            let item = self.iter.next();
            self.started = item.is_some();
            item
        }

Will return a Intersperse/IntersperseWith that will enter the if self.started case with the example I mentioned (None -> Some(2) -> None -> Some(4) -> None -> Some(6) -> None; it'll set self.started to false initially, then true in the next call to .next() and will stay at true for the whole time).

I think using a Fuse<I> for Intersperse/IntersperseWith iterator is fine if we're under the assumption that Intersperse/IntersperseWith should end at None from an iterator (ending meaning that Intersperse/IntersperseWith will return None endlessly at this point). But I guess if you have something like None -> Some("This") -> Some("is") -> None -> Some("a") -> None -> Some("sentence"), it makes me wonder if we should extend producing a separator on 'adjacency' to just in between Some(v) items? I see the use case for Postgres' TimeoutIter as an iterator that returns None but it's not indicative of it being the end of an iterator, but I'm not sure if Rust's standard library considers these iterator interfaces.

@asder8215
Copy link
Contributor

Oh another reason why they do iter.fuse() seems to be that Iterator::intersperse is based on itertools::intersperse, which does iter.fuse() as well. You can see it here.

@zakarumych
Copy link
Author

@asder8215 thank you.
I already mentioned itertools and that's it probably fuses iterator because implementation is port from itertools.

However, intersperse does not specify that it fuses inner iterator. Nor implements FusedIterator unless inner one is FusedIteartor (unlike itertools::Intersperse which does implement FusedIterator regardless).

In case of iterator that produces None -> Some("This") -> Some("is") -> None -> Some("a") -> None -> Some("sentence") and interspersing it with " ", there are two possible behavior (apart from stopping at first None.

1: None -> Some("This") -> Some(" ") -> Some("is") -> None -> Some(" ") -> Some("a") -> None -> Some(" ") -> Some("sentence"), i.e. adding separator before every item except the first one.

  1. None -> Some("This") -> Some(" ") -> Some("is") -> None -> Some("a") -> None -> Some("sentence"), i.e. adding separator between series of items.

I like 1st variant more.

If fusing is desired, it should be clearly stated.

@asder8215
Copy link
Contributor

asder8215 commented Feb 20, 2026

I like 1st variant more.

I agree with this.

If fusing is desired, it should be clearly stated.

I would recommend bringing this up to the discussion in the issue (or if there's a Zulip thread for this, then that would be an appropriate place to talk as well). I'd also like to hear Mark's thoughts on this.

@Mark-Simulacrum
Copy link
Member

Can we update the documentation on intersperse, so it's easier to read what the guarantee change is? This behavior should be documented either way, I think. It would also be good to add some tests (perhaps after we land on the desired behavior with team input, depending on how easy it is to shift their behavior to match what's agreed on).

I think my intuition is that we should document intersperse as not having any guaranteed behavior for non-fused iterators (i.e., we are free to do anything sound in terms of where separators are introduced). It's possible the methods should be Self: FusedIterator...

I'll nominate for libs-api to discuss as well.

@Mark-Simulacrum Mark-Simulacrum added I-libs-api-nominated Nominated for discussion during a libs-api team meeting. S-waiting-on-t-libs-api Status: Awaiting decision from T-libs-api and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 23, 2026
@the8472
Copy link
Member

the8472 commented Feb 24, 2026

Should this affect the size_hint if fewer items are emitted?

@joshtriplett
Copy link
Member

We talked about this in today's @rust-lang/libs-api meeting.

We're fine with making this change for now, but we'd like to see tests added to validate the behavior.

Separately, we'd like to see an unresolved question added to the checklist in the tracking issue for intersperse, about deciding if we want to unconditionally .fuse() the iterator.

@asder8215
Copy link
Contributor

asder8215 commented Feb 24, 2026

We're fine with making this change for now, but we'd like to see tests added to validate the behavior.

Just for clarification about the change, are you saying fine with the current behavior or with one of the following suggestions made by @zakarumych here:

1: None -> Some("This") -> Some(" ") -> Some("is") -> None -> Some(" ") -> Some("a") -> None -> Some(" ") -> Some("sentence"), i.e. adding separator before every item except the first one.
2: None -> Some("This") -> Some(" ") -> Some("is") -> None -> Some("a") -> None -> Some("sentence"), i.e. adding separator between series of items.

JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Mar 3, 2026
…hpratt

Clarified doc comments + added tests confirming current behavior for intersperse/intersperse_with

This PR builds on top of rust-lang#152855. I just added clarifying comments to `intersperse`/`intersperse_with` about its guarantees for fused iterators (and how behavior for non-fused iterators are subject to change). I also added in tests for non-fused iterators demonstrating its current behavior; fused iterators are already tested for in existing tests for `intersperse`/`intersperse_with`.
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Mar 3, 2026
…hpratt

Clarified doc comments + added tests confirming current behavior for intersperse/intersperse_with

This PR builds on top of rust-lang#152855. I just added clarifying comments to `intersperse`/`intersperse_with` about its guarantees for fused iterators (and how behavior for non-fused iterators are subject to change). I also added in tests for non-fused iterators demonstrating its current behavior; fused iterators are already tested for in existing tests for `intersperse`/`intersperse_with`.
rust-timer added a commit that referenced this pull request Mar 3, 2026
Rollup merge of #153265 - asder8215:intersperse_changes, r=jhpratt

Clarified doc comments + added tests confirming current behavior for intersperse/intersperse_with

This PR builds on top of #152855. I just added clarifying comments to `intersperse`/`intersperse_with` about its guarantees for fused iterators (and how behavior for non-fused iterators are subject to change). I also added in tests for non-fused iterators demonstrating its current behavior; fused iterators are already tested for in existing tests for `intersperse`/`intersperse_with`.
@nia-e
Copy link
Member

nia-e commented Mar 3, 2026

In the @rust-lang/libs-api meeting today we concluded that there might be possible usecases for both but we lack examples of unfused iterators being used in the wild. cc @asder8215 @zakarumych - which behaviour would be more appropriate?

@zakarumych
Copy link
Author

zakarumych commented Mar 3, 2026

@nia-e My only use case was collecting non-fused iterator repeatedly into single collection, where "emitting separator before every element except the first one" would be more appropriate. But I ended up not using intersperse there.

I can also imagine equally probable use case of collecting iterator into separate collections, where emission of separator only between elements in a series would make more sense.

Implementation for either use case is trivial. Explaining first case in documentation is probably easier.
But I can change it to 2nd one if libs team would decide it's better.

@tgross35 tgross35 changed the title Do not emit separator as before elements Do not emit separator before elements for Intersperse on non-fused iterators Mar 3, 2026
@asder8215
Copy link
Contributor

asder8215 commented Mar 4, 2026

@nia-e

In the @rust-lang/libs-api meeting today we concluded that there might be possible usecases for both but we lack examples of unfused iterators being used in the wild.

Note that I don't personally use non-fused iterators, but I do know that it was mentioned by @cuviper in this discussion that TryIter in std::sync::mpsc is an example of an iterator that is non-fused. For TryIter it looks like it uses try_recv().ok() for its .next() implementation, which indicates that None here could mean we just didn't receive data immediately (and we instead we could check something else to see if we're done iterating; I do notice that there's no is_disconnected method for mpsc channel though...). Maybe for TryIter or similar examples like these, intersperse/intersperse_with might be useful to adding a separator (or certain separators) between values we receive in a separate thread.

Taking from the example mentioned in the TryIter doc and modifying it, here's a dummy example of how intersperse could play a role with unfused iterators (though it's not really a useful example and could definitely been written better):

#![feature(iter_intersperse)]
fn main() {
    use std::cell::Cell;
    use std::sync::mpsc::channel;
    use std::thread;
    use std::time::Duration;

    let (sender, receiver) = channel();

    // Nothing is in the buffer yet
    assert!(receiver.try_iter().next().is_none());
    println!("Nothing in the buffer...");

    thread::spawn(move || {
        sender.send("This").unwrap();
        sender.send("is").unwrap();
        sender.send("a").unwrap();
        sender.send("sentence").unwrap();
        sender.send("DONE").unwrap();
    });

    // Now imagine this without sleeping the thread here
    // and the sender thread taking a while in between
    // to send data, causing `try_recv` to hang and
    // return `None`
    // thread::sleep(Duration::from_secs(2));

    let prev_word: Cell<Option<&str>> = Cell::new(None);
    // Makes me realize that it would be useful if `intersperse_with` had an
    // immutable borrow to current `Some(value)` it saw from its inner iter
    // and decide what separator to add
    let mut recv_intersperse_iter = receiver.try_iter().intersperse_with(|| {
        if let Some(_) = prev_word.get() {
            " "
        } else {
            // this is an impossible case to reach with the current
            // implementation of `intersperse` turning the iter into
            // a fused iterator
            "" 
        }
    });

    loop {
        prev_word.set(recv_intersperse_iter.next());
        // I would prefer checking to see if the `Receiver`
        // is disconnected here tbh, otherwise I need to use
        // smth else to signal I'm done.
        if let Some(word) = prev_word.get() {
            if word == "DONE" {
                println!();
                break;
            } else {
                print!("{}", word);
            }
        }
    }
}

This would print "This is a sentence " in the terminal (extra space after sentence is not ideal, but eh). It's a silly example, but there is the unfused iterator in std library and I'm sure someone may find their own niche use for this. I just think it'd be nice to have that flexibility in allowing intersperse/intersperse_with to work with unfused iterators naturally (without it transforming the unfused iterator into a fused one). I much prefer the first behavior mentioned by @zakarumych here:

In case of iterator that produces None -> Some("This") -> Some("is") -> None -> Some("a") -> None -> Some("sentence") and interspersing it with " ", there are two possible behavior (apart from stopping at first None).
1: None -> Some("This") -> Some(" ") -> Some("is") -> None -> Some(" ") -> Some("a") -> None -> Some(" ") -> Some("sentence"), i.e. adding separator before every item except the first one.

The first behavior seems like the more intuitive thing intersperse/intersperse_with should do on non-fused iterators and I agree with Zakarumych that this is easier to document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I-libs-api-nominated Nominated for discussion during a libs-api team meeting. S-waiting-on-t-libs-api Status: Awaiting decision from T-libs-api T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants