concatMapIterable could be enhanced to discard iterator remainder #2014

simonbasle · 2020-01-14T15:13:01Z

From #1925:

I can't verify because the step before bufferUntil is concatMapIterable which produces a collection of allocated items and those don't seem to pass through doOnDiscard in case of a downstream error. Could there be an issue with concatMapIterable?

Here is a simplified test:

@Test
void concatMapIterableDoOnDiscardTest() {

	Foo foo1 = new Foo();
	Foo foo2 = new Foo();
	Foo foo3 = new Foo();

	Flux<Foo> source = Flux.just(1)
			.concatMapIterable(i -> Arrays.asList(foo1, foo2, foo3))
			.doOnDiscard(Foo.class, Foo::release);

	StepVerifier.create(source)
			.consumeNextWith(foo -> {
				foo.release();
			})
			.thenCancel()
			.verify();

	assertThat(foo1.getRefCount()).isEqualTo(0); // okay
	assertThat(foo2.getRefCount()).isEqualTo(0); // fails
	assertThat(foo3.getRefCount()).isEqualTo(0); // fails
}

static class Foo {

	int refCount = 1;

	public int getRefCount() {
		return this.refCount;
	}

	public void release() {
		this.refCount = 0;
	}
}

Originally posted by @rstoyanchev in #1925 (comment)

The text was updated successfully, but these errors were encountered:

simonbasle · 2020-01-14T15:15:54Z

Currently concatMapIterable doesn't attempt to walk the remainder of the current iterator to discard further elements upon cancellation.

We could attempt to discard the iterator on top of the prefetched source elements.

This is tricky though, since an arbitrary Iterable/Iterator can be lazy and infinite...

In order to implement a best effort solution, we can retain the Class<Iterable> and inspect it, only attempting to discard the remainder of the Iterator if original iterable is either a Collection or Tuple2.

bsideup · 2020-01-14T15:33:07Z

Something to consider:
https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#trySplit--

simonbasle · 2020-01-14T15:59:51Z

the estimateSize() from Spliterator might be a perfect way of avoiding the infinite remainder case. We could switch the internal implementation from an Iterator-based one to a Spliterato-based one.

rstoyanchev · 2020-01-16T12:19:56Z

What would that mean for the API? Currently it takes a mapper with Iterable.

bsideup · 2020-01-16T12:24:02Z

Every Iterable can be converted into Spliterator:

https://docs.oracle.com/javase/8/docs/api/java/lang/Iterable.html#spliterator--

…from iterator If both an Iterator and a Spliterator can be generated for each of the processed Iterables, then the Spliterator is used to ensure the Iterable is SIZED. This allows us to safely assume we can iterate over the remainder of the iterator when cancelling, in order to discard its elements that weren't emitted. Not doing this check would likely cause trouble with infinite discarding loops in the case of infinite Iterables (which is technically possible). For Streams, since both the iterator() and spliterator() methods are terminating the Stream we only generate the Spliterator. We use it to check SIZED and then wrap it in an Iterator adapter for iteration (which is what BaseStream does by default). Note that using a Spliterator to drive the internal iteration doesn't work that well, since the default Iterable#spliterator isn't SIZED and its estimatedSize() method doesn't behave like hasNext(). Iterator#hasNext is far better suited for looking ahead of the emitted element to trigger onComplete immediately after the last onNext.

simonbasle · 2020-01-24T09:10:39Z

See more details in #2021, but going Spliterator the whole way doesn't work terribly well. For instance, the default Iterable#spliterator() creates a Spliterator over the Iterator, but its estimateSize() doesn't reflect hasNext() == false (always Long.MAX_VALUE), which is a test that we absolutely need for correct termination of the Flux.

An approach mixing the Iterator consumption of the iterable with a peek at the Spliterator#getExactSizeIfKnown() works better, at least the later won't give us false positive on which iterator we can "drain" and discard in case of cancellation.

In this change, the goal is to discard elements of the Iterable that haven't been processed yet. The challenge is to avoid attempting doing so for _infinite_ Iterables (which would lead to infinite discarding loops). If the Iterable is a Collection, it should be finite. If both an Iterator and a Spliterator can be generated for each of the processed Iterables, then the Spliterator is used to ensure the Iterable is SIZED. This allows us to safely assume we can iterate over the remainder of the iterator when cancelling, in order to discard its elements that weren't emitted. For Streams, since both the iterator() and spliterator() methods are terminating the Stream we only generate the Spliterator. We use it to check SIZED and then wrap it in an Iterator adapter for iteration (which is what BaseStream does by default). Implementation Notes ---- We didn't fully switch to using a Spliterator to drive the internal iteration. It doesn't work that well, since the Iterable#spliterator default implementation isn't SIZED and its estimatedSize() method does not behave like hasNext(). Iterator#hasNext is far better suited for looking ahead of the emitted element to trigger onComplete immediately after the last onNext.

This commit improves the javadoc from reactor#2014: - missed adding the javadoc discard tags to `fromIterable` and `fromStream` - align javadocs of `concatMapIterable` and `flatMapIterable` since they are aliases It also improves the wording and clarifies that `flatMapIterable` and `fromIterable` discard support can lead to multiple `iterator()` calls. Fixes reactor#2127

This commit improves the javadoc from #2014: - missed adding the javadoc discard tags to `fromIterable` and `fromStream` - align javadocs of `concatMapIterable` and `flatMapIterable` since they are aliases It also improves the wording and clarifies that `flatMapIterable` and `fromIterable` discard support can lead to multiple `iterator()` calls. Fixes #2127

simonbasle added this to the 3.2.15.RELEASE milestone Jan 14, 2020

simonbasle closed this as completed in 73becb9 Feb 7, 2020

simonbasle added a commit that referenced this issue Feb 7, 2020

Merge #2014 and #2021 into 3.3

e6798f6

simonbasle mentioned this issue Feb 19, 2020

Regression in Flux.cache in latest Californium snapshots #2053

Closed

This was referenced Apr 24, 2020

[doc] Polish doc of Iterable operators, clarify iterator() calls #2135

Merged

Use custom javadoc tags sparringly and duplicate important info they contain to the javadoc body #2136

Closed

simonbasle mentioned this issue Jun 19, 2020

repeatWhenEmpty hangs on cancel when discard hook is present #2196

Closed

rstoyanchev mentioned this issue Nov 10, 2020

Add bufferUntil and bufferWhile with Supplier for the Collection to use #1925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concatMapIterable could be enhanced to discard iterator remainder #2014

concatMapIterable could be enhanced to discard iterator remainder #2014

simonbasle commented Jan 14, 2020

simonbasle commented Jan 14, 2020

bsideup commented Jan 14, 2020

simonbasle commented Jan 14, 2020

rstoyanchev commented Jan 16, 2020

bsideup commented Jan 16, 2020

simonbasle commented Jan 24, 2020

concatMapIterable could be enhanced to discard iterator remainder #2014

concatMapIterable could be enhanced to discard iterator remainder #2014

Comments

simonbasle commented Jan 14, 2020

simonbasle commented Jan 14, 2020

bsideup commented Jan 14, 2020

simonbasle commented Jan 14, 2020

rstoyanchev commented Jan 16, 2020

bsideup commented Jan 16, 2020

simonbasle commented Jan 24, 2020