-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.x: groupBy may never cancel the upstream if the group is dropped #6596
Comments
Thanks for raising this @akarnokd. Reminds me of #5839 where you used the term resource-aware streams. Making RxJava resource-aware might be an option so that queueing operators applied some function (global? local? interface defined?) to unused buffered elements. I've hundreds of uses of Has anyone got a use case where they call |
We use something like a hook for unconsumed data (discarded). The problem is still that sometimes we cannot guarantee the parent is not emitting some extra onNext. We had that problem with MonoCollectList in particular but it can apply to other operators. One thing I was wondering was if we could make cancel propagate back an onComplete to delimit the onNext sequence consistently. Regarding your case @davidmoten i've mostly seen cases where we do |
Thanks @smaldini. My question is not great at the moment. Really I want to establish if any user has a use case to insert a non-trivial operator between |
That's interesting and certainly makes resource-aware streams more difficult. |
I don't think resource awareness would help in this situation. When the group is composed over ( Resource-awareness is its own complicated thing and requires drastic changes to the architecture. I'd like someday to explore it but outside of RxJava. |
Yep, indeed. What about the internalization of |
Like with |
Summary
The design decision to allow a delayed
subscribe()
to a group emitted bygroupBy
(i.e.,subscribeOn
) creates a window where if the consumer ignores the group,groupBy
may never cancel its source.Problem
In order to support taking a limited number of groups (i.e.,
source.groupBy(i -> i % 10).take(2)
, thegroupBy
operator can't cancel its source just because the downstream cancelled the flow of groups on it. Instead, a reference counting scheme is used so that when all groups have been cancelled, the upstream will then be cancelled (i.e.,source.groupBy(i -> i % 2).take(2).flatMap(g -> g.take(2))
).The documentation states that one should not ignore a group:
however, some operators may ignore their input - since they don't see a group but an object only, thus the group is not consumed and the source is never cancelled.
Unfortunately,
flatMap
is one of such operators which when cancelled, it will ignore incoming values, and thus groups. Add some asynchronous cancellation race and the problem manifests sooner or later (source.groupBy.take.flatMap.takeUnit(cancelSignal)
).In the original Rx.NET implementation, groups are reference counted and if the consumer doesn't subscribe to it immediately, the group is discarded right then. However, they don't cache items and thus a delayed subscription results in dataloss RxJava was set to avoid.
Proposition
I propose a change to the
groupBy
logic to solve this cancellation problem as well as not lose data. The solution requires multiple considerations.First, we need to detect if there was a
subscribe
call when a fresh group was emitted. If not, the group is discarded after the single value and a completion signal is emitted to the group. This way, thegroupBy
is not held back by a potentially unconsumed group and if the group is eventually consumed, the value is not lost.The drawback is that this scheme may lead to group recreation over and over even if the group is actually subscribed to in a delayed fashion. Note, however, that since a group is practically a hot subject, using
subscribeOn
has generally no practical benefit and consumers should applyobserveOn
to shift the emission to the desired thread anyway.Second, there is an inherent race possible between an async subscriber and deciding if the group has been subscribed to just in time. Therefore, an atomic state transition has to be implemented to declare a group live or dead on arrival. In addition, the completion of the dead group and a possible cancellation by its consumer should not trigger multiple cancellations/group removal, especially the removal of a newer group with the same key.
Third, when a
GroupedFlowable
is declared dead on arrival, then consumed later, the consumption if the cached item should not trigger a request from the main source like with any alive group consumption. In contrast, declaring a group dead should ask for replenishment from the main source as now we can't know if the group will ever be consumed.The text was updated successfully, but these errors were encountered: