How best to release elements in a ReplaySubject? #1794

simonbasle · 2014-10-24T17:33:11Z

Hi,

We're using ReplaySubject to keep a streamed response from a server into a buffer, in order to be able to pass the whole (finite) sequence to observables that subscribe to the subject.

The whole sequence may take a bit of time to complete, but we want to pass the Observable right away to client code.Furthermore, client code may not immediately subscribe to the flow, or it could subscribe several time to do multiple processing of the sequence.

Thus the use of ReplaySubject to avoid replaying the original request each time, and to avoid blocking for the server to respond with the whole sequence.

However, said elements are resources that are reference counted and should be manually freed before being garbage collected. What do you think is the best approach to cope with that?

Maybe what we need is to add semantics on the ReplaySubject to do cleanup and clear the buffer (like a dispose() or release() method)? This would probably also be needed in the bounded ReplaySubject to plug in cleanup behavior at eviction time...

Maybe there is a better alternative to ReplaySubject? Or a simple solution like foreaching on the items and cleaning them up this way?

I hope someone will be able to make a suggestion, and I'd be delighted to contribute if it makes sense ;) Thanks!

The text was updated successfully, but these errors were encountered:

akarnokd · 2014-10-24T17:57:24Z

I think for-each is your best option now. Multicast or Refcount doesn't support unsubscription action and ReplaySubject doesn't expose the buffer in any safe manner.

simonbasle · 2014-10-24T19:02:38Z

I see... Would that be worth the effort to try and implement and expose
buffer releasing mechanics in ReplaySubject?

benjchristensen · 2014-10-25T03:14:39Z

Using Subjects directly are almost never the desired solution as they don't compose with backpressure or resource cleanup (unsubscription). Subjects are very imperative in use and "hot" in how data flows through them.

For example, using cache() internally uses ReplaySubject but it will be garbage collected once references to the Observables are released. Or replay().refCount() is even better, as once all Subscribers are done it will unsubscribe up the chain and dispose of the underlying ReplaySubject.

Your use case however sounds like you should use Observable.create to represent your data source since then you can manage the cleanup as you need to (since you need to manage the reference counting).

Thus, I suggest something like the following:

import java.util.Arrays;
import java.util.Collection;

import rx.Observable;
import rx.schedulers.Schedulers;
import rx.subscriptions.Subscriptions;

public class ObservableCreateWithCleanup {

    public static void main(String[] args) {
        System.out.println("fetch slow data...");
        getData().toBlocking().forEach(System.out::println);
        // sleep so the JVM doesn't shut down before we see cleanup
        try {
            Thread.sleep(200);
        } catch (InterruptedException e) {
        }
    }

    public static Observable<Data> getData() {
        return Observable.<Data> create(subscriber -> {
            try {
                Collection<Data> data = slowSynchronousCall();
                // completely ignoring backpressure and not using setProducer for simplicity here
                for(Data d : data) {
                    subscriber.onNext(d);
                }
                subscriber.onCompleted();
                // register subscription
                subscriber.add(Subscriptions.create(() -> {
                    try {
                        // do the manual refcount release
                        for(Data d : data) {
                            d.release();
                        }
                    }catch(Throwable e) {
                        // never throw while unsubscribing
                        e.printStackTrace();
                    }
                }));
            }catch(Throwable e) {
                subscriber.onError(e);
            }
        }).subscribeOn(Schedulers.io());
    }

    /**
     * Assuming the slow call is synchronous. And I'm making this be a single collection rather than a stream to the Observable.
     * 
     * @param id
     * @return
     */
    public static Collection<Data> slowSynchronousCall() {
        try {
            // simulate slow
            Thread.sleep(1000);
        } catch (InterruptedException e) {
        }
        return Arrays.asList(new Data(1), new Data(2), new Data(3));
    }

    public static class Data {
        private final int id;

        public Data(int id) {
            this.id = id;
        }

        public void release() {
            System.out.println("Releasing " + toString());
        }

        @Override
        public String toString() {
            return "Data => " + id;
        }
    }
}

This outputs:

fetch slow data...
Data => 1
Data => 2
Data => 3
Releasing Data => 1
Releasing Data => 2
Releasing Data => 3

daschl · 2014-10-25T06:51:52Z

@benjchristensen @simonbasle is working with me on the Java client now.

The problem is why we deferred to use the Subject is we need to create it, pass it over to netty and then fill it in from the event loop as server chunks come in. I implemented a JSON streaming parser that writes chunks into the subject.

Since we're giving the Subject (casted to an observable) back to the user we can't influence his behavior, but we can wrap or extend those subject as needed since they are never exposed that way to the user.

The main problem here is that we expose Netty ByteBufs into the subject and if we don't clean it up properly, Netty's leak detector gets unhappy.

benjchristensen · 2014-10-25T07:03:20Z

Hi Michael, I figured it was Netty ByteBufs. Does the solution I offered not work? Subjects are not good to use because they break the subscription and backpressure composition.

If my example doesn't work can you modify the code to show how it doesn't? =

daschl · 2014-10-27T07:12:11Z

@benjchristensen hmm maybe I didn't grasp the whole idea.Are you suggesting that

a) we should remove the subject and use the Observable.create directly? (that may or may not work in our case, but it will require a larger rewrite of that codepath)

b) we wrap the subject with Observable.create, return the wrapper to the user and in the wrapper handle the releasing once emitted?

simonbasle · 2014-10-27T17:02:30Z

@benjchristensen the problem is that we are not dealing with just a blocking operation. The request has been sent to the server and it is already responding, in a streaming fashion, by sending back chunks. This is all done inside the core layer, then exploited by the java sdk layer.

Upon receiving the first chunk from the server we return a response with meta-data and an Observable to the java sdk layer (in fact a ReplaySubject, let's call it rows).

Next server chunk will probably be our first ByteBuf (first row), which will be fed into the ReplaySubject.

At this point in time, client code may not have yet subscribed to the rows Observable, yet we need to process and store the beginning of the response, thus the ReplaySubject (1).

Also, down the road the final client code (using java sdk layer) may want to process the sequence several time, without the cost of calling the server again occurring. He would do that by resubscribing on a different flow starting from rows. Once again the ReplaySubject kicks in (2).

I see that there is a BufferUntilSubscriber subject implementation. Maybe this could be a better match for the first part of our use case 1 above?
I've explored doing this temporary caching then on client side transcoding and releasing the ByteBuf before caching again:

//this is java sdk layer code, we adapt the response from core and pass it to final client code
//coreResponse contains metadata like total size of sequence + rows observable
Observable<ByteBuf> rows = coreResponse.rows(); //this was ReplaySubject, now BufferUntilSubscriber subject (1)
int rowCount = coreResponse.rowCount();
Observable<JsonObject> clientRows = rows
    .map(ByteBufToJsonObject::transcode) //this also releases the ByteBuf
    .cache(totalRows); //here we cache on client side, (2)
//calling code of the client library will be able to subscribe to clientRows multiple times and process rows as JSON
return new ClientResponse(ResultEnum.SUCCESS, rowCount, clientRows);

Does it give you a better grasp of what we try to achieve? Do you think it makes better sense to do things this way?

benjchristensen · 2014-10-27T18:13:26Z

Are you suggesting that

a) we should remove the subject and use the Observable.create directly? (that may or may not work in our case, but it will require a larger rewrite of that codepath)

Yes, use of Subject for anything other than hot streams is almost always the source of problems if resource cleanup and backpressure are needed. Subjects are always "hot".

down the road the final client code (using java sdk layer) may want to process the sequence several time, without the cost of calling the server again occurring

The use of ReplaySubject would generally be okay for your use case (as long as it is hidden as an implementation detail, such as by using observable.replay()) except that your use case is very unique in the need to manually release memory instead of counting on GC to do it.

(note that backpressure is still not supported on replay, that is one of the few places still to be done)

BufferUntilSubscriber is an internal implementation detail, so not safe to use (it could break at any time). It also only achieves part of what you want (buffering until the first subscribe, not for multiple subscribes).

You can model your solution after BufferUntilSubscriber but you'll want to register your cleanup inside it to release all the ByteBufs. You also have the risk of someone kicking off the work and then never subscribing and thus never releasing the ByteBufs ... not a problem when GC cleans up the BufferUntilSubscriber normally, but a problem for refCounted objects. This is another reason why Observable.create would be preferable so that you only do the work when there is a Subscriber that you can register an unsubscribe event with.

I also caution you automatically caching things to allow multiple subscribers, as that means you will never know when to release the ByteBufs (unless using ConnectableObservable.refCount(), but that doesn't solve your second use case of subscribing later, only when multiple subscribers at the same time are interested). The user should opt into that using operators like .cache() or .share() if they need to multicast. The .cache() implementation though will cause a memory leak with ByteBuf because it would never release (until GC which doesn't help you).

I've been working with ByteBufs on the RxNetty project and it's honestly a mess in user space to deal with. Mixing a GCd programming model with a refCounting strategy is not elegant at all. There are use cases that require the user choosing to increment/decrement the refCount themselves. That is why in RxNetty we automatically release the ByteBuf at the end of the chain, and if a user is trying to cache it for later, they need to opt in to incrementing the count and then are responsible for releasing it later.

simonbasle · 2014-10-27T23:13:29Z

Interesting, thanks for the insight. I'll explore the Observable.create
path and see how we can refactor our code. I'll also have a look at RxNetty
implementations. Thanks Ben!

benjchristensen · 2014-10-28T03:22:59Z

Feel free to keep discussing here. This isn't an easy thing to solve gracefully and I'm interested in figuring out a good solution with you.

benjchristensen · 2014-10-30T05:53:24Z

Potentially of interest to you about releasing ByteBuf -> ReactiveX/RxNetty#264

NiteshKant · 2014-10-30T06:02:31Z

This is a very interesting topic :)

I agree with @benjchristensen that auto-caching isn't a clean solution and the responsibility should lie in the hands of the user to cache if required i.e. when they can not eagerly subscribe to the content. OTOH, if the user eagerly subscribes to the content, then there is no need for caching. We have taken this approach of caching in zuul 2.x which is a proxy.
RxNetty employs a version of BufferUntilSubscribe here and expects users to eagerly subscribe to the HTTP content (holding the ByteBuf). In order to detect cases where no one subscribes to the content (and hence cause a leak), RxNetty has a subscription timeout, which disposes the subject if not subscribed. Also, to make it explicit that there will be no subscription, it also provides a way to express ignoring content here

From the point of view of auto-releasing the ByteBuf in RxNetty I have realized that it causes more heartache than solve issues. I have collected my thoughts in this issue and I intend to remove the auto-release behavior soon in RxNetty.

daschl · 2014-10-30T07:02:09Z

@NiteshKant thanks for chiming in here, I agree its a interesting topic and hard to get right for a variety of scenarios.

I'd say we are in the position where currently everywhere we can assume that subscribers consuming the ByteBufs are trusted (so the language bindings for the core package), but of course we need to solve this "what happens if noone subscribes" issue. I'm curious to see how you did that in RxNetty, I'm sure we can apply something similar.

NiteshKant · 2014-10-31T05:22:02Z

@daschl I replied to your comment on the RxNetty issue

akarnokd · 2015-02-05T16:45:42Z

ReplaySubject in 1.0.5 has now methods that lets you peek into it.

simonbasle · 2015-02-05T16:57:05Z

getValues() right? awesome 😍

akarnokd · 2015-03-11T14:32:24Z

I'm closing this due to inactivity. If you have further questions, please don't hesitate to reopen this issue or post a new one.

benjchristensen added the Question label Nov 15, 2014

akarnokd added the Discussion label Jan 26, 2015

akarnokd closed this as completed Mar 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How best to release elements in a ReplaySubject? #1794

How best to release elements in a ReplaySubject? #1794

simonbasle commented Oct 24, 2014

akarnokd commented Oct 24, 2014

simonbasle commented Oct 24, 2014

benjchristensen commented Oct 25, 2014

daschl commented Oct 25, 2014

benjchristensen commented Oct 25, 2014

daschl commented Oct 27, 2014

simonbasle commented Oct 27, 2014

benjchristensen commented Oct 27, 2014

simonbasle commented Oct 27, 2014

benjchristensen commented Oct 28, 2014

benjchristensen commented Oct 30, 2014

NiteshKant commented Oct 30, 2014

daschl commented Oct 30, 2014

NiteshKant commented Oct 31, 2014

akarnokd commented Feb 5, 2015

simonbasle commented Feb 5, 2015

akarnokd commented Mar 11, 2015

How best to release elements in a ReplaySubject? #1794

How best to release elements in a ReplaySubject? #1794

Comments

simonbasle commented Oct 24, 2014

akarnokd commented Oct 24, 2014

simonbasle commented Oct 24, 2014

benjchristensen commented Oct 25, 2014

daschl commented Oct 25, 2014

benjchristensen commented Oct 25, 2014

daschl commented Oct 27, 2014

simonbasle commented Oct 27, 2014

benjchristensen commented Oct 27, 2014

simonbasle commented Oct 27, 2014

benjchristensen commented Oct 28, 2014

benjchristensen commented Oct 30, 2014

NiteshKant commented Oct 30, 2014

daschl commented Oct 30, 2014

NiteshKant commented Oct 31, 2014

akarnokd commented Feb 5, 2015

simonbasle commented Feb 5, 2015

akarnokd commented Mar 11, 2015