Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RetainableByteBuffer buffer release bug in WebSocket #9682

Closed
SerCeMan opened this issue Apr 27, 2023 · 15 comments
Closed

RetainableByteBuffer buffer release bug in WebSocket #9682

SerCeMan opened this issue Apr 27, 2023 · 15 comments
Assignees
Labels
Bug For general bugs on Jetty side

Comments

@SerCeMan
Copy link
Contributor

SerCeMan commented Apr 27, 2023

Jetty version(s)

The extreme case was observed on 10.0.13, however, the same symptoms seem to be observable on 10.0.15.

Java version/vendor (use: java -version)

openjdk version "17.0.5" 2022-10-18 LTS
OpenJDK Runtime Environment Zulu17.38+21-CA (build 17.0.5+8-LTS)
OpenJDK 64-Bit Server VM Zulu17.38+21-CA (build 17.0.5+8-LTS, mixed mode, sharing)

OS type/version

Linux, containers

Description

Hi, Jetty folks! We've observed a few cases where a running instance of Jetty seems to leak native memory through direct ByteBuffers. From our investigation, we've found that the likely cause is a single bucket RetainedBucket that ends up holding a large number of RetainableByteBuffer instances – 346k of them which amounts to ~5G of native memory. The native memory usage seems to steadily grow over time – e.g. 5G native memory usage was observed after the application was running for a couple of days.

image

We're continuing the investigation mainly by adding additional instrumentation, however, I wonder if you have any suggestions for whether what we're seeing is expected under any specific circumstances.

How to reproduce?

I don't have a solid reproducer at the moment. However, some notes about the environment that might be relevant:

  • It's a Jetty application that primarily handles WebSockets.
  • It uses Conscrypt for SSL.
@SerCeMan SerCeMan added the Bug For general bugs on Jetty side label Apr 27, 2023
@gregw
Copy link
Contributor

gregw commented Apr 27, 2023

@SerCeMan is this isolated incidents, or does the memory for this app always grow over time even after a restart?

@SerCeMan
Copy link
Contributor Author

It does grow over time even after restart, although sometimes it decreases (not by much). However, most of the time the growth is manageable, and it typically stays under 1GB of direct memory used. However, in some cases, maybe once a week, all instances in the cluster observe a growth in the amount of direct memory consumed.

We'll hopefully be able to provide more information as the investigation continues, but I was wondering if you had any ideas of the top of your head for what could be causing this behaviour. Thanks!

@gregw
Copy link
Contributor

gregw commented Apr 27, 2023

@SerCeMan, nothing comes to mind immediately.... other than we have been working on optimisations and improvements to the buffer pooling mechanisms.... so it is plausible that these changes have introduced a problem. Thus we are keen to hear any further analysis on this. We'll also ponder what it could be and come up with some suggestions to help analyse. Stand by....

@lorban
Copy link
Contributor

lorban commented Apr 27, 2023

We've fixed a handful of buffer leaks in the past that were due to bugs in the SSL error handling code, so my first guess would be that it might be an error (aborted SSL connections, rejection of client certificates...) that is causing the leak. Is the observed leak slow and steady or is it happening in large amounts at some interval? Is there any way you can correlate the apparent leak with not directly related events?

If you can, I would also try replacing Conscrypt with the JDK SSL implementation and see if that helps.

@sshkel
Copy link

sshkel commented May 4, 2023

Hi gents, I've been looking into this issue. We've managed to replicate it. I am going to take a quick look if I can pinpoint the issue, but if not will work on replicating it with a basic setup I can share. Here are some finds so far in case you have an idea.

  1. The issue appears to be coming when using LogarithmicRetainablePool API. We've wrapped LogarithmicArrayByteBufferPool into LeakTracking proxy and we couldn't observe any problems, but then noticed you can convert it with a asRetainableByteBufferPool call. For that pool we can see a whole bunch of buffers are marked as still in use even after all clients have disconnected.

Screenshot 2023-05-04 at 4 50 36 pm

  1. It doesn't appear to be SSL related as we can reproduce it without SSL enabled.

  2. Our current suspect is this path where the buffer gets acquired. Need to double-check my LeakTracking logic, it was a late-night hack 😅

Screenshot 2023-05-04 at 11 26 23 am

@lorban
Copy link
Contributor

lorban commented May 8, 2023

@sshkel Tracking buffer leaks is tough, especially the Retainable kind as those are backed by a reference counter to decide when the repooling must happen to cope with much more complicated buffer-handling logic. Here is a quick explanation of how these work:

  • A retainable buffer is acquired from a retainable pool, the ref counter gets set to 1.
  • When a certain codepath is done with the buffer, it calls release() on it which decreases the counter by 1. When the counter reaches 0, the buffer is re-pooled.
  • When the buffer is handed over by some other codepath (think: an async task) then retain() is called on the buffer to increase the counter by 1.
  • When everything goes to plan, the # of times release() is called matches the # of times retain() was called -1 (b/c the acquisition counts for 1). When a code path forgets a release() call you have a leak.
  • Following all the code paths the buffer went though to track all the calls to retain() and release() and make sure none is missing can be a complex task as that sometimes involves many different async code paths.

For the two screenshots you posted, here is what we can see:

The leak is concentrated on 4K buffer size. That bucket contains 93440 buffers out of which 92276 are still marked as in-use. This is what the RetainedBucket@... string tells us. Then looking at the MonoEntry@... string, we see the displayed buffers are fully consumed (the first r=0) and they're still retained once (the rightmost r=1). This seems to indicate that some path somewhere did not call RetainableByteBuffer.release().

Since you managed to reproduce the issue without SSL, then we can exclude a great deal of extra complexity from the equation.

Since you managed to reproduce the issue, if you could narrow it down to the smallest possible amount of code that you could share with us, that would help tremendously. Since the culprit seems to lie in the websocket code so I would concentrate on that first. Otherwise you're going to have to collect debug logs while the issue is being reproduced and share those logs with us, but that is going to be a much longer journey.

Good luck, and let us know about your progress on this.

@sshkel
Copy link

sshkel commented May 9, 2023

Thanks for the explanation! I've been wrangling with it for the past few days and can see how tricky it can be.😅
I am building a small version of the app, but so far, I haven't been able to replicate it there.
I will keep you posted once I make some progress. Appreciate the tips.

@lachlan-roberts
Copy link
Contributor

@sshkel I have been reviewing the code and believe I have found some code paths which could cause such a leak.

Does your WebSocket application happen to be using the Suspend/Resume feature from the Jetty WebSocket API?

@sshkel
Copy link

sshkel commented May 11, 2023

Hey @lachlan-roberts, I think you are right on the money with that one. I've just applied your patch to our app and did a quick test, and there is no leak 🎉
We use rsocket on top of websocket for our app. I had a suspicion it was calling some other APIs that weren't being called in my basic websocket app. I will take another look tomorrow to see if I can catch it on that path and let you know.
Thanks heaps for looking into this. You are a legend!

lachlan-roberts added a commit that referenced this issue May 12, 2023
Signed-off-by: Lachlan Roberts <[email protected]>
lachlan-roberts added a commit that referenced this issue May 15, 2023
Signed-off-by: Lachlan Roberts <[email protected]>
lachlan-roberts added a commit that referenced this issue May 22, 2023
Issue #9682 - fix RetainableByteBuffer release bug in WebSocket
@lachlan-roberts lachlan-roberts moved this to Done in Delete ME May 23, 2023
@lachlan-roberts
Copy link
Contributor

lachlan-roberts commented May 23, 2023

Closing this as fixed in PR #9759, the fix will be released with 10.0.16/11.0.16/12.0.0.beta2

Thanks for confirming the fix @sshkel.

@sshkel
Copy link

sshkel commented May 24, 2023

Hi @lachlan-roberts , sorry to be the bearer of the bad news, but I don't think that was the fix. Ran a few extra tests this week and I can still replicate the leak with the patch. I think it has something to do with clients abruptly disappearing while the server is handling largish payloads. I can usually replicate it when starting a handful of clients and then stopping them abruptly and restarting.(See the video below). Some buffers get released after the idle timeout, but as you can see, toward the end it gets stuck with 365 available vs 557 total. I left it running for about another 10 minutes, and it stayed the same.

I've put together a simple server and client that I am using.
Server

package websocket;

import org.eclipse.jetty.io.LogarithmicArrayByteBufferPool;
import org.eclipse.jetty.server.HttpConnectionFactory;
import org.eclipse.jetty.server.Server;
import org.eclipse.jetty.server.ServerConnector;
import org.eclipse.jetty.servlet.ServletContextHandler;
import org.eclipse.jetty.util.BlockingArrayQueue;
import org.eclipse.jetty.util.thread.QueuedThreadPool;
import org.eclipse.jetty.websocket.api.Session;
import org.eclipse.jetty.websocket.api.WebSocketAdapter;
import org.eclipse.jetty.websocket.server.JettyServerUpgradeRequest;
import org.eclipse.jetty.websocket.server.JettyServerUpgradeResponse;
import org.eclipse.jetty.websocket.server.JettyWebSocketCreator;
import org.eclipse.jetty.websocket.server.config.JettyWebSocketServletContainerInitializer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class WsServer {
    private static final Logger LOG = LoggerFactory.getLogger(WsServer.class);

    public static void main(String[] args) throws Exception {
        var maxThreads = 8;
        var queue = new BlockingArrayQueue<Runnable>(1000);
        var threadPool = new QueuedThreadPool(maxThreads, 1, 300, queue);

        var server = new Server(threadPool);
        var pool = new LogarithmicArrayByteBufferPool();
        var connector = new ServerConnector(server, null, null, pool, -1, -1, new HttpConnectionFactory());
        connector.setPort(8080);

        setupMetrics(pool);

        server.addConnector(connector);

        ServletContextHandler context = new ServletContextHandler(ServletContextHandler.SESSIONS);
        context.setContextPath("/");
        server.setHandler(context);

        var maxFrame = 5 * 1024 * 1024;
        // Configure specific websocket behavior
        JettyWebSocketServletContainerInitializer.configure(context, (servletContext, wsContainer) ->
        {
            // Configure default max size
            wsContainer.setMaxBinaryMessageSize(maxFrame);
            wsContainer.setMaxFrameSize(maxFrame);

            // Add websockets
            wsContainer.addMapping("/", new JettyWebSocketCreator() {
                @Override
                public Object createWebSocket(JettyServerUpgradeRequest jettyServerUpgradeRequest, JettyServerUpgradeResponse jettyServerUpgradeResponse) {

                    return new WebSocketAdapter() {
                        private volatile Session session;


                        @Override
                        public void onWebSocketBinary(byte[] payload, int offset, int len) {

                            if (session != null && session.isOpen()) {
                                try {
                                    session.getRemote().sendBytes(ByteBuffer.wrap(payload));
                                } catch (IOException e) {
                                    LOG.error("Failed to send bytes");
                                    throw new RuntimeException(e);
                                }
                            }
                        }

                        @Override
                        public void onWebSocketConnect(Session sess) {
                            this.session = sess;
                        }

                        @Override
                        public void onWebSocketClose(int statusCode, String reason) {
                            this.session = null;

                        }

                        @Override
                        public void onWebSocketError(Throwable cause) {
                            LOG.error("Failed with ", cause);
                        }
                    };
                }
            });
        });
        server.start();
        server.join();
    }

    private static void setupMetrics(LogarithmicArrayByteBufferPool pool) {
        Executors.newScheduledThreadPool(1).scheduleAtFixedRate(() -> {
            var rp = (LogarithmicArrayByteBufferPool.LogarithmicRetainablePool) pool.asRetainableByteBufferPool();
            LOG.info("Direct mem: {}", rp.getDirectMemory());
            LOG.info("Direct count:{} ", rp.getDirectByteBufferCount());
            LOG.info("Direct avail mem: {}", rp.getAvailableDirectMemory());
            LOG.info("Direct avail count: {}", rp.getAvailableDirectByteBufferCount());

        }, 5, 3, TimeUnit.SECONDS);
    }

}

Client

package websocket;



import org.eclipse.jetty.websocket.api.Session;
import org.eclipse.jetty.websocket.api.WebSocketAdapter;
import org.eclipse.jetty.websocket.client.WebSocketClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.net.URI;
import java.nio.ByteBuffer;
import java.util.concurrent.Executors;

public class WsClient {
    private final Logger logger = LoggerFactory.getLogger(WsClient.class);

    public static void main(String[] args) {

        WsClient client = new WsClient();
        URI uri = URI.create("ws://localhost:8080/");
        try {
            client.run(uri);
        } catch (Throwable t) {
            t.printStackTrace(System.err);
        }
    }

    public void run(URI uri) throws Exception {

        var payload = new String(new char[300_000]).replace('\0', 'A');
        var frameSize = 5 * 1024 * 1024;
        var pool = Executors.newCachedThreadPool();
        for (var i = 0; i < 100; i++) {
            pool.submit(() -> {
                WebSocketClient client = new WebSocketClient();
                try {
                    client.start();
                    // Wait for Connect
                    Session session = client.connect(new WebSocketAdapter(), uri).get();
                    session.setMaxFrameSize(frameSize);
                    session.setMaxBinaryMessageSize(frameSize);
                    while (true) {
                        session.getRemote().sendBytes(ByteBuffer.wrap(payload.getBytes()));
                        Thread.sleep(300);
                    }
                } catch (Exception e) {
                    logger.error("Exception:", e);
                } finally {
                    try {
                        client.stop();
                    } catch (Exception e) {
                        throw new RuntimeException(e);
                    }
                }
            });
        }
    }
}
Kapture.2023-05-24.at.10.25.25.mp4

@lachlan-roberts
Copy link
Contributor

@sshkel thanks for the reproducer, I will look into this tomorrow or early next week.

@lachlan-roberts lachlan-roberts moved this from ✅ Done to 👀 In review in Jetty 10.0.16 / 11.0.16 - FROZEN May 25, 2023
@lachlan-roberts lachlan-roberts moved this from 👀 In review to 🏗 In progress in Jetty 10.0.16 / 11.0.16 - FROZEN May 25, 2023
@lachlan-roberts
Copy link
Contributor

@sshkel I have identified the cause of the leak from your example and will be putting up a PR to get this fixed.

So it should make it into the next release of 10.

@sshkel
Copy link

sshkel commented May 31, 2023

That's epic! Thanks heaps for the fix! Do you know roughly when the next release is planned?

@lachlan-roberts
Copy link
Contributor

There is no release schedule, but it will likely be sometime in the next few weeks.

@gregw gregw closed this as completed Jun 6, 2023
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Jetty 10.0.16 / 11.0.16 - FROZEN Jun 6, 2023
@lachlan-roberts lachlan-roberts moved this from ✅ Done to 🏗 In progress in Jetty 10.0.16 / 11.0.16 - FROZEN Jun 7, 2023
@lachlan-roberts lachlan-roberts moved this from ✅ Done to 🏗 In progress in Jetty 12.0.0.beta2 Jun 7, 2023
lachlan-roberts added a commit that referenced this issue Jun 8, 2023
lachlan-roberts added a commit that referenced this issue Jun 8, 2023
Signed-off-by: Lachlan Roberts <[email protected]>
lachlan-roberts added a commit that referenced this issue Jun 9, 2023
Signed-off-by: Lachlan Roberts <[email protected]>
lachlan-roberts added a commit that referenced this issue Jun 15, 2023
…erLeak

Issue #9682 - notify WebSocket message sinks of connection close
@gregw gregw moved this from 🏗 In progress to 👀 In review in Jetty 12.0.0.beta2 Jun 15, 2023
@lachlan-roberts lachlan-roberts moved this from 🏗 In progress to ✅ Done in Jetty 10.0.16 / 11.0.16 - FROZEN Jun 15, 2023
@lachlan-roberts lachlan-roberts moved this from 👀 In review to ✅ Done in Jetty 12.0.0.beta2 Jun 15, 2023
@joakime joakime changed the title A possible native memory leak through RetainableByteBuffers RetainableByteBuffer buffer release bug in WebSocket Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
None yet
Development

No branches or pull requests

5 participants