-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RetainableByteBuffer buffer release bug in WebSocket #9682
Comments
@SerCeMan is this isolated incidents, or does the memory for this app always grow over time even after a restart? |
It does grow over time even after restart, although sometimes it decreases (not by much). However, most of the time the growth is manageable, and it typically stays under 1GB of direct memory used. However, in some cases, maybe once a week, all instances in the cluster observe a growth in the amount of direct memory consumed. We'll hopefully be able to provide more information as the investigation continues, but I was wondering if you had any ideas of the top of your head for what could be causing this behaviour. Thanks! |
@SerCeMan, nothing comes to mind immediately.... other than we have been working on optimisations and improvements to the buffer pooling mechanisms.... so it is plausible that these changes have introduced a problem. Thus we are keen to hear any further analysis on this. We'll also ponder what it could be and come up with some suggestions to help analyse. Stand by.... |
We've fixed a handful of buffer leaks in the past that were due to bugs in the SSL error handling code, so my first guess would be that it might be an error (aborted SSL connections, rejection of client certificates...) that is causing the leak. Is the observed leak slow and steady or is it happening in large amounts at some interval? Is there any way you can correlate the apparent leak with not directly related events? If you can, I would also try replacing Conscrypt with the JDK SSL implementation and see if that helps. |
Hi gents, I've been looking into this issue. We've managed to replicate it. I am going to take a quick look if I can pinpoint the issue, but if not will work on replicating it with a basic setup I can share. Here are some finds so far in case you have an idea.
|
@sshkel Tracking buffer leaks is tough, especially the
For the two screenshots you posted, here is what we can see: The leak is concentrated on 4K buffer size. That bucket contains 93440 buffers out of which 92276 are still marked as in-use. This is what the Since you managed to reproduce the issue without SSL, then we can exclude a great deal of extra complexity from the equation. Since you managed to reproduce the issue, if you could narrow it down to the smallest possible amount of code that you could share with us, that would help tremendously. Since the culprit seems to lie in the websocket code so I would concentrate on that first. Otherwise you're going to have to collect debug logs while the issue is being reproduced and share those logs with us, but that is going to be a much longer journey. Good luck, and let us know about your progress on this. |
Thanks for the explanation! I've been wrangling with it for the past few days and can see how tricky it can be.😅 |
@sshkel I have been reviewing the code and believe I have found some code paths which could cause such a leak. Does your WebSocket application happen to be using the Suspend/Resume feature from the Jetty WebSocket API? |
Signed-off-by: Lachlan Roberts <[email protected]>
Hey @lachlan-roberts, I think you are right on the money with that one. I've just applied your patch to our app and did a quick test, and there is no leak 🎉 |
Signed-off-by: Lachlan Roberts <[email protected]>
Signed-off-by: Lachlan Roberts <[email protected]>
Issue #9682 - fix RetainableByteBuffer release bug in WebSocket
Hi @lachlan-roberts , sorry to be the bearer of the bad news, but I don't think that was the fix. Ran a few extra tests this week and I can still replicate the leak with the patch. I think it has something to do with clients abruptly disappearing while the server is handling largish payloads. I can usually replicate it when starting a handful of clients and then stopping them abruptly and restarting.(See the video below). Some buffers get released after the idle timeout, but as you can see, toward the end it gets stuck with 365 available vs 557 total. I left it running for about another 10 minutes, and it stayed the same. I've put together a simple server and client that I am using.
Client
Kapture.2023-05-24.at.10.25.25.mp4 |
@sshkel thanks for the reproducer, I will look into this tomorrow or early next week. |
@sshkel I have identified the cause of the leak from your example and will be putting up a PR to get this fixed. So it should make it into the next release of 10. |
That's epic! Thanks heaps for the fix! Do you know roughly when the next release is planned? |
There is no release schedule, but it will likely be sometime in the next few weeks. |
Signed-off-by: Lachlan Roberts <[email protected]>
Signed-off-by: Lachlan Roberts <[email protected]>
Signed-off-by: Lachlan Roberts <[email protected]>
…erLeak Issue #9682 - notify WebSocket message sinks of connection close
Jetty version(s)
The extreme case was observed on 10.0.13, however, the same symptoms seem to be observable on 10.0.15.
Java version/vendor
(use: java -version)
OS type/version
Linux, containers
Description
Hi, Jetty folks! We've observed a few cases where a running instance of Jetty seems to leak native memory through direct ByteBuffers. From our investigation, we've found that the likely cause is a single bucket
RetainedBucket
that ends up holding a large number ofRetainableByteBuffer
instances – 346k of them which amounts to ~5G of native memory. The native memory usage seems to steadily grow over time – e.g. 5G native memory usage was observed after the application was running for a couple of days.We're continuing the investigation mainly by adding additional instrumentation, however, I wonder if you have any suggestions for whether what we're seeing is expected under any specific circumstances.
How to reproduce?
I don't have a solid reproducer at the moment. However, some notes about the environment that might be relevant:
The text was updated successfully, but these errors were encountered: