-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Broker memory leak #22157
Comments
could you please upload the heap dump file? |
The heap dump has reached 13GB and cannot be uploaded |
compress it and upload to cloud driver, such as 百度云? it is very important to locate the root cause |
@graysonzeng @dao-jun please notice that the heap dump could contain sensitive data. it should be never shared without encryption because of this. |
@graysonzeng from the screenshots, it looks like the problem is caused by the 6.5 million NonDurableCursorImpl instances. |
I'd recommend adding https://github.com/vlsi/mat-calcite-plugin plugin to Eclipse MAT so that you can do SQL queries to the heap dump. Eclipse MAT has OQL support, but that's not so handy as the SQL queries where you can do anything that Calcite supports with SQL. not useful for this case, but an example of a Calcite query using Eclipse MAT + Calcite plugin:
|
@lhotari Yes. And I found that these instances are referenced by We used routine load task in starrocks for reader consume, and created and deleted consumers multiple times. Therefore, many NonDurableCursorImpl will be generated. |
@graysonzeng related to #13939 ? |
pulsar version is 3.1.1。It looks like related to #13939 . It looks like maybe removeWaitingCursor is not properly removing the cursor after deactivateCursor() converts the cursor's isActive to false. @lhotari Lines 310 to 311 in 6ec473e
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Another possibility is that non-durable cursors and related subscriptions should be cleaned up when a connection dies in an unexpected way. I'm not sure how that is handled in the code base currently. |
Or maybe cleaned up after an inactivity period? |
There is a race condition between the Lines 298 to 313 in ccc2ea6
Then after executing line-311, the cursor was added to waitingCursor .
Reproduced test
|
Search before asking
Version
v3.1.1
Minimal reproduce step
After running for a period of time, the broker memory will gradually increase and eventually lead to a restart.

After the heap dump, it was found that many ManagedLedgerImpl instances were retained in the memory, and these instances occupied most of the memory.
Common Path To the Accumulation Point:

What did you expect to see?
Normal memory GC
What did you see instead?
broker restart
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: