-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-13400. S3g has accumulated memory pressure due to unlimited ElasticByteBufferPool in RpcClient #9166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ChenSammi could you please review this patch. |
...dds/client/src/main/java/org/apache/hadoop/ozone/client/io/BoundedElasticByteBufferPool.java
Outdated
Show resolved
Hide resolved
...dds/client/src/main/java/org/apache/hadoop/ozone/client/io/BoundedElasticByteBufferPool.java
Outdated
Show resolved
Hide resolved
...dds/client/src/main/java/org/apache/hadoop/ozone/client/io/BoundedElasticByteBufferPool.java
Show resolved
Hide resolved
peterxcli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java
Outdated
Show resolved
Hide resolved
|
@ChenSammi Would you like to take a look? Thanks! |
|
@peterxcli Please take a look, I have updated the patch. |
peterxcli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other looks good.
| like the S3 Gateway. Once this limit is reached, used buffers are not | ||
| put back to the pool and will be garbage collected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used buffers are not put back to the pool and will be garbage collected.
can we help them to deallocate the buffer immediately? so we can reduce the GC pressure.
not quite understand how GC in java works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Java, we can't deallocate memory manually (like free() in C/C++). The only way to free memory is to remove all references to an object and let the Garbage Collector (GC) reclaim it.
When our pool is full, by returning without storing the buffer, we are doing exactly that. The buffer becomes "unreachable," and the GC will handle its deallocation.
So, I believe while we are still relying on the GC (which is unavoidable in Java), it's for a much smaller fraction of objects, which is exactly the fix we want to reduce overall s3g memory pressure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also call System.gc() to suggest that the garbage collector run immediately. However, the Java Runtime makes the final decision.
According to the Java documentation.
So immediately deallocating buffer is not allowed by java. However if needed we can use System.gc() to run garbage collector immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, Thanks for the detail explanation. learned a lot!
|
@peterxcli Could please re-trigger these failed checks as my CI has passed these checks. |
|
I think I need to rebase my branch as the above Commit has been reverted. |
4f9aeeb to
9d3183e
Compare
Please use |
Okay sure. |
| @Override | ||
| public synchronized ByteBuffer getBuffer(boolean direct, int length) { | ||
| TreeMap<Key, ByteBuffer> tree = this.getBufferTree(direct); | ||
| Map.Entry<Key, ByteBuffer> entry = tree.ceilingEntry(new Key(length, 0L)); | ||
| if (entry == null) { | ||
| // Pool is empty or has no suitable buffer. Allocate a new one. | ||
| return direct ? ByteBuffer.allocateDirect(length) : ByteBuffer.allocate(length); | ||
| } | ||
| tree.remove(entry.getKey()); | ||
| ByteBuffer buffer = entry.getValue(); | ||
|
|
||
| // Decrement the size because we are taking a buffer OUT of the pool. | ||
| currentPoolSize.addAndGet(-buffer.capacity()); | ||
| buffer.clear(); | ||
| return buffer; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also count those "allocated but not released" buffer into the buffer size limit?
Just like BufferPool does:
ozone/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BufferPool.java
Lines 52 to 53 in 3f6ba7e
| private final LinkedList<ChunkBuffer> allocated = new LinkedList<>(); | |
| private final LinkedList<ChunkBuffer> released = new LinkedList<>(); |
ozone/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BufferPool.java
Lines 92 to 95 in 3f6ba7e
| while (allocated.size() == capacity) { | |
| LOG.debug("Allocation needs to wait the pool is at capacity (allocated = capacity = {}).", capacity); | |
| notFull.await(); | |
| } |
I know that the original ElasticByteBufferPool doesn't do this.
Just want to make sure if we need to managed the allocated buffer, and why or why not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate your suggestion but I think If we did count "allocated but not released" buffers toward the limit, we would be forced to change our getBuffer method to be blocking (i.e., to wait() when the limit is hit).
This would be a major, high-risk change from the original ElasticByteBufferPool's behavior, which always allocates a new buffer immediately. It could introduce performance bottlenecks or even deadlocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BufferPool is linked to is a blocking, fixed-size pool. Its purpose is to strictly limit the total number of buffers ever created (e.g., "this system will only ever use 100 buffers, total"). If you ask for buffer 101, getBuffer will wait until one is returned.
Our BoundedElasticByteBufferPool is a non-blocking, caching pool. Its purpose is to fix a memory leak from the original ElasticByteBufferPool (which grew forever) while preserving its "elastic" (non-blocking) nature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good—let’s get this merged.
|
Thanks @Gargi-jais11 for the patch, @adoroszlai for reviewing! |
What changes were proposed in this pull request?
Currently RpcClient has a ElasticByteBufferPool to reuse buffers during EC data read and write. ElasticByteBufferPool can save the time of buffer allocation. While this Pool doesn't have a upper limit, so in s3g case, a long lived RpcClient will accumulate all buffers allocated through this pool, which lead to high memory pressure of s3g.
Solution:
Create a new class implementing
ByteBufferPoolwhich will be a bounded version ofElasticByteBufferPoolthat limits the total size of buffers that can be cached in the pool.To control the size of this pool added a new configuration :
In
RpcClientuseBoundedElasticByteBufferPoolinstead ofElasticByteBufferPool.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13400
How was this patch tested?
Passed Existing Tests and green CI.