-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-9536. Datanode perf: Copying (heap) buffers is costly. #5497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the patch @szetszwo. Is this optimization for the WriteChunk path, or only for the read path? Looks like it's only for reads. I believe the problem with WriteChunk starts from the moment the ContainerCommandRequestProto is parsed from the ratis log entry. The resulted |
|
Yes, looking at the changes in KeyValueHandler.java, changes seems to be for readChunk. Are you planning separate JIRA for writeChunk? |
|
@duongkame , @umamaheswararao , this PR is mainly to change For Write, the buffers are allocated by the gRPC server when it receives the requests. Let me see if the buffers are direct or not. If the buffers from gRPC are direct, we might have copied them to non-direct buffers somewhere in our code.
Thanks for the hint. Let me check. |
In a ratis test, it shows that the |
The client requests are received from the network (for leader, it is directly from client; for followers, it is the log entries from the leader). It should not be related to the log reader unless the cache is full and the entry is invalidated. This remind me that Ozone does have a Anyway, the
Yes, Ratis Streaming uses Netty directly. It neither use gRPC nor Protobuf for data. (It does use Protobuf for headers.) |
|
Tried testing Ratis with a larger message size (32MB) to see if gRPC will change to use
|
|
Found that gRPC uses |
Thanks for the diggings, @szetszwo. Seems to me that gRPC recently finalized the APIs to support zero-copy, details in grpc/grpc-java#7387. This implies some effort to configure the right marshaller for the Today datannodes clone and copy a WriteChunk data buffer 3+ times, and this is due to the |
|
supportsUnsafeByteBufferOperations seems to be depending on MEMORY_ACCESSOR. I am not sure we have a way to control it. Do we ? |
|
I was just doing some experimenting here: Here is my sample: However I get some expection as my input is just some random bytes and not a proper proto format. But trace is telling that bytes are getting extracted from directBuffer. What I am thinking is, what if we load the log into directBuffer and make CodedInputStream backed by that codecBuffer. I am not sure if that is possible option in Ratis code, but just throwing some thoughts if that make sense in case. |
@umamaheswararao , thanks for testing it! The log is from the network but not loaded from the log. |
|
@szetszwo yeah, I had offline chat with @duongkame. He is actually trying to get directBuffers from stream directly. I would let him comment once he has reasonable results! |
I filed 2 JIRA to make zero-copy work for ratis GRPC. (Did a quick try for zero-copy in GrpcService and I'm positive it's feasible). Let's keep this PR for readChunk only. |
|
@duongkame , sure, let's do only readChunk here. Could you review this? |
|
@duongkame , found some bugs in this PR. Let me fix them first. |
The bug is that a buffer can be released only after the proto is sent out to network. We use gRPC onNext(..) which is asynchronous. However, onNext(..) does not return a future. Not sure how to wait for the asynchronous task to complete. @duongkame , do you have any idea? |
duongkame
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch @szetszwo . It will not only solve the memory/gc inefficiency in datanode but also on the client-side (BlockOutputStream).
I put a few inline comments below.
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/IncrementalChunkBuffer.java
Show resolved
Hide resolved
| static ChunkBuffer preallocate(long capacity, int increment) { | ||
| Preconditions.assertTrue(increment > 0); | ||
| if (capacity <= increment) { | ||
| final CodecBuffer c = CodecBuffer.allocateDirect(Math.toIntExact(capacity)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be cleaner if we directly deal with ByteBufAllocator and ByteBuf in ChunkBuffer. CodecBuffer logic doesn't provide much, but adds an unnecessary dependency.
|
|
||
| @Override | ||
| public void onNext(ContainerCommandRequestProto request) { | ||
| final DispatcherContext context; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we got to do the same in ContainerStateMachine.readStateMachineData. Otherwise there'll be memory leak. Not sure if I should put this comment in #5805
|
@szetszwo @duongkame should we continue working on this PR post #6153 |
|
/pending conflicts; Q: should we continue working on this PR after 6153? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking this issue as un-mergeable as requested.
Please use /ready comment when it's resolved.
Please note that the PR will be closed after 21 days of inactivity from now. (But can be re-opened anytime later...)
conflicts; Q: should we continue working on this PR after 6153?
|
Sure, let's close this. |
What changes were proposed in this pull request?
Change
ChunkBufferto allocate direct buffers and in order to avoid buffer copying.What is the link to the Apache JIRA
HDDS-9536
How was this patch tested?
Modified existing tests.