-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-4552. Read data from chunk into ByteBuffer[] instead of single ByteBuffer. #1685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bdfe9cb to
577ddad
Compare
|
Moved the read error handling by refreshing pipeline from BlockInput to ChunkInputStream. cc. @adoroszlai |
|
@bshashikant, can you please take a look at this PR when you get a chance. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG.info -> LOG.warn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will do an additional buffer copy here. Let's see if we can explore anything here to avoid buffer copy here:
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/UnsafeByteOperations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We get ByteString from the response. But the returned ByteString does not have the underlying buffer boundary information. Hence ByteString#asReadOnlyByteBufferList() will return only one ByteBuffer with all the data irrespective of the backing arrays used to construct the ByteString.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we read in small buffers on the server side itself and send it across as a list of bytestrings to the client?
Copying a big buffer on the client read path will be slowing down the read. Probably we should do some benchmarking to understand the effects of all these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case, this turns out to be unavoidable, we can also think about doing bytebuffer.compact() which also does an intrinsic buffer copy to release the buffers but the logic would be more simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we read in small buffers on the server side itself and send it across as a list of bytestrings to the client?
This might work but would require a change in the DN-Client protocol. Would have to analyze the compatibility issues and how to address them.
In case, this turns out to be unavoidable, we can also think about doing bytebuffer.compact() which also does an intrinsic buffer copy to release the buffers but the logic would be more simpler
I am not sure if there is much gain in doing this. The code changes in this PR were required because the logic was inaccurate. It was working because there was always only one ByteBuffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic problem we are trying to solve here is to minimize the memory overhead in the client. In order to solve this, adding an extra buffer copy overhead(with the patch) does not seem to be a reasonable idea to me. Let's discuss it in some more detail on how to address this.
7819426 to
e3f550c
Compare
|
Changed the design to avoid buffer copying. Instead of copying read chunk data into smaller buffers on client side, readChunk response will return data as a list of smaller ByteStrings. Please refer to the PR description for more details. |
|
@bshashikant can you please take a look at the updated patch. |
bshashikant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hanishakoneru for the patch. The patch in general looks good. I am still reviewing the patch but have 2 questions:
-
I think we should use the read default buffer size irrespective of whether checksum is disabled or not. It can be same as the checksum boundary by default.
-
Can we add few acceptance tests to test the compatibility?
The problem with that would be while verifying the checksums. Let's say a chunk has checksum boundary at every 256KB and we set the default read buffer size to 64KB. To calculate checksum, we would need to combine 4 buffers of 64KB each and create a read only buffer of 256KB which will be passed to Checksum.verifyChecksum. This would result in buffer copy which we were trying to avoid in the previous deisgn.
Yes I am working on adding more tests here. |
@hanishakoneru , can we do byteString.concat in cases where bytes.per. Checksum < read buffer size? It doesn't do buffer copy as per the documentation here: https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/ByteString. |
853bfdf to
3b5253e
Compare
ByteString concat could work. But the problem is ChunkBuffer wraps around ByteBuffers. And checksums are calculated on ChunkBuffers. We would have to change the whole ChunkBuffer model (or change Checksum computations to use ByteStrings instead). I think the change would get very complicated. Also, with concating ByteStrings, we would have to keep track of position, limit etc. separately to track checksum boundaries. |
|
The existing xcompat acceptance tests added as part of HDDS-4731 should cover most of the testing required for this change. |
Just an idea: we have a
Yes, the same |
|
Thanks @adoroszlai
Yes, using ChunkBuffer implementation of ByteBufferList in this PR to wrap the list of buffers. |
bshashikant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hanishakoneru for the explanation. The changes looks good with few minor suggestions.
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/ChunkInputStream.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unintended change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not change the default for now. We can change once we do some tests and analyze performance .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Reverted back to 1MB.
|
Thanks @hanishakoneru. Can you please rebase? Also, do we need to add this to ozone documentation somewhere? |
V0 for returning data as single ByteString (old format). V1 for returning data as a list of ByteStrings, with each ByteString length = number of bytes per checksum. 2. If chunk does not have checksums, then set buffer capacity to a default (64KB). 3. Return data from chunk as a list of ByteBuffers instead of a single ByteBuffer.
a3d49a1 to
6beb15d
Compare
Sure, we can open a doc Jira to get this documented. Do you know where we can document this? |
|
Thanks @hanishakoneru . For documentation, you can refer https://issues.apache.org/jira/browse/HDDS-4948 for example. |
|
Thank you @bshashikant. I will merge this shortly. We currently do not have docs explaining client read/ write path. Do you propose we add docs for that? As this is an internal feature (not configurable), do we want to add it to docs or the javadocs in the code would suffice? |
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hanishakoneru for working on this, and sorry for the late review. I have a few comments. Would it be possible to address them in a follow-up issue?
| throw ex; | ||
| } | ||
| data.clear(); | ||
| dataBuffers = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If read from first location fails and we have to fall back to the temp chunk file, this would cause exception.
| long bufferCapacity = 0; | ||
| if (info.isReadDataIntoSingleBuffer()) { | ||
| // Older client - read all chunk data into one single buffer. | ||
| bufferCapacity = len; | ||
| } else { | ||
| // Set buffer capacity to checksum boundary size so that each buffer | ||
| // corresponds to one checksum. If checksum is NONE, then set buffer | ||
| // capacity to default (OZONE_CHUNK_READ_BUFFER_DEFAULT_SIZE_KEY = 64KB). | ||
| ChecksumData checksumData = info.getChecksumData(); | ||
|
|
||
| if (checksumData != null) { | ||
| if (checksumData.getChecksumType() == | ||
| ContainerProtos.ChecksumType.NONE) { | ||
| bufferCapacity = defaultReadBufferCapacity; | ||
| } else { | ||
| bufferCapacity = checksumData.getBytesPerChecksum(); | ||
| } | ||
| } | ||
| } | ||
| // If the buffer capacity is 0, set all the data into one ByteBuffer | ||
| if (bufferCapacity == 0) { | ||
| bufferCapacity = len; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block seems to be duplicated from FilePerBlock.... Can it be extracted?
| return buffersList.stream() | ||
| .map(ByteString::asReadOnlyByteBuffer) | ||
| .collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid streams on read/write path. Earlier these were found to cause CPU usage hotspots. See eg. HDDS-3702.
(Also in few other instances below.)
|
Thanks for the review @adoroszlai. I will address them in HDDS-4553 (#2062) which is a follow up of this Jira. |
What changes were proposed in this pull request?
When a ReadChunk operation is performed, all the data to be read from one chunk is read into a single ByteBuffer.
This Jira proposes to read the data from the channel and put it into an array of ByteBuffers for optimizing reads. For example, currently we hold onto the buffer until the chunkInputStream is closed or the last chunk byte is read (which can lead upto 4MB of data being cached in memory per ChunkInputStream). If we have smaller buffers, they can be released sooner, thus helping to optimize memory utilization (HDDS-4553). This Jira is a pre-requisite for optimizing client memory utilization.
We propose to add ReadChunk version to the ReadChunkRequestProto to determine if the response should have all the chunk data as a single ByteString (V0) or as a list of ByteStrings (V1). Default version will be V0. Older clients will get data back as a single ByteString to maintain wire compatibility.
For new clients, data will be returned as a list of ByteStrings with each ByteString having the capacity equal to its number of bytes per checksum. This is done so that checksum verification is uncomplicated and doesn't require extra buffer copying. For chunks with no checksum, the buffer capacity will be set to a configurable default.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4552
How was this patch tested?
Added unit tests. Working on adding more unit tests.