-
Notifications
You must be signed in to change notification settings - Fork 592
HDDS-6039.Define a minimum packet size during streaming writes. #2883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to allocate buffers of size equal to packet size and keep on using the same buffer until the packet is full or stream is closed/flushed. This will help reducing buffer allocations call.
|
Thanks @sadanand48 for the contribution. Can you describe the purpose of adding this PR? Is this PR related to retry optimization? |
The idea here is to ensure we are not sending too many packets when the data gets written in very small fragements(in bytes). There has been cases (for example-- word count acceptance test) where the write pattern is in terms of one byte at a time. For such cases, it becomes necessary to avoid sending extremely small packets over the wire. This idea is derived from hdfs where min packet size is defined as 64 kb. |
|
Thanks @bshashikant for the explanation. I get your point. Yeah, we need this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's set this size to 1 MB == bytesPerCheckSum by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set to 1Mb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In performance test, I found that the performance of 512K is better than that of IMB. Can we change the default value to 512K?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its better to add position() and other calls in here, instead of getting an instance of underlying buffer directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its safe to always return a read only copy of the buffer if required. I would prefer to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to track currentBufferRemaining? It can always be deduced from streamBuffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this position be always be set to 0?? The starting offset should be deduced rom last ack'd length in case of retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are writing a new chunk here everytime picking each entry from the bufferList, hence the position will be 0. Same logic as this
d1e4eea to
89a12ad
Compare
d1e060e to
5f7951a
Compare
|
Changed the default of ozone.fs.datastream.enable and created a new jira (HDDS-6126) to track the MR test failure . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 the change looks good.
Let's merge this PR first. If there is any problem later, we will modify it.
What changes were proposed in this pull request?
Define a minimum packet size during streaming writes as 64kb default.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6039
How was this patch tested?
Unit test