Skip to content

Conversation

@sadanand48
Copy link
Contributor

@sadanand48 sadanand48 commented Dec 3, 2021

What changes were proposed in this pull request?

Define a minimum packet size during streaming writes as 64kb default.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6039

How was this patch tested?

Unit test

@sadanand48 sadanand48 marked this pull request as ready for review December 3, 2021 20:41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to allocate buffers of size equal to packet size and keep on using the same buffer until the packet is full or stream is closed/flushed. This will help reducing buffer allocations call.

@captainzmc
Copy link
Member

Thanks @sadanand48 for the contribution. Can you describe the purpose of adding this PR? Is this PR related to retry optimization?

@bshashikant
Copy link
Contributor

Thanks @sadanand48 for the contribution. Can you describe the purpose of adding this PR? Is this PR related to retry optimization?

The idea here is to ensure we are not sending too many packets when the data gets written in very small fragements(in bytes). There has been cases (for example-- word count acceptance test) where the write pattern is in terms of one byte at a time. For such cases, it becomes necessary to avoid sending extremely small packets over the wire. This idea is derived from hdfs where min packet size is defined as 64 kb.

@captainzmc
Copy link
Member

Thanks @bshashikant for the explanation. I get your point. Yeah, we need this.

@sadanand48 sadanand48 marked this pull request as draft December 8, 2021 09:09
@sadanand48 sadanand48 marked this pull request as ready for review December 9, 2021 06:29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set this size to 1 MB == bytesPerCheckSum by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set to 1Mb

Copy link
Member

@captainzmc captainzmc Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In performance test, I found that the performance of 512K is better than that of IMB. Can we change the default value to 512K?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its better to add position() and other calls in here, instead of getting an instance of underlying buffer directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its safe to always return a read only copy of the buffer if required. I would prefer to remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to track currentBufferRemaining? It can always be deduced from streamBuffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this position be always be set to 0?? The starting offset should be deduced rom last ack'd length in case of retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are writing a new chunk here everytime picking each entry from the bufferList, hence the position will be 0. Same logic as this

@sadanand48
Copy link
Contributor Author

Changed the default of ozone.fs.datastream.enable and created a new jira (HDDS-6126) to track the MR test failure .

Copy link
Member

@captainzmc captainzmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.
Let's merge this PR first. If there is any problem later, we will modify it.

@captainzmc captainzmc merged commit cd143c5 into apache:HDDS-4454 Dec 21, 2021
captainzmc pushed a commit to captainzmc/hadoop-ozone that referenced this pull request Jul 4, 2022
szetszwo pushed a commit that referenced this pull request Oct 25, 2022
(cherry picked from commit 1f56a45)
(cherry picked from commit 750c5f475a4a03cdb078699d5f5a2254f1f491b0)
szetszwo pushed a commit that referenced this pull request Nov 7, 2022
(cherry picked from commit 1f56a45)
(cherry picked from commit 750c5f475a4a03cdb078699d5f5a2254f1f491b0)
(cherry picked from commit 7e61b1f)
szetszwo pushed a commit that referenced this pull request Dec 1, 2022
szetszwo pushed a commit that referenced this pull request Dec 16, 2022
nishitpatira pushed a commit to nishitpatira/ozone that referenced this pull request Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants