HDDS-3223. Improve s3g read 1GB object efficiency by 100 times #843

runzhiwang · 2020-04-20T02:19:04Z

What changes were proposed in this pull request?

What's the problem ?

Use dd to read 1000M object from ozone bucket mounted by goofys, it cost about 470 seconds, i.e. 2.2M/s, which is too slow.

I also use tcpdump to capture the packet when I read 200M object. As the image shows, the 1st GET request cost about 1 second, but the 10th GET request cost about 22 seconds. The GET request become slower and slower.

What's the reason ?
When read 1000M object, there are 50 GET requests, each GET request read 20M. When do GET, the stack is: IOUtils::copyLarge -> IOUtils::skipFully -> IOUtils::skip -> InputStream::read.

It means, the 50th GET request which should read 980M-1000M, but to skip 0-980M, it also InputStream::read 0-980M. So the 1st GET request read 0-20M, the 2nd GET request read 0-40M, the 3rd GET request read 0-60M, ..., the 50th GET request read 0-1000M. So the GET request from 1st-50th become slower and slower.

You can also refer IO-203 and IO-355 why IOUtils implement skip by read rather than real skip, e.g. seek.

How to improve ?
Copy IOUtils::copyLarge to S3WrapperInputStream, and replace IOUtils::skipFully with S3WrapperInputStream::seek. No other changes of IOUtils::copyLarge.
After improving, read 1000M object cost 4.79 seconds, i.e. 219M/s, about 100 times faster.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3223

How was this patch tested?

Existed UT and IT.

bharatviswa504

Thank You @runzhiwang for the analysis and great improvement in read performance.

I see there is one more usage of IOUtils.copyLarge in createMultipartKey, can we update over there also with a similar change.

L534:

            if (range != null) {
              RangeHeader rangeHeader =
                  RangeHeaderParserUtil.parseRangeHeader(range, 0);
              IOUtils.copyLarge(sourceObject, ozoneOutputStream,
                  rangeHeader.getStartOffset(),
                  rangeHeader.getEndOffset() - rangeHeader.getStartOffset());

            }

And also can we add test for S3WrapperInputStream which tests a new method with multiple seek values.

hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/io/S3WrapperInputStream.java

runzhiwang · 2020-04-21T06:32:31Z

Thank You @runzhiwang for the analysis and great improvement in read performance.

I see there is one more usage of IOUtils.copyLarge in createMultipartKey, can we update over there also with a similar change.

L534:
            if (range != null) {
              RangeHeader rangeHeader =
                  RangeHeaderParserUtil.parseRangeHeader(range, 0);
              IOUtils.copyLarge(sourceObject, ozoneOutputStream,
                  rangeHeader.getStartOffset(),
                  rangeHeader.getEndOffset() - rangeHeader.getStartOffset());

            }
And also can we add test for S3WrapperInputStream which tests a new method with multiple seek values.
@bharatviswa504 I will handle this. Thanks for your comments.

runzhiwang · 2020-04-22T14:32:57Z

@bharatviswa504 Hi, I move copyLarge into KeyInputStream, and add integration test in TestKeyInputStream for it. And also change all the use of IOUtils.copyLarge.

runzhiwang

I delete the following code because the InputStream in unit test is ByteArrayInputStream, it cannot be cast into KeyInputStream when new S3WrapperInputStream(sourceObject.getInputStream())).

It's ok to delete it, because there are other test to cover the case, such as MultipartUpload.robot as the image shows.

bharatviswa504 · 2020-04-29T05:56:53Z

+1 LGTM.
Thank you @runzhiwang for the improvement.

runzhiwang force-pushed the s3g-seek branch 2 times, most recently from ced02e6 to 2388d62 Compare April 20, 2020 05:59

HDDS-3223. Improve s3g read 1GB object efficiency by 100 times

2388d62

bharatviswa504 reviewed Apr 21, 2020

View reviewed changes

hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/io/S3WrapperInputStream.java Outdated Show resolved Hide resolved

fix code review

f7d948b

runzhiwang commented Apr 22, 2020

View reviewed changes

adoroszlai requested a review from elek April 22, 2020 15:26

bharatviswa504 merged commit 5fbb045 into apache:master Apr 29, 2020

This was referenced Apr 29, 2020

HDDS-426. Add field modificationTime for Volume and Bucket #164

Merged

HDDS-2424. Add the recover-trash command server side handling. #399

Merged

maobaolong mentioned this pull request Aug 27, 2020

HDDS-4151. Skip the inputstream while offset larger than zero in s3g #1354

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-3223. Improve s3g read 1GB object efficiency by 100 times #843

HDDS-3223. Improve s3g read 1GB object efficiency by 100 times #843

Uh oh!

runzhiwang commented Apr 20, 2020 •

edited

Loading

Uh oh!

bharatviswa504 left a comment

Uh oh!

Uh oh!

runzhiwang commented Apr 21, 2020

Uh oh!

runzhiwang commented Apr 22, 2020

Uh oh!

runzhiwang left a comment •

edited

Loading

Uh oh!

bharatviswa504 commented Apr 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HDDS-3223. Improve s3g read 1GB object efficiency by 100 times #843

HDDS-3223. Improve s3g read 1GB object efficiency by 100 times #843

Uh oh!

Conversation

runzhiwang commented Apr 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

bharatviswa504 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

runzhiwang commented Apr 21, 2020

Uh oh!

runzhiwang commented Apr 22, 2020

Uh oh!

runzhiwang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bharatviswa504 commented Apr 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

runzhiwang commented Apr 20, 2020 •

edited

Loading

runzhiwang left a comment •

edited

Loading