Skip to content

Conversation

@duongkame
Copy link
Contributor

@duongkame duongkame commented Jul 13, 2022

What changes were proposed in this pull request?

Today, a keyLookup (head-object) in S3G makes 3 calls to OM (ref: ObjectEndpoint, EndpointBase and ObjectStore).

  1. getS3VolumeContext: to get the S3 volume name under the tenant (callee) context.
  2. getBucket: for no particular reason, maybe it's only for code convenience.
  3. headObject(volume, bucket, key): actual key lookup.

The same pattern happens for all APIs in ObjectEndpoint.

This code change removes the getBucket call where possible to improve S3 throughput and avoid storming OM unnecessarily.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7009

How was this patch tested?

Unit test, robot test.

Copy link
Contributor

@kerneltime kerneltime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DuongNguyen0 for starting the discussion for this change.

@swagle swagle requested a review from smengcl July 13, 2022 21:09
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is present in other places as well (e.g get, put, delete, abort etc). Is it possible to fix there as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it's definitely what we want. I'd like to take that review to confirm the approach then we can apply that for other S3 APIs.

@duongkame duongkame changed the title HDDS-6996. Avoid calling getBucket/getS3VolumeContext when looking up a key HDDS-7009. Avoid calling getBucket in object APIs Jul 15, 2022
@duongkame duongkame marked this pull request as ready for review July 15, 2022 19:33
@duongkame
Copy link
Contributor Author

duongkame commented Jul 15, 2022

@kerneltime I've changed this PR scope to only avoid the getBucket calls in relevant Object APIs. The part of optimizing to avoid getS3VolumeContext in a separate call will be done by HDDS-7011.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DuongNguyen0 for refining the patch. Can ObjectEndpoint use ClientProtocol/RpcClient directly, instead of having to pollute OzoneVolume with key-related operations?

@duongkame
Copy link
Contributor Author

Thanks @DuongNguyen0 for refining the patch. Can ObjectEndpoint use ClientProtocol/RpcClient directly, instead of having to pollute OzoneVolume with key-related operations?

Yes, it can be done that way. I decided to follow the precedence to easily stub the OzoneVolume for testing. But maybe it's not worth it to add some unhelpful code to OzoneVolume.

@duongkame duongkame closed this Jul 18, 2022
@duongkame duongkame reopened this Jul 18, 2022
@duongkame
Copy link
Contributor Author

Thanks @DuongNguyen0 for refining the patch. Can ObjectEndpoint use ClientProtocol/RpcClient directly, instead of having to pollute OzoneVolume with key-related operations?

Updated to use ClientProtocol directly instead of putting delegation in OzoneVolume. I didn't clean up all delegations in OzoneVolume and OzoneBucket, thought, as they're used by OFS as well. And anyway, same optimization to avoid calling getBucket (or maybe even getVolume because volume is already in OFS path context) can be done in many OFS operations I believe.

@kerneltime
Copy link
Contributor

Thanks @DuongNguyen0 for refining the patch. Can ObjectEndpoint use ClientProtocol/RpcClient directly, instead of having to pollute OzoneVolume with key-related operations?

In some sense, volume is a composition of buckets and keys, so having this as part of volume might not be that bad.

@adoroszlai
Copy link
Contributor

adoroszlai commented Jul 19, 2022

Can ObjectEndpoint use ClientProtocol/RpcClient directly, instead of having to pollute OzoneVolume with key-related operations?

In some sense, volume is a composition of buckets and keys, so having this as part of volume might not be that bad.

ClientProtocol already provides that interface. I think OzoneVolume/OzoneBucket etc. are a hierarchical abstraction over that. Adding everything to OzoneVolume would be a maintenance problem due to duplication.

@duongkame
Copy link
Contributor Author

@smengcl can you have a look?

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DuongNguyen0 for the patch. Overall lgtm. Some minor comments inline.

IIUC, the idea is similar to what we've done in OFS -- use ClientProtocol directly wherever possible in order to get rid of the extra round trip that gets the bucket before getting the key.

New acceptance test cases (objecthead.robot) are triggered and passing: https://github.com/apache/ozone/runs/7416388703

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending CI. Some kubernetes (seemingly unrelated) tests were failing.

@kerneltime kerneltime merged commit 5e17998 into apache:master Jul 20, 2022
duongkame added a commit to duongkame/ozone that referenced this pull request Aug 25, 2022
@duongkame duongkame deleted the HDDS-6996 branch April 12, 2025 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants