Skip to content

HDDS-7231. Integrate the GetKeyInfo API to key read flows#3800

Merged
kerneltime merged 7 commits intoapache:masterfrom
duongkame:HDDS-7231
Oct 28, 2022
Merged

HDDS-7231. Integrate the GetKeyInfo API to key read flows#3800
kerneltime merged 7 commits intoapache:masterfrom
duongkame:HDDS-7231

Conversation

@duongkame
Copy link
Contributor

@duongkame duongkame commented Oct 5, 2022

What changes were proposed in this pull request?

This is a part of the container location cache implementation(HDDS-7223), specified in Client Interaction design.

This PR includes the changes to integrate the GetKeyInfo API to key read flows, keeping client code backward compatible with previous OM versions.

https://issues.apache.org/jira/browse/HDDS-7231

Details about client interaction changes:

Before:

  1. Client calls lookupKey.
  2. Client calls getBlock and readChunk. They when there's an exception in either calls:
    • If it's a security problems, fail without retry.
    • If StorageContainerException (threw when Datanode returns any stattus code than), recall lookupKey to get latest block location info.
    • If it a conectivity issues (all datanodes are not reachable), or any IOException, just retry without recalling lookupKey.

After:

Client vs new OM.

  1. Client calls getKeyInfo with forceCacheUpdate = false to tell OM to take container location from its cache.
  2. Client calls getBlock and readChunk. When there's an exception in either call:
    • If it's a security problems, fail without retry.
    • If StorageContainerException (threw when Datanode returns any stattus code than SUCCESS) or a conectivity issue (all datanodes are not reachable), recall getKeyInfo with forceCacheUpdate = true refresh block location from OM and tell OM to call SCM for updated container locations.
    • If any other IOException, just retry without recalling getKeyInfo.

Client vs old OM.

  1. Client calls lookupKey.
  2. Client calls getBlock and readChunk. When there's an exception in either calls:
    • If it's a security problems, fail without retrying.
    • If StorageContainerException (threw when Datanode returns any stattus code than SUCCESS) or a conectivity issue (all datanodes are not reachable), recall lookupKey to get latest block location info.
    • If any other IOException, just retries without recalling lookupKey.

How was this patch tested?

Standard CI: https://github.com/duongkame/ozone/actions/runs/3191413426/jobs/5207698694

  • Integration tests are added to verify clients correctly

Manual verification for the following cases:

  • New client vs old OM.
  • Old client vs new OM.

@duongkame duongkame marked this pull request as ready for review October 7, 2022 18:06
@kerneltime
Copy link
Contributor

@GeorgeJahad

@duongkame duongkame marked this pull request as draft October 11, 2022 23:42
@duongkame duongkame marked this pull request as ready for review October 11, 2022 23:42
@duongkame duongkame marked this pull request as draft October 12, 2022 23:34
@duongkame duongkame marked this pull request as ready for review October 12, 2022 23:35
@GeorgeJahad
Copy link
Contributor

@GeorgeJahad
Copy link
Contributor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does the assumeS3Context parameter get set to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. The parameter will be used as per HDDS-7324 to improve S3G GET performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a very nicely written class of tests. A few minor nits below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very detailed review, @GeorgeJahad .

@duongkame
Copy link
Contributor Author

There is a call to looupKey here:

https://github.com/apache/ozone/blob/1ea4c2adc640a02508696bf346c7f1f7b6e0afd9/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java#L1264

Should that be changed to getKeyInfo()?

Yes, I cleaned that up as well.

There is a call to lookupFile() here:

https://github.com/apache/ozone/blob/1ea4c2adc640a02508696bf346c7f1f7b6e0afd9/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java#L1683

But lookupFile() doesn't appear to use the cache. Should it?

Thanks, lookupFile will be another path we will tackle for OFS. I think it's to contain the scope of this PR for key reads.
The file read path can probably be deprecated and merged to key read, but we'll see.

@duongkame duongkame requested review from GeorgeJahad and kerneltime and removed request for GeorgeJahad October 19, 2022 17:50
@duongkame duongkame marked this pull request as draft October 22, 2022 05:59
@duongkame duongkame marked this pull request as ready for review October 22, 2022 05:59
@kerneltime kerneltime merged commit 3294d28 into apache:master Oct 28, 2022
@duongkame duongkame deleted the HDDS-7231 branch April 12, 2025 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments