Skip to content

Conversation

@duongkame
Copy link
Contributor

What changes were proposed in this pull request?

Use container cache in Key listing API to improve Key/file listing performance.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8076

How was this patch tested?

Unit test.

@duongkame duongkame marked this pull request as ready for review March 3, 2023 23:03
Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

need another +1 as I might not have the full context.

@kerneltime
Copy link
Contributor

We need to check if the client is caching or exposing the pipeline-refreshed results in any way. One alternative without breaking behavior would be adding a lighter-weight list keys API (something discussed in the context of ls command and list bucket API context).

@duongkame
Copy link
Contributor Author

We need to check if the client is caching or exposing the pipeline-refreshed results in any way. One alternative without breaking behavior would be adding a lighter-weight list keys API (something discussed in the context of ls command and list bucket API context).

In the ozone-client, I don't see any code using block location out of a listStatus result.

lock location info is used in RpcClient getKey or getFile API that read key metadata individually from OM and creates OzoneInputStream that allows reading data from datanodes given the block location information.
OzoneInputStream will retry reading individual key metadata if it finds the metatadata doesn't work.

OFS listStatus convert the files information and block location to a structure of org.apache.hadoop.fs.LocatedFileStatus. I'm not sure how this is used by external dependency. But I guess any data read needs to be done by OzoneInputStream eventually.

There's one problem left, newer clients will have OzoneInputStream retry getting block location with getKeyInfo(cacheRefresh=true) and that will force the container location cache to refresh in OM. Yet, older clients will call lookupKey which calls SCM directly to grab container location. There will be a recurring performance degradation for older clients if OM caches an outdated container location.
But yet, I don't see a usage of block information from the result of listStatus yet.

@kerneltime
Copy link
Contributor

I will file a separate jira to work on lightweight list keys. For now, this looks good. Will try out the code and review.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cache TTL is 6 hours as default, as currently observed that location information is not used.
I can see below JIRA where block location is populated for integration with Hive as per code comment...

HDDS-2188. Implement LocatedFileStatus & getFileBlockLocations to provide node/localization information to Yarn/Mapreduce
HDDS-2914. Certain Hive queries started to fail on generating splits (#563)
HDDS-2914. Certain Hive queries started to fail on generating splits

Plz check if this integration have some imact

@kerneltime kerneltime requested review from fapifta and neils-dev March 13, 2023 16:54
@duongkame
Copy link
Contributor Author

LGTM, cache TTL is 6 hours as default, as currently observed that location information is not used. I can see below JIRA where block location is populated for integration with Hive as per code comment...

HDDS-2188. Implement LocatedFileStatus & getFileBlockLocations to provide node/localization information to Yarn/Mapreduce HDDS-2914. Certain Hive queries started to fail on generating splits (#563) HDDS-2914. Certain Hive queries started to fail on generating splits

Plz check if this integration have some imact

Thanks for the deep-dive, @sumitagrawl. As discussed in the community meeting, listFileStatus doesn't calculate block tokens for the return BlockLocationInformation. Thus, clients of the listFileStatus API can't use the result to read data. So, whatever the usage of LocatedFileStatus & getFileBlockLocations from the list API is not on the critical path of reading data and therefore no guarantee of strong block location info consistency.

I also got through the JIRAs you found. They helped me understand how the LocatedFileStatus is brought to OFS API. Also, the JIRAs only concerns with getFileStatus API which provide individual file information, and not listFileStatus. My best guess is that the LocatedFileStatus is reused for both OFS APIs for programming consistency (and not actually intended to listFileStatus clients).

@kerneltime kerneltime merged commit f83b008 into apache:master Mar 15, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Mar 16, 2023
* master: (262 commits)
  HDDS-8153. Integrate ContainerBalancer with MoveManager (apache#4391)
  HDDS-8090. When getBlock from a datanode fails, retry other datanodes. (apache#4357)
  HDDS-8163 Use try-with-resources to ensure close rockdb connection in SstFilteringService (apache#4402)
  HDDS-8065. Provide GNU long options (apache#4394)
  HDDS-7930. [addendum] input stream does not refresh expired block token.
  HDDS-7930. input stream does not refresh expired block token. (apache#4378)
  HDDS-7740. [Snapshot] Implement SnapshotDeletingService (apache#4244)
  HDDS-8076. Use container cache in Key listing API. (apache#4346)
  HDDS-8091. [addendum] Generate list of config tags from ConfigTag enum - Hadoop 3.1 compatibility fix (apache#4374)
  HDDS-8144. TestDefaultCertificateClient#testTimeBeforeExpiryGracePeriod fails as we approach DST. (apache#4382)
  HDDS-8151. Support fine grained lifetime for root CA certificate (apache#4386)
  HDDS-8150. RpcClientTest and ConfigurationSourceTest not run due to naming convention (apache#4388)
  HDDS-8131. Add Configuration for OM Ratis Log Purge Tuning Parameters. (apache#4371)
  HDDS-8133. Create ozone sh key checksum command (apache#4375)
  HDDS-8142. Check if no entries in Block DB for a container on container delete (apache#4379)
  HDDS-8118. Fail container delete on non empty chunks dir (apache#4367)
  HDDS-8028. JNI for RocksDB SST Dump tool (apache#4315)
  HDDS-8129. ContainerStateMachine allows two different tasks with the same container id running in parallel. (apache#4370)
  HDDS-8119. Remove loosely related AutoCloseable from SendContainerOutputStream (apache#4368)
  close db connection (apache#4366)
  ...
@duongkame duongkame deleted the HDDS-8076 branch April 12, 2025 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants