-
Notifications
You must be signed in to change notification settings - Fork 587
HDDS-8076. Use container cache in Key listing API. #4346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
smengcl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
need another +1 as I might not have the full context.
|
We need to check if the client is caching or exposing the pipeline-refreshed results in any way. One alternative without breaking behavior would be adding a lighter-weight list keys API (something discussed in the context of |
In the lock location info is used in OFS There's one problem left, newer clients will have |
|
I will file a separate jira to work on lightweight list keys. For now, this looks good. Will try out the code and review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, cache TTL is 6 hours as default, as currently observed that location information is not used.
I can see below JIRA where block location is populated for integration with Hive as per code comment...
HDDS-2188. Implement LocatedFileStatus & getFileBlockLocations to provide node/localization information to Yarn/Mapreduce
HDDS-2914. Certain Hive queries started to fail on generating splits (#563)
HDDS-2914. Certain Hive queries started to fail on generating splits
Plz check if this integration have some imact
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
Outdated
Show resolved
Hide resolved
…ne/om/KeyManagerImpl.java Co-authored-by: Ritesh H Shukla <[email protected]>
Thanks for the deep-dive, @sumitagrawl. As discussed in the community meeting, listFileStatus doesn't calculate block tokens for the return BlockLocationInformation. Thus, clients of the listFileStatus API can't use the result to read data. So, whatever the usage of I also got through the JIRAs you found. They helped me understand how the |
* master: (262 commits) HDDS-8153. Integrate ContainerBalancer with MoveManager (apache#4391) HDDS-8090. When getBlock from a datanode fails, retry other datanodes. (apache#4357) HDDS-8163 Use try-with-resources to ensure close rockdb connection in SstFilteringService (apache#4402) HDDS-8065. Provide GNU long options (apache#4394) HDDS-7930. [addendum] input stream does not refresh expired block token. HDDS-7930. input stream does not refresh expired block token. (apache#4378) HDDS-7740. [Snapshot] Implement SnapshotDeletingService (apache#4244) HDDS-8076. Use container cache in Key listing API. (apache#4346) HDDS-8091. [addendum] Generate list of config tags from ConfigTag enum - Hadoop 3.1 compatibility fix (apache#4374) HDDS-8144. TestDefaultCertificateClient#testTimeBeforeExpiryGracePeriod fails as we approach DST. (apache#4382) HDDS-8151. Support fine grained lifetime for root CA certificate (apache#4386) HDDS-8150. RpcClientTest and ConfigurationSourceTest not run due to naming convention (apache#4388) HDDS-8131. Add Configuration for OM Ratis Log Purge Tuning Parameters. (apache#4371) HDDS-8133. Create ozone sh key checksum command (apache#4375) HDDS-8142. Check if no entries in Block DB for a container on container delete (apache#4379) HDDS-8118. Fail container delete on non empty chunks dir (apache#4367) HDDS-8028. JNI for RocksDB SST Dump tool (apache#4315) HDDS-8129. ContainerStateMachine allows two different tasks with the same container id running in parallel. (apache#4370) HDDS-8119. Remove loosely related AutoCloseable from SendContainerOutputStream (apache#4368) close db connection (apache#4366) ...
What changes were proposed in this pull request?
Use container cache in Key listing API to improve Key/file listing performance.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8076
How was this patch tested?
Unit test.