Skip to content

Hdds 12793#19

Closed
ChenSammi wants to merge 6 commits intoHDDS-8342from
HDDS-12793
Closed

Hdds 12793#19
ChenSammi wants to merge 6 commits intoHDDS-8342from
HDDS-12793

Conversation

@ChenSammi
Copy link
Owner

What changes were proposed in this pull request?

Provide a one-liner summary of the changes in the PR Title field above.
It should be in the form of HDDS-1234. Short summary of the change.

Please describe your PR in detail:

  • What changes are proposed in the PR? and Why? It would be better if it is written from third person's
    perspective not just for the reviewer.
  • Provide as much context and rationale for the pull request as possible. It could be copy-paste from
    the Jira's description if the jira is well defined.
  • If it is complex code, describe the approach used to solve the issue. If possible attach design doc,
    issue investigation, github discussion, etc.

Examples of well-written pull requests:

What is the link to the Apache JIRA

Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull
request which starts with the corresponding JIRA issue number. (e.g. HDDS-XXXX. Fix a typo in YYY.)

(Please replace this section with the link to the Apache JIRA)

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)

@gemini-code-assist
Copy link

Summary of Changes

Hello @ChenSammi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the KeyLifecycleService within Ozone Manager to improve how lifecycle rules are applied to directories, particularly in File System Optimized (FSO) buckets. The changes introduce a more sophisticated, stack-based recursive evaluation mechanism, allowing for accurate and efficient processing of deeply nested directories. Additionally, the trash functionality has been refined to properly handle the movement of directories, ensuring that lifecycle policies are consistently applied across both keys and directories. New metrics are also included to provide better visibility into directory rename operations.

Highlights

  • Recursive Directory Evaluation: The KeyLifecycleService now employs a stack-based, depth-first traversal for evaluating directories, enabling more robust and efficient processing of nested directory structures in FSO buckets.
  • Unified Key and Directory Handling: A new evaluateKeyAndDirTable method unifies the logic for processing both keys and directories, streamlining the lifecycle rule application process.
  • Improved Trash Functionality for Directories: The trash mechanism has been enhanced to correctly move directories to the trash, including the creation of necessary nested trash directories, and new metrics have been added to track renamed directories.
  • New Configuration for Directory Cache: A new configuration key, ozone.lifecycle.service.delete.cached.directory.max-count, has been introduced to control the maximum number of directories cached during lifecycle service deletion, with a default of 1,000,000.
  • New Internal Utility Classes: Several new internal classes (PendingEvaluateDirectory, SubDirectorySummary, LimitedSizeStack, CapacityFullException) were added to support the new recursive directory evaluation logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly refactors the KeyLifecycleService to improve the handling of lifecycle rules for FSO buckets, particularly for nested directories and prefix-based rules. It introduces a more robust depth-first search mechanism using a stack to traverse directory structures, which should prevent out-of-memory issues with deep hierarchies. The changes also add support for moving directories to trash, a previously missing feature. The new logic appears more correct and is well-supported by extensive new tests. My review includes a couple of performance suggestions to optimize collection lookups within the new directory traversal logic.

return;
}

List<Long> deletedDirList = new ArrayList<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better performance, consider using a HashSet<Long> instead of an ArrayList<Long> for deletedDirList (perhaps renaming it to deletedDirSet). The current implementation uses contains() and remove(Object) on an ArrayList on lines 537-540, which have O(n) time complexity. With a HashSet, these operations would be O(1) on average, which can be a significant improvement if the number of deleted directories is large.

You would also need to update its usage on lines 537-540 and 624. For example, lines 537-540 could be changed to:

for (OmDirectoryInfo subDir : subDirSummary.getSubDirList()) {
  if (deletedDirSet.remove(subDir.getObjectID())) {
    deletedDirCount++;
  }
}
Suggested change
List<Long> deletedDirList = new ArrayList<>();
Set<Long> deletedDirSet = new HashSet<>();

// and fromKey is also in table
long numKeysUnderDir = 0;
long numKeysExpired = 0;
List<String> deletedKeyList = new ArrayList();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better performance, consider using a HashSet<String> for deletedKeyList (perhaps renaming it to deletedKeySet). The contains() check on line 590 has O(n) complexity for an ArrayList, which could be slow if there are many deleted keys in the cache. A HashSet would provide O(1) average time complexity for this check.

You would also need to update its usage on lines 563 and 590.

Suggested change
List<String> deletedKeyList = new ArrayList();
Set<String> deletedKeySet = new HashSet<>();

@github-actions
Copy link

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale label Nov 27, 2025
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

@github-actions github-actions bot closed this Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant