-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-8565. Recon memory leak in NSSummary #8823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sumitagrawl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devmadhuu Thanks for working over this, given few comments
...n/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/ReconNamespaceSummaryManagerImpl.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithFSO.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithFSO.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithFSO.java
Outdated
Show resolved
Hide resolved
...p-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithLegacy.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithFSO.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithFSO.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithFSO.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithFSO.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithFSO.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithFSO.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithFSO.java
Outdated
Show resolved
Hide resolved
...one/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskDbEventHandler.java
Outdated
Show resolved
Hide resolved
… in NSSummary in Recon.
sumitagrawl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Hey Devesh! Could you please update the PR description to accurately reflect the actual changes made? |
|
Please wait for clean CI run in fork before opening PR. These tests are failing: |
CI is green in fork. |
ArafatKhan2198
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks devesh!
LGTM +1
* master: (730 commits) HDDS-13083. Handle cases where block deletion generates tree file before scanner (apache#8565) HDDS-12982. Reduce log level for snapshot validation failure (apache#8851) HDDS-13396. Documentation: Improve the top-level overview page for new users. (apache#8753) HDDS-13176. containerIds table value format change to proto from string (apache#8589) HDDS-13449. Incorrect Interrupt Handling for DirectoryDeletingService and KeyDeletingService (apache#8817) HDDS-2453. Add Freon tests for S3 MPU Keys (apache#8803) HDDS-13237. Container data checksum should contain block IDs. (apache#8773) HDDS-13489. Fix SCMBlockdeleting unnecessary iteration in corner case. (apache#8847) HDDS-13464. Make ozone.snapshot.filtering.service.interval reconfigurable (apache#8825) HDDS-13473. Amend validation for OZONE_OM_SNAPSHOT_DB_MAX_OPEN_FILES (apache#8829) HDDS-13435. Add an OzoneManagerAuthorizer interface (apache#8840) HDDS-8565. Recon memory leak in NSSummary (apache#8823). HDDS-12852. Implement a sliding window counter utility (apache#8498) HDDS-12000. Add unit test for RatisContainerSafeModeRule and ECContainerSafeModeRule (apache#8801) HDDS-13092. Container scanner should trigger volume scan when marking a container unhealthy (apache#8603) HDDS-13070. OM Follower changes to create and place sst files from hardlink file. (apache#8761) HDDS-13482. Mark testWriteStateMachineDataIdempotencyWithClosedContainer as flaky HDDS-13481. Fix success latency metric in SCM panels of deletion grafana dashboard (apache#8835) HDDS-13468. Update default value of ozone.scm.ha.dbtransactionbuffer.flush.interval. (apache#8834) HDDS-13410. Control block deletion for each DN from SCM. (apache#8767) ... hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaInfo.java hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ReconcileSubcommand.java hadoop-ozone/cli-admin/src/test/java/org/apache/hadoop/hdds/scm/cli/container/TestReconcileSubcommand.java
What changes were proposed in this pull request?
This PR change is to address the issue of memory leak in NSSummary map when lot of files and directories gets deleted and some orphan dir links gets unlinked from NSSummary tree, then they can lie in map for ever. So once the directories and files are completely deleted from Ozone including
deletedDirTable, then we should cleanup the entries from NSSummary map as well as NSSummary table.Key Changes Made:
NSSummaryTaskWithFSO.java:- Extended
getTaskTables()to include DELETED_DIR_TABLE for monitoring hard delete operations- Added
handleUpdateOnDeletedDirTable()method to properly handle DELETE operations ondeletedDirTable- For files: When entries are removed from
deletedTable(hard delete), we update the parent directory's NSSummary to decrement file count, size, and file sizebuckets
- For directories: When entries are removed from
deletedDirTable(hard delete), we remove the directory's NSSummary entry and update the parent directory toremove the child directory reference
- Database cleanup: Added calls to
deleteNSSummary()to ensure entries are properly removed from the database, not just from the in-memory map- Fixed the OmKeyInfo.Builder NPE by adding all required fields (setFileName, setCreationTime, setModificationTime, etc.)
- Created a comprehensive test that simulates the soft delete → hard delete flow
- The test validates that NSSummary objects are properly cleaned up during hard delete operations
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8565
(Please replace this section with the link to the Apache JIRA)
Tested by running docker and existing integration and unit tests.