Skip to content

Conversation

@devmadhuu
Copy link
Contributor

@devmadhuu devmadhuu commented Jul 17, 2025

What changes were proposed in this pull request?

This PR change is to address the issue of memory leak in NSSummary map when lot of files and directories gets deleted and some orphan dir links gets unlinked from NSSummary tree, then they can lie in map for ever. So once the directories and files are completely deleted from Ozone including deletedDirTable, then we should cleanup the entries from NSSummary map as well as NSSummary table.

Key Changes Made:

  1. NSSummaryTaskWithFSO.java:
    - Extended getTaskTables() to include DELETED_DIR_TABLE for monitoring hard delete operations
    - Added handleUpdateOnDeletedDirTable() method to properly handle DELETE operations on deletedDirTable
  2. Memory Leak Fix Logic:
    - For files: When entries are removed from deletedTable (hard delete), we update the parent directory's NSSummary to decrement file count, size, and file size
    buckets
    - For directories: When entries are removed from deletedDirTable (hard delete), we remove the directory's NSSummary entry and update the parent directory to
    remove the child directory reference
    - Database cleanup: Added calls to deleteNSSummary() to ensure entries are properly removed from the database, not just from the in-memory map
  3. Test Implementation:
    - Fixed the OmKeyInfo.Builder NPE by adding all required fields (setFileName, setCreationTime, setModificationTime, etc.)
    - Created a comprehensive test that simulates the soft delete → hard delete flow
    - The test validates that NSSummary objects are properly cleaned up during hard delete operations

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8565

(Please replace this section with the link to the Apache JIRA)
Tested by running docker and existing integration and unit tests.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmadhuu Thanks for working over this, given few comments

@devmadhuu devmadhuu requested a review from sumitagrawl July 18, 2025 05:49
@adoroszlai adoroszlai changed the title HDDS-8565. Recon - memory leak in NSSummary in Recon. HDDS-8565. Recon memory leak in NSSummary Jul 18, 2025
@devmadhuu devmadhuu requested a review from sumitagrawl July 18, 2025 14:40
Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ArafatKhan2198
Copy link
Contributor

Hey Devesh!
Thanks for the changes!
Looking at the PR description and the actual code changes, I noticed there's a discrepancy. The implementation only handles the deletedDirTable in the NSSummary tree in Recon, but the PR description mentions operations on both DELETED_TABLE and DELETED_DIR_TABLE.

Could you please update the PR description to accurately reflect the actual changes made?
Thanks!

@adoroszlai adoroszlai marked this pull request as draft July 21, 2025 20:34
@adoroszlai
Copy link
Contributor

Please wait for clean CI run in fork before opening PR.

These tests are failing:

org.apache.hadoop.ozone.recon.TestReconWithOzoneManagerFSO
org.apache.hadoop.ozone.recon.tasks.TestNSSummaryTaskWithFSO$TestProcess

@devmadhuu devmadhuu marked this pull request as ready for review July 22, 2025 12:44
@devmadhuu
Copy link
Contributor Author

TestNSSummaryTaskWithFSO

CI is green in fork.

Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks devesh!
LGTM +1

@ArafatKhan2198 ArafatKhan2198 merged commit a9c6f7f into apache:master Jul 22, 2025
53 checks passed
errose28 added a commit to errose28/ozone that referenced this pull request Jul 30, 2025
* master: (730 commits)
  HDDS-13083. Handle cases where block deletion generates tree file before scanner (apache#8565)
  HDDS-12982. Reduce log level for snapshot validation failure (apache#8851)
  HDDS-13396. Documentation: Improve the top-level overview page for new users. (apache#8753)
  HDDS-13176. containerIds table value format change to proto from string (apache#8589)
  HDDS-13449. Incorrect Interrupt Handling for DirectoryDeletingService and KeyDeletingService (apache#8817)
  HDDS-2453. Add Freon tests for S3 MPU Keys (apache#8803)
  HDDS-13237. Container data checksum should contain block IDs. (apache#8773)
  HDDS-13489. Fix SCMBlockdeleting unnecessary iteration in corner case. (apache#8847)
  HDDS-13464. Make ozone.snapshot.filtering.service.interval reconfigurable (apache#8825)
  HDDS-13473. Amend validation for OZONE_OM_SNAPSHOT_DB_MAX_OPEN_FILES (apache#8829)
  HDDS-13435. Add an OzoneManagerAuthorizer interface (apache#8840)
  HDDS-8565. Recon memory leak in NSSummary (apache#8823).
  HDDS-12852. Implement a sliding window counter utility (apache#8498)
  HDDS-12000. Add unit test for RatisContainerSafeModeRule and ECContainerSafeModeRule (apache#8801)
  HDDS-13092. Container scanner should trigger volume scan when marking a container unhealthy (apache#8603)
  HDDS-13070. OM Follower changes to create and place sst files from hardlink file. (apache#8761)
  HDDS-13482. Mark testWriteStateMachineDataIdempotencyWithClosedContainer as flaky
  HDDS-13481. Fix success latency metric in SCM panels of deletion grafana dashboard (apache#8835)
  HDDS-13468. Update default value of ozone.scm.ha.dbtransactionbuffer.flush.interval. (apache#8834)
  HDDS-13410. Control block deletion for each DN from SCM. (apache#8767)
  ...

hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaInfo.java
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ReconcileSubcommand.java
hadoop-ozone/cli-admin/src/test/java/org/apache/hadoop/hdds/scm/cli/container/TestReconcileSubcommand.java
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jul 31, 2025
Gargi-jais11 pushed a commit to Gargi-jais11/ozone that referenced this pull request Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants