Skip to content

Conversation

@swamirishi
Copy link
Contributor

@swamirishi swamirishi commented Jun 3, 2025

What changes were proposed in this pull request?

With implementation of ReclaimableKeyFilter & ReclaimableDirectoryFilter both snapshot directory deep cleaning and AOS DirectoryDeletingService can be unified which will simplify the entire implementation rather than having multiple implementations for DirectoryDeletingService.
Algorithm in play:

  1. The directory deleting service would iterate through all the entries in the deletedDirectoryTable for a store(AOS and snapshot). One thread processing deleted directory entry for a particular store.

  2. This thread would further initialize a DeletedDirectorySupplier which is an implementation for thread safe table iterator(We can eventually use a TableSpliterator after HDDS-12742. Make RDBStoreAbstractIterator Return Reference-Counted KeyValues #8203 gets merged).

  3. Each thread here would be iterating through the deletedDirectoryTable independently and would submit purgeDirectoriesRequests where the logic for directoryPurge is as follows:
    - Check if the deletedDirectory object from deletedDirectoryTable can be reclaimed as in check if the reference for directory object exists in the previous directory.
    - If the directory exists in the previous snapshot then we need not expand the sub files inside the files[Number of files would be a lot generally (number of files >>> number of directories)] and we would just expand all the subdirectories under the directory(this would help in the recursively iterating on the sub files in the next iteration instead of traversing again from root all over). Now we cannot purge the directories as a reference exists we would just move all sub directories present in the AOS directory table and files which can be reclaimed i.e. files which are not present in the previous snapshot's fileTable by iterating through the sub file list based on deleted directory prefix in the AOS file table and move it to the deleted space of the store[Check HDDS-11244. OmPurgeDirectoriesRequest should clean up File and Directory tables of AOS for deleted snapshot directories #8509].
    - If the directory doesn't exist in the previous snapshot we have to blindly remove all entries[sub files & sub directories] inside this directory irrespective whether it can be reclaimed or not and purge the directory if all the entries are going to be moved from AOS to the specific store's deleted space of the store.

  4. Now once all threads have processed if none of the threads have been able to move any entry from the key space to deleted space(All keys already have been reclaimed in the previous run) we can go ahead update the exclusive size of the previous snapshot and the deep clean flag of the snapshot.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13034

How was this patch tested?

Existing unit tests and additional assertions added on existing tests.

…s and deleted subFiles

Change-Id: Ic1fc709b3963cde14c2a7fb64b687322a29e642a
Change-Id: I47b24dfc3b5afa3cefbdc85ac7b3e4a9b8c94869
…ectoryFilter & ReclaimableKeyFilter

Change-Id: Iffdda403cba914ddef2b75808cfbef1a72b9a2d3
@jojochuang jojochuang added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Jun 3, 2025
swamirishi added 11 commits June 3, 2025 00:13
Change-Id: I11acc3782aadf8393f731adcaa2a436dd9b534ae
…kets and volumes have already been deleted

Change-Id: I16dc9d8f00686320b4e98fa5691420294a7f1e2f
Change-Id: Ie5fd1406bbb8af3a9ba76440dcba9b8d8db14691
Change-Id: I61ef68263ff88daa0e53dfb9d7d8eb62495d226b
Change-Id: I2667c6d12523f4dee7cbcf7c48c93803fe84d3d4
Change-Id: I5ed93af3b5ae794b0cfe4671ec2a851592edcb8c
Change-Id: I4c234b75208d96a146c7498bb6c2c188a270e1b8

# Conflicts:
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
#	hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/filter/AbstractReclaimableFilterTest.java
Change-Id: Iac3af98a7e568a135073b6704a6ad5a5fac7b427
Change-Id: I9b7b41cf667e03d48120a4201757e445227924f7
Change-Id: I2ff1cf3ecf3baa00a5c5646901f6c9ffdbe6e370
Change-Id: Ibe4244ba9eac58eeb86ff4a5a65bd42b15b2a8ae
Change-Id: I2c74594424fac70e62750815b45daf3780f7d85c
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check Gemini's review comments jojochuang#573

*/
public static final String OZONE_SNAPSHOT_DEEP_CLEANING_ENABLED = "ozone.snapshot.deep.cleaning.enabled";
public static final boolean OZONE_SNAPSHOT_DEEP_CLEANING_ENABLED_DEFAULT = false;
@Deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer used.

@Deprecated
public static final String OZONE_SNAPSHOT_DIRECTORY_SERVICE_INTERVAL =
"ozone.snapshot.directory.service.interval";
@Deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer used.

@Deprecated
public static final String OZONE_SNAPSHOT_DIRECTORY_SERVICE_TIMEOUT =
"ozone.snapshot.directory.service.timeout";
@Deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer used.

@Deprecated
public static final String OZONE_SNAPSHOT_DIRECTORY_SERVICE_INTERVAL_DEFAULT
= "24h";
@Deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer used.

<tag>OZONE, PERFORMANCE, OM, DEPRECATED</tag>
<description>
The time interval between successive SnapshotDirectoryCleaningService
DEPRECATED. The time interval between successive SnapshotDirectoryCleaningService
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecated because it is replaced by DirectoryDeletingService

Change-Id: I7211fe692fb036d578535cbd06b88db634e228d3

# Conflicts:
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/DirectoryDeletingService.java
Change-Id: I2b38ebc6bfeaba0935d568864605efc53e9eb222
@swamirishi swamirishi requested a review from jojochuang June 6, 2025 18:25
@smengcl smengcl changed the title HDDS-13034. Refactor Directory Deleting Service to use ReclaimableDirectoryFilter & ReclaimableKeyFilter HDDS-13034. Refactor DirectoryDeletingService to use ReclaimableDirFilter and ReclaimableKeyFilter Jun 7, 2025
Change-Id: I76b3d086daf4c2b90af2ef5df0e53542e164c52b
Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks @swamirishi for working on this!

pls make sure the CI is green before merging

@smengcl smengcl marked this pull request as ready for review June 8, 2025 03:29
Change-Id: I218dfbf383800d2097df47141ee909fc45f8cfa7
@swamirishi swamirishi merged commit 5081ba2 into apache:master Jun 9, 2025
42 checks passed
@swamirishi
Copy link
Contributor Author

Thank you for reviewing the PR @smengcl & @jojochuang .

aswinshakil added a commit to aswinshakil/ozone that referenced this pull request Jun 9, 2025
…239-container-reconciliation

Commits: 80 commits
5e273a4 HDDS-12977. Fail build on dependency problems (apache#8574)
5081ba2 HDDS-13034. Refactor DirectoryDeletingService to use ReclaimableDirFilter and ReclaimableKeyFilter (apache#8546)
e936e4d HDDS-12134. Implement Snapshot Cache lock for OM Bootstrap (apache#8474)
31d13de HDDS-13165. [Docs] Python client developer guide. (apache#8556)
9e6955e HDDS-13205. Bump common-custom-user-data-maven-extension to 2.0.3 (apache#8581)
750b629 HDDS-13203. Bump Bouncy Castle to 1.81 (apache#8580)
ba5177e HDDS-13202. Bump build-helper-maven-plugin to 3.6.1 (apache#8579)
07ee5dd HDDS-13204. Bump awssdk to 2.31.59 (apache#8582)
e1964f2 HDDS-13201. Bump jersey2 to 2.47 (apache#8578)
81295a5 HDDS-13013. [Snapshot] Add metrics and tests for snapshot operations. (apache#8436)
b3d75ab HDDS-12976. Clean up unused dependencies (apache#8521)
e0f08b2 HDDS-13179. rename-generated-config fails on re-compile without clean (apache#8569)
f388317 HDDS-12554. Support callback on completed reconfiguration (apache#8391)
c13a3fe HDDS-13154 Link more Grafana dashboard json files to the Observability user doc (apache#8533)
2a761f7 HDDS-11967. [Docs]DistCP Integration in Kerberized environment. (apache#8531)
81fc4c4 HDDS-12550. Use DatanodeID instead of UUID in NodeManager CommandQueue. (apache#8560)
2360af4 HDDS-13169. Intermittent failure in testSnapshotOperationsNotBlockedDuringCompaction (apache#8553)
f19789d HDDS-13170. Reclaimable filter should always reclaim entries when buckets and volumes have already been deleted (apache#8551)
315ef20 HDDS-13175. Leftover reference to OM-specific trash implementation (apache#8563)
902e715 HDDS-13159. Refactor KeyManagerImpl for getting deleted subdirectories and deleted subFiles (apache#8538)
46a93d0 HDDS-12817. Addendum rename ecIndex to replicaIndex in chunkinfo output (apache#8552)
19b9b9c HDDS-13166. Set pipeline ID in BlockExistenceVerifier to avoid cached pipeline with different node (apache#8549)
b3ff67c HDDS-13068. Validate Container Balancer move timeout and replication timeout configs (apache#8490)
7a7b9a8 HDDS-13139. Introduce bucket layout flag in freon rk command (apache#8539)
3c25e7d HDDS-12595. Add verifier for container replica states (apache#8422)
6d59220 HDDS-13104. Move auditparser acceptance test under debug (apache#8527)
8e8c432 HDDS-13071. Documentation for Container Replica Debugger Tool (apache#8485)
0e8c8d4 HDDS-13158. Bump junit to 5.13.0 (apache#8537)
8e552b4 HDDS-13157. Bump exec-maven-plugin to 3.5.1 (apache#8534)
168f690 HDDS-13155. Bump jline to 3.30.4 (apache#8535)
cc1e4d1 HDDS-13156. Bump awssdk to 2.31.54 (apache#8536)
3bfb7af HDDS-13136. KeyDeleting Service should not run for already deep cleaned snapshots (apache#8525)
006e691 HDDS-12503. Compact snapshot DB before evicting a snapshot out of cache (apache#8141)
568b228 HDDS-13067. Container Balancer delete commands should not be sent with an expiration time in the past (apache#8491)
53673c5 HDDS-11244. OmPurgeDirectoriesRequest should clean up File and Directory tables of AOS for deleted snapshot directories (apache#8509)
07f4868 HDDS-13099. ozone admin datanode list ignores --json flag when --id filter is used (apache#8500)
08c0ab8 HDDS-13075. Fix default value in description of container placement policy configs (apache#8511)
58c87a8 HDDS-12177. Set runtime scope where missing (apache#8513)
10c470d HDDS-12817. Add EC block index in the ozone debug replicas chunk-info (apache#8515)
7027ab7 HDDS-13124. Respect config hdds.datanode.use.datanode.hostname when reading from datanode (apache#8518)
b8b226c HDDS-12928. datanode min free space configuration (apache#8388)
fd3d70c HDDS-13026. KeyDeletingService should also delete RenameEntries (apache#8447)
4c1c6cf HDDS-12714. Create acceptance test framework for debug and repair tools (apache#8510)
fff80fc HDDS-13118. Remove duplicate mockito-core dependency from hdds-test-utils (apache#8508)
10d5555 HDDS-13115. Bump awssdk to 2.31.50 (apache#8505)
360d139 HDDS-13017. Fix warnings due to non-test scoped test dependencies (apache#8479)
1db1cca HDDS-13116. Bump jline to 3.30.3 (apache#8504)
322ca93 HDDS-13025. Refactor KeyDeletingService to use ReclaimableKeyFilter (apache#8450)
988b447 HDDS-5287. Document S3 ACL classes (apache#8501)
64bb29d HDDS-12777. Use module-specific name for generated config files (apache#8475)
54ed115 HDDS-9210. Update snapshot chain restore test to incorporate snapshot delete. (apache#8484)
87dfa5a HDDS-13014. Improve PrometheusMetricsSink#normalizeName performance (apache#8438)
7cdc865 HDDS-13100. ozone admin datanode list --json should output a newline at the end (apache#8499)
9cc4194 HDDS-13089. [snapshot] Add an integration test to verify snapshotted data can be read by S3 SDK client (apache#8495)
cb9867b HDDS-13065. Refactor SnapshotCache to return AutoCloseSupplier instead of ReferenceCounted (apache#8473)
a88ff71 HDDS-10979. Support STANDARD_IA S3 storage class to accept EC replication config (apache#8399)
6ec8f85 HDDS-13080. Improve delete metrics to show number of timeout DN command from SCM (apache#8497)
3bb8858 HDDS-12378. Change default hdds.scm.safemode.min.datanode to 3 (apache#8331)
0171bef HDDS-13073. Set pipeline ID in checksums verifier to avoid cached pipeline with different node (apache#8480)
5c7726a HDDS-11539. OzoneClientCache `@PreDestroy` is never called (apache#8493)
a8ed19b HDDS-13031. Implement a Flat Lock resource in OzoneManagerLock (apache#8446)
e9e8b30 HDDS-12935. Support unsigned chunked upload with STREAMING-UNSIGNED-PAYLOAD-TRAILER (apache#8366)
7590268 HDDS-13079. Improve logging in DN for delete operation. (apache#8489)
435fe7e HDDS-12870. Fix listObjects corner cases (apache#8307)
eb5dabd HDDS-12926. Remove *.tmp.* exclusion in DU (apache#8486)
eeb98c7 HDDS-13030. Snapshot Purge should unset deep cleaning flag for next 2 snapshots in the chain (apache#8451)
6bf121e HDDS-13032. Support proper S3OwnerId representation (apache#8478)
5d1b43d HDDS-13076. Refactor OzoneManagerLock class to rename Resource class to LeveledResource (apache#8482)
bafe6d9 HDDS-13064. [snapshot] Add test coverage for SnapshotUtils.isBlockLocationInfoSame() (apache#8476)
7035846 HDDS-13040. Add user doc highlighting the difference between Ozone ACL and S3 ACL. (apache#8457)
1825cdf HDDS-13049. Deprecate VolumeName & BucketName in OmKeyPurgeRequest and prevent Key version purge on Block Deletion Failure (apache#8463)
211c76c HDDS-13060. Change NodeManager.addDatanodeCommand(..) to use DatanodeID (apache#8471)
f410238 HDDS-13061. Add test for key ACL operations without permission (apache#8472)
d1a2f48 HDDS-13057. Increment block delete processed transaction counts regardless of log level (apache#8466)
0cc6fcc HDDS-13043. Replace != with assertNotEquals in TestSCMContainerPlacementRackAware (apache#8470)
e1c779a HDDS-13051. Use DatanodeID in server-scm. (apache#8465)
35e1126 HDDS-13042. [snapshot] Add future proofing test cases for unsupported file system API (apache#8458)
619c05d HDDS-13008. Exclude same SST files when calculating full snapdiff (apache#8423)
21b49d3 HDDS-12965. Fix warnings about "used undeclared" dependencies (apache#8468)
8136119 HDDS-13048. Create new module for Recon integration tests (apache#8464)

Conflicts:
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeManager.java
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jun 13, 2025
sadanand48 pushed a commit to sadanand48/hadoop-ozone that referenced this pull request Jun 16, 2025
swamirishi added a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025
…aimableDirFilter and ReclaimableKeyFilter (apache#8546)

(cherry picked from commit 5081ba2)

 Conflicts:
	hadoop-hdds/common/src/main/resources/ozone-default.xml
	hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/TestDirectoryDeletingServiceWithFSO.java
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDeletingServiceIntegrationTest.java
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDirectoryCleaningService.java
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/AbstractKeyDeletingService.java
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/DirectoryDeletingService.java
	hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/service/TestKeyDeletingService.java

Change-Id: Id17ecbc2b289dc159bd06a1c9a59281ac57432a8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants