HDDS-12503. Compact snapshot DB before evicting a snapshot out of cache #8141

peterxcli · 2025-03-23T16:17:28Z

What changes were proposed in this pull request?

Snapshot Cache should compact the rocksdb for all tables excepting for FileTable/DirectoryTable/KeyTable

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12503

How was this patch tested?

Tejaskriya

Changes LGTM overall. Just a small suggestion, as a compactTable method has been added to the DBStore and RDBStore, can you please update the CompactDBService#compactAsync() to use this method as well?

peterxcli · 2025-03-25T04:21:57Z

Sure, but maybe in another Jira?

Tejaskriya · 2025-03-27T05:44:26Z

@swamirishi what is the motivation behind triggering this compaction for snapshots? If it is performance related, then we might need to force bottommost compaction.

jojochuang · 2025-04-22T23:08:06Z

The PR looks really good to me. I wonder if compacting to the bottom most is required, because these are column families in the snapshots, so supposedly no change. Compacting to the bottom most for the first time a snapshot get evicted out of cache may make sense, but subsequent compactions for the same snapshot wouldn't make a difference.

Or maybe we should compact column families not tracked in the DAG when a snapshot is created? Do it in the background asynchronously so snapshot create request returns immediately.

jojochuang · 2025-04-30T03:41:41Z

@smengcl maybe you can help review this one? Specifically can we "compact column families not tracked in the DAG when a snapshot is created? Do it in the background asynchronously so snapshot create request returns immediately."

…-snapshot-db-before-evicting-a-snapshot-out-of-cache

smengcl · 2025-05-01T07:24:58Z

Thanks for the patch @peterxcli .

May I ask what is the motivation behind this? Is this required for correctness, or just an optimization?

cc @hemantk-12 @swamirishi

smengcl · 2025-05-01T08:24:46Z

hadoop-hdds/framework/src/test/java/org/apache/hadoop/hdds/utils/db/TestRDBStore.java

+  public void compactTable() throws Exception {
+    assertNotNull(rdbStore, "DBStore cannot be null");
+
+    for (int j = 0; j <= 20; j++) {


I have an interesting observation here.

When I play with the number of insertions. I noticed that metaSizeBeforeCompact always fluctuates between 2 to 5.

It looks like once it is about to reach 6, it does some compaction before the new insertion.

smengcl

lgtm implementation-wise.

swamirishi

Thanks for the patch @peterxcli overall the implementation looks good. But we need to make the compactTable synchronous or wait for all compactions before closing the db. By default manual compactions are asynchronous. This would work for AOS rocksdb but it won't work for snapshots.

swamirishi · 2025-05-01T12:35:09Z

hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java

+  @Override
+  public void compactTable(String tableName) throws IOException {
+    try (ManagedCompactRangeOptions options = new ManagedCompactRangeOptions()) {
+      compactTable(tableName, options);


This is not synchronous. Even if we call this function from snapshot cache to compact db, it wouldn't be helpful since the compaction would be cancelled. We would have to wait for all the compactions to finish before we close it.

Consider ManagedCompactRangeOptions.setExclusiveManualCompaction(true)

https://github.com/facebook/rocksdb/wiki/Manual-Compaction

Consider ManagedCompactRangeOptions.setExclusiveManualCompaction(true)

https://github.com/facebook/rocksdb/wiki/Manual-Compaction

This seems default to true.
https://github.com/facebook/rocksdb/wiki/Manual-Compaction#compactrange

CompactRangeOptions::exclusive_manual_compaction When set to true, no other compaction will run when this manual compaction is running. Default value is true

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java

jojochuang · 2025-05-01T14:58:05Z

Also when we discussed about this PR, @swamirishi wanted to have a configuration to turn off this behavior, since we're not sure if it's going to help with performance or adding extra overhead.

… cache

…-snapshot-db-before-evicting-a-snapshot-out-of-cache

…ault.xml

…-snapshot-db-before-evicting-a-snapshot-out-of-cache

peterxcli · 2025-05-14T18:48:06Z

@swamirishi would you like to take another look? TIA!

swamirishi

@peterxcli Left some more comments inline

swamirishi · 2025-05-17T00:26:45Z

hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java

+    if (columnFamily == null) {
+      throw new IOException("Table not found: " + tableName);
+    }
+    db.compactRange(columnFamily, null, null, options);


We should set options.setMaxCompactionBytes() when we open the snapshot rocksdb this could be useful so that we ensure one sub compaction doesn't take up a lot of memory. Also look into making ManagedCompactRangeOptions.setMaxSubCompactions() configurable so that we don't use a lot of CPU for this operation. It is ok if the compactions take time.

Created one to track: https://issues.apache.org/jira/browse/HDDS-13114

swamirishi · 2025-05-17T00:30:17Z

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java

            LOG.debug("Closing SnapshotId {}. It is not being referenced anymore.", k);
            // Close the instance, which also closes its DB handle.
            try {
+              compactSnapshotDB(v.get());


We should perform this outside dbMap.compute since this can block snapshot reads. dbMap.compute is blocking. We can just move this above line number 235 same level as the dbMap.compute()

@jojochuang @smengcl This can block other read operations. Let us perform the snapshot compaction while we still allow reads to go through. Compactions shouldn't block active snapshot operations both background services and active user reads. @peterxcli please write a test case for this as well so that we don't inadvertently block any snapshot operations.

Done. please take another look, thanks!

…-snapshot-db-before-evicting-a-snapshot-out-of-cache

…other thread that reading same key

peterxcli · 2025-05-31T03:24:25Z

Verified build on my local.

gh pr checkout 8141
git merge --no-commit upstream/master
mvn -DskipRecon -DskipShade -DskipTests clean verify
hadoop-ozone/dev-support/checks/checkstyle.sh
hadoop-ozone/dev-support/checks/pmd.sh

peterxcli · 2025-05-31T03:25:48Z

Thanks @Tejaskriya, @jojochuang, @smengcl and @smengcl for the review!

…239-container-reconciliation Commits: 80 commits 5e273a4 HDDS-12977. Fail build on dependency problems (apache#8574) 5081ba2 HDDS-13034. Refactor DirectoryDeletingService to use ReclaimableDirFilter and ReclaimableKeyFilter (apache#8546) e936e4d HDDS-12134. Implement Snapshot Cache lock for OM Bootstrap (apache#8474) 31d13de HDDS-13165. [Docs] Python client developer guide. (apache#8556) 9e6955e HDDS-13205. Bump common-custom-user-data-maven-extension to 2.0.3 (apache#8581) 750b629 HDDS-13203. Bump Bouncy Castle to 1.81 (apache#8580) ba5177e HDDS-13202. Bump build-helper-maven-plugin to 3.6.1 (apache#8579) 07ee5dd HDDS-13204. Bump awssdk to 2.31.59 (apache#8582) e1964f2 HDDS-13201. Bump jersey2 to 2.47 (apache#8578) 81295a5 HDDS-13013. [Snapshot] Add metrics and tests for snapshot operations. (apache#8436) b3d75ab HDDS-12976. Clean up unused dependencies (apache#8521) e0f08b2 HDDS-13179. rename-generated-config fails on re-compile without clean (apache#8569) f388317 HDDS-12554. Support callback on completed reconfiguration (apache#8391) c13a3fe HDDS-13154 Link more Grafana dashboard json files to the Observability user doc (apache#8533) 2a761f7 HDDS-11967. [Docs]DistCP Integration in Kerberized environment. (apache#8531) 81fc4c4 HDDS-12550. Use DatanodeID instead of UUID in NodeManager CommandQueue. (apache#8560) 2360af4 HDDS-13169. Intermittent failure in testSnapshotOperationsNotBlockedDuringCompaction (apache#8553) f19789d HDDS-13170. Reclaimable filter should always reclaim entries when buckets and volumes have already been deleted (apache#8551) 315ef20 HDDS-13175. Leftover reference to OM-specific trash implementation (apache#8563) 902e715 HDDS-13159. Refactor KeyManagerImpl for getting deleted subdirectories and deleted subFiles (apache#8538) 46a93d0 HDDS-12817. Addendum rename ecIndex to replicaIndex in chunkinfo output (apache#8552) 19b9b9c HDDS-13166. Set pipeline ID in BlockExistenceVerifier to avoid cached pipeline with different node (apache#8549) b3ff67c HDDS-13068. Validate Container Balancer move timeout and replication timeout configs (apache#8490) 7a7b9a8 HDDS-13139. Introduce bucket layout flag in freon rk command (apache#8539) 3c25e7d HDDS-12595. Add verifier for container replica states (apache#8422) 6d59220 HDDS-13104. Move auditparser acceptance test under debug (apache#8527) 8e8c432 HDDS-13071. Documentation for Container Replica Debugger Tool (apache#8485) 0e8c8d4 HDDS-13158. Bump junit to 5.13.0 (apache#8537) 8e552b4 HDDS-13157. Bump exec-maven-plugin to 3.5.1 (apache#8534) 168f690 HDDS-13155. Bump jline to 3.30.4 (apache#8535) cc1e4d1 HDDS-13156. Bump awssdk to 2.31.54 (apache#8536) 3bfb7af HDDS-13136. KeyDeleting Service should not run for already deep cleaned snapshots (apache#8525) 006e691 HDDS-12503. Compact snapshot DB before evicting a snapshot out of cache (apache#8141) 568b228 HDDS-13067. Container Balancer delete commands should not be sent with an expiration time in the past (apache#8491) 53673c5 HDDS-11244. OmPurgeDirectoriesRequest should clean up File and Directory tables of AOS for deleted snapshot directories (apache#8509) 07f4868 HDDS-13099. ozone admin datanode list ignores --json flag when --id filter is used (apache#8500) 08c0ab8 HDDS-13075. Fix default value in description of container placement policy configs (apache#8511) 58c87a8 HDDS-12177. Set runtime scope where missing (apache#8513) 10c470d HDDS-12817. Add EC block index in the ozone debug replicas chunk-info (apache#8515) 7027ab7 HDDS-13124. Respect config hdds.datanode.use.datanode.hostname when reading from datanode (apache#8518) b8b226c HDDS-12928. datanode min free space configuration (apache#8388) fd3d70c HDDS-13026. KeyDeletingService should also delete RenameEntries (apache#8447) 4c1c6cf HDDS-12714. Create acceptance test framework for debug and repair tools (apache#8510) fff80fc HDDS-13118. Remove duplicate mockito-core dependency from hdds-test-utils (apache#8508) 10d5555 HDDS-13115. Bump awssdk to 2.31.50 (apache#8505) 360d139 HDDS-13017. Fix warnings due to non-test scoped test dependencies (apache#8479) 1db1cca HDDS-13116. Bump jline to 3.30.3 (apache#8504) 322ca93 HDDS-13025. Refactor KeyDeletingService to use ReclaimableKeyFilter (apache#8450) 988b447 HDDS-5287. Document S3 ACL classes (apache#8501) 64bb29d HDDS-12777. Use module-specific name for generated config files (apache#8475) 54ed115 HDDS-9210. Update snapshot chain restore test to incorporate snapshot delete. (apache#8484) 87dfa5a HDDS-13014. Improve PrometheusMetricsSink#normalizeName performance (apache#8438) 7cdc865 HDDS-13100. ozone admin datanode list --json should output a newline at the end (apache#8499) 9cc4194 HDDS-13089. [snapshot] Add an integration test to verify snapshotted data can be read by S3 SDK client (apache#8495) cb9867b HDDS-13065. Refactor SnapshotCache to return AutoCloseSupplier instead of ReferenceCounted (apache#8473) a88ff71 HDDS-10979. Support STANDARD_IA S3 storage class to accept EC replication config (apache#8399) 6ec8f85 HDDS-13080. Improve delete metrics to show number of timeout DN command from SCM (apache#8497) 3bb8858 HDDS-12378. Change default hdds.scm.safemode.min.datanode to 3 (apache#8331) 0171bef HDDS-13073. Set pipeline ID in checksums verifier to avoid cached pipeline with different node (apache#8480) 5c7726a HDDS-11539. OzoneClientCache `@PreDestroy` is never called (apache#8493) a8ed19b HDDS-13031. Implement a Flat Lock resource in OzoneManagerLock (apache#8446) e9e8b30 HDDS-12935. Support unsigned chunked upload with STREAMING-UNSIGNED-PAYLOAD-TRAILER (apache#8366) 7590268 HDDS-13079. Improve logging in DN for delete operation. (apache#8489) 435fe7e HDDS-12870. Fix listObjects corner cases (apache#8307) eb5dabd HDDS-12926. Remove *.tmp.* exclusion in DU (apache#8486) eeb98c7 HDDS-13030. Snapshot Purge should unset deep cleaning flag for next 2 snapshots in the chain (apache#8451) 6bf121e HDDS-13032. Support proper S3OwnerId representation (apache#8478) 5d1b43d HDDS-13076. Refactor OzoneManagerLock class to rename Resource class to LeveledResource (apache#8482) bafe6d9 HDDS-13064. [snapshot] Add test coverage for SnapshotUtils.isBlockLocationInfoSame() (apache#8476) 7035846 HDDS-13040. Add user doc highlighting the difference between Ozone ACL and S3 ACL. (apache#8457) 1825cdf HDDS-13049. Deprecate VolumeName & BucketName in OmKeyPurgeRequest and prevent Key version purge on Block Deletion Failure (apache#8463) 211c76c HDDS-13060. Change NodeManager.addDatanodeCommand(..) to use DatanodeID (apache#8471) f410238 HDDS-13061. Add test for key ACL operations without permission (apache#8472) d1a2f48 HDDS-13057. Increment block delete processed transaction counts regardless of log level (apache#8466) 0cc6fcc HDDS-13043. Replace != with assertNotEquals in TestSCMContainerPlacementRackAware (apache#8470) e1c779a HDDS-13051. Use DatanodeID in server-scm. (apache#8465) 35e1126 HDDS-13042. [snapshot] Add future proofing test cases for unsupported file system API (apache#8458) 619c05d HDDS-13008. Exclude same SST files when calculating full snapdiff (apache#8423) 21b49d3 HDDS-12965. Fix warnings about "used undeclared" dependencies (apache#8468) 8136119 HDDS-13048. Create new module for Recon integration tests (apache#8464) Conflicts: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeManager.java

…t out of cache (apache#8141) (cherry picked from commit 006e691) Conflicts: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBStore.java hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotCache.java hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java Change-Id: Ice4365b4346e03efb1a4419ebf2406064a9e625c

HDDS-12503. Compact snapshot DB before evicting a snapshot out of cache

0294847

ivandika3 added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Mar 24, 2025

ivandika3 requested a review from swamirishi March 24, 2025 01:49

Tejaskriya reviewed Mar 25, 2025

View reviewed changes

peterxcli marked this pull request as draft April 11, 2025 13:50

peterxcli marked this pull request as ready for review April 23, 2025 02:49

jojochuang requested a review from smengcl April 30, 2025 03:40

Merge remote-tracking branch 'upstream/master' into hdds12503-compact…

ec156b2

…-snapshot-db-before-evicting-a-snapshot-out-of-cache

smengcl reviewed May 1, 2025

View reviewed changes

smengcl approved these changes May 1, 2025

View reviewed changes

swamirishi requested changes May 1, 2025

View reviewed changes

swamirishi reviewed May 1, 2025

View reviewed changes

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java Outdated Show resolved Hide resolved

peterxcli added 3 commits May 11, 2025 09:42

Add config to enable/disable compaction during snapshot eviction from…

c4e1e76

… cache

Merge remote-tracking branch 'upstream/master' into hdds12503-compact…

711988e

…-snapshot-db-before-evicting-a-snapshot-out-of-cache

Add ozone.om.snapshot.compact.non.snapshot.diff.tables in ozone-def…

1b47e0e

…ault.xml

peterxcli requested a review from jojochuang May 11, 2025 19:00

jojochuang approved these changes May 12, 2025

View reviewed changes

peterxcli requested a review from swamirishi May 13, 2025 03:25

Merge remote-tracking branch 'upstream/master' into hdds12503-compact…

d9020de

…-snapshot-db-before-evicting-a-snapshot-out-of-cache

swamirishi requested changes May 17, 2025

View reviewed changes

peterxcli added 2 commits May 24, 2025 01:52

Merge remote-tracking branch 'upstream/master' into hdds12503-compact…

6aad265

…-snapshot-db-before-evicting-a-snapshot-out-of-cache

move compact snapshot db out of map.compute to prevent from blocking …

9fea1b2

…other thread that reading same key

peterxcli requested a review from swamirishi May 24, 2025 06:32

jojochuang approved these changes May 30, 2025

View reviewed changes

peterxcli merged commit 006e691 into apache:master May 31, 2025
43 checks passed

This was referenced Jun 3, 2025

HDDS-13169. Intermittent failure in testSnapshotOperationsNotBlockedDuringCompaction #8553

Merged

HDDS-13114. Make MaxSubCompactions configureable for manual compaction #8572

Closed

HDDS-12503. Compact snapshot DB before evicting a snapshot out of cache #8141

HDDS-12503. Compact snapshot DB before evicting a snapshot out of cache #8141

Uh oh!

Conversation

peterxcli commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

Tejaskriya left a comment

Choose a reason for hiding this comment

Uh oh!

peterxcli commented Mar 25, 2025

Uh oh!

Tejaskriya commented Mar 27, 2025

Uh oh!

jojochuang commented Apr 22, 2025

Uh oh!

jojochuang commented Apr 30, 2025

Uh oh!

smengcl commented May 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smengcl left a comment

Choose a reason for hiding this comment

Uh oh!

swamirishi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jojochuang commented May 1, 2025

Uh oh!

peterxcli commented May 14, 2025

Uh oh!

swamirishi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swamirishi May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterxcli commented May 31, 2025

Uh oh!

Uh oh!

peterxcli commented May 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

peterxcli commented Mar 23, 2025 •

edited

Loading

swamirishi May 17, 2025 •

edited

Loading