Skip to content

Conversation

@aryangupta1998
Copy link
Contributor

What changes were proposed in this pull request?

While fetching keys for deletion based on keyLimitPerTask, we should also check that the size of the keys cumulatively does not exceed the Ratis limit (default: 32 MB).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13213

How was this patch tested?

Tested Manually.

@kerneltime kerneltime requested a review from Copilot July 8, 2025 19:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Enforce a cumulative serialized-size limit on deletion tasks alongside the existing key-count limit.

  • Introduce ratisByteLimit in the KeyManager API and implementations to stop fetching when serialized size crosses the threshold.
  • Update KeyDeletingService and SnapshotDeletingService to compute and pass the ratisByteLimit.
  • Extend tests to configure and use the new ratisLimit parameter in deletion calls.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
KeyManager.java Added ratisByteLimit parameter to pending-deletion and rename-fetch methods.
KeyManagerImpl.java Implemented size accumulation and break logic based on ratisByteLimit.
KeyDeletingService.java Initialized ratisByteLimit and supplied it to deletion fetch calls.
SnapshotDeletingService.java Updated calls to fetch deleted keys/renames to include ratisByteLimit (missing initialization).
DirectoryDeletingService.java Refined purge-path submission logic.
TestKeyDeletingService.java Configured and passed ratisLimit in test calls.
TestKeyManagerImpl.java Modified tests to supply ratisLimit to deletion and rename entry methods.
TestKeyPurging.java Added ratisLimit setup for integration test.
MapBackedTableIterator.java Updated test iterator to account for byte-size in Table.newKeyValue.
Comments suppressed due to low confidence (3)

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManager.java:160

  • [nitpick] The parameter name ratisLimit differs from ratisByteLimit used elsewhere; consider unifying the parameter name across methods for consistency.
      throws IOException;

hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/service/TestKeyDeletingService.java:1064

  • Tests currently validate normal deletion flows but don’t cover the scenario where cumulative serialized size hits the Ratis byte limit and truncates results. Consider adding a test case that creates a set of keys whose total size exceeds ratisLimit to verify the truncation behavior.
          .getKeyBlocksList().size();

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/SnapshotDeletingService.java:180

  • The field ratisByteLimit is not defined or initialized in SnapshotDeletingService, causing compilation errors. Consider adding a class member and initializing it in the constructor similar to KeyDeletingService.
                    null, (kv) -> true, remaining, ratisByteLimit);

if (serializedSize > ratisByteLimit) {
flag = true;
LOG.info(
"Total size of cumulative keys in a cycle crossed 90% ratis limit, serialized size: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a common occurrence, and this log will flood the logs. I would recommend adding a metric that be charted for measuring progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my testing, this scenario doesn't appear to be very common. In a typical production case, a key might have around 10 blocks. From test results, 20,000 such keys occupy roughly 2 MB (1.8 MB to be exact). Given that 90% of the default Ratis limit is about 29 MB, it would take approximately 300,000 keys in a single iteration to reach that limit — which seems unlikely in most practical scenarios.

Moreover, to successfully delete 300,000 keys in one iteration, the following configurations would need to be tuned:

hdds.scm.block.deletion.per-interval.max (on SCM)

hdds.datanode.block.deleting.limit.per.interval (on Datanodes)

Given the above, it makes sense to downgrade this log message to DEBUG level.

Let me know your thoughts — open to suggestions.

cc @ashishkumar50

Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aryangupta1998 Thanks for updating patch, LGTM.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ashishkumar50 ashishkumar50 merged commit 918b284 into apache:master Jul 14, 2025
42 checks passed
@swamirishi
Copy link
Contributor

@aryangupta1998 The check applied on KeyDeletingService is wrong to check for ratis request serialized size is wrong. We don’t send OmKeyInfo in the PurgeKeyRequest and send only the rocksdb key from deletedKeyTable. This calculation is unnecessary. We might need this check only for modified version keys but given that we would have only one OmKeyInfo in RepeatedKeyInfo we wouldn’t require any ratis serialized checks in KeyDeletingService.
We should revert this patch since this would have implications on the deletion speed.

errose28 added a commit to errose28/ozone that referenced this pull request Jul 22, 2025
* master: (90 commits)
  HDDS-13308. OM should expose Ratis config for increasing pending write limits (apache#8668)
  HDDS-8903. Add validation for ozone.om.snapshot.db.max.open.files. (apache#8787)
  HDDS-13429. Custom metadata headers with uppercase characters are not supported (apache#8805)
  HDDS-13448. DeleteBlocksCommandHandler thread stop for normal exception (apache#8816)
  HDDS-13346. Intermittent failure in TestCloseContainer#testContainerChecksumForClosedContainer (apache#8771)
  HDDS-13125. Add metrics for monitoring the SST file pruning threads. (apache#8764)
  HDDS-13367. [Docs] User doc for container balancer. (apache#8726)
  HDDS-13200. OM RocksDB Grafana Dashbroad shows no data on all panels (apache#8577)
  HDDS-13428. Recon - Retrigger of build whole NSSummary tree task submission inconsistency. (apache#8793)
  HDDS-13378. [Docs] Add a Production page under Getting Started (apache#8734)
  HDDS-13403. [Docs] Make feature proposal process more visible. (apache#8758)
  HDDS-11797. Remove cyclic dependency between SCMSafeModeManager and SafeModeRules (apache#8782)
  HDDS-13213. KeyDeletingService should limit task size by both key count and serialized size. (apache#8757)
  HDDS-13387. OMSnapshotCreateRequest logs invalid warning about DefaultReplicationConfig (apache#8760)
  HDDS-13405. ozone admin container create runs forever without kinit (apache#8765)
  HDDS-11514. Set optimal default values for delete configurations based on live cluster testing. (apache#8766)
  HDDS-13376. Add server-side limit note to ozone sh snapshot diff --page-size option (apache#8791)
  HDDS-11679. Support multiple S3Gs in MiniOzoneCluster (apache#8733)
  HDDS-13424. Use lsof instead of fuser to find if file is used in AbstractTestChunkManager (apache#8790)
  HDDS-13427. Bump awssdk to 2.31.78 (apache#8792)
  ...
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jul 31, 2025
aryangupta1998 pushed a commit to aryangupta1998/hadoop-ozone that referenced this pull request Aug 19, 2025
… key count and serialized size. (apache#8757)"

This reverts commit 918b284.
ashishkumar50 pushed a commit that referenced this pull request Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants