Skip to content

Conversation

@aryangupta1998
Copy link
Contributor

What changes were proposed in this pull request?

Background deletion configurations were initially chosen based on educated guesses and lacked validation on real-world workloads. I have now tested large-scale deletions involving millions of files and directories, on a real cluster. Based on the results, I'm updating the relevant property values to better reflect practical performance and scalability.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11514

How was this patch tested?

Tested Manually.

@ivandika3
Copy link
Contributor

ivandika3 commented Jul 10, 2025

@aryangupta1998 Thanks for the patch.

Based on the results, I'm updating the relevant property values to better reflect practical performance and scalability.

Could you share some of these results? What are the tradeoffs that need to be considered to decide on what is optimal?

@aryangupta1998
Copy link
Contributor Author

@ivandika3, sharing some results from my testing:

These three properties are interdependent:

  • ozone.key.deleting.limit.per.task (default: 20k)

  • hdds.scm.block.deletion.per-interval.max (default: 100k)

  • hdds.datanode.block.deleting.limit.per.interval (default: 5k)

Test setup: HA cluster with 10 Datanodes
For 1 million keys, I observed:

  • OM took ~50 minutes

  • SCM took ~30 minutes

  • DNs took ~1 hour

Based on my observations:

  • A key with a 10-character name and 10 blocks results in 20k keys consuming ~1.8 MB when submitted to SCM via Ratis.

  • 100k blocks in SCM consume ~1 MB.

Considering the default Ratis log appender size of 32 MB, I tuned the configs as follows:

  • ozone.key.deleting.limit.per.task → 50k

  • hdds.scm.block.deletion.per-interval.max → 500k

  • hdds.datanode.block.deleting.limit.per.interval → 20k

Results:

  • OM took ~20 minutes

  • SCM took ~6 minutes

  • DNs took ~15 minutes

OM’s 50k key batch consumed ~10 MB, and SCM’s batch used ~5 MB — both safely within Ratis limits.

I also tested with:

  • ozone.key.deleting.limit.per.task → 100k

  • hdds.scm.block.deletion.per-interval.max → 2500k

  • hdds.datanode.block.deleting.limit.per.interval → 100k

This worked fine too, but I avoided setting such high defaults.

Additionally, I’ve introduced new metrics, added them to Grafana, and created a lightweight dashboard to track deletion progress.
You can find all related work under the umbrella JIRA: HDDS-11506 Improvements for large scale deletion.

@ivandika3 ivandika3 requested a review from xichen01 July 12, 2025 14:01
Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aryangupta1998 Thanks for sharing the results. The configs look reasonable. LGTM +1.

Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aryangupta1998 for the improvement, LGTM.

@ivandika3 ivandika3 merged commit 3171688 into apache:master Jul 14, 2025
42 checks passed
@ivandika3
Copy link
Contributor

Thanks @aryangupta1998 for the patch and @ashishkumar50 for the review.

@errose28
Copy link
Contributor

@aryangupta1998 did you also check directory deleting service for both deep and wide filesystem trees?

errose28 added a commit to errose28/ozone that referenced this pull request Jul 22, 2025
* master: (90 commits)
  HDDS-13308. OM should expose Ratis config for increasing pending write limits (apache#8668)
  HDDS-8903. Add validation for ozone.om.snapshot.db.max.open.files. (apache#8787)
  HDDS-13429. Custom metadata headers with uppercase characters are not supported (apache#8805)
  HDDS-13448. DeleteBlocksCommandHandler thread stop for normal exception (apache#8816)
  HDDS-13346. Intermittent failure in TestCloseContainer#testContainerChecksumForClosedContainer (apache#8771)
  HDDS-13125. Add metrics for monitoring the SST file pruning threads. (apache#8764)
  HDDS-13367. [Docs] User doc for container balancer. (apache#8726)
  HDDS-13200. OM RocksDB Grafana Dashbroad shows no data on all panels (apache#8577)
  HDDS-13428. Recon - Retrigger of build whole NSSummary tree task submission inconsistency. (apache#8793)
  HDDS-13378. [Docs] Add a Production page under Getting Started (apache#8734)
  HDDS-13403. [Docs] Make feature proposal process more visible. (apache#8758)
  HDDS-11797. Remove cyclic dependency between SCMSafeModeManager and SafeModeRules (apache#8782)
  HDDS-13213. KeyDeletingService should limit task size by both key count and serialized size. (apache#8757)
  HDDS-13387. OMSnapshotCreateRequest logs invalid warning about DefaultReplicationConfig (apache#8760)
  HDDS-13405. ozone admin container create runs forever without kinit (apache#8765)
  HDDS-11514. Set optimal default values for delete configurations based on live cluster testing. (apache#8766)
  HDDS-13376. Add server-side limit note to ozone sh snapshot diff --page-size option (apache#8791)
  HDDS-11679. Support multiple S3Gs in MiniOzoneCluster (apache#8733)
  HDDS-13424. Use lsof instead of fuser to find if file is used in AbstractTestChunkManager (apache#8790)
  HDDS-13427. Bump awssdk to 2.31.78 (apache#8792)
  ...
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jul 31, 2025
swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025
…rations based on live cluster testing. (apache#8766)

(cherry picked from commit 3171688)

 Conflicts:
	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java

Change-Id: I5e5becdd8f53d25d7325fd6b1081bf2fe81e6b84
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants