HDDS-13639. Optimize container iterator for frequent operation by sarvekshayr · Pull Request #9147 · apache/ozone

sarvekshayr · 2025-10-13T14:43:13Z

What changes were proposed in this pull request?

In a datanode environment with 900k containers, frequent operations like ContainerController.getContainerCount(HddsVolume) and iterating through containers per volume were extremely slow. Looping 1k times took 2.5 minutes.

Added two new data structures to ContainerSet:

volumeToContainersMap: ConcurrentHashMap<String, ConcurrentSkipListSet>
Maps volume UUID -> sorted set of container IDs on that volume
Updated on every addContainer() and removeContainer().
volumeContainerCountCache: ConcurrentHashMap<String, AtomicLong>
Caches the count of containers per volume

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13639

How was this patch tested?

Added testContainerCountPerVolume and testContainerIteratorPerVolume in TestContainerSet.

priyeshkaratha · 2025-10-15T08:16:45Z

Hi @sarvekshayr, thanks for working on this. The overall code looks good to me.
I have a few suggestions:
I noticed that two new maps have been introduced — the first one’s size will be proportional to the number of containers. It might be helpful to include a note in the PR about the expected memory overhead. For example, with around 900k containers and 50 volumes, the additional heap usage could be roughly 50–70 MB.

From my understanding, since the computation primarily happens during container additions, the performance impact should be minimal. However, it would be good to document these details in the PR description, covering both the heap memory footprint and performance considerations.

@sumitagrawl what is your thoughts on this?

sumitagrawl

@sarvekshayr We need change implementation to keep only containerId in HddsVolume itself, being populated / managed via ContainerSet.

...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java

...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolume.java

sumitagrawl · 2025-10-21T17:11:46Z

@priyeshkaratha Good point for consideration of memory foot print,
We can use ConcurrentHashSet, whose memory footprint for every Long element added is 56 bytes, so overall will be 900k * 56 bytes in this exampl => 50.4 MB.

Performance impact is while below cases:

metrics publish for container count for the volume
container iterator for the volume
getting volume used bytes

So earlier to get any volume specific information, it need iterate 900k times, but volume is having only 10k containers as applicable.

As per test locally, if iterate 10k containers to get data, performance is approx 3 times faster. But for Size as used in metrics of volume container count, its just constant time

sumitagrawl

LGTM

…ation (apache#9147) (cherry picked from commit 65fb295)

HDDS-13639. Optimize container iterator for frequent operation

fc9a47f

sumitagrawl reviewed Oct 17, 2025

View reviewed changes

...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java Outdated Show resolved Hide resolved

sarvekshayr added 2 commits October 20, 2025 21:56

Track containers in HddsVolume

8fd59ff

Fixed TestDeleteBlocksCommandHandler failures

6123c9c

sarvekshayr requested a review from sumitagrawl October 21, 2025 05:53

sumitagrawl reviewed Oct 21, 2025

View reviewed changes

...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java Show resolved Hide resolved

sumitagrawl reviewed Oct 21, 2025

View reviewed changes

...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolume.java Show resolved Hide resolved

Use container iterator in gatherContainerUsages and fix tests

044183f

sumitagrawl approved these changes Oct 23, 2025

View reviewed changes

sumitagrawl merged commit 65fb295 into apache:master Oct 24, 2025
43 checks passed

swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025

CDPD-89907. HDDS-13639. Optimize container iterator for frequent oper…

c2560fd

…ation (apache#9147) (cherry picked from commit 65fb295)

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jan 12, 2026

CDPD-92413. HDDS-13639. Optimize container iterator for frequent oper…

75b80c1

…ation (apache#9147) (cherry picked from commit 65fb295)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-13639. Optimize container iterator for frequent operation#9147

HDDS-13639. Optimize container iterator for frequent operation#9147
sumitagrawl merged 4 commits intoapache:masterfrom
sarvekshayr:HDDS-13639

sarvekshayr commented Oct 13, 2025

Uh oh!

priyeshkaratha commented Oct 15, 2025 •

edited

Loading

Uh oh!

sumitagrawl left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sumitagrawl commented Oct 21, 2025

Uh oh!

sumitagrawl left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sarvekshayr commented Oct 13, 2025

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

priyeshkaratha commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sumitagrawl commented Oct 21, 2025

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

priyeshkaratha commented Oct 15, 2025 •

edited

Loading