HDDS-13639. Optimize container iterator for frequent operation#9147
HDDS-13639. Optimize container iterator for frequent operation#9147sumitagrawl merged 4 commits intoapache:masterfrom
Conversation
|
Hi @sarvekshayr, thanks for working on this. The overall code looks good to me. From my understanding, since the computation primarily happens during container additions, the performance impact should be minimal. However, it would be good to document these details in the PR description, covering both the heap memory footprint and performance considerations. @sumitagrawl what is your thoughts on this? |
sumitagrawl
left a comment
There was a problem hiding this comment.
@sarvekshayr We need change implementation to keep only containerId in HddsVolume itself, being populated / managed via ContainerSet.
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolume.java
Show resolved
Hide resolved
|
@priyeshkaratha Good point for consideration of memory foot print, Performance impact is while below cases:
So earlier to get any volume specific information, it need iterate 900k times, but volume is having only 10k containers as applicable. As per test locally, if iterate 10k containers to get data, performance is approx 3 times faster. But for Size as used in metrics of volume container count, its just constant time |
…ation (apache#9147) (cherry picked from commit 65fb295)
…ation (apache#9147) (cherry picked from commit 65fb295)
What changes were proposed in this pull request?
In a datanode environment with 900k containers, frequent operations like
ContainerController.getContainerCount(HddsVolume)and iterating through containers per volume were extremely slow. Looping 1k times took 2.5 minutes.Added two new data structures to
ContainerSet:volumeToContainersMap: ConcurrentHashMap<String, ConcurrentSkipListSet>Maps volume UUID -> sorted set of container IDs on that volume
Updated on every
addContainer()andremoveContainer().volumeContainerCountCache: ConcurrentHashMap<String, AtomicLong>Caches the count of containers per volume
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13639
How was this patch tested?
Added
testContainerCountPerVolumeandtestContainerIteratorPerVolumeinTestContainerSet.