Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

ContainerStateManagerImpl uses LockManager for maintaining a set of locks to be used per container ID. Similar to HDDS-8765, it should be replaced with Striped locks.

https://issues.apache.org/jira/browse/HDDS-8910

How was this patch tested?

CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/5346927967

@Galsza
Copy link
Contributor

Galsza commented Jun 22, 2023

@adoroszlai thanks for working on this, +1
Just for my curiosity, why was the initial stripe size chosen to be 512?

@adoroszlai
Copy link
Contributor Author

@duongkame
Copy link
Contributor

duongkame commented Jun 23, 2023

Just for my curiosity, why was the initial stripe size chosen to be 512?

Shameless copy from @duongkame's PR:

With striped locks, collisions can happen. It is when different keys are resolved to the same lock instance. Note that the lock instance is resolved based on the key hashCode % striped_size. Collisions can slow things down because the works that can be done in parallel (with LockManager) now get to block each other. However, they don't result in deadlocks. So, it's ok to have collisions at a low rate.

The size of the lock pool (Striped size) is a tradeoff between getting a low collision rate and memory footprint.
Assuming we have 100 threads competing striped locks, the maximum number of unique lock keys to be requested concurrently is 100. Then, if the striped size is 100 and hashCode is perfect, the probability of collisions is ~0. Yet, because hashCode is not perfect, we can simply have Striped size as X times the number of threads. In OM, the default number of RPC handler threads is 100, and I put X=5, hence 512 (closest binary). I didn't test it thoroughly and 5X may be too generous. Think 2X is quite enough to keep the collision rate ~0.
And of course, the right Stripe size also depends on the cardinality of the keys space.

This is nowhere near a complete theory, but it should give you some ideas to choose the right default Striped size for container IDs.
@adoroszlai @Galsza

Copy link
Contributor

@duongkame duongkame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a very nice change. Thanks for the patch @adoroszlai. +1

*
* This class is NOT thread safe. All the calls are idempotent.
*/
public final class ContainerStateManagerImpl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai , do we need similar striped lock in org.apache.hadoop.hdds.scm.container.ContainerManagerImpl class to be changed as well ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out ContainerManagerImpl.

ContainerManagerImpl uses a single lock, not a set of "managed" locks. It may be possible to improve that by adding a striped lock for some operations that apply to a single container, but that's a different task.

@adoroszlai adoroszlai merged commit 0a3d585 into apache:master Jun 23, 2023
@adoroszlai adoroszlai deleted the HDDS-8910 branch June 23, 2023 18:30
@adoroszlai
Copy link
Contributor Author

Thanks @devmadhuu, @duongkame, @Galsza for the review.

errose28 added a commit to errose28/ozone that referenced this pull request Jun 25, 2023
* master: (96 commits)
  HDDS-8586 Recon. - API for Count of deletePending keys and amount of data mapped to such keys. (apache#4923)
  HDDS-8908. Intermittent failure in TestBlockDeletion#testBlockDeletion (apache#4958)
  HDDS-8910. Replace LockManager with striped lock in ContainerStateManager (apache#4962)
  HDDS-8917. Move protobuf conversion out of the lock in PipelineStateManagerImpl (apache#4965)
  HDDS-8825. Use apache/hadoop 3.3.5 docker image (apache#4963)
  HDDS-8906. Avoid stream when getting in-service healthy nodes (apache#4960)
  HDDS-8907. Store volume count when storage report is updated (apache#4957)
  HDDS-8905. PipelineManager metrics should not be synchronized (apache#4959)
  HDDS-8553. Improve scanner integration tests. (apache#4936)
  HDDS-8854. Avoid unnecessary DatanodeDetails creation for NodeStateManager lookup (apache#4925)
  HDDS-8315. [Snapshot] Added unit tests for SnapshotDiffManager (apache#4716)
  HDDS-7968. [Snapshot] Improve KeyDeletingService to reclaim eligible key blocks in snapshot's deletedTable (apache#4935)
  HDDS-8838. Update default datanode check empty containter on disk to false (apache#4937)
  HDDS-8763. Support RocksDB iterator with ByteBuffer. (apache#4942)
  HDDS-8543. FSO directory should reflect bucket/cluster default replication (apache#4947)
  HDDS-8898. Replication limit should not be less than reconstruction weight (apache#4954)
  HDDS-8739. Snapdiff should return complete absolute path in Diff Entry (apache#4823)
  HDDS-8908. Mark TestBlockDeletion#testBlockDeletion as flaky
  HDDS-8534. Support asynchronous service logging (apache#4663)
  HDDS-8879. Cleanup SecurityConfig and related class initialization (apache#4921)
  ...
errose28 added a commit to errose28/ozone that referenced this pull request Jun 26, 2023
* tmp-dir-refactor: (99 commits)
  HDDS-8586 Recon. - API for Count of deletePending keys and amount of data mapped to such keys. (apache#4923)
  Fix SCM HA finalization compat test
  HDDS-8908. Intermittent failure in TestBlockDeletion#testBlockDeletion (apache#4958)
  HDDS-8910. Replace LockManager with striped lock in ContainerStateManager (apache#4962)
  HDDS-8917. Move protobuf conversion out of the lock in PipelineStateManagerImpl (apache#4965)
  HDDS-8825. Use apache/hadoop 3.3.5 docker image (apache#4963)
  HDDS-8906. Avoid stream when getting in-service healthy nodes (apache#4960)
  HDDS-8907. Store volume count when storage report is updated (apache#4957)
  HDDS-8905. PipelineManager metrics should not be synchronized (apache#4959)
  HDDS-8553. Improve scanner integration tests. (apache#4936)
  HDDS-8854. Avoid unnecessary DatanodeDetails creation for NodeStateManager lookup (apache#4925)
  HDDS-8315. [Snapshot] Added unit tests for SnapshotDiffManager (apache#4716)
  HDDS-7968. [Snapshot] Improve KeyDeletingService to reclaim eligible key blocks in snapshot's deletedTable (apache#4935)
  HDDS-8838. Update default datanode check empty containter on disk to false (apache#4937)
  HDDS-8763. Support RocksDB iterator with ByteBuffer. (apache#4942)
  HDDS-8543. FSO directory should reflect bucket/cluster default replication (apache#4947)
  HDDS-8898. Replication limit should not be less than reconstruction weight (apache#4954)
  HDDS-8739. Snapdiff should return complete absolute path in Diff Entry (apache#4823)
  HDDS-8908. Mark TestBlockDeletion#testBlockDeletion as flaky
  HDDS-8534. Support asynchronous service logging (apache#4663)
  ...
errose28 added a commit to errose28/ozone that referenced this pull request Jun 26, 2023
* master: (79 commits)
  HDDS-8914. Datanode may fail to start due to duplicate VolumeInfoMetrics (apache#4966)
  HDDS-8921. Add support for EC in Freon SCM block generator (apache#4982)
  HDDS-8927. Metadata scanner should not scan unhealthy containers. (apache#4976)
  HDDS-8929. Avoid list allocation for pipeline search (apache#4980)
  HDDS-8778. Support recursive volume delete using Ozone sh command. (apache#4842)
  HDDS-8885. Quota repair count enable quota feature for old bucket/volume. (apache#4941)
  HDDS-8771. Refactor volume level tmp directory for generic usage. (apache#4838)
  HDDS-8922. Random EC read pipeline ID causes XceiverClient cache churn (apache#4971)
  HDDS-8586 Recon. - API for Count of deletePending keys and amount of data mapped to such keys. (apache#4923)
  HDDS-8908. Intermittent failure in TestBlockDeletion#testBlockDeletion (apache#4958)
  HDDS-8910. Replace LockManager with striped lock in ContainerStateManager (apache#4962)
  HDDS-8917. Move protobuf conversion out of the lock in PipelineStateManagerImpl (apache#4965)
  HDDS-8825. Use apache/hadoop 3.3.5 docker image (apache#4963)
  HDDS-8906. Avoid stream when getting in-service healthy nodes (apache#4960)
  HDDS-8907. Store volume count when storage report is updated (apache#4957)
  HDDS-8905. PipelineManager metrics should not be synchronized (apache#4959)
  HDDS-8553. Improve scanner integration tests. (apache#4936)
  HDDS-8854. Avoid unnecessary DatanodeDetails creation for NodeStateManager lookup (apache#4925)
  HDDS-8315. [Snapshot] Added unit tests for SnapshotDiffManager (apache#4716)
  HDDS-7968. [Snapshot] Improve KeyDeletingService to reclaim eligible key blocks in snapshot's deletedTable (apache#4935)
  ...
vtutrinov pushed a commit to Cyrill/ozone that referenced this pull request Jul 3, 2023
vtutrinov pushed a commit to Cyrill/ozone that referenced this pull request Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants