HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers #4447

sodonnel · 2023-03-22T11:56:28Z

What changes were proposed in this pull request?

Similar to ReplicateContainerCommands, we should limit the number of delete commands queued on a given datanode at any time. This PR will enforce the limit with a static config variable with a view to making this more dynamic later.

This change does not limit any delete container commands sent from the health check chain in RM. It only affects deletes for the Ratis and EC Over Replication Handlers, which should drive the bulk of the deletes.

Note that delete container replicas from the balancer are not throttled. The balancer issues moves in a controlled way, and its deletes are triggered when a replication completes. Therefore its deletes are naturally throttled by the rate of completion of the replicated commands. It will not flood the cluster with deletes like could happen when a couple of dead nodes are brought back into the cluster still with their containers in place.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8233

How was this patch tested?

New unit tests added.

adoroszlai

LGTM. The only issue I found is a stale javadoc comment, but to save CI time it can be fixed later in any upcoming patch.

adoroszlai · 2023-03-22T13:16:35Z

.../java/org/apache/hadoop/hdds/scm/container/replication/CommandTargetOverloadedException.java

 import java.io.IOException;

 /**
 * Exception class used to indicate that all sources are overloaded.


nit: stale javadoc comment

* master: (43 commits) HDDS-8148. Improve log for Pipeline creation failure (apache#4385) HDDS-7853. Add support for RemoveSCM in SCMRatisServer. (apache#4358) HDDS-8042. Display certificate issuer in cert list command. (apache#4429) HDDS-8189. [Snapshot] renamedKeyTable should only track keys in buckets that has at least one active snapshot. (apache#4436) HDDS-8154. Perf: Reuse Mac instances in S3 token validation (apache#4433) HDDS-8245. Info log for keyDeletingService when nonzero number of keys are deleted. (apache#4451) HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers (apache#4447) HDDS-8220. [Ozone-Streaming] Trigger volume check on IOException in StreamDataChannelBase (apache#4428) HDDS-8173. Fix to remove enrties from RocksDB after container gets deleted. (apache#4445) HDDS-7975. Rebalance acceptance tests (apache#4437) HDDS-8152. Reduce S3 acceptance test setup time (apache#4393) HDDS-8172. ECUnderReplicationHandler should consider commands already sent when processing the container (apache#4435) HDDS-7883. [Snapshot] Accommodate FSO, key renames and implement OMSnapshotPurgeRequest for SnapshotDeletingService (apache#4407) HDDS-8168. Make deadlines inside MoveManager for move commands configurable (apache#4415) HDDS-7918. EC: ECBlockReconstructedStripeInputStream should check for spare replicas before failing an index (apache#4441) HDDS-8222. EndpointBase#getBucket should handle BUCKET_NOT_FOUND (apache#4431) HDDS-8068. Fix Exception: JMXJsonServlet, getting attribute RatisRoles of Hadoop:service=OzoneManager. (apache#4352) HDDS-8139. Datanodes should not drop block delete transactions based on transaction ID (apache#4384) HDDS-8216. EC: OzoneClientConfig is overwritten in ECKeyOutputStream (apache#4425) HDDS-8054. Fix NPE in metrics for failed volume (apache#4340) ...

S O'Donnell added 5 commits March 22, 2023 11:45

Create throttleDeleteContainer command and tests

32e6b45

Remove createThrottledReplicationCommand method only used in tests

aae1a1d

Changes to RatisOverReplicationHandler to throttle the delete commands

37ad6c5

Throttle deletes in the ECOverReplicationHandler

7033047

Rename AllSourcesOverloadedException to CommandTargetOverloadedException

e54f9db

sodonnel changed the title ~~Hdds 8233~~ HDDS-8233. ReplicationManager: Throttle delete container commands from over replication handlers Mar 22, 2023

adoroszlai approved these changes Mar 22, 2023

View reviewed changes

adoroszlai changed the title ~~HDDS-8233. ReplicationManager: Throttle delete container commands from over replication handlers~~ HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers Mar 22, 2023

adoroszlai merged commit 6dd80eb into apache:master Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers #4447

HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers #4447

Uh oh!

sodonnel commented Mar 22, 2023

Uh oh!

adoroszlai left a comment

Uh oh!

adoroszlai Mar 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers #4447

HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers #4447

Uh oh!

Conversation

sodonnel commented Mar 22, 2023

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai Mar 22, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants