HDDS-8153. Integrate ContainerBalancer with MoveManager #4391

siddhantsangwan · 2023-03-13T14:51:54Z

What changes were proposed in this pull request?

Currently ContainerBalancer uses ReplicationManager#move which internally uses LegacyReplicationManager. This jira proposes integrating MoveManager with ContainerBalancer such that it uses MoveManager#move if "hdds.scm.replication.enable.legacy" is false and ReplicationManager#move if it's true.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8153

How was this patch tested?

Added a UT and modified existing ones

adoroszlai

LGTM so far.

adoroszlai · 2023-03-14T10:38:42Z

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java

+      ReplicationManager.ReplicationManagerConfiguration rmConf =
+          ozoneConfiguration.getObject(
+              ReplicationManager.ReplicationManagerConfiguration.class);


nit: I think this should be stored in the constructor.

sodonnel · 2023-03-14T11:47:43Z

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java

+          if (result == MoveManager.MoveResult.COMPLETED) {
+            sizeActuallyMovedInLatestIteration +=
+                containerInfo.getUsedBytes();
+            if (LOG.isDebugEnabled()) {


NIT: We usually don't need to wrap debug calls in if (LOG.isDebugEnabled) if the parameters to the log are simple getters, which I think they are here.

siddhantsangwan · 2023-03-14T12:29:32Z

Thanks for the reviews. I've addressed them in the latest commit.

sodonnel · 2023-03-14T12:36:20Z

.../main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerConfiguration.java

  private String excludeContainers = "";

-  @Config(key = "move.timeout", type = ConfigType.TIME, defaultValue = "30m",
+  @Config(key = "move.timeout", type = ConfigType.TIME, defaultValue = "60m",


We can do this in another PR, but there is a timeout hard coded into MoveManager right now. Probably we need to pass this value into MoveManager somehow so it passes a sensible timeout to RM when scheduling the command. That will then set the DN deadline and the pending Ops timeout.

Yes, I've reverted any changes in this PR related to timeouts. We can do that in the next PR.

sodonnel · 2023-03-14T12:37:18Z

.../main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerConfiguration.java


  @Config(key = "balancing.iteration.interval", type = ConfigType.TIME,
-      defaultValue = "70m", tags = {ConfigTag.BALANCER}, description =
+      defaultValue = "130m", tags = {ConfigTag.BALANCER}, description =


Should the timeout and interval not be quite close in time? If commands timeout after 60 mins, and the interval is 130m, does that mean the balancer will go idle for some time in between?

siddhantsangwan · 2023-03-15T09:11:00Z

I noticed a subtle bug in this PR. We're only constructing MoveManager once when ContainerBalancerTask is constructed. But ContainerBalancerTask works in iterations and resets and reuses any collections or members that hold state. Since MoveManager tracks moves using ContainerID as the key in a hash map, we need to reset its state between iterations too.

…balancer's each iteration

sodonnel

This changes are good to commit if we get green CI.

I had a discussion with @siddhantsangwan and we know we need to take MoveManager construction out of ContainerBalancerTask and inject the dependency, and also make the MoveManager instance register with ContainerReplicaPendingOps.

We are going to do that in a followup PR immediately after this one.

siddhantsangwan · 2023-03-15T16:58:34Z

Thanks for the reviews. Merging now that CI is green. Follow up Jira: https://issues.apache.org/jira/browse/HDDS-8167

* master: (262 commits) HDDS-8153. Integrate ContainerBalancer with MoveManager (apache#4391) HDDS-8090. When getBlock from a datanode fails, retry other datanodes. (apache#4357) HDDS-8163 Use try-with-resources to ensure close rockdb connection in SstFilteringService (apache#4402) HDDS-8065. Provide GNU long options (apache#4394) HDDS-7930. [addendum] input stream does not refresh expired block token. HDDS-7930. input stream does not refresh expired block token. (apache#4378) HDDS-7740. [Snapshot] Implement SnapshotDeletingService (apache#4244) HDDS-8076. Use container cache in Key listing API. (apache#4346) HDDS-8091. [addendum] Generate list of config tags from ConfigTag enum - Hadoop 3.1 compatibility fix (apache#4374) HDDS-8144. TestDefaultCertificateClient#testTimeBeforeExpiryGracePeriod fails as we approach DST. (apache#4382) HDDS-8151. Support fine grained lifetime for root CA certificate (apache#4386) HDDS-8150. RpcClientTest and ConfigurationSourceTest not run due to naming convention (apache#4388) HDDS-8131. Add Configuration for OM Ratis Log Purge Tuning Parameters. (apache#4371) HDDS-8133. Create ozone sh key checksum command (apache#4375) HDDS-8142. Check if no entries in Block DB for a container on container delete (apache#4379) HDDS-8118. Fail container delete on non empty chunks dir (apache#4367) HDDS-8028. JNI for RocksDB SST Dump tool (apache#4315) HDDS-8129. ContainerStateMachine allows two different tasks with the same container id running in parallel. (apache#4370) HDDS-8119. Remove loosely related AutoCloseable from SendContainerOutputStream (apache#4368) close db connection (apache#4366) ...

siddhantsangwan added 2 commits March 13, 2023 20:15

HDDS-8153. Integrate ContainerBalancer with MoveManager

293da24

Merge branch 'master' into HDDS-8153

68d7c1e

siddhantsangwan requested review from adoroszlai and sodonnel March 13, 2023 14:51

adoroszlai reviewed Mar 14, 2023

View reviewed changes

sodonnel reviewed Mar 14, 2023

View reviewed changes

change config defaults and address review

ba5c810

siddhantsangwan marked this pull request as ready for review March 14, 2023 12:25

sodonnel reviewed Mar 14, 2023

View reviewed changes

revert changes related to timeouts. Reset moveManager's state before …

b90a708

…balancer's each iteration

sodonnel approved these changes Mar 15, 2023

View reviewed changes

siddhantsangwan merged commit 2fc0117 into apache:master Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-8153. Integrate ContainerBalancer with MoveManager #4391

HDDS-8153. Integrate ContainerBalancer with MoveManager #4391

Uh oh!

siddhantsangwan commented Mar 13, 2023

Uh oh!

adoroszlai left a comment

Uh oh!

adoroszlai Mar 14, 2023

Uh oh!

sodonnel Mar 14, 2023

Uh oh!

siddhantsangwan commented Mar 14, 2023

Uh oh!

sodonnel Mar 14, 2023

Uh oh!

siddhantsangwan Mar 15, 2023

Uh oh!

sodonnel Mar 14, 2023

Uh oh!

siddhantsangwan commented Mar 15, 2023 •

edited

Loading

Uh oh!

sodonnel left a comment

Uh oh!

siddhantsangwan commented Mar 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HDDS-8153. Integrate ContainerBalancer with MoveManager #4391

HDDS-8153. Integrate ContainerBalancer with MoveManager #4391

Uh oh!

Conversation

siddhantsangwan commented Mar 13, 2023

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

sodonnel Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan commented Mar 14, 2023

Uh oh!

sodonnel Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Mar 15, 2023

Choose a reason for hiding this comment

Uh oh!

sodonnel Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan commented Mar 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sodonnel left a comment

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan commented Mar 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

siddhantsangwan commented Mar 15, 2023 •

edited

Loading