HDDS-7252. Polled source Datanodes are wrongly not re-considered for balancing in Container Balancer #6305

Tejaskriya · 2024-03-01T08:31:36Z

What changes were proposed in this pull request?

The FindSourceGreedy class maintains a priority queue of source datanodes. This class is used to get the next source DN for balancing. According to current design, a datanode can be involved in multiple source->target pairings in one iteration of container balancer.
When FindSourceGreedy#getNextCandidateSourceDataNode() is called during an iteration, it polls a DN from the queue. This DN is re queued when FindSourceGreedy#increaseSizeLeaving is called. Problem is that if ContainerBalancer#moveContainer returns false, then increaseSizeLeaving will not be called and the DN doesn't get added back.
If the failure of move operation is due to some errors related to the container selected in that run, then we should be adding back the DN to the queue. That way in the next run, the source DN could still be considered.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7252

How was this patch tested?

Added unit test in TestContainerBalancerTask

…balancing in Container Balancer

siddhantsangwan

I have a comment, otherwise looks good. Let's also add some unit tests.

siddhantsangwan · 2024-03-06T08:00:39Z

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java

+            // add source back if move failed due to container related errors
+            if (result == MoveManager.MoveResult.REPLICATION_FAIL_NOT_EXIST_IN_SOURCE ||
+                result == MoveManager.MoveResult.REPLICATION_FAIL_EXIST_IN_TARGET ||
+                result == MoveManager.MoveResult.REPLICATION_FAIL_CONTAINER_NOT_CLOSED ||
+                result == MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_DELETION ||
+                result == MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_REPLICATION) {
+              findSourceStrategy.addBackSourceDataNode(source);
+            }


We don't need to add the source back here I think because it'd already have been added back in

if (moveContainer(source, moveSelection)) { // consider move successful for now, and update selection criteria updateTargetsAndSelectionCriteria(moveSelection, source); }

when moveContainer returns true.

Since the whenComplete is process after the move work, which is async, I'm not sure if this will help with the moveSelection, the selection part might have finished already.

In my understanding, in case the move is completed within one iteration of the containerBalancer, then in the next run of the infinite loop wouldn't this change would help by adding the source back? What I mean is:

ContainerBalancer # doIteration() { ... //this logic runs for each iteration while(true) { ... // in one run of this loop, the source is selected and move is scheduled // in some consecutive run, the previous move is completed with false result, // so following current PR changes, the source is added back. // so now it can be considered for source selection next time ... } ... }

If we see in doIteration -> processMoveSelection -> updateTargetsAndSelectionCriteria -> the same is done if move was successful.
This change simply does the same if move is failing too.

@Tejaskriya After scheduling the move here:

if (replicationManager.getConfig().isLegacyEnabled()) { future = replicationManager .move(containerID, source, moveSelection.getTargetNode()); } else { future = moveManager.move(containerID, source, moveSelection.getTargetNode()); }

this method will return true here if the move hasn't failed/completed yet:

moveSelectionToFutureMap.put(moveSelection, future); return true;

Then, updateTargetsAndSelectionCriteria will eventually call findSourceStrategy.increaseSizeLeaving(source, size);, which will add the source datanode back so it can be considered in further move selections. So, we don't need to add the source back in the asynchronous whenComplete handling of the future.

Actually, I'm rethinking my comment. We might need to add back the source in the whenComplete logic as well. I'll get back here once I've understood what we can do.

We need to handle the case where the scheduled move completes with a MoveResult other than COMPLETED, before the if...else code of if (future.isDone()) gets executed. In this case, this line return result == MoveManager.MoveResult.COMPLETED; will evaluate to false and the source datanode will not get added back. However, instead of handling this case in the whenComplete block, I suggest handling it this way:

if (future.isDone()) { if (future.isCompletedExceptionally()) { return false; } else { MoveManager.MoveResult result = future.join(); moveSelectionToFutureMap.put(moveSelection, future); if (result == MoveManager.MoveResult.REPLICATION_FAIL_NOT_EXIST_IN_SOURCE || result == MoveManager.MoveResult.REPLICATION_FAIL_EXIST_IN_TARGET || result == MoveManager.MoveResult.REPLICATION_FAIL_CONTAINER_NOT_CLOSED || result == MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_DELETION || result == MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_REPLICATION) { findSourceStrategy.addBackSourceDataNode(source); } return result == MoveManager.MoveResult.COMPLETED; } } else { moveSelectionToFutureMap.put(moveSelection, future); return true; }

This will achieve the same result and I think is a better way mainly because of thread safety issues. Any code in the whenComplete block can be called from a thread other than the balancing thread. This means we need to care about thread safety problems, such as concurrent accesses to the priority queue that findSourceStrategy.addBackSourceDataNode(source) is changing. We're better off avoiding this to prevent multithreading issues.

I have followed this method to handle the checks. Thank you for the suggestion. Could you please review the recent patch?

...server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/FindSourceGreedy.java

siddhantsangwan · 2024-03-13T08:24:25Z

@Tejaskriya We were discussing how to test this change earlier. I came up with a bit of a hack to test it.

  @Test
  public void testSourceDatanodeAddedBack()
      throws NodeNotFoundException, IOException, IllegalContainerBalancerStateException,
      InvalidContainerBalancerConfigurationException, TimeoutException, InterruptedException {

    when(moveManager.move(any(ContainerID.class),
        any(DatanodeDetails.class),
        any(DatanodeDetails.class)))
        .thenReturn(CompletableFuture.completedFuture(MoveManager.MoveResult.REPLICATION_FAIL_NOT_EXIST_IN_SOURCE))
        .thenReturn(CompletableFuture.completedFuture(MoveManager.MoveResult.COMPLETED));
    balancerConfiguration.setThreshold(10);
    balancerConfiguration.setIterations(1);
    balancerConfiguration.setMaxSizeEnteringTarget(10 * STORAGE_UNIT);
    balancerConfiguration.setMaxSizeToMovePerIteration(100 * STORAGE_UNIT);
    balancerConfiguration.setMaxDatanodesPercentageToInvolvePerIteration(100);
    String includeNodes = nodesInCluster.get(0).getDatanodeDetails().getHostName() + "," +
        nodesInCluster.get(nodesInCluster.size() - 1).getDatanodeDetails().getHostName();
    balancerConfiguration.setIncludeNodes(includeNodes);

    startBalancer(balancerConfiguration);
    GenericTestUtils.waitFor(() -> ContainerBalancerTask.IterationResult.ITERATION_COMPLETED ==
        containerBalancerTask.getIterationResult(), 10, 50);

    assertEquals(2, containerBalancerTask.getCountDatanodesInvolvedPerIteration());
    assertTrue(containerBalancerTask.getMetrics().getNumContainerMovesCompletedInLatestIteration() >= 1);
    assertThat(containerBalancerTask.getMetrics().getNumContainerMovesFailed()).isEqualTo(1);
    stopBalancer();
  }

It uses the includeNodes configuration to ensure only one under and one over utilized node is involved in balancing. This way we're able to test whether the same source was or wasn't added back. What do you think?

…Done

Tejaskriya · 2024-03-26T08:15:35Z

@siddhantsangwan Thank you for finding the hack to test it! It works, I have added it in the unit tests.

@siddhantsangwan @symious Could you please review this updated PR? I have incorporated the suggested changes.

siddhantsangwan

Thanks for the update, I have a few more comments below.

...server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/FindSourceGreedy.java

...rver-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/FindSourceStrategy.java

...m/src/test/java/org/apache/hadoop/hdds/scm/container/balancer/TestContainerBalancerTask.java

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java

Tejaskriya · 2024-04-02T13:12:48Z

@siddhantsangwan Thank you for the review! I have made the changes suggested by you and addressed the remaining comments with appropriate code changes. Could you please take another look at the PR now?

sumitagrawl

@Tejaskriya Thanks for working, have a minor comment

...n/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerSelectionCriteria.java

sumitagrawl · 2024-04-03T16:23:24Z

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java

          "starting a container move", containerID, e);
+      // add source back to queue as a different container can be selected in next run.
+      findSourceStrategy.addBackSourceDataNode(source);
+      selectionCriteria.addToExcludeDueToFailContainers(moveSelection.getContainerID());


How about same container in another source? do that also needs keep excluding? I think we need reconsider this container exclusion to be at source level only.

I missed this comment - @sumitagrawl do you want to take another look? I think @Tejaskriya has updated the PR now.

Thanks for the review! I have created a Map of DN and containers to be excluded for that DN. Could you please take a look at the recent changes?

I think we need reconsider this container exclusion to be at source level only.

What's your reasoning behind this? If container manager can't find this container I think we should avoid this container for any DN.

If 2 different source have same container choosen to be moved to target to reduce usages at source,
s1: c5.--> t5
s2: c5 --> t6
Now s1 for c5 had some failure, so this will avoid choosing c5 container for s1 (as expected). But old logic where comment is given, it will also make c5 container not be choosed in another source s2.

When a container is not found in the SCM, the problem is likely outside of balancer and will affect any other sources trying to move the same container later. So I think we should exclude the container for all sources when it's a ContainerInfo related error like in the above case.

siddhantsangwan

LGTM, pending green CI.

sumitagrawl

LGTM

siddhantsangwan · 2024-04-10T06:15:23Z

Holding off on merging while we're still discussing a point above.

This reverts commit fa36822.

Tejaskriya · 2024-04-15T08:41:12Z

@siddhantsangwan @sumitagrawl following our discussion today, I have reverted the latest commit. Now the container is excluded for all DNs as ContainerNotFoundException will come up for every DN that the container is present in. For replica related intermittent failures, the container is not excluded.
I have also added some comments to mention the reason we are adding a source DN back and excluding containers. The merge conflicts are resolved.
Could you please give it a final review?

siddhantsangwan · 2024-04-16T08:02:23Z

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java

+      // exclude the container whose replica caused failure of move to avoid error in next run.
+      selectionCriteria.addToExcludeDueToFailContainers(moveSelection.getContainerID());


I don't think we need to exclude this container as it's a replica related error. Balancer should get the updated list of containers in a node from node manager next time, so it's fine if we don't exclude it.

I have updated the PR. Thank you for the review! Could you please approve the workflows and merge it if everything looks fine?

siddhantsangwan · 2024-04-17T05:57:20Z

@Tejaskriya please check the failing tests.

…balancing in Container Balancer (apache#6305)

…balancing in Container Balancer (apache#6305) (cherry picked from commit fdc38b5)

…DDS-10870 (apache#221) * CDPD-65217. HDDS-10085. Improve method name in ContainerBalancerSelectionCriteria (apache#5957) (cherry picked from commit b932e16) * CDPD-65645. HDDS-10160. Cache sort results in ContainerBalancerSelectionCriteria (apache#6050) (cherry picked from commit 95666eb) * CDPD-52140. HDDS-7252. Polled source Datanodes are wrongly not re-considered for balancing in Container Balancer (apache#6305) (cherry picked from commit fdc38b5) * CDPD-70112. HDDS-10869. SCMNodeManager#getUsageInfo memory occupancy optimization (apache#6737) (cherry picked from commit 39baf0f) * CDPD-70110. HDDS-10871. ContainerBalancerSelectionCriteria memory occupancy optimization (apache#6738) (cherry picked from commit 19d4419) * CDPD-70111. HDDS-10870. moveSelectionToFutureMap cleanup when future complete (apache#6746) (cherry picked from commit 4f02853) * CDPD-78919. HDDS-12231. Logging in Container Balancer is too verbose. (apache#7826) (cherry picked from commit 371792f) --------- Conflicts: 1. Forced to include CDPD-65217 (minor commit that only changes a method name) and CDPD-52140 to resolve conflicts. No major changes made. 2. Had to change a test method to use the full Assertions.assertEquals instead of just assertEquals. 3. CDPD-70111 - had to merge manually to exclude changes from other unrelated commits in ContainerBalancerTask. --------- Co-authored-by: Siddhant Sangwan <[email protected]> Co-authored-by: Symious <[email protected]> Co-authored-by: Tejaskriya <[email protected]> Co-authored-by: hao guo <[email protected]>

tejaskriya and others added 3 commits March 1, 2024 13:45

HDDS-7252. Polled source Datanodes are wrongly not re-considered for …

e11f214

…balancing in Container Balancer

Merge branch 'apache:master' into HDDS-7252

b670262

Add test case

b6fea1f

siddhantsangwan reviewed Mar 6, 2024

View reviewed changes

siddhantsangwan requested review from sumitagrawl and symious March 6, 2024 08:46

siddhantsangwan reviewed Mar 13, 2024

View reviewed changes

...server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/FindSourceGreedy.java Show resolved Hide resolved

tejaskriya added 4 commits March 25, 2024 12:52

Mark containers to be excluded in failure and move check to future.is…

6177ce1

…Done

Checkstyle fix

5d754a0

Add test

d8cfa4e

test class clean up

79e491a

Tejaskriya marked this pull request as ready for review March 26, 2024 08:15

Merge remote-tracking branch 'origin/master' into HDDS-7252

e721ce5

siddhantsangwan reviewed Apr 2, 2024

View reviewed changes

JavaDoc changes, optimize code and add checks in test

8290f79

sumitagrawl reviewed Apr 3, 2024

View reviewed changes

...n/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerSelectionCriteria.java Show resolved Hide resolved

sumitagrawl reviewed Apr 3, 2024

View reviewed changes

siddhantsangwan approved these changes Apr 5, 2024

View reviewed changes

tejaskriya added 2 commits April 5, 2024 11:04

Exclude containers only for source

fa36822

Checkstyle fix

efac536

sumitagrawl approved these changes Apr 8, 2024

View reviewed changes

tejaskriya and others added 3 commits April 15, 2024 11:59

Revert "Exclude containers only for source"

834fd06

This reverts commit fa36822.

Merge branch 'master' into HDDS-7252

f982c2c

Add descriptive comments

7c82f06

siddhantsangwan reviewed Apr 16, 2024

View reviewed changes

Not exclude container for replica related failure

ee90c03

siddhantsangwan approved these changes Apr 16, 2024

View reviewed changes

siddhantsangwan merged commit fdc38b5 into apache:master Apr 17, 2024

Tejaskriya added a commit to Tejaskriya/ozone that referenced this pull request Apr 17, 2024

HDDS-7252. Polled source Datanodes are wrongly not re-considered for …

93bfde0

…balancing in Container Balancer (apache#6305)

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request May 29, 2024

HDDS-7252. Polled source Datanodes are wrongly not re-considered for …

35cfdfc

…balancing in Container Balancer (apache#6305) (cherry picked from commit fdc38b5)

		// exclude the container whose replica caused failure of move to avoid error in next run.
		selectionCriteria.addToExcludeDueToFailContainers(moveSelection.getContainerID());

HDDS-7252. Polled source Datanodes are wrongly not re-considered for balancing in Container Balancer #6305

HDDS-7252. Polled source Datanodes are wrongly not re-considered for balancing in Container Balancer #6305

Uh oh!

Conversation

Tejaskriya commented Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

siddhantsangwan commented Mar 13, 2024

Uh oh!

Tejaskriya commented Mar 26, 2024

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Tejaskriya commented Apr 2, 2024

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tejaskriya Apr 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Apr 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan commented Apr 10, 2024

Uh oh!

Tejaskriya commented Apr 15, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan commented Apr 17, 2024

Uh oh!

Tejaskriya commented Mar 1, 2024 •

edited

Loading

siddhantsangwan Mar 6, 2024 •

edited

Loading

Tejaskriya Apr 5, 2024 •

edited

Loading

siddhantsangwan Apr 10, 2024 •

edited

Loading