Skip to content

Conversation

@Tejaskriya
Copy link
Contributor

@Tejaskriya Tejaskriya commented Mar 1, 2024

What changes were proposed in this pull request?

The FindSourceGreedy class maintains a priority queue of source datanodes. This class is used to get the next source DN for balancing. According to current design, a datanode can be involved in multiple source->target pairings in one iteration of container balancer.
When FindSourceGreedy#getNextCandidateSourceDataNode() is called during an iteration, it polls a DN from the queue. This DN is re queued when FindSourceGreedy#increaseSizeLeaving is called. Problem is that if ContainerBalancer#moveContainer returns false, then increaseSizeLeaving will not be called and the DN doesn't get added back.
If the failure of move operation is due to some errors related to the container selected in that run, then we should be adding back the DN to the queue. That way in the next run, the source DN could still be considered.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7252

How was this patch tested?

Added unit test in TestContainerBalancerTask

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a comment, otherwise looks good. Let's also add some unit tests.

Comment on lines 863 to 870
// add source back if move failed due to container related errors
if (result == MoveManager.MoveResult.REPLICATION_FAIL_NOT_EXIST_IN_SOURCE ||
result == MoveManager.MoveResult.REPLICATION_FAIL_EXIST_IN_TARGET ||
result == MoveManager.MoveResult.REPLICATION_FAIL_CONTAINER_NOT_CLOSED ||
result == MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_DELETION ||
result == MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_REPLICATION) {
findSourceStrategy.addBackSourceDataNode(source);
}
Copy link
Contributor

@siddhantsangwan siddhantsangwan Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to add the source back here I think because it'd already have been added back in

    if (moveContainer(source, moveSelection)) {
      // consider move successful for now, and update selection criteria
      updateTargetsAndSelectionCriteria(moveSelection, source);
    }

when moveContainer returns true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the whenComplete is process after the move work, which is async, I'm not sure if this will help with the moveSelection, the selection part might have finished already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, in case the move is completed within one iteration of the containerBalancer, then in the next run of the infinite loop wouldn't this change would help by adding the source back? What I mean is:

ContainerBalancer # doIteration() {
  ...  //this logic runs for each iteration
  while(true) {
    ...
    // in one run of this loop, the source is selected and move is scheduled
    // in some consecutive run, the previous move is completed with false result, 
    //         so following current PR changes, the source is added back.
    // so now it can be considered for source selection next time
    ...
  }
  ...
}

If we see in doIteration -> processMoveSelection -> updateTargetsAndSelectionCriteria -> the same is done if move was successful.
This change simply does the same if move is failing too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tejaskriya After scheduling the move here:

      if (replicationManager.getConfig().isLegacyEnabled()) {
        future = replicationManager
            .move(containerID, source, moveSelection.getTargetNode());
      } else {
        future = moveManager.move(containerID, source,
            moveSelection.getTargetNode());
      }

this method will return true here if the move hasn't failed/completed yet:

      moveSelectionToFutureMap.put(moveSelection, future);
      return true;

Then, updateTargetsAndSelectionCriteria will eventually call findSourceStrategy.increaseSizeLeaving(source, size);, which will add the source datanode back so it can be considered in further move selections. So, we don't need to add the source back in the asynchronous whenComplete handling of the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm rethinking my comment. We might need to add back the source in the whenComplete logic as well. I'll get back here once I've understood what we can do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle the case where the scheduled move completes with a MoveResult other than COMPLETED, before the if...else code of if (future.isDone()) gets executed. In this case, this line return result == MoveManager.MoveResult.COMPLETED; will evaluate to false and the source datanode will not get added back. However, instead of handling this case in the whenComplete block, I suggest handling it this way:

    if (future.isDone()) {
      if (future.isCompletedExceptionally()) {
        return false;
      } else {
        MoveManager.MoveResult result = future.join();
        moveSelectionToFutureMap.put(moveSelection, future);
        if (result == MoveManager.MoveResult.REPLICATION_FAIL_NOT_EXIST_IN_SOURCE ||
            result == MoveManager.MoveResult.REPLICATION_FAIL_EXIST_IN_TARGET ||
            result == MoveManager.MoveResult.REPLICATION_FAIL_CONTAINER_NOT_CLOSED ||
            result == MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_DELETION ||
            result == MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_REPLICATION) {
          findSourceStrategy.addBackSourceDataNode(source);
        }
        return result == MoveManager.MoveResult.COMPLETED;
      }
    } else {
      moveSelectionToFutureMap.put(moveSelection, future);
      return true;
    }

This will achieve the same result and I think is a better way mainly because of thread safety issues. Any code in the whenComplete block can be called from a thread other than the balancing thread. This means we need to care about thread safety problems, such as concurrent accesses to the priority queue that findSourceStrategy.addBackSourceDataNode(source) is changing. We're better off avoiding this to prevent multithreading issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have followed this method to handle the checks. Thank you for the suggestion. Could you please review the recent patch?

@siddhantsangwan
Copy link
Contributor

@Tejaskriya We were discussing how to test this change earlier. I came up with a bit of a hack to test it.

  @Test
  public void testSourceDatanodeAddedBack()
      throws NodeNotFoundException, IOException, IllegalContainerBalancerStateException,
      InvalidContainerBalancerConfigurationException, TimeoutException, InterruptedException {

    when(moveManager.move(any(ContainerID.class),
        any(DatanodeDetails.class),
        any(DatanodeDetails.class)))
        .thenReturn(CompletableFuture.completedFuture(MoveManager.MoveResult.REPLICATION_FAIL_NOT_EXIST_IN_SOURCE))
        .thenReturn(CompletableFuture.completedFuture(MoveManager.MoveResult.COMPLETED));
    balancerConfiguration.setThreshold(10);
    balancerConfiguration.setIterations(1);
    balancerConfiguration.setMaxSizeEnteringTarget(10 * STORAGE_UNIT);
    balancerConfiguration.setMaxSizeToMovePerIteration(100 * STORAGE_UNIT);
    balancerConfiguration.setMaxDatanodesPercentageToInvolvePerIteration(100);
    String includeNodes = nodesInCluster.get(0).getDatanodeDetails().getHostName() + "," +
        nodesInCluster.get(nodesInCluster.size() - 1).getDatanodeDetails().getHostName();
    balancerConfiguration.setIncludeNodes(includeNodes);

    startBalancer(balancerConfiguration);
    GenericTestUtils.waitFor(() -> ContainerBalancerTask.IterationResult.ITERATION_COMPLETED ==
        containerBalancerTask.getIterationResult(), 10, 50);

    assertEquals(2, containerBalancerTask.getCountDatanodesInvolvedPerIteration());
    assertTrue(containerBalancerTask.getMetrics().getNumContainerMovesCompletedInLatestIteration() >= 1);
    assertThat(containerBalancerTask.getMetrics().getNumContainerMovesFailed()).isEqualTo(1);
    stopBalancer();
  }

It uses the includeNodes configuration to ensure only one under and one over utilized node is involved in balancing. This way we're able to test whether the same source was or wasn't added back. What do you think?

@Tejaskriya
Copy link
Contributor Author

@siddhantsangwan Thank you for finding the hack to test it! It works, I have added it in the unit tests.

@siddhantsangwan @symious Could you please review this updated PR? I have incorporated the suggested changes.

@Tejaskriya Tejaskriya marked this pull request as ready for review March 26, 2024 08:15
Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, I have a few more comments below.

@Tejaskriya
Copy link
Contributor Author

@siddhantsangwan Thank you for the review! I have made the changes suggested by you and addressed the remaining comments with appropriate code changes. Could you please take another look at the PR now?

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tejaskriya Thanks for working, have a minor comment

"starting a container move", containerID, e);
// add source back to queue as a different container can be selected in next run.
findSourceStrategy.addBackSourceDataNode(source);
selectionCriteria.addToExcludeDueToFailContainers(moveSelection.getContainerID());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about same container in another source? do that also needs keep excluding? I think we need reconsider this container exclusion to be at source level only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this comment - @sumitagrawl do you want to take another look? I think @Tejaskriya has updated the PR now.

Copy link
Contributor Author

@Tejaskriya Tejaskriya Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I have created a Map of DN and containers to be excluded for that DN. Could you please take a look at the recent changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need reconsider this container exclusion to be at source level only.

What's your reasoning behind this? If container manager can't find this container I think we should avoid this container for any DN.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 2 different source have same container choosen to be moved to target to reduce usages at source,
s1: c5.--> t5
s2: c5 --> t6
Now s1 for c5 had some failure, so this will avoid choosing c5 container for s1 (as expected). But old logic where comment is given, it will also make c5 container not be choosed in another source s2.

Copy link
Contributor

@siddhantsangwan siddhantsangwan Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a container is not found in the SCM, the problem is likely outside of balancer and will affect any other sources trying to move the same container later. So I think we should exclude the container for all sources when it's a ContainerInfo related error like in the above case.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending green CI.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@siddhantsangwan
Copy link
Contributor

Holding off on merging while we're still discussing a point above.

@Tejaskriya
Copy link
Contributor Author

@siddhantsangwan @sumitagrawl following our discussion today, I have reverted the latest commit. Now the container is excluded for all DNs as ContainerNotFoundException will come up for every DN that the container is present in. For replica related intermittent failures, the container is not excluded.
I have also added some comments to mention the reason we are adding a source DN back and excluding containers. The merge conflicts are resolved.
Could you please give it a final review?

Comment on lines 888 to 889
// exclude the container whose replica caused failure of move to avoid error in next run.
selectionCriteria.addToExcludeDueToFailContainers(moveSelection.getContainerID());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to exclude this container as it's a replica related error. Balancer should get the updated list of containers in a node from node manager next time, so it's fine if we don't exclude it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR. Thank you for the review! Could you please approve the workflows and merge it if everything looks fine?

@siddhantsangwan
Copy link
Contributor

@Tejaskriya please check the failing tests.

@siddhantsangwan siddhantsangwan merged commit fdc38b5 into apache:master Apr 17, 2024
Tejaskriya added a commit to Tejaskriya/ozone that referenced this pull request Apr 17, 2024
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request May 29, 2024
…balancing in Container Balancer (apache#6305)

(cherry picked from commit fdc38b5)
ptlrs pushed a commit to ptlrs/ozone that referenced this pull request Mar 8, 2025
…DDS-10870 (apache#221)

* CDPD-65217. HDDS-10085. Improve method name in ContainerBalancerSelectionCriteria (apache#5957)

(cherry picked from commit b932e16)

* CDPD-65645. HDDS-10160. Cache sort results in ContainerBalancerSelectionCriteria (apache#6050)

(cherry picked from commit 95666eb)

* CDPD-52140. HDDS-7252. Polled source Datanodes are wrongly not re-considered for balancing in Container Balancer (apache#6305)

(cherry picked from commit fdc38b5)

* CDPD-70112. HDDS-10869. SCMNodeManager#getUsageInfo memory occupancy optimization (apache#6737)

(cherry picked from commit 39baf0f)

* CDPD-70110. HDDS-10871. ContainerBalancerSelectionCriteria memory occupancy optimization (apache#6738)

(cherry picked from commit 19d4419)

* CDPD-70111. HDDS-10870. moveSelectionToFutureMap cleanup when future complete (apache#6746)

(cherry picked from commit 4f02853)

* CDPD-78919. HDDS-12231. Logging in Container Balancer is too verbose. (apache#7826)

(cherry picked from commit 371792f)

---------

Conflicts:

1. Forced to include CDPD-65217 (minor commit that only changes a method name) and CDPD-52140 to resolve conflicts. No major changes made.
2. Had to change a test method to use the full Assertions.assertEquals instead of just assertEquals.
3. CDPD-70111 - had to merge manually to exclude changes from other unrelated commits in ContainerBalancerTask.


---------

Co-authored-by: Siddhant Sangwan <[email protected]>
Co-authored-by: Symious <[email protected]>
Co-authored-by: Tejaskriya <[email protected]>
Co-authored-by: hao guo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants