Skip to content

Conversation

@JacksonYao287
Copy link
Contributor

What changes were proposed in this pull request?

always choose the nearest one as the target in the candidates according to networkTopology

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5602

How was this patch tested?

ut

@JacksonYao287
Copy link
Contributor Author

@lokeshj1703 @siddhantsangwan PTAL, thanks!

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Thanks for working on this! The changes look good to me.
Can we add a UT if it is not a lot of effort?

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 It might be better to create another implementation of FindTargetStrategy that sorts potential targets by distance. FindTargetGreedy is working on a list of potential targets sorted from least utilized to most utilized.

@JacksonYao287
Copy link
Contributor Author

@siddhantsangwan thanks for the review! what about adding a boolean parameter to findTargetForContainerMove to indicate whether to sort the potential targets from network topology, and this can be configured by user.

@JacksonYao287
Copy link
Contributor Author

@siddhantsangwan @lokeshj1703 i have refactored the code , please take a look!

@lokeshj1703
Copy link
Contributor

I would recommend using an Interface. @JacksonYao287 @siddhantsangwan Let's first finalise how the interface should look like and how to integrate it with balancer. Please see if we would like to use existing selection criteria/strategy.

@JacksonYao287
Copy link
Contributor Author

JacksonYao287 commented Nov 25, 2021

@lokeshj1703 @siddhantsangwan , i have refactored the code again in the latest commit , please take a look!
if it looks good to you , i will then add the network topology related logic to this patch.
when selecting a target for a source, if the target meet all other conditions, i think it is better to make network topology distance as the first criteria, and then the storage usage.

@JacksonYao287 JacksonYao287 force-pushed the HDDS-5602 branch 4 times, most recently from 335c927 to babfd61 Compare November 30, 2021 08:26
Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Thanks for working on this! I think it is better to have a separated ContainerBalancerSelectionCriteria as it gives flexibility later to account both source and target limitations in the criteria. With the current changes it will become part of FindSourceStrategy itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR. Since we are doing a poll here, the source datanode will not be considered again in the iteration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check if this is an issue? We can create a new jira to address it if it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not an issue. this happens when i tried to refactor the findTargetStrategy and remove ContainerBalancerSelectionCriteria. i have canceled this refactor, so getNextCandidateSourceDataNode is public now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we are doing a poll in potentialSources. As a result source datanode will be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this source is added back here, and that method is called here. It seems correct but the logic is a bit confusing (#2808 (comment))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lokeshj1703 @siddhantsangwan

  private void incSizeSelectedForMoving(DatanodeDetails source,
                                        ContainerMoveSelection moveSelection) {
    ````````
    // update sizeLeavingNode map with the recent moveSelection
    findSourceStrategy.increaseSizeLeaving(source, size);

    // update sizeEnteringNode map with the recent moveSelection
    findTargetStrategy.increaseSizeEntering(target, size);
  }
 FindSourceGreedy#increaseSizeLeaving

  public void increaseSizeLeaving(DatanodeDetails dui, long size) {
    Long currentSize = sizeLeavingNode.get(dui);
    if(currentSize != null) {
      sizeLeavingNode.put(dui, currentSize + size);
      //reorder according to the latest sizeLeavingNode
      potentialSources.add(nodeManager.getUsageInfo(dui));
      return;
    }
    LOG.warn("Cannot find datanode {} in candidate source datanodes",
        dui.getUuid());
  }

if we can find a target and a container for this source datanode, then incSizeSelectedForMoving will be definitely called , so this source data node will be added back to that priority queue and re-sort again.

if we can not find any target and container for this source datanode , what we should do is just removing it.

so i don`t think this is an issue. does this explanation make sense to you?

@JacksonYao287
Copy link
Contributor Author

JacksonYao287 commented Dec 1, 2021

thanks @lokeshj1703 very much for the review!

I think it is better to have a separated ContainerBalancerSelectionCriteria as it gives flexibility later to account both source and target limitations in the criteria.

for now , the only role of ContainerBalancerSelectionCriteria is to get Candidate Containers. but i agree with you. may be later in the future, we need to do some flexible work in ContainerBalancerSelectionCriteria. so let us keep it for now and i will cancel this refactor.

so in this patch , i will only do the network topology related work. so i want to hear your opinion, should we sorting candidate targets by network first, or make it configurable?
if we sort candidate targets by network first, we will then use a common list instead of current Treeset to hold all the candidate targets, and we should sort all the candidate targets for each source datanode with a function , for example:

List<DatanodeUsageInfo> potentialTargets;
DatanodeUsageInfo source;
NetworkTopology nt = scm.getClusterMap();
//for each source, we should sort
potentialTargets.sort((a, b) -> {
    // sort by network topology first
     int ret = 0;
     int distancetoA = nt.getDistanceCost(source, a);
     int distancetoB =  nt.getDistanceCost(source, b);
     ret = distancetoA - distancetoB;
     if (ret != 0) {
             return ret;
     }
    // if network distance is equal, sort by usage
      double currentUsageOfA = a.calculateUtilization(
          sizeEnteringNode.get(a.getDatanodeDetails()));
      double currentUsageOfB = b.calculateUtilization(
          sizeEnteringNode.get(b.getDatanodeDetails()));
      ret = Double.compare(currentUsageOfA, currentUsageOfB);
      if (ret != 0) {
        return ret;
      }
      UUID uuidA = a.getDatanodeDetails().getUuid();
      UUID uuidB = b.getDatanodeDetails().getUuid();
      return uuidA.compareTo(uuidB);
    })

@lokeshj1703
Copy link
Contributor

I think configurable is better. It would be preferable to choose most under utilised nodes first in cases like addition of new nodes or highly unbalanced cluster.

@JacksonYao287 JacksonYao287 changed the title HDDS-5602. always choose the nearest one as the target in the candidates according to networkTopology HDDS-5602. make it configurable to choose the nearest one as the target in the candidates according to networkTopology Dec 4, 2021
@JacksonYao287 JacksonYao287 force-pushed the HDDS-5602 branch 2 times, most recently from 4ec207a to ebf730f Compare December 8, 2021 03:32
@JacksonYao287
Copy link
Contributor Author

@lokeshj1703 i have updated this patch according to the comments , PTAL!

@JacksonYao287 JacksonYao287 force-pushed the HDDS-5602 branch 2 times, most recently from 80b3eec to 234d764 Compare December 8, 2021 11:28
Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Thanks for working on this! The changes look good to me. Please find my comments below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting would happen every time function is called. I think we can optimise for same source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will do this optimization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after a deep thought, i think we have to sort potentialTargets every time , even for same source. after a certain target is selected and a move option is scheduled to it, sizeEntering of it will increase, and thus the utilization will increase. so when choosing a target for even the same source, if two candidate target has the same network topology distance to the source , the priority may change according to current usageinfo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the target and re-add it I think. But we can do this in a separate PR.

@JacksonYao287
Copy link
Contributor Author

@lokeshj1703 thanks for the review! i have update this patch according to you comments , PTAL!

@JacksonYao287 JacksonYao287 force-pushed the HDDS-5602 branch 3 times, most recently from 6b928e1 to e48a50e Compare December 18, 2021 15:13
@JacksonYao287
Copy link
Contributor Author

JacksonYao287 commented Dec 20, 2021

@lokeshj1703 @siddhantsangwan i have added test case. could you help reviewing this patch?

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Thanks for updating the PR! The changes look good to me. +1.

@lokeshj1703 lokeshj1703 merged commit e01b471 into apache:master Dec 22, 2021
@lokeshj1703
Copy link
Contributor

@JacksonYao287 Thanks for the contribution! @siddhantsangwan Thanks for review! I have committed the PR to master branch.

@JacksonYao287
Copy link
Contributor Author

thanks @lokeshj1703 @siddhantsangwan for the review!

@JacksonYao287 JacksonYao287 deleted the HDDS-5602 branch December 22, 2021 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants