-
Notifications
You must be signed in to change notification settings - Fork 589
HDDS-7492. Placement Policy Interface changes to handle misreplication changes #4006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
90f0171 to
f6b0d03
Compare
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java
Outdated
Show resolved
Hide resolved
|
Taking the 'replicasToCopy` to start with, as it is easier. What is passed into this method should be neither over or under replicated, the only potential problem is mis-replication. For Random and Capacity placement - neither of these can be mis-replicated as they don't care about Racks. For RackScatter, the policy says we should spread the replicas across as many racks as possible. This means we have: If we group the replicas into something like: Then any rack with more than maxReplicasPerRack in its list needs to have some copied elsewhere. It doesn't matter which we copy, as each replica is unique when there is no over or under replication. For RackScatter, after you group the containers per rack, I think the logic comes down to: For the RackAware placement policy, the policy says, the replicas must be on at least 2 racks, so I think we can just say: expectedRacks = Math.min(Total-Racks-In-Cluster, Math.min(replica-count, 2)) // Handle replica=ONE containers And then the logic is exactly the same. In the current implementation, there seems to be a lot more groupings than above, and I am not sure if they are needed? Also, I know it was me who mentioned the parameters |
|
@sodonnel |
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/PlacementPolicy.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/PlacementPolicy.java
Outdated
Show resolved
Hide resolved
|
I suggested leaving the parameters as they were for now, but the latest commit changed them anyway, and the other comments are not addressed. Lets add only the replicasToCopy method in this PR and save the replicasToRemove for another PR please. We also need to look at the algorithm used - I post a simplified version above I think will work, so we should check if that is the case as it appears to be simpler than the approach in this PR right now. |
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/PlacementPolicy.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/PlacementPolicy.java
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java
Outdated
Show resolved
Hide resolved
...org/apache/hadoop/hdds/scm/container/placement/algorithms/TestContainerPlacementFactory.java
Outdated
Show resolved
Hide resolved
...op-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/fsck/TestContainerHealthTask.java
Outdated
Show resolved
Hide resolved
...p-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/TestSCMCommonPlacementPolicy.java
Show resolved
Hide resolved
|
Did not want the run the CI to run again. Have to add test case |
Done with all the testcases. Running CI. PR can be reviewed. |
|
@sodonnel I have addressed all review comments. Can you check if the PR is good now |
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testReplicasToFixMisreplication() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its hard to see what is tested here. Eg can I at a glance see if we have a test for something like 3 racks and 5, 3, 1 replicas on each rack? Or require 3 racks but have 5, 4 replicas per rack currently.
Also can we add some tests for the scenarios I added in my earlier comment today, so we can see if those cases are really failing or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on splitting the test case
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java
Outdated
Show resolved
Hide resolved
|
If you think it would work, could you try implementing it with my algorithm and see how it looks? I think it will be less LOC and possibly easier to understand. On 5 Dec 2022, at 22:59, Swaminathan Balachandran ***@***.***> wrote:
@swamirishi commented on this pull request.
In hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java:
+ 0);
+ int replicasPerPlacementGroup =
+ getMaxReplicasPerRack(totalNumberOfReplicas);
+ Set<ContainerReplica> copyReplicaSet = Sets.newHashSet();
+
+ for (List<ContainerReplica> replicaList: placementGroupReplicaIdMap
+ .values()) {
+ if (replicaList.size() > replicasPerPlacementGroup) {
+ List<ContainerReplica> replicasToBeCopied = replicaList.stream()
+ .limit(replicaList.size() - replicasPerPlacementGroup)
+ .collect(Collectors.toList());
+ copyReplicaSet.addAll(replicasToBeCopied);
+ replicaList.removeAll(replicasToBeCopied);
+ }
+ }
+ if (additionalNumberOfRacksRequired > copyReplicaSet.size()) {
The algorithm you are suggesting would also work.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
… to reduce lines of code
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java
Show resolved
Hide resolved
|
This change looks good to me now. Thanks for sticking with it. There are some compile problems it seems - could you check and fix and then if we get a green build we can commit. Thanks! |
Done |
|
@sodonnel I had to fix rack scatter policy to support max replicas per rack. Please look at the changes in SCMContainerPlacementRackScatter.java |
| return Math.max(requiredRacks - currentRacks, | ||
| rackReplicaCnts.stream().mapToInt( | ||
| cnt -> Math.max(maxReplicasPerRack - cnt, 0)).sum()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to figure out what this part is doing. Should it be cnt - maxReplicasPerRack instead? @swamirishi @sodonnel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you are right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. Let me fix this.
What changes were proposed in this pull request?
Placement Policy Interface changes to handle misreplication changes
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7492
How was this patch tested?
Unit Tests