-
Notifications
You must be signed in to change notification settings - Fork 590
HDDS-2459. Change the ReplicationManager to consider decommission and maintenance states #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-2459. Change the ReplicationManager to consider decommission and maintenance states #262
Conversation
61ea716 to
5a9bc68
Compare
… locally. Will revisit it in a later patch
|
@sodonnel Consider disabling the failing unit test using |
+1. And please create a Jira with information of the failure (log, stdout) and put it to the |
elek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this patch @sodonnel . Overall it looks good to me. I tested it and it works as it was proposed in the design doc.
My biggest question is the usage of the NodeManager vs using the sate of the containers. AFAIK @anuengineer had a different proposal.
| * and zero indicates the containers has replicationFactor healthy | ||
| * replica | ||
| */ | ||
| public int additionalReplicaNeeded() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @anuengineer who suggested to separated the calculation of the required replicas from the retry logic (calculate with inflights). It would make the logic more similar to the original proposal and would make it possible to handle the inflightDel / inFlightAdd in different ways (for example in case of over replication the inflightAdd can be stopped...)
For example:
public int additionalReplicaNeeded() {
int missing = calculateMissingReplicas();
if (missing <= 0) {
return missing + inFlightDel;
} else {
return missing + inFlightDel - inFlightAdd;
}
}
Where calculateMissingReplicas is the existing additionalReplicaNeeded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed this with @elek offline. I have refactored this code and we believe it is simpler and easier to understand now.
| // calculating the new containers needed. | ||
| delta = Math.max(0, delta - maintenanceCount); | ||
| // Check we have enough healthy replicas | ||
| minHealthyForMaintenance = Math.min(repFactor, minHealthyForMaintenance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be moved to the constructor to simplify the logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I missed this. I have made this change.
| * @return True if the container is sufficiently replicated and False | ||
| * otherwise. | ||
| */ | ||
| public boolean isSufficientlyReplicated() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and the isOverReplicated function) seems to be the same logic what we have in the additionalReplicaNeeded just in a simplified version. Can be harder to maintain the logic in two places. Why don't we calculate the number of the missing replicas and use that number to decide if we need to call under/over replication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a subtle difference in these methods in that they only consider inflight deletes, while additionalReplicaNeeded() considers inflight add and delete in some cases.
isSufficientlyReplicated() is intended to be used by the decommission monitor, so it can make a decision on whether a container is correctly replicated to allow decommission or maintenance to proceed. Therefore it assumes an inflight del will complete (worst case) but ignores inflight adds (assuming they will fail, again worst case, until they actually complete).
However, based on the refactor above we can reuse the logic to calculate the missing replicas, and then apply the inflight deletes and simplify these methods. The next commit will have that change.
| /** | ||
| * Used to lookup the health of a nodes or the nodes operational state. | ||
| */ | ||
| private final NodeManager nodeManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anuengineer / @nandakumar131 had a different proposal offline (and in #160): to manage the state only in the containers. In that case it's not required to call the NodeManager.
I am not sure about the current state of that proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of the node manager is not to see the state of the containers, but to check if a node is alive or not before attempting to use it as the source of a replication. Eg, the containers can be decommissioned (which is their end state) but the host is either healthy, stale or dead. We don't want to use a stale or dead node as a source of the replication hence we need to use the node manager to get that health state. The proposal by @anuengineer / @nandakumar131 is in place here - the logic expects the DNs to change the container state via a container report to decommissioned / maintenance for the rest of the logic to work .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understood the proposal is to update the state of the containers by an other components based on the node state and use only the container state here (instead of checking the state by the node manager).
I discussed it with @anuengineer. Let's go forward with this approach and later we can improve this part.
| public class TestContainerReplicaCount { | ||
|
|
||
| private OzoneConfiguration conf; | ||
| private List<DatanodeDetails> dns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this test class went though a few changes and those variables are no longer used. I have removed them.
|
|
||
| @Test | ||
| public void testThreeHealthyReplica() { | ||
| registerNodes(CLOSED, CLOSED, CLOSED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to store the state on the class level here? It would be more readable (for me) to store all the required state in local variables:
Set<ContainerReplica> replica = registerNodes(CLOSED, CLOSED, CLOSED);
... rcount = new ContainerReplicaCount(replica, 0, 0, 3, 2);
validate(rcount, true, 0, false);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed this in the most recent commit and all the instance variables are now gone.
I will change this to @ignore however I have not been able to find the cause of the problem. It just times out when running via github actions every time, but did not before the switch to GH actions, and it works locally in < 2 seconds every time. I am going to revisit the approach to this test as a wider effort or dig into the failure later. HDDS-2631 raised so I do not forget this. |
…lyReplicated() and isOverReplicated() methods
Sure, just link the failing github actions unit test + download the logs and upload to the jira (if meaningful, in case of timeout can be empty). I am not interested about the real root cause, but we need a definition of the problem including assertion errors, exceptions and log output to check it later. |
What changes were proposed in this pull request?
In its current form the replication manager does not consider decommission or maintenance states when checking if replicas are sufficiently replicated. With the introduction of maintenance states, it needs to consider decommission and maintenance states when deciding if blocks are over or under replicated.
It also needs to provide an API to allow the decommission manager to check if blocks are over or under replicated, so the decommission manager can decide if a node has completed decommission and maintenance or not.
The key part of this change is a new class called ContainerReplicaCount - the logic to determine if a container is "sufficiently replicated", over replicated and the delta of replicas required is extracted into this class. This allows for a standalone testable unit which can be used inside the ReplicationManager, but this same object can be returned from the Replication Manager to another class that needs to make replication decisions (ie the datanode admin manager).
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-2459
How was this patch tested?
Additional unit tests have been added to test the new functionality.