HDDS-2459. Change the ReplicationManager to consider decommission and maintenance states #262

sodonnel · 2019-11-22T16:08:56Z

What changes were proposed in this pull request?

In its current form the replication manager does not consider decommission or maintenance states when checking if replicas are sufficiently replicated. With the introduction of maintenance states, it needs to consider decommission and maintenance states when deciding if blocks are over or under replicated.

It also needs to provide an API to allow the decommission manager to check if blocks are over or under replicated, so the decommission manager can decide if a node has completed decommission and maintenance or not.

The key part of this change is a new class called ContainerReplicaCount - the logic to determine if a container is "sufficiently replicated", over replicated and the delta of replicas required is extracted into this class. This allows for a standalone testable unit which can be used inside the ReplicationManager, but this same object can be returned from the Replication Manager to another class that needs to make replication decisions (ie the datanode admin manager).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-2459

How was this patch tested?

Additional unit tests have been added to test the new functionality.

…e states

… locally. Will revisit it in a later patch

adoroszlai · 2019-11-25T10:51:44Z

@sodonnel Consider disabling the failing unit test using @Ignore instead of commenting it out. This way you can prevent the situation where some refactoring makes the commented code not compile when you want to enable it.

elek · 2019-11-27T10:07:56Z

@sodonnel Consider disabling the failing unit test using @ignore instead of commenting it out. This way you can prevent the situation where some refactoring makes the commented code not compile when you want to enable it.

+1. And please create a Jira with information of the failure (log, stdout) and put it to the @Ignore.

elek

Thank you very much for this patch @sodonnel . Overall it looks good to me. I tested it and it works as it was proposed in the design doc.

My biggest question is the usage of the NodeManager vs using the sate of the containers. AFAIK @anuengineer had a different proposal.

elek · 2019-11-27T10:13:05Z

...dds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaCount.java

+   *         and zero indicates the containers has replicationFactor healthy
+   *         replica
+   */
+  public int additionalReplicaNeeded() {


I agree with @anuengineer who suggested to separated the calculation of the required replicas from the retry logic (calculate with inflights). It would make the logic more similar to the original proposal and would make it possible to handle the inflightDel / inFlightAdd in different ways (for example in case of over replication the inflightAdd can be stopped...)

For example:

public int additionalReplicaNeeded() { int missing = calculateMissingReplicas(); if (missing <= 0) { return missing + inFlightDel; } else { return missing + inFlightDel - inFlightAdd; } }

Where calculateMissingReplicas is the existing additionalReplicaNeeded

I discussed this with @elek offline. I have refactored this code and we believe it is simpler and easier to understand now.

elek · 2019-11-27T10:14:03Z

...dds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaCount.java

+      // calculating the new containers needed.
+      delta = Math.max(0, delta - maintenanceCount);
+      // Check we have enough healthy replicas
+      minHealthyForMaintenance = Math.min(repFactor, minHealthyForMaintenance);


This line can be moved to the constructor to simplify the logic here.

True, I missed this. I have made this change.

elek · 2019-11-27T10:15:27Z

...dds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaCount.java

+   * @return True if the container is sufficiently replicated and False
+   *         otherwise.
+   */
+  public boolean isSufficientlyReplicated() {


This (and the isOverReplicated function) seems to be the same logic what we have in the additionalReplicaNeeded just in a simplified version. Can be harder to maintain the logic in two places. Why don't we calculate the number of the missing replicas and use that number to decide if we need to call under/over replication?

There is a subtle difference in these methods in that they only consider inflight deletes, while additionalReplicaNeeded() considers inflight add and delete in some cases.

isSufficientlyReplicated() is intended to be used by the decommission monitor, so it can make a decision on whether a container is correctly replicated to allow decommission or maintenance to proceed. Therefore it assumes an inflight del will complete (worst case) but ignores inflight adds (assuming they will fail, again worst case, until they actually complete).

However, based on the refactor above we can reuse the logic to calculate the missing replicas, and then apply the inflight deletes and simplify these methods. The next commit will have that change.

elek · 2019-11-27T10:18:10Z

...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java

+  /**
+   * Used to lookup the health of a nodes or the nodes operational state.
+   */
+  private final NodeManager nodeManager;


@anuengineer / @nandakumar131 had a different proposal offline (and in #160): to manage the state only in the containers. In that case it's not required to call the NodeManager.

I am not sure about the current state of that proposal.

The use of the node manager is not to see the state of the containers, but to check if a node is alive or not before attempting to use it as the source of a replication. Eg, the containers can be decommissioned (which is their end state) but the host is either healthy, stale or dead. We don't want to use a stale or dead node as a source of the replication hence we need to use the node manager to get that health state. The proposal by @anuengineer / @nandakumar131 is in place here - the logic expects the DNs to change the container state via a container report to decommissioned / maintenance for the rest of the logic to work .

As far as I understood the proposal is to update the state of the containers by an other components based on the node state and use only the container state here (instead of checking the state by the node manager).

I discussed it with @anuengineer. Let's go forward with this approach and later we can improve this part.

elek · 2019-11-27T10:20:28Z

...scm/src/test/java/org/apache/hadoop/hdds/scm/container/states/TestContainerReplicaCount.java

+public class TestContainerReplicaCount {
+
+  private OzoneConfiguration conf;
+  private List<DatanodeDetails> dns;


Might be unused.

Yes, this test class went though a few changes and those variables are no longer used. I have removed them.

elek · 2019-11-27T10:23:32Z

...scm/src/test/java/org/apache/hadoop/hdds/scm/container/states/TestContainerReplicaCount.java

+
+  @Test
+  public void testThreeHealthyReplica() {
+    registerNodes(CLOSED, CLOSED, CLOSED);


Is there any reason to store the state on the class level here? It would be more readable (for me) to store all the required state in local variables:

Set<ContainerReplica> replica = registerNodes(CLOSED, CLOSED, CLOSED); ... rcount = new ContainerReplicaCount(replica, 0, 0, 3, 2); validate(rcount, true, 0, false);

I have changed this in the most recent commit and all the instance variables are now gone.

sodonnel · 2019-11-27T10:45:27Z

@sodonnel Consider disabling the failing unit test using @ignore instead of commenting it out. This way you can prevent the situation where some refactoring makes the commented code not compile when you want to enable it.

+1. And please create a Jira with information of the failure (log, stdout) and put it to the @Ignore.

I will change this to @ignore however I have not been able to find the cause of the problem. It just times out when running via github actions every time, but did not before the switch to GH actions, and it works locally in < 2 seconds every time. I am going to revisit the approach to this test as a wider effort or dig into the failure later. HDDS-2631 raised so I do not forget this.

…lyReplicated() and isOverReplicated() methods

elek · 2019-11-28T08:23:24Z

I will change this to @ignore however I have not been able to find the cause of the problem

Sure, just link the failing github actions unit test + download the logs and upload to the jira (if meaningful, in case of timeout can be empty). I am not interested about the real root cause, but we need a definition of the problem including assertion errors, exceptions and log output to check it later.

Closes #262

S O'Donnell added 2 commits November 22, 2019 22:34

Change the ReplicationManager to consider decommission and maintenanc…

878ed1b

…e states

Fix style issues caused by merging master into branch

5a9bc68

sodonnel force-pushed the HDDS-2459-rep-manager-dn-reports branch from 61ea716 to 5a9bc68 Compare November 22, 2019 22:37

Disabled test which is consistently failing in the CI runs, but works…

6cf6309

… locally. Will revisit it in a later patch

Fixed style issues caused by commenting out text

fc364b9

elek reviewed Nov 27, 2019

View reviewed changes

S O'Donnell added 3 commits November 27, 2019 10:55

Change commented out test to be @ignore instead

c2d3117

Refactor logic to calculate missing replica and also the isSufficient…

bbc1345

…lyReplicated() and isOverReplicated() methods

Refactored tests to remove unneeded instance variables and unused code

35e0860

elek changed the title ~~HDDS-2459 - Change the ReplicationManager to consider decommission and maintenance states~~ HDDS-2459. Change the ReplicationManager to consider decommission and maintenance states Nov 28, 2019

elek pushed a commit that referenced this pull request Nov 28, 2019

HDDS-2459. Refactor ReplicationManager to consider maintenance states

920b8c5

Closes #262

elek approved these changes Nov 28, 2019

View reviewed changes

elek closed this Nov 28, 2019

HDDS-2459. Change the ReplicationManager to consider decommission and maintenance states #262

HDDS-2459. Change the ReplicationManager to consider decommission and maintenance states #262

Uh oh!

Conversation

sodonnel commented Nov 22, 2019

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

adoroszlai commented Nov 25, 2019

Uh oh!

elek commented Nov 27, 2019

Uh oh!

elek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sodonnel commented Nov 27, 2019

Uh oh!

elek commented Nov 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants