HDDS-5274. Revert "HDDS-5153. Decommissioning a dead node should complete immediately (#2190)" #2282
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
After some discussion with István Fajth and Siddharth Wagle we believe that the change in HDDS-5153 should be reverted.
If a DN starts decommissioning or maintenance, but goes dead before it completes the process, then the node is moved back to a state of IN_SERVICE and DEAD by the decommission monitor when it notices it has become dead. This is because decommission should gracefully remove the node, but it goes dead first, we may not be able to replicate its containers. In this case decommission effectively fails.
In HDDS-5153, we decided that if a node is already dead and you decommission it, it should immediately move to DECOMMISSIONED. However that is not really consistent with the above behaviour.
Also, there is no real value in decommissioning a dead node - it does not do anything except adjust its state in SCM.
To keep things consistent, I propose we revert HDDS-5153 so starting decommission on a dead node will work the same as when a node goes dead part way through decommission. In both cases the node will end up as IN_SERVICE + DEAD.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5274
How was this patch tested?
Existing tests