-
Notifications
You must be signed in to change notification settings - Fork 592
HDDS-7666. EC: Unrecoverable EC containers with some remaining replicas may block decommissioning #4118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sioning replicas to the queue
…Handler so decommission and maint indexes are replicated
| public boolean isSufficientlyReplicatedForOffline(DatanodeDetails datanode) { | ||
| boolean sufficientlyReplicated = isSufficientlyReplicated(false); | ||
| if (sufficientlyReplicated) { | ||
| return true; | ||
| } | ||
| // If it is not sufficiently replicated (ie the container has all replicas) | ||
| // then we need to check if the replica that is on this node is available | ||
| // on another ONLINE node, ie in the healthy set. This means we avoid | ||
| // blocking decommission or maintenance caused by un-recoverable EC | ||
| // containers. | ||
| ContainerReplica thisReplica = null; | ||
| for (ContainerReplica r : replicas) { | ||
| if (r.getDatanodeDetails().equals(datanode)) { | ||
| thisReplica = r; | ||
| break; | ||
| } | ||
| } | ||
| if (thisReplica == null) { | ||
| // From the set of replicas, none are on the passed datanode. | ||
| // This should not happen in practice but if it does we cannot indicate | ||
| // the container is sufficiently replicated. | ||
| return false; | ||
| } | ||
| return healthyIndexes.containsKey(thisReplica.getReplicaIndex()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns true even if called for the datanode which has the online replica. Should we guard against that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is for this method to be passed a decommissioning / maintenance node. If that is the case, its replicas will not be in healthy replicas, but the code does not enforce that.
Perhaps we could also check there is a replica index for the passed in node in the decommission / maintenance indexes and return false if not. I will add that in.
| container, remainingRedundancy, dueToDecommission, | ||
| replicaCount.isSufficientlyReplicated(true), | ||
| replicaCount.isUnrecoverable()); | ||
| if (replicaCount.decommissioningOnlyIndexes(true).size() > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods will also count indexes that are on DECOMMISSIONED or IN_MAINTENANCE datanodes. I guess we only want to add replicas on datanodes that haven't yet entered these states to the under rep queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For it to have reached DECOMMISSIONED or IN_MAINTENANCE, then it should have made a new copy elsewhere already. The flag it sets here only comes into play later and when its an unrecoverable container. If there are that many replicas missing, we should have another copy of any decommission / or maintenance replicas anyway and if there are not, it would be OK for the system to create them. I think its OK for this to consider both decommissioning / entering_maintenance and decommissioned / in_mantenance.
* master: (176 commits) HDDS-7726. EC: Enhance datanode reconstruction log message (apache#4155) HDDS-7739. EC: Increase the information in the RM sending command log message (apache#4153) HDDS-7652. Volume Quota not enforced during write when bucket quota is not set (apache#4124) HDDS-7628. Intermittent failure in TestOzoneContainerWithTLS (apache#4142) HDDS-7695. EC metrics related to replication commands don't add up (apache#4152) HDDS-7729. EC: ECContainerReplicaCount should handle pending delete of unhealthy replicas (apache#4146) HDDS-7738. SCM terminates when adding container to a closed pipeline (apache#4154) HDDS-7243. Remove RequestFeatureValidator from echoRPC method which supports only ValidationCondition.OLDER_CLIENT_REQUESTS (apache#4051) HDDS-7708. No check for certificate duration config scenarios. (apache#4149) HDDS-7727. EC: SCM unregistered event handler for DatanodeCommandCountUpdated (apache#4147) HDDS-7606. Add SCM HA support in intellij run (apache#4058) HDDS-7666. EC: Unrecoverable EC containers with some remaining replicas may block decommissioning (apache#4118) HDDS-7339. Implement Certificate renewal task for services (apache#3982) HDDS-7696. MisReplicationHandler does not consider QUASI_CLOSED replicas as sources (apache#4144) HDDS-7714. Docker cluster ozone-om-ha fails during docker-compose up (apache#4137) HDDS-7716. Log read requests rejected with permission denied in OM audit (apache#4136) HDDS-7588. Intermittent failure in TestObjectStoreWithLegacyFS#testFlatKeyStructureWithOBS (apache#4040) HDDS-7633. Compile error with Java 11: package com.sun.jmx.mbeanserver is not visible (apache#4077) HDDS-7648. Add a servername tag in UGI metrics. (apache#4094) HDDS-7564. Update Ozone version after 1.3.0 release (apache#4115) ...
…as may block decommissioning (apache#4118) (cherry picked from commit 10811c5) Change-Id: I4f5cbdcda9f7492f8a4f3b332b8f2f600a2b34e5
What changes were proposed in this pull request?
In EC, a container is considered missing and under replicated if it has lost enough replicas that offline reconstruction is not possible. If any of the remaining replicas for this container are on a datanode that is being decommissioned, the decommissioning will not proceed. All the containers on that node must be restored to proper replication for it to finish decommissioning, but the code will not copy the replica of the missing container to a different node.
There are 3 parts to fixing this problem:
In DatanodeAdminMonitorImpl, inside the method checkContainersReplicatedOnNode, we use a call to ECContainerReplicaCount.isSufficientlyReplicated() to decide if the container is replicated ok or not. Even if we address 1 and 2 above, this is still a problem, as the container is un-recoverable. For EC container in the decommission monitor, perhaps we need a different check. Ie, that for the replica on the host being checked, it is also available on another IN_SERVICE host. From a decommission point of view, we don't care if the entire EC container is sufficiently replicated or not - we just care that the replica on the current host has a copy elsewhere.
In ECReplicationCheckHandler, we deliberately skip adding "unrecoverable" containers to the under replicated queue as we previously believed there was no point in adding them. They cannot be recovered anyway. However this decommission issue is specific to EC, so we should allow the container to make it onto the under-replicated queue if it has decommissioning or maintenance indexes.
In ECUnderReplicationHandle we need to check that the decommissioning indexes are copied ok, even if the container is otherwise unrecoverable.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7666
How was this patch tested?
New unit tests added.