Skip to content

Conversation

@sodonnel
Copy link
Contributor

What changes were proposed in this pull request?

In EC, a container is considered missing and under replicated if it has lost enough replicas that offline reconstruction is not possible. If any of the remaining replicas for this container are on a datanode that is being decommissioned, the decommissioning will not proceed. All the containers on that node must be restored to proper replication for it to finish decommissioning, but the code will not copy the replica of the missing container to a different node.

There are 3 parts to fixing this problem:

In DatanodeAdminMonitorImpl, inside the method checkContainersReplicatedOnNode, we use a call to ECContainerReplicaCount.isSufficientlyReplicated() to decide if the container is replicated ok or not. Even if we address 1 and 2 above, this is still a problem, as the container is un-recoverable. For EC container in the decommission monitor, perhaps we need a different check. Ie, that for the replica on the host being checked, it is also available on another IN_SERVICE host. From a decommission point of view, we don't care if the entire EC container is sufficiently replicated or not - we just care that the replica on the current host has a copy elsewhere.

In ECReplicationCheckHandler, we deliberately skip adding "unrecoverable" containers to the under replicated queue as we previously believed there was no point in adding them. They cannot be recovered anyway. However this decommission issue is specific to EC, so we should allow the container to make it onto the under-replicated queue if it has decommissioning or maintenance indexes.

In ECUnderReplicationHandle we need to check that the decommissioning indexes are copied ok, even if the container is otherwise unrecoverable.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7666

How was this patch tested?

New unit tests added.

Comment on lines 436 to 460
public boolean isSufficientlyReplicatedForOffline(DatanodeDetails datanode) {
boolean sufficientlyReplicated = isSufficientlyReplicated(false);
if (sufficientlyReplicated) {
return true;
}
// If it is not sufficiently replicated (ie the container has all replicas)
// then we need to check if the replica that is on this node is available
// on another ONLINE node, ie in the healthy set. This means we avoid
// blocking decommission or maintenance caused by un-recoverable EC
// containers.
ContainerReplica thisReplica = null;
for (ContainerReplica r : replicas) {
if (r.getDatanodeDetails().equals(datanode)) {
thisReplica = r;
break;
}
}
if (thisReplica == null) {
// From the set of replicas, none are on the passed datanode.
// This should not happen in practice but if it does we cannot indicate
// the container is sufficiently replicated.
return false;
}
return healthyIndexes.containsKey(thisReplica.getReplicaIndex());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns true even if called for the datanode which has the online replica. Should we guard against that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is for this method to be passed a decommissioning / maintenance node. If that is the case, its replicas will not be in healthy replicas, but the code does not enforce that.

Perhaps we could also check there is a replica index for the passed in node in the decommission / maintenance indexes and return false if not. I will add that in.

container, remainingRedundancy, dueToDecommission,
replicaCount.isSufficientlyReplicated(true),
replicaCount.isUnrecoverable());
if (replicaCount.decommissioningOnlyIndexes(true).size() > 0
Copy link
Contributor

@siddhantsangwan siddhantsangwan Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods will also count indexes that are on DECOMMISSIONED or IN_MAINTENANCE datanodes. I guess we only want to add replicas on datanodes that haven't yet entered these states to the under rep queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For it to have reached DECOMMISSIONED or IN_MAINTENANCE, then it should have made a new copy elsewhere already. The flag it sets here only comes into play later and when its an unrecoverable container. If there are that many replicas missing, we should have another copy of any decommission / or maintenance replicas anyway and if there are not, it would be OK for the system to create them. I think its OK for this to consider both decommissioning / entering_maintenance and decommissioned / in_mantenance.

@sodonnel sodonnel merged commit 10811c5 into apache:master Jan 5, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Jan 9, 2023
* master: (176 commits)
  HDDS-7726. EC: Enhance datanode reconstruction log message (apache#4155)
  HDDS-7739. EC: Increase the information in the RM sending command log message (apache#4153)
  HDDS-7652. Volume Quota not enforced during write when bucket quota is not set (apache#4124)
  HDDS-7628. Intermittent failure in TestOzoneContainerWithTLS (apache#4142)
  HDDS-7695. EC metrics related to replication commands don't add up (apache#4152)
  HDDS-7729. EC: ECContainerReplicaCount should handle pending delete of unhealthy replicas (apache#4146)
  HDDS-7738. SCM terminates when adding container to a closed pipeline (apache#4154)
  HDDS-7243. Remove RequestFeatureValidator from echoRPC method which supports only ValidationCondition.OLDER_CLIENT_REQUESTS (apache#4051)
  HDDS-7708. No check for certificate duration config scenarios. (apache#4149)
  HDDS-7727. EC: SCM unregistered event handler for DatanodeCommandCountUpdated (apache#4147)
  HDDS-7606. Add SCM HA support in intellij run (apache#4058)
  HDDS-7666. EC: Unrecoverable EC containers with some remaining replicas may block decommissioning (apache#4118)
  HDDS-7339. Implement Certificate renewal task for services (apache#3982)
  HDDS-7696. MisReplicationHandler does not consider QUASI_CLOSED replicas as sources (apache#4144)
  HDDS-7714. Docker cluster ozone-om-ha fails during docker-compose up (apache#4137)
  HDDS-7716. Log read requests rejected with permission denied in OM audit (apache#4136)
  HDDS-7588. Intermittent failure in TestObjectStoreWithLegacyFS#testFlatKeyStructureWithOBS (apache#4040)
  HDDS-7633. Compile error with Java 11: package com.sun.jmx.mbeanserver is not visible (apache#4077)
  HDDS-7648. Add a servername tag in UGI metrics. (apache#4094)
  HDDS-7564. Update Ozone version after 1.3.0 release (apache#4115)
  ...
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Feb 6, 2023
…as may block decommissioning (apache#4118)

(cherry picked from commit 10811c5)
Change-Id: I4f5cbdcda9f7492f8a4f3b332b8f2f600a2b34e5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants