Skip to content

Conversation

@GlenGeng-awx
Copy link
Contributor

@GlenGeng-awx GlenGeng-awx commented Nov 25, 2020

What changes were proposed in this pull request?

This change is inspired by avoiding StaleNodeHandler to take effect in TestDeleteWithSlowFollower.

Consider the follow logic in RM,

      /*
       * We don't take any action if the container is in OPEN state and
       * the container is healthy. If the container is not healthy, i.e.
       * the replicas are not in OPEN state, send CLOSE_CONTAINER command.
       */
      if (state == LifeCycleState.OPEN) {
        if (!isContainerHealthy(container, replicas)) {
          eventPublisher.fireEvent(SCMEvents.CLOSE_CONTAINER, id);
        }
        return;
      }

  private boolean isContainerHealthy(final ContainerInfo container,
                                     final Set<ContainerReplica> replicas) {
    return !isContainerUnderReplicated(container, replicas) &&
        !isContainerOverReplicated(container, replicas) &&
        replicas.stream().allMatch(
            r -> compareState(container.getState(), r.getState()));
  }

  private boolean isContainerUnderReplicated(final ContainerInfo container,
      final Set<ContainerReplica> replicas) {
    if (container.getState() != LifeCycleState.CLOSED &&
        container.getState() != LifeCycleState.QUASI_CLOSED) {
      return false;
    }
    boolean misReplicated = !getPlacementStatus(
        replicas, container.getReplicationFactor().getNumber())
        .isPolicySatisfied();
    return container.getReplicationFactor().getNumber() >
        getReplicaCount(container.containerID(), replicas) || misReplicated;
  }

Consider this scenario: client triggers SCM to allocate one container by allocating blocks, then it crashes, never writes chunks to DN to trigger the creation of the container, thus no replica report for this container.

Previously, ReplicationManager will close such containers, since it is under replicated. This is a reasonable and legacy handling which is used to prevents StaleNodeHandler to take effect in TestDeleteWithSlowFollower.

This following logic is added by Sammi in her PR HDDS-4023. Delete closed container after all blocks have been deleted

    if (container.getState() != LifeCycleState.CLOSED &&
        container.getState() != LifeCycleState.QUASI_CLOSED) {
      return false;
    }

After talked with @ChenSammi , by design, it just needs to explicitly avoid replicating container in DELETING or DELETED state.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4511

How was this patch tested?

CI

@linyiqun
Copy link
Contributor

@GlenGeng , not fully get the reason for doing this. If we enable replication for open state containers, how we keep data consistent when container be in replication and meanwhile there is block written in this container.

@bshashikant
Copy link
Contributor

@GlenGeng , is the description of the pull request correct?

@GlenGeng-awx
Copy link
Contributor Author

GlenGeng-awx commented Nov 26, 2020

@linyiqun @bshashikant Thanks for the review!

Shashi, I found this issue when trying to debug the test case TestDeleteWithSlowFollower for SCM HA, which is written by you. You can find more details in the jira's description.

The key point is, although this case can pass in master, after HDDS-4023. Delete closed container after all blocks have been deleted., there will be log from stale node hander, which is by design not correct.

This case relies on RM to make the container to CLOSING state, so that SCM can move container state to CLOSED, and not hold the delete blocks command.

There are two solutions to avoid the stale node issue, one is shorten the frequency of RM in the case TestDeleteWithSlowFollower, the other is the above changes.

Actually, this fix would work in the version of Oct 20, which is the latest merge from master to HDDS-2823, But seems some change after Oct 20 affect the case TestDeleteWithSlowFollower again.

@GlenGeng-awx
Copy link
Contributor Author

The description may not be accurate enough, I would update them later. Thanks for pointing out !

@GlenGeng-awx GlenGeng-awx changed the title HDDS-4511: ReplicationManager#isContainerUnderReplicated should consider OPEN container [DRAFT]HDDS-4511: ReplicationManager#isContainerUnderReplicated should consider OPEN container Nov 26, 2020
@GlenGeng-awx GlenGeng-awx changed the title [DRAFT]HDDS-4511: ReplicationManager#isContainerUnderReplicated should consider OPEN container HDDS-4511: Avoid StaleNodeHandler to take effect in TestDeleteWithSlowFollower. Nov 26, 2020
@GlenGeng-awx GlenGeng-awx changed the title HDDS-4511: Avoid StaleNodeHandler to take effect in TestDeleteWithSlowFollower. HDDS-4511: ReplicationManager#isContainerUnderReplicated should consider OPEN container Nov 26, 2020
@GlenGeng-awx
Copy link
Contributor Author

@linyiqun Hey yiqun, never mind, RM will only send out replicate container command for closed container.

@GlenGeng-awx GlenGeng-awx changed the title HDDS-4511: ReplicationManager#isContainerUnderReplicated should consider OPEN container HDDS-4511: Avoiding StaleNodeHandler to take effect in TestDeleteWithSlowFollower. Nov 26, 2020
@GlenGeng-awx GlenGeng-awx reopened this Nov 26, 2020
@linyiqun
Copy link
Contributor

linyiqun commented Nov 26, 2020

@linyiqun Hey yiqun, never mind, RM will only send out replicate container command for closed container.

Okay, get it.

@GlenGeng-awx
Copy link
Contributor Author

@ChenSammi Please take a look at this PR. Thanks !

@ChenSammi
Copy link
Contributor

The patch LGTM, +1.

Thanks @GlenGeng for investigating and fix the issue, and @linyiqun @bshashikant for the review.

@ChenSammi ChenSammi merged commit 130ba4d into apache:master Nov 27, 2020
errose28 added a commit to errose28/ozone that referenced this pull request Dec 1, 2020
* HDDS-3698-upgrade:
  HDDS-4429. Create unit test for SimpleContainerDownloader. (apache#1551)
  HDDS-4461. Reuse compiled binaries in acceptance test (apache#1588)
  HDDS-4511: Avoiding StaleNodeHandler to take effect in TestDeleteWithSlowFollower. (apache#1625)
  HDDS-4510. SCM can avoid creating RetriableDatanodeEventWatcher for deletion command ACK (apache#1626)
  HDDS-3363. Intermittent failure in testContainerImportExport (apache#1618)
  HDDS-4370. Datanode deletion service can avoid storing deleted blocks. (apache#1620)
  HDDS-4512. Remove unused netty3 transitive dependency (apache#1627)
  HDDS-4481. With HA OM can send deletion blocks to SCM multiple times. (apache#1608)
  HDDS-4487. SCM can avoid using RETRIABLE_DATANODE_COMMAND for datanode deletion commands. (apache#1621)
  HDDS-4471. GrpcOutputStream length can overflow (apache#1617)
  HDDS-4308. Fix issue with quota update (apache#1489)
  HDDS-4392. [DOC] Add Recon architecture to docs (apache#1602)
  HDDS-4501. Reload OM State fail should terminate OM for any exceptions. (apache#1622)
  HDDS-4492. CLI flag --quota should default to 'spaceQuota' to preserve backward compatibility. (apache#1609)
  HDDS-3689. Add various profiles to MiniOzoneChaosCluster to run different modes. (apache#1420)
  HDDS-4497. Recon File Size Count task throws SQL Exception. (apache#1612)
errose28 added a commit to errose28/ozone that referenced this pull request Dec 1, 2020
* HDDS-3698-upgrade:
  HDDS-4429. Create unit test for SimpleContainerDownloader. (apache#1551)
  HDDS-4461. Reuse compiled binaries in acceptance test (apache#1588)
  HDDS-4511: Avoiding StaleNodeHandler to take effect in TestDeleteWithSlowFollower. (apache#1625)
  HDDS-4510. SCM can avoid creating RetriableDatanodeEventWatcher for deletion command ACK (apache#1626)
  HDDS-3363. Intermittent failure in testContainerImportExport (apache#1618)
  HDDS-4370. Datanode deletion service can avoid storing deleted blocks. (apache#1620)
  HDDS-4512. Remove unused netty3 transitive dependency (apache#1627)
  HDDS-4481. With HA OM can send deletion blocks to SCM multiple times. (apache#1608)
  HDDS-4487. SCM can avoid using RETRIABLE_DATANODE_COMMAND for datanode deletion commands. (apache#1621)
  HDDS-4471. GrpcOutputStream length can overflow (apache#1617)
  HDDS-4308. Fix issue with quota update (apache#1489)
  HDDS-4392. [DOC] Add Recon architecture to docs (apache#1602)
  HDDS-4501. Reload OM State fail should terminate OM for any exceptions. (apache#1622)
  HDDS-4492. CLI flag --quota should default to 'spaceQuota' to preserve backward compatibility. (apache#1609)
  HDDS-3689. Add various profiles to MiniOzoneChaosCluster to run different modes. (apache#1420)
  HDDS-4497. Recon File Size Count task throws SQL Exception. (apache#1612)
errose28 added a commit to errose28/ozone that referenced this pull request Jan 5, 2021
* master: (40 commits)
  HDDS-4473. Reduce number of sortDatanodes RPC calls (apache#1610)
  HDDS-4485. [DOC] add the authentication rules of the Ozone Ranger. (apache#1603)
  HDDS-4528. Upgrade slf4j to 1.7.30 (apache#1639)
  HDDS-4424. Update README with information how to report security issues (apache#1548)
  HDDS-4484. Use RaftServerImpl isLeader instead of periodic leader update logic in OM and isLeaderReady for read/write requests (apache#1638)
  HDDS-4429. Create unit test for SimpleContainerDownloader. (apache#1551)
  HDDS-4461. Reuse compiled binaries in acceptance test (apache#1588)
  HDDS-4511: Avoiding StaleNodeHandler to take effect in TestDeleteWithSlowFollower. (apache#1625)
  HDDS-4510. SCM can avoid creating RetriableDatanodeEventWatcher for deletion command ACK (apache#1626)
  HDDS-3363. Intermittent failure in testContainerImportExport (apache#1618)
  HDDS-4370. Datanode deletion service can avoid storing deleted blocks. (apache#1620)
  HDDS-4512. Remove unused netty3 transitive dependency (apache#1627)
  HDDS-4481. With HA OM can send deletion blocks to SCM multiple times. (apache#1608)
  HDDS-4487. SCM can avoid using RETRIABLE_DATANODE_COMMAND for datanode deletion commands. (apache#1621)
  HDDS-4471. GrpcOutputStream length can overflow (apache#1617)
  HDDS-4308. Fix issue with quota update (apache#1489)
  HDDS-4392. [DOC] Add Recon architecture to docs (apache#1602)
  HDDS-4501. Reload OM State fail should terminate OM for any exceptions. (apache#1622)
  HDDS-4492. CLI flag --quota should default to 'spaceQuota' to preserve backward compatibility. (apache#1609)
  HDDS-3689. Add various profiles to MiniOzoneChaosCluster to run different modes. (apache#1420)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants