HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues #3512

sodonnel · 2022-06-13T16:52:17Z

What changes were proposed in this pull request?

This PR defines a ContainerHealthCheckInterface, with the following definition:

  ContainerHealthResult checkHealth(
      ContainerInfo container, Set<ContainerReplica> replicas,
      List<Pair<Integer, DatanodeDetails>> indexesPendingAdd,
      List<Pair<Integer, DatanodeDetails>> indexesPendingDelete,
      int remainingRedundancyForMaintenance);
}

And also an implementation for EC containers that can determine if an EC container is over or under replicated. The interface will return a ContainerHealthResult object that contains information about the health state. Eg for under replicated, it will contain details about the remaining redundancy, whether its due to decommission etc.

For now this class is standalone and unused. It will be integrated with ReplicationManager soon via a new Jira.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6697

How was this patch tested?

New unit tests

…e decommissioning indexes from the list

…health issues

...m/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ECContainerHealthCheck.java

umamaheswararao · 2022-06-14T13:31:13Z

...cm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerHealthResult.java

+     * will the container still be over-replicated when they complete.
+     * @return True if the over-replication is corrected by the pending
+     *         deletes. False otherwise.
+     */


For consistency, you may want to rename this to isSufficientlyReplicatedAfterPending?

I see I have a typo in there - is that what you are referring to? sufficient -> sufficiently ?

umamaheswararao · 2022-06-14T14:56:31Z

@sodonnel overall patch looks good to me. After changing the method name type, I am +1 on this.

umamaheswararao

LGTM, pending CI

* master: (34 commits) HDDS-6868 Add S3Auth information to thread local (apache#3527) HDDS-6877. Keep replication port unchanged when restarting datanode in MiniOzoneCluster (apache#3510) HDDS-6907. OFS should create buckets with FILE_SYSTEM_OPTIMIZED layout. (apache#3528) HDDS-6875. Migrate parameterized tests in hdds-common to JUnit5 (apache#3513) HDDS-6924. OBJECT_STORE isn't flat namespaced (apache#3533) HDDS-6899. [EC] Remove warnings and errors from console during online reconstruction of data. (apache#3522) HDDS-6695. Enable SCM Ratis by default for new clusters only (apache#3499) HDDS-4123. Integrate OM Open Key Cleanup Service Into Existing Code (apache#3319) HDDS-6882. Correct exit code for invalid arguments passed to command-line tools. (apache#3517) HDDS-6890. EC: Fix potential wrong replica read with over-replicated container. (apache#3523) HDDS-6902. Duplicate mockito-core entries in pom.xml (apache#3525) HDDS-6752. Migrate tests with rules in hdds-server-scm to JUnit5 (apache#3442) HDDS-6806. EC: Implement the EC Reconstruction coordinator. (apache#3504) HDDS-6829. Limit the no of inflight replication tasks in SCM. (apache#3482) HDDS-6898. [SCM HA finalization] Modify acceptance test configuration to speed up test finalization (apache#3521) HDDS-6577. Configurations to reserve HDDS volume space. (apache#3484) HDDS-6870 Clean up isTenantAdmin to use UGI (apache#3503) HDDS-6872. TestAuthorizationV4QueryParser should pass offline (apache#3506) HDDS-6840. Add MetaData volume information to the SCM and OM - UI (apache#3488) HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues (apache#3512) ...

… container health issues (apache#3512)" This reverts commit be29c67.

HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues (apache#3512) (cherry picked from commit be29c67) Change-Id: I5f895f60d7ecba2a3b37dfd92c42fce7fb2c9611

S O'Donnell added 2 commits June 13, 2022 12:52

Change missingNonMaintenanceIndexes to unavailable indexes and exclud…

6009bee

…e decommissioning indexes from the list

HDDS-6697. EC: ReplicationManager - create class to detect container …

3eaaabd

…health issues

umamaheswararao reviewed Jun 14, 2022

View reviewed changes

Fixed typo in method name

09caf57

umamaheswararao approved these changes Jun 14, 2022

View reviewed changes

umamaheswararao merged commit be29c67 into apache:master Jun 15, 2022

guihecheng pushed a commit to guihecheng/ozone that referenced this pull request Jun 28, 2022

Revert "HDDS-6697. EC: ReplicationManager - create class to detect EC…

bdc3383

… container health issues (apache#3512)" This reverts commit be29c67.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues #3512

HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues #3512

Uh oh!

sodonnel commented Jun 13, 2022

Uh oh!

Uh oh!

umamaheswararao Jun 14, 2022 •

edited

Loading

Uh oh!

sodonnel Jun 14, 2022

Uh oh!

umamaheswararao Jun 14, 2022

Uh oh!

umamaheswararao commented Jun 14, 2022

Uh oh!

umamaheswararao left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues #3512

HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues #3512

Uh oh!

Conversation

sodonnel commented Jun 13, 2022

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

Uh oh!

umamaheswararao Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sodonnel Jun 14, 2022

Choose a reason for hiding this comment

Uh oh!

umamaheswararao Jun 14, 2022

Choose a reason for hiding this comment

Uh oh!

umamaheswararao commented Jun 14, 2022

Uh oh!

umamaheswararao left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

umamaheswararao Jun 14, 2022 •

edited

Loading