Skip to content

Conversation

@sodonnel
Copy link
Contributor

What changes were proposed in this pull request?

This change adds a class and tests intended to be used for checking if an EC container is over or under replicated, while also considering decommission and maintenance. Given the container details and replicas, there are a series of methods used to check for over and under replication. There are also methods to return the under or over replicated indexes. The intention of this class, is to be used by Replication manager and any other parts of the code that need to make decisions based on the replicated state of EC containers.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6704

How was this patch tested?

New unit tests

Copy link
Contributor

@umamaheswararao umamaheswararao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sodonnel thanks for the patch. I have dropped few minors/nits. Thanks

* in maintenance. Note it is possible for an index to be
* decommissioning, healthy and in maintenance, if there are multiple copies
* of it.
* @return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: forgot to mention what we are returning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this.

* Return true if there are insufficient replicas to recover this container.
* Ie, less than EC Datanum containers are present.
* @return True if the container cannot be recovered, false otherwise.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is canRecover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably unrecoverable makes sense, rather than missing

private final Map<Integer, Integer> decommissionIndexes = new HashMap<>();
private final Map<Integer, Integer> maintenanceIndexes = new HashMap<>();

public ECContainerReplicaCount(ContainerInfo containerInfo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexesPendingAdd means that we already schedule with some of previous iteration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the command to create them has been scheduled and may not have completed yet. Or it may fail / timeout.

/**
* Tests for EcContainerReplicaCounts.
*/
public class TestECContainerReplicaCount {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like pendingAdd list related tests not covered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I have missed this. They are only used in missingNonMaintenanceIndexes(), so I will add a test there.

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @sodonnel for the patch , i have few comment , please take a look.

* @param index The replica index to check.
* @Throws IllegalArgumentException if the index is out of bounds.
*/
private void ensureIndexWithinBounds(Integer index) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: we could add an info to indicate in which set the index is out of bound.
Eg.

ensureIndexWithinBounds(Interger index, String setName) {
    if (index < 1 || index > repConfig.getRequiredNodes()) {
      throw new IllegalArgumentException("Replica Index in {} must be between "
          + " 1 and " + repConfig.getRequiredNodes(), setName);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea. I also added the containerID.

Comment on lines +169 to +173
// Now we have a list of missing. Remove any pending add as they should
// eventually recover.
for (Integer i : pendingAdd) {
missing.remove(i);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pendingAdd may fail. should we remove them here?

Copy link
Contributor Author

@sodonnel sodonnel May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the intention in this case, is to get a list of indexes we must schedule new copies for, so we assume the adds will complete. From the java doc above:

and any pending add are assume to be created and omitted them from the returned list. This list can be used to determine which replicas must be recovered in a group, assuming the inflight replicas pending add complete successfully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

* @return
*/
private int getMaxMaintenance() {
return Math.max(0, repConfig.getParity() - remainingMaintenanceRedundancy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i do not missunderstand , remainingMaintenanceRedundancy is just the maximum number of replicas that are allowed to exist only on a maintenance node. should we just return remainingMaintenanceRedundancy ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remainingMaintenanceRedundancy is how many replicas we can still lose when some are in maintenance. For EC-3-2, and remainingRedundancy of 1, we can have 1 in maintenance without replicating.

For 6-3, and remainingRedundancy of 1, we can have 2 in maintenance without replicating.

In both these cases, one more replica could fail without affecting data availability.

It is also possible for someone to set an impossible remainingRedundancy, eg 3 for 3-2.

If we have 3 for 3-2, then this method returns zero, indicating we must make a new copy of all maintenance replicas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it , thanks

* will have the pending deletes already removed.
* @return List of indexes which are over-replicated.
*/
public List<Integer> overReplicatedIndexes() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we return a Map<Integer , Integer> to indicate the index and the associated extra number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep this idea is mind when we write the code to use this method. I am not yet sure how the code to deal with each over-replicated index will behave. Eg whether it should query the replicas again and work out which to remove or just use the output from this class.

I am guessing that code will need to get the replicas for a container, filter them for "IN_SERVICE" + Replica_Index and then schedule a delete of all but 1 of them, so knowing the total count is not going to help it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could keep the code like this for now. when we implement EC container check and know how we will use it exactly, we can refine it if necessary

Copy link
Contributor

@umamaheswararao umamaheswararao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my side as this is currently evolving stage and not yet connected to existing code flows yet. Let's refine if needed when we use this class.

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank @sodonnel for the patch and @umamaheswararao for the review! LGTM +1
I have commit this patch to master branch , and we can refine it if necessary when implementing EC container health check

@JacksonYao287 JacksonYao287 merged commit 86ffd94 into apache:master May 13, 2022
errose28 added a commit to errose28/ozone that referenced this pull request May 20, 2022
* master: (96 commits)
  HDDS-6738. Migrate tests with rules in hdds-server-framework to JUnit5 (apache#3415)
  HDDS-6650. S3MultipartUpload support update bucket usedNamespace. (apache#3404)
  HDDS-6491. Support FSO keys in getExpiredOpenKeys (apache#3226)
  HDDS-6596. EC: Support ListBlock from CoordinatorDN (apache#3410)
  HDDS-6737. Migrate parameterized tests in hdds-server-framework to JUnit5 (apache#3414)
  HDDS-6660: EC: Add the DN side Reconstruction Handler class. (apache#3399)
  HDDS-6750. Migrate simple tests in hdds-server-scm to JUnit5 (apache#3417)
  HDDS-6749. SCM includes itself as peer in addSCM request (apache#3413)
  HDDS-6657. Improve Ozone integrated Ranger configuration instructions (apache#3365)
  HDDS-6742. Audit operation category mismatch (apache#3407)
  HDDS-6748. Intermittent timeout in TestECBlockReconstructedInputStream#testReadDataWithUnbuffer (apache#3416)
  HDDS-6731. Migrate simple tests in hdds-server-framework to JUnit5 (apache#3412)
  HDDS-5919. In kubernetes OM HA has circular dependency on service availability (apache#3185)
  HDDS-6730. Migrate tests in hdds-tools to JUnit5 (apache#3402)
  HDDS-6630. Explicitly remove node after being chosen (apache#3332)
  HDDS-6560. Add general Grafana dashboard (apache#3285)
  HDDS-6704. EC: ReplicationManager - create version of ContainerReplicaCounts applicable to EC (apache#3405)
  HDDS-6680. Pre-Finalize behaviour for Bucket Layout Feature. (apache#3377)
  HDDS-6619. Add freon command to run r/w mix workload using ObjectStore APIs (apache#3383)
  HDDS-6734. ozone admin pipeline list CLI is not backward compatible (apache#3406)
  ...

Conflicts:
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMMetadataStore.java
hadoop-hdds/interface-server/src/main/proto/SCMRatisProtocol.proto
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMDBDefinition.java
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMMetadataStoreImpl.java
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants