Skip to content

Conversation

@siddhantsangwan
Copy link
Contributor

@siddhantsangwan siddhantsangwan commented Jan 19, 2023

What changes were proposed in this pull request?

  • Replicas with UNHEALTHY state will not count toward sufficient replication.
  • A new constructor in RatisContainerReplicaCount that takes a list of ContainerReplicaOp. This is needed to ignore pending deletes on UNHEALTHY replicas.
  • Ensure Legacy RM is not affected.
  • New UTs.
  • This breaks the ratis flow in the new RM, which is still WIP. Disabled a UT for now. It will be fixed when the rest of the tasks are in: https://issues.apache.org/jira/browse/HDDS-7785
  • There is a discrepancy in the meaning of healthy replicas bw RatisContainerReplicaCount#isHealthy and getHealthyReplicaCount. This should be clear when the rest of the jiras are in.

Check the jira for a detailed description.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7785

How was this patch tested?

New UTs
CI run in my fork: https://github.com/siddhantsangwan/ozone/actions/runs/3956910453/jobs/6776657068

@siddhantsangwan siddhantsangwan changed the title Hdds 7804 HDDS-7804. UNHEALTHY replicas will not contribute to sufficient replication in RatisContainerReplicaCount Jan 19, 2023
* prioritize creating delete commands for unhealthy replicas over quasi
* closed replicas.
*/
@Ignore("HDDS-7804")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test is no longer valid based on the changes here, probably best to remove it rather than just ignore it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we need a similar test later on but in a different context. I've removed it for now.

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I think we should remove the invalid test you have ignored, as it may just cause confusion later.

@adoroszlai adoroszlai merged commit be40803 into apache:master Jan 20, 2023
@adoroszlai
Copy link
Contributor

Thanks @siddhantsangwan for the patch, @sodonnel for the review.

errose28 added a commit to errose28/ozone that referenced this pull request Apr 20, 2023
* master: (209 commits)
  HDDS-7097. Container scanner log output lacks useful information (apache#4169)
  HDDS-7813. Handle Mismatched Replicas (OPEN or CLOSING) of QUASI-CLOSED containers in RM (apache#4195)
  HDDS-7625. Do not compress OM/SCM checkpoints (apache#4130)
  HDDS-7801. Bucket not found when calling getKeyInfo with tenant context (apache#4189)
  HDDS-7807. TarContainerPacker closes streams multiple times (apache#4193)
  HDDS-7755. Ensure that acquired locks are always released. (apache#4191)
  HDDS-7804. UNHEALTHY replicas will not contribute to sufficient replication in RatisContainerReplicaCount (apache#4192)
  HDDS-7748. Rename OMFileRequest.addToOpenFileTable() to avoid misuse. (apache#4176)
  HDDS-7723. Refresh Keys and Certificate used in OzoneSecretManager after certificate renewed (apache#4179)
  HDDS-7788. Ratis OverReplicationHandler should exclude stale replicas (apache#4183)
  HDDS-7718. Bump Netty to 4.1.86 and gRPC to 1.51.1 (apache#4139)
  HDDS-7542. Refactor DefaultReplicationConfig (apache#4005)
  HDDS-7787. GetChecksum for EC files can fail intermittently with IndexOutOfBounds exception (apache#4180)
  HDDS-7754. Download of container is failing with SSL/TLS error during re-replication (apache#4174)
  HDDS-7455. ClassCastException: OzoneTokenIdentifier cannot be cast to String (apache#4159)
  HDDS-7441. Rename function names of retrieving metadata keys (apache#3918)
  HDDS-7722. FSO buckets fail to invalidate open file table cache when committing a key (apache#4156)
  HDDS-7774. Update outdated Trash documentation (apache#4172)
  HDDS-7761. EC: ReplicationManager - Use placementPolicy.replicasToRemoveToFixOverreplication in EC Over replication handler (apache#4166)
  HDDS-7775. EC: Exception encountered while deleting UNHEALTHY replica in Datanode (apache#4173)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants