Skip to content

Conversation

@sodonnel
Copy link
Contributor

What changes were proposed in this pull request?

As a stale DN is more than likely dead, in the RatisOverReplicationHandler we should exclude stale replicas before checking for over-replication. That way, we will not allow stale replicas to count toward over-replication. If we continue consider a stale replica as "present" for over-replication, we may remove another replica and then the stale one could go dead and get removed too, resulting in under replication.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7788

How was this patch tested?

New unit test added.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sodonnel for the improvement, LGTM.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a minor comment, otherwise LGTM.

Comment on lines 98 to 101
/**
* Handler should create one delete command when a closed ratis container
* has 5 replicas and 1 pending delete.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description is incorrect for this test

@sodonnel sodonnel merged commit 3bc2b79 into apache:master Jan 18, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Apr 20, 2023
* master: (209 commits)
  HDDS-7097. Container scanner log output lacks useful information (apache#4169)
  HDDS-7813. Handle Mismatched Replicas (OPEN or CLOSING) of QUASI-CLOSED containers in RM (apache#4195)
  HDDS-7625. Do not compress OM/SCM checkpoints (apache#4130)
  HDDS-7801. Bucket not found when calling getKeyInfo with tenant context (apache#4189)
  HDDS-7807. TarContainerPacker closes streams multiple times (apache#4193)
  HDDS-7755. Ensure that acquired locks are always released. (apache#4191)
  HDDS-7804. UNHEALTHY replicas will not contribute to sufficient replication in RatisContainerReplicaCount (apache#4192)
  HDDS-7748. Rename OMFileRequest.addToOpenFileTable() to avoid misuse. (apache#4176)
  HDDS-7723. Refresh Keys and Certificate used in OzoneSecretManager after certificate renewed (apache#4179)
  HDDS-7788. Ratis OverReplicationHandler should exclude stale replicas (apache#4183)
  HDDS-7718. Bump Netty to 4.1.86 and gRPC to 1.51.1 (apache#4139)
  HDDS-7542. Refactor DefaultReplicationConfig (apache#4005)
  HDDS-7787. GetChecksum for EC files can fail intermittently with IndexOutOfBounds exception (apache#4180)
  HDDS-7754. Download of container is failing with SSL/TLS error during re-replication (apache#4174)
  HDDS-7455. ClassCastException: OzoneTokenIdentifier cannot be cast to String (apache#4159)
  HDDS-7441. Rename function names of retrieving metadata keys (apache#3918)
  HDDS-7722. FSO buckets fail to invalidate open file table cache when committing a key (apache#4156)
  HDDS-7774. Update outdated Trash documentation (apache#4172)
  HDDS-7761. EC: ReplicationManager - Use placementPolicy.replicasToRemoveToFixOverreplication in EC Over replication handler (apache#4166)
  HDDS-7775. EC: Exception encountered while deleting UNHEALTHY replica in Datanode (apache#4173)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants