HDDS-10113. UNHEALTHY replicas of QUASI_CLOSED container with unique origins should be handled during decommission #5984

siddhantsangwan · 2024-01-11T13:41:49Z

What changes were proposed in this pull request?

A QUASI_CLOSED container may have UNHEALTHY replicas with the correct sequence ID which have unique origin Datanodes. If any of these UNHEALTHY replicas is being taken offline, then it needs to be replicated to another DN for decommission to progress. Currently, decommission will simply proceed without replication, and such an UNHEALTHY replica will be lost.

We try to save such UNHEALTHY replicas because in the future HDDS may have the ability to restore these replicas to a healthy state. Then, these replicas can be used to achieve quorum and close the QUASI_CLOSED container.

This PR makes some changes in VulnerableUnhealthyReplicasHandler. Previously, it was only queueing the container if the healthy ones didn't have the correct sequence ID. Now, it checks if:

UNHEALTHY replica has the correct sequence ID
Has a unique origin, ie, there's no other replica on a healthy, in-service node with the same origin node id.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10113

How was this patch tested?

Added unit tests.

…origins should be handled during decommission

sodonnel

LGTM - thanks for fixing this quickly.

…origins should be handled during decommission (apache#5984) (cherry picked from commit 46b6f3d)

…with unique origins should be handled during decommission (apache#5984) (cherry picked from commit 46b6f3d) Change-Id: I7e178ab4f098de596310d0f87212f1144ddb9da2

HDDS-10113. UNHEALTHY replicas of QUASI_CLOSED container with unique …

2816f3b

…origins should be handled during decommission

siddhantsangwan requested a review from sodonnel January 11, 2024 13:41

sodonnel approved these changes Jan 11, 2024

View reviewed changes

sodonnel merged commit 46b6f3d into apache:master Jan 11, 2024

adoroszlai pushed a commit to adoroszlai/ozone that referenced this pull request Jan 25, 2024

HDDS-10113. UNHEALTHY replicas of QUASI_CLOSED container with unique …

edf1b18

…origins should be handled during decommission (apache#5984) (cherry picked from commit 46b6f3d)

adoroszlai mentioned this pull request Jan 25, 2024

[DO NOT MERGE] Backport some fixes from master to ozone-1.4 #6096

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-10113. UNHEALTHY replicas of QUASI_CLOSED container with unique origins should be handled during decommission #5984

HDDS-10113. UNHEALTHY replicas of QUASI_CLOSED container with unique origins should be handled during decommission #5984

Uh oh!

siddhantsangwan commented Jan 11, 2024

Uh oh!

sodonnel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HDDS-10113. UNHEALTHY replicas of QUASI_CLOSED container with unique origins should be handled during decommission #5984

HDDS-10113. UNHEALTHY replicas of QUASI_CLOSED container with unique origins should be handled during decommission #5984

Uh oh!

Conversation

siddhantsangwan commented Jan 11, 2024

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

sodonnel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants