Skip to content

Conversation

@siddhantsangwan
Copy link
Contributor

What changes were proposed in this pull request?

A QUASI_CLOSED container may have UNHEALTHY replicas with the correct sequence ID which have unique origin Datanodes. If any of these UNHEALTHY replicas is being taken offline, then it needs to be replicated to another DN for decommission to progress. Currently, decommission will simply proceed without replication, and such an UNHEALTHY replica will be lost.

We try to save such UNHEALTHY replicas because in the future HDDS may have the ability to restore these replicas to a healthy state. Then, these replicas can be used to achieve quorum and close the QUASI_CLOSED container.

This PR makes some changes in VulnerableUnhealthyReplicasHandler. Previously, it was only queueing the container if the healthy ones didn't have the correct sequence ID. Now, it checks if:

  1. UNHEALTHY replica has the correct sequence ID
  2. Has a unique origin, ie, there's no other replica on a healthy, in-service node with the same origin node id.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10113

How was this patch tested?

Added unit tests.

…origins should be handled during decommission
Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for fixing this quickly.

@sodonnel sodonnel merged commit 46b6f3d into apache:master Jan 11, 2024
adoroszlai pushed a commit to adoroszlai/ozone that referenced this pull request Jan 25, 2024
…origins should be handled during decommission (apache#5984)

(cherry picked from commit 46b6f3d)
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Feb 1, 2024
…with unique origins should be handled during decommission (apache#5984)

(cherry picked from commit 46b6f3d)
Change-Id: I7e178ab4f098de596310d0f87212f1144ddb9da2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants