Skip to content

Conversation

@ChenSammi
Copy link
Contributor

@adoroszlai adoroszlai requested a review from sodonnel July 6, 2020 08:36
Copy link
Contributor

@xiaoyuyao xiaoyuyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChenSammi for reporting the issue and propose the fix. The change LGTM.

I have a question wrt the excess calculation in handleOverReplicationContainer where we only consider inflightDeletion without inflightReplication at around line 615. Could this contribute the over replication issue as we will break out when a smaller excess value reaches 0 but the inflighreplication is still going?

@sodonnel
Copy link
Contributor

sodonnel commented Jul 7, 2020

Thanks for this change. I also wonder if there is a bug in the method isContainerUnderReplicated(...) which leads to this problem, and results in more processing that necessary for mis-replicated containers with inFlight Additions.

In isContainerUnderReplicated is uses only the live replicas to check for mis-replication, but then it considers inflight add and delete for under replication. Should we also include the inflightAdds when considering mis-replication in that method?

For isContainerOverReplicated it also uses the inflightAdds and deletes to check for over replication.

@ChenSammi
Copy link
Contributor Author

Thanks @ChenSammi for reporting the issue and propose the fix. The change LGTM.

I have a question wrt the excess calculation in handleOverReplicationContainer where we only consider inflightDeletion without inflightReplication at around line 615. Could this contribute the over replication issue as we will break out when a smaller excess value reaches 0 but the inflighreplication is still going?

@xiaoyuyao , I think inflightReplication not considered here is safer since replication has the change to fail. Imaging we have 2 healthy replicas and 2 inflight replications, this case, send the command to delete the extra 1 replica until we are sure that we have 4 healthy replicas in hand.

@ChenSammi
Copy link
Contributor Author

Thanks for this change. I also wonder if there is a bug in the method isContainerUnderReplicated(...) which leads to this problem, and results in more processing that necessary for mis-replicated containers with inFlight Additions.

In isContainerUnderReplicated is uses only the live replicas to check for mis-replication, but then it considers inflight add and delete for under replication. Should we also include the inflightAdds when considering mis-replication in that method?

For isContainerOverReplicated it also uses the inflightAdds and deletes to check for over replication.

It's a good point. I don't think it's a real bug, but we can improve it. HDDS-3942 to track it.

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This revision LGTM, pending green CI run. +1.

Lets wait for Xiaoyu to check he is happy before committing.

@ChenSammi
Copy link
Contributor Author

testDeleteKeyWithSlowFollower failed at leader membership check step. The test passed locally. It seems a timing issue, not relevant to this patch.

@ChenSammi ChenSammi merged commit 5977168 into apache:master Jul 13, 2020
@ChenSammi
Copy link
Contributor Author

Thanks @sodonnel and @xiaoyuyao for the review.

@xiaoyuyao
Copy link
Contributor

LGTM, +1. Thanks @ChenSammi for the contribution and all for the reviews. I will merge the PR shortly.

ChenSammi added a commit that referenced this pull request Jul 22, 2020
rakeshadr pushed a commit to rakeshadr/hadoop-ozone that referenced this pull request Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants