Skip to content

Conversation

@JacksonYao287
Copy link
Contributor

What changes were proposed in this pull request?

add a move API in replication manager , and it will be used by container balancer

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4928

How was this patch tested?

add unit test

@JacksonYao287
Copy link
Contributor Author

@ChenSammi @GlenGeng @lokeshj1703 @siddhantsangwan PATL!

I will add more test case for move in additional commits of this PR

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this PR @JacksonYao287 ! The changes look good to me. I have few comments inline.

I was also thinking if we could simplify the logic in RM#handleOverReplicatedContainer. How about we always try to make the container over-replicated so that move can succeed. I think in handleOverReplicatedContainer if src and target are both present and container is over-replicated, delete can be sent for source.
To make sure that container is over-replicated, maybe changes can be made in ContainerReplicaCount so that move target is not included in replica count? This would make sure that finally handleOverReplicatedContainer would have an over-replicated state so that it can just delete source dn.

@JacksonYao287
Copy link
Contributor Author

Thanks @lokeshj1703 very much for this review, i will update this PR according to your comments.

I was also thinking if we could simplify the logic in RM#handleOverReplicatedContainer. How about we always try to make the container over-replicated so that move can succeed. I think in handleOverReplicatedContainer if src and target are both present and container is over-replicated, delete can be sent for source.

this idea seems good, but i have a little question. i think even if src and target are both present and container is over-replicated , deletion can not be definitely sent for source. if the placementPolicy is not met after src is deleted, i think we can not directly send a delete command to src.

@lokeshj1703
Copy link
Contributor

this idea seems good, but i have a little question. i think even if src and target are both present and container is over-replicated , deletion can not be definitely sent for source. if the placementPolicy is not met after src is deleted, i think we can not directly send a delete command to src.

We can use a simple approach initially where if placement policy is not satisfied with replicas + target - src, we fail the move future.

@JacksonYao287
Copy link
Contributor Author

JacksonYao287 commented Jul 5, 2021

i have updated the PR , @lokeshj1703 PTAL! if there is no more logic problem , i will add unit test for move

Comment on lines 1179 to 1192
if (isTargetDnInReplicaSet) {
if (isSourceDnInReplicaSet) {
// if the target is present, and source disappears somehow,
// we can consider move is successful.
inflightMoveFuture.get(id).complete(MoveResult.COMPLETED);
inflightMove.remove(id);
inflightMoveFuture.remove(id);
} else {
// if the container is in inflightMove and target datanode is
// included in the replicas, then swap the source datanode to
// first of the replica list if exists, so the source datanode
// will be first removed if possible.
eligibleReplicas.add(0, eligibleReplicas.remove(sourceDnPos));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain this part? If both target and source are present in replicas, then source should be swapped with the first node in replicas list so it can be deleted, right? And move completes if source is absent while target is present. It seems the logic of if and else parts is opposite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, thanks @siddhantsangwan for pointing out this! I explained the logic in the comment, which is just what you said, but i made a mistake in the code. i will fix this , thanks!

@JacksonYao287 JacksonYao287 requested a review from lokeshj1703 July 7, 2021 13:08
@lokeshj1703
Copy link
Contributor

@JacksonYao287 Thanks for updating the PR! The logic looks good to me.

@lokeshj1703
Copy link
Contributor

@JacksonYao287 There is one optimization which we can use for move.
https://github.com/apache/ozone/pull/2349/files#diff-7b38a0477ad3f17c66925825066d4f153b69efbf37288b2c71cd573313771122R1265-R1270

If placement policy was initially not satisfied then deletion for source can be sent provided placement count remains same after deletion. I think it would be a good idea to abstract out logic here
https://github.com/apache/ozone/pull/2349/files#diff-7b38a0477ad3f17c66925825066d4f153b69efbf37288b2c71cd573313771122R1265-R1270
and reuse it for move and over-replication.

@JacksonYao287
Copy link
Contributor Author

JacksonYao287 commented Jul 12, 2021

If placement policy was initially not satisfied then deletion for source can be sent provided placement count remains same after deletion. I think it would be a good idea to abstract out logic here and reuse it for move and over-replication.

yea, looks good, i will make this change and add some unit tests for move.

by the way , i think !ps.isPolicySatisfied() before nowPS.actualPlacementCount() == ps.actualPlacementCount() is useless, we can remove it

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Thanks for updating the PR! The UTs look really good.
I would suggest adding a UT to make sure RM does not delete replica if placement policy is not satisfied.

@JacksonYao287
Copy link
Contributor Author

@lokeshj1703 Thanks for the review

I would suggest adding a UT to make sure RM does not delete replica if placement policy is not satisfied.

i will add this UT

@JacksonYao287
Copy link
Contributor Author

JacksonYao287 commented Jul 16, 2021

@ChenSammi @siddhantsangwan PTAL!

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Thanks for updating the PR! The changes look good to me. +1.
I have a minor comment inline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants