Skip to content

Conversation

@GlenGeng-awx
Copy link
Contributor

What changes were proposed in this pull request?

As part of SCMCommand SCM will also send its current term, which will be used in Datanode to identify if the command was sent by the latest leader SCM.
 
Datanode will maintain the highest term that it has seen and compare it with the term that is received as part of SCMCommand.
If the term in the Datanode and SCMCommand are same, the command is added to the command queue for processing.
If the term in the Datanode is less than the term received in SCMCommand, Datanode will update its term and add the command to the command queue for processing.
If the term in the Datanode is greater than the term received in SCMCommand, Datanode will ignore the command.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3988

How was this patch tested?

CI

@GlenGeng-awx GlenGeng-awx force-pushed the HDDS-3988 branch 2 times, most recently from ea538f9 to b0ed636 Compare August 19, 2020 02:00
return;
}

termOfLeaderSCM = Optional.of(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use AtomicLong for termOfLeaderSCM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Point.
Since this is executed in a single thread, let's do this first. We surely need to consider the thread safety in future change.

@amaliujia
Copy link
Contributor

cc @amaliujia

}

updateTermOfLeaderSCM(command);
if (command.getTerm() == termOfLeaderSCM.get()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused on when termOfLeaderSCM is updated to the newest leader term. Is termOfLeaderSCM updated in during leader selection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean whether termOfLeaderSCM is updated in leader election of SCM ? No, it won't. Datanode detects the latest SCM term by heartbeat with SCMs, whose interval is larger than 30s.

@Override
public boolean isLeader() {
return isLeader;
public Optional<Long> isLeader() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another challenge is how to test such change. ideally we can have a minicluster setup with configurable nodes so we can simulate the scenario of split brian.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We need add a lot of test cases, including UT and acceptance test, before merging 2823 back to master.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I have been thinking about it and we might be some testing infrastructure to simulate these complicated consensus cases. E.g. split-brian.

OM HA might have done something similar.

@GlenGeng-awx GlenGeng-awx force-pushed the HDDS-3988 branch 3 times, most recently from 2ecae8f to 6645cab Compare November 19, 2020 12:24
@GlenGeng-awx
Copy link
Contributor Author

@ChenSammi @timmylicheng @nandakumar131 CI has been passed upon the latest HDDS-2823. Please revise this PR, Thanks!

* term in the command so that datanode can distinguish commands from stale
* leader SCM by comparing term.
*
* There are 7 SCMCommands:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Try not to list all the commands by name. It would be hard to maintain the correctness of the comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've removed the comments for good.

As discussed in design of SCMContext, we will query and inject the term before firing event of DATANODE_COMMAND explicitly for all related places. I will revisit this code in dev of phase 2.0.

@timmylicheng
Copy link
Contributor

timmylicheng commented Dec 3, 2020

LGTM overall. Left comments inline on a minor issue.
Also would like to see some level of tests on this feature.

@GlenGeng-awx
Copy link
Contributor Author

@timmylicheng Thanks for looking at this patch!

We surely need unit tests for this feature. I've file a new for this task https://issues.apache.org/jira/browse/HDDS-4544

@timmylicheng
Copy link
Contributor

+1. Merging

@timmylicheng timmylicheng merged commit 0aa9ba3 into apache:HDDS-2823 Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants