Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quorum raft upgrade v3.3.13 #887

Merged
merged 45 commits into from
Dec 4, 2019
Merged

Quorum raft upgrade v3.3.13 #887

merged 45 commits into from
Dec 4, 2019

Conversation

vsmk98
Copy link
Contributor

@vsmk98 vsmk98 commented Nov 19, 2019

This PR contains the following changes to Raft consensus:

  • Upgrade quorum's Raft library to version v3.3.13 from upstream.
  • Raft API to add raft learner node in quorum. learner node will follow the leader but will not vote or take part in quorum for commits.
  • Raft API for promoting a learner node to peer node
  • Disallow learner node from adding another learner or peer
  • Disallow learner node from promoting itself or other learners to peer
  • Disallow learner node to remove other peers or learners from raft cluster
  • Allow learner node to remove itself from raft cluster

#825, #790, #852, #180

amalraj.manigmail.com and others added 30 commits June 6, 2019 09:37
fix learner node changing to voter node when network restarted
…d-upgrade

# Conflicts:
#	vendor/github.com/coreos/etcd/raft/raft.go
add --raftlearner flag to geth to specify node type when joining
…ly when it joins cluster first time and remove my fixes for the same
amalraj.manigmail.com and others added 7 commits September 29, 2019 20:00
….3.13) package. This is based on the same convention followed by previous version(3.1.0)
allow raft learner node to call removePeer if raftId to be removed is its own raftId
update raft.cluster api to show minter instead of leader
update raft consensus documentation
code improvements
…-upgrade-v3.3.13

# Conflicts:
#	raft/handler.go
#	raft/peer.go
@lookfwd
Copy link

lookfwd commented Nov 22, 2019

Wondering how has this been tested?

@amalrajmani
Copy link
Contributor

Wondering how has this been tested?

When a new node joins the network as a peer, it tries to sync with leader and during this time the leader is overloaded and can potentially lead to periods of cluster unavailability. Please refer to https://github.com/etcd-io/etcd/blob/master/Documentation/learning/design-learner.md for further details on various scenarios. The advantage of a learner node is it joins the network as a non voting member and hence does not impact the Raft quorum.
For testing this, brought up a 3 node raft network and created few thousand blocks. Then added a 4th node as peer. While the 4th node is syncing with the network, brought down one of the other nodes. Now if any transaction is send, the transaction will wait till the newly joined node is fully synced with the network.
Repeated the same test adding the 4th node as learner. In this case the network will be able to process the transactions while the new learner node is still syncing with the network

Copy link
Contributor

@jbhurat jbhurat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a node is learner and it calls raft.removePeer() on itself, it returns null. When checking raft.cluster the node is still seen with a verifier role. Running raft.role returns an empty string. Running raft.cluster from any other node returns correct results though

jbhurat
jbhurat previously approved these changes Nov 26, 2019
Copy link
Contributor

@jbhurat jbhurat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amalrajmani
Copy link
Contributor

When a node is learner and it calls raft.removePeer() on itself, it returns null. When checking raft.cluster the node is still seen with a verifier role. Running raft.role returns an empty string. Running raft.cluster from any other node returns correct results though

this has been fixed now. after a learner/peer node removes itself it will return an empty array when raft.cluster is called

@jpmsam jpmsam requested a review from trung November 29, 2019 19:15
zzy96 and others added 2 commits December 2, 2019 22:58
1. let raft step return error when proposal is dropped to allow fail-fast
2. Propose in raft node wait the proposal result so we can fail fast while dropping proposal
Copy link
Contributor

@trung trung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Changes in the original etcd raft library:

@jpmsam jpmsam merged commit dfd93ff into master Dec 4, 2019
@trung trung deleted the Quorum-raft-upgrade-v3.3.13 branch December 6, 2019 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants