Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

startStopReplication: Handle Existing Peer Update #419

Merged
merged 1 commit into from
Sep 14, 2020

Conversation

alexanderbez
Copy link
Contributor

@alexanderbez alexanderbez commented Aug 25, 2020

Currently, when Raft#startStopReplication is executed from a config change update, it handles two cases: adding a new peer or removing an existing peer. However, if a configuration change resulted in a peer change (e.g. server address changed), Raft#startStopReplication won't catch this and heartbeats will be sent to the old address.

e.g.

  1. Create a cluster {A, B, C}
  2. Stop node A & wait for view change
  3. Start A with new IP address
  4. Observe AddStaging is handled by Raft#appendConfigurationEntry
  5. Observe new configuration is persisted
  6. Observe Raft#startStopReplication is called, but Raft keeps A with the old server address
  7. Observe heartbeats timeout

I'm not sure if we need to cleanup any routines or replication, so I just have the followerReplication.server field updated for now.

r.leaderState.replState[server.ID] = s
r.goFunc(func() { r.replicate(s) })
asyncNotifyCh(s.triggerCh)
r.observe(PeerObservation{Peer: server, Removed: false})
} else if ok && s.peer.Address != server.Address {
r.logger.Info("updating peer", "peer", server.ID)
s.peer = server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if you need to call asyncNotifyCh(s.triggerCh) like we are above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps? What does that do? is it idempotent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it tells raft to kick off replication to this server

raft/replication.go

Lines 156 to 158 in ff338ca

case <-s.triggerCh:
lastLogIdx, _ := r.getLastLog()
shouldStop = r.replicateTo(s, lastLogIdx)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We're only change the peer address here. So it depends on if the replication layer will pick up this change or not? If we do call asyncNotifyCh, is this idempotent? Specifically, is there any harm in calling it?

@alexanderbez
Copy link
Contributor Author

Going to merge this. During my testing of this change, replication was still occurring so I believe the asyncNotifyCh(s.triggerCh) call is unnecessary.

@alexanderbez alexanderbez merged commit b7cd2b3 into master Sep 14, 2020
@alexanderbez alexanderbez deleted the handle-peer-addr-update branch September 14, 2020 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants