Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release/v21.03: fix(Raft): Detect network partition when streaming #7908

Merged
merged 1 commit into from
Jun 22, 2021

Conversation

danielmai
Copy link
Contributor

@danielmai danielmai commented Jun 21, 2021

When streaming raft messages in k8s cluster, we don't seem to get an
error if the send didn't succeed. The packets get queued up, but don't
fail and don't get sent. This causes a long re-election process.

This PR periodically tries to send a message to the destination node via
IsPeer, so it has another way to test the connection. If that fails, the
streaming fails too, and the node is marked as unreachable.


This change is Reviewable

When streaming raft messages in k8s cluster, we don't seem to get an
error if the send didn't succeed. The packets get queued up, but don't
fail and don't get sent. This causes a long re-election process.

This PR periodically tries to send a message to the destination node via
IsPeer, so it has another way to test the connection. If that fails, the
streaming fails too, and the node is marked as unreachable.
@danielmai danielmai requested a review from manishrjain as a code owner June 21, 2021 20:49
@danielmai danielmai merged commit 09c9548 into release/v21.03 Jun 22, 2021
@danielmai danielmai deleted the danielmai/v21.03-raft-partition-fix branch June 22, 2021 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants