Stream Raft Messages and Fix Check Quorum #3138

manishrjain · 2019-03-14T18:52:48Z

Instead of sending the Raft messages, one message per gRPC call, this PR creates a one-way stream between the sender and the receiver. Each messages gets pushed to a channel. We use smart batching to pick up as many messages as we can and send them over the stream in order. If we see connection issues etc., there are mechanisms in place to recreate the stream.

Another issue I saw was related to Zero being unable to maintain quorum. It was because of an unbuffered channel in checkQuorum asking for read index, which didn't allow multiple requests to be pushed into one batch causing check quorum to fail even with one second timeout. After allocating a buffered channel, all the check quorum requests finish within a millisecond, rarely going above 7ms in my tests.

This change is

martinmr

Reviewed 3 of 4 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @manishrjain)

conn/node.go, line 379 at r2 (raw file):

/ Exit after a thousand tries

"Exit after at least a thousand tries or ten seconds" to match what the code is actually doing.

conn/node.go, line 386 at r2 (raw file):

So that we print error only a few times

nit: "Update lastLog so that we print the error ... "

…d the outer one has longer.

manishrjain

Reviewable status: 1 of 5 files reviewed, 2 unresolved discussions (waiting on @martinmr)

conn/node.go, line 379 at r2 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

/ Exit after a thousand tries
"Exit after at least a thousand tries or ten seconds" to match what the code is actually doing.

Done.

conn/node.go, line 386 at r2 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

So that we print error only a few times
nit: "Update lastLog so that we print the error ... "

Done.

Instead of sending the Raft messages, one message per gRPC call, this PR creates a one-way stream between the sender and the receiver. Each messages gets pushed to a channel. We use smart batching to pick up as many messages as we can and send them over the stream in order. If we see connection issues etc., there are mechanisms in place to recreate the stream. Another issue I saw was related to Zero being unable to maintain quorum. It was because of an unbuffered channel in checkQuorum asking for read index, which didn't allow multiple requests to be pushed into one batch causing check quorum to fail even with one second timeout. After allocating a buffered channel, all the check quorum requests finish within a millisecond, rarely going above 7ms in my tests. Changes: * Stream raft messages instead of sending them one by one. * Set duration to 10s * Zero checkQuorum works well now * Martin's comments * Adjust timeouts in contexts, so the deeper one has shorter timeout and the outer one has longer. * Batch up multiple Raft messages from channel and send them in one request.

manishrjain added 6 commits February 6, 2019 18:06

Stream raft messages instead of sending them one by one.

3442948

Merge branch 'master' into mrjn/stream-raft-messages

96572fe

Merge branch 'master' into mrjn/stream-raft-messages

03058b2

Merge branch 'master' into mrjn/stream-raft-messages

5b0ba66

Merge master

7e2c95a

Set duration to 10s

66a12f5

manishrjain marked this pull request as ready for review March 15, 2019 00:32

manishrjain assigned martinmr Mar 15, 2019

martinmr suggested changes Mar 15, 2019

View reviewed changes

manishrjain added 4 commits March 19, 2019 14:19

Zero checkQuorum works well now

98678c0

Merge master

81e3de6

Martin's comments

b543706

Adjust timeouts in contexts, so the deeper one has shorter timeout an…

3dd0987

…d the outer one has longer.

manishrjain commented Mar 20, 2019

View reviewed changes

manishrjain added 2 commits March 19, 2019 17:40

Fix an error message

2d0b14f

Batch up multiple messages from channel and send them in one request.

73ad1ed

manishrjain changed the title ~~Stream Raft Messages~~ Stream Raft Messages and Fix Check Quorum Mar 20, 2019

manishrjain merged commit a57dfc0 into master Mar 20, 2019

manishrjain deleted the mrjn/stream-raft-messages branch March 20, 2019 01:16

manishrjain mentioned this pull request Mar 20, 2019

Sometimes dgraph cannot end a term #3129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream Raft Messages and Fix Check Quorum #3138

Stream Raft Messages and Fix Check Quorum #3138

manishrjain commented Mar 14, 2019 •

edited

Loading

martinmr left a comment

manishrjain left a comment

Stream Raft Messages and Fix Check Quorum #3138

Stream Raft Messages and Fix Check Quorum #3138

Conversation

manishrjain commented Mar 14, 2019 • edited Loading

martinmr left a comment

Choose a reason for hiding this comment

manishrjain left a comment

Choose a reason for hiding this comment

manishrjain commented Mar 14, 2019 •

edited

Loading