Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sam/stream msg app #1252

Closed
wants to merge 21 commits into from
Closed

Sam/stream msg app #1252

wants to merge 21 commits into from

Conversation

srh
Copy link

@srh srh commented Jul 25, 2017

Connected to #1180.

This makes us stream Raft log entries to a peer that's fallen behind, instead of sending one raftpb.Message per second, or per heartbeat. Puts all MsgApps on a gRPC stream for pushback.

This also adds Kick() to etcd so that we can "kick" it into sending more MsgApp messages, instead of one per heartbeat.


This change is Reviewable

@srh
Copy link
Author

srh commented Jul 25, 2017

Review status: 0 of 8 files reviewed at latest revision, 4 unresolved discussions.


cmd/dgraph/main.go, line 620 at r1 (raw file):

	// By default Go GRPC traces all requests.
	grpc.EnableTracing = false
	workerCancel, err := worker.SpawnServer(bindall) // For internal communication.

We need a cancellation function because we need to cancel RaftMessageStream workers on the server side. The other way to do that would be by closing the TCP connection, but that's not a graceful shutdown.


vendor/github.com/coreos/etcd/raft/node.go, line 168 at r1 (raw file):

	ReportSnapshot(id uint64, status SnapshotStatus)
	// Kick tells the node to send more MsgApps to the peer.
	Kick(id uint64, lastIndexSent uint64)

Changes to etcd.


worker/conn.go, line 176 at r1 (raw file):

// NewPool creates a new "pool" with one gRPC connection, refcount 0.
func newPool(ctx context.Context, addr string) (*pool, error) {
	conn, err := grpc.DialContext(ctx, addr,

Not related to the main fix, I guess I noticed using DialContext was better and non-deprecated.


worker/draft.go, line 78 at r1 (raw file):

	peers map[uint64]peerPoolEntry
	// A waitGroup for the peerPoolEntrys' goroutines.
	wg *sync.WaitGroup

I think I forgot to delete this field -- there's a WaitGroup in node, instead. Will remove.


Comments from Reviewable

@manishrjain
Copy link
Contributor

Hey @srh , A bunch of this stuff is supposed to be done by Raft library itself, and we shouldn't need to do it. So, I want to understand what and why behind this change. Let's have a video chat, we'll loop in @janardhan1993 as well.


Review status: 0 of 8 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


Comments from Reviewable

@janardhan1993
Copy link
Contributor

Review status: 0 of 8 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


protos/payload.proto, line 57 at r1 (raw file):

	// RAFT serving RPCs.
	rpc RaftMessage (Payload)                     returns (Payload) {}

Do we still need RaftMessage if we are streaming ?


vendor/github.com/coreos/etcd/raft/node.go, line 168 at r1 (raw file):

Previously, srh (Sam Hughes) wrote…

Changes to etcd.

Raft does flow control and do we need kick ? We can stream as many entries as possible once per hearbeat. We can use your change to do flow control instead of spawning go routines and leaving the messages in memory.

Kick complicates things. Sometimes we might need to send the entries again or send from a previous index. These cases are handled via msgappresp.


worker/draft.go, line 355 at r1 (raw file):

	// a *pool can be created, good, but if not, we still create a peerPoolEntry with
	// a nil *pool, no goroutine.
	p, ok := pools().connect(n.ctx, addr)

We could use the same connection for multiple groups. Since we are using n.ctx in grpc.dialcontext, it will cause requests from other groups to fail(if we have multiple groups running on same server) when n.ctx is cancelled.


worker/draft.go, line 541 at r1 (raw file):

			if err != nil {
				wg.Add(1)
				go func(stream protos.Worker_RaftMessageStreamClient) {

why goroutine here ??


Comments from Reviewable

@srh srh closed this Jul 26, 2017
@srh srh deleted the sam/streamMsgApp branch July 28, 2017 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants