-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(txn): de-duplicate the context keys and predicates #7478
Conversation
828b49a
to
e978b65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the proposal application out of the Zero Raft loop, into an applyCh style application, like we do in Alpha. That would tackle the slowness issues.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @NamanJain8 and @vvbalaji-dgraph)
e978b65
to
0d391e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed the real issue was with txnContext.Keys containing duplicates. So, we don't need that Raft loop change.
Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @martinmr, @NamanJain8, and @vvbalaji-dgraph)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @NamanJain8 🎉
The zero raft loop got stuck because the proposals took a long time to process. Hence, the goroutine which checks the quorum gets struck leading to leadership change (even in single zero single alpha). The real issue was the Keys in txnContext. While merging the conflict keys, deduplication was not done. Do deduplication before sending the request to zero.
The zero raft loop got stuck because the proposals took a long time to process. Hence, the goroutine which checks the quorum gets struck leading to leadership change (even in single zero single alpha).
Update: The real issue was the Keys in txnContext. While merging the conflict keys, deduplication was not done. Do deduplication before sending the request to zero.
The issue is due to lock contention.
https://github.com/dgraph-io/dgraph/blob/c66e86985b642eb6b563014ed05389e7b3170f72/dgraph/cmd/zero/oracle.go#L132-L151 is holding the lock, while this function tries to send update https://github.com/dgraph-io/dgraph/blob/c66e86985b642eb6b563014ed05389e7b3170f72/dgraph/cmd/zero/oracle.go#L273-L283.
The processing of the keyCommit map is slow and the locks for o.commits and o.keyCommit can be separated by using
sync.Map
for o.commits.This change is