-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OPTIMIZATION] Optimize snapshot creation #4901
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 2 of 2 files at r3.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @ashish-goswami, @golangcibot, @manishrjain, and @pawanrawal)
raftwal/storage.go, line 90 at r2 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
printf: Errorf format %s has arg r.until of wrong type uint64 (from
govet
)
Same thing regarding the use of Wrapf instead of Errorf.
raftwal/storage.go, line 80 at r3 (raw file):
func (w *DiskStorage) processDeleteRange() { // TODO: Figure out a way to close this.
for future reference, can you be more specific about what needs to be closed?
I am assuming you mean closing the channel but I am not sure.
raftwal/storage.go, line 85 at r3 (raw file):
"deleteuntil
deleteUntil
raftwal/storage.go, line 85 at r3 (raw file):
glog.Errorf(
don't print the error message here. Used Wrapf instead to include the previous error.
raftwal/storage.go, line 273 at r3 (raw file):
if snap, err := w.Snapshot(); err == nil && !raft.IsEmptySnap(snap) {
why is the cache not being used anymore. If this is intentional, make sure to mention it in the PR description.
raftwal/storage.go, line 659 at r3 (raw file):
} // TODO: should we block here?
Also for future reference, maybe add a short description of why are you concerned the code might have to block here.
raftwal/storage.go, line 660 at r3 (raw file):
// TODO: should we block here? w.deleteChan <- deleteRange{first - 1, snap.Metadata.Index}
why first - 1
instead of just first
. Maybe add a comment explaining this because it's not obvious why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clean it up, so we can merge this.
Reviewed 1 of 3 files at r1, 1 of 2 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @golangcibot, and @pawanrawal)
raftwal/storage.go, line 81 at r4 (raw file):
batch := w.db.NewWriteBatch() defer batch.Cancel() x.Check(w.deleteFrom(batch, first-2, true))
use delete until.
And if you're modifying the meaning of delete until, then call it deleteRange.
raftwal/storage.go, line 89 at r4 (raw file):
func (w *DiskStorage) processDeleteRange() { // TODO: Figure out a way to close this. go func() {
always call goroutine from outside. Don't envelope entire function in this from within.
Also, you need a way to shut this down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 2 files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @golangcibot, @manishrjain, @martinmr, and @pawanrawal)
raftwal/storage.go, line 80 at r3 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
for future reference, can you be more specific about what needs to be closed?
I am assuming you mean closing the channel but I am not sure.
Done.
raftwal/storage.go, line 85 at r3 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
"deleteuntil
deleteUntil
Done.
raftwal/storage.go, line 273 at r3 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
if snap, err := w.Snapshot(); err == nil && !raft.IsEmptySnap(snap) {
why is the cache not being used anymore. If this is intentional, make sure to mention it in the PR description.
Done.
raftwal/storage.go, line 660 at r3 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
why
first - 1
instead of justfirst
. Maybe add a comment explaining this because it's not obvious why.
Done.
raftwal/storage.go, line 81 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
use delete until.
And if you're modifying the meaning of delete until, then call it deleteRange.
Done.
raftwal/storage.go, line 89 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
always call goroutine from outside. Don't envelope entire function in this from within.
Also, you need a way to shut this down.
Done.
worker/draft.go, line 427 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 114 characters (from
lll
)
Fixed.
worker/groups.go
Outdated
@@ -52,6 +52,7 @@ type groupi struct { | |||
triggerCh chan struct{} // Used to trigger membership sync | |||
blockDeletes *sync.Mutex // Ensure that deletion won't happen when move is going on. | |||
closer *y.Closer | |||
storeCloser *y.Closer // Closer for raftwal storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field storeCloser
is unused (from unused
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 7 files reviewed, 13 unresolved discussions (waiting on @ashish-goswami, @golangcibot, @manishrjain, @martinmr, and @pawanrawal)
raftwal/storage.go, line 50 at r6 (raw file):
type indexRange struct { from, until uint64
Just add comment that this is RAFT index.
raftwal/storage.go, line 290 at r6 (raw file):
// into the latest Snapshot). func (w *DiskStorage) FirstIndex() (uint64, error) { // Previously we used to get snapshot key from cache. If it is not found in cache, we used to
Just add a comment explaining the current behavior that cache is only set on calling CreateSnapshot and also how does doing snapshot in background affects this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 8 files at r5, 1 of 1 files at r6, 1 of 1 files at r7.
Reviewable status: all files reviewed, 16 unresolved discussions (waiting on @ashish-goswami, @golangcibot, @manishrjain, and @martinmr)
dgraph/cmd/debug/wal.go, line 271 at r7 (raw file):
case len(opt.wsetSnapshot) > 0: err := overwriteSnapshot(db, store) store.Closer.SignalAndWait()
Is there another goroutine here? I'm not sure. So, do run the tool and see if it gets stuck.
raftwal/storage.go, line 94 at r7 (raw file):
defer w.Closer.Done() slurp := func(r indexRange) {
deleteRange
raftwal/storage.go, line 116 at r7 (raw file):
// As we have already shutdown the node, it is safe to close indexRangeChan. // node.processApplyChan() calls CreteSnapshot, which internally sends values on this chan.
CreateSnapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 7 files reviewed, 16 unresolved discussions (waiting on @ashish-goswami, @golangcibot, @manishrjain, @martinmr, and @pawanrawal)
raftwal/storage.go, line 659 at r3 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
Also for future reference, maybe add a short description of why are you concerned the code might have to block here.
Done.
raftwal/storage.go, line 50 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
Just add comment that this is RAFT index.
Done.
raftwal/storage.go, line 290 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
Just add a comment explaining the current behavior that cache is only set on calling CreateSnapshot and also how does doing snapshot in background affects this.
Done.
raftwal/storage.go, line 94 at r7 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
deleteRange
Renamed it processSingleRange
raftwal/storage.go, line 116 at r7 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
CreateSnapshot
Done.
worker/groups.go, line 55 at r5 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
field
storeCloser
is unused (fromunused
)
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 7 files reviewed, 16 unresolved discussions (waiting on @ashish-goswami, @golangcibot, @manishrjain, @martinmr, and @pawanrawal)
dgraph/cmd/debug/wal.go, line 271 at r7 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Is there another goroutine here? I'm not sure. So, do run the tool and see if it gets stuck.
Ran debug
tool on w/zw directory of alpha/zero nodes after insertion of 1 million data using live loader. Didn't get stuck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 7 files reviewed, 17 unresolved discussions (waiting on @ashish-goswami, @golangcibot, @manishrjain, @martinmr, and @pawanrawal)
raftwal/storage.go, line 97 at r8 (raw file):
batch := w.db.NewWriteBatch() if err := w.deleteRange(batch, r.from, r.until); err != nil { glog.Errorf("deleteRange failed with error: %s, from: %d, until: %d\n",
Why are we using %s
for printing error?
Bumping this up! Have definitely noticed that my snapshot times have been anywhere between 1-6 seconds 🤔 |
Fixes: #4250, DGRAPH-676 Currently snapshot creation is taking time, which shoots up mutations latency. This PR optimizes snapshot creation in following ways: We delete all raft entries until snapshot index, while saving new snapshot. This requires performing a seek on badger. We used to seek from entry 0, which is not required. We can seek from FirstIndex at that point of time. This avoids going through lot of previously deleted entries in badger iterator. This improves seek time significantly. Saving of new snapshot and deletion of all entries used to be synchronous. This PR saves new snapshot synchronously but performs deletion of entries in background. This PR reduces snapshot creation time from seconds(in worst case) to milliseconds. Logs from master for snapshot creation time: 574 I0318 10:30:55.174392 18 storage.go:595] CreateSnapshot i=20739611, cs=nodes:1 nodes:2 nodes:3 575 I0318 10:30:59.398864 18 draft.go:506] Creating snapshot at index: 20739611. ReadTs: 39763033. time taken: 4.22447658s 576 I0318 10:31:47.558142 18 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 577 I0318 10:31:47.558183 18 log.go:34] Running for level: 0 578 I0318 10:31:48.202713 18 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 644.511199ms 579 I0318 10:31:48.202740 18 log.go:34] Compaction for level: 0 DONE 580 I0318 10:31:54.985492 18 draft.go:1404] Num pending txns: 54 581 I0318 10:31:55.131087 18 draft.go:804] [0x1] Set Raft progress to index: 20768031. 582 I0318 10:31:55.131145 18 draft.go:1404] Num pending txns: 60 583 I0318 10:31:55.272493 18 storage.go:595] CreateSnapshot i=20768104, cs=nodes:1 nodes:2 nodes:3 584 I0318 10:31:58.809421 18 draft.go:506] Creating snapshot at index: 20768104. ReadTs: 39817697. time taken: 3.536931581s 585 I0318 10:32:54.972385 18 draft.go:1404] Num pending txns: 68 586 I0318 10:32:55.124161 18 draft.go:804] [0x1] Set Raft progress to index: 20796681. 587 I0318 10:32:55.124219 18 draft.go:1404] Num pending txns: 56 588 I0318 10:32:55.322028 18 storage.go:595] CreateSnapshot i=20796770, cs=nodes:1 nodes:2 nodes:3 589 I0318 10:32:59.558248 18 draft.go:506] Creating snapshot at index: 20796770. ReadTs: 39872679. time taken: 4.236221663s 590 I0318 10:33:28.558129 18 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 591 I0318 10:33:28.558176 18 log.go:34] Running for level: 0 592 I0318 10:33:29.230710 18 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 672.502982ms 593 I0318 10:33:29.230742 18 log.go:34] Compaction for level: 0 DONE 594 I0318 10:33:54.986669 18 draft.go:1404] Num pending txns: 64 595 I0318 10:33:55.125660 18 draft.go:804] [0x1] Set Raft progress to index: 20825162. 596 I0318 10:33:55.125710 18 draft.go:1404] Num pending txns: 60 597 I0318 10:33:55.264070 18 storage.go:595] CreateSnapshot i=20825236, cs=nodes:1 nodes:2 nodes:3 598 I0318 10:33:58.716431 18 draft.go:506] Creating snapshot at index: 20825236. ReadTs: 39927283. time taken: 3.452365171s 599 I0318 10:34:54.969608 18 draft.go:115] Operation started with id: 1 Logs from this branch for snapshot creation time: 178 I0318 11:02:42.103363 1 draft.go:425] Creating snapshot at index: 21496840. ReadTs: 41226003. time taken: 297.39µs 179 I0318 11:03:21.943711 1 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 180 I0318 11:03:21.943763 1 log.go:34] Running for level: 0 181 I0318 11:03:24.510385 1 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 2.566600068s 182 I0318 11:03:24.510417 1 log.go:34] Compaction for level: 0 DONE 183 I0318 11:03:40.164856 1 draft.go:1283] Num pending txns: 56 184 I0318 11:03:40.330751 1 draft.go:709] [0x1] Set Raft progress to index: 21526400. 185 I0318 11:03:40.330806 1 draft.go:1283] Num pending txns: 55 186 I0318 11:03:40.503145 1 storage.go:645] CreateSnapshot i=21526477, cs=nodes:1 nodes:2 nodes:3 187 I0318 11:03:40.503317 1 draft.go:425] Creating snapshot at index: 21526477. ReadTs: 41282847. time taken: 174.56µs 188 W0318 11:04:20.508999 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 221ms. Breakdown: [{disk 220ms} {proposal s 0s} {advance 0s}] Num entries: 0. MustSync: false 189 I0318 11:04:40.164813 1 draft.go:1283] Num pending txns: 66 190 I0318 11:04:40.429105 1 draft.go:709] [0x1] Set Raft progress to index: 21556865. 191 I0318 11:04:40.429157 1 draft.go:1283] Num pending txns: 23 192 I0318 11:04:40.639351 1 storage.go:645] CreateSnapshot i=21557009, cs=nodes:1 nodes:2 nodes:3 193 I0318 11:04:40.639667 1 draft.go:425] Creating snapshot at index: 21557009. ReadTs: 41341402. time taken: 317.73µs 194 W0318 11:04:40.952292 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 248ms. Breakdown: [{disk 243ms} {sync 5ms } {proposals 0s} {advance 0s}] Num entries: 30. MustSync: true 195 I0318 11:04:50.943669 1 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 196 I0318 11:04:50.943708 1 log.go:34] Running for level: 0 197 W0318 11:04:53.236307 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 443ms. Breakdown: [{sync 443ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 35. MustSync: true 198 I0318 11:04:53.299058 1 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 2.355332869s 199 I0318 11:04:53.299089 1 log.go:34] Compaction for level: 0 DONE 200 I0318 11:05:40.164832 1 draft.go:1283] Num pending txns: 39 201 I0318 11:05:40.340678 1 draft.go:709] [0x1] Set Raft progress to index: 21587019. 202 I0318 11:05:40.340732 1 draft.go:1283] Num pending txns: 34 203 I0318 11:05:40.511474 1 storage.go:645] CreateSnapshot i=21587137, cs=nodes:1 nodes:2 nodes:3 204 I0318 11:05:40.511781 1 draft.go:425] Creating snapshot at index: 21587137. ReadTs: 41399174. time taken: 306.48µs (cherry picked from commit 1c61346)
Fixes: #4250, DGRAPH-676 Currently snapshot creation is taking time, which shoots up mutations latency. This PR optimizes snapshot creation in following ways: We delete all raft entries until snapshot index, while saving new snapshot. This requires performing a seek on badger. We used to seek from entry 0, which is not required. We can seek from FirstIndex at that point of time. This avoids going through lot of previously deleted entries in badger iterator. This improves seek time significantly. Saving of new snapshot and deletion of all entries used to be synchronous. This PR saves new snapshot synchronously but performs deletion of entries in background. This PR reduces snapshot creation time from seconds(in worst case) to milliseconds. Logs from master for snapshot creation time: 574 I0318 10:30:55.174392 18 storage.go:595] CreateSnapshot i=20739611, cs=nodes:1 nodes:2 nodes:3 575 I0318 10:30:59.398864 18 draft.go:506] Creating snapshot at index: 20739611. ReadTs: 39763033. time taken: 4.22447658s 576 I0318 10:31:47.558142 18 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 577 I0318 10:31:47.558183 18 log.go:34] Running for level: 0 578 I0318 10:31:48.202713 18 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 644.511199ms 579 I0318 10:31:48.202740 18 log.go:34] Compaction for level: 0 DONE 580 I0318 10:31:54.985492 18 draft.go:1404] Num pending txns: 54 581 I0318 10:31:55.131087 18 draft.go:804] [0x1] Set Raft progress to index: 20768031. 582 I0318 10:31:55.131145 18 draft.go:1404] Num pending txns: 60 583 I0318 10:31:55.272493 18 storage.go:595] CreateSnapshot i=20768104, cs=nodes:1 nodes:2 nodes:3 584 I0318 10:31:58.809421 18 draft.go:506] Creating snapshot at index: 20768104. ReadTs: 39817697. time taken: 3.536931581s 585 I0318 10:32:54.972385 18 draft.go:1404] Num pending txns: 68 586 I0318 10:32:55.124161 18 draft.go:804] [0x1] Set Raft progress to index: 20796681. 587 I0318 10:32:55.124219 18 draft.go:1404] Num pending txns: 56 588 I0318 10:32:55.322028 18 storage.go:595] CreateSnapshot i=20796770, cs=nodes:1 nodes:2 nodes:3 589 I0318 10:32:59.558248 18 draft.go:506] Creating snapshot at index: 20796770. ReadTs: 39872679. time taken: 4.236221663s 590 I0318 10:33:28.558129 18 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 591 I0318 10:33:28.558176 18 log.go:34] Running for level: 0 592 I0318 10:33:29.230710 18 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 672.502982ms 593 I0318 10:33:29.230742 18 log.go:34] Compaction for level: 0 DONE 594 I0318 10:33:54.986669 18 draft.go:1404] Num pending txns: 64 595 I0318 10:33:55.125660 18 draft.go:804] [0x1] Set Raft progress to index: 20825162. 596 I0318 10:33:55.125710 18 draft.go:1404] Num pending txns: 60 597 I0318 10:33:55.264070 18 storage.go:595] CreateSnapshot i=20825236, cs=nodes:1 nodes:2 nodes:3 598 I0318 10:33:58.716431 18 draft.go:506] Creating snapshot at index: 20825236. ReadTs: 39927283. time taken: 3.452365171s 599 I0318 10:34:54.969608 18 draft.go:115] Operation started with id: 1 Logs from this branch for snapshot creation time: 178 I0318 11:02:42.103363 1 draft.go:425] Creating snapshot at index: 21496840. ReadTs: 41226003. time taken: 297.39µs 179 I0318 11:03:21.943711 1 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 180 I0318 11:03:21.943763 1 log.go:34] Running for level: 0 181 I0318 11:03:24.510385 1 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 2.566600068s 182 I0318 11:03:24.510417 1 log.go:34] Compaction for level: 0 DONE 183 I0318 11:03:40.164856 1 draft.go:1283] Num pending txns: 56 184 I0318 11:03:40.330751 1 draft.go:709] [0x1] Set Raft progress to index: 21526400. 185 I0318 11:03:40.330806 1 draft.go:1283] Num pending txns: 55 186 I0318 11:03:40.503145 1 storage.go:645] CreateSnapshot i=21526477, cs=nodes:1 nodes:2 nodes:3 187 I0318 11:03:40.503317 1 draft.go:425] Creating snapshot at index: 21526477. ReadTs: 41282847. time taken: 174.56µs 188 W0318 11:04:20.508999 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 221ms. Breakdown: [{disk 220ms} {proposal s 0s} {advance 0s}] Num entries: 0. MustSync: false 189 I0318 11:04:40.164813 1 draft.go:1283] Num pending txns: 66 190 I0318 11:04:40.429105 1 draft.go:709] [0x1] Set Raft progress to index: 21556865. 191 I0318 11:04:40.429157 1 draft.go:1283] Num pending txns: 23 192 I0318 11:04:40.639351 1 storage.go:645] CreateSnapshot i=21557009, cs=nodes:1 nodes:2 nodes:3 193 I0318 11:04:40.639667 1 draft.go:425] Creating snapshot at index: 21557009. ReadTs: 41341402. time taken: 317.73µs 194 W0318 11:04:40.952292 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 248ms. Breakdown: [{disk 243ms} {sync 5ms } {proposals 0s} {advance 0s}] Num entries: 30. MustSync: true 195 I0318 11:04:50.943669 1 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 196 I0318 11:04:50.943708 1 log.go:34] Running for level: 0 197 W0318 11:04:53.236307 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 443ms. Breakdown: [{sync 443ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 35. MustSync: true 198 I0318 11:04:53.299058 1 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 2.355332869s 199 I0318 11:04:53.299089 1 log.go:34] Compaction for level: 0 DONE 200 I0318 11:05:40.164832 1 draft.go:1283] Num pending txns: 39 201 I0318 11:05:40.340678 1 draft.go:709] [0x1] Set Raft progress to index: 21587019. 202 I0318 11:05:40.340732 1 draft.go:1283] Num pending txns: 34 203 I0318 11:05:40.511474 1 storage.go:645] CreateSnapshot i=21587137, cs=nodes:1 nodes:2 nodes:3 204 I0318 11:05:40.511781 1 draft.go:425] Creating snapshot at index: 21587137. ReadTs: 41399174. time taken: 306.48µs
Fixes: #4250, DGRAPH-676 Currently snapshot creation is taking time, which shoots up mutations latency. This PR optimizes snapshot creation in following ways: We delete all raft entries until snapshot index, while saving new snapshot. This requires performing a seek on badger. We used to seek from entry 0, which is not required. We can seek from FirstIndex at that point of time. This avoids going through lot of previously deleted entries in badger iterator. This improves seek time significantly. Saving of new snapshot and deletion of all entries used to be synchronous. This PR saves new snapshot synchronously but performs deletion of entries in background. This PR reduces snapshot creation time from seconds(in worst case) to milliseconds. Logs from master for snapshot creation time: 574 I0318 10:30:55.174392 18 storage.go:595] CreateSnapshot i=20739611, cs=nodes:1 nodes:2 nodes:3 575 I0318 10:30:59.398864 18 draft.go:506] Creating snapshot at index: 20739611. ReadTs: 39763033. time taken: 4.22447658s 576 I0318 10:31:47.558142 18 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 577 I0318 10:31:47.558183 18 log.go:34] Running for level: 0 578 I0318 10:31:48.202713 18 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 644.511199ms 579 I0318 10:31:48.202740 18 log.go:34] Compaction for level: 0 DONE 580 I0318 10:31:54.985492 18 draft.go:1404] Num pending txns: 54 581 I0318 10:31:55.131087 18 draft.go:804] [0x1] Set Raft progress to index: 20768031. 582 I0318 10:31:55.131145 18 draft.go:1404] Num pending txns: 60 583 I0318 10:31:55.272493 18 storage.go:595] CreateSnapshot i=20768104, cs=nodes:1 nodes:2 nodes:3 584 I0318 10:31:58.809421 18 draft.go:506] Creating snapshot at index: 20768104. ReadTs: 39817697. time taken: 3.536931581s 585 I0318 10:32:54.972385 18 draft.go:1404] Num pending txns: 68 586 I0318 10:32:55.124161 18 draft.go:804] [0x1] Set Raft progress to index: 20796681. 587 I0318 10:32:55.124219 18 draft.go:1404] Num pending txns: 56 588 I0318 10:32:55.322028 18 storage.go:595] CreateSnapshot i=20796770, cs=nodes:1 nodes:2 nodes:3 589 I0318 10:32:59.558248 18 draft.go:506] Creating snapshot at index: 20796770. ReadTs: 39872679. time taken: 4.236221663s 590 I0318 10:33:28.558129 18 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 591 I0318 10:33:28.558176 18 log.go:34] Running for level: 0 592 I0318 10:33:29.230710 18 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 672.502982ms 593 I0318 10:33:29.230742 18 log.go:34] Compaction for level: 0 DONE 594 I0318 10:33:54.986669 18 draft.go:1404] Num pending txns: 64 595 I0318 10:33:55.125660 18 draft.go:804] [0x1] Set Raft progress to index: 20825162. 596 I0318 10:33:55.125710 18 draft.go:1404] Num pending txns: 60 597 I0318 10:33:55.264070 18 storage.go:595] CreateSnapshot i=20825236, cs=nodes:1 nodes:2 nodes:3 598 I0318 10:33:58.716431 18 draft.go:506] Creating snapshot at index: 20825236. ReadTs: 39927283. time taken: 3.452365171s 599 I0318 10:34:54.969608 18 draft.go:115] Operation started with id: 1 Logs from this branch for snapshot creation time: 178 I0318 11:02:42.103363 1 draft.go:425] Creating snapshot at index: 21496840. ReadTs: 41226003. time taken: 297.39µs 179 I0318 11:03:21.943711 1 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 180 I0318 11:03:21.943763 1 log.go:34] Running for level: 0 181 I0318 11:03:24.510385 1 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 2.566600068s 182 I0318 11:03:24.510417 1 log.go:34] Compaction for level: 0 DONE 183 I0318 11:03:40.164856 1 draft.go:1283] Num pending txns: 56 184 I0318 11:03:40.330751 1 draft.go:709] [0x1] Set Raft progress to index: 21526400. 185 I0318 11:03:40.330806 1 draft.go:1283] Num pending txns: 55 186 I0318 11:03:40.503145 1 storage.go:645] CreateSnapshot i=21526477, cs=nodes:1 nodes:2 nodes:3 187 I0318 11:03:40.503317 1 draft.go:425] Creating snapshot at index: 21526477. ReadTs: 41282847. time taken: 174.56µs 188 W0318 11:04:20.508999 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 221ms. Breakdown: [{disk 220ms} {proposal s 0s} {advance 0s}] Num entries: 0. MustSync: false 189 I0318 11:04:40.164813 1 draft.go:1283] Num pending txns: 66 190 I0318 11:04:40.429105 1 draft.go:709] [0x1] Set Raft progress to index: 21556865. 191 I0318 11:04:40.429157 1 draft.go:1283] Num pending txns: 23 192 I0318 11:04:40.639351 1 storage.go:645] CreateSnapshot i=21557009, cs=nodes:1 nodes:2 nodes:3 193 I0318 11:04:40.639667 1 draft.go:425] Creating snapshot at index: 21557009. ReadTs: 41341402. time taken: 317.73µs 194 W0318 11:04:40.952292 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 248ms. Breakdown: [{disk 243ms} {sync 5ms } {proposals 0s} {advance 0s}] Num entries: 30. MustSync: true 195 I0318 11:04:50.943669 1 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 196 I0318 11:04:50.943708 1 log.go:34] Running for level: 0 197 W0318 11:04:53.236307 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 443ms. Breakdown: [{sync 443ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 35. MustSync: true 198 I0318 11:04:53.299058 1 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 2.355332869s 199 I0318 11:04:53.299089 1 log.go:34] Compaction for level: 0 DONE 200 I0318 11:05:40.164832 1 draft.go:1283] Num pending txns: 39 201 I0318 11:05:40.340678 1 draft.go:709] [0x1] Set Raft progress to index: 21587019. 202 I0318 11:05:40.340732 1 draft.go:1283] Num pending txns: 34 203 I0318 11:05:40.511474 1 storage.go:645] CreateSnapshot i=21587137, cs=nodes:1 nodes:2 nodes:3 204 I0318 11:05:40.511781 1 draft.go:425] Creating snapshot at index: 21587137. ReadTs: 41399174. time taken: 306.48µs
Fixes: hypermodeinc#4250, DGRAPH-676 Currently snapshot creation is taking time, which shoots up mutations latency. This PR optimizes snapshot creation in following ways: We delete all raft entries until snapshot index, while saving new snapshot. This requires performing a seek on badger. We used to seek from entry 0, which is not required. We can seek from FirstIndex at that point of time. This avoids going through lot of previously deleted entries in badger iterator. This improves seek time significantly. Saving of new snapshot and deletion of all entries used to be synchronous. This PR saves new snapshot synchronously but performs deletion of entries in background. This PR reduces snapshot creation time from seconds(in worst case) to milliseconds. Logs from master for snapshot creation time: 574 I0318 10:30:55.174392 18 storage.go:595] CreateSnapshot i=20739611, cs=nodes:1 nodes:2 nodes:3 575 I0318 10:30:59.398864 18 draft.go:506] Creating snapshot at index: 20739611. ReadTs: 39763033. time taken: 4.22447658s 576 I0318 10:31:47.558142 18 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 577 I0318 10:31:47.558183 18 log.go:34] Running for level: 0 578 I0318 10:31:48.202713 18 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 644.511199ms 579 I0318 10:31:48.202740 18 log.go:34] Compaction for level: 0 DONE 580 I0318 10:31:54.985492 18 draft.go:1404] Num pending txns: 54 581 I0318 10:31:55.131087 18 draft.go:804] [0x1] Set Raft progress to index: 20768031. 582 I0318 10:31:55.131145 18 draft.go:1404] Num pending txns: 60 583 I0318 10:31:55.272493 18 storage.go:595] CreateSnapshot i=20768104, cs=nodes:1 nodes:2 nodes:3 584 I0318 10:31:58.809421 18 draft.go:506] Creating snapshot at index: 20768104. ReadTs: 39817697. time taken: 3.536931581s 585 I0318 10:32:54.972385 18 draft.go:1404] Num pending txns: 68 586 I0318 10:32:55.124161 18 draft.go:804] [0x1] Set Raft progress to index: 20796681. 587 I0318 10:32:55.124219 18 draft.go:1404] Num pending txns: 56 588 I0318 10:32:55.322028 18 storage.go:595] CreateSnapshot i=20796770, cs=nodes:1 nodes:2 nodes:3 589 I0318 10:32:59.558248 18 draft.go:506] Creating snapshot at index: 20796770. ReadTs: 39872679. time taken: 4.236221663s 590 I0318 10:33:28.558129 18 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 591 I0318 10:33:28.558176 18 log.go:34] Running for level: 0 592 I0318 10:33:29.230710 18 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 672.502982ms 593 I0318 10:33:29.230742 18 log.go:34] Compaction for level: 0 DONE 594 I0318 10:33:54.986669 18 draft.go:1404] Num pending txns: 64 595 I0318 10:33:55.125660 18 draft.go:804] [0x1] Set Raft progress to index: 20825162. 596 I0318 10:33:55.125710 18 draft.go:1404] Num pending txns: 60 597 I0318 10:33:55.264070 18 storage.go:595] CreateSnapshot i=20825236, cs=nodes:1 nodes:2 nodes:3 598 I0318 10:33:58.716431 18 draft.go:506] Creating snapshot at index: 20825236. ReadTs: 39927283. time taken: 3.452365171s 599 I0318 10:34:54.969608 18 draft.go:115] Operation started with id: 1 Logs from this branch for snapshot creation time: 178 I0318 11:02:42.103363 1 draft.go:425] Creating snapshot at index: 21496840. ReadTs: 41226003. time taken: 297.39µs 179 I0318 11:03:21.943711 1 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 180 I0318 11:03:21.943763 1 log.go:34] Running for level: 0 181 I0318 11:03:24.510385 1 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 2.566600068s 182 I0318 11:03:24.510417 1 log.go:34] Compaction for level: 0 DONE 183 I0318 11:03:40.164856 1 draft.go:1283] Num pending txns: 56 184 I0318 11:03:40.330751 1 draft.go:709] [0x1] Set Raft progress to index: 21526400. 185 I0318 11:03:40.330806 1 draft.go:1283] Num pending txns: 55 186 I0318 11:03:40.503145 1 storage.go:645] CreateSnapshot i=21526477, cs=nodes:1 nodes:2 nodes:3 187 I0318 11:03:40.503317 1 draft.go:425] Creating snapshot at index: 21526477. ReadTs: 41282847. time taken: 174.56µs 188 W0318 11:04:20.508999 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 221ms. Breakdown: [{disk 220ms} {proposal s 0s} {advance 0s}] Num entries: 0. MustSync: false 189 I0318 11:04:40.164813 1 draft.go:1283] Num pending txns: 66 190 I0318 11:04:40.429105 1 draft.go:709] [0x1] Set Raft progress to index: 21556865. 191 I0318 11:04:40.429157 1 draft.go:1283] Num pending txns: 23 192 I0318 11:04:40.639351 1 storage.go:645] CreateSnapshot i=21557009, cs=nodes:1 nodes:2 nodes:3 193 I0318 11:04:40.639667 1 draft.go:425] Creating snapshot at index: 21557009. ReadTs: 41341402. time taken: 317.73µs 194 W0318 11:04:40.952292 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 248ms. Breakdown: [{disk 243ms} {sync 5ms } {proposals 0s} {advance 0s}] Num entries: 30. MustSync: true 195 I0318 11:04:50.943669 1 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]} 196 I0318 11:04:50.943708 1 log.go:34] Running for level: 0 197 W0318 11:04:53.236307 1 draft.go:1025] Raft.Ready took too long to process: Timer Total: 443ms. Breakdown: [{sync 443ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 35. MustSync: true 198 I0318 11:04:53.299058 1 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 2.355332869s 199 I0318 11:04:53.299089 1 log.go:34] Compaction for level: 0 DONE 200 I0318 11:05:40.164832 1 draft.go:1283] Num pending txns: 39 201 I0318 11:05:40.340678 1 draft.go:709] [0x1] Set Raft progress to index: 21587019. 202 I0318 11:05:40.340732 1 draft.go:1283] Num pending txns: 34 203 I0318 11:05:40.511474 1 storage.go:645] CreateSnapshot i=21587137, cs=nodes:1 nodes:2 nodes:3 204 I0318 11:05:40.511781 1 draft.go:425] Creating snapshot at index: 21587137. ReadTs: 41399174. time taken: 306.48µs
Fixes: #4250, DGRAPH-676
Currently snapshot creation is taking time, which shoots up mutations latency.
This PR optimizes snapshot creation in following ways:
0
, which is not required. We can seek fromFirstIndex
at that point of time. This avoids going through lot of previously deleted entries in badger iterator. This improves seek time significantly.This PR reduces snapshot creation time from seconds(in worst case) to milliseconds.
Logs from master for snapshot creation time:
Logs from this branch for snapshot creation time:
This change is
Docs Preview: