-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(backup): use StreamWriter instead of KVLoader during backup restore #8510
Conversation
|
Do we want to merge all these in one go or can we split these up in smaller cherry-picks? |
no, I was thinking of merging them one by one but I realized that these changes are better merged together. The challenge with these changes is that it has lot of refactoring including file name changes and it become very difficult to make sense of them one commit at a time. |
f0932d6
to
2a123f2
Compare
NIT - could we align titles to our existing format, |
2a123f2
to
575ba5b
Compare
575ba5b
to
0abf09c
Compare
0abf09c
to
4dbaf51
Compare
A couple of comments -
|
Most of the normal cases are already covered. I have a few more tests in mind and Siddhesh has a PR for adding more tests. I want to unblock rest of the changes for the slash alignment and I will work on adding tests in parallel.
I think StreamWriter is inherently faster than our existing approach. Even if it is not fast, it improves the performance for writes later on. Let's talk more about this in tomorrow's meeting. |
27861d7
to
c51a385
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still looking over ...
0c75dae
to
545677f
Compare
This is how the map reduce phases are implemented in this PR: we create MAP files each of a limited size and write sorted data into it. We may end up creating many such files. Then we take all of these MAP files and read part of the data from each file, sort all of this data and then use streamwriter to write the sorted data into pstore badger. We store some sort of partition keys in the MAP file in the beginning of the file. The partition keys are just intermediate keys among the entries that we store in the map file. When we read data during reduce, we read in the chunks of these partition keys, meaning from one partition key to the next partition key. I am not sure if there is a value in having these partition keys. Maybe, we can live without them. |
Few questions as I try to wade through the change:
|
hard coded
restore_reduce.go, it is set to 2GB
correct
correct. We need to sort data in all the map files while each map file is already sorted. We read data until the partition key from each map file sort that data and then write it to badger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minor nitpicks here and there. I haven't really looked that deeply into the algorithm yet.
type predicateSet map[string]struct{} | ||
|
||
// Manifest records backup details, these are values used during restore. | ||
// Since is the timestamp from which the next incremental backup should start (it's set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something like SinceTs is the the timestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean?
2ac2673
to
3307aab
Compare
"/state": true, | ||
"/health": true, | ||
"/state": true, | ||
"/probe/graphql": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require a doc update in the audit log section that this endpoint will not be audited?
I think it does require a doc update:
https://dgraph.io/docs/enterprise-features/audit-logs/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make a note of it. Thanks
It changes |
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * opt(restore): Sort the buffer before spinning the writeToDisk goroutine (#7984) (#7996) * fix(backup): Fix full backup request (#7932) (#7933) * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * fix(restore): return nil if there is error (#7899) * Don't ban namespace in export_backup * reset the kv.StreamId before sending to stream writer (#7833) (#7837) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
cherry-pick PR #7753
This commit is a major rewrite of online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes in the case of restore.