Add Support for Restoring Specific Backups#8824
Conversation
By default Vitess will only make practical use of the latest backup of a given shard. While this makes perfect sense for the common use cases there are times where you need to restore a specific backup. For example: 1. In order to extract a portion of the data that can then be merged with the current state. For example if you later realize that you accidentally deleted some records in a table that you shouldn't have last week, and you need to perform a restore so that you can copy those specific records back to the live data set. 2. To perform validation, forensics, analysis on the system state at that time. 3. A specific PITR for whatever reason ... This is a continuation of: vitessio#7998 This solves: vitessio#4905 Co-authored-by: Guido Iaquinti <giaquinti@slack-corp.com> Signed-off-by: Matt Lord <mattalord@gmail.com>
597aa29 to
50938fd
Compare
|
Once tests are added LGTM. Thank you for moving this forward! |
1caad45 to
863df25
Compare
|
I've tested and verified the tablet flag as well as the vtctlclient flag: BackupsUpdated help outputInvalid timestampUses latest / 2021-09-17.034524Uses the older one based on the timestampNow I have a good idea of what to test and how in the test suite. 🙂 |
40a66cf to
0d9f2c8
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
0d9f2c8 to
bf78488
Compare
deepthi
left a comment
There was a problem hiding this comment.
Looking good, except protobuf definition is missing.
e61794f to
4b80776
Compare
And add missing protobuf change Signed-off-by: Matt Lord <mattalord@gmail.com>
9db70ac to
25fa195
Compare
dff915a to
f8ec301
Compare
mm == minutes (two digits) MM == month (two digits) Signed-off-by: Matt Lord <mattalord@gmail.com>
ee31291 to
a7a951b
Compare
deepthi
left a comment
There was a problem hiding this comment.
LGTM.
Let's get another set of eyes before merging.
Can you please create a website PR to document the new usage (vttablet flag & vtctl command)?
| err = localCluster.VtctlclientProcess.ExecuteCommand("DeleteTablet", "-allow_primary=true", replica2.Alias) | ||
| require.Nil(t, err) |
There was a problem hiding this comment.
This is a clever way to simulate the test scenario 👍
This documents the work done in: vitessio/vitess#8824 Signed-off-by: Matt Lord <mattalord@gmail.com>
vitessio/website#834 (noted in the description now and linked in comments) |
ajm188
left a comment
There was a problem hiding this comment.
I wanted to bring up one comment about the types we're using, but if you don't feel it's worth it to change then I'm happy to merge as-is.
| // Check if we should use the latest (default) or a specified backup timestamp for the restore | ||
| if restoreFromBackupTs != "" { | ||
| startTime, err = time.Parse(mysqlctl.BackupTimestampFormat, restoreFromBackupTs) | ||
| if err != nil { | ||
| return vterrors.New(vtrpcpb.Code_INVALID_ARGUMENT, fmt.Sprintf("unable to parse the backup timestamp value provided of '%s'", restoreFromBackupTs)) | ||
| } | ||
| } |
There was a problem hiding this comment.
I skimmed the rest of the PR and it doesn't seem like anyone brought this up. Another way to do this would be to push the parsing further up to the cli, and enforce stricter types closer to this function. In this way we could:
commandRestoreBackupingo/vt/vtctl/vtctl.goparses itsbackupTimestampStrinto atime.Time, usingmysqlctl.BackupTimestampFormat- we pass
time.Timeto all the tmc functions - when making the gRPC request, we change that protobuf definition to be a
vttime.Time, and callprotoutil.TimeToPrototo convert thetime.Timeto avttime.Time, and convert back on the tabletmanager side.
What do you think?
There was a problem hiding this comment.
I briefly considered this but wasn't sure of the benefits (avoiding the RPC call with an invalid vtctl CLI parameter is one). I don't mind making this change if you prefer it? Sounds like you do or you wouldn't have mentioned it 😄 ... so I'll work on that now. Thanks!
There was a problem hiding this comment.
OK, after trying it again I'm remembering WHY I didn't go this route last time. 🙂
What makes it feel awkward/clunky to use time.Time coming from the vtctl side is that there's also a corresponding vttablet flag that's used directly in the tmc:
As things are now, it's uniform handling (always a string) on the tmc side -- whether the value came from a vtctl call and RPC or from the vttablet flag. Maybe I'm missing something though?
I understand the pull to use a more precise type... but it also feels like it makes the code more complex with little practical benefit. If you feel strongly about it though I can continue.
There was a problem hiding this comment.
OK, I just switched to this method. Thank you for the nudge! I feel better about it too and It wasn't really that awkward in the end. :-)
0920d09 to
67db03f
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
67db03f to
f9c6938
Compare
| // Open the state manager after restore is done. | ||
| defer tm.tmState.Open() | ||
|
|
||
| // Zero date will cause us to use the latest, which is the default |
Follow-up to: vitessio#8824 I noticed this oversight after it was merged. :-) Signed-off-by: Matt Lord <mattalord@gmail.com>
Description
By default Vitess will only make practical use of the latest backup of a given shard. While this makes perfect sense for the common use cases there are times where you may want to restore a specific backup. For example:
This work supports that in two different ways:
-restore_from_backup -restore_from_backup_ts 2021-04-29.133050vtctlclient -server=<vtctld-server>:<vtctld-port> RestoreFromBackup -backup_timestamp=2021-04-29.133050 <tablet-alias>This compared with PITR
While Vitess supports a method to accomplish PITR, it's a fairly involved process with the details left up to the user (e.g. binlog servers). This work offers another method that can potentially be used in various ways/circumstances as an alternative. For example, let's say we realize that 1-2 days ago we made a mistake and want to look at the previous state of the data on one tablet in the shard... we could do that with:
This will stop MySQL replication on the tablet ASAP after the restore completes and prevent it from being automatically re-started (via tablet repair). This allows you to do whatever you like on that tablet before deciding to later:
vtctlclient StartReplication)Co-authored-by: Guido Iaquinti giaquinti@slack-corp.com
Signed-off-by: Matt Lord mattalord@gmail.com
Related Issue(s)
This is a continuation of: #7998
This solves: #4905
Checklist