VReplication based traffic migrater#4981
Conversation
|
Oh this is very cool. I'll take time to review this as I'm familiarized with the legacy MigrateServed. |
| // If it's a bad table or db, it could be because _vt.vreplication wasn't created. | ||
| // In that case we can try creating it again. | ||
| // If it's a bad table or db, it could be because the vreplication tables weren't created. | ||
| // In that case we can try creating then again. |
go/vt/wrangler/fake_tablet_test.go
Outdated
| // This file was copied from testlib. All tests from testlib should be moved | ||
| // to the current directory. In order to move tests from there, we have to | ||
| // remove the circular dependency it causes (through vtctl dependence). | ||
| // The tests in this diectory call wrangler functions directory. So, there's |
There was a problem hiding this comment.
| // The tests in this diectory call wrangler functions directory. So, there's | |
| // The tests in this directory call wrangler functions directly. So, there's |
go/vt/wrangler/migrater.go
Outdated
| return err | ||
| } | ||
|
|
||
| // For reads, locking the source keysppace is sufficient. |
There was a problem hiding this comment.
| // For reads, locking the source keysppace is sufficient. | |
| // For reads, locking the source keyspace is sufficient. |
| } | ||
| defer unlock(&err) | ||
|
|
||
| if mi.migrationType == binlogdatapb.MigrationType_TABLES { |
There was a problem hiding this comment.
When would you use TABLES vs SHARDS?
go/vt/wrangler/migrater.go
Outdated
| return targets, nil | ||
| } | ||
|
|
||
| // hashStreams produces a reproduceable hash based on the input parameters. |
There was a problem hiding this comment.
| // hashStreams produces a reproduceable hash based on the input parameters. | |
| // hashStreams produces a reproducible hash based on the input parameters. |
|
|
||
| //------------------------------------------------------------------------------------------------------------------- | ||
| // Single cell RDONLY migration. | ||
| err := tme.wr.MigrateReads(ctx, tme.targetKeyspace, "test", topodatapb.TabletType_RDONLY, []string{"cell1"}, DirectionForward) |
There was a problem hiding this comment.
put a checkCellRouting before this to show that there are only 4 entries for cell1 before we MigrateReads, so that it is obvious to anyone reading the test what changes happened because of the call to MigrateReads.
go/vt/wrangler/migrater_test.go
Outdated
| // Other cell REPLICA migration. | ||
| // The global routing already contains redirections for rdonly. | ||
| // So, adding routes for replica and deploying to cell2 will also cause | ||
| // cell2 to migrat rdonly. This is a quirk that can be fixed later if necessary. |
There was a problem hiding this comment.
| // cell2 to migrat rdonly. This is a quirk that can be fixed later if necessary. | |
| // cell2 to migrate rdonly. This is a quirk that can be fixed later if necessary. |
go/vt/wrangler/migrater_test.go
Outdated
| // Other cell REPLICA migration. | ||
| // The global routing already contains redirections for rdonly. | ||
| // So, adding routes for replica and deploying to cell2 will also cause | ||
| // cell2 to migrat rdonly. This is a quirk that can be fixed later if necessary. |
| "ks1.t2@replica": {"ks2.t2"}, | ||
| }) | ||
| verifyQueries(t, tme.allDBClients) | ||
|
|
There was a problem hiding this comment.
will be nice to add one more backward migration here - of REPLICA
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Since we don't support multiple source or target keyspaces, the uid parameters can be simplified to only have shards as keys, and there will be a separate targetKeyspace parameter that applies to all shards. Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Specifying shards and uids wasn't user-friendly. Specifying workflow names should better. However, it will be the user's responsibilty to keep them unique. Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
There was a problem hiding this comment.
I didn't review the tests in detail. I spent most of my time in migrater.go
This looks really good and way more readable that we have today. I added some comments.
Next steps would be to refactor the low level functions. I need to remember what was the plan we had for that.
go/vt/vtctl/vtctl.go
Outdated
| } | ||
|
|
||
| func commandMigrateReads(ctx context.Context, wr *wrangler.Wrangler, subFlags *flag.FlagSet, args []string) error { | ||
| reverse := subFlags.Bool("reverse", false, "Moves the served tablet type backward instead of forward. Use in case of trouble") |
There was a problem hiding this comment.
Maybe: Replace use in case of trouble. Use in case a rollback of a migration is needed.
There was a problem hiding this comment.
I removed that entire statement (including places where I copied from). It doesn't add any value.
| } | ||
|
|
||
| log.Info("Looks like _vt.vreplication table may not exist. Trying to recreate... ") | ||
| log.Info("Looks like the vreplcation tables may not exist. Trying to recreate... ") |
There was a problem hiding this comment.
I forgot if this is this log, but there is one that as is similar to this one that in practice becomes noisy and confusing. I think we should remove and only log if there are errors.
There was a problem hiding this comment.
I removed the confusing comment a few weeks ago. It was in tablet manager. This should not spam and looks useful, because it will let us know when the tables got created.
| if servedType != topodatapb.TabletType_REPLICA && servedType != topodatapb.TabletType_RDONLY { | ||
| return fmt.Errorf("tablet type must be REPLICA or RDONLY: %v", servedType) | ||
| } | ||
| mi, err := wr.buildMigrater(ctx, targetKeyspace, workflow) |
There was a problem hiding this comment.
Wouldn't be better to pass servedType and direction to the migrator? That way this whole method could become way more succinct because:
- The validation above could be push down to the
mi.validate. - The logic to decide if it's
migrateTableReadsormigrateShardReadscould also be pushed down there. So this whole method could be something like:
mi, err = buildReadMigrater(...)
...
if err := mi.validate(ctx); err != nil {
return err
}
ctx, unlock, lockErr := wr.ts.LockKeyspace(ctx, mi.sourceKeyspace, "MigrateReads")
if lockErr != nil {
return lockErr
}
return mi.migrate(ctx)
This implies that you have a migrator for reads and a migrator for writes. But I'm thinking that this is actually better as right now the paths are already different for the validations and we are passing a flag to distinguish this.
There was a problem hiding this comment.
I tried changing. It made some things better and some things worse. So, let's leave it the way it is for now until we add more functionality.
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if len(p3qr.Rows) < 1 { |
There was a problem hiding this comment.
I think the comment is still not very clear. Maybe adding a concrete example will help visualize.
| if err := mi.wr.saveRoutingRules(ctx, rules); err != nil { | ||
| return err | ||
| } | ||
| return mi.wr.ts.RebuildSrvVSchema(ctx, cells) |
There was a problem hiding this comment.
Why do we need this for the tables case? This is not clear to me.
There was a problem hiding this comment.
RebuildSrvVSchema also propagates routing rules to the cells. Routing rules are logically part of the vschema, except that they're not associated to a specific keyspace.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
This is the new VReplication based migrater that will eventually replace all existing MigrateServed (From and Types).
At a high level, the input for the migration is a list of vreplication streams. The migrate command will migrate the traffic for all the requested streams. This means that the higher level
MigrateServedfunctions can be changed to reuse the new implementation instead. The main idea is that the resharding metadata that was previously crucial for traffic migration will become non-critical. We instead rely on the vreplication streams to be the source of truth.The functionality is exposed as vtctl commands
MigrateReadsandMigrateWrites. One of the input arguments is a workflow name which can be used to identify a group of vreplication streams across all shards.The new code has the following properties:
_vt.resharding_journaltable has been created to keep track of the important parameters and binlog positions of a migration. This can be used to reverse vreplication, and also for redirecting other vreplication streams.