Skip to content

Partial Movetables: allow moving a keyspace one shard at a time#9987

Merged
rohit-nayak-ps merged 20 commits intovitessio:mainfrom
planetscale:partial-movetables
Sep 23, 2022
Merged

Partial Movetables: allow moving a keyspace one shard at a time#9987
rohit-nayak-ps merged 20 commits intovitessio:mainfrom
planetscale:partial-movetables

Conversation

@rohit-nayak-ps
Copy link
Copy Markdown
Member

@rohit-nayak-ps rohit-nayak-ps commented Mar 26, 2022

Warning! 
This feature is expected to be used only for migrating large Vitess clusters across data centers and should not be enabled on an ongoing basis since it adds performance overhead during query serving. It also changes the paradigm hitherto expected in Vitess that a table is served, at any given time, from a single keyspace. The latter might lead to unstable behavior, with global routing, for example.

TL;DR;

  • MoveTables can now take a subset of shards
  • ShardRoutingRules have been added, to route shards already moved from source to target
  • ShardRoutingRules apply both for shard targeted and globally routed queries

Description

This feature introduces the concept of partial keyspaces where some shards are served from a different keyspace. This is
useful for a specific but critical use-case where a large production Vitess setup (100s of shards) is being migrated to
a new data center or provider. Migrating the entire cluster in one go using MoveTables could cause an unacceptable
downtime due to the large number of primaries that need to be synced when writes are switched.

Sample Usage

partial MoveTables signalled by --source_shards

vtctlclient MoveTables -- -source customer --tables 'customer,corder' --source_shards '-80' Create customer2.partial1

VDiff works as-is

vtctlclient VDiff customer2.partial1

SwitchTraffic generates this shard routing rule

vtctlclient MoveTables -- SwitchTraffic customer2.partial1

{"rules":[{"from_keyspace":"customer", "to_keyspace":"customer2", "shards":"-80"}]}

Demo that shard routing now works

# stop workflow to stop reverse replication from running 
vtctlclient Workflow customer2.partial stop
# update customer2 database directly.
mysql -S $VTDATAROOT/vt_0000000500/mysql.sock -u vt_dba -e "update vt_customer2.customer set email = concat('routed.', email)"

# use shard targeting using customer (not customer2). 
mysql -e "use customer:-80; select customer_id, email from customer order by customer_id"

+-------------+---------------------------+
| customer_id | email                     |
+-------------+---------------------------+
|           1 | routed.alice@domain.com   |
|           2 | routed.bob@domain.com     |
|           3 | routed.charlie@domain.com |
|           5 | routed.eve@domain.com     |
+-------------+---------------------------+

Summary of code changes

Core Changes

Workflow Show changes

While we had the possibility of partial reads being switched earlier, now writes can also be partially switched in a workflow.

Shard Routing Rules

Topo

Shard Routing Rules are a new concept introduced for this feature. It maps a (keyspace, shard) tuple to another keyspace. This is used to create a new cluster level map which maps a (keyspace, shard) to another keyspace. These are set by SwitchTraffic in a partial MoveTables and used by vtgate while routing queries. vtctlclient commands ApplyShardRoutingRules and GetShardRoutingRulesallow setting/getting of these rules.

vtgate Shard Targeting

The shard targeted query routing in vtgate's bypass mechanism go/vt/vtgate/planbuilder/bypass.go. We create a map from the SrvVSchema's shard routing rules object and check if a specified shard destination needs to be rerouted.

vtgate Global Routing

The global query routing using vtgate's ResolveDestinations() methods go/vt/vtgate/vcursor_impl.go. We go through all selected shard destinations and modify those that are mapped in the shard routing rules.

New column workflow_sub_type in _vt.vreplication

There is a new bool column workflow_sub_type added to _vt.vreplication, set for partial movetables. It is used for
visibility and for bypassing certain validations that expect a full keyspace.

Flags

vtgate --enable-partial-keyspace-migration

Default: false. It is used when a cluster is setup with shard routing rules and tells vtgate to use these rules while routing queries.

MoveTables --source_shards

The only flag needed to tell MoveTables to perform a partial movetables is to pass it this flag, example --source_shards -80. This flag already exists and is used by Reshard.

Notes

Both read and write traffic is switched at the same time when a shard routing is deemed complete (using SwitchTraffic). This is because we add a shard routing rule when this happens. Switching of read and write separately is done by updating the regular routing rules by targeting @replica and @rdonly.

Test Changes

e2e test TestPartialMoveTables

It creates a workflow which moves tables from one shard only to the target shard, ensures it completed correctly and switches traffic. It ensures that the shard routing rules and regular routing rules are setup correctly and that the vtgate queries both shard targeted and global queries are routed correctly.

Modified vtgate tests

A new set of tests have been added where the test cluster is setup as a partial cluster: one shard is in a different keyspace with the ShardRoutingRules setup. The main change is that the vtgate params need to specify the default keyspace as the DbName to avoid ambiguity.

Some tests have to be skipped for partial keyspaces because they intrinsically affect tablets in the base cluster which are now not all serving.

TODOs:
[ ] Add explicit reasons for Skipped vtgate unit tests during partial keyspace

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

@rohit-nayak-ps rohit-nayak-ps added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) Skip Upgrade Downgrade Do Not Merge labels Mar 26, 2022
@rohit-nayak-ps rohit-nayak-ps force-pushed the partial-movetables branch 2 times, most recently from 7395dd5 to 4f81532 Compare June 15, 2022 16:52
@rohit-nayak-ps rohit-nayak-ps added release notes and removed release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) labels Jul 3, 2022
@rohit-nayak-ps rohit-nayak-ps force-pushed the partial-movetables branch 2 times, most recently from e2604c8 to 2574892 Compare July 5, 2022 07:54
@rohit-nayak-ps rohit-nayak-ps force-pushed the partial-movetables branch 2 times, most recently from 142cf66 to 5a9e6a5 Compare July 14, 2022 20:30
@rohit-nayak-ps
Copy link
Copy Markdown
Member Author

rohit-nayak-ps commented Jul 14, 2022

Squashed all commits since it was becoming tougher to fix conflicts each time with a large number of commits..

@rohit-nayak-ps rohit-nayak-ps force-pushed the partial-movetables branch 6 times, most recently from 2a2c20c to 9fe807f Compare July 20, 2022 16:33
@rohit-nayak-ps rohit-nayak-ps changed the title POC: (do not merge): Shard-by-shard Cross-cluster Migration Partial Movetables: allow moving a table one shard at a time Jul 20, 2022
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…e multiple config.json files

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@mattlord mattlord self-requested a review September 19, 2022 20:25
Copy link
Copy Markdown
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this! There were a couple of things we should change before merging (example related), but otherwise it's a few minor questions and suggestions that you can make the final call on. I'll review your feedback tomorrow and quickly approve. Thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what done means here. 🙂 We're making it a future ToDo post merge?

Copy link
Copy Markdown
Member

@mattlord mattlord Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In that case, any reason not to export the flag variable enableShardRouting->vtgate.EnableShardRouting and reference that directly? That would make more sense to me than creating these new exported functions in the planbuilder package: EnableShardRoutingFlag() and IsShardRoutingEnabled().

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Copy link
Copy Markdown
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

enableSchemaChangeSignal = flag.Bool("schema_change_signal", true, "Enable the schema tracker; requires queryserver-config-schema-change-signal to be enabled on the underlying vttablets for this to work")
schemaChangeUser = flag.String("schema_change_signal_user", "", "User to be used to send down query to vttablet to retrieve schema changes")

enableShardRouting = flag.Bool("enable_partial_keyspace_migration", false, "(Experimental) Follow shard routing rules: enable only while migrating a keyspace shard by shard. See documentation on Partial MoveTables for more. (default false)")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing, but I think we're supposed to use dashes for new flags.

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps rohit-nayak-ps merged commit 86e0cf8 into vitessio:main Sep 23, 2022
@rohit-nayak-ps rohit-nayak-ps deleted the partial-movetables branch September 23, 2022 10:15
@rohit-nayak-ps rohit-nayak-ps changed the title Partial Movetables: allow moving a table one shard at a time Partial Movetables: allow moving a keyspace one shard at a time Oct 5, 2022
DeathBorn added a commit to vinted/vitess that referenced this pull request Apr 23, 2024
…moving a keyspace one shard at a time vitessio#9987

Signed-off-by: Vilius Okockis <vilius.okockis@vinted.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants