diff --git a/content/en/blog/2022-11-14-vdiff-v2.md b/content/en/blog/2022-11-14-vdiff-v2.md new file mode 100644 index 000000000..04426f532 --- /dev/null +++ b/content/en/blog/2022-11-14-vdiff-v2.md @@ -0,0 +1,161 @@ +--- +author: 'Matt Lord' +date: 2022-11-22 +slug: '2022-11-22-vdiff-v2' +tags: [Vitess','MySQL','sharding','replication'] +title: 'Introducing VDiff V2' +description: "Vitess's Powerful Workflow Data Diff Tool Just Got Even Better" +--- + +Vitess is a solution that allows you to infinitely scale MySQL while providing clients and apps with [a single logical +view](https://vitess.io/docs/concepts/vtgate/) of the fleet of MySQL instances comprising any number of +[`Keyspaces`](https://vitess.io/docs/concepts/keyspace/) and [`Shards`](https://vitess.io/docs/concepts/shard/). + +Vitess also provides the cluster and data management tools that make it possible to manage a massive cluster and +perform complex workflows using [VReplication](https://vitess.io/docs/reference/vreplication/vreplication/), such +as: + * [Moving tables](https://vitess.io/docs/reference/vreplication/movetables/) into Vitess or between keyspaces + * [Resharding](https://vitess.io/docs/reference/vreplication/reshard/) to adjust to changes in data size and load + * [Materialized views and rollups](https://vitess.io/docs/reference/vreplication/materialize/) for data analytics +and data locality + * [Online schema changes](https://vitess.io/docs/user-guides/schema-changes/managed-online-schema-changes/) that +are trackable, cancellable, revertible, and retryable + +## Why a Diff Tool? + +Data is typically one of the most critical assets within an organization. As such, an operator needs to be able to +verify the correctness of this data, in particular as the data is moved around or otherwise transformed. For example, +operators have wanted a way to verify data consistency after replicating data from one MySQL instance to another or +dumping a table from one instance and loading it in another. However, even for a single table in these simplest of +cases — performing a safe, reliable, light-weight, and performant online diff between two MySQL instances is a +suprisingly difficult problem. Due to the challenges involved, there have been few attempted general solutions with +the most notable being: + * Percona's [pt-table-checksum](https://docs.percona.com/percona-toolkit/pt-table-checksum.html) + * MySQL's `mysqldiff` tool that was part of the now EOL'd [MySQL Utilities](https://downloads.mysql.com/docs/mysql-utilities-1.6-en.pdf) + +With Vitess, _the need for a data diff tool is even more pronounced_ because you'll be migrating data from your +legacy systems into Vitess, migrating data across keyspaces, and performing a variety of other workflows. This +is further complicated by the fact that these workflows may be done across MySQL versions, data centers, with +differing schemas between the source and target, and over long time periods in which your data evolves. So it +is critical to have a tool that can reliably perform a logical diff between the source and target of these +workflows, in a timely manner, and without impacting production traffic. + +## VDiff + +Vitess provided a solution for _diffing tables that are part of a [VReplication](https://vitess.io/docs/reference/vreplication/vreplication/) +workflow_ called [VDiff](https://vitess.io/docs/reference/vreplication/vdiff/). I will walk through the basic algorithm or flow used for diffing +each table in order to demonstrate the challenges involved and how we solved them in VDiff: +* [`vtctld`](https://vitess.io/docs/reference/programs/vtctld/) + [selects tablets](https://vitess.io/docs/reference/vreplication/tablet_selection/) in the source and target + shards to use for the comparison — one per shard on each side +* On the target [tablets](https://vitess.io/docs/concepts/tablet/): stop the VReplication workflow for the VDiff + operation, to "freeze" the state, and record the current   + [GTID](https://dev.mysql.com/doc/refman/en/replication-gtids-concepts.html) position in the + [VStream](https://vitess.io/docs/concepts/vstream/) +* On the source [tablets](https://vitess.io/docs/concepts/tablet/): + * wait for replication to catch up to at least where the target is (remember that the source instance may be a replica and the target + a primary) + * [lock the table](https://dev.mysql.com/doc/refman/en/lock-tables.html) to get the current + [`GTID_EXECUTED`](https://dev.mysql.com/doc/refman/en/replication-gtids-concepts.html) which gives us a logical + point in time that will correspond to the the read view in our upcoming transaction + * issue [`START TRANSACTION WITH CONSISTENT SNAPSHOT`](https://dev.mysql.com/doc/refman/en/commit.html) + * [unlock the table](https://dev.mysql.com/doc/refman/en/lock-tables.html) as we now have a consistent snapshot of + the table data and the GTID metadata that are both at the same logical point in time with regards to the table + we're diffing +* On the target [tablets](https://vitess.io/docs/concepts/tablet/): + * start VReplication UNTIL we have reached that `GTID_EXECUTED` position in the [VStream](https://vitess.io/docs/concepts/vstream/) + which matches the one we saved when setting up the read view on the source + * issue [`START TRANSACTION WITH CONSISTENT SNAPSHOT`](https://dev.mysql.com/doc/refman/en/commit.html) (remember + that the state is "frozen" on the target tablet) — now the target context is at the same logical point in + time as the source for this table +* On the source and target tablets: issue `SELECT FROM ORDER BY ` (for deterministic ordering and to avoid a filesort) +* In [`vtctld`](https://vitess.io/docs/reference/programs/vtctld/) : stream the results from those SELECTs, doing a + merge sort from the shards, and compare the rows on both sides logically, as the schema may be different on either + side, keeping a record of any differences seen +* On the target [tablets](https://vitess.io/docs/concepts/tablet/): restart the VReplication workflow +* On the source and target [tablets](https://vitess.io/docs/concepts/tablet/): close the open transaction with + a `ROLLBACK` +* Finally the [`vtctl`](https://vitess.io/docs/reference/programs/vtctl/) client prints a report (to STDOUT) of the + results + +{{< info >}} +For large tables, holding a transaction open on the source tablets can have a significant impact on normal query +traffic due to [InnoDB MVCC](https://dev.mysql.com/doc/refman/en/innodb-multi-versioning.html) needing to keep those +older versions of rows around if they are updated after the transaction started +([`innodb_history_list_length`](https://orangematter.solarwinds.com/2015/07/20/what-is-innodb-history-list-length/)). For +this reason, I would recommend always using REPLICA tablets for VDiff operations whenever you can (when the source is an +[unmanaged tablet](https://vitess.io/docs/user-guides/configuration-advanced/unmanaged-tablet/), such as when e.g. moving +from RDS into Vitess, you may only have a PRIMARY tablet available). You can control that using the +`--tablet_types=REPLICA` flag for the [VDiff command](https://vitess.io/docs/reference/vreplication/vdiff/). In v14+ the +default was changed to: `--tablet_types=in_order:RDONLY,REPLICA,PRIMARY`. +{{}} + +The original version worked very well but it suffered from [some limitations](https://vitess.io/docs/15.0/reference/vreplication/vdiff/#note) +that posed challenges in certain situations such as when working +[with very large tables](https://vitess.io/docs/15.0/reference/vreplication/vdiff/#using-vdiff-with-huge-tables). +For example, if you have over 1TiB of data that needs to be compared the VDiff could take a week to complete. If +during this time you had any failure such as one of the MySQL connections used getting closed (e.g. due to +[`wait_timeout`](https://dev.mysql.com/doc/refman/en/server-system-variables.html#sysvar_wait_timeout) or +[`net_write_timeout`](https://dev.mysql.com/doc/refman/en/server-system-variables.html#sysvar_net_write_timeout)) +then you'd have to start the entire operation over again from scratch. + +We processed feedback from Vitess users over the course of 2+ years as they used VDiff in production and a +set of underlying issues started to become clear: +* Fragility — any connection loss, process failure, failover etc. would cause the VDiff to fail and need to be re-run +* Synchronous command — the vtctl client command would block until the VDiff completed which posed some challenges and + required a stable machine where e.g. a [tmux](https://github.com/tmux/tmux/wiki) session could be used for the client + call +* VTCtld as the controller — the [Vitess cluster management daemon](https://vitess.io/docs/reference/programs/vtctld/) is + generally a lightweight process used to coordinate complex operations that span many Vitess components. It's not designed + to be used for operations that span days and require the resources needed to compare 100s of GiBs of data + * Network traffic — the [tablets](https://vitess.io/docs/concepts/tablet/) on each side of the VDiff streamed their + data to the `vtctld` process which then compared the data. This generated a lot of network traffic which could + become a bottleneck and impact overall network bandwidth and latency. Keep in mind that it's common for the data + involved to reside in 3 or more failure domains / availability zones. +* No progress reporting — the VDiff could run for days without any indication of overall progress +* Execution time — the VDiff could take days or weeks to complete for very large tables, in large part because there + was very little concurrency with a single `vtctld` process doing the bulk of the work + +We set out to create a new version of VDiff that addressed all of these issues. + +## VDiff V2 + +We started by largely [rearranging the existing VDiff code](https://github.com/vitessio/vitess/pull/10382) so that +instead of being managed and controlled by a single `vtctld` it's instead managed and executed — in parallel — by the +primary tablet in each shard on the target side. This offers parallelism while also reducing the amount of network traffic +needed to perform the diff. The operation was also made asynchronous, with the +[`VDiff Show`](https://vitess.io/docs/reference/vreplication/vdiff2/#show-progressstatus-of-a-vdiff) +client command gathering and reporting the results of the VDiff operation from each of the target shards involved. + +We then made VDiffs [resumable](https://github.com/vitessio/vitess/pull/10497) so that if a failure occurs during +the diff, the operation can be resumed from where it left off. This also makes it possible to do a rolling or +incremental VDiff where you may perform the VDiff immediately after a workflow completes, and then again just before +doing a cutover for added confidence as there may be weeks between those two stages. From there we added support for +[auto-restarting](https://github.com/vitessio/vitess/pull/10639) a VDiff if any ephemeral/recoverable error occurs. +This means that you can have process crashes, failovers, network issues, etc and the VDiff will automatically +recover and continue running. + +We also added [progress reporting](https://github.com/vitessio/vitess/pull/10639) so that you have some idea of +how much work the VDiff has done, how much is left, and have an ETA for when it's likely to complete. This gives +you greater peace of mind while a longer operation runs and better allows you to prepare for the next step once +the VDiff completes. + +There were a variety of other minor improvements as well. In total, we hope that this new version addresses the +major set of issues that users had and provides a solid base for us to continue making further improvements. + +## Conclusion + +Vitess [VReplication](https://vitess.io/docs/16.0/reference/vreplication/vreplication/) offers a set of +powerful features that allow users to manage data workflows when that data is spread across a large fleet of +MySQL instances. [VDiff](https://vitess.io/docs/reference/vreplication/vdiff2/) then provides an invaluable +tool for verifying the correctness of these complex operations, giving you confidence and peace of mind +as you execute the data operations required to better meet your evolving business needs and objectives over +time. + +Please try out [VDiff v2](https://vitess.io/docs/reference/vreplication/vdiff2/) in +[Vitess 15.0](https://github.com/vitessio/vitess/releases/tag/v15.0.0) — where it's marked as experimental — +and provide feedback! We hope to mark it as GA/production-ready in the upcoming 16.0 release and your +feedback is invaluable. Special shout out to [Arthur Schreiber @ GitHub](https://github.com/arthurschreiber) +for providing a lot of great early feedback that's helping to make the feature better! ♥️ + +Happy data migrations! 🚀 🚀 🚀 \ No newline at end of file