-
Notifications
You must be signed in to change notification settings - Fork 233
VDiff v2 Blog Post #1219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
VDiff v2 Blog Post #1219
Changes from 6 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
bf5a363
Initial preface content
mattlord 60b7307
Add VDiff content
mattlord 342fa56
Incorporate Max's feedback
mattlord a6d3337
More minor tweaks
mattlord 57ee896
Modify title and description
mattlord ed6efe3
Add basic algo/flow for VDiff1
mattlord 6746090
Update content/en/blog/2022-11-14-vdiff-v2.md
mattlord 91034a8
Minor tweaks
mattlord abed0ac
Add vtgate concept link
mattlord f16b19c
Remove errant comma in SELECT
mattlord bd8ed44
Remove image and add shout-out to Arthur
mattlord 5cdea5a
Update publish date
mattlord File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,160 @@ | ||
| --- | ||
| author: 'Matt Lord' | ||
| date: 2022-11-14 | ||
| slug: '2022-11-14-vdiff-v2' | ||
| tags: [Vitess','MySQL','sharding','replication'] | ||
| title: 'Introducing Vitess Diff V2' | ||
| description: "Vitess's Powerful Data Diff Tool Just Got Even Better" | ||
|
mattlord marked this conversation as resolved.
Outdated
|
||
| --- | ||
|
|
||
| Vitess is a solution that allows you to infinitely scale MySQL while providing clients and apps with a single logical | ||
| mysqld view of the fleet of MySQL instances comprising any number of [`Keyspaces`](https://vitess.io/docs/concepts/keyspace/) | ||
|
mattlord marked this conversation as resolved.
Outdated
|
||
| and [`Shards`](https://vitess.io/docs/concepts/shard/). | ||
| <a href="/img/VitessQueryExample.png"><img src="/img/VitessQueryExample.png" alt="Query Example" width="275" align="right"/></a> | ||
|
|
||
| Vitess also provides the cluster and data management tools that make it possible to manage a massive cluster and | ||
| perform complex workflows using [VReplication](https://vitess.io/docs/reference/vreplication/vreplication/), such | ||
| as: | ||
| * [Moving tables](https://vitess.io/docs/reference/vreplication/movetables/) into Vitess or between keyspaces | ||
| * [Resharding](https://vitess.io/docs/reference/vreplication/reshard/) to adjust to changes in data size and load | ||
| * [Materialized views and rollups](https://vitess.io/docs/reference/vreplication/materialize/) for data analytics | ||
| and data locality | ||
| * [Online schema changes](https://vitess.io/docs/user-guides/schema-changes/managed-online-schema-changes/) that | ||
| are trackable, cancellable, revertible, and retryable | ||
|
|
||
| ## Why a Diff Tool? | ||
|
|
||
| Data is typically one of the most critical assets within an organization. As such, an operator needs to be able to | ||
| verify the correctness of this data, in particular as the data is moved around or otherwise transformed. For example, | ||
| operators have wanted a way to verify data consistency after replicating data from one MySQL instance to another or | ||
| dumping a table from one instance and loading it in another. However, even for a single table in these simplest of | ||
| cases — performing a safe, reliable, light-weight, and performant online diff between two MySQL instances is a | ||
| suprisingly difficult problem. Due to the challenges involved, there have been few attempted general solutions with | ||
| the most notable being: | ||
| * Percona's [pt-table-checksum](https://docs.percona.com/percona-toolkit/pt-table-checksum.html) | ||
| * MySQL's `mysqldiff` tool that was part of the now EOL'd [MySQL Utilities](https://downloads.mysql.com/docs/mysql-utilities-1.6-en.pdf) | ||
|
|
||
| With Vitess, _the need for a data diff tool is even more pronounced_ because you'll be migrating data from your | ||
| legacy systems into Vitess, migrating data across keyspaces, and performing a variety of other workflows. This | ||
| is further complicated by the fact that these workflows may be done across MySQL versions, data centers, with | ||
| differing schemas between the source and target, and over long time periods in which your data evolves. So it | ||
| is critical to have a tool that can reliably perform a logical diff between the source and target of these | ||
| workflows, in a timely manner, and without impacting production traffic. | ||
|
|
||
| ## VDiff | ||
|
|
||
| Vitess provided a solution for _diffing tables that are part of a [VReplication](https://vitess.io/docs/reference/vreplication/vreplication/) | ||
| workflow_ called [VDiff](https://vitess.io/docs/reference/vreplication/vdiff/). The basic algorithm or flow for each table is as follows: | ||
| * [`vtctld`](https://vitess.io/docs/reference/programs/vtctld/) | ||
| [selects tablets](https://vitess.io/docs/reference/vreplication/tablet_selection/) in the source and target | ||
| shards to use for the comparison — one per shard on each side | ||
| * On the target [tablets](https://vitess.io/docs/concepts/tablet/): stop the VReplication workflow for the VDiff | ||
| operation, to "freeze" the state, and record the current | ||
| [GTID](https://dev.mysql.com/doc/refman/en/replication-gtids-concepts.html) position in the | ||
| [VStream](https://vitess.io/docs/concepts/vstream/) | ||
| * On the source [tablets](https://vitess.io/docs/concepts/tablet/): | ||
| * wait for replication to catch up to at least where the target is (remember that the source instance may be a replica and the target | ||
| a primary) | ||
| * [lock the table](https://dev.mysql.com/doc/refman/en/lock-tables.html) to get the current | ||
| [`GTID_EXECUTED`](https://dev.mysql.com/doc/refman/en/replication-gtids-concepts.html) which gives us a logical | ||
| point in time that will correspond to the the read view in our upcoming transaction | ||
| * issue [`START TRANSACTION WITH CONSISTENT SNAPSHOT`](https://dev.mysql.com/doc/refman/en/commit.html) | ||
| * [unlock the table](https://dev.mysql.com/doc/refman/en/lock-tables.html) as we now have a consistent snapshot of | ||
| the table data and the GTID metadata that are both at the same logical point in time with regards to the table | ||
| we're diffing | ||
| * On the target [tablets](https://vitess.io/docs/concepts/tablet/): | ||
| * start VReplication UNTIL we have reached that `GTID_EXECUTED` position in the [VStream](https://vitess.io/docs/concepts/vstream/) | ||
| which matches the one we saved when setting up the read view on the source | ||
| * issue [`START TRANSACTION WITH CONSISTENT SNAPSHOT`](https://dev.mysql.com/doc/refman/en/commit.html) (remember | ||
| that the state is "frozen" on the target tablet) — now the target context is at the same logical point in | ||
| time as the source for this table | ||
| * On the source and target tablets: issue `SELECT <cols>, FROM <table> ORDER BY <pkcols>` | ||
| * In [`vtctld`](https://vitess.io/docs/reference/programs/vtctld/) : stream the results from those SELECTs, doing a | ||
| merge sort from shards, and compare the rows on both sides logically, as the schema may be different on either | ||
| side, keeping a record of any differences seen | ||
| * On the target [tablets](https://vitess.io/docs/concepts/tablet/): restart the VReplication workflow | ||
| * On the source and target [tablets](https://vitess.io/docs/concepts/tablet/): close the open transactions with | ||
| a `ROLLBACK` | ||
| * Finally the [`vtctl`](https://vitess.io/docs/reference/programs/vtctl/) client prints a report (to STDOUT) of the | ||
| results | ||
|
|
||
| {{< info >}} | ||
| For large tables, holding a transaction open on the source tablets can have a significant impact on normal query | ||
| traffic due to [InnoDB MVCC](https://dev.mysql.com/doc/refman/en/innodb-multi-versioning.html) needing to keep those | ||
| older versions of rows around if they are updated after the transaction started | ||
| ([`innodb_history_list_length`](https://orangematter.solarwinds.com/2015/07/20/what-is-innodb-history-list-length/)). For | ||
| this reason, I would recommend always using REPLICA tablets for VDiff operations whenever you can (when the source is an | ||
| [unmanaged tablet](https://vitess.io/docs/user-guides/configuration-advanced/unmanaged-tablet/), such as when e.g. moving | ||
| from RDS into Vitess, you may only have a PRIMARY tablet available). You can control that using the | ||
| `--tablet_types=REPLICA` flag for the [VDiff command](https://vitess.io/docs/reference/vreplication/vdiff/). In v14+ the | ||
| default was changed to: `--tablet_types=in_order:RDONLY,REPLICA,PRIMARY`. | ||
| {{</ info >}} | ||
|
|
||
| The original version worked very well but it suffered from [some limitations](https://vitess.io/docs/15.0/reference/vreplication/vdiff/#note) | ||
| that posed challenges in certain situations such as when working | ||
| [with very large tables](https://vitess.io/docs/15.0/reference/vreplication/vdiff/#using-vdiff-with-huge-tables). | ||
| For example, if you have over 1TiB of data that needs to be compared the VDiff could take a week to complete. If | ||
| during this time you had any failure such as one of the MySQL connections used getting closed (e.g. due to | ||
| [`wait_timeout`](https://dev.mysql.com/doc/refman/en/server-system-variables.html#sysvar_wait_timeout) or | ||
| [`net_write_timeout`](https://dev.mysql.com/doc/refman/en/server-system-variables.html#sysvar_net_write_timeout)) | ||
| then you'd have to start the entire operation over again from scratch. | ||
|
|
||
| We processed feedback from Vitess users over the course of 2+ years as they used VDiff in production and a | ||
| set of underlying issues started to become clear: | ||
| * Fragility — any connection loss, process failure, failover etc. would cause the VDiff to fail and need to be re-run | ||
| * Synchronous command — the vtctl client command would block until the VDiff completed which posed some challenges and | ||
| required a stable machine where e.g. a [tmux](https://github.com/tmux/tmux/wiki) session could be used for the client | ||
| call | ||
| * VTCtld as the controller — the [Vitess cluster management daemon](https://vitess.io/docs/reference/programs/vtctld/) is | ||
| generally a lightweight process used to coordinate complex operations that span many Vitess components. It's not designed | ||
| to be used for operations that span days and require the resources needed to compare 100s of GiBs of data | ||
| * Network traffic — the [tablets](https://vitess.io/docs/concepts/tablet/) on each side of the VDiff streamed their | ||
| data to the `vtctld` process which then compared the data. This generated a lot of network traffic which could | ||
| become a bottleneck and impact overall network bandwidth and latency. Keep in mind that it's common for the data | ||
| involved to reside in 3 or more failure domains / availability zones. | ||
| * No progress reporting — the VDiff could run for days without any indication of overall progress | ||
| * Execution time — the VDiff could take days or weeks to complete for very large tables, in large part because there | ||
| was very little concurrency with a single `vtctld` process doing the bulk of the work | ||
|
|
||
| We set out to create a new version of VDiff that addressed all of these issues. | ||
|
|
||
| ## VDiff V2 | ||
|
|
||
| We started by largely [rearranging the existing VDiff code](https://github.com/vitessio/vitess/pull/10382) so that | ||
| instead of being managed and controlled by a `vtctld` it's managed and executed — in parallel — by each shard on the | ||
| target side. This offers parallelism while also reducing the amount of network traffic needed to perform the diff. | ||
| The operation was also made asynchronous, with the | ||
| [`VDiff Show`](https://vitess.io/docs/reference/vreplication/vdiff2/#show-progressstatus-of-a-vdiff) | ||
| client command gathering and reporting the results of the VDiff operation from each of the target shards involved. | ||
|
|
||
| We then made VDiffs [resumable](https://github.com/vitessio/vitess/pull/10497) so that if a failure occurs during | ||
| the diff, the operation can be resumed from where it left off. This also makes it possible to do a rolling or | ||
| incremental VDiff where you may perform the VDiff immediately after a workflow completes, and then again just before | ||
| doing a cutover for added confidence as there may be weeks between those two stages. From there we added support for | ||
| [auto-restarting](https://github.com/vitessio/vitess/pull/10639) a VDiff if any ephemeral/recoverable error occurs. | ||
| This means that you can have process crashes, failovers, network issues, etc and the VDiff will automatically | ||
| recover and continue running. | ||
|
|
||
| We also added [progress reporting](https://github.com/vitessio/vitess/pull/10639) so that you have some idea of | ||
| how much work the VDiff has done, how much is left, and have an ETA for when it's likely to complete. This gives | ||
| you greater peace of mind while a longer operation runs and better allows you to prepare for the next step once | ||
| the VDiff completes. | ||
|
|
||
| There were a variety of other minor improvements as well. In total, we hope that this new version addresses the | ||
| major set of issues that users had and provides a solid base for us to continue making further improvements. | ||
|
|
||
| ## Conclusion | ||
|
|
||
| Vitess [VReplication](https://vitess.io/docs/16.0/reference/vreplication/vreplication/) offers a set of | ||
| powerful features that allow users to manage data workflows when that data is spread across a large fleet of | ||
| MySQL instances. [VDiff](https://vitess.io/docs/reference/vreplication/vdiff2/) then provides an invaluable | ||
| tool for verifying the correctness of these complex operations, giving you confidence and peace of mind | ||
| as you execute the data operations required to better meet your evolving business needs and objectives over | ||
| time. | ||
|
|
||
| Please try out [VDiff v2](https://vitess.io/docs/reference/vreplication/vdiff2/) in | ||
| [Vitess 15.0](https://github.com/vitessio/vitess/releases/tag/v15.0.0) — where it's marked as experimental — | ||
| and provide feedback! We hope to mark it as GA/production-ready in the upcoming 16.0 release and your | ||
| feedback is invaluable. | ||
|
|
||
| Happy data migrations! 🚀 🚀 🚀 | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.