vitessio · mattlord · Nov 22, 2022 · Nov 10, 2022 · Nov 10, 2022 · Nov 10, 2022
diff --git a/content/en/blog/2022-11-14-vdiff-v2.md b/content/en/blog/2022-11-14-vdiff-v2.md
@@ -0,0 +1,161 @@
+---
+author: 'Matt Lord'
+date: 2022-11-22
+slug: '2022-11-22-vdiff-v2'
+tags: [Vitess','MySQL','sharding','replication']
+title: 'Introducing VDiff V2'
+description: "Vitess's Powerful Workflow Data Diff Tool Just Got Even Better"
+---
+
+Vitess is a solution that allows you to infinitely scale MySQL while providing clients and apps with [a single logical
+view](https://vitess.io/docs/concepts/vtgate/) of the fleet of MySQL instances comprising any number of
+[`Keyspaces`](https://vitess.io/docs/concepts/keyspace/) and [`Shards`](https://vitess.io/docs/concepts/shard/).
+
+Vitess also provides the cluster and data management tools that make it possible to manage a massive cluster and
+perform complex workflows using [VReplication](https://vitess.io/docs/reference/vreplication/vreplication/), such
+as:
+  * [Moving tables](https://vitess.io/docs/reference/vreplication/movetables/) into Vitess or between keyspaces
+  * [Resharding](https://vitess.io/docs/reference/vreplication/reshard/) to adjust to changes in data size and load
+  * [Materialized views and rollups](https://vitess.io/docs/reference/vreplication/materialize/) for data analytics
+and data locality
+  * [Online schema changes](https://vitess.io/docs/user-guides/schema-changes/managed-online-schema-changes/) that
+are trackable, cancellable, revertible, and retryable
+
+## Why a Diff Tool?
+
+Data is typically one of the most critical assets within an organization. As such, an operator needs to be able to
+verify the correctness of this data, in particular as the data is moved around or otherwise transformed. For example,
+operators have wanted a way to verify data consistency after replicating data from one MySQL instance to another or
+dumping a table from one instance and loading it in another. However, even for a single table in these simplest of
+cases — performing a safe, reliable, light-weight, and performant online diff between two MySQL instances is a
+suprisingly difficult problem. Due to the challenges involved, there have been few attempted general solutions with
+the most notable being:
+  * Percona's [pt-table-checksum](https://docs.percona.com/percona-toolkit/pt-table-checksum.html)
+  * MySQL's `mysqldiff` tool that was part of the now EOL'd [MySQL Utilities](https://downloads.mysql.com/docs/mysql-utilities-1.6-en.pdf)
+
+With Vitess, _the need for a data diff tool is even more pronounced_ because you'll be migrating data from your
+legacy systems into Vitess, migrating data across keyspaces, and performing a variety of other workflows. This
+is further complicated by the fact that these workflows may be done across MySQL versions, data centers, with
+differing schemas between the source and target, and over long time periods in which your data evolves. So it
+is critical to have a tool that can reliably perform a logical diff between the source and target of these
+workflows, in a timely manner, and without impacting production traffic.
+
+## VDiff
+
+Vitess provided a solution for _diffing tables that are part of a [VReplication](https://vitess.io/docs/reference/vreplication/vreplication/)
+workflow_ called [VDiff](https://vitess.io/docs/reference/vreplication/vdiff/). I will walk through the basic algorithm or flow used for diffing
+each table in order to demonstrate the challenges involved and how we solved them in VDiff:
+* [`vtctld`](https://vitess.io/docs/reference/programs/vtctld/)
+  [selects tablets](https://vitess.io/docs/reference/vreplication/tablet_selection/) in the source and target
+  shards to use for the comparison — one per shard on each side
+* On the target [tablets](https://vitess.io/docs/concepts/tablet/): stop the VReplication workflow for the VDiff
+  operation, to "freeze" the state, and record the current &nbsp;
+  [GTID](https://dev.mysql.com/doc/refman/en/replication-gtids-concepts.html) position in the
+  [VStream](https://vitess.io/docs/concepts/vstream/)
+* On the source [tablets](https://vitess.io/docs/concepts/tablet/):
+  * wait for replication to catch up to at least where the target is (remember that the source instance may be a replica and the target
+    a primary)
+  * [lock the table](https://dev.mysql.com/doc/refman/en/lock-tables.html) to get the current
+    [`GTID_EXECUTED`](https://dev.mysql.com/doc/refman/en/replication-gtids-concepts.html) which gives us a logical
+    point in time that will correspond to the the read view in our upcoming transaction
+  * issue [`START TRANSACTION WITH CONSISTENT SNAPSHOT`](https://dev.mysql.com/doc/refman/en/commit.html)
+  * [unlock the table](https://dev.mysql.com/doc/refman/en/lock-tables.html) as we now have a consistent snapshot of
+    the table data and the GTID metadata that are both at the same logical point in time with regards to the table
+    we're diffing
+* On the target [tablets](https://vitess.io/docs/concepts/tablet/):
+  * start VReplication UNTIL we have reached that `GTID_EXECUTED` position in the [VStream](https://vitess.io/docs/concepts/vstream/)
+    which matches the one we saved when setting up the read view on the source
+  * issue [`START TRANSACTION WITH CONSISTENT SNAPSHOT`](https://dev.mysql.com/doc/refman/en/commit.html) (remember
+    that the state is "frozen" on the target tablet) — now the target context is at the same logical point in
+    time as the source for this table
+* On the source and target tablets: issue `SELECT <cols> FROM <table> ORDER BY <pkcols>` (for deterministic ordering and to avoid a filesort)
+* In [`vtctld`](https://vitess.io/docs/reference/programs/vtctld/) : stream the results from those SELECTs, doing a
+  merge sort from the shards, and compare the rows on both sides logically, as the schema may be different on either
+  side, keeping a record of any differences seen
+* On the target [tablets](https://vitess.io/docs/concepts/tablet/): restart the VReplication workflow
+* On the source and target [tablets](https://vitess.io/docs/concepts/tablet/): close the open transaction with
+  a `ROLLBACK`
+* Finally the [`vtctl`](https://vitess.io/docs/reference/programs/vtctl/) client prints a report (to STDOUT) of the
+  results
+
+{{< info >}}
+For large tables, holding a transaction open on the source tablets can have a significant impact on normal query
+traffic due to [InnoDB MVCC](https://dev.mysql.com/doc/refman/en/innodb-multi-versioning.html) needing to keep those
+older versions of rows around if they are updated after the transaction started
+([`innodb_history_list_length`](https://orangematter.solarwinds.com/2015/07/20/what-is-innodb-history-list-length/)). For
+this reason, I would recommend always using REPLICA tablets for VDiff operations whenever you can (when the source is an
+[unmanaged tablet](https://vitess.io/docs/user-guides/configuration-advanced/unmanaged-tablet/), such as when e.g. moving
+from RDS into Vitess, you may only have a PRIMARY tablet available). You can control that using the
+`--tablet_types=REPLICA` flag for the [VDiff command](https://vitess.io/docs/reference/vreplication/vdiff/). In v14+ the
+default was changed to: `--tablet_types=in_order:RDONLY,REPLICA,PRIMARY`.
+{{</ info >}}
+
+The original version worked very well but it suffered from [some limitations](https://vitess.io/docs/15.0/reference/vreplication/vdiff/#note)
+that posed challenges in certain situations such as when working
+[with very large tables](https://vitess.io/docs/15.0/reference/vreplication/vdiff/#using-vdiff-with-huge-tables).
+For example, if you have over 1TiB of data that needs to be compared the VDiff could take a week to complete. If
+during this time you had any failure such as one of the MySQL connections used getting closed (e.g. due to
+[`wait_timeout`](https://dev.mysql.com/doc/refman/en/server-system-variables.html#sysvar_wait_timeout) or
+[`net_write_timeout`](https://dev.mysql.com/doc/refman/en/server-system-variables.html#sysvar_net_write_timeout))
+then you'd have to start the entire operation over again from scratch.
+
+We processed feedback from Vitess users over the course of 2+ years as they used VDiff in production and a
+set of underlying issues started to become clear:
+* Fragility — any connection loss, process failure, failover etc. would cause the VDiff to fail and need to be re-run
+* Synchronous command — the vtctl client command would block until the VDiff completed which posed some challenges and
+  required a stable machine where e.g. a [tmux](https://github.com/tmux/tmux/wiki) session could be used for the client
+  call
+* VTCtld as the controller — the [Vitess cluster management daemon](https://vitess.io/docs/reference/programs/vtctld/) is
+  generally a lightweight process used to coordinate complex operations that span many Vitess components. It's not designed
+  to be used for operations that span days and require the resources needed to compare 100s of GiBs of data
+  * Network traffic — the [tablets](https://vitess.io/docs/concepts/tablet/) on each side of the VDiff streamed their
+    data to the `vtctld` process which then compared the data. This generated a lot of network traffic which could
+    become a bottleneck and impact overall network bandwidth and latency. Keep in mind that it's common for the data
+    involved to reside in 3 or more failure domains / availability zones.
+* No progress reporting — the VDiff could run for days without any indication of overall progress
+* Execution time — the VDiff could take days or weeks to complete for very large tables, in large part because there
+  was very little concurrency with a single `vtctld` process doing the bulk of the work
+
+We set out to create a new version of VDiff that addressed all of these issues.
+
+## VDiff V2
+
+We started by largely [rearranging the existing VDiff code](https://github.com/vitessio/vitess/pull/10382) so that
+instead of being managed and controlled by a single `vtctld` it's instead managed and executed — in parallel — by the
+primary tablet in each shard on the target side. This offers parallelism while also reducing the amount of network traffic
+needed to perform the diff. The operation was also made asynchronous, with the
+[`VDiff Show`](https://vitess.io/docs/reference/vreplication/vdiff2/#show-progressstatus-of-a-vdiff)
+client command gathering and reporting the results of the VDiff operation from each of the target shards involved.
+
+We then made VDiffs [resumable](https://github.com/vitessio/vitess/pull/10497) so that if a failure occurs during
+the diff, the operation can be resumed from where it left off. This also makes it possible to do a rolling or
+incremental VDiff where you may perform the VDiff immediately after a workflow completes, and then again just before
+doing a cutover for added confidence as there may be weeks between those two stages. From there we added support for
+[auto-restarting](https://github.com/vitessio/vitess/pull/10639) a VDiff if any ephemeral/recoverable error occurs.
+This means that you can have process crashes, failovers, network issues, etc and the VDiff will automatically
+recover and continue running.
+
+We also added [progress reporting](https://github.com/vitessio/vitess/pull/10639) so that you have some idea of
+how much work the VDiff has done, how much is left, and have an ETA for when it's likely to complete. This gives
+you greater peace of mind while a longer operation runs and better allows you to prepare for the next step once
+the VDiff completes.
+
+There were a variety of other minor improvements as well. In total, we hope that this new version addresses the 
+major set of issues that users had and provides a solid base for us to continue making further improvements.
+
+## Conclusion
+
+Vitess [VReplication](https://vitess.io/docs/16.0/reference/vreplication/vreplication/) offers a set of
+powerful features that allow users to manage data workflows when that data is spread across a large fleet of
+MySQL instances. [VDiff](https://vitess.io/docs/reference/vreplication/vdiff2/) then provides an invaluable
+tool for verifying the correctness of these complex operations, giving you confidence and peace of mind
+as you execute the data operations required to better meet your evolving business needs and objectives over
+time.
+
+Please try out [VDiff v2](https://vitess.io/docs/reference/vreplication/vdiff2/) in
+[Vitess 15.0](https://github.com/vitessio/vitess/releases/tag/v15.0.0) — where it's marked as experimental —
+and provide feedback! We hope to mark it as GA/production-ready in the upcoming 16.0 release and your
+feedback is invaluable. Special shout out to [Arthur Schreiber @ GitHub](https://github.com/arthurschreiber)
+for providing a lot of great early feedback that's helping to make the feature better! ♥️
+
+Happy data migrations! 🚀 🚀 🚀