Skip to content
161 changes: 161 additions & 0 deletions content/en/blog/2022-11-14-vdiff-v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
---
author: 'Matt Lord'
date: 2022-11-22
slug: '2022-11-22-vdiff-v2'
tags: [Vitess','MySQL','sharding','replication']
title: 'Introducing VDiff V2'
description: "Vitess's Powerful Workflow Data Diff Tool Just Got Even Better"
---

Vitess is a solution that allows you to infinitely scale MySQL while providing clients and apps with [a single logical
view](https://vitess.io/docs/concepts/vtgate/) of the fleet of MySQL instances comprising any number of
[`Keyspaces`](https://vitess.io/docs/concepts/keyspace/) and [`Shards`](https://vitess.io/docs/concepts/shard/).

Vitess also provides the cluster and data management tools that make it possible to manage a massive cluster and
perform complex workflows using [VReplication](https://vitess.io/docs/reference/vreplication/vreplication/), such
as:
* [Moving tables](https://vitess.io/docs/reference/vreplication/movetables/) into Vitess or between keyspaces
* [Resharding](https://vitess.io/docs/reference/vreplication/reshard/) to adjust to changes in data size and load
* [Materialized views and rollups](https://vitess.io/docs/reference/vreplication/materialize/) for data analytics
and data locality
* [Online schema changes](https://vitess.io/docs/user-guides/schema-changes/managed-online-schema-changes/) that
are trackable, cancellable, revertible, and retryable

## Why a Diff Tool?

Data is typically one of the most critical assets within an organization. As such, an operator needs to be able to
verify the correctness of this data, in particular as the data is moved around or otherwise transformed. For example,
operators have wanted a way to verify data consistency after replicating data from one MySQL instance to another or
dumping a table from one instance and loading it in another. However, even for a single table in these simplest of
cases — performing a safe, reliable, light-weight, and performant online diff between two MySQL instances is a
suprisingly difficult problem. Due to the challenges involved, there have been few attempted general solutions with
the most notable being:
* Percona's [pt-table-checksum](https://docs.percona.com/percona-toolkit/pt-table-checksum.html)
* MySQL's `mysqldiff` tool that was part of the now EOL'd [MySQL Utilities](https://downloads.mysql.com/docs/mysql-utilities-1.6-en.pdf)

With Vitess, _the need for a data diff tool is even more pronounced_ because you'll be migrating data from your
legacy systems into Vitess, migrating data across keyspaces, and performing a variety of other workflows. This
is further complicated by the fact that these workflows may be done across MySQL versions, data centers, with
differing schemas between the source and target, and over long time periods in which your data evolves. So it
is critical to have a tool that can reliably perform a logical diff between the source and target of these
workflows, in a timely manner, and without impacting production traffic.

## VDiff

Vitess provided a solution for _diffing tables that are part of a [VReplication](https://vitess.io/docs/reference/vreplication/vreplication/)
workflow_ called [VDiff](https://vitess.io/docs/reference/vreplication/vdiff/). I will walk through the basic algorithm or flow used for diffing
each table in order to demonstrate the challenges involved and how we solved them in VDiff:
* [`vtctld`](https://vitess.io/docs/reference/programs/vtctld/)
[selects tablets](https://vitess.io/docs/reference/vreplication/tablet_selection/) in the source and target
shards to use for the comparison — one per shard on each side
* On the target [tablets](https://vitess.io/docs/concepts/tablet/): stop the VReplication workflow for the VDiff
operation, to "freeze" the state, and record the current  
[GTID](https://dev.mysql.com/doc/refman/en/replication-gtids-concepts.html) position in the
[VStream](https://vitess.io/docs/concepts/vstream/)
* On the source [tablets](https://vitess.io/docs/concepts/tablet/):
* wait for replication to catch up to at least where the target is (remember that the source instance may be a replica and the target
a primary)
* [lock the table](https://dev.mysql.com/doc/refman/en/lock-tables.html) to get the current
[`GTID_EXECUTED`](https://dev.mysql.com/doc/refman/en/replication-gtids-concepts.html) which gives us a logical
point in time that will correspond to the the read view in our upcoming transaction
* issue [`START TRANSACTION WITH CONSISTENT SNAPSHOT`](https://dev.mysql.com/doc/refman/en/commit.html)
* [unlock the table](https://dev.mysql.com/doc/refman/en/lock-tables.html) as we now have a consistent snapshot of
the table data and the GTID metadata that are both at the same logical point in time with regards to the table
we're diffing
* On the target [tablets](https://vitess.io/docs/concepts/tablet/):
* start VReplication UNTIL we have reached that `GTID_EXECUTED` position in the [VStream](https://vitess.io/docs/concepts/vstream/)
which matches the one we saved when setting up the read view on the source
* issue [`START TRANSACTION WITH CONSISTENT SNAPSHOT`](https://dev.mysql.com/doc/refman/en/commit.html) (remember
that the state is "frozen" on the target tablet) — now the target context is at the same logical point in
time as the source for this table
* On the source and target tablets: issue `SELECT <cols> FROM <table> ORDER BY <pkcols>` (for deterministic ordering and to avoid a filesort)
* In [`vtctld`](https://vitess.io/docs/reference/programs/vtctld/) : stream the results from those SELECTs, doing a
merge sort from the shards, and compare the rows on both sides logically, as the schema may be different on either
side, keeping a record of any differences seen
* On the target [tablets](https://vitess.io/docs/concepts/tablet/): restart the VReplication workflow
* On the source and target [tablets](https://vitess.io/docs/concepts/tablet/): close the open transaction with
a `ROLLBACK`
* Finally the [`vtctl`](https://vitess.io/docs/reference/programs/vtctl/) client prints a report (to STDOUT) of the
results

{{< info >}}
For large tables, holding a transaction open on the source tablets can have a significant impact on normal query
traffic due to [InnoDB MVCC](https://dev.mysql.com/doc/refman/en/innodb-multi-versioning.html) needing to keep those
older versions of rows around if they are updated after the transaction started
([`innodb_history_list_length`](https://orangematter.solarwinds.com/2015/07/20/what-is-innodb-history-list-length/)). For
this reason, I would recommend always using REPLICA tablets for VDiff operations whenever you can (when the source is an
[unmanaged tablet](https://vitess.io/docs/user-guides/configuration-advanced/unmanaged-tablet/), such as when e.g. moving
from RDS into Vitess, you may only have a PRIMARY tablet available). You can control that using the
`--tablet_types=REPLICA` flag for the [VDiff command](https://vitess.io/docs/reference/vreplication/vdiff/). In v14+ the
default was changed to: `--tablet_types=in_order:RDONLY,REPLICA,PRIMARY`.
{{</ info >}}

The original version worked very well but it suffered from [some limitations](https://vitess.io/docs/15.0/reference/vreplication/vdiff/#note)
that posed challenges in certain situations such as when working
[with very large tables](https://vitess.io/docs/15.0/reference/vreplication/vdiff/#using-vdiff-with-huge-tables).
For example, if you have over 1TiB of data that needs to be compared the VDiff could take a week to complete. If
during this time you had any failure such as one of the MySQL connections used getting closed (e.g. due to
[`wait_timeout`](https://dev.mysql.com/doc/refman/en/server-system-variables.html#sysvar_wait_timeout) or
[`net_write_timeout`](https://dev.mysql.com/doc/refman/en/server-system-variables.html#sysvar_net_write_timeout))
then you'd have to start the entire operation over again from scratch.

We processed feedback from Vitess users over the course of 2+ years as they used VDiff in production and a
set of underlying issues started to become clear:
* Fragility — any connection loss, process failure, failover etc. would cause the VDiff to fail and need to be re-run
* Synchronous command — the vtctl client command would block until the VDiff completed which posed some challenges and
required a stable machine where e.g. a [tmux](https://github.com/tmux/tmux/wiki) session could be used for the client
call
* VTCtld as the controller — the [Vitess cluster management daemon](https://vitess.io/docs/reference/programs/vtctld/) is
generally a lightweight process used to coordinate complex operations that span many Vitess components. It's not designed
to be used for operations that span days and require the resources needed to compare 100s of GiBs of data
* Network traffic — the [tablets](https://vitess.io/docs/concepts/tablet/) on each side of the VDiff streamed their
data to the `vtctld` process which then compared the data. This generated a lot of network traffic which could
become a bottleneck and impact overall network bandwidth and latency. Keep in mind that it's common for the data
involved to reside in 3 or more failure domains / availability zones.
* No progress reporting — the VDiff could run for days without any indication of overall progress
* Execution time — the VDiff could take days or weeks to complete for very large tables, in large part because there
was very little concurrency with a single `vtctld` process doing the bulk of the work

We set out to create a new version of VDiff that addressed all of these issues.

## VDiff V2

We started by largely [rearranging the existing VDiff code](https://github.com/vitessio/vitess/pull/10382) so that
instead of being managed and controlled by a single `vtctld` it's instead managed and executed — in parallel — by the
primary tablet in each shard on the target side. This offers parallelism while also reducing the amount of network traffic
needed to perform the diff. The operation was also made asynchronous, with the
[`VDiff Show`](https://vitess.io/docs/reference/vreplication/vdiff2/#show-progressstatus-of-a-vdiff)
client command gathering and reporting the results of the VDiff operation from each of the target shards involved.

We then made VDiffs [resumable](https://github.com/vitessio/vitess/pull/10497) so that if a failure occurs during
the diff, the operation can be resumed from where it left off. This also makes it possible to do a rolling or
incremental VDiff where you may perform the VDiff immediately after a workflow completes, and then again just before
doing a cutover for added confidence as there may be weeks between those two stages. From there we added support for
[auto-restarting](https://github.com/vitessio/vitess/pull/10639) a VDiff if any ephemeral/recoverable error occurs.
This means that you can have process crashes, failovers, network issues, etc and the VDiff will automatically
recover and continue running.

We also added [progress reporting](https://github.com/vitessio/vitess/pull/10639) so that you have some idea of
how much work the VDiff has done, how much is left, and have an ETA for when it's likely to complete. This gives
you greater peace of mind while a longer operation runs and better allows you to prepare for the next step once
the VDiff completes.

There were a variety of other minor improvements as well. In total, we hope that this new version addresses the
major set of issues that users had and provides a solid base for us to continue making further improvements.

## Conclusion

Vitess [VReplication](https://vitess.io/docs/16.0/reference/vreplication/vreplication/) offers a set of
powerful features that allow users to manage data workflows when that data is spread across a large fleet of
MySQL instances. [VDiff](https://vitess.io/docs/reference/vreplication/vdiff2/) then provides an invaluable
tool for verifying the correctness of these complex operations, giving you confidence and peace of mind
as you execute the data operations required to better meet your evolving business needs and objectives over
time.

Please try out [VDiff v2](https://vitess.io/docs/reference/vreplication/vdiff2/) in
[Vitess 15.0](https://github.com/vitessio/vitess/releases/tag/v15.0.0) — where it's marked as experimental —
and provide feedback! We hope to mark it as GA/production-ready in the upcoming 16.0 release and your
feedback is invaluable. Special shout out to [Arthur Schreiber @ GitHub](https://github.com/arthurschreiber)
for providing a lot of great early feedback that's helping to make the feature better! ♥️

Happy data migrations! 🚀 🚀 🚀