Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions content/en/blog/2022-11-14-vdiff-v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
author: 'Matt Lord'
date: 2022-11-14
slug: '2022-11-14-vdiff-v2'
tags: [Vitess','MySQL','sharding','replication']
title: 'Introducing Vitess Diff V2'
Comment thread
mattlord marked this conversation as resolved.
Outdated
description: "Vitess's Powerful Data Diff Tool Just Got Even Better"
Comment thread
mattlord marked this conversation as resolved.
Outdated
---

Vitess is a solution that allows you to infinitely scale MySQL while providing clients and apps with a single logical
mysqld view of the fleet of MySQL instances comprising any number of [`Keyspaces`](https://vitess.io/docs/concepts/keyspace/)
Comment thread
mattlord marked this conversation as resolved.
Outdated
and [`Shards`](https://vitess.io/docs/concepts/shard/).
<a href="/img/VitessQueryExample.png"><img src="/img/VitessQueryExample.png" alt="Query Example" width="275" align="right"/></a>

Vitess also provides the cluster and data management tools that make it possible to manage a massive cluster and
perform complex workflows using [VReplication](https://vitess.io/docs/reference/vreplication/vreplication/), such
as:
* [Moving tables](https://vitess.io/docs/reference/vreplication/movetables/) into Vitess or between keyspaces
* [Resharding](https://vitess.io/docs/reference/vreplication/reshard/) to adjust to changes in data size and load
* [Materialized views and rollups](https://vitess.io/docs/reference/vreplication/materialize/) for data analytics
and data locality
* [Online schema changes](https://vitess.io/docs/user-guides/schema-changes/managed-online-schema-changes/) that
are trackable, cancellable, revertible, and retryable

## Why a Diff Tool?

Data is typically one of the most critical assets within an organization. As such, an operator needs to be able to
verify the correctness of this data, in particular as the data is moved around or otherwise transformed. For example,
operators have wanted a way to verify data consistency after replicating data from one MySQL instance to another or
dumping a table from one instance and loading it in another. However, even for a single table in these simplest of
cases — performing a safe, reliable, light-weight, and performant online diff between two MySQL instances is a
suprisingly difficult problem. Due to the challenges involved, there have been few attempted general solutions with
the most notable being:
* Percona's [pt-table-checksum](https://docs.percona.com/percona-toolkit/pt-table-checksum.html)
* MySQL's `mysqldiff` tool that was part of the now EOL'd [MySQL Utilities](https://downloads.mysql.com/docs/mysql-utilities-1.6-en.pdf)

With Vitess, _the need for a data diff tool is even more pronounced_ because you'll be migrating data from your
legacy systems into Vitess, migrating data across keyspaces, and performing a variety of other workflows. This
is further complicated by the fact that these workflows may be done across MySQL versions, data centers, with
differing schemas between the source and target, and over long time periods in which your data evolves. So it
is critical to have a tool that can reliably perform a logical diff between the source and target of these
workflows, in a timely manner, and without impacting production traffic.

## VDiff

Vitess provided a solution for _diffing tables that are part of a [VReplication](https://vitess.io/docs/reference/vreplication/vreplication/)
workflow_ called [VDiff](https://vitess.io/docs/reference/vreplication/vdiff/). The basic algorithm or flow for each table is as follows:
* [`vtctld`](https://vitess.io/docs/reference/programs/vtctld/)
[selects tablets](https://vitess.io/docs/reference/vreplication/tablet_selection/) in the source and target
shards to use for the comparison — one per shard on each side
* On the target [tablets](https://vitess.io/docs/concepts/tablet/): stop the VReplication workflow for the VDiff
operation, to "freeze" the state, and record the current &nbsp;
[GTID](https://dev.mysql.com/doc/refman/en/replication-gtids-concepts.html) position in the
[VStream](https://vitess.io/docs/concepts/vstream/)
* On the source [tablets](https://vitess.io/docs/concepts/tablet/):
* wait for replication to catch up to at least where the target is (remember that the source instance may be a replica and the target
a primary)
* [lock the table](https://dev.mysql.com/doc/refman/en/lock-tables.html) to get the current
[`GTID_EXECUTED`](https://dev.mysql.com/doc/refman/en/replication-gtids-concepts.html) which gives us a logical
point in time that will correspond to the the read view in our upcoming transaction
* issue [`START TRANSACTION WITH CONSISTENT SNAPSHOT`](https://dev.mysql.com/doc/refman/en/commit.html)
* [unlock the table](https://dev.mysql.com/doc/refman/en/lock-tables.html) as we now have a consistent snapshot of
the table data and the GTID metadata that are both at the same logical point in time with regards to the table
we're diffing
* On the target [tablets](https://vitess.io/docs/concepts/tablet/):
* start VReplication UNTIL we have reached that `GTID_EXECUTED` position in the [VStream](https://vitess.io/docs/concepts/vstream/)
which matches the one we saved when setting up the read view on the source
* issue [`START TRANSACTION WITH CONSISTENT SNAPSHOT`](https://dev.mysql.com/doc/refman/en/commit.html) (remember
that the state is "frozen" on the target tablet) — now the target context is at the same logical point in
time as the source for this table
* On the source and target tablets: issue `SELECT <cols>, FROM <table> ORDER BY <pkcols>`
* In [`vtctld`](https://vitess.io/docs/reference/programs/vtctld/) : stream the results from those SELECTs, doing a
merge sort from shards, and compare the rows on both sides logically, as the schema may be different on either
side, keeping a record of any differences seen
* On the target [tablets](https://vitess.io/docs/concepts/tablet/): restart the VReplication workflow
* On the source and target [tablets](https://vitess.io/docs/concepts/tablet/): close the open transactions with
a `ROLLBACK`
* Finally the [`vtctl`](https://vitess.io/docs/reference/programs/vtctl/) client prints a report (to STDOUT) of the
results

{{< info >}}
For large tables, holding a transaction open on the source tablets can have a significant impact on normal query
traffic due to [InnoDB MVCC](https://dev.mysql.com/doc/refman/en/innodb-multi-versioning.html) needing to keep those
older versions of rows around if they are updated after the transaction started
([`innodb_history_list_length`](https://orangematter.solarwinds.com/2015/07/20/what-is-innodb-history-list-length/)). For
this reason, I would recommend always using REPLICA tablets for VDiff operations whenever you can (when the source is an
[unmanaged tablet](https://vitess.io/docs/user-guides/configuration-advanced/unmanaged-tablet/), such as when e.g. moving
from RDS into Vitess, you may only have a PRIMARY tablet available). You can control that using the
`--tablet_types=REPLICA` flag for the [VDiff command](https://vitess.io/docs/reference/vreplication/vdiff/). In v14+ the
default was changed to: `--tablet_types=in_order:RDONLY,REPLICA,PRIMARY`.
{{</ info >}}

The original version worked very well but it suffered from [some limitations](https://vitess.io/docs/15.0/reference/vreplication/vdiff/#note)
that posed challenges in certain situations such as when working
[with very large tables](https://vitess.io/docs/15.0/reference/vreplication/vdiff/#using-vdiff-with-huge-tables).
For example, if you have over 1TiB of data that needs to be compared the VDiff could take a week to complete. If
during this time you had any failure such as one of the MySQL connections used getting closed (e.g. due to
[`wait_timeout`](https://dev.mysql.com/doc/refman/en/server-system-variables.html#sysvar_wait_timeout) or
[`net_write_timeout`](https://dev.mysql.com/doc/refman/en/server-system-variables.html#sysvar_net_write_timeout))
then you'd have to start the entire operation over again from scratch.

We processed feedback from Vitess users over the course of 2+ years as they used VDiff in production and a
set of underlying issues started to become clear:
* Fragility — any connection loss, process failure, failover etc. would cause the VDiff to fail and need to be re-run
* Synchronous command — the vtctl client command would block until the VDiff completed which posed some challenges and
required a stable machine where e.g. a [tmux](https://github.com/tmux/tmux/wiki) session could be used for the client
call
* VTCtld as the controller — the [Vitess cluster management daemon](https://vitess.io/docs/reference/programs/vtctld/) is
generally a lightweight process used to coordinate complex operations that span many Vitess components. It's not designed
to be used for operations that span days and require the resources needed to compare 100s of GiBs of data
* Network traffic — the [tablets](https://vitess.io/docs/concepts/tablet/) on each side of the VDiff streamed their
data to the `vtctld` process which then compared the data. This generated a lot of network traffic which could
become a bottleneck and impact overall network bandwidth and latency. Keep in mind that it's common for the data
involved to reside in 3 or more failure domains / availability zones.
* No progress reporting — the VDiff could run for days without any indication of overall progress
* Execution time — the VDiff could take days or weeks to complete for very large tables, in large part because there
was very little concurrency with a single `vtctld` process doing the bulk of the work

We set out to create a new version of VDiff that addressed all of these issues.

## VDiff V2

We started by largely [rearranging the existing VDiff code](https://github.com/vitessio/vitess/pull/10382) so that
instead of being managed and controlled by a `vtctld` it's managed and executed — in parallel — by each shard on the
target side. This offers parallelism while also reducing the amount of network traffic needed to perform the diff.
The operation was also made asynchronous, with the
[`VDiff Show`](https://vitess.io/docs/reference/vreplication/vdiff2/#show-progressstatus-of-a-vdiff)
client command gathering and reporting the results of the VDiff operation from each of the target shards involved.

We then made VDiffs [resumable](https://github.com/vitessio/vitess/pull/10497) so that if a failure occurs during
the diff, the operation can be resumed from where it left off. This also makes it possible to do a rolling or
incremental VDiff where you may perform the VDiff immediately after a workflow completes, and then again just before
doing a cutover for added confidence as there may be weeks between those two stages. From there we added support for
[auto-restarting](https://github.com/vitessio/vitess/pull/10639) a VDiff if any ephemeral/recoverable error occurs.
This means that you can have process crashes, failovers, network issues, etc and the VDiff will automatically
recover and continue running.

We also added [progress reporting](https://github.com/vitessio/vitess/pull/10639) so that you have some idea of
how much work the VDiff has done, how much is left, and have an ETA for when it's likely to complete. This gives
you greater peace of mind while a longer operation runs and better allows you to prepare for the next step once
the VDiff completes.

There were a variety of other minor improvements as well. In total, we hope that this new version addresses the
major set of issues that users had and provides a solid base for us to continue making further improvements.

## Conclusion

Vitess [VReplication](https://vitess.io/docs/16.0/reference/vreplication/vreplication/) offers a set of
powerful features that allow users to manage data workflows when that data is spread across a large fleet of
MySQL instances. [VDiff](https://vitess.io/docs/reference/vreplication/vdiff2/) then provides an invaluable
tool for verifying the correctness of these complex operations, giving you confidence and peace of mind
as you execute the data operations required to better meet your evolving business needs and objectives over
time.

Please try out [VDiff v2](https://vitess.io/docs/reference/vreplication/vdiff2/) in
[Vitess 15.0](https://github.com/vitessio/vitess/releases/tag/v15.0.0) — where it's marked as experimental —
and provide feedback! We hope to mark it as GA/production-ready in the upcoming 16.0 release and your
feedback is invaluable.

Happy data migrations! 🚀 🚀 🚀
Binary file added static/img/VitessQueryExample.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.