WIP: NativeDDL cut-over with MySQL 8.0 locking #8573

shlomi-noach · 2021-08-02T07:31:28Z

Right now this is not working

But I may as well submit the changes as Draft and explain what we're trying to achieve.

Description

What are we trying to achieve

This is an attempt to create an atomic cut-over for NativeDDL (OnlineDDL via VReplication). The current NativeDDL cut-over is done in two steps: writes to the original table are rejected, we wait for the final catchup of binary logs, we then swap original and new tables.

During this wait time, queries on the original table are rejected. We want them to be blocked, instead. That is, we want the queries to wait, and then proceed to execute on the newly instated table. This is what we call a blocking/atomic cut-over. It means queries are not rejected, though they do pile up during the wait time.

gh-ost uses an elaborate mechanism to achieve that. Recently, a few users found holes in the logic and were able to demonstrate how queries can be lost during the cut-ver time (the queries would execute on the old table after renamed way, rather than on the new table once instated).
Also, The gh-ost mechanism involves multiple moving parts working in fine synchronization and is not viable for Vitess. Vitess NativeDDL runs in different planes that have one-way gRPC communication (control plane calls vreplication, but not the other way around).

At the time, Oracle/MySQL developed a feature for gh-ost at my specific request, that would allow a cleaner logic for cut-over. Unfortunately, the feature was implemented in a way that only made it more complex to use with gh-ost. The feature was implemented in 8.0.13, and this PR attempts to utilize this functionality for NativeDDL.

Right now without success.

What's the MySQL 8.0.13 locking feature and what's the implementation problem?

Before MySQL 8.0.13 it was impossible to LOCK TABLES ... and then RENAME TABLE ... from the same connection.

My intention was for gh-ost's control plane to be able to LOCK TABLES original_table WRITE, then wait for catchup, and follow up with RENAME TABLE original_table TO old_table, ghost_table TO original_table. In the same connection.

However, the way the feature was implemented, you have to hold locks on all tables in your RENAME statement. So, if you're holding any locks at all during RENAME TABLE original_table TO old_table, ghost_table TO original_table, then you must specifically hold locks on original_table and on ghost_table.

This changes the logic and viability of the feature dramatically.

I should note that in general the reasoning is inline with the general MySQL LOCK theme: when you have LOCKs, you may only operate on locked tables (and this gets even more complicated as we'll see below). This is fine, it's just that this wasn't what I wanted to achieve for gh-ost.

So, why is this so dramatic? Because now we can't let the control plane run the LOCK. Why? Because it then locks ghost_table. And, the whoile point is that we want to lock the original table, and then apply more changes to the ghost table, based on final binary log events.

But if the ghost table is locked, then the only entity that can write to the ghost table is the applier, which is on a different plane than the controller.

Specifically in Vitess, the control plane is onlineddl.Executor. The applier is VReplication itself. Normally, VReplication should be ignorant of the specifics of the operation, and the controller tells it how to run, when to step, etc.

But now, it seems VReplication has to be the one that takes the locks. Here's where it gets more complicated, in different trajectories:

VReplication has to be the one to lock both original and ghost tables. It must be the one to follow up with the RENAME and final UNLOCK tables. This means VReplication now needs to have substantial online DDL logic. Ideally, only the controller should have that logic, but there's no choice here.
Not only do these LOCK, RENAME, UNLOCK have to happen from within the same connection, it also has to be the same connection used by vreplication itself. Speficially, VPlayer must be able to write to the ghost table. It must use the same connection used for locking the table.
Which creates concurrency/synchronization/race condition issues, because the controller invokes VReplication via gRPC, asynchronously to the vreplication flow. Vitess' mysql.Conn is not thread safe and we need to be able to sync queries invoked from gRPC with queries invoked by VPlayer
Moreover, the onlineDDL flow actually requires to stop vreplication once the original and ghost tables are in sync. That leads to the destruction of vreplication and its controller and that also destroys the dbClient/connection. But we stil need that connection open because we yet need to RENAME and UNLOCK.
Moreover, VReplication also writes to _vt.vreplication table. It does so transactionally to VCopier and VPlayer. Recall that in MySQL, if you have any locks at all and write to some table, you must lock that table. This means we need to extend LOCK TABLES to include _vt.vreplication. This further messes up concurrency and sycnhronization as other moving parts access _vt.vreplication, and, if that table is LOCKed, they can't operate.
Which requires us to patch some other logic like waitForPos and `stopping vreplication.

Some notable changes in the code:

vreplication's dbClient now has a mutex to make it thread safe to be used from various moving parts.
vreplication's controller now exposes dbClient so that it can be used from within vreplication's Engine
in the cut-over logic, Engine subscribes as an "interested party" for the controller's dbClient. The controller will not destroy dbClient (hence, will not close the connection) before Engine completes the cut-over.
A few goroutines are necessary to avoid deadlock in the above logic

Tests

I revived an older Online DDL stress test, and adapted it for the new logic. Introducing:

.github/workflows/cluster_endtoend_onlineddl_vrepl_stress80.yml
go/test/endtoend/onlineddl/vrepl_stress_80/onlineddl_vrepl_stress_mysql80_test.go

Together, these:

Make sure to install MySQL 8.0 (> 8.0.13, which is an "old" version by now)
Run a series of stress tests on a table. The tests inject insert and update statements on the table, with a predictable effect, which is recorder. After the stress workload completes, we compare the table's data with what we've predicted. There must be a match.
Some tests run without any ALTER TABLE just to validate the data/prediction logic.
We then run similarly, but with a trivial alter table... engine=innodb. We validate that the migration utilizes the new 8.0 locking mechanism.

THIS SADLY FAILS

The tests can show inconsistencies_. They're able to show some statements are wrongly executed on the old table but not on the new table. We find rows in the old table that do not exist in the new table (shouldn't happen, only insert and update). We keep an increasing counter that's used for each insert and update, and We find rows with higher counter values on the old table.

This shows failure of the logic. I am yet unable to explain it. It is likely related to MySQL internal locking queue behavior, and I suspect some similar gotchas as presented by gh-ost users.

Right now not sure how to proceed

The code is here, it's not pretty, and it's not working. I'm not sure if we have a path to make this work. Meanwhile I hope the above clarifies what we are trying to do.

Related Issue(s)

Online DDL: tracking issue #6926
gh-ost cut-over dataloss issue: When the _xxxx_gho table is queried, data may be lost. github/gh-ost#887
gh-ost cut-over handing issues: lock hang at cut-over table stage github/gh-ost#939, cut over not retrying ? github/gh-ost#971, gh-ost swap table is hanging cluster github/gh-ost#974

Checklist

Should this PR be backported?
Tests were added or are not required
Documentation was added or is not required

cc @rohit-nayak-ps @deepthi @sougou @rbranson

…while tables are locked Signed-off-by: Shlomi Noach <[email protected]>

Signed-off-by: Shlomi Noach <[email protected]>

…or/vcopier/vplayer Signed-off-by: Shlomi Noach <[email protected]>

Signed-off-by: Shlomi Noach <[email protected]>

…s MySQL 8.0 cut-over Signed-off-by: Shlomi Noach <[email protected]>

Signed-off-by: Shlomi Noach <[email protected]>

shlomi-noach · 2021-08-02T09:22:17Z

Very curiously on GitHub CI the test onlineddl_vrepl_stress_80 passes: https://github.com/vitessio/vitess/pull/8573/checks?check_run_id=3218283408

But I am able to reproduce test failures on a dev machine 😬

shlomi-noach · 2021-08-04T12:22:50Z

This does not show great promise. We're likely to solve NativeDDL cut-over with vtgate buffering, instead.

shlomi-noach · 2021-08-29T06:07:36Z

This is not working as one might expect. We've decided that the path forward is by caching queries. Closing this experimental (yet educational) branch.

shlomi-noach added 19 commits July 28, 2021 14:24

flavor: SupportsLockTablesRename(): see if RENAME TABLE is supported …

e447769

…while tables are locked Signed-off-by: Shlomi Noach <[email protected]>

proto: VReplicationLockTables, VReplicationSwapTables

328b254

Signed-off-by: Shlomi Noach <[email protected]>

vreplication controller keep reference for dbClient used by vreplicat…

c20a34d

…or/vcopier/vplayer Signed-off-by: Shlomi Noach <[email protected]>

VReplicationExecInConnection

4e80486

Signed-off-by: Shlomi Noach <[email protected]>

VReplicationExecInConnection

37f62bc

Signed-off-by: Shlomi Noach <[email protected]>

tmclient VReplicationExecInConnection

6fa7a6d

Signed-off-by: Shlomi Noach <[email protected]>

VReplicationCutOverOnlineDDL{Request,Response}

63e4f46

Signed-off-by: Shlomi Noach <[email protected]>

VReplicationCutOverOnlineDDL

23952f4

Signed-off-by: Shlomi Noach <[email protected]>

override dbClient close

a23c181

Signed-off-by: Shlomi Noach <[email protected]>

new endtoend test: onlineddl_vrepl_stress_80, which specifically test…

2ffb0e5

…s MySQL 8.0 cut-over Signed-off-by: Shlomi Noach <[email protected]>

atomic_cutover column

0bc86d1

Signed-off-by: Shlomi Noach <[email protected]>

test atomic_cutover column

0dc05c9

Signed-off-by: Shlomi Noach <[email protected]>

test atomic_cutover column

0f5bcad

Signed-off-by: Shlomi Noach <[email protected]>

synchronize use of dbClient

a22ee6e

Signed-off-by: Shlomi Noach <[email protected]>

dbClient is thread safe

85aa872

Signed-off-by: Shlomi Noach <[email protected]>

endtoend: test table consistency after migration (right now that FAILS)

2743a24

Signed-off-by: Shlomi Noach <[email protected]>

clarify no deletes

e121ba6

Signed-off-by: Shlomi Noach <[email protected]>

merge main, resolve conflict

ef1a414

Signed-off-by: Shlomi Noach <[email protected]>

removed unused logic

836e9f0

Signed-off-by: Shlomi Noach <[email protected]>

shlomi-noach added Component: Query Serving Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature) labels Aug 2, 2021

safer unlock tables

c710ad1

Signed-off-by: Shlomi Noach <[email protected]>

shlomi-noach closed this Aug 29, 2021

shlomi-noach deleted the onlineddl-vrepl-80-cutover branch August 29, 2021 06:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: NativeDDL cut-over with MySQL 8.0 locking #8573

WIP: NativeDDL cut-over with MySQL 8.0 locking #8573

shlomi-noach commented Aug 2, 2021 •

edited by harshit-gangal

Loading

shlomi-noach commented Aug 2, 2021 •

edited

Loading

shlomi-noach commented Aug 4, 2021

shlomi-noach commented Aug 29, 2021

WIP: NativeDDL cut-over with MySQL 8.0 locking #8573

WIP: NativeDDL cut-over with MySQL 8.0 locking #8573

Conversation

shlomi-noach commented Aug 2, 2021 • edited by harshit-gangal Loading

Right now this is not working

Description

What are we trying to achieve

What's the MySQL 8.0.13 locking feature and what's the implementation problem?

Tests

THIS SADLY FAILS

Right now not sure how to proceed

Related Issue(s)

Checklist

shlomi-noach commented Aug 2, 2021 • edited Loading

shlomi-noach commented Aug 4, 2021

shlomi-noach commented Aug 29, 2021

shlomi-noach commented Aug 2, 2021 •

edited by harshit-gangal

Loading

shlomi-noach commented Aug 2, 2021 •

edited

Loading