Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: NativeDDL cut-over with MySQL 8.0 locking #8573

Closed

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Aug 2, 2021

Right now this is not working

But I may as well submit the changes as Draft and explain what we're trying to achieve.

Description

What are we trying to achieve

This is an attempt to create an atomic cut-over for NativeDDL (OnlineDDL via VReplication). The current NativeDDL cut-over is done in two steps: writes to the original table are rejected, we wait for the final catchup of binary logs, we then swap original and new tables.

During this wait time, queries on the original table are rejected. We want them to be blocked, instead. That is, we want the queries to wait, and then proceed to execute on the newly instated table. This is what we call a blocking/atomic cut-over. It means queries are not rejected, though they do pile up during the wait time.

gh-ost uses an elaborate mechanism to achieve that. Recently, a few users found holes in the logic and were able to demonstrate how queries can be lost during the cut-ver time (the queries would execute on the old table after renamed way, rather than on the new table once instated).
Also, The gh-ost mechanism involves multiple moving parts working in fine synchronization and is not viable for Vitess. Vitess NativeDDL runs in different planes that have one-way gRPC communication (control plane calls vreplication, but not the other way around).

At the time, Oracle/MySQL developed a feature for gh-ost at my specific request, that would allow a cleaner logic for cut-over. Unfortunately, the feature was implemented in a way that only made it more complex to use with gh-ost. The feature was implemented in 8.0.13, and this PR attempts to utilize this functionality for NativeDDL.

Right now without success.

What's the MySQL 8.0.13 locking feature and what's the implementation problem?

Before MySQL 8.0.13 it was impossible to LOCK TABLES ... and then RENAME TABLE ... from the same connection.

My intention was for gh-ost's control plane to be able to LOCK TABLES original_table WRITE, then wait for catchup, and follow up with RENAME TABLE original_table TO old_table, ghost_table TO original_table. In the same connection.

However, the way the feature was implemented, you have to hold locks on all tables in your RENAME statement. So, if you're holding any locks at all during RENAME TABLE original_table TO old_table, ghost_table TO original_table, then you must specifically hold locks on original_table and on ghost_table.

This changes the logic and viability of the feature dramatically.

I should note that in general the reasoning is inline with the general MySQL LOCK theme: when you have LOCKs, you may only operate on locked tables (and this gets even more complicated as we'll see below). This is fine, it's just that this wasn't what I wanted to achieve for gh-ost.

So, why is this so dramatic? Because now we can't let the control plane run the LOCK. Why? Because it then locks ghost_table. And, the whoile point is that we want to lock the original table, and then apply more changes to the ghost table, based on final binary log events.

But if the ghost table is locked, then the only entity that can write to the ghost table is the applier, which is on a different plane than the controller.

Specifically in Vitess, the control plane is onlineddl.Executor. The applier is VReplication itself. Normally, VReplication should be ignorant of the specifics of the operation, and the controller tells it how to run, when to step, etc.

But now, it seems VReplication has to be the one that takes the locks. Here's where it gets more complicated, in different trajectories:

  1. VReplication has to be the one to lock both original and ghost tables. It must be the one to follow up with the RENAME and final UNLOCK tables. This means VReplication now needs to have substantial online DDL logic. Ideally, only the controller should have that logic, but there's no choice here.
  2. Not only do these LOCK, RENAME, UNLOCK have to happen from within the same connection, it also has to be the same connection used by vreplication itself. Speficially, VPlayer must be able to write to the ghost table. It must use the same connection used for locking the table.
  3. Which creates concurrency/synchronization/race condition issues, because the controller invokes VReplication via gRPC, asynchronously to the vreplication flow. Vitess' mysql.Conn is not thread safe and we need to be able to sync queries invoked from gRPC with queries invoked by VPlayer
  4. Moreover, the onlineDDL flow actually requires to stop vreplication once the original and ghost tables are in sync. That leads to the destruction of vreplication and its controller and that also destroys the dbClient/connection. But we stil need that connection open because we yet need to RENAME and UNLOCK.
  5. Moreover, VReplication also writes to _vt.vreplication table. It does so transactionally to VCopier and VPlayer. Recall that in MySQL, if you have any locks at all and write to some table, you must lock that table. This means we need to extend LOCK TABLES to include _vt.vreplication. This further messes up concurrency and sycnhronization as other moving parts access _vt.vreplication, and, if that table is LOCKed, they can't operate.
  6. Which requires us to patch some other logic like waitForPos and `stopping vreplication.

Some notable changes in the code:

  • vreplication's dbClient now has a mutex to make it thread safe to be used from various moving parts.
  • vreplication's controller now exposes dbClient so that it can be used from within vreplication's Engine
  • in the cut-over logic, Engine subscribes as an "interested party" for the controller's dbClient. The controller will not destroy dbClient (hence, will not close the connection) before Engine completes the cut-over.
  • A few goroutines are necessary to avoid deadlock in the above logic

Tests

I revived an older Online DDL stress test, and adapted it for the new logic. Introducing:

  • .github/workflows/cluster_endtoend_onlineddl_vrepl_stress80.yml
  • go/test/endtoend/onlineddl/vrepl_stress_80/onlineddl_vrepl_stress_mysql80_test.go

Together, these:

  • Make sure to install MySQL 8.0 (> 8.0.13, which is an "old" version by now)
  • Run a series of stress tests on a table. The tests inject insert and update statements on the table, with a predictable effect, which is recorder. After the stress workload completes, we compare the table's data with what we've predicted. There must be a match.
  • Some tests run without any ALTER TABLE just to validate the data/prediction logic.
  • We then run similarly, but with a trivial alter table... engine=innodb. We validate that the migration utilizes the new 8.0 locking mechanism.

THIS SADLY FAILS

The tests can show inconsistencies_. They're able to show some statements are wrongly executed on the old table but not on the new table. We find rows in the old table that do not exist in the new table (shouldn't happen, only insert and update). We keep an increasing counter that's used for each insert and update, and We find rows with higher counter values on the old table.

This shows failure of the logic. I am yet unable to explain it. It is likely related to MySQL internal locking queue behavior, and I suspect some similar gotchas as presented by gh-ost users.

Right now not sure how to proceed

The code is here, it's not pretty, and it's not working. I'm not sure if we have a path to make this work. Meanwhile I hope the above clarifies what we are trying to do.

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

cc @rohit-nayak-ps @deepthi @sougou @rbranson

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach shlomi-noach added Component: Query Serving Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature) labels Aug 2, 2021
Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Aug 2, 2021

Very curiously on GitHub CI the test onlineddl_vrepl_stress_80 passes: https://github.com/vitessio/vitess/pull/8573/checks?check_run_id=3218283408

But I am able to reproduce test failures on a dev machine 😬

@shlomi-noach
Copy link
Contributor Author

This does not show great promise. We're likely to solve NativeDDL cut-over with vtgate buffering, instead.

@shlomi-noach
Copy link
Contributor Author

This is not working as one might expect. We've decided that the path forward is by caching queries. Closing this experimental (yet educational) branch.

@shlomi-noach shlomi-noach deleted the onlineddl-vrepl-80-cutover branch August 29, 2021 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant