-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vitess Online DDL atomic cut-over #11460
vitess Online DDL atomic cut-over #11460
Conversation
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…-stage Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…-stage Signed-off-by: Shlomi Noach <[email protected]>
A walkthrough to explain the new cut-over logic and why it is implemented the way it is.First, to explain the current logic, also discussed in this blog post. Current cut-over logicAssume we modify table
When
In essence, we have swapped Renaming
New cut-over logic in this PRWe want to avoid the puncture, such that replicas always have The query buffering should help us: activating the buffering means no queries should operate on We therefore employ a variation of In the naive implementation
Problem is this won't work as expected. (1) and (3) can't both run from the same connection. If they were, (3) would fail because the connection did not lock So (1) and (3) must run by different connections. But this poses a risk. When we run (3), how do we ensure that (1) is still holding the lock? Maybe (1)'s connections was terminated, releasing the lock? Meaning queries are actually able to run against This is how Non-naive, simplified implementation
Nuances
|
Signed-off-by: Shlomi Noach <[email protected]>
…-stage Signed-off-by: Shlomi Noach <[email protected]>
…-stage Signed-off-by: Shlomi Noach <[email protected]>
Seeing CI errors in the form
I suspect something crashes the primary tablet. Of course this only happens in GitHub CI and not on local machine, so it's a bit of a chase to find out what happened. |
Not even sure the CI issue is related to this PR. Still looking. |
From an internal discussion it seems the aforementioned CI problem is colossal, and not specifically related to this PR. |
Ready to review! Please make sure to read the original comment and then also #11460 (comment), if you want to have a good grasp of what's going on. Hint: it's complex! |
…-stage Signed-off-by: Shlomi Noach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds good to me.
Do we have a test that reproduces the problem that is described in the PR, or is that too difficult (needs to be timed precisely)?
Let's get a review from @mattlord as well.
Yeah, it's too difficult I think; and on top of this, because we do use query buffering on top, I'm not sure I can reproduce it. |
Signed-off-by: Shlomi Noach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! This approach makes sense to me and seems sound.
I only had some minor comments/questions/suggestions. Thanks!
@@ -739,31 +765,99 @@ func (e *Executor) cutOverVReplMigration(ctx context.Context, s *VReplStream) er | |||
return err | |||
} | |||
isVreplicationTestSuite := onlineDDL.StrategySetting().IsVreplicationTestSuite() | |||
e.updateMigrationStage(ctx, onlineDDL.UUID, "starting cut-over") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can fail for a variety of reasons, e.g. deadlock. We should have some error handling and perhaps retry logic. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is nice-to-have but not strictly critical for the cut-over. It adds a level of auditing/logging. Not sure we should fail the cut-over over failure of this action.
go/vt/vttablet/onlineddl/executor.go
Outdated
e.updateMigrationStage(ctx, onlineDDL.UUID, "dropping sentry table") | ||
dropTableQuery := sqlparser.BuildParsedQuery(sqlDropTable, sentryTableName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be in the following code block? Or we could just remove the code block. I personally think it's less readable in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! done
if err != nil { | ||
return err | ||
} | ||
_, err = e.execQuery(ctx, query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the error that we're not handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to summarize, we're ignoring it where it doesn't have a true effect on the cut-over, as we consider it more of a logging/debuggability feature
@@ -81,6 +81,8 @@ const ( | |||
alterSchemaMigrationsComponentThrottled = "ALTER TABLE _vt.schema_migrations add column component_throttled tinytext NOT NULL" | |||
alterSchemaMigrationsCancelledTimestamp = "ALTER TABLE _vt.schema_migrations add column cancelled_timestamp timestamp NULL DEFAULT NULL" | |||
alterSchemaMigrationsTablePostponeLaunch = "ALTER TABLE _vt.schema_migrations add column postpone_launch tinyint unsigned NOT NULL DEFAULT 0" | |||
alterSchemaMigrationsStage = "ALTER TABLE _vt.schema_migrations add column stage text not null" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text seems a little excessive, no? In case we want to use this in query predicates and index it. Not a problem, just struck me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we're running out of varchar
space for 5.7
. I may be mistaken but I have this recollection from a few months ago when I tried adding a varchar
column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we're probably hitting the [on] page size limit (text can be moved off page to secondary storage).
This is a good example of why we should probably start leveraging JSON columns more, otherwise we'll end up with hundreds of columns which may or may not be used at different times. We're using them now in VDiff2 FWIW.
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work on this!
force merging: only failing test is |
Hi, @shlomi-noach
|
Unfortunately not. Try this at home: create table t0 (id int primary key);
create table t1 (id int primary key);
lock tables t0 write;
rename table t0 to told, t1 to t0; output:
But we really want to lock just the one table. For this reason we have to use a third connection, which complicates the entire logic. See my suggestion for an upstream change: mysql/mysql-server#426, https://bugs.mysql.com/bug.php?id=108864 |
* cut-over with sentry table Signed-off-by: Shlomi Noach <[email protected]> * wait for rename via channel; write extra transaction post LOCK Signed-off-by: Shlomi Noach <[email protected]> * add stage info Signed-off-by: Shlomi Noach <[email protected]> * reduced wait-for-pos timeout. Improved stage logic Signed-off-by: Shlomi Noach <[email protected]> * cleanup Signed-off-by: Shlomi Noach <[email protected]> * more cleanup Signed-off-by: Shlomi Noach <[email protected]> * even more cleanup Signed-off-by: Shlomi Noach <[email protected]> * context timeout Signed-off-by: Shlomi Noach <[email protected]> * preserve stage by disabling deferred stage changes Signed-off-by: Shlomi Noach <[email protected]> * killing rename query upon error Signed-off-by: Shlomi Noach <[email protected]> * increase test timeout Signed-off-by: Shlomi Noach <[email protected]> * fix defer ordering Signed-off-by: Shlomi Noach <[email protected]> * log.info Signed-off-by: Shlomi Noach <[email protected]> * add and populate cutover_attempts column Signed-off-by: Shlomi Noach <[email protected]> * search PROCESSLIST with LIKE Signed-off-by: Shlomi Noach <[email protected]> * code comment Signed-off-by: Shlomi Noach <[email protected]> * making a variable more local Signed-off-by: Shlomi Noach <[email protected]> * literal string Signed-off-by: Shlomi Noach <[email protected]> Signed-off-by: Shlomi Noach <[email protected]>
* cut-over with sentry table * wait for rename via channel; write extra transaction post LOCK * add stage info * reduced wait-for-pos timeout. Improved stage logic * cleanup * more cleanup * even more cleanup * context timeout * preserve stage by disabling deferred stage changes * killing rename query upon error * increase test timeout * fix defer ordering * log.info * add and populate cutover_attempts column * search PROCESSLIST with LIKE * code comment * making a variable more local * literal string Signed-off-by: Shlomi Noach <[email protected]> Co-authored-by: Shlomi Noach <[email protected]>
Description
This is work in progress for an atomic
RENAME
(not two-step gaping hole) in the cut-over process of a vitess/vreplication Online DDL migration.Currently, we run a two-step rename, when the original table is renamed away, and then a 2nd rename moves the
vreplication
table in its place. This is protected under a buffering rule on the primary, but replicas see two distinct renames and there is a point in time on the replica, where the table does not exist. Of course this means queries are failing.There is also a scenario depicted in #11226, where even queries against the primary may fail.
This PR introduces an algorithm similar to the
gh-ost
cut-over, described in http://code.openark.org/blog/mysql/solving-the-non-atomic-table-swap-take-iii-making-it-atomic and in github/gh-ost#82We still retain our query buffering logic, which protects is against existing logic failures in the
gh-ost
implementation, per github/gh-ost#887.the new logic to be described in detail once the PR is validated and considered safe. Some noteworthy details in the meantime:
VReplicationWaitForPos
after creating the sentry table, which increases the overall cut-over process time. This does not affect incoming traffic, as the creation of the table comes before the critical section of locking/blocking/renaming. It does, however, lock the executor's mutex, which potentially adds a few more seconds to an already a few seconds worth of process. The impact is internal and of no real concern.stowaway_table
logic. The swap is atomic, and there is no need to recover a "stowaway scenario". We will keep the recovery logic for backwards compatibility for one full release, then remove it. The recovery logic is found here:vitess/go/vt/vttablet/onlineddl/executor.go
Lines 3258 to 3280 in 893cc9d
stage
column to_vt.schema_migrations
. It serves multiple purposes:stage
column in every step of the cut-over logic, so that we can easily see where a failure took placestage
in the logs, so we have a full historyLOCK TABLES
andRENAME TABLES
: theLOCK TABLES
query does not generate a GTID entry. TheRENAME TABLES
query is blocked due toLOCK TABLES
and of course does not generate a GTID entry. We updatestage
column immediately after issuing both, and thus ensure we have a post-lock/rename position for which we can wait for inVReplicationWaitForPos
.Related Issue(s)
Checklist
Deployment Notes