-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online DDL: ready_to_complete
race fix
#12612
Online DDL: ready_to_complete
race fix
#12612
Conversation
…ission condition; introduce ready_to_complete_timestamp and WasReadyToComplete Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
Signed-off-by: Shlomi Noach <[email protected]>
if conflictFound, conflictingMigration := e.isAnyConflictingMigrationRunning(onlineDDL); conflictFound { | ||
return vterrors.Wrapf(ErrExecutorMigrationAlreadyRunning, "conflicting migration: %v over table: %v", conflictingMigration.UUID, conflictingMigration.Table) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this double-verification, since we now run synchronously and under same mutex protection.
the diff is I'll try to restart the test tomorrow, 24hr after US daylight saving took effect, and see what happens. |
UUID string `json:"uuid,omitempty"` | ||
Strategy DDLStrategy `json:"strategy,omitempty"` | ||
Options string `json:"options,omitempty"` | ||
RequestTime int64 `json:"time_created,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found that this field(RequestTime
) was unused.
TabletAlias string `json:"tablet,omitempty"` | ||
Retries int64 `json:"retries,omitempty"` | ||
ReadyToComplete int64 `json:"ready_to_complete,omitempty"` | ||
WasReadyToComplete int64 `json:"was_ready_to_complete,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was going to ask if we can't use atomic.Int64
here (and for others where appropriate), but then found golang/go#54582.
Description
Fixes #12610
Two main change in this PR:
executeAlterDDLActionMigration()
does not create goroutines in order to executevitess
,gh-ost
, andpt-osc
migrations. The calls toExecuteWithVReplication
,ExecuteWithGhost
, andExecuteWithPTOSC
are now inlined, synchronousely.migrationMutex
acquired byrunNextMigration()
, which callsisAnyConflictingMigrationRunning()
to determine which migration can be executed that does not conflict with running migrations.isAnyConflictingMigrationRunning()
themselves, and that 2nd chcek is removed, trivially solving the race condition reported in Bug Report: race condition in vreplication migration submission #12610.The reason these migrations were called in a goroutine is historical, and I do not see the need for that anymore.
ready_to_complete_timestamp
, inschema_migrations
table. This column is updated any time the migration isready_to_complete
. It defaultsNULL
.It follows that if the column is
NOT NULL
, then the migration was at some point in the pastread_to_complete
, even if it isn't right now. This condition,ready_to_complete_timestamp IS NOT NULL
, is read intoWasReadyToComplete
.In
isAnyConflictingMigrationRunning()
, we replace theReadyToComplete
check with aWasReadyToComplete
check. It is OK to execute a new concurrentvitess
migration if all existing migrations were at least once ready-to-complete, even if some of them are now not ready to complete. The main thing is that all existing migrations are done with the copy phase and are tailing the logs.This change is compatible with the behavior of
ready_to_complete
of immediate operations (those are implicitlyready_to_complete
and thereby theirready_to_complete_timestamp
is set to a non-NULL
value).Related Issue(s)
Checklist
Deployment Notes