You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We met a lock hang at cut-over table stage recently. After analysis, I have a suspicion that it is related to the PR #888. The problem occurs handle Timeout while waiting for events up to lock in atomicCutOver(). After timeout error current atomicCutOver will be cancelled and try again. When canceled, defer func is executed, include okToUnlockTable <-true and this.applier.DropAtomicCutOverSentryTableIfExists(), also applier.AtomicCutOverMagicLock will drop magic cut-over table after recv okToUnlockTable channel. so the PR use sync.Once to avoid to send drop cutover sentry table to mysql twice, But if the drop table operation executed by applier.DropAtomicCutOverSentryTableIfExists() first, it will be locked with Waiting for table metadata lock, while the actual lock owner applier.AtomicCutOverMagicLock is stuck here (Once mutex lock), Waiting for the completion of the former.
This can be reproduced by injecting some problems,at here force a timeout error before waiting for the event to be locked, and Wait a few seconds here to make sure that the delete table is invoking by DropAtomicCutOverSentryTableIfExists().
Thank you!
The text was updated successfully, but these errors were encountered:
@cenkore Hello,is it possible to move the delete table action to the defer function of AtomicCutOverMagicLock?This enables the creation and release of sentry table in the same coroutine, and avoid this problem.
We met a lock hang at cut-over table stage recently. After analysis, I have a suspicion that it is related to the PR #888. The problem occurs handle
Timeout while waiting for events up to lock
inatomicCutOver()
. After timeout error current atomicCutOver will be cancelled and try again. When canceled, defer func is executed, includeokToUnlockTable <-true
andthis.applier.DropAtomicCutOverSentryTableIfExists()
, alsoapplier.AtomicCutOverMagicLock
will drop magic cut-over table after recv okToUnlockTable channel. so the PR use sync.Once to avoid to send drop cutover sentry table to mysql twice, But if the drop table operation executed byapplier.DropAtomicCutOverSentryTableIfExists()
first, it will be locked withWaiting for table metadata lock
, while the actual lock ownerapplier.AtomicCutOverMagicLock
is stuck here (Once mutex lock), Waiting for the completion of the former.This can be reproduced by injecting some problems,at here force a timeout error before waiting for the event to be locked, and Wait a few seconds here to make sure that the delete table is invoking by DropAtomicCutOverSentryTableIfExists().
Thank you!
The text was updated successfully, but these errors were encountered: