Skip to content

tm: fix replmanager deadlock#6625

Merged
sougou merged 3 commits intovitessio:masterfrom
planetscale:ss-tm3-repl-deadlock
Aug 26, 2020
Merged

tm: fix replmanager deadlock#6625
sougou merged 3 commits intovitessio:masterfrom
planetscale:ss-tm3-repl-deadlock

Conversation

@sougou
Copy link
Copy Markdown
Contributor

@sougou sougou commented Aug 26, 2020

A deadlock was found during a PRS. The root cause was a fix where we changed the replmanager to take the action lock. Otherwise, it would potentially race and conflict with other actions. But this led to a deadlock because PromoteReplica also waits for the replmanager to finish its fix.

We could have spot-fixed this for the specific use case. But in the interest of preventing other corner cases, the better fix was to change replmanager to not wait if it couldn't obtain a lock.

However, the implementation of lock with context timeout was flawed, because it wouldn't really timeout if the context expired. So, I implemented a new AcquireContext in sync2.Semaphore to, which encouraged to fix the flaky tests there.

Using the semaphore allowed me to implement a real tryLock, and replManager could use it.

Since this was a race condition, I tested it manually. The test that failed previously now passes.

sougou added 2 commits August 25, 2020 19:46
And fix flaky test

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Change actionMutex to a semaphore to implement a tryLock function
in tm, and use it in replManager.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
@sougou sougou requested review from deepthi, enisoc and rafael and removed request for deepthi August 26, 2020 03:17
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Copy link
Copy Markdown
Member

@rafael rafael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sougou sougou merged commit 6cb8496 into vitessio:master Aug 26, 2020
@sougou sougou deleted the ss-tm3-repl-deadlock branch August 26, 2020 22:36
@askdba askdba added this to the v8.0 milestone Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants