Skip to content

Backport: tm: fix replmanager deadlock#6633

Merged
deepthi merged 3 commits intovitessio:release-7.0from
planetscale:ss-tm4-backport-tm-fix
Aug 26, 2020
Merged

Backport: tm: fix replmanager deadlock#6633
deepthi merged 3 commits intovitessio:release-7.0from
planetscale:ss-tm4-backport-tm-fix

Conversation

@sougou
Copy link
Copy Markdown
Contributor

@sougou sougou commented Aug 26, 2020

of #6625

A deadlock was found during a PRS. The root cause was a fix where we changed the replmanager to take the action lock. Otherwise, it would potentially race and conflict with other actions. But this led to a deadlock because PromoteReplica also waits for the replmanager to finish its fix.

We could have spot-fixed this for the specific use case. But in the interest of preventing other corner cases, the better fix was to change replmanager to not wait if it couldn't obtain a lock.

However, the implementation of lock with context timeout was flawed, because it wouldn't really timeout if the context expired. So, I implemented a new AcquireContext in sync2.Semaphore to, which encouraged to fix the flaky tests there.

Using the semaphore allowed me to implement a real tryLock, and replManager could use it.

Since this was a race condition, I tested it manually. The test that failed previously now passes.

sougou added 3 commits August 26, 2020 13:35
And fix flaky test

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Change actionMutex to a semaphore to implement a tryLock function
in tm, and use it in replManager.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
@sougou sougou requested a review from deepthi August 26, 2020 20:37
@deepthi deepthi added this to the v7.0.2 milestone Aug 26, 2020
@deepthi deepthi merged commit 9f6b69c into vitessio:release-7.0 Aug 26, 2020
@sougou sougou deleted the ss-tm4-backport-tm-fix branch August 26, 2020 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants