Skip to content

repairReplication deadlock fix#177

Merged
vmogilev merged 10 commits intoslack-vitess-r14.0.5from
vm_debug-prs
Jan 27, 2024
Merged

repairReplication deadlock fix#177
vmogilev merged 10 commits intoslack-vitess-r14.0.5from
vm_debug-prs

Conversation

@vmogilev
Copy link

@vmogilev vmogilev commented Jan 17, 2024

Description

This PR fixes slow PRS (17-18s hangs) bug caused by repairReplication causing a shard deadlock as described here: https://slack-pde.slack.com/archives/C8EJ0PTPF/p1705042056083619?thread_ts=1696929303.041269&cid=C8EJ0PTPF

Testing

Extensively tested over 2 week period here and here.

Backport/Upstream Plans

see this thread

Rollout

After making a new build off of slack-vitess-r14.0.5:

  1. soak the new build in dev ( all keyspaces )
  2. wait until vtgate v14 rollout is 100% completed to avoid adding any new variables
  3. vttablet canary phase in prod using vtops-go upgrade plan
  4. vttablet default build in prod rollout via u22/high-uptime recycling

@vmogilev vmogilev marked this pull request as ready for review January 22, 2024 19:16
@vmogilev vmogilev requested a review from a team as a code owner January 22, 2024 19:16
Signed-off-by: Vitaliy Mogilevskiy <vmogilevskiy@slack-corp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants