vtorc: support analysis ordering, improve semi-sync rollout#19427
vtorc: support analysis ordering, improve semi-sync rollout#19427timvaillancourt merged 21 commits intovitessio:mainfrom
vtorc: support analysis ordering, improve semi-sync rollout#19427Conversation
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19427 +/- ##
===========================================
- Coverage 69.67% 46.46% -23.21%
===========================================
Files 1614 24 -1590
Lines 216793 3736 -213057
===========================================
- Hits 151044 1736 -149308
+ Misses 65749 2000 -63749
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
|
@mattlord / @nickvanw I believe all PR comments/questions have been addressed. Thanks for the review ❤️ For the most part: I've made sure all problems have a priority and deprecated the |
mhamza15
left a comment
There was a problem hiding this comment.
LGTM! Just some small comments.
@timvaillancourt I think this is the only open comment from me: https://github.com/vitessio/vitess/pull/19427/files/2f3b04caca1ba12f4fbcb41baccc5e830d9eaca3#r2843704610 I'm not sure if the comment needs to be changed or that should return true? |
mattlord
left a comment
There was a problem hiding this comment.
LGTM! Nice work on this, @timvaillancourt ❤️
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
|
@mhamza15 this PR seems to be hitting a race in I debugged this with Claude and here was the summary: I wanted to hear what you thought about the fix proposed, and if you'd prefer to run with it, or I add it here? |
This reverts commit 91d3264. Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
…#19427 - Breaking change: external decompressor no longer read from backup MANIFEST by default - VTOrc: ordered recovery execution and semi-sync rollout improvements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Resolve merge conflicts from cherry-picking the analysis ordering and semi-sync rollout PR onto release-23.0. Adapt the new problem-matching system to the release-23.0 codebase by removing IncapacitatedPrimary (not present on this branch) and keeping fmt.Sprintf style consistent with existing code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve merge conflicts from cherry-picking the analysis ordering and semi-sync rollout PR onto release-23.0. Adapt the new problem-matching system to the release-23.0 codebase by removing IncapacitatedPrimary (not present on this branch) and keeping fmt.Sprintf style consistent with existing code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
…rollout (#19427) (#19472) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> Co-authored-by: Mohamed Hamza <mhamza@fastmail.com> Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com>
Description
This PR refactors how VTOrc prioritizes and executes recoveries in order to address #18712, which requires some problems to be executed in a defined order
The major changes this PR brings is:
switchstatement, limiting options to solve this problemslice-of-structin a new file:analysis_problem.goBeforeAnalysisandAfterAnalysisdependencies, to enforce dependenciesPrimarySemiSyncMustBeSetproblem is only solved if>= semi-sync ackershas enabled semi-sync, and VTOrc sees it as enabled in subsequent pollings (via theFullStatusRPC VTOrc calls every X seconds)End-to-end tests were added to validate the main issue #18712 is addressed. This e2e test checks that semi-sync states on replicas are fixed first, before a
PRIMARY. Also various unit tests confirm the validity of new functions that were addedBackport Reason
As explained in #18712, the current recovery logic causes a stall in writes when semi-sync is enabled, from the perspective of a shard
Backporting this would avoid a solvable stall in writes for older Vitess versions. If this doesn't backport nicely, this may be a reason to not-backport
Related Issue(s)
Resolves #18712
Checklist
Deployment Notes
AI Disclosure
Claude w/Opus used to move switch statement to new map. Also some test-failure debugging