vtorc: improve handling of partial cell topo results#17718
Merged
deepthi merged 6 commits intovitessio:mainfrom Feb 14, 2025
Merged
vtorc: improve handling of partial cell topo results#17718deepthi merged 6 commits intovitessio:mainfrom
vtorc: improve handling of partial cell topo results#17718deepthi merged 6 commits intovitessio:mainfrom
Conversation
Contributor
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
vtorc: improve handling of partial topo resultsvtorc: improve handling of partial cell topo results
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17718 +/- ##
==========================================
+ Coverage 67.79% 67.95% +0.16%
==========================================
Files 1587 1586 -1
Lines 255829 255209 -620
==========================================
+ Hits 173427 173433 +6
+ Misses 82402 81776 -626 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
bb5e29c to
b3936d2
Compare
5 tasks
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
GuptaManan100
approved these changes
Feb 10, 2025
Contributor
GuptaManan100
left a comment
There was a problem hiding this comment.
Rest looks good to me!
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Contributor
Author
|
Update: tested this PR by backporting it to our v19 release Working as expected ✅. We now get an error log line for the individual cell failure(s), also an error indicating we got a partial result from all cells |
deepthi
approved these changes
Feb 14, 2025
timvaillancourt
added a commit
to slackhq/vitess
that referenced
this pull request
Feb 19, 2025
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
5 tasks
timvaillancourt
added a commit
to slackhq/vitess
that referenced
this pull request
Feb 20, 2025
* Move to native sqlite3 queries (vitessio#17124) Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com> * Improve efficiency of `vtorc` topo calls (vitessio#17071) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Co-authored-by: Matt Lord <mattalord@gmail.com> * Ensure all topo read calls consider `--topo_read_concurrency` (vitessio#17276) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Avoid flaky topo concurrency test (vitessio#17407) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: fetch all tablets from cells once + filter during refresh (vitessio#17388) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Support KeyRange in `--clusters_to_watch` flag (vitessio#17604) Signed-off-by: Manan Gupta <manan@planetscale.com> * `vtorc`: improve handling of partial cell topo results (vitessio#17718) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Add stats for shards watched by VTOrc Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * add more tests Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * cleanup Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix ineffassign Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix test for v21 Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Use prefix in all vtorc check and recover logs (vitessio#17526) Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com> --------- Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Signed-off-by: Manan Gupta <manan@planetscale.com> Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Matt Lord <mattalord@gmail.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> Co-authored-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR improves the safety of
getAllTabletsingo/vt/vtorc/logic/tablet_discovery.go, which is used to get all tablet records from all cellsThe new logic returns the tablets from
getAllTablets(...)in a per-cell map (asmap[string][]*topo.TabletInfo) to ensure that only the cells that responded are operated on. A list of failed cells is also returned as[]stringThis avoids tablets from being forgotten when one more cells fail, because before this PR the SQL query to get aliases to forget in
refreshTablets(...)does not consider cells that never responded. This bug was introduced by #17388Related Issue(s)
Closes #17719
Checklist
Deployment Notes