Handle SQL thread crash in vt/vttablet/tabletserver/repltracker#7156
Closed
timvaillancourt wants to merge 5 commits intovitessio:masterfrom
Closed
Handle SQL thread crash in vt/vttablet/tabletserver/repltracker#7156timvaillancourt wants to merge 5 commits intovitessio:masterfrom
timvaillancourt wants to merge 5 commits intovitessio:masterfrom
Conversation
Signed-off-by: Tim Vaillancourt <timvaillancourt@github.com>
…com/timvaillancourt/vitess into repltracker-handle-sql-thread-error
…com/timvaillancourt/vitess into repltracker-handle-sql-thread-error Signed-off-by: Tim Vaillancourt <timvaillancourt@github.com>
f95c8b2 to
a2b378b
Compare
…com/timvaillancourt/vitess into repltracker-handle-sql-thread-error Signed-off-by: Tim Vaillancourt <timvaillancourt@github.com>
a2b378b to
9002059
Compare
6 tasks
Contributor
Author
|
Replacing with PR #7157 due to DCO CI problems |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport
NO
Status
DRAFT
Description
This PR causes
vt/vttablet/tabletserver/repltracker/poller.goto return anreplication sql thread errorwhen a replica's SQL thread has crashed on an unrecoverable error. This is to address delays in the system noticing a replica is unhealthy (and potentially inconsistent) when the SQL thread has crashedThe criteria for the error is:
sql_slave_skip_counter=0, ie: SQL thread errors are not skippedThe drawback to this change is
Last_SQL_Errnois not reset to zero unless aRESET MASTERorRESET SLAVEis ran following an SQL thread crashAlthough somewhat unlikely, a user that manually resolves the SQL error and restarts the SQL thread without running
RESET [MASTER|SLAVE]will continue to have aLast_SQL_Errnothat is greater-than zero. In this situation this new error could be returned if the SQL thread were to be stopped. I don't have a good solution for this at the moment. Because restoring from a good backup is the norm in this situation (in my experience) I wonder if this is an acceptable limitation or more logic is required 🤔cc @sougou / @shlomi-noach / @deepthi / @tomkrouper / @drogart for thoughts on the above
Related Issue(s)
List related PRs against other branches:
Todos
Deployment Notes
Notes regarding deployment of the contained body of work. These should note any
db migrations, etc.
Impacted Areas in Vitess
List general components of the application that this PR will affect: