VStreams NOT_FOUND Error Retries and Omits Tablet#154
VStreams NOT_FOUND Error Retries and Omits Tablet#154makinje16 merged 4 commits intoslack-vitess-r14.0.5from
Conversation
|
original discussion for posterity: https://slack-pde.slack.com/archives/C01P84R7L02/p1695329095557079 |
@makinje16 / @pbibra: are we seeing VStreams use |
Sorry to clarify, we're not streaming from a tablet that is still in Restore, we'd be trying to stream from a tablet that's being replaced due to an issue. Since the current tablet picker version does not perform the health check before a tablet is chosen for streaming, we run into this issue. |
Ahh I understand, thanks for clarifying 👍 |
Description
Currently, vstreams fail as expected when trying to stream from a tablet that is being replaced for some reason. However, when the
NOT_FOUNDerror is returned to the vstream client it does not omit the restore tablet from being used during the next retry. This can lead to streams being blocked for multiple hours.The reason this is added to our fork instead of committed upstream is because this has been fixed in future versions of Vitess, but, when bringing that fix in, there were many dependencies that caused issues with the merge. This change will be added in order to unblock CDC until we are more caught up with upstream.
Testing
This change has been in dev over the last several weeks and recently it was noted that the
NOT_FOUNDerrors have not been in over 4 weeks.Related Issue(s)
Checklist
Deployment Notes