Fallback to poller replication lag if heartbeat lag fails #207
Conversation
Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
| // rt.mode == tabletenv.Poller or fallback after heartbeat error | ||
| mysqlLag, mysqlErr = rt.poller.Status() | ||
| if fallbackToPoller && mysqlErr != nil { | ||
| return 0, errFallback |
There was a problem hiding this comment.
@ejortegau what if we just return rt.poller.Status() here? This would give you a more useful error although it wouldn't be clear there was a "fallback"
Wrapping the mysqlErr with errFallback might work too
There was a problem hiding this comment.
I want to explicitly return a different error when fallback fails. I will do some wrapping, though
Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
timvaillancourt
left a comment
There was a problem hiding this comment.
I suggested one typo-fix in the error, but LGTM 👍
| if heartbeatLag, heartbeatErr = rt.hr.Status(); heartbeatErr == nil { | ||
| return heartbeatLag, heartbeatErr | ||
| } | ||
| fallbackToPoller = true |
There was a problem hiding this comment.
do we need this boolean? If either of the case statements are met, we would return out anyways
There was a problem hiding this comment.
then the condition below on line 158 can just be
if mysqlErr != nil {...}
There was a problem hiding this comment.
I think so. The intention is to raise different errors in case rt.poller.Status() failed depending on whether fallback was used or not.
…beat_poller_replication_tracker
* Fallback to poller replication lag if heartbeat lag fails Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> * Try to make CI pipeline happy Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> * Address PR comments Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> * Fix typo Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com> Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com> --------- Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com> Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com>
* Fallback to poller replication lag if heartbeat lag fails * Try to make CI pipeline happy * Address PR comments * Fix typo --------- Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com> Co-authored-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Description
This PR should allow to fallback to poller replication lag tracker in case using the heartbeat lag tracker fails. It is meant as a temporary change to allow us to move to heartbeat lag. The reason this is needed is because moving a shard from poller to heartbeat lag tracking can lead to replicas taken out of service. For example:
With the changes in this PR, if there are issues getting the lag from the heartbeat, replication tracker uses poller lag instead, which is what we are currently using.
Once we are fully migrated to using heartbeats, these changes can be reverted.