healthcheck: update healthy tablets correctly when a stream returns an error or times out#7654
Conversation
…n error or times out Signed-off-by: deepthi <deepthi@planetscale.com>
…r timeout from healthcheck stream Signed-off-by: deepthi <deepthi@planetscale.com>
go/vt/discovery/healthcheck.go
Outdated
| } | ||
|
|
||
| func (hc *HealthCheckImpl) updateHealth(th *TabletHealth, shr *query.StreamHealthResponse, currentTarget *query.Target, trivialNonMasterUpdate bool, isMasterUpdate bool, isMasterChange bool) { | ||
| func (hc *HealthCheckImpl) updateHealth(th *TabletHealth, currentTarget *query.Target, trivialUpdate bool, isPrimaryUp bool) { |
There was a problem hiding this comment.
Rename this to isPrimaryUpdate? Otherwise, I was reading it as "is primary up".
There was a problem hiding this comment.
It is supposed to be "is primary up". The old parameters (isMasterChange and isMasterUpdate) were actually unnecessary because all the information they pass in is available within the scope of this func. However, we do need to know whether the primary should be marked unhealthy (if there is an error/timeout on the healthcheck connection), so I introduced this parameter.
go/vt/discovery/healthcheck.go
Outdated
| topoproto.TabletAliasString(hc.healthy[targetKey][0].Tablet.Alias), | ||
| shr.TabletExternallyReparentedTimestamp, | ||
| hc.healthy[targetKey][0].MasterTermStartTime) | ||
| if th.Target.TabletType == topodata.TabletType_MASTER { |
There was a problem hiding this comment.
I would presume that isPrimaryUp would be true only if the tablet type was MASTER. Is there a case where this is not the case?
There was a problem hiding this comment.
That is a good point. I suppose this is more of a consistency check - to make sure that we still behave correctly if a caller passes in isPrimaryUp true for a non-master tablet type.
go/vt/discovery/healthcheck.go
Outdated
| hc.healthy[targetKey][0] = th | ||
| } | ||
| } | ||
| } else { |
There was a problem hiding this comment.
If this section was the reason you added the MASTER check, it may read better if you explicitly checked for it here. Something like th.Target.TabletType == topodata.TabletType_MASTER && !isPrimaryUp. But I'm not sure if that means the same thing.
There was a problem hiding this comment.
The MASTER check was already there but in the form of a boolean (isMasterUpdate).
I can break this into two separate if blocks instead of an if .. if .. else if that is easier to read/understand. What you are suggesting will be equivalent to how it is written today.
There was a problem hiding this comment.
The concern was the deep nesting which made it non-obvious. A cascading if (or switch) may read better. Try it. If it doesn't improve it, we can keep this as is.
There was a problem hiding this comment.
Ok, I think the switch made it better.
sougou
left a comment
There was a problem hiding this comment.
Approving, in case no further improvements are possible.
go/vt/discovery/healthcheck.go
Outdated
| hc.healthy[targetKey][0] = th | ||
| } | ||
| } | ||
| } else { |
There was a problem hiding this comment.
The concern was the deep nesting which made it non-obvious. A cascading if (or switch) may read better. Try it. If it doesn't improve it, we can keep this as is.
Signed-off-by: deepthi <deepthi@planetscale.com>
| } | ||
| case isPrimary && !isPrimaryUp: | ||
| // No healthy master tablet | ||
| hc.healthy[targetKey] = []*TabletHealth{} |
There was a problem hiding this comment.
that makes sense. This should be fixed.
Description
There are error conditions from a tablet health check that require the healthy tablet list to be updated.
Related Issue(s)
Fixes #7472
Fixes #7177
Checklist
Impacted Areas in Vitess
Components that this PR will affect:
Testing
Unit tests were added for each case. For #7472 I also followed the provided testing procedure to reproduce the problem and confirmed that the fix works.