Conversation
Signed-off-by: crowu <y.wu4515@gmail.com>
| "vitess.io/vitess/go/vt/vterrors" | ||
| ) | ||
|
|
||
| var replicationLagGauges = stats.NewGaugesWithMultiLabels( |
There was a problem hiding this comment.
None of the vttablet metrics have the keyspace/shard dimensions because by definition a tablet belongs to only one keyspace/shard. This can be a simple gauge (NewGauge).
Also the name should include the units - replicationLagMs or replicationLagNs.
HeartbeatLag is being reported in nanoseconds so we should probably do the same here.
There was a problem hiding this comment.
Sure, I renamed the gauge to replicationLagSec since we always assume the lag in seconds (e.g., we have SecondsBehindMaster and also cast the duration to sec on line 60)
There was a problem hiding this comment.
That makes more sense than ns :)
Signed-off-by: crowu <y.wu4515@gmail.com>
| "vitess.io/vitess/go/vt/vterrors" | ||
| ) | ||
|
|
||
| var replicationLagGauges = stats.NewGauge("replicationLagSec", "replication lag in seconds") |
There was a problem hiding this comment.
Sorry to be nitpicky, but could you rename the variable? It can be the same: replicationLagSec or even rename both the variable and gauge to replicationLagSeconds.
There was a problem hiding this comment.
Yep, good catch. I was going to do that initially as well :-)
Signed-off-by: crowu y.wu4515@gmail.com
Description
I think if polling is the default recommendation given how VTGate gateway works. This PR reports lag stats from poller so that we can track which replica is "unhealthy"
Related Issue(s)
Checklist
Deployment Notes
Impacted Areas in Vitess
Components that this PR will affect: