Skip to content

Report lag stats in poller#7490

Merged
deepthi merged 3 commits intovitessio:masterfrom
5antelope:ywu/poll_stats
Feb 17, 2021
Merged

Report lag stats in poller#7490
deepthi merged 3 commits intovitessio:masterfrom
5antelope:ywu/poll_stats

Conversation

@5antelope
Copy link
Copy Markdown
Member

@5antelope 5antelope commented Feb 14, 2021

Signed-off-by: crowu y.wu4515@gmail.com

Description

I think if polling is the default recommendation given how VTGate gateway works. This PR reports lag stats from poller so that we can track which replica is "unhealthy"

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

Signed-off-by: crowu <y.wu4515@gmail.com>
"vitess.io/vitess/go/vt/vterrors"
)

var replicationLagGauges = stats.NewGaugesWithMultiLabels(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the vttablet metrics have the keyspace/shard dimensions because by definition a tablet belongs to only one keyspace/shard. This can be a simple gauge (NewGauge).
Also the name should include the units - replicationLagMs or replicationLagNs.
HeartbeatLag is being reported in nanoseconds so we should probably do the same here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I renamed the gauge to replicationLagSec since we always assume the lag in seconds (e.g., we have SecondsBehindMaster and also cast the duration to sec on line 60)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes more sense than ns :)

@deepthi deepthi requested a review from dweitzman February 16, 2021 04:19
Signed-off-by: crowu <y.wu4515@gmail.com>
"vitess.io/vitess/go/vt/vterrors"
)

var replicationLagGauges = stats.NewGauge("replicationLagSec", "replication lag in seconds")
Copy link
Copy Markdown
Collaborator

@deepthi deepthi Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be nitpicky, but could you rename the variable? It can be the same: replicationLagSec or even rename both the variable and gauge to replicationLagSeconds.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good catch. I was going to do that initially as well :-)

Signed-off-by: crowu <y.wu4515@gmail.com>
@deepthi deepthi merged commit 84b5ab3 into vitessio:master Feb 17, 2021
@askdba askdba added this to the v10.0 milestone Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants