go/{stats,vt}: publish VReplicationStreamState to prometheus backend#12772
Conversation
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
Signed-off-by: Max Englander <max@planetscale.com>
5427609 to
e5b245b
Compare
| // StringMapFuncWithMultiLabels is a multidimensional string map publisher. | ||
| // | ||
| // Map keys are compound names made with joining multiple strings with '.', | ||
| // and are named by corresponding key labels. | ||
| // | ||
| // Map values are any string, and are named by the value label. | ||
| // | ||
| // Since the map is returned by the function, we assume it's in the right | ||
| // format (meaning each key is of the form 'aaa.bbb.ccc' with as many elements | ||
| // as there are in Labels). | ||
| // | ||
| // Backends which need to provide a numeric value can set a constant value of 1 | ||
| // (or whatever is appropriate for the backend) for each key-value pair present | ||
| // in the map. |
There was a problem hiding this comment.
I'm curious why we can't use a variation of this:
stats.NewGaugesFuncWithMultiLabels(
"VReplicationLagSeconds",
"vreplication seconds behind primary per stream",
[]string{"source_keyspace", "source_shard", "workflow", "counts"},
func() map[string]int64 {
st.mu.Lock()
defer st.mu.Unlock()
result := make(map[string]int64, len(st.controllers))
for _, ct := range st.controllers {
result[ct.source.Keyspace+"."+ct.source.Shard+"."+ct.workflow+"."+fmt.Sprintf("%v", ct.id)] = ct.blpStats.ReplicationLagSeconds.Load()
}
return result
})
I don'ts how it's any different. Using the existing code would eliminate a lot of the code in this PR.
There was a problem hiding this comment.
Hey @mattlord the difference is that with the current code, it does not change the format of the data exported to expvars. If we follow your recommendation it will change the shape of the expvars JSON.
I wasn't sure if we wanted to make that change, and, if we do that, whether we should treat it as a breaking change.
If you aren't concerned with changing the shape of the JSON, I'm all for your suggestion! Let me know.
There was a problem hiding this comment.
Changing JSON should always be non-breaking. But I don't understand what you mean there. Can you share an example?
There was a problem hiding this comment.
Here are the local changes I made which is how I understood your suggestion:
diff --git a/go/vt/vttablet/tabletmanager/vreplication/stats.go b/go/vt/vttablet/tabletmanager/vreplication/stats.go
index 58c958ce35..b1a5f3250e 100644
--- a/go/vt/vttablet/tabletmanager/vreplication/stats.go
+++ b/go/vt/vttablet/tabletmanager/vreplication/stats.go
@@ -62,19 +62,18 @@ type vrStats struct {
func (st *vrStats) register() {
stats.NewGaugeFunc("VReplicationStreamCount", "Number of vreplication streams", st.numControllers)
stats.NewGaugeFunc("VReplicationLagSecondsMax", "Max vreplication seconds behind primary", st.maxReplicationLagSeconds)
- stats.NewStringMapFuncWithMultiLabels(
+ stats.NewGaugesFuncWithMultiLabels(
"VReplicationStreamState",
"State of vreplication workflow",
- []string{"workflow", "counts"},
- "state",
- func() map[string]string {
+ []string{"workflow", "counts", "state"},
+ func() map[string]int64 {
st.mu.Lock()
defer st.mu.Unlock()
- result := make(map[string]string, len(st.controllers))
+ result := make(map[string]int64, len(st.controllers))
for _, ct := range st.controllers {
state := ct.blpStats.State.Load()
if state != nil {
- result[ct.workflow+"."+fmt.Sprintf("%v", ct.id)] = state.(string)
+ result[ct.workflow+"."+fmt.Sprintf("%v", ct.id)+"."+state.(string)] = 1
}
}
return result
Here is what the expvar stats look like with this change:
"VReplicationStreamState": {
"commerce2customer.1.Running": 1
},
Which is different from the current JSON shape in main and this PR in its current form:
"VReplicationStreamState": {
"commerce2customer.1": "Running"
}
There was a problem hiding this comment.
Ah, I see. A workflow can have N streams and the controller ID is for a given stream. That part makes sense either way. What do we want to do with this metric? Do we want to check the count of ones where the status is e.g. Stopped? If so, I feel like the first output might be easier for that in e.g. promql or in grafana. Maybe not though?
There was a problem hiding this comment.
Hey @mattlord the Prometheus output is different from the expvar output format.
The change you recommended above outputs this expvar format:
"VReplicationStreamState": {
"commerce2customer.1.Running": 1
},
Both this PR and main outputs this expvar format:
"VReplicationStreamState": {
"commerce2customer.1": "Running"
}
Both this PR and the change you recommended will output the same Prometheus output:
# HELP vttablet_v_replication_stream_state State of vreplication workflow
# TYPE vttablet_v_replication_stream_state gauge
vttablet_v_replication_stream_state{counts="1",state="Running",workflow="commerce2customer"} 1
So with both this PR and the change you recommended, we'll be able to accomplish what we want with Prometheus just as well. The difference between your recommendation and the PR in its current form is that your recommendation will change the shape of expvars output, whereas this PR maintains its current form.
There was a problem hiding this comment.
I feel like we might be getting hung up on the specifics of the function when I was merely asking why we couldn't use stats.NewGaugesFuncWithMultiLabels(). It's not clear to me why we can't get the same output you desire and produced in the PR as-is using that existing function. I apologize if I'm being dense and missing something obvious. 🙂
There was a problem hiding this comment.
@mattlord we totally can use stats.NewGaugesFuncWithMultiLabels(), and it will do exactly what I want in terms of giving me nice Prometheus output. However, it will change the shape of the existing expvars JSON, which we may be fine with or we may not want, your call. The shape of the expvars JSON will change because that's just the nature of NewGaugesFuncWithMultiLabels.
There was a problem hiding this comment.
Thanks! After chatting on Slack we cleared up my confusion.
| // StringMapFuncWithMultiLabels is a multidimensional string map publisher. | ||
| // | ||
| // Map keys are compound names made with joining multiple strings with '.', | ||
| // and are named by corresponding key labels. | ||
| // | ||
| // Map values are any string, and are named by the value label. | ||
| // | ||
| // Since the map is returned by the function, we assume it's in the right | ||
| // format (meaning each key is of the form 'aaa.bbb.ccc' with as many elements | ||
| // as there are in Labels). | ||
| // | ||
| // Backends which need to provide a numeric value can set a constant value of 1 | ||
| // (or whatever is appropriate for the backend) for each key-value pair present | ||
| // in the map. |
There was a problem hiding this comment.
Thanks! After chatting on Slack we cleared up my confusion.
Signed-off-by: Max Englander <max@planetscale.com>
Description
In order to make it easier to monitor VReplication workflow status, expose the existing
VReplicationStreamStateas a 1-valued Prometheus gauge. S/o @mcrauwel for the idea.Related Issue(s)
Checklist