You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently we started receiving KubeAPIErrorBudgetBurn alerts. When looking into the historical data, I noticed that the metrics, e.g. apiserver_request:burnrate1d is empty before the alert. After digging into the definition, I found that there are some flaws in some definitions in kubernetes-prometheusRule.yaml. Actually the metrics didn't change too much before and after the alert, but the 5xx errors from apiserver made this metrics available since then.
The sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1d])) will cause the metrics not available until 5xx error happens.
Did you expect to see some different?
I expect the metrics to produce correct numbers even before any 5xx error happens. The wrong definitions should be fixed.
How to reproduce it (as minimally and precisely as possible):
What happened?
Recently we started receiving
KubeAPIErrorBudgetBurn
alerts. When looking into the historical data, I noticed that the metrics, e.g.apiserver_request:burnrate1d
is empty before the alert. After digging into the definition, I found that there are some flaws in some definitions inkubernetes-prometheusRule.yaml
. Actually the metrics didn't change too much before and after the alert, but the 5xx errors from apiserver made this metrics available since then.The
sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1d]))
will cause the metrics not available until 5xx error happens.Did you expect to see some different?
I expect the metrics to produce correct numbers even before any 5xx error happens. The wrong definitions should be fixed.
How to reproduce it (as minimally and precisely as possible):
https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/kubernetes-prometheusRule.yaml#L785
should be replaced by
It also applies to other similar metrics.
Environment
It's environment irrelevant.
Anything else we need to know?:
Nope
The text was updated successfully, but these errors were encountered: