Flaws in some definitions in kubernetes-prometheusRule.yaml #1480

allenhsu · 2021-11-03T08:35:10Z

What happened?

Recently we started receiving KubeAPIErrorBudgetBurn alerts. When looking into the historical data, I noticed that the metrics, e.g. apiserver_request:burnrate1d is empty before the alert. After digging into the definition, I found that there are some flaws in some definitions in kubernetes-prometheusRule.yaml. Actually the metrics didn't change too much before and after the alert, but the 5xx errors from apiserver made this metrics available since then.

The sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1d])) will cause the metrics not available until 5xx error happens.

Did you expect to see some different?

I expect the metrics to produce correct numbers even before any 5xx error happens. The wrong definitions should be fixed.

How to reproduce it (as minimally and precisely as possible):

https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/kubernetes-prometheusRule.yaml#L785

sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1d]))

should be replaced by

sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1d]) or vector(0))

It also applies to other similar metrics.

Environment

It's environment irrelevant.

Anything else we need to know?:

Nope

The text was updated successfully, but these errors were encountered:

paulfantom · 2021-11-03T10:00:34Z

The alert definition is coming from https://github.com/kubernetes-monitoring/kubernetes-mixin/ project. Please file an issue there.

allenhsu added the kind/bug label Nov 3, 2021

paulfantom added the dependency/external label Nov 3, 2021

allenhsu mentioned this issue Nov 3, 2021

Flaws in the definition of apiserver metrics kubernetes-monitoring/kubernetes-mixin#690

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaws in some definitions in kubernetes-prometheusRule.yaml #1480

Flaws in some definitions in kubernetes-prometheusRule.yaml #1480

allenhsu commented Nov 3, 2021 •

edited

Loading

paulfantom commented Nov 3, 2021

Flaws in some definitions in kubernetes-prometheusRule.yaml #1480

Flaws in some definitions in kubernetes-prometheusRule.yaml #1480

Comments

allenhsu commented Nov 3, 2021 • edited Loading

paulfantom commented Nov 3, 2021

allenhsu commented Nov 3, 2021 •

edited

Loading