Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaws in some definitions in kubernetes-prometheusRule.yaml #1480

Open
allenhsu opened this issue Nov 3, 2021 · 1 comment
Open

Flaws in some definitions in kubernetes-prometheusRule.yaml #1480

allenhsu opened this issue Nov 3, 2021 · 1 comment

Comments

@allenhsu
Copy link

allenhsu commented Nov 3, 2021

What happened?

Recently we started receiving KubeAPIErrorBudgetBurn alerts. When looking into the historical data, I noticed that the metrics, e.g. apiserver_request:burnrate1d is empty before the alert. After digging into the definition, I found that there are some flaws in some definitions in kubernetes-prometheusRule.yaml. Actually the metrics didn't change too much before and after the alert, but the 5xx errors from apiserver made this metrics available since then.

The sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1d])) will cause the metrics not available until 5xx error happens.

Did you expect to see some different?

I expect the metrics to produce correct numbers even before any 5xx error happens. The wrong definitions should be fixed.

How to reproduce it (as minimally and precisely as possible):

https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/kubernetes-prometheusRule.yaml#L785

sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1d]))

should be replaced by

sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1d]) or vector(0))

It also applies to other similar metrics.

Environment

It's environment irrelevant.

Anything else we need to know?:

Nope

@paulfantom
Copy link
Member

The alert definition is coming from https://github.com/kubernetes-monitoring/kubernetes-mixin/ project. Please file an issue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants