Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Disable etcd alert about false positive cluster state #850

Open
surajssd opened this issue Aug 25, 2020 · 1 comment
Open

Disable etcd alert about false positive cluster state #850

surajssd opened this issue Aug 25, 2020 · 1 comment
Labels
area/monitoring Monitoring bug Something isn't working

Comments

@surajssd
Copy link
Member

Following alert is triggered when you have multiple gRPC failures on watch request.

[firing]  — etcdHighNumberOfFailedGRPCRequests (1)
critical
etcd cluster "kube-etcd": 100% of requests for Watch failed on etcd instance 10.99.170.163:2381.
Source
Labels:
· alertname: etcdHighNumberOfFailedGRPCRequests
· grpc_method: Watch
· grpc_service: etcdserverpb.Watch
· instance: 10.1.1.1:2381
· job: kube-etcd
· prometheus: monitoring/prometheus-operator-prometheus
· severity: critical
@surajssd surajssd added area/monitoring Monitoring bug Something isn't working labels Aug 25, 2020
@invidian
Copy link
Member

I investigated this issue privately and it seems removing the alert should be okay'ish for the time being.

Alternatively we could modify the formula to ignore offending metrics, leaving others in place.

Then, we should restore the alerts once issue is fixed in etcd.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/monitoring Monitoring bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants