-
-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add etcd metrics, Prometheus scrapes, and Grafana dash #175
Conversation
6d7bd60
to
79493cd
Compare
The #176 was to address an alert that fired specifically on AWS because of slow disks. |
5adc367
to
7eb1f0a
Compare
We've seen that a couple times before, and I believe that it's because of etcd-io/etcd#9166 for which doesn't seem like the "fix" landed in 3.3. I believe even then it will require a change to the alerting rules to ignore cancelled connections. |
For now I think I'll drop the two noisy alerts. Even without them, this change adds etcd alerts which weren't active before and populates the one page in Grafana that was empty before. Its a good feedback loop too! - using the new etcd alerts motivated switching to faster disks on AWS for the v1.10 release (other platforms were fast enough). |
* Use etcd v3.3 --listen-metrics-urls to expose only metrics data via http://0.0.0.0:2381 on controllers * Add Prometheus discovery for etcd peers on controller nodes * Temporarily drop two noisy Prometheus alerts
7eb1f0a
to
d770393
Compare
* Expose etcd metrics to workers so Prometheus can run on a worker, rather than a controller * Drop temporary firewall rules allowing Prometheus to run on a controller and scrape targes * Related to #175
* Expose etcd metrics to workers so Prometheus can run on a worker, rather than a controller * Drop temporary firewall rules allowing Prometheus to run on a controller and scrape targes * Related to #175
* Expose etcd metrics to workers so Prometheus can run on a worker, rather than a controller * Drop temporary firewall rules allowing Prometheus to run on a controller and scrape targes * Related to poseidon/typhoon#175
* Expose etcd metrics to workers so Prometheus can run on a worker, rather than a controller * Drop temporary firewall rules allowing Prometheus to run on a controller and scrape targes * Related to poseidon/typhoon#175
--listen-metrics-urls
to expose only metrics data via http://0.0.0.0:2381 on controllersHold off on allowing workers firewall access (can't think of any concrete concern with workloads seeing this). Move Prometheus to a controller node for a while (maybe drop). Adjust firewall rules now that Prometheus can run on a controller, rather than a worker
Made possible by:
Closes #114