You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unlike our cost metrics, operational metrics should be uniform across all providers and collectors. This makes dashboards and alerts a lot easier to setup, e.g. you only have one set of alerts instead of an N*M alerts(where N is providers, and M is collectors). This work has already begun with #104 which can be refactored to belong to the providers package and used by both AWS and GCP. While the current set is a nice foundation, we can extend this even further to:
Just to say, naming is hard and I'm not married to the idea of it being ...collector_api_requests..., I just don't have a better way of communicating external requests.
In order to better track freshness of data, this PR adds a few more operational metrics:
- `cloudcost_exporter_collector_last_scrape_time`
- `cloudcost_exporter_last_scrape_time`
The intent of these is to export in unix time the last time a scrape was performed. This can be used to alert in prometheus when the last_scrape_time was say > 60m.
This also implements in AWS the operational metrics that GCP implemented so that we have feature parity between the two. In the future it would make sense to generalize this to a common interface so that new providers do not need to implement the same metrics.
- refs #5 + #105
Unlike our cost metrics, operational metrics should be uniform across all providers and collectors. This makes dashboards and alerts a lot easier to setup, e.g. you only have one set of alerts instead of an N*M alerts(where N is providers, and M is collectors). This work has already begun with #104 which can be refactored to belong to the providers package and used by both AWS and GCP. While the current set is a nice foundation, we can extend this even further to:
cloudcost_exporter_collector_api_requests_total
cloudcost_exporter_collector_api_requests_errors_total
cloudcost_exporter_collector_api_requests_duration_seconds
cloudcost_exporter_collector_last_scrape_time
We'd likely need the following labels:
provider
=> CSPcollector
=> Module that's making the requestservice
=> The backend system being called(compute, storage, billing, costexplorer, etc)method
=> The method(ListInstancesInZone, GetServiceName, GetCostUsage)Once we do this, we can update our existing operational dashboard(https://admin-ops-us-east-0.grafana-ops.net/grafana/d/1a9c0de366458599246184cf0ae8b468/cloudcost-exporter-overview?orgId=1) to use the generic metrics instead of the provider specific ones.
The text was updated successfully, but these errors were encountered: