Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement operational metrics #105

Open
Pokom opened this issue Feb 8, 2024 · 2 comments
Open

Implement operational metrics #105

Pokom opened this issue Feb 8, 2024 · 2 comments

Comments

@Pokom
Copy link
Contributor

Pokom commented Feb 8, 2024

Unlike our cost metrics, operational metrics should be uniform across all providers and collectors. This makes dashboards and alerts a lot easier to setup, e.g. you only have one set of alerts instead of an N*M alerts(where N is providers, and M is collectors). This work has already begun with #104 which can be refactored to belong to the providers package and used by both AWS and GCP. While the current set is a nice foundation, we can extend this even further to:

  • cloudcost_exporter_collector_api_requests_total
  • cloudcost_exporter_collector_api_requests_errors_total
  • cloudcost_exporter_collector_api_requests_duration_seconds
  • cloudcost_exporter_collector_last_scrape_time

We'd likely need the following labels:

Once we do this, we can update our existing operational dashboard(https://admin-ops-us-east-0.grafana-ops.net/grafana/d/1a9c0de366458599246184cf0ae8b468/cloudcost-exporter-overview?orgId=1) to use the generic metrics instead of the provider specific ones.

@Pokom Pokom changed the title Define operational metrics Implement operational metrics Feb 8, 2024
@Pokom
Copy link
Contributor Author

Pokom commented Feb 8, 2024

Just to say, naming is hard and I'm not married to the idea of it being ...collector_api_requests..., I just don't have a better way of communicating external requests.

Pokom added a commit that referenced this issue May 2, 2024
In order to better track freshness of data, this PR adds a few more operational metrics:
- `cloudcost_exporter_collector_last_scrape_time`
- `cloudcost_exporter_last_scrape_time`

The intent of these is to export in unix time the last time a scrape was performed. This can be used to alert in prometheus when the last_scrape_time was say > 60m.

This also implements in AWS the operational metrics that GCP implemented so that we have feature parity between the two. In the future it would make sense to generalize this to a common interface so that new providers do not need to implement the same metrics.

- refs #5 + #105
@Pokom
Copy link
Contributor Author

Pokom commented Jul 9, 2024

This is likely related work to #222 and could follow similar implementations.

@Pokom Pokom added good first issue Good for newcomers area/capacity and removed good first issue Good for newcomers labels Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant