Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax threshold for bq_gardener_historical_throughput alerts #873

Merged
merged 3 commits into from
Feb 7, 2022

Conversation

cristinaleonr
Copy link
Contributor

@cristinaleonr cristinaleonr commented Feb 3, 2022

The bq_gardener_historical_throughput query is run every 3h.

The current threshold for the alerts that use this query is 3h and 10mins. T

he GardenerHistoricalThroughputIsStalled alert fired because it took BQ Exporter 3h and 13mins to report the new throughput.

This PR is adjusting the threshold to 4h.

Example:
Alert condition became true at 18:15 and resolved itself at 21:28 (3h and 13mins later).
https://prometheus.mlab-oti.measurementlab.net/graph?g0.expr=increase(gardener_jobs_total%7Bdaily%3D%22false%22%2Cstatus%3D%22success%22%7D%5B1d%5D)%20%3E%200%20unless%20on(datatype)%20bq_gardener_historical_throughput%20%3E%200&g0.tab=0&g0.stacked=0&g0.show_exemplars=0&g0.range_input=3h28m21s591ms&g0.end_input=2022-02-03%2021%3A40%3A04&g0.moment_input=2022-02-03%2021%3A40%3A04


This change is Reviewable

Copy link
Contributor

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: please update alert comment.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @cristinaleonr)


config/federation/prometheus/alerts.yml, line 843 at r1 (raw file):

# datatypes under processing by the v2 pipeline falls below 1 date / day.
# The bq_gardener_historical_throughput metric is under the bigquery exporter 3h
# deployment, so the timeout for this alert is 3 hours and 10 minutes.

The comment is now out of date.

@cristinaleonr cristinaleonr merged commit 9f1103e into master Feb 7, 2022
@cristinaleonr cristinaleonr deleted the sandbox-cristinaleon-relax-threshold branch February 7, 2022 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants