Skip to content

Conversation

@stevesg
Copy link
Contributor

@stevesg stevesg commented Jun 20, 2022

What this PR does

The cortex_ruler_queries_failed_total metric should increment whenever a rule evaluation query fails, for a reason not caused by user behavior. Before this change, errors would be ignored unless they are specifically of typehttpgrpc. This means that many errors get lost, for example, connectivity errors coming from gRPC.

Note: I inverted the logic a little so it reads "count everything except... " which I thought was easier to read.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated
  • n/a Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

The `cortex_ruler_queries_failed_total` metric should increment whenever a
rule evaluation query fails. Before this change, errors would be ignored
unless they are specifically of type`httpgrpc`. This means that many errors
get lost, for example, connectivity errors coming from gRPC.
Copy link
Contributor

@ortuman ortuman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes perfect sense to me. and explains why we were not being properly reported on failed evaluations. nice catch!

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

@stevesg stevesg marked this pull request as ready for review June 20, 2022 13:49
@stevesg stevesg merged commit d58da4a into main Jun 21, 2022
@stevesg stevesg deleted the tweak-ruler-remote-query-error-checking branch June 21, 2022 09:31
masonmei pushed a commit to udmire/mimir that referenced this pull request Jul 11, 2022
…es. (grafana#2143)

The `cortex_ruler_queries_failed_total` metric should increment whenever a
rule evaluation query fails. Before this change, errors would be ignored
unless they are specifically of type`httpgrpc`. This means that many errors
get lost, for example, connectivity errors coming from gRPC.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants