Skip to content

Commit

Permalink
Fix rules (#3908)
Browse files Browse the repository at this point in the history
  • Loading branch information
d80tb7 authored Sep 5, 2024
1 parent 35cb59f commit 0d5415e
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions deployment/scheduler/templates/scheduler-prometheusrule.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ spec:
expr: sum by (cluster, category, subCategory) (armada_scheduler_error_classification_by_node)
# Per-queue failures.
- record: queue_category_subCategory:armada_scheduler_failed_jobs
expr: sum by (queue, category, subCategory) (job_error_classification_by_queue)
expr: sum by (queue, category, subCategory) (armada_scheduler_job_error_classification_by_queue)
# Per-node successes.
- record: node:armada_scheduler_succeeded_jobs
expr: sum by (node) (armada_scheduler_job_state_counter_by_node{state="succeeded"})
Expand All @@ -31,7 +31,7 @@ spec:
expr: sum by (cluster, category, subCategory) (armada_scheduler_job_state_counter_by_node{state="succeeded"})
# Per-queue successes.
- record: queue_category_subCategory:armada_scheduler_succeeded_jobs
expr: sum by (queue) (job_state_counter_by_queue{state="succeeded"})
expr: sum by (queue) (armada_scheduler_job_state_counter_by_queue{state="succeeded"})
# Per-node failures increase.
# increase(sum... is safe here, since all metrics that make up the sum reset at the same time.
- record: node:armada_scheduler_failed_jobs:increase1m
Expand Down

0 comments on commit 0d5415e

Please sign in to comment.