Skip to content
This repository was archived by the owner on Apr 28, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions cortex-mixin/alerts/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,21 @@
|||,
},
},
{
alert: 'CortexInconsistentConfig',
expr: |||
count(count by(%s, job, sha256) (cortex_config_hash)) without(sha256) > 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this always > 1 when there are multiple jobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so (maybe I am making a mistake here).

I think the first count checks for series with differenent hashes and the second count filters out the ones from different clusters. This unit test passes (maybe you can break it 🙂 ):

jsonnet -S alerts.jsonnet > alerts.yaml

cat > test.yaml <<EOF  
rule_files:
  - alerts.yaml
evaluation_interval: 1m
tests:
 - interval: 1m
   input_series:
    - series: 'cortex_config_hash{sha256="aa",job="customer-a/cortex", cluster="customer-a", namespace="customer-a", instance="host-a1"}'
      values: '1+0x100'
    - series: 'cortex_config_hash{sha256="bb",job="customer-a/cortex", cluster="customer-a", namespace="customer-a", instance="host-a2"}'
      values: '1+0x100'
    - series: 'cortex_config_hash{sha256="cc",job="customer-b/cortex", cluster="customer-b", namespace="customer-b", instance="host-b1"}'
      values: '1+0x100'
    - series: 'cortex_config_hash{sha256="cc",job="customer-b/cortex", cluster="customer-b", namespace="customer-b", instance="host-b2"}'
      values: '1+0x100'
   alert_rule_test:
    - alertname: CortexInconsistentConfig
      eval_time: 59m
    - alertname: CortexInconsistentConfig
      eval_time: 60m
      exp_alerts:
       - exp_labels:
           severity: warning
           job: customer-a/cortex
           cluster: customer-a
           namespace: customer-a
         exp_annotations:
           message: "An inconsistent config file hash is used across cluster customer-a/cortex.\n"
EOF

promtool test rules unit_test.yaml
Unit Testing:  unit_test.yaml
  SUCCESS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the following query against our local environment to verify the logic of this query:

count(count by(cluster, job, sha256) (cortex_runtime_config_hash)) without (sha256)

It came back looking sane to me

||| % $._config.alert_aggregation_labels,
'for': '1h',
labels: {
severity: 'warning',
},
annotations: {
message: |||
An inconsistent config file hash is used across cluster {{ $labels.job }}.
|||,
},
},
{
// As of https://github.com/cortexproject/cortex/pull/2092, this metric is
// only exposed when it is supposed to be non-zero, so we don't need to do
Expand Down
1 change: 1 addition & 0 deletions cortex-mixin/dashboards.libsonnet
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
grafanaDashboards+:
(import 'dashboards/config.libsonnet') +
(import 'dashboards/queries.libsonnet') +
(import 'dashboards/reads.libsonnet') +
(import 'dashboards/ruler.libsonnet') +
Expand Down
26 changes: 26 additions & 0 deletions cortex-mixin/dashboards/config.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
local utils = import 'mixin-utils/utils.libsonnet';

(import 'dashboard-utils.libsonnet') {

'cortex-config.json':
$.dashboard('Cortex / Config')
.addClusterSelectorTemplates()
.addRow(
$.row('Startup config file')
.addPanel(
$.panel('Startup config file hashes') +
$.queryPanel('count(cortex_config_hash{%s}) by (sha256)' % $.namespaceMatcher(), 'sha256:{{sha256}}') +
$.stack +
{ yaxes: $.yaxes('instances') },
)
)
.addRow(
$.row('Runtime config file')
.addPanel(
$.panel('Runtime config file hashes') +
$.queryPanel('count(cortex_runtime_config_hash{%s}) by (sha256)' % $.namespaceMatcher(), 'sha256:{{sha256}}') +
$.stack +
{ yaxes: $.yaxes('instances') },
)
),
}