Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dashboard Namespace (Workloads) is broken when you have HA prometheus #680

Open
Andor opened this issue Oct 19, 2021 · 5 comments
Open

dashboard Namespace (Workloads) is broken when you have HA prometheus #680

Andor opened this issue Oct 19, 2021 · 5 comments

Comments

@Andor
Copy link

Andor commented Oct 19, 2021

When you have HA Prometheus, you usually will have multiple prometheus instances with different prometheus_replica label values.

For example, this query from dashboard Namespace (Workloads) will return the error:

sum(
  node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster="$cluster", namespace="$namespace"}
* on(namespace,pod)
  group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster="$cluster", namespace="$namespace", workload_type="$type"}
) by (workload, workload_type)

error:

duplicate time series on the right side of `* on (namespace, pod) group_left (workload, workload_type)`
@paulfantom
Copy link
Member

That sounds like an issue with evaluating rules not in prometheus but in something like thanos ruler or cortex.

@Andor
Copy link
Author

Andor commented Oct 21, 2021

No, rules are evaluated on Prometheus side. I use remote_write on top of that and store data in VictoriaMetrics.
I managed to mitigate this issue by removing label prometheus_replica on remote_write relabeling.
This is possible you can close this issue now.

@paulfantom
Copy link
Member

I am just wondering how your setup is configured to get to this results. Usually what I find is that there are 2 prometheus replicas using identical scrape configuration and the same recording/alerting rules. The only difference between replicas is what is set up in external_labels section. In such a scenario replica A do not have access to data from replica B and thus there is no possibility for duplication of data which could cause an issue like the one described here.

The only way I can see this could happen is if recording rules are evaluated based on data from both replicas A and replica B.

@pschulten
Copy link

Thanks @Andor
I had the same behavior. query:

sum(
    node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="tortilla", namespace="loki"}
  * on(namespace,pod)
    group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster="tortilla", namespace="loki", workload="compactor", workload_type="deployment"}
) by (pod)

error:

Query error
422: error when executing query="sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"tortilla\", namespace=\"loki\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"tortilla\", namespace=\"loki\", workload=\"compactor\", workload_type=\"deployment\"}\n) by (pod)\n" on the time range (start=1640087850000, end=1640088750000, step=30000): cannot execute query: cannot evaluate "node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"tortilla\", namespace=\"loki\"} * on (namespace, pod) group_left (workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"tortilla\", namespace=\"loki\", workload=\"compactor\", workload_type=\"deployment\"}": duplicate time series on the right side of `* on (namespace, pod) group_left (workload, workload_type)`: {cluster="tortilla", namespace="loki", pod="compactor-8564fbf8b4-mdr7p", prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-1", stage="prod", workload="compactor", workload_type="deployment"} and {cluster="tortilla", namespace="loki", pod="compactor-8564fbf8b4-mdr7p", prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0", stage="prod", workload="compactor", workload_type="deployment"}

"fixed" by:

prometheus+: {
  prometheus+: {
    spec+: {
      //...
      externalLabels: {
        cluster: 'xyz',
        //...
      },
      remoteWrite: [
        // Also write metrics to victoria-metricses
        {
          writeRelabelConfigs: [
            {
              sourceLabels: ['prometheus_replica'],
              action: 'drop',
            },
          ],
          url: 'https://example.com:8427/api/v1/write',
          //...

@Pluies
Copy link

Pluies commented Mar 9, 2022

FWIW, we ran into the same issue after setting up HA Prometheus (backed by Thanos using the Thanos sidecar), and it affected a lot of dashboards.

The best way to fix this for us was to turn on deduplication in Thanos query, so that Thanos would only reply with a single timeserie rather than one timeserie for each scraping Prometheus. All dashboards back to normal!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants