Define and apply a priority class to critical components #92

p1-bot-repo-sync · 2025-02-03T20:05:40Z

Summary

While updating a Bigbang deployment we noticed the monitoring release was failing to reconcile on repeated attempts with a timeout error.

The underlying cause of the issue was that prometheus-node-exporter pods were unable to deploy to all nodes due to insufficient resources. This was indicated in the failed helm release:

      DaemonSet is not ready: monitoring/monitoring-monitoring-prometheus-node-exporter. 0 out of 10 expected pods have been scheduled
      DaemonSet is not ready: monitoring/monitoring-monitoring-prometheus-node-exporter. 1 out of 10 expected pods have been scheduled
      DaemonSet is not ready: monitoring/monitoring-monitoring-prometheus-node-exporter. 2 out of 10 expected pods have been scheduled
      warning: Upgrade "monitoring-monitoring" failed: timed out waiting for the condition
    reason: UpgradeFailed

As well as the pod and events:

Warning  FailedScheduling   39s (x6 over 2m18s)  default-scheduler   0/10 nodes are available: 1 Insufficient cpu, 9 node(s) didn't match Pod's node affinity/selector

Temporary resolution

The cluster we were deploying into happened to have a pre-defined PolicyClass. We manually added this to the daemonset's pod spec so that the pods were deployed and the release was able to reconcile as expected.

Notional feature request

There are several daemonsets deployed by Bigbang which I would imagine could run into similar scenarios. Promtail, twistlock defenders, velero/restic all come to mind. Bigbang could define a PolicyClass (perhaps as part of /base) and add it to daemonsets and other critical components as appropriate to ensure that reconciliation of BB managed helm releases is able to complete without hanging on resource scheduling constraints.

The text was updated successfully, but these errors were encountered:

p1-bot-repo-sync bot added community-contribution kind::feature priority::7 team::Observability status::grooming epic::grooming and removed epic::grooming labels Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define and apply a priority class to critical components #92

Define and apply a priority class to critical components #92

p1-bot-repo-sync bot commented Feb 3, 2025

Define and apply a priority class to critical components #92

Define and apply a priority class to critical components #92

Comments

p1-bot-repo-sync bot commented Feb 3, 2025

Summary

Temporary resolution

Notional feature request