feat(base-cluster/flux): add alert about suspended resources#1540
feat(base-cluster/flux): add alert about suspended resources#1540
Conversation
WalkthroughThe changes update Prometheus alert rules and kube-state-metrics configurations for Flux resources in a Kubernetes cluster. Alert expressions now use new gauge metrics and include additional conditions for suspended resources. Metric definitions are refactored, switching from "Info" to "Gauge" types, introducing new labels, and adding a metric to track resource suspension. Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
⏰ Context from checks skipped due to timeout of 90000ms (4)
🔇 Additional comments (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Pull Request Overview
Adds a new gauge for suspended Flux resources and an accompanying alert, while simplifying the resource-type mapping for kube-state-metrics.
- Simplifies the
$typesdict to only map resource kinds to their group names. - Renames the existing
reconcile_condition_infometric toreconcile_condition_gaugeand adds a newreconcile_suspended_gauge. - Introduces a
ResourcesSuspendedalert and updates the missing-metrics alert to include the suspended gauge.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| charts/base-cluster/templates/monitoring/kube-prometheus-stack/_kube-state-metrics-config.yaml | Update types mapping, rename existing metric, add suspended-resource gauge |
| charts/base-cluster/templates/flux/rules/flux-status.yaml | Add ResourcesSuspended alert rule and extend missing-metrics check |
Comments suppressed due to low confidence (4)
charts/base-cluster/templates/monitoring/kube-prometheus-stack/_kube-state-metrics-config.yaml:58
- The
groupVersionKinddict only includesgroup, but it should also include bothversionandkindto properly identify each resource and avoid metric-generation errors.
"groupVersionKind" (dict
charts/base-cluster/templates/monitoring/kube-prometheus-stack/_kube-state-metrics-config.yaml:56
- It looks like the
$typesdict declaration above has its closing}}removed, which will cause a Helm template syntax error. Please re-add the closing braces to complete the dict.
{{- range $kind, $group := $types -}}
charts/base-cluster/templates/flux/rules/flux-status.yaml:29
- Consider adding or updating chart/unit tests to cover the new
ResourcesSuspendedalert and thegotk_reconcile_suspended_gaugemetric to ensure they fire and resolve as expected.
- alert: ResourcesSuspended
charts/base-cluster/templates/monitoring/kube-prometheus-stack/_kube-state-metrics-config.yaml:83
- [nitpick] It would help maintainers to add a brief comment explaining the purpose and expected labels/value semantics of the new
reconcile_suspended_gaugemetric.
"name" "reconcile_suspended_gauge"
Pull Request is not mergeable
🤖 I have diffed this beep boop"/$namespace/$kind/$name.yaml" for normal resources
|
🤖 I have created a release *beep* *boop* --- ## [8.2.0](base-cluster-v8.1.0...base-cluster-v8.2.0) (2025-07-21) ### Features * **base-cluster/flux:** add alert about suspended resources ([#1540](#1540)) ([bb1555e](bb1555e)) * **base-cluster/monitoring:** lower cpu request for prometheus ([#1578](#1578)) ([4a83bf6](4a83bf6)) * **base-cluster/monitoring:** non-critical alerts aren't routed to on-call ([#1533](#1533)) ([0d080f4](0d080f4)) * **base-cluster/rbac:** adjust rbac stuff for OIDC accounts ([#1538](#1538)) ([3f9aa69](3f9aa69)) ### Bug Fixes * **base-cluster/docs:** there is no 9.0.0 release for now... ([#1563](#1563)) ([8e417fb](8e417fb)) ### Miscellaneous Chores * **base-cluster/dependencies:** update common docker tag to v1.5.0 ([#1520](#1520)) ([cb4a522](cb4a522)) * **base-cluster/dependencies:** update docker.io/bitnami/kubectl docker tag to v1.33.3-debian-12-r1 ([#1521](#1521)) ([4ed2a77](4ed2a77)) * **base-cluster/dependencies:** update docker.io/curlimages/curl docker tag to v8.13.0 ([#1523](#1523)) ([e451428](e451428)) * **base-cluster/dependencies:** update docker.io/curlimages/curl docker tag to v8.14.1 ([#1544](#1544)) ([02ba163](02ba163)) * **base-cluster/dependencies:** update docker.io/curlimages/curl docker tag to v8.15.0 ([#1585](#1585)) ([a672a67](a672a67)) * **base-cluster/dependencies:** update docker.io/fluxcd/flux-cli docker tag to v2.6.1 ([#1524](#1524)) ([956ad7e](956ad7e)) * **base-cluster/dependencies:** update docker.io/fluxcd/flux-cli docker tag to v2.6.4 ([#1536](#1536)) ([32a69cc](32a69cc)) * **base-cluster/dependencies:** update docker.io/vladgh/gpg docker tag to v1.3.6 ([#1509](#1509)) ([e521e61](e521e61)) * **base-cluster/dependencies:** update external-dns docker tag to v8.8.4 ([#1510](#1510)) ([b8b3f80](b8b3f80)) * **base-cluster/dependencies:** update external-dns docker tag to v8.9.1 ([#1559](#1559)) ([f9c5642](f9c5642)) * **base-cluster/dependencies:** update external-dns docker tag to v8.9.2 ([#1575](#1575)) ([88b2630](88b2630)) * **base-cluster/dependencies:** update grafana-tempo docker tag to v4 ([#1570](#1570)) ([63c2593](63c2593)) * **base-cluster/dependencies:** update grafana-tempo docker tag to v4.0.13 ([#1580](#1580)) ([5b9df00](5b9df00)) * **base-cluster/dependencies:** update helm release alloy to v1 ([#1573](#1573)) ([013a670](013a670)) * **base-cluster/dependencies:** update helm release alloy to v1.2.0 ([#1596](#1596)) ([aec70ba](aec70ba)) * **base-cluster/dependencies:** update helm release descheduler to v0.33.0 ([#1525](#1525)) ([36d8eca](36d8eca)) * **base-cluster/dependencies:** update helm release ingress-nginx to v4.12.3 ([#1511](#1511)) ([3dd2aa7](3dd2aa7)) * **base-cluster/dependencies:** update helm release kube-prometheus-stack to v75.12.0 ([#1598](#1598)) ([eec214f](eec214f)) * **base-cluster/dependencies:** update helm release kyverno to v3.4.4 ([#1512](#1512)) ([bc20fb9](bc20fb9)) * **base-cluster/dependencies:** update helm release kyverno-policies to v3.4.4 ([#1513](#1513)) ([f79dce1](f79dce1)) * **base-cluster/dependencies:** update helm release loki to v6.30.1 ([#1526](#1526)) ([4bf6daa](4bf6daa)) * **base-cluster/dependencies:** update helm release loki to v6.31.0 ([#1566](#1566)) ([b188c09](b188c09)) * **base-cluster/dependencies:** update helm release loki to v6.32.0 ([#1586](#1586)) ([57c7d86](57c7d86)) * **base-cluster/dependencies:** update helm release reflector to v9 ([#1590](#1590)) ([e679195](e679195)) * **base-cluster/dependencies:** update helm release tetragon to v1.4.1 ([#1581](#1581)) ([ff4f27d](ff4f27d)) * **base-cluster/dependencies:** update helm release traefik to v35.4.0 ([#1527](#1527)) ([9d30e5e](9d30e5e)) * **base-cluster/dependencies:** update helm release traefik to v36 ([#1592](#1592)) ([8c9cafa](8c9cafa)) * **base-cluster/dependencies:** update helm release trivy-operator to v0.29.2 ([#1528](#1528)) ([a815190](a815190)) * **base-cluster/dependencies:** update helm release trivy-operator to v0.29.3 ([#1582](#1582)) ([b078902](b078902)) * **base-cluster/dependencies:** update helm release velero to v7.2.2 ([#1172](#1172)) ([a33e9cb](a33e9cb)) * **base-cluster/dependencies:** update metrics-server docker tag to v7.4.10 ([#1576](#1576)) ([c29757a](c29757a)) * **base-cluster/dependencies:** update metrics-server docker tag to v7.4.6 ([#1514](#1514)) ([c897fbd](c897fbd)) * **base-cluster/dependencies:** update metrics-server docker tag to v7.4.9 ([#1541](#1541)) ([134e3c1](134e3c1)) * **base-cluster/dependencies:** update oauth2-proxy docker tag to v6.2.13 ([#1515](#1515)) ([0c04e1c](0c04e1c)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added an alert for suspended resources in the flux component. * Reduced CPU requests for Prometheus in the monitoring component. * Updated routing to exclude non-critical alerts from on-call notifications. * Improved RBAC configurations for better OIDC account support. * **Bug Fixes** * Clarified that there is no 9.0.0 release currently. * **Chores** * Updated multiple dependencies and docker image tags. * Bumped chart version to 8.2.0. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Summary by CodeRabbit
New Features
Improvements