feat(base-cluster/monitoring): non-critical alerts aren't routed to on-call#1533
feat(base-cluster/monitoring): non-critical alerts aren't routed to on-call#1533
Conversation
…n-call Period WorkingHours prevents this from waking someone up, but it will still be sent out during the day.
WalkthroughAlert rules across multiple PrometheusRule templates were updated to change the severity label from "warning" to "critical" where applicable, and to add a new label "period: WorkingHours" to all affected alerts. No modifications were made to alert expressions, durations, or other metadata. Changes
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (5)
🔇 Additional comments (8)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Pull Request Overview
This PR updates several alert rules to ensure non-critical notifications are only sent during defined working hours and treats them as critical alerts.
- Changed severity from
warningtocriticalin multiple PrometheusRule templates - Added a
period: WorkingHourslabel to relevant alert definitions - Applied updates across NFS provisioner, Kubernetes deprecation, Flux, cert-manager, and Velero charts
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| charts/base-cluster/templates/nfs-server-provisioner/rules/storage-size.yaml | Bumped severity and added working-hours period |
| charts/base-cluster/templates/monitoring/kdave/rules/releases-with-deprecation.yaml | Updated severity and period for deprecation alerts |
| charts/base-cluster/templates/flux/rules/flux-status.yaml | Adjusted severity and added working-hours period |
| charts/base-cluster/templates/cert-manager/rules/certificate-expiration.yaml | Changed severity and added period to expiration rule |
| charts/base-cluster/templates/backup/velero.yaml | Updated Velero backup alert labels |
Comments suppressed due to low confidence (2)
charts/base-cluster/templates/nfs-server-provisioner/rules/storage-size.yaml:29
- [nitpick] These
severityandperiodlabel blocks are duplicated across multiple rule files. Consider extracting them into a Helm partial (e.g.,_working_hours_labels) to reduce repetition and simplify future updates.
period: WorkingHours
🤖 I have diffed this beep boop"/$namespace/$kind/$name.yaml" for normal resources
|
🤖 I have created a release *beep* *boop* --- ## [8.2.0](base-cluster-v8.1.0...base-cluster-v8.2.0) (2025-07-21) ### Features * **base-cluster/flux:** add alert about suspended resources ([#1540](#1540)) ([bb1555e](bb1555e)) * **base-cluster/monitoring:** lower cpu request for prometheus ([#1578](#1578)) ([4a83bf6](4a83bf6)) * **base-cluster/monitoring:** non-critical alerts aren't routed to on-call ([#1533](#1533)) ([0d080f4](0d080f4)) * **base-cluster/rbac:** adjust rbac stuff for OIDC accounts ([#1538](#1538)) ([3f9aa69](3f9aa69)) ### Bug Fixes * **base-cluster/docs:** there is no 9.0.0 release for now... ([#1563](#1563)) ([8e417fb](8e417fb)) ### Miscellaneous Chores * **base-cluster/dependencies:** update common docker tag to v1.5.0 ([#1520](#1520)) ([cb4a522](cb4a522)) * **base-cluster/dependencies:** update docker.io/bitnami/kubectl docker tag to v1.33.3-debian-12-r1 ([#1521](#1521)) ([4ed2a77](4ed2a77)) * **base-cluster/dependencies:** update docker.io/curlimages/curl docker tag to v8.13.0 ([#1523](#1523)) ([e451428](e451428)) * **base-cluster/dependencies:** update docker.io/curlimages/curl docker tag to v8.14.1 ([#1544](#1544)) ([02ba163](02ba163)) * **base-cluster/dependencies:** update docker.io/curlimages/curl docker tag to v8.15.0 ([#1585](#1585)) ([a672a67](a672a67)) * **base-cluster/dependencies:** update docker.io/fluxcd/flux-cli docker tag to v2.6.1 ([#1524](#1524)) ([956ad7e](956ad7e)) * **base-cluster/dependencies:** update docker.io/fluxcd/flux-cli docker tag to v2.6.4 ([#1536](#1536)) ([32a69cc](32a69cc)) * **base-cluster/dependencies:** update docker.io/vladgh/gpg docker tag to v1.3.6 ([#1509](#1509)) ([e521e61](e521e61)) * **base-cluster/dependencies:** update external-dns docker tag to v8.8.4 ([#1510](#1510)) ([b8b3f80](b8b3f80)) * **base-cluster/dependencies:** update external-dns docker tag to v8.9.1 ([#1559](#1559)) ([f9c5642](f9c5642)) * **base-cluster/dependencies:** update external-dns docker tag to v8.9.2 ([#1575](#1575)) ([88b2630](88b2630)) * **base-cluster/dependencies:** update grafana-tempo docker tag to v4 ([#1570](#1570)) ([63c2593](63c2593)) * **base-cluster/dependencies:** update grafana-tempo docker tag to v4.0.13 ([#1580](#1580)) ([5b9df00](5b9df00)) * **base-cluster/dependencies:** update helm release alloy to v1 ([#1573](#1573)) ([013a670](013a670)) * **base-cluster/dependencies:** update helm release alloy to v1.2.0 ([#1596](#1596)) ([aec70ba](aec70ba)) * **base-cluster/dependencies:** update helm release descheduler to v0.33.0 ([#1525](#1525)) ([36d8eca](36d8eca)) * **base-cluster/dependencies:** update helm release ingress-nginx to v4.12.3 ([#1511](#1511)) ([3dd2aa7](3dd2aa7)) * **base-cluster/dependencies:** update helm release kube-prometheus-stack to v75.12.0 ([#1598](#1598)) ([eec214f](eec214f)) * **base-cluster/dependencies:** update helm release kyverno to v3.4.4 ([#1512](#1512)) ([bc20fb9](bc20fb9)) * **base-cluster/dependencies:** update helm release kyverno-policies to v3.4.4 ([#1513](#1513)) ([f79dce1](f79dce1)) * **base-cluster/dependencies:** update helm release loki to v6.30.1 ([#1526](#1526)) ([4bf6daa](4bf6daa)) * **base-cluster/dependencies:** update helm release loki to v6.31.0 ([#1566](#1566)) ([b188c09](b188c09)) * **base-cluster/dependencies:** update helm release loki to v6.32.0 ([#1586](#1586)) ([57c7d86](57c7d86)) * **base-cluster/dependencies:** update helm release reflector to v9 ([#1590](#1590)) ([e679195](e679195)) * **base-cluster/dependencies:** update helm release tetragon to v1.4.1 ([#1581](#1581)) ([ff4f27d](ff4f27d)) * **base-cluster/dependencies:** update helm release traefik to v35.4.0 ([#1527](#1527)) ([9d30e5e](9d30e5e)) * **base-cluster/dependencies:** update helm release traefik to v36 ([#1592](#1592)) ([8c9cafa](8c9cafa)) * **base-cluster/dependencies:** update helm release trivy-operator to v0.29.2 ([#1528](#1528)) ([a815190](a815190)) * **base-cluster/dependencies:** update helm release trivy-operator to v0.29.3 ([#1582](#1582)) ([b078902](b078902)) * **base-cluster/dependencies:** update helm release velero to v7.2.2 ([#1172](#1172)) ([a33e9cb](a33e9cb)) * **base-cluster/dependencies:** update metrics-server docker tag to v7.4.10 ([#1576](#1576)) ([c29757a](c29757a)) * **base-cluster/dependencies:** update metrics-server docker tag to v7.4.6 ([#1514](#1514)) ([c897fbd](c897fbd)) * **base-cluster/dependencies:** update metrics-server docker tag to v7.4.9 ([#1541](#1541)) ([134e3c1](134e3c1)) * **base-cluster/dependencies:** update oauth2-proxy docker tag to v6.2.13 ([#1515](#1515)) ([0c04e1c](0c04e1c)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added an alert for suspended resources in the flux component. * Reduced CPU requests for Prometheus in the monitoring component. * Updated routing to exclude non-critical alerts from on-call notifications. * Improved RBAC configurations for better OIDC account support. * **Bug Fixes** * Clarified that there is no 9.0.0 release currently. * **Chores** * Updated multiple dependencies and docker image tags. * Bumped chart version to 8.2.0. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Period WorkingHours prevents this from waking someone up, but it will
still be sent out during the day.
Summary by CodeRabbit