Conversation
Otherwise the prometheus alerts don't fire, as they never get too old. Combined with pods that take a little while to crash and are ready before that, not even KubeDeploymentReplicasMismatch triggers. This effectively hides that error.
WalkthroughThe descheduler plugin configuration in the Helm values file has been updated to remove the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull Request Overview
This PR adjusts the default descheduler behavior to avoid evicting pods with high restart counts so that Prometheus alerts can age and fire as intended.
- Remove the RemovePodsHavingTooManyRestarts strategy from the descheduler configuration
- Retain other descheduler strategies unchanged
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
🤖 I have created a release *beep* *boop* --- ## [10.0.1](base-cluster-v10.0.0...base-cluster-v10.0.1) (2025-10-27) ### Bug Fixes * **base-cluster/descheduler:** don't remove pods with too many restarts ([#1744](#1744)) ([9c1ed51](9c1ed51)) * **base-cluster/ingress:** add missing `prometheus` block 🙄 ([#1767](#1767)) ([a329e1a](a329e1a)) * **base-cluster/loki:** adjust retention settings for loki logs ([#1745](#1745)) ([1985d34](1985d34)) * **base-cluster/monitoring:** use the correct prometheus datasource id ([#1764](#1764)) ([511cc84](511cc84)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Fixed descheduler to prevent removal of pods with excessive restart counts * Added missing Prometheus monitoring configuration to ingress * Adjusted log retention settings in Loki * Corrected Prometheus datasource ID in monitoring <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Otherwise the prometheus alerts don't fire, as they never get too old.
Combined with pods that take a little while to crash and are ready
before that, not even KubeDeploymentReplicasMismatch triggers.
This effectively hides that error.
Summary by CodeRabbit