Skip to content

fix(base-cluster/descheduler): don't remove pods with too many restarts#1744

Merged
cwrau merged 1 commit intomainfrom
fix/base-cluster/descheduler-dont-remove-pods-with-too-many-restarts
Oct 24, 2025
Merged

fix(base-cluster/descheduler): don't remove pods with too many restarts#1744
cwrau merged 1 commit intomainfrom
fix/base-cluster/descheduler-dont-remove-pods-with-too-many-restarts

Conversation

@cwrau
Copy link
Copy Markdown
Member

@cwrau cwrau commented Oct 17, 2025

Otherwise the prometheus alerts don't fire, as they never get too old.
Combined with pods that take a little while to crash and are ready
before that, not even KubeDeploymentReplicasMismatch triggers.
This effectively hides that error.

Summary by CodeRabbit

  • Chores
    • Removed the RemovePodsHavingTooManyRestarts policy from the base cluster descheduler configuration, including its associated pod restart threshold parameters and all related plugin list entries. This configuration change updates how the cluster manages pod eviction and rescheduling, specifically modifying the handling of pods that experience frequent restarts.

Otherwise the prometheus alerts don't fire, as they never get too old.
Combined with pods that take a little while to crash and are ready
before that, not even KubeDeploymentReplicasMismatch triggers.
This effectively hides that error.
Copilot AI review requested due to automatic review settings October 17, 2025 08:08
@cwrau cwrau enabled auto-merge October 17, 2025 08:08
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Oct 17, 2025

Walkthrough

The descheduler plugin configuration in the Helm values file has been updated to remove the RemovePodsHavingTooManyRestarts plugin entry, including its DefaultEvictor arguments and podRestartThreshold setting, as well as its entry in the enabled plugins list.

Changes

Cohort / File(s) Summary
Descheduler plugin config cleanup
charts/base-cluster/values.yaml
Removed RemovePodsHavingTooManyRestarts plugin configuration block from DefaultEvictor args, including podRestartThreshold argument, and removed the plugin from the enabled plugins list

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A plugin bids farewell today,
No more restarts get in the way,
The config grows lean and clean,
Simplest Helm change we've seen! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix(base-cluster/descheduler): don't remove pods with too many restarts" directly and accurately describes the primary change in the changeset. The title clearly indicates that the modification prevents the descheduler from removing pods that have experienced many restarts, which aligns perfectly with the raw summary showing that the RemovePodsHavingTooManyRestarts plugin configuration has been removed. The title is concise, uses conventional commit formatting, avoids vague terms, and provides sufficient clarity for a reviewer scanning the project history to understand the intent of the change.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/base-cluster/descheduler-dont-remove-pods-with-too-many-restarts

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b61010 and 4ceea80.

📒 Files selected for processing (1)
  • charts/base-cluster/values.yaml (0 hunks)
💤 Files with no reviewable changes (1)
  • charts/base-cluster/values.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: check licenses
  • GitHub Check: Update release-please config file for a possibly new chart
  • GitHub Check: lint helm chart (base-cluster)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adjusts the default descheduler behavior to avoid evicting pods with high restart counts so that Prometheus alerts can age and fire as intended.

  • Remove the RemovePodsHavingTooManyRestarts strategy from the descheduler configuration
  • Retain other descheduler strategies unchanged

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread charts/base-cluster/values.yaml
Comment thread charts/base-cluster/values.yaml
@cwrau cwrau added this pull request to the merge queue Oct 24, 2025
Merged via the queue into main with commit 9c1ed51 Oct 24, 2025
30 of 35 checks passed
@cwrau cwrau deleted the fix/base-cluster/descheduler-dont-remove-pods-with-too-many-restarts branch October 24, 2025 09:53
github-merge-queue Bot pushed a commit that referenced this pull request Oct 27, 2025
🤖 I have created a release *beep* *boop*
---


##
[10.0.1](base-cluster-v10.0.0...base-cluster-v10.0.1)
(2025-10-27)


### Bug Fixes

* **base-cluster/descheduler:** don't remove pods with too many restarts
([#1744](#1744))
([9c1ed51](9c1ed51))
* **base-cluster/ingress:** add missing `prometheus` block 🙄
([#1767](#1767))
([a329e1a](a329e1a))
* **base-cluster/loki:** adjust retention settings for loki logs
([#1745](#1745))
([1985d34](1985d34))
* **base-cluster/monitoring:** use the correct prometheus datasource id
([#1764](#1764))
([511cc84](511cc84))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Fixed descheduler to prevent removal of pods with excessive restart
counts
  * Added missing Prometheus monitoring configuration to ingress
  * Adjusted log retention settings in Loki
  * Corrected Prometheus datasource ID in monitoring

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants