fix(base-cluster/descheduler): don't remove pods with too many restarts by cwrau · Pull Request #1744 · teutonet/teutonet-helm-charts

cwrau · 2025-10-17T08:08:51Z

Otherwise the prometheus alerts don't fire, as they never get too old.
Combined with pods that take a little while to crash and are ready
before that, not even KubeDeploymentReplicasMismatch triggers.
This effectively hides that error.

Summary by CodeRabbit

Chores
- Removed the RemovePodsHavingTooManyRestarts policy from the base cluster descheduler configuration, including its associated pod restart threshold parameters and all related plugin list entries. This configuration change updates how the cluster manages pod eviction and rescheduling, specifically modifying the handling of pods that experience frequent restarts.

Otherwise the prometheus alerts don't fire, as they never get too old. Combined with pods that take a little while to crash and are ready before that, not even KubeDeploymentReplicasMismatch triggers. This effectively hides that error.

coderabbitai · 2025-10-17T08:09:33Z

Walkthrough

The descheduler plugin configuration in the Helm values file has been updated to remove the RemovePodsHavingTooManyRestarts plugin entry, including its DefaultEvictor arguments and podRestartThreshold setting, as well as its entry in the enabled plugins list.

Changes

Cohort / File(s)	Summary
Descheduler plugin config cleanup `charts/base-cluster/values.yaml`	Removed `RemovePodsHavingTooManyRestarts` plugin configuration block from DefaultEvictor args, including podRestartThreshold argument, and removed the plugin from the enabled plugins list

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A plugin bids farewell today,
No more restarts get in the way,
The config grows lean and clean,
Simplest Helm change we've seen! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "fix(base-cluster/descheduler): don't remove pods with too many restarts" directly and accurately describes the primary change in the changeset. The title clearly indicates that the modification prevents the descheduler from removing pods that have experienced many restarts, which aligns perfectly with the raw summary showing that the RemovePodsHavingTooManyRestarts plugin configuration has been removed. The title is concise, uses conventional commit formatting, avoids vague terms, and provides sufficient clarity for a reviewer scanning the project history to understand the intent of the change.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/base-cluster/descheduler-dont-remove-pods-with-too-many-restarts

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b61010 and 4ceea80.

📒 Files selected for processing (1)

charts/base-cluster/values.yaml (0 hunks)

💤 Files with no reviewable changes (1)

charts/base-cluster/values.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: check licenses
GitHub Check: Update release-please config file for a possibly new chart
GitHub Check: lint helm chart (base-cluster)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull Request Overview

This PR adjusts the default descheduler behavior to avoid evicting pods with high restart counts so that Prometheus alerts can age and fire as intended.

Remove the RemovePodsHavingTooManyRestarts strategy from the descheduler configuration
Retain other descheduler strategies unchanged

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

🤖 I have created a release *beep* *boop* --- ## [10.0.1](base-cluster-v10.0.0...base-cluster-v10.0.1) (2025-10-27) ### Bug Fixes * **base-cluster/descheduler:** don't remove pods with too many restarts ([#1744](#1744)) ([9c1ed51](9c1ed51)) * **base-cluster/ingress:** add missing `prometheus` block 🙄 ([#1767](#1767)) ([a329e1a](a329e1a)) * **base-cluster/loki:** adjust retention settings for loki logs ([#1745](#1745)) ([1985d34](1985d34)) * **base-cluster/monitoring:** use the correct prometheus datasource id ([#1764](#1764)) ([511cc84](511cc84)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).  ## Summary by CodeRabbit * **Bug Fixes** * Fixed descheduler to prevent removal of pods with excessive restart counts * Added missing Prometheus monitoring configuration to ingress * Adjusted log retention settings in Loki * Corrected Prometheus datasource ID in monitoring  --------- Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings October 17, 2025 08:08

cwrau requested review from marvinWolff, tasches and teutonet-bot as code owners October 17, 2025 08:08

cwrau enabled auto-merge October 17, 2025 08:08

github-actions Bot assigned cwrau Oct 17, 2025

teutonet-bot added the base-cluster label Oct 17, 2025

Copilot AI reviewed Oct 17, 2025

View reviewed changes

Comment thread charts/base-cluster/values.yaml

Comment thread charts/base-cluster/values.yaml

marvinWolff approved these changes Oct 22, 2025

View reviewed changes

cwrau added this pull request to the merge queue Oct 24, 2025

Merged via the queue into main with commit 9c1ed51 Oct 24, 2025
30 of 35 checks passed

cwrau deleted the fix/base-cluster/descheduler-dont-remove-pods-with-too-many-restarts branch October 24, 2025 09:53

teutonet-bot mentioned this pull request Oct 24, 2025

chore(main): [bot] release base-cluster:10.0.1 #1765

Merged

This was referenced Nov 28, 2025

chore(main): [bot] release base-cluster:10.1.2 #1841

Merged

chore(main): [bot] release base-cluster:11.0.0 #1862

Merged

teutonet-bot mentioned this pull request Mar 17, 2026

chore(main): [bot] release base-cluster:11.1.1 #2023

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(base-cluster/descheduler): don't remove pods with too many restarts#1744

fix(base-cluster/descheduler): don't remove pods with too many restarts#1744
cwrau merged 1 commit intomainfrom
fix/base-cluster/descheduler-dont-remove-pods-with-too-many-restarts

cwrau commented Oct 17, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 17, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cwrau commented Oct 17, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cwrau commented Oct 17, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 17, 2025 •

edited

Loading