Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce node minTTL parameter to delay consolidation for freshly launched nodes. #1648

Open
eugenea opened this issue Sep 9, 2024 · 2 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@eugenea
Copy link

eugenea commented Sep 9, 2024

Description

What problem are you trying to solve?

We observe that karpenter sometimes disrupts node less than 1 min after it got launched. This creates several problems in the cluster:

  • velero backup jobs cannot complete because velero pod keeps getting evicted due to karpenter aggressive consolidation.
  • elevated alert noise for container crashloops or service inactivity timeouts because some cluster components get removed before they are fully initialized which causes complicated monitoring situations due to component depencency
  • high observability costs due to high metric cardinality caused by constant pod churn caused by overly aggressive karpenter node consolidation.
  • high stress for istio discovery layer due to large changes in the cluster caused by aggressive karpenter node consolidation.

How important is this feature to you?

It is pretty important because it is challenging to find workaround for some situations listed above.

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@eugenea eugenea added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 9, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 9, 2024
@eugenea
Copy link
Author

eugenea commented Sep 9, 2024

maybe instead of or in addition to minTTL there should be "stabilization window" for scaling down, to match the same functionality as HPA provides. StabilizationWindow would reduce amount of cluster changes where clusters are being used for spiky/batch workloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

2 participants