Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Borrow Mechanism Between NodePools in Karpenter #1703

Open
woehrl01 opened this issue Sep 21, 2024 · 1 comment
Open

Borrow Mechanism Between NodePools in Karpenter #1703

woehrl01 opened this issue Sep 21, 2024 · 1 comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@woehrl01
Copy link

woehrl01 commented Sep 21, 2024

Summary:
Introduce a "borrow" mechanism between NodePools in Karpenter, inspired by Kueue's Cohort borrowing functionality. This feature would allow a NodePool to borrow CPU cores (or other resources) from another NodePool, optimizing the utilization of reserved compute instances and improving overall flexibility.

Background:
Karpenter significantly improves workload efficiency by automatically provisioning nodes that meet pod requirements and deprovisioning nodes when they are no longer needed. However, in environments with multiple NodePools (such as those containing both on-demand and reserved instances), it would be beneficial to allow NodePools to share underutilized resources. This would enable more efficient use of reserved instances, ensuring that the resources already allocated are fully leveraged before new nodes are provisioned.

Proposed Solution:
Implement a feature similar to Kueue’s "borrow" mechanism, where NodePools can borrow unused CPU cores or other resources from other NodePools. For translation, a Kueue "ClusterQueue" can be seen as equivalent to a NodePool in Karpenter.

  1. Cohort of NodePools:
    Allow grouping of NodePools into a cohort. NodePools in the same cohort should be able to borrow resources (e.g., CPU cores) from each other.

  2. Borrowing Semantics:
    When a NodePool runs out of its allocated resources, it should be able to borrow unused resources from another NodePool in the same cohort:

    • Karpenter should attempt to provision workloads within the assigned quota of a NodePool first.
    • If resources are exhausted, it should try to borrow from unused quota in other NodePools within the cohort.
    • A NodePool can only borrow resources it is configured to use, and borrowing should be limited by predefined thresholds (similar to Kueue’s borrowingLimit).
  3. Resource Prioritization:
    Borrowed resources should prioritize workloads within nominal quotas, ensuring that borrowing is a secondary measure. If multiple workloads require borrowing, prioritize based on workload priority or creation timestamp, similar to Kueue's approach.

Reference:
For details on Kueue’s borrowing semantics, refer to the Kueue documentation.

Use Case:
This feature would be especially useful for environments with a mix of on-demand and reserved instances. For example, a NodePool running on reserved instances could lend unused CPU cores to an on-demand NodePool, reducing the need to provision additional on-demand nodes when reserved capacity is available. This would result in significant cost savings and better resource utilization.

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@woehrl01 woehrl01 added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 21, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 21, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

2 participants