Skip to content

Conversation

@pajakd
Copy link
Contributor

@pajakd pajakd commented Jul 22, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Some of the users would like to define a different preference when selecting a flavor in flavor fungibility. Currently we always try to minimize preemptions. Sometimes it means that an incoming high-priority workload would choose a flavor in which it has to borrow (because its nominal quota is in use or borrowed). A workload that is borrowing is at risk of being preempted which may be undesirable for high-priority workloads so it might be better for it to preempt and be scheduled in a flavor that does not require borrowing.

This PR adds a feature gate FlavorFungibilityImplicitPreferenceDefault. When the feature gate is enabled, it infers the preference (borrowing or preemption) based on the FavorFungibilityStrategy. More precisely in two cases:

  1. WhenCanPreempt = Preempt and WhenCanBorrow = TryNextFlavor
  2. WhenCanPreempt = TryNextFlavor and WhenCanBorrow = TryNextFlavor

flavor assigner will prioritize flavors that require preemption over ones that require borrowing. In all other cases, borrowing will be prioritized.

KEP changes will be added in a separate PR.

Which issue(s) this PR fixes:

Fixes #5424

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Flavor Fungibility: Introduces a new mode which allows to prefer preemption over borrowing when choosing a flavor. 
In this mode the preference is decided based on FavorFungibilityStrategy. This behavior is behind the 
FlavorFungibilityImplicitPreferenceDefault Alpha feature gate (disabled by default).

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 22, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 22, 2025
@pajakd pajakd changed the title Enable borrowing avoidance in flavor fungibility Flavor Fungibility Prefers Fit over NoBorrow Jul 22, 2025
@netlify
Copy link

netlify bot commented Jul 22, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 1ca3056
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/6882307a48ecf50008c9b6fb

@pajakd pajakd marked this pull request as ready for review July 22, 2025 09:32
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 22, 2025
@k8s-ci-robot k8s-ci-robot requested a review from tenzen-y July 22, 2025 09:33
@mimowo
Copy link
Contributor

mimowo commented Jul 22, 2025

@pajakd @gabesaba please prepare also a KEP update for this extension.

Comment on lines 50 to 53
for _, candidate := range candidates {
revertRemoval := cq.SimulateUsageRemoval(candidate.WorkloadInfo.Usage())
defer revertRemoval()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use SimulateWorklaodRemoval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But we have to iterate over the candidates anyway because they are of type Target and we need WorkloadInfo. With SimulateWorkloadRemoval we don't have to get the WorkloadInfo.Usage() so it should be better.

Comment on lines 55 to 57
if len(candidates) == 0 {
return preemptioncommon.NoCandidates
return preemptioncommon.NoCandidates, borrowAfterPreemptions
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this clause come before usage removal? on line 50?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to calculate the borrowing level before this (even if there are no candidates) because we return it. And before that calculation we have to remove the usage of the candidates (if there are any). So, this is why its here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does borrowing level matter if it is NoFit (which is implied by 0 candidates, right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not matter. If the workload does not fit, the borrowAfterPreemptions will be the maximum distance. Do you suggest it makes more sense to return something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I finally understood what you mean. Updated the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it back. Changing the borrowing for NoCandiates makes some integrations tests to fail. I reverted the change. I'll investigate why is that but in the meantime can it stay as it is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once investigated please also add a unit test which demonstrates why this needs to be the wait it is. So that it is much easier to avoid the trap in the future.

@mimowo
Copy link
Contributor

mimowo commented Jul 22, 2025

/hold
I have some higher level concerns as it does not match my understanding of the solution from initial discussions. If we want this to be included still in 0.13 please prepare KEP update with the story and justification for the changes, in particular addressing:

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2025
@pajakd
Copy link
Contributor Author

pajakd commented Jul 22, 2025

@pajakd @gabesaba please prepare also a KEP update for this extension.

Proposed KEP changes in #6133

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 24, 2025
@mimowo
Copy link
Contributor

mimowo commented Jul 24, 2025

@pajakd please mark all addressed comments as resolved and update the release note

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 24, 2025
@gabesaba
Copy link
Contributor

/lgtm

thank you!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 24, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

DetailsGit tree hash: a68c4aa2a68c2fd06ffe2058e2b15a0a81d171ad

Comment on lines +330 to +333
func preferToAvoidBorrowing(fungiblityConfig kueue.FlavorFungibility) bool {
return (fungiblityConfig.WhenCanBorrow == kueue.TryNextFlavor && fungiblityConfig.WhenCanPreempt == kueue.Preempt) ||
(fungiblityConfig.WhenCanBorrow == kueue.TryNextFlavor && fungiblityConfig.WhenCanPreempt == kueue.TryNextFlavor)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking nit: Since now the function simplifies to fungiblityConfig.WhenCanBorrow == kueue.TryNextFlavor can you inline?

@mimowo
Copy link
Contributor

mimowo commented Jul 24, 2025

/remove-kind api-change

@k8s-ci-robot k8s-ci-robot removed the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jul 24, 2025
@mimowo
Copy link
Contributor

mimowo commented Jul 24, 2025

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, pajakd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 24, 2025
@k8s-ci-robot k8s-ci-robot merged commit 32ed754 into kubernetes-sigs:main Jul 24, 2025
22 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.13 milestone Jul 24, 2025
@pajakd pajakd deleted the fungibility_order branch July 25, 2025 06:19
kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Aug 11, 2025
* Allow to avoid borrowing in flavor fungibility

* Update comment

* Update comment

* Update comment

* Typo fix

* Fix tests

* Fix typo

* Update pkg/scheduler/flavorassigner/flavorassigner.go

Co-authored-by: gabesaba <[email protected]>

* Address comments

* Fix a whitespace

* Fix tests

* Address comments

* Avoid -> prefer

* Strategy -> policy

* Update the defaults and validation

* Fix

* Allow empty selection policy

* Fix whitespaces

* Fix unit test

* Update apis/kueue/v1beta1/clusterqueue_types.go

Co-authored-by: Michał Woźniak <[email protected]>

* Fix errors

* Revert API + add feature gate

* Cleanup

* Cleanup

* Simplify the preference check

* Additional integration tests

* Update pkg/features/kube_features.go

Co-authored-by: Michał Woźniak <[email protected]>

* Earlier check for 0 candidates

* Fix tests

* Remove reliance on FS

* Fix test

---------

Co-authored-by: gabesaba <[email protected]>
Co-authored-by: Michał Woźniak <[email protected]>
@pajakd pajakd mentioned this pull request Nov 27, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flavor Fungibility Prefers Fit over NoBorrow

4 participants