Skip to content

Conversation

@gabesaba
Copy link
Contributor

@gabesaba gabesaba commented Aug 9, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

We update FlavorSelector to simulate preemption, to determine if a workload can reclaim capacity from its Cohort in one Flavor, rather than preempting its workloads in another Flavor. This matches behavior in the single Flavor case, where a ClusterQueue prioritizes preempting workloads in its Cohort, before preempting its own workloads.

Which issue(s) this PR fixes:

Fixes #2720

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Update Flavor selection logic to prefer Flavors which allow reclamation of lent nominal quota, over Flavors which require preempting workloads within the ClusterQueue. This matches the behavior in the single Flavor case.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Aug 9, 2024
@k8s-ci-robot k8s-ci-robot requested review from mimowo and trasc August 9, 2024 12:54
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 9, 2024
@netlify
Copy link

netlify bot commented Aug 9, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit ca32567
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66bb3e3cce1cd8000843888e

@gabesaba gabesaba force-pushed the prefer_reclamation branch from b65cfb3 to 3d87d6f Compare August 9, 2024 13:00
@gabesaba
Copy link
Contributor Author

gabesaba commented Aug 9, 2024

/assign @PBundyra

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it is quite impressive how small the fix is. I'm wondering if we have enough test coverage for the oracle though.

}

// granularMode is the FlavorAssignmentMode internal to
// FlavorAssigner, which lets us distinguish priority based premption,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// FlavorAssigner, which lets us distinguish priority based premption,
// FlavorAssigner, which lets us distinguish priority based preemption,


for _, candidate := range p.preemptor.getTargets(log, wl, resources.FlavorResourceQuantities{fr: quantity}, sets.New(fr), p.snapshot) {
if candidate.WorkloadInfo.ClusterQueue == cq.Name {
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any test which exercises if this is needed? If no, it would be good to have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this, thank you. Added a test which covers this path.

Comment on lines +2368 to +2371
wantPreempted: sets.New("eng-beta/b1"),
wantLeft: map[string][]string{
"other-alpha": {"eng-alpha/preemptor"},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since eng-alpha/a1 has lower priority than eng-alpha/preemptor shouldn't a1 be preempted, and preemptor be admitted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preemptor will be admitted - but in a future cycle (the way these tests are setup, we expect to see it in wantLeft, since it didnt Fit this cycle). It was correct here to preempt b1, as we have a preference (after this PR) to reclaim capacity, rather than preempt another workload in the same CQ.

@PBundyra
Copy link
Contributor

When reclamation meets required quota only partially, do we still reclaim as much as possible and then preempt the rest, or do we just preempt the whole missing quota?

@gabesaba gabesaba force-pushed the prefer_reclamation branch from 3d87d6f to ca32567 Compare August 13, 2024 11:06
@gabesaba
Copy link
Contributor Author

When reclamation meets required quota only partially, do we still reclaim as much as possible and then preempt the rest, or do we just preempt the whole missing quota?

Yes, we will preempt the workloads which reclaim capacity first. But, it is possible we won't end up preempting them if we can get all the resources we need from a workload in our CQ, as we attempt to add back workloads after the fit - the step here

"prefer first preemption flavor when second flavor requires both reclaim and cq priority preemption": {
// Flavor 1, on-demand, requires preemption of workload in CQ.
// Flavor 2, spot, requires preemption of workload in Cohort and CQ
// Since Flavor 2 doesn't improve the assignment, we prefer Flavor 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the descriptions. They are really helpful here.

@mimowo
Copy link
Contributor

mimowo commented Aug 13, 2024

/approve
Will wait with lgtm for a little while in case @PBundyra has some additional comments.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabesaba, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 13, 2024
@PBundyra
Copy link
Contributor

LGTM @mimowo

@mimowo
Copy link
Contributor

mimowo commented Aug 13, 2024

/lgtm
Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 13, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f72ab35e2c72db45ec2b5f37802c4721cb01dc69

@mimowo
Copy link
Contributor

mimowo commented Aug 13, 2024

/cherry-pick release-0.8

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: once the present PR merges, I will cherry-pick it on top of release-0.8 in a new PR and assign it to you.

In response to this:

/cherry-pick release-0.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot merged commit 615f13d into kubernetes-sigs:main Aug 13, 2024
@k8s-ci-robot k8s-ci-robot added this to the v0.9 milestone Aug 13, 2024
@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: #2811 failed to apply on top of branch "release-0.8":

Applying: Prefer Reclamation to Priority Based Preemption
Using index info to reconstruct a base tree...
M	pkg/scheduler/flavorassigner/flavorassigner.go
M	pkg/scheduler/flavorassigner/flavorassigner_test.go
M	pkg/scheduler/preemption/preemption.go
M	pkg/scheduler/scheduler.go
M	pkg/scheduler/scheduler_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/scheduler/scheduler_test.go
Auto-merging pkg/scheduler/scheduler.go
Auto-merging pkg/scheduler/preemption/preemption.go
CONFLICT (content): Merge conflict in pkg/scheduler/preemption/preemption.go
Auto-merging pkg/scheduler/flavorassigner/flavorassigner_test.go
Auto-merging pkg/scheduler/flavorassigner/flavorassigner.go
CONFLICT (content): Merge conflict in pkg/scheduler/flavorassigner/flavorassigner.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Prefer Reclamation to Priority Based Preemption
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-0.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mimowo
Copy link
Contributor

mimowo commented Aug 13, 2024

@gabesaba PTAL what is the best way to backport it.

gabesaba added a commit to gabesaba/kueue that referenced this pull request Aug 13, 2024
k8s-ci-robot pushed a commit that referenced this pull request Aug 13, 2024
* Cleanup to use FlavorResourceQuantities.Add in cache (#2696)

* fix: Refactor FitInCohort tests (#2655)

* fix: Refactor FitInCohorot tests

* fix: delete no-op function

* fix: use new method to add usage

* fix: to enforce resource group constraint for flavors and resources

* fix: consolidated into a single resource group

* fix: delete flavorNames

* fix: adjusted test cases to align with existing expected conditions

* fix: change FlavorResourceQuantitiesFlat value

* Finish flattening of FlavorResourceQuantities (#2721)

* Finish Flattenning FlavorResourceQuantities

* Rename FlavorResourceQuantitiesFlat to FlavorResourceQuantities

* Cleanup preemption.go (#2800)

* [Partial Admission] Check Mode before attempting Preemption (#2809)

* Prefer Reclamation to Priority Based Preemption (#2811)

---------

Co-authored-by: s-shiraki <[email protected]>
@gabesaba gabesaba deleted the prefer_reclamation branch August 13, 2024 13:17
mbobrovskyi pushed a commit to epam/kubernetes-kueue that referenced this pull request Aug 20, 2024
@@ -0,0 +1,35 @@
package preemption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add copyright

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage resources.FlavorResourceQuantities
}

type testOracle struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add this, as opposed to use the real object (or not have an intermediary object at all)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency graph - preemption depends on flavor assigner, and refactoring was non-trivial

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prioritize reclamation over priority-based preemption across multiple RFs

6 participants