Skip to content

Conversation

@PBundyra
Copy link
Contributor

@PBundyra PBundyra commented Mar 19, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Changes described in the first phase of #4252

Which issue(s) this PR fixes:

Fixes #4136

Special notes for your reviewer:

I still need to comment some API fields and integration tests

Does this PR introduce a user-facing change?

Support a new fair sharing alpha feature `Admission Fair Sharing`, along with the new API. It orders workloads based on the recent usage coming from a LocalQueue the workload was submitted to. The recent usage is more important than the priority of workloads  

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 19, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 19, 2025
@netlify
Copy link

netlify bot commented Mar 19, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 118ac5c
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/681b97748d89a5000795c375

@PBundyra
Copy link
Contributor Author

/assign @gabesaba @pajakd
cc @mimowo @mwielgus

@k8s-ci-robot
Copy link
Contributor

@PBundyra: GitHub didn't allow me to assign the following users: pajakd.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @gabesaba @pajakd
cc @mimowo @mwielgus

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@PBundyra PBundyra force-pushed the FS-AdmissionTime-CQ branch from a6bef60 to b0e124a Compare March 19, 2025 16:20
@PBundyra
Copy link
Contributor Author

/retest

@PBundyra PBundyra force-pushed the FS-AdmissionTime-CQ branch from 03c713d to 45ac937 Compare March 20, 2025 10:45
@pajakd
Copy link
Contributor

pajakd commented Mar 21, 2025

Just to be sure about the scope here -- by "Changes described in the first phase of #4252" you mean CQ + LQ level support, right (so no cohorts yet)?

@PBundyra
Copy link
Contributor Author

Just to be sure about the scope here -- by "Changes described in the first phase of #4252" you mean CQ + LQ level support, right (so no cohorts yet)?

Exactly

Copy link
Contributor

@gabesaba gabesaba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. Let's iron out the API, and interaction with other FS mode, as this will be difficult to change later

@PBundyra PBundyra force-pushed the FS-AdmissionTime-CQ branch 2 times, most recently from 53b0c21 to 51a869f Compare April 11, 2025 11:56
lqBUsage, errB := b.LqUsage(ctx, c, fsResWeights)
switch {
case errA != nil:
log.V(3).Error(errA, "Error fetching LocalQueue from informer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this code does not know this comes from informer. For example we could change the impl of LqUsage to use cache, but we wouldn't remember to change the log. I would suggest to have a local function which logs and call it.

something like:

lqUsageOrLogError := func(...) usage {
   usage, err := a.LqUsage(ctx, c, fsResWeights)
   if err != nil {
      log.V(2).Error("Error determining LocalQueue usage", "localQueue", lq.Name)
   }
   return usage
}

also, errors should probably not be V(3).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix the message and the depth of the log but the local functions seems like a little overkill to me. I need to have information about the error not only for logging but also to decide if I can compare usages

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it, so in case of errors you don't compare them at all, sounds reasonable. Ideally, the functions used in comparators should not return errors at all.

For example, we could retrieve the LqUsage before calling comparators and just store in memory for quick access. We can leave it for a follow up.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 7, 2025

@PBundyra: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kueue-test-integration-main 88778ec link true /test pull-kueue-test-integration-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mimowo
Copy link
Contributor

mimowo commented May 7, 2025

/lgtm
/approve
Thank you, this is an important feature.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 7, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 33fd40967855828f0da439e29de8d1485591560e

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, PBundyra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2025
@k8s-ci-robot k8s-ci-robot merged commit 9d2919e into kubernetes-sigs:main May 7, 2025
22 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.12 milestone May 7, 2025
@mimowo
Copy link
Contributor

mimowo commented May 8, 2025

These are good comments, I guess the most important is to address the API remarks before the release, the other ones could probably be fixed as follow up issue(s). wdyt @tenzen-y ?

@mimowo
Copy link
Contributor

mimowo commented May 8, 2025

@PBundyra please address the comments, I think the API updates have priority.

@PBundyra
Copy link
Contributor Author

PBundyra commented May 8, 2025

Sure, will do in a follow-up

@tenzen-y
Copy link
Member

tenzen-y commented May 8, 2025

These are good comments, I guess the most important is to address the API remarks before the release, the other ones could probably be fixed as follow up issue(s). wdyt @tenzen-y ?

I think API markers and validations for ConfigAPIs are higher priority.

@PBundyra
Copy link
Contributor Author

PBundyra commented May 8, 2025

@PBundyra please address the comments, I think the API updates have priority.

Addressed here: #5203

pajakd pushed a commit to pajakd/kueue that referenced this pull request May 9, 2025
* Introduce FairSharing at admission time

* Change shape of config API

* Clean up

* Address comments

* Address comments

* Generate helm charts

* Add unit tests that check time left for another reconcile

* Address nits

* Adjust integration test to the new shape of API

* Make FS status in LQ a pointer

* Adjust wrapper

* Rename API field to be consitent with CQ

* Clean up comments, small logic fix

* Add feature gate, reduce number of requeste in lq controller

* Switch on feature gate before creating manager

* Align examples with the latest naming changes

* Ensure test is not flaky, small refactor

* Small refactor, fix naming, improve logging, add another feature gate check

* Fix feature gates in unit tests
@PBundyra PBundyra deleted the FS-AdmissionTime-CQ branch August 12, 2025 10:15
@pajakd pajakd mentioned this pull request Nov 27, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fair sharing mechanism without preemptions

7 participants