Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-5142: Pop pod from backoffQ when activeQ is empty #5144

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

macsko
Copy link
Member

@macsko macsko commented Feb 6, 2025

  • One-line PR description: Add KEP-5142
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 6, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: macsko
Once this PR has been reviewed and has the lgtm label, please assign huang-wei, jpbetz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 6, 2025
@macsko macsko force-pushed the pop-Pod_from_backoffq_when-activeq_is_empty branch from 7fa0d54 to 34fc985 Compare February 6, 2025 16:32
@macsko
Copy link
Member Author

macsko commented Feb 6, 2025

/cc @dom4ha @sanposhiho @Huang-Wei @alculquicondor

I haven't finished all the points yet, but the general idea is there.

@macsko macsko force-pushed the pop-Pod_from_backoffq_when-activeq_is_empty branch from 34fc985 to 58bc648 Compare February 6, 2025 16:38
Copy link
Member

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though there're some sections with TODO, looking good overall, aligned with the discussion we had in the issue.


In the default scheduler, we should see the throughput around 100-150 pods/s ([ref](https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=Scheduler&metricname=LoadSchedulingThroughput&TestName=load)), and this feature shouldn't bring any regression there.

Based on that `schedule_attempts_total` shouldn't be less than 100 in a second.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
Based on that `schedule_attempts_total` shouldn't be less than 100 in a second.
Based on that `schedule_attempts_total` shouldn't be less than 100 in a second
if there are enough pods that are constantly created within the cluster.

that were previously unschedulable.

## Motivation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a bit more about what is backoffQ (= they put penalty on pods that wasted the scheduling cycles) and why we can pop from there (= because they can be retried but they're just waiting for a penalty).

@@ -0,0 +1,3 @@
kep-number: 5142
alpha:
approver: "@wojtek-t"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @wojtek-t

@k8s-ci-robot k8s-ci-robot requested a review from wojtek-t February 7, 2025 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

3 participants