-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-5142: Pop pod from backoffQ when activeQ is empty #5144
base: master
Are you sure you want to change the base?
KEP-5142: Pop pod from backoffQ when activeQ is empty #5144
Conversation
macsko
commented
Feb 6, 2025
- One-line PR description: Add KEP-5142
- Issue link: Pop pod from backoffQ when activeQ is empty #5142
- Other comments:
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: macsko The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
7fa0d54
to
34fc985
Compare
/cc @dom4ha @sanposhiho @Huang-Wei @alculquicondor I haven't finished all the points yet, but the general idea is there. |
34fc985
to
58bc648
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though there're some sections with TODO, looking good overall, aligned with the discussion we had in the issue.
|
||
In the default scheduler, we should see the throughput around 100-150 pods/s ([ref](https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=Scheduler&metricname=LoadSchedulingThroughput&TestName=load)), and this feature shouldn't bring any regression there. | ||
|
||
Based on that `schedule_attempts_total` shouldn't be less than 100 in a second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Based on that `schedule_attempts_total` shouldn't be less than 100 in a second. | |
Based on that `schedule_attempts_total` shouldn't be less than 100 in a second | |
if there are enough pods that are constantly created within the cluster. |
that were previously unschedulable. | ||
|
||
## Motivation | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a bit more about what is backoffQ (= they put penalty on pods that wasted the scheduling cycles) and why we can pop from there (= because they can be retried but they're just waiting for a penalty).
@@ -0,0 +1,3 @@ | |||
kep-number: 5142 | |||
alpha: | |||
approver: "@wojtek-t" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @wojtek-t