KEP-5142: Pop pod from backoffQ when activeQ is empty #5144

macsko · 2025-02-06T16:29:18Z

One-line PR description: Add KEP-5142

Issue link: Pop pod from backoffQ when activeQ is empty #5142

Other comments:

k8s-ci-robot · 2025-02-06T16:29:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: macsko
Once this PR has been reviewed and has the lgtm label, please assign huang-wei, jpbetz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-scheduling/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

macsko · 2025-02-06T16:35:53Z

/cc @dom4ha @sanposhiho @Huang-Wei @alculquicondor

I haven't finished all the points yet, but the general idea is there.

sanposhiho

Though there're some sections with TODO, looking good overall, aligned with the discussion we had in the issue.

sanposhiho · 2025-02-06T23:43:39Z

keps/sig-scheduling/5142-pop-backoffq-when-activeq-empty/README.md

+
+In the default scheduler, we should see the throughput around 100-150 pods/s ([ref](https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=Scheduler&metricname=LoadSchedulingThroughput&TestName=load)), and this feature shouldn't bring any regression there.
+
+Based on that `schedule_attempts_total` shouldn't be less than 100 in a second.


nit:

Suggested change

Based on that `schedule_attempts_total` shouldn't be less than 100 in a second.

Based on that `schedule_attempts_total` shouldn't be less than 100 in a second

if there are enough pods that are constantly created within the cluster.

sanposhiho · 2025-02-06T23:54:44Z

keps/sig-scheduling/5142-pop-backoffq-when-activeq-empty/README.md

+that were previously unschedulable.
+
+## Motivation
+


Can you add a bit more about what is backoffQ (= they put penalty on pods that wasted the scheduling cycles) and why we can pop from there (= because they can be retried but they're just waiting for a penalty).

sanposhiho · 2025-02-07T00:07:36Z

keps/prod-readiness/sig-scheduling/5142.yaml

@@ -0,0 +1,3 @@
+kep-number: 5142
+alpha:
+  approver: "@wojtek-t"


/cc @wojtek-t

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 6, 2025

k8s-ci-robot requested review from Huang-Wei and kikisdeliveryservice February 6, 2025 16:29

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 6, 2025

macsko mentioned this pull request Feb 6, 2025

Pop pod from backoffQ when activeQ is empty #5142

Open

4 tasks

macsko force-pushed the pop-Pod_from_backoffq_when-activeq_is_empty branch from 7fa0d54 to 34fc985 Compare February 6, 2025 16:32

k8s-ci-robot requested review from alculquicondor, dom4ha and sanposhiho February 6, 2025 16:35

KEP-5142: Pop pod from backoffQ when activeQ is empty

58bc648

macsko force-pushed the pop-Pod_from_backoffq_when-activeq_is_empty branch from 34fc985 to 58bc648 Compare February 6, 2025 16:38

sanposhiho reviewed Feb 6, 2025

View reviewed changes

sanposhiho reviewed Feb 7, 2025

View reviewed changes

keps/prod-readiness/sig-scheduling/5142.yaml

@@ -0,0 +1,3 @@

kep-number: 5142

alpha:

approver: "@wojtek-t"

Copy link

Member

sanposhiho Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @wojtek-t

k8s-ci-robot requested a review from wojtek-t February 7, 2025 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-5142: Pop pod from backoffQ when activeQ is empty #5144

KEP-5142: Pop pod from backoffQ when activeQ is empty #5144

macsko commented Feb 6, 2025

k8s-ci-robot commented Feb 6, 2025

macsko commented Feb 6, 2025

sanposhiho left a comment

sanposhiho Feb 6, 2025

sanposhiho Feb 6, 2025

sanposhiho Feb 7, 2025


		In the default scheduler, we should see the throughput around 100-150 pods/s ([ref](https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=Scheduler&metricname=LoadSchedulingThroughput&TestName=load)), and this feature shouldn't bring any regression there.

		Based on that `schedule_attempts_total` shouldn't be less than 100 in a second.

	Based on that `schedule_attempts_total` shouldn't be less than 100 in a second.
	Based on that `schedule_attempts_total` shouldn't be less than 100 in a second
	if there are enough pods that are constantly created within the cluster.

KEP-5142: Pop pod from backoffQ when activeQ is empty #5144

Are you sure you want to change the base?

KEP-5142: Pop pod from backoffQ when activeQ is empty #5144

Conversation

macsko commented Feb 6, 2025

k8s-ci-robot commented Feb 6, 2025

macsko commented Feb 6, 2025

sanposhiho left a comment

Choose a reason for hiding this comment

sanposhiho Feb 6, 2025

Choose a reason for hiding this comment

sanposhiho Feb 6, 2025

Choose a reason for hiding this comment

sanposhiho Feb 7, 2025

Choose a reason for hiding this comment