overcommit plugin is no need to check other resources. #3520

xieyanker · 2024-06-13T11:23:40Z

What happened:

I created a vcjob to request 0.1 core CPU, but the podgroup is pending. I add some debug info logs to source code, and recompile the volcano-scheduler to replace the old one. I found the reason of pending is that the minReq's "nvidia.com/gpu" resouce is less than idle.
However, my vcjob didn't request the "nvidia.com/gpu" resouce. Finally I realized that the jobMinReq will add the inqueue's request resources and comparing with idle resources. Actually, the total amount of "nvidia.com/gpu" requested by inqueue is already more than idle.

volcano/pkg/scheduler/plugins/overcommit/overcommit.go

Line 122 in c61742d

if inqueue.Add(jobMinReq).LessEqual(idle, api.Zero) {

What you expected to happen:

I think that when adding newly requested resources to inqueued resources, irrelevant resources should not be considered. For example, if the minimum request is only 1 CPU core, then there is no need to add memory, GPU and other resources.

How to reproduce it (as minimally and precisely as possible):

I guess it may be possible to reproduce it by doing the following:

Add and register GPU resources to one node of clusters.
Create a vcjob to request GPU resources.
When podgorup is running, remove the node from clusters.
The podgroup's status may become "Inqueue".
Then create a new vcjob to request only cpu or memory resource.
The new podgroup will be pending because of Inqueue's GPU not enough.

Anything else we need to know?:

Environment:

Volcano Version:
v1.9.0
Kubernetes version (use kubectl version):
v1.25.4
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

xieyanker added the kind/bug Categorizes issue or PR as related to a bug. label Jun 13, 2024

xieyanker linked a pull request Jun 13, 2024 that will close this issue

overcommit plugin AddForMinReq is no need to add other resources. #3521

Open

lowang-bh linked a pull request Jun 13, 2024 that will close this issue

resource compare support only consider the requested resource item #3522

Open

3 tasks

xieyanker closed this as completed Jun 15, 2024

xieyanker reopened this Jun 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overcommit plugin is no need to check other resources. #3520

overcommit plugin is no need to check other resources. #3520

xieyanker commented Jun 13, 2024

overcommit plugin is no need to check other resources. #3520

overcommit plugin is no need to check other resources. #3520

Comments

xieyanker commented Jun 13, 2024