You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I created a vcjob to request 0.1 core CPU, but the podgroup is pending. I add some debug info logs to source code, and recompile the volcano-scheduler to replace the old one. I found the reason of pending is that the minReq's "nvidia.com/gpu" resouce is less than idle.
However, my vcjob didn't request the "nvidia.com/gpu" resouce. Finally I realized that the jobMinReq will add the inqueue's request resources and comparing with idle resources. Actually, the total amount of "nvidia.com/gpu" requested by inqueue is already more than idle.
I think that when adding newly requested resources to inqueued resources, irrelevant resources should not be considered. For example, if the minimum request is only 1 CPU core, then there is no need to add memory, GPU and other resources.
How to reproduce it (as minimally and precisely as possible):
I guess it may be possible to reproduce it by doing the following:
Add and register GPU resources to one node of clusters.
Create a vcjob to request GPU resources.
When podgorup is running, remove the node from clusters.
The podgroup's status may become "Inqueue".
Then create a new vcjob to request only cpu or memory resource.
The new podgroup will be pending because of Inqueue's GPU not enough.
Anything else we need to know?:
Environment:
Volcano Version:
v1.9.0
Kubernetes version (use kubectl version):
v1.25.4
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:
The text was updated successfully, but these errors were encountered:
What happened:
I created a vcjob to request 0.1 core CPU, but the podgroup is pending. I add some debug info logs to source code, and recompile the volcano-scheduler to replace the old one. I found the reason of pending is that the minReq's "nvidia.com/gpu" resouce is less than idle.
However, my vcjob didn't request the "nvidia.com/gpu" resouce. Finally I realized that the
jobMinReq
will add the inqueue's request resources and comparing with idle resources. Actually, the total amount of "nvidia.com/gpu" requested by inqueue is already more than idle.volcano/pkg/scheduler/plugins/overcommit/overcommit.go
Line 122 in c61742d
What you expected to happen:
I think that when adding newly requested resources to inqueued resources, irrelevant resources should not be considered. For example, if the minimum request is only 1 CPU core, then there is no need to add memory, GPU and other resources.
How to reproduce it (as minimally and precisely as possible):
I guess it may be possible to reproduce it by doing the following:
Anything else we need to know?:
Environment:
v1.9.0
kubectl version
):v1.25.4
uname -a
):The text was updated successfully, but these errors were encountered: