Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overcommit plugin is no need to check other resources. #3520

Open
xieyanker opened this issue Jun 13, 2024 · 0 comments · May be fixed by #3521 or #3522
Open

overcommit plugin is no need to check other resources. #3520

xieyanker opened this issue Jun 13, 2024 · 0 comments · May be fixed by #3521 or #3522
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@xieyanker
Copy link
Contributor

What happened:

I created a vcjob to request 0.1 core CPU, but the podgroup is pending. I add some debug info logs to source code, and recompile the volcano-scheduler to replace the old one. I found the reason of pending is that the minReq's "nvidia.com/gpu" resouce is less than idle.
However, my vcjob didn't request the "nvidia.com/gpu" resouce. Finally I realized that the jobMinReq will add the inqueue's request resources and comparing with idle resources. Actually, the total amount of "nvidia.com/gpu" requested by inqueue is already more than idle.

if inqueue.Add(jobMinReq).LessEqual(idle, api.Zero) {

What you expected to happen:

I think that when adding newly requested resources to inqueued resources, irrelevant resources should not be considered. For example, if the minimum request is only 1 CPU core, then there is no need to add memory, GPU and other resources.

How to reproduce it (as minimally and precisely as possible):

I guess it may be possible to reproduce it by doing the following:

  1. Add and register GPU resources to one node of clusters.
  2. Create a vcjob to request GPU resources.
  3. When podgorup is running, remove the node from clusters.
  4. The podgroup's status may become "Inqueue".
  5. Then create a new vcjob to request only cpu or memory resource.
  6. The new podgroup will be pending because of Inqueue's GPU not enough.

Anything else we need to know?:

Environment:

  • Volcano Version:
    v1.9.0
  • Kubernetes version (use kubectl version):
    v1.25.4
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@xieyanker xieyanker added the kind/bug Categorizes issue or PR as related to a bug. label Jun 13, 2024
@xieyanker xieyanker reopened this Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
1 participant