Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when restart kubelet, the gpu-device-plugin will Restart and Re-register to the new kubelet, leads to pods that depend on the gpu-device-plugin restarting as well. #199

Open
Soledao opened this issue Jul 14, 2021 · 1 comment
Assignees

Comments

@Soledao
Copy link

Soledao commented Jul 14, 2021

because of this, usually, these pods were not available for about 60 seconds.

Normally, a pod that uses only CPU and memory will not restart when restarting kubelet. What solutions can be taken to ensure that pods rely on gpu device are not unavailable when restarting kubelet? Especially when I want to upgrade kubelet.

@grac3gao-zz
Copy link
Contributor

grac3gao-zz commented Jul 15, 2021

Hi @Soledao, could you provide the way to reproduce that (and the information about your cluster)?
I was trying to reproduce it by:

  1. create a cluster with gpu -> install device installer pod to make sure device plugin starts working-> apply a pod using gpu
  2. ssh into the node, and restart the kubelet sudo systemctl restart kubelet

However, I didn't see gpu-device-plugin get restart, the Pod uses gpu didn't restart either.

@grac3gao-zz grac3gao-zz self-assigned this Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants