-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Support multiple GPU driver versions in one k8s cluster #323
Comments
Thanks for the feature request. This will indeed be a great feature. Currently only way this can be done is with pre-installed drivers on the host with GPU operator. We would need to introduce |
@shivamerla Thanks for considering this. |
@khatrig, we currently package a single driver version into each image, hence the requirement to have separate daemonsets. |
I understand now what you meant earlier. Thanks for the explanation. |
If this feature is developed, could we add a sanity check for deploying the latest driver compatible with the hardware found on a node? For example: It would be nice if the operator would sanity-check the current devices on a node and select the latest supported driver auto-magically, preventing breakage that we currently see in such an use-case. Thanks |
@shivamerla I would also be interested for having different drivers. We are currently trying to include in the same cluster different GPUs (A100 and K40m) and I can't find a single driver working for both (latest 515 is working for A100 but not for K40m and latest 470 is working for K40m but doesn't see the device on A100) |
wouldn't this be possible today through different nodegroups with the appropriate labels/taints? it would require multiple daemonsets to be deployed, but it seems like it could work - no?
|
Glad to see that gpu-operator now has a feature(tech preview) that makes it possible to run multiple driver versions in the same cluster. Closing. |
@khatrig |
1. feature description
Currently, the gpu-operator only supports one driver version across the cluster. It'd be great if each GPU node could have a different driver version based on the requirement.
Eg. There are two GPU nodes A and B in a k8s cluster.
Node A could have driver version X and node B could have driver version Y.
The text was updated successfully, but these errors were encountered: