-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using single GPU with multiple containers #123
Comments
Porting gpushare-device-plugin (and gpushare-scheduler-extender if it needs adaptation) to GKE would be much useful to us! |
There is a simple way to do GPU sharing on GKE by creating symlinks on the host filesystem to the /dev/nvidia0 device. The plugin will pickup these new devices and register them with the node. There is no advanced scheduling, but it does let you attach the same GPU to multiple containers. Here is an example I use in a daemonset to add the symlinks to the node after the NVIDIA driver has been installed: # Create symlinks to NVIDIA device to support GPU sharing.
NVIDIA_0_SHARE=16
OLD_DEV=/dev/nvidia0
for i in $(seq 1 $(($NVIDIA_0_SHARE - 1))); do
NEW_DEV=/dev/nvidia${i}
echo "Linking $OLD_DEV -> $NEW_DEV"
ln -sf $OLD_DEV $NEW_DEV
done |
@danisla That's a nice, simple solution for a use case with basic scheduling. |
Another trick would be to make your node RAM match the GPU memory. |
@danisla, could you please clarify where exactly this bash script should be applied? |
@danisla , thank you but seems that it doesn't work with EKS |
@HenriTEL That would work when we just have a single node. But in case of multiple nodes having GPUs with different memories, it will become increasingly difficult to ensure proper scheduling and that we are not overcommitting our resources. @ALL I will try to create a PoC for it since there are enough people interested in this. I will create a pull here when I'm done. |
@ndesh26 , in our case it doesn't work even with a single node, with and without nvidia device plugin, but that's maybe specific to EKS. |
I was able to find some unofficial support for this in case of the Nvidia plugin. Is there similar support for this plugin?
I'm interested in this feature and would be willing to implement it. I want to know if there are other people who would be interested in such a feature. And also if the maintainers of this repo would like to add such a feature.
The text was updated successfully, but these errors were encountered: