-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU support #2115
Comments
@kozikow Have you enabled the feature gate |
After adding "--feature-gates=Accelerators=true" to minikube the container starts, but I get cuda libraries errors: https://gist.github.com/kozikow/be44083d4812c554d84271edf01853aa . Same workflow succeeds on GKE or in nvidia-docker . For reference, other gpu workflow succeeds with similar pod configuration: Image: gcr.io/tensorflow/tensorflow:latest-gpu
I suspected that there is a cuda library mismatch between my host machine and container. However, it successfully starts in nvidia-docker. Is there any magic that nvidia-docker is doing that I am missing? |
We do not support this use case yet because it wasn't clear if minikube is used for spinning up k8s clusters on linux hosts. On hosts where minikube spins up a VM it is harder to consume GPUs since it requires isolating and attaching extra GPUs on the host to minikube VM. |
@vishh We already have an option to use kubeadm. The "none" driver is officially supported for localkube and possibly soon the kubeadm bootstrapper. The "none" driver runs the cluster directly on the host without a VM. |
Is there a recommended way of local testing of GPU workloads? minikube with I suspect the only missing link is some CUDA library trickery that GKE or nvidia-docker are doing. I have been reading the code of nvidia-docker ( https://github.com/NVIDIA/nvidia-docker ) or GKE GPU installer ( https://github.com/ContainerEngine/accelerators/tree/master/cos-nvidia-gpu-installer ), but I didn't find anything yet. |
Same problem here. Would be great if you could advise how to solve this problem. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
FWIW I described our current setup for developing GPU containers on kubernetes in https://tensorflight.blog/2018/02/23/dev-environment-for-gke/ . Please let me know if minikube gets GPU support or there is any other way. |
I too would like to see this happen 😀 |
FYI: This feature is being tackled via the ML Working Group. |
Here's my setup in case it's valuable for the continuation of this RFE minikube version: v0.26.1 Starting minikube: Device plugin being installed Querying the node to see if it sees the GPU If I understand things correctly the However, no matter what I do, I can't seem to get the node to recognize the GPU as an available resource. I know this isn't officially supported yet :) but I thought I'd contribute my env to help with the progression. edit |
/assign |
@Nick-Harvey Didn't work for minikube v0.28.0 |
Hello, |
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
FEATURE REQUEST
**Description **:
It would be really great to run GPU workloads on minikube.
I successfully ran GPU workload on GKE using instructions from https://docs.google.com/document/d/1hYOqaOVSu68ZaUsmCKwyP6kf6UtlTMiE_hxoJ2uUqvs/edit# . I was looking to replicate this in minikube.
Example pod that successfully runs GPU workload on GKE:
Expected output:
[[ 3. 3. 3.] [ 3. 3. 3.]]
I was looking to replicate this workflow within minikube. I have correct GPU local setup that runs the image in nvidia-docker.
I installed and started local minikube with:
I copied all required cuda and nvidia libraries into local host dirtectory
/home/kubernetes/bin/nvidia/lib
I added GPU node capacity:
Yet when I start the same pod as on GKE I get pod status "CreateContainerConfigError" and event
kubelet, kozikowpc Error: GPUs are not supported
. I've seen some code for GPU support in minikube: https://github.com/kubernetes/minikube/blob/master/vendor/k8s.io/kubernetes/pkg/kubelet/gpu/nvidia/nvidia_gpu_manager.go . Is there anything I am doing wrong?The text was updated successfully, but these errors were encountered: