GPU support #2115

kozikow · 2017-10-24T20:46:23Z

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

FEATURE REQUEST

**Description **:

It would be really great to run GPU workloads on minikube.

I successfully ran GPU workload on GKE using instructions from https://docs.google.com/document/d/1hYOqaOVSu68ZaUsmCKwyP6kf6UtlTMiE_hxoJ2uUqvs/edit# . I was looking to replicate this in minikube.

Example pod that successfully runs GPU workload on GKE:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-container
spec:
  volumes:
    - name: nvidia-libraries
      hostPath:
        path: /home/kubernetes/bin/nvidia/lib
  containers:
  - name: gpu-container
    image: mxnet/python:gpu
    args:
      - python
      - -c
      - "import mxnet as mx; a = mx.nd.ones((2, 3), mx.gpu()); b = a * 2 + 1; print b.asnumpy()"
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1
    volumeMounts:
    - name: nvidia-libraries
      mountPath: /usr/local/nvidia/lib64

Expected output: [[ 3. 3. 3.] [ 3. 3. 3.]]

I was looking to replicate this workflow within minikube. I have correct GPU local setup that runs the image in nvidia-docker.

I installed and started local minikube with:

wget https://storage.googleapis.com/minikube-builds/2050/minikube-linux-amd64 && mv minikube-linux-amd64 /usr/bin/minikube && chmod +x /usr/bin/minikube
curl -Lo kubectl https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl && chmod +x kubectl
sudo gsutil cp gs://minikube/k8sReleases/v1.8.0/localkube-linux-amd64 /usr/local/bin/localkube && chmod +x localkube

export MINIKUBE_WANTUPDATENOTIFICATION=false
export MINIKUBE_WANTREPORTERRORPROMPT=false
export MINIKUBE_HOME=$HOME
export CHANGE_MINIKUBE_NONE_USER=true
mkdir $HOME/.kube || true
touch $HOME/.kube/config
export KUBECONFIG=$HOME/.kube/config
sudo -E minikube start --vm-driver=none

I copied all required cuda and nvidia libraries into local host dirtectory /home/kubernetes/bin/nvidia/lib

I added GPU node capacity:

kubectl proxy
curl --header "Content-Type: application/json-patch+json" \
--request PATCH \
--data '[{"op": "add", "path": "/status/capacity/alpha.kubernetes.io~1nvidia-gpu", "value": "1"}]' \
http://127.0.0.1:8001/api/v1/nodes/kozikowpc/status

Yet when I start the same pod as on GKE I get pod status "CreateContainerConfigError" and event kubelet, kozikowpc Error: GPUs are not supported. I've seen some code for GPU support in minikube: https://github.com/kubernetes/minikube/blob/master/vendor/k8s.io/kubernetes/pkg/kubelet/gpu/nvidia/nvidia_gpu_manager.go . Is there anything I am doing wrong?

The text was updated successfully, but these errors were encountered:

r2d4 · 2017-10-24T22:18:12Z

@kozikow Have you enabled the feature gate Accelerators=true? Not sure if thats still required but a google search returned that.

kozikow · 2017-10-24T23:50:33Z

After adding "--feature-gates=Accelerators=true" to minikube the container starts, but I get cuda libraries errors: https://gist.github.com/kozikow/be44083d4812c554d84271edf01853aa . Same workflow succeeds on GKE or in nvidia-docker .

For reference, other gpu workflow succeeds with similar pod configuration:

Image: gcr.io/tensorflow/tensorflow:latest-gpu
Code:

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

I suspected that there is a cuda library mismatch between my host machine and container. However, it successfully starts in nvidia-docker. Is there any magic that nvidia-docker is doing that I am missing?

vishh · 2017-10-25T02:17:00Z

We do not support this use case yet because it wasn't clear if minikube is used for spinning up k8s clusters on linux hosts. On hosts where minikube spins up a VM it is harder to consume GPUs since it requires isolating and attaching extra GPUs on the host to minikube VM.
@dlorenc is vm-driver=none supported officially by minikube? Would it make sense to use kubeadm instead?

r2d4 · 2017-10-25T02:54:26Z

@vishh We already have an option to use kubeadm. The "none" driver is officially supported for localkube and possibly soon the kubeadm bootstrapper.

The "none" driver runs the cluster directly on the host without a VM.

kozikow · 2017-10-26T18:08:19Z

Is there a recommended way of local testing of GPU workloads? minikube with --vm-driver=none --feature-gates=Accelerators=true gets pretty close to achieving this task - some GPU containers are running successfully.

I suspect the only missing link is some CUDA library trickery that GKE or nvidia-docker are doing. I have been reading the code of nvidia-docker ( https://github.com/NVIDIA/nvidia-docker ) or GKE GPU installer ( https://github.com/ContainerEngine/accelerators/tree/master/cos-nvidia-gpu-installer ), but I didn't find anything yet.

sebastianlach · 2017-11-07T13:58:13Z

Same problem here. Would be great if you could advise how to solve this problem.

fejta-bot · 2018-02-07T21:37:08Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

sebastianlach · 2018-02-13T18:25:52Z

/remove-lifecycle stale

kozikow · 2018-02-23T14:08:38Z

FWIW I described our current setup for developing GPU containers on kubernetes in https://tensorflight.blog/2018/02/23/dev-environment-for-gke/ . Please let me know if minikube gets GPU support or there is any other way.

Nick-Harvey · 2018-04-24T17:15:59Z

I too would like to see this happen 😀

vishh · 2018-04-24T18:01:49Z

FYI: This feature is being tackled via the ML Working Group.

Nick-Harvey · 2018-04-25T19:48:17Z

Here's my setup in case it's valuable for the continuation of this RFE

minikube version: v0.26.1
Kubernetes version being created: 1.10

Starting minikube:
minikube start --feature-gates=DevicePlugins=true --vm-driver none --feature-gates=Accelerators=true

Device plugin being installed
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.yml

Querying the node to see if it sees the GPU
kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
NAME GPUs
minikube

If I understand things correctly the --vm-driver none leverages the existing docker runtime on the host to which I have set to nvidia-docker.

However, no matter what I do, I can't seem to get the node to recognize the GPU as an available resource. I know this isn't officially supported yet :) but I thought I'd contribute my env to help with the progression.

edit
figured it out, I was using the 1.9 nvidia device plugin rather than the 1.10. Once I changed those out, the node was recognizing the GPU.

rohitagarwal003 · 2018-06-20T20:42:58Z

/assign

aclowkey · 2018-06-26T07:56:19Z

@Nick-Harvey Didn't work for minikube v0.28.0

rohitagarwal003 · 2018-06-27T18:50:22Z

Hello,
I have a PR that adds GPU support to minikube #2936. It would be really helpful if people on this thread try it out. The instructions are in the PR.
Thank you!

r2d4 added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 9, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 13, 2018

kozikow mentioned this issue Feb 28, 2018

Instructions for GKE ubuntu GoogleCloudPlatform/container-engine-accelerators#57

Closed

k8s-ci-robot assigned rohitagarwal003 Jun 20, 2018

rohitagarwal003 mentioned this issue Jun 26, 2018

Add GPU support to minikube. #2936

Merged

bhack mentioned this issue Jun 28, 2018

Include GPU daemonset in GKE configs? kubeflow/kubeflow#288

Closed

dlorenc closed this as completed in #2936 Jul 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU support #2115

GPU support #2115

kozikow commented Oct 24, 2017

r2d4 commented Oct 24, 2017

kozikow commented Oct 24, 2017 •

edited

Loading

vishh commented Oct 25, 2017

r2d4 commented Oct 25, 2017 •

edited

Loading

kozikow commented Oct 26, 2017

sebastianlach commented Nov 7, 2017

fejta-bot commented Feb 7, 2018

sebastianlach commented Feb 13, 2018

kozikow commented Feb 23, 2018

Nick-Harvey commented Apr 24, 2018

vishh commented Apr 24, 2018

Nick-Harvey commented Apr 25, 2018 •

edited

Loading

rohitagarwal003 commented Jun 20, 2018

aclowkey commented Jun 26, 2018

rohitagarwal003 commented Jun 27, 2018

GPU support #2115

GPU support #2115

Comments

kozikow commented Oct 24, 2017

r2d4 commented Oct 24, 2017

kozikow commented Oct 24, 2017 • edited Loading

vishh commented Oct 25, 2017

r2d4 commented Oct 25, 2017 • edited Loading

kozikow commented Oct 26, 2017

sebastianlach commented Nov 7, 2017

fejta-bot commented Feb 7, 2018

sebastianlach commented Feb 13, 2018

kozikow commented Feb 23, 2018

Nick-Harvey commented Apr 24, 2018

vishh commented Apr 24, 2018

Nick-Harvey commented Apr 25, 2018 • edited Loading

rohitagarwal003 commented Jun 20, 2018

aclowkey commented Jun 26, 2018

rohitagarwal003 commented Jun 27, 2018

kozikow commented Oct 24, 2017 •

edited

Loading

r2d4 commented Oct 25, 2017 •

edited

Loading

Nick-Harvey commented Apr 25, 2018 •

edited

Loading