-
Notifications
You must be signed in to change notification settings - Fork 295
NVIDIA driver installation support on GPU instances #645
NVIDIA driver installation support on GPU instances #645
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Codecov Report
@@ Coverage Diff @@
## master #645 +/- ##
==========================================
- Coverage 38.26% 37.14% -1.13%
==========================================
Files 51 51
Lines 3316 3201 -115
==========================================
- Hits 1269 1189 -80
+ Misses 1845 1836 -9
+ Partials 202 176 -26
Continue to review full report at Codecov.
|
cc @jollinshead @redbaron I believe you're currently running batch workloads on your kube-aws clusters. Are you also willing to run machine learning workloads utilizing GPUs? 😃 |
# # (Experimental) GPU Driver installation support | ||
# # Currently, only Nvidia driver is supported. | ||
# # This setting takes effect only when node's instance family is p2 of g2. | ||
# # Otherwise, installation will be skipped even if enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://github.com/kubernetes-incubator/kube-aws/pull/645/files#diff-5e5dcac90c0e906cb335a42b0352ce9cR47?, it seems like kube-aws emits a validation error when the gpu support is enabled on a node pool with instance type other than p2 or g2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry! I've missed that there is only a warning.
Anyway, I believe we'd better make it an error rather than a warning because an user does seem to intend to enable the GPU support but kube-aws was unable to do so.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, ok. yes, I agree with you. kube-aws should prohibit 'enabled:true' with instance types which doesn't support GPU. I will update my code.
ExecStart=/opt/nvidia/current/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose | ||
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced | ||
|
||
- path: /opt/nvidia-build/nvidia-start.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my education, would you mind sharing me how this systemd unit gets installed to systemd?
Can we set up the systemd unit via the units
section of cloud-config like others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see the comment. nvidia-install.sh
does.
tar -C ${ARTIFACT_DIR} -cvj ${TOOLS} > tools-${VERSION}.tar.bz2 | ||
tar -C ${ARTIFACT_DIR}/kernel -cvj $(basename -a ${ARTIFACT_DIR}/kernel/*.ko) > modules-${COMBINED_VERSION}.tar.bz2 | ||
|
||
- path: /opt/nvidia-build/build.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS, coreos-nvidia
supports cross-building against any version of Container Linux.
Then, just curious, but can we run build.sh
locally in e.g. a Vagrant machine hosting Container Linux and then export built assets to the disk, then embed to cloud-config or put to S3 for faster startup of GPU-enabled nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, probably. I will try this my local vagrant!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you pointed out, building libraries, kernel modules and nvidia tools succeeded in vagrant!
Puting pre-build binaries to gpu nodes directly makes it startup faster. However kube-aws up
process will be a more complex instead and it requires users to install vagrant and virtualbox.
Honestly speaking, building process takes several minutes (it probably 5 ~10 min. which depends on speed of downloading coreos dev container and nvidia installer). I believe this duration would be acceptable for many users because gpu node pool usually doesn't need to scale so quickly like normal nodepool which hosts service pods.
What do you think?? Do you prefer local build & put pre-build binaries directly for faster startup??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this duration would be acceptable for many users because gpu node pool usually doesn't need to scale so quickly like normal nodepool which hosts service pods.
I completely agree with you here 👍
The local build feature could be an extra thing we may or may not add in the future.
} | ||
|
||
// This function is used when rendering cloud-config-worker | ||
func (c NvidiaSetting) IsEnabledOn(instanceType string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the nice naming 👍
# # Make sure to choose 'docker' as container runtime when enabled this feature. | ||
# gpu: | ||
# nvidia: | ||
# enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess an installed driver can be unusable once Container Linux is updated afterwards, due to an updated kernel.
// Do you think so too?
Then, I believe we'd better document it - maybe something like:
Ensure that automatic Container Linux is disabled(it is disabled by default btw). Otherwise the installed driver may stop working when an OS update resulted in an updated kernel
would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're absolutely right. I will put this to the comment.
cp *.service /etc/systemd/system/ | ||
systemctl daemon-reload | ||
systemctl enable nvidia-start.service | ||
systemctl start nvidia-start.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvidia-install.sh
install nvidia-start.service
and nvidia-persitenced.service
to the systemd. and this script only stats nvidia-start.service
. the unit insmod
nvidia module and then udevadam will spawn several actions in 71-nvidia.rules
which includes nvidia-persisnteced.service
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! So, basically we're copying systemd unit files written via write_files
into /etc/systemd/system
?
Then, it is possible to just write those units directly into the units
section, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we put nvidia-start.service
by unit
section, can we control when the service start? Unless we don't define enabled
explicitly in unit definition, this doesn't automatically, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so.
You can even enable it by default.
More concretely, my idea is modifying nvidia-start.service
to something like:
[Unit]
Description=Start NVIDIA daemon(?)
After=local-fs.target
Before=kublet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStartPre=/opt/bin/retry /opt/nvidia/current/bin/nvidia-start.sh
ExecStart=/bin/true
[Install]
RequiredBy=kubelet.service
And trigger it via a systemd dependency from a newly introduced nvidia-install.service
:
[Unit]
Description=Start NVIDIA daemon(?)
After=local-fs.target
Before=nvidia-start.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStartPre=/opt/bin/retry /opt/nvidia/current/bin/nvidia-install.sh
ExecStart=/bin/true
[Install]
RequiredBy=nvidia-start.service
where /opt/bin/retry
is:
#/usr/bin/bash
set -e
# could be improved to a finite loop
while true; do
if "$@"; then
exit 0
fi
echo retrying "$@"
# could be improved to an exponential backoff
sleep 1
done
and omit ExecStartPre=/opt/nvidia-build/build-and-install.sh
from kubelet.service.
…nce types. This change is caused by: kubernetes-retired#645 (comment)
This change is caused by: kubernetes-retired#645 (comment)
creation for nvidia-persistenced user to `users` section, too. This change is caused by: kubernetes-retired#645 (comment)
|
||
systemctl daemon-reload | ||
systemctl enable nvidia-start.service | ||
systemctl start nvidia-start.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may sound like a nit but -
If we could transform the build-and-install.sh
in ExecStartPre to a systemd unit like suggested in #645 (comment), systemctl daemon-reload
and enable
, start
can be omitted altogether and such dependencies can be handled completely by systemd rather than by a bash script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mumoshu
Yes, sounds nice! I'll fix this.
deleted `systemctl` command from bash script. Instead, above unit dependency is introduced. nvidia-install.service, which just invokes build-and-install.sh is implemented type=oneshot because nvidia-start should wait until nvidia-install.service successed completely. Enabling retry build-and-install.sh, /opt/nvidia-build/util/retry.sh is introduced. It is because type=oneshot and Restart=always can't be used in systemd.
48adb9b
to
2710606
Compare
…ld-and-install.sh via ExecStartPre with retry.sh kubelet.service 'Requires' and 'After' nvidia-star.service.
@mumoshu I updated systemd units' dependencies. I'm glad if you could take a look! |
@everpeace LGTM. Thanks for your efforts on the great feature 👍 |
* kubernetes-incubator/master: Fix "install-kube-system" script when "clusterAutoscaler" is disabled. Remove obsolete etcd locking logic Re: cluster-autoscaler support Make `go test` timeout longer enough for Travis Fixes kubernetes-retired#667 NVIDIA driver installation support on GPU instances (kubernetes-retired#645) Make kubelet flags more consistent Fix taint being assigned as labels Avoid unnecessary node replacements when TLS bootstrapping is enabled (kubernetes-retired#639) Update Kubernetes dashboard to v1.6.1. Update calico to v2.2.1. Fix typo in help message
Has anyone tested this w/ the new G3 instances yet? |
…ed#645) AWS offers Nvidia GPU ready instance type families (P2 and G2). And, of course Kubernetes supports GPU resource scheduling since 1.6. However Nvidia drivers is not installed in default coreos ami used in kube-aws. Then, let's support it! This implements auto installation support of Nvidia GPU driver. Some driver installation script are borrowed from [/Clarifai/coreos-nvidia](https://github.com/Clarifai/coreos-nvidia/). ## Design summary ### Configuration and what will happen New configuration for this feature is really simple. `worker.nodePool[i].gpu.nvidia.{enabled,version}` is introduced in `cluster.yaml`. - default value of `enabled` is false. - user will be warned if - user set `enabled: true` when `instanceType` doesn't support GPU. In this case the configuration will be ignored. - user set `enabled: false` when `instanceType` does support GPU - when `enabled: true` on GPU supported instance type, - nvidia driver will be installed automatically in each node in the nodepool. - The installation will happen just before `kubelet.service` starting (see below). - And, `kubelet` will start with [`--feature-gates="Accelerators=true"`](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L212-L214) - then container can mount nvidia driver [like this](https://gist.github.com/everpeace/9e03050467d5ef5f66b7ce96b5fefa72#file-pod-yaml-L30-L53) - several tags are assigned to the node for enabling schedule on appropriate GPU model and its driver version by using `nodeAffinity`. - `alpha.kubernetes.io/nvidia-gpu-name=<GPU hardware type name>` - `kube-aws.coreos.com/gpu=nvidia`, - `kube-aws.coreos.com/nvidia-gpu-version=<version>` - Because substitution are not used in unit definition, I introduced `/etc/default/kubectl` for defining these label values in [this commit](kubernetes-retired@5c59944). ### Driver installation process Most of installation script is borrowed from [/Clarifai/coreos-nvidia](https://github.com/Clarifai/coreos-nvidia/). Especially, for device node installation, I referenced to Clarifai/coreos-nvidia#4 . I just described summary of installation process. - [`kubelet.service`](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L144-L147) ruires [`nvidia-start.service`](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L456-L471) - [`nvidia-start.service`](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L456-L471) invokes [`build-and-install.sh`](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L918-L947), which installs nvidia drivers and kernel module files, via `ExecStartPre`. `nvidia-start.service` will create device nodes(`nvidiactl` and `nvidia0,1,...`). Other dynamic device nodes are controlled by`udevadam` (configuration is in [this rule file](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L905-L939)) - `nvidia-start.service` is `type=oneshot` because `kubelet.service` should wait until `nvidia-start.sh` completely succeeded. - `Restart` policy cannot be used with`type=oneshot`. `nvidia-start.service` doesn't use systemd's retry feature is not used but manual `retry.sh` is used. - [nvidia-persistenced](https://docs.nvidia.com/deploy/driver-persistence/#persistence-daemon) is also enabled for speeding up startup. this service is started/stopped via `udevadam` too. ## How to try 1. build `kube-aws` on this branch 2. `kube-aws up` with minimal nodepool configuration below ``` worker: nodePools: - name: p2xlarge count: 1 instanceType: p2.xlarge rootVolume: size: 30 type: gp2 gpu: nvidia: enabled: true version: "375.66" ``` 3. check `kubectl get nodes --show-labels`. Then you'll see one node with gpu related labels. 4. try starting this [pod](https://gist.github.com/everpeace/9e03050467d5ef5f66b7ce96b5fefa72#file-pod-yaml) ``` kubectl create -f pod.yaml ``` 5. log reports sample matrix multiplication is computed on gpus. ``` kubectl logs gpu-pod ``` ## Full changelog * add /etc/default/kubelet to worker nodes. * add nvidia driver installation support. * add gpu related config test. * it should be error when user gpu.nvidia.true with GPU unspported intance types. This change is caused by: kubernetes-retired#645 (comment) * add note which warns that driver may stop working when OS is updated. This change is caused by: kubernetes-retired#645 (comment) * move nvidia-{start, persisntenced}.service to `coreos.units` section. creation for nvidia-persistenced user to `users` section, too. This change is caused by: kubernetes-retired#645 (comment) * introduce unit dependency: kubelet --> nvidia-start --> nvidia-install deleted `systemctl` command from bash script. Instead, above unit dependency is introduced. nvidia-install.service, which just invokes build-and-install.sh is implemented type=oneshot because nvidia-start should wait until nvidia-install.service successed completely. Enabling retry build-and-install.sh, /opt/nvidia-build/util/retry.sh is introduced. It is because type=oneshot and Restart=always can't be used in systemd. * delete nvidia-install.service and now nvidia-start.service invoke build-and-install.sh via ExecStartPre with retry.sh kubelet.service 'Requires' and 'After' nvidia-star.service.
Hello community! Thank you for developing the great tool! I'm glad to have opportunity to contribute this project because my colleague @mumoshu always encourages me.
As everybody know, AWS offers Nvidia GPU ready instance type families (P2 and G2). And, of course Kubernetes supports GPU resource scheduling since 1.6. However Nvidia drivers is not installed in default coreos ami used in kube-aws. Then, let's support it!
This PR implements auto installation support of Nvidia GPU driver. I borrowed some driver installation script from /Clarifai/coreos-nvidia.
Design summary
Configuration and what will happen
New configuration for this feature is really simple.
worker.nodePool[i].gpu.nvidia.{enabled,version}
is introduced incluster.yaml
.enabled
is false.enabled: true
wheninstanceType
doesn't support GPU. In this case the configuration will be ignored.enabled: false
wheninstanceType
does support GPUenabled: true
on GPU supported instance type,kubelet.service
starting (see below).kubelet
will start with--feature-gates="Accelerators=true"
nodeAffinity
.alpha.kubernetes.io/nvidia-gpu-name=<GPU hardware type name>
kube-aws.coreos.com/gpu=nvidia
,kube-aws.coreos.com/nvidia-gpu-version=<version>
/etc/default/kubectl
for defining these label values in this commit.Driver installation process
Most of installation script is borrowed from /Clarifai/coreos-nvidia. Especially, for device node installation, I referenced to Clarifai/coreos-nvidia#4 . I just described summary of installation process.
kubelet.service
ruiresnvidia-start.service
nvidia-start.service
invokesbuild-and-install.sh
, which installs nvidia drivers and kernel module files, viaExecStartPre
.nvidia-start.service
will create device nodes(nvidiactl
andnvidia0,1,...
). Other dynamic device nodes are controlled byudevadam
(configuration is in this rule file)nvidia-start.service
istype=oneshot
becausekubelet.service
should wait untilnvidia-start.sh
completely succeeded.Restart
policy cannot be used withtype=oneshot
.nvidia-start.service
doesn't use systemd's retry feature is not used but manualretry.sh
is used.udevadam
too.How to try
kube-aws
on this branchkube-aws up
with minimal nodepool configuration belowkubectl get nodes --show-labels
. Then you'll see one node with gpu related labels.Feedbacks are always welcome!!