Kyma GPU driver is the remake of gardenlinux-nvidia-installer that does in-cluster node kernel detection and driver compilation, not requiring you to maintain a container repository with all possible images built for different kernel versions.
Kyma GPU driver operator requires a Kyma cluster with GPU machine types.
You can install Kyma GPU driver operator by Helm chart and with plain manifest. Helm chart is recommended since it simplifies the removal of unnecessary resources and uninstallation.
Requirements:
- Helm CLI - for details check Installing Helm
Once Helm has been set up correctly, add the repo as follows:
helm repo add kyma-gpu-driver https://kyma-project.github.io/gpu-driverIf you had already added this repo earlier, run helm repo update to retrieve
the latest versions of the packages. You can then run
helm search repo kyma-gpu-driver to see the charts.
To install the gpu-driver-operator chart you should run:
helm upgrade --install gpu-driver kyma-gpu-driver/gpu-driver-operatorRequirements:
- kubectl - for details check Kubernetes tools
To install the Kyma GPU driver operator you should run
kubectl apply -f https://raw.githubusercontent.com/kyma-project/gpu-driver/refs/heads/main/config/dist/all.yamlNote
If you are not familiar with the Kubernetes platform and details for the Kyma GPU driver operator manifests, it is recommended to use Helm installation procedure. Installing the plain manifest with kubectl does not delete the old resources, previously installed in some older version but removed in the newer release.
To instruct Kyma GPU driver operator on which nodes you would like to have GPU driver installed you should create a GpuDriver custom resource.
apiVersion: gpu.kyma-project.io/v1beta1
kind: GpuDriver
metadata:
name: gpu1
spec:
nodeSelector:
worker.gardener.cloud/pool: gpu-worker-poolThe resource above specifies that all nodes from the node pool gpu-worker-pool, will be instrumented
with the GPU driver. You can use any other set of labels as node selector. If node selector is empty, it
will match all nodes.
List help repositories with expectation that kyma-gpu-driver https://kyma-project.github.io/gpu-driver is defined.
helm repo list | grep gpu List charts in the kyma-gpu-driver repo with expectations that latest version of the kyma-gpu-driver/gpu-driver-operator chart is present.
helm search repo kyma-gpu-driverList all k8s nodes with expectation to have at least one node pool of machine types with GPU device, and that nodes in that pool have .status.capacity["nvidia.com/gpu"] > 0.
kubectl get nodes -o yamlList k8s API resources with expectation that gpudrivers gpu.kyma-project.io/v1beta1 exists.
kubectl api-resources | grep gpuList k8s namespaces with expectation that gpu-driver-system namespace exists.
kubectl get nsList GpuDriver CR resources with expectation to have one for each node pool having GPU capable machine type, with the appropriate node selector matching that node pool.
kubectl get GpuDriver -A -o yamlList operator pods in the gpu-driver-system namespace.
kubectl get pods -n gpu-driver-system -o yamlSee the Contributing Rules.
See the Code of Conduct document.
See the license file.