You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These two links seem to be different ways of doing more or less the same thing, except one only uses the amdgpu-install script while the other documents the script as well as a package-manager-only approach. Are these two sets of documentation redundant, or if not what is the difference?
Also the text or latest AMD GPU Linux driver makes me think that any Linux system should work out of the box without requiring any extra software installation, because the amdgpu driver is included in the kernel:
Also since the two linked documents cover the general case and are not k8s-specific I am not sure which parts are applicable to using a GPU in k8s pods as opposed to general bare metal. For example wouldn't parts of the ROCm software be installed in an application pod rather than on the node?
Attach any links, screenshots, or additional evidence you think will be helpful.
Unfortunately, this is a consequence of the complexity of AMD GPU product stack. Due to the divergence of graphic and compute products (CDNA and RDNA in marketing terms), there are different ways of doing things and the two links represent the divergence. The choice between the two depends on the AMD GPU products you have in your servers. In theory, you can have a cluster of consumer grade AMD GPUs and this k8s device plugin will work.
Also the text or latest AMD GPU Linux driver makes me think that any Linux system should work out of the box without requiring any extra software installation, because the amdgpu driver is included in the kernel
This is correct IF the Linux distribution you choose for your cluster already has a Linux kernel that supports the AMD GPUs you have. I am not sure which kernel version you have in your example but looks like the MI210 support has already been up-streamed and included. In these cases, you should be able to simply install Linux, install k8s and deploy the device plugin and have things working.
However, there are situations where AMD just launched some new hardware and the distribution you picked is not fast moving enough to have the hardware support. In such case, you will have to follow one of the install links (depending on the SKU.)
Also since the two linked documents cover the general case and are not k8s-specific I am not sure which parts are applicable to using a GPU in k8s pods as opposed to general bare metal. For example wouldn't parts of the ROCm software be installed in an application pod rather than on the node?
Only the kernel driver installation is needed. Any user space software should be part of the container images.
Description of errors
https://github.com/ROCm/k8s-device-plugin says "ROCm kernel (Installation guide) or latest AMD GPU Linux driver (Installation guide)"
These two links seem to be different ways of doing more or less the same thing, except one only uses the amdgpu-install script while the other documents the script as well as a package-manager-only approach. Are these two sets of documentation redundant, or if not what is the difference?
Also the text
or latest AMD GPU Linux driver
makes me think that any Linux system should work out of the box without requiring any extra software installation, because the amdgpu driver is included in the kernel:Is that not the case?
Also since the two linked documents cover the general case and are not k8s-specific I am not sure which parts are applicable to using a GPU in k8s pods as opposed to general bare metal. For example wouldn't parts of the ROCm software be installed in an application pod rather than on the node?
Attach any links, screenshots, or additional evidence you think will be helpful.
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
https://amdgpu-install.readthedocs.io/en/latest/
The text was updated successfully, but these errors were encountered: