Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libnvidia-container: fix arm64 build #1915

Merged

Conversation

arnaldo2792
Copy link
Contributor

Issue number:
N / A

Description of changes:

libnvidia-container: fix arm64 build

The arm64 build doesn't provide a symlink to
libnvidia-container.so.<version>, which causes runtime errors when the
NVIDIA prestart hooks run.

Testing done:
Launched a aws-k8s-1.21 g5g instance, with NVIDIA tools. The orchestrated containers were set up correctly, and I was able to call nvidia-smi:

❯ kubectl exec nvidia-device-plugin-1642187599-jb8tc -n kube-system -it -- sh -c "uname -a && nvidia-smi"
Linux nvidia-device-plugin-1642187599-jb8tc 5.10.75 #1 SMP Fri Jan 14 18:15:25 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

Tue Jan 18 17:37:52 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA T4G          Off  | 00000000:00:1F.0 Off |                    0 |
| N/A   39C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from some wording nits - but would like to see the commit message updated to reflect the actual root cause (ldconfig)

ldconfig doesn't create the required symlinks for the arm64 build, which
causes runtime errors when the NVIDIA prestart hooks run.

Signed-off-by: Arnaldo Garcia Rincon <[email protected]>
@arnaldo2792 arnaldo2792 merged commit 72e30c8 into bottlerocket-os:develop Jan 19, 2022
@arnaldo2792 arnaldo2792 deleted the fix-libnvidia-container branch January 26, 2022 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants