Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update libnvidia-container and nvidia-container-toolkit #161

Merged

Conversation

koooosh
Copy link
Contributor

@koooosh koooosh commented Sep 26, 2024

Description of changes:

This updates libnvidia-container and nvidia-container-toolkit to their latest versions.

nvidia-container-toolkit Changelog: https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.16.2

libnvidia-container Changelog: https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.16.2

Testing done:

  • aws-ecs-2-nvidia x86_64 instance launches and is able to connect to a cluster:
bash-5.1# docker ps
CONTAINER ID   IMAGE           COMMAND            CREATED          STATUS          PORTS     NAMES
308db6dda1d3   fedora:latest   "sleep infinity"   11 minutes ago   Up 11 minutes             ecs-test-ecs-gpu-2-nvidia-e0aae2e7b181e5fc6200
bash-5.1# docker exec -it 308db6dda1d3 nvidia-smi
Thu Sep 26 19:32:32 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06             Driver Version: 535.183.06   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   29C    P0              25W /  70W |      2MiB / 15360MiB |      5%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
  • aws-k8s-1.29-nvidia x86_64 node launches and is able to connect to a cluster:
bash-5.1# /usr/libexec/nvidia/tesla/bin/nvidia-smi
Thu Sep 26 20:31:55 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06             Driver Version: 535.183.06   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   25C    P8              11W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Copy link
Member

@larvacea larvacea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mechanically: the download URLs work, and the sha512 checksums are correct.

Copy link
Contributor

@yeazelm yeazelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me assuming testing passes.

@koooosh koooosh force-pushed the update-nvidia-container-toolkit branch from 8d4b1d2 to f1ea1e8 Compare September 26, 2024 19:39
@koooosh
Copy link
Contributor Author

koooosh commented Sep 26, 2024

^ updating commit messages to include version

@koooosh koooosh marked this pull request as ready for review September 26, 2024 21:23
@koooosh koooosh merged commit 323f5af into bottlerocket-os:develop Sep 26, 2024
2 checks passed
koooosh added a commit to koooosh/bottlerocket-core-kit-fork that referenced this pull request Sep 28, 2024
…tainer-toolkit

Update libnvidia-container and nvidia-container-toolkit
@koooosh koooosh deleted the update-nvidia-container-toolkit branch October 3, 2024 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants