Skip to content

ci-wheel: ensure libnccl is always installed#342

Merged
bdice merged 2 commits intorapidsai:mainfrom
jameslamb:install-nccl
Jan 6, 2026
Merged

ci-wheel: ensure libnccl is always installed#342
bdice merged 2 commits intorapidsai:mainfrom
jameslamb:install-nccl

Conversation

@jameslamb
Copy link
Member

@jameslamb jameslamb commented Jan 6, 2026

While working through CUDA 13.1 rollout in RAPIDS (rapidsai/build-planning#236), we noticed that wheel builds requiring libnccl were failing (example: rapidsai/raft#2912 (comment))

The root cause appears to be:

  • those builds depended on finding a system-installed libnccl
  • the available CUDA 13.1 nvidia/cuda images (base images used here) didn't come with it preinstalled

This PR proposes installing libnccl-dev(el) at build time of rapidsai/ci-wheel if it isn't found from the base image.

Notes for Reviewers

How I tested this

Manually pushed similar changes to the build scripts in the RAFT PR where these failures were observed, and saw builds succeed: rapidsai/raft#2912

Confirmed that I saw the expected log messages and image contents from builds here. For example, in the build-images / ci-wheel / build (13.1.0, 3.13, rockylinux8, amd64) build (link)

#9 37.11 libnccl-devel not found, manually installing it
...
#9 38.23 Installing:
...
#9 38.23  libnccl-devel             x86_64  2.28.9-1+cuda13.0                       cuda        63 k

and

docker run \
  --rm \
  --pull always \
  docker.io/rapidsai/staging:ci-wheel-342-26.02-cuda13.1.0-rockylinux8-py3.13-amd64@sha256:ba0dadd01dc5145b280966b54b72e3ef86cd114b08b9c12feb867b0cb43c2aa1 \
  find / -name 'libnccl.so*'

# /usr/lib64/libnccl.so.2.28.9
# /usr/lib64/libnccl.so
# /usr/lib64/libnccl.so.2

@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 6, 2026
@jameslamb jameslamb changed the title WIP: ci-wheel: ensure libnccl is always installed ci-wheel: ensure libnccl is always installed Jan 6, 2026
@jameslamb jameslamb requested a review from gforsyth January 6, 2026 16:49
@jameslamb jameslamb marked this pull request as ready for review January 6, 2026 18:20
@jameslamb jameslamb requested a review from a team as a code owner January 6, 2026 18:20
wget \
yasm \
zip \
LIBRARIES_TO_INSTALL=(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use the same array logic in other Dockerfiles? Let's do that as a followup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, happy to do a small followup soon with this and some other minor cleanup

@bdice bdice merged commit a9b2037 into rapidsai:main Jan 6, 2026
422 checks passed
@jameslamb jameslamb deleted the install-nccl branch January 6, 2026 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants