Use NCCL wheels from PyPI for CUDA 12 builds#2629
Use NCCL wheels from PyPI for CUDA 12 builds#2629rapids-bot[bot] merged 27 commits intorapidsai:branch-25.06from
Conversation
conda/recipes/libraft/recipe.yaml
Outdated
| - nccl ${{ nccl_version }} | ||
| - ucxx ${{ ucxx_version }} |
There was a problem hiding this comment.
libraft recipe installs the distributed component
raft/conda/recipes/libraft/recipe.yaml
Line 94 in 902ed56
NCCL and ucxx as seen here Lines 392 to 406 in 902ed56
I think these changes are needed and I believe the reason it did not error out until now is because distributed is a header-only target with INTERFACE dependencies so if consumer libraries brought in NCCL themselves it would be okay.
Happy to revert the changes, however, if this is considered an improper way of working with header only packages.
There was a problem hiding this comment.
@divyegala I think these changes to the rattler-build recipe are reasonable, but can we break them out into a separate PR so we've got a cleaner history of when and why this was added?
There was a problem hiding this comment.
Agreed, this seems fine but doesn't make sense to bundle with the NCCL wheels work.
| fi | ||
|
|
||
| RAPIDS_CUDA_MAJOR="${RAPIDS_CUDA_VERSION%%.*}" | ||
| if [[ ${RAPIDS_CUDA_MAJOR} == "12" ]]; then |
There was a problem hiding this comment.
Is there a reason we do this only for CUDA 12?
There was a problem hiding this comment.
Yes, there's no NCCL wheel distributions available for CUDA 11 since April 2024 and even then, no arm64 binaries. I think we'll have to continue to vendor NCCL binaries with CUDA 11.
There was a problem hiding this comment.
Great. That is fine! I just didn't know the constraints off the top of my head.
| common: | ||
| - output_types: conda | ||
| packages: | ||
| - &nccl_unsuffixed nccl>=2.19 |
There was a problem hiding this comment.
Is 2.19 a high enough lower bound? The latest is 2.26. I'm pretty sure we rely on newer versions than 2.19 for Blackwell support and other features. I don't know the exact bound to use here but we should do some validation with the oldest NCCL release we claim to support.
There was a problem hiding this comment.
That's a good question. I think we could bump the version of NCCL as a follow-on to this PR depending on the . I just used the version constraint that was already present. I just checked our ci-wheel images and the system version on those is also 2.26.
There was a problem hiding this comment.
Alrighty! If this matches other bounds already being used in RAFT, we can merge in the current state.
|
/merge |
Reference PR in RAFT rapidsai/raft#2629. The size of `libcuvs-cu12` wheel goes down from about 1.2G to 854M. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Bradley Dice (https://github.com/bdice) URL: #827
This PR attempts to de-vendor NCCL DSOs that we bundle along with our wheel builds.