-
Notifications
You must be signed in to change notification settings - Fork 225
Use NCCL wheels from PyPI for CUDA 12 builds #2629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7e37261
1df2dc7
da99d61
2916f7e
9319ddb
bd06d58
881bf17
ad3c7aa
3863d3b
1e7fa83
9de1626
fabab7b
426fca7
5290e87
602ca4f
0d94ab6
dfd88a5
7aabd32
a373adc
82ef5a6
62c8bb6
bcccb25
54bb266
132aa73
f67abbe
e5031fb
74069d0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,7 @@ files: | |
| - depends_on_distributed_ucxx | ||
| - depends_on_rmm | ||
| - depends_on_rapids_logger | ||
| - depends_on_nccl | ||
| - develop | ||
| - docs | ||
| - rapids_build_skbuild | ||
|
|
@@ -78,6 +79,7 @@ files: | |
| - build_common | ||
| - depends_on_librmm | ||
| - depends_on_rapids_logger | ||
| - depends_on_nccl | ||
| py_run_libraft: | ||
| output: pyproject | ||
| pyproject_dir: python/libraft | ||
|
|
@@ -87,6 +89,7 @@ files: | |
| - cuda_wheels | ||
| - depends_on_librmm | ||
| - depends_on_rapids_logger | ||
| - depends_on_nccl | ||
| py_build_pylibraft: | ||
| output: pyproject | ||
| pyproject_dir: python/pylibraft | ||
|
|
@@ -146,6 +149,7 @@ files: | |
| - depends_on_libraft | ||
| - depends_on_librmm | ||
| - depends_on_ucx_build | ||
| - depends_on_nccl | ||
| py_run_raft_dask: | ||
| output: pyproject | ||
| pyproject_dir: python/raft-dask | ||
|
|
@@ -154,6 +158,7 @@ files: | |
| includes: | ||
| - depends_on_distributed_ucxx | ||
| - depends_on_libraft | ||
| - depends_on_nccl | ||
| - run_raft_dask | ||
| py_test_raft_dask: | ||
| output: pyproject | ||
|
|
@@ -192,7 +197,6 @@ dependencies: | |
| - c-compiler | ||
| - cxx-compiler | ||
| - libucxx==0.44.*,>=0.0.0a0 | ||
| - nccl>=2.19 | ||
| specific: | ||
| - output_types: conda | ||
| matrices: | ||
|
|
@@ -700,3 +704,18 @@ dependencies: | |
| - matrix: null | ||
| packages: | ||
| - libucx>=1.15.0 | ||
| depends_on_nccl: | ||
| common: | ||
| - output_types: conda | ||
| packages: | ||
| - &nccl_unsuffixed nccl>=2.19 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is 2.19 a high enough lower bound? The latest is 2.26. I'm pretty sure we rely on newer versions than 2.19 for Blackwell support and other features. I don't know the exact bound to use here but we should do some validation with the oldest NCCL release we claim to support.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good question. I think we could bump the version of NCCL as a follow-on to this PR depending on the . I just used the version constraint that was already present. I just checked our
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alrighty! If this matches other bounds already being used in RAFT, we can merge in the current state. |
||
| specific: | ||
| - output_types: [pyproject, requirements] | ||
| matrices: | ||
| - matrix: | ||
| cuda: "12.*" | ||
| cuda_suffixed: "true" | ||
| packages: | ||
| - nvidia-nccl-cu12>=2.19 | ||
| - matrix: | ||
| packages: | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we do this only for CUDA 12?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there's no NCCL wheel distributions available for CUDA 11 since April 2024 and even then, no arm64 binaries. I think we'll have to continue to vendor NCCL binaries with CUDA 11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. That is fine! I just didn't know the constraints off the top of my head.