Skip to content

WIP: wheels CI: stricter torch index selection, test oldest versions of dependencies#413

Closed
jameslamb wants to merge 17 commits intorapidsai:release/26.04from
jameslamb:fix/index-selection
Closed

WIP: wheels CI: stricter torch index selection, test oldest versions of dependencies#413
jameslamb wants to merge 17 commits intorapidsai:release/26.04from
jameslamb:fix/index-selection

Conversation

@jameslamb
Copy link
Copy Markdown
Member

@jameslamb jameslamb commented Feb 23, 2026

Fixes #410

There, some nightly wheels tests were failing because CUDA 13 packages were being installed but testing against CUDA 12 pylibwholegraph packages. This fixes that, along with some other improvements to wheel testing:

  • ensures CUDA variants of torch are always installed (no fallback to pypi.org CPU-only packages)
  • ensures the correct torch index (based on CUDA major version) is used
  • adds coverage of "oldest" dependencies group, to check that lower bounds on dependencies are correct

@copy-pr-bot

This comment was marked as resolved.

@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Feb 23, 2026
# (useful in CI scripts where we want to tightly which indices 'pip' uses).
- matrix:
include_torch_extra_index: "false"
packages:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rapids-dependency-file-generator uses the first matching matrix (see https://github.com/rapidsai/dependency-file-generator?tab=readme-ov-file#how-dependency-lists-are-merged).

This will only affect cases where include_torch_extra_index=false is passed (as in CI here). Other cases (like RAPIDS devcontainers) will fall through to othe groups that pull in --extra-index-url lines.

So this should not break any other uses of this file.

build_type: pull-request
script: ci/test_wheel_pylibwholegraph.sh
matrix_filter: map(select(.ARCH == "amd64"))
matrix_type: 'nightly'
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
matrix_type: 'nightly'

TODO: revert this. Just added here for testing, to confirm this will fix the issues we've been seeing in nightlies.

build_type: pull-request
script: ci/test_wheel_cugraph-pyg.sh
matrix_filter: map(select(.ARCH == "amd64"))
matrix_type: 'nightly'
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
matrix_type: 'nightly'

Revert before merging.

@jameslamb jameslamb changed the title WIP: [NOT READY FOR REVIEW] wheels CI: stricter torch index selection, test oldest versions of dependencies wheels CI: stricter torch index selection, test oldest versions of dependencies Feb 23, 2026
@jameslamb jameslamb marked this pull request as ready for review February 23, 2026 22:57
@jameslamb jameslamb requested review from a team as code owners February 23, 2026 22:57
@greptile-apps

This comment was marked as resolved.

greptile-apps[bot]

This comment was marked as resolved.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
greptile-apps[bot]

This comment was marked as off-topic.

greptile-apps[bot]

This comment was marked as resolved.

Co-authored-by: Bradley Dice <bdice@bradleydice.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
greptile-apps[bot]

This comment was marked as resolved.

greptile-apps[bot]

This comment was marked as off-topic.

- output_types: [conda]
packages:
- torchdata
- pydantic
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moving this here so depends_on_pytorch only ever contains torch / pytorch.

This test_python_common group is used everywhere that depends_on_pytorch is.

Comment on lines +389 to +390
# 2.6.0 is the oldest version on https://download.pytorch.org/whl/cu126 with CUDA wheels
- torch==2.6.0
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexbarghi-nv see this note.

There aren't CUDA 12 wheels available for PyTorch older than 2.6.0.

pip download \
  --isolated \
  --no-deps \
  --index-url=https://download.pytorch.org/whl/cu126 \
  'torch==2.3.0'

# ERROR: Could not find a version that satisfies the requirement torch==2.3.0
# (from versions: 2.6.0+cu126, 2.7.0+cu126, 2.7.1+cu126, 2.8.0+cu126, 2.9.0+cu126, 2.9.1+cu126, 2.10.0+cu126)

Do you want to bump the floor in dependency metadata here to >=2.6.0? Or to leave it at >=2.3 so that these libraries are still installable alongside older PyTorch releases (for example, if people build PyTorch 2.4 from source)?

Your call.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for bumping the floor to >=2.6.0. It's a little over a year old at this point. https://github.com/pytorch/pytorch/releases/tag/v2.6.0

greptile-apps[bot]

This comment was marked as off-topic.

greptile-apps[bot]

This comment was marked as off-topic.

greptile-apps[bot]

This comment was marked as resolved.

greptile-apps[bot]

This comment was marked as resolved.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
greptile-apps[bot]

This comment was marked as resolved.

jameslamb and others added 2 commits February 25, 2026 12:11
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

rapids-bot bot pushed a commit that referenced this pull request Mar 4, 2026
Nightly CI here has been failing for a couple weeks, and the root cause is "some jobs are installing incorrect `torch` wheels".

That's tracked in #410 and being worked on in #413. That work unfortunately uncovered some other significant compatibility issues that will require RAPIDS-wide fixes:

* rapidsai/build-planning#256
* rapidsai/build-planning#257
* rapidsai/cugraph#5443

As a short-term patch, this proposes allowing `cugraph-gnn` nightlies to fail for a few more weeks, so regular PR CI can be unblocked while we focus on the more permanent fix.

Targeting the more permanent fix (and reverting this back to 7 days) for the 26.04 release (so over the next few weeks).

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Gil Forsyth (https://github.com/gforsyth)

URL: #419
rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Mar 4, 2026
Contributes to #5443

Related to rapidsai/build-planning#143 

`libcugraph.so` dynamically links to several CUDA Toolkit libraries

```console
$ ldd /pyenv/versions/3.11.14/lib/python3.11/site-packages/libcugraph/lib64/libcugraph.so
        ...
        libcusolver.so.12 => /usr/local/cuda/lib64/libcusolver.so.12 (0x00007c616aba7000)
        libcublas.so.13 => /usr/local/cuda/lib64/libcublas.so.13 (0x00007c61675d5000)
        libcublasLt.so.13 => /usr/local/cuda/lib64/libcublasLt.so.13 (0x00007c6143c83000)
        libcusparse.so.12 => /usr/local/cuda/lib64/libcusparse.so.12 (0x00007c613987c000)
        libcurand.so.10 => /usr/local/cuda/lib64/libcurand.so.10 (0x00007c6131161000)
        ...
        libnvJitLink.so.13 => /usr/local/cuda/lib64/libnvJitLink.so.13 (0x00007c612af5f000)
        ...
```

This proposes getting them from `cuda-toolkit` wheels, instead of system installations.

## Notes for Reviewers

### Benefits of this change

* reduces the risk of multiple copies of the same library being loaded
* allows the use of Python package versioning to manage compatibility
* consistency with other RAPIDS libraries (see rapidsai/build-planning#35)
* reduces the risk of runtime issues with other libraries that use CTK wheels, like `torch` (rapidsai/cugraph-gnn#413 (comment))

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #5444
@jameslamb jameslamb changed the base branch from main to release/26.04 March 12, 2026 19:15
@jameslamb
Copy link
Copy Markdown
Member Author

This PR uncovered a bunch of other issues that have spiraled off into other efforts: #410 (comment)

Most of the changes here are now on #425 and the rest will be addressed soon in rapidsai/build-planning#256

Moving this to draft for now, but it might end up getting closed depending on how #425 goes.

@jameslamb jameslamb marked this pull request as draft March 12, 2026 19:19
@jameslamb jameslamb changed the title wheels CI: stricter torch index selection, test oldest versions of dependencies WIP: wheels CI: stricter torch index selection, test oldest versions of dependencies Mar 12, 2026
@jameslamb
Copy link
Copy Markdown
Member Author

Since this effort started, this work has spun out into a bunch of other PRs (#410 (comment))

I'm going to close this. It'll be replaced by:

@jameslamb jameslamb closed this Mar 17, 2026
@jameslamb jameslamb deleted the fix/index-selection branch March 17, 2026 19:51
rapids-bot bot pushed a commit that referenced this pull request Mar 18, 2026
…an optional dependency (#425)

Contributes to rapidsai/build-planning#256 and #410

## Ensures that `torch` CUDA wheels are always installed in CI

*"unintentionally installed a CPU-only `torch` from PyPI"* is a failure mode I've seen a few times for CI in this project, and each time it's happened it has take a bit of time to figure out that that was the root cause.

This PR tries to fix that by:

* using a stricter install pattern to guarantee a CUDA `torch` wheel is installed if a compatible one exists
* using local versions like `+cu130` to help prevent pulling in packages from PyPI
* adding `dependencies.yaml` items for "oldest" dependencies so CI covers a range of supported versions

I tested similar patterns in rapidsai/rmm#2279 and saw them work well there.

## Makes `torch` truly optional

We want these packages to be installable and importable without `torch`, for use in RAPIDS DLFW builds (where we don't install `torch` alongside RAPIDS because it's build in other processes).

I'd started relying on the assumption that they worked that way in this PR, but quickly realized that that isn't true... `torch` is used unconditionally in many ways in these libraries.

This PR fixes that. It makes `torch` optional and adds testing to ensure it stays that way:

* copying the `import_optional` machinery from `cugraph-pyg` into `pylibwholegraph`
* using `import_optional()` for all `torch` imports in the libraries
* using `pytest.importorskip("torch")` in tests
* deferring some `torch` inputs from import time to run time
* adding the `flake8-tidy-imports:banned-api` check from `ruff` to enforce that `import torch` isn't used anywhere in library code or test code
* explicitly testing that the libraries are still installable and at least 1 unit test can run successfully after a `pip uninstall torch`
* adding a check in `ci/validate_wheel.sh` confirming that `torch` doesn't make it into any wheel metadata *(which could happen due to mistakes in `dependencies.yaml` / `pyproject.toml`)*

## Notes for Reviewers

Pulling these changes out of #413 , so CI in this repo can immediately benefit from them and so #419 can be reverted.

When this is merged, #413 will have a smaller diff and just be focused on testing against a range of CTKs.

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #425
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] pylibwholegraph: nightly wheel tests failing: "driver on your system is too old"

2 participants