Skip to content

wheels: build with CUDA 13.0, test against mix of CTK versions, make 'torch-geometric' fully optional for 'cugraph-pyg'#434

Merged
rapids-bot[bot] merged 8 commits intorapidsai:release/26.04from
jameslamb:test-older-ctk
Mar 25, 2026
Merged

wheels: build with CUDA 13.0, test against mix of CTK versions, make 'torch-geometric' fully optional for 'cugraph-pyg'#434
rapids-bot[bot] merged 8 commits intorapidsai:release/26.04from
jameslamb:test-older-ctk

Conversation

@jameslamb
Copy link
Copy Markdown
Member

@jameslamb jameslamb commented Mar 18, 2026

Description

Fixes #410

Contributes to rapidsai/build-planning#257

  • builds CUDA 13 wheels with the 13.0 CTK

Contributes to rapidsai/build-planning#256

  • updates wheel tests to cover a range of CTK versions (we previously, accidentally, were only testing the latest 12.x and 13.x)

Makes torch even more optional for wheels (follow-up to #425)

  • removes torch-geometric from cugraph-pyg wheels' runtime dependencies (leaves it for conda)
  • removes ogb and sentence-transformers from cugraph-pyg[test] (they're only used for examples that aren't run in wheels CI)

Notes for Reviewers

How I tested this

Tested the full set of nightly and PR CI jobs for wheels, saw them all pass: https://github.com/rapidsai/cugraph-gnn/actions/runs/23348691254

This should fix #410 😁

@jameslamb jameslamb requested review from a team as code owners March 18, 2026 20:44
@jameslamb jameslamb requested a review from bdice March 18, 2026 20:44
@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 18, 2026
@greptile-apps

This comment was marked as resolved.

- wheel-tests-nightly-pylibwholegraph
- wheel-build-cugraph-pyg
- wheel-tests-cugraph-pyg
- wheel-tests-nightly-cugraph-pyg
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty close!

  • ✔️ all PR CI wheel jobs passing
  • ✔️ all nightly pylibwholegraph wheel jobs passing
  • 😬 1 nightly cugraph-pyg wheel job failing
Collecting ucxx-cu12==0.49.*,>=0.0.0a0 (from cugraph-cu12==26.4.*,>=0.0.0a0->cugraph-pyg-cu12==26.4.0a40->cugraph-pyg-cu12==26.4.0a40)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a32/ucxx_cu12-0.49.0a32-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a31/ucxx_cu12-0.49.0a31-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a30/ucxx_cu12-0.49.0a30-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a29/ucxx_cu12-0.49.0a29-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a28/ucxx_cu12-0.49.0a28-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (515 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a27/ucxx_cu12-0.49.0a27-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (515 kB)
Collecting torch-geometric<2.8,>=2.5 (from cugraph-pyg-cu12==26.4.0a40->cugraph-pyg-cu12==26.4.0a40)
  Downloading http://pip-cache.local.gha-runners.nvidia.com/packages/03/9f/157e913626c1acfb3b19ce000b1a6e4e4fb177c0bc0ea0c67ca5bd714b5a/torch_geometric-2.6.1-py3-none-any.whl.metadata (63 kB)
error: resolution-too-deep

× Dependency resolution exceeded maximum depth
╰─> Pip cannot resolve the current dependencies as the dependency graph is too complex for pip to solve efficiently.

hint: Try adding lower bounds to constrain your dependencies, for example: 'package>=2.0.0' instead of just 'package'.

(wheel-tests-nightly-cugraph-pyg / 12.2.2, 3.11, amd64, ubuntu22.04, v100, earliest-driver, latest-deps)

I'll try to reproduce that locally and see if I can get a better solver error.

Copy link
Copy Markdown
Member Author

@jameslamb jameslamb Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm able to reproduce this locally

code to do that (click me)
docker run \
    --rm \
    --pull always \
    --env RAPIDS_REPOSITORY=rapidsai/cugraph-gnn \
    --env RAPIDS_SHA=13ef184fcfbeab41e096fa643f1ff082a3127ccd \
    --env RAPIDS_REF_NAME=pull-request/434 \
    --env RAPIDS_BUILD_TYPE=pull-request \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:26.04-cuda12.2.2-ubuntu22.04-py3.11 \
    bash


source rapids-init-pip

package_name="cugraph-pyg"

RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"

# Download the libwholegraph, pylibwholegraph, and cugraph-pyg built in the previous step
LIBWHOLEGRAPH_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-github cpp)
PYLIBWHOLEGRAPH_WHEELHOUSE=$(rapids-download-from-github "$(rapids-package-name "wheel_python" pylibwholegraph --stable --cuda "$RAPIDS_CUDA_VERSION")")
CUGRAPH_PYG_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="${package_name}_${RAPIDS_PY_CUDA_SUFFIX}" RAPIDS_PY_WHEEL_PURE="1" rapids-download-wheels-from-github python)

# generate constraints (possibly pinning to oldest support versions of dependencies)
rapids-generate-pip-constraints test_cugraph_pyg "${PIP_CONSTRAINT}"

rapids-generate-pip-constraints torch_only "${PIP_CONSTRAINT}"

rapids-pip-retry install \
  --prefer-binary \
  --constraint "${PIP_CONSTRAINT}" \
  --extra-index-url 'https://pypi.nvidia.com' \
  "${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \
  "$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)" \
  "$(echo "${CUGRAPH_PYG_WHEELHOUSE}"/cugraph_pyg_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)[test]"

I think I see what's happening.

  • torch-geometric and ogb require torch
  • ogb requires some nvidia-{project} CTK packages like nvidia-cuda-nvrtc
  • when we don't install a CUDA build of torch, the version of torch in the environment is only constrained by ogb and torch-geometric's requirements, which allow all the way back to torch>=1.6.0

Taken together, you end up in this "resolution-too-deep" situation, where pip is trying varying combinations of ogb, torch-geometric, and CPU-only torch. CUDA-suffixed packages make the resolution graph larger... go back far enough and ogb flips from depending on nvidia-cuda-nvrtc-cu12 to nvidia-cuda-nvrtc-cu11.

Unfortunately I think the best long-term fix here is to treat ogb and torch-geometric as fully optional for wheels just as we do torch... keeping them out of wheel metadata and installing them separately (ref: #425). If torch has to be truly optional, then anything that pulls it in needs to be optional too. I'll work on that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying this in 9c60899

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, still hitting a 'resolution-to-deep' error even without 'torch', 'ogb', or 'torch-geometric' in the solve: https://github.com/rapidsai/cugraph-gnn/actions/runs/23318288336/job/67824788071?pr=434

Will look more into this tomorrow. Maybe it's actually RAPIDS libraries that are causing the conflicts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is with pip right?

If so, maybe it is worth trying with uv. That might give us more insight into the nature of the conflict

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll consider it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the root cause... cugraph-pyg[test] had sentence-transformers in it, which pulls in torch as a required dependency. That took us back down the road of pip considering many different torch versions and other libraries with competing dependencies (including building some from source during backtracking!), which led to these issues.

We really do not want torch in the environment at all unless it's a CUDA build of torch, and that means making sentence-transformers optional just as we did with torch itself in #425.

Pushed that change and it looks like all CI Jobs (including all nightly wheels jobs!) are now passing: https://github.com/rapidsai/cugraph-gnn/actions/runs/23348691254/job/67923786458?pr=434

I'll revert the nightly stuff and go ask for a review.

@jameslamb jameslamb changed the title build wheels with CUDA 13.0.x, test wheels against mix of CTK versions wheels: build with CUDA 13.0, test against mix of CTK versions, make 'torch-geometric' fully optional for 'cugraph-pyg' Mar 19, 2026
@jameslamb jameslamb mentioned this pull request Mar 20, 2026
- matrix:
packages:
- sentence-transformers
- sentence-transformers>=3.0.1
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran into issues in this PR that were like "pip is processing a graph of possibilities that's too large".

I don't think this floor would have helped that (in this specific case, the entire dependency just needed to be skipped), but in general having floors for test-only requirements like this reduces the risk of this type of problem.

This choice is pretty arbitrary... sentence-transformers 3.0.0 came out about 2 years ago (May 2024) and 3.0.1 came out a few days later so probably fixed some bug(s).

Chose this just to go from "no floor" to "some floor", and "version from 2 years ago" seemed like a safe choice 🤷🏻

@jameslamb jameslamb requested a review from BradReesWork March 23, 2026 19:34
Copy link
Copy Markdown
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alexbarghi-nv
Copy link
Copy Markdown
Member

/merge

@rapids-bot rapids-bot bot merged commit c11936f into rapidsai:release/26.04 Mar 25, 2026
62 checks passed
@jameslamb jameslamb deleted the test-older-ctk branch March 25, 2026 16:14
rapids-bot bot pushed a commit that referenced this pull request Mar 25, 2026
Fixes #410

#434 was the last step to unblocking nightly CI in this project. This reverts #419. Once this is merged, PR CI will be blocked if the project goes 7 days without passing nightly tests.

## How I tested this:

1. manually triggered a `build` run on `release/26.04` ([link](https://github.com/rapidsai/cugraph-gnn/actions/runs/23559380423))
2. manually triggered a `test` run on `release/26.04` ([link](https://github.com/rapidsai/cugraph-gnn/actions/runs/23560028629))
3. re-ran the `check-nightly-ci` check here and saw it pass ([link](https://github.com/rapidsai/cugraph-gnn/actions/runs/23559415201/job/68607287625?pr=442))

```text
Found 1 successful runs of workflow 'test.yaml' on branch 'release/26.04' in the previous 7 days (most recent: '2026-03-25 19:59:38+00:00'). View logs:
 - https://github.com/rapidsai/cugraph-gnn/actions/runs/23560028629
```

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Joseph (https://github.com/jolorunyomi)
  - Gil Forsyth (https://github.com/gforsyth)

URL: #442
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants