NIXL EP wheels: Add support for multiple CUDA and PyTorch versions by ovidiusm · Pull Request #1646 · ai-dynamo/nixl

ovidiusm · 2026-05-15T10:39:28Z

What?

Adds a nixl_ep Python dispatch layer that ships in the meta wheel and routes import nixl_ep to the correct backend at runtime, picked from two axes:

CUDA major (cu12 vs cu13) — selected from torch.version.cuda.
torch minor ABI (e.g. 2.11 vs 2.12) — selected from torch.__version__.

Each nixl-cu12 / nixl-cu13 backend wheel ships all supported torch ABIs as separate compiled extensions in the same package, so a single backend wheel works against any installed torch in the supported range without reinstalling.

Why?

vLLM is working on upgrading torch from 2.11 to 2.12. Currently NIXL EP wheel ships binary with ABI compatibility only with 2.11, by the time we release this may be obsolete.

Also, there may be a transition period where vLLM will ship a mix of images, some requiring PyTorch 2.11 (stable) and some 2.12 (nightly) so it is best if we can support both.

vLLM PR: vllm-project/vllm#40077

Also fixes vllm-project/vllm#42525

How?

The full import flow works like this:

import nixl_ep lands in the meta dispatcher, src/bindings/python/nixl-meta/nixl_ep/__init__.py. It reads torch.version.cuda and picks nixl_ep_cu12 or nixl_ep_cu13:

def _load_ep_backend() -> str:
    cuda_major = _get_torch_cuda_major()
    if cuda_major is not None:
        pip_name = f"nixl-cu{cuda_major}"
        mod_name = f"nixl_ep_cu{cuda_major}"
        try:
            return importlib.import_module(mod_name).__name__
        except ModuleNotFoundError as e:
            if e.name != mod_name:
                raise
            raise ImportError(
                f"torch reports CUDA {cuda_major} but {pip_name} is not installed"
            ) from e

On CPU-only torch it falls through to a cu13→cu12 probe order so that something gets loaded and the import does not fail, since vLLM imports unconditionally, but it will not be used. This is the same as import nixl works.

The selected backend (e.g. nixl_ep_cu13/__init__.py, shipped in the nixl-cu13 wheel) reads torch.__version__ and loads the matching ABI extension bindings through a second dispatch:
```
_torch_mm = "".join(torch.__version__.split(".")[:2])
_nixl_ep_cpp = importlib.import_module(f".nixl_ep_cpp_torch{_torch_mm}", __package__)
```
With torch 2.12 installed, this resolves to nixl_ep_cu13.nixl_ep_cpp_torch212.

The meta dispatcher then re-exposes the backend's public symbols and submodules under the top-level nixl_ep namespace so that the import is fully pass-through:

submodules = ["buffer", "utils"]
for sub_name in submodules:
    module = importlib.import_module(f"{_pkg.__name__}.{sub_name}")
    sys.modules[f"nixl_ep.{sub_name}"] = module
    setattr(sys.modules[__name__], sub_name, module)
    for attr in dir(module):
        if not attr.startswith("_"):
            setattr(sys.modules[__name__], attr, getattr(module, attr))

for attr in dir(_pkg):
    if not attr.startswith("_"):
        setattr(sys.modules[__name__], attr, getattr(_pkg, attr))

So nixl_ep.Buffer, nixl_ep.EventOverlap, nixl_ep.Config, nixl_ep.utils, nixl_ep.buffer all work regardless of which (CUDA, torch) backend was selected.

If torch reports a CUDA major for which no backend wheel is installed, the meta dispatcher raises ImportError with a clear message (e.g. "torch reports CUDA 13 but nixl-cu13 is not installed").

Wheel layout

Supported PyTorch versions: 2.11,2.12.

PyTorch 2.13 is intentionally excluded since it is a nightly package that is not built for all platforms at the moment, and also requires C++20. Out of scope of this PR.

Meta wheel — nixl-1.1.0-py3-none-any.whl (~12 KB, pure-Python, dispatch only):

   2933  nixl/__init__.py               ← NIXL dispatcher
   1593  nixl/_api.py
   1166  nixl/logging.py
   2916  nixl_ep/__init__.py            ← NIXL EP dispatcher

Backend wheel — nixl_cu13-1.1.0-cp312-cp312-manylinux_2_28_x86_64.whl:

        0  nixl_cu13/
     1099  nixl_cu13/__init__.py
    45009  nixl_cu13/_api.py
  1122345  nixl_cu13/_bindings.cpython-312-x86_64-linux-gnu.so
   187232  nixl_cu13/_utils.cpython-312-x86_64-linux-gnu.so
     3640  nixl_cu13/logging.py
        0  nixl_cu13/py.typed
        0  nixl_ep_cu13/
     1474  nixl_ep_cu13/__init__.py
    36336  nixl_ep_cu13/buffer.py
     2985  nixl_ep_cu13/utils.py
  9447553  nixl_ep_cu13/nixl_ep_cpp_torch211.cpython-312-x86_64-linux-gnu.so    <-- PyTorch 2.11 bindings
  9555257  nixl_ep_cu13/nixl_ep_cpp_torch212.cpython-312-x86_64-linux-gnu.so    <-- PyTorch 2.12 bindings

The cu12 backend wheel mirrors this, with nixl_cu12/ + nixl_ep_cu12/ and the same per-torch .so set.

Testing

Test matrix

run	torch resolved	dispatched backend	loaded `.so`	`libcudart.so.X`
CUDA 12 PyTorch 2.11	`2.11.0+cu129` (stable)	`nixl_ep_cu12`	`nixl_ep_cpp_torch211`	`12` ✓
CUDA 12 PyTorch 2.12	`2.12.0.dev20260401+cu129` (nightly)	`nixl_ep_cu12`	`nixl_ep_cpp_torch212`	`12` ✓
CUDA 13 PyTorch 2.11	`2.11.0+cu130` (stable)	`nixl_ep_cu13`	`nixl_ep_cpp_torch211`	`13` ✓
CUDA 13 PyTorch 2.12	`2.12.0+cu130` (stable)	`nixl_ep_cu13`	`nixl_ep_cpp_torch212`	`13` ✓

For each case, we check that:

CUDA-major assertion passes (torch.version.cuda matches the requested cuda-major).
Meta nixl_ep dispatcher routed to the correct nixl_ep_cu{12,13} backend based on torch.version.cuda.
Backend selected the correct nixl_ep_cpp_torch{211,212}.so based on torch.__version__.
Buffer, Config, EventOverlap all resolve through both layers (the nixl_ep_cpp alias trick is working in buffer.py/utils.py).
ldd shows the wheel-bundled libucp-…/libucs-… from nixl_cu{12,13}.libs/, the right libcudart.so.{12,13} for the cu-major, and libtorch*.so from the resolved torch in the venv. No not found.
import nixl (non-EP meta) also imports clean.

End-to-end: dispatch contract from nixl_ep/__init__.py (cuda-major from torch.version.cuda) and nixl_ep_cu{N}/__init__.py (torch ABI slug from torch.__version__) is verified across the full {cu12, cu13} × {2.11, 2.12} matrix.

Testing strategy

# install uv
apt-get update -qq && apt-get install -y curl ca-certificates
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH=$HOME/.local/bin:$PATH

# venv
uv venv --python 3.12 /venv
source /venv/bin/activate

# install torch — pick ONE line based on your image:
uv pip install --index-url https://download.pytorch.org/whl/cu129 'torch==2.11.*'    # cu12 + 2.11
uv pip install --index-url https://download.pytorch.org/whl/cu129 --extra-index-url https://download.pytorch.org/whl/nightly/cu129 --index-strategy unsafe-best-match --pre 'torch==2.12.*'    # cu12 + 2.12
uv pip install --index-url https://download.pytorch.org/whl/cu130 'torch==2.11.*'    # cu13 + 2.11
uv pip install --index-url https://download.pytorch.org/whl/cu130 'torch==2.12.*'    # cu13 + 2.12

# install nixl
uv pip install --find-links /wheels nixl

# test
python -c 'import nixl_ep; print(nixl_ep.Buffer, nixl_ep.Config)'

Smoke test script

#!/usr/bin/env bash
set -euo pipefail

if [ "$#" -lt 2 ]; then
    echo "usage: $0 <cuda-major: 12|13> <torch-version: 2.11|2.12>" >&2
    exit 2
fi

CUDA_MAJOR="$1"
TORCH_VER="$2"
WHEEL_DIR="${WHEEL_DIR:-$(dirname "$(readlink -f "$0")")}"

case "$CUDA_MAJOR" in
    12) IMAGE="nvidia/cuda:12.9.0-runtime-ubuntu22.04"; CU_TAG="cu129" ;;
    13) IMAGE="nvidia/cuda:13.0.0-runtime-ubuntu22.04"; CU_TAG="cu130" ;;
    *) echo "cuda-major must be 12 or 13" >&2; exit 2 ;;
esac

case "$TORCH_VER" in
    2.11|2.12) ;;
    *) echo "torch-version must be 2.11 or 2.12" >&2; exit 2 ;;
esac

echo ">>> image     : $IMAGE"
echo ">>> cu-tag    : $CU_TAG"
echo ">>> torch     : $TORCH_VER"
echo ">>> wheel dir : $WHEEL_DIR"

docker run --rm --gpus all \
    -v "$WHEEL_DIR":/wheels:ro \
    -e CUDA_MAJOR="$CUDA_MAJOR" -e TORCH_VER="$TORCH_VER" -e CU_TAG="$CU_TAG" \
    "$IMAGE" bash -exc '
        set -euo pipefail

        apt-get update -qq
        apt-get install -y -qq --no-install-recommends curl ca-certificates >/dev/null
        curl -LsSf https://astral.sh/uv/install.sh | sh >/dev/null
        export PATH="$HOME/.local/bin:$PATH"

        uv venv --python 3.12 /venv
        # shellcheck disable=SC1091
        . /venv/bin/activate

        # Only torch==2.12 on cu129 is nightly-only today; everything else
        # is on the stable cu index. Add --pre + nightly index iff needed.
        TORCH_FLAGS=( --index-url "https://download.pytorch.org/whl/${CU_TAG}" )
        if [ "$CU_TAG" = "cu129" ] && [ "$TORCH_VER" = "2.12" ]; then
            TORCH_FLAGS+=(
                --extra-index-url "https://download.pytorch.org/whl/nightly/${CU_TAG}"
                --pre
                --index-strategy unsafe-best-match
            )
        fi
        uv pip install "${TORCH_FLAGS[@]}" "torch==${TORCH_VER}.*"

        # nixl + transitive deps (numpy, etc.) — torch already satisfied;
        # local wheel dir provides the meta + nixl-cu{12,13}.
        uv pip install --find-links /wheels nixl

        echo "=== environment ==="
        python -c "
import torch
print(f\"torch={torch.__version__}  cuda={torch.version.cuda}\")
expected = ${CUDA_MAJOR}
got = int(torch.version.cuda.split(\".\")[0]) if torch.version.cuda else None
assert got == expected, f\"expected torch built for cuda{expected}, got cuda{got}\"
"

        echo "=== nixl_ep dispatch ==="
        python - <<PY
import nixl_ep, sys
print("nixl_ep loaded from:", nixl_ep.__file__)
backend = next(m for n, m in sys.modules.items()
               if n.startswith("nixl_ep_cu") and "." not in n)
print("dispatched backend :", backend.__name__, "->", backend.__file__)
print("loaded torch-abi .so:", backend._nixl_ep_cpp.__file__)
print("Buffer       :", nixl_ep.Buffer)
print("Config       :", nixl_ep.Config)
print("EventOverlap :", nixl_ep.EventOverlap)
PY

        echo "=== resolved deps of active .so ==="
        python - <<PY
import os, subprocess, sys, torch
import nixl_ep
backend = next(m for n, m in sys.modules.items()
               if n.startswith("nixl_ep_cu") and "." not in n)
so = backend._nixl_ep_cpp.__file__
torch_libdir = os.path.join(os.path.dirname(torch.__file__), "lib")
env = {**os.environ, "LD_LIBRARY_PATH": torch_libdir + ":" + os.environ.get("LD_LIBRARY_PATH", "")}
print(so)
out = subprocess.check_output(["ldd", so], env=env, text=True)
for line in out.splitlines():
    if any(x in line for x in ("libcudart", "libtorch", "libcuda.", "libucp", "libucs", "not found")):
        print(line)
PY

        echo "=== nixl dispatch ==="
        python -c "import nixl; print(\"nixl OK\", nixl.__file__)"
    '

Test log:

Details

./smoke 12 2.11
./smoke 12 2.12
./smoke 13 2.11
./smoke 13 2.12
>>> image     : nvidia/cuda:12.9.0-runtime-ubuntu22.04
>>> cu-tag    : cu129
>>> torch     : 2.11
>>> wheel dir : /.autodirect/mtrswgwork/ovidium/wheels/nixl-ep-wheel-dispatch-20260515

==========
== CUDA ==
==========

CUDA Version 12.9.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

+ set -euo pipefail
+ apt-get update -qq
+ apt-get install -y -qq --no-install-recommends curl ca-certificates
debconf: delaying package configuration, since apt-utils is not installed
+ curl -LsSf https://astral.sh/uv/install.sh
+ sh
downloading uv 0.11.14 x86_64-unknown-linux-gnu
+ export PATH=/root/.local/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/root/.local/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ uv venv --python 3.12 /venv
Downloading cpython-3.12.13-linux-x86_64-gnu (download) (32.5MiB)
 Downloaded cpython-3.12.13-linux-x86_64-gnu (download)
Using CPython 3.12.13
Creating virtual environment at: /venv
Activate with: source venv/bin/activate
+ . /venv/bin/activate
++ '[' -z '' ']'
++ '[' -n x ']'
++ SCRIPT_PATH=/venv/bin/activate
++ '[' /venv/bin/activate = bash ']'
++ deactivate nondestructive
++ unset -f pydoc
++ '[' -z '' ']'
++ '[' -z '' ']'
++ hash -r
++ '[' -z '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ VIRTUAL_ENV=/venv
++ '[' linux-gnu = cygwin ']'
++ '[' linux-gnu = msys ']'
++ export VIRTUAL_ENV
++ '[' -z '' ']'
++ unset SCRIPT_PATH
++ _OLD_VIRTUAL_PATH=/root/.local/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ PATH=/venv/bin:/root/.local/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ '[' x '!=' x ']'
+++ basename /venv
++ VIRTUAL_ENV_PROMPT=venv
++ export VIRTUAL_ENV_PROMPT
++ '[' -z '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(venv) '
++ export PS1
++ alias pydoc
++ true
++ hash -r
+ TORCH_FLAGS=(--index-url "https://download.pytorch.org/whl/${CU_TAG}")
+ '[' cu129 = cu129 ']'
+ '[' 2.11 = 2.12 ']'
+ uv pip install --index-url https://download.pytorch.org/whl/cu129 'torch==2.11.*'
Using Python 3.12.13 environment at: /venv
Resolved 29 packages in 1.95s
Downloading nvidia-curand-cu12 (65.1MiB)
Downloading torch (1.1GiB)
Downloading cuda-bindings (11.6MiB)
Downloading triton (179.6MiB)
Downloading nvidia-cuda-nvrtc-cu12 (85.4MiB)
Downloading nvidia-nccl-cu12 (283.0MiB)
Downloading nvidia-cusparse-cu12 (349.5MiB)
Downloading nvidia-cuda-runtime-cu12 (3.3MiB)
Downloading nvidia-cublas-cu12 (554.3MiB)
Downloading nvidia-cufile-cu12 (1.2MiB)
Downloading nvidia-nvjitlink-cu12 (37.9MiB)
Downloading nvidia-cudnn-cu12 (617.8MiB)
Downloading nvidia-cusparselt-cu12 (273.9MiB)
Downloading nvidia-cufft-cu12 (191.6MiB)
Downloading nvidia-nvshmem-cu12 (132.7MiB)
Downloading nvidia-cuda-cupti-cu12 (10.3MiB)
Downloading nvidia-cusolver-cu12 (322.5MiB)
Downloading networkx (2.0MiB)
Downloading sympy (6.0MiB)
 Downloaded nvidia-cufile-cu12
 Downloaded nvidia-cuda-runtime-cu12
 Downloaded cuda-bindings
 Downloaded nvidia-cuda-cupti-cu12
 Downloaded nvidia-curand-cu12
 Downloaded networkx
 Downloaded triton
 Downloaded nvidia-nvjitlink-cu12
 Downloaded nvidia-cuda-nvrtc-cu12
 Downloaded sympy
 Downloaded nvidia-nccl-cu12
 Downloaded nvidia-nvshmem-cu12
 Downloaded nvidia-cusparse-cu12
 Downloaded nvidia-cufft-cu12
 Downloaded nvidia-cusparselt-cu12
 Downloaded nvidia-cublas-cu12
 Downloaded nvidia-cusolver-cu12
 Downloaded nvidia-cudnn-cu12
 Downloaded torch
Prepared 29 packages in 34.86s
Installed 29 packages in 318ms
 + cuda-bindings==12.9.4
 + cuda-pathfinder==1.2.2
 + cuda-toolkit==12.9.1
 + filelock==3.29.0
 + fsspec==2026.4.0
 + jinja2==3.1.6
 + markupsafe==3.0.3
 + mpmath==1.3.0
 + networkx==3.6.1
 + nvidia-cublas-cu12==12.9.1.4
 + nvidia-cuda-cupti-cu12==12.9.79
 + nvidia-cuda-nvrtc-cu12==12.9.86
 + nvidia-cuda-runtime-cu12==12.9.79
 + nvidia-cudnn-cu12==9.17.1.4
 + nvidia-cufft-cu12==11.4.1.4
 + nvidia-cufile-cu12==1.14.1.1
 + nvidia-curand-cu12==10.3.10.19
 + nvidia-cusolver-cu12==11.7.5.82
 + nvidia-cusparse-cu12==12.5.10.65
 + nvidia-cusparselt-cu12==0.7.1
 + nvidia-nccl-cu12==2.28.9
 + nvidia-nvjitlink-cu12==12.9.86
 + nvidia-nvshmem-cu12==3.4.5
 + nvidia-nvtx-cu12==12.9.79
 + setuptools==70.2.0
 + sympy==1.14.0
 + torch==2.11.0+cu129
 + triton==3.6.0
 + typing-extensions==4.15.0
+ uv pip install --find-links /wheels nixl
Using Python 3.12.13 environment at: /venv
Resolved 33 packages in 1.04s
Downloading numpy (15.9MiB)
 Downloaded numpy
Prepared 4 packages in 962ms
Installed 4 packages in 64ms
 + nixl==1.1.0
 + nixl-cu12==1.1.0
 + nixl-cu13==1.1.0
 + numpy==2.4.4
+ echo '=== environment ==='
=== environment ===
+ python -c '
import torch
print(f"torch={torch.__version__}  cuda={torch.version.cuda}")
expected = 12
got = int(torch.version.cuda.split(".")[0]) if torch.version.cuda else None
assert got == expected, f"expected torch built for cuda{expected}, got cuda{got}"
'
torch=2.11.0+cu129  cuda=12.9
=== nixl_ep dispatch ===
+ echo '=== nixl_ep dispatch ==='
+ python -
nixl_ep loaded from: /venv/lib/python3.12/site-packages/nixl_ep/__init__.py
dispatched backend : nixl_ep_cu12 -> /venv/lib/python3.12/site-packages/nixl_ep_cu12/__init__.py
loaded torch-abi .so: /venv/lib/python3.12/site-packages/nixl_ep_cu12/nixl_ep_cpp_torch211.cpython-312-x86_64-linux-gnu.so
Buffer       : <class 'nixl_ep_cu12.buffer.Buffer'>
Config       : <class 'nixl_ep_cu12.nixl_ep_cpp_torch211.Config'>
EventOverlap : <class 'nixl_ep_cu12.utils.EventOverlap'>
=== resolved deps of active .so ===
+ echo '=== resolved deps of active .so ==='
+ python -
/venv/lib/python3.12/site-packages/nixl_ep_cu12/nixl_ep_cpp_torch211.cpython-312-x86_64-linux-gnu.so
        libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x00007ffff1618000)
        libucp-14aa4f8d.so.0.0.0 => /venv/lib/python3.12/site-packages/nixl_ep_cu12/../nixl_cu12.libs/libucp-14aa4f8d.so.0.0.0 (0x00007ffff1507000)
        libucs-f64e0bb0.so.0.0.0 => /venv/lib/python3.12/site-packages/nixl_ep_cu12/../nixl_cu12.libs/libucs-f64e0bb0.so.0.0.0 (0x00007ffff1435000)
        libtorch.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch.so (0x00007ffff13d0000)
        libtorch_python.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so (0x00007fffefd77000)
        libtorch_cuda.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so (0x00007fff9ffd1000)
        libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007fff9fa00000)
        libtorch_cpu.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so (0x00007fff89bc5000)
        libtorch_nvshmem.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_nvshmem.so (0x00007fff6da25000)
=== nixl dispatch ===
+ echo '=== nixl dispatch ==='
+ python -c 'import nixl; print("nixl OK", nixl.__file__)'
nixl OK /venv/lib/python3.12/site-packages/nixl/__init__.py
>>> image     : nvidia/cuda:12.9.0-runtime-ubuntu22.04
>>> cu-tag    : cu129
>>> torch     : 2.12
>>> wheel dir : /.autodirect/mtrswgwork/ovidium/wheels/nixl-ep-wheel-dispatch-20260515

==========
== CUDA ==
==========

CUDA Version 12.9.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

+ set -euo pipefail
+ apt-get update -qq
+ apt-get install -y -qq --no-install-recommends curl ca-certificates
debconf: delaying package configuration, since apt-utils is not installed
+ curl -LsSf https://astral.sh/uv/install.sh
+ sh
downloading uv 0.11.14 x86_64-unknown-linux-gnu
+ export PATH=/root/.local/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/root/.local/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ uv venv --python 3.12 /venv
Downloading cpython-3.12.13-linux-x86_64-gnu (download) (32.5MiB)
 Downloaded cpython-3.12.13-linux-x86_64-gnu (download)
Using CPython 3.12.13
Creating virtual environment at: /venv
Activate with: source venv/bin/activate
+ . /venv/bin/activate
++ '[' -z '' ']'
++ '[' -n x ']'
++ SCRIPT_PATH=/venv/bin/activate
++ '[' /venv/bin/activate = bash ']'
++ deactivate nondestructive
++ unset -f pydoc
++ '[' -z '' ']'
++ '[' -z '' ']'
++ hash -r
++ '[' -z '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ VIRTUAL_ENV=/venv
++ '[' linux-gnu = cygwin ']'
++ '[' linux-gnu = msys ']'
++ export VIRTUAL_ENV
++ '[' -z '' ']'
++ unset SCRIPT_PATH
++ _OLD_VIRTUAL_PATH=/root/.local/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ PATH=/venv/bin:/root/.local/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ '[' x '!=' x ']'
+++ basename /venv
++ VIRTUAL_ENV_PROMPT=venv
++ export VIRTUAL_ENV_PROMPT
++ '[' -z '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(venv) '
++ export PS1
++ alias pydoc
++ true
++ hash -r
+ TORCH_FLAGS=(--index-url "https://download.pytorch.org/whl/${CU_TAG}")
+ '[' cu129 = cu129 ']'
+ '[' 2.12 = 2.12 ']'
+ TORCH_FLAGS+=(--extra-index-url "https://download.pytorch.org/whl/nightly/${CU_TAG}" --pre --index-strategy unsafe-best-match)
+ uv pip install --index-url https://download.pytorch.org/whl/cu129 --extra-index-url https://download.pytorch.org/whl/nightly/cu129 --pre --index-strategy unsafe-best-match 'torch==2.12.*'
Using Python 3.12.13 environment at: /venv
Resolved 29 packages in 2.47s
Downloading nvidia-cuda-runtime-cu12 (3.3MiB)
Downloading nvidia-cublas-cu12 (554.3MiB)
Downloading nvidia-cusparse-cu12 (349.5MiB)
Downloading nvidia-cufile-cu12 (1.2MiB)
Downloading nvidia-nvjitlink-cu12 (37.9MiB)
Downloading nvidia-cufft-cu12 (191.6MiB)
Downloading nvidia-nvshmem-cu12 (132.7MiB)
Downloading nvidia-cuda-cupti-cu12 (10.3MiB)
Downloading nvidia-cusolver-cu12 (322.5MiB)
Downloading nvidia-curand-cu12 (65.1MiB)
Downloading nvidia-nccl-cu12 (280.0MiB)
Downloading nvidia-cuda-nvrtc-cu12 (85.4MiB)
Downloading nvidia-cusparselt-cu12 (228.2MiB)
Downloading nvidia-cudnn-cu12 (627.4MiB)
Downloading setuptools (1.2MiB)
Downloading networkx (2.0MiB)
Downloading sympy (6.0MiB)
Downloading cuda-bindings (11.6MiB)
Downloading triton (278.6MiB)
Downloading torch (1.1GiB)
 Downloaded nvidia-cufile-cu12
 Downloaded nvidia-cuda-runtime-cu12
 Downloaded cuda-bindings
 Downloaded nvidia-cuda-cupti-cu12
 Downloaded setuptools
 Downloaded networkx
 Downloaded nvidia-nvjitlink-cu12
 Downloaded triton
 Downloaded sympy
 Downloaded nvidia-curand-cu12
 Downloaded nvidia-cuda-nvrtc-cu12
 Downloaded nvidia-cusparse-cu12
 Downloaded nvidia-nvshmem-cu12
 Downloaded nvidia-cublas-cu12
 Downloaded nvidia-cufft-cu12
 Downloaded nvidia-cusolver-cu12
 Downloaded nvidia-nccl-cu12
 Downloaded nvidia-cudnn-cu12
 Downloaded nvidia-cusparselt-cu12
 Downloaded torch
Prepared 29 packages in 36.00s
Installed 29 packages in 315ms
 + cuda-bindings==12.9.4
 + cuda-pathfinder==1.2.2
 + cuda-toolkit==12.9.1
 + filelock==3.29.0
 + fsspec==2026.4.0
 + jinja2==3.1.6
 + markupsafe==3.0.3
 + mpmath==1.3.0
 + networkx==3.6.1
 + nvidia-cublas-cu12==12.9.1.4
 + nvidia-cuda-cupti-cu12==12.9.79
 + nvidia-cuda-nvrtc-cu12==12.9.86
 + nvidia-cuda-runtime-cu12==12.9.79
 + nvidia-cudnn-cu12==9.20.0.48
 + nvidia-cufft-cu12==11.4.1.4
 + nvidia-cufile-cu12==1.14.1.1
 + nvidia-curand-cu12==10.3.10.19
 + nvidia-cusolver-cu12==11.7.5.82
 + nvidia-cusparse-cu12==12.5.10.65
 + nvidia-cusparselt-cu12==0.8.1
 + nvidia-nccl-cu12==2.29.7
 + nvidia-nvjitlink-cu12==12.9.86
 + nvidia-nvshmem-cu12==3.4.5
 + nvidia-nvtx-cu12==12.9.79
 + setuptools==78.1.0
 + sympy==1.14.0
 + torch==2.12.0.dev20260401+cu129
 + triton==3.7.0+git9c288bc5
 + typing-extensions==4.15.0
+ uv pip install --find-links /wheels nixl
Using Python 3.12.13 environment at: /venv
Resolved 33 packages in 921ms
Downloading numpy (15.9MiB)
 Downloaded numpy
Prepared 4 packages in 1.01s
Installed 4 packages in 61ms
 + nixl==1.1.0
 + nixl-cu12==1.1.0
 + nixl-cu13==1.1.0
 + numpy==2.4.4
+ echo '=== environment ==='
=== environment ===
+ python -c '
import torch
print(f"torch={torch.__version__}  cuda={torch.version.cuda}")
expected = 12
got = int(torch.version.cuda.split(".")[0]) if torch.version.cuda else None
assert got == expected, f"expected torch built for cuda{expected}, got cuda{got}"
'
torch=2.12.0.dev20260401+cu129  cuda=12.9
=== nixl_ep dispatch ===
+ echo '=== nixl_ep dispatch ==='
+ python -
nixl_ep loaded from: /venv/lib/python3.12/site-packages/nixl_ep/__init__.py
dispatched backend : nixl_ep_cu12 -> /venv/lib/python3.12/site-packages/nixl_ep_cu12/__init__.py
loaded torch-abi .so: /venv/lib/python3.12/site-packages/nixl_ep_cu12/nixl_ep_cpp_torch212.cpython-312-x86_64-linux-gnu.so
Buffer       : <class 'nixl_ep_cu12.buffer.Buffer'>
Config       : <class 'nixl_ep_cu12.nixl_ep_cpp_torch212.Config'>
EventOverlap : <class 'nixl_ep_cu12.utils.EventOverlap'>
=== resolved deps of active .so ===
+ echo '=== resolved deps of active .so ==='
+ python -
/venv/lib/python3.12/site-packages/nixl_ep_cu12/nixl_ep_cpp_torch212.cpython-312-x86_64-linux-gnu.so
        libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x00007ffff15ff000)
        libucp-14aa4f8d.so.0.0.0 => /venv/lib/python3.12/site-packages/nixl_ep_cu12/../nixl_cu12.libs/libucp-14aa4f8d.so.0.0.0 (0x00007ffff14ee000)
        libucs-f64e0bb0.so.0.0.0 => /venv/lib/python3.12/site-packages/nixl_ep_cu12/../nixl_cu12.libs/libucs-f64e0bb0.so.0.0.0 (0x00007ffff141c000)
        libtorch.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch.so (0x00007ffff13e9000)
        libtorch_python.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so (0x00007fffefd99000)
        libtorch_cuda.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so (0x00007fff9d8af000)
        libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007fff9d200000)
        libtorch_cpu.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so (0x00007fff87748000)
        libtorch_nvshmem.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_nvshmem.so (0x00007fff867a3000)
=== nixl dispatch ===
+ echo '=== nixl dispatch ==='
+ python -c 'import nixl; print("nixl OK", nixl.__file__)'
nixl OK /venv/lib/python3.12/site-packages/nixl/__init__.py
>>> image     : nvidia/cuda:13.0.0-runtime-ubuntu22.04
>>> cu-tag    : cu130
>>> torch     : 2.11
>>> wheel dir : /.autodirect/mtrswgwork/ovidium/wheels/nixl-ep-wheel-dispatch-20260515

==========
== CUDA ==
==========

CUDA Version 13.0.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

+ set -euo pipefail
+ apt-get update -qq
+ apt-get install -y -qq --no-install-recommends curl ca-certificates
debconf: delaying package configuration, since apt-utils is not installed
+ curl -LsSf https://astral.sh/uv/install.sh
+ sh
downloading uv 0.11.14 x86_64-unknown-linux-gnu
+ export PATH=/root/.local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/root/.local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ uv venv --python 3.12 /venv
Downloading cpython-3.12.13-linux-x86_64-gnu (download) (32.5MiB)
 Downloaded cpython-3.12.13-linux-x86_64-gnu (download)
Using CPython 3.12.13
Creating virtual environment at: /venv
Activate with: source venv/bin/activate
+ . /venv/bin/activate
++ '[' -z '' ']'
++ '[' -n x ']'
++ SCRIPT_PATH=/venv/bin/activate
++ '[' /venv/bin/activate = bash ']'
++ deactivate nondestructive
++ unset -f pydoc
++ '[' -z '' ']'
++ '[' -z '' ']'
++ hash -r
++ '[' -z '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ VIRTUAL_ENV=/venv
++ '[' linux-gnu = cygwin ']'
++ '[' linux-gnu = msys ']'
++ export VIRTUAL_ENV
++ '[' -z '' ']'
++ unset SCRIPT_PATH
++ _OLD_VIRTUAL_PATH=/root/.local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ PATH=/venv/bin:/root/.local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ '[' x '!=' x ']'
+++ basename /venv
++ VIRTUAL_ENV_PROMPT=venv
++ export VIRTUAL_ENV_PROMPT
++ '[' -z '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(venv) '
++ export PS1
++ alias pydoc
++ true
++ hash -r
+ TORCH_FLAGS=(--index-url "https://download.pytorch.org/whl/${CU_TAG}")
+ '[' cu130 = cu129 ']'
+ uv pip install --index-url https://download.pytorch.org/whl/cu130 'torch==2.11.*'
Using Python 3.12.13 environment at: /venv
Resolved 29 packages in 1.86s
Downloading nvidia-cufft (204.2MiB)
Downloading triton (179.6MiB)
Downloading nvidia-cusolver (191.6MiB)
Downloading nvidia-cuda-runtime (2.1MiB)
Downloading nvidia-nvjitlink (38.8MiB)
Downloading nvidia-cufile (1.2MiB)
Downloading nvidia-curand (56.8MiB)
Downloading nvidia-cudnn-cu13 (349.1MiB)
Downloading nvidia-cusparse (139.2MiB)
Downloading nvidia-nvshmem-cu13 (57.6MiB)
Downloading nvidia-cublas (403.5MiB)
Downloading nvidia-cuda-nvrtc (86.0MiB)
Downloading nvidia-cusparselt-cu13 (162.0MiB)
Downloading nvidia-nccl-cu13 (187.4MiB)
Downloading nvidia-cuda-cupti (10.2MiB)
Downloading networkx (2.0MiB)
Downloading sympy (6.0MiB)
Downloading torch (506.5MiB)
Downloading cuda-bindings (11.6MiB)
 Downloaded nvidia-cufile
 Downloaded nvidia-cuda-runtime
 Downloaded cuda-bindings
 Downloaded nvidia-cuda-cupti
 Downloaded nvidia-nvjitlink
 Downloaded networkx
 Downloaded triton
 Downloaded nvidia-curand
 Downloaded sympy
 Downloaded nvidia-nvshmem-cu13
 Downloaded nvidia-cufft
 Downloaded nvidia-cusolver
 Downloaded nvidia-cuda-nvrtc
 Downloaded nvidia-cusparse
 Downloaded nvidia-cusparselt-cu13
 Downloaded nvidia-nccl-cu13
 Downloaded nvidia-cudnn-cu13
 Downloaded nvidia-cublas
 Downloaded torch
Prepared 29 packages in 14.47s
Installed 29 packages in 318ms
 + cuda-bindings==13.0.3
 + cuda-pathfinder==1.2.2
 + cuda-toolkit==13.0.2
 + filelock==3.29.0
 + fsspec==2026.4.0
 + jinja2==3.1.6
 + markupsafe==3.0.3
 + mpmath==1.3.0
 + networkx==3.6.1
 + nvidia-cublas==13.1.0.3
 + nvidia-cuda-cupti==13.0.85
 + nvidia-cuda-nvrtc==13.0.88
 + nvidia-cuda-runtime==13.0.96
 + nvidia-cudnn-cu13==9.19.0.56
 + nvidia-cufft==12.0.0.61
 + nvidia-cufile==1.15.1.6
 + nvidia-curand==10.4.0.35
 + nvidia-cusolver==12.0.4.66
 + nvidia-cusparse==12.6.3.3
 + nvidia-cusparselt-cu13==0.8.0
 + nvidia-nccl-cu13==2.28.9
 + nvidia-nvjitlink==13.0.88
 + nvidia-nvshmem-cu13==3.4.5
 + nvidia-nvtx==13.0.85
 + setuptools==70.2.0
 + sympy==1.14.0
 + torch==2.11.0+cu130
 + triton==3.6.0
 + typing-extensions==4.15.0
+ uv pip install --find-links /wheels nixl
Using Python 3.12.13 environment at: /venv
Resolved 33 packages in 933ms
Downloading numpy (15.9MiB)
 Downloaded numpy
Prepared 4 packages in 1.17s
Installed 4 packages in 64ms
 + nixl==1.1.0
 + nixl-cu12==1.1.0
 + nixl-cu13==1.1.0
 + numpy==2.4.4
+ echo '=== environment ==='
=== environment ===
+ python -c '
import torch
print(f"torch={torch.__version__}  cuda={torch.version.cuda}")
expected = 13
got = int(torch.version.cuda.split(".")[0]) if torch.version.cuda else None
assert got == expected, f"expected torch built for cuda{expected}, got cuda{got}"
'
torch=2.11.0+cu130  cuda=13.0
=== nixl_ep dispatch ===
+ echo '=== nixl_ep dispatch ==='
+ python -
nixl_ep loaded from: /venv/lib/python3.12/site-packages/nixl_ep/__init__.py
dispatched backend : nixl_ep_cu13 -> /venv/lib/python3.12/site-packages/nixl_ep_cu13/__init__.py
loaded torch-abi .so: /venv/lib/python3.12/site-packages/nixl_ep_cu13/nixl_ep_cpp_torch211.cpython-312-x86_64-linux-gnu.so
Buffer       : <class 'nixl_ep_cu13.buffer.Buffer'>
Config       : <class 'nixl_ep_cu13.nixl_ep_cpp_torch211.Config'>
EventOverlap : <class 'nixl_ep_cu13.utils.EventOverlap'>
=== resolved deps of active .so ===
+ echo '=== resolved deps of active .so ==='
+ python -
/venv/lib/python3.12/site-packages/nixl_ep_cu13/nixl_ep_cpp_torch211.cpython-312-x86_64-linux-gnu.so
        libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x00007ffff1983000)
        libucp-14aa4f8d.so.0.0.0 => /venv/lib/python3.12/site-packages/nixl_ep_cu13/../nixl_cu13.libs/libucp-14aa4f8d.so.0.0.0 (0x00007ffff1872000)
        libucs-f64e0bb0.so.0.0.0 => /venv/lib/python3.12/site-packages/nixl_ep_cu13/../nixl_cu13.libs/libucs-f64e0bb0.so.0.0.0 (0x00007ffff17a0000)
        libtorch.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch.so (0x00007ffff173b000)
        libtorch_python.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so (0x00007ffff00e2000)
        libtorch_cuda.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so (0x00007fffd845e000)
        libcudart.so.13 => /usr/local/cuda/lib64/libcudart.so.13 (0x00007fffd7e00000)
        libtorch_cpu.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so (0x00007fffc2024000)
        libtorch_nvshmem.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_nvshmem.so (0x00007fffc1347000)
=== nixl dispatch ===
+ echo '=== nixl dispatch ==='
+ python -c 'import nixl; print("nixl OK", nixl.__file__)'
nixl OK /venv/lib/python3.12/site-packages/nixl/__init__.py
>>> image     : nvidia/cuda:13.0.0-runtime-ubuntu22.04
>>> cu-tag    : cu130
>>> torch     : 2.12
>>> wheel dir : /.autodirect/mtrswgwork/ovidium/wheels/nixl-ep-wheel-dispatch-20260515

==========
== CUDA ==
==========

CUDA Version 13.0.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

+ set -euo pipefail
+ apt-get update -qq
+ apt-get install -y -qq --no-install-recommends curl ca-certificates
debconf: delaying package configuration, since apt-utils is not installed
+ curl -LsSf https://astral.sh/uv/install.sh
+ sh
downloading uv 0.11.14 x86_64-unknown-linux-gnu
+ export PATH=/root/.local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/root/.local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ uv venv --python 3.12 /venv
Downloading cpython-3.12.13-linux-x86_64-gnu (download) (32.5MiB)
 Downloaded cpython-3.12.13-linux-x86_64-gnu (download)
Using CPython 3.12.13
Creating virtual environment at: /venv
Activate with: source venv/bin/activate
+ . /venv/bin/activate
++ '[' -z '' ']'
++ '[' -n x ']'
++ SCRIPT_PATH=/venv/bin/activate
++ '[' /venv/bin/activate = bash ']'
++ deactivate nondestructive
++ unset -f pydoc
++ '[' -z '' ']'
++ '[' -z '' ']'
++ hash -r
++ '[' -z '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ VIRTUAL_ENV=/venv
++ '[' linux-gnu = cygwin ']'
++ '[' linux-gnu = msys ']'
++ export VIRTUAL_ENV
++ '[' -z '' ']'
++ unset SCRIPT_PATH
++ _OLD_VIRTUAL_PATH=/root/.local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ PATH=/venv/bin:/root/.local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ '[' x '!=' x ']'
+++ basename /venv
++ VIRTUAL_ENV_PROMPT=venv
++ export VIRTUAL_ENV_PROMPT
++ '[' -z '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(venv) '
++ export PS1
++ alias pydoc
++ true
++ hash -r
+ TORCH_FLAGS=(--index-url "https://download.pytorch.org/whl/${CU_TAG}")
+ '[' cu130 = cu129 ']'
+ uv pip install --index-url https://download.pytorch.org/whl/cu130 'torch==2.12.*'
Using Python 3.12.13 environment at: /venv
Resolved 29 packages in 2.32s
Downloading nvidia-cuda-runtime (2.1MiB)
Downloading torch (508.0MiB)
Downloading nvidia-cusolver (191.6MiB)
Downloading nvidia-cufft (204.2MiB)
Downloading nvidia-cufile (1.2MiB)
Downloading nvidia-nvjitlink (38.8MiB)
Downloading nvidia-cuda-cupti (10.2MiB)
Downloading nvidia-curand (56.8MiB)
Downloading nvidia-cudnn-cu13 (349.2MiB)
Downloading nvidia-cusparse (139.2MiB)
Downloading nvidia-nvshmem-cu13 (57.6MiB)
Downloading nvidia-cublas (403.5MiB)
Downloading nvidia-cusparselt-cu13 (162.3MiB)
Downloading nvidia-cuda-nvrtc (86.0MiB)
Downloading nvidia-nccl-cu13 (196.4MiB)
Downloading sympy (6.0MiB)
Downloading networkx (2.0MiB)
Downloading cuda-bindings (11.6MiB)
Downloading triton (192.1MiB)
 Downloaded nvidia-cufile
 Downloaded nvidia-cuda-runtime
 Downloaded nvidia-cuda-cupti
 Downloaded cuda-bindings
 Downloaded nvidia-nvjitlink
 Downloaded networkx
 Downloaded triton
 Downloaded sympy
 Downloaded nvidia-curand
 Downloaded nvidia-cusolver
 Downloaded nvidia-nvshmem-cu13
 Downloaded nvidia-cufft
 Downloaded nvidia-cuda-nvrtc
 Downloaded nvidia-cusparse
 Downloaded nvidia-cusparselt-cu13
 Downloaded nvidia-cudnn-cu13
 Downloaded nvidia-nccl-cu13
 Downloaded nvidia-cublas
 Downloaded torch
Prepared 29 packages in 15.06s
Installed 29 packages in 310ms
 + cuda-bindings==13.0.3
 + cuda-pathfinder==1.2.2
 + cuda-toolkit==13.0.2
 + filelock==3.29.0
 + fsspec==2026.4.0
 + jinja2==3.1.6
 + markupsafe==3.0.3
 + mpmath==1.3.0
 + networkx==3.6.1
 + nvidia-cublas==13.1.1.3
 + nvidia-cuda-cupti==13.0.85
 + nvidia-cuda-nvrtc==13.0.88
 + nvidia-cuda-runtime==13.0.96
 + nvidia-cudnn-cu13==9.20.0.48
 + nvidia-cufft==12.0.0.61
 + nvidia-cufile==1.15.1.6
 + nvidia-curand==10.4.0.35
 + nvidia-cusolver==12.0.4.66
 + nvidia-cusparse==12.6.3.3
 + nvidia-cusparselt-cu13==0.8.1
 + nvidia-nccl-cu13==2.29.7
 + nvidia-nvjitlink==13.0.88
 + nvidia-nvshmem-cu13==3.4.5
 + nvidia-nvtx==13.0.85
 + setuptools==70.2.0
 + sympy==1.14.0
 + torch==2.12.0+cu130
 + triton==3.7.0
 + typing-extensions==4.15.0
+ uv pip install --find-links /wheels nixl
Using Python 3.12.13 environment at: /venv
Resolved 33 packages in 827ms
Downloading numpy (15.9MiB)
 Downloaded numpy
Prepared 4 packages in 900ms
Installed 4 packages in 61ms
 + nixl==1.1.0
 + nixl-cu12==1.1.0
 + nixl-cu13==1.1.0
 + numpy==2.4.4
+ echo '=== environment ==='
=== environment ===
+ python -c '
import torch
print(f"torch={torch.__version__}  cuda={torch.version.cuda}")
expected = 13
got = int(torch.version.cuda.split(".")[0]) if torch.version.cuda else None
assert got == expected, f"expected torch built for cuda{expected}, got cuda{got}"
'
torch=2.12.0+cu130  cuda=13.0
=== nixl_ep dispatch ===
+ echo '=== nixl_ep dispatch ==='
+ python -
nixl_ep loaded from: /venv/lib/python3.12/site-packages/nixl_ep/__init__.py
dispatched backend : nixl_ep_cu13 -> /venv/lib/python3.12/site-packages/nixl_ep_cu13/__init__.py
loaded torch-abi .so: /venv/lib/python3.12/site-packages/nixl_ep_cu13/nixl_ep_cpp_torch212.cpython-312-x86_64-linux-gnu.so
Buffer       : <class 'nixl_ep_cu13.buffer.Buffer'>
Config       : <class 'nixl_ep_cu13.nixl_ep_cpp_torch212.Config'>
EventOverlap : <class 'nixl_ep_cu13.utils.EventOverlap'>
=== resolved deps of active .so ===
+ echo '=== resolved deps of active .so ==='
+ python -
/venv/lib/python3.12/site-packages/nixl_ep_cu13/nixl_ep_cpp_torch212.cpython-312-x86_64-linux-gnu.so
        libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x00007ffff196d000)
        libucp-14aa4f8d.so.0.0.0 => /venv/lib/python3.12/site-packages/nixl_ep_cu13/../nixl_cu13.libs/libucp-14aa4f8d.so.0.0.0 (0x00007ffff185c000)
        libucs-f64e0bb0.so.0.0.0 => /venv/lib/python3.12/site-packages/nixl_ep_cu13/../nixl_cu13.libs/libucs-f64e0bb0.so.0.0.0 (0x00007ffff178a000)
        libtorch.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch.so (0x00007ffff1757000)
        libtorch_python.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so (0x00007ffff00fb000)
        libtorch_cuda.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so (0x00007fffd7edd000)
        libcudart.so.13 => /usr/local/cuda/lib64/libcudart.so.13 (0x00007fffd7a00000)
        libtorch_cpu.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so (0x00007fffc1db8000)
        libtorch_nvshmem.so => /venv/lib/python3.12/site-packages/torch/lib/libtorch_nvshmem.so (0x00007fffc1242000)
=== nixl dispatch ===
+ echo '=== nixl dispatch ==='
+ python -c 'import nixl; print("nixl OK", nixl.__file__)'
nixl OK /venv/lib/python3.12/site-packages/nixl/__init__.py

Summary by CodeRabbit

New Features
- Build can produce wheels for multiple PyTorch versions in one run.
- Runtime auto-selects and loads the appropriate Torch/CUDA-versioned extension.
Chores
- Wheel pipeline reorganized to build, repair, and merge per-PyTorch artifacts and inject plugins.
- Added a utility to merge extension artifacts between wheels.
- Packaging updated to include the EP package alongside the core distribution.
Style
- Type-checker config adjusted to avoid duplicate-package resolution errors.

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

github-actions · 2026-05-15T10:39:37Z

👋 Hi ovidiusm! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

dpressle · 2026-05-15T10:39:58Z

👀 Investigating Run Pre-Commit Hooks…

dpressle · 2026-05-15T10:41:50Z

🤖 CI Triage Agent — Run Pre-Commit Hooks · commit b299774a

I have everything needed to produce a complete diagnosis. Here is the full breakdown:

Summary: Pre-commit hooks (mypy + flake8) failed on two files introduced by PR #1646 — a new nixl_ep/__init__.py under src/bindings/python/nixl-meta/ and the existing one under examples/device/ep/nixl_ep/.

Root cause: Two independent problems detected by pre-commit:

1. `mypy` — Duplicate module `nixl_ep` (exit code 2)

src/bindings/python/nixl-meta/nixl_ep/__init__.py: error: Duplicate module named "nixl_ep"
(also at "examples/device/ep/nixl_ep/__init__.py")

PR #1646 introduces a new nixl_ep Python package at src/bindings/python/nixl-meta/nixl_ep/__init__.py. Because a module with the identical name (nixl_ep) already exists at examples/device/ep/nixl_ep/__init__.py, mypy cannot resolve which one to check and raises a fatal error, aborting all further type-checking.

2. `flake8` — E402 module-level imports not at top of file (exit code 1)

examples/device/ep/nixl_ep/__init__.py:28:1: E402 module level import not at top of file
examples/device/ep/nixl_ep/__init__.py:29:1: E402 module level import not at top of file

Lines 28–29 of examples/device/ep/nixl_ep/__init__.py perform import torch and other imports after non-import statements (the topk_idx_t/Config assignments on lines 27–28 sit between the copyright header and two relative imports above them, disrupting the expected import-only block at module top). flake8 flags these as out-of-order imports.

Implicated commit: b299774a906dbc5c73e9b7064726a3d370c90f0f (PR #1646 branch nixl-ep-wheel-dispatch)

File:

src/bindings/python/nixl-meta/nixl_ep/__init__.py — new file, causes duplicate-module collision for mypy
examples/device/ep/nixl_ep/__init__.py:28-29 — E402 flake8 violations

Suggested fix:

mypy duplicate module — Pick one of mypy's documented resolutions:
- Preferred: Add a mypy.ini / setup.cfg / pyproject.toml [mypy] section (or update the existing one in .pre-commit-config.yaml's mypy hook args) with --exclude examples/device/ep/nixl_ep so mypy only checks the canonical src/ copy, or
- Rename the new package (e.g. nixl_ep_meta) so the two directories no longer share a module name, or
- Add --explicit-package-bases to the mypy invocation and ensure both directories live under a common namespace with distinct qualified names.
flake8 E402 — In examples/device/ep/nixl_ep/__init__.py, move the import torch statement (line 21) and the two relative imports (from . import ..., from .buffer import ..., from .utils import ...) before any non-import executable statements, or suppress the two offending lines with # noqa: E402 if the order is intentional (e.g. torch must be imported before the C++ extension).

Related: PR #1646 (nixl-ep-wheel-dispatch), no prior issues found matching this exact collision.

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

ovidiusm · 2026-05-15T10:45:51Z

/build

…he code Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

ovidiusm · 2026-05-15T11:13:21Z

/build

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

Signed-off-by: ovidiusm <ovidium@nvidia.com>

ovidiusm · 2026-05-18T07:46:25Z

/build

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@contrib/build-wheel.sh`:
- Around line 159-169: The probe and installer flags currently use
--extra-index-url which leaves PyPI as the primary index and can pull non-CUDA
wheels; update the torch resolution in torch_classify(), torch_uv_flags(), and
build_wheel to pin CUDA indexes as the primary --index-url: for the stable
probe/branch use --index-url "$TORCH_STABLE_INDEX" (remove index-strategy), for
nightly use --index-url "$TORCH_NIGHTLY_INDEX" and keep --extra-index-url
"$TORCH_STABLE_INDEX" as a fallback, and split the build-deps vs torch install
steps so build deps are installed from PyPI without torch index flags while
torch itself is installed with the CUDA-pinned --index-url. Ensure the commands
invoking uv pip install use the named functions torch_classify(),
torch_uv_flags(), and build_wheel locations when applying these changes.

In `@examples/device/ep/nixl_ep/__init__.py`:
- Around line 26-30: Detect the torch minor version string currently built into
_torch_mm and validate it against an explicit supported set before calling
importlib.import_module; if it is not supported, raise an ImportError that
includes the full detected torch.__version__ and a list of supported minor
versions. Specifically, before calling
importlib.import_module(f".nixl_ep_cpp_torch{_torch_mm}", __package__), check
_torch_mm against the allowed values (e.g., a tuple/list you define), and if
absent raise ImportError with a clear message naming torch.__version__ and the
supported minors; otherwise proceed to import and set
sys.modules[f"{__package__}.nixl_ep_cpp"] = _nixl_ep_cpp as before.

In `@src/bindings/python/nixl-meta/nixl_ep/__init__.py`:
- Around line 44-50: The loop that tries backends using
importlib.import_module(mod_name) currently only catches ModuleNotFoundError, so
import-time failures like ImportError or OSError will propagate and stop
fallback; change the except to catch (ModuleNotFoundError, ImportError, OSError)
as e, and keep the existing special-case check for ModuleNotFoundError (if
e.name != mod_name: raise) but for ImportError/OSError simply continue to the
next mod_name; ensure the successful path still returns
importlib.import_module(mod_name).__name__ inside the try block.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 77bcd126-f819-48ff-ae85-b8b81fc126ca

📥 Commits

Reviewing files that changed from the base of the PR and between 71e6817 and 6c5a83c.

📒 Files selected for processing (11)

contrib/Dockerfile.manylinux
contrib/build-wheel.sh
contrib/wheel_merge.py
examples/device/ep/meson.build
examples/device/ep/nixl_ep/__init__.py
pyproject.toml
src/bindings/python/nixl-meta/meson.build
src/bindings/python/nixl-meta/nixl/meson.build
src/bindings/python/nixl-meta/nixl_ep/__init__.py
src/bindings/python/nixl-meta/nixl_ep/meson.build
src/bindings/python/nixl-meta/pyproject.toml.in

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/bindings/python/nixl-meta/nixl_ep/__init__.py`:
- Around line 1-17: There are two top-level packages named nixl_ep (this
__init__.py and examples/device/ep/nixl_ep) causing mypy's "Duplicate module"
error; fix by either renaming this package (e.g., to nixl_ep_meta) and updating
any imports referencing nixl_ep, or keep the name and disambiguate CI by
excluding the example package from type-checking (add an exclude or files/paths
entry in mypy.ini to ignore examples/device/ep/nixl_ep); locate the module in
__init__.py (nixl_ep) and the example package path (examples/device/ep/nixl_ep)
when applying the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ead6418f-f2b7-4879-8526-9378403ce77e

📥 Commits

Reviewing files that changed from the base of the PR and between 6c5a83c and edda9d1.

📒 Files selected for processing (3)

src/bindings/python/nixl-meta/nixl/meson.build
src/bindings/python/nixl-meta/nixl_ep/__init__.py
src/bindings/python/nixl-meta/nixl_ep/meson.build

…comments Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

dpressle · 2026-05-18T08:41:04Z

👀 Investigating Run Pre-Commit Hooks…

dpressle · 2026-05-18T08:41:41Z

🤖 CI Triage Agent — Run Pre-Commit Hooks · commit 8d59ad79

I have everything I need. The diagnosis is complete and unambiguous.

Summary: codespell pre-commit hook failed on contrib/build-wheel.sh due to a typo — doesnt instead of doesn't — introduced by PR #1646 at line 230.

Root cause: Commit 8d59ad7937feaa673c59dc82d9b5b3dc01c06c8d (branch pull-request/1646, PR "NIXL EP wheels: Add support for multiple CUDA and PyTorch versions") added new content to contrib/build-wheel.sh. At line 230 of the modified file, the word doesnt was written without an apostrophe. The codespell hook caught this with exit code 65:

contrib/build-wheel.sh:230: doesnt ==> doesn't, does not

All other hooks (mypy, isort, black, flake8 — skipped as no Python files touched; case conflicts, shebangs, merge conflicts, line endings, trailing whitespace — all Passed). Only codespell failed.

Implicated commit: 8d59ad7937feaa673c59dc82d9b5b3dc01c06c8d — PR #1646 author (branch pull-request/1646)

File: contrib/build-wheel.sh:230

Suggested fix: On line 230 of contrib/build-wheel.sh, replace doesnt with doesn't (or does not). The fix is a one-character change — add the apostrophe:

-  # This doesnt require CUDA 13 ...
+  # This doesn't require CUDA 13 ...

After applying the fix, re-run pre-commit run --files contrib/build-wheel.sh locally to verify before pushing.

Related: PR #1646 — NIXL EP wheels: Add support for multiple CUDA and PyTorch versions

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

contrib/build-wheel.sh (1)

153-158: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin the nightly probe to the nightly CUDA index.

This branch still leaves PyPI as the primary index by using only --extra-index-url. That means torch_classify() can mark a version as nightly based on a non-CUDA/PyPI match, then install_torch() fails later because it installs from the nightly CUDA index only.

Suggested fix

-        elif uv pip install --dry-run --pre \
-            --python "$PROBE/bin/python" \
-            --extra-index-url "$TORCH_STABLE_INDEX" \
-            --extra-index-url "$TORCH_NIGHTLY_INDEX" \
-            --index-strategy unsafe-best-match \
+        elif uv pip install --dry-run --pre \
+            --python "$PROBE/bin/python" \
+            --index-url "$TORCH_NIGHTLY_INDEX" \
+            --extra-index-url "$TORCH_STABLE_INDEX" \
             "torch==${VER}.*" >/dev/null 2>&1; then
             CLASS="nightly"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@contrib/build-wheel.sh` around lines 153 - 158, The nightly probe currently
leaves PyPI as the primary index by using --extra-index-url for
$TORCH_NIGHTLY_INDEX, which can misclassify a version as nightly; update the pip
dry-run probe in build-wheel.sh (the `elif` branch that runs `pip install
--dry-run ... "torch==${VER}.*"`) to use --index-url "$TORCH_NIGHTLY_INDEX"
(make the nightly CUDA index the primary index) instead of --extra-index-url for
the nightly index so the probe checks the correct source; this change ensures
`torch_classify()` matches the same index that `install_torch()` will use.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@contrib/build-wheel.sh`:
- Around line 229-230: Fix the typo in the comment string that currently reads
"doesnt" to include the missing apostrophe ("doesn't") so codespell no longer
fails; update the comment in the contrib/build-wheel.sh diff block (the comment
beginning "torch + nvidia-* in each venv is several GB; tear down so the docker
layer doesnt blow up across the (python, torch) matrix.") to use "doesn't".
- Around line 71-75: The new CLI flag --torch-versions is parsed into
TORCH_VERSIONS in the case block but not documented in the script's help/usage
output; update the usage/help text (the function or block that prints the CLI
usage) to include a short description of --torch-versions and its expected
value, matching the style of the other options so users see the flag when
running --help; ensure the option name and variable TORCH_VERSIONS are
referenced so the docs stay consistent with the parser.

---

Duplicate comments:
In `@contrib/build-wheel.sh`:
- Around line 153-158: The nightly probe currently leaves PyPI as the primary
index by using --extra-index-url for $TORCH_NIGHTLY_INDEX, which can misclassify
a version as nightly; update the pip dry-run probe in build-wheel.sh (the `elif`
branch that runs `pip install --dry-run ... "torch==${VER}.*"`) to use
--index-url "$TORCH_NIGHTLY_INDEX" (make the nightly CUDA index the primary
index) instead of --extra-index-url for the nightly index so the probe checks
the correct source; this change ensures `torch_classify()` matches the same
index that `install_torch()` will use.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 44cbbc54-f86e-4ccc-b5d2-984f558c06d1

📥 Commits

Reviewing files that changed from the base of the PR and between edda9d1 and 8d59ad7.

📒 Files selected for processing (1)

contrib/build-wheel.sh

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@contrib/build-wheel.sh`:
- Line 272: The assignment BASE_WHL=$(ls "$TMP_DIR"/dist/*.whl) is fragile
because it parses ls output and can yield multiple newline-separated names;
replace it with glob expansion and explicit handling of matches (e.g., use an
array like files=( "$TMP_DIR"/dist/*.whl ) or set -- "$TMP_DIR"/dist/*.whl, then
validate count and pick the intended entry) and update any downstream uses
(e.g., cp "$BASE_WHL") to use the chosen single path; modify the same pattern at
the other occurrence around the earlier noted line (the second glob usage) so
both BASE_WHL and the related variable(s) come from safe shell globbing and
include a check for zero or multiple matches.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bc6364c8-24a7-4d99-8f74-de7d541ca4a2

📥 Commits

Reviewing files that changed from the base of the PR and between 8d59ad7 and 83264e8.

📒 Files selected for processing (1)

contrib/build-wheel.sh

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@contrib/build-wheel.sh`:
- Around line 239-240: The shell commands calling auditwheel repair and
./contrib/wheel_add_ucx_plugins.py should quote directory/plateform variables to
prevent word-splitting: update the auditwheel repair invocation to quote
"$WHL_PLATFORM" and update the wheel_add_ucx_plugins.py call to quote
"$UCX_PLUGINS_DIR", "$NIXL_PLUGINS_DIR" (keep $AUDITWHEEL_EXCLUDES unquoted to
allow splitting) so the variables are treated as single arguments even when they
contain spaces.
- Around line 312-314: The final copy step uses a raw glob to copy wheels (cp
"$TMP_DIR"/dist/*.whl "$OUTPUT_DIR") instead of validating/normalizing the
output like the multi-torch path; replace that raw copy with a call to the
existing helper single_wheel(...) to pick/validate the built wheel and then move
it to OUTPUT_DIR. Locate the block that calls build_wheel and repair_wheel
(functions build_wheel and repair_wheel) and change the final cp step to use
single_wheel pointing at "$TMP_DIR/dist" (and ensure it writes or moves the
selected wheel into OUTPUT_DIR), preserving TMP_DIR and OUTPUT_DIR variables.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 745e2b49-c6b0-474a-8df7-5d7c52d4ad21

📥 Commits

Reviewing files that changed from the base of the PR and between 83264e8 and 3d10ba5.

📒 Files selected for processing (1)

contrib/build-wheel.sh

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

ovidiusm · 2026-05-18T09:49:02Z

/build

guy-ealey-morag

We discussed some points to improve in later PRs, but other than that LGTM

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

ovidiusm added 8 commits May 15, 2026 02:58

NIXL EP: suffixed builds for CUDA versions and PyTorch versions

8fcb247

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

Fix torch version pin

8fc3d56

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

Use extra index correctly

da2f6df

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

Add missing repair wheel step

771346a

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

Refactor to create venv in a controlled way

a79b000

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

Clean up diff

37509dd

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

Rename flags variable

4d380cd

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

Remove torch 2.13 build, it fails

b299774

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

pull-request-size Bot added the size/L label May 15, 2026

copy-pr-bot Bot temporarily deployed to SWX_AWS May 15, 2026 10:39 Inactive

github-actions Bot added the external-contribution label May 15, 2026

Fix mypy CI check

3fec58f

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

copy-pr-bot Bot temporarily deployed to SWX_AWS May 15, 2026 10:45 Inactive

Revert C++ 20 flags, out of scope and unsafe to build only parts of t…

3896d93

…he code Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

copy-pr-bot Bot temporarily deployed to SWX_AWS May 15, 2026 10:49 Inactive

Fix import of binary bindings

3511722

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

copy-pr-bot Bot temporarily deployed to SWX_AWS May 15, 2026 12:10 Inactive

ovidiusm added 4 commits May 15, 2026 15:55

Refactor meson build scripts to track deps correctly

96b3b7e

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

Rename variable for clarity

90e282c

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

Refactor wheel merge from bash into Python

b30995a

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

Refactor bash script to simplify

758c025

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

pull-request-size Bot added size/XL and removed size/L labels May 15, 2026

copy-pr-bot Bot temporarily deployed to SWX_AWS May 15, 2026 14:13 Inactive

Format python code

deb5840

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

copy-pr-bot Bot temporarily deployed to SWX_AWS May 18, 2026 07:43 Inactive

Rename function _load_ep_backend to _load_ep_module

edf4b94

Signed-off-by: ovidiusm <ovidium@nvidia.com>

copy-pr-bot Bot had a problem deploying to SWX_AWS May 18, 2026 07:44 Failure

Merge branch 'main' into nixl-ep-wheel-dispatch

edda9d1

copy-pr-bot Bot had a problem deploying to SWX_AWS May 18, 2026 07:46 Failure

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread contrib/build-wheel.sh

Comment thread examples/device/ep/nixl_ep/__init__.py

Comment thread src/bindings/python/nixl-meta/nixl_ep/__init__.py

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread src/bindings/python/nixl-meta/nixl_ep/__init__.py

Update build script to address torch installation issue and simplify …

8d59ad7

…comments Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

copy-pr-bot Bot temporarily deployed to SWX_AWS May 18, 2026 08:40 Inactive

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread contrib/build-wheel.sh

Comment thread contrib/build-wheel.sh Outdated

Fix help and comment

83264e8

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

copy-pr-bot Bot temporarily deployed to SWX_AWS May 18, 2026 08:47 Inactive

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread contrib/build-wheel.sh Outdated

Address comment

3d10ba5

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

copy-pr-bot Bot temporarily deployed to SWX_AWS May 18, 2026 08:58 Inactive

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread contrib/build-wheel.sh Outdated

Comment thread contrib/build-wheel.sh Outdated

Refactoring

a8f6bdb

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

copy-pr-bot Bot temporarily deployed to SWX_AWS May 18, 2026 09:18 Inactive

guy-ealey-morag previously approved these changes May 20, 2026

View reviewed changes

Address comments

ca86eb7

Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>

ovidiusm dismissed guy-ealey-morag’s stale review via ca86eb7 May 21, 2026 14:11

copy-pr-bot Bot temporarily deployed to SWX_AWS May 21, 2026 14:11 Inactive

ovidiusm marked this pull request as draft May 21, 2026 14:12

alec-flowers mentioned this pull request Jun 1, 2026

[Bugfix][CI] Fix ImportError: libcudart.so.12: cannot open shared object file: No such file or directory vllm-project/vllm#44192

Open

ofirfarjun7 mentioned this pull request Jun 3, 2026

NIXL/EP/WHEEL: pack cu in diff namespace #1727

Merged

ovidiusm closed this Jun 5, 2026

Conversation

ovidiusm commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

How?

Wheel layout

Testing

Test matrix

Testing strategy

Smoke test script

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

dpressle commented May 15, 2026

Uh oh!

dpressle commented May 15, 2026

1. mypy — Duplicate module nixl_ep (exit code 2)

2. flake8 — E402 module-level imports not at top of file (exit code 1)

Uh oh!

ovidiusm commented May 15, 2026

Uh oh!

ovidiusm commented May 15, 2026

Uh oh!

ovidiusm commented May 18, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dpressle commented May 18, 2026

Uh oh!

dpressle commented May 18, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ovidiusm commented May 18, 2026

Uh oh!

guy-ealey-morag left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ovidiusm commented May 15, 2026 •

edited by coderabbitai Bot

Loading

1. `mypy` — Duplicate module `nixl_ep` (exit code 2)

2. `flake8` — E402 module-level imports not at top of file (exit code 1)