Skip to content

[CI][ROCm] Ship RIXL with vllm/vllm-openai-rocm#41634

Merged
tjtanaa merged 5 commits intovllm-project:mainfrom
simondanielsson:ci/rixl-final-image
May 8, 2026
Merged

[CI][ROCm] Ship RIXL with vllm/vllm-openai-rocm#41634
tjtanaa merged 5 commits intovllm-project:mainfrom
simondanielsson:ci/rixl-final-image

Conversation

@simondanielsson
Copy link
Copy Markdown
Contributor

@simondanielsson simondanielsson commented May 4, 2026

Purpose

Fixes #41637.

RIXL is not readily available in the official vLLM ROCm image:

$ docker run --rm --entrypoint python3  vllm/vllm-openai-rocm:v0.20.1 -c "from rixl._api import nixl_agent; print('RIXL OK')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'rixl'

In contrast to NIXL, there are no pre-built wheels for RIXL yet. Hence you cannot use the NixlConnector for PD disagg on AMD platforms without installing manually from source, limiting reproducibility and productivity. My suggestion is (to follow the current vLLM documentation and) ship RIXL with the ROCm image, at least until RIXL wheels are readily available.

RIXL is already installed in the test stage of the image, but not in the final stage. This PR installs it also in the final stage.

Note: This aligns the expected behavior with NV stack:

$ docker run --rm --entrypoint python3  vllm/vllm-openai:v0.20.1 -c "from nixl._api import nixl_agent; print('NIXL OK')"
NIXL OK

Test Plan

Testing on 8xMI300X node with Thor2 NICs.

1. Build

Build

docker build \
  -f docker/Dockerfile.rocm \
  --build-arg BASE_IMAGE=rocm/vllm-dev:base \
  -t vllm/vllm-openai-rocm:local \
  .

Depending on platform you might need RDMA userspace libs. This is for Thor2:

Expand for details
# docker/Dockerfile.rocm_dev
ARG BASE_IMAGE=vllm/vllm-openai-rocm:local
FROM ${BASE_IMAGE}

RUN apt-get update -q -y && apt-get install -q -y \
        autoconf \
        libtool \
        unzip \
        wget \
    && rm -rf /var/lib/apt/lists/*

# Thor2 (Broadcom BCM5760x) RDMA user-space driver (libbnxt_re).
# The inbox libbnxt_re-rdmav*.so shipped by libibverbs is renamed so the
# vendor build takes precedence via libibverbs provider discovery.
RUN wget -q \
        https://docs.broadcom.com/docs-and-downloads/ethernet-network-adapters/NXE/Thor2/GCA1/bcm5760x_230.2.52.0a.zip \
    && unzip -q bcm5760x_230.2.52.0a.zip \
    && cd bcm5760x_230.2.52.0a/drivers_linux/bnxt_rocelib/ \
    && tar -xf "$(find . -name 'libbnxt*.tar.gz' | head -n 1)" \
    && cd "$(find . -maxdepth 1 -type d -name 'libbnxt*' ! -name '*.tar.gz' | head -n 1)" \
    && sh autogen.sh \
    && ./configure \
    && make \
    && find /usr/lib64/ /usr/lib -name "libbnxt_re-rdmav*.so" \
         -exec mv {} {}.inbox \; 2>/dev/null || true \
    && make install all \
    && echo /usr/local/lib >> /etc/ld.so.conf \
    && ldconfig \
    && cp -f bnxt_re.driver /etc/libibverbs.d/ \
    && cd / \
    && rm -rf /bcm5760x_230.2.52.0a /bcm5760x_230.2.52.0a.zip

which we can build with

docker build \
    -f docker/Dockerfile.rocm_dev \
    -t vllm/vllm-openai-rocm:local-rixl \
    .

2. Test

  1. RIXL importable:
docker run --rm --entrypoint python3  vllm/vllm-openai-rocm:local -c "from rixl._api import nixl_agent; print('RIXL OK')"
  1. vLLM starts with NixlConnector without errors
docker run \
  --rm \
  --name rixl-prefill \
  --init --network host --ipc host --privileged \
  --cap-add SYS_PTRACE --security-opt seccomp=unconfined \
  --ulimit memlock=-1 --ulimit stack=67108864 \
  --shm-size 256G \
  --group-add video --group-add render \
  --device /dev/kfd --device /dev/dri --device /dev/infiniband \
  -v /sys:/sys \
  -v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
  -e HF_HOME=/root/.cache/huggingface \
  -e HF_HUB_ENABLE_HF_TRANSFER=0 \
  -e NCCL_MIN_NCHANNELS=112 \
  -e VLLM_ENGINE_READY_TIMEOUT_S=3600 \
  -e VLLM_ROCM_USE_AITER=1 \
  -e VLLM_ROCM_USE_AITER_PAGED_ATTN=0 \
  -e VLLM_ROCM_USE_AITER_RMSNORM=1 \
  -e NCCL_SOCKET_IFNAME=ens51np0 \
  vllm/vllm-openai-rocm:local-rixl \
  deepseek-ai/DeepSeek-R1-0528 \
    --port 8100 \
    --load-format dummy \
    --tensor-parallel-size 8 \
    --kv-cache-dtype fp8 \
    --gpu-memory-utilization 0.7 \
    --max-model-len 16384 \
    --trust-remote-code \
    --block-size 1 \
    --enforce-eager \
    --kv-transfer-config '{
      "kv_connector": "NixlConnector",
      "kv_role": "kv_producer"
    }'

Test Result

  1. Importable:
$ docker run --rm --entrypoint python3  vllm/vllm-openai-rocm:local -c "from rixl._api import nixl_agent; print('RIXL OK')"
RIXL OK
  1. vllm runs with NIXL:
...
(EngineCore pid=621) INFO 05-04 15:43:37 [core.py:306] init engine (profile, create kv cache, warmup model) took 23.54 s
(EngineCore pid=621) INFO 05-04 15:43:40 [factory.py:64] Creating v1 connector with name: NixlConnector and engine_id: 4babd70a-813e-475e-b651-85c47f14298a
(EngineCore pid=621) WARNING 05-04 15:43:40 [base.py:189] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
(EngineCore pid=621) INFO 05-04 15:43:40 [scheduler.py:87] Initializing NIXL Scheduler 4babd70a-813e-475e-b651-85c47f14298a
(EngineCore pid=621) INFO 05-04 15:43:40 [scheduler.py:89] Hybrid Memory Allocator is enabled with NIXL
(EngineCore pid=621) INFO 05-04 15:43:42 [vllm.py:844] Asynchronous scheduling is enabled.
(EngineCore pid=621) WARNING 05-04 15:43:42 [vllm.py:900] Enforce eager set, disabling torch.compile and CUDAGraphs. This is equivalent to setting -cc.mode=none -cc.cudagraph_mode=none
(EngineCore pid=621) WARNING 05-04 15:43:42 [vllm.py:918] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(EngineCore pid=621) INFO 05-04 15:43:42 [kernel.py:210] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=['vllm_c', 'native'], fused_add_rms_norm=['vllm_c', 'native'])
(EngineCore pid=621) INFO 05-04 15:43:42 [vllm.py:1093] Cudagraph is disabled under eager mode
(EngineCore pid=621) INFO 05-04 15:43:42 [compilation.py:303] Enabled custom fusions: norm_quant, act_quant, allreduce_rms
(APIServer pid=7) INFO 05-04 15:43:42 [api_server.py:598] Supported tasks: ['generate']
(APIServer pid=7) WARNING 05-04 15:43:42 [model.py:1449] Default vLLM sampling parameters have been overridden by the model's `generation_config.json`: `{'temperature': 0.6, 'top_p': 0.95}`. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`.
(APIServer pid=7) INFO 05-04 15:43:45 [hf.py:482] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
(APIServer pid=7) INFO 05-04 15:43:45 [api_server.py:602] Starting vLLM server on http://0.0.0.0:8100
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:37] Available routes are:
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /v1/chat/completions/batch, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /v1/messages/count_tokens, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /generative_scoring, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /v1/chat/completions/render, Methods: POST
(APIServer pid=7) INFO 05-04 15:43:45 [launcher.py:46] Route: /v1/completions/render, Methods: POST
(APIServer pid=7) INFO:     Started server process [7]
(APIServer pid=7) INFO:     Waiting for application startup.
(APIServer pid=7) INFO:     Application startup complete.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@simondanielsson simondanielsson changed the title [CI][ROCm] install RIXL wheel in final stage of Dockerfile.rocm [CI][ROCm] Install RIXL wheel in final stage of Dockerfile.rocm May 4, 2026
@mergify mergify Bot added ci/build rocm Related to AMD ROCm labels May 4, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD May 4, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the ROCm Dockerfile to install the RIXL wheel and its RDMA runtime dependencies, and sets the HSA_ENABLE_IPC_MODE_LEGACY environment variable to avoid GPU memory pinning issues. Feedback was provided to optimize the system package installation by adding the --no-install-recommends flag, removing an invalid flag from the update command, and reordering the instructions to improve Docker layer caching.

Comment thread docker/Dockerfile.rocm Outdated
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@simondanielsson simondanielsson marked this pull request as ready for review May 4, 2026 14:45
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@simondanielsson simondanielsson changed the title [CI][ROCm] Install RIXL wheel in final stage of Dockerfile.rocm [CI][ROCm] Ship RIXL with vllm/vllm-openai-rocm May 4, 2026
@simondanielsson simondanielsson changed the title [CI][ROCm] Ship RIXL with vllm/vllm-openai-rocm [CI][ROCm] Ship RIXL with vllm/vllm-openai-rocm May 4, 2026
@simondanielsson simondanielsson changed the title [CI][ROCm] Ship RIXL with vllm/vllm-openai-rocm [CI][ROCm] Ship RIXL with vllm/vllm-openai-rocm May 4, 2026
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 4, 2026

Documentation preview: https://vllm--41634.org.readthedocs.build/en/41634/

@mergify mergify Bot added the documentation Improvements or additions to documentation label May 4, 2026
Copy link
Copy Markdown
Contributor

@divakar-amd divakar-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 2 comments. Looks good overall.

Comment thread docker/Dockerfile.rocm Outdated
Comment thread docker/Dockerfile.rocm Outdated
… earrlier

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Copy link
Copy Markdown
Contributor

@divakar-amd divakar-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tested the change with a single-node 1P-1D disaggregated setup on mi300.

@functionstackx
Copy link
Copy Markdown

@simondanielsson thanks for this PR for having RIXL out of the box (for models where rixl is better than mori)

@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label May 8, 2026
Copy link
Copy Markdown
Collaborator

@tjtanaa tjtanaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tjtanaa tjtanaa enabled auto-merge (squash) May 8, 2026 06:38
@tjtanaa tjtanaa merged commit f9b9bf3 into vllm-project:main May 8, 2026
13 of 14 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD May 8, 2026
@simondanielsson simondanielsson deleted the ci/rixl-final-image branch May 8, 2026 07:19
libinta pushed a commit to libinta/vllm that referenced this pull request May 8, 2026
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: Libin Tang <libin.tang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Installation]: RIXL not available

4 participants