Skip to content

[Bugfix][ROCm] Include float8_e4m3fnuz in NCCL Dtype Dispatching#33713

Merged
vllm-bot merged 3 commits intovllm-project:mainfrom
ROCm:micah/nccl-dtype
Feb 4, 2026
Merged

[Bugfix][ROCm] Include float8_e4m3fnuz in NCCL Dtype Dispatching#33713
vllm-bot merged 3 commits intovllm-project:mainfrom
ROCm:micah/nccl-dtype

Conversation

@micah-wil
Copy link
Copy Markdown
Contributor

@micah-wil micah-wil commented Feb 3, 2026

#33030 fixed dtypes in the Pynccl wrapper, but omitted the case for float8_e4m3fnuz which is used on MI300/325.

Repro command:

vllm serve QWen/Qwen3-30B-A3B-FP8 --enforce-eager --enable-eplb --all2all-backend allgather_reducescatter --eplb-config '{"window_size":10, "step_interval":100, "num_redundant_experts":0, "log_balancedness":true}' --tensor-parallel-size 2 --data-parallel-size 2  --enable-expert-parallel

When running this on main, I am seeing the following:

RuntimeError: Worker failed with error 'Unsupported dtype torch.float8_e4m3fnuz: should be one of int8, uint8, int32, int64, float16, float32, float64, bfloat16, float8e4m3.', please check the stack trace above for the root cause

With this PR, the server starts as expected. This fixes the Qwen3-30B-A3B-FP8-block Accuracy failure on AMD CI. When running bash .buildkite/scripts/scheduled_integration_test/qwen30b_a3b_fp8_block_ep_eplb.sh 0.8 200 8020 on MI300X, I am now seeing:

Evaluating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:58<00:00,  3.44it/s]

Results:
Accuracy: 0.915
Invalid responses: 0.000
Total latency: 58.083 s
Questions per second: 3.443
Total output tokens: 21947
Output tokens per second: 377.858

RCCL only defines ncclFloat8e4m3 without distinguishing between the fn and fnuz variants, so that is why both torch dtype variants map to the same NCCL dtype.
https://github.com/ROCm/rocm-systems/blob/0334750b74d14d92102196d9bd435d3ca4fc67ed/projects/rccl/src/nccl.h.in#L468

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
@mergify mergify bot added rocm Related to AMD ROCm bug Something isn't working labels Feb 3, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Feb 3, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug on ROCm platforms where torch.float8_e4m3fnuz was not supported for NCCL dtype dispatching, causing a RuntimeError on MI300/325 hardware. The change correctly adds this dtype to the mapping in vllm/distributed/device_communicators/pynccl_wrapper.py. This ensures both float8_e4m3fn and float8_e4m3fnuz variants map to ncclFloat8e4m3, which is the correct behavior as explained in the pull request description. The fix is targeted, correct, and resolves the issue.

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
@gshtras gshtras enabled auto-merge (squash) February 3, 2026 22:24
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 3, 2026
@vllm-bot vllm-bot merged commit 1d367a7 into vllm-project:main Feb 4, 2026
48 of 49 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in AMD Feb 4, 2026
gameofdimension pushed a commit to gameofdimension/vllm that referenced this pull request Feb 5, 2026
…m-project#33713)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…m-project#33713)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
…m-project#33713)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants