[Bugfix][ROCm] Include float8_e4m3fnuz in NCCL Dtype Dispatching#33713
Merged
vllm-bot merged 3 commits intovllm-project:mainfrom Feb 4, 2026
Merged
[Bugfix][ROCm] Include float8_e4m3fnuz in NCCL Dtype Dispatching#33713vllm-bot merged 3 commits intovllm-project:mainfrom
vllm-bot merged 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request addresses a bug on ROCm platforms where torch.float8_e4m3fnuz was not supported for NCCL dtype dispatching, causing a RuntimeError on MI300/325 hardware. The change correctly adds this dtype to the mapping in vllm/distributed/device_communicators/pynccl_wrapper.py. This ensures both float8_e4m3fn and float8_e4m3fnuz variants map to ncclFloat8e4m3, which is the correct behavior as explained in the pull request description. The fix is targeted, correct, and resolves the issue.
gshtras
reviewed
Feb 3, 2026
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
gshtras
approved these changes
Feb 3, 2026
3 tasks
gameofdimension
pushed a commit
to gameofdimension/vllm
that referenced
this pull request
Feb 5, 2026
…m-project#33713) Signed-off-by: Micah Williamson <micah.williamson@amd.com> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
ItzDEXX
pushed a commit
to ItzDEXX/vllm
that referenced
this pull request
Feb 19, 2026
…m-project#33713) Signed-off-by: Micah Williamson <micah.williamson@amd.com>
tunglinwood
pushed a commit
to tunglinwood/vllm
that referenced
this pull request
Mar 4, 2026
…m-project#33713) Signed-off-by: Micah Williamson <micah.williamson@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#33030 fixed dtypes in the Pynccl wrapper, but omitted the case for float8_e4m3fnuz which is used on MI300/325.
Repro command:
When running this on main, I am seeing the following:
With this PR, the server starts as expected. This fixes the
Qwen3-30B-A3B-FP8-block Accuracyfailure on AMD CI. When runningbash .buildkite/scripts/scheduled_integration_test/qwen30b_a3b_fp8_block_ep_eplb.sh 0.8 200 8020on MI300X, I am now seeing:RCCL only defines ncclFloat8e4m3 without distinguishing between the fn and fnuz variants, so that is why both torch dtype variants map to the same NCCL dtype.
https://github.com/ROCm/rocm-systems/blob/0334750b74d14d92102196d9bd435d3ca4fc67ed/projects/rccl/src/nccl.h.in#L468