Support nccl fp8 communication by amirkl94 · Pull Request #32760 · vllm-project/vllm

amirkl94 · 2026-01-21T07:24:48Z

Purpose

Porting change from #32677

Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request introduces support for FP8 communication in NCCL and updates the fused MoE layers to accommodate this. The changes in pynccl_wrapper.py correctly extend the NCCL data type enum and its from_torch mapping for torch.float8_e4m3fn. The modifications in prepare_finalize.py remove an int8_view workaround, suggesting improved native FP8 handling. However, a TODO comment in shared_fused_moe.py indicates uncertainty regarding a critical condition for shared expert overlap, which needs to be addressed.

gemini-code-assist · 2026-01-21T07:26:26Z

vllm/model_executor/layers/fused_moe/shared_fused_moe.py

                (self.enable_eplb and backend != "allgather_reducescatter")
-                or (self.moe_config.use_flashinfer_cutlass_kernels and self.dp_size > 1)
+                # TODO: Is this correct?
+                or self.moe_parallel_config.use_fi_all2allv_kernels


The TODO: Is this correct? comment indicates uncertainty about the logic for disabling shared expert overlap when use_fi_all2allv_kernels is true. If this condition is incorrect, it could lead to improper disabling of shared expert overlap, potentially impacting performance or correctness. Please verify this logic and either remove the TODO comment with a clarifying explanation or correct the condition if it's found to be wrong.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-01-21T07:27:44Z

vllm/model_executor/layers/fused_moe/shared_fused_moe.py

                (self.enable_eplb and backend != "allgather_reducescatter")
-                or (self.moe_config.use_flashinfer_cutlass_kernels and self.dp_size > 1)
+                # TODO: Is this correct?
+                or self.moe_parallel_config.use_fi_all2allv_kernels


Removed DP check changes overlap disabling behavior

Medium Severity

The condition for disabling shared expert overlap changed from checking self.moe_config.use_flashinfer_cutlass_kernels and self.dp_size > 1 to just self.moe_parallel_config.use_fi_all2allv_kernels. The original comment indicated overlap was only disabled "with DP, since there nothing to gain." The removal of the dp_size > 1 check means overlap is now disabled even when dp_size == 1, which may unnecessarily reduce performance. The TODO comment "Is this correct?" indicates the author's uncertainty about this change.

mergify · 2026-01-21T07:29:14Z

Hi @amirkl94, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

robertgshaw2-redhat · 2026-01-21T13:31:06Z

thank you!

Support nccl fp8 communication

ce6703c

Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>

amirkl94 requested review from mgoin and pavanimajety as code owners January 21, 2026 07:24

gemini-code-assist bot reviewed Jan 21, 2026

View reviewed changes

cursor bot reviewed Jan 21, 2026

View reviewed changes

robertgshaw2-redhat approved these changes Jan 21, 2026

View reviewed changes

robertgshaw2-redhat closed this Jan 21, 2026

robertgshaw2-redhat reopened this Jan 21, 2026

robertgshaw2-redhat merged commit c3ee917 into vllm-project:naive-pf-separation Jan 21, 2026
8 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support nccl fp8 communication#32760

Support nccl fp8 communication#32760
robertgshaw2-redhat merged 1 commit intovllm-project:naive-pf-separationfrom
amirkl94:feature/support-fp8-in-nccl-wrapper

amirkl94 commented Jan 21, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 21, 2026

Uh oh!

mergify bot commented Jan 21, 2026

Uh oh!

robertgshaw2-redhat commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

amirkl94 commented Jan 21, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 21, 2026

Choose a reason for hiding this comment

Removed DP check changes overlap disabling behavior

Uh oh!

mergify bot commented Jan 21, 2026

Uh oh!

robertgshaw2-redhat commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amirkl94 commented Jan 21, 2026 •

edited by github-actions bot

Loading