Skip to content

Support nccl fp8 communication#32760

Merged
robertgshaw2-redhat merged 1 commit intovllm-project:naive-pf-separationfrom
amirkl94:feature/support-fp8-in-nccl-wrapper
Jan 21, 2026
Merged

Support nccl fp8 communication#32760
robertgshaw2-redhat merged 1 commit intovllm-project:naive-pf-separationfrom
amirkl94:feature/support-fp8-in-nccl-wrapper

Conversation

@amirkl94
Copy link
Copy Markdown
Contributor

@amirkl94 amirkl94 commented Jan 21, 2026

Purpose

Porting change from #32677

Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for FP8 communication in NCCL and updates the fused MoE layers to accommodate this. The changes in pynccl_wrapper.py correctly extend the NCCL data type enum and its from_torch mapping for torch.float8_e4m3fn. The modifications in prepare_finalize.py remove an int8_view workaround, suggesting improved native FP8 handling. However, a TODO comment in shared_fused_moe.py indicates uncertainty regarding a critical condition for shared expert overlap, which needs to be addressed.

(self.enable_eplb and backend != "allgather_reducescatter")
or (self.moe_config.use_flashinfer_cutlass_kernels and self.dp_size > 1)
# TODO: Is this correct?
or self.moe_parallel_config.use_fi_all2allv_kernels
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The TODO: Is this correct? comment indicates uncertainty about the logic for disabling shared expert overlap when use_fi_all2allv_kernels is true. If this condition is incorrect, it could lead to improper disabling of shared expert overlap, potentially impacting performance or correctness. Please verify this logic and either remove the TODO comment with a clarifying explanation or correct the condition if it's found to be wrong.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

(self.enable_eplb and backend != "allgather_reducescatter")
or (self.moe_config.use_flashinfer_cutlass_kernels and self.dp_size > 1)
# TODO: Is this correct?
or self.moe_parallel_config.use_fi_all2allv_kernels
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed DP check changes overlap disabling behavior

Medium Severity

The condition for disabling shared expert overlap changed from checking self.moe_config.use_flashinfer_cutlass_kernels and self.dp_size > 1 to just self.moe_parallel_config.use_fi_all2allv_kernels. The original comment indicated overlap was only disabled "with DP, since there nothing to gain." The removal of the dp_size > 1 check means overlap is now disabled even when dp_size == 1, which may unnecessarily reduce performance. The TODO comment "Is this correct?" indicates the author's uncertainty about this change.

Fix in Cursor Fix in Web

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 21, 2026

Hi @amirkl94, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

thank you!

@robertgshaw2-redhat robertgshaw2-redhat merged commit c3ee917 into vllm-project:naive-pf-separation Jan 21, 2026
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants