Skip to content

[BugFix] Support DP/EP in AG/RS for FLASHINFER_CUTLASS FP8#32677

Draft
amirkl94 wants to merge 1 commit intovllm-project:mainfrom
amirkl94:feat/cutlass-fp8-moe-dp
Draft

[BugFix] Support DP/EP in AG/RS for FLASHINFER_CUTLASS FP8#32677
amirkl94 wants to merge 1 commit intovllm-project:mainfrom
amirkl94:feat/cutlass-fp8-moe-dp

Conversation

@amirkl94
Copy link
Copy Markdown
Contributor

@amirkl94 amirkl94 commented Jan 20, 2026

Purpose

Add support for fp8 per-tensor DP and EP in MoE flashinfer cutlass path.
Changes:

  1. Support fp8 hidden state communication in pynccl_wrapper .
  2. Split the DP only case and EP case in prepare and finalize allgather object.

Test Plan

TBD

Test Result

TBD

Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
@mergify mergify bot added the nvidia label Jan 20, 2026
@robertgshaw2-redhat robertgshaw2-redhat self-assigned this Jan 20, 2026
@robertgshaw2-redhat robertgshaw2-redhat changed the title Feature: Support DP and EP in fp8 MoE cutlass path [BugFix] Support DP/EP in AG/RS for FLASHINFER_CUTLASS FP8 Jan 20, 2026
@mergify mergify bot added the bug Something isn't working label Jan 20, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for FP8 per-tensor DP and EP in the MoE FlashInfer Cutlass path. The changes correctly integrate the ncclFloat8e4m3 data type and update the from_torch method in pynccl_wrapper.py. Additionally, a new use_ep flag has been added to FlashInferAllGatherMoEPrepareAndFinalize and its factory function, create_flashinfer_prepare_finalize, to explicitly handle expert parallelism. The conditional logic in prepare and finalize methods has been updated to incorporate this new flag, ensuring proper differentiation between DP-only and DP+EP cases. The use_ep parameter is correctly propagated from the moe configuration, making the implementation robust. Overall, the changes align well with the stated purpose of supporting FP8 and explicit DP/EP handling.

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

FYI - @amirkl94 I also made this work in this PR by casting to int8, but your solution is cleaner

Could you make your change on top of #32567?

@amirkl94
Copy link
Copy Markdown
Contributor Author

FYI - @amirkl94 I also made this work in this PR by casting to int8, but your solution is cleaner

Could you make your change on top of #32567?

Yes, I'm testing this now on top of your branch as well.

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 27, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @amirkl94.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-rebase nvidia

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants