[BugFix] Support DP/EP in AG/RS for FLASHINFER_CUTLASS FP8#32677
[BugFix] Support DP/EP in AG/RS for FLASHINFER_CUTLASS FP8#32677amirkl94 wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request introduces support for FP8 per-tensor DP and EP in the MoE FlashInfer Cutlass path. The changes correctly integrate the ncclFloat8e4m3 data type and update the from_torch method in pynccl_wrapper.py. Additionally, a new use_ep flag has been added to FlashInferAllGatherMoEPrepareAndFinalize and its factory function, create_flashinfer_prepare_finalize, to explicitly handle expert parallelism. The conditional logic in prepare and finalize methods has been updated to incorporate this new flag, ensuring proper differentiation between DP-only and DP+EP cases. The use_ep parameter is correctly propagated from the moe configuration, making the implementation robust. Overall, the changes align well with the stated purpose of supporting FP8 and explicit DP/EP handling.
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
Add support for fp8 per-tensor DP and EP in MoE flashinfer cutlass path.
Changes:
pynccl_wrapper.Test Plan
TBD
Test Result
TBD