Support nccl fp8 communication#32760
Support nccl fp8 communication#32760robertgshaw2-redhat merged 1 commit intovllm-project:naive-pf-separationfrom
Conversation
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request introduces support for FP8 communication in NCCL and updates the fused MoE layers to accommodate this. The changes in pynccl_wrapper.py correctly extend the NCCL data type enum and its from_torch mapping for torch.float8_e4m3fn. The modifications in prepare_finalize.py remove an int8_view workaround, suggesting improved native FP8 handling. However, a TODO comment in shared_fused_moe.py indicates uncertainty regarding a critical condition for shared expert overlap, which needs to be addressed.
| (self.enable_eplb and backend != "allgather_reducescatter") | ||
| or (self.moe_config.use_flashinfer_cutlass_kernels and self.dp_size > 1) | ||
| # TODO: Is this correct? | ||
| or self.moe_parallel_config.use_fi_all2allv_kernels |
There was a problem hiding this comment.
The TODO: Is this correct? comment indicates uncertainty about the logic for disabling shared expert overlap when use_fi_all2allv_kernels is true. If this condition is incorrect, it could lead to improper disabling of shared expert overlap, potentially impacting performance or correctness. Please verify this logic and either remove the TODO comment with a clarifying explanation or correct the condition if it's found to be wrong.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
| (self.enable_eplb and backend != "allgather_reducescatter") | ||
| or (self.moe_config.use_flashinfer_cutlass_kernels and self.dp_size > 1) | ||
| # TODO: Is this correct? | ||
| or self.moe_parallel_config.use_fi_all2allv_kernels |
There was a problem hiding this comment.
Removed DP check changes overlap disabling behavior
Medium Severity
The condition for disabling shared expert overlap changed from checking self.moe_config.use_flashinfer_cutlass_kernels and self.dp_size > 1 to just self.moe_parallel_config.use_fi_all2allv_kernels. The original comment indicated overlap was only disabled "with DP, since there nothing to gain." The removal of the dp_size > 1 check means overlap is now disabled even when dp_size == 1, which may unnecessarily reduce performance. The TODO comment "Is this correct?" indicates the author's uncertainty about this change.
|
Hi @amirkl94, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
thank you! |
c3ee917
into
vllm-project:naive-pf-separation
Purpose
Porting change from #32677