[Bugfix] Fix torch.compile error for DP + MoE on CPU Backend#31650
[Bugfix] Fix torch.compile error for DP + MoE on CPU Backend#31650bigPYJ1151 merged 3 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request effectively resolves a torch.compile error that occurred with MoE models on CPU backends using data parallelism. The core of the fix is in vllm/model_executor/layers/fused_moe/layer.py, where the conditions for the post_quant_allgather flag are reordered. By moving the has_flashinfer_trtllm_fused_moe() check to the end of the expression, the change leverages short-circuiting to prevent torch.compile from tracing this non-traceable function on CPU platforms. This is a clean and correct solution. The accompanying changes in base_device_communicator.py and xpu_communicator.py are necessary API updates to plumb the extra_tensors parameter, which is part of the feature enabled by post_quant_allgather. These changes are implemented safely and do not pose a risk to other backends. The pull request is well-executed and improves the framework's robustness.
Signed-off-by: kunzh <zhikun.wu@outlook.com>
09d6a0b to
3c53d77
Compare
…oject#31650) Signed-off-by: kunzh <zhikun.wu@outlook.com>
…oject#31650) Signed-off-by: kunzh <zhikun.wu@outlook.com>
…oject#31650) Signed-off-by: kunzh <zhikun.wu@outlook.com>
…oject#31650) Signed-off-by: kunzh <zhikun.wu@outlook.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…oject#31650) Signed-off-by: kunzh <zhikun.wu@outlook.com>
Purpose
Fix #31648 (torch.compile fails for MoE models on CPU backend with
-dp 2).Fix the error by reordering
has_flashinfer_trtllm_fused_moe()in post_quant_allgather.Test Plan
vllm serve "Qwen/Qwen1.5-MoE-A2.7B-Chat" -dp 2Test Result
The service can start successfully.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.