Port of: fix: bypass _forward_impl for dp_size==1 to fix DeepSeek R1 FP8 crash #1441 by iboiko-habana · Pull Request #1459 · vllm-project/vllm-gaudi

iboiko-habana · 2026-05-19T09:23:14Z

No description provided.

…FP8 crash vllm-project#1441 Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Copilot

Pull request overview

Ports an upstream fix to the Gaudi fused MoE runner to avoid a DeepSeek R1 FP8 crash by changing the dp_size == 1 execution path to bypass _forward_impl and directly run quant/application + output combination steps.

Changes:

Implement a dp_size == 1 fast path that calls _apply_quant_method and _maybe_combine directly instead of _forward_impl.
Persist additional runner references (_hpu_layer_ref, _hpu_gate_ref) during FusedMoE.__init__ to avoid relying on potentially detached gate modules at runtime.
Minor style/formatting adjustments to patch assignments.

+        if self.moe_config.pcp_size > 1:
+            raise RuntimeError("dp_size==1 fast path does not support pcp_size > 1")
+        layer = self._hpu_layer_ref


    Instead of calling forward_dispatch (which uses get_layer_from_name,
    ensure_moe_quant_config_init, and _sequence_parallel_context — all of
    which access ForwardContext and cause torch.compile graph breaks), we
    use a layer reference stashed on the runner at FusedMoE.__init__ time
-    (self._hpu_layer_ref) and call _forward_impl directly. This also
+    (self._hpu_layer_ref) and bypass _forward_impl for dp_size==1,
+    calling _apply_quant_method + _maybe_combine directly. This also
    bypasses self.layer_name (a per-layer string) so dynamo no longer


kamil-kaczor

lgtm

github-actions · 2026-05-20T03:48:57Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
ad7125a431e176d4161099480a66f0169609a690

Port of: fix: bypass _forward_impl for dp_size==1 to fix DeepSeek R1 …

c51a791

…FP8 crash vllm-project#1441 Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Copilot AI review requested due to automatic review settings May 19, 2026 09:23

iboiko-habana requested review from PatrykWo, mgawarkiewicz-intel and wpyszka as code owners May 19, 2026 09:23

Copilot started reviewing on behalf of iboiko-habana May 19, 2026 09:23 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

kamil-kaczor approved these changes May 19, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 19, 2026

🚦 Team Review Dashboard #701

Open

mgawarkiewicz-intel merged commit de9f2fb into vllm-project:releases/v0.21.0 May 25, 2026
2 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port of: fix: bypass _forward_impl for dp_size==1 to fix DeepSeek R1 FP8 crash #1441#1459

Port of: fix: bypass _forward_impl for dp_size==1 to fix DeepSeek R1 FP8 crash #1441#1459
mgawarkiewicz-intel merged 1 commit into
vllm-project:releases/v0.21.0from
iboiko-habana:port1441

iboiko-habana commented May 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

kamil-kaczor left a comment

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

iboiko-habana commented May 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

kamil-kaczor left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 20, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants