Skip to content

Fix stale gate ref overriding caller router_logits in dp_size==1 MoE fast path#1469

Merged
iboiko-habana merged 3 commits into
vllm-project:mainfrom
iboiko-habana:fix1441
May 22, 2026
Merged

Fix stale gate ref overriding caller router_logits in dp_size==1 MoE fast path#1469
iboiko-habana merged 3 commits into
vllm-project:mainfrom
iboiko-habana:fix1441

Conversation

@iboiko-habana
Copy link
Copy Markdown
Collaborator

PR #1441 added an _hpu_gate_ref fallback in the dp_size==1 fast path
that unconditionally re-invoked a runner-owned gate, overwriting
router_logits supplied by the caller. For SharedFusedMoE models
(Qwen3 MoE, ernie45, ...) the block's mlp.gate(...) has already
produced router_logits and _sync_shared_moe_gates sets
runner.gate=None post-INC; the cached _hpu_gate_ref still points at
the pre-INC module and produced shape/dtype mismatches under fp8.

Only invoke the runner-owned gate when the caller did not provide
router_logits, preserving the DeepSeek R1 internal-router fast path
from #1441.

…fast path

Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the dp_size==1 MoE fast path in patched_fused_moe_forward so that a runner-owned (or cached) gate is only invoked when the caller did not supply router_logits, preventing stale _hpu_gate_ref from overwriting caller-provided logits (notably for SharedFusedMoE models) and avoiding fp8 shape/dtype mismatches.

Changes:

  • Guard runner-owned gate invocation behind if router_logits is None to preserve caller-supplied router_logits.
  • Add clarifying in-code rationale about SharedFusedMoE post-INC behavior and stale gate references.

Comment thread vllm_gaudi/ops/hpu_fused_moe.py Outdated
Comment on lines +315 to +318
if router_logits is None:
gate = self.gate or getattr(self, "_hpu_gate_ref", None)
if gate is not None:
router_logits, _ = gate(hidden_states)
Comment thread vllm_gaudi/ops/hpu_fused_moe.py Outdated
Comment on lines +309 to +318
# Only invoke a runner-owned gate when the caller did not provide
# router_logits (internal-router mode, e.g. DeepSeek R1). For
# SharedFusedMoE models (Qwen3 MoE, ernie45, ...) the block's
# mlp.gate(...) has already produced router_logits and runner.gate
# is explicitly set to None by _sync_shared_moe_gates post-INC;
# re-invoking _hpu_gate_ref here would call a stale pre-INC module.
if router_logits is None:
gate = self.gate or getattr(self, "_hpu_gate_ref", None)
if gate is not None:
router_logits, _ = gate(hidden_states)
…d_moe_gates

Signed-off-by: Iryna Boiko <iboiko@habana.ai>
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
dcacdf9a8860a86401127d1c8f93ebf3cfbfd026

@iboiko-habana iboiko-habana merged commit 2cb5d99 into vllm-project:main May 22, 2026
2 checks passed
mgawarkiewicz-intel pushed a commit that referenced this pull request May 26, 2026
…e==1 MoE fast path (#1469) (#1492)

Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants