Fix stale gate ref overriding caller router_logits in dp_size==1 MoE fast path#1469
Merged
Merged
Conversation
…fast path Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes the dp_size==1 MoE fast path in patched_fused_moe_forward so that a runner-owned (or cached) gate is only invoked when the caller did not supply router_logits, preventing stale _hpu_gate_ref from overwriting caller-provided logits (notably for SharedFusedMoE models) and avoiding fp8 shape/dtype mismatches.
Changes:
- Guard runner-owned gate invocation behind
if router_logits is Noneto preserve caller-suppliedrouter_logits. - Add clarifying in-code rationale about SharedFusedMoE post-INC behavior and stale gate references.
Comment on lines
+315
to
+318
| if router_logits is None: | ||
| gate = self.gate or getattr(self, "_hpu_gate_ref", None) | ||
| if gate is not None: | ||
| router_logits, _ = gate(hidden_states) |
Comment on lines
+309
to
+318
| # Only invoke a runner-owned gate when the caller did not provide | ||
| # router_logits (internal-router mode, e.g. DeepSeek R1). For | ||
| # SharedFusedMoE models (Qwen3 MoE, ernie45, ...) the block's | ||
| # mlp.gate(...) has already produced router_logits and runner.gate | ||
| # is explicitly set to None by _sync_shared_moe_gates post-INC; | ||
| # re-invoking _hpu_gate_ref here would call a stale pre-INC module. | ||
| if router_logits is None: | ||
| gate = self.gate or getattr(self, "_hpu_gate_ref", None) | ||
| if gate is not None: | ||
| router_logits, _ = gate(hidden_states) |
…==1 MoE fast path" This reverts commit d1cb3b9.
…d_moe_gates Signed-off-by: Iryna Boiko <iboiko@habana.ai>
✅ CI PassedAll checks passed successfully against the following vllm commit: |
adobrzyn
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #1441 added an _hpu_gate_ref fallback in the dp_size==1 fast path
that unconditionally re-invoked a runner-owned gate, overwriting
router_logits supplied by the caller. For SharedFusedMoE models
(Qwen3 MoE, ernie45, ...) the block's mlp.gate(...) has already
produced router_logits and _sync_shared_moe_gates sets
runner.gate=None post-INC; the cached _hpu_gate_ref still points at
the pre-INC module and produced shape/dtype mismatches under fp8.
Only invoke the runner-owned gate when the caller did not provide
router_logits, preserving the DeepSeek R1 internal-router fast path
from #1441.