Skip to content

[DSV4][XPU] Pass gemm1_clamp_limit to XpuFusedMoe#44517

Draft
majian4work wants to merge 3 commits into
vllm-project:mainfrom
majian4work:dsv4-pr6-moe-clamp
Draft

[DSV4][XPU] Pass gemm1_clamp_limit to XpuFusedMoe#44517
majian4work wants to merge 3 commits into
vllm-project:mainfrom
majian4work:dsv4-pr6-moe-clamp

Conversation

@majian4work

Copy link
Copy Markdown
Contributor

Summary

Pass quant_config.gemm1_clamp_limit to XpuFusedMoe so that the SwiGLU clamp limit is applied during MoE expert computation on XPU.

Dependencies

This PR depends on:

Add XPU-specific decode implementation for DeepSeek-V4 MLA sparse attention.

Signed-off-by: Ma Jian <jian1.ma@intel.com>
- Add forward_xpu to MHCFusedPostPreOp (decomposes into mhc_post_torch + mhc_pre_torch)
- Update XPU model forward to use fused MHC path (matching AMD pattern):
  first layer uses standalone hc_pre, middle layers use mhc_fused_post_pre
- Add explicit hc_post after decoder loop

Signed-off-by: Ma Jian <jian1.ma@intel.com>
Signed-off-by: Ma Jian <jian1.ma@intel.com>
@mergify mergify Bot added intel-gpu Related to Intel GPU v1 labels Jun 4, 2026
@mergify

mergify Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @majian4work.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant