[XPU][4/N] add mxfp4 moe model support#33679
Conversation
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the IPEX-specific MXFP4 MoE method to a more generic XPU implementation by renaming IpexMxfp4MoEMethod to XpuMxfp4MoEMethod and replacing the IPEX-dependent logic in apply_monolithic with a call to the xpu_fused_moe kernel.
My review identifies a critical issue in the new apply_monolithic implementation where input padding is missing, which will likely lead to a shape mismatch and runtime errors. I've provided a detailed comment with a suggested fix for this. Additionally, I've noted a minor performance concern regarding an unused tensor allocation.
|
@robertgshaw2-redhat @mgoin can you help take a review? thanks! |
|
@marvind thanks for reporting this. can you share your client command? or try with |
|
Thank you for the swift feedback! For me it consistently ends up outputting exclamation marks (!!!!) after a longer reasoning which gets more and more chaotic: I will also run lm_eval but have time for it first tomorrow. |
strict-match looks off compared to yours, doesn't it? I also get v0.15.1 actually looks similar but I do not get the warnings and the curl command from my previous message works fine: Not sure how to interpret this. Please let me know if I can provide more information. |
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
@marvind
|
|
Thank you, @jikunshang. v0.15.1 works fine for the time being. |
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
Thank you for the update, @jikunshang! I will attach two sample outputs to show what I mean (v0.15.1 with IPEX vs. v0.17.0, both include reasoning and content): I ran this command to obtain the outputs: |
|
@marvind we will continue investigating. thanks! |
|
@marvind would you mind try with latest per-commit-wheel. you can find it here https://github.com/vllm-project/vllm-xpu-kernels/actions/runs/22834791643 |
|
@jikunshang, tested vllm v0.17.0 plus the updated vllm-xpu-kernel per-commit-wheel you mentioned. This fixes the issue! Great work, thanks. 😊 |


Purpose
[4/N] of #33214
add mxfp4 moe support. we can also refactor xpu part once mxfp4 apply kernel abstraction.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.