Skip to content

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle#37784

Merged
bigPYJ1151 merged 2 commits intovllm-project:mainfrom
jikunshang:kunshang/mxfp4_oracle
Mar 23, 2026
Merged

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle#37784
bigPYJ1151 merged 2 commits intovllm-project:mainfrom
jikunshang:kunshang/mxfp4_oracle

Conversation

@jikunshang
Copy link
Copy Markdown
Collaborator

@jikunshang jikunshang commented Mar 22, 2026

Purpose

follow up of #37128, move xpu mxfp4 support into oracle as well.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
@jikunshang
Copy link
Copy Markdown
Collaborator Author

@zyongye @mgoin @robertgshaw2-redhat PTAL, thanks!

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the XPU MXFP4 support for Mixture-of-Experts layers to integrate it into the MoE oracle system. This is achieved by removing the specialized XpuMxfp4MoEMethod and introducing a new XPUExpertsMXFp4 class that can be selected by the oracle. While this is a good refactoring for modularity, I've identified a potential performance regression. The previous implementation used XPU-specific custom operators for routing, but the new implementation appears to fall back to a generic PyTorch-based router. My review includes a comment highlighting this issue and asking for clarification.

I am having trouble creating individual review comments. Click here to see my feedback.

vllm/model_executor/layers/quantization/mxfp4.py (416-506)

high

By removing XpuMxfp4MoEMethod, the routing logic for XPU MXFP4 MoE layers is now handled by the generic Router class. The previous implementation used XPU-specific custom ops (torch.ops._moe_C.fused_grouped_topk and torch.ops._moe_C.topk_softmax) for routing, which are likely more performant on XPU hardware.

The new implementation uses a pure PyTorch-based router, which might cause a performance regression, especially for models that use grouped top-k routing. Was this change in routing implementation intentional? If the custom routing ops are still desired, the logic from XpuMxfp4MoEMethod.apply_monolithic might need to be preserved, perhaps by creating a monolithic XPU expert class (FusedMoEExpertsMonolithic) that can be selected by the oracle and can perform the specialized routing.

@bigPYJ1151 bigPYJ1151 enabled auto-merge (squash) March 23, 2026 09:15
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 23, 2026
@bigPYJ1151 bigPYJ1151 merged commit debd6e7 into vllm-project:main Mar 23, 2026
66 checks passed
yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Mar 23, 2026
RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026
HenryTangDev pushed a commit to HenryTangMain/vllm that referenced this pull request Mar 27, 2026
SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Monishver11 pushed a commit to Monishver11/vllm that referenced this pull request Mar 27, 2026
…ect#37784)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026
…ect#37784)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026
…ect#37784)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026
…ect#37784)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: EricccYang <yangyang4991@gmail.com>
liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026
rishitdholakia13 pushed a commit to rishitdholakia13/vllm that referenced this pull request Apr 7, 2026
…ect#37784)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>
big-yellow-duck pushed a commit to EmbeddedLLM/vllm that referenced this pull request Apr 8, 2026
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants