[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle by jikunshang · Pull Request #37784 · vllm-project/vllm

jikunshang · 2026-03-22T01:04:09Z

Purpose

follow up of #37128, move xpu mxfp4 support into oracle as well.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

jikunshang · 2026-03-22T01:04:48Z

@zyongye @mgoin @robertgshaw2-redhat PTAL, thanks!

gemini-code-assist

Code Review

This pull request refactors the XPU MXFP4 support for Mixture-of-Experts layers to integrate it into the MoE oracle system. This is achieved by removing the specialized XpuMxfp4MoEMethod and introducing a new XPUExpertsMXFp4 class that can be selected by the oracle. While this is a good refactoring for modularity, I've identified a potential performance regression. The previous implementation used XPU-specific custom operators for routing, but the new implementation appears to fall back to a generic PyTorch-based router. My review includes a comment highlighting this issue and asking for clarification.

I am having trouble creating individual review comments. Click here to see my feedback.

vllm/model_executor/layers/quantization/mxfp4.py (416-506)

By removing XpuMxfp4MoEMethod, the routing logic for XPU MXFP4 MoE layers is now handled by the generic Router class. The previous implementation used XPU-specific custom ops (torch.ops._moe_C.fused_grouped_topk and torch.ops._moe_C.topk_softmax) for routing, which are likely more performant on XPU hardware.

The new implementation uses a pure PyTorch-based router, which might cause a performance regression, especially for models that use grouped top-k routing. Was this change in routing implementation intentional? If the custom routing ops are still desired, the logic from XpuMxfp4MoEMethod.apply_monolithic might need to be preserved, perhaps by creating a monolithic XPU expert class (FusedMoEExpertsMonolithic) that can be selected by the oracle and can perform the specialized routing.

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

refactor xpu mxfp4 support into oracle

3c01986

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

jikunshang requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners March 22, 2026 01:04

gemini-code-assist bot reviewed Mar 22, 2026

View reviewed changes

AndreasKaratzas mentioned this pull request Mar 22, 2026

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 #37787

Merged

Merge branch 'main' into kunshang/mxfp4_oracle

b52f803

bigPYJ1151 approved these changes Mar 23, 2026

View reviewed changes

bigPYJ1151 enabled auto-merge (squash) March 23, 2026 09:15

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 23, 2026

bigPYJ1151 merged commit debd6e7 into vllm-project:main Mar 23, 2026
66 checks passed

yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Mar 23, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

5354f7c

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

376b074

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

HenryTangDev pushed a commit to HenryTangMain/vllm that referenced this pull request Mar 27, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

dd67c0b

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

6e59e40

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

b1e9c94

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

5006517

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

130abb6

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

18442a2

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

7f77886

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

173560a

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

big-yellow-duck pushed a commit to EmbeddedLLM/vllm that referenced this pull request Apr 8, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

0c3796b

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (vllm-proj…

33dc210

…ect#37784) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle#37784

[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle#37784
bigPYJ1151 merged 2 commits intovllm-project:mainfrom
jikunshang:kunshang/mxfp4_oracle

jikunshang commented Mar 22, 2026 •

edited by github-actions bot

Loading

Uh oh!

jikunshang commented Mar 22, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jikunshang commented Mar 22, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

jikunshang commented Mar 22, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

vllm/model_executor/layers/quantization/mxfp4.py (416-506)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jikunshang commented Mar 22, 2026 •

edited by github-actions bot

Loading