[MoE] Unify MoE oracles with class structure by Zijun9 · Pull Request #37776 · vllm-project/vllm

Zijun9 · 2026-03-21T22:26:25Z

Purpose

Resolves #37753.

Introduces MoEKernelOracle(ABC, Generic[BackendT]) as a base class for all MoE kernel selection oracles. Each oracle (FP8, NvFP4, MXFP4, MXFP8, Unquantized) now inherits from this base class, standardizing the 4 core operations:

select_backend – choose the best kernel backend
convert_to_kernel_format – shuffle weights for a backend
make_quant_config – build a FusedMoEQuantConfig
make_kernel – construct the FusedMoEKernel

Plus 2 shared helper methods (backend_to_kernel_cls, map_backend) as abstract methods.

Key design decisions:

Module-level wrapper functions are preserved for full backward compatibility — zero changes required from external callers.
Method signatures intentionally vary across subclasses (different quant types need different weight/scale parameters), documented in base class docstring.
Optional methods (convert_to_kernel_format, make_quant_config, make_kernel) default to NotImplementedError for oracles that delegate (e.g. MXFP8 reuses FP8's kernel logic).

Additional fixes:

Fixed class methods calling module-level wrapper functions instead of self.method() in fp8, nvfp4, mxfp4.
Fixed map_backend type annotation inconsistency (str → MoEBackend) in mxfp8 and mxfp4.
Fixed potential UnboundLocalError in unquantized.py select_backend (changed if/if chain to if/elif with else fallback).
Fixed missing else branch in unquantized.py make_kernel.
Renamed private _select_kernel_cls to select_kernel_cls in mxfp8.
Exported MoEKernelOracle from oracle/__init__.py.

Test Plan

pytest tests/kernels/moe/test_oracle_class_structure.py -v -s
pytest tests/kernels/moe/test_unquantized_backend_selection.py -v -s

Test Result

tests/kernels/moe/test_oracle_class_structure.py: 20 passed
tests/kernels/moe/test_unquantized_backend_selection.py: 8 passed
Total: 28 passed, 0 failed

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed refactoring to unify the MoE kernel selection oracles under a common MoEKernelOracle base class. This standardizes the API across different quantization types and improves code structure. The changes thoughtfully preserve backward compatibility by keeping module-level wrapper functions. The PR also includes several valuable bug fixes. My review found one minor type inconsistency in a wrapper function in mxfp4.py that was likely missed during the refactoring. Overall, this is a high-quality contribution that improves the MoE infrastructure.

vllm/model_executor/layers/fused_moe/oracle/mxfp4.py

Closes vllm-project#37753 Signed-off-by: Zijun Gao <zijung3@illinois.edu>

Signed-off-by: Zijun Gao <zijung3@illinois.edu>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Zijun Gao <zijung3@illinois.edu>

robertgshaw2-redhat · 2026-03-21T23:19:17Z

FYI - please build off

[MoE Refactor] Migrate Unquantized to Full Oracle Flow #36286

this has the unquantized stuff properly structured

robertgshaw2-redhat · 2026-03-21T23:20:20Z

I think we should do a series of PRs

this one (which adds the structure)
follow on (which makes more of the logic generic)

WDYT?

mergify · 2026-03-21T23:21:28Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Zijun9.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Zijun9 · 2026-03-22T00:38:24Z

FYI - please build off

[MoE Refactor] Migrate Unquantized to Full Oracle Flow #36286

this has the unquantized stuff properly structured

Thanks for the reminder! I'll rebase this PR onto #36286. Would you recommend doing that now on this branch, or waiting until it is merged into main? Just want to avoid rebasing multiple times if there are further changes.

Zijun9 · 2026-03-22T01:07:55Z

I think we should do a series of PRs

this one (which adds the structure)

follow on (which makes more of the logic generic)

WDYT?

Sounds good. This PR focuses on adding the structure. After #36286 is merged, I'll rebase accordingly. Then I can follow up with a separate PR to make more of the logic generic.

Zijun9 requested review from WoosukKwon, mgoin, pavanimajety, tlrmchlsmth and yewentao256 as code owners March 21, 2026 22:26

gemini-code-assist bot reviewed Mar 21, 2026

View reviewed changes

vllm/model_executor/layers/fused_moe/oracle/mxfp4.py Outdated Show resolved Hide resolved

Zijun9 added 2 commits March 21, 2026 15:29

[MoE] Unify MoE oracles with class structure

d965a8a

Closes vllm-project#37753 Signed-off-by: Zijun Gao <zijung3@illinois.edu>

[MoE] Refine oracle class structure and fix bugs

e85ebfc

Signed-off-by: Zijun Gao <zijung3@illinois.edu>

Zijun9 force-pushed the feature/unify-moe-oracle-class-structure branch from 17a98ee to e85ebfc Compare March 21, 2026 22:29

Update vllm/model_executor/layers/fused_moe/oracle/mxfp4.py

809e5bf

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Zijun Gao <zijung3@illinois.edu>

mergify bot added the needs-rebase label Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MoE] Unify MoE oracles with class structure#37776

[MoE] Unify MoE oracles with class structure#37776
Zijun9 wants to merge 3 commits intovllm-project:mainfrom
Zijun9:feature/unify-moe-oracle-class-structure

Zijun9 commented Mar 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

robertgshaw2-redhat commented Mar 21, 2026

Uh oh!

robertgshaw2-redhat commented Mar 21, 2026

Uh oh!

mergify bot commented Mar 21, 2026

Uh oh!

Zijun9 commented Mar 22, 2026

Uh oh!

Zijun9 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Zijun9 commented Mar 21, 2026

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

robertgshaw2-redhat commented Mar 21, 2026

Uh oh!

robertgshaw2-redhat commented Mar 21, 2026

Uh oh!

mergify bot commented Mar 21, 2026

Uh oh!

Zijun9 commented Mar 22, 2026

Uh oh!

Zijun9 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants