Skip to content

[MoE] Unify MoE oracles with class structure#37776

Open
Zijun9 wants to merge 3 commits intovllm-project:mainfrom
Zijun9:feature/unify-moe-oracle-class-structure
Open

[MoE] Unify MoE oracles with class structure#37776
Zijun9 wants to merge 3 commits intovllm-project:mainfrom
Zijun9:feature/unify-moe-oracle-class-structure

Conversation

@Zijun9
Copy link

@Zijun9 Zijun9 commented Mar 21, 2026

Purpose

Resolves #37753.

Introduces MoEKernelOracle(ABC, Generic[BackendT]) as a base class for all MoE kernel selection oracles. Each oracle (FP8, NvFP4, MXFP4, MXFP8, Unquantized) now inherits from this base class, standardizing the 4 core operations:

  • select_backend – choose the best kernel backend
  • convert_to_kernel_format – shuffle weights for a backend
  • make_quant_config – build a FusedMoEQuantConfig
  • make_kernel – construct the FusedMoEKernel

Plus 2 shared helper methods (backend_to_kernel_cls, map_backend) as abstract methods.

Key design decisions:

  • Module-level wrapper functions are preserved for full backward compatibility — zero changes required from external callers.
  • Method signatures intentionally vary across subclasses (different quant types need different weight/scale parameters), documented in base class docstring.
  • Optional methods (convert_to_kernel_format, make_quant_config, make_kernel) default to NotImplementedError for oracles that delegate (e.g. MXFP8 reuses FP8's kernel logic).

Additional fixes:

  • Fixed class methods calling module-level wrapper functions instead of self.method() in fp8, nvfp4, mxfp4.
  • Fixed map_backend type annotation inconsistency (strMoEBackend) in mxfp8 and mxfp4.
  • Fixed potential UnboundLocalError in unquantized.py select_backend (changed if/if chain to if/elif with else fallback).
  • Fixed missing else branch in unquantized.py make_kernel.
  • Renamed private _select_kernel_cls to select_kernel_cls in mxfp8.
  • Exported MoEKernelOracle from oracle/__init__.py.

Test Plan

pytest tests/kernels/moe/test_oracle_class_structure.py -v -s
pytest tests/kernels/moe/test_unquantized_backend_selection.py -v -s

Test Result

tests/kernels/moe/test_oracle_class_structure.py: 20 passed
tests/kernels/moe/test_unquantized_backend_selection.py: 8 passed
Total: 28 passed, 0 failed

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring to unify the MoE kernel selection oracles under a common MoEKernelOracle base class. This standardizes the API across different quantization types and improves code structure. The changes thoughtfully preserve backward compatibility by keeping module-level wrapper functions. The PR also includes several valuable bug fixes. My review found one minor type inconsistency in a wrapper function in mxfp4.py that was likely missed during the refactoring. Overall, this is a high-quality contribution that improves the MoE infrastructure.

Zijun9 added 2 commits March 21, 2026 15:29
Closes vllm-project#37753

Signed-off-by: Zijun Gao <zijung3@illinois.edu>
Signed-off-by: Zijun Gao <zijung3@illinois.edu>
@Zijun9 Zijun9 force-pushed the feature/unify-moe-oracle-class-structure branch from 17a98ee to e85ebfc Compare March 21, 2026 22:29
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Zijun Gao <zijung3@illinois.edu>
@robertgshaw2-redhat
Copy link
Collaborator

FYI - please build off

this has the unquantized stuff properly structured

@robertgshaw2-redhat
Copy link
Collaborator

I think we should do a series of PRs

  • this one (which adds the structure)
  • follow on (which makes more of the logic generic)

WDYT?

@mergify
Copy link

mergify bot commented Mar 21, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Zijun9.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 21, 2026
@Zijun9
Copy link
Author

Zijun9 commented Mar 22, 2026

FYI - please build off

this has the unquantized stuff properly structured

Thanks for the reminder! I'll rebase this PR onto #36286. Would you recommend doing that now on this branch, or waiting until it is merged into main? Just want to avoid rebasing multiple times if there are further changes.

@Zijun9
Copy link
Author

Zijun9 commented Mar 22, 2026

I think we should do a series of PRs

  • this one (which adds the structure)
  • follow on (which makes more of the logic generic)

WDYT?

Sounds good. This PR focuses on adding the structure. After #36286 is merged, I'll rebase accordingly. Then I can follow up with a separate PR to make more of the logic generic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Unify MoE "Oracles" with Class Structure

2 participants