[MoE] Unify MoE oracles with class structure#37776
[MoE] Unify MoE oracles with class structure#37776Zijun9 wants to merge 3 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed refactoring to unify the MoE kernel selection oracles under a common MoEKernelOracle base class. This standardizes the API across different quantization types and improves code structure. The changes thoughtfully preserve backward compatibility by keeping module-level wrapper functions. The PR also includes several valuable bug fixes. My review found one minor type inconsistency in a wrapper function in mxfp4.py that was likely missed during the refactoring. Overall, this is a high-quality contribution that improves the MoE infrastructure.
Closes vllm-project#37753 Signed-off-by: Zijun Gao <zijung3@illinois.edu>
Signed-off-by: Zijun Gao <zijung3@illinois.edu>
17a98ee to
e85ebfc
Compare
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Zijun Gao <zijung3@illinois.edu>
|
FYI - please build off this has the unquantized stuff properly structured |
|
I think we should do a series of PRs
WDYT? |
|
This pull request has merge conflicts that must be resolved before it can be |
Thanks for the reminder! I'll rebase this PR onto #36286. Would you recommend doing that now on this branch, or waiting until it is merged into main? Just want to avoid rebasing multiple times if there are further changes. |
Sounds good. This PR focuses on adding the structure. After #36286 is merged, I'll rebase accordingly. Then I can follow up with a separate PR to make more of the logic generic. |
Purpose
Resolves #37753.
Introduces
MoEKernelOracle(ABC, Generic[BackendT])as a base class for all MoE kernel selection oracles. Each oracle (FP8, NvFP4, MXFP4, MXFP8, Unquantized) now inherits from this base class, standardizing the 4 core operations:select_backend– choose the best kernel backendconvert_to_kernel_format– shuffle weights for a backendmake_quant_config– build aFusedMoEQuantConfigmake_kernel– construct theFusedMoEKernelPlus 2 shared helper methods (
backend_to_kernel_cls,map_backend) as abstract methods.Key design decisions:
convert_to_kernel_format,make_quant_config,make_kernel) default toNotImplementedErrorfor oracles that delegate (e.g. MXFP8 reuses FP8's kernel logic).Additional fixes:
self.method()in fp8, nvfp4, mxfp4.map_backendtype annotation inconsistency (str→MoEBackend) in mxfp8 and mxfp4.UnboundLocalErrorinunquantized.pyselect_backend(changedif/ifchain toif/elifwithelsefallback).elsebranch inunquantized.pymake_kernel._select_kernel_clstoselect_kernel_clsin mxfp8.MoEKernelOraclefromoracle/__init__.py.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.