Skip to content

[MoE Refactor] FusedMoE/MoERunner inversion refactor#41184

Open
bnellnm wants to merge 228 commits into
vllm-project:mainfrom
neuralmagic:layer-refactor
Open

[MoE Refactor] FusedMoE/MoERunner inversion refactor#41184
bnellnm wants to merge 228 commits into
vllm-project:mainfrom
neuralmagic:layer-refactor

Conversation

@bnellnm
Copy link
Copy Markdown
Collaborator

@bnellnm bnellnm commented Apr 29, 2026

Purpose

Invert the MoERunner <-> FusedMoE relationship

The MoERunner will own the FusedMoE (renamed to RoutedExperts)
The FusedMoE class will go away.

Some model weight loading code needed updating since the paths for MoE weights now has an extra level, e.g. .experts.<foo> is now .experts.routed_experts.<foo>

Based on the following PRs:
#41997 - Move capture state out of FusedMoE

cc @yzong-rh

Test Plan

CI + MoE refactoring tests
Run all MoE layer tests (including SP tests from (#41299)
Run tests from #39956
Ran model loading tests for the following models:

arcee-ai/Trinity-Nano-Preview
rhymes-ai/Aria
skt/A.X-K1
inclusionAI/Ling-lite-1.5
inclusionAI/Ring-2.5-1T
deepseek-ai/DeepSeek-V2-Lite-Chat
rednote-hilab/dots.llm1.inst
baidu/ERNIE-4.5-21B-A3B-PT
baidu/ERNIE-4.5-VL-28B-A3B-PT
LGAI-EXAONE/K-EXAONE-236B-A23B
allenai/Flex-reddit-2x7B-1T
google/gemma-4-E2B-it
zai-org/GLM-4.5-FP8
ibm/PowerMoE-3b
xai-org/grok-2
tencent/Hunyuan-A13B-Instruct
tencent/Hy3-preview
ai21labs/Jamba-tiny-dev
moonshotai/Kimi-Linear-48B-A3B-Instruct
poolside/Laguna-XS.2
LiquidAI/LFM2-8B-A1B
meituan-longcat/LongCat-Flash-Chat-FP8
XiaomiMiMo/MiMo-V2.5-Pro
MiniMaxAI/MiniMax-M2
TitanML/tiny-mixtral
allenai/OLMoE-1B-7B-0924-Instruct
bharatgenai/Param2-17B-A2.4B-Thinking
microsoft/Phi-3.5-MoE-instruct
Qwen/Qwen1.5-MoE-A2.7B-Chat
Qwen/Qwen3-30B-A3B
Qwen/Qwen3-Next-80B-A3B-Instruct
sarvamai/sarvam-30b
stepfun-ai/Step-3.5-Flash
MiniMaxAI/MiniMax-Text-01

Test Result

Waiting for CI results
All model loading tests passed except for the following which I was unable to verify (due to OOM or other issues):

skt/A.X-K1
inclusionAI/Ring-2.5-1T
rednote-hilab/dots.llm1.inst
LGAI-EXAONE/K-EXAONE-236B-A23B
xai-org/grok-2
tencent/Hy3-preview
MiniMaxAI/MiniMax-Text-01

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

bnellnm added 30 commits March 18, 2026 16:48
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
@hmellor
Copy link
Copy Markdown
Member

hmellor commented May 21, 2026

Sorry, I'll clarify.

By elevate I meant move it up one level in the model structure.

So that MoERunner is used for the the sparse_moe_block (often named mlp) instead of the experts in the model structure.

Currently, MoERunner is the class used for experts and then it tries to own all the siblings of experts.

It would be better if the MoERunner was used for the sparse_moe_block and it could then own the routed experts, shared experts and router as its children.

@bnellnm
Copy link
Copy Markdown
Collaborator Author

bnellnm commented May 21, 2026

Sorry, I'll clarify.

By elevate I meant move it up one level in the model structure.

So that MoERunner is used for the the sparse_moe_block (often named mlp) instead of the experts in the model structure.

Currently, MoERunner is the class used for experts and then it tries to own all the siblings of experts.

It would be better if the MoERunner was used for the sparse_moe_block and it could then own the routed experts, shared experts and router as its children.

Ok, I'm not opposed to that but I think it would be too much for this PR.

@hmellor
Copy link
Copy Markdown
Member

hmellor commented May 21, 2026

Ok, happy for it to be a follow up. It would simplify model loading and the Transformers modelling backend if we did this.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 23, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 23, 2026
Signed-off-by: Bill Nell <bnell@redhat.com>
@bnellnm bnellnm requested a review from AndreasKaratzas as a code owner May 28, 2026 19:36
@mergify mergify Bot removed the needs-rebase label May 28, 2026
@AndreasKaratzas
Copy link
Copy Markdown
Member

cc @divakar-amd can you review this PR too?

Signed-off-by: Bill Nell <bnell@redhat.com>
Comment thread vllm/models/deepseek_v4/quant_config.py
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 29, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 29, 2026
bnellnm added 4 commits May 29, 2026 01:43
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
@mergify mergify Bot removed the needs-rebase label May 29, 2026
Comment thread vllm/model_executor/layers/fused_moe/runner/moe_runner.py Outdated
Signed-off-by: Bill Nell <bnell@redhat.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 29, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 29, 2026
Signed-off-by: Bill Nell <bnell@redhat.com>
@mergify mergify Bot removed the needs-rebase label May 29, 2026
bnellnm added 2 commits May 29, 2026 19:35
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
@mergify mergify Bot added the performance Performance-related issues label May 29, 2026
bnellnm added 2 commits May 30, 2026 00:33
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models documentation Improvements or additions to documentation gpt-oss Related to GPT-OSS models nvidia performance Performance-related issues qwen Related to Qwen models v1

Projects

Status: No status
Status: To Triage

Development

Successfully merging this pull request may close these issues.

5 participants