[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply by bnellnm · Pull Request #29066 · vllm-project/vllm

bnellnm · 2025-11-20T03:56:29Z

Purpose

After making select_experts a non-static method (#29067), we can avoid passing most of the arguments to FusedMoEMethodBase.apply and get them from the layer directly.

This doesn't really decrease flexibility, the MoE quantization methods were already fairly tightly coupled with FusedMoE. A further step could be making FusedMoE an abstract base class that simply provides the quant_method parameters. Then the current FusedMoE class would become a subclass of that.

Test Plan

CI

Test Result

Blackwell failure is known bug fixed by #30336

cc @varun-sundar-rabindranath

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2025-11-20T03:57:33Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/layers/fused_moe/layer.py

LucasWilkinson · 2025-11-21T19:42:26Z

Looks like we need to land #29067 first? marking draft until that lands 👍

bnellnm · 2025-11-21T21:44:44Z

draft

Yeah, sorry this should have been a draft to start with. It needs the other PR to land first.

mergify · 2025-12-01T17:25:11Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Bill Nell <bnell@redhat.com>

mergify · 2025-12-09T07:47:57Z

Hi @bnellnm, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Bill Nell <bnell@redhat.com>

mergify · 2025-12-09T13:27:57Z

Hi @bnellnm, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Bill Nell <bnell@redhat.com>

mgoin

Great work Bill! Thinking now on the state, the main downside I see of this move is that since we aren't passing arguments into the function, it is less explicit for each apply method to assert against unsupported modes as we add more. Hopefully we can move to a system like the attention backends for each kernel backend

Culprit commit: vllm-project/vllm#29066 --------- Signed-off-by: Dobrzyniewicz, Agata <agata.dobrzyniewicz@intel.com> Signed-off-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Iryna Boiko <iboiko@habana.ai>

### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com>

…lm-project#29066) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

mergify bot added the needs-rebase label Nov 20, 2025

bnellnm mentioned this pull request Nov 20, 2025

[MoE][Refactor] Make select_experts a non-static method #29067

Merged

5 tasks

bnellnm changed the title ~~Simplify apply~~ Remove most arguments to FusedMoEMethodBase.apply Nov 20, 2025

bnellnm changed the title ~~Remove most arguments to FusedMoEMethodBase.apply~~ Remove most arguments to FusedMoEMethodBase.apply Nov 20, 2025

bnellnm marked this pull request as ready for review November 20, 2025 16:22

bnellnm requested review from mgoin, pavanimajety, robertgshaw2-redhat, tjtanaa, tlrmchlsmth and yewentao256 as code owners November 20, 2025 16:22

chatgpt-codex-connector bot reviewed Nov 20, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Show resolved Hide resolved

vllm/model_executor/layers/fused_moe/layer.py Show resolved Hide resolved

LucasWilkinson marked this pull request as draft November 21, 2025 19:42

bnellnm force-pushed the simplify-apply branch from 669f47c to 67fd46e Compare November 21, 2025 21:00

mergify bot added the nvidia label Nov 21, 2025

github-project-automation bot added this to NVIDIA Nov 21, 2025

mergify bot removed the needs-rebase label Nov 21, 2025

bnellnm changed the title ~~Remove most arguments to FusedMoEMethodBase.apply~~ [MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply Nov 21, 2025

bnellnm mentioned this pull request Nov 22, 2025

[Feature]: Generalize RoutingMethodType for broader MoE routing control #28408

Open

1 task

bnellnm force-pushed the simplify-apply branch from bf0aaf9 to 587ee3b Compare November 24, 2025 19:37

bnellnm marked this pull request as ready for review November 24, 2025 19:38

bnellnm mentioned this pull request Nov 26, 2025

[Misc][Refactor] Decouple quant methods from FusedMoE #29505

Closed

5 tasks

mergify bot added the needs-rebase label Dec 1, 2025

bnellnm added 3 commits December 3, 2025 20:02

remove most apply arguments

8dcbcc3

Signed-off-by: Bill Nell <bnell@redhat.com>

fixes

79524a6

Signed-off-by: Bill Nell <bnell@redhat.com>

fixes

ee4f2f6

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm added 2 commits December 8, 2025 09:04

Merge branch 'main' into simplify-apply

65873b1

Merge branch 'main' into simplify-apply

9c3faf4

fix merge

dc5fe75

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm and others added 3 commits December 9, 2025 13:59

fix merge

d24d49a

Signed-off-by: Bill Nell <bnell@redhat.com>

Merge branch 'main' into simplify-apply

c8c8612

Merge branch 'main' into simplify-apply

09af5d7

mgoin added the moe label Dec 9, 2025

mgoin approved these changes Dec 9, 2025

View reviewed changes

vllm-bot merged commit 00e5cbb into vllm-project:main Dec 9, 2025
132 of 136 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Dec 9, 2025

bnellnm deleted the simplify-apply branch December 9, 2025 21:59

kyuyeunk mentioned this pull request Dec 10, 2025

Fix moe layer from upstream change vllm-project/tpu-inference#1274

Merged

kitaekatt mentioned this pull request Dec 10, 2025

fix(gguf): Make GGUFMoEMethod.apply() parameters optional #30423

Closed

2 tasks

Potabk mentioned this pull request Dec 11, 2025

[Misc] Upgrade vllm hash to 1210 vllm-project/vllm-ascend#4906

Closed

adobrzyn mentioned this pull request Dec 11, 2025

[FIX_FOR_VLLM_LATEST] Maybe fix for 29066 vllm-project/vllm-gaudi#709

Merged

Potabk mentioned this pull request Dec 15, 2025

[Misc] Upgrade vllm hash to 12_14 vllm-project/vllm-ascend#5000

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply#29066

[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply#29066
vllm-bot merged 24 commits intovllm-project:mainfrom
neuralmagic:simplify-apply

bnellnm commented Nov 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Nov 20, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

LucasWilkinson commented Nov 21, 2025

Uh oh!

bnellnm commented Nov 21, 2025

Uh oh!

mergify bot commented Dec 1, 2025

Uh oh!

mergify bot commented Dec 9, 2025

Uh oh!

mergify bot commented Dec 9, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

bnellnm commented Nov 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Nov 20, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

LucasWilkinson commented Nov 21, 2025

Uh oh!

bnellnm commented Nov 21, 2025

Uh oh!

mergify bot commented Dec 1, 2025

Uh oh!

mergify bot commented Dec 9, 2025

Uh oh!

mergify bot commented Dec 9, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bnellnm commented Nov 20, 2025 •

edited by github-actions bot

Loading