Skip to content

[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply#29066

Merged
vllm-bot merged 24 commits intovllm-project:mainfrom
neuralmagic:simplify-apply
Dec 9, 2025
Merged

[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply#29066
vllm-bot merged 24 commits intovllm-project:mainfrom
neuralmagic:simplify-apply

Conversation

@bnellnm
Copy link
Copy Markdown
Collaborator

@bnellnm bnellnm commented Nov 20, 2025

Purpose

After making select_experts a non-static method (#29067), we can avoid passing most of the arguments to FusedMoEMethodBase.apply and get them from the layer directly.

This doesn't really decrease flexibility, the MoE quantization methods were already fairly tightly coupled with FusedMoE. A further step could be making FusedMoE an abstract base class that simply provides the quant_method parameters. Then the current FusedMoE class would become a subclass of that.

Test Plan

CI

Test Result

Blackwell failure is known bug fixed by #30336

cc @varun-sundar-rabindranath


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link
Copy Markdown

mergify bot commented Nov 20, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 20, 2025
@bnellnm bnellnm changed the title Simplify apply Remove most arguments to FusedMoEMethodBase.apply Nov 20, 2025
@bnellnm bnellnm changed the title Remove most arguments to FusedMoEMethodBase.apply Remove most arguments to FusedMoEMethodBase.apply Nov 20, 2025
@bnellnm bnellnm marked this pull request as ready for review November 20, 2025 16:22
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@LucasWilkinson
Copy link
Copy Markdown
Collaborator

Looks like we need to land #29067 first? marking draft until that lands 👍

@LucasWilkinson LucasWilkinson marked this pull request as draft November 21, 2025 19:42
@mergify mergify bot added the nvidia label Nov 21, 2025
@mergify mergify bot removed the needs-rebase label Nov 21, 2025
@bnellnm
Copy link
Copy Markdown
Collaborator Author

bnellnm commented Nov 21, 2025

  • draft

Yeah, sorry this should have been a draft to start with. It needs the other PR to land first.

@bnellnm bnellnm changed the title Remove most arguments to FusedMoEMethodBase.apply [MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply Nov 21, 2025
@bnellnm bnellnm marked this pull request as ready for review November 24, 2025 19:38
@mergify
Copy link
Copy Markdown

mergify bot commented Dec 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 1, 2025
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Dec 9, 2025

Hi @bnellnm, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Bill Nell <bnell@redhat.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Dec 9, 2025

Hi @bnellnm, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@mgoin mgoin added the moe label Dec 9, 2025
Copy link
Copy Markdown
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Bill! Thinking now on the state, the main downside I see of this move is that since we aren't passing arguments into the function, it is less explicit for each apply method to assert against unsupported modes as we add more. Hopefully we can move to a system like the attention backends for each kernel backend

@vllm-bot vllm-bot merged commit 00e5cbb into vllm-project:main Dec 9, 2025
132 of 136 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in NVIDIA Dec 9, 2025
@bnellnm bnellnm deleted the simplify-apply branch December 9, 2025 21:59
iboiko-habana added a commit to vllm-project/vllm-gaudi that referenced this pull request Dec 12, 2025
Culprit commit: vllm-project/vllm#29066

---------

Signed-off-by: Dobrzyniewicz, Agata <agata.dobrzyniewicz@intel.com>
Signed-off-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Dec 15, 2025
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
1. fix vllm-project/vllm#27938
2. fix vllm-project/vllm#27145
pooling models now supports chunked prefill and prefix caching,
3. fix vllm-project/vllm#30181
define the CPU fields in the field config where they really belong.
4. fix vllm-project/vllm#28168
define the CPU fields in the field config where they really belong.
5. fix vllm-project/vllm#30201
some moudle rename
6. fix vllm-project/vllm#29067
fusedmoe moudle refactor
7. fix vllm-project/vllm#29066
fusedmoe moudle refactor
8. fix vllm-project/vllm#29624
### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
1. fix vllm-project/vllm#27938
2. fix vllm-project/vllm#27145
pooling models now supports chunked prefill and prefix caching,
3. fix vllm-project/vllm#30181
define the CPU fields in the field config where they really belong.
4. fix vllm-project/vllm#28168
define the CPU fields in the field config where they really belong.
5. fix vllm-project/vllm#30201
some moudle rename
6. fix vllm-project/vllm#29067
fusedmoe moudle refactor
7. fix vllm-project/vllm#29066
fusedmoe moudle refactor
8. fix vllm-project/vllm#29624
### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…lm-project#29066)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
1. fix vllm-project/vllm#27938
2. fix vllm-project/vllm#27145
pooling models now supports chunked prefill and prefix caching,
3. fix vllm-project/vllm#30181
define the CPU fields in the field config where they really belong.
4. fix vllm-project/vllm#28168
define the CPU fields in the field config where they really belong.
5. fix vllm-project/vllm#30201
some moudle rename
6. fix vllm-project/vllm#29067
fusedmoe moudle refactor
7. fix vllm-project/vllm#29066
fusedmoe moudle refactor
8. fix vllm-project/vllm#29624
### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
1. fix vllm-project/vllm#27938
2. fix vllm-project/vllm#27145
pooling models now supports chunked prefill and prefix caching,
3. fix vllm-project/vllm#30181
define the CPU fields in the field config where they really belong.
4. fix vllm-project/vllm#28168
define the CPU fields in the field config where they really belong.
5. fix vllm-project/vllm#30201
some moudle rename
6. fix vllm-project/vllm#29067
fusedmoe moudle refactor
7. fix vllm-project/vllm#29066
fusedmoe moudle refactor
8. fix vllm-project/vllm#29624
### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants