[Feature]: Remove Chunking From FusedMoE by SouthWest7 · Pull Request #34086 · vllm-project/vllm

SouthWest7 · 2026-02-08T11:39:30Z

Purpose

Remove the kernel-level chunking mechanism from FusedMoE.

Resolves #30620

Test Plan

pytest tests/kernels/moe/test_moe.py

Test Result

============================================== test session starts ===============================================
platform linux -- Python 3.12.4, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/mxn/llm/vllm
configfile: pyproject.toml
plugins: anyio-4.12.1, asyncio-1.3.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 3551 items                                                                                             

tests/kernels/moe/test_moe.py ............................................................................ [  2%]
.......................................................................................................... [  5%]
.......................................................................................................... [  8%]
.......................................................................................................... [ 11%]
.......................................................................................................... [ 14%]
.......................................................................................................... [ 17%]
.......................................................................................................... [ 20%]
.......................................................................................................... [ 23%]
.......................................................................................................... [ 26%]
.......................................................................................................... [ 29%]
.......................................................................................................... [ 31%]
.......................................................................................................... [ 34%]
.......................................................................................................... [ 37%]
.......................................................................................................... [ 40%]
.......................................................................................................... [ 43%]
.......................................................................................................... [ 46%]
.......................................................................................................... [ 49%]
.......................................................................................................... [ 52%]
.......................................................................................................... [ 55%]
.......................................................................................................... [ 58%]
.......................................................................................................... [ 61%]
.......................................................................................................... [ 64%]
.......................................................................................................... [ 67%]
.......................................................................................................... [ 70%]
.......................................................................................................... [ 73%]
.......................................................................................................... [ 76%]
.......................................................................................................... [ 79%]
.......................................................................................................... [ 82%]
.......................................................................................................... [ 85%]
.......................................................................................................... [ 88%]
.......................................................................................................... [ 91%]
.......................................................................................................... [ 94%]
.......................................................................................................... [ 97%]
...ssssssss..s..........ss..s..s.................s.................................                              [100%]

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2026-02-08T11:40:06Z

Documentation preview: https://vllm--34086.org.readthedocs.build/en/34086/

gemini-code-assist

Code Review

This pull request removes the kernel-level chunking mechanism from FusedMoE, which simplifies the codebase significantly. The changes are mostly about removing code related to chunking and relying on the scheduler's chunked prefill to handle large inputs. A safety check has been added to the non-modular path to prevent illegal memory access with Triton kernels when the number of tokens is too large. However, my review identified that a similar safety check is missing in the modular kernel path for Triton-based experts, which could lead to memory corruption issues. I've added a critical comment to address this.

vllm/model_executor/layers/fused_moe/fused_moe.py

ZJY0516 · 2026-02-10T16:12:54Z

Chunked prefill now guarantees that the scheduler never sends more tokens than ~65K in a single forward pass

Do we have any assertion for this?

mergify · 2026-02-10T16:15:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @SouthWest7.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/model_executor/layers/fused_moe/fused_moe.py

Signed-off-by: SouthWest7 <am1ao@qq.com>

SouthWest7 · 2026-02-11T17:08:12Z

Chunked prefill now guarantees that the scheduler never sends more tokens than ~65K in a single forward pass

Do we have any assertion for this?

Sorry, there is actually no related assertion — I misunderstood the original issue. I’ll be more careful in future changes.

# Conflicts: # tests/kernels/moe/test_flashinfer.py # vllm/model_executor/layers/fused_moe/fused_moe.py # vllm/model_executor/layers/fused_moe/modular_kernel.py

mergify · 2026-03-03T19:53:02Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @SouthWest7.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: SouthWest7 <am1ao@qq.com> # Conflicts: # docs/design/fused_moe_modular_kernel.md # tests/kernels/moe/test_modular_kernel_combinations.py # vllm/model_executor/layers/fused_moe/modular_kernel.py

vllm/model_executor/layers/fused_moe/modular_kernel.py

ProExpertProg

Lgtm but let's wait for @robertgshaw2-redhat & bill's response for the 1 recent comment before merging

SouthWest7 · 2026-03-11T03:16:04Z

@ProExpertProg I marked the previous comment as resolved because reviewer said it was only related to chunking.

ProExpertProg · 2026-03-11T17:09:27Z

Thank you for the contribution! Let's hope CI passes

Signed-off-by: SouthWest7 <am1ao@qq.com>

Remove supports_chunking from test helpers to match main branch changes from vllm-project#34086, and replace torch.cuda.device_count() with torch.accelerator.device_count() per project policy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

Signed-off-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Southwest <1403572259@qq.com> Signed-off-by: southwest <am1ao@qq.com> Signed-off-by: Xinan Miao <1403572259@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com>

Signed-off-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Southwest <1403572259@qq.com> Signed-off-by: southwest <am1ao@qq.com> Signed-off-by: Xinan Miao <1403572259@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

mergify bot added documentation Improvements or additions to documentation gpt-oss Related to GPT-OSS models nvidia rocm Related to AMD ROCm labels Feb 8, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Feb 8, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Feb 8, 2026

github-project-automation bot added this to NVIDIA and AMD Feb 8, 2026

github-project-automation bot moved this to Todo in AMD Feb 8, 2026

gemini-code-assist bot reviewed Feb 8, 2026

View reviewed changes

vllm/model_executor/layers/fused_moe/fused_moe.py Show resolved Hide resolved

SouthWest7 marked this pull request as ready for review February 9, 2026 03:03

SouthWest7 requested review from WoosukKwon, jeejeelee, mgoin, pavanimajety, tjtanaa, tlrmchlsmth and yewentao256 as code owners February 9, 2026 03:03

mergify bot added the needs-rebase label Feb 10, 2026

bnellnm reviewed Feb 10, 2026

View reviewed changes

vllm/model_executor/layers/fused_moe/fused_moe.py Outdated Show resolved Hide resolved

tlrmchlsmth mentioned this pull request Feb 10, 2026

[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides #34279

Merged

4 tasks

mergify bot removed the needs-rebase label Feb 11, 2026

SouthWest7 added 2 commits February 12, 2026 00:52

[Feature]: Remove Chunking From FusedMoE

b64a0a7

Signed-off-by: SouthWest7 <am1ao@qq.com>

[Feature]: Remove Chunking From FusedMoE

b3617de

Signed-off-by: SouthWest7 <am1ao@qq.com>

SouthWest7 force-pushed the feat/remove-chunk branch from 23f6a80 to b3617de Compare February 11, 2026 17:12

Merge remote-tracking branch 'origin/main' into feat/remove-chunk

bd35e80

# Conflicts: # tests/kernels/moe/test_flashinfer.py # vllm/model_executor/layers/fused_moe/fused_moe.py # vllm/model_executor/layers/fused_moe/modular_kernel.py

mergify bot added the needs-rebase label Mar 3, 2026

Merge remote-tracking branch 'upstream/main' into feat/remove-chunk

209ecf8

Signed-off-by: SouthWest7 <am1ao@qq.com> # Conflicts: # docs/design/fused_moe_modular_kernel.md # tests/kernels/moe/test_modular_kernel_combinations.py # vllm/model_executor/layers/fused_moe/modular_kernel.py

mergify bot removed the needs-rebase label Mar 7, 2026

SouthWest7 mentioned this pull request Mar 7, 2026

[Feature]: Remove Chunking From FusedMoE #30620

Closed

1 task

SouthWest7 commented Mar 8, 2026

View reviewed changes

vllm/model_executor/layers/fused_moe/modular_kernel.py Show resolved Hide resolved

ProExpertProg approved these changes Mar 8, 2026

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Mar 8, 2026

github-project-automation bot moved this to Ready in NVIDIA Mar 8, 2026

Merge branch 'main' into feat/remove-chunk

ac3573d

ProExpertProg enabled auto-merge (squash) March 11, 2026 17:08

ProExpertProg approved these changes Mar 11, 2026

View reviewed changes

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 11, 2026

auto-merge was automatically disabled March 12, 2026 02:39
Head branch was pushed to by a user without write access

SouthWest7 force-pushed the feat/remove-chunk branch from afe35c2 to 7bad003 Compare March 12, 2026 02:43

Remove incorrect and redundant assertion.

7bad003

Signed-off-by: SouthWest7 <am1ao@qq.com>

ProExpertProg merged commit 2cdf922 into vllm-project:main Mar 12, 2026
61 of 62 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Mar 12, 2026

github-project-automation bot moved this from Todo to Done in AMD Mar 12, 2026

github-project-automation bot moved this from Ready to Done in NVIDIA Mar 12, 2026

SouthWest7 deleted the feat/remove-chunk branch March 12, 2026 23:20

SouthWest7 mentioned this pull request Mar 13, 2026

[Feature]: Unwrap FusedMoE custom op #31985

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Remove Chunking From FusedMoE#34086

[Feature]: Remove Chunking From FusedMoE#34086
ProExpertProg merged 16 commits intovllm-project:mainfrom
SouthWest7:feat/remove-chunk

SouthWest7 commented Feb 8, 2026 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Feb 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

ZJY0516 commented Feb 10, 2026

Uh oh!

mergify bot commented Feb 10, 2026

Uh oh!

Uh oh!

SouthWest7 commented Feb 11, 2026

Uh oh!

mergify bot commented Mar 3, 2026

Uh oh!

Uh oh!

ProExpertProg left a comment

Uh oh!

SouthWest7 commented Mar 11, 2026

Uh oh!

ProExpertProg commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

SouthWest7 commented Feb 8, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Feb 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ZJY0516 commented Feb 10, 2026

Uh oh!

mergify bot commented Feb 10, 2026

Uh oh!

Uh oh!

SouthWest7 commented Feb 11, 2026

Uh oh!

mergify bot commented Mar 3, 2026

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

SouthWest7 commented Mar 11, 2026

Uh oh!

ProExpertProg commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SouthWest7 commented Feb 8, 2026 •

edited by github-actions bot

Loading