[Feature]: Remove Chunking From FusedMoE#34086
[Feature]: Remove Chunking From FusedMoE#34086ProExpertProg merged 16 commits intovllm-project:mainfrom
Conversation
|
Documentation preview: https://vllm--34086.org.readthedocs.build/en/34086/ |
There was a problem hiding this comment.
Code Review
This pull request removes the kernel-level chunking mechanism from FusedMoE, which simplifies the codebase significantly. The changes are mostly about removing code related to chunking and relying on the scheduler's chunked prefill to handle large inputs. A safety check has been added to the non-modular path to prevent illegal memory access with Triton kernels when the number of tokens is too large. However, my review identified that a similar safety check is missing in the modular kernel path for Triton-based experts, which could lead to memory corruption issues. I've added a critical comment to address this.
Do we have any assertion for this? |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: SouthWest7 <am1ao@qq.com>
Signed-off-by: SouthWest7 <am1ao@qq.com>
Sorry, there is actually no related assertion — I misunderstood the original issue. I’ll be more careful in future changes. |
23f6a80 to
b3617de
Compare
# Conflicts: # tests/kernels/moe/test_flashinfer.py # vllm/model_executor/layers/fused_moe/fused_moe.py # vllm/model_executor/layers/fused_moe/modular_kernel.py
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: SouthWest7 <am1ao@qq.com> # Conflicts: # docs/design/fused_moe_modular_kernel.md # tests/kernels/moe/test_modular_kernel_combinations.py # vllm/model_executor/layers/fused_moe/modular_kernel.py
ProExpertProg
left a comment
There was a problem hiding this comment.
Lgtm but let's wait for @robertgshaw2-redhat & bill's response for the 1 recent comment before merging
|
@ProExpertProg I marked the previous comment as resolved because reviewer said it was only related to chunking. |
|
Thank you for the contribution! Let's hope CI passes |
Head branch was pushed to by a user without write access
afe35c2 to
7bad003
Compare
Signed-off-by: SouthWest7 <am1ao@qq.com>
Remove supports_chunking from test helpers to match main branch changes from vllm-project#34086, and replace torch.cuda.device_count() with torch.accelerator.device_count() per project policy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Southwest <1403572259@qq.com> Signed-off-by: southwest <am1ao@qq.com> Signed-off-by: Xinan Miao <1403572259@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com>
Signed-off-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Southwest <1403572259@qq.com> Signed-off-by: southwest <am1ao@qq.com> Signed-off-by: Xinan Miao <1403572259@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com>
Signed-off-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Southwest <1403572259@qq.com> Signed-off-by: southwest <am1ao@qq.com> Signed-off-by: Xinan Miao <1403572259@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com>
Signed-off-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Southwest <1403572259@qq.com> Signed-off-by: southwest <am1ao@qq.com> Signed-off-by: Xinan Miao <1403572259@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com>
Signed-off-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Southwest <1403572259@qq.com> Signed-off-by: southwest <am1ao@qq.com> Signed-off-by: Xinan Miao <1403572259@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
Purpose
Remove the kernel-level chunking mechanism from FusedMoE.
Resolves #30620
Test Plan
Test Result
============================================== test session starts =============================================== platform linux -- Python 3.12.4, pytest-9.0.2, pluggy-1.6.0 rootdir: /home/mxn/llm/vllm configfile: pyproject.toml plugins: anyio-4.12.1, asyncio-1.3.0 asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 3551 items tests/kernels/moe/test_moe.py ............................................................................ [ 2%] .......................................................................................................... [ 5%] .......................................................................................................... [ 8%] .......................................................................................................... [ 11%] .......................................................................................................... [ 14%] .......................................................................................................... [ 17%] .......................................................................................................... [ 20%] .......................................................................................................... [ 23%] .......................................................................................................... [ 26%] .......................................................................................................... [ 29%] .......................................................................................................... [ 31%] .......................................................................................................... [ 34%] .......................................................................................................... [ 37%] .......................................................................................................... [ 40%] .......................................................................................................... [ 43%] .......................................................................................................... [ 46%] .......................................................................................................... [ 49%] .......................................................................................................... [ 52%] .......................................................................................................... [ 55%] .......................................................................................................... [ 58%] .......................................................................................................... [ 61%] .......................................................................................................... [ 64%] .......................................................................................................... [ 67%] .......................................................................................................... [ 70%] .......................................................................................................... [ 73%] .......................................................................................................... [ 76%] .......................................................................................................... [ 79%] .......................................................................................................... [ 82%] .......................................................................................................... [ 85%] .......................................................................................................... [ 88%] .......................................................................................................... [ 91%] .......................................................................................................... [ 94%] .......................................................................................................... [ 97%] ...ssssssss..s..........ss..s..s.................s................................. [100%]Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.