Skip to content

[MoE Refactor] Remove MoE DP chunking#39107

Merged
robertgshaw2-redhat merged 22 commits intovllm-project:mainfrom
neuralmagic:remove-dp-chunking
Apr 14, 2026
Merged

[MoE Refactor] Remove MoE DP chunking#39107
robertgshaw2-redhat merged 22 commits intovllm-project:mainfrom
neuralmagic:remove-dp-chunking

Conversation

@bnellnm
Copy link
Copy Markdown
Collaborator

@bnellnm bnellnm commented Apr 6, 2026

Purpose

Remove DP chunking MoE runner. Use max_num_batched_tokens as default for max_num_tokens in FusedMoEConfig.

Test Plan

CI
Ran DeepEP related tests/kernels/moe tests locally.

Test Result

cc @robertgshaw2-redhat , @tlrmchlsmth


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

bnellnm added 2 commits April 6, 2026 18:33
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the VLLM_MOE_DP_CHUNK_SIZE and VLLM_ENABLE_MOE_DP_CHUNK environment variables, refactoring MoE chunking to rely on the scheduler's max_num_batched_tokens. It eliminates ChunkingMoERunner and simplifies related logic in the runner factory and shared experts. Feedback indicates that defaulting max_num_tokens to 0 in FusedMoEConfig causes an assertion failure if not explicitly set, which may break external integrations.

Comment thread vllm/model_executor/layers/fused_moe/config.py Outdated
Signed-off-by: Bill Nell <bnell@redhat.com>
@bnellnm bnellnm marked this pull request as ready for review April 6, 2026 21:27
@robertgshaw2-redhat robertgshaw2-redhat added the ready-run-all-tests Trigger CI with all tests for wide-ranging PRs label Apr 6, 2026
@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

we should set the default max-num-batched-tokens to something smaller if we detect deepep-ll

@bnellnm bnellnm changed the title [MoE Refactor] Remove dp chunking [MoE Refactor] Remove MoE DP chunking Apr 6, 2026
bnellnm added 3 commits April 6, 2026 22:41
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
@mergify mergify Bot added the ci/build label Apr 7, 2026
Comment thread vllm/config/parallel.py
@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

LGTM.

@elvircrn can you do a sanity check on gb?

@robertgshaw2-redhat robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 9, 2026
@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

shouldnt this also delete the ChunkingMoERunner file?

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

triggering a full CI run now

@robertgshaw2-redhat robertgshaw2-redhat added the ready-run-all-tests Trigger CI with all tests for wide-ranging PRs label Apr 9, 2026
@bnellnm
Copy link
Copy Markdown
Collaborator Author

bnellnm commented Apr 9, 2026

shouldnt this also delete the ChunkingMoERunner file?

I thought I did. Thanks for reminding me.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 10, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 10, 2026
@mergify mergify Bot removed the needs-rebase label Apr 10, 2026
@robertgshaw2-redhat robertgshaw2-redhat merged commit e1e318a into vllm-project:main Apr 14, 2026
142 of 144 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Done in NVIDIA Apr 14, 2026
vllm-agent pushed a commit to vllm-agent/vllm that referenced this pull request Apr 15, 2026
zxd1997066 pushed a commit to zxd1997066/vllm that referenced this pull request Apr 15, 2026
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
@bnellnm bnellnm deleted the remove-dp-chunking branch April 15, 2026 20:52
whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Apr 29, 2026
**Commit range:** `6f786f2`..`d886c26`
1. Fix 'DPMetadata' object has no attribute 'max_tokens_across_dp_cpu'
by vllm-project/vllm#39107
2. Fix 'Indexer' object has no attribute 'wk' by
vllm-project/vllm#38928
3. Fix 'float' object has no attribute 'language_model' by
vllm-project/vllm#39240
### What this PR does / why we need it? 

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.19.0
- vLLM main:
vllm-project/vllm@6f786f2

---------

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build nvidia ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants