[MoE Refactor] Remove MoE DP chunking#39107
[MoE Refactor] Remove MoE DP chunking#39107robertgshaw2-redhat merged 22 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Bill Nell <bnell@redhat.com>
There was a problem hiding this comment.
Code Review
This pull request removes the VLLM_MOE_DP_CHUNK_SIZE and VLLM_ENABLE_MOE_DP_CHUNK environment variables, refactoring MoE chunking to rely on the scheduler's max_num_batched_tokens. It eliminates ChunkingMoERunner and simplifies related logic in the runner factory and shared experts. Feedback indicates that defaulting max_num_tokens to 0 in FusedMoEConfig causes an assertion failure if not explicitly set, which may break external integrations.
Signed-off-by: Bill Nell <bnell@redhat.com>
|
we should set the default max-num-batched-tokens to something smaller if we detect deepep-ll |
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
|
LGTM. @elvircrn can you do a sanity check on gb? |
|
shouldnt this also delete the ChunkingMoERunner file? |
|
triggering a full CI run now |
I thought I did. Thanks for reminding me. |
Signed-off-by: Bill Nell <bnell@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
e1e318a
into
vllm-project:main
This reverts commit e1e318a.
Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: zengxian <xiangdong.zeng@intel.com>
Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
**Commit range:** `6f786f2`..`d886c26` 1. Fix 'DPMetadata' object has no attribute 'max_tokens_across_dp_cpu' by vllm-project/vllm#39107 2. Fix 'Indexer' object has no attribute 'wk' by vllm-project/vllm#38928 3. Fix 'float' object has no attribute 'language_model' by vllm-project/vllm#39240 ### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>
Purpose
Remove DP chunking MoE runner. Use
max_num_batched_tokensas default formax_num_tokensinFusedMoEConfig.Test Plan
CI
Ran DeepEP related tests/kernels/moe tests locally.
Test Result
cc @robertgshaw2-redhat , @tlrmchlsmth
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)