Skip to content

[DSV4] Guard megamoe flag with Pure TP#41522

Merged
ywang96 merged 3 commits intovllm-project:mainfrom
zyongye:bug/megamoe_gate
May 2, 2026
Merged

[DSV4] Guard megamoe flag with Pure TP#41522
ywang96 merged 3 commits intovllm-project:mainfrom
zyongye:bug/megamoe_gate

Conversation

@zyongye
Copy link
Copy Markdown
Member

@zyongye zyongye commented May 2, 2026

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added the deepseek Related to DeepSeek models label May 2, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the initialization logic for DeepSeek V4 models to support the deep_gemm_mega_moe backend. While a guard was added to DeepseekV4MoE to ensure expert parallel is enabled when using MegaMoE, the reviewer pointed out that this same check is missing in DeepseekV4Model and suggested adding it for consistency and to ensure the model fails early during initialization.

Comment on lines +1229 to +1231
self.use_mega_moe = (
vllm_config.kernel_config.moe_backend == "deep_gemm_mega_moe"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The guard against using MegaMoE without expert parallel is missing here in DeepseekV4Model, although it was added to DeepseekV4MoE. For consistency and to ensure the model fails early during initialization (before creating layers), the same guard should be applied here. This also ensures that self.use_mega_moe is only True when the configuration is valid, which is important as this flag is used in the forward pass and for expert mapping logic.

Suggested change
self.use_mega_moe = (
vllm_config.kernel_config.moe_backend == "deep_gemm_mega_moe"
)
self.use_mega_moe = (
vllm_config.kernel_config.moe_backend == "deep_gemm_mega_moe"
)
if self.use_mega_moe and not vllm_config.parallel_config.enable_expert_parallel:
raise NotImplementedError(
"DeepSeek V4 MegaMoE currently requires expert parallel. "
"Enable it with --enable-expert-parallel, or pick a different "
"--moe-backend."
)

zyongye added 2 commits May 2, 2026 22:45
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label May 2, 2026
@ywang96 ywang96 merged commit 1c607d7 into vllm-project:main May 2, 2026
9 of 15 checks passed
@ywang96 ywang96 added this to the v0.20.1 milestone May 2, 2026
ywang96 pushed a commit that referenced this pull request May 2, 2026
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
joa-stdn pushed a commit to joa-stdn/vllm that referenced this pull request May 4, 2026
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
chaojun-zhang pushed a commit to chaojun-zhang/vllm that referenced this pull request May 6, 2026
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Copilot AI pushed a commit to hongbolv/vllm that referenced this pull request May 7, 2026
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants