[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels by robertgshaw2-redhat · Pull Request #36061 · vllm-project/vllm

robertgshaw2-redhat · 2026-03-04T23:51:24Z

Summary

previously, we gave the shared expert to the MK if A2A is used, we should only do this is the MK can support Shared Expert overlap. I updated this to happen if we have DeepEP LL which supports this
fixes: [Bug]: DeepSeek v3.2 FP8 Failure to start server #36050

Signed-off-by: Robert Shaw <robshaw@redhat.com>

gemini-code-assist

Code Review

This pull request aims to fix an issue with shared expert computation in monolithic kernels by restricting the passing of shared_experts to the FusedMoEKernel only when the deepep_low_latency All-to-All backend is used, as it's the only one that supports shared expert overlap.

While the intent is correct, the change appears to introduce a critical bug. For other All-to-All backends, this change will cause the shared expert computation to be skipped entirely. This is because the DefaultMoERunner delegates shared expert computation to the modular kernel for all All-to-All configurations, but with this change, the kernel will no longer receive the shared expert module for non-deepep_ll backends. I've added comments with details on the issue and suggestions for a fix.

gemini-code-assist · 2026-03-04T23:54:03Z

vllm/model_executor/layers/fused_moe/oracle/fp8.py

        shared_experts=(
            shared_experts
-            if moe_config.moe_parallel_config.use_all2all_kernels
+            if moe_config.moe_parallel_config.use_deepep_ll_kernels


This change correctly identifies that only the deepep_low_latency backend supports shared expert overlap within the modular kernel. However, this introduces a critical issue for other All-to-All backends (e.g., deepep_high_throughput, mori).

Here's the breakdown of the issue:

For any All-to-All backend, a FusedMoEKernel is created, so quant_method.mk_owns_shared_expert becomes True.

This prevents DefaultMoERunner from computing the shared experts, as it delegates this task to the modular kernel.

With this change, for non-deepep_ll backends, shared_experts is passed as None to the FusedMoEKernel.

Consequently, the FusedMoEKernel also doesn't compute the shared experts.

This results in the shared expert computation being skipped entirely for these configurations, likely leading to incorrect model outputs.

To fix this, the logic that determines whether the modular kernel "owns" the shared expert computation needs to be updated. For instance, DefaultMoERunner should handle shared experts if use_all2all_kernels is true but use_deepep_ll_kernels is false.

gemini-code-assist · 2026-03-04T23:54:03Z

vllm/model_executor/layers/fused_moe/oracle/nvfp4.py

        shared_experts=(
            shared_experts
-            if moe_config.moe_parallel_config.use_all2all_kernels
+            if moe_config.moe_parallel_config.use_deepep_ll_kernels


Similar to the change in fp8.py, this modification correctly restricts passing shared_experts to the FusedMoEKernel to only when the deepep_low_latency backend is used. However, it creates the same critical issue for other All-to-All backends.

The shared expert computation will be skipped for backends like deepep_high_throughput because:

quant_method.mk_owns_shared_expert will be True, so DefaultMoERunner won't run the shared experts.

The FusedMoEKernel will receive shared_experts=None and will also not run them.

This logic needs to be reconciled to ensure shared experts are always computed. The DefaultMoERunner should likely handle the shared expert computation when a modular kernel is used but does not support shared expert overlap (i.e., when use_all2all_kernels is true but use_deepep_ll_kernels is false).

mergify · 2026-03-04T23:59:19Z

Hi @robertgshaw2-redhat, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

bnellnm · 2026-03-10T23:25:20Z

vllm/model_executor/layers/fused_moe/oracle/fp8.py

        shared_experts=(
            shared_experts
-            if moe_config.moe_parallel_config.use_all2all_kernels
+            if moe_config.moe_parallel_config.use_deepep_ll_kernels


Should this condition be prepare_finalize.supports_async? That's the only time it really matters for the MK to call shared_experts.

ProExpertProg

Good with this if @bnellnm is

…t#36061) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>

updated

c4b139b

Signed-off-by: Robert Shaw <robshaw@redhat.com>

robertgshaw2-redhat requested review from mgoin and pavanimajety as code owners March 4, 2026 23:51

robertgshaw2-redhat mentioned this pull request Mar 4, 2026

[Bug]: DeepSeek v3.2 FP8 Failure to start server #36050

Closed

1 task

mergify bot added the bug Something isn't working label Mar 4, 2026

gemini-code-assist bot reviewed Mar 4, 2026

View reviewed changes

Merge branch 'main' into fix-dp-ep-shared-expert-monolithic

37c2a9d

robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 10, 2026

robertgshaw2-redhat mentioned this pull request Mar 10, 2026

[MoE Refactor] Migrate Unquantized to Full Oracle Flow #36286

Open

5 tasks

Merge branch 'main' into fix-dp-ep-shared-expert-monolithic

566f5ab

bnellnm reviewed Mar 10, 2026

View reviewed changes

bnellnm approved these changes Mar 10, 2026

View reviewed changes

ProExpertProg approved these changes Mar 10, 2026

View reviewed changes

Merge branch 'main' into fix-dp-ep-shared-expert-monolithic

d74e051

robertgshaw2-redhat enabled auto-merge (squash) March 11, 2026 13:41

robertgshaw2-redhat merged commit b7e5a58 into main Mar 11, 2026
58 checks passed

robertgshaw2-redhat deleted the fix-dp-ep-shared-expert-monolithic branch March 11, 2026 16:07

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels (vllm-projec…

712251a

…t#36061) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels (vllm-projec…

ad9a4c3

…t#36061) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels#36061

[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels#36061
robertgshaw2-redhat merged 4 commits intomainfrom
fix-dp-ep-shared-expert-monolithic

robertgshaw2-redhat commented Mar 4, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Uh oh!

gemini-code-assist bot Mar 4, 2026

Uh oh!

mergify bot commented Mar 4, 2026

Uh oh!

bnellnm Mar 10, 2026

Uh oh!

ProExpertProg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

robertgshaw2-redhat commented Mar 4, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 4, 2026

Uh oh!

bnellnm Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robertgshaw2-redhat commented Mar 4, 2026 •

edited by github-actions bot

Loading