Skip to content

[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels#36061

Merged
robertgshaw2-redhat merged 4 commits intomainfrom
fix-dp-ep-shared-expert-monolithic
Mar 11, 2026
Merged

[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels#36061
robertgshaw2-redhat merged 4 commits intomainfrom
fix-dp-ep-shared-expert-monolithic

Conversation

@robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Mar 4, 2026

Summary

Signed-off-by: Robert Shaw <robshaw@redhat.com>
@mergify mergify bot added the bug Something isn't working label Mar 4, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue with shared expert computation in monolithic kernels by restricting the passing of shared_experts to the FusedMoEKernel only when the deepep_low_latency All-to-All backend is used, as it's the only one that supports shared expert overlap.

While the intent is correct, the change appears to introduce a critical bug. For other All-to-All backends, this change will cause the shared expert computation to be skipped entirely. This is because the DefaultMoERunner delegates shared expert computation to the modular kernel for all All-to-All configurations, but with this change, the kernel will no longer receive the shared expert module for non-deepep_ll backends. I've added comments with details on the issue and suggestions for a fix.

shared_experts=(
shared_experts
if moe_config.moe_parallel_config.use_all2all_kernels
if moe_config.moe_parallel_config.use_deepep_ll_kernels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change correctly identifies that only the deepep_low_latency backend supports shared expert overlap within the modular kernel. However, this introduces a critical issue for other All-to-All backends (e.g., deepep_high_throughput, mori).

Here's the breakdown of the issue:

  1. For any All-to-All backend, a FusedMoEKernel is created, so quant_method.mk_owns_shared_expert becomes True.
  2. This prevents DefaultMoERunner from computing the shared experts, as it delegates this task to the modular kernel.
  3. With this change, for non-deepep_ll backends, shared_experts is passed as None to the FusedMoEKernel.
  4. Consequently, the FusedMoEKernel also doesn't compute the shared experts.

This results in the shared expert computation being skipped entirely for these configurations, likely leading to incorrect model outputs.

To fix this, the logic that determines whether the modular kernel "owns" the shared expert computation needs to be updated. For instance, DefaultMoERunner should handle shared experts if use_all2all_kernels is true but use_deepep_ll_kernels is false.

shared_experts=(
shared_experts
if moe_config.moe_parallel_config.use_all2all_kernels
if moe_config.moe_parallel_config.use_deepep_ll_kernels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Similar to the change in fp8.py, this modification correctly restricts passing shared_experts to the FusedMoEKernel to only when the deepep_low_latency backend is used. However, it creates the same critical issue for other All-to-All backends.

The shared expert computation will be skipped for backends like deepep_high_throughput because:

  1. quant_method.mk_owns_shared_expert will be True, so DefaultMoERunner won't run the shared experts.
  2. The FusedMoEKernel will receive shared_experts=None and will also not run them.

This logic needs to be reconciled to ensure shared experts are always computed. The DefaultMoERunner should likely handle the shared expert computation when a modular kernel is used but does not support shared expert overlap (i.e., when use_all2all_kernels is true but use_deepep_ll_kernels is false).

@mergify
Copy link

mergify bot commented Mar 4, 2026

Hi @robertgshaw2-redhat, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@robertgshaw2-redhat robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 10, 2026
shared_experts=(
shared_experts
if moe_config.moe_parallel_config.use_all2all_kernels
if moe_config.moe_parallel_config.use_deepep_ll_kernels
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this condition be prepare_finalize.supports_async? That's the only time it really matters for the MK to call shared_experts.

Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good with this if @bnellnm is

@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) March 11, 2026 13:41
@robertgshaw2-redhat robertgshaw2-redhat merged commit b7e5a58 into main Mar 11, 2026
58 checks passed
@robertgshaw2-redhat robertgshaw2-redhat deleted the fix-dp-ep-shared-expert-monolithic branch March 11, 2026 16:07
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
…t#36061)

Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
…t#36061)

Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: DeepSeek v3.2 FP8 Failure to start server

3 participants