Skip to content

[AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled#22948

Merged
HaiShaw merged 2 commits into
sgl-project:mainfrom
mqhc2020:marv/fix_qwen3.5_mxfp4_shared_expert_fusion
Apr 16, 2026
Merged

[AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled#22948
HaiShaw merged 2 commits into
sgl-project:mainfrom
mqhc2020:marv/fix_qwen3.5_mxfp4_shared_expert_fusion

Conversation

@mqhc2020
Copy link
Copy Markdown
Contributor

@mqhc2020 mqhc2020 commented Apr 16, 2026

Motivation

After shared expert fusion is enabled for Qwen3.5 models (as in #20736 ), MXFP4 model hits an issue: the shared expert in the checkpoint is based on BF16 but current weight loading can only treat it as MXFP4 which is the dtype of routed experts. So before either online quantization is ready, or shared expert in MXFP4 models has been pre-quanted as MXFP4,
we have to skip shared expert fusion feature for MXFP4 model.

Modifications

In qwen2_moe.py, disable fusion when the shared experts are detected:

  • "model.language_model.layers.xxx.mlp.shared_expert.down_proj"
  • "model.language_model.layers.xxx.mlp.shared_expert.gate_proj"
  • "model.language_model.layers.xxx.mlp.shared_expert.up_proj"

Note that the following layers are irrelevant:

  • model.language_model.layers.xxx.mlp.shared_expert_gate"
  • mtp.layers.xxx.mlp.shared_expert*"

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the can_fuse_shared_expert function in qwen2_moe.py to prevent the fusion of shared experts when they are explicitly excluded from quantization. This logic ensures that FP32 shared experts are not incorrectly fused into quantized MoE weight tensors, which would require unsupported online quantization. Feedback was provided to correct the type hint for the quant_config parameter from None to Optional[QuantizationConfig] to accurately reflect its usage and maintain consistency with the rest of the codebase.

Comment thread python/sglang/srt/models/qwen2_moe.py Outdated
@mqhc2020 mqhc2020 force-pushed the marv/fix_qwen3.5_mxfp4_shared_expert_fusion branch from 763f2d4 to e5b3e21 Compare April 16, 2026 08:32
@mqhc2020 mqhc2020 changed the title [AMD] Qwen3 MXFP4 breaks if shared expert fusion is enabled [AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled Apr 16, 2026
@hubertlu-tw
Copy link
Copy Markdown
Collaborator

@mqhc2020 Thanks for the fix.
@yctseng0211 do we have capacity to add a 4-GPU e2e test to our CI for this model?

@yctseng0211
Copy link
Copy Markdown
Collaborator

yctseng0211 commented Apr 16, 2026

@mqhc2020 Thanks for the fix. @yctseng0211 do we have capacity to add a 4-GPU e2e test to our CI for this model?

@hubertlu-tw yeah I think we can add it in amd Nightly Test. If we get more capacity then we can move it to pr-test if needed.

Comment on lines +129 to +136
exclude_layers = getattr(quant_config, "exclude_layers", [])
if any(
"shared_expert" in layer
and "shared_expert_gate" not in layer
and not layer.startswith("mtp.")
for layer in exclude_layers
):
return False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a method can_fuse_shared_expert to QuantConfig and implement this check inside QuarkConfig? That way we keep quark specific logic within quark.

and a more precise check is probably checking if shared_expert share the same quantization spec with moe layers.

Copy link
Copy Markdown
Collaborator

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mqhc2020 approve now, but look for refactor is feasible

@HaiShaw HaiShaw merged commit 52f0b86 into sgl-project:main Apr 16, 2026
54 of 105 checks passed
hubertlu-tw added a commit to hubertlu-tw/sglang that referenced this pull request Apr 18, 2026
…-project#22948)

Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
jmamou pushed a commit to jmamou/sglang that referenced this pull request Apr 20, 2026
…-project#22948)

Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
…-project#22948)

Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026
…-project#22948)

Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026
…-project#22948)

Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants