[AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled by mqhc2020 · Pull Request #22948 · sgl-project/sglang

mqhc2020 · 2026-04-16T08:21:46Z

Motivation

After shared expert fusion is enabled for Qwen3.5 models (as in #20736 ), MXFP4 model hits an issue: the shared expert in the checkpoint is based on BF16 but current weight loading can only treat it as MXFP4 which is the dtype of routed experts. So before either online quantization is ready, or shared expert in MXFP4 models has been pre-quanted as MXFP4,
we have to skip shared expert fusion feature for MXFP4 model.

Modifications

In qwen2_moe.py, disable fusion when the shared experts are detected:

"model.language_model.layers.xxx.mlp.shared_expert.down_proj"
"model.language_model.layers.xxx.mlp.shared_expert.gate_proj"
"model.language_model.layers.xxx.mlp.shared_expert.up_proj"

Note that the following layers are irrelevant:

model.language_model.layers.xxx.mlp.shared_expert_gate"
mtp.layers.xxx.mlp.shared_expert*"

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request updates the can_fuse_shared_expert function in qwen2_moe.py to prevent the fusion of shared experts when they are explicitly excluded from quantization. This logic ensures that FP32 shared experts are not incorrectly fused into quantized MoE weight tensors, which would require unsupported online quantization. Feedback was provided to correct the type hint for the quant_config parameter from None to Optional[QuantizationConfig] to accurately reflect its usage and maintain consistency with the rest of the codebase.

hubertlu-tw · 2026-04-16T16:22:28Z

@mqhc2020 Thanks for the fix.
@yctseng0211 do we have capacity to add a 4-GPU e2e test to our CI for this model?

yctseng0211 · 2026-04-16T16:28:20Z

@mqhc2020 Thanks for the fix. @yctseng0211 do we have capacity to add a 4-GPU e2e test to our CI for this model?

@hubertlu-tw yeah I think we can add it in amd Nightly Test. If we get more capacity then we can move it to pr-test if needed.

BowenBao · 2026-04-16T18:36:40Z

+        exclude_layers = getattr(quant_config, "exclude_layers", [])
+        if any(
+            "shared_expert" in layer
+            and "shared_expert_gate" not in layer
+            and not layer.startswith("mtp.")
+            for layer in exclude_layers
+        ):
+            return False


can we add a method can_fuse_shared_expert to QuantConfig and implement this check inside QuarkConfig? That way we keep quark specific logic within quark.

and a more precise check is probably checking if shared_expert share the same quantization spec with moe layers.

HaiShaw

@mqhc2020 approve now, but look for refactor is feasible

…-project#22948) Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread python/sglang/srt/models/qwen2_moe.py Outdated

Qwen3.5 MXFP4 breaks after shared expert fusion is enabled

e5b3e21

mqhc2020 force-pushed the marv/fix_qwen3.5_mxfp4_shared_expert_fusion branch from 763f2d4 to e5b3e21 Compare April 16, 2026 08:32

mqhc2020 changed the title ~~[AMD] Qwen3 MXFP4 breaks if shared expert fusion is enabled~~ [AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled Apr 16, 2026

hubertlu-tw added amd run-ci labels Apr 16, 2026

hubertlu-tw requested review from HaiShaw and kkHuang-amd April 16, 2026 17:07

Merge branch 'main' into marv/fix_qwen3.5_mxfp4_shared_expert_fusion

9a39e38

BowenBao reviewed Apr 16, 2026

View reviewed changes

HaiShaw approved these changes Apr 16, 2026

View reviewed changes

HaiShaw merged commit 52f0b86 into sgl-project:main Apr 16, 2026
54 of 105 checks passed

hubertlu-tw mentioned this pull request Apr 18, 2026

[Bug] [rocm] qwen 3.5 mtp fp4 broken #23113

Closed

5 tasks

hubertlu-tw added a commit to hubertlu-tw/sglang that referenced this pull request Apr 18, 2026

[AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled (sgl…

b6f9c95

…-project#22948) Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>

jmamou pushed a commit to jmamou/sglang that referenced this pull request Apr 20, 2026

[AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled (sgl…

f2cbc01

…-project#22948) Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled (sgl…

16ab5b6

…-project#22948) Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>

kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026

[AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled (sgl…

888b9fc

…-project#22948) Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>

mqhc2020 mentioned this pull request May 4, 2026

AMD: move shared expert check function to quark #24046

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled#22948

[AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled#22948
HaiShaw merged 2 commits into
sgl-project:mainfrom
mqhc2020:marv/fix_qwen3.5_mxfp4_shared_expert_fusion

mqhc2020 commented Apr 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

hubertlu-tw commented Apr 16, 2026

Uh oh!

yctseng0211 commented Apr 16, 2026 •

edited

Loading

Uh oh!

BowenBao Apr 16, 2026

Uh oh!

HaiShaw left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mqhc2020 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

hubertlu-tw commented Apr 16, 2026

Uh oh!

yctseng0211 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BowenBao Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

HaiShaw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mqhc2020 commented Apr 16, 2026 •

edited

Loading

yctseng0211 commented Apr 16, 2026 •

edited

Loading