[NPU] [Bugfix] Wan quantization fix by OrangeRedeng · Pull Request #24540 · sgl-project/sglang

OrangeRedeng · 2026-05-06T15:09:11Z

Description

Fix Wan2.2-T2V-A14B-Diffusers-w8a8/w8a8/mxfp8 quant scheme recognition on NPU by threading
reverse_param_names_mapping through the ModelSlim config stack.

Motivation

Fix of #24518. After PR #23625 reshaped quantized‑diffusion prefixes, the Wan2.2‑W8A8 model stopped loading (No modelslim compatible scheme). The bug is that _get_scheme_from_parts uses the model’s internal layer name (e.g. blocks.0.self_attn.to_q) to look up the quant description, but the quant config file uses architecture‑canonical names (e.g. blocks.0.attn1.to_q).

The WanVideo arch‑config already carries a reverse_param_names_mapping that translates internal names back to canonical names, but the ModelSlim initialisation path silently discarded this mapping.

Modifications

modelslim.py – accept reverse_param_names_mapping in __init__ and from_config, build a mapper via get_param_names_mapping, and use it in _get_scheme_from_parts before querying quant_model_description.json.
transformer_load_utils.py – read reverse_param_names_mapping from the arch config and pass it into get_quant_config.
quantization_utils.py – accept and forward reverse_param_names_mapping.
fsdp_load.py – add missing quant parameter suffixes (input_offset, quant_bias, deq_scale) to the sharding allow‑list; mark sharded tensors with requires_grad=False to avoid meta‑tensor copy errors.
npu/utils.py – update a performance warning to mention --dit-cpu-offload false.

Accuracy Tests

Command:
SGLANG_CACHE_DIT_FN=2 SGLANG_CACHE_DIT_BN=1 SGLANG_CACHE_DIT_WARMUP=4 SGLANG_CACHE_DIT_RDT=0.4 SGLANG_CACHE_DIT_MC=4 SGLANG_CACHE_DIT_TAYLORSEER=true SGLANG_CACHE_DIT_TS_ORDER=2 SGLANG_CACHE_DIT_ENABLED=true sglang generate --model-path ./weights/Wan2.2-T2V-A14B-Diffusers-w8a8/ --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --height 720 --width 1280 --tp-size 2 --sp-degree 2 --num-gpus 4 --num-frames 81 --num-inference-steps 40 --warmup true

Two_anthropomorphic_cats_in_comfy_boxing_gear_and_bright_gloves_fight_intensely_on_a_spotlighted_sta_20260507-001126_b2e8ddbd.mp4

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-05-06T15:09:15Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2026-05-06T19:16:05Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

OrangeRedeng · 2026-05-07T14:06:00Z

/gemini review

OrangeRedeng · 2026-05-07T14:06:08Z

/gemini summary

gemini-code-assist

Code Review

This pull request introduces parameter name mapping for ModelSlim quantization, allowing for more flexible layer identification during the quantization process. It also updates FSDP loading to handle additional quantization-related parameters like input_offset and deq_scale, while ensuring sharded tensors are loaded with gradients disabled. Review feedback identifies a potential TypeError in ModelSlimConfig.from_config due to a signature change lacking a default value for the new mapping parameter. Additionally, a logic error was found in _get_scheme_from_parts where the .weight suffix is omitted in the fallback path, which would cause configuration lookups to fail for standard layers.

gemini-code-assist · 2026-05-07T14:13:55Z

Summary of Changes

This pull request addresses a regression in quantization scheme loading for Wan2.2 models on NPU hardware. By ensuring that parameter name mappings are correctly passed through the ModelSlim initialization flow, the system can now accurately resolve internal layer names to canonical architecture names required for quantization configuration lookups. Additionally, the PR improves FSDP robustness by expanding the allowed parameter list and adjusting tensor gradient requirements to avoid runtime errors during model loading.

Highlights

Quantization Scheme Recognition: Enabled correct recognition of Wan2.2-T2V-A14B-Diffusers-w8a8 quantization schemes on NPU by propagating reverse_param_names_mapping through the ModelSlim configuration stack.
FSDP Loading Improvements: Updated FSDP loading to include missing quantization parameter suffixes (input_offset, quant_bias, deq_scale) and marked sharded tensors with requires_grad=False to prevent meta-tensor copy errors.
Performance Diagnostics: Updated NPU performance warnings to explicitly suggest disabling CPU offloading for better compatibility.

New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Activity

Pull request created by OrangeRedeng.
Author requested a summary and review via automated commands.
Automated review identified potential TypeError risks in from_config and suggested improvements for name mapping logic.

ping1jing2 · 2026-05-07T17:08:45Z

/tag-and-rerun-ci

ping1jing2 · 2026-05-11T05:31:54Z

this is a Diffusion related PR and all GPU CIs passed, so i merged it

Co-authored-by: ronnie_zheng <zl19940307@163.com>

github-actions Bot added documentation Improvements or additions to documentation quant LLM Quantization diffusion SGLang Diffusion labels May 6, 2026

Fix wan quantization on NPU

a1316d3

OrangeRedeng force-pushed the wan_quantization_fix branch from 03aeade to a1316d3 Compare May 6, 2026 19:15

OrangeRedeng marked this pull request as ready for review May 6, 2026 19:15

OrangeRedeng requested review from BBuf, iforgetmyname, mickqian, ping1jing2, yhyang201 and yingluosanqian as code owners May 6, 2026 19:15

Merge branch 'main' into wan_quantization_fix

b28391d

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

Comment thread python/sglang/multimodal_gen/runtime/layers/quantization/modelslim.py Outdated

Comment thread python/sglang/multimodal_gen/runtime/layers/quantization/modelslim.py Outdated

OrangeRedeng added 2 commits May 7, 2026 17:18

Resolve comment

f214d25

Resolve comment

2743978

ping1jing2 linked an issue May 7, 2026 that may be closed by this pull request

[Bug] [NPU] Wan2.2-T2V-A14B-Diffusers-w8a8 does not work #24518

Closed

5 tasks

ping1jing2 self-assigned this May 7, 2026

github-actions Bot added the run-ci label May 7, 2026

ping1jing2 approved these changes May 7, 2026

View reviewed changes

Comment thread python/sglang/multimodal_gen/runtime/layers/quantization/modelslim.py

OrangeRedeng added 3 commits May 7, 2026 22:08

Merge branch 'main' into wan_quantization_fix

6534f1c

Return mxfp8

8070371

Update modelslim.py

15573b9

OrangeRedeng mentioned this pull request May 7, 2026

✨ [diffusion][npu][quant] Add MXFP8 quantization support for Wan2.2 Diffusion on Ascend NPU #20922

Merged

OrangeRedeng added 6 commits May 7, 2026 22:24

Fix lint

fa42790

Merge branch 'main' into wan_quantization_fix

47b93cc

Merge branch 'main' into wan_quantization_fix

b8ac18b

Update fsdp_load.py

5ade319

Update fsdp_load.py

a0ee586

Update perf_baselines_npu

1e5f68b

github-actions Bot added the npu label May 8, 2026

OrangeRedeng and others added 9 commits May 8, 2026 16:40

Update perf_baselines_npu.

7e94678

Update modelslim.py

0b62f10

Update modelslim.py

88fb620

Fix lint

81cfe78

Merge branch 'main' into wan_quantization_fix

ee670f5

Fix perf_baselines_npu.json

598512b

Merge branch 'main' into wan_quantization_fix

c97acb6

Update perf_baselines_npu.json

d92e5a1

Merge branch 'main' into wan_quantization_fix

1e2e4a5

sglang-npu-bot merged commit 9ec2880 into sgl-project:main May 11, 2026
103 of 135 checks passed

LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026

[NPU] [Bugfix] Wan quantization fix (sgl-project#24540)

e055351

Co-authored-by: ronnie_zheng <zl19940307@163.com>

xjpang pushed a commit to xjpang/sglang that referenced this pull request May 13, 2026

[NPU] [Bugfix] Wan quantization fix (sgl-project#24540)

cda6b6a

Co-authored-by: ronnie_zheng <zl19940307@163.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU] [Bugfix] Wan quantization fix#24540

[NPU] [Bugfix] Wan quantization fix#24540
sglang-npu-bot merged 22 commits into
sgl-project:mainfrom
OrangeRedeng:wan_quantization_fix

OrangeRedeng commented May 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 6, 2026

Uh oh!

gemini-code-assist Bot commented May 6, 2026

Uh oh!

OrangeRedeng commented May 7, 2026

Uh oh!

OrangeRedeng commented May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot commented May 7, 2026

Uh oh!

ping1jing2 commented May 7, 2026

Uh oh!

Uh oh!

ping1jing2 commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

OrangeRedeng commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented May 6, 2026

Uh oh!

gemini-code-assist Bot commented May 6, 2026

Uh oh!

OrangeRedeng commented May 7, 2026

Uh oh!

OrangeRedeng commented May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot commented May 7, 2026

Summary of Changes

Highlights

Uh oh!

ping1jing2 commented May 7, 2026

Uh oh!

Uh oh!

ping1jing2 commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

OrangeRedeng commented May 6, 2026 •

edited

Loading