[NPU] [Bugfix] Wan quantization fix#24540
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
03aeade to
a1316d3
Compare
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/gemini review |
|
/gemini summary |
There was a problem hiding this comment.
Code Review
This pull request introduces parameter name mapping for ModelSlim quantization, allowing for more flexible layer identification during the quantization process. It also updates FSDP loading to handle additional quantization-related parameters like input_offset and deq_scale, while ensuring sharded tensors are loaded with gradients disabled. Review feedback identifies a potential TypeError in ModelSlimConfig.from_config due to a signature change lacking a default value for the new mapping parameter. Additionally, a logic error was found in _get_scheme_from_parts where the .weight suffix is omitted in the fallback path, which would cause configuration lookups to fail for standard layers.
Summary of ChangesThis pull request addresses a regression in quantization scheme loading for Wan2.2 models on NPU hardware. By ensuring that parameter name mappings are correctly passed through the ModelSlim initialization flow, the system can now accurately resolve internal layer names to canonical architecture names required for quantization configuration lookups. Additionally, the PR improves FSDP robustness by expanding the allowed parameter list and adjusting tensor gradient requirements to avoid runtime errors during model loading. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Activity
|
|
/tag-and-rerun-ci |
|
this is a Diffusion related PR and all GPU CIs passed, so i merged it |
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Description
Fix Wan2.2-T2V-A14B-Diffusers-w8a8/w8a8/mxfp8 quant scheme recognition on NPU by threading
reverse_param_names_mappingthrough the ModelSlim config stack.Motivation
Fix of #24518. After PR #23625 reshaped quantized‑diffusion prefixes, the Wan2.2‑W8A8 model stopped loading (
No modelslim compatible scheme). The bug is that_get_scheme_from_partsuses the model’s internal layer name (e.g.blocks.0.self_attn.to_q) to look up the quant description, but the quant config file uses architecture‑canonical names (e.g.blocks.0.attn1.to_q).The WanVideo arch‑config already carries a
reverse_param_names_mappingthat translates internal names back to canonical names, but the ModelSlim initialisation path silently discarded this mapping.Modifications
modelslim.py– acceptreverse_param_names_mappingin__init__andfrom_config, build a mapper viaget_param_names_mapping, and use it in_get_scheme_from_partsbefore queryingquant_model_description.json.transformer_load_utils.py– readreverse_param_names_mappingfrom the arch config and pass it intoget_quant_config.quantization_utils.py– accept and forwardreverse_param_names_mapping.fsdp_load.py– add missing quant parameter suffixes (input_offset,quant_bias,deq_scale) to the sharding allow‑list; mark sharded tensors withrequires_grad=Falseto avoid meta‑tensor copy errors.npu/utils.py– update a performance warning to mention--dit-cpu-offload false.Accuracy Tests
Command:
SGLANG_CACHE_DIT_FN=2 SGLANG_CACHE_DIT_BN=1 SGLANG_CACHE_DIT_WARMUP=4 SGLANG_CACHE_DIT_RDT=0.4 SGLANG_CACHE_DIT_MC=4 SGLANG_CACHE_DIT_TAYLORSEER=true SGLANG_CACHE_DIT_TS_ORDER=2 SGLANG_CACHE_DIT_ENABLED=true sglang generate --model-path ./weights/Wan2.2-T2V-A14B-Diffusers-w8a8/ --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --height 720 --width 1280 --tp-size 2 --sp-degree 2 --num-gpus 4 --num-frames 81 --num-inference-steps 40 --warmup trueTwo_anthropomorphic_cats_in_comfy_boxing_gear_and_bright_gloves_fight_intensely_on_a_spotlighted_sta_20260507-001126_b2e8ddbd.mp4
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci