[AutoRound] Support WAN2.2 W4A16 quantization model by lvliang-intel · Pull Request #3353 · vllm-project/vllm-omni

lvliang-intel · 2026-05-05T13:06:02Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add AutoRound W4A16 quantization support for Wan2.2 pipelines and transformer modules.
https://huggingface.co/Intel/Wan2.2-TI2V-5B-Diffusers-int4-AutoRound
https://huggingface.co/Intel/Wan2.2-I2V-A14B-Diffusers-int4-AutoRound
https://huggingface.co/Intel/Wan2.2-T2V-A14B-Diffusers-int4-AutoRound

Related: #1325, #1777, #2670

Test Plan

Run UT
Run VBench dataset accuracy test

Test Result

Raw Scores

Subject Consistency	Wan2.2-I2V-A14B-Diffusers	Wan2.2-I2V-A14B-Diffusers-Int4-AutoRound	Wan2.2-T2V-A14B-Diffusers	Wan2.2-T2V-A14B-Diffusers-Int4-AutoRound
Subject Consistency	0.9752	0.9741	0.9508	0.9578
Background Consistency	0.9704	0.9691	0.9449	0.9465
Aesthetic Quality	0.6241	0.6089	0.5730	0.5980
Imaging Quality	0.6832	0.6679	0.6623	0.6591

Aggregate by Category

Category	Wan2.2-I2V-A14B-Diffusers	Wan2.2-I2V-A14B-Diffusers-Int4-AutoRound	Wan2.2-T2V-A14B-Diffusers	Wan2.2-T2V-A14B-Diffusers-Int4-AutoRound
Consistency	0.9728	0.9716	0.9478	0.9522
Quality	0.6537	0.6384	0.6176	0.6286

Evaluated Dimension Average

Model	Dimensions Evaluated	Avg Score
Wan2.2-I2V-A14B-Diffusers	4	0.8132
Wan2.2-I2V-A14B-Diffusers-Int4-AutoRound	4	0.8050
Wan2.2-T2V-A14B-Diffusers	4	0.7827
Wan2.2-T2V-A14B-Diffusers-Int4-AutoRound	4	0.7904

Generation Statistics

Model	Success Rate	Avg Latency(s)	Avg Memory(MB)	Speedup vs Ref	Memory Ratio vs Ref
Wan2.2-I2V-A14B-Diffusers	100.0	377.31	76309.33	1.00x	1.00x
Wan2.2-I2V-A14B-Diffusers-Int4-AutoRound	100.0	439.68	36893.0	0.86x	0.48x
Wan2.2-T2V-A14B-Diffusers	100.0	736.09	76298.33	1.00x	1.00x
Wan2.2-T2V-A14B-Diffusers-Int4-AutoRound	100.0	863.49	36891.33	0.85x	0.48x

The test is mainly for accuracy purpose. For video generation at batch size 1, Int4 W4A16 primarily saves memory (0.48x as shown — great for fitting larger models / longer videos in VRAM) but does not necessarily improve latency because the workload is compute-bound and dequantization overhead is significant.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector · 2026-05-05T13:06:10Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-05-05T21:11:07Z

Comprehensive benchmarks and well-structured tests. Memory reduction to 0.48x is significant for VRAM-constrained deployments. Two notes: 1) Checklist items at the bottom are unchecked - confirm documentation was updated if required. 2) Latency impact (0.86x speedup) is expected for compute-bound workloads at batch size 1, but consider documenting guidance for optimal batch sizes where dequantization overhead is amortized.

david6666666 · 2026-05-18T06:50:06Z

Merge conflicts need fixing before review. Thx.

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

david6666666 · 2026-05-19T08:59:02Z

LGTM now

david6666666 · 2026-05-19T09:00:40Z

@yenuo26 please check test

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

lvliang-intel requested a review from hsliuustc0106 as a code owner May 5, 2026 13:06

lvliang-intel force-pushed the feats/ar-w4a16-wan22 branch from 8ccefe8 to cbfbe97 Compare May 5, 2026 13:16

lvliang-intel force-pushed the feats/ar-w4a16-wan22 branch from cbfbe97 to 3a2cde7 Compare May 6, 2026 01:49

yiliu30 mentioned this pull request May 7, 2026

[RFC]: Intel Auto-Round x vLLM-Omni Quantization Support (2026 H1) #1325

Open

3 tasks

This was referenced May 8, 2026

[RFC]: Continuous Quantization Support #1854

Open

[RFC] [0.22.0]: Quantization Support JiusiServe/vllm-omni#182

Open

Gaohan123 added this to the v0.22.0 milestone May 11, 2026

lvliang-intel added 7 commits May 19, 2026 11:24

support autoround w4a16 for wan2.2

6c6f3b1

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

fix stage diffusion proc

4a43993

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

fix i2v

bd786ae

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

snapshot sys.modules before iteration to prevent RuntimeError

452e2e1

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

add test

27d744a

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

fix pre-commit

bf286bb

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

update doc

7616e04

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

lvliang-intel force-pushed the feats/ar-w4a16-wan22 branch from 3a2cde7 to 7616e04 Compare May 19, 2026 03:28

lvliang-intel requested review from Gaohan123, Isotr0py, RuixiangMa, SamitHuang, ZJY0516, david6666666, princepride, wtomin and yenuo26 as code owners May 19, 2026 03:28

lvliang-intel added 3 commits May 19, 2026 11:29

Merge branch 'main' into feats/ar-w4a16-wan22

224c031

remove unnecessary import

df49f4a

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Merge branch 'main' into feats/ar-w4a16-wan22

c3af8fc

fix lint

9cdf385

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

david6666666 added the ready label to trigger buildkite CI label May 19, 2026

yenuo26 reviewed May 19, 2026

View reviewed changes

adapt test code according to comments

2a8ea43

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

lvliang-intel requested review from congw729 and lishunyang12 as code owners May 19, 2026 14:40

Merge branch 'main' into feats/ar-w4a16-wan22

459579b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoRound] Support WAN2.2 W4A16 quantization model#3353

[AutoRound] Support WAN2.2 W4A16 quantization model#3353
lvliang-intel wants to merge 13 commits into
vllm-project:mainfrom
lvliang-intel:feats/ar-w4a16-wan22

lvliang-intel commented May 5, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented May 5, 2026

Uh oh!

hsliuustc0106 commented May 5, 2026

Uh oh!

david6666666 commented May 18, 2026

Uh oh!

david6666666 commented May 19, 2026

Uh oh!

david6666666 commented May 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

lvliang-intel commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Raw Scores

Aggregate by Category

Evaluated Dimension Average

Generation Statistics

Uh oh!

chatgpt-codex-connector Bot commented May 5, 2026

Uh oh!

hsliuustc0106 commented May 5, 2026

Uh oh!

david6666666 commented May 18, 2026

Uh oh!

david6666666 commented May 19, 2026

Uh oh!

david6666666 commented May 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lvliang-intel commented May 5, 2026 •

edited

Loading