[Model] Adapt Wan2.2-I2V-A14B via LightX2V offline conversion path by Celeste-jq · Pull Request #2134 · vllm-project/vllm-omni

Celeste-jq · 2026-03-24T11:52:08Z

Purpose

Adapt Wan-AI/Wan2.2-I2V-A14B to the existing vLLM-Omni Diffusers runtime path via an offline conversion workflow, instead of extending runtime protocol for online dual-LoRA loading.

This addresses the integration path discussed in #2093.

What this PR changes

Add offline assembly helper for Wan2.2 LightX2V outputs:
- tools/wan22/assemble_lightx2v_wan22_i2v_diffusers.py
- Supports both:
  - single-file checkpoints
  - sharded checkpoints (*.index.json + shards)
Add Wan2.2 loader compatibility for converted checkpoint key variants:
- vllm_omni/diffusion/models/wan2_2/wan2_2_transformer.py
- Accept alias mapping:
  - blocks.N.modulation -> blocks.N.scale_shift_table
Add Wan2.2 sampling controls needed by LightX2V-distilled setups:
- sample_solver switch (unipc / euler)
- request-level flow_shift support
- request/engine boundary_ratio plumbing in default diffusion stage path
- Wan-specific Euler scheduler implementation (vllm_omni/diffusion/models/wan2_2/scheduling_wan_euler.py)
- non-CUDA compatibility fix in Wan Euler scheduler init
- touched files:
  - vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py
  - vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_i2v.py
  - vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_ti2v.py
  - vllm_omni/engine/async_omni_engine.py
  - examples/offline_inference/image_to_video/image_to_video.py
Add user guide and nav entry:
- docs/user_guide/examples/offline_inference/wan22_i2v_lightx2v_conversion.md
- docs/.nav.yml
Include a docs build compatibility fix that is currently part of this branch:
- mkdocs.yml inventory URL adjustment

Why this approach

Minimal runtime impact: keep standard Diffusers loading path.
Lower regression risk than introducing online dual-LoRA runtime support.
Practical for current upstream architecture and easier to maintain.

Euler note for LightX2V Distill

For Wan2.2-I2V-A14B + Wan2.2-Distill-Loras (4-step distilled LoRAs), sample_solver=euler is important for quality stability.

In practice on this setup:

num_inference_steps=4
sample_solver=euler
flow_shift=12 (for 480p)
guidance_scale=1.0
guidance_scale_high=1.0
boundary_ratio=0.875

produces significantly better visual quality than using the previous default sampling setup.

Test Plan

python converter.py   --source /home/xx/Wan-AI/Wan2.2-I2V-A14B/high_noise_model   --output /home/xx/Wan-AI/wan22_lightx2v/high_noise_out   --output_ext .safetensors   --output_name diffusion_pytorch_model   --model_type wan_dit   --direction forward   --lora_path /home/xx/Wan-AI/Wan2.2-Distill-Loras/wan2.2_i2v_A14b_high_noise_lora_rank64_lightx2v_4step_1022.safetensors   --lora_key_convert auto   --single_file --device cpu

python converter.py   --source /home/xx/Wan-AI/Wan2.2-I2V-A14B/low_noise_model   --output /home/xx/Wan-AI/wan22_lightx2v/low_noise_out   --output_ext .safetensors   --output_name diffusion_pytorch_model   --model_type wan_dit   --direction forward   --lora_path /home/xx/Wan-AI/Wan2.2-Distill-Loras/wan2.2_i2v_A14b_low_noise_lora_rank64_lightx2v_4step_1022.safetensors   --lora_key_convert auto   --single_file --device cpu

python tools/wan22/assemble_lightx2v_wan22_i2v_diffusers.py \
    --diffusers-skeleton /home/xx/Wan-AI/Wan2.2-I2V-A14B-Diffusers \
    --high-noise-weight /home/xx/Wan-AI/wan22_lightx2v/high_noise_out \
    --low-noise-weight /home/xx/Wan-AI/wan22_lightx2v/low_noise_out \
    --output-dir /home/xx/Wan-AI/Wan2.2-I2V-A14B-LightX2V-Diffusers

python examples/offline_inference/image_to_video/image_to_video.py --model /home/xx/Wan-AI/Wan2.2-I2V-A14B-LightX2V-Diffusers --image /home/xx/vllm_public_assets/images.jpg --prompt "A cat playing with yarn" --num-frames 81 --num-inference-steps 4 --tensor-parallel-size 4 --height 480 --width 832 --flow-shift 12 --sample-solver euler --guidance-scale 1.0 --guidance-scale-high 1.0 --boundary-ratio 0.875

Test Result

Environment: Ascend NPU
Model loaded successfully: /home/xx/Wan-AI/Wan2.2-I2V-A14B-LightX2V-Diffusers
Generation completed successfully and output video saved: i2v_output.mp4
E2E generation time: 38.78s
Step profile: 4 steps, around 4.70s/step
Peak memory (request): 38.06 GB reserved, 33.94 GB allocated

Raw execution log:

INFO 03-25 02:42:13 [async_omni_diffusion.py:154] AsyncOmniDiffusion initialized with model: /home/xx/Wan-AI/Wan2.2-I2V-A14B-LightX2V-Diffusers, batch_size: 1
INFO 03-25 02:42:13 [stage_diffusion_client.py:53] [StageDiffusionClient] Stage-0 initialized (batch_size=1)
INFO 03-25 02:42:13 [async_omni_engine.py:484] [AsyncOmniEngine] Stage 0 initialized (diffusion, batch_size=1)
INFO 03-25 02:42:13 [orchestrator.py:158] [Orchestrator] Starting event loop
INFO 03-25 02:42:13 [async_omni_engine.py:287] [AsyncOmniEngine] Orchestrator ready with 1 stages
INFO 03-25 02:42:13 [omni_base.py:105] [Omni] AsyncOmniEngine initialized in 65.11 seconds
INFO 03-25 02:42:13 [omni_base.py:120] [Omni] Initialized with 1 stages for model /home/xx/Wan-AI/Wan2.2-I2V-A14B-LightX2V-Diffusers

============================================================
Generation Configuration:
  Model: /home/xx/Wan-AI/Wan2.2-I2V-A14B-LightX2V-Diffusers
  Inference steps: 4
  Frames: 81
  Solver: euler
  Parallel configuration: cfg_parallel_size=1, tensor_parallel_size=4, vae_patch_parallel_size=1
  Video size: 832x480
============================================================

/home/xx/vllm-omni-wan/.venv/lib/python3.11/site-packages/torch_npu/utils/storage.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  if self.device.type != 'cpu':
INFO 03-25 02:42:13 [orchestrator.py:584] [Orchestrator] _handle_add_request: stage=0 req=0_8456df36-19f3-4fa7-a5c5-a3210a20e285 prompt_type=dict original_prompt_type=dict final_stage=0 num_sampling_params=1
Processed prompts:   0%|                                                                                                                                                                                       | 0/1 [00:00<?, ?it/s]INFO 03-25 02:42:13 [diffusion_engine.py:96] Pre-processing completed in 0.0247 seconds
INFO 03-25 02:42:13 [manager.py:608] Deactivating all adapters: 0 layers
INFO 03-25 02:42:13 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-25 02:42:13 [kv_transfer_manager.py:381] No connector available for receiving KV cache
INFO 03-25 02:42:13 [manager.py:608] Deactivating all adapters: 0 layers
INFO 03-25 02:42:13 [manager.py:608] Deactivating all adapters: 0 layers
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:18<00:00,  4.70s/it]
INFO 03-25 02:42:46 [diffusion_model_runner.py:212] Peak GPU memory (this request): 38.06 GB reserved, 33.94 GB allocated, 4.13 GB pool overhead (10.8%)
INFO 03-25 02:42:52 [diffusion_engine.py:104] Generation completed successfully.
INFO 03-25 02:42:52 [diffusion_engine.py:137] Post-processing completed in 0.1528 seconds
INFO 03-25 02:42:52 [diffusion_engine.py:140] DiffusionEngine.step breakdown: preprocess=24.69 ms, add_req_and_wait=38592.13 ms, postprocess=152.84 ms, total=38770.68 ms
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:38<00:00, 38.77s/it]INFO 03-25 02:42:52 [omni_base.py:161] [Summary] {}
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:38<00:00, 38.77s/it]
Total generation time: 38.7800 seconds (38779.99 ms)
Saved generated video to i2v_output.mp4
INFO 03-25 02:42:53 [async_omni_engine.py:1103] [AsyncOmniEngine] Shutting down Orchestrator
INFO 03-25 02:42:53 [orchestrator.py:210] [Orchestrator] Received shutdown signal
INFO 03-25 02:42:53 [orchestrator.py:820] [Orchestrator] Shutting down all stages
INFO 03-25 02:42:53 [diffusion_worker.py:460] Worker 0: Received shutdown message
INFO 03-25 02:42:53 [diffusion_worker.py:481] event loop terminated.
INFO 03-25 02:42:53 [diffusion_worker.py:460] Worker 1: Received shutdown message
INFO 03-25 02:42:53 [diffusion_worker.py:481] event loop terminated.
INFO 03-25 02:42:53 [diffusion_worker.py:460] Worker 3: Received shutdown message
INFO 03-25 02:42:53 [diffusion_worker.py:481] event loop terminated.
INFO 03-25 02:42:53 [diffusion_worker.py:460] Worker 2: Received shutdown message
INFO 03-25 02:42:53 [diffusion_worker.py:481] event loop terminated.
INFO 03-25 02:42:54 [diffusion_worker.py:516] Worker 0: Shutdown complete.
INFO 03-25 02:42:54 [diffusion_worker.py:516] Worker 1: Shutdown complete.
INFO 03-25 02:42:54 [diffusion_worker.py:516] Worker 2: Shutdown complete.
INFO 03-25 02:42:54 [diffusion_worker.py:516] Worker 3: Shutdown complete.

Notes

This PR focuses on offline conversion path for [RFC]: Support Wan2.2-I2V-A14B Model in vllm-omni Multimodal Generation Framework #2093.
It does not add online runtime dual-LoRA request/activation protocol changes.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 23637f6a10

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T11:57:35Z

+    if path.is_file():
+        return WeightSpec(kind="single", single_file=path)


Validate file inputs before treating as single checkpoint

_resolve_weight_spec classifies any existing file as a single weight file, so passing a sharded index file like diffusion_pytorch_model.safetensors.index.json (which is a plausible “checkpoint file” per the CLI help) is accepted as kind="single". In that path, the assembler copies only the index JSON, skips shard validation, and _validate_output still reports success, producing an output directory that is missing actual tensor shards and cannot be loaded. Please special-case *.index.json as sharded input (with shard checks) or reject non-weight file extensions.

Useful? React with 👍 / 👎.

Fishermanykx · 2026-03-24T12:24:51Z

Thanks for your contribution. Could you also attach the generated video result for this PR?

lishunyang12 · 2026-03-24T13:07:22Z

Can you provide visual output and have relevant metrics compared with existing Wan2.2-I2V-A14B solution?

lishunyang12 · 2026-03-24T13:10:51Z

@@ -0,0 +1,140 @@
+# Wan2.2 I2V LightX2V Conversion


Can you put this file under img2video section instead of creating a standalone page?

@lishunyang12 Oh, moving it to image_to_video/README.md leads to minors problems in CI. Let's find a better place for ti.

congw729 · 2026-03-25T04:28:26Z

Hi, the doc for your model example should be placed under examples/offline_inference/image_to_video/, then run mkdocs serve (it will auto-generate this docs/user_guide/examples/offline_inference/wan22_i2v_lightx2v_conversion.md for you).
And you can also check whether the layout of this page is acceptable on this local mkdocs server; the address is http://127.0.0.1:8000/vllm-omni/.

thank you. I’ve updated this according to your suggestion. But I'm not sure if I'm doing this right. please check.

Celeste-jq · 2026-03-25T08:07:22Z

input image:

Wan2.2-I2V-A14B-Diffuser

test
python examples/offline_inference/image_to_video/image_to_video.py --model /home/xx/Wan-AI/Wan2.2-I2V-A14B-Diffusers --image /home/xx/vllm_public_assets/images.jpg --prompt "A cat playing with yarn" --num-frames 81 --num-inference-steps 10 --tensor-parallel-size 4 --height 480 --width 832
result

============================================================
Generation Configuration:
  Model: /home/xx/Wan-AI/Wan2.2-I2V-A14B-Diffusers
  Inference steps: 10
  Frames: 81
  Solver: unipc
  Parallel configuration: cfg_parallel_size=1, tensor_parallel_size=4, vae_patch_parallel_size=1
  Video size: 832x480
============================================================
INFO 03-25 05:57:35 [diffusion_model_runner.py:212] Peak GPU memory (this request): 38.02 GB reserved, 33.96 GB allocated, 4.05 GB pool overhead (10.7%)
INFO 03-25 05:57:41 [diffusion_engine.py:104] Generation completed successfully.
INFO 03-25 05:57:41 [diffusion_engine.py:137] Post-processing completed in 0.1882 seconds
INFO 03-25 05:57:41 [diffusion_engine.py:140] DiffusionEngine.step breakdown: preprocess=26.33 ms, add_req_and_wait=136750.86 ms, postprocess=188.16 ms, total=136966.08 ms
Total generation time: 136.9741 seconds (136974.08 ms)

i2v_output-diffuser325.mp4

Wan2.2-I2V-A14B-LightX2V-Diffusers （Wan2.2-I2V-A14B-Lightning+Wan2.2-Distill-Loras)

test
python examples/offline_inference/image_to_video/image_to_video.py --model /home/xx/Wan-AI/Wan2.2-I2V-A14B-LightX2V-Diffusers-Lightning --image /home/xx/vllm_public_assets/images.jpg --prompt "A cat playing with yarn" --num-frames 81 --num-inference-steps 4 --tensor-parallel-size 4 --height 480 --width 832 --flow-shift 12 --sample-solver euler --guidance-scale 1.0 --guidance-scale-high 1.0 --boundary-ratio 0.875
result

============================================================
Generation Configuration:
  Model: /home/xx/Wan-AI/Wan2.2-I2V-A14B-LightX2V-Diffusers-Lightning
  Inference steps: 4
  Frames: 81
  Solver: euler
  Parallel configuration: cfg_parallel_size=1, tensor_parallel_size=4, vae_patch_parallel_size=1
  Video size: 832x480
============================================================
INFO 03-25 06:10:03 [diffusion_model_runner.py:212] Peak GPU memory (this request): 38.06 GB reserved, 33.94 GB allocated, 4.13 GB pool overhead (10.8%)
INFO 03-25 06:10:09 [diffusion_engine.py:104] Generation completed successfully.
INFO 03-25 06:10:10 [diffusion_engine.py:137] Post-processing completed in 0.1675 seconds
INFO 03-25 06:10:10 [diffusion_engine.py:140] DiffusionEngine.step breakdown: preprocess=29.09 ms, add_req_and_wait=38545.65 ms, postprocess=167.50 ms, total=38743.63 ms
Total generation time: 38.7537 seconds (38753.66 ms)

i2v_output-lightning-lora.mp4

Wan2.2-I2V-A14B-LightX2V-Diffusers （Wan2.2-I2V-A14B+Wan2.2-Distill-Loras）

test
python examples/offline_inference/image_to_video/image_to_video.py --model /home/xx/Wan-AI/Wan2.2-I2V-A14B-LightX2V-Diffusers-Base --image /home/xx/vllm_public_assets/images.jpg --prompt "A cat playing with yarn" --num-frames 81 --num-inference-steps 4 --tensor-parallel-size 4 --height 480 --width 832 --flow-shift 12 --sample-solver euler --guidance-scale 1.0 --guidance-scale-high 1.0 --boundary-ratio 0.875
result

============================================================
Generation Configuration:
  Model: /home/xx/Wan-AI/Wan2.2-I2V-A14B-LightX2V-Diffusers-Base
  Inference steps: 4
  Frames: 81
  Solver: euler
  Parallel configuration: cfg_parallel_size=1, tensor_parallel_size=4, vae_patch_parallel_size=1
  Video size: 832x480
============================================================
INFO 03-25 06:33:07 [diffusion_model_runner.py:212] Peak GPU memory (this request): 38.06 GB reserved, 33.94 GB allocated, 4.13 GB pool overhead (10.8%)
INFO 03-25 06:33:13 [diffusion_engine.py:104] Generation completed successfully.
INFO 03-25 06:33:13 [diffusion_engine.py:137] Post-processing completed in 0.1704 seconds
INFO 03-25 06:33:13 [diffusion_engine.py:140] DiffusionEngine.step breakdown: preprocess=27.21 ms, add_req_and_wait=38526.15 ms, postprocess=170.39 ms, total=38724.81 ms
Total generation time: 38.7337 seconds (38733.73 ms)

i2v_output-base-lora.mp4

Signed-off-by: Celeste-jq <591998922@qq.com>

Celeste-jq · 2026-03-27T09:47:57Z

@hsliuustc0106 @lishunyang12 @gcanlin @congw729 @Fishermanykx PTAL

Fishermanykx · 2026-03-28T02:23:49Z

LGTM

gcanlin · 2026-03-29T13:53:32Z

+- Base model: `Wan-AI/Wan2.2-I2V-A14B`
+- Diffusers skeleton: `Wan-AI/Wan2.2-I2V-A14B-Diffusers`
+- LoRA weights: `lightx2v/Wan2.2-Distill-Loras`
+- LightX2V converter: `tools/convert/converter.py`


We don't have this file.

gcanlin · 2026-03-29T14:20:37Z

@@ -0,0 +1,339 @@
+#!/usr/bin/env python3


This tool seems only work with lora weight. Is it possible to decouple the lora part and make it work for both load and non-lora?

Use a single assemble_wan22_i2v_diffusers.py tool that supports both new and legacy LightX2V argument names, and update README guidance to avoid referencing converter files in this repo. Made-with: Cursor Signed-off-by: Celeste-jq <591998922@qq.com>

Celeste-jq · 2026-04-01T03:16:10Z

@gcanlin PTAL

congw729 · 2026-04-01T06:36:23Z

This PR modified the file vllm_omni/diffusion/models/wan2_2/wan2_2_transformer.py, highly suggest adding nightly-test label to trigger L4 level testing before merge. If you need help adding the label, please let me know.

gcanlin · 2026-04-01T06:40:17Z

This PR modified the file vllm_omni/diffusion/models/wan2_2/wan2_2_transformer.py, highly suggest adding nightly-test label to trigger L4 level testing before merge. If you need help adding the label, please let me know.

Agree. We need it.

gcanlin · 2026-04-02T03:52:20Z

@wtomin @SamitHuang PTAL

Signed-off-by: Celeste-jq <591998922@qq.com>

wtomin

LGTM.

Signed-off-by: Celeste-jq <591998922@qq.com> # Conflicts: # vllm_omni/engine/async_omni_engine.py

Fishermanykx · 2026-04-13T06:14:13Z

@gcanlin PTAL

…llm-project#2134) Signed-off-by: Celeste-jq <591998922@qq.com> Co-authored-by: Canlin Guo <canlinguosdu@gmail.com>

Celeste-jq requested a review from hsliuustc0106 as a code owner March 24, 2026 11:52

Celeste-jq mentioned this pull request Mar 24, 2026

[RFC]: Support Wan2.2-I2V-A14B Model in vllm-omni Multimodal Generation Framework #2093

Open

1 task

Celeste-jq force-pushed the wan22-lightx2v-upstream-main branch from 23637f6 to 7c91a46 Compare March 24, 2026 11:55

chatgpt-codex-connector Bot reviewed Mar 24, 2026

View reviewed changes

lishunyang12 reviewed Mar 24, 2026

View reviewed changes

Celeste-jq changed the title ~~[Model] Adapt Wan2.2-I2V-A14B via LightX2V offline conversion path~~ [WIP] [Model] Adapt Wan2.2-I2V-A14B via LightX2V offline conversion path Mar 25, 2026

Celeste-jq changed the title ~~[WIP] [Model] Adapt Wan2.2-I2V-A14B via LightX2V offline conversion path~~ [Model] Adapt Wan2.2-I2V-A14B via LightX2V offline conversion path Mar 25, 2026

congw729 reviewed Mar 25, 2026

View reviewed changes

Celeste-jq force-pushed the wan22-lightx2v-upstream-main branch from cbb4bb9 to c360806 Compare March 25, 2026 07:40

Celeste-jq force-pushed the wan22-lightx2v-upstream-main branch 3 times, most recently from 11d4f43 to 1b7274c Compare March 26, 2026 02:11

Celeste-jq added 8 commits March 27, 2026 09:09

[Model] Adapt Wan2.2-I2V-A14B via LightX2V offline conversion path

d9ab917

Signed-off-by: Celeste-jq <591998922@qq.com>

[BugFix] Fix CI formatting and docs inventory URL

7f3763e

Signed-off-by: Celeste-jq <591998922@qq.com>

[Model] Consolidate Wan2.2 Euler scheduler and sampling controls

641a386

Signed-off-by: Celeste-jq <591998922@qq.com>

docs: consolidate wan22 lightx2v example source relocation

beb6394

Signed-off-by: Celeste-jq <591998922@qq.com>

revert(docs): remove wan22 docs page and nav entry

00aaa3f

Signed-off-by: Celeste-jq <591998922@qq.com>

docs: add generated wan22 i2v lightx2v conversion page

f97d079

Signed-off-by: Celeste-jq <591998922@qq.com>

style: apply pre-commit autofixes

9d55016

Signed-off-by: Celeste-jq <591998922@qq.com>

docs: fold wan22 lightx2v guide into image-to-video README

80c8a18

Signed-off-by: Celeste-jq <591998922@qq.com>

Celeste-jq force-pushed the wan22-lightx2v-upstream-main branch from 1b7274c to 80c8a18 Compare March 27, 2026 01:10

gcanlin reviewed Mar 29, 2026

View reviewed changes

Celeste-jq force-pushed the wan22-lightx2v-upstream-main branch from db6c171 to a838e33 Compare March 31, 2026 08:41

gcanlin added the nightly-test label to trigger buildkite nightly test CI label Apr 2, 2026

Merge branch 'main' into wan22-lightx2v-upstream-main

ade43d4