vllm-project · gcanlin · Apr 13, 2026 · Mar 24, 2026 · Mar 24, 2026 · Mar 25, 2026
@@ -56,6 +56,92 @@ outputs = omni.generate(
 !!! note "Server-side Path Requirement"
     The LoRA adapter path (`local_path`) must be readable on the **server** machine. If your client and server are on different machines, ensure the LoRA adapter is accessible via a shared mount or copied to the server.
 
+## Wan2.2 LightX2V Offline Assembly
+
+This workflow is LoRA-adjacent: it uses external LightX2V conversion plus
+`Wan2.2-Distill-Loras` to bake converted Wan2.2 I2V checkpoints into a local
+Diffusers directory, instead of loading LoRA adapters at runtime.
+
+### Required assets
+
+- Base model: `Wan-AI/Wan2.2-I2V-A14B`
+- Diffusers skeleton: `Wan-AI/Wan2.2-I2V-A14B-Diffusers`
+- Optional external converter from the LightX2V project (not shipped in this repository)
+- Optional LoRA weights: `lightx2v/Wan2.2-Distill-Loras`
+
+### Step 1: Optional - convert high/low-noise DiT weights with LightX2V
+
+Install or clone LightX2V from the upstream repository
+(`https://github.com/ModelTC/LightX2V`). After cloning, the converter used
+below is available at `<lightx2v_root>/tools/convert/converter.py`.
+
+```bash
+python /path/to/lightx2v/tools/convert/converter.py \
+  --source /path/to/Wan2.2-I2V-A14B/high_noise_model \
+  --output /tmp/wan22_lightx2v/high_noise_out \
+  --output_ext .safetensors \
+  --output_name diffusion_pytorch_model \
+  --model_type wan_dit \
+  --direction forward \
+  --lora_path /path/to/wan2.2_i2v_A14b_high_noise_lora_rank64_lightx2v_4step_1022.safetensors \
+  --lora_key_convert auto \
+  --single_file
+
+python /path/to/lightx2v/tools/convert/converter.py \
+  --source /path/to/Wan2.2-I2V-A14B/low_noise_model \
+  --output /tmp/wan22_lightx2v/low_noise_out \
+  --output_ext .safetensors \
+  --output_name diffusion_pytorch_model \
+  --model_type wan_dit \
+  --direction forward \
+  --lora_path /path/to/wan2.2_i2v_A14b_low_noise_lora_rank64_lightx2v_4step_1022.safetensors \
+  --lora_key_convert auto \
+  --single_file
+```
+
+If you are not using LightX2V, skip this step and either keep the original
+Diffusers weights from the skeleton or point Step 2 at any other converted
+`transformer/` and `transformer_2/` checkpoints.
+
+### Step 2: Assemble a final Diffusers-style directory
+
+```bash
+python tools/wan22/assemble_wan22_i2v_diffusers.py \
+  --diffusers-skeleton /path/to/Wan2.2-I2V-A14B-Diffusers \
+  --transformer-weight /tmp/wan22_lightx2v/high_noise_out \
+  --transformer-2-weight /tmp/wan22_lightx2v/low_noise_out \
+  --output-dir /path/to/Wan2.2-I2V-A14B-Custom-Diffusers \
+  --asset-mode symlink \
+  --overwrite
+```
+
+`--transformer-weight` and `--transformer-2-weight` are optional. If you omit
+them, the tool keeps the original weights from the Diffusers skeleton.
+
+### Step 3: Run offline inference
+
+```bash
+python examples/offline_inference/image_to_video/image_to_video.py \
+  --model /path/to/Wan2.2-I2V-A14B-Custom-Diffusers \
+  --image /path/to/input.jpg \
+  --prompt "A cat playing with yarn" \
+  --num-frames 81 \
+  --num-inference-steps 4 \
+  --tensor-parallel-size 4 \
+  --height 480 \
+  --width 832 \
+  --flow-shift 12 \
+  --sample-solver euler \
+  --guidance-scale 1.0 \
+  --guidance-scale-high 1.0 \
+  --boundary-ratio 0.875
+```
+
+Notes:
+
+- This route avoids runtime LoRA loading changes in vLLM-Omni when you choose to bake converted weights into a local Diffusers directory.
+- Output quality and speed depend on the replacement checkpoints and sampling params you choose.
+
 
 ## See Also
 

@@ -62,12 +62,13 @@ Key arguments:
 - `--negative-prompt`: Optional list of artifacts to suppress.
 - `--boundary-ratio`: Boundary split ratio for two-stage MoE models.
 - `--flow-shift`: Scheduler flow shift (5.0 for 720p, 12.0 for 480p).
+- `--sample-solver`: Wan2.2 sampling solver. Use `unipc` for the default multistep solver, or `euler` for Lightning/Distill checkpoints.
 - `--num-inference-steps`: Number of denoising steps (default 50).
 - `--fps`: Frames per second for the saved MP4 (requires `diffusers` export_to_video).
 - `--output`: Path to save the generated video.
 - `--vae-use-slicing`: Enable VAE slicing for memory optimization.
 - `--vae-use-tiling`: Enable VAE tiling for memory optimization.
-- `--cfg-parallel-size`: set it to 2 to enable CFG Parallel. See more examples in [`user_guide`](https://github.com/vllm-project/vllm-omni/tree/main/docs/user_guide/diffusion/parallelism_acceleration.md#cfg-parallel).
+- `--cfg-parallel-size`: set it to 2 to enable CFG Parallel. See more examples in [`user_guide`](https://github.com/vllm-project/vllm-omni/tree/main/docs/user_guide/diffusion/parallelism/cfg_parallel.md).
 - `--tensor-parallel-size`: tensor parallel size (effective for models that support TP, e.g. LTX2).
 - `--enable-cpu-offload`: enable CPU offloading for diffusion models.
 - `--use-hsdp`: Enable Hybrid Sharded Data Parallel to shard model weights across GPUs.
@@ -78,6 +79,9 @@ Key arguments:
 
 > ℹ️ If you encounter OOM errors, try using `--vae-use-slicing` and `--vae-use-tiling` to reduce memory usage.
 
+For Wan2.2 LightX2V-converted local Diffusers directories and related LoRA
+assets, see the [LoRA guide](../../diffusion/lora.md#wan22-lightx2v-offline-assembly).
+
 ## Example materials
 
 ??? abstract "image_to_video.py"

@@ -59,12 +59,13 @@ Key arguments:
 - `--negative-prompt`: Optional list of artifacts to suppress.
 - `--boundary-ratio`: Boundary split ratio for two-stage MoE models.
 - `--flow-shift`: Scheduler flow shift (5.0 for 720p, 12.0 for 480p).
+- `--sample-solver`: Wan2.2 sampling solver. Use `unipc` for the default multistep solver, or `euler` for Lightning/Distill checkpoints.
 - `--num-inference-steps`: Number of denoising steps (default 50).
 - `--fps`: Frames per second for the saved MP4 (requires `diffusers` export_to_video).
 - `--output`: Path to save the generated video.
 - `--vae-use-slicing`: Enable VAE slicing for memory optimization.
 - `--vae-use-tiling`: Enable VAE tiling for memory optimization.
-- `--cfg-parallel-size`: set it to 2 to enable CFG Parallel. See more examples in [`user_guide`](../../../docs/user_guide/diffusion/parallelism_acceleration.md#cfg-parallel).
+- `--cfg-parallel-size`: set it to 2 to enable CFG Parallel. See more examples in [`user_guide`](https://github.com/vllm-project/vllm-omni/tree/main/docs/user_guide/diffusion/parallelism/cfg_parallel.md).
 - `--tensor-parallel-size`: tensor parallel size (effective for models that support TP, e.g. LTX2).
 - `--enable-cpu-offload`: enable CPU offloading for diffusion models.
 - `--use-hsdp`: Enable Hybrid Sharded Data Parallel to shard model weights across GPUs.
@@ -74,3 +75,6 @@ Key arguments:
 
 
 > ℹ️ If you encounter OOM errors, try using `--vae-use-slicing` and `--vae-use-tiling` to reduce memory usage.
+
+For Wan2.2 LightX2V-converted local Diffusers directories and related LoRA
+assets, see the [LoRA guide](../../../docs/user_guide/diffusion/lora.md#wan22-lightx2v-offline-assembly).
@@ -84,6 +84,13 @@ def parse_args() -> argparse.Namespace:
     parser.add_argument(
         "--flow-shift", type=float, default=5.0, help="Scheduler flow_shift (5.0 for 720p, 12.0 for 480p)."
     )
+    parser.add_argument(
+        "--sample-solver",
+        type=str,
+        default="unipc",
+        choices=["unipc", "euler"],
+        help="Sampling solver for Wan2.2 pipelines. Use 'euler' for Lightning/Distill setups.",
+    )
     parser.add_argument("--output", type=str, default="i2v_output.mp4", help="Path to save the video (mp4).")
     parser.add_argument("--fps", type=int, default=None, help="Frames per second for the output video.")
     parser.add_argument(
@@ -305,6 +312,7 @@ def main():
     print(f"  Model: {args.model}")
     print(f"  Inference steps: {args.num_inference_steps}")
     print(f"  Frames: {args.num_frames}")
+    print(f"  Solver: {args.sample_solver}")
     print(
         f"  Parallel configuration: cfg_parallel_size={args.cfg_parallel_size},"
         f" tensor_parallel_size={args.tensor_parallel_size}, vae_patch_parallel_size={args.vae_patch_parallel_size}"
@@ -326,9 +334,14 @@ def main():
             generator=generator,
             guidance_scale=guidance_scale,
             guidance_scale_2=args.guidance_scale_high,
+            boundary_ratio=args.boundary_ratio,
             num_inference_steps=num_inference_steps,
             num_frames=num_frames,
             frame_rate=frame_rate,
+            extra_args={
+                "sample_solver": args.sample_solver,
+                "flow_shift": args.flow_shift,
+            },
         ),
     )
     generation_end = time.perf_counter()

@@ -26,6 +26,23 @@ The script allows overriding:
 - `CACHE_BACKEND` (default: `none`)
 - `ENABLE_CACHE_DIT_SUMMARY` (default: `0`)
 
+### Ascend / Local LightX2V Example
+
+For a local Wan2.2-LightX2V Diffusers directory on Ascend/NPU, you can start the server like this:
+
+```bash
+vllm serve /path/to/Wan2.2-I2V-A14B-LightX2V-Diffusers-Lightning \
+  --omni \
+  --port 8091 \
+  --flow-shift 12 \
+  --cfg-parallel-size 1 \
+  --ulysses-degree 4 \
+  --use-hsdp \
+  --trust-remote-code \
+  --allowed-local-media-path / \
+  --seed 42
+```
+
 ## Async Job Behavior
 
 `POST /v1/videos` is asynchronous. It creates a video job and immediately
@@ -69,10 +86,35 @@ curl -X POST http://localhost:8091/v1/videos/sync \
   -F "guidance_scale_2=1.0" \
   -F "boundary_ratio=0.875" \
   -F "flow_shift=12.0" \
+  -F 'extra_params={"sample_solver":"euler"}' \
   -F "seed=42" \
   -o sync_i2v_output.mp4
 ```
 
+For Wan Lightning/Distill checkpoints, pass `{"sample_solver":"euler"}` via `extra_params`. The default solver is `unipc`.
+
+Example matching the local LightX2V deployment above:
+
+```bash
+curl -sS -X POST http://localhost:8091/v1/videos/sync \
+  -H "Accept: video/mp4" \
+  -F "prompt=A cat playing with yarn" \
+  -F "input_reference=@/path/to/input.jpg" \
+  -F "width=832" \
+  -F "height=480" \
+  -F "num_frames=81" \
+  -F "fps=16" \
+  -F "num_inference_steps=4" \
+  -F "guidance_scale=1.0" \
+  -F "guidance_scale_2=1.0" \
+  -F "boundary_ratio=0.875" \
+  -F "seed=42" \
+  -F 'extra_params={"sample_solver":"euler"}' \
+  -o ./output.mp4
+```
+
+Use `/v1/videos/sync` if you want to write the MP4 directly to a file. `POST /v1/videos` is async and returns job metadata, not inline `b64_json`.
+
 ## Storage
 
 Generated video files are stored on local disk by the async video API.
@@ -96,6 +138,9 @@ export VLLM_OMNI_STORAGE_MAX_CONCURRENCY=8
 # Basic image-to-video generation
 bash run_curl_image_to_video.sh
 
+# Wan Lightning/Distill checkpoints
+SAMPLE_SOLVER=euler bash run_curl_image_to_video.sh
+
 # Or execute directly (OpenAI-style multipart)
 create_response=$(curl -s http://localhost:8091/v1/videos \
   -H "Accept: application/json" \
@@ -111,6 +156,7 @@ create_response=$(curl -s http://localhost:8091/v1/videos \
   -F "guidance_scale_2=1.0" \
   -F "boundary_ratio=0.875" \
   -F "flow_shift=12.0" \
+  -F 'extra_params={"sample_solver":"euler"}' \
   -F "seed=42")
 
 video_id=$(echo "$create_response" | jq -r '.id')
@@ -169,9 +215,12 @@ curl -X POST http://localhost:8091/v1/videos \
   -F "guidance_scale_2=1.0" \
   -F "boundary_ratio=0.875" \
   -F "flow_shift=12.0" \
+  -F 'extra_params={"sample_solver":"euler"}' \
   -F "seed=42"
 ```
 
+`sample_solver` is supported by Wan2.2 online serving through the existing `extra_params` field, which is merged into the pipeline `extra_args`. Use `unipc` for the default multistep solver, or `euler` for Lightning/Distill checkpoints.
+
 ## Create Response Format
 
 `POST /v1/videos` returns a job record, not inline base64 video data.

@@ -7,6 +7,7 @@ INPUT_IMAGE="${INPUT_IMAGE:-../../offline_inference/image_to_video/qwen-bear.png
 BASE_URL="${BASE_URL:-http://localhost:8099}"
 OUTPUT_PATH="${OUTPUT_PATH:-wan22_i2v_output.mp4}"
 NEGATIVE_PROMPT="${NEGATIVE_PROMPT:-}"
+SAMPLE_SOLVER="${SAMPLE_SOLVER:-}"
 POLL_INTERVAL="${POLL_INTERVAL:-2}"
 
 if [ ! -f "$INPUT_IMAGE" ]; then
@@ -34,6 +35,10 @@ if [ -n "${NEGATIVE_PROMPT}" ]; then
   create_cmd+=(-F "negative_prompt=${NEGATIVE_PROMPT}")
 fi
 
+if [ -n "${SAMPLE_SOLVER}" ]; then
+  create_cmd+=(-F "extra_params={\"sample_solver\":\"${SAMPLE_SOLVER}\"}")
+fi
+
 create_response="$("${create_cmd[@]}")"
 video_id="$(echo "${create_response}" | jq -r '.id')"
 if [ -z "${video_id}" ] || [ "${video_id}" = "null" ]; then

@@ -737,6 +737,28 @@ def test_extra_params_merged_with_existing_extra_args(test_client, mocker: Mocke
     assert captured.extra_args["zero_steps"] == 2
 
 
+def test_sample_solver_forwarded_via_extra_params(test_client, mocker: MockerFixture):
+    """sample_solver can be passed through existing extra_params for Wan2.2 online serving."""
+    mocker.patch(
+        "vllm_omni.entrypoints.openai.serving_video.encode_video_base64",
+        return_value="Zg==",
+    )
+    response = test_client.post(
+        "/v1/videos",
+        data={
+            "prompt": "A fox running through snow.",
+            "extra_params": json.dumps({"sample_solver": "euler"}),
+        },
+    )
+
+    assert response.status_code == 200
+    video_id = response.json()["id"]
+    _wait_for_status(test_client, video_id, VideoGenerationStatus.COMPLETED.value)
+    engine = test_client.app.state.openai_serving_video._engine_client
+    captured = engine.captured_sampling_params_list[0]
+    assert captured.extra_args["sample_solver"] == "euler"
+
+
 # ---------------------------------------------------------------------------
 # Sync endpoint tests (POST /v1/videos/sync)
 # ---------------------------------------------------------------------------