Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions docs/user_guide/diffusion/lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,92 @@ outputs = omni.generate(
!!! note "Server-side Path Requirement"
The LoRA adapter path (`local_path`) must be readable on the **server** machine. If your client and server are on different machines, ensure the LoRA adapter is accessible via a shared mount or copied to the server.

## Wan2.2 LightX2V Offline Assembly

This workflow is LoRA-adjacent: it uses external LightX2V conversion plus
`Wan2.2-Distill-Loras` to bake converted Wan2.2 I2V checkpoints into a local
Diffusers directory, instead of loading LoRA adapters at runtime.

### Required assets

- Base model: `Wan-AI/Wan2.2-I2V-A14B`
- Diffusers skeleton: `Wan-AI/Wan2.2-I2V-A14B-Diffusers`
- Optional external converter from the LightX2V project (not shipped in this repository)
- Optional LoRA weights: `lightx2v/Wan2.2-Distill-Loras`

### Step 1: Optional - convert high/low-noise DiT weights with LightX2V

Install or clone LightX2V from the upstream repository
(`https://github.com/ModelTC/LightX2V`). After cloning, the converter used
below is available at `<lightx2v_root>/tools/convert/converter.py`.

```bash
python /path/to/lightx2v/tools/convert/converter.py \
--source /path/to/Wan2.2-I2V-A14B/high_noise_model \
--output /tmp/wan22_lightx2v/high_noise_out \
--output_ext .safetensors \
--output_name diffusion_pytorch_model \
--model_type wan_dit \
--direction forward \
--lora_path /path/to/wan2.2_i2v_A14b_high_noise_lora_rank64_lightx2v_4step_1022.safetensors \
--lora_key_convert auto \
--single_file

python /path/to/lightx2v/tools/convert/converter.py \
--source /path/to/Wan2.2-I2V-A14B/low_noise_model \
--output /tmp/wan22_lightx2v/low_noise_out \
--output_ext .safetensors \
--output_name diffusion_pytorch_model \
--model_type wan_dit \
--direction forward \
--lora_path /path/to/wan2.2_i2v_A14b_low_noise_lora_rank64_lightx2v_4step_1022.safetensors \
--lora_key_convert auto \
--single_file
```

If you are not using LightX2V, skip this step and either keep the original
Diffusers weights from the skeleton or point Step 2 at any other converted
`transformer/` and `transformer_2/` checkpoints.

### Step 2: Assemble a final Diffusers-style directory

```bash
python tools/wan22/assemble_wan22_i2v_diffusers.py \
--diffusers-skeleton /path/to/Wan2.2-I2V-A14B-Diffusers \
--transformer-weight /tmp/wan22_lightx2v/high_noise_out \
--transformer-2-weight /tmp/wan22_lightx2v/low_noise_out \
--output-dir /path/to/Wan2.2-I2V-A14B-Custom-Diffusers \
--asset-mode symlink \
--overwrite
```

`--transformer-weight` and `--transformer-2-weight` are optional. If you omit
them, the tool keeps the original weights from the Diffusers skeleton.

### Step 3: Run offline inference

```bash
python examples/offline_inference/image_to_video/image_to_video.py \
--model /path/to/Wan2.2-I2V-A14B-Custom-Diffusers \
--image /path/to/input.jpg \
--prompt "A cat playing with yarn" \
--num-frames 81 \
--num-inference-steps 4 \
--tensor-parallel-size 4 \
--height 480 \
--width 832 \
--flow-shift 12 \
--sample-solver euler \
--guidance-scale 1.0 \
--guidance-scale-high 1.0 \
--boundary-ratio 0.875
```

Notes:

- This route avoids runtime LoRA loading changes in vLLM-Omni when you choose to bake converted weights into a local Diffusers directory.
- Output quality and speed depend on the replacement checkpoints and sampling params you choose.


## See Also

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,12 +62,13 @@ Key arguments:
- `--negative-prompt`: Optional list of artifacts to suppress.
- `--boundary-ratio`: Boundary split ratio for two-stage MoE models.
- `--flow-shift`: Scheduler flow shift (5.0 for 720p, 12.0 for 480p).
- `--sample-solver`: Wan2.2 sampling solver. Use `unipc` for the default multistep solver, or `euler` for Lightning/Distill checkpoints.
- `--num-inference-steps`: Number of denoising steps (default 50).
- `--fps`: Frames per second for the saved MP4 (requires `diffusers` export_to_video).
- `--output`: Path to save the generated video.
- `--vae-use-slicing`: Enable VAE slicing for memory optimization.
- `--vae-use-tiling`: Enable VAE tiling for memory optimization.
- `--cfg-parallel-size`: set it to 2 to enable CFG Parallel. See more examples in [`user_guide`](https://github.com/vllm-project/vllm-omni/tree/main/docs/user_guide/diffusion/parallelism_acceleration.md#cfg-parallel).
- `--cfg-parallel-size`: set it to 2 to enable CFG Parallel. See more examples in [`user_guide`](https://github.com/vllm-project/vllm-omni/tree/main/docs/user_guide/diffusion/parallelism/cfg_parallel.md).
- `--tensor-parallel-size`: tensor parallel size (effective for models that support TP, e.g. LTX2).
- `--enable-cpu-offload`: enable CPU offloading for diffusion models.
- `--use-hsdp`: Enable Hybrid Sharded Data Parallel to shard model weights across GPUs.
Expand All @@ -78,6 +79,9 @@ Key arguments:

> ℹ️ If you encounter OOM errors, try using `--vae-use-slicing` and `--vae-use-tiling` to reduce memory usage.

For Wan2.2 LightX2V-converted local Diffusers directories and related LoRA
assets, see the [LoRA guide](../../diffusion/lora.md#wan22-lightx2v-offline-assembly).

## Example materials

??? abstract "image_to_video.py"
Expand Down
6 changes: 5 additions & 1 deletion examples/offline_inference/image_to_video/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,13 @@ Key arguments:
- `--negative-prompt`: Optional list of artifacts to suppress.
- `--boundary-ratio`: Boundary split ratio for two-stage MoE models.
- `--flow-shift`: Scheduler flow shift (5.0 for 720p, 12.0 for 480p).
- `--sample-solver`: Wan2.2 sampling solver. Use `unipc` for the default multistep solver, or `euler` for Lightning/Distill checkpoints.
- `--num-inference-steps`: Number of denoising steps (default 50).
- `--fps`: Frames per second for the saved MP4 (requires `diffusers` export_to_video).
- `--output`: Path to save the generated video.
- `--vae-use-slicing`: Enable VAE slicing for memory optimization.
- `--vae-use-tiling`: Enable VAE tiling for memory optimization.
- `--cfg-parallel-size`: set it to 2 to enable CFG Parallel. See more examples in [`user_guide`](../../../docs/user_guide/diffusion/parallelism_acceleration.md#cfg-parallel).
- `--cfg-parallel-size`: set it to 2 to enable CFG Parallel. See more examples in [`user_guide`](https://github.com/vllm-project/vllm-omni/tree/main/docs/user_guide/diffusion/parallelism/cfg_parallel.md).
- `--tensor-parallel-size`: tensor parallel size (effective for models that support TP, e.g. LTX2).
- `--enable-cpu-offload`: enable CPU offloading for diffusion models.
- `--use-hsdp`: Enable Hybrid Sharded Data Parallel to shard model weights across GPUs.
Comment thread
wtomin marked this conversation as resolved.
Expand All @@ -74,3 +75,6 @@ Key arguments:


> ℹ️ If you encounter OOM errors, try using `--vae-use-slicing` and `--vae-use-tiling` to reduce memory usage.

For Wan2.2 LightX2V-converted local Diffusers directories and related LoRA
assets, see the [LoRA guide](../../../docs/user_guide/diffusion/lora.md#wan22-lightx2v-offline-assembly).
13 changes: 13 additions & 0 deletions examples/offline_inference/image_to_video/image_to_video.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,13 @@ def parse_args() -> argparse.Namespace:
parser.add_argument(
"--flow-shift", type=float, default=5.0, help="Scheduler flow_shift (5.0 for 720p, 12.0 for 480p)."
)
parser.add_argument(
"--sample-solver",
type=str,
default="unipc",
choices=["unipc", "euler"],
help="Sampling solver for Wan2.2 pipelines. Use 'euler' for Lightning/Distill setups.",
)
Comment thread
wtomin marked this conversation as resolved.
parser.add_argument("--output", type=str, default="i2v_output.mp4", help="Path to save the video (mp4).")
parser.add_argument("--fps", type=int, default=None, help="Frames per second for the output video.")
parser.add_argument(
Expand Down Expand Up @@ -305,6 +312,7 @@ def main():
print(f" Model: {args.model}")
print(f" Inference steps: {args.num_inference_steps}")
print(f" Frames: {args.num_frames}")
print(f" Solver: {args.sample_solver}")
print(
f" Parallel configuration: cfg_parallel_size={args.cfg_parallel_size},"
f" tensor_parallel_size={args.tensor_parallel_size}, vae_patch_parallel_size={args.vae_patch_parallel_size}"
Expand All @@ -326,9 +334,14 @@ def main():
generator=generator,
guidance_scale=guidance_scale,
guidance_scale_2=args.guidance_scale_high,
boundary_ratio=args.boundary_ratio,
num_inference_steps=num_inference_steps,
num_frames=num_frames,
frame_rate=frame_rate,
extra_args={
"sample_solver": args.sample_solver,
"flow_shift": args.flow_shift,
},
),
)
generation_end = time.perf_counter()
Expand Down
49 changes: 49 additions & 0 deletions examples/online_serving/image_to_video/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,23 @@ The script allows overriding:
- `CACHE_BACKEND` (default: `none`)
- `ENABLE_CACHE_DIT_SUMMARY` (default: `0`)

### Ascend / Local LightX2V Example

For a local Wan2.2-LightX2V Diffusers directory on Ascend/NPU, you can start the server like this:

```bash
vllm serve /path/to/Wan2.2-I2V-A14B-LightX2V-Diffusers-Lightning \
--omni \
--port 8091 \
--flow-shift 12 \
--cfg-parallel-size 1 \
--ulysses-degree 4 \
--use-hsdp \
--trust-remote-code \
--allowed-local-media-path / \
--seed 42
```

## Async Job Behavior

`POST /v1/videos` is asynchronous. It creates a video job and immediately
Expand Down Expand Up @@ -69,10 +86,35 @@ curl -X POST http://localhost:8091/v1/videos/sync \
-F "guidance_scale_2=1.0" \
-F "boundary_ratio=0.875" \
-F "flow_shift=12.0" \
-F 'extra_params={"sample_solver":"euler"}' \
-F "seed=42" \
-o sync_i2v_output.mp4
```

For Wan Lightning/Distill checkpoints, pass `{"sample_solver":"euler"}` via `extra_params`. The default solver is `unipc`.

Example matching the local LightX2V deployment above:

```bash
curl -sS -X POST http://localhost:8091/v1/videos/sync \
-H "Accept: video/mp4" \
-F "prompt=A cat playing with yarn" \
-F "input_reference=@/path/to/input.jpg" \
-F "width=832" \
-F "height=480" \
-F "num_frames=81" \
-F "fps=16" \
-F "num_inference_steps=4" \
-F "guidance_scale=1.0" \
-F "guidance_scale_2=1.0" \
-F "boundary_ratio=0.875" \
-F "seed=42" \
-F 'extra_params={"sample_solver":"euler"}' \
-o ./output.mp4
```

Use `/v1/videos/sync` if you want to write the MP4 directly to a file. `POST /v1/videos` is async and returns job metadata, not inline `b64_json`.

## Storage

Generated video files are stored on local disk by the async video API.
Expand All @@ -96,6 +138,9 @@ export VLLM_OMNI_STORAGE_MAX_CONCURRENCY=8
# Basic image-to-video generation
bash run_curl_image_to_video.sh

# Wan Lightning/Distill checkpoints
SAMPLE_SOLVER=euler bash run_curl_image_to_video.sh

# Or execute directly (OpenAI-style multipart)
create_response=$(curl -s http://localhost:8091/v1/videos \
-H "Accept: application/json" \
Expand All @@ -111,6 +156,7 @@ create_response=$(curl -s http://localhost:8091/v1/videos \
-F "guidance_scale_2=1.0" \
-F "boundary_ratio=0.875" \
-F "flow_shift=12.0" \
-F 'extra_params={"sample_solver":"euler"}' \
-F "seed=42")

video_id=$(echo "$create_response" | jq -r '.id')
Expand Down Expand Up @@ -169,9 +215,12 @@ curl -X POST http://localhost:8091/v1/videos \
-F "guidance_scale_2=1.0" \
-F "boundary_ratio=0.875" \
-F "flow_shift=12.0" \
-F 'extra_params={"sample_solver":"euler"}' \
-F "seed=42"
```

`sample_solver` is supported by Wan2.2 online serving through the existing `extra_params` field, which is merged into the pipeline `extra_args`. Use `unipc` for the default multistep solver, or `euler` for Lightning/Distill checkpoints.

## Create Response Format

`POST /v1/videos` returns a job record, not inline base64 video data.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ INPUT_IMAGE="${INPUT_IMAGE:-../../offline_inference/image_to_video/qwen-bear.png
BASE_URL="${BASE_URL:-http://localhost:8099}"
OUTPUT_PATH="${OUTPUT_PATH:-wan22_i2v_output.mp4}"
NEGATIVE_PROMPT="${NEGATIVE_PROMPT:-}"
SAMPLE_SOLVER="${SAMPLE_SOLVER:-}"
POLL_INTERVAL="${POLL_INTERVAL:-2}"

if [ ! -f "$INPUT_IMAGE" ]; then
Expand Down Expand Up @@ -34,6 +35,10 @@ if [ -n "${NEGATIVE_PROMPT}" ]; then
create_cmd+=(-F "negative_prompt=${NEGATIVE_PROMPT}")
fi

if [ -n "${SAMPLE_SOLVER}" ]; then
create_cmd+=(-F "extra_params={\"sample_solver\":\"${SAMPLE_SOLVER}\"}")
fi

create_response="$("${create_cmd[@]}")"
video_id="$(echo "${create_response}" | jq -r '.id')"
if [ -z "${video_id}" ] || [ "${video_id}" = "null" ]; then
Expand Down
22 changes: 22 additions & 0 deletions tests/entrypoints/openai_api/test_video_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -737,6 +737,28 @@ def test_extra_params_merged_with_existing_extra_args(test_client, mocker: Mocke
assert captured.extra_args["zero_steps"] == 2


def test_sample_solver_forwarded_via_extra_params(test_client, mocker: MockerFixture):
"""sample_solver can be passed through existing extra_params for Wan2.2 online serving."""
mocker.patch(
"vllm_omni.entrypoints.openai.serving_video.encode_video_base64",
return_value="Zg==",
)
response = test_client.post(
"/v1/videos",
data={
"prompt": "A fox running through snow.",
"extra_params": json.dumps({"sample_solver": "euler"}),
},
)

assert response.status_code == 200
video_id = response.json()["id"]
_wait_for_status(test_client, video_id, VideoGenerationStatus.COMPLETED.value)
engine = test_client.app.state.openai_serving_video._engine_client
captured = engine.captured_sampling_params_list[0]
assert captured.extra_args["sample_solver"] == "euler"


# ---------------------------------------------------------------------------
# Sync endpoint tests (POST /v1/videos/sync)
# ---------------------------------------------------------------------------
Expand Down
Loading
Loading