[Model] Add LingBot-World I2V support by pakkah · Pull Request #2073 · vllm-project/vllm-omni

pakkah · 2026-03-22T13:34:02Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add LingbotWorldPipeline support for robbyant/lingbot-world-base-cam and integrate it into the existing image-to-video example flow. This PR closes #1045.

Implementation notes:

Follow the DreamID-Omni integration pattern: keep the pipeline self-loading, reuse an external dependency repo via a download helper, and keep the vllm-omni-specific model adaptation local
Support LingBot-World control signals through the offline serving --action-path arg, which matches the upstream LingBot-World usage pattern
The API form for model-specific control signals is still to be determined [RFC]: World Model Support #1987 , so we leave that to a follow-up PR. In this PR, online serving remains limited to plain I2V without control signals.

Test Plan

cd examples/offline_inference/image_to_video

python download_lingbot_world.py \
  --model-id robbyant/lingbot-world-base-cam \
  --output-dir ./lingbot-world-base-cam

PROMPT="$(cat /tmp/vllm-omni-dependency/lingbot-world/examples/00/prompt.txt)"

python image_to_video.py \
  --model ./lingbot-world-base-cam \
  --image /tmp/vllm-omni-dependency/lingbot-world/examples/00/image.jpg \
  --action-path /tmp/vllm-omni-dependency/lingbot-world/examples/00 \
  --prompt "$PROMPT" \
  --height 480 \
  --width 832 \
  --num-frames 161 \
  --guidance-scale 5.0 \
  --guidance-scale-high 5.0 \
  --num-inference-steps 20 \
  --flow-shift 10.0 \
  --fps 16 \
  --output lingbot_world_base_cam_examples00.mp4

python image_to_video.py \
  --model ./lingbot-world-base-cam \
  --image /tmp/vllm-omni-dependency/lingbot-world/examples/00/image.jpg \
  --prompt "$PROMPT" \
  --height 480 \
  --width 832 \
  --num-frames 161 \
  --guidance-scale 5.0 \
  --guidance-scale-high 5.0 \
  --num-inference-steps 20 \
  --flow-shift 10.0 \
  --fps 16 \
  --output lingbot_world_base_cam_no_control.mp4

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Zhaoxiang Huang <zhaoxiang.huang@outlook.com>

Signed-off-by: asukaqaq-s <1311722138@qq.com>

…up claims Fix three things the earlier phase-D draft got wrong by re-reading the DreamZero paper carefully: 1. Decimal separator: the naive baseline is 5.7s per chunk (not 7s), and the bimanual action horizon is 1.6s per chunk (not 6s). pypdf had silently dropped the leading digit at line breaks. 2. Step-reduction progression: DreamZero does not "stay at 16 steps". DiT Caching (velocity reuse within a chunk, based on cosine similarity of flow-matching velocities) reduces effective steps from 16 to 4. DreamZero-Flash, a training-time noise-schedule change, further reduces to 1 step. The paper's Table 3 shows the task-progress cost: naive 1-step loses 31 points on table-bussing, Flash 1-step loses only 9. 3. The 38x headline is GB200-only. Table 1 shows the cumulative speedup caps at 9.6x on H100; NVFP4 and DreamZero-Flash rows are dashed for H100. CFG parallelism is also multi-GPU from row 2, so the post-baseline config in the paper is never single-GPU. Also separate DreamZero's "DiT Caching" (intra-chunk velocity reuse) from RFC vllm-project#1987 / StreamDiffusionV2's "rolling KV cache across chunks" -- these are different optimizations and should not be conflated. Drop all cross-baseline speedup attribution from phase_d_cross_check.md and the journal's Phase D/E sections. Earlier drafts claimed that e.g. "36x of DreamZero's 38x is accounted for by choosing streaming-point hyperparameters", which compares across different baselines, hardware, training regimes, and chunk shapes. That comparison is invalid and is removed. Our own measurement (53.30s -> 2.585s = 20.6x on Wan-1.3B at our offline vs our streaming point on 1x A100) is retained as a fact about our own two configurations, with no claim about how it relates to any published speedup. Contribution-target vllm-project#3 (rolling KV cache + blockwise-causal attention) is downgraded from "~1.5-2x at our streaming operating point" to "speed-up not measured on our hardware; research commitment, not quantified target." The measured ~260 ms framework overhead is kept as the concrete motivating number for target #2 (per-call overhead reduction, RFC vllm-project#2073). Add a correction notice at the top of the earlier SOTA-scan DreamZero subsection pointing readers to the Phase D corrections.

pakkah and others added 2 commits March 22, 2026 13:11

Add lingbot-world support

7d020b8

Signed-off-by: Zhaoxiang Huang <zhaoxiang.huang@outlook.com>

Handle custom model_index.json-only diffusion pipelines

e546daa

Signed-off-by: asukaqaq-s <1311722138@qq.com>

pakkah mentioned this pull request Mar 22, 2026

[New Model]: LingBot-World #1045

Open

1 task

TKONIY mentioned this pull request Mar 22, 2026

[RFC]: World Model Support #1987

Open

20 tasks

tzhouam added the world model label Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Add LingBot-World I2V support#2073

[Model] Add LingBot-World I2V support#2073
pakkah wants to merge 2 commits intovllm-project:mainfrom
pakkah:main

pakkah commented Mar 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pakkah commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pakkah commented Mar 22, 2026 •

edited

Loading