[Model] Add LingBot-World I2V support#2073
Draft
pakkah wants to merge 2 commits intovllm-project:mainfrom
Draft
Conversation
Signed-off-by: Zhaoxiang Huang <zhaoxiang.huang@outlook.com>
Signed-off-by: asukaqaq-s <1311722138@qq.com>
wjsuijlenh
added a commit
to zzhang-fr/vllm-omni
that referenced
this pull request
Apr 15, 2026
…up claims Fix three things the earlier phase-D draft got wrong by re-reading the DreamZero paper carefully: 1. Decimal separator: the naive baseline is 5.7s per chunk (not 7s), and the bimanual action horizon is 1.6s per chunk (not 6s). pypdf had silently dropped the leading digit at line breaks. 2. Step-reduction progression: DreamZero does not "stay at 16 steps". DiT Caching (velocity reuse within a chunk, based on cosine similarity of flow-matching velocities) reduces effective steps from 16 to 4. DreamZero-Flash, a training-time noise-schedule change, further reduces to 1 step. The paper's Table 3 shows the task-progress cost: naive 1-step loses 31 points on table-bussing, Flash 1-step loses only 9. 3. The 38x headline is GB200-only. Table 1 shows the cumulative speedup caps at 9.6x on H100; NVFP4 and DreamZero-Flash rows are dashed for H100. CFG parallelism is also multi-GPU from row 2, so the post-baseline config in the paper is never single-GPU. Also separate DreamZero's "DiT Caching" (intra-chunk velocity reuse) from RFC vllm-project#1987 / StreamDiffusionV2's "rolling KV cache across chunks" -- these are different optimizations and should not be conflated. Drop all cross-baseline speedup attribution from phase_d_cross_check.md and the journal's Phase D/E sections. Earlier drafts claimed that e.g. "36x of DreamZero's 38x is accounted for by choosing streaming-point hyperparameters", which compares across different baselines, hardware, training regimes, and chunk shapes. That comparison is invalid and is removed. Our own measurement (53.30s -> 2.585s = 20.6x on Wan-1.3B at our offline vs our streaming point on 1x A100) is retained as a fact about our own two configurations, with no claim about how it relates to any published speedup. Contribution-target vllm-project#3 (rolling KV cache + blockwise-causal attention) is downgraded from "~1.5-2x at our streaming operating point" to "speed-up not measured on our hardware; research commitment, not quantified target." The measured ~260 ms framework overhead is kept as the concrete motivating number for target #2 (per-call overhead reduction, RFC vllm-project#2073). Add a correction notice at the top of the earlier SOTA-scan DreamZero subsection pointing readers to the Phase D corrections.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Add
LingbotWorldPipelinesupport forrobbyant/lingbot-world-base-camand integrate it into the existing image-to-video example flow. This PR closes #1045.Implementation notes:
DreamID-Omniintegration pattern: keep the pipeline self-loading, reuse an external dependency repo via a download helper, and keep the vllm-omni-specific model adaptation local--action-patharg, which matches the upstream LingBot-World usage patternTest Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)