[Diffusion] DreamZero world model integration with CFG parallel + OpenPI serving by TKONIY · Pull Request #2162 · vllm-project/vllm-omni

TKONIY · 2026-03-25T02:53:14Z

Summary

Current PR branch: feature/dreamzero-pipeline
Latest pushed commit: a820f779 tests: fix diffusion scheduler mock imports after rebase

This PR integrates DreamZero into vllm-omni with:

DreamZero diffusion pipeline support
OpenPI-compatible robot WebSocket serving
DreamZero-specific transform pipeline owned by the model
DreamZero stage config + model-specific policy server config
Online serving example with bundled real videos
DROID sim-evals rollout client for the vLLM OpenPI endpoint
Explicit optional-dependency errors in the DreamZero example clients
Self-contained e2e tests plus optional upstream parity checks

What Changed

Model / Pipeline

Added DreamZero stage config: vllm_omni/model_executor/stage_configs/dreamzero.yaml
Added DreamZero model defaults in vllm_omni/diffusion/models/dreamzero/utils.py
Moved robot transforms into vllm_omni/diffusion/models/dreamzero/transform/
Let DreamZeroPipeline consume raw robot_obs and apply transform inside the model
Kept reset semantics as deferred engine reset via extra_args["reset"]

Serving

Added model-specific PolicyServerConfig loading in vllm_omni/entrypoints/openai/realtime/robot/openpi_serving.py
Added optional dependency guard, structured errors, and idle timeout in vllm_omni/entrypoints/openai/realtime/robot/openpi_connection.py
Added create_policy_server() so OpenPI serving is enabled only when the loaded model provides policy_server_config
Sent OpenPI handshake metadata from model config instead of hardcoding it in the WebSocket connection layer

Framework Wiring

vllm_omni/diffusion/stage_diffusion_proc.py
- no longer carries DreamZero-specific inline detection
- preserves explicit model_class_name for unregistered architectures
vllm_omni/diffusion/diffusion_engine.py
- deduplicates multimodal payload slicing for audio / actions
vllm_omni/entrypoints/utils.py
- resolves DreamZero config from model type override

Tests / Examples / Docs

Added OpenPI unit tests:
- tests/entrypoints/openai_api/test_openpi_connection.py
- tests/entrypoints/openai_api/test_openpi_serving.py
Added DreamZero e2e / example tests:
- tests/e2e/online_serving/test_dreamzero.py
- tests/examples/online_serving/test_dreamzero.py
Added optional upstream parity checks under:
- tests/dreamzero/upstream/
Added runnable example:
- examples/online_serving/dreamzero/
Added offline prediction-video export helpers:
- examples/online_serving/dreamzero/export_prediction_video.py
- examples/online_serving/dreamzero/generate_comparison_videos.py
Added DROID sim rollout client:
- examples/online_serving/dreamzero/droid_sim_eval_client.py
Added model docs:
- docs/models/dreamzero/README.md
- docs/models/dreamzero/quick_start.md
Documented per-script environment requirements for:
- vllm serve
- bundled OpenPI client
- prediction-video export helpers
- DROID sim-eval client
- optional upstream parity tests

Quick Start

Start the server

From the repository root:

CUDA_VISIBLE_DEVICES=0,1 \
examples/online_serving/dreamzero/run_server.sh

If you only want 1 GPU:

CUDA_VISIBLE_DEVICES=0 \
CFG_PARALLEL_SIZE=1 \
examples/online_serving/dreamzero/run_server.sh

OpenPI websocket endpoint:

ws://127.0.0.1:8000/v1/realtime/robot/openpi

Run the client

From the repository root:

python examples/online_serving/dreamzero/openpi_client.py \
  --host 127.0.0.1 \
  --port 8000

Extra client dependencies:

pip install openpi-client websockets opencv-python

The example client uses bundled real videos from:

examples/online_serving/dreamzero/assets/

Optional flags:

python examples/online_serving/dreamzero/openpi_client.py \
  --host 127.0.0.1 \
  --port 8000 \
  --video-dir examples/online_serving/dreamzero/assets \
  --session-id demo-session \
  --num-chunks 2

Run DROID sim-eval

Run this from an environment where isaaclab, isaaclab_tasks,
sim_evals, and gymnasium are already importable.

pip install openpi-client websockets opencv-python mediapy

From the vllm-omni repository root, invoke the client through an external
Isaac Lab launcher, for example:

/path/to/isaaclab.sh -p \
  examples/online_serving/dreamzero/droid_sim_eval_client.py \
  --host 127.0.0.1 \
  --port 8000 \
  --scene 1 \
  --episodes 1 \
  --headless \
  --device cuda:0

Validation

Passed locally:

PYTHONPATH=. pytest tests/entrypoints/openai_api/test_openpi_serving.py tests/entrypoints/openai_api/test_openpi_connection.py -q
OPENPI_E2E_GPUS=0,1 PYTHONPATH=. pytest tests/e2e/online_serving/test_dreamzero.py -q --run-level=advanced_model
PYTHONPATH=. .venv/bin/python -m py_compile examples/online_serving/dreamzero/openpi_client.py examples/online_serving/dreamzero/droid_sim_eval_client.py

This confirms:

DreamZero OpenPI handshake / serving works
DreamZero online e2e works with real bundled videos

Performance Snapshot

Measurement scope for the numbers below:

eager mode (--enforce-eager)
no torch.compile
no DiT cache / dynamic cached schedule in the baseline table
single-request OpenPI serving path
DreamZero parallelism configured via stage YAML parallel_config, not CLI TP/CFG flags

Hardware environment for these measurements:

GPU: 4x NVIDIA RTX PRO 6000 Blackwell Server Edition, 97887 MiB VRAM each
GPU driver: 590.48.01
CPU: 2x AMD EPYC 9355 32-Core Processor (128 logical CPUs total)
Host RAM: 1.5 TiB

VRAM

Interpretation notes:

dreamzero.yaml is the default TP=1, CFG=1 baseline
all GPUs on this host are the same model, so the table reports only how many GPUs were used and the per-GPU memory numbers
TP=2, CFG=2 needs 4 GPUs and is still blocked on this host

Mode	GPUs Used	Per-GPU startup VRAM	Per-GPU peak VRAM	Status	Notes
`TP=1, CFG=1`	1	`43.58 GiB`	`52.01 GB reserved`	Measured	`vllm_omni/model_executor/stage_configs/dreamzero.yaml`
`TP=1, CFG=2`	2	`43.58 GiB`	`49.61 GB reserved`	Measured	True CFG-parallel serving via `dreamzero_tp1_cfg2.yaml`
`TP=2, CFG=1`	2	`28.88 GiB`	`32.65 GB reserved`	Measured	True TP serving via `dreamzero_tp2_cfg1.yaml`
`TP=2, CFG=2`	4	Not measured	Not measured	Pending	Requires 4 GPUs; blocked by unrelated jobs on GPUs `2,3`

Latency

Interpretation notes:

latency metric here is server-side DiffusionEngine.step breakdown ... total=... ms
each row below used the same OpenPI client workload:
- 3 action-producing requests (infer, infer, reset, infer)
- same bundled DreamZero example videos
TP=1, CFG=2 reduces wall time versus baseline on this workload
TP=2, CFG=1 works correctly but is slower than baseline on this workload

Mode	GPUs Used	Mean latency	P50 latency	Range	Client wall time	Status	Notes
`TP=1, CFG=1`	1	`7349.93 ms`	`7419.06 ms`	`7145.32–7485.42 ms`	`22.329 s`	Measured	`tmp/dreamzero_perf_yaml/tp1_cfg1.log`
`TP=1, CFG=2`	2	`4033.93 ms`	`3863.00 ms`	`3829.45–4409.35 ms`	`12.365 s`	Measured	`tmp/dreamzero_perf_yaml/tp1_cfg2.log`
`TP=2, CFG=1`	2	`8636.77 ms`	`8526.22 ms`	`8451.61–8932.49 ms`	`26.196 s`	Measured	`tmp/dreamzero_perf_yaml/tp2_cfg1.log`
`TP=2, CFG=2`	4	Not measured	Not measured	Not measured	Not measured	Pending	Requires 4 GPUs; blocked by unrelated jobs on GPUs `2,3`

Important Future Work

Design clear and stable API and protocols for robot. Current design of PolicyServerConfig and Transform is ugly.

Manage KV Cache with Paged Attention.
Performance Optimizations, e.g., asynchronous pipelined execution.

chatgpt-codex-connector · 2026-04-13T00:48:26Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

linyueqian

Nice clean architecture with good separation of concerns (connection / serving / transform). A few items to address before merge.

hsliuustc0106

BLOCKING:

Merge conflict — This PR is in CONFLICTING state. Please rebase onto latest main before review can proceed.

Non-blocking notes:

PR description TODO says "Clear the comments" — please resolve before merge.
Consider adding concrete latency / VRAM numbers to the PR description (even rough figures from the local validation runs). The current "passed locally" section lists test commands but not their measured outputs.

Implement the DreamZero omni serving path as a single clean feature commit without test- or doc-only files. This keeps the model registry/stage detection, root-config-driven pipeline initialization, root-checkpoint weight loading, DreamZero model/state wiring, and OpenPI serving / transform integration required for the feature branch. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

Add a dedicated DreamZero video-latent decode helper that matches upstream WanVideoVAE decode semantics. The fix keeps forward() output as normalized video latents for serving, but documents the contract clearly and restores exact debug-video parity by inverting latent normalization in bf16 the same way as the upstream source path. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

Add concise environment guidance for DreamZero serving, bundled OpenPI client usage, and DROID sim-eval rollout usage. Also guard optional client-side imports in the DreamZero example scripts so missing non-core dependencies fail with explicit messages instead of opaque import errors. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

Document the sim-eval launch flow without assuming Isaac Lab lives inside the vllm-omni repo, while still assuming commands are run from the repository root. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

Add offline example scripts to export DreamZero prediction videos and generate TP/CFG comparison outputs without changing the serving path. Document the workflow in the DreamZero quick start and example README, ignore local generated video artifacts, and add stage YAMLs for TP/CFG variants used by the comparison helper. Also update DreamZero weight loading to honor custom parameter weight loaders during remapped checkpoint loading. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

Move the upstream DreamZero policy imports in test_openpi_client_ar behind a helper so the file passes E402 without changing behavior, and restore the BasePolicy import while removing the unused cv2 dependency guard in the DROID sim-eval client. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

Run the same pre-commit --all-files pass used by CI and commit the resulting ruff/format adjustments so the DreamZero PR branch is clean under the repo's global hooks. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

TKONIY · 2026-04-20T19:11:34Z

@yinpeiqi @hsliuustc0106 @linyueqian
There are a lot of unit tests under tests/dreamzero/upstream and end-to-end test to check if our implementation aligns with upstream DreamZero's official implementation. They require dependency to that repo. Do you prefer keeping or removing them?

TKONIY · 2026-04-21T17:17:06Z

BLOCKING:

Merge conflict — This PR is in CONFLICTING state. Please rebase onto latest main before review can proceed.

Non-blocking notes:

PR description TODO says "Clear the comments" — please resolve before merge.

Consider adding concrete latency / VRAM numbers to the PR description (even rough figures from the local validation runs). The current "passed locally" section lists test commands but not their measured outputs.

All comments have been addressed.

TKONIY force-pushed the feature/dreamzero-pipeline branch 3 times, most recently from 8a51735 to 22bfc2f Compare March 26, 2026 19:16

TKONIY force-pushed the feature/dreamzero-pipeline branch 5 times, most recently from 1d6c89f to f5bfbc9 Compare April 3, 2026 21:25

TKONIY force-pushed the feature/dreamzero-pipeline branch from f5bfbc9 to 783b0f1 Compare April 11, 2026 01:15

TKONIY mentioned this pull request Apr 11, 2026

[RFC]: World Model Support #1987

Open

20 tasks

TKONIY force-pushed the feature/dreamzero-pipeline branch from 31a9faf to e6d1229 Compare April 13, 2026 00:18

TKONIY marked this pull request as ready for review April 13, 2026 00:48

TKONIY requested a review from hsliuustc0106 as a code owner April 13, 2026 00:48

TKONIY force-pushed the feature/dreamzero-pipeline branch 2 times, most recently from 7542ada to e048473 Compare April 13, 2026 01:49

linyueqian reviewed Apr 13, 2026

View reviewed changes

TKONIY mentioned this pull request Apr 13, 2026

[RFC]: Support 3D World Model (VGGT, etc.) #2727

Open

1 task

tzhouam added the world model label Apr 14, 2026

yinpeiqi reviewed Apr 14, 2026

View reviewed changes

KevinZeng08 mentioned this pull request Apr 17, 2026

[WIP][Diffusion] LingBot-VA world-action model support #2885

Draft

13 tasks

TKONIY force-pushed the feature/dreamzero-pipeline branch from ce2edf7 to 5fcf7d3 Compare April 18, 2026 21:56

hsliuustc0106 reviewed Apr 19, 2026

View reviewed changes

TKONIY added 7 commits April 19, 2026 19:07

Address DreamZero PR review feedback

e90461f

Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

docs(dreamzero): clarify external isaac lab launcher usage

87b28dd

Document the sim-eval launch flow without assuming Isaac Lab lives inside the vllm-omni repo, while still assuming commands are run from the repository root. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

chore(dreamzero): remove PR-only source mapping comments

91e03f7

Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

tests: fix diffusion scheduler mock imports after rebase

a820f77

Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

TKONIY force-pushed the feature/dreamzero-pipeline branch from 88f2f22 to a820f77 Compare April 19, 2026 19:49

TKONIY added 3 commits April 20, 2026 18:59

style: apply repository-wide pre-commit fixes

2a0f2df

Run the same pre-commit --all-files pass used by CI and commit the resulting ruff/format adjustments so the DreamZero PR branch is clean under the repo's global hooks. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

TKONIY mentioned this pull request Apr 23, 2026

[RFC]: LingBot-World-Fast — Interactive video world model port #3072

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Diffusion] DreamZero world model integration with CFG parallel + OpenPI serving#2162

[Diffusion] DreamZero world model integration with CFG parallel + OpenPI serving#2162
TKONIY wants to merge 10 commits intovllm-project:mainfrom
TKONIY:feature/dreamzero-pipeline

TKONIY commented Mar 25, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 13, 2026

Uh oh!

linyueqian left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Uh oh!

TKONIY commented Apr 20, 2026 •

edited

Loading

Uh oh!

TKONIY commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

TKONIY commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Model / Pipeline

Serving

Framework Wiring

Tests / Examples / Docs

Quick Start

Start the server

Run the client

Run DROID sim-eval

Validation

Performance Snapshot

VRAM

Latency

Important Future Work

Uh oh!

chatgpt-codex-connector Bot commented Apr 13, 2026

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

TKONIY commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TKONIY commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

TKONIY commented Mar 25, 2026 •

edited

Loading

TKONIY commented Apr 20, 2026 •

edited

Loading