[Diffusion] DreamZero world model integration with CFG parallel + OpenPI serving#2162
[Diffusion] DreamZero world model integration with CFG parallel + OpenPI serving#2162TKONIY wants to merge 10 commits intovllm-project:mainfrom
Conversation
8a51735 to
22bfc2f
Compare
1d6c89f to
f5bfbc9
Compare
f5bfbc9 to
783b0f1
Compare
31a9faf to
e6d1229
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
7542ada to
e048473
Compare
linyueqian
left a comment
There was a problem hiding this comment.
Nice clean architecture with good separation of concerns (connection / serving / transform). A few items to address before merge.
ce2edf7 to
5fcf7d3
Compare
hsliuustc0106
left a comment
There was a problem hiding this comment.
BLOCKING:
- Merge conflict — This PR is in
CONFLICTINGstate. Please rebase onto latestmainbefore review can proceed.
Non-blocking notes:
- PR description TODO says "Clear the comments" — please resolve before merge.
- Consider adding concrete latency / VRAM numbers to the PR description (even rough figures from the local validation runs). The current "passed locally" section lists test commands but not their measured outputs.
Implement the DreamZero omni serving path as a single clean feature commit without test- or doc-only files. This keeps the model registry/stage detection, root-config-driven pipeline initialization, root-checkpoint weight loading, DreamZero model/state wiring, and OpenPI serving / transform integration required for the feature branch. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>
Add a dedicated DreamZero video-latent decode helper that matches upstream WanVideoVAE decode semantics. The fix keeps forward() output as normalized video latents for serving, but documents the contract clearly and restores exact debug-video parity by inverting latent normalization in bf16 the same way as the upstream source path. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>
Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>
Add concise environment guidance for DreamZero serving, bundled OpenPI client usage, and DROID sim-eval rollout usage. Also guard optional client-side imports in the DreamZero example scripts so missing non-core dependencies fail with explicit messages instead of opaque import errors. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>
Document the sim-eval launch flow without assuming Isaac Lab lives inside the vllm-omni repo, while still assuming commands are run from the repository root. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>
Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>
Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>
88f2f22 to
a820f77
Compare
Add offline example scripts to export DreamZero prediction videos and generate TP/CFG comparison outputs without changing the serving path. Document the workflow in the DreamZero quick start and example README, ignore local generated video artifacts, and add stage YAMLs for TP/CFG variants used by the comparison helper. Also update DreamZero weight loading to honor custom parameter weight loaders during remapped checkpoint loading. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>
Move the upstream DreamZero policy imports in test_openpi_client_ar behind a helper so the file passes E402 without changing behavior, and restore the BasePolicy import while removing the unused cv2 dependency guard in the DROID sim-eval client. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>
Run the same pre-commit --all-files pass used by CI and commit the resulting ruff/format adjustments so the DreamZero PR branch is clean under the repo's global hooks. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>
|
@yinpeiqi @hsliuustc0106 @linyueqian |
All comments have been addressed. |
Summary
Current PR branch:
feature/dreamzero-pipelineLatest pushed commit:
a820f779 tests: fix diffusion scheduler mock imports after rebaseThis PR integrates DreamZero into
vllm-omniwith:sim-evalsrollout client for the vLLM OpenPI endpointWhat Changed
Model / Pipeline
vllm_omni/model_executor/stage_configs/dreamzero.yamlvllm_omni/diffusion/models/dreamzero/utils.pyvllm_omni/diffusion/models/dreamzero/transform/DreamZeroPipelineconsume rawrobot_obsand apply transform inside the modelextra_args["reset"]Serving
PolicyServerConfigloading invllm_omni/entrypoints/openai/realtime/robot/openpi_serving.pyvllm_omni/entrypoints/openai/realtime/robot/openpi_connection.pycreate_policy_server()so OpenPI serving is enabled only when the loaded model providespolicy_server_configFramework Wiring
vllm_omni/diffusion/stage_diffusion_proc.pymodel_class_namefor unregistered architecturesvllm_omni/diffusion/diffusion_engine.pyaudio/actionsvllm_omni/entrypoints/utils.pyTests / Examples / Docs
tests/entrypoints/openai_api/test_openpi_connection.pytests/entrypoints/openai_api/test_openpi_serving.pytests/e2e/online_serving/test_dreamzero.pytests/examples/online_serving/test_dreamzero.pytests/dreamzero/upstream/examples/online_serving/dreamzero/examples/online_serving/dreamzero/export_prediction_video.pyexamples/online_serving/dreamzero/generate_comparison_videos.pyexamples/online_serving/dreamzero/droid_sim_eval_client.pydocs/models/dreamzero/README.mddocs/models/dreamzero/quick_start.mdvllm serveQuick Start
Start the server
From the repository root:
If you only want 1 GPU:
OpenPI websocket endpoint:
ws://127.0.0.1:8000/v1/realtime/robot/openpiRun the client
From the repository root:
Extra client dependencies:
The example client uses bundled real videos from:
examples/online_serving/dreamzero/assets/Optional flags:
Run DROID sim-eval
Run this from an environment where
isaaclab,isaaclab_tasks,sim_evals, andgymnasiumare already importable.From the
vllm-omnirepository root, invoke the client through an externalIsaac Lab launcher, for example:
Validation
Passed locally:
PYTHONPATH=. pytest tests/entrypoints/openai_api/test_openpi_serving.py tests/entrypoints/openai_api/test_openpi_connection.py -qOPENPI_E2E_GPUS=0,1 PYTHONPATH=. pytest tests/e2e/online_serving/test_dreamzero.py -q --run-level=advanced_modelPYTHONPATH=. .venv/bin/python -m py_compile examples/online_serving/dreamzero/openpi_client.py examples/online_serving/dreamzero/droid_sim_eval_client.pyThis confirms:
Performance Snapshot
Measurement scope for the numbers below:
--enforce-eager)torch.compileparallel_config, not CLI TP/CFG flagsHardware environment for these measurements:
4x NVIDIA RTX PRO 6000 Blackwell Server Edition,97887 MiBVRAM each590.48.012x AMD EPYC 9355 32-Core Processor(128logical CPUs total)1.5 TiBVRAM
Interpretation notes:
dreamzero.yamlis the defaultTP=1, CFG=1baselineTP=2, CFG=2needs 4 GPUs and is still blocked on this hostTP=1, CFG=143.58 GiB52.01 GB reservedvllm_omni/model_executor/stage_configs/dreamzero.yamlTP=1, CFG=243.58 GiB49.61 GB reserveddreamzero_tp1_cfg2.yamlTP=2, CFG=128.88 GiB32.65 GB reserveddreamzero_tp2_cfg1.yamlTP=2, CFG=22,3Latency
Interpretation notes:
DiffusionEngine.step breakdown ... total=... msinfer,infer,reset,infer)TP=1, CFG=2reduces wall time versus baseline on this workloadTP=2, CFG=1works correctly but is slower than baseline on this workloadTP=1, CFG=17349.93 ms7419.06 ms7145.32–7485.42 ms22.329 stmp/dreamzero_perf_yaml/tp1_cfg1.logTP=1, CFG=24033.93 ms3863.00 ms3829.45–4409.35 ms12.365 stmp/dreamzero_perf_yaml/tp1_cfg2.logTP=2, CFG=18636.77 ms8526.22 ms8451.61–8932.49 ms26.196 stmp/dreamzero_perf_yaml/tp2_cfg1.logTP=2, CFG=22,3Important Future Work
PolicyServerConfigandTransformis ugly.