-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat(omni): add Cosmos3 support to vLLM-Omni backend #10132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
ebe6779
b9b9ca3
22812d0
7744835
0034bee
001eacb
22d56b9
2c48064
271214e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| #!/bin/bash | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Aggregated Cosmos3 image-to-video generation (1 GPU). | ||
| # Same worker as text-to-video (registers the "video" modality); i2v is driven | ||
| # by adding "input_reference" to the /v1/videos request. The image loader | ||
| # rejects local file paths — pass a data: URI (base64) or an http(s) URL. | ||
| # --no-cosmos3-guardrails skips loading the safety guardrail models. | ||
|
|
||
| set -e | ||
| trap 'echo Cleaning up...; kill 0' EXIT | ||
|
|
||
| SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" | ||
| source "$SCRIPT_DIR/../../../common/gpu_utils.sh" | ||
| source "$SCRIPT_DIR/../../../common/launch_utils.sh" | ||
|
|
||
| MODEL="nvidia/Cosmos3-Nano" | ||
|
|
||
| # Parse command line arguments | ||
| EXTRA_ARGS=() | ||
| while [[ $# -gt 0 ]]; do | ||
| case $1 in | ||
| --model) | ||
| MODEL="$2" | ||
| shift 2 | ||
| ;; | ||
| *) | ||
| EXTRA_ARGS+=("$1") | ||
| shift | ||
| ;; | ||
| esac | ||
| done | ||
|
|
||
| HTTP_PORT="${DYN_HTTP_PORT:-8000}" | ||
| GPU_MEM_ARGS=$(build_vllm_gpu_mem_args) | ||
| print_launch_banner --no-curl "Launching vLLM-Omni Cosmos3 Image-to-Video (1 GPU)" "$MODEL" "$HTTP_PORT" | ||
| print_curl_footer <<CURL | ||
| # input_reference must be an http(s) URL or a data: URI (local paths are rejected) | ||
| curl -s http://localhost:${HTTP_PORT}/v1/videos \\ | ||
| -H 'Content-Type: application/json' \\ | ||
| -d '{ | ||
| "model": "${MODEL}", | ||
| "prompt": "The scene comes alive, gentle camera motion", | ||
| "input_reference": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg", | ||
| "size": "512x512", | ||
| "response_format": "url", | ||
| "nvext": { | ||
| "num_inference_steps": 20, | ||
| "num_frames": 17 | ||
| } | ||
| }' | jq | ||
| CURL | ||
|
|
||
|
|
||
| python -m dynamo.frontend & | ||
| FRONTEND_PID=$! | ||
|
|
||
| sleep 2 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove fixed readiness sleep and use shared health-check orchestration.
As per coding guidelines, launch scripts should “Avoid readiness sleeps/polls; rely on the shared framework health-check patterns instead.” 🤖 Prompt for AI Agents |
||
|
|
||
| echo "Starting Omni worker..." | ||
| DYN_SYSTEM_PORT=${DYN_SYSTEM_PORT:-8081} \ | ||
| python -m dynamo.vllm.omni \ | ||
| --model "$MODEL" \ | ||
| --output-modalities video \ | ||
| --no-cosmos3-guardrails \ | ||
| --media-output-fs-url file:///tmp/dynamo_media \ | ||
| $GPU_MEM_ARGS \ | ||
| "${EXTRA_ARGS[@]}" & | ||
|
|
||
| # Exit on first worker failure; kill 0 in the EXIT trap tears down the rest | ||
| wait_any_exit | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| #!/bin/bash | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Aggregated Cosmos3 text-to-image generation (1 GPU). | ||
| # Uses the native vLLM-Omni Cosmos3 pipeline; --no-cosmos3-guardrails skips | ||
| # loading the safety guardrail models. A worker serves a single modality, so | ||
| # this script registers the "image" modality (see agg_omni_cosmos3_video.sh | ||
| # for text-to-video). | ||
|
|
||
| set -e | ||
| trap 'echo Cleaning up...; kill 0' EXIT | ||
|
|
||
| SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" | ||
| source "$SCRIPT_DIR/../../../common/launch_utils.sh" | ||
|
|
||
|
Comment on lines
+15
to
+16
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Align this launcher with shared vLLM GPU-memory utilities. This script skips As per coding guidelines, launchers should source Also applies to: 50-57 🧰 Tools🪛 Shellcheck (0.11.0)[info] 15-15: Not following: ./../../../common/launch_utils.sh was not specified as input (see shellcheck -x). (SC1091) 🤖 Prompt for AI Agents |
||
| MODEL="nvidia/Cosmos3-Nano" | ||
|
|
||
| # Parse command line arguments | ||
| EXTRA_ARGS=() | ||
| while [[ $# -gt 0 ]]; do | ||
| case $1 in | ||
| --model) | ||
| MODEL="$2" | ||
| shift 2 | ||
| ;; | ||
| *) | ||
| EXTRA_ARGS+=("$1") | ||
| shift | ||
| ;; | ||
| esac | ||
| done | ||
|
|
||
| HTTP_PORT="${DYN_HTTP_PORT:-8000}" | ||
| print_launch_banner --no-curl "Launching vLLM-Omni Cosmos3 Image Generation (1 GPU)" "$MODEL" "$HTTP_PORT" | ||
| print_curl_footer <<CURL | ||
| curl -s -X POST http://localhost:${HTTP_PORT}/v1/images/generations \\ | ||
| -H 'Content-Type: application/json' \\ | ||
| -d '{ | ||
| "model": "${MODEL}", | ||
| "prompt": "A robot standing in a bright laboratory", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. imo, for the examples, I think we should provide an appropriate JSON caption, not a dense one. If later on, we add JSON upsampling within the container, we can have a normal "dense" prompt as an example and then a extra parameter like "upsample_prompt=True" or whatever. |
||
| "size": "512x512", | ||
| "num_inference_steps": 20 | ||
| }' | jq | ||
| CURL | ||
|
|
||
|
|
||
| python -m dynamo.frontend & | ||
| FRONTEND_PID=$! | ||
|
|
||
| sleep 2 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replace fixed startup sleep with framework readiness handling. Using As per coding guidelines, launch scripts should “Avoid readiness sleeps/polls; rely on the shared framework health-check patterns instead.” 🤖 Prompt for AI Agents |
||
|
|
||
| echo "Starting Omni worker..." | ||
| DYN_SYSTEM_PORT=${DYN_SYSTEM_PORT:-8081} \ | ||
| python -m dynamo.vllm.omni \ | ||
| --model "$MODEL" \ | ||
| --output-modalities image \ | ||
| --no-cosmos3-guardrails \ | ||
| --media-output-fs-url file:///tmp/dynamo_media \ | ||
| "${EXTRA_ARGS[@]}" & | ||
|
|
||
| # Exit on first worker failure; kill 0 in the EXIT trap tears down the rest | ||
| wait_any_exit | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| #!/bin/bash | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Aggregated Cosmos3 text-to-video generation (1 GPU). | ||
| # Uses the native vLLM-Omni Cosmos3 pipeline; --no-cosmos3-guardrails skips | ||
| # loading the safety guardrail models. A worker serves a single modality, so | ||
| # this script registers the "video" modality (see agg_omni_cosmos3_image.sh | ||
| # for text-to-image). | ||
|
|
||
| set -e | ||
| trap 'echo Cleaning up...; kill 0' EXIT | ||
|
|
||
| SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" | ||
| source "$SCRIPT_DIR/../../../common/gpu_utils.sh" | ||
| source "$SCRIPT_DIR/../../../common/launch_utils.sh" | ||
|
|
||
| MODEL="nvidia/Cosmos3-Nano" | ||
|
|
||
| # Parse command line arguments | ||
| EXTRA_ARGS=() | ||
| while [[ $# -gt 0 ]]; do | ||
| case $1 in | ||
| --model) | ||
| MODEL="$2" | ||
| shift 2 | ||
| ;; | ||
| *) | ||
| EXTRA_ARGS+=("$1") | ||
| shift | ||
| ;; | ||
| esac | ||
| done | ||
|
|
||
| HTTP_PORT="${DYN_HTTP_PORT:-8000}" | ||
| GPU_MEM_ARGS=$(build_vllm_gpu_mem_args) | ||
| print_launch_banner --no-curl "Launching vLLM-Omni Cosmos3 Video Generation (1 GPU)" "$MODEL" "$HTTP_PORT" | ||
| print_curl_footer <<CURL | ||
| curl -s http://localhost:${HTTP_PORT}/v1/videos \\ | ||
| -H 'Content-Type: application/json' \\ | ||
| -d '{ | ||
| "model": "${MODEL}", | ||
| "prompt": "A waterfall in a green forest, gentle mist", | ||
| "size": "512x512", | ||
| "response_format": "url", | ||
| "nvext": { | ||
| "num_inference_steps": 20, | ||
| "num_frames": 17 | ||
| } | ||
| }' | jq | ||
| CURL | ||
|
|
||
|
|
||
| python -m dynamo.frontend & | ||
| FRONTEND_PID=$! | ||
|
|
||
| sleep 2 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use shared readiness checks instead of a fixed sleep.
As per coding guidelines, launch scripts should “Avoid readiness sleeps/polls; rely on the shared framework health-check patterns instead.” 🤖 Prompt for AI Agents |
||
|
|
||
| echo "Starting Omni worker..." | ||
| DYN_SYSTEM_PORT=${DYN_SYSTEM_PORT:-8081} \ | ||
| python -m dynamo.vllm.omni \ | ||
| --model "$MODEL" \ | ||
| --output-modalities video \ | ||
| --no-cosmos3-guardrails \ | ||
| --media-output-fs-url file:///tmp/dynamo_media \ | ||
| $GPU_MEM_ARGS \ | ||
| "${EXTRA_ARGS[@]}" & | ||
|
|
||
| # Exit on first worker failure; kill 0 in the EXIT trap tears down the rest | ||
| wait_any_exit | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normalize_image_framescollapses a[B, F, H, W, C]Cosmos3 array by takingarr[0], so image requests withn > 1silently drop every generated batch after the first. Fix: preserve and flatten all leading batch/frame dimensions before converting frames to PIL images.🤖 AI Fix
In
components/src/dynamo/common/utils/video_utils.py, updatenormalize_image_framesto replace thewhile arr.ndim > 4: arr = arr[0]logic with validation that the last three dimensions areH, W, Candarr = arr.reshape((-1, *arr.shape[-3:]))so all[B, F, H, W, C]outputs are emitted.