Skip to content

feat(omni): add Cosmos3 support to vLLM-Omni backend#10132

Open
ayushag-nv wants to merge 9 commits into
mainfrom
cosmos3-omni-integration
Open

feat(omni): add Cosmos3 support to vLLM-Omni backend#10132
ayushag-nv wants to merge 9 commits into
mainfrom
cosmos3-omni-integration

Conversation

@ayushag-nv
Copy link
Copy Markdown
Contributor

@ayushag-nv ayushag-nv commented May 29, 2026

Summary

Adds Dynamo vLLM-Omni backend support for NVIDIA Cosmos3 (nvidia/Cosmos3-Nano / -Super) — text-to-image, text-to-video, and image-to-video — backed by the native Cosmos3 pipeline in vllm-project/vllm-omni#3454.

Changes

Worker integration (components/src/dynamo/vllm/omni, common/utils)

  • --cosmos3-guardrails / --no-cosmos3-guardrails flag, routed into AsyncOmni(model_config={"guardrails": False}) so the Cosmos3 safety-guardrail models can be skipped at startup.
  • normalize_image_frames() + the image output path: the native Cosmos3 pipeline returns numpy [batch, frames, H, W, C] arrays (not PIL), so the formatter normalizes them before PNG-encoding /v1/images/generations responses.

Examples & docs

  • Launch scripts agg_omni_cosmos3_{image,video,i2v}.sh (one modality per worker).
  • Sample request payloads under examples/backends/vllm/launch/cosmos3/ (official Cosmos3 prompts mapped to the Dynamo request schema).
  • Guide docs/backends/vllm/cosmos3.md (install, serving, request formats, gotchas).

Tests for normalize_image_frames, the guardrails arg, and the model_config passthrough.

Dependency

Requires the Cosmos3 pipeline from vllm-omni#3454 (not yet in a released vLLM-Omni). This PR is the Dynamo-side integration only — container pinning of vLLM-Omni is intentionally not included; install vLLM-Omni from that PR per the guide.

Notes

  • One modality per worker (--output-modalities image|video) — request type derives from the worker's configured modality, not the HTTP endpoint.
  • Image size is the OpenAI enum; per-request num_inference_steps / guidance_scale / seed go under nvext; i2v input_reference must be an http(s) URL or a data: URI (local paths rejected).

Verified t2i / t2v / i2v end-to-end through the frontend.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for NVIDIA Cosmos3 omni model via vLLM backend with new --cosmos3-guardrails CLI flag to control optional safety features.
    • Introduced image frame normalization for improved handling of diffusion pipeline outputs.
  • Improvements

    • MP4 video encoding now uses h264_nvenc codec for better GPU efficiency.
    • Updated FFmpeg configuration for enhanced media support.
  • Documentation

    • New guide for running Cosmos3 text-to-image, text-to-video, and image-to-video generation.
    • Added example launch scripts and request payloads for Cosmos3 workflows.
  • Tests

    • Expanded test coverage for frame normalization and Cosmos3 guardrails configuration.

Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

-H 'Content-Type: application/json' \\
-d '{
"model": "${MODEL}",
"prompt": "A robot standing in a bright laboratory",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, for the examples, I think we should provide an appropriate JSON caption, not a dense one.
currently, we don't have any upsampling within the container, so our example captions should only be JSON strings.

If later on, we add JSON upsampling within the container, we can have a normal "dense" prompt as an example and then a extra parameter like "upsample_prompt=True" or whatever.

@ayushag-nv ayushag-nv changed the title feat(omni): add Cosmos3-Nano support to vLLM-Omni backend feat(omni): add Cosmos3 support to vLLM-Omni backend May 29, 2026
…stall

Signed-off-by: ayushag <ayushag@nvidia.com>
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 29, 2026
Signed-off-by: ayushag <ayushag@nvidia.com>
@@ -0,0 +1,12 @@
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too long for a sample request. Where is this from? If from web download, can we just point to the url instead of have this in the repo?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuanLuo This is from cosmos private repo. There is just a release branch. We can clean this up before merging. Till then more examples of payloads will be published publicly.

saturley-hall and others added 2 commits May 31, 2026 04:09
Cosmos3 pipelines are only in the unreleased vllm-omni PR
vllm-project/vllm-omni#3454, not in any released wheel. Re-enable the
git-install mechanism (reverted in 7744835) so the vllm-runtime
container installs vllm-omni from the canonical repo pinned to the
current PR head SHA (65b83d87, == refs/pull/3454/head).

When vllm_omni_git_url is set, install_vllm_omni.sh installs
"vllm-omni @ git+<url>@<ref>"; otherwise it falls back to the released
"vllm-omni==<ref>" wheel.

Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…it (#10091)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
(cherry picked from commit dc2f352)
@github-actions github-actions Bot added backend::sglang Relates to the sglang backend backend::trtllm Relates to the trtllm backend labels May 31, 2026
@saturley-hall saturley-hall marked this pull request as ready for review May 31, 2026 08:27
@saturley-hall saturley-hall requested review from a team as code owners May 31, 2026 08:27
@saturley-hall saturley-hall added the blocked Waiting on external dependency label May 31, 2026
@saturley-hall
Copy link
Copy Markdown
Member

/ok to test 22d56b9

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 31, 2026

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

Review Change Stack

Walkthrough

This PR integrates NVIDIA Cosmos3 omni model support into Dynamo's vLLM-Omni backend, including codec migration from libx264 to h264_nvenc for MP4 encoding, Cosmos3 guardrails toggleable configuration, FFmpeg build infrastructure with NVENC/VP9 encoder support, updated container runtime images to deploy LGPL ffmpeg instead of bundled GPL binary wheels, vLLM-Omni git-based installation support, and comprehensive documentation with launch examples for text-to-image, text-to-video, and image-to-video generation.

Changes

Cosmos3 Model Integration

Layer / File(s) Summary
Video normalization and MP4 codec migration
components/src/dynamo/common/utils/video_utils.py, components/src/dynamo/common/tests/test_video_utils.py, components/src/dynamo/sglang/request_handlers/video_generation/video_generation_handler.py, components/src/dynamo/vllm/omni/output_formatter.py
Adds normalize_image_frames() helper to flatten diffusion outputs into ordered PIL frames, migrates MP4 encoding from libx264 to h264_nvenc codec across video utilities and handlers, and integrates normalization into image output formatter. Tests validate codec selection and normalization behavior for PIL passthrough, uint8/float numpy arrays, and multi-dimensional Cosmos3 inputs.
Cosmos3 guardrails configuration
components/src/dynamo/vllm/omni/args.py, components/src/dynamo/vllm/omni/base_handler.py, components/src/dynamo/vllm/tests/omni/test_omni_args.py, components/src/dynamo/vllm/tests/omni/test_omni_base_handler.py
Adds --cosmos3-guardrails CLI flag (default enabled) and OmniConfig.cosmos3_guardrails field; BaseOmniHandler conditionally injects model_config={"guardrails": False} into AsyncOmni kwargs when disabled; tests verify toggle behavior and configuration validation.
FFmpeg NVENC and VP9 build support
container/templates/wheel_builder.Dockerfile, container/context.yaml, container/templates/args.Dockerfile
Extends wheel_builder compilation to include NVENC and VP9 codecs: adds build args for nv_codec_headers and libvpx versions, installs source builds for both codec libraries in CUDA and non-CUDA environments, configures FFmpeg with h264_nvenc and libvpx_vp9 encoder support while maintaining LGPL-only licensing; updates build arg declarations and version pins in context and templates.
Runtime FFmpeg deployment and imageio source installation
container/deps/requirements.common.txt, container/deps/requirements.sglang.txt, container/deps/requirements.trtllm.txt, container/deps/requirements.vllm.txt, container/templates/dynamo_runtime.Dockerfile, container/templates/sglang_runtime.Dockerfile, container/templates/trtllm_runtime.Dockerfile
Updates all runtime images to copy LGPL ffmpeg binary and libraries from wheel_builder stage into /usr/local, runs ldconfig, and sets IMAGEIO_FFMPEG_EXE environment variable; enforces source installation of imageio-ffmpeg via --no-binary directive across all requirements files to avoid bundled GPL binary wheel, allowing imageio to use the in-tree LGPL CLI via environment variable.
vLLM-Omni git-based installation
container/deps/vllm/install_vllm_omni.sh, container/context.yaml, container/templates/args.Dockerfile, container/templates/vllm_runtime.Dockerfile, container/templates/dynamo_base.Dockerfile
Refactors install_vllm_omni.sh to support git checkout via VLLM_OMNI_GIT_URL: computes unified VLLM_OMNI_SPEC that selects git or PyPI installation based on URL presence; updates vllm_runtime and base Dockerfiles with new build args; context.yaml pins vLLM-Omni to specific commit SHA; removes SCCACHE_VERSION default and adds AWS_SDK_CPP_VERSION arg to enable caller-supplied build configuration.
Cosmos3 documentation and launch examples
docs/backends/vllm/cosmos3.md, docs/backends/trtllm/trtllm-diffusion.md, examples/backends/vllm/launch/agg_omni_cosmos3_image.sh, examples/backends/vllm/launch/agg_omni_cosmos3_video.sh, examples/backends/vllm/launch/agg_omni_cosmos3_i2v.sh, examples/backends/vllm/launch/cosmos3/t2i.json, examples/backends/vllm/launch/cosmos3/t2v.json, examples/backends/vllm/launch/cosmos3/i2v.json
Comprehensive Cosmos3 documentation covering checkpoint selection, supported modalities, setup from cosmos3-omni-integration branch with pinned vLLM-Omni commit, and request format specifications; three launch scripts demonstrate aggregated inference with guardrails disabled and media output to filesystem; JSON payload examples show prompt structures and generation parameters for text-to-image, text-to-video, and image-to-video workflows; TensorRT-LLM documentation clarified with container-aware ffmpeg/NVENC setup guidance.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(omni): add Cosmos3 support to vLLM-Omni backend' clearly and concisely summarizes the main change—adding Cosmos3 model support to the vLLM-Omni backend for text-to-image, text-to-video, and image-to-video tasks.
Description check ✅ Passed The PR description includes Overview (Summary section), Details (Changes section covering worker integration, examples & docs, tests, and dependencies), and Related Issues; however, it lacks a dedicated 'Where should the reviewer start?' section explicitly calling out specific files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Trivy (0.69.3)

Trivy execution failed: 2026-05-31T08:28:08Z FATAL Fatal error run error: fs scan error: scan error: scan failed: failed analysis: post analysis error: post analysis error: ansible scan error: fs filter error: fs filter error: walk error range error: stat .coderabbit-opengrep-fallback.yml: no such file or directory: range error: stat .coderabbit-opengrep-fallback.yml: no such file or directory


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
components/src/dynamo/common/utils/video_utils.py (1)

93-93: 💤 Low value

Hoist the PIL import to module top.

from PIL import Image is imported inside the function. The module already implicitly relies on PIL (frames_to_numpy operates on PIL Images), so moving this to the top-level imports is safe and aligns with the repo convention.

♻️ Proposed change
 import numpy as np
+from PIL import Image
-    from PIL import Image
-
     out: list = []

As per coding guidelines: "Keep all imports at module top (no imports inside functions/classes)."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/src/dynamo/common/utils/video_utils.py` at line 93, Move the local
import "from PIL import Image" out of the function and add it to the module's
top-level imports in components/src/dynamo/common/utils/video_utils.py;
specifically remove the in-function import (the one used by frames_to_numpy) and
place "from PIL import Image" alongside the other imports at file top so
frames_to_numpy and any other PIL-dependent functions reference the module-level
Image symbol.
container/templates/trtllm_runtime.Dockerfile (1)

154-161: 💤 Low value

Inconsistent error handling for libav/libsw copies compared to sglang.

In sglang_runtime.Dockerfile, the libav*.so* and libsw*.so* copies (line 43) will fail the build if missing, but here (lines 155-156) they silently succeed with || true. If FFmpeg encoding is required for TRT-LLM diffusion (as the comment states), these libraries are also required for the ffmpeg binary to function.

Consider removing 2>/dev/null || true from lines 155-156 to match the sglang behavior and fail fast if the wheel_builder stage is missing required libraries.

Suggested change for consistency
 RUN --mount=type=bind,from=wheel_builder,source=/usr/local/,target=/tmp/usr/local/ \
-    cp -nL /tmp/usr/local/lib/libav*.so* /usr/local/lib/ 2>/dev/null || true && \
-    cp -nL /tmp/usr/local/lib/libsw*.so* /usr/local/lib/ 2>/dev/null || true && \
+    cp -nL /tmp/usr/local/lib/libav*.so* /usr/local/lib/ && \
+    cp -nL /tmp/usr/local/lib/libsw*.so* /usr/local/lib/ && \
     cp -nL /tmp/usr/local/lib/lib*vpx*.so* /usr/local/lib/ 2>/dev/null || true && \
     cp -nL /tmp/usr/local/bin/ffmpeg /usr/local/bin/ffmpeg && \
     cp -r /tmp/usr/local/src/ffmpeg /usr/local/src/ && \
     ldconfig
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@container/templates/trtllm_runtime.Dockerfile` around lines 154 - 161, The
RUN step currently silences failures for critical libs by appending "2>/dev/null
|| true" to the cp of libav*.so* and libsw*.so*; remove the "2>/dev/null ||
true" (and optional "2>/dev/null") from the cp commands that match "cp -nL
/tmp/usr/local/lib/libav*.so*" and "cp -nL /tmp/usr/local/lib/libsw*.so*" so the
build fails fast if those libraries are missing (keeping the cp for lib*vpx*.so*
and the ffmpeg cp/ldconfig as-is), since ffmpeg (ENV
IMAGEIO_FFMPEG_EXE=/usr/local/bin/ffmpeg) requires these libs to function.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/backends/vllm/cosmos3.md`:
- Around line 112-114: The multiline shell snippet in the docs uses backslash
line continuations with inline comments after the backslashes which breaks
copy/paste execution; update the example around the flags --output-modalities,
--no-cosmos3-guardrails, and --media-output-fs-url so comments are on their own
lines (or provide separate full-command variants) instead of trailing the
backslashes, ensuring each continued line ends only with the backslash and the
flag text so the shell command is copy/paste-safe.
- Around line 31-32: Replace the generic link text "[link]" with descriptive
labels matching the checkpoint names so the table rows for `nvidia/Cosmos3-Nano`
and `nvidia/Cosmos3-Super` use link text like "Cosmos3-Nano" and "Cosmos3-Super"
respectively; update the markdown links in the table to read
`[Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano)` and
`[Cosmos3-Super](https://huggingface.co/nvidia/Cosmos3-Super)` so linting passes
and the labels clearly identify the checkpoints.
- Around line 31-32: The HF checkpoint links for the model names
`nvidia/Cosmos3-Nano` and `nvidia/Cosmos3-Super` in the docs page are returning
401 in link-check CI; update the two Markdown link targets for those model
entries so they point to publicly accessible URLs that pass docs link-check (for
example swap the current https://huggingface.co/... checkpoint links for the
public model hub pages or an official NVIDIA/public docs page), or alternatively
add those exact HF URLs/statuses to the docs-link-check allowlist; change the
two link targets referenced alongside the `nvidia/Cosmos3-Nano` and
`nvidia/Cosmos3-Super` entries to the new URLs or add them to the allowlist so
CI no longer fails.

In `@examples/backends/vllm/launch/agg_omni_cosmos3_i2v.sh`:
- Line 50: Remove the fragile fixed wait ("sleep 2") from
agg_omni_cosmos3_i2v.sh and replace it with the project’s shared health-check
orchestration: remove the "sleep 2" line and invoke the centralized readiness
check (use the launch framework's health-check helper or wait-for-ready wrapper
used by other launch scripts) to block until the service reports healthy; ensure
you call the same health-check entrypoint used elsewhere in the repo so the
script follows the launch-script convention for readiness handling.

In `@examples/backends/vllm/launch/agg_omni_cosmos3_image.sh`:
- Around line 15-16: The launcher missing gpu_utils integration should source
gpu_utils.sh (via SCRIPT_DIR/../../../common/gpu_utils.sh) and use
build_vllm_gpu_mem_args() when constructing the vLLM CLI invocation in
agg_omni_cosmos3_image.sh; update the script to source gpu_utils.sh near the
other shared utils and insert the output of build_vllm_gpu_mem_args into the
vLLM/vllm-server command-line assembly so GPU memory flags are consistent with
other Cosmos3 launchers.
- Line 48: Replace the fixed "sleep 2" with a proper readiness check: remove the
"sleep 2" line and instead call the shared framework health-check/ready helper
(e.g., a common script or function like wait_for_framework_ready or
check_framework_health) in a loop with a timeout and non-zero exit if not ready;
ensure the script waits for the specific service(s) the launch depends on and
logs progress/errors so startup is deterministic and not flaky.

In `@examples/backends/vllm/launch/agg_omni_cosmos3_video.sh`:
- Line 49: Replace the fixed "sleep 2" with a call to the repository's shared
readiness-check helper (instead of a blind sleep, invoke the common
wait-for-ready/health-check script or function used elsewhere), passing the
service endpoint/port or health URL for the component started in this script and
fail the launch if the check returns non-zero; specifically remove the "sleep 2"
line and invoke the shared readiness checker (e.g., wait_for_service or
wait-for-ready) with the correct args so the script blocks until a successful
health response and exits on timeout/error.

---

Nitpick comments:
In `@components/src/dynamo/common/utils/video_utils.py`:
- Line 93: Move the local import "from PIL import Image" out of the function and
add it to the module's top-level imports in
components/src/dynamo/common/utils/video_utils.py; specifically remove the
in-function import (the one used by frames_to_numpy) and place "from PIL import
Image" alongside the other imports at file top so frames_to_numpy and any other
PIL-dependent functions reference the module-level Image symbol.

In `@container/templates/trtllm_runtime.Dockerfile`:
- Around line 154-161: The RUN step currently silences failures for critical
libs by appending "2>/dev/null || true" to the cp of libav*.so* and libsw*.so*;
remove the "2>/dev/null || true" (and optional "2>/dev/null") from the cp
commands that match "cp -nL /tmp/usr/local/lib/libav*.so*" and "cp -nL
/tmp/usr/local/lib/libsw*.so*" so the build fails fast if those libraries are
missing (keeping the cp for lib*vpx*.so* and the ffmpeg cp/ldconfig as-is),
since ffmpeg (ENV IMAGEIO_FFMPEG_EXE=/usr/local/bin/ffmpeg) requires these libs
to function.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4d3be066-b46b-43d3-8b8b-18d9296de8e8

📥 Commits

Reviewing files that changed from the base of the PR and between 5b4bc1d and 22d56b9.

📒 Files selected for processing (29)
  • components/src/dynamo/common/tests/test_video_utils.py
  • components/src/dynamo/common/utils/video_utils.py
  • components/src/dynamo/sglang/request_handlers/video_generation/video_generation_handler.py
  • components/src/dynamo/vllm/omni/args.py
  • components/src/dynamo/vllm/omni/base_handler.py
  • components/src/dynamo/vllm/omni/output_formatter.py
  • components/src/dynamo/vllm/tests/omni/test_omni_args.py
  • components/src/dynamo/vllm/tests/omni/test_omni_base_handler.py
  • container/context.yaml
  • container/deps/requirements.common.txt
  • container/deps/requirements.sglang.txt
  • container/deps/requirements.trtllm.txt
  • container/deps/requirements.vllm.txt
  • container/deps/vllm/install_vllm_omni.sh
  • container/templates/args.Dockerfile
  • container/templates/dynamo_base.Dockerfile
  • container/templates/dynamo_runtime.Dockerfile
  • container/templates/sglang_runtime.Dockerfile
  • container/templates/trtllm_runtime.Dockerfile
  • container/templates/vllm_runtime.Dockerfile
  • container/templates/wheel_builder.Dockerfile
  • docs/backends/trtllm/trtllm-diffusion.md
  • docs/backends/vllm/cosmos3.md
  • examples/backends/vllm/launch/agg_omni_cosmos3_i2v.sh
  • examples/backends/vllm/launch/agg_omni_cosmos3_image.sh
  • examples/backends/vllm/launch/agg_omni_cosmos3_video.sh
  • examples/backends/vllm/launch/cosmos3/i2v.json
  • examples/backends/vllm/launch/cosmos3/t2i.json
  • examples/backends/vllm/launch/cosmos3/t2v.json

Comment on lines +31 to +32
| `nvidia/Cosmos3-Nano` | Smaller, faster — default in the Dynamo launch scripts below | [link](https://huggingface.co/nvidia/Cosmos3-Nano) |
| `nvidia/Cosmos3-Super` | Larger, higher quality | [link](https://huggingface.co/nvidia/Cosmos3-Super) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use descriptive link labels for checkpoint URLs.

[link] is too generic and already flagged by markdownlint. Use labels like Cosmos3-Nano / Cosmos3-Super.

As per coding guidelines, for **/*.md documentation quality should be maintained; replacing non-descriptive link text improves clarity and lint compliance.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 31-31: Link text should be descriptive

(MD059, descriptive-link-text)


[warning] 32-32: Link text should be descriptive

(MD059, descriptive-link-text)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/backends/vllm/cosmos3.md` around lines 31 - 32, Replace the generic link
text "[link]" with descriptive labels matching the checkpoint names so the table
rows for `nvidia/Cosmos3-Nano` and `nvidia/Cosmos3-Super` use link text like
"Cosmos3-Nano" and "Cosmos3-Super" respectively; update the markdown links in
the table to read `[Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano)`
and `[Cosmos3-Super](https://huggingface.co/nvidia/Cosmos3-Super)` so linting
passes and the labels clearly identify the checkpoints.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the checkpoint links that currently fail docs link-check CI.

The current Hugging Face checkpoint URLs are failing lychee with 401, which blocks docs checks. Please switch these to URLs that pass CI (or update the docs-link-check allowlist for these exact domains/statuses).

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 31-31: Link text should be descriptive

(MD059, descriptive-link-text)


[warning] 32-32: Link text should be descriptive

(MD059, descriptive-link-text)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/backends/vllm/cosmos3.md` around lines 31 - 32, The HF checkpoint links
for the model names `nvidia/Cosmos3-Nano` and `nvidia/Cosmos3-Super` in the docs
page are returning 401 in link-check CI; update the two Markdown link targets
for those model entries so they point to publicly accessible URLs that pass docs
link-check (for example swap the current https://huggingface.co/... checkpoint
links for the public model hub pages or an official NVIDIA/public docs page), or
alternatively add those exact HF URLs/statuses to the docs-link-check allowlist;
change the two link targets referenced alongside the `nvidia/Cosmos3-Nano` and
`nvidia/Cosmos3-Super` entries to the new URLs or add them to the allowlist so
CI no longer fails.

Comment on lines +112 to +114
--output-modalities image \ # or: video
--no-cosmos3-guardrails \ # skip loading the safety guardrail models
--media-output-fs-url file:///tmp/dynamo_media
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The multiline shell example is not copy/paste-safe.

The inline comments after line-continuation backslashes break the command. Move those comments to separate lines (or provide separate command variants) so the snippet executes as documented.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/backends/vllm/cosmos3.md` around lines 112 - 114, The multiline shell
snippet in the docs uses backslash line continuations with inline comments after
the backslashes which breaks copy/paste execution; update the example around the
flags --output-modalities, --no-cosmos3-guardrails, and --media-output-fs-url so
comments are on their own lines (or provide separate full-command variants)
instead of trailing the backslashes, ensuring each continued line ends only with
the backslash and the flag text so the shell command is copy/paste-safe.

python -m dynamo.frontend &
FRONTEND_PID=$!

sleep 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove fixed readiness sleep and use shared health-check orchestration.

sleep 2 is a fragile startup gate and violates the launch-script convention for readiness handling.

As per coding guidelines, launch scripts should “Avoid readiness sleeps/polls; rely on the shared framework health-check patterns instead.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/backends/vllm/launch/agg_omni_cosmos3_i2v.sh` at line 50, Remove the
fragile fixed wait ("sleep 2") from agg_omni_cosmos3_i2v.sh and replace it with
the project’s shared health-check orchestration: remove the "sleep 2" line and
invoke the centralized readiness check (use the launch framework's health-check
helper or wait-for-ready wrapper used by other launch scripts) to block until
the service reports healthy; ensure you call the same health-check entrypoint
used elsewhere in the repo so the script follows the launch-script convention
for readiness handling.

Comment on lines +15 to +16
source "$SCRIPT_DIR/../../../common/launch_utils.sh"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align this launcher with shared vLLM GPU-memory utilities.

This script skips gpu_utils.sh and does not use build_vllm_gpu_mem_args, so users can’t control VRAM behavior consistently with the other Cosmos3 launchers.

As per coding guidelines, launchers should source gpu_utils.sh and “Use build_vllm_gpu_mem_args() to construct GPU memory CLI flags for vLLM.”

Also applies to: 50-57

🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 15-15: Not following: ./../../../common/launch_utils.sh was not specified as input (see shellcheck -x).

(SC1091)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/backends/vllm/launch/agg_omni_cosmos3_image.sh` around lines 15 -
16, The launcher missing gpu_utils integration should source gpu_utils.sh (via
SCRIPT_DIR/../../../common/gpu_utils.sh) and use build_vllm_gpu_mem_args() when
constructing the vLLM CLI invocation in agg_omni_cosmos3_image.sh; update the
script to source gpu_utils.sh near the other shared utils and insert the output
of build_vllm_gpu_mem_args into the vLLM/vllm-server command-line assembly so
GPU memory flags are consistent with other Cosmos3 launchers.

python -m dynamo.frontend &
FRONTEND_PID=$!

sleep 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace fixed startup sleep with framework readiness handling.

Using sleep 2 introduces flaky startup behavior across machines.

As per coding guidelines, launch scripts should “Avoid readiness sleeps/polls; rely on the shared framework health-check patterns instead.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/backends/vllm/launch/agg_omni_cosmos3_image.sh` at line 48, Replace
the fixed "sleep 2" with a proper readiness check: remove the "sleep 2" line and
instead call the shared framework health-check/ready helper (e.g., a common
script or function like wait_for_framework_ready or check_framework_health) in a
loop with a timeout and non-zero exit if not ready; ensure the script waits for
the specific service(s) the launch depends on and logs progress/errors so
startup is deterministic and not flaky.

python -m dynamo.frontend &
FRONTEND_PID=$!

sleep 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use shared readiness checks instead of a fixed sleep.

sleep 2 is not reliable for service readiness and can fail under slower startup conditions.

As per coding guidelines, launch scripts should “Avoid readiness sleeps/polls; rely on the shared framework health-check patterns instead.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/backends/vllm/launch/agg_omni_cosmos3_video.sh` at line 49, Replace
the fixed "sleep 2" with a call to the repository's shared readiness-check
helper (instead of a blind sleep, invoke the common wait-for-ready/health-check
script or function used elsewhere), passing the service endpoint/port or health
URL for the component started in this script and fail the launch if the check
returns non-zero; specifically remove the "sleep 2" line and invoke the shared
readiness checker (e.g., wait_for_service or wait-for-ready) with the correct
args so the script blocks until a successful health response and exits on
timeout/error.

out.append(item)
continue
arr = np.asarray(item)
while arr.ndim > 4: # [batch, frames, H, W, C] -> [frames, H, W, C]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalize_image_frames collapses a [B, F, H, W, C] Cosmos3 array by taking arr[0], so image requests with n > 1 silently drop every generated batch after the first. Fix: preserve and flatten all leading batch/frame dimensions before converting frames to PIL images.

🤖 AI Fix

In components/src/dynamo/common/utils/video_utils.py, update normalize_image_frames to replace the while arr.ndim > 4: arr = arr[0] logic with validation that the last three dimensions are H, W, C and arr = arr.reshape((-1, *arr.shape[-3:])) so all [B, F, H, W, C] outputs are emitted.

RUN --mount=type=bind,source=./container/deps/requirements.vllm.txt,target=/tmp/requirements.vllm.txt \
--mount=type=cache,target=/root/.cache/uv,sharing=locked \
export UV_CACHE_DIR=/root/.cache/uv && \
uv pip install {{ pip_target }} --reinstall-package imageio-ffmpeg --no-deps \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reinstalling imageio-ffmpeg from source removes the bundled ffmpeg from the vLLM image, but vLLM-Omni video formatting still calls diffusers.export_to_video through imageio's ffmpeg writer and will fail without a configured ffmpeg/h264 encoder. Fix: copy the in-tree ffmpeg CLI/libs into the vLLM image, set IMAGEIO_FFMPEG_EXE, and ensure the vLLM video formatter uses h264_nvenc instead of imageio's default libx264.

🤖 AI Fix

In container/templates/vllm_runtime.Dockerfile, copy /usr/local/bin/ffmpeg plus libav*.so*, libsw*.so*, and lib*vpx*.so* from wheel_builder, run ldconfig, and set ENV IMAGEIO_FFMPEG_EXE=/usr/local/bin/ffmpeg; in components/src/dynamo/vllm/omni/output_formatter.py DiffusionFormatter._encode_video, replace diffusers.export_to_video with the shared encode_to_video_bytes(..., output_format="mp4") path so the codec is h264_nvenc.

The vllm-runtime build failed at install_vllm_omni.sh with "Git executable
not found" because uv needs git to fetch the vllm-omni PR pin
(git+https://...@65b83d87), but the upstream vllm/vllm-openai runtime image
does not ship git. The released-wheel install never needed it.

Add git to the existing omni apt step, gated on VLLM_OMNI_GIT_URL via
${VLLM_OMNI_GIT_URL:+git} so the PyPI-wheel path (and the eventual revert)
keeps the runtime image lean.

Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@saturley-hall
Copy link
Copy Markdown
Member

/ok to test 2c48064

The diffusion image tests fed bare MagicMock() objects as images. Since
ebe6779 routed _prepare_images through normalize_image_frames(), a
non-PIL input takes the np.asarray(item).max() path; MagicMock.__iter__
defaults to empty, so np.asarray(MagicMock()) is a zero-size array and
arr.max() raises "zero-size array to reduction operation maximum". These
8 tests only ran in CI once the runtime image built, exposing the failure.

Swap the MagicMock image doubles for real PIL images via a _make_pil_image()
helper, so they hit the isinstance(item, Image.Image) pass-through and
img.save(buf, format="PNG") produces real PNG bytes. Assertions unchanged.

Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@saturley-hall
Copy link
Copy Markdown
Member

/ok to test 271214e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::sglang Relates to the sglang backend backend::trtllm Relates to the trtllm backend backend::vllm Relates to the vllm backend blocked Waiting on external dependency container documentation Improvements or additions to documentation feat multimodal size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants