Skip to content

feat: use generic image and use single node for oss-gpt-120b recipe#3454

Merged
biswapanda merged 1 commit into
mainfrom
bis/oss-gpt-120b-1node
Oct 7, 2025
Merged

feat: use generic image and use single node for oss-gpt-120b recipe#3454
biswapanda merged 1 commit into
mainfrom
bis/oss-gpt-120b-1node

Conversation

@biswapanda
Copy link
Copy Markdown
Contributor

@biswapanda biswapanda commented Oct 7, 2025

Overview:

  • use generic image
  • use single node
  • fix hub path

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Bug Fixes

    • Corrected model path to ensure models load reliably.
  • Chores

    • Switched container images to a private registry for improved deployment consistency.
    • Adjusted default replica count to a single instance for streamlined deployments.
    • Tuned performance configuration to use fewer GPUs, aligning perf runs with typical resource availability.

@biswapanda biswapanda self-assigned this Oct 7, 2025
@biswapanda biswapanda requested a review from a team as a code owner October 7, 2025 06:32
@biswapanda biswapanda requested a review from a team October 7, 2025 06:32
@biswapanda biswapanda requested a review from a team as a code owner October 7, 2025 06:32
@github-actions github-actions Bot added the feat label Oct 7, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 7, 2025

Walkthrough

Updated TRT-LLM deployment configs: switched container image registry/tag, reduced replicas, adjusted MODEL_PATH prefix, and scaled down GPU count in perf settings. No new features or API changes.

Changes

Cohort / File(s) Summary of Changes
TRT-LLM deployment configs
recipes/gpt-oss-120b/trtllm/agg/deploy.yaml
Changed image from nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3 to my-registry/trtllm-runtime:my-tag in three containers; reduced replicas 18→1 across three sections; updated MODEL_PATH from /model-store/models--openai--gpt-oss-120b/... to /model-store/hub/models--openai--gpt-oss-120b/... in two places.
Perf scaling config
recipes/gpt-oss-120b/trtllm/agg/perf.yaml
Reduced DEPLOYMENT_GPU_COUNT from "72" to "4", affecting derived concurrency and perf inputs; no other logic/path changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I thump my paws—new tags, fewer crews,
One GPU burrow, not seventy-two’s.
Paths hop to “hub,” replicas rest,
Leaner carrots, same warm nest.
In quiet racks, I twitch with glee—
Configs trimmed, swift as a bunny.

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description includes the Overview and Related Issues sections but leaves the Details and Where should the reviewer start sections as unmodified placeholders, providing no concrete information about the specific changes or files that need attention, which makes the description incomplete for effective review. Please populate the Details section with a clear summary of the file changes and update the Where should the reviewer start section to call out the specific files or areas that require focused review.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title clearly and concisely summarizes the main changes by indicating the switch to a generic image and reducing to a single node for the oss-gpt-120b recipe, directly reflecting the core updates in the deployment configurations. It uses specific terminology and avoids vague language, making it easy for reviewers to grasp the primary intent at a glance. The title is appropriately scoped and aligned with the changeset.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@biswapanda biswapanda enabled auto-merge (squash) October 7, 2025 17:16
@biswapanda biswapanda merged commit af7a41c into main Oct 7, 2025
20 checks passed
@biswapanda biswapanda deleted the bis/oss-gpt-120b-1node branch October 7, 2025 17:25
saturley-hall pushed a commit that referenced this pull request Oct 7, 2025
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
ptarasiewiczNV pushed a commit that referenced this pull request Oct 8, 2025
…3454)

Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
saturley-hall added a commit that referenced this pull request May 31, 2026
Cosmos3 pipelines are only in the unreleased vllm-omni PR
vllm-project/vllm-omni#3454, not in any released wheel. Re-enable the
git-install mechanism (reverted in 7744835) so the vllm-runtime
container installs vllm-omni from the canonical repo pinned to the
current PR head SHA (65b83d87, == refs/pull/3454/head).

When vllm_omni_git_url is set, install_vllm_omni.sh installs
"vllm-omni @ git+<url>@<ref>"; otherwise it falls back to the released
"vllm-omni==<ref>" wheel.

Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants