feat: update gpt-oss 120b model recipe by biswapanda · Pull Request #3143 · ai-dynamo/dynamo

biswapanda · 2025-09-19T16:16:43Z

Overview:

New Features
- Added a persistent, shared model cache via Kubernetes PVC to speed up reuse across jobs.
- Introduced a Kubernetes Job to pre-download and cache the target Hugging Face model using a secret token.
- Added a benchmarking Job to load-test the GPT‑OSS 120B endpoint, automatically waits for readiness, runs specified concurrencies, and saves performance artifacts.

coderabbitai · 2025-09-19T16:26:56Z

Walkthrough

Adds three Kubernetes manifests under recipes/gpt-oss-120b: a PersistentVolumeClaim for a shared Hugging Face cache, a Job to pre-download the openai/gpt-oss-120b model into that cache, and a benchmarking Job that waits for model readiness and runs aiperf against a TRT-LLM aggregate endpoint.

Changes

Cohort / File(s)	Change Summary
Model cache PVC `recipes/gpt-oss-120b/model-cache/model-cache.yaml`	New PersistentVolumeClaim named model-cache (apiVersion v1, RWX, 100Gi, storageClassName oci-nfs) with SPDX header.
Model pre-download Job `recipes/gpt-oss-120b/model-cache/model-download.yaml`	New Kubernetes Job that installs huggingface_hub and hf_transfer, authenticates via hf-token-secret, downloads MODEL_NAME=openai/gpt-oss-120b into /root/.cache/huggingface/hub backed by the model-cache PVC; backoffLimit 3, parallelism 1, restartPolicy Never.
Benchmark (TRT-LLM agg) Job `recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml`	New Kubernetes Job oss-gpt120b-bench running aiperf against endpoint gpt-oss-agg-trtllmworker:8000 for TARGET_MODEL openai/gpt-oss-120b; waits for /v1/models readiness, iterates over CONCURRENCIES, writes artifacts under ROOT_ARTIFACT_DIR, mounts model-cache PVC at /root/.cache/huggingface; uses nvcrimagepullsecret.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant K8s as Kubernetes
  participant PVC as PVC model-cache
  participant DL as Job: model-download
  participant Bench as Job: oss-gpt120b-bench
  participant EP as TRT-LLM Endpoint (agg)

  rect rgb(240,248,255)
    Note over PVC: Shared HF cache (RWX, 100Gi)
  end

  K8s->>PVC: Create PersistentVolumeClaim
  K8s->>DL: Schedule model-download (mounts PVC)
  DL->>DL: Install huggingface_hub, hf_transfer
  DL->>HF: huggingface-cli download MODEL_NAME
  DL->>PVC: Write model files to /root/.cache/huggingface/hub
  DL-->>K8s: Complete

  K8s->>Bench: Schedule bench job (mounts PVC)
  loop Poll every 5s
    Bench->>EP: GET /v1/models
    EP-->>Bench: Model list (checks for TARGET_MODEL)
  end
  Bench->>EP: POST /v1/chat/completions (stream), varying concurrency
  Bench->>Artifacts: Write results per concurrency under ROOT_ARTIFACT_DIR

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I cached a moon of tokens bright,
In NFS fields through the night—
Then hopped to bench with eager cheer,
“Are models up?” I twitch an ear.
Concurrency thumps, results take flight,
A happy hare with graphs in sight. 🐇📈

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description provides an overview and a closing issue reference but omits the Details and Where should the reviewer start sections from the repository’s template, and the Related Issues header is misformatted, so it does not fully follow the required structure.	Add the missing “#### Details:” and “#### Where should the reviewer start?” sections with specifics on the changes and entry points, and format the Related Issues section as “#### Related Issues:” followed by “- closes GitHub issue: #DEP-410.”
Title Check	❓ Inconclusive	The title indicates a feature update to the GPT-OSS 120B model recipe but is overly generic and does not specify the new additions such as the shared PVC cache, model download job, or benchmarking job, making it unclear at a glance what the primary change is.	Consider revising the title to clearly highlight the key feature additions, for example, “feat: add persistent model cache and download/benchmark jobs for GPT-OSS 120B recipe.”

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (8)

recipes/gpt-oss-120b/model-cache/model-cache.yaml (1)
1-13: LGTM! Add missing newline at end of file.

The PVC configuration is appropriate for shared model caching with NFS storage. The 100Gi storage size and ReadWriteMany access mode are suitable for sharing cached models across multiple jobs.

Apply this diff to fix the missing newline:
   storageClassName: oci-nfs
+
recipes/gpt-oss-120b/model-cache/model-download.yaml (3)
25-28: Update obsolete comment.

The comment references "llama-3-70b model" but the actual model being downloaded is "openai/gpt-oss-120b".

Apply this diff to correct the comment:
-            # NOTE: This is the model name for the llama-3-70b model
-            # Update this to model name for the model you are downloading
+            # NOTE: This is the model name for the openai/gpt-oss-120b model
29-33: Remove redundant HF_TOKEN configuration.

The HF_TOKEN is already sourced from hf-token-secret via envFrom (lines 21-23), making the explicit env var definition redundant.

Apply this diff to remove the redundant configuration:
-            - name: HF_TOKEN
-              valueFrom:
-                secretKeyRef:
-                  name: hf-token-secret
-                  key: HF_TOKEN
1-46: Add missing newline at end of file.

Apply this diff to fix the missing newline:
           claimName: model-cache
+
recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml (4)
46-46: Address the TODO comment.

The TODO indicates this setup should be baked into the aiperf image. This suggests the current approach is temporary and adds overhead to job startup.

Do you want me to help create an issue to track building a custom aiperf image with these dependencies pre-installed?

60-61: Remove duplicate echo statement.

Line 60 duplicates the message from line 59.

Apply this diff to remove the duplicate:
             echo "✅ Model '$TARGET_MODEL' is now available!"
-            echo "Model '$TARGET_MODEL' is now available!"
103-112: Fix JSON key naming inconsistency.

The JSON uses inconsistent key naming: "model endpoint" contains a space while other keys use underscores or are single words.

Apply this diff for consistent naming:
             "model endpoint": "$TARGET_MODEL"
+            "model_endpoint": "$TARGET_MODEL"
8-8: Consider increasing backoffLimit for robustness.

The benchmarking job has backoffLimit: 1, meaning it will fail after a single retry. Given the complexity of the benchmarking setup and potential for transient failures, consider increasing this value.

Apply this diff to allow more retries:
-  backoffLimit: 1
+  backoffLimit: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56f423c and e74467f.

📒 Files selected for processing (3)

recipes/gpt-oss-120b/model-cache/model-cache.yaml (1 hunks)
recipes/gpt-oss-120b/model-cache/model-download.yaml (1 hunks)
recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml (1 hunks)

🧰 Additional context used

🪛 YAMLlint (1.37.1)

recipes/gpt-oss-120b/model-cache/model-cache.yaml

[error] 13-13: no new line character at the end of file

(new-line-at-end-of-file)

recipes/gpt-oss-120b/model-cache/model-download.yaml

[error] 46-46: no new line character at the end of file

(new-line-at-end-of-file)

🪛 Checkov (3.2.334)

recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml

[medium] 3-126: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 3-126: Minimize the admission of root containers

(CKV_K8S_23)

recipes/gpt-oss-120b/model-cache/model-download.yaml

[medium] 3-46: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 3-46: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Build and Test - sglang
GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml (1)

26-27: Verify CONCURRENCIES are appropriate for the 32‑GPU deployment

Both recipes/gpt-oss-120b/trtllm/agg/bench.yaml and recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml set CONCURRENCIES to very large values (bench: "13000 13500 1400"; bench-pub: "13000 13500") and the scripts iterate over $CONCURRENCIES to run perf — confirm these are intentional and compatible with DEPLOYMENT_GPU_COUNT (32 GPUs), or reduce/document them.

Locations: recipes/gpt-oss-120b/trtllm/agg/bench.yaml (lines ~26, ~112–115) and recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml (lines ~26, ~114–117).

copy-pr-bot · 2025-09-25T00:10:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

biswapanda · 2025-09-30T04:23:31Z

/ok to test 7918461

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

biswapanda requested a review from a team as a code owner September 19, 2025 16:16

biswapanda requested a review from a team September 19, 2025 16:16

pull-request-size Bot added the size/L label Sep 19, 2025

github-actions Bot added the feat label Sep 19, 2025

coderabbitai Bot reviewed Sep 19, 2025

View reviewed changes

Comment thread recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml Outdated

biswapanda changed the title ~~feat: use public aiperf image for gpt-oss 120b~~ feat: update gpt-oss 120b model recipe Sep 22, 2025

biswapanda requested a review from a team as a code owner September 22, 2025 22:24

copy-pr-bot Bot temporarily deployed to GITLAB September 22, 2025 22:24 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB September 22, 2025 22:25 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB September 22, 2025 22:31 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB September 22, 2025 22:32 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB September 22, 2025 22:34 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB September 22, 2025 22:42 Inactive

biswapanda force-pushed the bis/dep-410-oss-gpt-120b-updates branch from a348ea5 to d81752b Compare September 24, 2025 22:58

copy-pr-bot Bot temporarily deployed to GITLAB September 24, 2025 22:58 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB September 24, 2025 23:00 Inactive

biswapanda self-assigned this Sep 24, 2025

julienmancuso reviewed Sep 24, 2025

View reviewed changes

Comment thread recipes/gpt-oss-120b/model-cache/model-download.yaml Outdated

biswapanda force-pushed the bis/dep-410-oss-gpt-120b-updates branch from d81752b to d04e971 Compare September 25, 2025 00:07

copy-pr-bot Bot temporarily deployed to GITLAB September 25, 2025 00:07 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB September 25, 2025 00:08 Inactive

biswapanda force-pushed the bis/dep-410-oss-gpt-120b-updates branch from d04e971 to e39ba15 Compare September 25, 2025 00:09

copy-pr-bot Bot temporarily deployed to GITLAB September 25, 2025 00:09 Inactive

biswapanda force-pushed the bis/dep-410-oss-gpt-120b-updates branch from e39ba15 to 92b5e31 Compare September 25, 2025 00:10

biswapanda added 2 commits September 24, 2025 17:10

add model cache and use public aiperf image

15ac82a

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

fix

61e60db

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

biswapanda force-pushed the bis/dep-410-oss-gpt-120b-updates branch from 92b5e31 to 61e60db Compare September 25, 2025 00:10

biswapanda added 4 commits September 29, 2025 17:42

fix

0cbecad

fix

d67ef6e

fix

123a9fc

fix

d05e7ca

copy-pr-bot Bot temporarily deployed to GITLAB September 30, 2025 01:06 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB September 30, 2025 01:09 Inactive

fix

11554a9

copy-pr-bot Bot temporarily deployed to GITLAB September 30, 2025 02:29 Inactive

biswapanda added 2 commits September 29, 2025 19:30

fix

c737358

fix

9e0e434

copy-pr-bot Bot temporarily deployed to GITLAB September 30, 2025 03:00 Inactive

Merge branch 'main' into bis/dep-410-oss-gpt-120b-updates

7918461

copy-pr-bot Bot temporarily deployed to GITLAB September 30, 2025 04:23 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB September 30, 2025 04:28 Inactive

Merge branch 'main' into bis/dep-410-oss-gpt-120b-updates

659a379

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

copy-pr-bot Bot temporarily deployed to GITLAB October 2, 2025 16:35 Inactive

update

4cfa0c5

copy-pr-bot Bot temporarily deployed to GITLAB October 2, 2025 19:51 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB October 2, 2025 19:53 Inactive

biswapanda merged commit de6fdf0 into main Oct 6, 2025
20 of 23 checks passed

biswapanda deleted the bis/dep-410-oss-gpt-120b-updates branch October 6, 2025 15:55

biswapanda added a commit that referenced this pull request Oct 6, 2025

feat: update gpt-oss 120b model recipe (#3143)

9228909

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

biswapanda mentioned this pull request Oct 6, 2025

feat: update gpt-oss 120b model recipe #3143 #3454 #3431

Merged

saturley-hall pushed a commit that referenced this pull request Oct 7, 2025

feat: update gpt-oss 120b model recipe #3143 #3454 (#3431)

99f1696

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

nv-tusharma pushed a commit that referenced this pull request Oct 20, 2025

feat: update gpt-oss 120b model recipe (#3143)

dcb45ed

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

yao531441 pushed a commit to yao531441/dynamo that referenced this pull request May 13, 2026

feat: update gpt-oss 120b model recipe (ai-dynamo#3143)

817e380

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: update gpt-oss 120b model recipe#3143

feat: update gpt-oss 120b model recipe#3143
biswapanda merged 16 commits into
mainfrom
bis/dep-410-oss-gpt-120b-updates

biswapanda commented Sep 19, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

copy-pr-bot Bot commented Sep 25, 2025

Uh oh!

biswapanda commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

biswapanda commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Uh oh!

coderabbitai Bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

copy-pr-bot Bot commented Sep 25, 2025

Uh oh!

biswapanda commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

biswapanda commented Sep 19, 2025 •

edited

Loading

coderabbitai Bot commented Sep 19, 2025 •

edited

Loading