Skip to content

feat: update gpt-oss 120b model recipe#3143

Merged
biswapanda merged 16 commits into
mainfrom
bis/dep-410-oss-gpt-120b-updates
Oct 6, 2025
Merged

feat: update gpt-oss 120b model recipe#3143
biswapanda merged 16 commits into
mainfrom
bis/dep-410-oss-gpt-120b-updates

Conversation

@biswapanda
Copy link
Copy Markdown
Contributor

@biswapanda biswapanda commented Sep 19, 2025

Overview:

  • New Features
    • Added a persistent, shared model cache via Kubernetes PVC to speed up reuse across jobs.
    • Introduced a Kubernetes Job to pre-download and cache the target Hugging Face model using a secret token.
    • Added a benchmarking Job to load-test the GPT‑OSS 120B endpoint, automatically waits for readiness, runs specified concurrencies, and saves performance artifacts.

closes: DEP-410

@biswapanda biswapanda requested a review from a team as a code owner September 19, 2025 16:16
@biswapanda biswapanda requested a review from a team September 19, 2025 16:16
@github-actions github-actions Bot added the feat label Sep 19, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 19, 2025

Walkthrough

Adds three Kubernetes manifests under recipes/gpt-oss-120b: a PersistentVolumeClaim for a shared Hugging Face cache, a Job to pre-download the openai/gpt-oss-120b model into that cache, and a benchmarking Job that waits for model readiness and runs aiperf against a TRT-LLM aggregate endpoint.

Changes

Cohort / File(s) Change Summary
Model cache PVC
recipes/gpt-oss-120b/model-cache/model-cache.yaml
New PersistentVolumeClaim named model-cache (apiVersion v1, RWX, 100Gi, storageClassName oci-nfs) with SPDX header.
Model pre-download Job
recipes/gpt-oss-120b/model-cache/model-download.yaml
New Kubernetes Job that installs huggingface_hub and hf_transfer, authenticates via hf-token-secret, downloads MODEL_NAME=openai/gpt-oss-120b into /root/.cache/huggingface/hub backed by the model-cache PVC; backoffLimit 3, parallelism 1, restartPolicy Never.
Benchmark (TRT-LLM agg) Job
recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml
New Kubernetes Job oss-gpt120b-bench running aiperf against endpoint gpt-oss-agg-trtllmworker:8000 for TARGET_MODEL openai/gpt-oss-120b; waits for /v1/models readiness, iterates over CONCURRENCIES, writes artifacts under ROOT_ARTIFACT_DIR, mounts model-cache PVC at /root/.cache/huggingface; uses nvcrimagepullsecret.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant K8s as Kubernetes
  participant PVC as PVC model-cache
  participant DL as Job: model-download
  participant Bench as Job: oss-gpt120b-bench
  participant EP as TRT-LLM Endpoint (agg)

  rect rgb(240,248,255)
    Note over PVC: Shared HF cache (RWX, 100Gi)
  end

  K8s->>PVC: Create PersistentVolumeClaim
  K8s->>DL: Schedule model-download (mounts PVC)
  DL->>DL: Install huggingface_hub, hf_transfer
  DL->>HF: huggingface-cli download MODEL_NAME
  DL->>PVC: Write model files to /root/.cache/huggingface/hub
  DL-->>K8s: Complete

  K8s->>Bench: Schedule bench job (mounts PVC)
  loop Poll every 5s
    Bench->>EP: GET /v1/models
    EP-->>Bench: Model list (checks for TARGET_MODEL)
  end
  Bench->>EP: POST /v1/chat/completions (stream), varying concurrency
  Bench->>Artifacts: Write results per concurrency under ROOT_ARTIFACT_DIR
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I cached a moon of tokens bright,
In NFS fields through the night—
Then hopped to bench with eager cheer,
“Are models up?” I twitch an ear.
Concurrency thumps, results take flight,
A happy hare with graphs in sight. 🐇📈

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description provides an overview and a closing issue reference but omits the Details and Where should the reviewer start sections from the repository’s template, and the Related Issues header is misformatted, so it does not fully follow the required structure. Add the missing “#### Details:” and “#### Where should the reviewer start?” sections with specifics on the changes and entry points, and format the Related Issues section as “#### Related Issues:” followed by “- closes GitHub issue: #DEP-410.”
Title Check ❓ Inconclusive The title indicates a feature update to the GPT-OSS 120B model recipe but is overly generic and does not specify the new additions such as the shared PVC cache, model download job, or benchmarking job, making it unclear at a glance what the primary change is. Consider revising the title to clearly highlight the key feature additions, for example, “feat: add persistent model cache and download/benchmark jobs for GPT-OSS 120B recipe.”
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (8)
recipes/gpt-oss-120b/model-cache/model-cache.yaml (1)

1-13: LGTM! Add missing newline at end of file.

The PVC configuration is appropriate for shared model caching with NFS storage. The 100Gi storage size and ReadWriteMany access mode are suitable for sharing cached models across multiple jobs.

Apply this diff to fix the missing newline:

   storageClassName: oci-nfs
+
recipes/gpt-oss-120b/model-cache/model-download.yaml (3)

25-28: Update obsolete comment.

The comment references "llama-3-70b model" but the actual model being downloaded is "openai/gpt-oss-120b".

Apply this diff to correct the comment:

-            # NOTE: This is the model name for the llama-3-70b model
-            # Update this to model name for the model you are downloading
+            # NOTE: This is the model name for the openai/gpt-oss-120b model

29-33: Remove redundant HF_TOKEN configuration.

The HF_TOKEN is already sourced from hf-token-secret via envFrom (lines 21-23), making the explicit env var definition redundant.

Apply this diff to remove the redundant configuration:

-            - name: HF_TOKEN
-              valueFrom:
-                secretKeyRef:
-                  name: hf-token-secret
-                  key: HF_TOKEN

1-46: Add missing newline at end of file.

Apply this diff to fix the missing newline:

           claimName: model-cache
+
recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml (4)

46-46: Address the TODO comment.

The TODO indicates this setup should be baked into the aiperf image. This suggests the current approach is temporary and adds overhead to job startup.

Do you want me to help create an issue to track building a custom aiperf image with these dependencies pre-installed?


60-61: Remove duplicate echo statement.

Line 60 duplicates the message from line 59.

Apply this diff to remove the duplicate:

             echo "✅ Model '$TARGET_MODEL' is now available!"
-            echo "Model '$TARGET_MODEL' is now available!"

103-112: Fix JSON key naming inconsistency.

The JSON uses inconsistent key naming: "model endpoint" contains a space while other keys use underscores or are single words.

Apply this diff for consistent naming:

             "model endpoint": "$TARGET_MODEL"
+            "model_endpoint": "$TARGET_MODEL"

8-8: Consider increasing backoffLimit for robustness.

The benchmarking job has backoffLimit: 1, meaning it will fail after a single retry. Given the complexity of the benchmarking setup and potential for transient failures, consider increasing this value.

Apply this diff to allow more retries:

-  backoffLimit: 1
+  backoffLimit: 3
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56f423c and e74467f.

📒 Files selected for processing (3)
  • recipes/gpt-oss-120b/model-cache/model-cache.yaml (1 hunks)
  • recipes/gpt-oss-120b/model-cache/model-download.yaml (1 hunks)
  • recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
recipes/gpt-oss-120b/model-cache/model-cache.yaml

[error] 13-13: no new line character at the end of file

(new-line-at-end-of-file)

recipes/gpt-oss-120b/model-cache/model-download.yaml

[error] 46-46: no new line character at the end of file

(new-line-at-end-of-file)

🪛 Checkov (3.2.334)
recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml

[medium] 3-126: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 3-126: Minimize the admission of root containers

(CKV_K8S_23)

recipes/gpt-oss-120b/model-cache/model-download.yaml

[medium] 3-46: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 3-46: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test - sglang
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml (1)

26-27: Verify CONCURRENCIES are appropriate for the 32‑GPU deployment

Both recipes/gpt-oss-120b/trtllm/agg/bench.yaml and recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml set CONCURRENCIES to very large values (bench: "13000 13500 1400"; bench-pub: "13000 13500") and the scripts iterate over $CONCURRENCIES to run perf — confirm these are intentional and compatible with DEPLOYMENT_GPU_COUNT (32 GPUs), or reduce/document them.

Locations: recipes/gpt-oss-120b/trtllm/agg/bench.yaml (lines ~26, ~112–115) and recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml (lines ~26, ~114–117).

Comment thread recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml Outdated
@biswapanda biswapanda changed the title feat: use public aiperf image for gpt-oss 120b feat: update gpt-oss 120b model recipe Sep 22, 2025
@biswapanda biswapanda requested a review from a team as a code owner September 22, 2025 22:24
@biswapanda biswapanda force-pushed the bis/dep-410-oss-gpt-120b-updates branch from a348ea5 to d81752b Compare September 24, 2025 22:58
@biswapanda biswapanda self-assigned this Sep 24, 2025
Comment thread recipes/gpt-oss-120b/model-cache/model-download.yaml Outdated
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Sep 25, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
@biswapanda biswapanda force-pushed the bis/dep-410-oss-gpt-120b-updates branch from 92b5e31 to 61e60db Compare September 25, 2025 00:10
@biswapanda
Copy link
Copy Markdown
Contributor Author

/ok to test 7918461

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
@biswapanda biswapanda merged commit de6fdf0 into main Oct 6, 2025
20 of 23 checks passed
@biswapanda biswapanda deleted the bis/dep-410-oss-gpt-120b-updates branch October 6, 2025 15:55
biswapanda added a commit that referenced this pull request Oct 6, 2025
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
saturley-hall pushed a commit that referenced this pull request Oct 7, 2025
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
nv-tusharma pushed a commit that referenced this pull request Oct 20, 2025
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
yao531441 pushed a commit to yao531441/dynamo that referenced this pull request May 13, 2026
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants