feat: update gpt-oss 120b model recipe#3143
Conversation
WalkthroughAdds three Kubernetes manifests under recipes/gpt-oss-120b: a PersistentVolumeClaim for a shared Hugging Face cache, a Job to pre-download the openai/gpt-oss-120b model into that cache, and a benchmarking Job that waits for model readiness and runs aiperf against a TRT-LLM aggregate endpoint. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant K8s as Kubernetes
participant PVC as PVC model-cache
participant DL as Job: model-download
participant Bench as Job: oss-gpt120b-bench
participant EP as TRT-LLM Endpoint (agg)
rect rgb(240,248,255)
Note over PVC: Shared HF cache (RWX, 100Gi)
end
K8s->>PVC: Create PersistentVolumeClaim
K8s->>DL: Schedule model-download (mounts PVC)
DL->>DL: Install huggingface_hub, hf_transfer
DL->>HF: huggingface-cli download MODEL_NAME
DL->>PVC: Write model files to /root/.cache/huggingface/hub
DL-->>K8s: Complete
K8s->>Bench: Schedule bench job (mounts PVC)
loop Poll every 5s
Bench->>EP: GET /v1/models
EP-->>Bench: Model list (checks for TARGET_MODEL)
end
Bench->>EP: POST /v1/chat/completions (stream), varying concurrency
Bench->>Artifacts: Write results per concurrency under ROOT_ARTIFACT_DIR
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (8)
recipes/gpt-oss-120b/model-cache/model-cache.yaml (1)
1-13: LGTM! Add missing newline at end of file.The PVC configuration is appropriate for shared model caching with NFS storage. The 100Gi storage size and ReadWriteMany access mode are suitable for sharing cached models across multiple jobs.
Apply this diff to fix the missing newline:
storageClassName: oci-nfs +recipes/gpt-oss-120b/model-cache/model-download.yaml (3)
25-28: Update obsolete comment.The comment references "llama-3-70b model" but the actual model being downloaded is "openai/gpt-oss-120b".
Apply this diff to correct the comment:
- # NOTE: This is the model name for the llama-3-70b model - # Update this to model name for the model you are downloading + # NOTE: This is the model name for the openai/gpt-oss-120b model
29-33: Remove redundant HF_TOKEN configuration.The HF_TOKEN is already sourced from
hf-token-secretviaenvFrom(lines 21-23), making the explicit env var definition redundant.Apply this diff to remove the redundant configuration:
- - name: HF_TOKEN - valueFrom: - secretKeyRef: - name: hf-token-secret - key: HF_TOKEN
1-46: Add missing newline at end of file.Apply this diff to fix the missing newline:
claimName: model-cache +recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml (4)
46-46: Address the TODO comment.The TODO indicates this setup should be baked into the aiperf image. This suggests the current approach is temporary and adds overhead to job startup.
Do you want me to help create an issue to track building a custom aiperf image with these dependencies pre-installed?
60-61: Remove duplicate echo statement.Line 60 duplicates the message from line 59.
Apply this diff to remove the duplicate:
echo "✅ Model '$TARGET_MODEL' is now available!" - echo "Model '$TARGET_MODEL' is now available!"
103-112: Fix JSON key naming inconsistency.The JSON uses inconsistent key naming: "model endpoint" contains a space while other keys use underscores or are single words.
Apply this diff for consistent naming:
"model endpoint": "$TARGET_MODEL" + "model_endpoint": "$TARGET_MODEL"
8-8: Consider increasing backoffLimit for robustness.The benchmarking job has
backoffLimit: 1, meaning it will fail after a single retry. Given the complexity of the benchmarking setup and potential for transient failures, consider increasing this value.Apply this diff to allow more retries:
- backoffLimit: 1 + backoffLimit: 3
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
recipes/gpt-oss-120b/model-cache/model-cache.yaml(1 hunks)recipes/gpt-oss-120b/model-cache/model-download.yaml(1 hunks)recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
recipes/gpt-oss-120b/model-cache/model-cache.yaml
[error] 13-13: no new line character at the end of file
(new-line-at-end-of-file)
recipes/gpt-oss-120b/model-cache/model-download.yaml
[error] 46-46: no new line character at the end of file
(new-line-at-end-of-file)
🪛 Checkov (3.2.334)
recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml
[medium] 3-126: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 3-126: Minimize the admission of root containers
(CKV_K8S_23)
recipes/gpt-oss-120b/model-cache/model-download.yaml
[medium] 3-46: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 3-46: Minimize the admission of root containers
(CKV_K8S_23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - sglang
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml (1)
26-27: Verify CONCURRENCIES are appropriate for the 32‑GPU deploymentBoth recipes/gpt-oss-120b/trtllm/agg/bench.yaml and recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml set CONCURRENCIES to very large values (bench: "13000 13500 1400"; bench-pub: "13000 13500") and the scripts iterate over $CONCURRENCIES to run perf — confirm these are intentional and compatible with DEPLOYMENT_GPU_COUNT (32 GPUs), or reduce/document them.
Locations: recipes/gpt-oss-120b/trtllm/agg/bench.yaml (lines ~26, ~112–115) and recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml (lines ~26, ~114–117).
a348ea5 to
d81752b
Compare
d81752b to
d04e971
Compare
d04e971 to
e39ba15
Compare
e39ba15 to
92b5e31
Compare
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
92b5e31 to
61e60db
Compare
|
/ok to test 7918461 |
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Overview:
closes: DEP-410