fix: update model recipe for llama-3 70b to match with common recipe template #3637#3656
Conversation
WalkthroughThe PR updates container image registries/tags to nvcr.io/nvidia/ai-dynamo:0.6.0, replaces GenAI-Perf with AIPerf across benchmarking code/docs, adds git to several Dockerfiles and new Python deps, introduces Kubernetes pre-deployment check scripts/docs and NIXL build/deploy tooling, adjusts several recipe/perf flows, and removes some Rust test scaffolding. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant S as pre-deployment-check.sh
participant K as Kubernetes API
participant G as GPU Operator
U->>S: Run script
S->>K: kubectl version/check (context)
S-->>U: Connectivity PASS/FAIL
S->>K: Get default StorageClass
S-->>U: Default SC PASS/FAIL
S->>K: Get nodes (check GPUs)
S-->>U: GPU nodes PASS/FAIL
S->>G: Check NVIDIA GPU Operator status
S-->>U: Operator PASS/FAIL
S-->>U: Summary + exit code
sequenceDiagram
autonumber
actor U as User
participant B as build_and_deploy.sh
participant GH as Source ZIP Host
participant D as Docker
participant R as Registry
participant K as Kubernetes API
U->>B: Start (select steps/arch/registry)
B->>GH: Download NIXL source (wget)
B->>D: Build image (buildx)
D->>R: Push image (optional)
B->>B: Update deployment YAML (sed image)
B->>K: kubectl apply -f deployment.yaml
K-->>B: Status
B-->>U: Results/logging guidance
sequenceDiagram
autonumber
participant J as K8s Job (perf)
participant S as Model Service
participant P as AIPerf CLI
participant A as Artifact Store
J->>J: wait_for_model_ready()
J->>S: Probe /healthz
S-->>J: 200 OK when ready
J->>P: aiperf profile --model --endpoint --concurrency
P->>S: Load test traffic
S-->>P: Responses
P-->>A: Write profile_export_aiperf.json
J-->>A: Save input_config.json, logs
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (3 warnings)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml (1)
45-92: Hard-codedMODEL_PATHprevents fresh deploymentsPointing
MODEL_PATHat a specific HuggingFace cache snapshot requires that exact directory already exist on the PVC. On a fresh cluster (PVC empty) the workers start with/bin/sh -c python3 … --model $MODEL_PATH, the path is missing, and vLLM exits immediately. The previous value (RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic) allowed vLLM/HuggingFace to materialize the model on first run. Please revert to the repo ID (or add an explicit pre-download job and keep the repo ID here). Remember to fix both prefill and decode sections.- - name: MODEL_PATH - value: "/root/.cache/huggingface/hub/models--RedHatAI--Llama-3.3-70B-Instruct-FP8-dynamic/snapshots/ddb4128556dfcff99e0c41aee159ea6c3e655dcd" + - name: MODEL_PATH + value: "RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic" @@ - - name: MODEL_PATH - value: "/root/.cache/huggingface/hub/models--RedHatAI--Llama-3.3-70B-Instruct-FP8-dynamic/snapshots/ddb4128556dfcff99e0c41aee159ea6c3e655dcd" + - name: MODEL_PATH + value: "RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic"
♻️ Duplicate comments (9)
recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml (1)
38-39: Duplicate: Hardcoded snapshot hash in both workers.Both workers use the same hardcoded MODEL_PATH with snapshot hash
ddb4128556dfcff99e0c41aee159ea6c3e655dcd. Consider the maintainability concern noted in the agg deployment - centralizing this path value would make updates easier.Also applies to: 67-68
recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (4)
23-102: Consider extracting duplicated shell script to a ConfigMap.Same refactoring suggestion as in disagg-single-node/perf.yaml. This file is nearly identical except for environment variable values (DEPLOYMENT_GPU_COUNT=16, DEPLOYMENT_MODE=disagg-mn, ENDPOINT=llama3-70b-disagg-mn-frontend:8000).
8-8: Consider the impact of reducing backoffLimit.Same issue as in disagg-single-node/perf.yaml. The backoffLimit reduction from 3 to 1 reduces resilience to transient failures.
52-52: Hard-coded tokenizer path is brittle and not portable.Same issue as in disagg-single-node/perf.yaml. The hard-coded tokenizer path with snapshot hash is not portable across environments.
86-98: Undefined variable $max_threads in input_config.json.Same issue as in disagg-single-node/perf.yaml. The variable
$max_threadsis used but not defined in the outer scope.recipes/llama-3-70b/vllm/agg/perf.yaml (4)
23-102: Consider extracting duplicated shell script to a ConfigMap.Same refactoring suggestion as in the other perf.yaml files. This file is nearly identical except for environment variable values (DEPLOYMENT_GPU_COUNT=4, DEPLOYMENT_MODE=agg, ENDPOINT=llama3-70b-agg-frontend:8000).
8-8: Consider the impact of reducing backoffLimit.Same issue as in the other perf.yaml files. The backoffLimit reduction from 3 to 1 reduces resilience to transient failures.
52-52: Hard-coded tokenizer path is brittle and not portable.Same issue as in the other perf.yaml files. The hard-coded tokenizer path with snapshot hash is not portable across environments.
86-98: Undefined variable $max_threads in input_config.json.Same issue as in the other perf.yaml files. The variable
$max_threadsis used but not defined in the outer scope.
🧹 Nitpick comments (10)
container/Dockerfile.sglang-wideep (1)
110-133: Align the section comment with AIPerfWe now install
aiperf, but the header still reads “GenAI Perf,” which can mislead anyone scanning the Dockerfile. Please rename the comment to reflect the new tool.-# GenAI Perf +# AIPerfrecipes/llama-3-70b/vllm/agg/deploy.yaml (2)
38-39: Consider making the snapshot reference more maintainable.The MODEL_PATH uses a hardcoded snapshot hash (
ddb4128556dfcff99e0c41aee159ea6c3e655dcd). This creates tight coupling to a specific model version and may require manual updates when the model is refreshed.Consider whether this should:
- Use a symbolic link or alias that can be updated centrally
- Be parameterized via ConfigMap for easier management
- Include a comment explaining why this specific snapshot is required
35-41: Parameterize hardcoded snapshot and confirm shell-var expansion
- Shell variable substitution in
argsis consistent across all VLLM deployment manifests and will expand correctly under/bin/sh -c.- The fixed snapshot hash in
MODEL_PATHis repeated in every recipe; consider parameterizing this or using a version tag to avoid brittle, manual updates.recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (1)
23-102: Consider extracting duplicated shell script to a ConfigMap.The shell script containing utility functions (
wait_for_model_ready,run_perf) and execution flow is nearly identical across three perf.yaml files (disagg-single-node, disagg-multi-node, and agg). Only the environment variable values differ between them.Consider extracting the common shell script to a ConfigMap and mounting it into the containers. This would:
- Eliminate code duplication
- Make updates easier to maintain
- Reduce the risk of inconsistencies between files
- Follow the DRY principle
Example structure:
# Create a ConfigMap with the script apiVersion: v1 kind: ConfigMap metadata: name: perf-script data: run-perf.sh: | #!/bin/bash # ... entire script here ... # Mount and execute in the Job volumeMounts: - name: perf-script mountPath: /scripts volumes: - name: perf-script configMap: name: perf-script defaultMode: 0755 command: ["/scripts/run-perf.sh"]deploy/cloud/pre-deployment/README.md (1)
1-172: Excellent documentation for pre-deployment checks!This README provides comprehensive documentation for the Dynamo pre-deployment check script, including:
- Clear usage instructions
- Detailed explanation of each check (kubectl connectivity, default StorageClass, GPU resources)
- Sample outputs for both passing and failing scenarios
- Helpful troubleshooting guidance
- External references to Kubernetes documentation
The documentation is well-organized and will be valuable for users deploying Dynamo.
Minor markdown improvements: Consider adding language specifiers to the fenced code blocks at lines 54-94, 97-124, and 150-153 to improve syntax highlighting and comply with markdown linting best practices. For example:
-``` +```text ======================================== Dynamo Pre-Deployment Check Script ========================================Note: The LanguageTool grammar warnings appear to be false positives related to markdown list formatting and can be safely ignored.
deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml (1)
17-35: Consider security hardening for production deployments.Static analysis identified potential security improvements:
- Running without
allowPrivilegeEscalation: falsecould allow privilege escalation- Running as root user increases attack surface
For benchmark workloads, these may be acceptable, but for production consider:
spec: + securityContext: + runAsNonRoot: true + runAsUser: 1000 containers: - name: nixl-benchmark image: "my-registry/nixlbench:version-arch" + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALLdeploy/cloud/pre-deployment/pre-deployment-check.sh (1)
205-236: Consider separate declaration and assignment.Shellcheck warns that declaring and assigning
statusin the same statement (line 214) can mask the return value ofget_check_result. While not a bug here sinceget_check_resultusesecho, separating declaration and assignment is a safer pattern.Apply this diff if you want to follow the more defensive pattern:
IFS='|' read -ra CHECKS <<< "$CHECK_ORDER" for check_name in "${CHECKS[@]}"; do - local status=$(get_check_result "$check_name") + local status + status=$(get_check_result "$check_name") if [[ "$status" == "PASS" ]]; thenbenchmarks/llm/plot_pareto.py (1)
81-130: Consider usingstrict=Truein zip for safety.The function signature change from
genai_perf_profile_export_json_pathstoaiperf_profile_export_json_pathsis correct. However, thezip()call on lines 85-87 should includestrict=Trueto ensure both lists have the same length and catch potential data inconsistencies early.Apply this diff:
- for aiperf_profile_export_json_path, deployment_config_json_path in zip( - aiperf_profile_export_json_paths, deployment_config_json_paths - ): + for aiperf_profile_export_json_path, deployment_config_json_path in zip( + aiperf_profile_export_json_paths, deployment_config_json_paths, strict=True + ):Based on static analysis hints.
deploy/cloud/pre-deployment/nixl/build_and_deploy.sh (1)
188-188: Remove or explain the debug pause.Line 188 contains
read -p "Press Enter to continue"which appears to be a debug statement. This interrupts the automated flow and should either be removed or documented as an intentional manual checkpoint.If this is intentional for user review of the download, add a comment explaining why:
+ # Allow user to inspect downloaded files before building read -p "Press Enter to continue"Otherwise, remove it:
- read -p "Press Enter to continue"benchmarks/profiler/utils/aiperf.py (1)
67-99: Consider iterable unpacking for list construction.Both
get_prefill_aiperf_cmdandget_decode_aiperf_cmdare correctly renamed and functional. However, the list concatenation pattern could be slightly more efficient using iterable unpacking.For example, in
get_prefill_aiperf_cmd:- return _get_common_aiperf_cmd( + return [ + *_get_common_aiperf_cmd( artifact_dir, seed, model, tokenizer, base_url, - ) + [ + ), "--synthetic-input-tokens-mean", str(isl), # ... rest of arguments ]This is a minor style preference and the current implementation is perfectly functional. Based on static analysis hints.
Also applies to: 102-137
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (84)
Earthfile(3 hunks)README.md(1 hunks)benchmarks/README.md(1 hunks)benchmarks/incluster/benchmark_job.yaml(1 hunks)benchmarks/llm/perf.sh(2 hunks)benchmarks/llm/plot_pareto.py(6 hunks)benchmarks/nixl/README.md(0 hunks)benchmarks/profiler/profile_endpoint.py(1 hunks)benchmarks/profiler/profile_sla.py(3 hunks)benchmarks/profiler/utils/aiperf.py(8 hunks)benchmarks/profiler/utils/config.py(1 hunks)benchmarks/profiler/utils/profile_cache.py(5 hunks)benchmarks/profiler/utils/profile_decode.py(2 hunks)benchmarks/profiler/utils/profile_prefill.py(2 hunks)benchmarks/utils/genai.py(4 hunks)benchmarks/utils/plot.py(1 hunks)components/backends/sglang/deploy/README.md(2 hunks)components/backends/sglang/deploy/agg.yaml(2 hunks)components/backends/sglang/deploy/agg_logging.yaml(2 hunks)components/backends/sglang/deploy/agg_router.yaml(2 hunks)components/backends/sglang/deploy/disagg-multinode.yaml(3 hunks)components/backends/sglang/deploy/disagg.yaml(3 hunks)components/backends/sglang/deploy/disagg_planner.yaml(4 hunks)components/backends/sglang/launch/disagg_dp_attn.sh(2 hunks)components/backends/trtllm/deploy/README.md(3 hunks)components/backends/trtllm/deploy/agg-with-config.yaml(2 hunks)components/backends/trtllm/deploy/agg.yaml(2 hunks)components/backends/trtllm/deploy/agg_router.yaml(2 hunks)components/backends/trtllm/deploy/disagg-multinode.yaml(3 hunks)components/backends/trtllm/deploy/disagg.yaml(3 hunks)components/backends/trtllm/deploy/disagg_planner.yaml(4 hunks)components/backends/trtllm/deploy/disagg_router.yaml(3 hunks)components/backends/vllm/deploy/README.md(2 hunks)components/backends/vllm/deploy/agg.yaml(2 hunks)components/backends/vllm/deploy/agg_kvbm.yaml(2 hunks)components/backends/vllm/deploy/agg_router.yaml(2 hunks)components/backends/vllm/deploy/disagg-multinode.yaml(3 hunks)components/backends/vllm/deploy/disagg.yaml(3 hunks)components/backends/vllm/deploy/disagg_kvbm.yaml(3 hunks)components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml(3 hunks)components/backends/vllm/deploy/disagg_kvbm_tp2.yaml(3 hunks)components/backends/vllm/deploy/disagg_planner.yaml(4 hunks)components/backends/vllm/deploy/disagg_router.yaml(3 hunks)container/Dockerfile.sglang(2 hunks)container/Dockerfile.sglang-wideep(2 hunks)container/Dockerfile.trtllm(1 hunks)container/Dockerfile.vllm(1 hunks)container/deps/requirements.txt(2 hunks)deploy/cloud/operator/Earthfile(1 hunks)deploy/cloud/pre-deployment/README.md(1 hunks)deploy/cloud/pre-deployment/nixl/README.md(1 hunks)deploy/cloud/pre-deployment/nixl/build_and_deploy.sh(1 hunks)deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml(1 hunks)deploy/cloud/pre-deployment/pre-deployment-check.sh(1 hunks)docs/_includes/install.rst(2 hunks)docs/backends/sglang/README.md(1 hunks)docs/backends/trtllm/gpt-oss.md(1 hunks)docs/benchmarks/benchmarking.md(7 hunks)docs/kubernetes/README.md(1 hunks)docs/kubernetes/create_deployment.md(1 hunks)docs/kubernetes/sla_planner_quickstart.md(1 hunks)examples/basics/kubernetes/Distributed_Inference/agg_router.yaml(2 hunks)examples/custom_backend/hello_world/deploy/hello_world.yaml(2 hunks)examples/deployments/ECS/task_definition_frontend.json(1 hunks)examples/deployments/ECS/task_definition_prefillworker.json(1 hunks)examples/multimodal/deploy/agg_llava.yaml(4 hunks)examples/multimodal/deploy/agg_qwen.yaml(4 hunks)lib/parsers/Cargo.toml(0 hunks)lib/parsers/tests/mod.rs(0 hunks)recipes/gpt-oss-120b/trtllm/agg/deploy.yaml(2 hunks)recipes/llama-3-70b/model-cache/model-download.yaml(1 hunks)recipes/llama-3-70b/vllm/agg/deploy.yaml(2 hunks)recipes/llama-3-70b/vllm/agg/perf.yaml(2 hunks)recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml(3 hunks)recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml(2 hunks)recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml(3 hunks)recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml(2 hunks)tests/planner/perf_test_configs/agg_8b.yaml(2 hunks)tests/planner/perf_test_configs/disagg_8b_2p2d.yaml(3 hunks)tests/planner/perf_test_configs/disagg_8b_3p1d.yaml(3 hunks)tests/planner/perf_test_configs/disagg_8b_planner.yaml(4 hunks)tests/planner/perf_test_configs/disagg_8b_tp2.yaml(3 hunks)tests/planner/perf_test_configs/image_cache_daemonset.yaml(1 hunks)tests/planner/scaling/disagg_planner.yaml(4 hunks)
💤 Files with no reviewable changes (3)
- benchmarks/nixl/README.md
- lib/parsers/Cargo.toml
- lib/parsers/tests/mod.rs
🧰 Additional context used
🪛 Checkov (3.2.334)
deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml
[medium] 3-35: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 3-35: Minimize the admission of root containers
(CKV_K8S_23)
🪛 LanguageTool
README.md
[grammar] ~181-~181: There might be a mistake here.
Context: ...ggregated vs. vanilla vLLM) using AIPerf - **[Pre-Deployment Profiling](docs/benchmark...
(QB_NEW_EN)
docs/benchmarks/benchmarking.md
[grammar] ~187-~187: There might be a mistake here.
Context: ...marking The Python benchmarking module: 1. Connects to your port-forwarded endpoi...
(QB_NEW_EN)
[grammar] ~188-~188: There might be a mistake here.
Context: ...nnects** to your port-forwarded endpoint 2. Benchmarks using AIPerf at various con...
(QB_NEW_EN)
[grammar] ~189-~189: There might be a mistake here.
Context: ...els (default: 1, 2, 5, 10, 50, 100, 250) 3. Measures key metrics: latency, through...
(QB_NEW_EN)
deploy/cloud/pre-deployment/README.md
[grammar] ~38-~38: There might be a mistake here.
Context: ...ed summary: ### 1. kubectl Connectivity - Verifies that kubectl is installed and...
(QB_NEW_EN)
[grammar] ~41-~41: There might be a mistake here.
Context: ...tes cluster ### 2. Default StorageClass - Verifies that a default StorageClass is ...
(QB_NEW_EN)
[grammar] ~43-~43: There might be a mistake here.
Context: ...r - If no default StorageClass is found: - Lists all available StorageClasses in th...
(QB_NEW_EN)
[grammar] ~48-~48: There might be a mistake here.
Context: ...d guidance ### 3. Cluster GPU Resources - Checks for GPU-enabled nodes in the clus...
(QB_NEW_EN)
[grammar] ~130-~130: There might be a mistake here.
Context: ... Name | Description | Pass/Fail Status | |------------|-------------|------------...
(QB_NEW_EN)
[grammar] ~131-~131: There might be a mistake here.
Context: ...------|-------------|------------------| | kubectl Connectivity | Verifies ku...
(QB_NEW_EN)
[grammar] ~132-~132: There might be a mistake here.
Context: ...d cluster access | ✅ PASSED / ❌ FAILED | | Default StorageClass | Checks for ...
(QB_NEW_EN)
[grammar] ~133-~133: There might be a mistake here.
Context: ...Class annotation | ✅ PASSED / ❌ FAILED | | Cluster Resources | Validates GPU ...
(QB_NEW_EN)
[grammar] ~148-~148: There might be a mistake here.
Context: ...ing ### Multiple Default StorageClasses If you have multiple StorageClasses mark...
(QB_NEW_EN)
[grammar] ~160-~160: There might be a mistake here.
Context: ...:"false"}}}' ``` ### No GPU Nodes Found If no GPU nodes are found, ensure your c...
(QB_NEW_EN)
[grammar] ~163-~163: There might be a mistake here.
Context: ... label. ### No StorageClasses Available If no StorageClasses are available in yo...
(QB_NEW_EN)
[grammar] ~164-~164: There might be a mistake here.
Context: ...ailable in your cluster, you'll need to: 1. Install a storage provisioner (e.g., for...
(QB_NEW_EN)
deploy/cloud/pre-deployment/nixl/README.md
[grammar] ~16-~16: There might be a mistake here.
Context: ... Prerequisites ### Cluster Requirements Before deploying NIXL benchmark, ensure ...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...isites ### Cluster Requirements Before deploying NIXL benchmark, ensure your cluster mee...
(QB_NEW_EN)
[grammar] ~29-~29: There might be a mistake here.
Context: ...d status ### NIXL-Specific Requirements In addition to the cluster requirements ...
(QB_NEW_EN)
[grammar] ~30-~30: There might be a mistake here.
Context: ...dition to the cluster requirements above, NIXL benchmark requires: - Docker i...
(QB_NEW_EN)
[grammar] ~30-~30: There might be a mistake here.
Context: ...irements above, NIXL benchmark requires: - Docker installed and configured on you...
(QB_NEW_EN)
[grammar] ~31-~31: There might be a mistake here.
Context: ...your local machine (for building images) - Docker registry access to push the bui...
(QB_NEW_EN)
[grammar] ~32-~32: There might be a mistake here.
Context: ...ess** to push the built nixlbench images - ETCD service deployed and accessible a...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ...NIXL source code ### Verification Steps 1. Run pre-deployment check (recommended)...
(QB_NEW_EN)
[grammar] ~63-~63: There might be a mistake here.
Context: ...vides a flexible workflow where you can: 1. Select architecture: Choose between x8...
(QB_NEW_EN)
[grammar] ~65-~65: There might be a mistake here.
Context: ...to execute**: Select any combination of: - Build nixlbench Docker image - Update...
(QB_NEW_EN)
[grammar] ~75-~75: There might be a mistake here.
Context: ...ipt Features ### Architecture Selection The script supports two architectures: -...
(QB_NEW_EN)
[grammar] ~76-~76: There might be a mistake here.
Context: ...n The script supports two architectures: - Option 1: x86_64 (Intel/AMD 64-bit) - ...
(QB_NEW_EN)
[grammar] ~77-~77: There might be a mistake here.
Context: ... Option 1: x86_64 (Intel/AMD 64-bit) - Option 2: aarch64 (ARM64) You can sel...
(QB_NEW_EN)
[grammar] ~84-~84: There might be a mistake here.
Context: ...aarch64 architecture ### Step Selection Choose which steps to execute by enterin...
(QB_NEW_EN)
[grammar] ~87-~87: There might be a mistake here.
Context: ...ma-separated numbers: - All steps: 1,2,3 - Build and update only: 1,2 (skips Ku...
(QB_NEW_EN)
[grammar] ~88-~88: There might be a mistake here.
Context: ...y**: 1,2 (skips Kubernetes deployment) - Deploy only: 3 (useful if image is a...
(QB_NEW_EN)
[grammar] ~89-~89: There might be a mistake here.
Context: ...lready built and deployment file exists) - Build only: 1 (useful for just creat...
(QB_NEW_EN)
[grammar] ~90-~90: There might be a mistake here.
Context: ...eful for just creating the Docker image) - Update deployment only: 2 (useful fo...
(QB_NEW_EN)
[grammar] ~93-~93: There might be a mistake here.
Context: ...y/version) ### Smart Registry Prompting The script only prompts for Docker regis...
(QB_NEW_EN)
[grammar] ~94-~94: There might be a mistake here.
Context: ...Docker registry information when needed: - Steps 1 or 2: Registry required for bu...
(QB_NEW_EN)
[grammar] ~95-~95: There might be a mistake here.
Context: ...or building image or updating deployment - Step 3 only: No registry prompt (uses ...
(QB_NEW_EN)
[grammar] ~102-~102: There might be a mistake here.
Context: ...### Step 1: Build nixlbench Docker Image - Downloads NIXL source code (version 0.6....
(QB_NEW_EN)
[grammar] ~109-~109: There might be a mistake here.
Context: ... ### Step 2: Update Deployment YAML File - Copies base deployment template (`nixlbe...
(QB_NEW_EN)
[grammar] ~110-~110: There might be a mistake here.
Context: ...t template (nixlbench-deployment.yaml) - Creates architecture-specific deployment...
(QB_NEW_EN)
[grammar] ~111-~111: There might be a mistake here.
Context: ...ile (nixlbench-deployment-{arch}.yaml) - Updates image reference with your regist...
(QB_NEW_EN)
[grammar] ~112-~112: There might be a mistake here.
Context: ...ence with your registry and architecture - Preserves all other deployment configura...
(QB_NEW_EN)
[grammar] ~115-~115: There might be a mistake here.
Context: ...ations ### Step 3: Deploy to Kubernetes - Validates deployment file exists - Appli...
(QB_NEW_EN)
[grammar] ~124-~124: There might be a mistake here.
Context: ...t Configuration The deployment creates: - 2 replicas of the nixlbench pod - **Re...
(QB_NEW_EN)
[grammar] ~125-~125: There might be a mistake here.
Context: ...s: - 2 replicas of the nixlbench pod - Resource requests/limits: - CPU: 10 ...
(QB_NEW_EN)
[grammar] ~126-~126: There might be a mistake here.
Context: ...ench pod - Resource requests/limits: - CPU: 10 cores - Memory: 5Gi - GPU: 1...
(QB_NEW_EN)
[grammar] ~127-~127: There might be a mistake here.
Context: ...rce requests/limits**: - CPU: 10 cores - Memory: 5Gi - GPU: 1 NVIDIA GPU per po...
(QB_NEW_EN)
[grammar] ~128-~128: Ensure spelling is correct
Context: ...limits**: - CPU: 10 cores - Memory: 5Gi - GPU: 1 NVIDIA GPU per pod - **Environmen...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~129-~129: There might be a mistake here.
Context: ...emory: 5Gi - GPU: 1 NVIDIA GPU per pod - Environment variables: - `ETCD_ENDPO...
(QB_NEW_EN)
[grammar] ~130-~130: There might be a mistake here.
Context: ...GPU per pod - Environment variables: - ETCD_ENDPOINTS: Points to etcd:2379 - Command: R...
(QB_NEW_EN)
[grammar] ~131-~131: There might be a mistake here.
Context: ...bles**: - ETCD_ENDPOINTS: Points to etcd:2379 - Command: Runs nixlbench with VRAM segm...
(QB_NEW_EN)
[grammar] ~181-~181: There might be a mistake here.
Context: ...of the base template with your specific: - Docker registry - Image tag - Architectu...
(QB_NEW_EN)
[grammar] ~182-~182: There might be a mistake here.
Context: ...te with your specific: - Docker registry - Image tag - Architecture --- ## Monito...
(QB_NEW_EN)
[grammar] ~183-~183: There might be a mistake here.
Context: ... specific: - Docker registry - Image tag - Architecture --- ## Monitoring Your De...
(QB_NEW_EN)
[grammar] ~217-~217: There might be a mistake here.
Context: ...roubleshooting ### Cluster-Level Issues For cluster-related problems, first run ...
(QB_NEW_EN)
[grammar] ~224-~224: There might be a mistake here.
Context: ...t-check.sh ``` This will help diagnose: - kubectl connectivity problems - Missing ...
(QB_NEW_EN)
[grammar] ~232-~232: There might be a mistake here.
Context: ...Specific Issues 1. ETCD Connection: - Ensure etcd service is running: `kubectl...
(QB_NEW_EN)
[grammar] ~237-~237: There might be a mistake here.
Context: ...ect namespace 2. Image Pull Issues: - Verify registry credentials are configur...
(QB_NEW_EN)
[grammar] ~239-~239: There might be a mistake here.
Context: ...are configured - Check image exists: docker pull {registry}/nixlbench:0.6.0-{arch} - Ensure image was pushed successfully aft...
(QB_NEW_EN)
[grammar] ~242-~242: There might be a mistake here.
Context: ...ully after build 3. Build Failures: - Ensure Docker daemon is running - Che...
(QB_NEW_EN)
[grammar] ~248-~248: There might be a mistake here.
Context: ...nzip` 4. Deployment File Not Found: - Run step 2 to create deployment file bef...
(QB_NEW_EN)
[grammar] ~249-~249: There might be a mistake here.
Context: ...nt File Not Found**: - Run step 2 to create deployment file before step 3 - Chec...
(QB_NEW_EN)
[grammar] ~250-~250: There might be a mistake here.
Context: ...fore step 3 - Check file permissions in script directory - Verify script dir...
(QB_NEW_EN)
[grammar] ~286-~286: There might be a mistake here.
Context: ...cript Reference ### build_and_deploy.sh Interactive script that provides flexibl...
(QB_NEW_EN)
[grammar] ~287-~287: There might be a mistake here.
Context: ...d_and_deploy.sh Interactive script that provides flexible build and deployment workflow:...
(QB_NEW_EN)
[grammar] ~287-~287: There might be a mistake here.
Context: ... flexible build and deployment workflow: - Architecture selection: x86_64 or aarc...
(QB_NEW_EN)
[grammar] ~288-~288: There might be a mistake here.
Context: ...hitecture selection**: x86_64 or aarch64 - Step selection: Choose any combination...
(QB_NEW_EN)
[grammar] ~292-~292: There might be a mistake here.
Context: ...deploying ### nixlbench-deployment.yaml Base Kubernetes deployment template that...
(QB_NEW_EN)
[grammar] ~293-~293: There might be a mistake here.
Context: ...late that gets customized by the script: - Template image: `my-registry/nixlbench...
(QB_NEW_EN)
[grammar] ~294-~294: There might be a mistake here.
Context: ...ed by the script: - Template image: my-registry/nixlbench:version-arch - Resource allocation: 10 CPU, 5Gi memor...
(QB_NEW_EN)
[grammar] ~295-~295: There might be a mistake here.
Context: ...ion**: 10 CPU, 5Gi memory, 1 GPU per pod - ETCD integration: Pre-configured envir...
(QB_NEW_EN)
[grammar] ~296-~296: There might be a mistake here.
Context: ...**: Pre-configured environment variables - Benchmark command: Runs with VRAM segm...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.18.1)
deploy/cloud/pre-deployment/README.md
54-54: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
97-97: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.0)
benchmarks/utils/genai.py
83-83: subprocess call: check for execution of untrusted input
(S603)
benchmarks/utils/plot.py
47-47: Do not catch blind exception: Exception
(BLE001)
benchmarks/profiler/utils/aiperf.py
76-99: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
112-137: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
147-149: Avoid specifying long messages outside the exception class
(TRY003)
171-171: subprocess call: check for execution of untrusted input
(S603)
214-214: subprocess call: check for execution of untrusted input
(S603)
232-232: subprocess call: check for execution of untrusted input
(S603)
benchmarks/llm/plot_pareto.py
36-36: Create your own exception
(TRY002)
36-36: Avoid specifying long messages outside the exception class
(TRY003)
85-87: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
266-268: Avoid specifying long messages outside the exception class
(TRY003)
🪛 Shellcheck (0.11.0)
deploy/cloud/pre-deployment/pre-deployment-check.sh
[warning] 105-105: provisioner appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 214-214: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (80)
docs/_includes/install.rst (2)
13-13: Doc version update looks goodThanks for bumping the PyPI install snippet to 0.6.0; this keeps the doc aligned with the tagged release.
44-44: Docker tag alignment confirmedThe Docker command now points at the 0.6.0 runtime image, matching the standardized version across the project. No further action needed.
container/deps/requirements.txt (2)
5-7: Appreciate the pinned git SHAs.Thanks for pinning
aiconfiguratorandaiperfto specific commits; that keeps our images reproducible as we roll out the new tooling.
34-34: Double-check NumPy compatibility alongside this SciPy pin.We’re explicitly holding SciPy below 1.14 because of
pmdarima, but NumPy is now unconstrained. Please confirm whetherpmdarima(and the rest of the stack) already caps NumPy <2.0 via its metadata; if not, add an explicit<2guard so we don’t accidentally pick up NumPy 2.x and hit ABI/build failures.examples/basics/kubernetes/Distributed_Inference/agg_router.yaml (1)
41-41: Image tag alignment looks good.Both frontend and decode worker now point to the shared 0.6.0 runtime, matching the common recipe template.
Also applies to: 98-98
components/backends/vllm/deploy/agg.yaml (1)
16-16: Consistent vLLM runtime update.The deployment now uses the standardized 0.6.0 image for both roles; no further action needed.
Also applies to: 27-27
examples/multimodal/deploy/agg_qwen.yaml (1)
17-17: Multimodal stack now references the shared 0.6.0 build.Glad to see every worker updated together—prevents heterogeneous runtimes during deploys.
Also applies to: 28-28, 45-45, 62-62
examples/custom_backend/hello_world/deploy/hello_world.yaml (1)
44-44: Hello-world example tracks the canonical 0.6.0 image.Keeps the tutorial consistent with the shipped containers.
Also applies to: 83-83
components/backends/vllm/deploy/agg_kvbm.yaml (1)
16-16: KV cache deployment adopts the 0.6.0 runtime.Update aligns this specialized recipe with the rest of the vLLM templates.
Also applies to: 34-34
components/backends/trtllm/deploy/disagg-multinode.yaml (1)
98-98: Uniform TRT-LLM runtime at 0.6.0.All three roles now share the same vetted image; this prevents mixed-version cluster issues.
Also applies to: 130-130, 170-170
recipes/gpt-oss-120b/trtllm/agg/deploy.yaml (1)
33-33: Recipe now mirrors the common 0.6.0 TRT-LLM image.Brings the GPT-OSS recipe in sync with the shared deployment guidance.
Also applies to: 72-72
benchmarks/README.md (1)
18-18: Doc wording updated appropriately.Calling out AIPerf keeps the README aligned with the current benchmarking stack.
components/backends/vllm/deploy/agg_router.yaml (1)
16-16: LGTM - Image version update.The image tags have been consistently updated to
0.6.0. This aligns with the broader PR-wide upgrade across vLLM deployments.Also applies to: 30-30
components/backends/trtllm/deploy/agg-with-config.yaml (1)
37-37: LGTM - TensorRT-LLM image version update.The image tags have been updated to
0.6.0for the TensorRT-LLM runtime. Ensure thatnvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0is available and compatible with the current configuration.Also applies to: 53-53
components/backends/trtllm/deploy/agg.yaml (1)
16-16: LGTM - Consistent image version update.Image tags updated to
0.6.0for TensorRT-LLM aggregate deployment.Also applies to: 27-27
components/backends/trtllm/deploy/disagg_planner.yaml (1)
19-19: LGTM - All container images updated consistently.The image tags have been updated to
0.6.0across all four containers (Frontend, Planner, TRTLLMDecodeWorker, TRTLLMPrefillWorker) in the disaggregated planner deployment.Also applies to: 47-47, 88-88, 117-117
components/backends/vllm/deploy/disagg.yaml (2)
16-16: LGTM - vLLM disaggregated deployment image updates.Image tags consistently updated to
0.6.0across all three containers.Also applies to: 28-28, 48-48
1-57: PR title does not match the actual changes.The PR title states "fix: update model recipe for llama-3 70b to match with common recipe template", but the actual changes are:
- Image version bumps from
my-tagto0.6.0across multiple deployment files- Docker registry migration to
nvcr.io/nvidia/ai-dynamo- Tool rename from
genai-perftoaiperfNo llama-3 70b model-specific recipe changes are present in these files. Please update the PR title and description to accurately reflect the scope of changes, such as:
- "chore: upgrade deployment images to 0.6.0 and migrate to NVIDIA registry"
- "refactor: replace genai-perf with aiperf and update container images"
components/backends/vllm/deploy/disagg_router.yaml (1)
16-16: Verify vllm-runtime:0.6.0 image availability
Test the new tag can be pulled or inspected with valid nvcr.io credentials before deploying:docker pull nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0Applies to lines 16, 30, 49.
deploy/cloud/operator/Earthfile (1)
48-48: Verify nvcr.io registry credentials and access (Earthfile:48).
- Connectivity check returned 401 Unauthorized—registry reachable but requires authentication.
- Add Docker credentials for nvcr.io (e.g., in ~/.docker/config.json or CI/CD secrets).
- Ensure deployment and CI/CD pipelines can pull from nvcr.io/nvidia/ai-dynamo.
Earthfile is deprecated (lines 17–22); plan migration to the new docker-based build system.benchmarks/llm/perf.sh (1)
205-205: Confirmaiperfinstallation and compatibility
- The benchmarking script was updated from
genai-perftoaiperf, butaiperfis not found in the environment.- Ensure
aiperfis installed in all target environments (local, CI, Docker, etc.)- Verify that
aiperf profileaccepts the same flags asgenai-perf profileor document any changes- Update README or setup guides to reflect the tool rename and installation steps
tests/planner/perf_test_configs/disagg_8b_planner.yaml (1)
44-44: Image updates look consistent.All services now reference
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, keeping the deployment aligned with the common recipe template. 👍Also applies to: 77-77, 141-141, 198-198
components/backends/trtllm/deploy/disagg_router.yaml (1)
16-16: Container images aligned to 0.6.0.Frontend, prefill, and decode workers all point to the shared
trtllm-runtime:0.6.0, matching the standardized recipe.Also applies to: 30-30, 58-58
container/Dockerfile.vllm (1)
190-191: Addinggitto runtime packages makes sense.Including git in the runtime image unblocks workflows that rely on it (e.g., pulling git-hosted assets) without affecting cleanup.
components/backends/vllm/deploy/disagg-multinode.yaml (1)
16-16: Version bump applied uniformly.Frontend, decode, and prefill pods now share the
vllm-runtime:0.6.0image, keeping the multinode recipe in sync with the common template.Also applies to: 37-37, 60-60
components/backends/sglang/deploy/README.md (1)
64-65: Docs now match the new registry/tag.README snippets reference
sglang-runtime:0.6.0, mirroring the deployment YAML updates.Also applies to: 95-95
components/backends/sglang/deploy/disagg_planner.yaml (1)
19-19: All planner components aligned to 0.6.0.Frontend, planner, and both worker roles now run the standardized
sglang-runtime:0.6.0image.Also applies to: 30-30, 52-52, 84-84
components/backends/trtllm/deploy/README.md (1)
92-93: Documentation reflects the new TensorRT-LLM image.The README now points to
trtllm-runtime:0.6.0, matching the deployment YAML updates.Also applies to: 112-112, 144-144
components/backends/trtllm/deploy/disagg.yaml (1)
16-16: Runtime image updates look good.Each service now references
trtllm-runtime:0.6.0, matching the common recipe template.Also applies to: 28-28, 56-56
tests/planner/scaling/disagg_planner.yaml (1)
21-21: LGTM! Container image references consistently updated.All container images have been updated from placeholder values to the official NVIDIA registry with the 0.6.0 tag, aligning with the PR-wide migration to standardized image references.
Also applies to: 44-44, 81-81, 105-105
components/backends/sglang/deploy/disagg.yaml (1)
16-16: LGTM! Container images updated to official NVIDIA registry.The sglang-runtime images have been consistently updated to version 0.6.0 across all service containers (Frontend, decode, and prefill).
Also applies to: 28-28, 61-61
recipes/llama-3-70b/vllm/agg/deploy.yaml (1)
41-41: LGTM! GPU resources and tensor parallelism correctly aligned.The reduction of both tensor-parallel-size (from 8 to 4) and GPU resources (from 8 to 4) is properly aligned, ensuring each tensor parallel shard gets exactly one GPU. This appears to be an intentional configuration change for the updated recipe template.
Also applies to: 50-52
recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml (2)
21-21: LGTM! Container images consistently updated.All container images updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 across Frontend, VllmPrefillWorker, and VllmDecodeWorker.
Also applies to: 45-45, 74-74
35-41: Verify shell variable expansion for both workers.Both VllmPrefillWorker and VllmDecodeWorker use environment variable substitution in their args. The pattern is consistent with the agg deployment. Verify that the shell expansion works correctly in your Kubernetes cluster.
Also applies to: 64-70
Earthfile (1)
137-137: LGTM! Docker registry defaults updated consistently.The DOCKER_SERVER defaults have been updated from
my-registrytonvcr.io/nvidia/ai-dynamoacross all three targets (dynamo-base-docker, all-docker, custom), aligning with the PR-wide registry migration.Note: Since this Earthfile is marked as deprecated (lines 17-22), ensure the replacement system also uses the correct registry defaults.
Also applies to: 178-178, 192-192
benchmarks/utils/plot.py (1)
35-50: LGTM! Renaming from GenAI-Perf to AIPerf completed consistently.The renaming has been applied consistently throughout the function:
- Variable names:
genai_perf_json→aiperf_json- File pattern:
profile_export_genai_perf.json→profile_export_aiperf.json- Messages: "genai"/"Genai-perf" → "aiperf"
The logic and control flow remain unchanged.
Note: The static analysis tool flags line 47 for catching a broad
Exception. While more specific exception handling would be ideal, this is acceptable for a benchmarking utility script where you want to continue processing other concurrency levels even if one fails.benchmarks/utils/genai.py (2)
36-36: LGTM! Public function renamed consistently.The function
run_genai_perfhas been renamed torun_aiperfwhile maintaining the same signature and semantics. The caller inrun_concurrency_sweep(line 116) has been updated accordingly.Also applies to: 116-116
47-47: LGTM! Command and internal references updated consistently.The changes correctly update:
- Command from
"genai-perf"to"aiperf"- Variable from
gap_processtoaip_process- User-facing messages from "Genai-perf" to "Aiperf"
Note: The static analysis warning about subprocess execution (line 83) is a false positive. The command is constructed from structured benchmark parameters (
service_url,model_name, concurrency levels, etc.) that come from the function arguments, not from arbitrary untrusted input.Also applies to: 79-79, 83-101
components/backends/sglang/launch/disagg_dp_attn.sh (1)
30-30: Verify new round-robin flags in sglang CLI
- Prefill worker: replaced
--expert-distribution-recorder-mode stat(MoE diagnostics) with--load-balance-method round_robin(valid DP routing flag).- Decode worker: confirm
--prefill-round-robin-balanceis supported in sglang 0.6.0—otherwise switch to--load-balance-method round_robinor the correct router policy.- Ensure removal of
SGLANG_EXPERT_DISTRIBUTION_RECORDER_DIR(MoE stats) is intentional.benchmarks/profiler/utils/profile_cache.py (2)
29-29: LGTM! Consistent renaming from GenAI-Perf to AIPerf.The file pattern updates from
profile_export_genai_perf.jsontoprofile_export_aiperf.jsonand fromgap_request*toaiperf_request*are consistent throughout the file. This aligns with the broader repository-wide migration to AIPerf tooling.Also applies to: 68-69, 96-96, 118-118, 131-131
45-49: Good addition of JSON validation for prefill results.The validation logic appropriately checks for
time_to_first_tokenmetrics within the JSON structure, which strengthens the data validation for AIPerf results.docs/benchmarks/benchmarking.md (1)
64-64: LGTM! Documentation consistently updated for AIPerf.The documentation correctly reflects the migration from GenAI-Perf to AIPerf across all sections, including:
- Tool descriptions and framing
- Command-line parameter documentation
- Artifact file naming (profile_export_aiperf.json/csv)
- Docker image references updated to 0.6.0
This aligns with the broader repository changes to AIPerf tooling.
Also applies to: 73-73, 168-169, 182-183, 189-189, 304-307, 413-413, 519-519
docs/kubernetes/README.md (1)
22-25: LGTM! Helpful addition of pre-deployment checks.The new "Pre-deployment Checks" section appropriately guides users to run pre-deployment validation before installing the platform. This is well-placed before the installation steps and provides a clear reference to the detailed documentation.
components/backends/vllm/deploy/README.md (1)
72-72: LGTM! Documentation updated to match deployment files.The image references have been correctly updated to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, which aligns with the broader repository-wide image version standardization visible across multiple deployment manifests.Also applies to: 119-119
components/backends/vllm/deploy/disagg_planner.yaml (1)
19-19: LGTM! Consistent image version updates.All four services (Frontend, Planner, VllmDecodeWorker, VllmPrefillWorker) have been updated to use
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0. This is consistent with the broader repository-wide standardization to version 0.6.0 and the NVIDIA AI-Dynamo image registry.Also applies to: 29-29, 51-51, 71-71
recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (2)
52-52: Hard-coded tokenizer path is brittle and not portable.The tokenizer path
/root/.cache/huggingface/hub/models--RedHatAI--Llama-3.3-70B-Instruct-FP8-dynamic/snapshots/ddb4128556dfcff99e0c41aee159ea6c3e655dcdis hard-coded with a specific snapshot hash. This creates the following issues:
- The path will break if the model is updated or the cache structure changes
- The snapshot hash is not parameterized and may not match across different environments
- The path assumes a specific cache location structure
Consider using the model identifier directly (e.g.,
--tokenizer $TARGET_MODEL) and let the tool resolve the path, or parameterize the tokenizer path as an environment variable if an absolute path is required.⛔ Skipped due to learnings
Learnt from: biswapanda PR: ai-dynamo/dynamo#3143 File: recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml:0-0 Timestamp: 2025-09-24T23:44:28.972Z Learning: In the gpt-oss-120b benchmarking jobs, hardcoded snapshot hashes in tokenizer paths are intentional for version pinning and reproducible results, not an issue to be fixed.
8-8: Verify reduced backoffLimit is intentional
All perf.yaml files under recipes/* use backoffLimit: 1; if retries for transient failures are needed, consider reverting to backoffLimit: 3.components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (1)
16-16: LGTM! Consistent image update across all containers.All three container images (Frontend, VllmDecodeWorker, VllmPrefillWorker) have been updated to use
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, maintaining consistency with the broader registry migration in this PR.Also applies to: 27-27, 63-63
tests/planner/perf_test_configs/agg_8b.yaml (1)
41-41: LGTM! Image updates are consistent.Both Frontend and VllmDecodeWorker containers have been updated to use the NVIDIA registry image at version 0.6.0, consistent with the migration pattern across this PR.
Also applies to: 91-91
components/backends/sglang/deploy/agg_logging.yaml (1)
19-19: LGTM! SGLang runtime images updated consistently.Both Frontend and decode worker containers have been updated to use
nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0, maintaining version consistency with the vLLM deployments in this PR.Also applies to: 30-30
components/backends/vllm/deploy/disagg_kvbm.yaml (1)
16-16: LGTM! All container images updated consistently.The three containers in this KVBM disaggregated deployment (Frontend, VllmDecodeWorker, VllmPrefillWorker) have been updated to version 0.6.0, aligning with the registry migration across this PR.
Also applies to: 27-27, 59-59
components/backends/sglang/deploy/agg.yaml (1)
16-16: LGTM! Image references updated correctly.Both Frontend and decode worker containers have been updated to
nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0, maintaining consistency with the broader image migration in this PR.Also applies to: 27-27
deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml (2)
21-26: LGTM: Parameterized ETCD endpoint configuration.The addition of the
ETCD_ENDPOINTSenvironment variable and its use in the command improves configurability and follows best practices for avoiding hardcoded values.
27-35: LGTM: Resource requests and limits specified.Adding explicit CPU, memory, and GPU resource requests and limits is a best practice for Kubernetes deployments, ensuring predictable scheduling and preventing resource contention.
components/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)
16-16: LGTM: Container image version updated to 0.6.0.The image references have been consistently updated to use
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0across all three services (Frontend, VllmDecodeWorker, VllmPrefillWorker), aligning with the repository-wide standardization to NVIDIA's official registry and versioning.Also applies to: 29-29, 67-67
benchmarks/profiler/utils/profile_prefill.py (2)
9-9: LGTM: Import updated to use AIPerf module.The import path has been updated from
benchmarks.profiler.utils.genai_perftobenchmarks.profiler.utils.aiperf, aligning with the repository-wide renaming from GenAI-Perf to AIPerf.
83-94: LGTM: Artifact directory and result access updated for AIPerf.The changes consistently update:
- Artifact directory naming from
genai_perf_artifact_dirtoai_perf_artifact_dir- Function calls from
benchmark_prefill(genai_perf) to the aiperf equivalent- Result access pattern to match AIPerf's structure:
aiperf_result["records"]["ttft"]["avg"]The logic remains functionally equivalent, properly handling None results.
deploy/cloud/pre-deployment/pre-deployment-check.sh (4)
42-59: LGTM: kubectl connectivity check is well-structured.The function properly checks both command availability and cluster connectivity, with clear error messages guiding users to remediation steps.
122-141: LGTM: GPU resource check is straightforward and effective.The function correctly identifies GPU-enabled nodes using the
nvidia.com/gpu.present=truelabel with appropriate error messaging.
143-177: LGTM: GPU operator check handles partial running state.The function properly distinguishes between no operator, non-running pods, partially running, and fully running states, providing appropriate warnings and status messages for each scenario.
239-283: LGTM: Main function orchestrates checks effectively.The main function properly executes all checks, records results, displays a summary, and exits with the appropriate code. The try-finally pattern with
set -eensures proper error handling.benchmarks/profiler/profile_sla.py (3)
25-25: LGTM: Import updated to AIPerf modules.The import has been updated to use
benchmarks.profiler.utils.aiperffor bothbenchmark_decodeandbenchmark_prefill, consistent with the repository-wide migration from GenAI-Perf to AIPerf.
248-259: LGTM: Prefill benchmarking updated to use AIPerf.The changes consistently update:
- Artifact directory naming from
gap_isl{args.isl}toaiperf_isl{args.isl}- Function call target to
benchmark_prefillfrom the aiperf module- Result access to
aiperf_result["records"]["ttft"]["avg"]The logic flow remains unchanged, properly handling None results.
427-444: LGTM: Decode benchmarking updated to use AIPerf.The changes consistently update:
- Artifact directory naming pattern to use
aiperf_prefix- Function call to
benchmark_decodefrom the aiperf module- Result access patterns to use
aiperf_result["records"]["inter_token_latency"]["avg"]andaiperf_result["records"]["output_token_throughput"]["avg"]The throughput calculation correctly divides by
num_gpusto get per-GPU metrics, and None handling is preserved.deploy/cloud/pre-deployment/nixl/README.md (1)
1-297: LGTM: Comprehensive and well-structured documentation.The README provides excellent coverage of:
- Prerequisites with clear verification steps
- Interactive script usage with multiple examples
- Deployment configuration details
- Monitoring commands
- Troubleshooting guidance organized by issue type
The documentation follows best practices by referencing the pre-deployment check script and providing step-by-step workflows for different use cases.
Note: The static analysis grammar warnings are false positives related to markdown list formatting detection and can be safely ignored. The term "5Gi" is valid Kubernetes resource notation.
benchmarks/profiler/utils/profile_decode.py (2)
9-9: LGTM: Import updated to AIPerf module.The import path has been updated to use
benchmarks.profiler.utils.aiperf, consistent with the repository-wide migration from GenAI-Perf to AIPerf.
115-132: LGTM: Decode benchmarking updated to use AIPerf.The changes consistently update:
- Artifact directory naming from
gap_isl{isl}_osl{osl}_n{num_request}toaiperf_isl{isl}_osl{osl}_n{num_request}- Function call to
benchmark_decodefrom the aiperf module- Result access to use
aiperf_result["records"]["inter_token_latency"]["avg"]andaiperf_result["records"]["output_token_throughput"]["avg"]The per-GPU throughput calculation and None handling logic remain correct.
tests/planner/perf_test_configs/disagg_8b_2p2d.yaml (1)
41-41: LGTM: Container image version updated to 0.6.0.The image references have been consistently updated to use
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0across all three services (Frontend, VllmDecodeWorker, VllmPrefillWorker), aligning with the repository-wide standardization to NVIDIA's official registry and versioning.Also applies to: 91-91, 141-141
components/backends/sglang/deploy/agg_router.yaml (1)
16-16: LGTM! Image references updated consistently.The container image updates to
nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0are consistent across both the Frontend and decode worker components.Also applies to: 30-30
benchmarks/llm/plot_pareto.py (2)
28-43: LGTM! Function correctly updated for AIPerf.The renaming from
genai_perf_profile_export_json_pathstoaiperf_profile_export_json_pathsis consistent throughout, and the file search pattern correctly targetsprofile_export_aiperf.json.
242-247: LGTM! Help text updated correctly.The argument parser description and help text have been properly updated from "GenAI-Perf" to "AIPerf", maintaining consistency with the renamed functionality.
deploy/cloud/pre-deployment/nixl/build_and_deploy.sh (5)
1-11: LGTM! Script header and safety settings are correct.The script includes proper licensing, uses
set -euo pipefailfor robust error handling, and defines the NIXL version and script directory appropriately.
12-133: Excellent dependency checking implementation.The
check_dependenciesandcommand_existsfunctions provide thorough validation of required tools (wget, unzip, kubectl, docker) with clear error messages and installation suggestions. The warnings for Docker daemon and buildx availability are helpful for troubleshooting.
213-263: LGTM! Step selection logic is well-structured.The
prompt_for_stepsfunction provides a clear interactive interface for users to select which steps to execute, with proper validation and error handling.
265-279: LGTM! Deployment function is clean and informative.The
deploy_to_k8sfunction correctly applies the deployment and provides helpful next-step commands for checking pod status and viewing logs.
281-413: LGTM! Main function orchestrates workflow effectively.The main function provides a clear workflow with multiple confirmation points, proper error handling, and informative summaries. The step-by-step execution with conditional logic based on user selections is well-implemented.
benchmarks/profiler/utils/aiperf.py (4)
33-64: LGTM! Common command builder updated correctly.The function has been properly renamed from
_get_common_genai_perf_cmdto_get_common_aiperf_cmd, and the executable name is correctly changed from"genai-perf"to"aiperf".
140-151: LGTM! Result retrieval function updated correctly.The function has been renamed from
get_gap_resulttoget_aiperf_result, and the file search correctly targetsprofile_export_aiperf.jsoninstead ofprofile_export_genai_perf.json.
154-186: LGTM! Prefill benchmark function updated correctly.The function signature has been properly updated to use
aiperf_artifact_dirinstead ofgenai_perf_artifact_dir, and all internal references have been consistently renamed. The logging messages correctly reflect the AIPerf terminology.
189-247: LGTM! Decode benchmark function updated correctly.The function signature and implementation have been thoroughly updated from GenAI-Perf to AIPerf naming. The warmup and profiling flows correctly use the renamed command builders and result retrieval functions.
5713c67 to
665ce7f
Compare
Overview:
closes: DEP-526
Main PR: #3637
nvbug: 5579876