Skip to content

fix: update model recipe for llama-3 70b to match with common recipe template #3637#3656

Merged
saturley-hall merged 1 commit into
release/0.6.0from
bis/cp-fix-model-recipe
Oct 16, 2025
Merged

fix: update model recipe for llama-3 70b to match with common recipe template #3637#3656
saturley-hall merged 1 commit into
release/0.6.0from
bis/cp-fix-model-recipe

Conversation

@biswapanda
Copy link
Copy Markdown
Contributor

@biswapanda biswapanda commented Oct 15, 2025

Overview:

closes: DEP-526

Main PR: #3637

nvbug: 5579876

@biswapanda biswapanda requested a review from a team as a code owner October 15, 2025 19:52
@biswapanda biswapanda requested a review from a team October 15, 2025 19:52
@biswapanda biswapanda requested review from a team as code owners October 15, 2025 19:52
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Oct 15, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 15, 2025

Walkthrough

The PR updates container image registries/tags to nvcr.io/nvidia/ai-dynamo:0.6.0, replaces GenAI-Perf with AIPerf across benchmarking code/docs, adds git to several Dockerfiles and new Python deps, introduces Kubernetes pre-deployment check scripts/docs and NIXL build/deploy tooling, adjusts several recipe/perf flows, and removes some Rust test scaffolding.

Changes

Cohort / File(s) Summary
Registry defaults and image tags
Earthfile, deploy/cloud/operator/Earthfile, components/.../deploy/*.yaml, examples/.../*.yaml, examples/deployments/ECS/*.json, tests/planner/.../*.yaml, docs/_includes/install.rst, docs/backends/*/*.md, docs/kubernetes/*.md, components/backends/*/deploy/README.md, README.md
Switch default/published images to nvcr.io/nvidia/ai-dynamo:*:0.6.0; update imagePullSecrets example; bump literal version tags in docs and examples.
Benchmarking: GenAI-Perf → AIPerf
benchmarks/llm/perf.sh, benchmarks/llm/plot_pareto.py, benchmarks/utils/genai.py, benchmarks/utils/plot.py, benchmarks/profiler/profile_{endpoint,sla}.py, `benchmarks/profiler/utils/*(aiperf
profile_).py, benchmarks/incluster/benchmark_job.yaml, docs/benchmarks/benchmarking.md`
Dockerfiles and dependencies
container/Dockerfile.{sglang,sglang-wideep,trtllm,vllm}, container/deps/requirements.txt
Add git to images; replace genai-perf with aiperf install; add aiconfigurator/aiperf (git), pin scipy<1.14.0, loosen pydantic (>=2.10.6), remove numpy==1.26.4 pin.
Pre-deployment checks and NIXL tooling
deploy/cloud/pre-deployment/README.md, deploy/cloud/pre-deployment/pre-deployment-check.sh, docs/kubernetes/README.md, deploy/cloud/pre-deployment/nixl/README.md, deploy/cloud/pre-deployment/nixl/build_and_deploy.sh, deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml, benchmarks/nixl/README.md (removed)
Add Kubernetes readiness check script/docs; add NIXL build/deploy guide and interactive script; update NIXL deployment YAML (image placeholder, ETCD env, resources); remove old NIXL README.
Recipes: llama-3-70b
recipes/llama-3-70b/model-cache/model-download.yaml, recipes/llama-3-70b/vllm/{agg,disagg-*}/deploy.yaml, recipes/llama-3-70b/vllm/{agg,disagg-*}/perf.yaml
Model cache switches to hf download with MODEL_REVISION and /model-store; deployment uses env-driven MODEL_PATH/SERVED_MODEL_NAME, updates images and GPU counts; perf Jobs reworked to install/run AIPerf with parameterized harness and artifacts; reduce backoffLimit.
SGLang launch tweaks
components/backends/sglang/launch/disagg_dp_attn.sh
Remove expert distribution recorder; switch to round-robin load balancing flags for prefill/decode.
Profiler config/image
benchmarks/profiler/utils/config.py, benchmarks/profiler/profile_endpoint.py
Update default planner image to nvcr tag; fix import paths to absolute module locations.
Rust tests cleanup
lib/parsers/Cargo.toml, lib/parsers/tests/mod.rs
Remove dev-dependencies and a test module scaffolding.
Misc examples and tests
examples/basics/kubernetes/Distributed_Inference/agg_router.yaml, examples/multimodal/deploy/*, examples/custom_backend/hello_world/deploy/hello_world.yaml, recipes/gpt-oss-120b/trtllm/agg/deploy.yaml
Update image references to nvcr 0.6.0 across various example deployments.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant S as pre-deployment-check.sh
  participant K as Kubernetes API
  participant G as GPU Operator

  U->>S: Run script
  S->>K: kubectl version/check (context)
  S-->>U: Connectivity PASS/FAIL
  S->>K: Get default StorageClass
  S-->>U: Default SC PASS/FAIL
  S->>K: Get nodes (check GPUs)
  S-->>U: GPU nodes PASS/FAIL
  S->>G: Check NVIDIA GPU Operator status
  S-->>U: Operator PASS/FAIL
  S-->>U: Summary + exit code
Loading
sequenceDiagram
  autonumber
  actor U as User
  participant B as build_and_deploy.sh
  participant GH as Source ZIP Host
  participant D as Docker
  participant R as Registry
  participant K as Kubernetes API

  U->>B: Start (select steps/arch/registry)
  B->>GH: Download NIXL source (wget)
  B->>D: Build image (buildx)
  D->>R: Push image (optional)
  B->>B: Update deployment YAML (sed image)
  B->>K: kubectl apply -f deployment.yaml
  K-->>B: Status
  B-->>U: Results/logging guidance
Loading
sequenceDiagram
  autonumber
  participant J as K8s Job (perf)
  participant S as Model Service
  participant P as AIPerf CLI
  participant A as Artifact Store

  J->>J: wait_for_model_ready()
  J->>S: Probe /healthz
  S-->>J: 200 OK when ready
  J->>P: aiperf profile --model --endpoint --concurrency
  P->>S: Load test traffic
  S-->>P: Responses
  P-->>A: Write profile_export_aiperf.json
  J-->>A: Save input_config.json, logs
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Poem

In burrows of YAML I hop with delight,
Tags turn to 0.6.0, shining bright.
AIPerf drums a quicker beat,
Images pull, deployments meet.
Preflight checks—thump, thump—okay!
Git in the soup, we’re on our way.
Carrot commits shipped today. 🥕🚀

Pre-merge checks

❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title indicates that the pull request only updates the model recipe for Llama-3 70B, but the change summary shows extensive modifications across Docker image references, benchmarking tooling, deployment scripts, documentation, and configuration files well beyond the Llama-3 recipe update. Therefore the title does not accurately reflect the scope of changes. Please update the title to summarize the full scope of changes or split this PR into smaller focused PRs that each align with their titles.
Docstring Coverage ⚠️ Warning Docstring coverage is 54.76% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description Check ⚠️ Warning The pull request description is incomplete when compared to the required template structure. While the author provided an "Overview" section header, it only contains issue references ("closes: DEP-526", link to Main PR, and nvbug reference) rather than an actual overview describing what the PR accomplishes. The required "Details" section, which should describe the changes made in the PR, is entirely missing. Additionally, the "Where should the reviewer start?" section for directing reviewers to specific files is absent. Based on the raw_summary, this PR makes extensive changes across multiple files including benchmarking tool migrations from genai-perf to aiperf, image tag updates from my-tag to 0.6.0, registry updates, documentation changes, and new deployment scripts, none of which are described in the provided description. The author should expand the pull request description by: (1) adding a substantive Overview section that explains at a high level what this PR accomplishes and why it's needed, (2) adding a Details section that describes the key changes being made (such as the migration from genai-perf to aiperf, image tag updates, documentation updates, and addition of deployment scripts), and (3) adding a "Where should the reviewer start?" section that identifies critical files for review. The related issues information can remain but should be moved to a dedicated "Related Issues" section following the template structure.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml (1)

45-92: Hard-coded MODEL_PATH prevents fresh deployments

Pointing MODEL_PATH at a specific HuggingFace cache snapshot requires that exact directory already exist on the PVC. On a fresh cluster (PVC empty) the workers start with /bin/sh -c python3 … --model $MODEL_PATH, the path is missing, and vLLM exits immediately. The previous value (RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic) allowed vLLM/HuggingFace to materialize the model on first run. Please revert to the repo ID (or add an explicit pre-download job and keep the repo ID here). Remember to fix both prefill and decode sections.

-            - name: MODEL_PATH
-              value: "/root/.cache/huggingface/hub/models--RedHatAI--Llama-3.3-70B-Instruct-FP8-dynamic/snapshots/ddb4128556dfcff99e0c41aee159ea6c3e655dcd"
+            - name: MODEL_PATH
+              value: "RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic"
@@
-            - name: MODEL_PATH
-              value: "/root/.cache/huggingface/hub/models--RedHatAI--Llama-3.3-70B-Instruct-FP8-dynamic/snapshots/ddb4128556dfcff99e0c41aee159ea6c3e655dcd"
+            - name: MODEL_PATH
+              value: "RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic"
♻️ Duplicate comments (9)
recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml (1)

38-39: Duplicate: Hardcoded snapshot hash in both workers.

Both workers use the same hardcoded MODEL_PATH with snapshot hash ddb4128556dfcff99e0c41aee159ea6c3e655dcd. Consider the maintainability concern noted in the agg deployment - centralizing this path value would make updates easier.

Also applies to: 67-68

recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (4)

23-102: Consider extracting duplicated shell script to a ConfigMap.

Same refactoring suggestion as in disagg-single-node/perf.yaml. This file is nearly identical except for environment variable values (DEPLOYMENT_GPU_COUNT=16, DEPLOYMENT_MODE=disagg-mn, ENDPOINT=llama3-70b-disagg-mn-frontend:8000).


8-8: Consider the impact of reducing backoffLimit.

Same issue as in disagg-single-node/perf.yaml. The backoffLimit reduction from 3 to 1 reduces resilience to transient failures.


52-52: Hard-coded tokenizer path is brittle and not portable.

Same issue as in disagg-single-node/perf.yaml. The hard-coded tokenizer path with snapshot hash is not portable across environments.


86-98: Undefined variable $max_threads in input_config.json.

Same issue as in disagg-single-node/perf.yaml. The variable $max_threads is used but not defined in the outer scope.

recipes/llama-3-70b/vllm/agg/perf.yaml (4)

23-102: Consider extracting duplicated shell script to a ConfigMap.

Same refactoring suggestion as in the other perf.yaml files. This file is nearly identical except for environment variable values (DEPLOYMENT_GPU_COUNT=4, DEPLOYMENT_MODE=agg, ENDPOINT=llama3-70b-agg-frontend:8000).


8-8: Consider the impact of reducing backoffLimit.

Same issue as in the other perf.yaml files. The backoffLimit reduction from 3 to 1 reduces resilience to transient failures.


52-52: Hard-coded tokenizer path is brittle and not portable.

Same issue as in the other perf.yaml files. The hard-coded tokenizer path with snapshot hash is not portable across environments.


86-98: Undefined variable $max_threads in input_config.json.

Same issue as in the other perf.yaml files. The variable $max_threads is used but not defined in the outer scope.

🧹 Nitpick comments (10)
container/Dockerfile.sglang-wideep (1)

110-133: Align the section comment with AIPerf

We now install aiperf, but the header still reads “GenAI Perf,” which can mislead anyone scanning the Dockerfile. Please rename the comment to reflect the new tool.

-# GenAI Perf
+# AIPerf
recipes/llama-3-70b/vllm/agg/deploy.yaml (2)

38-39: Consider making the snapshot reference more maintainable.

The MODEL_PATH uses a hardcoded snapshot hash (ddb4128556dfcff99e0c41aee159ea6c3e655dcd). This creates tight coupling to a specific model version and may require manual updates when the model is refreshed.

Consider whether this should:

  • Use a symbolic link or alias that can be updated centrally
  • Be parameterized via ConfigMap for easier management
  • Include a comment explaining why this specific snapshot is required

35-41: Parameterize hardcoded snapshot and confirm shell-var expansion

  • Shell variable substitution in args is consistent across all VLLM deployment manifests and will expand correctly under /bin/sh -c.
  • The fixed snapshot hash in MODEL_PATH is repeated in every recipe; consider parameterizing this or using a version tag to avoid brittle, manual updates.
recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (1)

23-102: Consider extracting duplicated shell script to a ConfigMap.

The shell script containing utility functions (wait_for_model_ready, run_perf) and execution flow is nearly identical across three perf.yaml files (disagg-single-node, disagg-multi-node, and agg). Only the environment variable values differ between them.

Consider extracting the common shell script to a ConfigMap and mounting it into the containers. This would:

  • Eliminate code duplication
  • Make updates easier to maintain
  • Reduce the risk of inconsistencies between files
  • Follow the DRY principle

Example structure:

# Create a ConfigMap with the script
apiVersion: v1
kind: ConfigMap
metadata:
  name: perf-script
data:
  run-perf.sh: |
    #!/bin/bash
    # ... entire script here ...

# Mount and execute in the Job
volumeMounts:
  - name: perf-script
    mountPath: /scripts
volumes:
  - name: perf-script
    configMap:
      name: perf-script
      defaultMode: 0755
command: ["/scripts/run-perf.sh"]
deploy/cloud/pre-deployment/README.md (1)

1-172: Excellent documentation for pre-deployment checks!

This README provides comprehensive documentation for the Dynamo pre-deployment check script, including:

  • Clear usage instructions
  • Detailed explanation of each check (kubectl connectivity, default StorageClass, GPU resources)
  • Sample outputs for both passing and failing scenarios
  • Helpful troubleshooting guidance
  • External references to Kubernetes documentation

The documentation is well-organized and will be valuable for users deploying Dynamo.

Minor markdown improvements: Consider adding language specifiers to the fenced code blocks at lines 54-94, 97-124, and 150-153 to improve syntax highlighting and comply with markdown linting best practices. For example:

-```
+```text
 ========================================
   Dynamo Pre-Deployment Check Script
 ========================================

Note: The LanguageTool grammar warnings appear to be false positives related to markdown list formatting and can be safely ignored.

deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml (1)

17-35: Consider security hardening for production deployments.

Static analysis identified potential security improvements:

  • Running without allowPrivilegeEscalation: false could allow privilege escalation
  • Running as root user increases attack surface

For benchmark workloads, these may be acceptable, but for production consider:

     spec:
+      securityContext:
+        runAsNonRoot: true
+        runAsUser: 1000
       containers:
       - name: nixl-benchmark
         image: "my-registry/nixlbench:version-arch"
+        securityContext:
+          allowPrivilegeEscalation: false
+          capabilities:
+            drop:
+              - ALL
deploy/cloud/pre-deployment/pre-deployment-check.sh (1)

205-236: Consider separate declaration and assignment.

Shellcheck warns that declaring and assigning status in the same statement (line 214) can mask the return value of get_check_result. While not a bug here since get_check_result uses echo, separating declaration and assignment is a safer pattern.

Apply this diff if you want to follow the more defensive pattern:

     IFS='|' read -ra CHECKS <<< "$CHECK_ORDER"
     for check_name in "${CHECKS[@]}"; do
-        local status=$(get_check_result "$check_name")
+        local status
+        status=$(get_check_result "$check_name")
         if [[ "$status" == "PASS" ]]; then
benchmarks/llm/plot_pareto.py (1)

81-130: Consider using strict=True in zip for safety.

The function signature change from genai_perf_profile_export_json_paths to aiperf_profile_export_json_paths is correct. However, the zip() call on lines 85-87 should include strict=True to ensure both lists have the same length and catch potential data inconsistencies early.

Apply this diff:

-    for aiperf_profile_export_json_path, deployment_config_json_path in zip(
-        aiperf_profile_export_json_paths, deployment_config_json_paths
-    ):
+    for aiperf_profile_export_json_path, deployment_config_json_path in zip(
+        aiperf_profile_export_json_paths, deployment_config_json_paths, strict=True
+    ):

Based on static analysis hints.

deploy/cloud/pre-deployment/nixl/build_and_deploy.sh (1)

188-188: Remove or explain the debug pause.

Line 188 contains read -p "Press Enter to continue" which appears to be a debug statement. This interrupts the automated flow and should either be removed or documented as an intentional manual checkpoint.

If this is intentional for user review of the download, add a comment explaining why:

+    # Allow user to inspect downloaded files before building
     read -p "Press Enter to continue"

Otherwise, remove it:

-    read -p "Press Enter to continue"
benchmarks/profiler/utils/aiperf.py (1)

67-99: Consider iterable unpacking for list construction.

Both get_prefill_aiperf_cmd and get_decode_aiperf_cmd are correctly renamed and functional. However, the list concatenation pattern could be slightly more efficient using iterable unpacking.

For example, in get_prefill_aiperf_cmd:

-    return _get_common_aiperf_cmd(
+    return [
+        *_get_common_aiperf_cmd(
         artifact_dir,
         seed,
         model,
         tokenizer,
         base_url,
-    ) + [
+        ),
         "--synthetic-input-tokens-mean",
         str(isl),
         # ... rest of arguments
     ]

This is a minor style preference and the current implementation is perfectly functional. Based on static analysis hints.

Also applies to: 102-137

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7badb8 and 5713c67.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (84)
  • Earthfile (3 hunks)
  • README.md (1 hunks)
  • benchmarks/README.md (1 hunks)
  • benchmarks/incluster/benchmark_job.yaml (1 hunks)
  • benchmarks/llm/perf.sh (2 hunks)
  • benchmarks/llm/plot_pareto.py (6 hunks)
  • benchmarks/nixl/README.md (0 hunks)
  • benchmarks/profiler/profile_endpoint.py (1 hunks)
  • benchmarks/profiler/profile_sla.py (3 hunks)
  • benchmarks/profiler/utils/aiperf.py (8 hunks)
  • benchmarks/profiler/utils/config.py (1 hunks)
  • benchmarks/profiler/utils/profile_cache.py (5 hunks)
  • benchmarks/profiler/utils/profile_decode.py (2 hunks)
  • benchmarks/profiler/utils/profile_prefill.py (2 hunks)
  • benchmarks/utils/genai.py (4 hunks)
  • benchmarks/utils/plot.py (1 hunks)
  • components/backends/sglang/deploy/README.md (2 hunks)
  • components/backends/sglang/deploy/agg.yaml (2 hunks)
  • components/backends/sglang/deploy/agg_logging.yaml (2 hunks)
  • components/backends/sglang/deploy/agg_router.yaml (2 hunks)
  • components/backends/sglang/deploy/disagg-multinode.yaml (3 hunks)
  • components/backends/sglang/deploy/disagg.yaml (3 hunks)
  • components/backends/sglang/deploy/disagg_planner.yaml (4 hunks)
  • components/backends/sglang/launch/disagg_dp_attn.sh (2 hunks)
  • components/backends/trtllm/deploy/README.md (3 hunks)
  • components/backends/trtllm/deploy/agg-with-config.yaml (2 hunks)
  • components/backends/trtllm/deploy/agg.yaml (2 hunks)
  • components/backends/trtllm/deploy/agg_router.yaml (2 hunks)
  • components/backends/trtllm/deploy/disagg-multinode.yaml (3 hunks)
  • components/backends/trtllm/deploy/disagg.yaml (3 hunks)
  • components/backends/trtllm/deploy/disagg_planner.yaml (4 hunks)
  • components/backends/trtllm/deploy/disagg_router.yaml (3 hunks)
  • components/backends/vllm/deploy/README.md (2 hunks)
  • components/backends/vllm/deploy/agg.yaml (2 hunks)
  • components/backends/vllm/deploy/agg_kvbm.yaml (2 hunks)
  • components/backends/vllm/deploy/agg_router.yaml (2 hunks)
  • components/backends/vllm/deploy/disagg-multinode.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_kvbm.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_kvbm_tp2.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_planner.yaml (4 hunks)
  • components/backends/vllm/deploy/disagg_router.yaml (3 hunks)
  • container/Dockerfile.sglang (2 hunks)
  • container/Dockerfile.sglang-wideep (2 hunks)
  • container/Dockerfile.trtllm (1 hunks)
  • container/Dockerfile.vllm (1 hunks)
  • container/deps/requirements.txt (2 hunks)
  • deploy/cloud/operator/Earthfile (1 hunks)
  • deploy/cloud/pre-deployment/README.md (1 hunks)
  • deploy/cloud/pre-deployment/nixl/README.md (1 hunks)
  • deploy/cloud/pre-deployment/nixl/build_and_deploy.sh (1 hunks)
  • deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml (1 hunks)
  • deploy/cloud/pre-deployment/pre-deployment-check.sh (1 hunks)
  • docs/_includes/install.rst (2 hunks)
  • docs/backends/sglang/README.md (1 hunks)
  • docs/backends/trtllm/gpt-oss.md (1 hunks)
  • docs/benchmarks/benchmarking.md (7 hunks)
  • docs/kubernetes/README.md (1 hunks)
  • docs/kubernetes/create_deployment.md (1 hunks)
  • docs/kubernetes/sla_planner_quickstart.md (1 hunks)
  • examples/basics/kubernetes/Distributed_Inference/agg_router.yaml (2 hunks)
  • examples/custom_backend/hello_world/deploy/hello_world.yaml (2 hunks)
  • examples/deployments/ECS/task_definition_frontend.json (1 hunks)
  • examples/deployments/ECS/task_definition_prefillworker.json (1 hunks)
  • examples/multimodal/deploy/agg_llava.yaml (4 hunks)
  • examples/multimodal/deploy/agg_qwen.yaml (4 hunks)
  • lib/parsers/Cargo.toml (0 hunks)
  • lib/parsers/tests/mod.rs (0 hunks)
  • recipes/gpt-oss-120b/trtllm/agg/deploy.yaml (2 hunks)
  • recipes/llama-3-70b/model-cache/model-download.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/agg/deploy.yaml (2 hunks)
  • recipes/llama-3-70b/vllm/agg/perf.yaml (2 hunks)
  • recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml (3 hunks)
  • recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (2 hunks)
  • recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml (3 hunks)
  • recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (2 hunks)
  • tests/planner/perf_test_configs/agg_8b.yaml (2 hunks)
  • tests/planner/perf_test_configs/disagg_8b_2p2d.yaml (3 hunks)
  • tests/planner/perf_test_configs/disagg_8b_3p1d.yaml (3 hunks)
  • tests/planner/perf_test_configs/disagg_8b_planner.yaml (4 hunks)
  • tests/planner/perf_test_configs/disagg_8b_tp2.yaml (3 hunks)
  • tests/planner/perf_test_configs/image_cache_daemonset.yaml (1 hunks)
  • tests/planner/scaling/disagg_planner.yaml (4 hunks)
💤 Files with no reviewable changes (3)
  • benchmarks/nixl/README.md
  • lib/parsers/Cargo.toml
  • lib/parsers/tests/mod.rs
🧰 Additional context used
🪛 Checkov (3.2.334)
deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml

[medium] 3-35: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 3-35: Minimize the admission of root containers

(CKV_K8S_23)

🪛 LanguageTool
README.md

[grammar] ~181-~181: There might be a mistake here.
Context: ...ggregated vs. vanilla vLLM) using AIPerf - **[Pre-Deployment Profiling](docs/benchmark...

(QB_NEW_EN)

docs/benchmarks/benchmarking.md

[grammar] ~187-~187: There might be a mistake here.
Context: ...marking The Python benchmarking module: 1. Connects to your port-forwarded endpoi...

(QB_NEW_EN)


[grammar] ~188-~188: There might be a mistake here.
Context: ...nnects** to your port-forwarded endpoint 2. Benchmarks using AIPerf at various con...

(QB_NEW_EN)


[grammar] ~189-~189: There might be a mistake here.
Context: ...els (default: 1, 2, 5, 10, 50, 100, 250) 3. Measures key metrics: latency, through...

(QB_NEW_EN)

deploy/cloud/pre-deployment/README.md

[grammar] ~38-~38: There might be a mistake here.
Context: ...ed summary: ### 1. kubectl Connectivity - Verifies that kubectl is installed and...

(QB_NEW_EN)


[grammar] ~41-~41: There might be a mistake here.
Context: ...tes cluster ### 2. Default StorageClass - Verifies that a default StorageClass is ...

(QB_NEW_EN)


[grammar] ~43-~43: There might be a mistake here.
Context: ...r - If no default StorageClass is found: - Lists all available StorageClasses in th...

(QB_NEW_EN)


[grammar] ~48-~48: There might be a mistake here.
Context: ...d guidance ### 3. Cluster GPU Resources - Checks for GPU-enabled nodes in the clus...

(QB_NEW_EN)


[grammar] ~130-~130: There might be a mistake here.
Context: ... Name | Description | Pass/Fail Status | |------------|-------------|------------...

(QB_NEW_EN)


[grammar] ~131-~131: There might be a mistake here.
Context: ...------|-------------|------------------| | kubectl Connectivity | Verifies ku...

(QB_NEW_EN)


[grammar] ~132-~132: There might be a mistake here.
Context: ...d cluster access | ✅ PASSED / ❌ FAILED | | Default StorageClass | Checks for ...

(QB_NEW_EN)


[grammar] ~133-~133: There might be a mistake here.
Context: ...Class annotation | ✅ PASSED / ❌ FAILED | | Cluster Resources | Validates GPU ...

(QB_NEW_EN)


[grammar] ~148-~148: There might be a mistake here.
Context: ...ing ### Multiple Default StorageClasses If you have multiple StorageClasses mark...

(QB_NEW_EN)


[grammar] ~160-~160: There might be a mistake here.
Context: ...:"false"}}}' ``` ### No GPU Nodes Found If no GPU nodes are found, ensure your c...

(QB_NEW_EN)


[grammar] ~163-~163: There might be a mistake here.
Context: ... label. ### No StorageClasses Available If no StorageClasses are available in yo...

(QB_NEW_EN)


[grammar] ~164-~164: There might be a mistake here.
Context: ...ailable in your cluster, you'll need to: 1. Install a storage provisioner (e.g., for...

(QB_NEW_EN)

deploy/cloud/pre-deployment/nixl/README.md

[grammar] ~16-~16: There might be a mistake here.
Context: ... Prerequisites ### Cluster Requirements Before deploying NIXL benchmark, ensure ...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...isites ### Cluster Requirements Before deploying NIXL benchmark, ensure your cluster mee...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...d status ### NIXL-Specific Requirements In addition to the cluster requirements ...

(QB_NEW_EN)


[grammar] ~30-~30: There might be a mistake here.
Context: ...dition to the cluster requirements above, NIXL benchmark requires: - Docker i...

(QB_NEW_EN)


[grammar] ~30-~30: There might be a mistake here.
Context: ...irements above, NIXL benchmark requires: - Docker installed and configured on you...

(QB_NEW_EN)


[grammar] ~31-~31: There might be a mistake here.
Context: ...your local machine (for building images) - Docker registry access to push the bui...

(QB_NEW_EN)


[grammar] ~32-~32: There might be a mistake here.
Context: ...ess** to push the built nixlbench images - ETCD service deployed and accessible a...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ...NIXL source code ### Verification Steps 1. Run pre-deployment check (recommended)...

(QB_NEW_EN)


[grammar] ~63-~63: There might be a mistake here.
Context: ...vides a flexible workflow where you can: 1. Select architecture: Choose between x8...

(QB_NEW_EN)


[grammar] ~65-~65: There might be a mistake here.
Context: ...to execute**: Select any combination of: - Build nixlbench Docker image - Update...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ...ipt Features ### Architecture Selection The script supports two architectures: -...

(QB_NEW_EN)


[grammar] ~76-~76: There might be a mistake here.
Context: ...n The script supports two architectures: - Option 1: x86_64 (Intel/AMD 64-bit) - ...

(QB_NEW_EN)


[grammar] ~77-~77: There might be a mistake here.
Context: ... Option 1: x86_64 (Intel/AMD 64-bit) - Option 2: aarch64 (ARM64) You can sel...

(QB_NEW_EN)


[grammar] ~84-~84: There might be a mistake here.
Context: ...aarch64 architecture ### Step Selection Choose which steps to execute by enterin...

(QB_NEW_EN)


[grammar] ~87-~87: There might be a mistake here.
Context: ...ma-separated numbers: - All steps: 1,2,3 - Build and update only: 1,2 (skips Ku...

(QB_NEW_EN)


[grammar] ~88-~88: There might be a mistake here.
Context: ...y**: 1,2 (skips Kubernetes deployment) - Deploy only: 3 (useful if image is a...

(QB_NEW_EN)


[grammar] ~89-~89: There might be a mistake here.
Context: ...lready built and deployment file exists) - Build only: 1 (useful for just creat...

(QB_NEW_EN)


[grammar] ~90-~90: There might be a mistake here.
Context: ...eful for just creating the Docker image) - Update deployment only: 2 (useful fo...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...y/version) ### Smart Registry Prompting The script only prompts for Docker regis...

(QB_NEW_EN)


[grammar] ~94-~94: There might be a mistake here.
Context: ...Docker registry information when needed: - Steps 1 or 2: Registry required for bu...

(QB_NEW_EN)


[grammar] ~95-~95: There might be a mistake here.
Context: ...or building image or updating deployment - Step 3 only: No registry prompt (uses ...

(QB_NEW_EN)


[grammar] ~102-~102: There might be a mistake here.
Context: ...### Step 1: Build nixlbench Docker Image - Downloads NIXL source code (version 0.6....

(QB_NEW_EN)


[grammar] ~109-~109: There might be a mistake here.
Context: ... ### Step 2: Update Deployment YAML File - Copies base deployment template (`nixlbe...

(QB_NEW_EN)


[grammar] ~110-~110: There might be a mistake here.
Context: ...t template (nixlbench-deployment.yaml) - Creates architecture-specific deployment...

(QB_NEW_EN)


[grammar] ~111-~111: There might be a mistake here.
Context: ...ile (nixlbench-deployment-{arch}.yaml) - Updates image reference with your regist...

(QB_NEW_EN)


[grammar] ~112-~112: There might be a mistake here.
Context: ...ence with your registry and architecture - Preserves all other deployment configura...

(QB_NEW_EN)


[grammar] ~115-~115: There might be a mistake here.
Context: ...ations ### Step 3: Deploy to Kubernetes - Validates deployment file exists - Appli...

(QB_NEW_EN)


[grammar] ~124-~124: There might be a mistake here.
Context: ...t Configuration The deployment creates: - 2 replicas of the nixlbench pod - **Re...

(QB_NEW_EN)


[grammar] ~125-~125: There might be a mistake here.
Context: ...s: - 2 replicas of the nixlbench pod - Resource requests/limits: - CPU: 10 ...

(QB_NEW_EN)


[grammar] ~126-~126: There might be a mistake here.
Context: ...ench pod - Resource requests/limits: - CPU: 10 cores - Memory: 5Gi - GPU: 1...

(QB_NEW_EN)


[grammar] ~127-~127: There might be a mistake here.
Context: ...rce requests/limits**: - CPU: 10 cores - Memory: 5Gi - GPU: 1 NVIDIA GPU per po...

(QB_NEW_EN)


[grammar] ~128-~128: Ensure spelling is correct
Context: ...limits**: - CPU: 10 cores - Memory: 5Gi - GPU: 1 NVIDIA GPU per pod - **Environmen...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~129-~129: There might be a mistake here.
Context: ...emory: 5Gi - GPU: 1 NVIDIA GPU per pod - Environment variables: - `ETCD_ENDPO...

(QB_NEW_EN)


[grammar] ~130-~130: There might be a mistake here.
Context: ...GPU per pod - Environment variables: - ETCD_ENDPOINTS: Points to etcd:2379 - Command: R...

(QB_NEW_EN)


[grammar] ~131-~131: There might be a mistake here.
Context: ...bles**: - ETCD_ENDPOINTS: Points to etcd:2379 - Command: Runs nixlbench with VRAM segm...

(QB_NEW_EN)


[grammar] ~181-~181: There might be a mistake here.
Context: ...of the base template with your specific: - Docker registry - Image tag - Architectu...

(QB_NEW_EN)


[grammar] ~182-~182: There might be a mistake here.
Context: ...te with your specific: - Docker registry - Image tag - Architecture --- ## Monito...

(QB_NEW_EN)


[grammar] ~183-~183: There might be a mistake here.
Context: ... specific: - Docker registry - Image tag - Architecture --- ## Monitoring Your De...

(QB_NEW_EN)


[grammar] ~217-~217: There might be a mistake here.
Context: ...roubleshooting ### Cluster-Level Issues For cluster-related problems, first run ...

(QB_NEW_EN)


[grammar] ~224-~224: There might be a mistake here.
Context: ...t-check.sh ``` This will help diagnose: - kubectl connectivity problems - Missing ...

(QB_NEW_EN)


[grammar] ~232-~232: There might be a mistake here.
Context: ...Specific Issues 1. ETCD Connection: - Ensure etcd service is running: `kubectl...

(QB_NEW_EN)


[grammar] ~237-~237: There might be a mistake here.
Context: ...ect namespace 2. Image Pull Issues: - Verify registry credentials are configur...

(QB_NEW_EN)


[grammar] ~239-~239: There might be a mistake here.
Context: ...are configured - Check image exists: docker pull {registry}/nixlbench:0.6.0-{arch} - Ensure image was pushed successfully aft...

(QB_NEW_EN)


[grammar] ~242-~242: There might be a mistake here.
Context: ...ully after build 3. Build Failures: - Ensure Docker daemon is running - Che...

(QB_NEW_EN)


[grammar] ~248-~248: There might be a mistake here.
Context: ...nzip` 4. Deployment File Not Found: - Run step 2 to create deployment file bef...

(QB_NEW_EN)


[grammar] ~249-~249: There might be a mistake here.
Context: ...nt File Not Found**: - Run step 2 to create deployment file before step 3 - Chec...

(QB_NEW_EN)


[grammar] ~250-~250: There might be a mistake here.
Context: ...fore step 3 - Check file permissions in script directory - Verify script dir...

(QB_NEW_EN)


[grammar] ~286-~286: There might be a mistake here.
Context: ...cript Reference ### build_and_deploy.sh Interactive script that provides flexibl...

(QB_NEW_EN)


[grammar] ~287-~287: There might be a mistake here.
Context: ...d_and_deploy.sh Interactive script that provides flexible build and deployment workflow:...

(QB_NEW_EN)


[grammar] ~287-~287: There might be a mistake here.
Context: ... flexible build and deployment workflow: - Architecture selection: x86_64 or aarc...

(QB_NEW_EN)


[grammar] ~288-~288: There might be a mistake here.
Context: ...hitecture selection**: x86_64 or aarch64 - Step selection: Choose any combination...

(QB_NEW_EN)


[grammar] ~292-~292: There might be a mistake here.
Context: ...deploying ### nixlbench-deployment.yaml Base Kubernetes deployment template that...

(QB_NEW_EN)


[grammar] ~293-~293: There might be a mistake here.
Context: ...late that gets customized by the script: - Template image: `my-registry/nixlbench...

(QB_NEW_EN)


[grammar] ~294-~294: There might be a mistake here.
Context: ...ed by the script: - Template image: my-registry/nixlbench:version-arch - Resource allocation: 10 CPU, 5Gi memor...

(QB_NEW_EN)


[grammar] ~295-~295: There might be a mistake here.
Context: ...ion**: 10 CPU, 5Gi memory, 1 GPU per pod - ETCD integration: Pre-configured envir...

(QB_NEW_EN)


[grammar] ~296-~296: There might be a mistake here.
Context: ...**: Pre-configured environment variables - Benchmark command: Runs with VRAM segm...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.18.1)
deploy/cloud/pre-deployment/README.md

54-54: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


97-97: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.0)
benchmarks/utils/genai.py

83-83: subprocess call: check for execution of untrusted input

(S603)

benchmarks/utils/plot.py

47-47: Do not catch blind exception: Exception

(BLE001)

benchmarks/profiler/utils/aiperf.py

76-99: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)


112-137: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)


147-149: Avoid specifying long messages outside the exception class

(TRY003)


171-171: subprocess call: check for execution of untrusted input

(S603)


214-214: subprocess call: check for execution of untrusted input

(S603)


232-232: subprocess call: check for execution of untrusted input

(S603)

benchmarks/llm/plot_pareto.py

36-36: Create your own exception

(TRY002)


36-36: Avoid specifying long messages outside the exception class

(TRY003)


85-87: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


266-268: Avoid specifying long messages outside the exception class

(TRY003)

🪛 Shellcheck (0.11.0)
deploy/cloud/pre-deployment/pre-deployment-check.sh

[warning] 105-105: provisioner appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 214-214: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (80)
docs/_includes/install.rst (2)

13-13: Doc version update looks good

Thanks for bumping the PyPI install snippet to 0.6.0; this keeps the doc aligned with the tagged release.


44-44: Docker tag alignment confirmed

The Docker command now points at the 0.6.0 runtime image, matching the standardized version across the project. No further action needed.

container/deps/requirements.txt (2)

5-7: Appreciate the pinned git SHAs.

Thanks for pinning aiconfigurator and aiperf to specific commits; that keeps our images reproducible as we roll out the new tooling.


34-34: Double-check NumPy compatibility alongside this SciPy pin.

We’re explicitly holding SciPy below 1.14 because of pmdarima, but NumPy is now unconstrained. Please confirm whether pmdarima (and the rest of the stack) already caps NumPy <2.0 via its metadata; if not, add an explicit <2 guard so we don’t accidentally pick up NumPy 2.x and hit ABI/build failures.

examples/basics/kubernetes/Distributed_Inference/agg_router.yaml (1)

41-41: Image tag alignment looks good.

Both frontend and decode worker now point to the shared 0.6.0 runtime, matching the common recipe template.

Also applies to: 98-98

components/backends/vllm/deploy/agg.yaml (1)

16-16: Consistent vLLM runtime update.

The deployment now uses the standardized 0.6.0 image for both roles; no further action needed.

Also applies to: 27-27

examples/multimodal/deploy/agg_qwen.yaml (1)

17-17: Multimodal stack now references the shared 0.6.0 build.

Glad to see every worker updated together—prevents heterogeneous runtimes during deploys.

Also applies to: 28-28, 45-45, 62-62

examples/custom_backend/hello_world/deploy/hello_world.yaml (1)

44-44: Hello-world example tracks the canonical 0.6.0 image.

Keeps the tutorial consistent with the shipped containers.

Also applies to: 83-83

components/backends/vllm/deploy/agg_kvbm.yaml (1)

16-16: KV cache deployment adopts the 0.6.0 runtime.

Update aligns this specialized recipe with the rest of the vLLM templates.

Also applies to: 34-34

components/backends/trtllm/deploy/disagg-multinode.yaml (1)

98-98: Uniform TRT-LLM runtime at 0.6.0.

All three roles now share the same vetted image; this prevents mixed-version cluster issues.

Also applies to: 130-130, 170-170

recipes/gpt-oss-120b/trtllm/agg/deploy.yaml (1)

33-33: Recipe now mirrors the common 0.6.0 TRT-LLM image.

Brings the GPT-OSS recipe in sync with the shared deployment guidance.

Also applies to: 72-72

benchmarks/README.md (1)

18-18: Doc wording updated appropriately.

Calling out AIPerf keeps the README aligned with the current benchmarking stack.

components/backends/vllm/deploy/agg_router.yaml (1)

16-16: LGTM - Image version update.

The image tags have been consistently updated to 0.6.0. This aligns with the broader PR-wide upgrade across vLLM deployments.

Also applies to: 30-30

components/backends/trtllm/deploy/agg-with-config.yaml (1)

37-37: LGTM - TensorRT-LLM image version update.

The image tags have been updated to 0.6.0 for the TensorRT-LLM runtime. Ensure that nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0 is available and compatible with the current configuration.

Also applies to: 53-53

components/backends/trtllm/deploy/agg.yaml (1)

16-16: LGTM - Consistent image version update.

Image tags updated to 0.6.0 for TensorRT-LLM aggregate deployment.

Also applies to: 27-27

components/backends/trtllm/deploy/disagg_planner.yaml (1)

19-19: LGTM - All container images updated consistently.

The image tags have been updated to 0.6.0 across all four containers (Frontend, Planner, TRTLLMDecodeWorker, TRTLLMPrefillWorker) in the disaggregated planner deployment.

Also applies to: 47-47, 88-88, 117-117

components/backends/vllm/deploy/disagg.yaml (2)

16-16: LGTM - vLLM disaggregated deployment image updates.

Image tags consistently updated to 0.6.0 across all three containers.

Also applies to: 28-28, 48-48


1-57: PR title does not match the actual changes.

The PR title states "fix: update model recipe for llama-3 70b to match with common recipe template", but the actual changes are:

  1. Image version bumps from my-tag to 0.6.0 across multiple deployment files
  2. Docker registry migration to nvcr.io/nvidia/ai-dynamo
  3. Tool rename from genai-perf to aiperf

No llama-3 70b model-specific recipe changes are present in these files. Please update the PR title and description to accurately reflect the scope of changes, such as:

  • "chore: upgrade deployment images to 0.6.0 and migrate to NVIDIA registry"
  • "refactor: replace genai-perf with aiperf and update container images"
components/backends/vllm/deploy/disagg_router.yaml (1)

16-16: Verify vllm-runtime:0.6.0 image availability
Test the new tag can be pulled or inspected with valid nvcr.io credentials before deploying:

docker pull nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0

Applies to lines 16, 30, 49.

deploy/cloud/operator/Earthfile (1)

48-48: Verify nvcr.io registry credentials and access (Earthfile:48).

  • Connectivity check returned 401 Unauthorized—registry reachable but requires authentication.
  • Add Docker credentials for nvcr.io (e.g., in ~/.docker/config.json or CI/CD secrets).
  • Ensure deployment and CI/CD pipelines can pull from nvcr.io/nvidia/ai-dynamo.
    Earthfile is deprecated (lines 17–22); plan migration to the new docker-based build system.
benchmarks/llm/perf.sh (1)

205-205: Confirm aiperf installation and compatibility

  • The benchmarking script was updated from genai-perf to aiperf, but aiperf is not found in the environment.
  • Ensure aiperf is installed in all target environments (local, CI, Docker, etc.)
  • Verify that aiperf profile accepts the same flags as genai-perf profile or document any changes
  • Update README or setup guides to reflect the tool rename and installation steps
tests/planner/perf_test_configs/disagg_8b_planner.yaml (1)

44-44: Image updates look consistent.

All services now reference nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, keeping the deployment aligned with the common recipe template. 👍

Also applies to: 77-77, 141-141, 198-198

components/backends/trtllm/deploy/disagg_router.yaml (1)

16-16: Container images aligned to 0.6.0.

Frontend, prefill, and decode workers all point to the shared trtllm-runtime:0.6.0, matching the standardized recipe.

Also applies to: 30-30, 58-58

container/Dockerfile.vllm (1)

190-191: Adding git to runtime packages makes sense.

Including git in the runtime image unblocks workflows that rely on it (e.g., pulling git-hosted assets) without affecting cleanup.

components/backends/vllm/deploy/disagg-multinode.yaml (1)

16-16: Version bump applied uniformly.

Frontend, decode, and prefill pods now share the vllm-runtime:0.6.0 image, keeping the multinode recipe in sync with the common template.

Also applies to: 37-37, 60-60

components/backends/sglang/deploy/README.md (1)

64-65: Docs now match the new registry/tag.

README snippets reference sglang-runtime:0.6.0, mirroring the deployment YAML updates.

Also applies to: 95-95

components/backends/sglang/deploy/disagg_planner.yaml (1)

19-19: All planner components aligned to 0.6.0.

Frontend, planner, and both worker roles now run the standardized sglang-runtime:0.6.0 image.

Also applies to: 30-30, 52-52, 84-84

components/backends/trtllm/deploy/README.md (1)

92-93: Documentation reflects the new TensorRT-LLM image.

The README now points to trtllm-runtime:0.6.0, matching the deployment YAML updates.

Also applies to: 112-112, 144-144

components/backends/trtllm/deploy/disagg.yaml (1)

16-16: Runtime image updates look good.

Each service now references trtllm-runtime:0.6.0, matching the common recipe template.

Also applies to: 28-28, 56-56

tests/planner/scaling/disagg_planner.yaml (1)

21-21: LGTM! Container image references consistently updated.

All container images have been updated from placeholder values to the official NVIDIA registry with the 0.6.0 tag, aligning with the PR-wide migration to standardized image references.

Also applies to: 44-44, 81-81, 105-105

components/backends/sglang/deploy/disagg.yaml (1)

16-16: LGTM! Container images updated to official NVIDIA registry.

The sglang-runtime images have been consistently updated to version 0.6.0 across all service containers (Frontend, decode, and prefill).

Also applies to: 28-28, 61-61

recipes/llama-3-70b/vllm/agg/deploy.yaml (1)

41-41: LGTM! GPU resources and tensor parallelism correctly aligned.

The reduction of both tensor-parallel-size (from 8 to 4) and GPU resources (from 8 to 4) is properly aligned, ensuring each tensor parallel shard gets exactly one GPU. This appears to be an intentional configuration change for the updated recipe template.

Also applies to: 50-52

recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml (2)

21-21: LGTM! Container images consistently updated.

All container images updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 across Frontend, VllmPrefillWorker, and VllmDecodeWorker.

Also applies to: 45-45, 74-74


35-41: Verify shell variable expansion for both workers.

Both VllmPrefillWorker and VllmDecodeWorker use environment variable substitution in their args. The pattern is consistent with the agg deployment. Verify that the shell expansion works correctly in your Kubernetes cluster.

Also applies to: 64-70

Earthfile (1)

137-137: LGTM! Docker registry defaults updated consistently.

The DOCKER_SERVER defaults have been updated from my-registry to nvcr.io/nvidia/ai-dynamo across all three targets (dynamo-base-docker, all-docker, custom), aligning with the PR-wide registry migration.

Note: Since this Earthfile is marked as deprecated (lines 17-22), ensure the replacement system also uses the correct registry defaults.

Also applies to: 178-178, 192-192

benchmarks/utils/plot.py (1)

35-50: LGTM! Renaming from GenAI-Perf to AIPerf completed consistently.

The renaming has been applied consistently throughout the function:

  • Variable names: genai_perf_jsonaiperf_json
  • File pattern: profile_export_genai_perf.jsonprofile_export_aiperf.json
  • Messages: "genai"/"Genai-perf" → "aiperf"

The logic and control flow remain unchanged.

Note: The static analysis tool flags line 47 for catching a broad Exception. While more specific exception handling would be ideal, this is acceptable for a benchmarking utility script where you want to continue processing other concurrency levels even if one fails.

benchmarks/utils/genai.py (2)

36-36: LGTM! Public function renamed consistently.

The function run_genai_perf has been renamed to run_aiperf while maintaining the same signature and semantics. The caller in run_concurrency_sweep (line 116) has been updated accordingly.

Also applies to: 116-116


47-47: LGTM! Command and internal references updated consistently.

The changes correctly update:

  • Command from "genai-perf" to "aiperf"
  • Variable from gap_process to aip_process
  • User-facing messages from "Genai-perf" to "Aiperf"

Note: The static analysis warning about subprocess execution (line 83) is a false positive. The command is constructed from structured benchmark parameters (service_url, model_name, concurrency levels, etc.) that come from the function arguments, not from arbitrary untrusted input.

Also applies to: 79-79, 83-101

components/backends/sglang/launch/disagg_dp_attn.sh (1)

30-30: Verify new round-robin flags in sglang CLI

  • Prefill worker: replaced --expert-distribution-recorder-mode stat (MoE diagnostics) with --load-balance-method round_robin (valid DP routing flag).
  • Decode worker: confirm --prefill-round-robin-balance is supported in sglang 0.6.0—otherwise switch to --load-balance-method round_robin or the correct router policy.
  • Ensure removal of SGLANG_EXPERT_DISTRIBUTION_RECORDER_DIR (MoE stats) is intentional.
benchmarks/profiler/utils/profile_cache.py (2)

29-29: LGTM! Consistent renaming from GenAI-Perf to AIPerf.

The file pattern updates from profile_export_genai_perf.json to profile_export_aiperf.json and from gap_request* to aiperf_request* are consistent throughout the file. This aligns with the broader repository-wide migration to AIPerf tooling.

Also applies to: 68-69, 96-96, 118-118, 131-131


45-49: Good addition of JSON validation for prefill results.

The validation logic appropriately checks for time_to_first_token metrics within the JSON structure, which strengthens the data validation for AIPerf results.

docs/benchmarks/benchmarking.md (1)

64-64: LGTM! Documentation consistently updated for AIPerf.

The documentation correctly reflects the migration from GenAI-Perf to AIPerf across all sections, including:

  • Tool descriptions and framing
  • Command-line parameter documentation
  • Artifact file naming (profile_export_aiperf.json/csv)
  • Docker image references updated to 0.6.0

This aligns with the broader repository changes to AIPerf tooling.

Also applies to: 73-73, 168-169, 182-183, 189-189, 304-307, 413-413, 519-519

docs/kubernetes/README.md (1)

22-25: LGTM! Helpful addition of pre-deployment checks.

The new "Pre-deployment Checks" section appropriately guides users to run pre-deployment validation before installing the platform. This is well-placed before the installation steps and provides a clear reference to the detailed documentation.

components/backends/vllm/deploy/README.md (1)

72-72: LGTM! Documentation updated to match deployment files.

The image references have been correctly updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, which aligns with the broader repository-wide image version standardization visible across multiple deployment manifests.

Also applies to: 119-119

components/backends/vllm/deploy/disagg_planner.yaml (1)

19-19: LGTM! Consistent image version updates.

All four services (Frontend, Planner, VllmDecodeWorker, VllmPrefillWorker) have been updated to use nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0. This is consistent with the broader repository-wide standardization to version 0.6.0 and the NVIDIA AI-Dynamo image registry.

Also applies to: 29-29, 51-51, 71-71

recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (2)

52-52: Hard-coded tokenizer path is brittle and not portable.

The tokenizer path /root/.cache/huggingface/hub/models--RedHatAI--Llama-3.3-70B-Instruct-FP8-dynamic/snapshots/ddb4128556dfcff99e0c41aee159ea6c3e655dcd is hard-coded with a specific snapshot hash. This creates the following issues:

  • The path will break if the model is updated or the cache structure changes
  • The snapshot hash is not parameterized and may not match across different environments
  • The path assumes a specific cache location structure

Consider using the model identifier directly (e.g., --tokenizer $TARGET_MODEL) and let the tool resolve the path, or parameterize the tokenizer path as an environment variable if an absolute path is required.

⛔ Skipped due to learnings
Learnt from: biswapanda
PR: ai-dynamo/dynamo#3143
File: recipes/gpt-oss-120b/trtllm/agg/bench-pub.yaml:0-0
Timestamp: 2025-09-24T23:44:28.972Z
Learning: In the gpt-oss-120b benchmarking jobs, hardcoded snapshot hashes in tokenizer paths are intentional for version pinning and reproducible results, not an issue to be fixed.

8-8: Verify reduced backoffLimit is intentional
All perf.yaml files under recipes/* use backoffLimit: 1; if retries for transient failures are needed, consider reverting to backoffLimit: 3.

components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (1)

16-16: LGTM! Consistent image update across all containers.

All three container images (Frontend, VllmDecodeWorker, VllmPrefillWorker) have been updated to use nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, maintaining consistency with the broader registry migration in this PR.

Also applies to: 27-27, 63-63

tests/planner/perf_test_configs/agg_8b.yaml (1)

41-41: LGTM! Image updates are consistent.

Both Frontend and VllmDecodeWorker containers have been updated to use the NVIDIA registry image at version 0.6.0, consistent with the migration pattern across this PR.

Also applies to: 91-91

components/backends/sglang/deploy/agg_logging.yaml (1)

19-19: LGTM! SGLang runtime images updated consistently.

Both Frontend and decode worker containers have been updated to use nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0, maintaining version consistency with the vLLM deployments in this PR.

Also applies to: 30-30

components/backends/vllm/deploy/disagg_kvbm.yaml (1)

16-16: LGTM! All container images updated consistently.

The three containers in this KVBM disaggregated deployment (Frontend, VllmDecodeWorker, VllmPrefillWorker) have been updated to version 0.6.0, aligning with the registry migration across this PR.

Also applies to: 27-27, 59-59

components/backends/sglang/deploy/agg.yaml (1)

16-16: LGTM! Image references updated correctly.

Both Frontend and decode worker containers have been updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0, maintaining consistency with the broader image migration in this PR.

Also applies to: 27-27

deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml (2)

21-26: LGTM: Parameterized ETCD endpoint configuration.

The addition of the ETCD_ENDPOINTS environment variable and its use in the command improves configurability and follows best practices for avoiding hardcoded values.


27-35: LGTM: Resource requests and limits specified.

Adding explicit CPU, memory, and GPU resource requests and limits is a best practice for Kubernetes deployments, ensuring predictable scheduling and preventing resource contention.

components/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)

16-16: LGTM: Container image version updated to 0.6.0.

The image references have been consistently updated to use nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 across all three services (Frontend, VllmDecodeWorker, VllmPrefillWorker), aligning with the repository-wide standardization to NVIDIA's official registry and versioning.

Also applies to: 29-29, 67-67

benchmarks/profiler/utils/profile_prefill.py (2)

9-9: LGTM: Import updated to use AIPerf module.

The import path has been updated from benchmarks.profiler.utils.genai_perf to benchmarks.profiler.utils.aiperf, aligning with the repository-wide renaming from GenAI-Perf to AIPerf.


83-94: LGTM: Artifact directory and result access updated for AIPerf.

The changes consistently update:

  • Artifact directory naming from genai_perf_artifact_dir to ai_perf_artifact_dir
  • Function calls from benchmark_prefill (genai_perf) to the aiperf equivalent
  • Result access pattern to match AIPerf's structure: aiperf_result["records"]["ttft"]["avg"]

The logic remains functionally equivalent, properly handling None results.

deploy/cloud/pre-deployment/pre-deployment-check.sh (4)

42-59: LGTM: kubectl connectivity check is well-structured.

The function properly checks both command availability and cluster connectivity, with clear error messages guiding users to remediation steps.


122-141: LGTM: GPU resource check is straightforward and effective.

The function correctly identifies GPU-enabled nodes using the nvidia.com/gpu.present=true label with appropriate error messaging.


143-177: LGTM: GPU operator check handles partial running state.

The function properly distinguishes between no operator, non-running pods, partially running, and fully running states, providing appropriate warnings and status messages for each scenario.


239-283: LGTM: Main function orchestrates checks effectively.

The main function properly executes all checks, records results, displays a summary, and exits with the appropriate code. The try-finally pattern with set -e ensures proper error handling.

benchmarks/profiler/profile_sla.py (3)

25-25: LGTM: Import updated to AIPerf modules.

The import has been updated to use benchmarks.profiler.utils.aiperf for both benchmark_decode and benchmark_prefill, consistent with the repository-wide migration from GenAI-Perf to AIPerf.


248-259: LGTM: Prefill benchmarking updated to use AIPerf.

The changes consistently update:

  • Artifact directory naming from gap_isl{args.isl} to aiperf_isl{args.isl}
  • Function call target to benchmark_prefill from the aiperf module
  • Result access to aiperf_result["records"]["ttft"]["avg"]

The logic flow remains unchanged, properly handling None results.


427-444: LGTM: Decode benchmarking updated to use AIPerf.

The changes consistently update:

  • Artifact directory naming pattern to use aiperf_ prefix
  • Function call to benchmark_decode from the aiperf module
  • Result access patterns to use aiperf_result["records"]["inter_token_latency"]["avg"] and aiperf_result["records"]["output_token_throughput"]["avg"]

The throughput calculation correctly divides by num_gpus to get per-GPU metrics, and None handling is preserved.

deploy/cloud/pre-deployment/nixl/README.md (1)

1-297: LGTM: Comprehensive and well-structured documentation.

The README provides excellent coverage of:

  • Prerequisites with clear verification steps
  • Interactive script usage with multiple examples
  • Deployment configuration details
  • Monitoring commands
  • Troubleshooting guidance organized by issue type

The documentation follows best practices by referencing the pre-deployment check script and providing step-by-step workflows for different use cases.

Note: The static analysis grammar warnings are false positives related to markdown list formatting detection and can be safely ignored. The term "5Gi" is valid Kubernetes resource notation.

benchmarks/profiler/utils/profile_decode.py (2)

9-9: LGTM: Import updated to AIPerf module.

The import path has been updated to use benchmarks.profiler.utils.aiperf, consistent with the repository-wide migration from GenAI-Perf to AIPerf.


115-132: LGTM: Decode benchmarking updated to use AIPerf.

The changes consistently update:

  • Artifact directory naming from gap_isl{isl}_osl{osl}_n{num_request} to aiperf_isl{isl}_osl{osl}_n{num_request}
  • Function call to benchmark_decode from the aiperf module
  • Result access to use aiperf_result["records"]["inter_token_latency"]["avg"] and aiperf_result["records"]["output_token_throughput"]["avg"]

The per-GPU throughput calculation and None handling logic remain correct.

tests/planner/perf_test_configs/disagg_8b_2p2d.yaml (1)

41-41: LGTM: Container image version updated to 0.6.0.

The image references have been consistently updated to use nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 across all three services (Frontend, VllmDecodeWorker, VllmPrefillWorker), aligning with the repository-wide standardization to NVIDIA's official registry and versioning.

Also applies to: 91-91, 141-141

components/backends/sglang/deploy/agg_router.yaml (1)

16-16: LGTM! Image references updated consistently.

The container image updates to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0 are consistent across both the Frontend and decode worker components.

Also applies to: 30-30

benchmarks/llm/plot_pareto.py (2)

28-43: LGTM! Function correctly updated for AIPerf.

The renaming from genai_perf_profile_export_json_paths to aiperf_profile_export_json_paths is consistent throughout, and the file search pattern correctly targets profile_export_aiperf.json.


242-247: LGTM! Help text updated correctly.

The argument parser description and help text have been properly updated from "GenAI-Perf" to "AIPerf", maintaining consistency with the renamed functionality.

deploy/cloud/pre-deployment/nixl/build_and_deploy.sh (5)

1-11: LGTM! Script header and safety settings are correct.

The script includes proper licensing, uses set -euo pipefail for robust error handling, and defines the NIXL version and script directory appropriately.


12-133: Excellent dependency checking implementation.

The check_dependencies and command_exists functions provide thorough validation of required tools (wget, unzip, kubectl, docker) with clear error messages and installation suggestions. The warnings for Docker daemon and buildx availability are helpful for troubleshooting.


213-263: LGTM! Step selection logic is well-structured.

The prompt_for_steps function provides a clear interactive interface for users to select which steps to execute, with proper validation and error handling.


265-279: LGTM! Deployment function is clean and informative.

The deploy_to_k8s function correctly applies the deployment and provides helpful next-step commands for checking pod status and viewing logs.


281-413: LGTM! Main function orchestrates workflow effectively.

The main function provides a clear workflow with multiple confirmation points, proper error handling, and informative summaries. The step-by-step execution with conditional logic based on user selections is well-implemented.

benchmarks/profiler/utils/aiperf.py (4)

33-64: LGTM! Common command builder updated correctly.

The function has been properly renamed from _get_common_genai_perf_cmd to _get_common_aiperf_cmd, and the executable name is correctly changed from "genai-perf" to "aiperf".


140-151: LGTM! Result retrieval function updated correctly.

The function has been renamed from get_gap_result to get_aiperf_result, and the file search correctly targets profile_export_aiperf.json instead of profile_export_genai_perf.json.


154-186: LGTM! Prefill benchmark function updated correctly.

The function signature has been properly updated to use aiperf_artifact_dir instead of genai_perf_artifact_dir, and all internal references have been consistently renamed. The logging messages correctly reflect the AIPerf terminology.


189-247: LGTM! Decode benchmark function updated correctly.

The function signature and implementation have been thoroughly updated from GenAI-Perf to AIPerf naming. The warmup and profiling flows correctly use the renamed command builders and result retrieval functions.

Comment thread benchmarks/profiler/profile_endpoint.py
Comment thread container/deps/requirements.txt
Comment thread deploy/cloud/pre-deployment/nixl/build_and_deploy.sh
Comment thread deploy/cloud/pre-deployment/nixl/build_and_deploy.sh
Comment thread deploy/cloud/pre-deployment/pre-deployment-check.sh
Comment thread recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml
Comment thread tests/planner/perf_test_configs/disagg_8b_3p1d.yaml
@biswapanda biswapanda changed the base branch from main to release/0.6.0 October 15, 2025 20:18
@biswapanda biswapanda force-pushed the bis/cp-fix-model-recipe branch from 5713c67 to 665ce7f Compare October 15, 2025 20:19
@saturley-hall saturley-hall merged commit 1ef8cc1 into release/0.6.0 Oct 16, 2025
24 of 27 checks passed
@saturley-hall saturley-hall deleted the bis/cp-fix-model-recipe branch October 16, 2025 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants