feat: cherry pick PR#3306 benchmarks use aiperf by saturley-hall · Pull Request #3626 · ai-dynamo/dynamo

saturley-hall · 2025-10-14T22:01:11Z

Overview:

Cherry-pick PR #3306. First of several.

Summary by CodeRabbit

New Features
- None
Bug Fixes
- Corrected command typo from python3E to python3 in deployment scripts.
Refactor
- Migrated benchmarking workflows from GenAI-Perf to AIPerf, including tooling, outputs, and plots.
Documentation
- Updated guides and READMEs to reference AIPerf and pinned container images to version 0.6.0.
Tests
- Refreshed test/deployment configs to use 0.6.0 images.
Chores
- Standardized container images to nvcr.io with 0.6.0 tags across deployments and examples.
- Added git to runtime images and updated dependencies (added AIPerf; relaxed some pins).
- Updated default container registry settings.

Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Signed-off-by: lkomali <lkomali@nvidia.com> Co-authored-by: lkomali <lkomali@nvidia.com> Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

coderabbitai · 2025-10-14T22:01:24Z

Walkthrough

Updates default container registries/tags to nvcr.io/nvidia/ai-dynamo:0.6.0 across deployments, docs, tests, and examples. Migrates benchmarking from GenAI-Perf to AIPerf in code, scripts, and artifact handling. Adds git to Docker images and updates Python requirements. Minor import path adjustments, file pattern updates, and small Rust test cleanup.

Changes

Cohort / File(s)	Summary
Registry defaults `Earthfile`, `deploy/cloud/operator/Earthfile`	Default DOCKER_SERVER switched to nvcr.io/nvidia/ai-dynamo in Earthly targets.
VLLM deployment images `components/backends/vllm/deploy/`, `recipes/llama-3-70b/vllm/.../deploy.yaml`, `examples/basics/kubernetes/Distributed_Inference/agg_router.yaml`, `tests/planner/perf_test_configs/vllm*.yaml`	Image tags updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 across Frontend/Decode/Prefill; no other spec changes.
TRTLLM deployment images `components/backends/trtllm/deploy/*`, `recipes/gpt-oss-120b/trtllm/agg/deploy.yaml`	Image tags updated to nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0 in Frontend/Worker/Planner specs.
SGLang deployment images `components/backends/sglang/deploy/*`	Image tags set to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0; disagg.yaml also fixes command from python3E to python3.
Examples and ECS images `examples/multimodal/deploy/`, `examples/custom_backend/hello_world/deploy/hello_world.yaml`, `examples/deployments/ECS/task_definition_.json`	Images switched to nvcr.io/nvidia/ai-dynamo/*:0.6.0 across multimodal, custom backend, and ECS task definitions.
Docs image/version refs `docs/_includes/install.rst`, `docs/backends//.md`, `docs/kubernetes/.md`, `components/backends//deploy/README.md`, `README.md`	Documentation updated to reference 0.6.0 images, new secret name, and rename GenAI-Perf to AIPerf where applicable.
Benchmarks: AIPerf migration (code) `benchmarks/llm/plot_pareto.py`, `benchmarks/utils/genai.py`, `benchmarks/utils/plot.py`, `benchmarks/profiler/profile_.py`, `benchmarks/profiler/utils/(aiperf	profile_*.py
Benchmarks: AIPerf migration (scripts/configs) `benchmarks/llm/perf.sh`, `benchmarks/README.md`, `benchmarks/incluster/benchmark_job.yaml`, `docs/benchmarks/benchmarking.md`, `recipes/llama-3-70b/vllm/*/perf.yaml`	CLI/tool name switched to aiperf; artifact lookups updated to AIPerf JSON/CSV; example images pinned to 0.6.0.
Container builds `container/Dockerfile.sglang`, `container/Dockerfile.sglang-wideep`, `container/Dockerfile.trtllm`, `container/Dockerfile.vllm`, `container/deps/requirements.txt`	Added git to runtime/build stages; switch Python dep genai-perf→aiperf; add aiconfigurator; relax pydantic pin, remove numpy pin, add scipy<1.14.0.
SGLang launch tweak `components/backends/sglang/launch/disagg_dp_attn.sh`	Remove expert distribution recorder features; replace with load-balancing flags (round robin).
Rust parsers cleanup `lib/parsers/Cargo.toml`, `lib/parsers/tests/mod.rs`	Remove dev-dependency and associated test module/includes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Runner as Benchmark Runner
  participant AIP as AIPerf CLI
  participant Svc as Model Service
  participant FS as Artifacts (JSON/CSV)
  participant Prof as Profiler/Parsers
  participant Plot as Plotting

  User->>Runner: Start benchmark
  Runner->>AIP: aiperf profile ... (isl/osl/concurrency)
  AIP->>Svc: Send requests (prefill/decode)
  Svc-->>AIP: Responses/latencies
  AIP-->>FS: Write profile_export_aiperf.json/.csv

  Note over Prof,FS: Updated readers expect "records" schema
  Runner->>Prof: Parse artifacts
  Prof->>FS: Read profile_export_aiperf.json
  Prof-->>Runner: Metrics (ttft, itl, throughput)
  Runner->>Plot: Build Pareto plots
  Plot-->>User: Rendered charts

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I thump my paw at versioned skies,
Tags hop to 0.6.0—no disguise.
AIPerf drums a quicker beat,
Artifacts crisp, results neat.
With git in paws and YAML rows,
I nibble bytes as workflow flows—
Carrots, charts, and green light glows. 🥕✨

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The description only provides the Overview section and omits the required Details, Where should the reviewer start, and Related Issues sections from the repository’s template, leaving out crucial information about the changes and review guidance.	Please fill out the template by adding a Details section describing the specific changes, a Where should the reviewer start section highlighting key files, and a Related Issues section linking or closing relevant issue numbers.
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly identifies the main change—benchmarks now use AIPerf—and is specific enough for a teammate to grasp the feature update, even though it includes an internal cherry-pick reference.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

docs/benchmarks/benchmarking.md (1)

286-287: Update stale artifact name references.

Lines 286-287 (and line 460) still reference the old artifact name profile_export_genai_perf.json. These should be updated to profile_export_aiperf.json to match the changes at lines 304-306 and maintain consistency throughout the documentation.

Apply this diff to fix the inconsistency:

 results/                         # Client-side: ./benchmarks/results/ or custom dir
 ├── plots/                       # Server-side: /data/results/
 │   ├── SUMMARY.txt              # Performance visualization plots
 │   ├── p50_inter_token_latency_vs_concurrency.png
 │   ├── avg_inter_token_latency_vs_concurrency.png
 │   ├── request_throughput_vs_concurrency.png
 │   ├── efficiency_tok_s_gpu_vs_user.png
 │   └── avg_time_to_first_token_vs_concurrency.png
 ├── <your-benchmark-name>/       # Results for your benchmark (uses your custom name)
 │   ├── c1/                      # Concurrency level 1
-│   │   └── profile_export_genai_perf.json
+│   │   └── profile_export_aiperf.json
 │   ├── c2/                      # Concurrency level 2

Similar fix needed around line 460:

 /data/results/
 └── <benchmark-name>/                # Results for your benchmark name
     ├── c1/                          # Concurrency level 1
-    │   └── profile_export_genai_perf.json
+    │   └── profile_export_aiperf.json
     ├── c2/                          # Concurrency level 2

benchmarks/profiler/utils/profile_cache.py (1)

45-49: Update cached-result readers to the new AIPerf schema

These helpers still look for the GenAI-Perf layout (time_to_first_token, inter_token_latency, output_token_throughput at the top level). AIPerf now nests metrics under records, so every skip/load check will silently fail—forcing redundant reruns and ignoring existing data. Please switch all lookups to the new structure.

Suggested fix:

-            if "time_to_first_token" in data and "avg" in data["time_to_first_token"]:
+            if (
+                "records" in data
+                and "ttft" in data["records"]
+                and "avg" in data["records"]["ttft"]
+            ):
@@
-                ttft = data["time_to_first_token"]["avg"]
+                ttft = data["records"]["ttft"]["avg"]
@@
-            if "inter_token_latency" in data and "avg" in data["inter_token_latency"]:
+            if (
+                "records" in data
+                and "inter_token_latency" in data["records"]
+                and "avg" in data["records"]["inter_token_latency"]
+            ):
@@
-                itl = data["inter_token_latency"]["avg"]
-                thpt_per_gpu = data["output_token_throughput"]["avg"] / tp_size
+                itl = data["records"]["inter_token_latency"]["avg"]
+                thpt_per_gpu = data["records"]["output_token_throughput"]["avg"] / tp_size

Also applies to: 80-84, 103-105, 128-134

🧹 Nitpick comments (6)

container/deps/requirements.txt (2)
30-30: Reintroduce an upper bound for Pydantic.

Switching to pydantic>=2.10.6 leaves us exposed to the eventual 3.x release, which will ship breaking changes. Please keep a <3 cap (or equivalent) so future installs remain stable.

Apply this diff:
-pydantic>=2.10.6
+pydantic>=2.10.6,<3
30-34: Keep NumPy constrained away from 2.x.

We dropped the explicit NumPy pin, but nothing now prevents a future resolver (e.g., once SciPy loosens its cap) from pulling in NumPy 2.x, which pmdarima still does not support. Please add back an explicit <2 guard (with an appropriate minimum) so we don’t regress when upstream relaxes its requirements.

Suggested addition:
+pydantic>=2.10.6,<3
+numpy>=1.24,<2
 scipy<1.14.0  # Pin scipy version for pmdarima compatibility
recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (1)
37-65: Consider renaming the artifacts directory for clarity.

We're now generating AIPerf outputs but still writing them under /tmp/genai, which is confusing for anyone inspecting artifacts by hand. Renaming the directory keeps the intent obvious and avoids future mixups.
-          export ARTIFACT_DIR="/tmp/genai"
+          export ARTIFACT_DIR="/tmp/aiperf"
@@
-          PERF_JSON=$(find $ARTIFACT_DIR -name profile_export_aiperf.json)
+          PERF_JSON=$(find $ARTIFACT_DIR -name profile_export_aiperf.json)
@@
-          PERF_CSV=$(find $ARTIFACT_DIR -name profile_export_aiperf.csv)
+          PERF_CSV=$(find $ARTIFACT_DIR -name profile_export_aiperf.csv)
benchmarks/utils/genai.py (1)

1-1: Consider renaming the module for consistency.

The module filename is still genai.py, but the primary function is now run_aiperf. For clarity and consistency, consider renaming this module to aiperf.py in a follow-up change.

If this naming is intentional for backward compatibility or will be addressed in a subsequent PR, you can safely ignore this suggestion.

benchmarks/utils/plot.py (1)

35-50: LGTM! Successfully migrated from GenAI-Perf to AIPerf naming.

The file pattern and variable renaming from genai_perf to aiperf has been applied consistently throughout the parsing logic:

File pattern: profile_export_genai_perf.json → profile_export_aiperf.json

Variable: genai_perf_json → aiperf_json

Error messages updated to reference aiperf

The logic remains unchanged, ensuring a clean migration to AIPerf.

Note: The broad exception catch on line 47 was flagged by static analysis. While not introduced by this PR, consider refining the exception handling in a future update to catch specific exceptions (e.g., json.JSONDecodeError, OSError) for better error diagnostics.
benchmarks/llm/plot_pareto.py (1)
85-129: Tighten the pairing of JSON/config lists

Using zip() without strict=True hides length mismatches; the loop silently drops unmatched entries, yielding incomplete results. With strict=True, we fail fast if the artifact/config discovery falls out of sync, which is safer for downstream plots.
-    for aiperf_profile_export_json_path, deployment_config_json_path in zip(
-        aiperf_profile_export_json_paths, deployment_config_json_paths
-    ):
+    for aiperf_profile_export_json_path, deployment_config_json_path in zip(
+        aiperf_profile_export_json_paths,
+        deployment_config_json_paths,
+        strict=True,
+    ):

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5f1335 and bb0fb08.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (76)

Earthfile (3 hunks)
README.md (1 hunks)
benchmarks/README.md (1 hunks)
benchmarks/incluster/benchmark_job.yaml (1 hunks)
benchmarks/llm/perf.sh (2 hunks)
benchmarks/llm/plot_pareto.py (6 hunks)
benchmarks/profiler/profile_endpoint.py (1 hunks)
benchmarks/profiler/profile_sla.py (3 hunks)
benchmarks/profiler/utils/aiperf.py (8 hunks)
benchmarks/profiler/utils/config.py (1 hunks)
benchmarks/profiler/utils/profile_cache.py (5 hunks)
benchmarks/profiler/utils/profile_decode.py (2 hunks)
benchmarks/profiler/utils/profile_prefill.py (2 hunks)
benchmarks/utils/genai.py (4 hunks)
benchmarks/utils/plot.py (1 hunks)
components/backends/sglang/deploy/README.md (2 hunks)
components/backends/sglang/deploy/agg.yaml (2 hunks)
components/backends/sglang/deploy/agg_logging.yaml (2 hunks)
components/backends/sglang/deploy/agg_router.yaml (2 hunks)
components/backends/sglang/deploy/disagg-multinode.yaml (3 hunks)
components/backends/sglang/deploy/disagg.yaml (3 hunks)
components/backends/sglang/deploy/disagg_planner.yaml (4 hunks)
components/backends/sglang/launch/disagg_dp_attn.sh (2 hunks)
components/backends/trtllm/deploy/README.md (3 hunks)
components/backends/trtllm/deploy/agg-with-config.yaml (2 hunks)
components/backends/trtllm/deploy/agg.yaml (2 hunks)
components/backends/trtllm/deploy/agg_router.yaml (2 hunks)
components/backends/trtllm/deploy/disagg-multinode.yaml (3 hunks)
components/backends/trtllm/deploy/disagg.yaml (3 hunks)
components/backends/trtllm/deploy/disagg_planner.yaml (4 hunks)
components/backends/trtllm/deploy/disagg_router.yaml (3 hunks)
components/backends/vllm/deploy/README.md (2 hunks)
components/backends/vllm/deploy/agg.yaml (2 hunks)
components/backends/vllm/deploy/agg_kvbm.yaml (2 hunks)
components/backends/vllm/deploy/agg_router.yaml (2 hunks)
components/backends/vllm/deploy/disagg-multinode.yaml (3 hunks)
components/backends/vllm/deploy/disagg.yaml (3 hunks)
components/backends/vllm/deploy/disagg_kvbm.yaml (3 hunks)
components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (3 hunks)
components/backends/vllm/deploy/disagg_kvbm_tp2.yaml (3 hunks)
components/backends/vllm/deploy/disagg_planner.yaml (4 hunks)
components/backends/vllm/deploy/disagg_router.yaml (3 hunks)
container/Dockerfile.sglang (2 hunks)
container/Dockerfile.sglang-wideep (2 hunks)
container/Dockerfile.trtllm (1 hunks)
container/Dockerfile.vllm (1 hunks)
container/deps/requirements.txt (2 hunks)
deploy/cloud/operator/Earthfile (1 hunks)
docs/_includes/install.rst (2 hunks)
docs/backends/sglang/README.md (1 hunks)
docs/backends/trtllm/gpt-oss.md (1 hunks)
docs/benchmarks/benchmarking.md (7 hunks)
docs/kubernetes/create_deployment.md (1 hunks)
docs/kubernetes/sla_planner_quickstart.md (1 hunks)
examples/basics/kubernetes/Distributed_Inference/agg_router.yaml (2 hunks)
examples/custom_backend/hello_world/deploy/hello_world.yaml (2 hunks)
examples/deployments/ECS/task_definition_frontend.json (1 hunks)
examples/deployments/ECS/task_definition_prefillworker.json (1 hunks)
examples/multimodal/deploy/agg_llava.yaml (4 hunks)
examples/multimodal/deploy/agg_qwen.yaml (4 hunks)
lib/parsers/Cargo.toml (0 hunks)
lib/parsers/tests/mod.rs (0 hunks)
recipes/gpt-oss-120b/trtllm/agg/deploy.yaml (2 hunks)
recipes/llama-3-70b/vllm/agg/deploy.yaml (2 hunks)
recipes/llama-3-70b/vllm/agg/perf.yaml (3 hunks)
recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml (3 hunks)
recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (3 hunks)
recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml (3 hunks)
recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (3 hunks)
tests/planner/perf_test_configs/agg_8b.yaml (2 hunks)
tests/planner/perf_test_configs/disagg_8b_2p2d.yaml (3 hunks)
tests/planner/perf_test_configs/disagg_8b_3p1d.yaml (3 hunks)
tests/planner/perf_test_configs/disagg_8b_planner.yaml (4 hunks)
tests/planner/perf_test_configs/disagg_8b_tp2.yaml (3 hunks)
tests/planner/perf_test_configs/image_cache_daemonset.yaml (1 hunks)
tests/planner/scaling/disagg_planner.yaml (4 hunks)

💤 Files with no reviewable changes (2)

lib/parsers/tests/mod.rs
lib/parsers/Cargo.toml

🧰 Additional context used

🧬 Code graph analysis (5)

benchmarks/profiler/utils/profile_prefill.py (1)

benchmarks/profiler/utils/aiperf.py (1)

benchmark_prefill (154-186)

components/backends/sglang/launch/disagg_dp_attn.sh (1)

lib/bindings/python/rust/lib.rs (1)

round_robin (760-794)

benchmarks/profiler/profile_endpoint.py (2)

benchmarks/profiler/utils/profile_decode.py (1)

profile_decode (104-142)

benchmarks/profiler/utils/profile_prefill.py (1)

profile_prefill (74-102)

benchmarks/profiler/utils/profile_decode.py (1)

benchmarks/profiler/utils/aiperf.py (1)

benchmark_decode (189-247)

benchmarks/profiler/profile_sla.py (2)

benchmarks/profiler/utils/aiperf.py (2)

benchmark_decode (189-247)

benchmark_prefill (154-186)

deploy/utils/dynamo_deployment.py (1)

get_service_url (212-218)

🪛 LanguageTool

README.md

[grammar] ~181-~181: There might be a mistake here.
Context: ...ggregated vs. vanilla vLLM) using AIPerf - **[Pre-Deployment Profiling](docs/benchmark...

(QB_NEW_EN)

docs/benchmarks/benchmarking.md

[grammar] ~187-~187: There might be a mistake here.
Context: ...marking The Python benchmarking module: 1. Connects to your port-forwarded endpoi...

(QB_NEW_EN)

[grammar] ~188-~188: There might be a mistake here.
Context: ...nnects** to your port-forwarded endpoint 2. Benchmarks using AIPerf at various con...

(QB_NEW_EN)

[grammar] ~189-~189: There might be a mistake here.
Context: ...els (default: 1, 2, 5, 10, 50, 100, 250) 3. Measures key metrics: latency, through...

(QB_NEW_EN)

🪛 Ruff (0.14.0)

benchmarks/utils/plot.py

47-47: Do not catch blind exception: Exception

(BLE001)

benchmarks/utils/genai.py

83-83: subprocess call: check for execution of untrusted input

(S603)

benchmarks/profiler/utils/aiperf.py

76-99: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

112-137: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

147-149: Avoid specifying long messages outside the exception class

(TRY003)

171-171: subprocess call: check for execution of untrusted input

(S603)

214-214: subprocess call: check for execution of untrusted input

(S603)

232-232: subprocess call: check for execution of untrusted input

(S603)

benchmarks/llm/plot_pareto.py

36-36: Create your own exception

(TRY002)

36-36: Avoid specifying long messages outside the exception class

(TRY003)

85-87: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

266-268: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: Build and Test - dynamo
GitHub Check: tests (.)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: clippy (.)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: clippy (lib/bindings/python)
GitHub Check: tests (lib/bindings/python)

🔇 Additional comments (33)

examples/deployments/ECS/task_definition_prefillworker.json (1)

6-6: Image tag update LGTM

Tagging the prefill worker to vllm-runtime:0.6.0 matches the rest of the rollout and keeps ECS in sync. Nothing else to change here.

components/backends/trtllm/deploy/agg.yaml (1)

16-16: LGTM! Image references updated consistently.

The container image references have been updated from placeholder values to the concrete NVIDIA registry with version 0.6.0 for both Frontend and TRTLLMWorker services.

Also applies to: 27-27

deploy/cloud/operator/Earthfile (1)

48-48: LGTM! Default registry updated.

The default DOCKER_SERVER has been updated to align with the NVIDIA registry used across other deployment configurations.

recipes/llama-3-70b/vllm/agg/deploy.yaml (1)

21-21: LGTM! vLLM runtime images updated consistently.

Both Frontend and VllmPrefillWorker services now reference the pinned vLLM runtime image at version 0.6.0.

Also applies to: 40-40

examples/multimodal/deploy/agg_qwen.yaml (1)

17-17: LGTM! All service images updated consistently.

All four services (Frontend, EncodeWorker, VLMWorker, and Processor) have been updated to use the vLLM runtime image version 0.6.0 from the NVIDIA registry.

Also applies to: 28-28, 45-45, 62-62

recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml (1)

21-21: LGTM! Disaggregated deployment images updated.

All three services (Frontend, VllmPrefillWorker, and VllmDecodeWorker) in the multi-node disaggregated deployment now use the vLLM runtime version 0.6.0.

Also applies to: 40-40, 64-64

components/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)

16-16: LGTM! KVBM deployment images updated.

All services in the disaggregated KVBM tensor-parallel deployment (Frontend, VllmDecodeWorker, and VllmPrefillWorker) have been updated to version 0.6.0.

Also applies to: 29-29, 67-67

components/backends/trtllm/deploy/agg_router.yaml (1)

16-16: LGTM! Router deployment images updated.

Both Frontend and TRTLLMWorker services in the router-enabled aggregated deployment now use the trtllm-runtime version 0.6.0.

Also applies to: 30-30

components/backends/vllm/deploy/agg_kvbm.yaml (1)

16-16: LGTM! Aggregated KVBM images updated.

Both Frontend and VllmDecodeWorker services in the aggregated KVBM deployment have been updated to vLLM runtime version 0.6.0.

Also applies to: 34-34

components/backends/vllm/deploy/disagg.yaml (1)

16-16: LGTM! Image references standardized.

The container image references have been properly updated from placeholder tags to the official NVIDIA AI-Dynamo runtime image (version 0.6.0). This ensures consistent deployment across all components.

Also applies to: 28-28, 48-48

Earthfile (1)

137-137: LGTM! Default registry standardized.

The default DOCKER_SERVER argument has been properly updated across all build targets to use the official NVIDIA container registry. This aligns with the repository-wide standardization on official NVIDIA AI-Dynamo images.

Also applies to: 178-178, 192-192

benchmarks/utils/genai.py (1)

36-116: LGTM! Function properly migrated to AIPerf.

The function has been correctly renamed from run_genai_perf to run_aiperf, and all internal references (command name, process variable, log messages) have been consistently updated. The implementation correctly invokes the aiperf CLI tool with appropriate arguments.

docs/benchmarks/benchmarking.md (1)

64-64: LGTM! Documentation properly updated to reference AIPerf.

The documentation has been correctly updated to reference AIPerf throughout, including tool descriptions, command examples, artifact names, and container image references. The changes are consistent with the broader migration from GenAI-Perf to AIPerf.

Also applies to: 73-73, 168-168, 182-182, 189-189, 304-306, 413-413, 519-519

docs/backends/sglang/README.md (1)

133-133: LGTM! Documentation updated with correct image tag.

The Docker pull command has been updated to reference the official NVIDIA AI-Dynamo SGLang runtime image with version 0.6.0, consistent with the repository-wide image standardization.

tests/planner/perf_test_configs/agg_8b.yaml (1)

41-41: LGTM! Test configuration updated with official images.

The test configuration now uses the official NVIDIA AI-Dynamo vLLM runtime image (version 0.6.0) for both frontend and worker components, ensuring consistent test environments.

Also applies to: 91-91

components/backends/sglang/deploy/disagg-multinode.yaml (1)

25-25: LGTM! Multi-node deployment standardized on official images.

All three container references (frontend, decode worker, prefill worker) have been properly updated to use the official NVIDIA AI-Dynamo SGLang runtime image version 0.6.0, ensuring consistent multi-node deployments.

Also applies to: 38-38, 73-73

tests/planner/perf_test_configs/image_cache_daemonset.yaml (1)

23-23: LGTM! Image cache DaemonSet updated.

The image cache DaemonSet now references the official NVIDIA AI-Dynamo vLLM runtime image version 0.6.0, ensuring that the correct image is pre-cached on cluster nodes.

components/backends/sglang/deploy/agg.yaml (1)

16-16: LGTM! Image references updated consistently.

The container image references have been updated from placeholder tags to the NVIDIA AI-Dynamo registry with version 0.6.0 for both Frontend and decode worker components. This aligns with the broader image pinning strategy across the repository.

Also applies to: 27-27

components/backends/sglang/deploy/agg_router.yaml (1)

16-16: LGTM! Consistent image pinning.

Image references updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0 for both Frontend and decode worker, maintaining consistency with other SGLang deployment configurations.

Also applies to: 30-30

components/backends/sglang/deploy/agg_logging.yaml (1)

19-19: LGTM! Image updates aligned with deployment standards.

Container images updated to the NVIDIA AI-Dynamo registry (version 0.6.0) for both Frontend and worker components, consistent with the repository-wide image pinning updates.

Also applies to: 30-30

docs/backends/trtllm/gpt-oss.md (1)

52-52: LGTM! Documentation updated to reflect the pinned image version.

The documentation now references the concrete NVIDIA AI-Dynamo TensorRT-LLM runtime image version 0.6.0, replacing the placeholder tag. This ensures users follow the correct deployment instructions aligned with the repository's image pinning strategy.

components/backends/sglang/deploy/disagg.yaml (2)

64-64: Excellent catch! Critical typo fixed.

The command has been corrected from python3E to python3. This typo would have caused the prefill worker to fail at startup. The decode worker at line 31 already had the correct command, making this an important consistency fix.

16-16: LGTM! Image references updated consistently.

Container images updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0 across Frontend, decode, and prefill workers, aligning with the repository-wide image pinning strategy.

Also applies to: 28-28, 61-61

benchmarks/profiler/profile_endpoint.py (1)

8-9: LGTM! Import paths updated to absolute references.

The imports have been changed from relative (utils.profile_decode, utils.profile_prefill) to absolute paths (benchmarks.profiler.utils.profile_decode, benchmarks.profiler.utils.profile_prefill). This improves code clarity and aligns with the broader profiler restructuring as part of the AIPerf migration.

tests/planner/scaling/disagg_planner.yaml (1)

21-21: LGTM! Test configuration images updated consistently.

All four service components (Frontend, Planner, VllmDecodeWorker, VllmPrefillWorker) have been updated to use the NVIDIA AI-Dynamo vLLM runtime image version 0.6.0, ensuring test configurations align with production deployment standards.

Also applies to: 44-44, 81-81, 105-105

tests/planner/perf_test_configs/disagg_8b_3p1d.yaml (1)

41-41: LGTM! Performance test configuration images updated.

Container images for Frontend, VllmDecodeWorker, and VllmPrefillWorker components have been updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, maintaining consistency with the test configuration's 3-prefill-1-decode worker setup.

Also applies to: 91-91, 141-141

benchmarks/profiler/utils/config.py (1)

114-114: LGTM! Image reference standardized to NVIDIA registry.

The update from a placeholder to the official NVIDIA NGC registry image (nvcr.io/nvidia/ai-dynamo/dynamo-runtime:0.6.0) aligns with the repository-wide standardization observed across deployment configs and test files.

components/backends/sglang/deploy/README.md (1)

64-64: LGTM! Documentation updated to reflect new image registry.

The image references in the documentation examples have been correctly updated to use the NVIDIA NGC registry (nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0), consistent with the actual deployment file changes in this PR.

Also applies to: 95-95

tests/planner/perf_test_configs/disagg_8b_2p2d.yaml (1)

41-41: LGTM! Test configuration images updated consistently.

All three container image references (Frontend, VllmDecodeWorker, VllmPrefillWorker) have been uniformly updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, maintaining consistency across the deployment specification.

Also applies to: 91-91, 141-141

tests/planner/perf_test_configs/disagg_8b_tp2.yaml (1)

41-41: LGTM! Test configuration images updated consistently.

Image references updated uniformly across all services to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, matching the pattern in other test configurations.

Also applies to: 91-91, 141-141

components/backends/sglang/deploy/disagg_planner.yaml (1)

19-19: LGTM! Deployment configuration images updated consistently.

All four service containers (Frontend, Planner, decode worker, prefill worker) have been updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0, ensuring consistent runtime across the disaggregated deployment with planner.

Also applies to: 30-30, 52-52, 84-84

tests/planner/perf_test_configs/disagg_8b_planner.yaml (1)

44-44: LGTM! Test configuration with planner updated consistently.

All four service images (Frontend, Planner, VllmDecodeWorker, VllmPrefillWorker) have been uniformly updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, ensuring consistent testing environment.

Also applies to: 77-77, 141-141, 198-198

components/backends/trtllm/deploy/README.md (1)

92-92: LGTM! Documentation updated to reflect new image registry.

All three references to the TensorRT-LLM runtime image have been correctly updated to nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0:

Container configuration example

Prerequisites/container images note

Usage/customization section

This ensures the documentation is synchronized with the actual deployment configurations.

Also applies to: 112-112, 144-144

saturley-hall requested review from biswapanda and debermudez October 14, 2025 22:01

saturley-hall requested a review from a team as a code owner October 14, 2025 22:01

saturley-hall requested a review from a team October 14, 2025 22:01

saturley-hall requested review from a team as code owners October 14, 2025 22:01

pull-request-size Bot added the size/XL label Oct 14, 2025

github-actions Bot added the feat label Oct 14, 2025

saturley-hall changed the base branch from main to release/0.6.0 October 14, 2025 22:06

pull-request-size Bot added size/L and removed size/XL labels Oct 14, 2025

coderabbitai Bot reviewed Oct 14, 2025

View reviewed changes

debermudez approved these changes Oct 14, 2025

View reviewed changes

saturley-hall merged commit a61a800 into release/0.6.0 Oct 14, 2025
28 of 29 checks passed

saturley-hall deleted the harrison/cherry-pick-pr3306-benchmarks-use-aiperf branch October 14, 2025 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cherry pick PR#3306 benchmarks use aiperf#3626

feat: cherry pick PR#3306 benchmarks use aiperf#3626
saturley-hall merged 1 commit into
release/0.6.0from
harrison/cherry-pick-pr3306-benchmarks-use-aiperf

saturley-hall commented Oct 14, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 14, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

saturley-hall commented Oct 14, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

saturley-hall commented Oct 14, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 14, 2025 •

edited

Loading