feat: cherry pick PR#3306 benchmarks use aiperf#3626
Conversation
Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Signed-off-by: lkomali <lkomali@nvidia.com> Co-authored-by: lkomali <lkomali@nvidia.com> Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
WalkthroughUpdates default container registries/tags to nvcr.io/nvidia/ai-dynamo:0.6.0 across deployments, docs, tests, and examples. Migrates benchmarking from GenAI-Perf to AIPerf in code, scripts, and artifact handling. Adds git to Docker images and updates Python requirements. Minor import path adjustments, file pattern updates, and small Rust test cleanup. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Runner as Benchmark Runner
participant AIP as AIPerf CLI
participant Svc as Model Service
participant FS as Artifacts (JSON/CSV)
participant Prof as Profiler/Parsers
participant Plot as Plotting
User->>Runner: Start benchmark
Runner->>AIP: aiperf profile ... (isl/osl/concurrency)
AIP->>Svc: Send requests (prefill/decode)
Svc-->>AIP: Responses/latencies
AIP-->>FS: Write profile_export_aiperf.json/.csv
Note over Prof,FS: Updated readers expect "records" schema
Runner->>Prof: Parse artifacts
Prof->>FS: Read profile_export_aiperf.json
Prof-->>Runner: Metrics (ttft, itl, throughput)
Runner->>Plot: Build Pareto plots
Plot-->>User: Rendered charts
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/benchmarks/benchmarking.md (1)
286-287: Update stale artifact name references.Lines 286-287 (and line 460) still reference the old artifact name
profile_export_genai_perf.json. These should be updated toprofile_export_aiperf.jsonto match the changes at lines 304-306 and maintain consistency throughout the documentation.Apply this diff to fix the inconsistency:
results/ # Client-side: ./benchmarks/results/ or custom dir ├── plots/ # Server-side: /data/results/ │ ├── SUMMARY.txt # Performance visualization plots │ ├── p50_inter_token_latency_vs_concurrency.png │ ├── avg_inter_token_latency_vs_concurrency.png │ ├── request_throughput_vs_concurrency.png │ ├── efficiency_tok_s_gpu_vs_user.png │ └── avg_time_to_first_token_vs_concurrency.png ├── <your-benchmark-name>/ # Results for your benchmark (uses your custom name) │ ├── c1/ # Concurrency level 1 -│ │ └── profile_export_genai_perf.json +│ │ └── profile_export_aiperf.json │ ├── c2/ # Concurrency level 2Similar fix needed around line 460:
/data/results/ └── <benchmark-name>/ # Results for your benchmark name ├── c1/ # Concurrency level 1 - │ └── profile_export_genai_perf.json + │ └── profile_export_aiperf.json ├── c2/ # Concurrency level 2benchmarks/profiler/utils/profile_cache.py (1)
45-49: Update cached-result readers to the new AIPerf schemaThese helpers still look for the GenAI-Perf layout (
time_to_first_token,inter_token_latency,output_token_throughputat the top level). AIPerf now nests metrics underrecords, so every skip/load check will silently fail—forcing redundant reruns and ignoring existing data. Please switch all lookups to the new structure.Suggested fix:
- if "time_to_first_token" in data and "avg" in data["time_to_first_token"]: + if ( + "records" in data + and "ttft" in data["records"] + and "avg" in data["records"]["ttft"] + ): @@ - ttft = data["time_to_first_token"]["avg"] + ttft = data["records"]["ttft"]["avg"] @@ - if "inter_token_latency" in data and "avg" in data["inter_token_latency"]: + if ( + "records" in data + and "inter_token_latency" in data["records"] + and "avg" in data["records"]["inter_token_latency"] + ): @@ - itl = data["inter_token_latency"]["avg"] - thpt_per_gpu = data["output_token_throughput"]["avg"] / tp_size + itl = data["records"]["inter_token_latency"]["avg"] + thpt_per_gpu = data["records"]["output_token_throughput"]["avg"] / tp_sizeAlso applies to: 80-84, 103-105, 128-134
🧹 Nitpick comments (6)
container/deps/requirements.txt (2)
30-30: Reintroduce an upper bound for Pydantic.Switching to
pydantic>=2.10.6leaves us exposed to the eventual 3.x release, which will ship breaking changes. Please keep a<3cap (or equivalent) so future installs remain stable.Apply this diff:
-pydantic>=2.10.6 +pydantic>=2.10.6,<3
30-34: Keep NumPy constrained away from 2.x.We dropped the explicit NumPy pin, but nothing now prevents a future resolver (e.g., once SciPy loosens its cap) from pulling in NumPy 2.x, which pmdarima still does not support. Please add back an explicit
<2guard (with an appropriate minimum) so we don’t regress when upstream relaxes its requirements.Suggested addition:
+pydantic>=2.10.6,<3 +numpy>=1.24,<2 scipy<1.14.0 # Pin scipy version for pmdarima compatibilityrecipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (1)
37-65: Consider renaming the artifacts directory for clarity.We're now generating AIPerf outputs but still writing them under
/tmp/genai, which is confusing for anyone inspecting artifacts by hand. Renaming the directory keeps the intent obvious and avoids future mixups.- export ARTIFACT_DIR="/tmp/genai" + export ARTIFACT_DIR="/tmp/aiperf" @@ - PERF_JSON=$(find $ARTIFACT_DIR -name profile_export_aiperf.json) + PERF_JSON=$(find $ARTIFACT_DIR -name profile_export_aiperf.json) @@ - PERF_CSV=$(find $ARTIFACT_DIR -name profile_export_aiperf.csv) + PERF_CSV=$(find $ARTIFACT_DIR -name profile_export_aiperf.csv)benchmarks/utils/genai.py (1)
1-1: Consider renaming the module for consistency.The module filename is still
genai.py, but the primary function is nowrun_aiperf. For clarity and consistency, consider renaming this module toaiperf.pyin a follow-up change.If this naming is intentional for backward compatibility or will be addressed in a subsequent PR, you can safely ignore this suggestion.
benchmarks/utils/plot.py (1)
35-50: LGTM! Successfully migrated from GenAI-Perf to AIPerf naming.The file pattern and variable renaming from
genai_perftoaiperfhas been applied consistently throughout the parsing logic:
- File pattern:
profile_export_genai_perf.json→profile_export_aiperf.json- Variable:
genai_perf_json→aiperf_json- Error messages updated to reference
aiperfThe logic remains unchanged, ensuring a clean migration to AIPerf.
Note: The broad exception catch on line 47 was flagged by static analysis. While not introduced by this PR, consider refining the exception handling in a future update to catch specific exceptions (e.g.,
json.JSONDecodeError,OSError) for better error diagnostics.benchmarks/llm/plot_pareto.py (1)
85-129: Tighten the pairing of JSON/config listsUsing
zip()withoutstrict=Truehides length mismatches; the loop silently drops unmatched entries, yielding incomplete results. Withstrict=True, we fail fast if the artifact/config discovery falls out of sync, which is safer for downstream plots.- for aiperf_profile_export_json_path, deployment_config_json_path in zip( - aiperf_profile_export_json_paths, deployment_config_json_paths - ): + for aiperf_profile_export_json_path, deployment_config_json_path in zip( + aiperf_profile_export_json_paths, + deployment_config_json_paths, + strict=True, + ):
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (76)
Earthfile(3 hunks)README.md(1 hunks)benchmarks/README.md(1 hunks)benchmarks/incluster/benchmark_job.yaml(1 hunks)benchmarks/llm/perf.sh(2 hunks)benchmarks/llm/plot_pareto.py(6 hunks)benchmarks/profiler/profile_endpoint.py(1 hunks)benchmarks/profiler/profile_sla.py(3 hunks)benchmarks/profiler/utils/aiperf.py(8 hunks)benchmarks/profiler/utils/config.py(1 hunks)benchmarks/profiler/utils/profile_cache.py(5 hunks)benchmarks/profiler/utils/profile_decode.py(2 hunks)benchmarks/profiler/utils/profile_prefill.py(2 hunks)benchmarks/utils/genai.py(4 hunks)benchmarks/utils/plot.py(1 hunks)components/backends/sglang/deploy/README.md(2 hunks)components/backends/sglang/deploy/agg.yaml(2 hunks)components/backends/sglang/deploy/agg_logging.yaml(2 hunks)components/backends/sglang/deploy/agg_router.yaml(2 hunks)components/backends/sglang/deploy/disagg-multinode.yaml(3 hunks)components/backends/sglang/deploy/disagg.yaml(3 hunks)components/backends/sglang/deploy/disagg_planner.yaml(4 hunks)components/backends/sglang/launch/disagg_dp_attn.sh(2 hunks)components/backends/trtllm/deploy/README.md(3 hunks)components/backends/trtllm/deploy/agg-with-config.yaml(2 hunks)components/backends/trtllm/deploy/agg.yaml(2 hunks)components/backends/trtllm/deploy/agg_router.yaml(2 hunks)components/backends/trtllm/deploy/disagg-multinode.yaml(3 hunks)components/backends/trtllm/deploy/disagg.yaml(3 hunks)components/backends/trtllm/deploy/disagg_planner.yaml(4 hunks)components/backends/trtllm/deploy/disagg_router.yaml(3 hunks)components/backends/vllm/deploy/README.md(2 hunks)components/backends/vllm/deploy/agg.yaml(2 hunks)components/backends/vllm/deploy/agg_kvbm.yaml(2 hunks)components/backends/vllm/deploy/agg_router.yaml(2 hunks)components/backends/vllm/deploy/disagg-multinode.yaml(3 hunks)components/backends/vllm/deploy/disagg.yaml(3 hunks)components/backends/vllm/deploy/disagg_kvbm.yaml(3 hunks)components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml(3 hunks)components/backends/vllm/deploy/disagg_kvbm_tp2.yaml(3 hunks)components/backends/vllm/deploy/disagg_planner.yaml(4 hunks)components/backends/vllm/deploy/disagg_router.yaml(3 hunks)container/Dockerfile.sglang(2 hunks)container/Dockerfile.sglang-wideep(2 hunks)container/Dockerfile.trtllm(1 hunks)container/Dockerfile.vllm(1 hunks)container/deps/requirements.txt(2 hunks)deploy/cloud/operator/Earthfile(1 hunks)docs/_includes/install.rst(2 hunks)docs/backends/sglang/README.md(1 hunks)docs/backends/trtllm/gpt-oss.md(1 hunks)docs/benchmarks/benchmarking.md(7 hunks)docs/kubernetes/create_deployment.md(1 hunks)docs/kubernetes/sla_planner_quickstart.md(1 hunks)examples/basics/kubernetes/Distributed_Inference/agg_router.yaml(2 hunks)examples/custom_backend/hello_world/deploy/hello_world.yaml(2 hunks)examples/deployments/ECS/task_definition_frontend.json(1 hunks)examples/deployments/ECS/task_definition_prefillworker.json(1 hunks)examples/multimodal/deploy/agg_llava.yaml(4 hunks)examples/multimodal/deploy/agg_qwen.yaml(4 hunks)lib/parsers/Cargo.toml(0 hunks)lib/parsers/tests/mod.rs(0 hunks)recipes/gpt-oss-120b/trtllm/agg/deploy.yaml(2 hunks)recipes/llama-3-70b/vllm/agg/deploy.yaml(2 hunks)recipes/llama-3-70b/vllm/agg/perf.yaml(3 hunks)recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml(3 hunks)recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml(3 hunks)recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml(3 hunks)recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml(3 hunks)tests/planner/perf_test_configs/agg_8b.yaml(2 hunks)tests/planner/perf_test_configs/disagg_8b_2p2d.yaml(3 hunks)tests/planner/perf_test_configs/disagg_8b_3p1d.yaml(3 hunks)tests/planner/perf_test_configs/disagg_8b_planner.yaml(4 hunks)tests/planner/perf_test_configs/disagg_8b_tp2.yaml(3 hunks)tests/planner/perf_test_configs/image_cache_daemonset.yaml(1 hunks)tests/planner/scaling/disagg_planner.yaml(4 hunks)
💤 Files with no reviewable changes (2)
- lib/parsers/tests/mod.rs
- lib/parsers/Cargo.toml
🧰 Additional context used
🧬 Code graph analysis (5)
benchmarks/profiler/utils/profile_prefill.py (1)
benchmarks/profiler/utils/aiperf.py (1)
benchmark_prefill(154-186)
components/backends/sglang/launch/disagg_dp_attn.sh (1)
lib/bindings/python/rust/lib.rs (1)
round_robin(760-794)
benchmarks/profiler/profile_endpoint.py (2)
benchmarks/profiler/utils/profile_decode.py (1)
profile_decode(104-142)benchmarks/profiler/utils/profile_prefill.py (1)
profile_prefill(74-102)
benchmarks/profiler/utils/profile_decode.py (1)
benchmarks/profiler/utils/aiperf.py (1)
benchmark_decode(189-247)
benchmarks/profiler/profile_sla.py (2)
benchmarks/profiler/utils/aiperf.py (2)
benchmark_decode(189-247)benchmark_prefill(154-186)deploy/utils/dynamo_deployment.py (1)
get_service_url(212-218)
🪛 LanguageTool
README.md
[grammar] ~181-~181: There might be a mistake here.
Context: ...ggregated vs. vanilla vLLM) using AIPerf - **[Pre-Deployment Profiling](docs/benchmark...
(QB_NEW_EN)
docs/benchmarks/benchmarking.md
[grammar] ~187-~187: There might be a mistake here.
Context: ...marking The Python benchmarking module: 1. Connects to your port-forwarded endpoi...
(QB_NEW_EN)
[grammar] ~188-~188: There might be a mistake here.
Context: ...nnects** to your port-forwarded endpoint 2. Benchmarks using AIPerf at various con...
(QB_NEW_EN)
[grammar] ~189-~189: There might be a mistake here.
Context: ...els (default: 1, 2, 5, 10, 50, 100, 250) 3. Measures key metrics: latency, through...
(QB_NEW_EN)
🪛 Ruff (0.14.0)
benchmarks/utils/plot.py
47-47: Do not catch blind exception: Exception
(BLE001)
benchmarks/utils/genai.py
83-83: subprocess call: check for execution of untrusted input
(S603)
benchmarks/profiler/utils/aiperf.py
76-99: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
112-137: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
147-149: Avoid specifying long messages outside the exception class
(TRY003)
171-171: subprocess call: check for execution of untrusted input
(S603)
214-214: subprocess call: check for execution of untrusted input
(S603)
232-232: subprocess call: check for execution of untrusted input
(S603)
benchmarks/llm/plot_pareto.py
36-36: Create your own exception
(TRY002)
36-36: Avoid specifying long messages outside the exception class
(TRY003)
85-87: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
266-268: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Build and Test - dynamo
- GitHub Check: tests (.)
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: clippy (.)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: tests (lib/bindings/python)
🔇 Additional comments (33)
examples/deployments/ECS/task_definition_prefillworker.json (1)
6-6: Image tag update LGTMTagging the prefill worker to
vllm-runtime:0.6.0matches the rest of the rollout and keeps ECS in sync. Nothing else to change here.components/backends/trtllm/deploy/agg.yaml (1)
16-16: LGTM! Image references updated consistently.The container image references have been updated from placeholder values to the concrete NVIDIA registry with version 0.6.0 for both Frontend and TRTLLMWorker services.
Also applies to: 27-27
deploy/cloud/operator/Earthfile (1)
48-48: LGTM! Default registry updated.The default DOCKER_SERVER has been updated to align with the NVIDIA registry used across other deployment configurations.
recipes/llama-3-70b/vllm/agg/deploy.yaml (1)
21-21: LGTM! vLLM runtime images updated consistently.Both Frontend and VllmPrefillWorker services now reference the pinned vLLM runtime image at version 0.6.0.
Also applies to: 40-40
examples/multimodal/deploy/agg_qwen.yaml (1)
17-17: LGTM! All service images updated consistently.All four services (Frontend, EncodeWorker, VLMWorker, and Processor) have been updated to use the vLLM runtime image version 0.6.0 from the NVIDIA registry.
Also applies to: 28-28, 45-45, 62-62
recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml (1)
21-21: LGTM! Disaggregated deployment images updated.All three services (Frontend, VllmPrefillWorker, and VllmDecodeWorker) in the multi-node disaggregated deployment now use the vLLM runtime version 0.6.0.
Also applies to: 40-40, 64-64
components/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)
16-16: LGTM! KVBM deployment images updated.All services in the disaggregated KVBM tensor-parallel deployment (Frontend, VllmDecodeWorker, and VllmPrefillWorker) have been updated to version 0.6.0.
Also applies to: 29-29, 67-67
components/backends/trtllm/deploy/agg_router.yaml (1)
16-16: LGTM! Router deployment images updated.Both Frontend and TRTLLMWorker services in the router-enabled aggregated deployment now use the trtllm-runtime version 0.6.0.
Also applies to: 30-30
components/backends/vllm/deploy/agg_kvbm.yaml (1)
16-16: LGTM! Aggregated KVBM images updated.Both Frontend and VllmDecodeWorker services in the aggregated KVBM deployment have been updated to vLLM runtime version 0.6.0.
Also applies to: 34-34
components/backends/vllm/deploy/disagg.yaml (1)
16-16: LGTM! Image references standardized.The container image references have been properly updated from placeholder tags to the official NVIDIA AI-Dynamo runtime image (version 0.6.0). This ensures consistent deployment across all components.
Also applies to: 28-28, 48-48
Earthfile (1)
137-137: LGTM! Default registry standardized.The default
DOCKER_SERVERargument has been properly updated across all build targets to use the official NVIDIA container registry. This aligns with the repository-wide standardization on official NVIDIA AI-Dynamo images.Also applies to: 178-178, 192-192
benchmarks/utils/genai.py (1)
36-116: LGTM! Function properly migrated to AIPerf.The function has been correctly renamed from
run_genai_perftorun_aiperf, and all internal references (command name, process variable, log messages) have been consistently updated. The implementation correctly invokes theaiperfCLI tool with appropriate arguments.docs/benchmarks/benchmarking.md (1)
64-64: LGTM! Documentation properly updated to reference AIPerf.The documentation has been correctly updated to reference AIPerf throughout, including tool descriptions, command examples, artifact names, and container image references. The changes are consistent with the broader migration from GenAI-Perf to AIPerf.
Also applies to: 73-73, 168-168, 182-182, 189-189, 304-306, 413-413, 519-519
docs/backends/sglang/README.md (1)
133-133: LGTM! Documentation updated with correct image tag.The Docker pull command has been updated to reference the official NVIDIA AI-Dynamo SGLang runtime image with version 0.6.0, consistent with the repository-wide image standardization.
tests/planner/perf_test_configs/agg_8b.yaml (1)
41-41: LGTM! Test configuration updated with official images.The test configuration now uses the official NVIDIA AI-Dynamo vLLM runtime image (version 0.6.0) for both frontend and worker components, ensuring consistent test environments.
Also applies to: 91-91
components/backends/sglang/deploy/disagg-multinode.yaml (1)
25-25: LGTM! Multi-node deployment standardized on official images.All three container references (frontend, decode worker, prefill worker) have been properly updated to use the official NVIDIA AI-Dynamo SGLang runtime image version 0.6.0, ensuring consistent multi-node deployments.
Also applies to: 38-38, 73-73
tests/planner/perf_test_configs/image_cache_daemonset.yaml (1)
23-23: LGTM! Image cache DaemonSet updated.The image cache DaemonSet now references the official NVIDIA AI-Dynamo vLLM runtime image version 0.6.0, ensuring that the correct image is pre-cached on cluster nodes.
components/backends/sglang/deploy/agg.yaml (1)
16-16: LGTM! Image references updated consistently.The container image references have been updated from placeholder tags to the NVIDIA AI-Dynamo registry with version 0.6.0 for both Frontend and decode worker components. This aligns with the broader image pinning strategy across the repository.
Also applies to: 27-27
components/backends/sglang/deploy/agg_router.yaml (1)
16-16: LGTM! Consistent image pinning.Image references updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0 for both Frontend and decode worker, maintaining consistency with other SGLang deployment configurations.
Also applies to: 30-30
components/backends/sglang/deploy/agg_logging.yaml (1)
19-19: LGTM! Image updates aligned with deployment standards.Container images updated to the NVIDIA AI-Dynamo registry (version 0.6.0) for both Frontend and worker components, consistent with the repository-wide image pinning updates.
Also applies to: 30-30
docs/backends/trtllm/gpt-oss.md (1)
52-52: LGTM! Documentation updated to reflect the pinned image version.The documentation now references the concrete NVIDIA AI-Dynamo TensorRT-LLM runtime image version 0.6.0, replacing the placeholder tag. This ensures users follow the correct deployment instructions aligned with the repository's image pinning strategy.
components/backends/sglang/deploy/disagg.yaml (2)
64-64: Excellent catch! Critical typo fixed.The command has been corrected from
python3Etopython3. This typo would have caused the prefill worker to fail at startup. The decode worker at line 31 already had the correct command, making this an important consistency fix.
16-16: LGTM! Image references updated consistently.Container images updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0 across Frontend, decode, and prefill workers, aligning with the repository-wide image pinning strategy.
Also applies to: 28-28, 61-61
benchmarks/profiler/profile_endpoint.py (1)
8-9: LGTM! Import paths updated to absolute references.The imports have been changed from relative (
utils.profile_decode,utils.profile_prefill) to absolute paths (benchmarks.profiler.utils.profile_decode,benchmarks.profiler.utils.profile_prefill). This improves code clarity and aligns with the broader profiler restructuring as part of the AIPerf migration.tests/planner/scaling/disagg_planner.yaml (1)
21-21: LGTM! Test configuration images updated consistently.All four service components (Frontend, Planner, VllmDecodeWorker, VllmPrefillWorker) have been updated to use the NVIDIA AI-Dynamo vLLM runtime image version 0.6.0, ensuring test configurations align with production deployment standards.
Also applies to: 44-44, 81-81, 105-105
tests/planner/perf_test_configs/disagg_8b_3p1d.yaml (1)
41-41: LGTM! Performance test configuration images updated.Container images for Frontend, VllmDecodeWorker, and VllmPrefillWorker components have been updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, maintaining consistency with the test configuration's 3-prefill-1-decode worker setup.
Also applies to: 91-91, 141-141
benchmarks/profiler/utils/config.py (1)
114-114: LGTM! Image reference standardized to NVIDIA registry.The update from a placeholder to the official NVIDIA NGC registry image (
nvcr.io/nvidia/ai-dynamo/dynamo-runtime:0.6.0) aligns with the repository-wide standardization observed across deployment configs and test files.components/backends/sglang/deploy/README.md (1)
64-64: LGTM! Documentation updated to reflect new image registry.The image references in the documentation examples have been correctly updated to use the NVIDIA NGC registry (
nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0), consistent with the actual deployment file changes in this PR.Also applies to: 95-95
tests/planner/perf_test_configs/disagg_8b_2p2d.yaml (1)
41-41: LGTM! Test configuration images updated consistently.All three container image references (Frontend, VllmDecodeWorker, VllmPrefillWorker) have been uniformly updated to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, maintaining consistency across the deployment specification.Also applies to: 91-91, 141-141
tests/planner/perf_test_configs/disagg_8b_tp2.yaml (1)
41-41: LGTM! Test configuration images updated consistently.Image references updated uniformly across all services to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, matching the pattern in other test configurations.Also applies to: 91-91, 141-141
components/backends/sglang/deploy/disagg_planner.yaml (1)
19-19: LGTM! Deployment configuration images updated consistently.All four service containers (Frontend, Planner, decode worker, prefill worker) have been updated to
nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0, ensuring consistent runtime across the disaggregated deployment with planner.Also applies to: 30-30, 52-52, 84-84
tests/planner/perf_test_configs/disagg_8b_planner.yaml (1)
44-44: LGTM! Test configuration with planner updated consistently.All four service images (Frontend, Planner, VllmDecodeWorker, VllmPrefillWorker) have been uniformly updated to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0, ensuring consistent testing environment.Also applies to: 77-77, 141-141, 198-198
components/backends/trtllm/deploy/README.md (1)
92-92: LGTM! Documentation updated to reflect new image registry.All three references to the TensorRT-LLM runtime image have been correctly updated to
nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0:
- Container configuration example
- Prerequisites/container images note
- Usage/customization section
This ensures the documentation is synchronized with the actual deployment configurations.
Also applies to: 112-112, 144-144
Overview:
Cherry-pick PR #3306. First of several.
Summary by CodeRabbit