Skip to content

docs: reorganization of documentation structure for 0.6.0#3750

Closed
dagil-nvidia wants to merge 25 commits into
mainfrom
docs-reorg-to-release-0.6.0
Closed

docs: reorganization of documentation structure for 0.6.0#3750
dagil-nvidia wants to merge 25 commits into
mainfrom
docs-reorg-to-release-0.6.0

Conversation

@dagil-nvidia
Copy link
Copy Markdown
Collaborator

@dagil-nvidia dagil-nvidia commented Oct 20, 2025

docs: reorganize documentation structure for 0.6.0 release

Overview

This PR reorganizes the documentation structure to improve navigation and logical grouping of content for the 0.6.0 release. The reorganization creates clearer separation between API documentation, developer guides, user guides, and reference materials.

Note: This PR has 11 known merge conflicts with main that require review (detailed below).

Details

Major Structural Changes

1. Directory Reorganization:

  • docs/API/docs/api/ (lowercase for consistency)
  • Created docs/development/ for developer-focused guides (backend-guide.md, runtime-guide.md)
  • Created docs/observability/ for monitoring, logging, and health checks
  • Created docs/performance/ for performance tuning guides
  • Created docs/reference/ for CLI reference, glossary, and support matrix
  • Moved docs/architecture/kvbm_*docs/kvbm/ (top-level KV Block Manager docs)
  • Moved docs/architecture/planner_*docs/planner/ (top-level Planner docs)

2. File Movements:

  • docs/guides/backend.mddocs/development/backend-guide.md
  • docs/runtime/README.mddocs/development/runtime-guide.md
  • docs/guides/health_check.mddocs/observability/health-checks.md
  • docs/guides/logging.mddocs/observability/logging.md
  • docs/guides/metrics.mddocs/observability/metrics.md
  • docs/guides/disagg_perf_tuning.mddocs/performance/tuning.md
  • docs/guides/dynamo_run.mddocs/reference/cli.md
  • docs/dynamo_glossary.mddocs/reference/glossary.md
  • docs/support_matrix.mddocs/reference/support-matrix.md
  • docs/guides/tool_calling.mddocs/guides/tool-calling.md (renamed for consistency)
  • docs/architecture/run_kvbm_in_*.mddocs/kvbm/*-setup.md

3. Updated References:

  • Updated all internal documentation links in docs/index.rst
  • Updated all references in docs/hidden_toctree.rst
  • Fixed broken links in backend READMEs
  • Updated relative paths throughout documentation

4. Version Updates:

  • Updated container image tags from my-tag to 0.6.0 in deployment YAML files

New Documentation Structure

docs/
├── api/              # API documentation (lowercase)
├── architecture/     # Core architecture docs
├── backends/         # Backend-specific guides
├── benchmarks/       # Benchmarking guides
├── development/      # Developer guides (NEW)
├── guides/           # User guides (REDUCED)
├── kubernetes/       # K8s deployment docs
├── kvbm/            # KV Block Manager docs (NEW)
├── observability/   # Monitoring, logging, health (NEW)
├── performance/     # Performance tuning (NEW)
├── planner/         # Planner-related docs (NEW)
├── reference/       # Reference materials (NEW)
└── router/          # Router documentation

Where should the reviewer start?

1. Review Merge Conflicts (11 files)

These conflicts need careful review as main has added new KVBM documentation while this branch reorganizes the structure:

Documentation conflicts (9 files):

  • docs/backends/sglang/prometheus.md - Content conflict
  • docs/backends/trtllm/README.md - Content conflict
  • docs/backends/vllm/prometheus.md - Content conflict
  • docs/index.rst - Content conflict (structure vs new content)
  • docs/kvbm/kvbm_components.md - Rename conflict: renamed to kvbm_components.md here but kvbm_design_deepdive.md in main
  • docs/kvbm/trtllm-setup.md - Content conflict
  • docs/kvbm/vllm-setup.md - Content conflict

Recipe conflicts (3 files):

  • recipes/llama-3-70b/vllm/agg/perf.yaml
  • recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml
  • recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml

2. Key Files to Review

  • docs/index.rst - Main table of contents with updated paths
  • docs/hidden_toctree.rst - Updated references to moved files
  • docs/backends/*/README.md - Updated feature matrix links
  • New directory structure under docs/

3. Verification

  • ✅ All moved files exist in new locations
  • ✅ All toctree references are valid
  • ✅ Internal links updated (1 broken link fixed in latest commit)

Related Issues

  • Relates to 0.6.0 release documentation cleanup

Checklist

  • DCO sign-off added to all commits
  • PR title follows conventional commit format (docs:)
  • All internal documentation links updated
  • No files left in old locations after moves
  • Merge conflicts resolved (requires reviewer input)
  • Documentation builds successfully (will verify after merge)

Summary by CodeRabbit

  • Documentation

    • Restructured documentation paths for improved organization, consolidating planner, KVBM, and observability guides
    • Added comprehensive architecture documentation for DynamoGraphDeployment specifications
  • Chores

    • Updated container image versions to 0.6.0 across all backends
    • Migrated to official NVIDIA container registry (nvcr.io/nvidia/ai-dynamo)
    • Updated benchmark and profiling workflows with new image references

saturley-hall and others added 25 commits October 14, 2025 17:38
Signed-off-by: Graham King <grahamk@nvidia.com>
Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Co-authored-by: Graham King <grahamk@nvidia.com>
Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: lkomali <lkomali@nvidia.com>
Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
Signed-off-by: Harrison Saturley-Hall <harrison.saturley.hall@gmail.com>
Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
- Rename API/ → api/ for consistency
- Create observability/, performance/, development/, reference/ sections
- Move guides/ content to appropriate sections:
  * observability/: health-checks, logging, metrics
  * development/: backend-guide, runtime-guide
  * performance/: tuning
  * backends/: kvbm-setup guides
  * reference/: CLI, glossary, support-matrix
- Keep guides/tool-calling.md (genuine how-to)
- Remove empty runtime/ and deploy/ directories

Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
- Update API/ → api/ references
- Update guides/* → new locations (observability/, development/, performance/, reference/)
- Update index.rst navigation with new paths
- Update hidden_toctree.rst with new paths
- Update cross-references in kubernetes/metrics.md

Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
- Create top-level kvbm/ directory with all KVBM content
  * Architecture docs from architecture/kvbm_*.md
  * Setup guides from backends/*/kvbm-setup.md → kvbm/*-setup.md
- Create top-level planner/ directory
  * Move planner docs from architecture/
- Move router/ out of components/ to top level
- Remove empty components/ directory
- Update all internal links and navigation

Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Update all documentation links to reflect the new structure where
kvbm/, planner/, and router/ directories have been moved to the top
level of the docs folder.

Changes:
- Move docs/kubernetes/sla_planner_quickstart.md to docs/planner/
- Update all references from /docs/kubernetes/sla_planner_quickstart
  to /docs/planner/sla_planner_quickstart (8 files)
- Update references from /docs/architecture/planner_intro to
  /docs/planner/planner_intro (2 files)
- Update references from /docs/architecture/kvbm_intro to
  /docs/kvbm/kvbm_intro (1 file)
- Fix critical Sphinx toctree reference in docs/index.rst
- Remove temporary analysis files (DOCS_ANALYSIS.md, FINAL_STRUCTURE.md)

All documentation links now correctly point to the reorganized structure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Fix relative path in components/src/dynamo/planner/README.md to
correctly resolve to docs/planner/planner_intro.rst from the repo root.

The path ../../docs/planner resolves to components/src/docs/planner
which doesn't exist. Updated to ../../../../docs/planner to properly
navigate from components/src/dynamo/planner back to repo root.

Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Fix repository-relative links in README.md to work correctly on GitHub.
Links with /docs/... resolve to https://github.com/docs/... and break.
Changed to docs/... for proper relative resolution.

Fixes:
- /docs/planner/load_planner.md -> docs/planner/load_planner.md
- /docs/planner/sla_planner.md -> docs/planner/sla_planner.md
- /docs/kvbm/kvbm_architecture.md -> docs/kvbm/kvbm_architecture.md

Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Update all broken links to reflect the new documentation structure:

Metrics and Logging:
- docs/guides/metrics.md → docs/observability/metrics.md (4 files)
- docs/guides/logging.md → docs/observability/logging.md (1 file)

Backend and Development:
- backend.md → ../development/backend-guide.md (3 files)
- docs/runtime/README.md → docs/development/runtime-guide.md (1 file)

CLI and KVBM:
- docs/guides/dynamo_run.md → docs/reference/cli.md (1 file)
- docs/guides/run_kvbm_in_trtllm.md → docs/kvbm/trtllm-setup.md (1 file)

Files updated:
- components/backends/sglang/prometheus.md (2 instances)
- components/backends/vllm/prometheus.md (2 instances)
- deploy/metrics/README.md
- deploy/tracing/README.md
- docs/backends/trtllm/README.md
- docs/kubernetes/create_deployment.md
- docs/observability/health-checks.md
- docs/observability/logging.md
- docs/observability/metrics.md
- lib/bindings/python/README.md

Cleanup:
- Remove temporary reorganization documentation files
  (REORG_SUMMARY.md, RESTRUCTURE_PLAN.md, URL_MIGRATION_GUIDE.md)

All documentation links now correctly reference the reorganized structure.

Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Merges all documentation reorganization changes and other updates from
docs-reorg branch into release/0.6.0, including the fix for the planner
README hyperlink.

Conflicts resolved by accepting docs-reorg version for all conflicts.

Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: Dan Gil <dagil@nvidia.com>
@dagil-nvidia dagil-nvidia requested a review from a team as a code owner October 20, 2025 21:02
@dagil-nvidia dagil-nvidia requested a review from a team October 20, 2025 21:02
@dagil-nvidia dagil-nvidia requested review from a team as code owners October 20, 2025 21:02
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Oct 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 20, 2025

Walkthrough

This PR consolidates a configuration and documentation restructuring: it introduces DynamoGraphDeployment architecture documentation with public type definitions, updates the default Docker registry from my-registry to nvcr.io/nvidia/ai-dynamo and container image tags from my-tag to 0.6.0, reorganizes documentation paths (architecture → planner/kvbm/observability), and downgrades the sglang dependency from post2 to post1.

Changes

Cohort / File(s) Change Summary
Architecture & Public Types
DGD_ARCHITECTURE_ANALYSIS.md, dynamographdeployment_types.go, dynamocomponentdeployment_types.go
Introduces DGD architecture documentation and four new public types: DynamoGraphDeploymentSpec, DynamoGraphDeploymentStatus, DynamoComponentDeploymentSharedSpec, and MultinodeSpec. Documents planner, router, worker deployment flows and multi-DGD considerations.
Build Configuration & Docker Registry
Earthfile, deploy/cloud/operator/Earthfile, container/Dockerfile.sglang*
Updates ARG DOCKER_SERVER default from my-registry to nvcr.io/nvidia/ai-dynamo across multiple Earthfile targets. Adjusts sglang build argument from 0.5.3.post2 to 0.5.3.post1.
vLLM Deployment Images
components/backends/vllm/deploy/*, recipes/llama-3-70b/vllm/*, examples/basics/kubernetes/Distributed_Inference/agg_router.yaml, examples/multimodal/deploy/*, tests/planner/perf_test_configs/*, tests/planner/scaling/disagg_planner.yaml
Updates vllm-runtime container image references from my-tag to 0.6.0 across Frontend, Decode, Prefill, and related worker deployments (~30+ files).
SGLang Deployment Images
components/backends/sglang/deploy/*
Updates sglang-runtime image references from my-registry/sglang-runtime:my-tag to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0 across Frontend, worker, and router components (~7 files).
TensorRT-LLM Deployment Images
components/backends/trtllm/deploy/*, recipes/gpt-oss-120b/trtllm/agg/deploy.yaml
Updates trtllm-runtime image references from my-registry/trtllm-runtime:my-tag to nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0 (~10 files).
Other Container Image Updates
benchmarks/incluster/benchmark_job.yaml, benchmarks/profiler/utils/config.py, examples/custom_backend/hello_world/deploy/hello_world.yaml, examples/deployments/ECS/task_definition_*.json
Updates miscellaneous container references to 0.6.0 or NVIDIA registry paths. Config defaults updated for DgdPlannerServiceConfig.
Profiler CLI & Planner Integration
benchmarks/profiler/profile_sla.py
Replaces separate parser setup with consolidated argparse-based CLI; integrates planner arguments via add_planner_arguments_to_parser; adds post-profile deployment logic guarded by args.deploy_after_profile.
Documentation Structure Reorganization
docs/hidden_toctree.rst, docs/index.rst, README.md
Relocates docs from docs/architecture/* to docs/planner/*, docs/kvbm/*, and consolidates docs/guides/* into docs/reference/, docs/observability/, and docs/development/. Updates all toctree references accordingly.
Targeted Documentation Path Updates
docs/architecture/architecture.md, docs/backends/\*/*.md, docs/kubernetes/*.md, docs/planner/*.md, docs/kvbm/*.md, docs/observability/*.md, docs/benchmarks/*.md, docs/_includes/install.rst, components/src/dynamo/planner/README.md, components/backends/\*/prometheus.md, deploy/(metrics|tracing)/README.md, lib/bindings/python/README.md, examples/README.md
Updates links and references from old architecture/guides paths to new planner/kvbm/observability/reference paths; adds GKE and Launch Tools sections to examples; updates container image tags in documentation examples.
Performance Test Workflow Changes
recipes/llama-3-70b/vllm/(agg|disagg-single-node|disagg-multi-node)/perf.yaml
Increases backoffLimit from 1 to 3; refactors perf job to replace Python-based aiperf setup with self-contained vllm-runtime container; introduces model readiness polling loop via shell script before benchmark execution; restructures artifact generation flow.
Dependency & Configuration Updates
pyproject.toml
Downgrades sglang group dependency from 0.5.3.post2 to 0.5.3.post1.
Minor Documentation Additions
deploy/cloud/pre-deployment/nixl/README.md
Adds SPDX license header block (no functional changes).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Profiler as benchmarks/profiler/<br>profile_sla.py
    participant Parser as Argparse CLI
    participant DGD as DynamoGraphDeployment
    participant Planner as Planner Service

    User->>Profiler: Launch with args
    Profiler->>Parser: Build CLI with planner args
    Parser->>Profiler: Return parsed arguments
    Profiler->>Profiler: Run profiling workflow
    Note over Profiler: Original profiling logic
    alt args.deploy_after_profile && !args.dry_run
        Profiler->>DGD: generate_dgd_config_with_planner()
        DGD->>Planner: Deploy optimized config
        Planner->>DGD: Monitor & adjust replicas
    end
    Profiler->>User: Complete with artifacts
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Rationale: The PR spans ~80+ files but exhibits high homogeneity in the bulk of changes (repetitive container image tag updates across ~50+ deployment YAML files and documentation link path reorganizations across ~25+ files). These patterns are straightforward to verify once understood. However, complexity arises from: (1) the breadth of documentation restructuring requiring verification of path consistency, (2) the new public type definitions and architecture documentation requiring understanding of DGD design, (3) the refactored perf job workflows with new polling and artifact logic, and (4) the profiler CLI integration changes. The variety of change categories (build config, images, docs, workflows, types) prevents this from being "Simple," but the predominance of homogeneous updates keeps it from reaching "Complex."

Possibly related PRs

Poem

🐰 Registry paths hop to NVIDIA's gate,
Docs reorganized—planner, kvbm partake,
vLLM and SGLang now versioned just right (0.6.0!),
New DGD types emerge in the architectural light,
Perf jobs poll models with patient delight! 🚀

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "docs: reorganization of documentation structure for 0.6.0" clearly and specifically describes the primary change in the changeset. It follows conventional commit format with the "docs:" prefix and accurately captures the main objective of reorganizing the documentation structure for the release. The title is concise and specific enough for reviewers scanning history to understand the nature of the changes without being vague or overly generic.
Description Check ✅ Passed The PR description meets all required sections from the template: Overview explains the purpose clearly, Details provides comprehensive information about structural changes with specific file movements and a visual structure diagram, Where should the reviewer start provides detailed guidance with merge conflict listings and key files to review, and Related Issues indicates the connection to the 0.6.0 release. The description is thorough and well-organized, exceeding the minimal template requirements by including verification checklists and specific conflict file listings that aid the review process.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/backends/sglang/README.md (1)

58-58: Fix broken documentation link.

Line 58 references an incorrect path. From docs/backends/sglang/README.md, the file docs/architecture/request_migration.md exists and should use the correct relative path ../../architecture/request_migration.md to match the pattern of all other documentation links in the file (lines 37, 38, 39, 42).

-| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](../../../docs/architecture/request_migration.md). | `0` (disabled) | N/A |
+| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](../../architecture/request_migration.md). | `0` (disabled) | N/A |
🧹 Nitpick comments (4)
DGD_ARCHITECTURE_ANALYSIS.md (2)

1-6: Consider relocating to docs/ directory.

Given this PR's focus on documentation reorganization, this new architecture analysis file should likely be placed under the new docs/ structure (perhaps docs/planner/ or docs/development/) rather than at the repository root.


83-83: Add language identifiers to fenced code blocks.

Multiple fenced code blocks lack language identifiers, which affects syntax highlighting and accessibility.

Add appropriate language identifiers:

  • Go code blocks: ```go
  • Python code blocks: ```python
  • Rust code blocks: ```rust
  • YAML code blocks: ```yaml
  • Shell/text blocks: ```text or ```bash

Also applies to: 104-104, 114-114, 143-143, 190-190, 196-196, 206-206, 215-215, 235-235, 250-250, 256-256, 268-268, 276-276, 283-283, 294-294, 318-318, 326-326, 333-333

docs/kvbm/vllm-setup.md (1)

22-22: Improve link text for clarity and linting compliance.

The link text "here" is not descriptive and violates markdown linting rule MD059. Provide more contextual text that describes what users will find at the link destination.

Consider updating the link text as follows:

-To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html)
+To learn what KVBM is, please check the [KVBM introduction](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html)

This provides better context for readers and accessibility tools.

docs/kvbm/trtllm-setup.md (1)

22-22: Replace non-descriptive link text "here" with descriptive alternative.

The KVBM intro documentation file exists at the new location (./docs/kvbm/kvbm_intro.rst), confirming the link target is valid. The .html extension in the external URL is correct for Sphinx-built documentation.

However, the link text "here" remains non-descriptive per Markdown best practices. Apply the suggested improvement:

-To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html)
+To learn what KVBM is, check the [KVBM introduction](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6ed01b and b52f23a.

📒 Files selected for processing (87)
  • DGD_ARCHITECTURE_ANALYSIS.md (1 hunks)
  • Earthfile (3 hunks)
  • README.md (2 hunks)
  • benchmarks/incluster/benchmark_job.yaml (1 hunks)
  • benchmarks/profiler/profile_sla.py (3 hunks)
  • benchmarks/profiler/utils/config.py (1 hunks)
  • components/backends/sglang/deploy/README.md (2 hunks)
  • components/backends/sglang/deploy/agg.yaml (2 hunks)
  • components/backends/sglang/deploy/agg_logging.yaml (2 hunks)
  • components/backends/sglang/deploy/agg_router.yaml (2 hunks)
  • components/backends/sglang/deploy/disagg-multinode.yaml (3 hunks)
  • components/backends/sglang/deploy/disagg.yaml (3 hunks)
  • components/backends/sglang/deploy/disagg_planner.yaml (4 hunks)
  • components/backends/sglang/prometheus.md (2 hunks)
  • components/backends/trtllm/deploy/README.md (3 hunks)
  • components/backends/trtllm/deploy/agg-with-config.yaml (2 hunks)
  • components/backends/trtllm/deploy/agg.yaml (2 hunks)
  • components/backends/trtllm/deploy/agg_router.yaml (2 hunks)
  • components/backends/trtllm/deploy/disagg-multinode.yaml (3 hunks)
  • components/backends/trtllm/deploy/disagg.yaml (3 hunks)
  • components/backends/trtllm/deploy/disagg_planner.yaml (4 hunks)
  • components/backends/trtllm/deploy/disagg_router.yaml (3 hunks)
  • components/backends/vllm/deploy/README.md (3 hunks)
  • components/backends/vllm/deploy/agg.yaml (2 hunks)
  • components/backends/vllm/deploy/agg_kvbm.yaml (2 hunks)
  • components/backends/vllm/deploy/agg_router.yaml (2 hunks)
  • components/backends/vllm/deploy/disagg-multinode.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_kvbm.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_kvbm_tp2.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_planner.yaml (4 hunks)
  • components/backends/vllm/deploy/disagg_router.yaml (3 hunks)
  • components/backends/vllm/prometheus.md (2 hunks)
  • components/src/dynamo/planner/README.md (1 hunks)
  • container/Dockerfile.sglang (1 hunks)
  • container/Dockerfile.sglang-wideep (1 hunks)
  • deploy/cloud/operator/Earthfile (1 hunks)
  • deploy/cloud/pre-deployment/nixl/README.md (1 hunks)
  • deploy/metrics/README.md (1 hunks)
  • deploy/tracing/README.md (1 hunks)
  • docs/_includes/install.rst (2 hunks)
  • docs/architecture/architecture.md (1 hunks)
  • docs/architecture/kv_cache_routing.md (2 hunks)
  • docs/backends/sglang/README.md (2 hunks)
  • docs/backends/trtllm/README.md (2 hunks)
  • docs/backends/trtllm/gpt-oss.md (1 hunks)
  • docs/backends/vllm/README.md (1 hunks)
  • docs/benchmarks/benchmarking.md (1 hunks)
  • docs/benchmarks/pre_deployment_profiling.md (2 hunks)
  • docs/deploy/metrics/docker-compose.yml (0 hunks)
  • docs/hidden_toctree.rst (2 hunks)
  • docs/index.rst (2 hunks)
  • docs/kubernetes/create_deployment.md (2 hunks)
  • docs/kubernetes/installation_guide.md (1 hunks)
  • docs/kubernetes/metrics.md (1 hunks)
  • docs/kvbm/trtllm-setup.md (1 hunks)
  • docs/kvbm/vllm-setup.md (1 hunks)
  • docs/observability/health-checks.md (1 hunks)
  • docs/observability/logging.md (1 hunks)
  • docs/observability/metrics.md (1 hunks)
  • docs/planner/planner_intro.rst (2 hunks)
  • docs/planner/sla_planner.md (2 hunks)
  • docs/planner/sla_planner_quickstart.md (2 hunks)
  • examples/README.md (3 hunks)
  • examples/basics/kubernetes/Distributed_Inference/agg_router.yaml (2 hunks)
  • examples/custom_backend/hello_world/deploy/hello_world.yaml (2 hunks)
  • examples/deployments/ECS/task_definition_frontend.json (1 hunks)
  • examples/deployments/ECS/task_definition_prefillworker.json (1 hunks)
  • examples/multimodal/deploy/agg_llava.yaml (4 hunks)
  • examples/multimodal/deploy/agg_qwen.yaml (4 hunks)
  • lib/bindings/python/README.md (1 hunks)
  • pyproject.toml (1 hunks)
  • recipes/gpt-oss-120b/trtllm/agg/deploy.yaml (2 hunks)
  • recipes/llama-3-70b/vllm/agg/deploy.yaml (2 hunks)
  • recipes/llama-3-70b/vllm/agg/perf.yaml (2 hunks)
  • recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml (3 hunks)
  • recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (2 hunks)
  • recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml (3 hunks)
  • recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (2 hunks)
  • tests/planner/perf_test_configs/agg_8b.yaml (2 hunks)
  • tests/planner/perf_test_configs/disagg_8b_2p2d.yaml (3 hunks)
  • tests/planner/perf_test_configs/disagg_8b_3p1d.yaml (3 hunks)
  • tests/planner/perf_test_configs/disagg_8b_planner.yaml (4 hunks)
  • tests/planner/perf_test_configs/disagg_8b_tp2.yaml (3 hunks)
  • tests/planner/perf_test_configs/image_cache_daemonset.yaml (1 hunks)
  • tests/planner/scaling/disagg_planner.yaml (4 hunks)
💤 Files with no reviewable changes (1)
  • docs/deploy/metrics/docker-compose.yml
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T00:26:43.641Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#3035
File: lib/runtime/examples/system_metrics/README.md:65-65
Timestamp: 2025-09-16T00:26:43.641Z
Learning: The team at ai-dynamo/dynamo prefers to use consistent metric naming patterns with _total suffixes across all metric types (including gauges) for internal consistency, even when this differs from strict Prometheus conventions that reserve _total for counters only. This design decision was confirmed by keivenchang in PR 3035, referencing examples in prometheus_names.rs and input from team members.

Applied to files:

  • components/backends/vllm/prometheus.md
  • components/backends/sglang/prometheus.md
📚 Learning: 2025-09-29T19:11:14.161Z
Learnt from: nv-anants
PR: ai-dynamo/dynamo#3290
File: docs/_includes/quick_start_local.rst:13-14
Timestamp: 2025-09-29T19:11:14.161Z
Learning: ai-dynamo package requires --prerelease=allow flag for installation due to sglang dependency requiring RC versions (e.g., sglang[all]==0.5.0rc2).

Applied to files:

  • docs/backends/sglang/README.md
🧬 Code graph analysis (1)
benchmarks/profiler/profile_sla.py (3)
benchmarks/profiler/utils/config.py (1)
  • generate_dgd_config_with_planner (405-514)
benchmarks/profiler/utils/planner_utils.py (1)
  • add_planner_arguments_to_parser (132-174)
deploy/utils/dynamo_deployment.py (2)
  • DynamoDeploymentClient (98-475)
  • create_deployment (220-265)
🪛 LanguageTool
DGD_ARCHITECTURE_ANALYSIS.md

[style] ~90-~90: The words ‘Observation’ and ‘observed’ are quite similar. Consider replacing ‘observed’ with a different word.
Context: ...justment_interval` seconds - Metrics observed: - Number of requests (num_req) ...

(VERB_NOUN_SENT_LEVEL_REP)


[grammar] ~122-~122: Use a hyphen to join words.
Context: ...replicas, blocking=False)` ### Decision Making Assumptions - **Homogeneous worke...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
docs/kvbm/vllm-setup.md

22-22: Link text should be descriptive

(MD059, descriptive-link-text)

docs/kvbm/trtllm-setup.md

22-22: Link text should be descriptive

(MD059, descriptive-link-text)

DGD_ARCHITECTURE_ANALYSIS.md

33-33: Link text should be descriptive

(MD059, descriptive-link-text)


83-83: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


104-104: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


114-114: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


143-143: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


190-190: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


196-196: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


206-206: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


215-215: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


219-219: Bare URL used

(MD034, no-bare-urls)


229-229: Blank line inside blockquote

(MD028, no-blanks-blockquote)


235-235: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


246-246: Link text should be descriptive

(MD059, descriptive-link-text)


250-250: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


256-256: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


268-268: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


276-276: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


283-283: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


294-294: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


318-318: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


326-326: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


333-333: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (88)
examples/basics/kubernetes/Distributed_Inference/agg_router.yaml (1)

41-41: Verify version consistency across related deployment manifests.

The changes to agg_router.yaml are correctly updated to 0.6.0 with no remaining placeholders. However, the verification revealed a version inconsistency: shared_frontend/shared_frontend.yaml still uses vllm-runtime:0.5.0 (5 instances on lines 26, 48, 77, 98, 119) while agg_router.yaml uses 0.6.0.

Clarify intent: If this PR aims for comprehensive 0.6.0 release updates across all deployment manifests, shared_frontend.yaml should also be updated. If the version difference is intentional (e.g., different services require different versions), please confirm this is expected and document the reason.

benchmarks/profiler/profile_sla.py (2)

16-16: LGTM! Import additions support the new planner integration.

The new imports are correctly added and used throughout the file: argparse for the inline CLI parser, WORKER_COMPONENT_NAMES and generate_dgd_config_with_planner for the planner configuration, and add_planner_arguments_to_parser for dynamic argument injection.

Also applies to: 26-32


762-905: LGTM! Comprehensive argparse setup with good organization.

The inline argparse parser is well-structured with:

  • Clear argument grouping (profiling, interpolation, MoE, deployment, AI configurator)
  • Helpful descriptions and default values
  • Proper integration with planner arguments via add_planner_arguments_to_parser() (line 876)

This replaces the previous create_profiler_parser() approach with a more maintainable, self-contained implementation.

docs/backends/trtllm/gpt-oss.md (1)

52-52: Image tag and registry update is consistent with release version.

The container image reference has been properly updated from a placeholder to the official 0.6.0 release tag with the NVIDIA AI-Dynamo registry. Inline with PR objectives.

deploy/metrics/README.md (1)

6-6: Documentation reference path updated consistently with reorganization.

The metrics guide link has been properly updated to reflect the new documentation structure where metrics documentation has been moved to the observability section.

deploy/tracing/README.md (1)

22-22: Documentation reference path updated consistently with reorganization.

The logging guide reference has been properly updated to reflect the new documentation structure where logging documentation has been moved to the observability section, consistent with the metrics README update.

docs/planner/sla_planner_quickstart.md (2)

109-109: Image tag and registry updated consistently with release version.

The Docker image reference has been properly updated to use the official 0.6.0 release tag with the NVIDIA AI-Dynamo registry.


252-252: Documentation link updated to reflect planner subsection reorganization.

The architecture documentation link has been properly updated to point to the new planner documentation location, aligned with PR's documentation reorganization objectives.

components/src/dynamo/planner/README.md (1)

18-18: Relative documentation path updated correctly for new directory structure.

The planner documentation link has been properly updated with the correct relative path traversal (from components/src/dynamo/planner/ to root level docs/planner/), aligning with the documentation reorganization.

examples/deployments/ECS/task_definition_frontend.json (1)

6-6: Container image updated to official NVIDIA AI-Dynamo registry with 0.6.0 tag.

The ECS task definition now references the official container image with proper registry and release tag, consistent with other deployment updates in this PR.

docs/kubernetes/create_deployment.md (2)

93-93: Documentation link updated to reference section.

The dynamo run guide reference has been updated to point to the reference section. Verify this is the intended location—if this is contributor-focused documentation, consider whether /docs/development/cli.md or similar would be more appropriate.


161-161: Image pull secrets example updated with realistic registry hostname.

The example imagePullSecrets name has been updated from a generic placeholder to reflect the actual NVIDIA registry. Ensure the surrounding documentation clearly indicates this is an example that users should customize for their actual secret configurations.

deploy/cloud/pre-deployment/nixl/README.md (1)

1-4: SPDX license header properly added.

Standard copyright and Apache-2.0 license header has been added to the README file, aligning with repository licensing standards.

benchmarks/incluster/benchmark_job.yaml (1)

21-21: LGTM!

The image tag update from placeholder to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 is consistent with the 0.6.0 release updates across the PR and aligns with the TODO comment.

docs/_includes/install.rst (1)

13-13: LGTM!

The installation documentation is correctly updated to reflect the 0.6.0 release with proper version pinning for both PyPI and Docker image references.

Also applies to: 44-44

pyproject.toml (1)

63-63: Verify sglang post1 stability.

The sglang dependency is being downgraded from 0.5.3.post2 to 0.5.3.post1. While the enriched context indicates this aligns with broader ecosystem updates reflected in container Dockerfiles, confirm that post1 is the stable/tested version and doesn't regress any functionality used by the codebase.

lib/bindings/python/README.md (1)

53-53: Verify documentation path is correct.

The link has been updated from a runtime README to docs/development/runtime-guide.md. Ensure that this file exists at the new location after the documentation reorganization and contains the prerequisites content expected by users of this README.

examples/deployments/ECS/task_definition_prefillworker.json (1)

6-6: LGTM!

The ECS task definition is correctly updated to reference the official runtime image with the 0.6.0 tag.

examples/multimodal/deploy/agg_qwen.yaml (1)

17-17: LGTM!

All four services in the multimodal deployment spec are consistently updated to the correct runtime image with version 0.6.0. No other fields were modified, preserving the deployment logic.

Also applies to: 28-28, 45-45, 62-62

components/backends/vllm/deploy/disagg_kvbm.yaml (1)

16-16: LGTM!

All three worker types in the disaggregated KVBM deployment are consistently updated to 0.6.0. The image references are correct for the KVBM test configuration.

Also applies to: 27-27, 59-59

docs/architecture/architecture.md (1)

57-58: Links verify correctly — no issues found.

Both target documentation files exist at their new paths:

  • docs/kvbm/kvbm_intro.rst
  • docs/planner/planner_intro.rst

The relative path updates in the architecture documentation are correct and resolve to valid destinations. No broken links detected.

deploy/cloud/operator/Earthfile (1)

48-48: LGTM! Registry update aligns with the 0.6.0 release.

The update from the placeholder registry to NVIDIA's production registry is consistent with the broader migration across the PR.

docs/planner/planner_intro.rst (1)

80-80: LGTM! Toctree reference correctly updated.

The relative path update is consistent with the documentation reorganization.

components/backends/vllm/deploy/README.md (2)

72-72: LGTM! Image references updated to production registry.

The container image references are now pointing to the official NVIDIA registry with the 0.6.0 release tag, replacing the placeholder values.

Also applies to: 119-119


240-240: LGTM! Documentation link correctly updated.

The SLA Planner quickstart link has been updated to reflect the new documentation structure, with the correct relative path from the current file location.

components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (1)

16-16: LGTM! All service images consistently updated to 0.6.0.

The three services (Frontend, VllmDecodeWorker, VllmPrefillWorker) have been updated consistently to use the production NVIDIA registry with version 0.6.0.

Also applies to: 27-27, 63-63

components/backends/trtllm/deploy/disagg_router.yaml (1)

16-16: LGTM! TensorRT-LLM service images consistently updated.

All three services (Frontend, TRTLLMPrefillWorker, TRTLLMDecodeWorker) have been updated to use the production NVIDIA registry with version 0.6.0, correctly using the trtllm-runtime image.

Also applies to: 30-30, 58-58

benchmarks/profiler/utils/config.py (1)

109-109: LGTM! Default planner image updated to production registry.

The default container image for the planner service has been updated from placeholder values to the production NVIDIA registry with version 0.6.0, consistent with other components in this PR.

components/backends/sglang/prometheus.md (1)

13-13: LGTM! Metrics documentation paths updated correctly.

Both references to the Dynamo metrics documentation have been updated to reflect the reorganization from docs/guides/metrics.md to docs/observability/metrics.md, with correct relative paths.

Also applies to: 94-94

docs/backends/trtllm/README.md (2)

311-311: Link path and filename change verified.

The documentation link has been successfully updated from docs/architecture/run_kvbm_in_trtllm.md to docs/kvbm/trtllm-setup.md. The new file exists and the old path has been removed. No action required.


58-60: All documentation paths verified and valid.

The script confirms that all three updated documentation paths exist at their new locations:

  • docs/planner/sla_planner.md
  • docs/planner/load_planner.md
  • docs/kvbm/kvbm_architecture.md

No broken links or missing files found.

examples/multimodal/deploy/agg_llava.yaml (1)

17-17: Registry and image tag updates applied consistently.

All four container image references have been updated uniformly from the generic registry to the NVIDIA AI-Dynamo registry with the 0.6.0 release tag. No functional changes were made to the deployment configuration.

Also applies to: 28-28, 45-45, 62-62

components/backends/vllm/deploy/disagg-multinode.yaml (1)

16-16: Registry and image tag updates applied consistently across all workers.

All three vllm-runtime image references (Frontend, decode worker, prefill worker) have been updated uniformly to the NVIDIA AI-Dynamo registry with the 0.6.0 tag.

Also applies to: 37-37, 60-60

components/backends/sglang/deploy/README.md (1)

64-64: Image references updated consistently to 0.6.0 release tag.

Both container image references have been updated to the NVIDIA AI-Dynamo registry with the 0.6.0 tag, aligning with the release version.

Also applies to: 95-95

components/backends/trtllm/deploy/disagg-multinode.yaml (1)

98-98: Registry and image tag updates applied consistently across all workers.

All three trtllm-runtime image references (Frontend, prefill worker, decode worker) have been updated uniformly to the NVIDIA AI-Dynamo registry with the 0.6.0 tag.

Also applies to: 130-130, 170-170

components/backends/trtllm/deploy/README.md (1)

92-92: Image references updated consistently to 0.6.0 release tag.

All three trtllm-runtime image references have been updated to the NVIDIA AI-Dynamo registry with the 0.6.0 tag, aligning with the 0.6.0 release version.

Also applies to: 112-112, 144-144

container/Dockerfile.sglang (1)

16-17: Dependency versions are synchronized correctly.

The sglang dependency in pyproject.toml matches the Dockerfile specification: both specify 0.5.3.post1. The version downgrade has been applied consistently across both configuration files.

container/Dockerfile.sglang-wideep (1)

4-4: Dependency synchronization verified — no action required.

The pyproject.toml correctly specifies sglang[all]==0.5.3.post1, matching the downgraded image tag in the Dockerfile. The version synchronization across files is complete.

examples/README.md (3)

51-59: All launch tool directories verified successfully.

All four paths referenced in the documentation exist:

  • launch/dynamo-run/
  • components/backends/vllm/launch/
  • components/backends/sglang/launch/
  • components/backends/trtllm/launch/

The merge conflicts were properly resolved, and the directory reorganization is complete.


86-100: All referenced backend component paths verified—links are correct.

The verification confirms that all nine paths referenced in the restructured Components Workflows section (lines 86–100) exist in the repository:

  • ✓ Backend directories: components/backends/{vllm,sglang,trtllm}/
  • ✓ Kubernetes CRDs: components/backends/{vllm,sglang,trtllm}/deploy/
  • ✓ Launch Scripts: components/backends/{vllm,sglang,trtllm}/launch/

No broken or missing links found.


39-39: GKE deployment guide verified — no action needed.

Line 39 references ../docs/kubernetes/gke_setup.md under platform-specific deployment guides. Verification confirms this file exists at the correct relative path.

examples/custom_backend/hello_world/deploy/hello_world.yaml (1)

44-44: LGTM! Container image references updated for 0.6.0 release.

Both service containers now reference the official NVIDIA registry with the 0.6.0 release tag, consistent with the PR's objectives.

Also applies to: 83-83

docs/benchmarks/benchmarking.md (1)

413-413: LGTM! Documentation updated with 0.6.0 image tag.

The default configuration example now reflects the 0.6.0 release image tag.

Earthfile (1)

137-137: LGTM! Docker registry updated across all build targets.

All Earthfile targets now reference the official NVIDIA registry nvcr.io/nvidia/ai-dynamo, consistent with the 0.6.0 release standardization.

Also applies to: 178-178, 192-192

recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (2)

8-8: LGTM! Increased retry tolerance for performance tests.

Increasing backoffLimit from 1 to 3 is appropriate for performance benchmarking jobs that may experience transient failures.


18-66: LGTM! Enhanced performance testing workflow.

The refactored approach introduces several improvements:

  • Uses official 0.6.0 vLLM runtime image
  • Implements model readiness polling before benchmarking
  • Provides detailed benchmark execution with explicit artifact markers
  • Better structured for automated CI/CD pipelines
README.md (2)

77-77: No issues found—support matrix path is correct.

The file docs/reference/support-matrix.md exists at the location referenced in README.md line 77. The path update is accurate and reflects the file's actual location after reorganization.


62-64: Documentation paths verified—all files exist at referenced locations.

The framework support matrix links in README.md lines 62-64 correctly reference the reorganized documentation. All three files (load_planner.md, sla_planner.md, and kvbm_architecture.md) exist at their new paths under docs/planner/ and docs/kvbm/. No broken links found.

docs/benchmarks/pre_deployment_profiling.md (1)

4-4: The verification confirms that the file sla_planner_quickstart.md exists at the path /docs/planner/sla_planner_quickstart.md. The documentation link is valid and references an existing file. No issues found.

docs/hidden_toctree.rst (1)

14-42: Documentation reorganization verified—all paths correct and toctree syntax valid.

All 26 files referenced in the hidden toctree exist at their new locations following the major reorganization:

  • docs/development/runtime-guide.md
  • docs/api/nixl_connect/ (12 files) ✓
  • docs/kubernetes/ (6 files) ✓
  • docs/reference/cli.md
  • docs/observability/metrics.md
  • docs/kvbm/ (2 files) ✓
  • docs/guides/tool-calling.md
  • docs/architecture/kv_cache_routing.md
  • docs/planner/load_planner.md

The toctree directive syntax is valid with proper indentation. No broken references detected.

docs/observability/metrics.md (1)

99-99: Verify documentation link target exists.

The Backend Guide link has been updated as part of the documentation reorganization. Ensure that the target file docs/development/backend-guide.md exists and contains the expected content.

The link text is descriptive and the relative path looks correct for the file hierarchy. Please confirm the target file exists in the repository after the reorganization.

recipes/gpt-oss-120b/trtllm/agg/deploy.yaml (1)

33-33: Verify TensorRT-LLM runtime image availability.

The container image tags have been updated to 0.6.0 in both the Frontend and TrtllmWorker components. Confirm that the image nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0 exists and is accessible in the NVIDIA registry.

As mentioned in the previous file review, you can verify the image availability by pulling it locally or checking the NVCR registry documentation for published tags.

Also applies to: 72-72

components/backends/vllm/deploy/disagg.yaml (1)

16-16: Verify vLLM runtime image availability.

The container image tags have been updated to 0.6.0 across all three components (Frontend, VllmDecodeWorker, and VllmPrefillWorker). Confirm that the image nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 exists and is accessible in the NVIDIA registry.

Verify the image is available by attempting to pull it or checking the NVCR registry for published tags. This should be validated before the PR is merged to ensure deployments can pull the images.

Also applies to: 28-28, 48-48

docs/observability/health-checks.md (1)

200-200: Verify documentation link target exists.

The Backend Guide link has been updated as part of the documentation reorganization. Ensure that the target file docs/development/backend-guide.md exists and contains the expected content.

The link text is descriptive and the relative path appears correct for the reorganized file structure.

docs/kubernetes/metrics.md (1)

68-68: Verify documentation link target exists.

The metrics guide link has been updated to reflect the new documentation structure. Ensure that the target file docs/observability/metrics.md exists (which we reviewed earlier in this PR).

The link text is descriptive and aligns with the documentation reorganization objectives.

docs/kubernetes/installation_guide.md (1)

199-199: Verify documentation link target exists.

The SLA Planner Quickstart Guide link has been updated to reflect the new planner directory structure. Ensure that the target file docs/planner/sla_planner_quickstart.md exists and contains the expected content.

The link text is descriptive and the relative path aligns with the documentation reorganization for the 0.6.0 release.

recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (2)

43-43: The review comment is incorrect—hardcoded snapshot hashes are intentional for reproducibility.

The pattern of hardcoded tokenizer snapshot hashes appears consistently across multiple benchmarking jobs in both gpt-oss-120b and llama-3-70b recipe suites. Learnings confirm that hardcoded snapshot hashes in tokenizer paths are intentional for version pinning and reproducible results, not an issue to be fixed. This is established design in benchmarking manifests, not a brittleness concern requiring refactoring. No changes are needed.

Likely an incorrect or invalid review comment.


8-8: I need to search for information about vLLM 0.6.0 to determine if this image version exists and is documented.

Manual verification required for container image availability.

While the NGC catalog hosts Dynamo vLLM Runtime, a pre-built, Docker-based environment designed to run NVIDIA Dynamo with the vLLM inference engine, the sandbox environment cannot directly access external registries to confirm the specific tag 0.6.0 exists.

Verify the image before deployment by:

  • Pulling directly: docker pull nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 (requires NGC credentials)
  • Checking the NGC catalog web UI for available tags at the container repository
  • Consulting release notes or documentation for this version's availability
docs/architecture/kv_cache_routing.md (1)

159-159: Documentation reorganization verified successfully.

The new router documentation path docs/router/README.md exists and is properly accessible. The old path docs/components/router/README.md has been removed as expected. The link updates at lines 159 and 515 correctly reference the new location.

components/backends/vllm/deploy/agg.yaml (1)

16-16: Consider removing the image availability check or reformulating it as a pre-deployment verification step.

The vLLM v0.6.0 release is a documented version that introduced significant optimizations. The NVIDIA registry (nvcr.io) requires authentication to verify manifest details via API, so a 401 response does not confirm or deny image existence.

If image pull failures occur during deployment, they will be caught at runtime. The image tag appears consistent with a legitimate release. Either verify access separately with proper credentials before deployment or remove this check—it cannot be conclusively resolved without authenticated registry access.

docs/observability/logging.md (1)

190-190: Link target verified successfully — no issues found.

The target file docs/development/backend-guide.md exists at the correct location. The link update from ../backend.md to ../development/backend-guide.md is correct and consistently applied across related documentation files (metrics.md, health-checks.md). No stale references to the old path remain.

recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml (1)

21-21: Image references updated consistently to 0.6.0.

All three vLLM runtime container images are updated to the new NVIDIA registry with consistent tagging (nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0). The updates align with the 0.6.0 release standardization.

Also applies to: 45-45, 74-74

components/backends/trtllm/deploy/agg.yaml (1)

16-16: trtllm-runtime images updated correctly to 0.6.0.

Both Frontend and TRTLLMWorker containers are correctly updated to nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0.

Also applies to: 27-27

components/backends/sglang/deploy/agg_router.yaml (1)

16-16: sglang-runtime images updated correctly to 0.6.0.

Both Frontend and decode worker containers are correctly updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0.

Also applies to: 30-30

components/backends/trtllm/deploy/disagg.yaml (1)

16-16: trtllm-runtime images updated consistently across all workers.

Frontend, prefill, and decode worker containers all correctly reference nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0.

Also applies to: 28-28, 56-56

components/backends/trtllm/deploy/agg_router.yaml (1)

16-16: trtllm-runtime images updated correctly to 0.6.0.

Frontend and worker containers correctly reference nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0.

Also applies to: 30-30

tests/planner/perf_test_configs/image_cache_daemonset.yaml (1)

23-23: Image cache DaemonSet updated to 0.6.0.

The vllm-runtime image reference for the image-caching DaemonSet is correctly updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0.

components/backends/sglang/deploy/agg_logging.yaml (1)

19-19: sglang-runtime images updated correctly to 0.6.0.

Both Frontend and decode worker containers are correctly updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0 with consistent image references.

Also applies to: 30-30

components/backends/sglang/deploy/disagg-multinode.yaml (1)

25-25: Image tag updates look good.

All three container image references are consistently updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0. The changes align with the PR objective of updating image tags for the 0.6.0 release.

Also applies to: 38-38, 75-75

components/backends/trtllm/deploy/disagg_planner.yaml (1)

19-19: Image tag updates are consistent.

All four container image references are updated to nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0 with no other changes to pod spec configuration.

Also applies to: 47-47, 88-88, 117-117

tests/planner/perf_test_configs/disagg_8b_tp2.yaml (1)

41-41: Image tag updates are consistent across worker services.

All three container image references are updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 with no changes to pod spec or service configuration.

Also applies to: 91-91, 141-141

components/backends/vllm/prometheus.md (1)

99-99: Documentation reference path update looks appropriate.

Line 99 updates the reference in the "See Also" section to point to the new reorganized path docs/observability/metrics.md. This aligns with the PR's documentation reorganization objectives.

components/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)

16-16: Image tag updates are consistent.

All three container image references are updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 with no other changes to pod spec configuration.

Also applies to: 29-29, 67-67

components/backends/sglang/deploy/disagg.yaml (1)

16-16: Image tag updates are consistent.

All three container image references are updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0 with no other changes to pod spec configuration.

Also applies to: 28-28, 64-64

recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml (1)

21-21: Image tag updates are consistent.

All three container image references are updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 with no other changes to pod spec configuration.

Also applies to: 55-55, 94-94

tests/planner/perf_test_configs/disagg_8b_2p2d.yaml (1)

41-41: Image tag updates are consistent across worker services.

All three container image references are updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 with no changes to pod spec or service configuration.

Also applies to: 91-91, 141-141

docs/backends/sglang/README.md (1)

107-108: Verify sglang version strategy.

The installation section has been updated to specify sglang[all]==0.5.3.post1 (a production/patch version) instead of an RC version. According to prior learnings, the ai-dynamo package requires --prerelease=allow flag for RC versions of sglang (e.g., sglang[all]==0.5.0rc2).

Confirm that 0.5.3.post1 is the intended stable version and that it's compatible with the project's prerelease requirements. If RC versions are still needed, update the documentation to clarify the --prerelease=allow flag usage.

Based on learnings from PR #3290.

tests/planner/perf_test_configs/disagg_8b_3p1d.yaml (1)

41-41: Container image references updated consistently.

The vLLM runtime image references have been properly updated from placeholder tags to the NVIDIA AI-Dynamo registry with version 0.6.0 across all services (Frontend, VllmDecodeWorker, VllmPrefillWorker).

Also applies to: 91-91, 141-141

tests/planner/scaling/disagg_planner.yaml (1)

21-21: Container image registry and tag updated consistently.

All vLLM runtime container references have been updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 across all services in the disagg_planner configuration.

Also applies to: 44-44, 81-81, 105-105

components/backends/vllm/deploy/agg_router.yaml (1)

16-16: Container images updated to NVIDIA registry 0.6.0 tag.

Frontend and VllmDecodeWorker images properly updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0.

Also applies to: 30-30

docs/planner/sla_planner.md (1)

4-4: Documentation links properly updated to new directory structure.

SLA Planner Quick Start Guide references have been updated from /docs/kubernetes/sla_planner_quickstart.md to /docs/planner/sla_planner_quickstart.md to reflect the documentation reorganization.

Also applies to: 132-132

components/backends/vllm/deploy/disagg_planner.yaml (1)

19-19: Container images updated to NVIDIA registry version 0.6.0 across all services.

vLLM runtime images properly updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 in Frontend, Planner, and both worker services.

Also applies to: 29-29, 51-51, 71-71

tests/planner/perf_test_configs/disagg_8b_planner.yaml (1)

44-44: Container images consistently updated to NVIDIA AI-Dynamo 0.6.0 across all services.

All vLLM runtime container references updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 in this comprehensive perf test configuration spanning Frontend, Planner, and worker services.

Also applies to: 77-77, 141-141, 198-198

components/backends/sglang/deploy/agg.yaml (1)

16-16: SGLang runtime images updated to NVIDIA registry 0.6.0 tag.

Frontend and decode worker images properly updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0, maintaining differentiation from vLLM runtime images across the backend deployments.

Also applies to: 27-27

docs/backends/vllm/README.md (1)

41-43: Incomplete path migration in Feature Support Matrix.

Three documentation links are being updated to reflect the new directory structure (planner/, kvbm/), but the file contains other architecture documentation links that remain unchanged:

  • Line 38: ../../../docs/architecture/disagg_serving.md
  • Line 40: ../../../docs/architecture/kv_cache_routing.md
  • Line 181: ../../../docs/architecture/kv_cache_routing.md
  • Line 185-186: ../../../docs/architecture/request_migration.md

Per the PR objectives, the documentation structure is being reorganized. If these files (disagg_serving, kv_cache_routing, request_migration) have been moved to new top-level directories as part of this reorganization, these links should also be updated for consistency. Otherwise, readers will encounter inconsistent navigation and potentially broken links.

Verify whether the following files have been moved in this PR:

  • docs/architecture/disagg_serving.md
  • docs/architecture/kv_cache_routing.md
  • docs/architecture/request_migration.md

If they have been moved, provide their new paths so these links can be updated.

recipes/llama-3-70b/vllm/agg/perf.yaml (1)

8-8: Reasonable retry increase for perf testing job.

Increasing backoffLimit from 1 to 3 provides better fault tolerance for the performance testing job, allowing transient failures to be retried. This change aligns with the broader perf.yaml updates across the PR.

docs/index.rst (1)

45-45: Toctree link targets verified—all paths are valid.

All 10 toctree links in the restructured index file have been verified and resolve correctly to their target files/directories across the new organizational structure (reference/, router/, planner/, kvbm/, observability/, performance/, development/). No broken links detected.

recipes/llama-3-70b/vllm/agg/deploy.yaml (1)

21-21: Verify 0.6.0 container image availability before merging.

The image tag is being updated from the placeholder my-tag to the specific release version 0.6.0 on lines 21 and 45. The changes appear intentional and aligned with the PR's 0.6.0 release objectives, but the image tag availability cannot be verified programmatically in this environment (the NGC API requires authentication).

Please confirm that nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 is available in the NGC registry before merging this PR.

Comment on lines +746 to +760
# deploy the optimized DGD with planner
if args.deploy_after_profile and not args.dry_run:
logger.info("Deploying the optimized DGD with planner...")
# TODO: check conflicts for dynamo namespace and DGD name
# TODO: handle deployment errors and propagate proper error messages to users
client = DynamoDeploymentClient(
namespace=args.namespace,
base_log_dir=f"{args.output_dir}/final_deployment",
model_name=model_name,
service_name=args.service_name,
frontend_port=frontend_port,
deployment_name=config["metadata"]["name"],
)
await client.create_deployment(f"{args.output_dir}/config_with_planner.yaml")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Risk of undefined variables and missing deployment verification.

This post-profile deployment logic has several critical issues:

  1. Undefined variable risk: The variables model_name (line 754), frontend_port (line 756), and config (line 757) are defined inside the try block (lines 121, 171, 721). If an exception occurs before these variables are set, this code will fail with a NameError.

  2. No deployment verification: The deployment is created (line 759) but there's no await client.wait_for_deployment_ready() call. This means the function returns without confirming the deployment is functional.

  3. No error handling: There's no try-except block around the deployment creation. Deployment failures will either crash the script or leave broken deployments.

  4. No cleanup mechanism: Unlike the deployments in the try block which are tracked in deployment_clients and cleaned up in the finally block, this deployment is never added to that list and has no cleanup path.

  5. Incomplete implementation: The TODO comments (lines 749-750) confirm that conflict checking and error handling are missing.

Consider this structure:

+    # deploy the optimized DGD with planner
+    if args.deploy_after_profile and not args.dry_run:
+        try:
+            logger.info("Deploying the optimized DGD with planner...")
+            client = DynamoDeploymentClient(
+                namespace=args.namespace,
+                base_log_dir=f"{args.output_dir}/final_deployment",
+                model_name=model_name,
+                service_name=args.service_name,
+                frontend_port=frontend_port,
+                deployment_name=config["metadata"]["name"],
+            )
+            await client.create_deployment(f"{args.output_dir}/config_with_planner.yaml")
+            logger.info("Waiting for final deployment to be ready...")
+            await client.wait_for_deployment_ready(timeout=1800)
+            logger.info("Final deployment is ready and operational")
+        except Exception as e:
+            logger.error(f"Failed to deploy optimized DGD with planner: {e}")
+            logger.info("Attempting to clean up failed deployment...")
+            try:
+                await client.delete_deployment()
+            except Exception as cleanup_error:
+                logger.warning(f"Failed to clean up deployment: {cleanup_error}")
+            raise

Additionally, move this block inside the try block (before line 736) or ensure all required variables are defined with fallback values before this block executes.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In benchmarks/profiler/profile_sla.py around lines 746 to 760, the post-profile
deployment uses variables (model_name, frontend_port, config) that may be
undefined, lacks error handling, verification and cleanup registration; move
this deployment block inside the existing try block (or ensure those variables
have safe defaults set earlier), add a pre-deploy check for namespace/DGD name
conflicts, wrap client.create_deployment in a try/except to log and surface
errors, call await client.wait_for_deployment_ready() after creation to verify
readiness, and append the client to deployment_clients so it will be cleaned up
in the finally block.

extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

⚠️ Image name inconsistency: tensorrtllm-runtime vs trtllm-runtime.

This file references nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0, but other TRTLLM deployment files (agg.yaml, disagg.yaml, agg_router.yaml) consistently use nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0. This naming inconsistency could cause deployment failures if the tensorrtllm-runtime image doesn't exist or is outdated.

Verify that the image name is correct and exists in the registry. If this should match the other TRTLLM deployments, apply this diff:

-          image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0
+          image: nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0

Alternatively, confirm that tensorrtllm-runtime is the correct image name and verify it exists in the NVIDIA registry with the 0.6.0 tag. Run the following script to check:

Also applies to: 53-53


🏁 Script executed:

#!/bin/bash
# Verify image names exist in NVIDIA registry
echo "Checking trtllm-runtime..."
curl -s -H "Authorization: Bearer $(gh auth token)" https://api.github.com/repos/ai-dynamo/dynamo/issues \
  -X POST -d '{"title":"Verify NVIDIA image: trtllm-runtime:0.6.0"}' 2>/dev/null || echo "Could not verify via API"

# Alternative: Check if both image variants are referenced elsewhere in codebase
echo "Searching for trtllm-runtime vs tensorrtllm-runtime in TRTLLM files..."
grep -r "tensorrtllm-runtime" components/backends/trtllm/ || echo "Only trtllm-runtime found (expected)"
grep -r "trtllm-runtime" components/backends/trtllm/ | head -5

Length of output: 3043



🏁 Script executed:

#!/bin/bash
# Find YAML files in the repository
echo "Searching for YAML deployment files..."
find . -name "*.yaml" -o -name "*.yml" | grep -E "(agg|deploy|trtllm)" | head -20

echo ""
echo "Searching for image references across all YAML files..."
grep -r "tensorrtllm-runtime\|trtllm-runtime" --include="*.yaml" --include="*.yml" . 2>/dev/null || echo "No matches found"

Length of output: 3634


Fix image name inconsistency in agg-with-config.yaml

The file uses tensorrtllm-runtime while all other TRTLLM deployment files use trtllm-runtime. Update lines 37 and 53:

-          image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0
+          image: nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0

This inconsistency will cause deployment failures. All related TRTLLM deployment files (agg.yaml, agg_router.yaml, disagg.yaml, disagg_planner.yaml, disagg_router.yaml, disagg-multinode.yaml) consistently use trtllm-runtime:0.6.0.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0
image: nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0
🤖 Prompt for AI Agents
In components/backends/trtllm/deploy/agg-with-config.yaml around lines 37 and
53, the container image name is mistakenly set to
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0 while other TRTLLM deployment
files use nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0; update both occurrences
to nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0 to match the rest of the
deployments and avoid deployment failures.

For the complete and authoritative list of all vLLM metrics, always refer to the official documentation linked above.

Dynamo runtime metrics are documented in [docs/guides/metrics.md](../../../docs/guides/metrics.md).
Dynamo runtime metrics are documented in [docs/observability/metrics.md](../../../docs/observability/metrics.md).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the relative path depth for the documentation link.

Line 13 uses a relative path ../../../docs/observability/metrics.md to link to the metrics documentation. From components/backends/vllm/prometheus.md, this path goes up three levels (to components/) but the target file is in the repo root's docs/ directory. This may require an additional ../ level.


🏁 Script executed:

#!/bin/bash
# Description: Verify the relative path from prometheus.md to the metrics documentation

# Check the actual directory structure and confirm the correct relative path
echo "=== Checking directory structure ==="
echo "prometheus.md is at: components/backends/vllm/prometheus.md"

# Count path depth
file_depth=$(echo "components/backends/vllm/prometheus.md" | tr '/' '\n' | wc -l)
target_depth=$(echo "docs/observability/metrics.md" | tr '/' '\n' | wc -l)

echo "File is at depth: $file_depth"
echo "Target is at depth: $target_depth"

# Check if target file exists
if [ -f "docs/observability/metrics.md" ]; then
  echo "✓ Target file exists: docs/observability/metrics.md"
else
  echo "✗ Target file NOT found: docs/observability/metrics.md"
fi

# Test what the relative path should be
echo ""
echo "=== Relative path analysis ==="
echo "From: components/backends/vllm/"
echo "To: docs/observability/metrics.md"
echo "Correct relative path should be: ../../../../docs/observability/metrics.md"

Length of output: 363


🏁 Script executed:

#!/bin/bash
# Verify the actual content of line 13 in the file
echo "=== Current content of line 13 ==="
sed -n '13p' components/backends/vllm/prometheus.md

echo ""
echo "=== Full context around line 13 ==="
sed -n '10,16p' components/backends/vllm/prometheus.md

echo ""
echo "=== Checking for other path references ==="
grep -n "docs/observability/metrics" components/backends/vllm/prometheus.md

Length of output: 722


Fix the relative path in the markdown link on line 13—it's missing one level.

The relative path on line 13 uses ../../../docs/observability/metrics.md but should use ../../../../docs/observability/metrics.md. From components/backends/vllm/prometheus.md, the path needs four levels (../../../../) to reach the repo root, not three. The target file docs/observability/metrics.md exists and is correct; only the relative path depth is wrong.

🤖 Prompt for AI Agents
In components/backends/vllm/prometheus.md around line 13, the markdown link to
the metrics docs uses an incorrect relative path with three parent directories;
update the link to use four parent directories
(../../../../docs/observability/metrics.md) so it correctly resolves to
docs/observability/metrics.md from this file's location.

## 1. DynamoGraphDeployment Definition and Structure

### Core Definition
**File:** `/Users/anishmaddipoti/Desktop/repos/dynamo/deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go` (lines 30-69)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace absolute local file paths with repository-relative paths.

The documentation contains absolute paths to local files (e.g., /Users/anishmaddipoti/Desktop/repos/dynamo/...). These should be replaced with repository-relative paths for portability.

Apply this pattern throughout the file:

-**File:** `/Users/anishmaddipoti/Desktop/repos/dynamo/deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go` (lines 30-69)
+**File:** `deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go` (lines 30-69)

Also applies to: 268-268, 431-440

🤖 Prompt for AI Agents
In DGD_ARCHITECTURE_ANALYSIS.md around line 6 (and also at lines 268 and
431-440), replace any absolute local file paths like
/Users/anishmaddipoti/Desktop/repos/dynamo/deploy/... with repository-relative
paths (e.g., deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go);
scan the file for all occurrences of /Users/... and substitute them with the
repo-root-relative path, keep path separators intact, update any inline examples
to the same repository-relative format, and verify there are no remaining
absolute paths anywhere in the document.

:class: seealso

**New to SLA Planner?** Start with the [SLA Planner Quick Start Guide](/docs/kubernetes/sla_planner_quickstart.md) for a complete, step-by-step workflow.
**New to SLA Planner?** Start with the [SLA Planner Quick Start Guide](/docs/planner/sla_planner_quickstart.md) for a complete, step-by-step workflow.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the absolute path in the Markdown link within RST.

The link uses an absolute path /docs/planner/sla_planner_quickstart.md inside a Markdown-formatted link within an RST file. RST may not correctly resolve absolute paths that start with /docs/.

Run the following script to check if this path exists and to identify other similar link patterns:


🏁 Script executed:

#!/bin/bash
# Check if the target file exists
if [ -f "docs/planner/sla_planner_quickstart.md" ]; then
    echo "✓ Target file exists: docs/planner/sla_planner_quickstart.md"
else
    echo "✗ Target file NOT found: docs/planner/sla_planner_quickstart.md"
fi

# Search for other absolute path links in RST files
echo ""
echo "Other absolute /docs/ paths in RST files:"
rg -t rst '\[.*\]\(/docs/' --color never -n

Length of output: 291


I'll search for information about RST and Sphinx path handling to verify whether this absolute path is problematic.

Replace the Markdown-style link with a proper Sphinx cross-reference.

The link uses Markdown syntax [text](/docs/...) within an RST file, which bypasses Sphinx's path resolution system. In Sphinx, absolute paths starting with / are relative to the top source directory, but only when using Sphinx roles like :doc: or :download:.

Change line 32 from:

[SLA Planner Quick Start Guide](/docs/planner/sla_planner_quickstart.md)

To:

:doc:`SLA Planner Quick Start Guide </planner/sla_planner_quickstart>`

Note: Remove the .md extension and use the :doc: role for proper Sphinx resolution.

🤖 Prompt for AI Agents
docs/planner/planner_intro.rst around line 32: the file contains a
Markdown-style absolute link that bypasses Sphinx path resolution; replace the
Markdown link [SLA Planner Quick Start
Guide](/docs/planner/sla_planner_quickstart.md) with a Sphinx cross-reference
using the :doc: role, removing the .md and the leading /docs path, e.g. use
:doc:`SLA Planner Quick Start Guide </planner/sla_planner_quickstart>` so Sphinx
can resolve the document correctly.

Comment on lines +18 to +66
- name: perf
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0
workingDir: /workspace/components/backends/vllm
command:
- /bin/sh
- -c
- |
apt-get update && apt-get install -y curl jq procps git && apt-get clean
pip install git+https://github.com/ai-dynamo/aiperf.git@70af59489df24a601dba57604a7341966150b366;
echo "aiperf installation completed";
sysctl -w net.ipv4.ip_local_port_range="1024 65000"
cat /proc/sys/net/ipv4/ip_local_port_range
# wait for the model to be ready
export ENDPOINT=llama3-70b-agg-0:8000
export TARGET_MODEL=RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic
export INTERVAL=5
echo "Waiting for model '$TARGET_MODEL' at $ENDPOINT/v1/models (checking every ${INTERVAL}s)..."
while ! curl -s "http://$ENDPOINT/v1/models" | jq -e --arg model "$TARGET_MODEL" '.data[]? | select(.id == $model)' >/dev/null 2>&1; do
echo "[$(date '+%H:%M:%S')] Model not ready yet, waiting ${INTERVAL}s..."
sleep $INTERVAL
done
echo "✅ Model '$TARGET_MODEL' is now available!"
curl -s "http://$ENDPOINT/v1/models" | jq .
# now run the benchmark
export ARTIFACT_DIR="/tmp/genai"
mkdir -p "$ARTIFACT_DIR"
echo "Running benchmark..."
export COLUMNS=200
EPOCH=$(date +%s)
## utility functions -- can be moved to a bash script / configmap
wait_for_model_ready() {
echo "Waiting for model '$TARGET_MODEL' at $ENDPOINT/v1/models (checking every 5s)..."
while ! curl -s "http://$ENDPOINT/v1/models" | jq -e --arg model "$TARGET_MODEL" '.data[]? | select(.id == $model)' >/dev/null 2>&1; do
echo "[$(date '+%H:%M:%S')] Model not ready yet, sleeping 5s before checking again http://$ENDPOINT/v1/models"
sleep 5
done
echo "✅ Model '$TARGET_MODEL' is now available!"
echo "Model '$TARGET_MODEL' is now available!"
curl -s "http://$ENDPOINT/v1/models" | jq .
}
run_perf() {
local concurrency=$1
local isl=$2
local osl=$3
local max_threads=${concurrency}
key=concurrency_${concurrency}
export ARTIFACT_DIR="${ROOT_ARTIFACT_DIR}/${EPOCH}_${JOB_NAME}/${key}"
mkdir -p "$ARTIFACT_DIR"
echo "ARTIFACT_DIR: $ARTIFACT_DIR"
aiperf profile --artifact-dir $ARTIFACT_DIR \
--model $TARGET_MODEL \
--tokenizer /root/.cache/huggingface/hub/models--RedHatAI--Llama-3.3-70B-Instruct-FP8-dynamic/snapshots/ddb4128556dfcff99e0c41aee159ea6c3e655dcd \
--endpoint-type chat --endpoint /v1/chat/completions \
--streaming \
--url http://$ENDPOINT \
--synthetic-input-tokens-mean $isl \
--synthetic-input-tokens-stddev 0 \
--output-tokens-mean $osl \
--output-tokens-stddev 0 \
--extra-inputs max_tokens:$osl \
--extra-inputs min_tokens:$osl \
--extra-inputs ignore_eos:true \
--extra-inputs repetition_penalty:1.0 \
--extra-inputs temperature:0.0 \
--extra-inputs "{\"nvext\":{\"ignore_eos\":true}}" \
--concurrency $concurrency \
--request-count $((10*concurrency)) \
--warmup-request-count $concurrency \
--conversation-num 12800 \
--random-seed 100 \
--workers-max $max_threads \
-H 'Authorization: Bearer NOT USED' \
-H 'Accept: text/event-stream'\
--record-processors 32 \
--ui simple
echo "ARTIFACT_DIR: $ARTIFACT_DIR"
ls -la $ARTIFACT_DIR
}
#### Actual execution ####
wait_for_model_ready
mkdir -p "${ROOT_ARTIFACT_DIR}/${EPOCH}_${JOB_NAME}"
# Calculate total concurrency based on per-GPU concurrency and GPU count
TOTAL_CONCURRENCY=$((CONCURRENCY_PER_GPU * DEPLOYMENT_GPU_COUNT))
echo "Calculated total concurrency: $TOTAL_CONCURRENCY (${CONCURRENCY_PER_GPU} per GPU × ${DEPLOYMENT_GPU_COUNT} GPUs)"
# Write input_config.json
cat > "${ROOT_ARTIFACT_DIR}/${EPOCH}_${JOB_NAME}/input_config.json" <<EOF
{
"gpu_count": $DEPLOYMENT_GPU_COUNT,
"max_threads": $max_threads,
"concurrency_per_gpu": $CONCURRENCY_PER_GPU,
"total_concurrency": $TOTAL_CONCURRENCY,
"mode": "$DEPLOYMENT_MODE",
"isl": $ISL,
"osl": $OSL,
"endpoint": "$ENDPOINT",
"model endpoint": "$TARGET_MODEL"
}
EOF

# Run perf with calculated total concurrency
run_perf $TOTAL_CONCURRENCY $ISL $OSL
echo "done with concurrency $TOTAL_CONCURRENCY"
env:
- name: TARGET_MODEL
value: RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic
- name: ENDPOINT
value: llama3-70b-agg-frontend:8000
- name: CONCURRENCY_PER_GPU
value: "16"
- name: DEPLOYMENT_GPU_COUNT
value: "4"
- name: ISL
value: "8192"
- name: OSL
value: "1024"
- name: DEPLOYMENT_MODE
value: agg
- name: AIPERF_HTTP_CONNECTION_LIMIT
value: "200"
- name: JOB_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.labels['job-name']
- name: ROOT_ARTIFACT_DIR
value: /root/.cache/huggingface/perf
- name: HF_HOME
value: /root/.cache/huggingface
- name: PYTHONUNBUFFERED
value: "1"
image: python:3.12-slim
imagePullPolicy: IfNotPresent
name: perf
securityContext:
privileged: true
aiperf profile \
--model "$TARGET_MODEL" \
--tokenizer ~/.cache/huggingface/hub/models--RedHatAI--Llama-3.3-70B-Instruct-FP8-dynamic/snapshots/ddb4128556dfcff99e0c41aee159ea6c3e655dcd \
--endpoint-type chat --url "$ENDPOINT" --streaming \
--concurrency 64 \
--warmup-request-count 2 \
--request-count 320 \
--extra-inputs max_tokens:1024 \
--synthetic-input-tokens-mean 8192 \
--synthetic-input-tokens-stddev 0 \
--output-tokens-mean 1024 \
--output-tokens-stddev 0 \
--extra-inputs min_tokens:1024 \
--extra-inputs ignore_eos:true \
--extra-inputs "{\"nvext\":{\"ignore_eos\":true}}" \
--random-seed 1418186270 \
--artifact-dir $ARTIFACT_DIR \
--num-dataset-entries=3000 -- \
--max-threads 64
echo "----------------json----------------"
PERF_JSON=$(find $ARTIFACT_DIR -name profile_export_aiperf.json)
cat $PERF_JSON | jq .
echo "----------------csv-----------------"
PERF_CSV=$(find $ARTIFACT_DIR -name profile_export_aiperf.csv)
cat $PERF_CSV
echo "Benchmark completed successfully!"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout safeguard to model readiness polling loop.

The new perf container workflow includes a valuable model readiness check (lines 25-35) that prevents benchmarking from starting before the model is available. However, the polling loop lacks an explicit timeout or maximum retry limit:

while ! curl -s "http://$ENDPOINT/v1/models" | jq -e --arg model "$TARGET_MODEL" '.data[]? | select(.id == $model)' >/dev/null 2>&1; do
    echo "[$(date '+%H:%M:%S')] Model not ready yet, waiting ${INTERVAL}s..."
    sleep $INTERVAL
done

If the model never becomes available (due to deployment failures, misconfiguration, etc.), this loop will retry indefinitely, causing the job to hang and eventually hit Kubernetes pod termination limits.

Recommended fix: Add an explicit timeout mechanism:

+MAX_RETRIES=120  # 10 minutes with 5-second intervals
+RETRY_COUNT=0
 while ! curl -s "http://$ENDPOINT/v1/models" | jq -e --arg model "$TARGET_MODEL" '.data[]? | select(.id == $model)' >/dev/null 2>&1; do
     echo "[$(date '+%H:%M:%S')] Model not ready yet, waiting ${INTERVAL}s..."
+    RETRY_COUNT=$((RETRY_COUNT + 1))
+    if [ $RETRY_COUNT -ge $MAX_RETRIES ]; then
+      echo "❌ Timeout: Model did not become available within $((MAX_RETRIES * INTERVAL)) seconds"
+      exit 1
+    fi
     sleep $INTERVAL
 done

Additionally, consider validating that required CLI tools (curl, jq, aiperf) are available before the wait loop begins to fail fast on configuration issues.

🤖 Prompt for AI Agents
In recipes/llama-3-70b/vllm/agg/perf.yaml around lines 18 to 66, the readiness
polling loop has no timeout and can hang indefinitely; add a timeout/retry
mechanism (e.g. READINESS_TIMEOUT_SECONDS or MAX_RETRIES computed from INTERVAL)
and break the loop with a clear error log and non‑zero exit when the timeout is
reached, and before entering the loop validate required tools (curl, jq, aiperf)
are present and exit fast with an explanatory message if any are missing. Ensure
the timeout variables are configurable via environment variables, use a counter
or timestamp to track elapsed time, and return a non‑zero exit code when giving
up so Kubernetes can fail the job cleanly.

@dagil-nvidia dagil-nvidia deleted the docs-reorg-to-release-0.6.0 branch October 20, 2025 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants