docs: reorganization of documentation structure for 0.6.0#3750
docs: reorganization of documentation structure for 0.6.0#3750dagil-nvidia wants to merge 25 commits into
Conversation
Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Signed-off-by: lkomali <lkomali@nvidia.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: lkomali <lkomali@nvidia.com> Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
- Rename API/ → api/ for consistency - Create observability/, performance/, development/, reference/ sections - Move guides/ content to appropriate sections: * observability/: health-checks, logging, metrics * development/: backend-guide, runtime-guide * performance/: tuning * backends/: kvbm-setup guides * reference/: CLI, glossary, support-matrix - Keep guides/tool-calling.md (genuine how-to) - Remove empty runtime/ and deploy/ directories Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
- Update API/ → api/ references - Update guides/* → new locations (observability/, development/, performance/, reference/) - Update index.rst navigation with new paths - Update hidden_toctree.rst with new paths - Update cross-references in kubernetes/metrics.md Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
- Create top-level kvbm/ directory with all KVBM content * Architecture docs from architecture/kvbm_*.md * Setup guides from backends/*/kvbm-setup.md → kvbm/*-setup.md - Create top-level planner/ directory * Move planner docs from architecture/ - Move router/ out of components/ to top level - Remove empty components/ directory - Update all internal links and navigation Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Update all documentation links to reflect the new structure where kvbm/, planner/, and router/ directories have been moved to the top level of the docs folder. Changes: - Move docs/kubernetes/sla_planner_quickstart.md to docs/planner/ - Update all references from /docs/kubernetes/sla_planner_quickstart to /docs/planner/sla_planner_quickstart (8 files) - Update references from /docs/architecture/planner_intro to /docs/planner/planner_intro (2 files) - Update references from /docs/architecture/kvbm_intro to /docs/kvbm/kvbm_intro (1 file) - Fix critical Sphinx toctree reference in docs/index.rst - Remove temporary analysis files (DOCS_ANALYSIS.md, FINAL_STRUCTURE.md) All documentation links now correctly point to the reorganized structure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Fix relative path in components/src/dynamo/planner/README.md to correctly resolve to docs/planner/planner_intro.rst from the repo root. The path ../../docs/planner resolves to components/src/docs/planner which doesn't exist. Updated to ../../../../docs/planner to properly navigate from components/src/dynamo/planner back to repo root. Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Fix repository-relative links in README.md to work correctly on GitHub. Links with /docs/... resolve to https://github.com/docs/... and break. Changed to docs/... for proper relative resolution. Fixes: - /docs/planner/load_planner.md -> docs/planner/load_planner.md - /docs/planner/sla_planner.md -> docs/planner/sla_planner.md - /docs/kvbm/kvbm_architecture.md -> docs/kvbm/kvbm_architecture.md Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Update all broken links to reflect the new documentation structure: Metrics and Logging: - docs/guides/metrics.md → docs/observability/metrics.md (4 files) - docs/guides/logging.md → docs/observability/logging.md (1 file) Backend and Development: - backend.md → ../development/backend-guide.md (3 files) - docs/runtime/README.md → docs/development/runtime-guide.md (1 file) CLI and KVBM: - docs/guides/dynamo_run.md → docs/reference/cli.md (1 file) - docs/guides/run_kvbm_in_trtllm.md → docs/kvbm/trtllm-setup.md (1 file) Files updated: - components/backends/sglang/prometheus.md (2 instances) - components/backends/vllm/prometheus.md (2 instances) - deploy/metrics/README.md - deploy/tracing/README.md - docs/backends/trtllm/README.md - docs/kubernetes/create_deployment.md - docs/observability/health-checks.md - docs/observability/logging.md - docs/observability/metrics.md - lib/bindings/python/README.md Cleanup: - Remove temporary reorganization documentation files (REORG_SUMMARY.md, RESTRUCTURE_PLAN.md, URL_MIGRATION_GUIDE.md) All documentation links now correctly reference the reorganized structure. Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Merges all documentation reorganization changes and other updates from docs-reorg branch into release/0.6.0, including the fix for the planner README hyperlink. Conflicts resolved by accepting docs-reorg version for all conflicts. Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: Dan Gil <dagil@nvidia.com>
WalkthroughThis PR consolidates a configuration and documentation restructuring: it introduces DynamoGraphDeployment architecture documentation with public type definitions, updates the default Docker registry from Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Profiler as benchmarks/profiler/<br>profile_sla.py
participant Parser as Argparse CLI
participant DGD as DynamoGraphDeployment
participant Planner as Planner Service
User->>Profiler: Launch with args
Profiler->>Parser: Build CLI with planner args
Parser->>Profiler: Return parsed arguments
Profiler->>Profiler: Run profiling workflow
Note over Profiler: Original profiling logic
alt args.deploy_after_profile && !args.dry_run
Profiler->>DGD: generate_dgd_config_with_planner()
DGD->>Planner: Deploy optimized config
Planner->>DGD: Monitor & adjust replicas
end
Profiler->>User: Complete with artifacts
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Rationale: The PR spans ~80+ files but exhibits high homogeneity in the bulk of changes (repetitive container image tag updates across ~50+ deployment YAML files and documentation link path reorganizations across ~25+ files). These patterns are straightforward to verify once understood. However, complexity arises from: (1) the breadth of documentation restructuring requiring verification of path consistency, (2) the new public type definitions and architecture documentation requiring understanding of DGD design, (3) the refactored perf job workflows with new polling and artifact logic, and (4) the profiler CLI integration changes. The variety of change categories (build config, images, docs, workflows, types) prevents this from being "Simple," but the predominance of homogeneous updates keeps it from reaching "Complex." Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/backends/sglang/README.md (1)
58-58: Fix broken documentation link.Line 58 references an incorrect path. From
docs/backends/sglang/README.md, the filedocs/architecture/request_migration.mdexists and should use the correct relative path../../architecture/request_migration.mdto match the pattern of all other documentation links in the file (lines 37, 38, 39, 42).-| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](../../../docs/architecture/request_migration.md). | `0` (disabled) | N/A | +| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](../../architecture/request_migration.md). | `0` (disabled) | N/A |
🧹 Nitpick comments (4)
DGD_ARCHITECTURE_ANALYSIS.md (2)
1-6: Consider relocating to docs/ directory.Given this PR's focus on documentation reorganization, this new architecture analysis file should likely be placed under the new
docs/structure (perhapsdocs/planner/ordocs/development/) rather than at the repository root.
83-83: Add language identifiers to fenced code blocks.Multiple fenced code blocks lack language identifiers, which affects syntax highlighting and accessibility.
Add appropriate language identifiers:
- Go code blocks:
```go- Python code blocks:
```python- Rust code blocks:
```rust- YAML code blocks:
```yaml- Shell/text blocks:
```textor```bashAlso applies to: 104-104, 114-114, 143-143, 190-190, 196-196, 206-206, 215-215, 235-235, 250-250, 256-256, 268-268, 276-276, 283-283, 294-294, 318-318, 326-326, 333-333
docs/kvbm/vllm-setup.md (1)
22-22: Improve link text for clarity and linting compliance.The link text "here" is not descriptive and violates markdown linting rule MD059. Provide more contextual text that describes what users will find at the link destination.
Consider updating the link text as follows:
-To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html) +To learn what KVBM is, please check the [KVBM introduction](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html)This provides better context for readers and accessibility tools.
docs/kvbm/trtllm-setup.md (1)
22-22: Replace non-descriptive link text "here" with descriptive alternative.The KVBM intro documentation file exists at the new location (
./docs/kvbm/kvbm_intro.rst), confirming the link target is valid. The.htmlextension in the external URL is correct for Sphinx-built documentation.However, the link text "here" remains non-descriptive per Markdown best practices. Apply the suggested improvement:
-To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html) +To learn what KVBM is, check the [KVBM introduction](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (87)
DGD_ARCHITECTURE_ANALYSIS.md(1 hunks)Earthfile(3 hunks)README.md(2 hunks)benchmarks/incluster/benchmark_job.yaml(1 hunks)benchmarks/profiler/profile_sla.py(3 hunks)benchmarks/profiler/utils/config.py(1 hunks)components/backends/sglang/deploy/README.md(2 hunks)components/backends/sglang/deploy/agg.yaml(2 hunks)components/backends/sglang/deploy/agg_logging.yaml(2 hunks)components/backends/sglang/deploy/agg_router.yaml(2 hunks)components/backends/sglang/deploy/disagg-multinode.yaml(3 hunks)components/backends/sglang/deploy/disagg.yaml(3 hunks)components/backends/sglang/deploy/disagg_planner.yaml(4 hunks)components/backends/sglang/prometheus.md(2 hunks)components/backends/trtllm/deploy/README.md(3 hunks)components/backends/trtllm/deploy/agg-with-config.yaml(2 hunks)components/backends/trtllm/deploy/agg.yaml(2 hunks)components/backends/trtllm/deploy/agg_router.yaml(2 hunks)components/backends/trtllm/deploy/disagg-multinode.yaml(3 hunks)components/backends/trtllm/deploy/disagg.yaml(3 hunks)components/backends/trtllm/deploy/disagg_planner.yaml(4 hunks)components/backends/trtllm/deploy/disagg_router.yaml(3 hunks)components/backends/vllm/deploy/README.md(3 hunks)components/backends/vllm/deploy/agg.yaml(2 hunks)components/backends/vllm/deploy/agg_kvbm.yaml(2 hunks)components/backends/vllm/deploy/agg_router.yaml(2 hunks)components/backends/vllm/deploy/disagg-multinode.yaml(3 hunks)components/backends/vllm/deploy/disagg.yaml(3 hunks)components/backends/vllm/deploy/disagg_kvbm.yaml(3 hunks)components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml(3 hunks)components/backends/vllm/deploy/disagg_kvbm_tp2.yaml(3 hunks)components/backends/vllm/deploy/disagg_planner.yaml(4 hunks)components/backends/vllm/deploy/disagg_router.yaml(3 hunks)components/backends/vllm/prometheus.md(2 hunks)components/src/dynamo/planner/README.md(1 hunks)container/Dockerfile.sglang(1 hunks)container/Dockerfile.sglang-wideep(1 hunks)deploy/cloud/operator/Earthfile(1 hunks)deploy/cloud/pre-deployment/nixl/README.md(1 hunks)deploy/metrics/README.md(1 hunks)deploy/tracing/README.md(1 hunks)docs/_includes/install.rst(2 hunks)docs/architecture/architecture.md(1 hunks)docs/architecture/kv_cache_routing.md(2 hunks)docs/backends/sglang/README.md(2 hunks)docs/backends/trtllm/README.md(2 hunks)docs/backends/trtllm/gpt-oss.md(1 hunks)docs/backends/vllm/README.md(1 hunks)docs/benchmarks/benchmarking.md(1 hunks)docs/benchmarks/pre_deployment_profiling.md(2 hunks)docs/deploy/metrics/docker-compose.yml(0 hunks)docs/hidden_toctree.rst(2 hunks)docs/index.rst(2 hunks)docs/kubernetes/create_deployment.md(2 hunks)docs/kubernetes/installation_guide.md(1 hunks)docs/kubernetes/metrics.md(1 hunks)docs/kvbm/trtllm-setup.md(1 hunks)docs/kvbm/vllm-setup.md(1 hunks)docs/observability/health-checks.md(1 hunks)docs/observability/logging.md(1 hunks)docs/observability/metrics.md(1 hunks)docs/planner/planner_intro.rst(2 hunks)docs/planner/sla_planner.md(2 hunks)docs/planner/sla_planner_quickstart.md(2 hunks)examples/README.md(3 hunks)examples/basics/kubernetes/Distributed_Inference/agg_router.yaml(2 hunks)examples/custom_backend/hello_world/deploy/hello_world.yaml(2 hunks)examples/deployments/ECS/task_definition_frontend.json(1 hunks)examples/deployments/ECS/task_definition_prefillworker.json(1 hunks)examples/multimodal/deploy/agg_llava.yaml(4 hunks)examples/multimodal/deploy/agg_qwen.yaml(4 hunks)lib/bindings/python/README.md(1 hunks)pyproject.toml(1 hunks)recipes/gpt-oss-120b/trtllm/agg/deploy.yaml(2 hunks)recipes/llama-3-70b/vllm/agg/deploy.yaml(2 hunks)recipes/llama-3-70b/vllm/agg/perf.yaml(2 hunks)recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml(3 hunks)recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml(2 hunks)recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml(3 hunks)recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml(2 hunks)tests/planner/perf_test_configs/agg_8b.yaml(2 hunks)tests/planner/perf_test_configs/disagg_8b_2p2d.yaml(3 hunks)tests/planner/perf_test_configs/disagg_8b_3p1d.yaml(3 hunks)tests/planner/perf_test_configs/disagg_8b_planner.yaml(4 hunks)tests/planner/perf_test_configs/disagg_8b_tp2.yaml(3 hunks)tests/planner/perf_test_configs/image_cache_daemonset.yaml(1 hunks)tests/planner/scaling/disagg_planner.yaml(4 hunks)
💤 Files with no reviewable changes (1)
- docs/deploy/metrics/docker-compose.yml
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T00:26:43.641Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#3035
File: lib/runtime/examples/system_metrics/README.md:65-65
Timestamp: 2025-09-16T00:26:43.641Z
Learning: The team at ai-dynamo/dynamo prefers to use consistent metric naming patterns with _total suffixes across all metric types (including gauges) for internal consistency, even when this differs from strict Prometheus conventions that reserve _total for counters only. This design decision was confirmed by keivenchang in PR 3035, referencing examples in prometheus_names.rs and input from team members.
Applied to files:
components/backends/vllm/prometheus.mdcomponents/backends/sglang/prometheus.md
📚 Learning: 2025-09-29T19:11:14.161Z
Learnt from: nv-anants
PR: ai-dynamo/dynamo#3290
File: docs/_includes/quick_start_local.rst:13-14
Timestamp: 2025-09-29T19:11:14.161Z
Learning: ai-dynamo package requires --prerelease=allow flag for installation due to sglang dependency requiring RC versions (e.g., sglang[all]==0.5.0rc2).
Applied to files:
docs/backends/sglang/README.md
🧬 Code graph analysis (1)
benchmarks/profiler/profile_sla.py (3)
benchmarks/profiler/utils/config.py (1)
generate_dgd_config_with_planner(405-514)benchmarks/profiler/utils/planner_utils.py (1)
add_planner_arguments_to_parser(132-174)deploy/utils/dynamo_deployment.py (2)
DynamoDeploymentClient(98-475)create_deployment(220-265)
🪛 LanguageTool
DGD_ARCHITECTURE_ANALYSIS.md
[style] ~90-~90: The words ‘Observation’ and ‘observed’ are quite similar. Consider replacing ‘observed’ with a different word.
Context: ...justment_interval` seconds - Metrics observed: - Number of requests (num_req) ...
(VERB_NOUN_SENT_LEVEL_REP)
[grammar] ~122-~122: Use a hyphen to join words.
Context: ...replicas, blocking=False)` ### Decision Making Assumptions - **Homogeneous worke...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
docs/kvbm/vllm-setup.md
22-22: Link text should be descriptive
(MD059, descriptive-link-text)
docs/kvbm/trtllm-setup.md
22-22: Link text should be descriptive
(MD059, descriptive-link-text)
DGD_ARCHITECTURE_ANALYSIS.md
33-33: Link text should be descriptive
(MD059, descriptive-link-text)
83-83: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
104-104: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
114-114: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
143-143: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
190-190: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
196-196: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
206-206: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
215-215: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
219-219: Bare URL used
(MD034, no-bare-urls)
229-229: Blank line inside blockquote
(MD028, no-blanks-blockquote)
235-235: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
246-246: Link text should be descriptive
(MD059, descriptive-link-text)
250-250: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
256-256: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
268-268: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
276-276: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
283-283: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
294-294: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
318-318: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
326-326: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
333-333: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (88)
examples/basics/kubernetes/Distributed_Inference/agg_router.yaml (1)
41-41: Verify version consistency across related deployment manifests.The changes to
agg_router.yamlare correctly updated to0.6.0with no remaining placeholders. However, the verification revealed a version inconsistency:shared_frontend/shared_frontend.yamlstill usesvllm-runtime:0.5.0(5 instances on lines 26, 48, 77, 98, 119) whileagg_router.yamluses0.6.0.Clarify intent: If this PR aims for comprehensive 0.6.0 release updates across all deployment manifests,
shared_frontend.yamlshould also be updated. If the version difference is intentional (e.g., different services require different versions), please confirm this is expected and document the reason.benchmarks/profiler/profile_sla.py (2)
16-16: LGTM! Import additions support the new planner integration.The new imports are correctly added and used throughout the file:
argparsefor the inline CLI parser,WORKER_COMPONENT_NAMESandgenerate_dgd_config_with_plannerfor the planner configuration, andadd_planner_arguments_to_parserfor dynamic argument injection.Also applies to: 26-32
762-905: LGTM! Comprehensive argparse setup with good organization.The inline argparse parser is well-structured with:
- Clear argument grouping (profiling, interpolation, MoE, deployment, AI configurator)
- Helpful descriptions and default values
- Proper integration with planner arguments via
add_planner_arguments_to_parser()(line 876)This replaces the previous
create_profiler_parser()approach with a more maintainable, self-contained implementation.docs/backends/trtllm/gpt-oss.md (1)
52-52: Image tag and registry update is consistent with release version.The container image reference has been properly updated from a placeholder to the official 0.6.0 release tag with the NVIDIA AI-Dynamo registry. Inline with PR objectives.
deploy/metrics/README.md (1)
6-6: Documentation reference path updated consistently with reorganization.The metrics guide link has been properly updated to reflect the new documentation structure where metrics documentation has been moved to the observability section.
deploy/tracing/README.md (1)
22-22: Documentation reference path updated consistently with reorganization.The logging guide reference has been properly updated to reflect the new documentation structure where logging documentation has been moved to the observability section, consistent with the metrics README update.
docs/planner/sla_planner_quickstart.md (2)
109-109: Image tag and registry updated consistently with release version.The Docker image reference has been properly updated to use the official 0.6.0 release tag with the NVIDIA AI-Dynamo registry.
252-252: Documentation link updated to reflect planner subsection reorganization.The architecture documentation link has been properly updated to point to the new planner documentation location, aligned with PR's documentation reorganization objectives.
components/src/dynamo/planner/README.md (1)
18-18: Relative documentation path updated correctly for new directory structure.The planner documentation link has been properly updated with the correct relative path traversal (from
components/src/dynamo/planner/to root leveldocs/planner/), aligning with the documentation reorganization.examples/deployments/ECS/task_definition_frontend.json (1)
6-6: Container image updated to official NVIDIA AI-Dynamo registry with 0.6.0 tag.The ECS task definition now references the official container image with proper registry and release tag, consistent with other deployment updates in this PR.
docs/kubernetes/create_deployment.md (2)
93-93: Documentation link updated to reference section.The dynamo run guide reference has been updated to point to the reference section. Verify this is the intended location—if this is contributor-focused documentation, consider whether
/docs/development/cli.mdor similar would be more appropriate.
161-161: Image pull secrets example updated with realistic registry hostname.The example imagePullSecrets name has been updated from a generic placeholder to reflect the actual NVIDIA registry. Ensure the surrounding documentation clearly indicates this is an example that users should customize for their actual secret configurations.
deploy/cloud/pre-deployment/nixl/README.md (1)
1-4: SPDX license header properly added.Standard copyright and Apache-2.0 license header has been added to the README file, aligning with repository licensing standards.
benchmarks/incluster/benchmark_job.yaml (1)
21-21: LGTM!The image tag update from placeholder to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0is consistent with the 0.6.0 release updates across the PR and aligns with the TODO comment.docs/_includes/install.rst (1)
13-13: LGTM!The installation documentation is correctly updated to reflect the 0.6.0 release with proper version pinning for both PyPI and Docker image references.
Also applies to: 44-44
pyproject.toml (1)
63-63: Verify sglang post1 stability.The sglang dependency is being downgraded from
0.5.3.post2to0.5.3.post1. While the enriched context indicates this aligns with broader ecosystem updates reflected in container Dockerfiles, confirm thatpost1is the stable/tested version and doesn't regress any functionality used by the codebase.lib/bindings/python/README.md (1)
53-53: Verify documentation path is correct.The link has been updated from a runtime README to
docs/development/runtime-guide.md. Ensure that this file exists at the new location after the documentation reorganization and contains the prerequisites content expected by users of this README.examples/deployments/ECS/task_definition_prefillworker.json (1)
6-6: LGTM!The ECS task definition is correctly updated to reference the official runtime image with the 0.6.0 tag.
examples/multimodal/deploy/agg_qwen.yaml (1)
17-17: LGTM!All four services in the multimodal deployment spec are consistently updated to the correct runtime image with version 0.6.0. No other fields were modified, preserving the deployment logic.
Also applies to: 28-28, 45-45, 62-62
components/backends/vllm/deploy/disagg_kvbm.yaml (1)
16-16: LGTM!All three worker types in the disaggregated KVBM deployment are consistently updated to
0.6.0. The image references are correct for the KVBM test configuration.Also applies to: 27-27, 59-59
docs/architecture/architecture.md (1)
57-58: Links verify correctly — no issues found.Both target documentation files exist at their new paths:
docs/kvbm/kvbm_intro.rst✓docs/planner/planner_intro.rst✓The relative path updates in the architecture documentation are correct and resolve to valid destinations. No broken links detected.
deploy/cloud/operator/Earthfile (1)
48-48: LGTM! Registry update aligns with the 0.6.0 release.The update from the placeholder registry to NVIDIA's production registry is consistent with the broader migration across the PR.
docs/planner/planner_intro.rst (1)
80-80: LGTM! Toctree reference correctly updated.The relative path update is consistent with the documentation reorganization.
components/backends/vllm/deploy/README.md (2)
72-72: LGTM! Image references updated to production registry.The container image references are now pointing to the official NVIDIA registry with the 0.6.0 release tag, replacing the placeholder values.
Also applies to: 119-119
240-240: LGTM! Documentation link correctly updated.The SLA Planner quickstart link has been updated to reflect the new documentation structure, with the correct relative path from the current file location.
components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (1)
16-16: LGTM! All service images consistently updated to 0.6.0.The three services (Frontend, VllmDecodeWorker, VllmPrefillWorker) have been updated consistently to use the production NVIDIA registry with version 0.6.0.
Also applies to: 27-27, 63-63
components/backends/trtllm/deploy/disagg_router.yaml (1)
16-16: LGTM! TensorRT-LLM service images consistently updated.All three services (Frontend, TRTLLMPrefillWorker, TRTLLMDecodeWorker) have been updated to use the production NVIDIA registry with version 0.6.0, correctly using the
trtllm-runtimeimage.Also applies to: 30-30, 58-58
benchmarks/profiler/utils/config.py (1)
109-109: LGTM! Default planner image updated to production registry.The default container image for the planner service has been updated from placeholder values to the production NVIDIA registry with version 0.6.0, consistent with other components in this PR.
components/backends/sglang/prometheus.md (1)
13-13: LGTM! Metrics documentation paths updated correctly.Both references to the Dynamo metrics documentation have been updated to reflect the reorganization from
docs/guides/metrics.mdtodocs/observability/metrics.md, with correct relative paths.Also applies to: 94-94
docs/backends/trtllm/README.md (2)
311-311: Link path and filename change verified.The documentation link has been successfully updated from
docs/architecture/run_kvbm_in_trtllm.mdtodocs/kvbm/trtllm-setup.md. The new file exists and the old path has been removed. No action required.
58-60: All documentation paths verified and valid.The script confirms that all three updated documentation paths exist at their new locations:
docs/planner/sla_planner.md✓docs/planner/load_planner.md✓docs/kvbm/kvbm_architecture.md✓No broken links or missing files found.
examples/multimodal/deploy/agg_llava.yaml (1)
17-17: Registry and image tag updates applied consistently.All four container image references have been updated uniformly from the generic registry to the NVIDIA AI-Dynamo registry with the 0.6.0 release tag. No functional changes were made to the deployment configuration.
Also applies to: 28-28, 45-45, 62-62
components/backends/vllm/deploy/disagg-multinode.yaml (1)
16-16: Registry and image tag updates applied consistently across all workers.All three vllm-runtime image references (Frontend, decode worker, prefill worker) have been updated uniformly to the NVIDIA AI-Dynamo registry with the 0.6.0 tag.
Also applies to: 37-37, 60-60
components/backends/sglang/deploy/README.md (1)
64-64: Image references updated consistently to 0.6.0 release tag.Both container image references have been updated to the NVIDIA AI-Dynamo registry with the 0.6.0 tag, aligning with the release version.
Also applies to: 95-95
components/backends/trtllm/deploy/disagg-multinode.yaml (1)
98-98: Registry and image tag updates applied consistently across all workers.All three trtllm-runtime image references (Frontend, prefill worker, decode worker) have been updated uniformly to the NVIDIA AI-Dynamo registry with the 0.6.0 tag.
Also applies to: 130-130, 170-170
components/backends/trtllm/deploy/README.md (1)
92-92: Image references updated consistently to 0.6.0 release tag.All three trtllm-runtime image references have been updated to the NVIDIA AI-Dynamo registry with the 0.6.0 tag, aligning with the 0.6.0 release version.
Also applies to: 112-112, 144-144
container/Dockerfile.sglang (1)
16-17: Dependency versions are synchronized correctly.The sglang dependency in pyproject.toml matches the Dockerfile specification: both specify
0.5.3.post1. The version downgrade has been applied consistently across both configuration files.container/Dockerfile.sglang-wideep (1)
4-4: Dependency synchronization verified — no action required.The pyproject.toml correctly specifies
sglang[all]==0.5.3.post1, matching the downgraded image tag in the Dockerfile. The version synchronization across files is complete.examples/README.md (3)
51-59: All launch tool directories verified successfully.All four paths referenced in the documentation exist:
launch/dynamo-run/✓components/backends/vllm/launch/✓components/backends/sglang/launch/✓components/backends/trtllm/launch/✓The merge conflicts were properly resolved, and the directory reorganization is complete.
86-100: All referenced backend component paths verified—links are correct.The verification confirms that all nine paths referenced in the restructured Components Workflows section (lines 86–100) exist in the repository:
- ✓ Backend directories:
components/backends/{vllm,sglang,trtllm}/- ✓ Kubernetes CRDs:
components/backends/{vllm,sglang,trtllm}/deploy/- ✓ Launch Scripts:
components/backends/{vllm,sglang,trtllm}/launch/No broken or missing links found.
39-39: GKE deployment guide verified — no action needed.Line 39 references
../docs/kubernetes/gke_setup.mdunder platform-specific deployment guides. Verification confirms this file exists at the correct relative path.examples/custom_backend/hello_world/deploy/hello_world.yaml (1)
44-44: LGTM! Container image references updated for 0.6.0 release.Both service containers now reference the official NVIDIA registry with the 0.6.0 release tag, consistent with the PR's objectives.
Also applies to: 83-83
docs/benchmarks/benchmarking.md (1)
413-413: LGTM! Documentation updated with 0.6.0 image tag.The default configuration example now reflects the 0.6.0 release image tag.
Earthfile (1)
137-137: LGTM! Docker registry updated across all build targets.All Earthfile targets now reference the official NVIDIA registry
nvcr.io/nvidia/ai-dynamo, consistent with the 0.6.0 release standardization.Also applies to: 178-178, 192-192
recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (2)
8-8: LGTM! Increased retry tolerance for performance tests.Increasing
backoffLimitfrom 1 to 3 is appropriate for performance benchmarking jobs that may experience transient failures.
18-66: LGTM! Enhanced performance testing workflow.The refactored approach introduces several improvements:
- Uses official 0.6.0 vLLM runtime image
- Implements model readiness polling before benchmarking
- Provides detailed benchmark execution with explicit artifact markers
- Better structured for automated CI/CD pipelines
README.md (2)
77-77: No issues found—support matrix path is correct.The file
docs/reference/support-matrix.mdexists at the location referenced in README.md line 77. The path update is accurate and reflects the file's actual location after reorganization.
62-64: Documentation paths verified—all files exist at referenced locations.The framework support matrix links in README.md lines 62-64 correctly reference the reorganized documentation. All three files (
load_planner.md,sla_planner.md, andkvbm_architecture.md) exist at their new paths underdocs/planner/anddocs/kvbm/. No broken links found.docs/benchmarks/pre_deployment_profiling.md (1)
4-4: The verification confirms that the filesla_planner_quickstart.mdexists at the path/docs/planner/sla_planner_quickstart.md. The documentation link is valid and references an existing file. No issues found.docs/hidden_toctree.rst (1)
14-42: Documentation reorganization verified—all paths correct and toctree syntax valid.All 26 files referenced in the hidden toctree exist at their new locations following the major reorganization:
docs/development/runtime-guide.md✓docs/api/nixl_connect/(12 files) ✓docs/kubernetes/(6 files) ✓docs/reference/cli.md✓docs/observability/metrics.md✓docs/kvbm/(2 files) ✓docs/guides/tool-calling.md✓docs/architecture/kv_cache_routing.md✓docs/planner/load_planner.md✓The toctree directive syntax is valid with proper indentation. No broken references detected.
docs/observability/metrics.md (1)
99-99: Verify documentation link target exists.The Backend Guide link has been updated as part of the documentation reorganization. Ensure that the target file
docs/development/backend-guide.mdexists and contains the expected content.The link text is descriptive and the relative path looks correct for the file hierarchy. Please confirm the target file exists in the repository after the reorganization.
recipes/gpt-oss-120b/trtllm/agg/deploy.yaml (1)
33-33: Verify TensorRT-LLM runtime image availability.The container image tags have been updated to
0.6.0in both the Frontend and TrtllmWorker components. Confirm that the imagenvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0exists and is accessible in the NVIDIA registry.As mentioned in the previous file review, you can verify the image availability by pulling it locally or checking the NVCR registry documentation for published tags.
Also applies to: 72-72
components/backends/vllm/deploy/disagg.yaml (1)
16-16: Verify vLLM runtime image availability.The container image tags have been updated to
0.6.0across all three components (Frontend, VllmDecodeWorker, and VllmPrefillWorker). Confirm that the imagenvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0exists and is accessible in the NVIDIA registry.Verify the image is available by attempting to pull it or checking the NVCR registry for published tags. This should be validated before the PR is merged to ensure deployments can pull the images.
Also applies to: 28-28, 48-48
docs/observability/health-checks.md (1)
200-200: Verify documentation link target exists.The Backend Guide link has been updated as part of the documentation reorganization. Ensure that the target file
docs/development/backend-guide.mdexists and contains the expected content.The link text is descriptive and the relative path appears correct for the reorganized file structure.
docs/kubernetes/metrics.md (1)
68-68: Verify documentation link target exists.The metrics guide link has been updated to reflect the new documentation structure. Ensure that the target file
docs/observability/metrics.mdexists (which we reviewed earlier in this PR).The link text is descriptive and aligns with the documentation reorganization objectives.
docs/kubernetes/installation_guide.md (1)
199-199: Verify documentation link target exists.The SLA Planner Quickstart Guide link has been updated to reflect the new planner directory structure. Ensure that the target file
docs/planner/sla_planner_quickstart.mdexists and contains the expected content.The link text is descriptive and the relative path aligns with the documentation reorganization for the 0.6.0 release.
recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (2)
43-43: The review comment is incorrect—hardcoded snapshot hashes are intentional for reproducibility.The pattern of hardcoded tokenizer snapshot hashes appears consistently across multiple benchmarking jobs in both gpt-oss-120b and llama-3-70b recipe suites. Learnings confirm that hardcoded snapshot hashes in tokenizer paths are intentional for version pinning and reproducible results, not an issue to be fixed. This is established design in benchmarking manifests, not a brittleness concern requiring refactoring. No changes are needed.
Likely an incorrect or invalid review comment.
8-8: I need to search for information about vLLM 0.6.0 to determine if this image version exists and is documented.Manual verification required for container image availability.
While the NGC catalog hosts Dynamo vLLM Runtime, a pre-built, Docker-based environment designed to run NVIDIA Dynamo with the vLLM inference engine, the sandbox environment cannot directly access external registries to confirm the specific tag
0.6.0exists.Verify the image before deployment by:
- Pulling directly:
docker pull nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0(requires NGC credentials)- Checking the NGC catalog web UI for available tags at the container repository
- Consulting release notes or documentation for this version's availability
docs/architecture/kv_cache_routing.md (1)
159-159: Documentation reorganization verified successfully.The new router documentation path
docs/router/README.mdexists and is properly accessible. The old pathdocs/components/router/README.mdhas been removed as expected. The link updates at lines 159 and 515 correctly reference the new location.components/backends/vllm/deploy/agg.yaml (1)
16-16: Consider removing the image availability check or reformulating it as a pre-deployment verification step.The vLLM v0.6.0 release is a documented version that introduced significant optimizations. The NVIDIA registry (nvcr.io) requires authentication to verify manifest details via API, so a 401 response does not confirm or deny image existence.
If image pull failures occur during deployment, they will be caught at runtime. The image tag appears consistent with a legitimate release. Either verify access separately with proper credentials before deployment or remove this check—it cannot be conclusively resolved without authenticated registry access.
docs/observability/logging.md (1)
190-190: Link target verified successfully — no issues found.The target file
docs/development/backend-guide.mdexists at the correct location. The link update from../backend.mdto../development/backend-guide.mdis correct and consistently applied across related documentation files (metrics.md, health-checks.md). No stale references to the old path remain.recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml (1)
21-21: Image references updated consistently to 0.6.0.All three vLLM runtime container images are updated to the new NVIDIA registry with consistent tagging (nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0). The updates align with the 0.6.0 release standardization.
Also applies to: 45-45, 74-74
components/backends/trtllm/deploy/agg.yaml (1)
16-16: trtllm-runtime images updated correctly to 0.6.0.Both Frontend and TRTLLMWorker containers are correctly updated to nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0.
Also applies to: 27-27
components/backends/sglang/deploy/agg_router.yaml (1)
16-16: sglang-runtime images updated correctly to 0.6.0.Both Frontend and decode worker containers are correctly updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0.
Also applies to: 30-30
components/backends/trtllm/deploy/disagg.yaml (1)
16-16: trtllm-runtime images updated consistently across all workers.Frontend, prefill, and decode worker containers all correctly reference nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0.
Also applies to: 28-28, 56-56
components/backends/trtllm/deploy/agg_router.yaml (1)
16-16: trtllm-runtime images updated correctly to 0.6.0.Frontend and worker containers correctly reference nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0.
Also applies to: 30-30
tests/planner/perf_test_configs/image_cache_daemonset.yaml (1)
23-23: Image cache DaemonSet updated to 0.6.0.The vllm-runtime image reference for the image-caching DaemonSet is correctly updated to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0.
components/backends/sglang/deploy/agg_logging.yaml (1)
19-19: sglang-runtime images updated correctly to 0.6.0.Both Frontend and decode worker containers are correctly updated to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0 with consistent image references.
Also applies to: 30-30
components/backends/sglang/deploy/disagg-multinode.yaml (1)
25-25: Image tag updates look good.All three container image references are consistently updated to
nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0. The changes align with the PR objective of updating image tags for the 0.6.0 release.Also applies to: 38-38, 75-75
components/backends/trtllm/deploy/disagg_planner.yaml (1)
19-19: Image tag updates are consistent.All four container image references are updated to
nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0with no other changes to pod spec configuration.Also applies to: 47-47, 88-88, 117-117
tests/planner/perf_test_configs/disagg_8b_tp2.yaml (1)
41-41: Image tag updates are consistent across worker services.All three container image references are updated to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0with no changes to pod spec or service configuration.Also applies to: 91-91, 141-141
components/backends/vllm/prometheus.md (1)
99-99: Documentation reference path update looks appropriate.Line 99 updates the reference in the "See Also" section to point to the new reorganized path
docs/observability/metrics.md. This aligns with the PR's documentation reorganization objectives.components/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)
16-16: Image tag updates are consistent.All three container image references are updated to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0with no other changes to pod spec configuration.Also applies to: 29-29, 67-67
components/backends/sglang/deploy/disagg.yaml (1)
16-16: Image tag updates are consistent.All three container image references are updated to
nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0with no other changes to pod spec configuration.Also applies to: 28-28, 64-64
recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml (1)
21-21: Image tag updates are consistent.All three container image references are updated to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0with no other changes to pod spec configuration.Also applies to: 55-55, 94-94
tests/planner/perf_test_configs/disagg_8b_2p2d.yaml (1)
41-41: Image tag updates are consistent across worker services.All three container image references are updated to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0with no changes to pod spec or service configuration.Also applies to: 91-91, 141-141
docs/backends/sglang/README.md (1)
107-108: Verify sglang version strategy.The installation section has been updated to specify
sglang[all]==0.5.3.post1(a production/patch version) instead of an RC version. According to prior learnings, the ai-dynamo package requires--prerelease=allowflag for RC versions of sglang (e.g., sglang[all]==0.5.0rc2).Confirm that
0.5.3.post1is the intended stable version and that it's compatible with the project's prerelease requirements. If RC versions are still needed, update the documentation to clarify the--prerelease=allowflag usage.Based on learnings from PR #3290.
tests/planner/perf_test_configs/disagg_8b_3p1d.yaml (1)
41-41: Container image references updated consistently.The vLLM runtime image references have been properly updated from placeholder tags to the NVIDIA AI-Dynamo registry with version 0.6.0 across all services (Frontend, VllmDecodeWorker, VllmPrefillWorker).
Also applies to: 91-91, 141-141
tests/planner/scaling/disagg_planner.yaml (1)
21-21: Container image registry and tag updated consistently.All vLLM runtime container references have been updated to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0across all services in the disagg_planner configuration.Also applies to: 44-44, 81-81, 105-105
components/backends/vllm/deploy/agg_router.yaml (1)
16-16: Container images updated to NVIDIA registry 0.6.0 tag.Frontend and VllmDecodeWorker images properly updated to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0.Also applies to: 30-30
docs/planner/sla_planner.md (1)
4-4: Documentation links properly updated to new directory structure.SLA Planner Quick Start Guide references have been updated from
/docs/kubernetes/sla_planner_quickstart.mdto/docs/planner/sla_planner_quickstart.mdto reflect the documentation reorganization.Also applies to: 132-132
components/backends/vllm/deploy/disagg_planner.yaml (1)
19-19: Container images updated to NVIDIA registry version 0.6.0 across all services.vLLM runtime images properly updated to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0in Frontend, Planner, and both worker services.Also applies to: 29-29, 51-51, 71-71
tests/planner/perf_test_configs/disagg_8b_planner.yaml (1)
44-44: Container images consistently updated to NVIDIA AI-Dynamo 0.6.0 across all services.All vLLM runtime container references updated to
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0in this comprehensive perf test configuration spanning Frontend, Planner, and worker services.Also applies to: 77-77, 141-141, 198-198
components/backends/sglang/deploy/agg.yaml (1)
16-16: SGLang runtime images updated to NVIDIA registry 0.6.0 tag.Frontend and decode worker images properly updated to
nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0, maintaining differentiation from vLLM runtime images across the backend deployments.Also applies to: 27-27
docs/backends/vllm/README.md (1)
41-43: Incomplete path migration in Feature Support Matrix.Three documentation links are being updated to reflect the new directory structure (planner/, kvbm/), but the file contains other architecture documentation links that remain unchanged:
- Line 38:
../../../docs/architecture/disagg_serving.md- Line 40:
../../../docs/architecture/kv_cache_routing.md- Line 181:
../../../docs/architecture/kv_cache_routing.md- Line 185-186:
../../../docs/architecture/request_migration.mdPer the PR objectives, the documentation structure is being reorganized. If these files (disagg_serving, kv_cache_routing, request_migration) have been moved to new top-level directories as part of this reorganization, these links should also be updated for consistency. Otherwise, readers will encounter inconsistent navigation and potentially broken links.
Verify whether the following files have been moved in this PR:
docs/architecture/disagg_serving.mddocs/architecture/kv_cache_routing.mddocs/architecture/request_migration.mdIf they have been moved, provide their new paths so these links can be updated.
recipes/llama-3-70b/vllm/agg/perf.yaml (1)
8-8: Reasonable retry increase for perf testing job.Increasing
backoffLimitfrom 1 to 3 provides better fault tolerance for the performance testing job, allowing transient failures to be retried. This change aligns with the broader perf.yaml updates across the PR.docs/index.rst (1)
45-45: Toctree link targets verified—all paths are valid.All 10 toctree links in the restructured index file have been verified and resolve correctly to their target files/directories across the new organizational structure (reference/, router/, planner/, kvbm/, observability/, performance/, development/). No broken links detected.
recipes/llama-3-70b/vllm/agg/deploy.yaml (1)
21-21: Verify 0.6.0 container image availability before merging.The image tag is being updated from the placeholder
my-tagto the specific release version0.6.0on lines 21 and 45. The changes appear intentional and aligned with the PR's 0.6.0 release objectives, but the image tag availability cannot be verified programmatically in this environment (the NGC API requires authentication).Please confirm that
nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0is available in the NGC registry before merging this PR.
| # deploy the optimized DGD with planner | ||
| if args.deploy_after_profile and not args.dry_run: | ||
| logger.info("Deploying the optimized DGD with planner...") | ||
| # TODO: check conflicts for dynamo namespace and DGD name | ||
| # TODO: handle deployment errors and propagate proper error messages to users | ||
| client = DynamoDeploymentClient( | ||
| namespace=args.namespace, | ||
| base_log_dir=f"{args.output_dir}/final_deployment", | ||
| model_name=model_name, | ||
| service_name=args.service_name, | ||
| frontend_port=frontend_port, | ||
| deployment_name=config["metadata"]["name"], | ||
| ) | ||
| await client.create_deployment(f"{args.output_dir}/config_with_planner.yaml") | ||
|
|
There was a problem hiding this comment.
Critical: Risk of undefined variables and missing deployment verification.
This post-profile deployment logic has several critical issues:
-
Undefined variable risk: The variables
model_name(line 754),frontend_port(line 756), andconfig(line 757) are defined inside the try block (lines 121, 171, 721). If an exception occurs before these variables are set, this code will fail with aNameError. -
No deployment verification: The deployment is created (line 759) but there's no
await client.wait_for_deployment_ready()call. This means the function returns without confirming the deployment is functional. -
No error handling: There's no try-except block around the deployment creation. Deployment failures will either crash the script or leave broken deployments.
-
No cleanup mechanism: Unlike the deployments in the try block which are tracked in
deployment_clientsand cleaned up in the finally block, this deployment is never added to that list and has no cleanup path. -
Incomplete implementation: The TODO comments (lines 749-750) confirm that conflict checking and error handling are missing.
Consider this structure:
+ # deploy the optimized DGD with planner
+ if args.deploy_after_profile and not args.dry_run:
+ try:
+ logger.info("Deploying the optimized DGD with planner...")
+ client = DynamoDeploymentClient(
+ namespace=args.namespace,
+ base_log_dir=f"{args.output_dir}/final_deployment",
+ model_name=model_name,
+ service_name=args.service_name,
+ frontend_port=frontend_port,
+ deployment_name=config["metadata"]["name"],
+ )
+ await client.create_deployment(f"{args.output_dir}/config_with_planner.yaml")
+ logger.info("Waiting for final deployment to be ready...")
+ await client.wait_for_deployment_ready(timeout=1800)
+ logger.info("Final deployment is ready and operational")
+ except Exception as e:
+ logger.error(f"Failed to deploy optimized DGD with planner: {e}")
+ logger.info("Attempting to clean up failed deployment...")
+ try:
+ await client.delete_deployment()
+ except Exception as cleanup_error:
+ logger.warning(f"Failed to clean up deployment: {cleanup_error}")
+ raiseAdditionally, move this block inside the try block (before line 736) or ensure all required variables are defined with fallback values before this block executes.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In benchmarks/profiler/profile_sla.py around lines 746 to 760, the post-profile
deployment uses variables (model_name, frontend_port, config) that may be
undefined, lacks error handling, verification and cleanup registration; move
this deployment block inside the existing try block (or ensure those variables
have safe defaults set earlier), add a pre-deploy check for namespace/DGD name
conflicts, wrap client.create_deployment in a try/except to log and surface
errors, call await client.wait_for_deployment_ready() after creation to verify
readiness, and append the client to deployment_clients so it will be cleaned up
in the finally block.
| extraPodSpec: | ||
| mainContainer: | ||
| image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag | ||
| image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0 |
There was a problem hiding this comment.
🧩 Analysis chain
This file references nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0, but other TRTLLM deployment files (agg.yaml, disagg.yaml, agg_router.yaml) consistently use nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0. This naming inconsistency could cause deployment failures if the tensorrtllm-runtime image doesn't exist or is outdated.
Verify that the image name is correct and exists in the registry. If this should match the other TRTLLM deployments, apply this diff:
- image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0
+ image: nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0Alternatively, confirm that tensorrtllm-runtime is the correct image name and verify it exists in the NVIDIA registry with the 0.6.0 tag. Run the following script to check:
Also applies to: 53-53
🏁 Script executed:
#!/bin/bash
# Verify image names exist in NVIDIA registry
echo "Checking trtllm-runtime..."
curl -s -H "Authorization: Bearer $(gh auth token)" https://api.github.com/repos/ai-dynamo/dynamo/issues \
-X POST -d '{"title":"Verify NVIDIA image: trtllm-runtime:0.6.0"}' 2>/dev/null || echo "Could not verify via API"
# Alternative: Check if both image variants are referenced elsewhere in codebase
echo "Searching for trtllm-runtime vs tensorrtllm-runtime in TRTLLM files..."
grep -r "tensorrtllm-runtime" components/backends/trtllm/ || echo "Only trtllm-runtime found (expected)"
grep -r "trtllm-runtime" components/backends/trtllm/ | head -5Length of output: 3043
🏁 Script executed:
#!/bin/bash
# Find YAML files in the repository
echo "Searching for YAML deployment files..."
find . -name "*.yaml" -o -name "*.yml" | grep -E "(agg|deploy|trtllm)" | head -20
echo ""
echo "Searching for image references across all YAML files..."
grep -r "tensorrtllm-runtime\|trtllm-runtime" --include="*.yaml" --include="*.yml" . 2>/dev/null || echo "No matches found"Length of output: 3634
Fix image name inconsistency in agg-with-config.yaml
The file uses tensorrtllm-runtime while all other TRTLLM deployment files use trtllm-runtime. Update lines 37 and 53:
- image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0
+ image: nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0This inconsistency will cause deployment failures. All related TRTLLM deployment files (agg.yaml, agg_router.yaml, disagg.yaml, disagg_planner.yaml, disagg_router.yaml, disagg-multinode.yaml) consistently use trtllm-runtime:0.6.0.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0 | |
| image: nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0 |
🤖 Prompt for AI Agents
In components/backends/trtllm/deploy/agg-with-config.yaml around lines 37 and
53, the container image name is mistakenly set to
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0 while other TRTLLM deployment
files use nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0; update both occurrences
to nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0 to match the rest of the
deployments and avoid deployment failures.
| For the complete and authoritative list of all vLLM metrics, always refer to the official documentation linked above. | ||
|
|
||
| Dynamo runtime metrics are documented in [docs/guides/metrics.md](../../../docs/guides/metrics.md). | ||
| Dynamo runtime metrics are documented in [docs/observability/metrics.md](../../../docs/observability/metrics.md). |
There was a problem hiding this comment.
🧩 Analysis chain
Verify the relative path depth for the documentation link.
Line 13 uses a relative path ../../../docs/observability/metrics.md to link to the metrics documentation. From components/backends/vllm/prometheus.md, this path goes up three levels (to components/) but the target file is in the repo root's docs/ directory. This may require an additional ../ level.
🏁 Script executed:
#!/bin/bash
# Description: Verify the relative path from prometheus.md to the metrics documentation
# Check the actual directory structure and confirm the correct relative path
echo "=== Checking directory structure ==="
echo "prometheus.md is at: components/backends/vllm/prometheus.md"
# Count path depth
file_depth=$(echo "components/backends/vllm/prometheus.md" | tr '/' '\n' | wc -l)
target_depth=$(echo "docs/observability/metrics.md" | tr '/' '\n' | wc -l)
echo "File is at depth: $file_depth"
echo "Target is at depth: $target_depth"
# Check if target file exists
if [ -f "docs/observability/metrics.md" ]; then
echo "✓ Target file exists: docs/observability/metrics.md"
else
echo "✗ Target file NOT found: docs/observability/metrics.md"
fi
# Test what the relative path should be
echo ""
echo "=== Relative path analysis ==="
echo "From: components/backends/vllm/"
echo "To: docs/observability/metrics.md"
echo "Correct relative path should be: ../../../../docs/observability/metrics.md"Length of output: 363
🏁 Script executed:
#!/bin/bash
# Verify the actual content of line 13 in the file
echo "=== Current content of line 13 ==="
sed -n '13p' components/backends/vllm/prometheus.md
echo ""
echo "=== Full context around line 13 ==="
sed -n '10,16p' components/backends/vllm/prometheus.md
echo ""
echo "=== Checking for other path references ==="
grep -n "docs/observability/metrics" components/backends/vllm/prometheus.mdLength of output: 722
Fix the relative path in the markdown link on line 13—it's missing one level.
The relative path on line 13 uses ../../../docs/observability/metrics.md but should use ../../../../docs/observability/metrics.md. From components/backends/vllm/prometheus.md, the path needs four levels (../../../../) to reach the repo root, not three. The target file docs/observability/metrics.md exists and is correct; only the relative path depth is wrong.
🤖 Prompt for AI Agents
In components/backends/vllm/prometheus.md around line 13, the markdown link to
the metrics docs uses an incorrect relative path with three parent directories;
update the link to use four parent directories
(../../../../docs/observability/metrics.md) so it correctly resolves to
docs/observability/metrics.md from this file's location.
| ## 1. DynamoGraphDeployment Definition and Structure | ||
|
|
||
| ### Core Definition | ||
| **File:** `/Users/anishmaddipoti/Desktop/repos/dynamo/deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go` (lines 30-69) |
There was a problem hiding this comment.
Replace absolute local file paths with repository-relative paths.
The documentation contains absolute paths to local files (e.g., /Users/anishmaddipoti/Desktop/repos/dynamo/...). These should be replaced with repository-relative paths for portability.
Apply this pattern throughout the file:
-**File:** `/Users/anishmaddipoti/Desktop/repos/dynamo/deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go` (lines 30-69)
+**File:** `deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go` (lines 30-69)Also applies to: 268-268, 431-440
🤖 Prompt for AI Agents
In DGD_ARCHITECTURE_ANALYSIS.md around line 6 (and also at lines 268 and
431-440), replace any absolute local file paths like
/Users/anishmaddipoti/Desktop/repos/dynamo/deploy/... with repository-relative
paths (e.g., deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go);
scan the file for all occurrences of /Users/... and substitute them with the
repo-root-relative path, keep path separators intact, update any inline examples
to the same repository-relative format, and verify there are no remaining
absolute paths anywhere in the document.
| :class: seealso | ||
|
|
||
| **New to SLA Planner?** Start with the [SLA Planner Quick Start Guide](/docs/kubernetes/sla_planner_quickstart.md) for a complete, step-by-step workflow. | ||
| **New to SLA Planner?** Start with the [SLA Planner Quick Start Guide](/docs/planner/sla_planner_quickstart.md) for a complete, step-by-step workflow. |
There was a problem hiding this comment.
🧩 Analysis chain
Verify the absolute path in the Markdown link within RST.
The link uses an absolute path /docs/planner/sla_planner_quickstart.md inside a Markdown-formatted link within an RST file. RST may not correctly resolve absolute paths that start with /docs/.
Run the following script to check if this path exists and to identify other similar link patterns:
🏁 Script executed:
#!/bin/bash
# Check if the target file exists
if [ -f "docs/planner/sla_planner_quickstart.md" ]; then
echo "✓ Target file exists: docs/planner/sla_planner_quickstart.md"
else
echo "✗ Target file NOT found: docs/planner/sla_planner_quickstart.md"
fi
# Search for other absolute path links in RST files
echo ""
echo "Other absolute /docs/ paths in RST files:"
rg -t rst '\[.*\]\(/docs/' --color never -nLength of output: 291
I'll search for information about RST and Sphinx path handling to verify whether this absolute path is problematic.
Replace the Markdown-style link with a proper Sphinx cross-reference.
The link uses Markdown syntax [text](/docs/...) within an RST file, which bypasses Sphinx's path resolution system. In Sphinx, absolute paths starting with / are relative to the top source directory, but only when using Sphinx roles like :doc: or :download:.
Change line 32 from:
[SLA Planner Quick Start Guide](/docs/planner/sla_planner_quickstart.md)
To:
:doc:`SLA Planner Quick Start Guide </planner/sla_planner_quickstart>`
Note: Remove the .md extension and use the :doc: role for proper Sphinx resolution.
🤖 Prompt for AI Agents
docs/planner/planner_intro.rst around line 32: the file contains a
Markdown-style absolute link that bypasses Sphinx path resolution; replace the
Markdown link [SLA Planner Quick Start
Guide](/docs/planner/sla_planner_quickstart.md) with a Sphinx cross-reference
using the :doc: role, removing the .md and the leading /docs path, e.g. use
:doc:`SLA Planner Quick Start Guide </planner/sla_planner_quickstart>` so Sphinx
can resolve the document correctly.
| - name: perf | ||
| image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 | ||
| workingDir: /workspace/components/backends/vllm | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - | | ||
| apt-get update && apt-get install -y curl jq procps git && apt-get clean | ||
| pip install git+https://github.com/ai-dynamo/aiperf.git@70af59489df24a601dba57604a7341966150b366; | ||
| echo "aiperf installation completed"; | ||
| sysctl -w net.ipv4.ip_local_port_range="1024 65000" | ||
| cat /proc/sys/net/ipv4/ip_local_port_range | ||
| # wait for the model to be ready | ||
| export ENDPOINT=llama3-70b-agg-0:8000 | ||
| export TARGET_MODEL=RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | ||
| export INTERVAL=5 | ||
| echo "Waiting for model '$TARGET_MODEL' at $ENDPOINT/v1/models (checking every ${INTERVAL}s)..." | ||
| while ! curl -s "http://$ENDPOINT/v1/models" | jq -e --arg model "$TARGET_MODEL" '.data[]? | select(.id == $model)' >/dev/null 2>&1; do | ||
| echo "[$(date '+%H:%M:%S')] Model not ready yet, waiting ${INTERVAL}s..." | ||
| sleep $INTERVAL | ||
| done | ||
| echo "✅ Model '$TARGET_MODEL' is now available!" | ||
| curl -s "http://$ENDPOINT/v1/models" | jq . | ||
| # now run the benchmark | ||
| export ARTIFACT_DIR="/tmp/genai" | ||
| mkdir -p "$ARTIFACT_DIR" | ||
| echo "Running benchmark..." | ||
| export COLUMNS=200 | ||
| EPOCH=$(date +%s) | ||
| ## utility functions -- can be moved to a bash script / configmap | ||
| wait_for_model_ready() { | ||
| echo "Waiting for model '$TARGET_MODEL' at $ENDPOINT/v1/models (checking every 5s)..." | ||
| while ! curl -s "http://$ENDPOINT/v1/models" | jq -e --arg model "$TARGET_MODEL" '.data[]? | select(.id == $model)' >/dev/null 2>&1; do | ||
| echo "[$(date '+%H:%M:%S')] Model not ready yet, sleeping 5s before checking again http://$ENDPOINT/v1/models" | ||
| sleep 5 | ||
| done | ||
| echo "✅ Model '$TARGET_MODEL' is now available!" | ||
| echo "Model '$TARGET_MODEL' is now available!" | ||
| curl -s "http://$ENDPOINT/v1/models" | jq . | ||
| } | ||
| run_perf() { | ||
| local concurrency=$1 | ||
| local isl=$2 | ||
| local osl=$3 | ||
| local max_threads=${concurrency} | ||
| key=concurrency_${concurrency} | ||
| export ARTIFACT_DIR="${ROOT_ARTIFACT_DIR}/${EPOCH}_${JOB_NAME}/${key}" | ||
| mkdir -p "$ARTIFACT_DIR" | ||
| echo "ARTIFACT_DIR: $ARTIFACT_DIR" | ||
| aiperf profile --artifact-dir $ARTIFACT_DIR \ | ||
| --model $TARGET_MODEL \ | ||
| --tokenizer /root/.cache/huggingface/hub/models--RedHatAI--Llama-3.3-70B-Instruct-FP8-dynamic/snapshots/ddb4128556dfcff99e0c41aee159ea6c3e655dcd \ | ||
| --endpoint-type chat --endpoint /v1/chat/completions \ | ||
| --streaming \ | ||
| --url http://$ENDPOINT \ | ||
| --synthetic-input-tokens-mean $isl \ | ||
| --synthetic-input-tokens-stddev 0 \ | ||
| --output-tokens-mean $osl \ | ||
| --output-tokens-stddev 0 \ | ||
| --extra-inputs max_tokens:$osl \ | ||
| --extra-inputs min_tokens:$osl \ | ||
| --extra-inputs ignore_eos:true \ | ||
| --extra-inputs repetition_penalty:1.0 \ | ||
| --extra-inputs temperature:0.0 \ | ||
| --extra-inputs "{\"nvext\":{\"ignore_eos\":true}}" \ | ||
| --concurrency $concurrency \ | ||
| --request-count $((10*concurrency)) \ | ||
| --warmup-request-count $concurrency \ | ||
| --conversation-num 12800 \ | ||
| --random-seed 100 \ | ||
| --workers-max $max_threads \ | ||
| -H 'Authorization: Bearer NOT USED' \ | ||
| -H 'Accept: text/event-stream'\ | ||
| --record-processors 32 \ | ||
| --ui simple | ||
| echo "ARTIFACT_DIR: $ARTIFACT_DIR" | ||
| ls -la $ARTIFACT_DIR | ||
| } | ||
| #### Actual execution #### | ||
| wait_for_model_ready | ||
| mkdir -p "${ROOT_ARTIFACT_DIR}/${EPOCH}_${JOB_NAME}" | ||
| # Calculate total concurrency based on per-GPU concurrency and GPU count | ||
| TOTAL_CONCURRENCY=$((CONCURRENCY_PER_GPU * DEPLOYMENT_GPU_COUNT)) | ||
| echo "Calculated total concurrency: $TOTAL_CONCURRENCY (${CONCURRENCY_PER_GPU} per GPU × ${DEPLOYMENT_GPU_COUNT} GPUs)" | ||
| # Write input_config.json | ||
| cat > "${ROOT_ARTIFACT_DIR}/${EPOCH}_${JOB_NAME}/input_config.json" <<EOF | ||
| { | ||
| "gpu_count": $DEPLOYMENT_GPU_COUNT, | ||
| "max_threads": $max_threads, | ||
| "concurrency_per_gpu": $CONCURRENCY_PER_GPU, | ||
| "total_concurrency": $TOTAL_CONCURRENCY, | ||
| "mode": "$DEPLOYMENT_MODE", | ||
| "isl": $ISL, | ||
| "osl": $OSL, | ||
| "endpoint": "$ENDPOINT", | ||
| "model endpoint": "$TARGET_MODEL" | ||
| } | ||
| EOF | ||
|
|
||
| # Run perf with calculated total concurrency | ||
| run_perf $TOTAL_CONCURRENCY $ISL $OSL | ||
| echo "done with concurrency $TOTAL_CONCURRENCY" | ||
| env: | ||
| - name: TARGET_MODEL | ||
| value: RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | ||
| - name: ENDPOINT | ||
| value: llama3-70b-agg-frontend:8000 | ||
| - name: CONCURRENCY_PER_GPU | ||
| value: "16" | ||
| - name: DEPLOYMENT_GPU_COUNT | ||
| value: "4" | ||
| - name: ISL | ||
| value: "8192" | ||
| - name: OSL | ||
| value: "1024" | ||
| - name: DEPLOYMENT_MODE | ||
| value: agg | ||
| - name: AIPERF_HTTP_CONNECTION_LIMIT | ||
| value: "200" | ||
| - name: JOB_NAME | ||
| valueFrom: | ||
| fieldRef: | ||
| apiVersion: v1 | ||
| fieldPath: metadata.labels['job-name'] | ||
| - name: ROOT_ARTIFACT_DIR | ||
| value: /root/.cache/huggingface/perf | ||
| - name: HF_HOME | ||
| value: /root/.cache/huggingface | ||
| - name: PYTHONUNBUFFERED | ||
| value: "1" | ||
| image: python:3.12-slim | ||
| imagePullPolicy: IfNotPresent | ||
| name: perf | ||
| securityContext: | ||
| privileged: true | ||
| aiperf profile \ | ||
| --model "$TARGET_MODEL" \ | ||
| --tokenizer ~/.cache/huggingface/hub/models--RedHatAI--Llama-3.3-70B-Instruct-FP8-dynamic/snapshots/ddb4128556dfcff99e0c41aee159ea6c3e655dcd \ | ||
| --endpoint-type chat --url "$ENDPOINT" --streaming \ | ||
| --concurrency 64 \ | ||
| --warmup-request-count 2 \ | ||
| --request-count 320 \ | ||
| --extra-inputs max_tokens:1024 \ | ||
| --synthetic-input-tokens-mean 8192 \ | ||
| --synthetic-input-tokens-stddev 0 \ | ||
| --output-tokens-mean 1024 \ | ||
| --output-tokens-stddev 0 \ | ||
| --extra-inputs min_tokens:1024 \ | ||
| --extra-inputs ignore_eos:true \ | ||
| --extra-inputs "{\"nvext\":{\"ignore_eos\":true}}" \ | ||
| --random-seed 1418186270 \ | ||
| --artifact-dir $ARTIFACT_DIR \ | ||
| --num-dataset-entries=3000 -- \ | ||
| --max-threads 64 | ||
| echo "----------------json----------------" | ||
| PERF_JSON=$(find $ARTIFACT_DIR -name profile_export_aiperf.json) | ||
| cat $PERF_JSON | jq . | ||
| echo "----------------csv-----------------" | ||
| PERF_CSV=$(find $ARTIFACT_DIR -name profile_export_aiperf.csv) | ||
| cat $PERF_CSV | ||
| echo "Benchmark completed successfully!" |
There was a problem hiding this comment.
Add timeout safeguard to model readiness polling loop.
The new perf container workflow includes a valuable model readiness check (lines 25-35) that prevents benchmarking from starting before the model is available. However, the polling loop lacks an explicit timeout or maximum retry limit:
while ! curl -s "http://$ENDPOINT/v1/models" | jq -e --arg model "$TARGET_MODEL" '.data[]? | select(.id == $model)' >/dev/null 2>&1; do
echo "[$(date '+%H:%M:%S')] Model not ready yet, waiting ${INTERVAL}s..."
sleep $INTERVAL
doneIf the model never becomes available (due to deployment failures, misconfiguration, etc.), this loop will retry indefinitely, causing the job to hang and eventually hit Kubernetes pod termination limits.
Recommended fix: Add an explicit timeout mechanism:
+MAX_RETRIES=120 # 10 minutes with 5-second intervals
+RETRY_COUNT=0
while ! curl -s "http://$ENDPOINT/v1/models" | jq -e --arg model "$TARGET_MODEL" '.data[]? | select(.id == $model)' >/dev/null 2>&1; do
echo "[$(date '+%H:%M:%S')] Model not ready yet, waiting ${INTERVAL}s..."
+ RETRY_COUNT=$((RETRY_COUNT + 1))
+ if [ $RETRY_COUNT -ge $MAX_RETRIES ]; then
+ echo "❌ Timeout: Model did not become available within $((MAX_RETRIES * INTERVAL)) seconds"
+ exit 1
+ fi
sleep $INTERVAL
doneAdditionally, consider validating that required CLI tools (curl, jq, aiperf) are available before the wait loop begins to fail fast on configuration issues.
🤖 Prompt for AI Agents
In recipes/llama-3-70b/vllm/agg/perf.yaml around lines 18 to 66, the readiness
polling loop has no timeout and can hang indefinitely; add a timeout/retry
mechanism (e.g. READINESS_TIMEOUT_SECONDS or MAX_RETRIES computed from INTERVAL)
and break the loop with a clear error log and non‑zero exit when the timeout is
reached, and before entering the loop validate required tools (curl, jq, aiperf)
are present and exit fast with an explanatory message if any are missing. Ensure
the timeout variables are configurable via environment variables, use a counter
or timestamp to track elapsed time, and return a non‑zero exit code when giving
up so Kubernetes can fail the job cleanly.
docs: reorganize documentation structure for 0.6.0 release
Overview
This PR reorganizes the documentation structure to improve navigation and logical grouping of content for the 0.6.0 release. The reorganization creates clearer separation between API documentation, developer guides, user guides, and reference materials.
Note: This PR has 11 known merge conflicts with
mainthat require review (detailed below).Details
Major Structural Changes
1. Directory Reorganization:
docs/API/→docs/api/(lowercase for consistency)docs/development/for developer-focused guides (backend-guide.md, runtime-guide.md)docs/observability/for monitoring, logging, and health checksdocs/performance/for performance tuning guidesdocs/reference/for CLI reference, glossary, and support matrixdocs/architecture/kvbm_*→docs/kvbm/(top-level KV Block Manager docs)docs/architecture/planner_*→docs/planner/(top-level Planner docs)2. File Movements:
docs/guides/backend.md→docs/development/backend-guide.mddocs/runtime/README.md→docs/development/runtime-guide.mddocs/guides/health_check.md→docs/observability/health-checks.mddocs/guides/logging.md→docs/observability/logging.mddocs/guides/metrics.md→docs/observability/metrics.mddocs/guides/disagg_perf_tuning.md→docs/performance/tuning.mddocs/guides/dynamo_run.md→docs/reference/cli.mddocs/dynamo_glossary.md→docs/reference/glossary.mddocs/support_matrix.md→docs/reference/support-matrix.mddocs/guides/tool_calling.md→docs/guides/tool-calling.md(renamed for consistency)docs/architecture/run_kvbm_in_*.md→docs/kvbm/*-setup.md3. Updated References:
docs/index.rstdocs/hidden_toctree.rst4. Version Updates:
my-tagto0.6.0in deployment YAML filesNew Documentation Structure
Where should the reviewer start?
1. Review Merge Conflicts (11 files)
These conflicts need careful review as
mainhas added new KVBM documentation while this branch reorganizes the structure:Documentation conflicts (9 files):
docs/backends/sglang/prometheus.md- Content conflictdocs/backends/trtllm/README.md- Content conflictdocs/backends/vllm/prometheus.md- Content conflictdocs/index.rst- Content conflict (structure vs new content)docs/kvbm/kvbm_components.md- Rename conflict: renamed tokvbm_components.mdhere butkvbm_design_deepdive.mdin maindocs/kvbm/trtllm-setup.md- Content conflictdocs/kvbm/vllm-setup.md- Content conflictRecipe conflicts (3 files):
recipes/llama-3-70b/vllm/agg/perf.yamlrecipes/llama-3-70b/vllm/disagg-multi-node/perf.yamlrecipes/llama-3-70b/vllm/disagg-single-node/perf.yaml2. Key Files to Review
docs/index.rst- Main table of contents with updated pathsdocs/hidden_toctree.rst- Updated references to moved filesdocs/backends/*/README.md- Updated feature matrix linksdocs/3. Verification
Related Issues
Checklist
docs:)Summary by CodeRabbit
Documentation
Chores