Skip to content

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Aug 5, 2025

Overview:

Cherry-pick for #2309

fix broken doc links for deployment of frameworks

fixes nvbug: https://nvbugspro.nvidia.com/bug/5424387
closes linear: dyn-819

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added comprehensive Kubernetes deployment guides and configuration templates for vLLM, SGLang, and TensorRT-LLM backends, including new deployment patterns and troubleshooting sections.
    • Introduced dynamic port allocation and reservation utilities for distributed deployments.
    • Added support for Grove termination delay configuration in Kubernetes operator, replacing the previous enable/disable flag.
  • Bug Fixes

    • Corrected and clarified documentation links, installation instructions, and command-line examples across multiple READMEs and guides.
    • Fixed configuration and script issues for container images, improving dependency management and health check tooling.
  • Documentation

    • Expanded and reorganized deployment documentation, adding backend-specific instructions and removing outdated or duplicate files.
    • Updated usage, configuration, and deployment instructions for improved clarity and accuracy.
  • Refactor

    • Restructured configuration files for CUDA graph and cache settings, consolidating related parameters for clarity.
    • Updated operator and controller logic to support new Grove configuration structure and dynamic detection.
  • Chores

    • Updated dependency versions and default build references for improved compatibility and stability.
    • Removed deprecated or duplicate documentation files.

@biswapanda biswapanda changed the base branch from main to release/0.4.0 August 5, 2025 18:46
@biswapanda biswapanda enabled auto-merge (squash) August 5, 2025 18:51
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 5, 2025

Caution

Review failed

Failed to post review comments.

Walkthrough

This update introduces extensive documentation and configuration changes across the codebase, focusing on Kubernetes deployment guides, backend-specific deployment patterns, and configuration refactoring. Notable code changes include a major refactor of port allocation logic for vLLM backends, Grove feature configuration in the operator, and updates to CUDA graph and cache settings in TRTLLM engine configs. Multiple Dockerfiles, Helm charts, and deployment YAMLs are updated for consistency, dependency pinning, and improved health checks.

Changes

Cohort / File(s) Change Summary
Documentation Restructuring & Additions
README.md, examples/README.md, docs/examples/README.md, docs/guides/dynamo_deploy/README.md, components/backends/sglang/deploy/README.md, components/backends/vllm/deploy/README.md, components/backends/trtllm/deploy/README.md, components/backends/sglang/docs/*, components/backends/trtllm/README.md, components/backends/vllm/README.md, components/backends/sglang/README.md, components/backends/llama_cpp/README.md, components/backends/sglang/slurm_jobs/README.md, components/backends/sglang/docs/*, components/README.md, benchmarks/llm/README.md, deploy/inference-gateway/README.md, deploy/metrics/README.md, deploy/cloud/README.md, docs/runtime/README.md, docs/guides/dynamo_deploy/quickstart.md, docs/guides/dynamo_run.md, docs/API/nixl_connect/connector.md, docs/architecture/dynamo_flow.md, examples/basics/disaggregated_serving/README.md, examples/basics/multinode/README.md, examples/basics/quickstart/README.md, examples/deployments/EKS/Deploy_VLLM_example.md
Major documentation restructuring: expanded and reorganized deployment guides, backend-specific deployment instructions, updated links, improved usage examples, and new/deleted README files for various components and deployment patterns.
Kubernetes Deployment YAMLs
components/backends/sglang/deploy/disagg.yaml, components/backends/trtllm/deploy/agg.yaml, components/backends/trtllm/deploy/agg_router.yaml, components/backends/trtllm/deploy/disagg.yaml, components/backends/trtllm/deploy/disagg_router.yaml
Added or updated Kubernetes CRDs for SGLang and TRTLLM backends, specifying new deployment patterns, resource allocations, health checks, and updated worker launch commands.
TRTLLM Engine Config Refactoring
components/backends/trtllm/engine_configs/*
Consolidated CUDA graph and cache config keys into nested structures, added explicit dtype specifications, and introduced cache transceiver configs across multiple engine YAMLs.
vLLM Port Allocation Refactor
components/backends/vllm/src/dynamo/vllm/args.py, components/backends/vllm/src/dynamo/vllm/ports.py
Refactored port allocation logic: replaced internal allocation with new modular ports.py utilities, introduced explicit port range config, atomic ETCD-based port reservation, and improved error handling.
Operator & Grove Feature Refactor
deploy/cloud/operator/cmd/main.go, deploy/cloud/operator/internal/controller_common/predicate.go, deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go, deploy/cloud/operator/internal/dynamo/graph.go, deploy/cloud/operator/internal/dynamo/graph_test.go, deploy/cloud/operator/internal/consts/consts.go, deploy/cloud/helm/platform/components/operator/templates/deployment.yaml, deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml, deploy/cloud/helm/platform/components/operator/values.yaml, deploy/cloud/helm/platform/values.yaml, deploy/cloud/helm/dynamo-platform-values.yaml, deploy/cloud/helm/deploy.sh
Replaced static Grove enable flag with dynamic detection and configurable termination delay; updated operator config, controller logic, Helm charts, and test cases accordingly.
Dockerfiles & Build Scripts
container/Dockerfile.sglang, container/Dockerfile.sglang-wideep, container/Dockerfile.tensorrt_llm, container/Dockerfile.vllm, container/build.sh
Updated NIXL and TensorRT-LLM versions, added/updated dependencies (flashinfer-python, jq, curl), improved image build consistency, and fixed installation commands.
Dependency & Config Updates
pyproject.toml, lib/llm/Cargo.toml
Pinned/updated dependencies for compatibility (e.g., triton==3.3.1, nixl<=0.4.1), and updated optional dependency versions.
File Deletions
docs/components/backends/llm/README.md, docs/guides/dynamo_deploy/operator_deployment.md, examples/basics/multimodal/README.md
Removed outdated or redundant documentation files.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Etcd
    participant vLLM Worker
    participant Ports Module

    User->>vLLM Worker: Start with --dynamo-port-min/max
    vLLM Worker->>Ports Module: Request port allocation (with metadata)
    Ports Module->>Etcd: Reserve port(s) atomically
    Etcd-->>Ports Module: Confirm reservation
    Ports Module-->>vLLM Worker: Return allocated port(s)
    vLLM Worker->>User: Ready (ports configured)
Loading
sequenceDiagram
    participant Operator
    participant K8s API Server
    participant Grove API
    participant Helm Chart

    Operator->>K8s API Server: Start with Grove config
    Operator->>Grove API: Detect Grove availability
    Grove API-->>Operator: Grove present?
    Operator->>K8s API Server: Set Grove.Enabled and TerminationDelay
    Helm Chart->>K8s API Server: Deploy with Grove settings
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Poem

In the meadow of code, where YAMLs bloom bright,
The rabbits have shuffled the docs left and right.
Ports now reserved, with Etcd in tow,
Grove’s delay set, as the Kubernetes pods grow.
With Dockerfiles polished and configs in tune,
This patch is a garden—review it soon!
🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.2.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@dmitry-tokarev-nv dmitry-tokarev-nv merged commit 1b145bb into release/0.4.0 Aug 5, 2025
11 of 13 checks passed
@dmitry-tokarev-nv dmitry-tokarev-nv deleted the bis/fix-docs branch August 5, 2025 21:38
tzulingk pushed a commit that referenced this pull request Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants