Skip to content

Conversation

@athreesh
Copy link
Contributor

@athreesh athreesh commented Sep 23, 2025

Streamline README and add comprehensive quickstart guide
Motivation: VDR feedback indicated the main README was too long and we lacked dedicated quickstart guide for new users coming to the repository

Changes:
README: Reduced from 317 to ~240 lines (25% reduction) with streamlined introduction, condensed engine sections, and collapsible development section
Quickstart: Added comprehensive quickstart.md with local (5 min) and Kubernetes (15-20 min) deployment paths, framework-specific guides, and troubleshooting
Consistency: Updated KVBM support status across all READMEs and removed load-based planner from support matrix

Key Improvements:
Clear navigation paths to detailed guides
Essential commands only in main README with pointers to quickstart
Framework-specific quickstarts with setup instructions
Better UX with time estimates and use case guidance

Summary by CodeRabbit

  • Documentation
    • Revamped README with concise intro, updated positioning, and refreshed framework support matrix.
    • Added comprehensive Quick Start guide covering Local and Kubernetes deployment, validation, cleanup, and troubleshooting.
    • Simplified local setup via docker-compose; streamlined run/test flow with example commands.
    • Introduced Kubernetes Helm-based deployment path with status checks and port-forward testing.
    • Consolidated engine guidance with a high-level table and references to detailed guides.
    • Added “Building from Source” section with Rust/Python steps and tooling.
    • Updated backend docs: KVBM status adjustments for vLLM (✅), TensorRT-LLM (✅), and SGLang (🚧).

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 23, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 23, 2025

Walkthrough

Documentation overhaul: rewrites and restructures root README, adds a new comprehensive quickstart guide, and updates backend feature matrices to reflect KVBM status changes for vLLM, SGLang, and TensorRT‑LLM. No code or API changes.

Changes

Cohort / File(s) Summary of changes
Root README restructuring
README.md
Rewrote intro, updated news (0.5.0 KV Cache Block Manager), revised support matrix (removed Load Based Planner), consolidated Quick Start, added Docker Compose infra, Helm-based Kubernetes path, collapsed build-from-source, expanded dev/build tooling, and added logging/devcontainer notes.
New Quickstart guide
quickstart.md
Added end-to-end Local and Kubernetes quickstarts: prerequisites, install via uv, infra bootstrap, frontend/worker run, REST validation, framework-specific steps (vLLM/SGLang/TensorRT‑LLM), Helm deploy paths, validation and cleanup, troubleshooting, and next steps.
Backend KVBM status updates
components/backends/sglang/README.md, components/backends/trtllm/README.md, components/backends/vllm/README.md
Adjusted Core Features matrices: SGLang KVBM to WIP, TensorRT‑LLM KVBM to Completed, vLLM KVBM to Completed. No API or logic changes.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant FE as Frontend
    participant NATS as NATS
    participant ETCD as etcd
    participant Worker as Backend Worker
    participant Model as Model Runtime

    User->>FE: POST /v1/completions
    FE->>ETCD: Read config / model routing
    FE->>NATS: Publish inference request
    Worker->>NATS: Subscribe & receive request
    Worker->>Model: Run inference
    Model-->>Worker: Tokens / result
    Worker-->>NATS: Publish response
    FE-->>NATS: Receive response
    FE-->>User: Return completion
    Note over FE,Worker: Local Quickstart flow (Docker Compose)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

I thump the docs with gentle cheer,
New paths to run are crystal-clear.
Compose to start, Helm charts to steer,
KVBM marked—progress near!
I nose-twitch logs, then curl to hear—
Models hum: the carrots’ here. 🥕🐇

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description summarizes motivation, the README reductions, and the new quickstart.md, but it does not follow the repository's required template: it is missing the explicit "Overview", "Details", "Where should the reviewer start?" and "Related Issues" sections and does not list which files reviewers should inspect or any issue numbers. Because the repository expects the template, the description should be reformatted to include those required headings and reviewer guidance. Please reformat the PR description to match the repository template by adding the "Overview" and "Details" headings, a "Where should the reviewer start?" section that calls out README.md, quickstart.md, and the modified components/backends/* README files, and a "Related Issues" section listing any issue numbers to close (or "N/A" if none).
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "fix: creating quickstart.md, updating README, and small updates" references the main documentation changes (adding quickstart.md and updating the README), so it reflects the primary changes in the PR; however the "fix:" prefix is misleading for documentation-only changes and the phrase "small updates" is vague. A concise, conventional title would improve clarity for reviewers and history.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
components/backends/trtllm/README.md (1)

72-72: Fix grammar: “all of our the” → “all of the”

Reads awkwardly.

Apply:

-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
+Below we provide a guide that lets you run all of the common deployment patterns on a single node.
components/backends/vllm/README.md (1)

56-56: Fix grammar: “all of our the” → “all of the”

Minor readability improvement.

Apply:

-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
+Below we provide a guide that lets you run all of the common deployment patterns on a single node.
components/backends/sglang/README.md (2)

49-50: Fix typo: “does not router” → “does not route”

Minor wording issue.

Apply:

-| **DP Rank Routing** | 🚧     | Direct routing supported. Dynamo KV router does not router to DP worker |
+| **DP Rank Routing** | 🚧     | Direct routing supported. Dynamo KV router does not route to DP worker |

164-164: Fix typo: “conjuction” → “conjunction”

Minor spelling fix.

Apply:

-... is used in conjuction with NIXL to handle the kv transfer.
+... is used in conjunction with NIXL to handle the KV transfer.
🧹 Nitpick comments (7)
components/backends/trtllm/README.md (2)

231-241: Remove duplicated “Client” and “Benchmarking” sections

These repeat the earlier sections at Lines 191–201. Deduplicate to reduce maintenance burden.

Apply:

-## Client
-
-See [client](../sglang/README.md#testing-the-deployment) section to learn how to send request to the deployment.
-
-NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.
-
-## Benchmarking
-
-To benchmark your deployment with GenAI-Perf, see this utility script, configuring the
-`model` name and `host` based on your deployment: [perf.sh](../../../benchmarks/llm/perf.sh)

311-312: Tighten punctuation spacing

Remove stray space before the period.

Apply:

-Here is the instruction: [Running KVBM in TensorRT-LLM](./../../../docs/guides/run_kvbm_in_trtllm.md) .
+Here is the instruction: [Running KVBM in TensorRT-LLM](./../../../docs/guides/run_kvbm_in_trtllm.md).
quickstart.md (2)

152-158: Use ‘helm pull’ instead of deprecated ‘helm fetch’

Helm v3 recommends ‘helm pull’. Replace both occurrences for CRDs and platform.

Apply:

-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
+helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
 helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default

-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
+helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
 helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace

86-94: Version-pin backend installs for reproducibility

Align with the top-level install (0.5.0) to reduce drift between sections.

Apply:

-uv pip install "ai-dynamo[vllm]"
+uv pip install "ai-dynamo[vllm]==0.5.0"
README.md (3)

84-85: Pin version to match Quickstart

Keeps top-level README reproducible and aligned with quickstart.

Apply:

-uv pip install "ai-dynamo[sglang]"  # or [vllm], [trtllm]
+uv pip install "ai-dynamo[sglang]==0.5.0"  # or [vllm]==0.5.0, [trtllm]==0.5.0

118-123: Use ‘helm pull’ instead of ‘helm fetch’

Modern Helm uses ‘pull’.

Apply:

-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
+helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
 helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default

-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
+helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
 helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace

150-155: Normalize engine run commands and flags

Make SGLang line consistent with earlier fix and harmonize flag names across engines.

Apply:

-| **SGLang** | `uv pip install ai-dynamo[sglang]` | `python -m dynamo.sglang.worker --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires `apt install -y libnuma-dev` dependency. |
+| **SGLang** | `uv pip install ai-dynamo[sglang]==0.5.0` | `python -m dynamo.sglang --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires `apt install -y libnuma-dev`. |
-| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]` | `python -m dynamo.trtllm --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](quickstart.md#tensorrt-llm-backend) for setup. |
+| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]==0.5.0` | `python -m dynamo.trtllm --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](quickstart.md#tensorrt-llm-backend) for setup. |
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c63ccea and 0b01bd6.

📒 Files selected for processing (5)
  • README.md (6 hunks)
  • components/backends/sglang/README.md (1 hunks)
  • components/backends/trtllm/README.md (1 hunks)
  • components/backends/vllm/README.md (1 hunks)
  • quickstart.md (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3189/merge) by athreesh.
components/backends/trtllm/README.md

[error] 95-95: Trailing whitespace found and removed by pre-commit hook.

components/backends/vllm/README.md

[error] 95-95: Trailing whitespace found and removed by pre-commit hook.

README.md

[error] 95-95: Trailing whitespace found and removed by pre-commit hook.

components/backends/sglang/README.md

[error] 95-95: Trailing whitespace found and removed by pre-commit hook.

🪛 GitHub Check: Check for broken markdown links
quickstart.md

[failure] 244-244:
Broken link: Security Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/quickstart.md?plain=1#L244

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo

@athreesh athreesh requested review from a team as code owners September 24, 2025 02:27
@athreesh athreesh closed this Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants