fix: creating quickstart.md, updating README, and small updates #3189

athreesh · 2025-09-23T23:22:33Z

Streamline README and add comprehensive quickstart guide
Motivation: VDR feedback indicated the main README was too long and we lacked dedicated quickstart guide for new users coming to the repository

Changes:
README: Reduced from 317 to ~240 lines (25% reduction) with streamlined introduction, condensed engine sections, and collapsible development section
Quickstart: Added comprehensive quickstart.md with local (5 min) and Kubernetes (15-20 min) deployment paths, framework-specific guides, and troubleshooting
Consistency: Updated KVBM support status across all READMEs and removed load-based planner from support matrix

Key Improvements:
Clear navigation paths to detailed guides
Essential commands only in main README with pointers to quickstart
Framework-specific quickstarts with setup instructions
Better UX with time estimates and use case guidance

Summary by CodeRabbit

Documentation
- Revamped README with concise intro, updated positioning, and refreshed framework support matrix.
- Added comprehensive Quick Start guide covering Local and Kubernetes deployment, validation, cleanup, and troubleshooting.
- Simplified local setup via docker-compose; streamlined run/test flow with example commands.
- Introduced Kubernetes Helm-based deployment path with status checks and port-forward testing.
- Consolidated engine guidance with a high-level table and references to detailed guides.
- Added “Building from Source” section with Rust/Python steps and tooling.
- Updated backend docs: KVBM status adjustments for vLLM (✅), TensorRT-LLM (✅), and SGLang (🚧).

copy-pr-bot · 2025-09-23T23:22:37Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-23T23:29:14Z

Walkthrough

Documentation overhaul: rewrites and restructures root README, adds a new comprehensive quickstart guide, and updates backend feature matrices to reflect KVBM status changes for vLLM, SGLang, and TensorRT‑LLM. No code or API changes.

Changes

Cohort / File(s)	Summary of changes
Root README restructuring `README.md`	Rewrote intro, updated news (0.5.0 KV Cache Block Manager), revised support matrix (removed Load Based Planner), consolidated Quick Start, added Docker Compose infra, Helm-based Kubernetes path, collapsed build-from-source, expanded dev/build tooling, and added logging/devcontainer notes.
New Quickstart guide `quickstart.md`	Added end-to-end Local and Kubernetes quickstarts: prerequisites, install via uv, infra bootstrap, frontend/worker run, REST validation, framework-specific steps (vLLM/SGLang/TensorRT‑LLM), Helm deploy paths, validation and cleanup, troubleshooting, and next steps.
Backend KVBM status updates `components/backends/sglang/README.md`, `components/backends/trtllm/README.md`, `components/backends/vllm/README.md`	Adjusted Core Features matrices: SGLang KVBM to WIP, TensorRT‑LLM KVBM to Completed, vLLM KVBM to Completed. No API or logic changes.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant FE as Frontend
    participant NATS as NATS
    participant ETCD as etcd
    participant Worker as Backend Worker
    participant Model as Model Runtime

    User->>FE: POST /v1/completions
    FE->>ETCD: Read config / model routing
    FE->>NATS: Publish inference request
    Worker->>NATS: Subscribe & receive request
    Worker->>Model: Run inference
    Model-->>Worker: Tokens / result
    Worker-->>NATS: Publish response
    FE-->>NATS: Receive response
    FE-->>User: Return completion
    Note over FE,Worker: Local Quickstart flow (Docker Compose)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Update README.md #2938 — Updates README framework matrices, including KVBM status for SGLang and TensorRT‑LLM.
docs: Post-Merge cleanup of the deploy documentation #1922 — Restructures deployment/quickstart docs with Helm/Kubernetes guidance.
docs: Refactor README.md and add components/README.md #2141 — Reworks the top-level README intro, support matrix, and quickstart organization.

Poem

I thump the docs with gentle cheer,
New paths to run are crystal-clear.
Compose to start, Helm charts to steer,
KVBM marked—progress near!
I nose-twitch logs, then curl to hear—
Models hum: the carrots’ here. 🥕🐇

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description summarizes motivation, the README reductions, and the new quickstart.md, but it does not follow the repository's required template: it is missing the explicit "Overview", "Details", "Where should the reviewer start?" and "Related Issues" sections and does not list which files reviewers should inspect or any issue numbers. Because the repository expects the template, the description should be reformatted to include those required headings and reviewer guidance.	Please reformat the PR description to match the repository template by adding the "Overview" and "Details" headings, a "Where should the reviewer start?" section that calls out README.md, quickstart.md, and the modified components/backends/* README files, and a "Related Issues" section listing any issue numbers to close (or "N/A" if none).

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "fix: creating quickstart.md, updating README, and small updates" references the main documentation changes (adding quickstart.md and updating the README), so it reflects the primary changes in the PR; however the "fix:" prefix is misleading for documentation-only changes and the phrase "small updates" is vague. A concise, conventional title would improve clarity for reviewers and history.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

components/backends/trtllm/README.md (1)
72-72: Fix grammar: “all of our the” → “all of the”

Reads awkwardly.

Apply:
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
+Below we provide a guide that lets you run all of the common deployment patterns on a single node.
components/backends/vllm/README.md (1)
56-56: Fix grammar: “all of our the” → “all of the”

Minor readability improvement.

Apply:
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
+Below we provide a guide that lets you run all of the common deployment patterns on a single node.
components/backends/sglang/README.md (2)
49-50: Fix typo: “does not router” → “does not route”

Minor wording issue.

Apply:
-| **DP Rank Routing** | 🚧     | Direct routing supported. Dynamo KV router does not router to DP worker |
+| **DP Rank Routing** | 🚧     | Direct routing supported. Dynamo KV router does not route to DP worker |
164-164: Fix typo: “conjuction” → “conjunction”

Minor spelling fix.

Apply:
-... is used in conjuction with NIXL to handle the kv transfer.
+... is used in conjunction with NIXL to handle the KV transfer.

🧹 Nitpick comments (7)

components/backends/trtllm/README.md (2)

231-241: Remove duplicated “Client” and “Benchmarking” sections

These repeat the earlier sections at Lines 191–201. Deduplicate to reduce maintenance burden.

Apply:

-## Client
-
-See [client](../sglang/README.md#testing-the-deployment) section to learn how to send request to the deployment.
-
-NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.
-
-## Benchmarking
-
-To benchmark your deployment with GenAI-Perf, see this utility script, configuring the
-`model` name and `host` based on your deployment: [perf.sh](../../../benchmarks/llm/perf.sh)

311-312: Tighten punctuation spacing

Remove stray space before the period.

Apply:

-Here is the instruction: [Running KVBM in TensorRT-LLM](./../../../docs/guides/run_kvbm_in_trtllm.md) .
+Here is the instruction: [Running KVBM in TensorRT-LLM](./../../../docs/guides/run_kvbm_in_trtllm.md).

quickstart.md (2)

152-158: Use ‘helm pull’ instead of deprecated ‘helm fetch’

Helm v3 recommends ‘helm pull’. Replace both occurrences for CRDs and platform.

Apply:

-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
+helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
 helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default

-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
+helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
 helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace

86-94: Version-pin backend installs for reproducibility

Align with the top-level install (0.5.0) to reduce drift between sections.

Apply:

-uv pip install "ai-dynamo[vllm]"
+uv pip install "ai-dynamo[vllm]==0.5.0"

README.md (3)

84-85: Pin version to match Quickstart

Keeps top-level README reproducible and aligned with quickstart.

Apply:

-uv pip install "ai-dynamo[sglang]"  # or [vllm], [trtllm]
+uv pip install "ai-dynamo[sglang]==0.5.0"  # or [vllm]==0.5.0, [trtllm]==0.5.0

118-123: Use ‘helm pull’ instead of ‘helm fetch’

Modern Helm uses ‘pull’.

Apply:

-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
+helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
 helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default

-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
+helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
 helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace

150-155: Normalize engine run commands and flags

Make SGLang line consistent with earlier fix and harmonize flag names across engines.

Apply:

-| **SGLang** | `uv pip install ai-dynamo[sglang]` | `python -m dynamo.sglang.worker --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires `apt install -y libnuma-dev` dependency. |
+| **SGLang** | `uv pip install ai-dynamo[sglang]==0.5.0` | `python -m dynamo.sglang --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires `apt install -y libnuma-dev`. |
-| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]` | `python -m dynamo.trtllm --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](quickstart.md#tensorrt-llm-backend) for setup. |
+| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]==0.5.0` | `python -m dynamo.trtllm --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](quickstart.md#tensorrt-llm-backend) for setup. |

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c63ccea and 0b01bd6.

📒 Files selected for processing (5)

README.md (6 hunks)
components/backends/sglang/README.md (1 hunks)
components/backends/trtllm/README.md (1 hunks)
components/backends/vllm/README.md (1 hunks)
quickstart.md (1 hunks)

🧰 Additional context used

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3189/merge) by athreesh.

components/backends/trtllm/README.md

[error] 95-95: Trailing whitespace found and removed by pre-commit hook.

components/backends/vllm/README.md

[error] 95-95: Trailing whitespace found and removed by pre-commit hook.

README.md

[error] 95-95: Trailing whitespace found and removed by pre-commit hook.

components/backends/sglang/README.md

[error] 95-95: Trailing whitespace found and removed by pre-commit hook.

🪛 GitHub Check: Check for broken markdown links

quickstart.md

[failure] 244-244:
Broken link: Security Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/quickstart.md?plain=1#L244

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

quickstart.md

README.md

Signed-off-by: athreesh <[email protected]>

Signed-off-by: Julien Mancuso <[email protected]>

Signed-off-by: ayushag <[email protected]>

Signed-off-by: athreesh <[email protected]>

Signed-off-by: ayushag <[email protected]>

Signed-off-by: athreesh <[email protected]>

Signed-off-by: hongkuanz <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]>

Co-authored-by: Ubuntu <[email protected]> Signed-off-by: athreesh <[email protected]>

Signed-off-by: Guan Luo <[email protected]> Signed-off-by: GuanLuo <[email protected]> Co-authored-by: Olga Andreeva <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>

Signed-off-by: ayushag <[email protected]>

…into anish-ux-fixes

athreesh added 2 commits September 24, 2025 07:13

fix: making quickstart.md, editing readme, + small minor changes

770a2a3

additions to readme from previous

0b01bd6

athreesh requested review from grahamking, hutm, ishandhanani and rmccorm4 September 23, 2025 23:22

pull-request-size bot added the size/XL label Sep 23, 2025

github-actions bot added the fix label Sep 23, 2025

athreesh requested a review from nnshah1 September 23, 2025 23:26

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

quickstart.md Show resolved Hide resolved

quickstart.md Outdated Show resolved Hide resolved

README.md Show resolved Hide resolved

athreesh added 2 commits September 24, 2025 07:30

small changes to k8s readme

6861ac7

addressing coderabbit

e52d2f5

athreesh requested review from a team as code owners September 24, 2025 02:27

athreesh and others added 13 commits September 24, 2025 10:33

addressing coderabbit

0fe7574

Merge branch 'main' into anish-ux-fixes

09faa49

Merge branch 'main' into anish-ux-fixes

88a4f07

fix: fix broken links (#3186)

8678b5d

Signed-off-by: athreesh <[email protected]>

fix: improve sglang multinode handling in operator (#3151)

2a987f9

Signed-off-by: Julien Mancuso <[email protected]>

feat: add trtllm and vllm multinode k8s examples (#3100)

0e1bb5d

Signed-off-by: Julien Mancuso <[email protected]>

chore: added middleware layer to catch json validation errors (#3182)

e27d058

Signed-off-by: ayushag <[email protected]>

fix: making quickstart.md, editing readme, + small minor changes

3941ba4

Signed-off-by: athreesh <[email protected]>

additions to readme from previous

300d10c

Signed-off-by: athreesh <[email protected]>

small changes to k8s readme

b6823c5

Signed-off-by: athreesh <[email protected]>

addressing coderabbit

9fa2e64

Signed-off-by: athreesh <[email protected]>

addressing coderabbit

fd33821

Signed-off-by: athreesh <[email protected]>

feat: JailedStream (#3034)

b723c86

Signed-off-by: ayushag <[email protected]>

biswapanda and others added 6 commits September 24, 2025 17:40

feat: allow setting affinity for controller manager pod (#3157)

1c870fc

Signed-off-by: athreesh <[email protected]>

feat: support MoE model in SLA Planner Sglang (#3185)

40054be

Signed-off-by: hongkuanz <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]>

feat: allow framework tokenization/detokenization (#3134)

3e07d5e

Co-authored-by: Ubuntu <[email protected]> Signed-off-by: athreesh <[email protected]>

feat: tensor type for generic inference. (#2746)

8cf6437

Signed-off-by: Guan Luo <[email protected]> Signed-off-by: GuanLuo <[email protected]> Co-authored-by: Olga Andreeva <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>

chore: jail stream optimizations (v1) (#3195)

1203972

Signed-off-by: ayushag <[email protected]>

Merge branch 'anish-ux-fixes' of https://github.com/ai-dynamo/dynamo …

0e402be

…into anish-ux-fixes

athreesh closed this Sep 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: creating quickstart.md, updating README, and small updates #3189

fix: creating quickstart.md, updating README, and small updates #3189

Uh oh!

athreesh commented Sep 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Sep 23, 2025

Uh oh!

coderabbitai bot commented Sep 23, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

fix: creating quickstart.md, updating README, and small updates #3189

fix: creating quickstart.md, updating README, and small updates #3189

Uh oh!

Conversation

athreesh commented Sep 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 23, 2025

Uh oh!

coderabbitai bot commented Sep 23, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

athreesh commented Sep 23, 2025 •

edited by coderabbitai bot

Loading