-
Notifications
You must be signed in to change notification settings - Fork 690
fix: creating quickstart.md, updating README, and small updates #3189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughDocumentation overhaul: rewrites and restructures root README, adds a new comprehensive quickstart guide, and updates backend feature matrices to reflect KVBM status changes for vLLM, SGLang, and TensorRT‑LLM. No code or API changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant FE as Frontend
participant NATS as NATS
participant ETCD as etcd
participant Worker as Backend Worker
participant Model as Model Runtime
User->>FE: POST /v1/completions
FE->>ETCD: Read config / model routing
FE->>NATS: Publish inference request
Worker->>NATS: Subscribe & receive request
Worker->>Model: Run inference
Model-->>Worker: Tokens / result
Worker-->>NATS: Publish response
FE-->>NATS: Receive response
FE-->>User: Return completion
Note over FE,Worker: Local Quickstart flow (Docker Compose)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
components/backends/trtllm/README.md (1)
72-72: Fix grammar: “all of our the” → “all of the”Reads awkwardly.
Apply:
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node. +Below we provide a guide that lets you run all of the common deployment patterns on a single node.components/backends/vllm/README.md (1)
56-56: Fix grammar: “all of our the” → “all of the”Minor readability improvement.
Apply:
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node. +Below we provide a guide that lets you run all of the common deployment patterns on a single node.components/backends/sglang/README.md (2)
49-50: Fix typo: “does not router” → “does not route”Minor wording issue.
Apply:
-| **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not router to DP worker | +| **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not route to DP worker |
164-164: Fix typo: “conjuction” → “conjunction”Minor spelling fix.
Apply:
-... is used in conjuction with NIXL to handle the kv transfer. +... is used in conjunction with NIXL to handle the KV transfer.
🧹 Nitpick comments (7)
components/backends/trtllm/README.md (2)
231-241: Remove duplicated “Client” and “Benchmarking” sectionsThese repeat the earlier sections at Lines 191–201. Deduplicate to reduce maintenance burden.
Apply:
-## Client - -See [client](../sglang/README.md#testing-the-deployment) section to learn how to send request to the deployment. - -NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`. - -## Benchmarking - -To benchmark your deployment with GenAI-Perf, see this utility script, configuring the -`model` name and `host` based on your deployment: [perf.sh](../../../benchmarks/llm/perf.sh)
311-312: Tighten punctuation spacingRemove stray space before the period.
Apply:
-Here is the instruction: [Running KVBM in TensorRT-LLM](./../../../docs/guides/run_kvbm_in_trtllm.md) . +Here is the instruction: [Running KVBM in TensorRT-LLM](./../../../docs/guides/run_kvbm_in_trtllm.md).quickstart.md (2)
152-158: Use ‘helm pull’ instead of deprecated ‘helm fetch’Helm v3 recommends ‘helm pull’. Replace both occurrences for CRDs and platform.
Apply:
-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz +helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default -helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz +helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace
86-94: Version-pin backend installs for reproducibilityAlign with the top-level install (0.5.0) to reduce drift between sections.
Apply:
-uv pip install "ai-dynamo[vllm]" +uv pip install "ai-dynamo[vllm]==0.5.0"README.md (3)
84-85: Pin version to match QuickstartKeeps top-level README reproducible and aligned with quickstart.
Apply:
-uv pip install "ai-dynamo[sglang]" # or [vllm], [trtllm] +uv pip install "ai-dynamo[sglang]==0.5.0" # or [vllm]==0.5.0, [trtllm]==0.5.0
118-123: Use ‘helm pull’ instead of ‘helm fetch’Modern Helm uses ‘pull’.
Apply:
-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz +helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default -helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz +helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace
150-155: Normalize engine run commands and flagsMake SGLang line consistent with earlier fix and harmonize flag names across engines.
Apply:
-| **SGLang** | `uv pip install ai-dynamo[sglang]` | `python -m dynamo.sglang.worker --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires `apt install -y libnuma-dev` dependency. | +| **SGLang** | `uv pip install ai-dynamo[sglang]==0.5.0` | `python -m dynamo.sglang --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires `apt install -y libnuma-dev`. | -| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]` | `python -m dynamo.trtllm --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](quickstart.md#tensorrt-llm-backend) for setup. | +| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]==0.5.0` | `python -m dynamo.trtllm --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](quickstart.md#tensorrt-llm-backend) for setup. |
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
README.md(6 hunks)components/backends/sglang/README.md(1 hunks)components/backends/trtllm/README.md(1 hunks)components/backends/vllm/README.md(1 hunks)quickstart.md(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3189/merge) by athreesh.
components/backends/trtllm/README.md
[error] 95-95: Trailing whitespace found and removed by pre-commit hook.
components/backends/vllm/README.md
[error] 95-95: Trailing whitespace found and removed by pre-commit hook.
README.md
[error] 95-95: Trailing whitespace found and removed by pre-commit hook.
components/backends/sglang/README.md
[error] 95-95: Trailing whitespace found and removed by pre-commit hook.
🪛 GitHub Check: Check for broken markdown links
quickstart.md
[failure] 244-244:
Broken link: Security Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/quickstart.md?plain=1#L244
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
Signed-off-by: athreesh <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: athreesh <[email protected]>
Signed-off-by: athreesh <[email protected]>
Signed-off-by: athreesh <[email protected]>
Signed-off-by: athreesh <[email protected]>
Signed-off-by: athreesh <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: athreesh <[email protected]>
Signed-off-by: hongkuanz <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]>
Co-authored-by: Ubuntu <[email protected]> Signed-off-by: athreesh <[email protected]>
Signed-off-by: Guan Luo <[email protected]> Signed-off-by: GuanLuo <[email protected]> Co-authored-by: Olga Andreeva <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>
Signed-off-by: ayushag <[email protected]>
Streamline README and add comprehensive quickstart guide
Motivation: VDR feedback indicated the main README was too long and we lacked dedicated quickstart guide for new users coming to the repository
Changes:
README: Reduced from 317 to ~240 lines (25% reduction) with streamlined introduction, condensed engine sections, and collapsible development section
Quickstart: Added comprehensive quickstart.md with local (5 min) and Kubernetes (15-20 min) deployment paths, framework-specific guides, and troubleshooting
Consistency: Updated KVBM support status across all READMEs and removed load-based planner from support matrix
Key Improvements:
Clear navigation paths to detailed guides
Essential commands only in main README with pointers to quickstart
Framework-specific quickstarts with setup instructions
Better UX with time estimates and use case guidance
Summary by CodeRabbit