-
Notifications
You must be signed in to change notification settings - Fork 693
chore: Add EKS docs and tweak backend READMEs #2157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update introduces significant improvements and expansions to documentation across several components. The TensorRT-LLM and vLLM backend READMEs are extensively restructured for clarity, modularity, and completeness, with new feature support matrices and deployment instructions. Three new EKS deployment guides are added, detailing cluster setup, Dynamo Cloud installation, and vLLM deployment. The distributed runtime architecture documentation is modernized and clarified. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Documentation
participant Kubernetes Cluster
participant Dynamo Cloud
participant vLLM/TRTLLM Backend
User->>Documentation: Follow EKS setup guide
User->>Kubernetes Cluster: Create EKS cluster with EFS
User->>Documentation: Follow Dynamo Cloud install guide
User->>Kubernetes Cluster: Deploy Dynamo Cloud components
User->>Documentation: Follow vLLM/TRTLLM deployment guide
User->>Kubernetes Cluster: Deploy vLLM/TRTLLM backend
User->>Dynamo Cloud: Send inference request
Dynamo Cloud->>vLLM/TRTLLM Backend: Route request
vLLM/TRTLLM Backend-->>Dynamo Cloud: Return result
Dynamo Cloud-->>User: Return inference response
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
components/backends/trtllm/README.md (1)
201-204: Typo in link path
kv-cache-tranfer.md→kv-cache-transfer.md- see the [KV cache transfer guide](./kv-cache-tranfer.md). + see the [KV cache transfer guide](./kv-cache-transfer.md).
🧹 Nitpick comments (16)
examples/deployments/EKS/Deploy_VLLM_example.md (2)
28-41: Second fenced block is missing its language identifier.Same MD040 issue as above – add
bash(orshell) so the CI linter passes.-``` +```bash kubectl port-forward deployment/vllm-agg-router-frontend 8080:8000 -n dynamo-cloud curl localhost:8080/v1/chat/completions \ ... }' -``` +```
6-8: Make the “edit YAML” step reproducible & non-interactive.
vimin scripted docs breaks copy-paste automation. Suggest showing a one-liner or ayq/sedexample instead, e.g.:# yq eval '.metadata.namespace = "dynamo-cloud" | .spec.template.spec.containers[0].image = env(IMAGE)' -i agg_router.yamlThis keeps the guide copy-paste friendly.
examples/deployments/EKS/Deploy_Dynamo_Cloud.md (3)
5-5: Typo: “repositoriy”.-Create 1 ECR repositoriy +Create an ECR repository
8-10: Add language tags to fenced blocks (MD040).Applies to every block in this file; adding
bashonce per section will clear the linter noise.
69-77: Secret creation duplicated; will error on second run.
kubectl create secret ...appears both before and after dependency build. Either drop the second block or switch tokubectl apply.components/backends/vllm/README.md (1)
55-56: Grammar nit – duplicate article.-Below we provide a guide that lets you run all of our the common deployment patterns on a single node. +Below we provide a guide that lets you run all of the common deployment patterns on a single node.docs/architecture/distributed_runtime.md (3)
29-31: Minor grammar: plural agreement.-...in practice, each dynamo components typically are deployed... +...in practice, each Dynamo component is typically deployed...
58-60: Spelling: “Componenet”.-... `/services/{namespace}/{component}/{endpoint}-{lease_id}`. Note that the endpoints of different workers of the same type (i.e., two `PrefillWorker`s in one deployment) share the same `Namespace`, `Componenet`, and `Endpoint` name. +... share the same `Namespace`, `Component`, and `Endpoint` name.
78-79: Dangling back-tick at end of sentence.-... `/components/backends`. ` for full implementation details. +... `/components/backends` for full implementation details.examples/deployments/EKS/Create_EKS_EFS.md (3)
7-12: Add language identifiers to fenced blocks (MD040).Same issue repeats throughout the file; apply
bash(oryamlwhere appropriate) to silence the linter.
112-112: Style: “inside of” → “inside”.-...accessed by processes running inside of said pods... +...accessed by processes running inside said pods...
125-125: Concise wording: “are able to” → “can”.-...whether your nodes are able to access the created EFS file system. +...whether your nodes can access the created EFS file system.components/backends/trtllm/README.md (4)
36-41: Broken anchor in Table of Contents
Single Node Examplespoints to#single-node-deployments, but the section header renders to#single-node-examples.
This breaks intra-doc navigation.- - [Single Node Examples](#single-node-deployments) + - [Single Node Examples](#single-node-examples)
36-45: List indentation violates MD007Markdown-lint flags the bullets in the TOC. Add two spaces after the hyphen (or disable the rule locally) for consistent list-indentation.
67-70: Minor grammar tweak
all of our the common→all of the common.-Below we provide a guide that lets you run all of our the common deployment patterns on a single node. +Below we provide a guide that lets you run all of the common deployment patterns on a single node.
73-77: Command assumes repo root
docker compose -f deploy/docker-compose.yml …only works when run from the repository root, but the README sits three levels deep. Add an explicitcd $DYNAMO_HOME(or a relative path) to avoid copy-paste failures.- docker compose -f deploy/docker-compose.yml up -d + # Run from the repository root + docker compose -f deploy/docker-compose.yml up -d
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
components/backends/trtllm/README.md(8 hunks)components/backends/vllm/README.md(3 hunks)docs/architecture/distributed_runtime.md(3 hunks)examples/deployments/EKS/Create_EKS_EFS.md(1 hunks)examples/deployments/EKS/Deploy_Dynamo_Cloud.md(1 hunks)examples/deployments/EKS/Deploy_VLLM_example.md(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
examples/deployments/EKS/Deploy_VLLM_example.md (1)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
components/backends/trtllm/README.md (4)
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
Learnt from: nnshah1
PR: #2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.
docs/architecture/distributed_runtime.md (4)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
🪛 markdownlint-cli2 (0.17.2)
components/backends/trtllm/README.md
49-49: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
50-50: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
examples/deployments/EKS/Deploy_Dynamo_Cloud.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
14-14: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
25-25: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
35-35: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
43-43: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
58-58: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
66-66: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
88-88: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
examples/deployments/EKS/Create_EKS_EFS.md
5-5: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
13-13: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
27-27: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
46-46: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2157/merge) by athreesh.
docs/architecture/distributed_runtime.md
[error] 31-31: pre-commit hook 'trailing-whitespace' failed. Trailing whitespace was fixed in this file.
components/backends/vllm/README.md
[error] 52-52: pre-commit hook 'trailing-whitespace' failed. Trailing whitespace was fixed in this file.
🪛 LanguageTool
examples/deployments/EKS/Create_EKS_EFS.md
[style] ~112-~112: This phrase is redundant. Consider using “inside”.
Context: ... and then accessed by processes running inside of said pods as if they were part of the p...
(OUTSIDE_OF)
[style] ~125-~125: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...ly. This will affect whether your nodes are able to access the created EFS file system. ##...
(BE_ABLE_TO)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (4)
examples/deployments/EKS/Deploy_VLLM_example.md (1)
5-9: Add a language identifier to satisfy markdown‐lint (MD040).Without the specifier, the pre-commit hook will keep flagging the file.
[ suggest_nitpick ]-``` +```bash cd dynamo/components/backends/vllm/deploy vim agg_router.yaml #under metadata add namespace: dynamo-cloud and change image to your built base image kubectl apply -f agg_router.yaml -``` +```examples/deployments/EKS/Deploy_Dynamo_Cloud.md (1)
59-64: Namespace inconsistency may lead to orphaned CRDs.You install CRDs into
defaultbut everything else lives in${NAMESPACE}(dynamo-cloud). Unless there is a deliberate reason, target the same namespace to avoid confusion.- --namespace default \ + --namespace ${NAMESPACE} \components/backends/trtllm/README.md (2)
61-66: Inconsistent “GB200 Support” rowThe ✅ icon claims support while the Notes column says Not supported. Please verify and make the row unambiguous.
205-210: Potentially broken client link
../llm/README.md#clientresolves tocomponents/backends/llm/README.md, which likely does not exist.
Confirm the correct target (maybe../../llm/README.mdor the repo-root README) and update the path.
WalkthroughThis update introduces substantial improvements to documentation across multiple components. It restructures and clarifies the README files for the TensorRT-LLM and vLLM backends, updates the distributed runtime architecture documentation, and adds three new EKS deployment guides for Dynamo Cloud and vLLM. No code or configuration logic is changed; all modifications are limited to documentation. Changes
Sequence Diagram(s)No sequence diagrams are generated, as all changes are limited to documentation content and structure. Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (20)
examples/deployments/EKS/Create_EKS_EFS.md (6)
7-12: Addapt-get updatebefore installingunzip.Without refreshing package indices, some CI images fail to locate the
unzippackage.sudo apt install unzip +sudo apt-get update -y +sudo apt-get install unzip -y
17-21: Use the latest stable kubectl instead of pinning to 1.30.0.Hard-coding a minor version risks drift between control-plane and client. Prefer the upstream pattern:
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
52-54: Remind readers to save the YAML toeks_cluster_config.yaml.The subsequent
eksctl create cluster -f eks_cluster_config.yamlassumes this filename, but the guide never instructs users to write the file.
112-114: Minor wording & redundancy fixes.“too large to loaded” ➜ “too large to be loaded”
“inside of said pods” ➜ “inside the pods”
125-126: Concise wording.Replace “are able to access” with “can access”.
135-149: Consider addingreclaimPolicyand replacing hard-codedfileSystemId.reclaimPolicy: Retain # prevents accidental data loss parameters: fileSystemId: <YOUR_FS_ID> # make it explicit this must be editedexamples/deployments/EKS/Deploy_VLLM_example.md (2)
6-8: Use a generic editor placeholder instead of forcingvim.-vim agg_router.yaml #under metadata add namespace: dynamo-cloud and change image to your built base image +$EDITOR agg_router.yaml # edit metadata.namespace and image fields
14-23: Clarify that pod names and ages will differ.Real outputs vary per cluster; suggesting users rely on exact names can confuse copy-pastes.
examples/deployments/EKS/Deploy_Dynamo_Cloud.md (3)
5-10: Typo & pluralisation.“Create 1 ECR repositoriy” ➜ “Create an ECR repository”.
26-29: Remove trailing slash from registry URL.Some Docker versions treat the trailing
/as part of the registry hostname and fail auth.-docker login --username AWS --password-stdin <ECR_REGISTRY>/ +docker login --username AWS --password-stdin <ECR_REGISTRY>
67-77: Duplicate secret creation.Namespace and
docker-imagepullsecretare created twice (lines 44–49 and 71–76). Factor this into a single step to avoid “already exists” warnings.components/backends/vllm/README.md (2)
14-20: Fetch tags beforegit describeto avoid detached-HEAD errors.git fetch --tags git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
55-56: Grammar: remove extra “the”.“run all of our the common deployment patterns” ➜ “run all of our common deployment patterns”.
components/backends/trtllm/README.md (2)
24-26: Capitalize product name.“dynamo” ➜ “Dynamo”.
69-70: Grammar: remove extra “the”.Same phrasing issue as the vLLM README.
docs/architecture/distributed_runtime.md (5)
22-22: Capitalize language names & use “e.g.” instead of “i.e.”“Rust” and “Python” are proper nouns and should be capitalized. The parenthetical gives an example rather than an exhaustive list, so “e.g.” is the appropriate Latin abbreviation.
-Dynamo's `DistributedRuntime` is the core infrastructure in the framework that enables distributed communication and coordination between different Dynamo components. It is implemented in rust (`/lib/runtime`) and exposed to other programming languages via bindings (i.e., python bindings can be found in `/lib/bindings/python`). `DistributedRuntime` follows a hierarchical structure: +Dynamo's `DistributedRuntime` is the core infrastructure in the framework that enables distributed communication and coordination between different Dynamo components. It is implemented in Rust (`/lib/runtime`) and exposed to other programming languages via bindings (e.g., Python bindings can be found in `/lib/bindings/python`). `DistributedRuntime` follows a hierarchical structure:
33-35: Minor grammar fix – subject/verb agreement-Then, it route the request to the `Worker`. +Then, it routes the request to the `Worker`.
38-38: Namespace example contradicts preceding explanationThe paragraph says all components share a namespace such as
vllm-v1-aggorsglang-agg, yet the example code usesruntime.namespace("dynamo"). This is confusing for readers and may lead them to create inconsistent namespaces.-… programmatically (e.g., `runtime.namespace("dynamo").component("worker")`), … +… programmatically (e.g., `runtime.namespace("vllm-v1-agg").component("worker")`), …
78-79: Remove stray back-tick and tighten wordingThere is an extraneous back-tick at the end of the sentence, and “directories in … for full implementation details” reads awkwardly.
- - Python: We also provide complete examples of using `DistributedRuntime` for communication and Dynamo's LLM library for prompt templates and (de)tokenization to deploy Dynamo graphs. Please refer to the directories in `/components/backends`. ` +- Python: Complete examples that combine `DistributedRuntime` with Dynamo’s LLM utilities (prompt templates, (de)tokenization, etc.) are available in `/components/backends/`; see those directories for full implementation details.
59-59: Typo: “Componenet” → “Component”Small spelling error that might hinder searchability in the rendered docs.
- - etcd Registration: The endpoint information is stored in etcd at a path following the naming: `/services/{namespace}/{component}/{endpoint}-{lease_id}`. Note that the endpoints of different workers of the same type (i.e., two `PrefillWorker`s in one deployment) share the same `Namespace`, `Componenet`, and `Endpoint` name. They are distinguished by their different primary `lease_id` of their `DistributedRuntime`. + - etcd Registration: The endpoint information is stored in etcd at a path following the naming: `/services/{namespace}/{component}/{endpoint}-{lease_id}`. Note that the endpoints of different workers of the same type (i.e., two `PrefillWorker`s in one deployment) share the same `Namespace`, `Component`, and `Endpoint` name. They are distinguished by their different primary `lease_id` of their `DistributedRuntime`.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
components/backends/trtllm/README.md(8 hunks)components/backends/vllm/README.md(3 hunks)docs/architecture/distributed_runtime.md(3 hunks)examples/deployments/EKS/Create_EKS_EFS.md(1 hunks)examples/deployments/EKS/Deploy_Dynamo_Cloud.md(1 hunks)examples/deployments/EKS/Deploy_VLLM_example.md(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
examples/deployments/EKS/Deploy_VLLM_example.md (1)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
examples/deployments/EKS/Deploy_Dynamo_Cloud.md (1)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
docs/architecture/distributed_runtime.md (4)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
components/backends/trtllm/README.md (3)
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2157/merge) by athreesh.
components/backends/vllm/README.md
[error] 52-52: pre-commit hook 'trailing-whitespace' failed. Trailing whitespace was fixed in this file.
docs/architecture/distributed_runtime.md
[error] 31-31: pre-commit hook 'trailing-whitespace' failed. Trailing whitespace was fixed in this file.
🪛 LanguageTool
examples/deployments/EKS/Create_EKS_EFS.md
[style] ~112-~112: This phrase is redundant. Consider using “inside”.
Context: ... and then accessed by processes running inside of said pods as if they were part of the p...
(OUTSIDE_OF)
[style] ~125-~125: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...ly. This will affect whether your nodes are able to access the created EFS file system. ##...
(BE_ABLE_TO)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (1)
examples/deployments/EKS/Create_EKS_EFS.md (1)
44-46: Warn about g6e.48xlarge & EFA availability across regions.That SKU is still spot-market-only in most regions; cluster creation silently stalls if capacity is unavailable. Add a footnote or alternative (e.g.,
p5.48xlarge) to reduce first-time friction.
Co-authored-by: Ryan McCormick <[email protected]> Signed-off-by: Anish <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]> Signed-off-by: Anish <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]> Signed-off-by: Anish <[email protected]>
rmccorm4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than any unresolved comments
Co-authored-by: Ryan McCormick <[email protected]> Signed-off-by: Anish <[email protected]>
ishandhanani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments across the board
…dynamo into readmes-eks-additions
…stributed runtime docs
addressed Ishan's comments, will add a Github issue to add SGLang/TRT-LLM YAMLs to EKS example
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit