-
Notifications
You must be signed in to change notification settings - Fork 690
feat: Dynamo LLM Router Integeration #3045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new “LLM Router” deployment guide and associated Kubernetes manifests/configs under examples/deployments/LLM Router/, updates the examples index to link to it, and provides Helm values and router configuration for NVIDIA Dynamo-integrated, OpenAI-compatible routing across multiple vLLM models. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant G as OpenAI-compatible Gateway
participant R as LLM Router Server
participant DC as Dynamo Frontend
participant FE as Shared Frontend
participant W as vLLM Workers (Decode/Prefill)
C->>G: POST /v1/chat/completions (model: policy/model id)
G->>R: Forward request (API key, payload)
R->>R: Select route (task/complexity policy)
alt Model-based route
R->>DC: Invoke model endpoint (OpenAI-compatible)
DC->>FE: Dispatch request
FE->>W: Schedule to appropriate worker(s)
W-->>FE: Generate tokens / stream
FE-->>DC: Return outputs
DC-->>R: Response
R-->>G: Routed completion
G-->>C: Completion response
else Error
R-->>G: Error details
G-->>C: Error response
end
note over FE,W: Aggregated or disaggregated topology
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Pre-merge checks✅ Passed checks (3 passed)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (14)
examples/deployments/LLM Router/frontend.yaml (2)
16-16: Parameterize image with DYNAMO_IMAGE for consistency with docs.README instructs users to export DYNAMO_IMAGE; this manifest uses DYNAMO_VERSION directly. Align to avoid drift.
Apply:
- image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:${DYNAMO_VERSION} + image: ${DYNAMO_IMAGE}
11-16: Optional: add resources and imagePullSecrets.Frontend benefits from explicit CPU/memory requests/limits and (optionally) imagePullSecrets if pulling from NGC.
Example:
replicas: 1 extraPodSpec: mainContainer: - image: ${DYNAMO_IMAGE} + image: ${DYNAMO_IMAGE} + resources: + requests: + cpu: "500m" + memory: "1Gi" + limits: + cpu: "1" + memory: "2Gi" + imagePullSecrets: + - name: nvcr-secretexamples/deployments/LLM Router/agg.yaml (1)
20-26: Parameterize image with DYNAMO_IMAGE (matches README and env setup).Avoid duplicating registry/tag logic across files.
- image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:${DYNAMO_VERSION} + image: ${DYNAMO_IMAGE}examples/deployments/LLM Router/disagg.yaml (2)
20-26: Parameterize images with DYNAMO_IMAGE for consistency.- image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:${DYNAMO_VERSION} + image: ${DYNAMO_IMAGE}Apply to both workers.
Also applies to: 37-43
43-43: Add newline at EOF to satisfy yamllint.- - "python3 -m dynamo.vllm --model ${MODEL_NAME} --is-prefill-worker" + - "python3 -m dynamo.vllm --model ${MODEL_NAME} --is-prefill-worker" +examples/deployments/LLM Router/router-config-dynamo.yaml (2)
1-39: Trim trailing whitespace and keep headers minimal.Pre-commit flagged trailing whitespace. Keep SPDX header, but remove trailing spaces in comment block.
139-139: Add newline at EOF to satisfy yamllint.- model: mistralai/Mixtral-8x22B-Instruct-v0.1 + model: mistralai/Mixtral-8x22B-Instruct-v0.1 +examples/deployments/LLM Router/llm-router-values-override.yaml (3)
31-33: Don’t hardcode namespace in DYNAMO_API_BASE.Use ${NAMESPACE} to match README and envsubst flow.
- - name: DYNAMO_API_BASE - value: "http://vllm-frontend-frontend.dynamo-kubernetes.svc.cluster.local:8000" + - name: DYNAMO_API_BASE + value: "http://vllm-frontend-frontend.${NAMESPACE}.svc.cluster.local:8000"
101-110: Remove trailing whitespace and add newline at EOF.Pre-commit flagged trailing whitespace; also add final newline (yamllint).
service: type: ClusterIP +
45-66: Resources: consider adding CPU requests/limits for routerServer.You set memory and GPU; adding cpu requests/limits helps scheduling predictability.
resources: limits: nvidia.com/gpu: 1 memory: "8Gi" + cpu: "2" requests: nvidia.com/gpu: 1 memory: "8Gi" + cpu: "1"examples/deployments/LLM Router/README.md (4)
787-794: Wrong deployment name in logs command.Deployment is created by the Dynamo operator; earlier you port-forward svc/vllm-frontend-frontend. Align logs command.
-kubectl logs deployment/frontend -n ${NAMESPACE} --tail=10 +kubectl logs deployment/vllm-frontend-frontend -n ${NAMESPACE} --tail=10
391-391: Remove claim about health checks in extraPodSpec.Manifests don’t include probes; either add them or drop the note to avoid confusion.
215-237: Empty “model” fields in examples may confuse users.Either omit the field (let router pick) or show explicit model when bypassing router. Clarify intent.
- "model": "", + "model": "", "nim-llm-router": {Add a note: leave model empty when using router policies; set to a concrete HF id to bypass router.
Also applies to: 245-259
1-20: Add repository-standard SPDX header (for consistency).Other docs include SPDX in an HTML comment. Consider adding it here as well.
+<!-- +SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +SPDX-License-Identifier: Apache-2.0 +-->
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
examples/README.md(1 hunks)examples/deployments/LLM Router/README.md(1 hunks)examples/deployments/LLM Router/agg.yaml(1 hunks)examples/deployments/LLM Router/disagg.yaml(1 hunks)examples/deployments/LLM Router/frontend.yaml(1 hunks)examples/deployments/LLM Router/llm-router-values-override.yaml(1 hunks)examples/deployments/LLM Router/router-config-dynamo.yaml(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
examples/deployments/LLM Router/README.md
11-11: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
154-154: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
183-183: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
435-435: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
531-531: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
626-626: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
692-692: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
712-712: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
718-718: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
777-777: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
802-802: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
843-843: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
876-876: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
974-974: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3045/merge) by arunraman.
examples/deployments/LLM Router/README.md
[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace' during 'pre-commit run --show-diff-on-failure --color=always --all-files'.
examples/deployments/LLM Router/llm-router-values-override.yaml
[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace' during 'pre-commit run --show-diff-on-failure --color=always --all-files'.
examples/deployments/LLM Router/router-config-dynamo.yaml
[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace' during 'pre-commit run --show-diff-on-failure --color=always --all-files'.
🪛 YAMLlint (1.37.1)
examples/deployments/LLM Router/llm-router-values-override.yaml
[error] 110-110: no new line character at the end of file
(new-line-at-end-of-file)
examples/deployments/LLM Router/disagg.yaml
[error] 43-43: no new line character at the end of file
(new-line-at-end-of-file)
examples/deployments/LLM Router/router-config-dynamo.yaml
[error] 139-139: no new line character at the end of file
(new-line-at-end-of-file)
🪛 GitHub Actions: Copyright Checks
examples/deployments/LLM Router/llm-router-values-override.yaml
[error] 1-1: Copyright header check failed. Invalid/Missing Header: examples/deployments/LLM Router/llm-router-values-override.yaml.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
examples/deployments/LLM Router/agg.yaml (1)
15-17: Confirm resource key ‘gpu’ matches CRD schema.Other charts use nvidia.com/gpu. If this CRD expects ‘gpu’, keep it; otherwise switch.
examples/deployments/LLM Router/disagg.yaml (1)
33-35: Verify resource key ‘gpu’ vs ‘nvidia.com/gpu’.Ensure this matches DynamoGraphDeployment schema; otherwise routing to GPUs will fail scheduling.
examples/deployments/LLM Router/router-config-dynamo.yaml (1)
42-43: Confirm router-server service/port.The policy URLs target router-server:8000. Elsewhere you port-forward router-controller:8084. Verify this is the intended inference endpoint for policies.
examples/README.md (1)
39-39: LGTM: clear link addition.examples/deployments/LLM Router/README.md (1)
979-984: Health check endpoint mismatch with earlier port-forward.You port-forward router-controller:8084; ensure /health is served there (controller) or adjust to router-server service.
| global: | ||
| storageClass: "standard" | ||
| imageRegistry: "nvcr.io/nvidian/sae/" | ||
| imagePullSecrets: | ||
| - name: nvcr-secret | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing SPDX header (pipeline failure).
Add the repository-standard SPDX header to satisfy checks.
+##
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+##Place at top of file.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| global: | |
| storageClass: "standard" | |
| imageRegistry: "nvcr.io/nvidian/sae/" | |
| imagePullSecrets: | |
| - name: nvcr-secret | |
| ## | |
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| # SPDX-License-Identifier: Apache-2.0 | |
| ## | |
| global: | |
| storageClass: "standard" | |
| imageRegistry: "nvcr.io/nvidian/sae/" | |
| imagePullSecrets: | |
| - name: nvcr-secret |
🤖 Prompt for AI Agents
In examples/deployments/LLM Router/llm-router-values-override.yaml lines 6-11,
the file is missing the repository-standard SPDX header which causes pipeline
checks to fail; insert the repo's standard SPDX header as the very first line of
the file (use the exact SPDX-License-Identifier string used elsewhere in this
repository) so the header appears above all YAML content.
| # Global configuration (following official sample structure) | ||
| global: | ||
| storageClass: "standard" | ||
| imageRegistry: "nvcr.io/nvidian/sae/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registry typo will break image pulls.
“nvcr.io/nvidian/sae/” looks incorrect; should be “nvcr.io/nvidia/sae/” (or your actual org/team). Fix or parameterize.
- imageRegistry: "nvcr.io/nvidian/sae/"
+ imageRegistry: "nvcr.io/nvidia/sae/"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| imageRegistry: "nvcr.io/nvidian/sae/" | |
| imageRegistry: "nvcr.io/nvidia/sae/" |
🤖 Prompt for AI Agents
In examples/deployments/LLM Router/llm-router-values-override.yaml around line
8, the imageRegistry value contains a typo "nvcr.io/nvidian/sae/" which will
break image pulls; update it to the correct registry "nvcr.io/nvidia/sae/" (or
replace with the proper org/team registry), or make the registry a configurable
parameter and reference that variable throughout the values file to avoid
hardcoded typos.
| cd customizations/LLM\ Router | ||
|
|
||
| # Check that required files exist | ||
| ls -la frontend.yaml agg.yaml disagg.yaml router-config-dynamo.yaml llm-router-values-override.yaml | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix paths: “customizations/LLM Router” doesn’t exist in this repo.
Use the committed path examples/deployments/LLM Router.
-cd customizations/LLM\ Router
+cd examples/deployments/LLM\ RouterApply similar fixes for other occurrences (e.g., Lines 707–708).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cd customizations/LLM\ Router | |
| # Check that required files exist | |
| ls -la frontend.yaml agg.yaml disagg.yaml router-config-dynamo.yaml llm-router-values-override.yaml | |
| cd examples/deployments/LLM\ Router | |
| # Check that required files exist | |
| ls -la frontend.yaml agg.yaml disagg.yaml router-config-dynamo.yaml llm-router-values-override.yaml |
🤖 Prompt for AI Agents
In examples/deployments/LLM Router/README.md around lines 566 to 570, the path
"customizations/LLM Router" is incorrect; update the commands and any path
references to use the committed path "examples/deployments/LLM Router" (e.g.,
change cd customizations/LLM\ Router to cd examples/deployments/LLM\ Router and
adjust the ls paths accordingly), and apply the same replacement for the other
occurrences mentioned (e.g., lines 707–708) so all examples reference the
correct repository path.
| # 3. Create router configuration ConfigMap using official External ConfigMap strategy | ||
| # The official Helm chart now supports external ConfigMaps natively | ||
| kubectl create configmap router-config-dynamo \ | ||
| --from-file=config.yaml=router-config-dynamo.yaml \ | ||
| --namespace=llm-router | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConfigMap creation path likely wrong after changing directories.
You cd into NVIDIA LLM Router repo; router-config-dynamo.yaml lives in this repo. Provide an absolute or correct relative path.
-kubectl create configmap router-config-dynamo \
- --from-file=config.yaml=router-config-dynamo.yaml \
+kubectl create configmap router-config-dynamo \
+ --from-file=config.yaml=../../../examples/deployments/LLM\ Router/router-config-dynamo.yaml \
--namespace=llm-router📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # 3. Create router configuration ConfigMap using official External ConfigMap strategy | |
| # The official Helm chart now supports external ConfigMaps natively | |
| kubectl create configmap router-config-dynamo \ | |
| --from-file=config.yaml=router-config-dynamo.yaml \ | |
| --namespace=llm-router | |
| # 3. Create router configuration ConfigMap using official External ConfigMap strategy | |
| # The official Helm chart now supports external ConfigMaps natively | |
| kubectl create configmap router-config-dynamo \ | |
| --from-file=config.yaml=../../../examples/deployments/LLM\ Router/router-config-dynamo.yaml \ | |
| --namespace=llm-router |
🤖 Prompt for AI Agents
In examples/deployments/LLM Router/README.md around lines 896–901, the kubectl
create configmap command uses a relative filename that will be wrong after
cd'ing into the NVIDIA LLM Router repo; update the --from-file path to point to
the actual location of router-config-dynamo.yaml (either an absolute path or the
correct relative path from the current working directory, e.g.,
./router-config-dynamo.yaml or the full repo path) so kubectl can find the file
when creating the ConfigMap.
mohammedabdulwahhab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small nits but overall LGTM
… Dynamo integration - Updated README.md to include a new section for LLM Router deployment with NVIDIA Dynamo. - Added new YAML files for aggregated and disaggregated worker configurations (agg.yaml, disagg.yaml). - Introduced frontend.yaml for shared API frontend service. - Created router-config-dynamo.yaml for routing policies and model configurations. - Added llm-router-values-override.yaml for Helm values specific to LLM Router integration. - Included comprehensive documentation on deployment steps and routing strategies. Signed-off-by: arunraman <[email protected]> Signed-off-by: arunraman <[email protected]>
…update README.md for model routing - Deleted the helm-enhancement-implementation.yaml file as it is no longer needed. - Updated README.md to reflect changes in model routing, including new API base URLs and model names. - Adjusted environment variable descriptions for clarity, particularly regarding the DYNAMO_API_KEY for local deployments. - Enhanced deployment instructions to include multiple model deployment examples. Signed-off-by: arunraman <[email protected]> Signed-off-by: arunraman <[email protected]>
…deployment instructions - Added SPDX license headers to llm-router-values-override.yaml. - Updated imageRegistry placeholder in llm-router-values-override.yaml for clarity. - Revised README.md to reflect changes in directory structure and emphasize the need to update imageRegistry and imagePullSecrets. - Adjusted paths in README.md for configuration file references to ensure accuracy. - Modified router-config-dynamo.yaml to enhance model routing strategies and updated model names for better clarity. Signed-off-by: arunraman <[email protected]> Signed-off-by: arunraman <[email protected]>
Signed-off-by: arunraman <[email protected]> Signed-off-by: arunraman <[email protected]>
… deployment instructions Signed-off-by: arunraman <[email protected]> Signed-off-by: arunraman <[email protected]>
…uctions - Revised the overview and table of contents for better organization. - Enhanced quickstart section with detailed environment variable setup and deployment steps. - Updated routing strategies and API usage examples for clarity. - Adjusted version numbers and image references to reflect the latest updates. - Removed outdated sections and ensured consistency throughout the document. Signed-off-by: arunraman <[email protected]> Signed-off-by: arunraman <[email protected]>
…ategies section for improved readability Signed-off-by: arunraman <[email protected]> Signed-off-by: arunraman <[email protected]>
Signed-off-by: arunraman <[email protected]> Signed-off-by: arunraman <[email protected]>
Signed-off-by: arunraman <[email protected]> Signed-off-by: arunraman <[email protected]>
- Rename 'examples/deployments/LLM Router' to 'examples/deployments/LLMRouter' - Remove spaces from directory name for better Linux/Mac compatibility - Update all references in examples/README.md and deployment files - Update cd commands to use new path without quotes Signed-off-by: arunraman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arunraman, I'm assuming we will currently rely on you or the LLM Router team to maintain support for this example. If you have other expectations for maintenance moving forward, please let us know. If we can't have someone helping to maintain it, then we may need to consider removing it or moving the dynamo example to LLM Router repository at some point to flip the maintenance burden.
This pull request introduces a new example deployment guide and configuration for integrating NVIDIA Dynamo with an intelligent LLM Router. The changes add platform-specific deployment manifests, Helm values, and routing configuration to enable production-ready, multi-model LLM request routing using Dynamo. The deployment patterns support intelligent routing across multiple models based on task type and complexity.
LLM Router integration and deployment:
examples/README.mdlinking to the LLM Router deployment guide for NVIDIA Dynamo integration, making it discoverable for users seeking intelligent routing solutions.Kubernetes deployment manifests:
examples/deployments/LLM Router/:agg.yaml: Defines a singleVllmDecodeWorkerservice for aggregated model serving.disagg.yaml: Deploys bothVllmDecodeWorkerandVllmPrefillWorkerservices for disaggregated model serving, allowing for more granular scaling.frontend.yaml: Specifies the frontend service for the LLM Router, connecting to the Dynamo backend.Helm values and configuration:
llm-router-values-override.yamlto provide Helm chart values for deploying the LLM Router with Dynamo integration, including image, environment, service, and storage configuration, as well as support for external ConfigMap-based routing configuration.Intelligent LLM routing policies:
router-config-dynamo.yamlConfigMap manifest, defining intelligent routing policies that map various task types and complexity levels to specific LLMs (Llama 8B, Llama 70B, Mixtral 8x22B), leveraging environment variables for service endpoints and authentication.Summary by CodeRabbit
New Features
Documentation