feat: Dynamo LLM Router Integeration #3045

arunraman · 2025-09-16T02:57:46Z

This pull request introduces a new example deployment guide and configuration for integrating NVIDIA Dynamo with an intelligent LLM Router. The changes add platform-specific deployment manifests, Helm values, and routing configuration to enable production-ready, multi-model LLM request routing using Dynamo. The deployment patterns support intelligent routing across multiple models based on task type and complexity.

LLM Router integration and deployment:

Added a new section to the examples/README.md linking to the LLM Router deployment guide for NVIDIA Dynamo integration, making it discoverable for users seeking intelligent routing solutions.

Kubernetes deployment manifests:

Introduced three new Kubernetes YAML manifests in examples/deployments/LLM Router/:
- agg.yaml: Defines a single VllmDecodeWorker service for aggregated model serving.
- disagg.yaml: Deploys both VllmDecodeWorker and VllmPrefillWorker services for disaggregated model serving, allowing for more granular scaling.
- frontend.yaml: Specifies the frontend service for the LLM Router, connecting to the Dynamo backend.

Helm values and configuration:

Added llm-router-values-override.yaml to provide Helm chart values for deploying the LLM Router with Dynamo integration, including image, environment, service, and storage configuration, as well as support for external ConfigMap-based routing configuration.

Intelligent LLM routing policies:

Added router-config-dynamo.yaml ConfigMap manifest, defining intelligent routing policies that map various task types and complexity levels to specific LLMs (Llama 8B, Llama 70B, Mixtral 8x22B), leveraging environment variables for service endpoints and authentication.

Summary by CodeRabbit

New Features
- Added end-to-end LLM Router deployment examples for Kubernetes with NVIDIA Dynamo integration.
- Supports task-based and complexity-based routing through a unified API gateway.
- Includes shared frontend plus aggregated and disaggregated model worker options.
- Provides configurable Helm values and routing policies with demo app support.
Documentation
- Introduced a new production deployment guide for LLM Router with step-by-step installation, requirements, validation, and troubleshooting.
- Linked the guide under platform-specific deployment references.

copy-pr-bot · 2025-09-16T02:57:49Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-16T03:07:17Z

Walkthrough

Adds a new “LLM Router” deployment guide and associated Kubernetes manifests/configs under examples/deployments/LLM Router/, updates the examples index to link to it, and provides Helm values and router configuration for NVIDIA Dynamo-integrated, OpenAI-compatible routing across multiple vLLM models.

Changes

Cohort / File(s)	Summary of changes
Docs index link `examples/README.md`	Adds a new bullet linking to deployments/LLM%20Router/ for platform-specific production deployment guides.
LLM Router deployment guide `examples/deployments/LLM Router/README.md`	New comprehensive Kubernetes deployment guide for NVIDIA Dynamo + LLM Router, covering architecture, routing paradigms, prerequisites, step-by-step setup, validation, and troubleshooting.
DynamoGraphDeployment manifests (workers & frontend) `examples/deployments/LLM Router/agg.yaml`, `examples/deployments/LLM Router/disagg.yaml`, `examples/deployments/LLM Router/frontend.yaml`	Adds CRD manifests for vLLM aggregated decode worker, disaggregated decode/prefill workers, and a shared frontend service; parameterized via env/templated variables.
Helm values override for Router `examples/deployments/LLM Router/llm-router-values-override.yaml`	Adds Helm values enabling controller/server/app, GPU resources, PVC-mounted model repo, external ConfigMap/Secret wiring, optional Ingress, and cluster-internal service settings.
Router external config `examples/deployments/LLM Router/router-config-dynamo.yaml`	Adds router configuration defining task-based and complexity-based policies mapping request categories to target models via Dynamo API base/key.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant C as Client
    participant G as OpenAI-compatible Gateway
    participant R as LLM Router Server
    participant DC as Dynamo Frontend
    participant FE as Shared Frontend
    participant W as vLLM Workers (Decode/Prefill)

    C->>G: POST /v1/chat/completions (model: policy/model id)
    G->>R: Forward request (API key, payload)
    R->>R: Select route (task/complexity policy)
    alt Model-based route
        R->>DC: Invoke model endpoint (OpenAI-compatible)
        DC->>FE: Dispatch request
        FE->>W: Schedule to appropriate worker(s)
        W-->>FE: Generate tokens / stream
        FE-->>DC: Return outputs
        DC-->>R: Response
        R-->>G: Routed completion
        G-->>C: Completion response
    else Error
        R-->>G: Error details
        G-->>C: Error response
    end
    note over FE,W: Aggregated or disaggregated topology

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: vllm router examples #1942 — Similar additions of vLLM/Dynamo deployment examples and router-oriented manifests.
feat: add crds for vllm and llm examples #1766 — Adds DynamoGraphDeployment manifests for vLLM router components, aligning with agg/disagg and frontend patterns.
feat: update DynamoGraphDeployments for vllm_v1 #1890 — Introduces vLLM decode/prefill worker CRDs comparable to the aggregated/disaggregated setups here.

Poem

I thump my paws, deploy with cheer,
A router hops from queue to peer.
Through Dynamo fields the tokens flow,
To llama, mixtral—on they go.
One gate, many paths—what bliss!
Ship it quick—bun’s seal, a kiss. 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	The PR description provides a clear Overview and Details, enumerating the added README link, Kubernetes manifests, Helm values, and routing policies and therefore captures the main changes and intent; it maps well to the repository's template content. The description omits explicit template headings for "Where should the reviewer start?" and "Related Issues" which the repository template requests, so those specific fields are incomplete. Overall the description is mostly complete and actionable but should include the missing template sections to be fully compliant.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.
Title Check	✅ Passed	The title "feat: Dynamo LLM Router Integeration" correctly captures the PR’s primary change—adding Dynamo integration with an LLM Router via example deployments and configuration—so it reflects the main purpose and is concise; however it contains a spelling error ("Integeration" → "Integration") and could be slightly clearer by mentioning that these are example deployment guides.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (14)

examples/deployments/LLM Router/frontend.yaml (2)
16-16: Parameterize image with DYNAMO_IMAGE for consistency with docs.

README instructs users to export DYNAMO_IMAGE; this manifest uses DYNAMO_VERSION directly. Align to avoid drift.

Apply:
-          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:${DYNAMO_VERSION}
+          image: ${DYNAMO_IMAGE}
11-16: Optional: add resources and imagePullSecrets.

Frontend benefits from explicit CPU/memory requests/limits and (optionally) imagePullSecrets if pulling from NGC.

Example:
       replicas: 1
       extraPodSpec:
         mainContainer:
-          image: ${DYNAMO_IMAGE}
+          image: ${DYNAMO_IMAGE}
+          resources:
+            requests:
+              cpu: "500m"
+              memory: "1Gi"
+            limits:
+              cpu: "1"
+              memory: "2Gi"
+        imagePullSecrets:
+          - name: nvcr-secret
examples/deployments/LLM Router/agg.yaml (1)
20-26: Parameterize image with DYNAMO_IMAGE (matches README and env setup).

Avoid duplicating registry/tag logic across files.
-          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:${DYNAMO_VERSION}
+          image: ${DYNAMO_IMAGE}
examples/deployments/LLM Router/disagg.yaml (2)
20-26: Parameterize images with DYNAMO_IMAGE for consistency.
-          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:${DYNAMO_VERSION}
+          image: ${DYNAMO_IMAGE}
Apply to both workers.

Also applies to: 37-43

43-43: Add newline at EOF to satisfy yamllint.
-            - "python3 -m dynamo.vllm --model ${MODEL_NAME} --is-prefill-worker"
+            - "python3 -m dynamo.vllm --model ${MODEL_NAME} --is-prefill-worker"
+
examples/deployments/LLM Router/router-config-dynamo.yaml (2)
1-39: Trim trailing whitespace and keep headers minimal.

Pre-commit flagged trailing whitespace. Keep SPDX header, but remove trailing spaces in comment block.

139-139: Add newline at EOF to satisfy yamllint.
-        model: mistralai/Mixtral-8x22B-Instruct-v0.1 
+        model: mistralai/Mixtral-8x22B-Instruct-v0.1 
+
examples/deployments/LLM Router/llm-router-values-override.yaml (3)
31-33: Don’t hardcode namespace in DYNAMO_API_BASE.

Use ${NAMESPACE} to match README and envsubst flow.
-    - name: DYNAMO_API_BASE
-      value: "http://vllm-frontend-frontend.dynamo-kubernetes.svc.cluster.local:8000"
+    - name: DYNAMO_API_BASE
+      value: "http://vllm-frontend-frontend.${NAMESPACE}.svc.cluster.local:8000"
101-110: Remove trailing whitespace and add newline at EOF.

Pre-commit flagged trailing whitespace; also add final newline (yamllint).
   service:
     type: ClusterIP  
+  
45-66: Resources: consider adding CPU requests/limits for routerServer.

You set memory and GPU; adding cpu requests/limits helps scheduling predictability.
   resources:
     limits:
       nvidia.com/gpu: 1
       memory: "8Gi"
+      cpu: "2"
     requests:
       nvidia.com/gpu: 1
       memory: "8Gi"
+      cpu: "1"
examples/deployments/LLM Router/README.md (4)
787-794: Wrong deployment name in logs command.

Deployment is created by the Dynamo operator; earlier you port-forward svc/vllm-frontend-frontend. Align logs command.
-kubectl logs deployment/frontend -n ${NAMESPACE} --tail=10
+kubectl logs deployment/vllm-frontend-frontend -n ${NAMESPACE} --tail=10
391-391: Remove claim about health checks in extraPodSpec.

Manifests don’t include probes; either add them or drop the note to avoid confusion.

215-237: Empty “model” fields in examples may confuse users.

Either omit the field (let router pick) or show explicit model when bypassing router. Clarify intent.
-    "model": "",
+    "model": "",
     "nim-llm-router": {
Add a note: leave model empty when using router policies; set to a concrete HF id to bypass router.

Also applies to: 245-259

1-20: Add repository-standard SPDX header (for consistency).

Other docs include SPDX in an HTML comment. Consider adding it here as well.
+

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1186ae and 3bb2b21.

📒 Files selected for processing (7)

examples/README.md (1 hunks)
examples/deployments/LLM Router/README.md (1 hunks)
examples/deployments/LLM Router/agg.yaml (1 hunks)
examples/deployments/LLM Router/disagg.yaml (1 hunks)
examples/deployments/LLM Router/frontend.yaml (1 hunks)
examples/deployments/LLM Router/llm-router-values-override.yaml (1 hunks)
examples/deployments/LLM Router/router-config-dynamo.yaml (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

examples/deployments/LLM Router/README.md

11-11: Emphasis used instead of a heading