Skip to content

Conversation

@hutm
Copy link
Contributor

@hutm hutm commented Sep 11, 2025

Overview:

added example for a frontend shared across multiple models

Details:

added example for a frontend shared across multiple models

Where should the reviewer start?

review all the files

Summary by CodeRabbit

  • New Features
    • Added a Kubernetes example for a Shared Frontend that serves multiple models with a shared model cache.
    • Provides manifests to deploy the stack and expose endpoints for listing models (/v1/models) and chat completions.
  • Documentation
    • New README with end-to-end deployment steps: install chart, create access token secret, apply manifests, port-forward, and test requests.
    • Includes guidance for verifying model availability and sample payloads, plus a reference for benchmarking.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 11, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hutm hutm changed the title added example for a frontend shared across multiple models [docs] added example for a frontend shared across multiple models Sep 11, 2025
@hutm hutm changed the title [docs] added example for a frontend shared across multiple models docs: added example for a frontend shared across multiple models Sep 11, 2025
@github-actions github-actions bot added the docs label Sep 11, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 11, 2025

Walkthrough

Adds a new Kubernetes example for a shared Dynamo frontend. Introduces a README with deployment steps and a manifest defining a PVC, a frontend deployment, a vLLM aggregation worker, and an agg-qwen stack (encode, VLM, processor), all using a shared HF cache and token secret.

Changes

Cohort / File(s) Summary
Docs: Kubernetes shared frontend README
examples/basics/kubernetes/shared_frontend/README.md
New README describing deployment to the dynamo namespace, Helm install, HF token secret, applying shared_frontend.yaml, port-forwarding on 8000, listing /v1/models, sample chat completion, and GenAI-Perf reference.
Kubernetes manifests: shared frontend stack
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml
New manifest adding: PVC dynamo-model-cache (100Gi); DynamoGraphDeployment frontend (namespace: dynamo); DynamoGraphDeployment vllm-agg (VllmDecodeWorker, GPU 1, HF cache/token); DynamoGraphDeployment agg-qwen (EncodeWorker, VLMWorker prefill, Processor) with shared PVC mounts and command entries.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Frontend as Frontend (dynamo)
    participant vLLMAgg as VllmDecodeWorker (vllm-agg)
    participant AggQwen as agg-qwen Services
    participant HFCache as Shared PVC (/root/.cache/huggingface)
    participant HF as Hugging Face Hub

    User->>Frontend: HTTP request (/v1/*)
    alt Text decode
        Frontend->>vLLMAgg: Generate/Decode request
        vLLMAgg-->>HFCache: Read/Write model weights
        HFCache-->>HF: Fetch missing weights (via HF token)
        vLLMAgg-->>Frontend: Tokens/Result
    else Multimodal pipeline
        Frontend->>AggQwen: Encode request
        AggQwen->>AggQwen: EncodeWorker → VLMWorker(prefill) → Processor
        AggQwen-->>HFCache: Read/Write model weights
        HFCache-->>HF: Fetch missing weights (via HF token)
        AggQwen-->>Frontend: Pipeline result
    end
    Frontend-->>User: Response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A rabbit twitches whiskers, keen,
New pods arise in namespaces clean—
One cache to share, the models hum,
Frontend routes and tokens come.
VLLM sings, Qwen joins the thread,
Burrows of YAML, neatly spread.
Hop! The cluster’s green lights led.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Pre-merge checks (2 passed, 1 inconclusive)

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description Check ❓ Inconclusive The description includes the template headings (Overview, Details, Where should the reviewer start?) but the content is minimal and largely repetitive, offering no file-level guidance, summary of key changes, or testing instructions. The "Where should the reviewer start?" entry only says "review all the files," which does not meet the template's intent to call out specific files or risk areas. Because important details required by the template are missing, the description is insufficient to confidently assess the PR. Please expand Details to briefly summarize the added files and key changes (for example, examples/basics/kubernetes/shared_frontend/README.md and shared_frontend.yaml) and note important resources such as the new PVC and DynamoGraphDeployment entries and any manual test steps; replace "review all the files" with explicit starting points and callouts (specific files, sections, or commands to run) and add a Related Issues line if applicable. After those additions the description will meet the repository template and can be marked as pass.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Title Check ✅ Passed The title accurately and concisely summarizes the primary change: adding an example for a frontend shared across multiple models (documentation plus Kubernetes manifests). It directly matches the added README and shared_frontend.yaml in the changeset and is clear for a teammate scanning PR history. The phrasing is specific and avoids unnecessary noise.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (4)
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (2)

39-41: HF token secret must exist in each runtime namespace.

envFromSecret assumes hf-token-secret resides in the pod namespace. Ensure the secret is created in: vllm-agg, agg-qwen (and dynamo if frontend needs it). Update README accordingly. I can provide a patch.

Also applies to: 68-70, 89-91, 110-112


43-46: Add CPU/memory requests for reliable scheduling.

Only GPU limits are set. Add cpu/memory requests (and limits as needed). Example (tune values):

       resources:
-        limits:
-          gpu: "1"
+        requests:
+          cpu: "2"
+          memory: "8Gi"
+        limits:
+          gpu: "1"
+          memory: "16Gi"

Also applies to: 72-75, 94-96, 115-116

examples/basics/kubernetes/shared_frontend/README.md (2)

18-21: Port-forward: pin to the frontend namespace.

Service name may vary by operator, but namespace should be dynamo.

-kubectl port-forward svc/frontend-frontend 8000:8000 -n ${NAMESPACE}
+kubectl port-forward svc/frontend-frontend 8000:8000 -n dynamo

1-8: Trailing whitespace fixed by pre-commit—keep it clean.

Pre-merge hook reported trailing whitespace on these blocks; ensure editors trim on save to avoid CI churn.

Also applies to: 20-22

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2eced09 and c0e89f5.

📒 Files selected for processing (2)
  • examples/basics/kubernetes/shared_frontend/README.md (1 hunks)
  • examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3008/merge) by hutm.
examples/basics/kubernetes/shared_frontend/README.md

[error] 1-8: Trailing whitespace detected by pre-commit (hook: trailing-whitespace). The hook fixed issues in this file.


[error] 20-22: Trailing whitespace detected by pre-commit (hook: trailing-whitespace) on code fence line. The hook fixed issues.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (2)

43-46: No change required — CRD expects resources.limits.gpu
DynamoGraphDeployment CRD uses plain "gpu" under resources.limits (not "nvidia.com/gpu"); the YAML snippet is correct.


16-18: ```shell
#!/bin/bash
set -euo pipefail

echo "PWD: $(pwd)"
TARGET="examples/basics/kubernetes/shared_frontend/shared_frontend.yaml"

echo
if [ -f "$TARGET" ]; then
echo "---- $TARGET (full) ----"
sed -n '1,200p' "$TARGET"
else
echo "FILE NOT FOUND: $TARGET"
fi

echo
echo "---- Search for 'dynamoNamespace' ----"
rg -n --hidden -S "dynamoNamespace" || true

echo
echo "---- Search for 'dynamo' (word) ----"
rg -n --hidden -S "\bdynamo\b" || true

echo
echo "---- YAML files under examples/basics/kubernetes ----"
find examples/basics/kubernetes -maxdepth 4 -type f -name "*.yaml" -print | sed -n '1,200p' || true

echo
echo "---- Search for 'namespace:' occurrences in examples (show context) ----"
rg -n -C2 --hidden -S "namespace:" examples || true

echo
echo "---- Show 'metadata:' blocks in that examples path (context) ----"
rg -n -C2 --hidden -S "metadata:" examples/basics/kubernetes || true


</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@hutm hutm force-pushed the mkhadkevich/addSeparateFrontEndExample branch from c0e89f5 to f01958f Compare September 11, 2025 03:00
@hutm hutm force-pushed the mkhadkevich/addSeparateFrontEndExample branch from ef108af to f01958f Compare September 24, 2025 23:19
@hutm hutm requested a review from a team as a code owner September 24, 2025 23:19
@hutm hutm force-pushed the mkhadkevich/addSeparateFrontEndExample branch from f01958f to d79fabf Compare September 24, 2025 23:24
@hutm hutm force-pushed the mkhadkevich/addSeparateFrontEndExample branch 2 times, most recently from 925ff6a to a34b630 Compare October 2, 2025 23:35
Copy link
Contributor

@biswapanda biswapanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor typo. changes lgtm otherwise

@biswapanda
Copy link
Contributor

Thanks @hutm !
Few minor typos - lgtm otherwise

Copy link
Contributor

@biswapanda biswapanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@hutm hutm enabled auto-merge (squash) October 3, 2025 18:50
@hutm hutm force-pushed the mkhadkevich/addSeparateFrontEndExample branch from eb149a1 to 956251e Compare October 3, 2025 19:29
@biswapanda
Copy link
Contributor

/ok to test 956251e

@hutm hutm merged commit 13bf5d9 into main Oct 8, 2025
15 checks passed
@hutm hutm deleted the mkhadkevich/addSeparateFrontEndExample branch October 8, 2025 01:47
ptarasiewiczNV pushed a commit that referenced this pull request Oct 8, 2025
nv-tusharma pushed a commit that referenced this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants