-
Notifications
You must be signed in to change notification settings - Fork 691
docs: added example for a frontend shared across multiple models #3008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a new Kubernetes example for a shared Dynamo frontend. Introduces a README with deployment steps and a manifest defining a PVC, a frontend deployment, a vLLM aggregation worker, and an agg-qwen stack (encode, VLM, processor), all using a shared HF cache and token secret. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Frontend as Frontend (dynamo)
participant vLLMAgg as VllmDecodeWorker (vllm-agg)
participant AggQwen as agg-qwen Services
participant HFCache as Shared PVC (/root/.cache/huggingface)
participant HF as Hugging Face Hub
User->>Frontend: HTTP request (/v1/*)
alt Text decode
Frontend->>vLLMAgg: Generate/Decode request
vLLMAgg-->>HFCache: Read/Write model weights
HFCache-->>HF: Fetch missing weights (via HF token)
vLLMAgg-->>Frontend: Tokens/Result
else Multimodal pipeline
Frontend->>AggQwen: Encode request
AggQwen->>AggQwen: EncodeWorker → VLMWorker(prefill) → Processor
AggQwen-->>HFCache: Read/Write model weights
HFCache-->>HF: Fetch missing weights (via HF token)
AggQwen-->>Frontend: Pipeline result
end
Frontend-->>User: Response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Pre-merge checks (2 passed, 1 inconclusive)❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (4)
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (2)
39-41: HF token secret must exist in each runtime namespace.envFromSecret assumes hf-token-secret resides in the pod namespace. Ensure the secret is created in: vllm-agg, agg-qwen (and dynamo if frontend needs it). Update README accordingly. I can provide a patch.
Also applies to: 68-70, 89-91, 110-112
43-46: Add CPU/memory requests for reliable scheduling.Only GPU limits are set. Add cpu/memory requests (and limits as needed). Example (tune values):
resources: - limits: - gpu: "1" + requests: + cpu: "2" + memory: "8Gi" + limits: + gpu: "1" + memory: "16Gi"Also applies to: 72-75, 94-96, 115-116
examples/basics/kubernetes/shared_frontend/README.md (2)
18-21: Port-forward: pin to the frontend namespace.Service name may vary by operator, but namespace should be
dynamo.-kubectl port-forward svc/frontend-frontend 8000:8000 -n ${NAMESPACE} +kubectl port-forward svc/frontend-frontend 8000:8000 -n dynamo
1-8: Trailing whitespace fixed by pre-commit—keep it clean.Pre-merge hook reported trailing whitespace on these blocks; ensure editors trim on save to avoid CI churn.
Also applies to: 20-22
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/basics/kubernetes/shared_frontend/README.md(1 hunks)examples/basics/kubernetes/shared_frontend/shared_frontend.yaml(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3008/merge) by hutm.
examples/basics/kubernetes/shared_frontend/README.md
[error] 1-8: Trailing whitespace detected by pre-commit (hook: trailing-whitespace). The hook fixed issues in this file.
[error] 20-22: Trailing whitespace detected by pre-commit (hook: trailing-whitespace) on code fence line. The hook fixed issues.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (2)
43-46: No change required — CRD expects resources.limits.gpu
DynamoGraphDeployment CRD uses plain "gpu" under resources.limits (not "nvidia.com/gpu"); the YAML snippet is correct.
16-18: ```shell
#!/bin/bash
set -euo pipefailecho "PWD: $(pwd)"
TARGET="examples/basics/kubernetes/shared_frontend/shared_frontend.yaml"echo
if [ -f "$TARGET" ]; then
echo "---- $TARGET (full) ----"
sed -n '1,200p' "$TARGET"
else
echo "FILE NOT FOUND: $TARGET"
fiecho
echo "---- Search for 'dynamoNamespace' ----"
rg -n --hidden -S "dynamoNamespace" || trueecho
echo "---- Search for 'dynamo' (word) ----"
rg -n --hidden -S "\bdynamo\b" || trueecho
echo "---- YAML files under examples/basics/kubernetes ----"
find examples/basics/kubernetes -maxdepth 4 -type f -name "*.yaml" -print | sed -n '1,200p' || trueecho
echo "---- Search for 'namespace:' occurrences in examples (show context) ----"
rg -n -C2 --hidden -S "namespace:" examples || trueecho
echo "---- Show 'metadata:' blocks in that examples path (context) ----"
rg -n -C2 --hidden -S "metadata:" examples/basics/kubernetes || true</blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
c0e89f5 to
f01958f
Compare
ef108af to
f01958f
Compare
f01958f to
d79fabf
Compare
925ff6a to
a34b630
Compare
biswapanda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor typo. changes lgtm otherwise
|
Thanks @hutm ! |
biswapanda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Signed-off-by: Maksim Khadkevich <[email protected]>
Signed-off-by: Maksim Khadkevich <[email protected]>
eb149a1 to
956251e
Compare
|
/ok to test 956251e |
Signed-off-by: Maksim Khadkevich <[email protected]> Signed-off-by: Piotr Tarasiewicz <[email protected]>
Signed-off-by: Maksim Khadkevich <[email protected]>
Overview:
added example for a frontend shared across multiple models
Details:
added example for a frontend shared across multiple models
Where should the reviewer start?
review all the files
Summary by CodeRabbit