-
Notifications
You must be signed in to change notification settings - Fork 690
feat: update DynamoGraphDeployments for vllm_v1 #1890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update DynamoGraphDeployments for vllm_v1 #1890
Conversation
…-create-deploy-crds-for-vllm_v1-example
… of github.com:ai-dynamo/dynamo into hannahz/dep-216-create-deploy-crds-for-vllm_v1-example
… of https://github.com/ai-dynamo/dynamo into hannahz/dep-216-create-deploy-crds-for-vllm_v1-example
WalkthroughThe updates introduce and revise Kubernetes deployment manifests for various vLLM serving configurations, add detailed health probes, update container images and startup commands, and remove obsolete services. Documentation is expanded with a new section on Kubernetes deployment, outlining prerequisites, usage, and deployment instructions for the provided YAML manifests. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend (Pod)
participant Worker (Pod)
participant Kubernetes
User->>Kubernetes: Apply YAML manifest (kubectl apply)
Kubernetes->>Frontend (Pod): Schedule and start container
Frontend (Pod)->>Frontend (Pod): Liveness/readiness probes (HTTP GET /health)
Frontend (Pod)->>Worker (Pod): Send request for model inference
Worker (Pod)->>Worker (Pod): Liveness/readiness probes (log check/exec)
Worker (Pod)-->>Frontend (Pod): Return inference result
Frontend (Pod)-->>User: Return response
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
examples/vllm/deploy/agg.yaml (1)
98-100: Command passed as a single argument – shell features (pipe, redirection) will not runBecause the entire string is one list element, Kubernetes will exec the container’s entrypoint with one argument that literally contains pipes and redirects.
Use/bin/sh -cso the shell interprets| tee:- args: - - "python3 components/main.py --model Qwen/Qwen3-0.6B --enforce-eager 2>&1 | tee /tmp/vllm.log" + args: + - /bin/sh + - -c + - | + python3 components/main.py \ + --model Qwen/Qwen3-0.6B \ + --enforce-eager 2>&1 | tee /tmp/vllm.log
♻️ Duplicate comments (3)
examples/vllm/deploy/disagg_router.yaml (1)
95-100: Same shell-vs-args problem as inagg_router.yamlApply the
/bin/sh -cpattern here too; otherwise decode workers won’t start.examples/vllm/deploy/disagg_planner.yaml (1)
52-60: Shell piping without shell – will failReplicate the command/args fix shown for
agg_router.yaml.examples/vllm/deploy/disagg.yaml (1)
52-60: Worker container will not start due to missing shellSame fix as earlier manifests required.
🧹 Nitpick comments (8)
examples/vllm/README.md (2)
137-140: Prefer port-forwarding the Service, not an individual Deployment-pod
kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> …requires knowing an ephemeral pod name and breaks after every rollout.
Forward the stable Service instead:-kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8080:8000 +kubectl port-forward svc/vllm-v1-disagg-frontend 8080:8000
146-149: Repository path looks wrong
cd ~/dynamo/examples/vllm/deploydoes not match the tree (examples/vllm/deploy).
Drop the home-prefixed path to avoid copy-paste errors.-cd ~/dynamo/examples/vllm/deploy +cd examples/vllm/deployexamples/vllm/deploy/disagg_router.yaml (1)
100-138: Readiness probe may never succeed if the log format changesCoupling readiness to an exact log regex (
VllmWorker.*has been initialized) is brittle. Consider exposing an HTTP/healthendpoint from the worker and probing that instead.examples/vllm/deploy/disagg_planner.yaml (1)
25-33: Probes start after 60 s with 60 s periods – slow feedbackA crashed pod may stay un-restarted for a full minute. Tighten to e.g.,
initialDelaySeconds: 15,periodSeconds: 10unless cold-start time dictates otherwise.examples/vllm/deploy/disagg.yaml (1)
25-42: Security context not setPer team guidance (see learning on PodSecurityContext), set a pod-level
runAsUser: 1000,runAsGroup: 1000,fsGroup: 1000and override only when root is truly required.examples/vllm/deploy/agg.yaml (3)
55-60: Minor:out=dynlooks suspicious – is this intentional?The old command exposed HTTP output; the new one sets
out=dyn, which is an internal transport that is not externally accessible. If the intention is to expose an HTTP endpoint, this should probably remainout=http.- - out=dyn + - out=httpDouble-check this before shipping the manifest.
72-81: Race condition in workerreadinessProbeThe probe greps for a one-shot log line; once it scrolls out of
/tmp/vllm.log, the probe flips to Unready permanently on log rotation.
Instead, expose an HTTP/healthzin the worker or write a sentinel file after the worker is initialised and test for its existence.
62-62: Secret requirement not documented
envFromSecret: hf-token-secrethard-wires a secret name that must exist in the target namespace.
Add a short note in the README or the manifest’s annotations so users know to create it before applying.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
examples/vllm/README.md(1 hunks)examples/vllm/deploy/agg.yaml(3 hunks)examples/vllm/deploy/agg_router.yaml(1 hunks)examples/vllm/deploy/disagg.yaml(4 hunks)examples/vllm/deploy/disagg_planner.yaml(1 hunks)examples/vllm/deploy/disagg_router.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
examples/vllm/deploy/disagg.yaml (1)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1302-1306
Timestamp: 2025-06-11T21:18:00.425Z
Learning: In the Dynamo operator, the project’s preferred security posture is to set a Pod-level `PodSecurityContext` with `runAsUser`, `runAsGroup`, and `fsGroup` all set to `1000`, and then selectively override the user at the individual container level (e.g., `RunAsUser: 0` for Kaniko) when root is required.
examples/vllm/deploy/agg.yaml (1)
Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/configs/agg_tp_1_dp_8.yaml:31-38
Timestamp: 2025-07-01T15:33:53.262Z
Learning: In fault tolerance test configurations, the `resources` section under `ServiceArgs` specifies resources per individual worker, not total resources for all workers. So `workers: 8` with `gpu: '1'` means 8 workers × 1 GPU each = 8 GPUs total.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (4)
examples/vllm/deploy/agg_router.yaml (2)
86-93: Non-standard GPU resource keyUse the standard
nvidia.com/gpuresource unlessDynamoGraphDeploymentmutates it internally. Sticking to the canonical key avoids scheduler mis-accounting on vanilla clusters.
52-60:argsstring uses shell features but no shell is invoked
| tee /tmp/vllm.logrequires a shell. Because onlyargsis supplied, Kubernetes passes the whole string to the image ENTRYPOINT.
Unless the image’sENTRYPOINTis/bin/sh -c, the container will fail.- args: - - "python3 components/main.py --model Qwen/Qwen3-0.6B --enforce-eager 2>&1 | tee /tmp/vllm.log" + command: ["/bin/sh", "-c"] + args: + - python3 components/main.py --model Qwen/Qwen3-0.6B --enforce-eager 2>&1 | tee /tmp/vllm.logLikely an incorrect or invalid review comment.
examples/vllm/deploy/agg.yaml (2)
52-60: Verify that the working directory exists inside the new imageThe image was bumped and the working directory changed from
/workspace/examples/vllm_v1to/workspace/examples/vllm.
If that directory is missing,dynamowill fail to locate model/asset files at container start-up.Please confirm the new image layout or revert the path.
30-38: Replace exec readinessProbe with httpGet to avoid relying onjq
- File:
examples/vllm/deploy/agg.yaml(lines 30–38)- Verify whether the
nvcr.io/nvidian/nim-llm-dev/vllm_v1-runtime:dep-216.4image includesjq; if not, the current probe will always fail.- readinessProbe: - exec: - command: - - /bin/sh - - -c - - 'curl -s http://localhost:8000/health | jq -e ".status == \"healthy\""' + readinessProbe: + httpGet: + path: /health + port: 8000 initialDelaySeconds: 60 periodSeconds: 60 timeoutSeconds: 30Using an
httpGetprobe removes the external-binary dependency. If you truly need JSON parsing, consider bakingjqinto a custom image, but that increases attack surface.
|
Approved to unblock. |
nnshah1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
health and ready checks will be revised when health and ready endpoints are exposed from rust runtime
|
fyi the changes in the .yaml seem to point to private images |
Co-authored-by: mohammedabdulwahhab <[email protected]>
commit e330d96 Author: Yan Ru Pei <[email protected]> Date: Fri Jul 18 13:40:54 2025 -0700 feat: enable / disable chunked prefill for mockers (#2015) Signed-off-by: Yan Ru Pei <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 353146e Author: GuanLuo <[email protected]> Date: Fri Jul 18 13:33:36 2025 -0700 feat: add vLLM v1 multi-modal example. Add llama4 Maverick example (#1990) Signed-off-by: GuanLuo <[email protected]> Co-authored-by: krishung5 <[email protected]> commit 1f07dab Author: Jacky <[email protected]> Date: Fri Jul 18 13:04:20 2025 -0700 feat: Add migration to LLM requests (#1930) commit 5f17918 Author: Tanmay Verma <[email protected]> Date: Fri Jul 18 12:59:34 2025 -0700 refactor: Migrate to new UX2 for python launch (#2003) commit fc12436 Author: Graham King <[email protected]> Date: Fri Jul 18 14:52:57 2025 -0400 feat(frontend): router-mode settings (#2001) commit dc75cf1 Author: ptarasiewiczNV <[email protected]> Date: Fri Jul 18 18:47:28 2025 +0200 chore: Move NIXL repo clone to Dockerfiles (#2009) commit f6f392c Author: Iman Tabrizian <[email protected]> Date: Thu Jul 17 18:44:17 2025 -0700 Remove link to the fix for disagg + eagle3 for TRT-LLM example (#2006) Signed-off-by: Iman Tabrizian <[email protected]> commit cc90ca6 Author: atchernych <[email protected]> Date: Thu Jul 17 18:34:40 2025 -0700 feat: Create a convenience script to uninstall Dynamo Deploy CRDs (#1933) commit 267b422 Author: Greg Clark <[email protected]> Date: Thu Jul 17 20:44:21 2025 -0400 chore: loosed python requirement versions (#1998) Signed-off-by: Greg Clark <[email protected]> commit b8474e5 Author: ishandhanani <[email protected]> Date: Thu Jul 17 16:35:05 2025 -0700 chore: update cmake and gap installation and sgl in wideep container (#1991) commit 157a3b0 Author: Biswa Panda <[email protected]> Date: Thu Jul 17 15:38:12 2025 -0700 fix: incorrect helm upgrade command (#2000) commit 0dfca2c Author: Ryan McCormick <[email protected]> Date: Thu Jul 17 15:33:33 2025 -0700 ci: Update trtllm gitlab triggers for new components directory and test script (#1992) commit f3fb09e Author: Kris Hung <[email protected]> Date: Thu Jul 17 14:59:59 2025 -0700 fix: Fix syntax for tokio-console (#1997) commit dacffb8 Author: Biswa Panda <[email protected]> Date: Thu Jul 17 14:57:10 2025 -0700 fix: use non-dev golang image for operator (#1993) commit 2b29a0a Author: zaristei <[email protected]> Date: Thu Jul 17 13:10:42 2025 -0700 fix: Working Arm Build Dockerfile for Vllm_v1 (#1844) commit 2430d89 Author: Ryan McCormick <[email protected]> Date: Thu Jul 17 12:57:46 2025 -0700 test: Add trtllm kv router tests (#1988) commit 1eadc01 Author: Graham King <[email protected]> Date: Thu Jul 17 15:07:41 2025 -0400 feat(runtime): Support tokio-console (#1986) commit b62e633 Author: GuanLuo <[email protected]> Date: Thu Jul 17 11:16:28 2025 -0700 feat: support separate chat_template.jinja file (#1853) commit 8ae3719 Author: Hongkuan Zhou <[email protected]> Date: Thu Jul 17 11:12:35 2025 -0700 chore: add some details to dynamo deploy quickstart and fix deploy.sh (#1978) Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: julienmancuso <[email protected]> commit 08891ff Author: Ryan McCormick <[email protected]> Date: Thu Jul 17 10:57:42 2025 -0700 fix: Update trtllm tests to use new scripts instead of dynamo serve (#1979) commit 49b7a0d Author: Ryan Olson <[email protected]> Date: Thu Jul 17 08:35:04 2025 -0600 feat: record + analyze logprobs (#1957) commit 6d2be14 Author: Biswa Panda <[email protected]> Date: Thu Jul 17 00:17:58 2025 -0700 refactor: replace vllm with vllm_v1 container (#1953) Co-authored-by: alec-flowers <[email protected]> commit 4d2a31a Author: ishandhanani <[email protected]> Date: Wed Jul 16 18:04:09 2025 -0700 chore: add port reservation to utils (#1980) commit 1e3e4a0 Author: Alec <[email protected]> Date: Wed Jul 16 15:54:04 2025 -0700 fix: port race condition through deterministic ports (#1937) commit 4ad281f Author: Tanmay Verma <[email protected]> Date: Wed Jul 16 14:33:51 2025 -0700 refactor: Move TRTLLM example to the component/backends (#1976) commit 57d24a1 Author: Misha Chornyi <[email protected]> Date: Wed Jul 16 14:10:24 2025 -0700 build: Removing shell configuration violations. It's bad practice to hardcod… (#1973) commit 182d3b5 Author: Graham King <[email protected]> Date: Wed Jul 16 16:12:40 2025 -0400 chore(bindings): Remove mistralrs / llama.cpp (#1970) commit def6eaa Author: Harrison Saturley-Hall <[email protected]> Date: Wed Jul 16 15:50:23 2025 -0400 feat: attributions for debian deps of sglang, trtllm, vllm runtime containers (#1971) commit f31732a Author: Yan Ru Pei <[email protected]> Date: Wed Jul 16 11:22:15 2025 -0700 feat: integrate mocker with dynamo-run and python cli (#1927) commit aba6099 Author: Graham King <[email protected]> Date: Wed Jul 16 12:26:32 2025 -0400 perf(router): Remove lock from router hot path (#1963) commit b212103 Author: Hongkuan Zhou <[email protected]> Date: Wed Jul 16 08:55:33 2025 -0700 docs: add notes in docs to deprecate local connector (#1959) commit 7b325ee Author: Biswa Panda <[email protected]> Date: Tue Jul 15 18:52:00 2025 -0700 fix: vllm router examples (#1942) commit a50be1a Author: hhzhang16 <[email protected]> Date: Tue Jul 15 17:58:01 2025 -0700 feat: update CODEOWNERS (#1926) commit e260fdf Author: Harrison Saturley-Hall <[email protected]> Date: Tue Jul 15 18:49:21 2025 -0400 feat: add bitnami helm chart attribution (#1943) Signed-off-by: Harrison Saturley-Hall <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 1c03404 Author: Biswa Panda <[email protected]> Date: Tue Jul 15 14:26:24 2025 -0700 fix: update inference gateway deployment instructions (#1940) commit 5ca570f Author: Graham King <[email protected]> Date: Tue Jul 15 16:54:03 2025 -0400 chore: Rename dynamo.ingress to dynamo.frontend (#1944) commit 7b9182f Author: Graham King <[email protected]> Date: Tue Jul 15 16:33:07 2025 -0400 chore: Move examples/cli to lib/bindings/examples/cli (#1952) commit 40d40dd Author: Graham King <[email protected]> Date: Tue Jul 15 16:02:19 2025 -0400 chore(multi-modal): Rename frontend.py to web.py (#1951) commit a9e0891 Author: Ryan Olson <[email protected]> Date: Tue Jul 15 12:30:30 2025 -0600 feat: adding http clients and recorded response stream (#1919) commit 4128d58 Author: Biswa Panda <[email protected]> Date: Tue Jul 15 10:30:47 2025 -0700 feat: allow helm upgrade using deploy script (#1936) commit 4da078b Author: Graham King <[email protected]> Date: Tue Jul 15 12:57:38 2025 -0400 fix: Remove OpenSSL dependency, use Rust TLS (#1945) commit fc004d4 Author: jthomson04 <[email protected]> Date: Tue Jul 15 08:45:42 2025 -0700 fix: Fix TRT-LLM container build when using a custom pip wheel (#1825) commit 3c6fc6f Author: ishandhanani <[email protected]> Date: Mon Jul 14 22:35:20 2025 -0700 chore: fix typo (#1938) commit de7fe38 Author: Alec <[email protected]> Date: Mon Jul 14 21:47:12 2025 -0700 feat: add vllm e2e integration tests (#1935) commit 860f3f7 Author: Keiven C <[email protected]> Date: Mon Jul 14 21:44:19 2025 -0700 chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) Co-authored-by: Keiven Chang <[email protected]> commit fc402a3 Author: Biswa Panda <[email protected]> Date: Mon Jul 14 21:21:20 2025 -0700 feat: configurable namespace for vllm v1 example (#1909) commit df40d2c Author: ZichengMa <[email protected]> Date: Mon Jul 14 21:11:29 2025 -0700 docs: fix typo and add mount-workspace to vllm doc (#1931) Signed-off-by: ZichengMa <[email protected]> Co-authored-by: Alec <[email protected]> commit 901715b Author: Tanmay Verma <[email protected]> Date: Mon Jul 14 20:14:51 2025 -0700 refactor: Refactor the TRTLLM examples remove dynamo SDK (#1884) commit 5bf23d5 Author: hhzhang16 <[email protected]> Date: Mon Jul 14 18:29:19 2025 -0700 feat: update DynamoGraphDeployments for vllm_v1 (#1890) Co-authored-by: mohammedabdulwahhab <[email protected]> commit 9e76590 Author: ishandhanani <[email protected]> Date: Mon Jul 14 17:29:56 2025 -0700 docs: organize sglang readme (#1910) commit ef59ac8 Author: KrishnanPrash <[email protected]> Date: Mon Jul 14 16:16:44 2025 -0700 docs: TRTLLM Example of Llama4+Eagle3 (Speculative Decoding) (#1828) Signed-off-by: KrishnanPrash <[email protected]> Co-authored-by: Iman Tabrizian <[email protected]> commit 053041e Author: Jorge António <[email protected]> Date: Tue Jul 15 00:06:38 2025 +0100 fix: resolve incorrect finish reason propagation (#1857) commit 3733f58 Author: Graham King <[email protected]> Date: Mon Jul 14 19:04:22 2025 -0400 feat(backends): Python llama.cpp engine (#1925) commit 6a1350c Author: Tushar Sharma <[email protected]> Date: Mon Jul 14 14:56:36 2025 -0700 build: minor improvements to sglang dockerfile (#1917) commit e2a619b Author: Neelay Shah <[email protected]> Date: Mon Jul 14 14:52:53 2025 -0700 fix: remove environment variable passing (#1911) Signed-off-by: Neelay Shah <[email protected]> Co-authored-by: Neelay Shah <[email protected]> commit 3d17a49 Author: Schwinn Saereesitthipitak <[email protected]> Date: Mon Jul 14 14:41:56 2025 -0700 refactor: remove dynamo build (#1778) Signed-off-by: Schwinn Saereesitthipitak <[email protected]> commit 3e0cb07 Author: Anant Sharma <[email protected]> Date: Mon Jul 14 15:43:48 2025 -0400 fix: copy attributions and license to trtllm runtime container (#1916) commit fc36bf5 Author: ishandhanani <[email protected]> Date: Mon Jul 14 12:31:49 2025 -0700 feat: receive kvmetrics from sglang scheduler (#1789) Co-authored-by: zixuanzhang226 <[email protected]> commit df91fce Author: Yan Ru Pei <[email protected]> Date: Mon Jul 14 12:24:04 2025 -0700 feat: prefill aware routing (#1895) commit ad8ad66 Author: Graham King <[email protected]> Date: Mon Jul 14 15:20:35 2025 -0400 feat: Shrink the ai-dynamo wheel by 35 MiB (#1918) Remove http and llmctl binaries. They have been unused for a while. commit 480b41d Author: Graham King <[email protected]> Date: Mon Jul 14 15:06:45 2025 -0400 feat: Python frontend / ingress node (#1912)
commit cb6de94 Author: ptarasiewiczNV <[email protected]> Date: Sun Jul 20 22:34:50 2025 +0200 chore: Install vLLM and WideEP kernels in vLLM runtime container (#2010) Signed-off-by: Alec <[email protected]> Co-authored-by: Alec <[email protected]> Co-authored-by: alec-flowers <[email protected]> commit fe63c17 Author: Alec <[email protected]> Date: Fri Jul 18 17:45:08 2025 -0700 fix: Revert "feat: add vLLM v1 multi-modal example. Add llama4 Maverick ex… (#2017) commit bf1998f Author: jthomson04 <[email protected]> Date: Fri Jul 18 17:23:50 2025 -0700 fix: Don't detokenize twice in TRT-LLM examples (#1955) commit 343a481 Author: Ryan Olson <[email protected]> Date: Fri Jul 18 16:22:43 2025 -0600 feat: http disconnects (#2014) commit e330d96 Author: Yan Ru Pei <[email protected]> Date: Fri Jul 18 13:40:54 2025 -0700 feat: enable / disable chunked prefill for mockers (#2015) Signed-off-by: Yan Ru Pei <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 353146e Author: GuanLuo <[email protected]> Date: Fri Jul 18 13:33:36 2025 -0700 feat: add vLLM v1 multi-modal example. Add llama4 Maverick example (#1990) Signed-off-by: GuanLuo <[email protected]> Co-authored-by: krishung5 <[email protected]> commit 1f07dab Author: Jacky <[email protected]> Date: Fri Jul 18 13:04:20 2025 -0700 feat: Add migration to LLM requests (#1930) commit 5f17918 Author: Tanmay Verma <[email protected]> Date: Fri Jul 18 12:59:34 2025 -0700 refactor: Migrate to new UX2 for python launch (#2003) commit fc12436 Author: Graham King <[email protected]> Date: Fri Jul 18 14:52:57 2025 -0400 feat(frontend): router-mode settings (#2001) commit dc75cf1 Author: ptarasiewiczNV <[email protected]> Date: Fri Jul 18 18:47:28 2025 +0200 chore: Move NIXL repo clone to Dockerfiles (#2009) commit f6f392c Author: Iman Tabrizian <[email protected]> Date: Thu Jul 17 18:44:17 2025 -0700 Remove link to the fix for disagg + eagle3 for TRT-LLM example (#2006) Signed-off-by: Iman Tabrizian <[email protected]> commit cc90ca6 Author: atchernych <[email protected]> Date: Thu Jul 17 18:34:40 2025 -0700 feat: Create a convenience script to uninstall Dynamo Deploy CRDs (#1933) commit 267b422 Author: Greg Clark <[email protected]> Date: Thu Jul 17 20:44:21 2025 -0400 chore: loosed python requirement versions (#1998) Signed-off-by: Greg Clark <[email protected]> commit b8474e5 Author: ishandhanani <[email protected]> Date: Thu Jul 17 16:35:05 2025 -0700 chore: update cmake and gap installation and sgl in wideep container (#1991) commit 157a3b0 Author: Biswa Panda <[email protected]> Date: Thu Jul 17 15:38:12 2025 -0700 fix: incorrect helm upgrade command (#2000) commit 0dfca2c Author: Ryan McCormick <[email protected]> Date: Thu Jul 17 15:33:33 2025 -0700 ci: Update trtllm gitlab triggers for new components directory and test script (#1992) commit f3fb09e Author: Kris Hung <[email protected]> Date: Thu Jul 17 14:59:59 2025 -0700 fix: Fix syntax for tokio-console (#1997) commit dacffb8 Author: Biswa Panda <[email protected]> Date: Thu Jul 17 14:57:10 2025 -0700 fix: use non-dev golang image for operator (#1993) commit 2b29a0a Author: zaristei <[email protected]> Date: Thu Jul 17 13:10:42 2025 -0700 fix: Working Arm Build Dockerfile for Vllm_v1 (#1844) commit 2430d89 Author: Ryan McCormick <[email protected]> Date: Thu Jul 17 12:57:46 2025 -0700 test: Add trtllm kv router tests (#1988) commit 1eadc01 Author: Graham King <[email protected]> Date: Thu Jul 17 15:07:41 2025 -0400 feat(runtime): Support tokio-console (#1986) commit b62e633 Author: GuanLuo <[email protected]> Date: Thu Jul 17 11:16:28 2025 -0700 feat: support separate chat_template.jinja file (#1853) commit 8ae3719 Author: Hongkuan Zhou <[email protected]> Date: Thu Jul 17 11:12:35 2025 -0700 chore: add some details to dynamo deploy quickstart and fix deploy.sh (#1978) Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: julienmancuso <[email protected]> commit 08891ff Author: Ryan McCormick <[email protected]> Date: Thu Jul 17 10:57:42 2025 -0700 fix: Update trtllm tests to use new scripts instead of dynamo serve (#1979) commit 49b7a0d Author: Ryan Olson <[email protected]> Date: Thu Jul 17 08:35:04 2025 -0600 feat: record + analyze logprobs (#1957) commit 6d2be14 Author: Biswa Panda <[email protected]> Date: Thu Jul 17 00:17:58 2025 -0700 refactor: replace vllm with vllm_v1 container (#1953) Co-authored-by: alec-flowers <[email protected]> commit 4d2a31a Author: ishandhanani <[email protected]> Date: Wed Jul 16 18:04:09 2025 -0700 chore: add port reservation to utils (#1980) commit 1e3e4a0 Author: Alec <[email protected]> Date: Wed Jul 16 15:54:04 2025 -0700 fix: port race condition through deterministic ports (#1937) commit 4ad281f Author: Tanmay Verma <[email protected]> Date: Wed Jul 16 14:33:51 2025 -0700 refactor: Move TRTLLM example to the component/backends (#1976) commit 57d24a1 Author: Misha Chornyi <[email protected]> Date: Wed Jul 16 14:10:24 2025 -0700 build: Removing shell configuration violations. It's bad practice to hardcod… (#1973) commit 182d3b5 Author: Graham King <[email protected]> Date: Wed Jul 16 16:12:40 2025 -0400 chore(bindings): Remove mistralrs / llama.cpp (#1970) commit def6eaa Author: Harrison Saturley-Hall <[email protected]> Date: Wed Jul 16 15:50:23 2025 -0400 feat: attributions for debian deps of sglang, trtllm, vllm runtime containers (#1971) commit f31732a Author: Yan Ru Pei <[email protected]> Date: Wed Jul 16 11:22:15 2025 -0700 feat: integrate mocker with dynamo-run and python cli (#1927) commit aba6099 Author: Graham King <[email protected]> Date: Wed Jul 16 12:26:32 2025 -0400 perf(router): Remove lock from router hot path (#1963) commit b212103 Author: Hongkuan Zhou <[email protected]> Date: Wed Jul 16 08:55:33 2025 -0700 docs: add notes in docs to deprecate local connector (#1959) commit 7b325ee Author: Biswa Panda <[email protected]> Date: Tue Jul 15 18:52:00 2025 -0700 fix: vllm router examples (#1942) commit a50be1a Author: hhzhang16 <[email protected]> Date: Tue Jul 15 17:58:01 2025 -0700 feat: update CODEOWNERS (#1926) commit e260fdf Author: Harrison Saturley-Hall <[email protected]> Date: Tue Jul 15 18:49:21 2025 -0400 feat: add bitnami helm chart attribution (#1943) Signed-off-by: Harrison Saturley-Hall <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 1c03404 Author: Biswa Panda <[email protected]> Date: Tue Jul 15 14:26:24 2025 -0700 fix: update inference gateway deployment instructions (#1940) commit 5ca570f Author: Graham King <[email protected]> Date: Tue Jul 15 16:54:03 2025 -0400 chore: Rename dynamo.ingress to dynamo.frontend (#1944) commit 7b9182f Author: Graham King <[email protected]> Date: Tue Jul 15 16:33:07 2025 -0400 chore: Move examples/cli to lib/bindings/examples/cli (#1952) commit 40d40dd Author: Graham King <[email protected]> Date: Tue Jul 15 16:02:19 2025 -0400 chore(multi-modal): Rename frontend.py to web.py (#1951) commit a9e0891 Author: Ryan Olson <[email protected]> Date: Tue Jul 15 12:30:30 2025 -0600 feat: adding http clients and recorded response stream (#1919) commit 4128d58 Author: Biswa Panda <[email protected]> Date: Tue Jul 15 10:30:47 2025 -0700 feat: allow helm upgrade using deploy script (#1936) commit 4da078b Author: Graham King <[email protected]> Date: Tue Jul 15 12:57:38 2025 -0400 fix: Remove OpenSSL dependency, use Rust TLS (#1945) commit fc004d4 Author: jthomson04 <[email protected]> Date: Tue Jul 15 08:45:42 2025 -0700 fix: Fix TRT-LLM container build when using a custom pip wheel (#1825) commit 3c6fc6f Author: ishandhanani <[email protected]> Date: Mon Jul 14 22:35:20 2025 -0700 chore: fix typo (#1938) commit de7fe38 Author: Alec <[email protected]> Date: Mon Jul 14 21:47:12 2025 -0700 feat: add vllm e2e integration tests (#1935) commit 860f3f7 Author: Keiven C <[email protected]> Date: Mon Jul 14 21:44:19 2025 -0700 chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) Co-authored-by: Keiven Chang <[email protected]> commit fc402a3 Author: Biswa Panda <[email protected]> Date: Mon Jul 14 21:21:20 2025 -0700 feat: configurable namespace for vllm v1 example (#1909) commit df40d2c Author: ZichengMa <[email protected]> Date: Mon Jul 14 21:11:29 2025 -0700 docs: fix typo and add mount-workspace to vllm doc (#1931) Signed-off-by: ZichengMa <[email protected]> Co-authored-by: Alec <[email protected]> commit 901715b Author: Tanmay Verma <[email protected]> Date: Mon Jul 14 20:14:51 2025 -0700 refactor: Refactor the TRTLLM examples remove dynamo SDK (#1884) commit 5bf23d5 Author: hhzhang16 <[email protected]> Date: Mon Jul 14 18:29:19 2025 -0700 feat: update DynamoGraphDeployments for vllm_v1 (#1890) Co-authored-by: mohammedabdulwahhab <[email protected]> commit 9e76590 Author: ishandhanani <[email protected]> Date: Mon Jul 14 17:29:56 2025 -0700 docs: organize sglang readme (#1910) commit ef59ac8 Author: KrishnanPrash <[email protected]> Date: Mon Jul 14 16:16:44 2025 -0700 docs: TRTLLM Example of Llama4+Eagle3 (Speculative Decoding) (#1828) Signed-off-by: KrishnanPrash <[email protected]> Co-authored-by: Iman Tabrizian <[email protected]> commit 053041e Author: Jorge António <[email protected]> Date: Tue Jul 15 00:06:38 2025 +0100 fix: resolve incorrect finish reason propagation (#1857) commit 3733f58 Author: Graham King <[email protected]> Date: Mon Jul 14 19:04:22 2025 -0400 feat(backends): Python llama.cpp engine (#1925) commit 6a1350c Author: Tushar Sharma <[email protected]> Date: Mon Jul 14 14:56:36 2025 -0700 build: minor improvements to sglang dockerfile (#1917) commit e2a619b Author: Neelay Shah <[email protected]> Date: Mon Jul 14 14:52:53 2025 -0700 fix: remove environment variable passing (#1911) Signed-off-by: Neelay Shah <[email protected]> Co-authored-by: Neelay Shah <[email protected]> commit 3d17a49 Author: Schwinn Saereesitthipitak <[email protected]> Date: Mon Jul 14 14:41:56 2025 -0700 refactor: remove dynamo build (#1778) Signed-off-by: Schwinn Saereesitthipitak <[email protected]> commit 3e0cb07 Author: Anant Sharma <[email protected]> Date: Mon Jul 14 15:43:48 2025 -0400 fix: copy attributions and license to trtllm runtime container (#1916) commit fc36bf5 Author: ishandhanani <[email protected]> Date: Mon Jul 14 12:31:49 2025 -0700 feat: receive kvmetrics from sglang scheduler (#1789) Co-authored-by: zixuanzhang226 <[email protected]> commit df91fce Author: Yan Ru Pei <[email protected]> Date: Mon Jul 14 12:24:04 2025 -0700 feat: prefill aware routing (#1895) commit ad8ad66 Author: Graham King <[email protected]> Date: Mon Jul 14 15:20:35 2025 -0400 feat: Shrink the ai-dynamo wheel by 35 MiB (#1918) Remove http and llmctl binaries. They have been unused for a while. commit 480b41d Author: Graham King <[email protected]> Date: Mon Jul 14 15:06:45 2025 -0400 feat: Python frontend / ingress node (#1912)
commit d4b5414 Author: atchernych <[email protected]> Date: Mon Jul 21 13:10:24 2025 -0700 fix: mypy error (#2029) commit 79337c7 Author: Ryan McCormick <[email protected]> Date: Mon Jul 21 12:12:16 2025 -0700 build: support custom TRTLLM build for commits not on main branch (#2021) commit 95dd942 Author: atchernych <[email protected]> Date: Mon Jul 21 12:09:33 2025 -0700 docs: Post-Merge cleanup of the deploy documentation (#1922) commit cb6de94 Author: ptarasiewiczNV <[email protected]> Date: Sun Jul 20 22:34:50 2025 +0200 chore: Install vLLM and WideEP kernels in vLLM runtime container (#2010) Signed-off-by: Alec <[email protected]> Co-authored-by: Alec <[email protected]> Co-authored-by: alec-flowers <[email protected]> commit fe63c17 Author: Alec <[email protected]> Date: Fri Jul 18 17:45:08 2025 -0700 fix: Revert "feat: add vLLM v1 multi-modal example. Add llama4 Maverick ex… (#2017) commit bf1998f Author: jthomson04 <[email protected]> Date: Fri Jul 18 17:23:50 2025 -0700 fix: Don't detokenize twice in TRT-LLM examples (#1955) commit 343a481 Author: Ryan Olson <[email protected]> Date: Fri Jul 18 16:22:43 2025 -0600 feat: http disconnects (#2014) commit e330d96 Author: Yan Ru Pei <[email protected]> Date: Fri Jul 18 13:40:54 2025 -0700 feat: enable / disable chunked prefill for mockers (#2015) Signed-off-by: Yan Ru Pei <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 353146e Author: GuanLuo <[email protected]> Date: Fri Jul 18 13:33:36 2025 -0700 feat: add vLLM v1 multi-modal example. Add llama4 Maverick example (#1990) Signed-off-by: GuanLuo <[email protected]> Co-authored-by: krishung5 <[email protected]> commit 1f07dab Author: Jacky <[email protected]> Date: Fri Jul 18 13:04:20 2025 -0700 feat: Add migration to LLM requests (#1930) commit 5f17918 Author: Tanmay Verma <[email protected]> Date: Fri Jul 18 12:59:34 2025 -0700 refactor: Migrate to new UX2 for python launch (#2003) commit fc12436 Author: Graham King <[email protected]> Date: Fri Jul 18 14:52:57 2025 -0400 feat(frontend): router-mode settings (#2001) commit dc75cf1 Author: ptarasiewiczNV <[email protected]> Date: Fri Jul 18 18:47:28 2025 +0200 chore: Move NIXL repo clone to Dockerfiles (#2009) commit f6f392c Author: Iman Tabrizian <[email protected]> Date: Thu Jul 17 18:44:17 2025 -0700 Remove link to the fix for disagg + eagle3 for TRT-LLM example (#2006) Signed-off-by: Iman Tabrizian <[email protected]> commit cc90ca6 Author: atchernych <[email protected]> Date: Thu Jul 17 18:34:40 2025 -0700 feat: Create a convenience script to uninstall Dynamo Deploy CRDs (#1933) commit 267b422 Author: Greg Clark <[email protected]> Date: Thu Jul 17 20:44:21 2025 -0400 chore: loosed python requirement versions (#1998) Signed-off-by: Greg Clark <[email protected]> commit b8474e5 Author: ishandhanani <[email protected]> Date: Thu Jul 17 16:35:05 2025 -0700 chore: update cmake and gap installation and sgl in wideep container (#1991) commit 157a3b0 Author: Biswa Panda <[email protected]> Date: Thu Jul 17 15:38:12 2025 -0700 fix: incorrect helm upgrade command (#2000) commit 0dfca2c Author: Ryan McCormick <[email protected]> Date: Thu Jul 17 15:33:33 2025 -0700 ci: Update trtllm gitlab triggers for new components directory and test script (#1992) commit f3fb09e Author: Kris Hung <[email protected]> Date: Thu Jul 17 14:59:59 2025 -0700 fix: Fix syntax for tokio-console (#1997) commit dacffb8 Author: Biswa Panda <[email protected]> Date: Thu Jul 17 14:57:10 2025 -0700 fix: use non-dev golang image for operator (#1993) commit 2b29a0a Author: zaristei <[email protected]> Date: Thu Jul 17 13:10:42 2025 -0700 fix: Working Arm Build Dockerfile for Vllm_v1 (#1844) commit 2430d89 Author: Ryan McCormick <[email protected]> Date: Thu Jul 17 12:57:46 2025 -0700 test: Add trtllm kv router tests (#1988) commit 1eadc01 Author: Graham King <[email protected]> Date: Thu Jul 17 15:07:41 2025 -0400 feat(runtime): Support tokio-console (#1986) commit b62e633 Author: GuanLuo <[email protected]> Date: Thu Jul 17 11:16:28 2025 -0700 feat: support separate chat_template.jinja file (#1853) commit 8ae3719 Author: Hongkuan Zhou <[email protected]> Date: Thu Jul 17 11:12:35 2025 -0700 chore: add some details to dynamo deploy quickstart and fix deploy.sh (#1978) Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: julienmancuso <[email protected]> commit 08891ff Author: Ryan McCormick <[email protected]> Date: Thu Jul 17 10:57:42 2025 -0700 fix: Update trtllm tests to use new scripts instead of dynamo serve (#1979) commit 49b7a0d Author: Ryan Olson <[email protected]> Date: Thu Jul 17 08:35:04 2025 -0600 feat: record + analyze logprobs (#1957) commit 6d2be14 Author: Biswa Panda <[email protected]> Date: Thu Jul 17 00:17:58 2025 -0700 refactor: replace vllm with vllm_v1 container (#1953) Co-authored-by: alec-flowers <[email protected]> commit 4d2a31a Author: ishandhanani <[email protected]> Date: Wed Jul 16 18:04:09 2025 -0700 chore: add port reservation to utils (#1980) commit 1e3e4a0 Author: Alec <[email protected]> Date: Wed Jul 16 15:54:04 2025 -0700 fix: port race condition through deterministic ports (#1937) commit 4ad281f Author: Tanmay Verma <[email protected]> Date: Wed Jul 16 14:33:51 2025 -0700 refactor: Move TRTLLM example to the component/backends (#1976) commit 57d24a1 Author: Misha Chornyi <[email protected]> Date: Wed Jul 16 14:10:24 2025 -0700 build: Removing shell configuration violations. It's bad practice to hardcod… (#1973) commit 182d3b5 Author: Graham King <[email protected]> Date: Wed Jul 16 16:12:40 2025 -0400 chore(bindings): Remove mistralrs / llama.cpp (#1970) commit def6eaa Author: Harrison Saturley-Hall <[email protected]> Date: Wed Jul 16 15:50:23 2025 -0400 feat: attributions for debian deps of sglang, trtllm, vllm runtime containers (#1971) commit f31732a Author: Yan Ru Pei <[email protected]> Date: Wed Jul 16 11:22:15 2025 -0700 feat: integrate mocker with dynamo-run and python cli (#1927) commit aba6099 Author: Graham King <[email protected]> Date: Wed Jul 16 12:26:32 2025 -0400 perf(router): Remove lock from router hot path (#1963) commit b212103 Author: Hongkuan Zhou <[email protected]> Date: Wed Jul 16 08:55:33 2025 -0700 docs: add notes in docs to deprecate local connector (#1959) commit 7b325ee Author: Biswa Panda <[email protected]> Date: Tue Jul 15 18:52:00 2025 -0700 fix: vllm router examples (#1942) commit a50be1a Author: hhzhang16 <[email protected]> Date: Tue Jul 15 17:58:01 2025 -0700 feat: update CODEOWNERS (#1926) commit e260fdf Author: Harrison Saturley-Hall <[email protected]> Date: Tue Jul 15 18:49:21 2025 -0400 feat: add bitnami helm chart attribution (#1943) Signed-off-by: Harrison Saturley-Hall <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 1c03404 Author: Biswa Panda <[email protected]> Date: Tue Jul 15 14:26:24 2025 -0700 fix: update inference gateway deployment instructions (#1940) commit 5ca570f Author: Graham King <[email protected]> Date: Tue Jul 15 16:54:03 2025 -0400 chore: Rename dynamo.ingress to dynamo.frontend (#1944) commit 7b9182f Author: Graham King <[email protected]> Date: Tue Jul 15 16:33:07 2025 -0400 chore: Move examples/cli to lib/bindings/examples/cli (#1952) commit 40d40dd Author: Graham King <[email protected]> Date: Tue Jul 15 16:02:19 2025 -0400 chore(multi-modal): Rename frontend.py to web.py (#1951) commit a9e0891 Author: Ryan Olson <[email protected]> Date: Tue Jul 15 12:30:30 2025 -0600 feat: adding http clients and recorded response stream (#1919) commit 4128d58 Author: Biswa Panda <[email protected]> Date: Tue Jul 15 10:30:47 2025 -0700 feat: allow helm upgrade using deploy script (#1936) commit 4da078b Author: Graham King <[email protected]> Date: Tue Jul 15 12:57:38 2025 -0400 fix: Remove OpenSSL dependency, use Rust TLS (#1945) commit fc004d4 Author: jthomson04 <[email protected]> Date: Tue Jul 15 08:45:42 2025 -0700 fix: Fix TRT-LLM container build when using a custom pip wheel (#1825) commit 3c6fc6f Author: ishandhanani <[email protected]> Date: Mon Jul 14 22:35:20 2025 -0700 chore: fix typo (#1938) commit de7fe38 Author: Alec <[email protected]> Date: Mon Jul 14 21:47:12 2025 -0700 feat: add vllm e2e integration tests (#1935) commit 860f3f7 Author: Keiven C <[email protected]> Date: Mon Jul 14 21:44:19 2025 -0700 chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) Co-authored-by: Keiven Chang <[email protected]> commit fc402a3 Author: Biswa Panda <[email protected]> Date: Mon Jul 14 21:21:20 2025 -0700 feat: configurable namespace for vllm v1 example (#1909) commit df40d2c Author: ZichengMa <[email protected]> Date: Mon Jul 14 21:11:29 2025 -0700 docs: fix typo and add mount-workspace to vllm doc (#1931) Signed-off-by: ZichengMa <[email protected]> Co-authored-by: Alec <[email protected]> commit 901715b Author: Tanmay Verma <[email protected]> Date: Mon Jul 14 20:14:51 2025 -0700 refactor: Refactor the TRTLLM examples remove dynamo SDK (#1884) commit 5bf23d5 Author: hhzhang16 <[email protected]> Date: Mon Jul 14 18:29:19 2025 -0700 feat: update DynamoGraphDeployments for vllm_v1 (#1890) Co-authored-by: mohammedabdulwahhab <[email protected]> commit 9e76590 Author: ishandhanani <[email protected]> Date: Mon Jul 14 17:29:56 2025 -0700 docs: organize sglang readme (#1910) commit ef59ac8 Author: KrishnanPrash <[email protected]> Date: Mon Jul 14 16:16:44 2025 -0700 docs: TRTLLM Example of Llama4+Eagle3 (Speculative Decoding) (#1828) Signed-off-by: KrishnanPrash <[email protected]> Co-authored-by: Iman Tabrizian <[email protected]> commit 053041e Author: Jorge António <[email protected]> Date: Tue Jul 15 00:06:38 2025 +0100 fix: resolve incorrect finish reason propagation (#1857) commit 3733f58 Author: Graham King <[email protected]> Date: Mon Jul 14 19:04:22 2025 -0400 feat(backends): Python llama.cpp engine (#1925) commit 6a1350c Author: Tushar Sharma <[email protected]> Date: Mon Jul 14 14:56:36 2025 -0700 build: minor improvements to sglang dockerfile (#1917) commit e2a619b Author: Neelay Shah <[email protected]> Date: Mon Jul 14 14:52:53 2025 -0700 fix: remove environment variable passing (#1911) Signed-off-by: Neelay Shah <[email protected]> Co-authored-by: Neelay Shah <[email protected]> commit 3d17a49 Author: Schwinn Saereesitthipitak <[email protected]> Date: Mon Jul 14 14:41:56 2025 -0700 refactor: remove dynamo build (#1778) Signed-off-by: Schwinn Saereesitthipitak <[email protected]> commit 3e0cb07 Author: Anant Sharma <[email protected]> Date: Mon Jul 14 15:43:48 2025 -0400 fix: copy attributions and license to trtllm runtime container (#1916) commit fc36bf5 Author: ishandhanani <[email protected]> Date: Mon Jul 14 12:31:49 2025 -0700 feat: receive kvmetrics from sglang scheduler (#1789) Co-authored-by: zixuanzhang226 <[email protected]> commit df91fce Author: Yan Ru Pei <[email protected]> Date: Mon Jul 14 12:24:04 2025 -0700 feat: prefill aware routing (#1895) commit ad8ad66 Author: Graham King <[email protected]> Date: Mon Jul 14 15:20:35 2025 -0400 feat: Shrink the ai-dynamo wheel by 35 MiB (#1918) Remove http and llmctl binaries. They have been unused for a while. commit 480b41d Author: Graham King <[email protected]> Date: Mon Jul 14 15:06:45 2025 -0400 feat: Python frontend / ingress node (#1912)
Overview:
Details:
Demo:
2025-07-14-deploy-vllm-v1-disagg.mp4
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Documentation
New Features
Refactor