Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
75101b6
fix: wip
mohammedabdulwahhab Aug 11, 2025
1323498
fix: fix
mohammedabdulwahhab Aug 11, 2025
78ca1cc
fix: fix
mohammedabdulwahhab Aug 11, 2025
8514e2a
Merge branch 'main' of https://github.com/ai-dynamo/dynamo into mabdu…
mohammedabdulwahhab Aug 11, 2025
903baf3
fix: refactor main component type to frontend
mohammedabdulwahhab Aug 11, 2025
86759aa
fix: tests partially fixed
mohammedabdulwahhab Aug 12, 2025
045bae6
fix: parameterize component factory with single vs multinode and fix …
mohammedabdulwahhab Aug 12, 2025
bac8c3f
fix: update vllm yamls to use defaults
mohammedabdulwahhab Aug 12, 2025
b7fb92c
fix: update sglang yamls
mohammedabdulwahhab Aug 12, 2025
a618c0c
fix: trtllm yamls
mohammedabdulwahhab Aug 12, 2025
c7d5ad4
fix: add planner component defaults
mohammedabdulwahhab Aug 12, 2025
97b96c5
fix: set planner defaults
mohammedabdulwahhab Aug 12, 2025
286069f
fix: ai lint yaml files
mohammedabdulwahhab Aug 12, 2025
a2f4110
fix: more tee removals
mohammedabdulwahhab Aug 12, 2025
2ebaf90
fix: more lint
mohammedabdulwahhab Aug 12, 2025
3a79d26
Update components/backends/vllm/deploy/disagg_planner.yaml
mohammedabdulwahhab Aug 12, 2025
fe9c153
fix: fix
mohammedabdulwahhab Aug 12, 2025
4cf9394
Merge branch 'mabdulwahhab/defaults' of https://github.com/ai-dynamo/…
mohammedabdulwahhab Aug 12, 2025
880d2b4
fix: fix merge conflicts
mohammedabdulwahhab Aug 12, 2025
b9f9c43
fix: remove backend param
mohammedabdulwahhab Aug 12, 2025
d3eb5d3
Apply suggestions from code review
mohammedabdulwahhab Aug 12, 2025
352a4e7
fix: remove multinode guard and fix tests
mohammedabdulwahhab Aug 12, 2025
04919b7
Merge branch 'mabdulwahhab/defaults' of https://github.com/ai-dynamo/…
mohammedabdulwahhab Aug 12, 2025
1a58890
fix: fix role
mohammedabdulwahhab Aug 12, 2025
e84d253
fix: planner should add a service account
mohammedabdulwahhab Aug 13, 2025
cbd90e9
fix: add startup probe overrides, add checkMainContainerOverrides
mohammedabdulwahhab Aug 13, 2025
042092e
fix: restore prometheus comp in disagg_planner to use componentType f…
mohammedabdulwahhab Aug 13, 2025
0a738a8
Merge branch 'main' of https://github.com/ai-dynamo/dynamo into mabdu…
mohammedabdulwahhab Aug 13, 2025
66dbc51
fix: update prometheus for sglang as well
mohammedabdulwahhab Aug 13, 2025
d5f6b2d
Merge branch 'main' of https://github.com/ai-dynamo/dynamo into mabdu…
mohammedabdulwahhab Aug 13, 2025
1a05dab
fix: remove validate main container
mohammedabdulwahhab Aug 14, 2025
bf8db83
Merge branch 'main' of https://github.com/ai-dynamo/dynamo into mabdu…
mohammedabdulwahhab Aug 14, 2025
e54b451
fix: fix sglang disagg planner
mohammedabdulwahhab Aug 14, 2025
20e84e5
Apply suggestions from code review
mohammedabdulwahhab Aug 14, 2025
117b0ce
Merge branch 'main' of https://github.com/ai-dynamo/dynamo into mabdu…
mohammedabdulwahhab Aug 14, 2025
7aa3627
Merge branch 'mabdulwahhab/defaults' of https://github.com/ai-dynamo/…
mohammedabdulwahhab Aug 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 1 addition & 47 deletions components/backends/sglang/deploy/agg.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,26 +8,8 @@ metadata:
spec:
services:
Frontend:
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 20
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- 'curl -s http://localhost:8000/health | jq -e ".status == \"healthy\""'
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 10
dynamoNamespace: sglang-agg
componentType: main
componentType: frontend
replicas: 1
resources:
requests:
Expand All @@ -45,21 +27,6 @@ spec:
- "python3 -m dynamo.sglang.utils.clear_namespace --namespace sglang-agg && python3 -m dynamo.frontend --http-port=8000"
SGLangDecodeWorker:
envFromSecret: hf-token-secret
livenessProbe:
httpGet:
path: /live
port: 9090
periodSeconds: 5
timeoutSeconds: 30
failureThreshold: 1
readinessProbe:
exec:
httpGet:
path: /health
port: 9090
periodSeconds: 10
timeoutSeconds: 30
failureThreshold: 60
dynamoNamespace: sglang-agg
componentType: worker
replicas: 1
Expand All @@ -72,21 +39,8 @@ spec:
cpu: "32"
memory: "80Gi"
gpu: "1"
envs:
- name: DYN_SYSTEM_ENABLED
value: "true"
- name: DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS
value: "[\"generate\"]"
- name: DYN_SYSTEM_PORT
value: "9090"
extraPodSpec:
mainContainer:
startupProbe:
httpGet:
path: /live
port: 9090
periodSeconds: 10
failureThreshold: 60
image: my-registry/sglang-runtime:my-tag
workingDir: /workspace/components/backends/sglang
command:
Expand Down
48 changes: 1 addition & 47 deletions components/backends/sglang/deploy/agg_router.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,26 +8,8 @@ metadata:
spec:
services:
Frontend:
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 20
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- 'curl -s http://localhost:8000/health | jq -e ".status == \"healthy\""'
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 10
dynamoNamespace: sglang-agg-router
componentType: main
componentType: frontend
replicas: 1
resources:
requests:
Expand All @@ -45,21 +27,6 @@ spec:
- "python3 -m dynamo.sglang.utils.clear_namespace --namespace sglang-agg-router && python3 -m dynamo.frontend --http-port=8000 --router-mode kv"
SGLangDecodeWorker:
envFromSecret: hf-token-secret
livenessProbe:
httpGet:
path: /live
port: 9090
periodSeconds: 5
timeoutSeconds: 30
failureThreshold: 1
readinessProbe:
exec:
httpGet:
path: /health
port: 9090
periodSeconds: 10
timeoutSeconds: 30
failureThreshold: 60
dynamoNamespace: sglang-agg-router
componentType: worker
replicas: 1
Expand All @@ -72,21 +39,8 @@ spec:
cpu: "32"
memory: "80Gi"
gpu: "1"
envs:
- name: DYN_SYSTEM_ENABLED
value: "true"
- name: DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS
value: "[\"generate\"]"
- name: DYN_SYSTEM_PORT
value: "9090"
extraPodSpec:
mainContainer:
startupProbe:
httpGet:
path: /live
port: 9090
periodSeconds: 10
failureThreshold: 60
image: my-registry/sglang-runtime:my-tag
workingDir: /workspace/components/backends/sglang
command:
Expand Down
76 changes: 1 addition & 75 deletions components/backends/sglang/deploy/disagg.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,26 +8,8 @@ metadata:
spec:
services:
Frontend:
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 20
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- 'curl -s http://localhost:8000/health | jq -e ".status == \"healthy\""'
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 10
dynamoNamespace: sglang-disagg
componentType: main
componentType: frontend
replicas: 1
resources:
requests:
Expand All @@ -45,21 +27,6 @@ spec:
- "python3 -m dynamo.sglang.utils.clear_namespace --namespace sglang-disagg && python3 -m dynamo.frontend --http-port=8000"
SGLangDecodeWorker:
envFromSecret: hf-token-secret
livenessProbe:
httpGet:
path: /live
port: 9090
periodSeconds: 5
timeoutSeconds: 30
failureThreshold: 1
readinessProbe:
exec:
httpGet:
path: /health
port: 9090
periodSeconds: 10
timeoutSeconds: 30
failureThreshold: 60
dynamoNamespace: sglang-disagg
componentType: worker
replicas: 1
Expand All @@ -72,21 +39,8 @@ spec:
cpu: "32"
memory: "80Gi"
gpu: "1"
envs:
- name: DYN_SYSTEM_ENABLED
value: "true"
- name: DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS
value: "[\"generate\"]"
- name: DYN_SYSTEM_PORT
value: "9090"
extraPodSpec:
mainContainer:
startupProbe:
httpGet:
path: /live
port: 9090
periodSeconds: 10
failureThreshold: 60
image: nvcr.io/nvidian/nim-llm-dev/sglang-runtime:hzhou-0808-07
workingDir: /workspace/components/backends/sglang
command:
Expand All @@ -112,21 +66,6 @@ spec:
- "nixl"
SGLangPrefillWorker:
envFromSecret: hf-token-secret
livenessProbe:
httpGet:
path: /live
port: 9090
periodSeconds: 5
timeoutSeconds: 30
failureThreshold: 1
readinessProbe:
exec:
httpGet:
path: /health
port: 9090
periodSeconds: 10
timeoutSeconds: 30
failureThreshold: 60
dynamoNamespace: sglang-disagg
componentType: worker
replicas: 1
Expand All @@ -139,21 +78,8 @@ spec:
cpu: "32"
memory: "80Gi"
gpu: "1"
envs:
- name: DYN_SYSTEM_ENABLED
value: "true"
- name: DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS
value: "[\"generate\"]"
- name: DYN_SYSTEM_PORT
value: "9090"
extraPodSpec:
mainContainer:
startupProbe:
httpGet:
path: /health
port: 9090
periodSeconds: 10
failureThreshold: 60
image: nvcr.io/nvidian/nim-llm-dev/sglang-runtime:hzhou-0808-07
workingDir: /workspace/components/backends/sglang
command:
Expand Down
78 changes: 3 additions & 75 deletions components/backends/sglang/deploy/disagg_planner.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,25 +16,7 @@ spec:
services:
Frontend:
dynamoNamespace: dynamo
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 20
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- 'curl -s http://localhost:8000/health | jq -e ".status == \"healthy\""'
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 10
componentType: main
componentType: frontend
replicas: 1
resources:
requests:
Expand Down Expand Up @@ -97,9 +79,9 @@ spec:
- --backend=sglang
- --adjustment-interval=60
- --profile-results-dir=/workspace/profiling_results
Prometheus:
Prometheus: # NOTE: this is set on Prometheus to ensure a service is created for the Prometheus component. This is a workaround and should be managed differently.
dynamoNamespace: dynamo
componentType: main
componentType: frontend
replicas: 1
envs:
- name: PYTHONPATH
Expand Down Expand Up @@ -142,20 +124,6 @@ spec:
SGLangDecodeWorker:
dynamoNamespace: dynamo
envFromSecret: hf-token-secret
livenessProbe:
httpGet:
path: /live
port: 9090
periodSeconds: 5
timeoutSeconds: 30
failureThreshold: 1
readinessProbe:
httpGet:
path: /health
port: 9090
periodSeconds: 10
timeoutSeconds: 30
failureThreshold: 60
componentType: worker
replicas: 2
resources:
Expand All @@ -167,21 +135,8 @@ spec:
cpu: "32"
memory: "80Gi"
gpu: "1"
envs:
- name: DYN_SYSTEM_ENABLED
value: "true"
- name: DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS
value: "[\"generate\"]"
- name: DYN_SYSTEM_PORT
value: "9090"
extraPodSpec:
mainContainer:
startupProbe:
httpGet:
path: /live
port: 9090
periodSeconds: 10
failureThreshold: 60
image: nvcr.io/nvidian/nim-llm-dev/sglang-runtime:hzhou-0811-1
workingDir: /workspace/components/backends/sglang
args:
Expand All @@ -205,20 +160,6 @@ spec:
SGLangPrefillWorker:
dynamoNamespace: dynamo
envFromSecret: hf-token-secret
livenessProbe:
httpGet:
path: /live
port: 9090
periodSeconds: 5
timeoutSeconds: 30
failureThreshold: 1
readinessProbe:
httpGet:
path: /health
port: 9090
periodSeconds: 10
timeoutSeconds: 30
failureThreshold: 60
componentType: worker
replicas: 2
resources:
Expand All @@ -230,21 +171,8 @@ spec:
cpu: "32"
memory: "80Gi"
gpu: "1"
envs:
- name: DYN_SYSTEM_ENABLED
value: "true"
- name: DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS
value: "[\"generate\"]"
- name: DYN_SYSTEM_PORT
value: "9090"
extraPodSpec:
mainContainer:
startupProbe:
httpGet:
path: /health
port: 9090
periodSeconds: 10
failureThreshold: 60
image: nvcr.io/nvidian/nim-llm-dev/sglang-runtime:hzhou-0811-1
workingDir: /workspace/components/backends/sglang
args:
Expand Down
Loading
Loading