-
Notifications
You must be signed in to change notification settings - Fork 703
Description
Describe the Bug
Problem Description
The SGLang planner fails to detect and manage SGLang workers during autoscaling operations, causing division by zero errors and preventing proper scaling functionality.
Root Cause
There's a mismatch between the component names that SGLang workers register under and what the planner expects to find:
SGLang workers register as:
- Default workers:
instances/dynamo/backend/generate - Prefill workers:
instances/dynamo/prefill/generate(when using--disaggregation-mode prefill)
But SGLang planner expects to find:
- Prefill workers:
instances/dynamo/worker/generate - Decode workers:
instances/dynamo/decode/generate
This is defined in components/planner/src/dynamo/planner/defaults.py:
class SGLangComponentName:
prefill_worker_component_name = "worker" # ← Should be "prefill"
decode_worker_component_name = "decode" # ← Should be "backend"Comparison with vLLM
vLLM works correctly because its component names align with worker registration:
class VllmComponentName:
prefill_worker_component_name = "prefill" # Matches registration
decode_worker_component_name = "backend" # Matches registrationLogs
[INFO] planner_core.observe_metrics: Number of prefill workers: 0, number of decode workers: 0
[ERROR] planner_core.make_adjustments: Failed to correct prediction factors: float division by zero
Meanwhile, SGLang worker logs show successful registration:
[DEBUG] Starting endpoint: instances/dynamo/backend/generate:40239932a4066516
Environment
- Dynamo version: 0.4.1
- Backend: SGLang
- Deployment: Kubernetes with DynamoGraphDeployment
- Planner: SLA-based autoscaling
Workaround
Currently requires manually specifying --endpoint arguments in SGLang worker configurations to override default component names.
Steps to Reproduce
- use canonical sglang dockerfile to build for 0.4.1
- custom disagg_planner.yaml to run with planner
- you will find that the planner is missing the workers.
Expected Behavior
SGLang planner should detect workers automatically, just like vLLM planner does.
planner expects to find:
SGLang workers register as:
- Default workers:
instances/dynamo/backend/generate - Prefill workers:
instances/dynamo/prefill/generate(when using--disaggregation-mode prefill)
Actual Behavior
- SGLang workers start and register successfully
- Planner starts and looks for workers at wrong endpoints
- Planner reports "Number of prefill workers: 0, number of decode workers: 0"
- Scaling calculations fail with "Failed to correct prediction factors: float division by zero"
- No autoscaling occurs
Environment
dynamo version 0.4.1
Additional Context
No response
Screenshots
No response