Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,12 @@ async def generate(
}

input_param = self._get_input_param(inner_request)
priority = (inner_request.get("routing") or {}).get("priority")
routing = inner_request.get("routing") or {}
priority = routing.get("priority")
dp_rank = routing.get("dp_rank")

if dp_rank is not None and dp_rank == 2**32 - 1:
dp_rank = None
Comment on lines +117 to +122
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check decode_handler.py for dp_rank handling
rg -n -C5 "dp_rank" components/src/dynamo/sglang/request_handlers/llm/decode_handler.py

Repository: ai-dynamo/dynamo

Length of output: 2097


🏁 Script executed:

# Check how async_generate uses data_parallel_rank
rg -n -B2 -A5 "data_parallel_rank" components/src/dynamo/sglang/request_handlers/llm/ --type py | head -100

Repository: ai-dynamo/dynamo

Length of output: 2735


🏁 Script executed:

# Check vllm/handlers.py to understand the lenient routing pattern mentioned in learnings
rg -n -B3 -A8 "_to_local_dp_rank" components/src/dynamo/vllm/handlers.py

Repository: ai-dynamo/dynamo

Length of output: 2187


🏁 Script executed:

# Check if there are other handlers that normalize dp_rank
fd -e py components/src/dynamo/sglang/request_handlers | xargs rg -l "dp_rank" | head -20

Repository: ai-dynamo/dynamo

Length of output: 1396


🏁 Script executed:

# Search for any shared normalization logic or base handler class
rg -n "def.*dp_rank" components/src/dynamo/sglang/request_handlers/ --type py

Repository: ai-dynamo/dynamo

Length of output: 42


🏁 Script executed:

# Check the base handler class if one exists
fd -e py . components/src/dynamo/sglang/request_handlers | xargs rg -l "class.*Handler" | head -5

Repository: ai-dynamo/dynamo

Length of output: 454


🏁 Script executed:

# Check when/if the normalization was added to prefill_handler
git log -p --all -S "2**32 - 1" -- components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py | head -150

Repository: ai-dynamo/dynamo

Length of output: 42


Apply the same u32::MAX normalization in decode_handler.py.

The prefill_handler normalizes the sentinel value 2**32 - 1 (u32::MAX) to None at lines 121–122 before passing dp_rank to async_generate. However, decode_handler.py extracts dp_rank from routing at lines 138 and 180 without performing this normalization, then passes it directly to async_generate at lines 150 and 190.

Both handlers use the same async_generate interface and should normalize the sentinel value consistently. Add the same normalization check in decode_handler.py:

if dp_rank is not None and dp_rank == 2**32 - 1:
    dp_rank = None

This ensures the handlers follow the same defensive programming pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py` around
lines 117 - 122, decode_handler.py fails to normalize the sentinel u32::MAX
value for dp_rank before calling async_generate; find the places where
routing.get("dp_rank") is read (the dp_rank extractions around the decode
handler branches that later call async_generate) and add the same check used in
prefill_handler: if dp_rank is not None and dp_rank == 2**32 - 1: dp_rank = None
so that dp_rank passed into async_generate is normalized consistently across
handlers.


trace_header = self._get_trace_header(context) if self.enable_trace else None

Expand All @@ -127,6 +132,7 @@ async def generate(
bootstrap_room=bootstrap_room,
external_trace_header=trace_header,
rid=trace_id,
data_parallel_rank=dp_rank,
**self._priority_kwargs(priority),
)

Expand Down
2 changes: 1 addition & 1 deletion lib/llm/src/kv_router/prefill_router.rs
Original file line number Diff line number Diff line change
Expand Up @@ -537,7 +537,7 @@ impl PrefillRouter {
r.peek_next_worker()
}
.ok_or_else(|| anyhow::anyhow!("No workers available for prefill"))?;
Ok((worker_id, 0))
Ok((worker_id, u32::MAX))
}
}
}
Expand Down
Loading