fix: dp_rank always 0 in non-KV router mode#7984
Conversation
WalkthroughThe changes refactor Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/llm/src/kv_router/prefill_router/execution.rs (1)
193-207: Theunwrap_or(0)ondp_rankis never used in practice.The function returns
Option<(u64, u32)>containing the extracted dp_rank, but both call sites intentionally discard this value:
- mod.rs:171: Uses
_worker_info(explicit discard pattern)- execution.rs:235-240: Uses
Ok(_)(discards entire result)The return value is built but never consumed. If the PR theme is standardizing dp_rank handling, consider either removing this extraction entirely or documenting why it's kept for future use.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/llm/src/kv_router/prefill_router/execution.rs` around lines 193 - 207, The extracted dp_rank inside the disaggregated_params parsing (variable prefill_worker_info built from disaggregated_params.get("worker_id") and fields prefill_worker_id / prefill_dp_rank) is never used; either remove dp_rank extraction and simplify the tuple to only return the needed worker_id or drop the entire prefill_worker_info extraction if callers ignore the result. Update the code in execution.rs where prefill_worker_info is constructed: remove the prefill_dp_rank.get("prefill_dp_rank") mapping and the map/unwrap_or(0) use, and adjust the Some((worker_id, dp_rank)) to only return Some(worker_id) (and change the Option type accordingly) or delete the block and any references to prefill_worker_info so no unused value is produced; if you keep the field for future use, add a clear TODO comment near disaggregated_params / prefill_worker_info explaining why dp_rank is retained.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@lib/llm/src/kv_router/prefill_router/execution.rs`:
- Around line 193-207: The extracted dp_rank inside the disaggregated_params
parsing (variable prefill_worker_info built from
disaggregated_params.get("worker_id") and fields prefill_worker_id /
prefill_dp_rank) is never used; either remove dp_rank extraction and simplify
the tuple to only return the needed worker_id or drop the entire
prefill_worker_info extraction if callers ignore the result. Update the code in
execution.rs where prefill_worker_info is constructed: remove the
prefill_dp_rank.get("prefill_dp_rank") mapping and the map/unwrap_or(0) use, and
adjust the Some((worker_id, dp_rank)) to only return Some(worker_id) (and change
the Option type accordingly) or delete the block and any references to
prefill_worker_info so no unused value is produced; if you keep the field for
future use, add a clear TODO comment near disaggregated_params /
prefill_worker_info explaining why dp_rank is retained.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 47d219dd-e2ae-47a4-b056-384042fd605b
📒 Files selected for processing (3)
lib/llm/src/kv_router/prefill_router/execution.rslib/llm/src/kv_router/prefill_router/mod.rslib/llm/src/kv_router/prefill_router/types.rs
|
Verified this fix with a 1P2D disaggregated benchmark
This gives ~2.2x throughput recovery and ~4.2x TTFT improvement. Close the perf gap reported here: #7984 (comment) |
…print PrefillRouter::query_prefill_worker returns Option<u32> for dp_rank. The C FFI wrapper was declaring u32, causing E0308 in clippy. Map None to u32::MAX (NO_DP_RANK sentinel) so the Python side sees _DP_RANK_UNSET.
None for dp rank in round robin
None for dp rank in round robin|
wow, what a catch, congrats |
|
according to codex
regression test here |
|
We should plan to follow up to cleanup the dp rank plumbing in the code base
|
Problem
In non-KV-router mode (e.g. NATS with round-robin), all prefill requests are routed with
dp_rank=0, pinning 100% of prefill work to DP rank 0 while ranks 1-3 sit idle. This wastes a shit ton of prefill capacity.Root cause
PR #6736 attempted to fix this with a
u32::MAXsentinel value inprefill_router.rsSimpleRouter path, converted toNoneon the Python side. However, the sentinel never reaches the Python handler — the value arrives as0through the NATS transport path, bypassing the sentinel logic. The more interesting piece is that this regression only came about during benchmarking when we used the next PR #7617. Will investigate why that is in follow upsEvidence (from GB200 cluster, 4P8D DeepSeek-R1 FP4, C=4096)
HEAD (dp_rank=0 bug):
This PR (dp_rank=None):
Decode workers are unaffected — they distribute evenly in both cases.
Fix
Change
dp_rankfromu32toOption<u32>in the Rust prefill router types. This properly propagatesNonewhen no specific DP rank is selected, instead of relying on a sentinel value that gets lost in transit.Changes
types.rs:dp_rank: u32→dp_rank: Option<u32>inPrefillResolveDecisionexecution.rs: Remove.unwrap_or(0)fallback, passOptionthroughmod.rs: Assigndp_rankdirectly asOptionTest plan
None(not0) in prefill handler logsContext