feat(sglang): disagg DP rank routing + backwards-compatible network imports#6736
feat(sglang): disagg DP rank routing + backwards-compatible network imports#6736ishandhanani merged 11 commits intomainfrom
Conversation
The prefill handler was missing the data_parallel_rank parameter in its async_generate call, preventing DP rank-aware routing from working in disaggregated mode. The decode handler already passes this correctly. Extract dp_rank from the routing info (set by the KV router in prefill_router.rs) and forward it to SGLang's engine so the prefill scheduler directs work to the correct DP rank. This works in conjunction with sgl-project/sglang#19168, which adds per-request DP rank resolution on the SGLang side -- the decode worker can now resolve the prefill DP rank via the bootstrap server rather than relying on bootstrap_room % dp_size.
- Revert publisher.py changes (PR #6736 handles the SGLang compat) - Unify /// doc comments to // regular comments in reasoning parser tests Signed-off-by: Matej Kosec <mkosec@nvidia.com>
- Revert publisher.py changes (PR #6736 handles the SGLang compat) - Unify /// doc comments to // regular comments in reasoning parser tests Signed-off-by: Matej Kosec <mkosec@nvidia.com>
SGLang post-0.5.9 moved network helpers to sglang.srt.utils.network and introduced NetworkAddress. This adds _compat.py that tries the new path first and falls back to the old path with a minimal polyfill, so Dynamo works with both sglang 0.5.9 and main.
Each fallback branch must note which version it supports and when it can be removed. Old fallbacks are cleaned up when that version falls outside the 1-version-back support window.
WalkthroughThis change introduces a SGLang network compatibility layer ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/src/dynamo/sglang/publisher.py (1)
30-51:⚠️ Potential issue | 🟠 MajorOnly replace wildcard-style hosts here.
This rewrite now discards any explicit host in
endpoint_templateand always substitutesip_address. That changes the old wildcard-replacement contract and breaks configs liketcp://127.0.0.1:5557, where the subscriber needs to keep the configured loopback host.🔧 Suggested fix
def format_zmq_endpoint(endpoint_template: str, ip_address: str) -> str: @@ parsed = urlparse(endpoint_template) - if parsed.scheme != "tcp" or parsed.port is None: + if parsed.scheme != "tcp" or parsed.port is None or parsed.hostname is None: raise ValueError(f"Expected tcp://host:port endpoint, got {endpoint_template!r}") - return NetworkAddress(ip_address, parsed.port).to_tcp() + host = parsed.hostname + if host in {"*", "0.0.0.0", "::"}: + host = ip_address + return NetworkAddress(host, parsed.port).to_tcp()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/sglang/publisher.py` around lines 30 - 51, format_zmq_endpoint currently always replaces the host with ip_address, breaking explicit hosts; change it to only substitute when the template host is a wildcard (e.g., '*' or the universal bind addresses '0.0.0.0' or '::'). In format_zmq_endpoint, inspect parsed.hostname (from urlparse(endpoint_template)) and if it is not a wildcard/universal value, return the original endpoint_template (or reconstruct and return parsed.scheme + host + port) unchanged; otherwise call NetworkAddress(ip_address, parsed.port).to_tcp() as before. Ensure you still validate scheme == "tcp" and parsed.port is present and only perform substitution when hostname indicates a wildcard bind.components/src/dynamo/sglang/register.py (1)
8-15:⚠️ Potential issue | 🟡 MinorCommit the
isortrewrite for this import block.Pre-merge is already failing on this file, so the new
_compatimport still needs the repository’s standard ordering applied before this can merge.As per coding guidelines, "Follow import ordering via isort (stdlib → third-party → first-party) and run
ruff format/ruff checkor pre-commit on touched Python files."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/sglang/register.py` around lines 8 - 15, The import block in register.py is out of isort order; reorder imports into stdlib → third-party → first-party groups and run the project's formatter (isort/ruff) so the new `dynamo.sglang._compat` import is placed in the first-party group. Specifically, group and sort the imports that reference sglang (sgl, ServerArgs), dynamo.sglang._compat (NetworkAddress, get_local_ip_auto), dynamo._core (Endpoint), dynamo.common.utils.output_modalities (get_output_modalities), dynamo.llm (ModelInput, ModelRuntimeConfig, ModelType, register_model), and dynamo.sglang.args (DynamoConfig) according to the repo's import ordering rules and then run the pre-commit/ruff checks to ensure formatting passes.
🧹 Nitpick comments (1)
components/src/dynamo/sglang/register.py (1)
89-110: Extract bootstrap host normalization into one helper.The
NetworkAddress(...).to_host_port_str().rsplit(":", 1)[0]sequence is now duplicated here and incomponents/src/dynamo/sglang/request_handlers/handler_base.py. Since discovery registration and the runtime bootstrap payload need to stay identical, centralizing it next to the shim would reduce drift risk.As per coding guidelines, "Keep functions and methods concise and focused on a single responsibility."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/sglang/register.py` around lines 89 - 110, Duplicate bootstrap host normalization logic (NetworkAddress(...).to_host_port_str().rsplit(":", 1)[0]) exists in register.py and handler_base.py; extract this into a single helper (e.g., normalize_bootstrap_host or build_bootstrap_host) placed next to the shim so both discovery registration and runtime bootstrap payload use the same code. Update components/src/dynamo/sglang/register.py (the code paths that set bootstrap_host from dist_init.resolved() and from get_local_ip_auto()/local_addr) to call the new helper, and likewise replace the duplicated sequence in components/src/dynamo/sglang/request_handlers/handler_base.py with the same helper; ensure the helper accepts a NetworkAddress (or host and port) and returns the normalized host string, preserving IPv6 bracket handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/src/dynamo/sglang/request_handlers/handler_base.py`:
- Around line 21-23: Reorder the import block to follow isort ordering (stdlib →
third-party → first-party): ensure any stdlib imports come first, then
third-party like "import sglang as sgl", and then the local package import "from
dynamo.sglang._compat import NetworkAddress, get_local_ip_auto"; run isort/ruff
format or pre-commit on handler_base.py (referencing the top-level imports in
this file) to apply and persist the change before merging.
---
Outside diff comments:
In `@components/src/dynamo/sglang/publisher.py`:
- Around line 30-51: format_zmq_endpoint currently always replaces the host with
ip_address, breaking explicit hosts; change it to only substitute when the
template host is a wildcard (e.g., '*' or the universal bind addresses '0.0.0.0'
or '::'). In format_zmq_endpoint, inspect parsed.hostname (from
urlparse(endpoint_template)) and if it is not a wildcard/universal value, return
the original endpoint_template (or reconstruct and return parsed.scheme + host +
port) unchanged; otherwise call NetworkAddress(ip_address, parsed.port).to_tcp()
as before. Ensure you still validate scheme == "tcp" and parsed.port is present
and only perform substitution when hostname indicates a wildcard bind.
In `@components/src/dynamo/sglang/register.py`:
- Around line 8-15: The import block in register.py is out of isort order;
reorder imports into stdlib → third-party → first-party groups and run the
project's formatter (isort/ruff) so the new `dynamo.sglang._compat` import is
placed in the first-party group. Specifically, group and sort the imports that
reference sglang (sgl, ServerArgs), dynamo.sglang._compat (NetworkAddress,
get_local_ip_auto), dynamo._core (Endpoint),
dynamo.common.utils.output_modalities (get_output_modalities), dynamo.llm
(ModelInput, ModelRuntimeConfig, ModelType, register_model), and
dynamo.sglang.args (DynamoConfig) according to the repo's import ordering rules
and then run the pre-commit/ruff checks to ensure formatting passes.
---
Nitpick comments:
In `@components/src/dynamo/sglang/register.py`:
- Around line 89-110: Duplicate bootstrap host normalization logic
(NetworkAddress(...).to_host_port_str().rsplit(":", 1)[0]) exists in register.py
and handler_base.py; extract this into a single helper (e.g.,
normalize_bootstrap_host or build_bootstrap_host) placed next to the shim so
both discovery registration and runtime bootstrap payload use the same code.
Update components/src/dynamo/sglang/register.py (the code paths that set
bootstrap_host from dist_init.resolved() and from
get_local_ip_auto()/local_addr) to call the new helper, and likewise replace the
duplicated sequence in
components/src/dynamo/sglang/request_handlers/handler_base.py with the same
helper; ensure the helper accepts a NetworkAddress (or host and port) and
returns the normalized host string, preserving IPv6 bracket handling.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ce1bbe20-a510-49e7-8eb6-8cf6fddb40e8
📒 Files selected for processing (6)
components/src/dynamo/sglang/CLAUDE.mdcomponents/src/dynamo/sglang/_compat.pycomponents/src/dynamo/sglang/publisher.pycomponents/src/dynamo/sglang/register.pycomponents/src/dynamo/sglang/request_handlers/handler_base.pycomponents/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
|
@ishandhanani Hi, If router_mode is not kv, prefill handler will always get a dp rank "0". |
Good catch. Having @PeaBrane review your change. If it makes sense we can cherry pick it into this PR and get all changes in at once. As always will give you credit for the work 😄 |
When router_mode is not KV, the SimpleRouter path was hardcoding dp_rank to 0, causing prefill to always target the first data parallel rank. Use u32::MAX as a sentinel value instead, and treat it as None on the Python side so SGLang picks the correct rank. Cherry-picked from #7214. Co-Authored-By: huitian bai <baihuitian.bht@gmail.com>
Avoid recomputing 2**32-1 on every request. Co-Authored-By: huitian bai <baihuitian.bht@gmail.com>
258f8df to
bb1a919
Compare
- Revert publisher.py changes (PR #6736 handles the SGLang compat) - Unify /// doc comments to // regular comments in reasoning parser tests Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Summary
data_parallel_rankfrom routing info to SGLang's prefill handler for disagg DP routing_compat.pyshim for SGLang network imports -- works with both 0.5.9 and mainNetworkAddress(real or polyfill) for IPv6-safe address handling in publisher, register, and handler_baseChanges
Disagg DP rank routing (
prefill_handler.py)Extracts
dp_rankfrom the routing dict and passesdata_parallel_ranktoengine.async_generate(). The decode handler already does this; this closes the gap for disaggregated serving.SGLang backwards compatibility (
_compat.py)SGLang post-0.5.9 moved
get_local_ip_auto,get_zmq_socketfromsglang.srt.utilstosglang.srt.utils.networkand introducedNetworkAddress. Rather than pinning to one version,_compat.pytries the new import path first and falls back to the old path with a minimalNetworkAddresspolyfill.All SGLang network imports in the component now go through
_compat.pyinstead of importing directly fromsglang.srt.utils*.Supersedes PR #7597
PR #7597 (Fix SGLang network helper imports) is a strict subset of this work. If this merges first, #7597 can be closed.
Validation
Tested all 4 combinations:
Summary by CodeRabbit
New Features
dp_rank(data parallel rank) parameter in request routing, enabling more granular control over distributed execution.Documentation
Refactor