chore(sglang): bump to 0.5.12.post1 by ishandhanani · Pull Request #9677 · ai-dynamo/dynamo

ishandhanani · 2026-05-18T12:41:40Z

Summary

Bump dynamo's SGLang backend from 0.5.11 → 0.5.12.post1.

0.5.12.post1 is a patch release that cherry-picks upstream sgl-project/sglang#25699 (via #25731) onto release/v0.5.12, fixing the NIXL PD-disaggregation regression that hung disagg.sh on dense models. See the disagg section below. With post1 the bump is clean across all testable launch scripts — no downstream workaround needed.

pyproject.toml: sglang[diffusion]==0.5.11 → 0.5.12.post1; also pin accelerate>=0.17.0 in the [sglang] extra since SGLang's [diffusion] extra dropped it in 0.5.12 and diffusers still requires it for enable_model_cpu_offload.
container/context.yaml: runtime image tags v0.5.11-{cu129,cu130}-runtime → v0.5.12.post1-{cu129,cu130}-runtime (both variants verified on Docker Hub before bumping)
container/compliance/README.md: base-image reference table rows
container/templates/sglang_runtime.Dockerfile: refreshed the version reference in the openai/distro workaround comment
Unskipped 3 deepseek_v4 unit tests in frontend/tests/test_sglang_processor_unit.py that were gated on the 0.5.12 dispatch path landing (DYN-3049). Switched reasoning_parser_name from "deepseek_v4" → "deepseek-v4" (the canonical key in sglang.srt.parser.reasoning_parser.ReasoningParser.DetectorMap in 0.5.12).

_compat.py: no changes. The current shim is signature-/hasattr-probing only with no version-tagged fallback branches, so the N=0.5.12 / N-1=0.5.11 deprecation pass (skill Step 10) is a no-op.

Launch-script walk (2x NVIDIA L40S, venv `.venv-sgl-0.5.12`)

#	Script	Status	Notes
1	`agg.sh`	PASS	Qwen3-0.6B, 16-token chat
2	`agg_embed.sh`	PASS	Qwen3-Embedding-4B, dim=2560
3	`agg_router.sh`	PASS	Two workers, KV router selected both worker IDs across requests
4	`agg_vision.sh`	PASS	Qwen3-VL-2B-Instruct, local base64 image
5	`disagg.sh`	PASS on `0.5.12.post1`	Hung on plain `v0.5.12` (upstream NIXL regression `d7f4761a4`); fixed in post1. Re-verified end-to-end on post1 wheel — see below.
6	`diffusion_llada.sh`	PASS	LLaDA2.0-mini-preview, 48-token diffusion-LM
7	`image_diffusion.sh`	PASS	FLUX.1-dev, 183 KB PNG (4 inference steps). See env note below.
8	`text-to-video-diffusion.sh`	PASS	Wan2.1-T2V-1.3B-Diffusers, 446 KB MP4 (2 inference steps)
9	`multimodal_epd.sh`	PASS	Re-ran with `--model Qwen/Qwen3-VL-2B-Instruct --chat-template qwen2-vl` since the default Qwen2.5-VL-7B isn't cached on this box
10	`disagg_router.sh`	SKIP	Needs 4 GPUs, this box has 2
11	`multimodal_disagg.sh`	SKIP	Needs 3 GPUs, this box has 2
12	`disagg_same_gpu.sh`	SKIP	Optional per skill (default skip)

9/9 testable PASS, 3 SKIP for GPU count + optional.

`disagg.sh` NIXL regression — fixed in 0.5.12.post1

On plain v0.5.12, disagg.sh hung forever for NIXL disagg on dense LLMs (Qwen3, LLaMA, Gemma, …). Investigated under the debug-session skill; full worklog: disagg-sglang-0512-nixl.md (kept out of the PR diff). Bisected to sgl-project/sglang@d7f4761a4 ([PD] Refactor hybrid state transfer (#24932)), which introduced two asymmetries:

NixlKVManager.transfer_worker gated the aux RDMA write inside if kv_chunk.is_last and kv_chunk.state_indices:. For dense models state_indices is [] (falsy), so the whole branch short-circuited and send_aux was never called — decode never received the {room}_aux notification.
NixlKVReceiver.send_metadata set expects_state=True whenever state_indices is not None — but decode receives state_indices=[] (non-None, empty), so expects_state flipped on. Prefill uses a truthy check and (correctly) never sent a state notif for dense models, so is_done() waited forever.

Either fix alone is insufficient; both are required. Both landed upstream in #25699 (split the aux send out of the state-gated branch; match the decode-side truthy check), cherry-picked onto release/v0.5.12 as #25731 and shipped in 0.5.12.post1. This PR pins to post1 rather than vendoring a downstream monkey-patch or image sed.

Mooncake was unaffected — it gates state and aux on independent if blocks and has no expects_state field.

Env dependency change: `accelerate`

sglang[diffusion] no longer pulls in accelerate in 0.5.12 (it now only appears in the test extra). Without it, image_diffusion.sh crashes at startup:

ImportError: enable_model_cpu_offload requires accelerate v0.17.0 or higher.

The container template at container/templates/sglang_runtime.Dockerfile:57-60 already pins accelerate==1.13.0 defensively, so the runtime image was unaffected. To make fresh local venvs work too, this PR adds accelerate>=0.17.0 to the [sglang] extra in dynamo's pyproject.toml with a one-line comment.

Pre-flight env vars (unchanged from 0.5.11)

SGLANG_DISABLE_CUDNN_CHECK=1 still required for any vision / multimodal worker
HF_HUB_OFFLINE=1 needed when running the gated diffusion models with stale HF_TOKEN; the cached snapshots load fine

Test plan

Launch-script walk (9/9 testable PASS)
Unskipped deepseek_v4 unit tests pass locally (pytest -k deepseek_v4 → 5/5)
disagg.sh end-to-end on 0.5.12.post1 PyPI wheel (2x L40S) — tokens returned in ~5s, no NIXL KVReceiver timeout, clean prefill→decode KV transfer
Upstream fix merged + released: sgl-project/sglang #25699 → #25731 → v0.5.12.post1
uvx pre-commit run --files <touched> clean
CI green

Summary by CodeRabbit

Chores
- Updated SGLang framework and Docker runtime images to v0.5.12.post1
Tests
- Improved DeepSeek-V4 testing by enabling previously skipped test cases and updating parser configuration

- pyproject pin sglang[diffusion]==0.5.11 → 0.5.12 - container runtime tags v0.5.11-{cu129,cu130}-runtime → v0.5.12 for both context.yaml and compliance/README.md - container template comment refreshed to reference 0.5.12 server_args path for the openai/distro workaround - unskip 3 deepseek_v4 tests (DYN-3049): the speculative dispatch path now exists upstream in 0.5.12; tests use the canonical hyphenated reasoning_parser_name 'deepseek-v4' to match SGLang's ReasoningParser DetectorMap Launch-script walk on 2x L40S: 8/9 PASS, 1 FAIL. disagg.sh trips a NIXL KVReceiver waiting_timeout on the decode side — upstream regression in the v0.5.12 NIXL refactor (#23967, #22536), not in dynamo code. Same script PASSED on the 0.5.11 bump in this env. Flagging for follow-up; agg, agg_embed, agg_router, agg_vision, diffusion_llada, image_diffusion, text-to-video-diffusion, and multimodal_epd all pass.

SGLang's [diffusion] extra dropped accelerate in 0.5.12 (it now lives only in their [test] extra), but diffusers still requires it for enable_model_cpu_offload(). Without it, image_diffusion workers crash at startup with ImportError. Pin explicitly so dynamo's sglang extra stays self-contained regardless of how upstream restructures its extras.

….12 base-image table update # Conflicts: # container/compliance/README.md

0.5.12.post1 cherry-picks upstream #25699 (#25731) onto release/v0.5.12, fixing two NIXL PD-disaggregation regressions from #24932 that hung disagg.sh on dense models (Qwen3/LLaMA/Gemma): - prefill skipped the aux RDMA write when state_indices was empty - decode set expects_state=True for empty (non-None) state_indices, so is_done() waited forever for a state notif prefill never sent Bumps the PyPI pin and the cu129/cu130 runtime image tags. Verified disagg.sh end-to-end on 2x L40S: request returns tokens, no NIXL KVReceiver timeout.

ishandhanani · 2026-05-25T06:10:45Z

Bumped the pin from 0.5.12 → 0.5.12.post1 (PyPI wheel + cu129/cu130 runtime image tags).

0.5.12.post1 cherry-picks upstream #25699 (via #25731) onto release/v0.5.12, fixing the two NIXL PD-disaggregation regressions from #24932 that hung disagg.sh on dense models (Qwen3/LLaMA/Gemma):

prefill skipped the aux RDMA write when state_indices was empty
decode set expects_state=True for an empty (non-None) state_indices, so is_done() waited forever for a state notif prefill never sent

Validated examples/backends/sglang/launch/disagg.sh end-to-end on 2x L40S with the post1 wheel: request returns tokens in ~5s, no NIXL KVReceiver timeout, clean prefill→decode KV transfer.

github-actions · 2026-05-25T06:12:51Z

🌿 Fern Docs Preview: https://nvidia-preview-42ef3cda-1962-45e1-872c-fb3423a0ce4c.docs.buildwithfern.com/dynamo/dev

coderabbitai · 2026-05-25T06:16:13Z

Walkthrough

This PR enables DeepSeek-V4 reasoning tests by removing a skip decorator and correcting the parser parameter format, while simultaneously upgrading SGLang from v0.5.11 to v0.5.12.post1 across all dependency declarations and Docker configurations.

Changes

SGLang v0.5.12.post1 Upgrade and Test Fixes

Layer / File(s)	Summary
DeepSeek-V4 Test Fixes `components/src/dynamo/frontend/tests/test_sglang_processor_unit.py`	Removes the `pytest.mark.skip` decorator from the DeepSeek-V4 encoder-path test, allowing it to execute. Corrects `reasoning_parser_name` parameter from `"deepseek_v4"` (underscore) to `"deepseek-v4"` (hyphen) in both the encoder-path test and the named tool-choice filtering test.
SGLang Version Upgrade Across Dependencies and Configs `pyproject.toml`, `container/context.yaml`, `container/compliance/README.md`, `container/templates/sglang_runtime.Dockerfile`	Updates the `sglang[diffusion]` dependency from `0.5.11` to `0.5.12.post1` in pyproject.toml, with a note that this version drops the `accelerate` dependency (kept separately at `>=0.17.0`). Updates Docker runtime image tags for CUDA 12.9 and 13.0 variants from `v0.5.11` to `v0.5.12.post1` in context.yaml. Refreshes base image references in the compliance documentation README. Updates the sglang_runtime.Dockerfile comment to reference the new version.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: bumping SGLang from 0.5.11 to 0.5.12.post1 across the codebase, which is the primary objective of this PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description comprehensively covers all required template sections with detailed context.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

pull-request-size Bot added the size/S label May 18, 2026

github-actions Bot added documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` container labels May 18, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 18, 2026 13:07 Inactive

ishandhanani force-pushed the idhanani/sgl-to-0.5.12 branch from 5484d1d to 62521d8 Compare May 18, 2026 13:08

copy-pr-bot Bot temporarily deployed to GITLAB May 18, 2026 13:08 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 18, 2026 13:17 Inactive

dmitry-tokarev-nv changed the title ~~sglang: bump to 0.5.12~~ chore(sglang): bump to 0.5.12 May 18, 2026

github-actions Bot added the chore label May 18, 2026

ishandhanani mentioned this pull request May 19, 2026

[Bug][PD][NIXL] cherry-pick #25699 onto release/v0.5.12 sgl-project/sglang#25731

Merged

Merge main into idhanani/sgl-to-0.5.12 to pick up sglang 0.5.11 → 0.5…

ff4d59d

….12 base-image table update # Conflicts: # container/compliance/README.md

copy-pr-bot Bot temporarily deployed to GITLAB May 21, 2026 19:44 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 21, 2026 19:45 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 21, 2026 21:07 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 21, 2026 21:19 Inactive

dmitry-tokarev-nv mentioned this pull request May 22, 2026

fix(sglang): restore NetworkAddress compat shim for dsv4 base image #9842

Closed

3 tasks

copy-pr-bot Bot temporarily deployed to GITLAB May 25, 2026 06:10 Inactive

ishandhanani marked this pull request as ready for review May 25, 2026 06:10

ishandhanani requested review from a team as code owners May 25, 2026 06:10

copy-pr-bot Bot temporarily deployed to GITLAB May 25, 2026 06:11 Inactive

ishandhanani changed the title ~~chore(sglang): bump to 0.5.12~~ chore(sglang): bump to 0.5.12.post1 May 25, 2026

ishandhanani enabled auto-merge (squash) May 25, 2026 06:14

dillon-cullinan approved these changes May 25, 2026

View reviewed changes

ishandhanani merged commit 2af67e8 into main May 25, 2026
99 checks passed

ishandhanani deleted the idhanani/sgl-to-0.5.12 branch May 25, 2026 07:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(sglang): bump to 0.5.12.post1#9677

chore(sglang): bump to 0.5.12.post1#9677
ishandhanani merged 4 commits into
mainfrom
idhanani/sgl-to-0.5.12

ishandhanani commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

ishandhanani commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ishandhanani commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Launch-script walk (2x NVIDIA L40S, venv .venv-sgl-0.5.12)

disagg.sh NIXL regression — fixed in 0.5.12.post1

Env dependency change: accelerate

Pre-flight env vars (unchanged from 0.5.11)

Test plan

Summary by CodeRabbit

Uh oh!

ishandhanani commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ishandhanani commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Launch-script walk (2x NVIDIA L40S, venv `.venv-sgl-0.5.12`)

`disagg.sh` NIXL regression — fixed in 0.5.12.post1

Env dependency change: `accelerate`

coderabbitai Bot commented May 25, 2026 •

edited

Loading