Skip to content

chore(sglang): bump to 0.5.12.post1#9677

Merged
ishandhanani merged 4 commits into
mainfrom
idhanani/sgl-to-0.5.12
May 25, 2026
Merged

chore(sglang): bump to 0.5.12.post1#9677
ishandhanani merged 4 commits into
mainfrom
idhanani/sgl-to-0.5.12

Conversation

@ishandhanani
Copy link
Copy Markdown
Contributor

@ishandhanani ishandhanani commented May 18, 2026

Summary

Bump dynamo's SGLang backend from 0.5.11 → 0.5.12.post1.

0.5.12.post1 is a patch release that cherry-picks upstream sgl-project/sglang#25699 (via #25731) onto release/v0.5.12, fixing the NIXL PD-disaggregation regression that hung disagg.sh on dense models. See the disagg section below. With post1 the bump is clean across all testable launch scripts — no downstream workaround needed.

  • pyproject.toml: sglang[diffusion]==0.5.110.5.12.post1; also pin accelerate>=0.17.0 in the [sglang] extra since SGLang's [diffusion] extra dropped it in 0.5.12 and diffusers still requires it for enable_model_cpu_offload.
  • container/context.yaml: runtime image tags v0.5.11-{cu129,cu130}-runtimev0.5.12.post1-{cu129,cu130}-runtime (both variants verified on Docker Hub before bumping)
  • container/compliance/README.md: base-image reference table rows
  • container/templates/sglang_runtime.Dockerfile: refreshed the version reference in the openai/distro workaround comment
  • Unskipped 3 deepseek_v4 unit tests in frontend/tests/test_sglang_processor_unit.py that were gated on the 0.5.12 dispatch path landing (DYN-3049). Switched reasoning_parser_name from "deepseek_v4""deepseek-v4" (the canonical key in sglang.srt.parser.reasoning_parser.ReasoningParser.DetectorMap in 0.5.12).

_compat.py: no changes. The current shim is signature-/hasattr-probing only with no version-tagged fallback branches, so the N=0.5.12 / N-1=0.5.11 deprecation pass (skill Step 10) is a no-op.

Launch-script walk (2x NVIDIA L40S, venv .venv-sgl-0.5.12)

# Script Status Notes
1 agg.sh PASS Qwen3-0.6B, 16-token chat
2 agg_embed.sh PASS Qwen3-Embedding-4B, dim=2560
3 agg_router.sh PASS Two workers, KV router selected both worker IDs across requests
4 agg_vision.sh PASS Qwen3-VL-2B-Instruct, local base64 image
5 disagg.sh PASS on 0.5.12.post1 Hung on plain v0.5.12 (upstream NIXL regression d7f4761a4); fixed in post1. Re-verified end-to-end on post1 wheel — see below.
6 diffusion_llada.sh PASS LLaDA2.0-mini-preview, 48-token diffusion-LM
7 image_diffusion.sh PASS FLUX.1-dev, 183 KB PNG (4 inference steps). See env note below.
8 text-to-video-diffusion.sh PASS Wan2.1-T2V-1.3B-Diffusers, 446 KB MP4 (2 inference steps)
9 multimodal_epd.sh PASS Re-ran with --model Qwen/Qwen3-VL-2B-Instruct --chat-template qwen2-vl since the default Qwen2.5-VL-7B isn't cached on this box
10 disagg_router.sh SKIP Needs 4 GPUs, this box has 2
11 multimodal_disagg.sh SKIP Needs 3 GPUs, this box has 2
12 disagg_same_gpu.sh SKIP Optional per skill (default skip)

9/9 testable PASS, 3 SKIP for GPU count + optional.

disagg.sh NIXL regression — fixed in 0.5.12.post1

On plain v0.5.12, disagg.sh hung forever for NIXL disagg on dense LLMs (Qwen3, LLaMA, Gemma, …). Investigated under the debug-session skill; full worklog: disagg-sglang-0512-nixl.md (kept out of the PR diff). Bisected to sgl-project/sglang@d7f4761a4 ([PD] Refactor hybrid state transfer (#24932)), which introduced two asymmetries:

  1. NixlKVManager.transfer_worker gated the aux RDMA write inside if kv_chunk.is_last and kv_chunk.state_indices:. For dense models state_indices is [] (falsy), so the whole branch short-circuited and send_aux was never called — decode never received the {room}_aux notification.
  2. NixlKVReceiver.send_metadata set expects_state=True whenever state_indices is not None — but decode receives state_indices=[] (non-None, empty), so expects_state flipped on. Prefill uses a truthy check and (correctly) never sent a state notif for dense models, so is_done() waited forever.

Either fix alone is insufficient; both are required. Both landed upstream in #25699 (split the aux send out of the state-gated branch; match the decode-side truthy check), cherry-picked onto release/v0.5.12 as #25731 and shipped in 0.5.12.post1. This PR pins to post1 rather than vendoring a downstream monkey-patch or image sed.

Mooncake was unaffected — it gates state and aux on independent if blocks and has no expects_state field.

Env dependency change: accelerate

sglang[diffusion] no longer pulls in accelerate in 0.5.12 (it now only appears in the test extra). Without it, image_diffusion.sh crashes at startup:

ImportError: enable_model_cpu_offload requires accelerate v0.17.0 or higher.

The container template at container/templates/sglang_runtime.Dockerfile:57-60 already pins accelerate==1.13.0 defensively, so the runtime image was unaffected. To make fresh local venvs work too, this PR adds accelerate>=0.17.0 to the [sglang] extra in dynamo's pyproject.toml with a one-line comment.

Pre-flight env vars (unchanged from 0.5.11)

  • SGLANG_DISABLE_CUDNN_CHECK=1 still required for any vision / multimodal worker
  • HF_HUB_OFFLINE=1 needed when running the gated diffusion models with stale HF_TOKEN; the cached snapshots load fine

Test plan

  • Launch-script walk (9/9 testable PASS)
  • Unskipped deepseek_v4 unit tests pass locally (pytest -k deepseek_v4 → 5/5)
  • disagg.sh end-to-end on 0.5.12.post1 PyPI wheel (2x L40S) — tokens returned in ~5s, no NIXL KVReceiver timeout, clean prefill→decode KV transfer
  • Upstream fix merged + released: sgl-project/sglang #25699 → #25731 → v0.5.12.post1
  • uvx pre-commit run --files <touched> clean
  • CI green

Summary by CodeRabbit

  • Chores

    • Updated SGLang framework and Docker runtime images to v0.5.12.post1
  • Tests

    • Improved DeepSeek-V4 testing by enabling previously skipped test cases and updating parser configuration

Review Change Stack

- pyproject pin sglang[diffusion]==0.5.11 → 0.5.12
- container runtime tags v0.5.11-{cu129,cu130}-runtime → v0.5.12 for both
  context.yaml and compliance/README.md
- container template comment refreshed to reference 0.5.12 server_args path
  for the openai/distro workaround
- unskip 3 deepseek_v4 tests (DYN-3049): the speculative dispatch path now
  exists upstream in 0.5.12; tests use the canonical hyphenated
  reasoning_parser_name 'deepseek-v4' to match SGLang's ReasoningParser
  DetectorMap

Launch-script walk on 2x L40S: 8/9 PASS, 1 FAIL.
disagg.sh trips a NIXL KVReceiver waiting_timeout on the decode side —
upstream regression in the v0.5.12 NIXL refactor (#23967, #22536), not in
dynamo code. Same script PASSED on the 0.5.11 bump in this env. Flagging
for follow-up; agg, agg_embed, agg_router, agg_vision, diffusion_llada,
image_diffusion, text-to-video-diffusion, and multimodal_epd all pass.
@github-actions github-actions Bot added documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` container labels May 18, 2026
SGLang's [diffusion] extra dropped accelerate in 0.5.12 (it now lives only
in their [test] extra), but diffusers still requires it for
enable_model_cpu_offload(). Without it, image_diffusion workers crash at
startup with ImportError. Pin explicitly so dynamo's sglang extra stays
self-contained regardless of how upstream restructures its extras.
@ishandhanani ishandhanani force-pushed the idhanani/sgl-to-0.5.12 branch from 5484d1d to 62521d8 Compare May 18, 2026 13:08
@dmitry-tokarev-nv dmitry-tokarev-nv changed the title sglang: bump to 0.5.12 chore(sglang): bump to 0.5.12 May 18, 2026
@github-actions github-actions Bot added the chore label May 18, 2026
….12 base-image table update

# Conflicts:
#	container/compliance/README.md
0.5.12.post1 cherry-picks upstream #25699 (#25731) onto release/v0.5.12,
fixing two NIXL PD-disaggregation regressions from #24932 that hung
disagg.sh on dense models (Qwen3/LLaMA/Gemma):

- prefill skipped the aux RDMA write when state_indices was empty
- decode set expects_state=True for empty (non-None) state_indices, so
  is_done() waited forever for a state notif prefill never sent

Bumps the PyPI pin and the cu129/cu130 runtime image tags. Verified
disagg.sh end-to-end on 2x L40S: request returns tokens, no NIXL
KVReceiver timeout.
@ishandhanani ishandhanani marked this pull request as ready for review May 25, 2026 06:10
@ishandhanani
Copy link
Copy Markdown
Contributor Author

Bumped the pin from 0.5.120.5.12.post1 (PyPI wheel + cu129/cu130 runtime image tags).

0.5.12.post1 cherry-picks upstream #25699 (via #25731) onto release/v0.5.12, fixing the two NIXL PD-disaggregation regressions from #24932 that hung disagg.sh on dense models (Qwen3/LLaMA/Gemma):

  • prefill skipped the aux RDMA write when state_indices was empty
  • decode set expects_state=True for an empty (non-None) state_indices, so is_done() waited forever for a state notif prefill never sent

Validated examples/backends/sglang/launch/disagg.sh end-to-end on 2x L40S with the post1 wheel: request returns tokens in ~5s, no NIXL KVReceiver timeout, clean prefill→decode KV transfer.

@ishandhanani ishandhanani requested review from a team as code owners May 25, 2026 06:10
@github-actions
Copy link
Copy Markdown
Contributor

@ishandhanani ishandhanani changed the title chore(sglang): bump to 0.5.12 chore(sglang): bump to 0.5.12.post1 May 25, 2026
@ishandhanani ishandhanani enabled auto-merge (squash) May 25, 2026 06:14
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

Walkthrough

This PR enables DeepSeek-V4 reasoning tests by removing a skip decorator and correcting the parser parameter format, while simultaneously upgrading SGLang from v0.5.11 to v0.5.12.post1 across all dependency declarations and Docker configurations.

Changes

SGLang v0.5.12.post1 Upgrade and Test Fixes

Layer / File(s) Summary
DeepSeek-V4 Test Fixes
components/src/dynamo/frontend/tests/test_sglang_processor_unit.py
Removes the pytest.mark.skip decorator from the DeepSeek-V4 encoder-path test, allowing it to execute. Corrects reasoning_parser_name parameter from "deepseek_v4" (underscore) to "deepseek-v4" (hyphen) in both the encoder-path test and the named tool-choice filtering test.
SGLang Version Upgrade Across Dependencies and Configs
pyproject.toml, container/context.yaml, container/compliance/README.md, container/templates/sglang_runtime.Dockerfile
Updates the sglang[diffusion] dependency from 0.5.11 to 0.5.12.post1 in pyproject.toml, with a note that this version drops the accelerate dependency (kept separately at >=0.17.0). Updates Docker runtime image tags for CUDA 12.9 and 13.0 variants from v0.5.11 to v0.5.12.post1 in context.yaml. Refreshes base image references in the compliance documentation README. Updates the sglang_runtime.Dockerfile comment to reference the new version.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: bumping SGLang from 0.5.11 to 0.5.12.post1 across the codebase, which is the primary objective of this PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively covers all required template sections with detailed context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ishandhanani ishandhanani merged commit 2af67e8 into main May 25, 2026
99 checks passed
@ishandhanani ishandhanani deleted the idhanani/sgl-to-0.5.12 branch May 25, 2026 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore container documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants