chore(sglang): bump to 0.5.12.post1#9677
Conversation
- pyproject pin sglang[diffusion]==0.5.11 → 0.5.12
- container runtime tags v0.5.11-{cu129,cu130}-runtime → v0.5.12 for both
context.yaml and compliance/README.md
- container template comment refreshed to reference 0.5.12 server_args path
for the openai/distro workaround
- unskip 3 deepseek_v4 tests (DYN-3049): the speculative dispatch path now
exists upstream in 0.5.12; tests use the canonical hyphenated
reasoning_parser_name 'deepseek-v4' to match SGLang's ReasoningParser
DetectorMap
Launch-script walk on 2x L40S: 8/9 PASS, 1 FAIL.
disagg.sh trips a NIXL KVReceiver waiting_timeout on the decode side —
upstream regression in the v0.5.12 NIXL refactor (#23967, #22536), not in
dynamo code. Same script PASSED on the 0.5.11 bump in this env. Flagging
for follow-up; agg, agg_embed, agg_router, agg_vision, diffusion_llada,
image_diffusion, text-to-video-diffusion, and multimodal_epd all pass.
SGLang's [diffusion] extra dropped accelerate in 0.5.12 (it now lives only in their [test] extra), but diffusers still requires it for enable_model_cpu_offload(). Without it, image_diffusion workers crash at startup with ImportError. Pin explicitly so dynamo's sglang extra stays self-contained regardless of how upstream restructures its extras.
5484d1d to
62521d8
Compare
….12 base-image table update # Conflicts: # container/compliance/README.md
0.5.12.post1 cherry-picks upstream #25699 (#25731) onto release/v0.5.12, fixing two NIXL PD-disaggregation regressions from #24932 that hung disagg.sh on dense models (Qwen3/LLaMA/Gemma): - prefill skipped the aux RDMA write when state_indices was empty - decode set expects_state=True for empty (non-None) state_indices, so is_done() waited forever for a state notif prefill never sent Bumps the PyPI pin and the cu129/cu130 runtime image tags. Verified disagg.sh end-to-end on 2x L40S: request returns tokens, no NIXL KVReceiver timeout.
|
Bumped the pin from
Validated |
WalkthroughThis PR enables DeepSeek-V4 reasoning tests by removing a skip decorator and correcting the parser parameter format, while simultaneously upgrading SGLang from v0.5.11 to v0.5.12.post1 across all dependency declarations and Docker configurations. ChangesSGLang v0.5.12.post1 Upgrade and Test Fixes
🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Bump dynamo's SGLang backend from 0.5.11 → 0.5.12.post1.
0.5.12.post1is a patch release that cherry-picks upstream sgl-project/sglang#25699 (via #25731) ontorelease/v0.5.12, fixing the NIXL PD-disaggregation regression that hungdisagg.shon dense models. See the disagg section below. With post1 the bump is clean across all testable launch scripts — no downstream workaround needed.pyproject.toml:sglang[diffusion]==0.5.11→0.5.12.post1; also pinaccelerate>=0.17.0in the[sglang]extra since SGLang's[diffusion]extra dropped it in 0.5.12 and diffusers still requires it forenable_model_cpu_offload.container/context.yaml: runtime image tagsv0.5.11-{cu129,cu130}-runtime→v0.5.12.post1-{cu129,cu130}-runtime(both variants verified on Docker Hub before bumping)container/compliance/README.md: base-image reference table rowscontainer/templates/sglang_runtime.Dockerfile: refreshed the version reference in theopenai/distroworkaround commentdeepseek_v4unit tests infrontend/tests/test_sglang_processor_unit.pythat were gated on the 0.5.12 dispatch path landing (DYN-3049). Switchedreasoning_parser_namefrom"deepseek_v4"→"deepseek-v4"(the canonical key insglang.srt.parser.reasoning_parser.ReasoningParser.DetectorMapin 0.5.12)._compat.py: no changes. The current shim is signature-/hasattr-probing only with no version-tagged fallback branches, so the N=0.5.12 / N-1=0.5.11 deprecation pass (skill Step 10) is a no-op.Launch-script walk (2x NVIDIA L40S, venv
.venv-sgl-0.5.12)agg.shagg_embed.shagg_router.shagg_vision.shdisagg.sh0.5.12.post1v0.5.12(upstream NIXL regressiond7f4761a4); fixed in post1. Re-verified end-to-end on post1 wheel — see below.diffusion_llada.shimage_diffusion.shtext-to-video-diffusion.shmultimodal_epd.sh--model Qwen/Qwen3-VL-2B-Instruct --chat-template qwen2-vlsince the default Qwen2.5-VL-7B isn't cached on this boxdisagg_router.shmultimodal_disagg.shdisagg_same_gpu.sh9/9 testable PASS, 3 SKIP for GPU count + optional.
disagg.shNIXL regression — fixed in 0.5.12.post1On plain
v0.5.12,disagg.shhung forever for NIXL disagg on dense LLMs (Qwen3, LLaMA, Gemma, …). Investigated under the debug-session skill; full worklog:disagg-sglang-0512-nixl.md(kept out of the PR diff). Bisected tosgl-project/sglang@d7f4761a4([PD] Refactor hybrid state transfer (#24932)), which introduced two asymmetries:NixlKVManager.transfer_workergated the aux RDMA write insideif kv_chunk.is_last and kv_chunk.state_indices:. For dense modelsstate_indicesis[](falsy), so the whole branch short-circuited andsend_auxwas never called — decode never received the{room}_auxnotification.NixlKVReceiver.send_metadatasetexpects_state=Truewheneverstate_indices is not None— but decode receivesstate_indices=[](non-None, empty), soexpects_stateflipped on. Prefill uses a truthy check and (correctly) never sent a state notif for dense models, sois_done()waited forever.Either fix alone is insufficient; both are required. Both landed upstream in #25699 (split the aux send out of the state-gated branch; match the decode-side truthy check), cherry-picked onto
release/v0.5.12as #25731 and shipped in0.5.12.post1. This PR pins to post1 rather than vendoring a downstream monkey-patch or imagesed.Mooncake was unaffected — it gates state and aux on independent
ifblocks and has noexpects_statefield.Env dependency change:
acceleratesglang[diffusion]no longer pulls inacceleratein 0.5.12 (it now only appears in thetestextra). Without it,image_diffusion.shcrashes at startup:The container template at
container/templates/sglang_runtime.Dockerfile:57-60already pinsaccelerate==1.13.0defensively, so the runtime image was unaffected. To make fresh local venvs work too, this PR addsaccelerate>=0.17.0to the[sglang]extra in dynamo'spyproject.tomlwith a one-line comment.Pre-flight env vars (unchanged from 0.5.11)
SGLANG_DISABLE_CUDNN_CHECK=1still required for any vision / multimodal workerHF_HUB_OFFLINE=1needed when running the gated diffusion models with staleHF_TOKEN; the cached snapshots load fineTest plan
deepseek_v4unit tests pass locally (pytest -k deepseek_v4→ 5/5)disagg.shend-to-end on0.5.12.post1PyPI wheel (2x L40S) — tokens returned in ~5s, no NIXL KVReceiver timeout, clean prefill→decode KV transferv0.5.12.post1uvx pre-commit run --files <touched>cleanSummary by CodeRabbit
Chores
Tests