Fix PD bootstrap failure handling#24772
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Co-Authored-By: Cheng Wan <chwan@rice.edu>
43a6b02 to
827389c
Compare
|
/tag-and-rerun-ci |
Ported from 48135b2 Co-Authored-By: Cheng Wan <chwan@rice.edu>
| # In PD-prefill mode the cross-engine contract is `bootstrap_room`: | ||
| # the decode-side KV receiver locates the prefill DP rank via | ||
| # `bootstrap_room % prefill_dp_size`. Honoring an externally-set | ||
| # `routed_dp_rank` here breaks that contract whenever the two | ||
| # diverge (e.g., dynamo's KV router picks a rank for load-balance | ||
| # reasons that has no relation to `bootstrap_room`). Fall through | ||
| # to follow_bootstrap_room dispatch to keep prefill ↔ decode aligned. | ||
| if ( | ||
| self.server_args.disaggregation_mode == "prefill" | ||
| and req.bootstrap_room is not None | ||
| ): | ||
| return False |
There was a problem hiding this comment.
Not sure about this. If routed_dp_rank is not None, we should respect it?
If this would be assigned when the strategy is follow_bootstrap_room, then maybe we should add this in the condition as well, something like:
| # In PD-prefill mode the cross-engine contract is `bootstrap_room`: | |
| # the decode-side KV receiver locates the prefill DP rank via | |
| # `bootstrap_room % prefill_dp_size`. Honoring an externally-set | |
| # `routed_dp_rank` here breaks that contract whenever the two | |
| # diverge (e.g., dynamo's KV router picks a rank for load-balance | |
| # reasons that has no relation to `bootstrap_room`). Fall through | |
| # to follow_bootstrap_room dispatch to keep prefill ↔ decode aligned. | |
| if ( | |
| self.server_args.disaggregation_mode == "prefill" | |
| and req.bootstrap_room is not None | |
| ): | |
| return False | |
| # In PD-prefill mode the cross-engine contract is `bootstrap_room`: | |
| # the decode-side KV receiver locates the prefill DP rank via | |
| # `bootstrap_room % prefill_dp_size`. Honoring an externally-set | |
| # `routed_dp_rank` here breaks that contract whenever the two | |
| # diverge (e.g., dynamo's KV router picks a rank for load-balance | |
| # reasons that has no relation to `bootstrap_room`). Fall through | |
| # to follow_bootstrap_room dispatch to keep prefill ↔ decode aligned. | |
| if ( | |
| self.server_args.disaggregation_mode == "prefill" | |
| and req.bootstrap_room is not None | |
| and self.load_balance_method = "follow_bootstrap_room" | |
| ): | |
| return False |
I am not sure about this, should ping liangsheng for this.
There was a problem hiding this comment.
Thanks for the review! You're right — this is already handled better on main via #23882, so I've reverted it.
ShangmingCai
left a comment
There was a problem hiding this comment.
The common backend modification looks good.
This reverts commit 702e0c5.
|
/rerun-stage stage-c-test-8-gpu-h20 |
|
✅ Triggered |
|
/rerun-test test/registered/disaggregation/test_disaggregation_basic.py |
|
✅ |
|
No need to run full CI, we only need these two, which should be enough. |
* main: (87 commits) [Fix] Disable FlashInfer allreduce fusion under deterministic inference (sgl-project#24629) fix: STANDALONE spec-decode hidden-size mismatch crash (sgl-project#24217) Followup fix for Custom AR V2 in non NVL scenarios (sgl-project#24742) Fix reduce_scatterv producer contract for SUM_LEN (sgl-project#24785) [NPU]Documentation update for communications quantization feature (sgl-project#24668) [Session R3] Add routed_experts_start_len for absolute routing slice control (sgl-project#24851) [Model] Add MiniCPM-V 4.6 support (sgl-project#24855) Support Intern-S2-Preview (sgl-project#24875) [PD] Unify dsv4 dispatch with swa (sgl-project#24888) Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (sgl-project#24775) Fix PD bootstrap failure handling (sgl-project#24772) [Spec] Cleanup idle stub and shape-check patterns (sgl-project#24881) [Bug] Add dsv4 state_type branch to mooncake disaggregation (sgl-project#24878) [Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (sgl-project#24859) [Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (sgl-project#24696) [spec decoding] support kimi-k2.5-eagle3-mla (sgl-project#24826) [SPEC V2] fix: skip stale state updates in spec-v2 overlap (sgl-project#23456) [RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (sgl-project#24854) [diffusion] CI: add cache-dit CI tests (sgl-project#19213) [Utils] Make request dump robust to unpicklable server_args and large meta_info (sgl-project#24767) ... # Conflicts: # python/sglang/srt/utils/common.py
Summary
self.bootstrap_infos = Noneon bootstrap info fetch failure so downstream code hits the None-check instead ofAttributeErrorupdate_status(WaitingForInput)when_setup_bootstrap_infosalready marked the request asFailedPorted from 48135b2
Test plan