Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
3d17c4a
initial staging buffer
YAMY1234 Mar 4, 2026
6299e2a
fix(staging): add RDMA ordering barrier before scatter
YAMY1234 Mar 5, 2026
d242bde
ring buffer optimization with over commit
YAMY1234 Mar 6, 2026
f2cdab7
revert dynamo int64 workaround for bootstrap_room
YAMY1234 Mar 6, 2026
7d71474
watermark fix
YAMY1234 Mar 7, 2026
05ee729
chunk prefill support
YAMY1234 Mar 7, 2026
c088ec9
gqa fix
YAMY1234 Mar 7, 2026
c5147f9
kvhead offset fix
YAMY1234 Mar 7, 2026
27cbb8d
long ctx fix
YAMY1234 Mar 8, 2026
22f4697
chunk prefill concurrency fix
YAMY1234 Mar 8, 2026
2fdcff5
code refine
YAMY1234 Mar 8, 2026
d844dcb
format
YAMY1234 Mar 8, 2026
5ba8b67
draft fix for dep2
YAMY1234 Mar 9, 2026
9ce4ca8
staging comp
YAMY1234 Mar 9, 2026
dd2f0bb
war
YAMY1234 Mar 9, 2026
97015a0
staging comp2
YAMY1234 Mar 9, 2026
70720eb
watermark deadlock fix
YAMY1234 Mar 9, 2026
41dd9e3
recover staging view fix
YAMY1234 Mar 9, 2026
55bc263
ring buffer debugging
YAMY1234 Mar 10, 2026
9bd7f69
small fixes
YAMY1234 Mar 12, 2026
b2ea68f
on-demand alloc and debug prints
YAMY1234 Mar 12, 2026
1f97535
logging and debugging optimize
YAMY1234 Mar 12, 2026
a2b562a
allocate after prefill start instead of pre-alloc
YAMY1234 Mar 12, 2026
dcdd34e
all fixes in with dep4 and dep2
YAMY1234 Mar 14, 2026
a98abcb
format
YAMY1234 Mar 14, 2026
308228e
debugging clean up
YAMY1234 Mar 14, 2026
3431811
remove dead code
YAMY1234 Mar 14, 2026
807f310
refraction
YAMY1234 Mar 14, 2026
b0a6507
timing fix
YAMY1234 Mar 14, 2026
b39a91a
refine datastructure & handling
YAMY1234 Mar 14, 2026
0bd1a3f
furthur refinement
YAMY1234 Mar 14, 2026
d760362
unused _pending_chunk_count & code quality
YAMY1234 Mar 14, 2026
80ae05f
format & naming
YAMY1234 Mar 14, 2026
e766b9c
decode logic optimize
YAMY1234 Mar 14, 2026
b2c4c53
recovery & cleanup
YAMY1234 Mar 14, 2026
4b16885
small naming optimize
YAMY1234 Mar 14, 2026
c84554b
refract & format
YAMY1234 Mar 15, 2026
345d371
file name and remove war
YAMY1234 Mar 15, 2026
8a6b74a
rename staging as staging_buffer
YAMY1234 Mar 15, 2026
f6a6e84
isolate mamba slice fix
YAMY1234 Mar 16, 2026
98a030c
gather kernel for staging buffer
YAMY1234 Mar 17, 2026
3669d72
decode scatter triton kernel
YAMY1234 Mar 18, 2026
47d7869
isolate scatter to decode thread and cleanup
YAMY1234 Mar 20, 2026
8a41c82
format & triton env
YAMY1234 Mar 21, 2026
048b63c
Merge branch 'main' into staging_buffer
YAMY1234 Mar 23, 2026
87fb6d9
resolve code review and naming
YAMY1234 Mar 26, 2026
296ecb6
use enable_staging to check if possible
YAMY1234 Mar 26, 2026
f9824bd
resolve code review: replace getattr hacks with direct env var, clean…
YAMY1234 Mar 26, 2026
cc28a82
support hetero prefill instances with staging & dead path cleanup
YAMY1234 Mar 28, 2026
d6f5846
put all_reduce_with_staging in disaggregation/utils.py
YAMY1234 Mar 28, 2026
3d32443
Merge upstream main into staging_buffer
YAMY1234 Mar 31, 2026
8469d2b
Move prefill_attn_tp_size and require_staging into CommonKVReceiver
YAMY1234 Mar 31, 2026
4980bb3
remove all staging_handler none checks
YAMY1234 Mar 31, 2026
ea12a60
Merge branch 'main' into staging_buffer
ShangmingCai Mar 31, 2026
8678533
better naming & small adjustment
YAMY1234 Mar 31, 2026
a32b4fb
Merge origin/staging_buffer into staging_buffer
YAMY1234 Mar 31, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions python/sglang/srt/disaggregation/common/conn.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ def __init__(
# These timeout requests should be aborted to release the tree cache.
self.bootstrap_timeout = envs.SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT.get()
elif self.disaggregation_mode == DisaggregationMode.DECODE:
self.enable_staging: bool = False
self.connection_pool: Dict[str, Dict[str, Union[str, int]]] = {}
self.connection_lock = threading.Lock()
self.required_prefill_response_num_table: Dict[int, int] = {}
Expand Down Expand Up @@ -501,6 +502,7 @@ def __init__(
self.bootstrap_addr = bootstrap_addr
self.kv_mgr = mgr
self.conclude_state: Optional[KVPoll] = None
self.require_staging: bool = False
self.kv_mgr.addr_to_rooms_tracker[self.bootstrap_addr].add(self.bootstrap_room)
self.kv_mgr.update_status(self.bootstrap_room, KVPoll.Bootstrapping)

Expand Down Expand Up @@ -529,6 +531,12 @@ def init(self, prefill_dp_rank: int):
self.required_prefill_response_num
)

if self.kv_mgr.enable_staging:
self.require_staging = (
self.prefill_info.attn_tp_size != 0
and self.prefill_info.attn_tp_size != self.kv_mgr.attn_tp_size
)

self.prefill_dp_rank = prefill_dp_rank
self._setup_bootstrap_infos()
self.kv_mgr.update_status(self.bootstrap_room, KVPoll.WaitingForInput)
Expand Down
Loading
Loading