Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
7e1cd0e
rename FutureMap to Relayer
hnyls2002 May 18, 2026
3a73471
relayer: introduce 5-channel kit with named relay methods
hnyls2002 May 18, 2026
91d300c
relayer: add cpu_future_indices to GenerationBatchResult
hnyls2002 May 18, 2026
4a926de
relayer: alloc cpu_value slots alongside gpu future indices in run_batch
hnyls2002 May 18, 2026
5b622ff
relayer: store kv_committed_delta and finished status to cpu_value ch…
hnyls2002 May 18, 2026
560c484
schedule_batch: annotate seq_lens and verify_done relay migration paths
hnyls2002 May 18, 2026
33a69a0
scheduler: add cross-stream barrier helper and SB lockstep assertion
hnyls2002 May 18, 2026
c739559
relayer: cross-stream sync via cuda event in gpu_scalar channel
hnyls2002 May 18, 2026
dfa2476
schedule_batch: replace verify_done CPU sync with stream-level wait
hnyls2002 May 18, 2026
0476942
relayer: route kv_committed_delta through cpu_value channel resolve
hnyls2002 May 18, 2026
ccc8a31
relayer: filter_batch reads finished status from cpu_value channel
hnyls2002 May 18, 2026
6e5a699
relayer: store post-decode seq_lens family to gpu_scalar channel
hnyls2002 May 18, 2026
dc200dc
relayer: stash sampling_info via state_obj channel during overlap for…
hnyls2002 May 18, 2026
2995268
relayer: wire assert_lockstep at filter_batch exit under env flag
hnyls2002 May 18, 2026
7b0d3da
scheduler: delete SB snapshot/restore in _overlap_forward_isolation
hnyls2002 May 18, 2026
7837db2
relayer: SB auto-mirrors seq_lens family to channel via __setattr__
hnyls2002 May 18, 2026
ce2b2c9
forward_batch_info: read seq_lens family via Relayer channel resolve
hnyls2002 May 18, 2026
b0c4595
schedule_batch: Req routes kv_committed_len reads through relayer cpu…
hnyls2002 May 18, 2026
9b350fb
scheduler: factor spec V2 relay output application into _apply_spec_v…
hnyls2002 May 18, 2026
9ecbee0
relayer: take ownership of 2-iter Python ref pin ring (legacy batch_r…
hnyls2002 May 18, 2026
1ca2d00
sampling_info: rename copy_for_forward to derive_forward_view; state …
hnyls2002 May 18, 2026
0ccd185
schedule_batch: wait on relayer producer event before maybe_evict_swa
hnyls2002 May 18, 2026
23e5350
schedule_batch: assert_lockstep at merge_batch and prepare_for_decode…
hnyls2002 May 18, 2026
9d33c7b
disagg decode: read kv_committed_len via Relayer channel resolve in c…
hnyls2002 May 18, 2026
ddf2879
scheduler: route worker keep-alive refs through Relayer iter pin
hnyls2002 May 18, 2026
95d94c4
forward_batch: carry Relayer ctx + add relayer_resolve_* helpers for …
hnyls2002 May 18, 2026
2f675f3
mem_cache: read seq_lens family via Relayer channel resolve in alloc_…
hnyls2002 May 18, 2026
257e38d
mem_cache: read seq_lens via Relayer channel resolve in alloc_for_decode
hnyls2002 May 18, 2026
287c456
scheduler: bs derived via channel-resolved seq_lens in overlap run_batch
hnyls2002 May 18, 2026
d5fd614
scheduler_output_processor_mixin: attach Req relayer kv_committed ctx…
hnyls2002 May 18, 2026
f093de7
eagle_draft_cuda_graph_runner: read seq_lens via Relayer channel for …
hnyls2002 May 18, 2026
59a3bde
spec cuda graph runners: read seq_lens via Relayer channel resolve in…
hnyls2002 May 18, 2026
0a23dc3
cuda_graph_runner: read seq_lens via Relayer channel resolve in repla…
hnyls2002 May 18, 2026
6c8837f
scheduler: bind Relayer ctx before chunked-prefill prepare_for_decode
hnyls2002 May 18, 2026
de8b0cd
eagle_info_v2: annotate draft-extend SB mutations as Relayer-mirror s…
hnyls2002 May 18, 2026
b9c4380
disagg decode: route kv_committed_len reads through Relayer in retrac…
hnyls2002 May 18, 2026
1ed0f8f
refactor: add ForwardData snapshot type
hnyls2002 May 16, 2026
4690ffd
schedule_batch: add to_forward_data() snapshot constructor returning …
hnyls2002 May 18, 2026
d5b57f5
scheduler+tp_worker: forward path consumes ForwardData snapshot inste…
hnyls2002 May 18, 2026
ade0962
scheduler: non-overlap forward path also builds ForwardData snapshot …
hnyls2002 May 18, 2026
0e618eb
eagle_worker_v2: forward_batch_generation accepts ScheduleBatch | For…
hnyls2002 May 18, 2026
f63b3e0
spec workers: forward_batch_generation accepts ScheduleBatch | Forwar…
hnyls2002 May 18, 2026
8f6e7bb
scheduler+disagg: annotate batch_record_buf as legacy alias; clear re…
hnyls2002 May 18, 2026
40c3b65
overlap_utils: add module-level docstring documenting Relayer mechani…
hnyls2002 May 18, 2026
5f250fd
kv_committed reuse checks: route through Relayer channel resolve
hnyls2002 May 18, 2026
09302e5
overlap_utils: add docstrings to alloc_future_indices and is_empty_slice
hnyls2002 May 18, 2026
593c849
overlap_utils: docstring for FutureIndices explaining slot handle sem…
hnyls2002 May 18, 2026
a146af5
overlap_utils: trim store_to_map docstring to concise
hnyls2002 May 18, 2026
6acf102
schedule_batch: prepare_for_decode now also returns ForwardData snapshot
hnyls2002 May 18, 2026
c744726
schedule_batch: prepare_for_extend returns ForwardData + assert_locks…
hnyls2002 May 18, 2026
b693135
eagle_worker_v2: annotate mid-forward spec_info rebind as Relayer-mirror
hnyls2002 May 18, 2026
f163901
eagle_worker_v2: annotate verify_forward_batch keep-alive as Relayer …
hnyls2002 May 18, 2026
bbdfd13
relayer: drop dead channels; per-buffer producer event
hnyls2002 May 18, 2026
5e1ee80
schedule_batch: drop __setattr__ auto-mirror; bind clears stale ctx; …
hnyls2002 May 18, 2026
1ebcafc
scheduler: batch_record_buf is a Relayer ring property; drop legacy f…
hnyls2002 May 18, 2026
8885c6d
relayer: cpu_value channel is sole source for spec-V2 kv_committed_de…
hnyls2002 May 18, 2026
f2134d4
scheduler: forward-view sampling_info built in to_forward_data; drop …
hnyls2002 May 18, 2026
7ee811a
relayer: pin FD into iter_pin_ring; drop record_stream defenses in sp…
hnyls2002 May 18, 2026
3f67019
forward_batch_info: trim FD docstrings; drop FD migration step refs
hnyls2002 May 18, 2026
722d8bd
scheduler: spec V2 draft input resolved at apply boundary; SB carries…
hnyls2002 May 18, 2026
cd2ba14
schedule_batch: drop maybe_wait_verify_done + verify_done event; chan…
hnyls2002 May 18, 2026
5d0ea69
ci fix: FD path consumes capture_hidden_mode/seq_lens_cpu_cache/retur…
hnyls2002 May 18, 2026
d7eb671
schedule_batch: merge_batch clears relayer ctx; old cpu_value slot no…
hnyls2002 May 18, 2026
04d4dea
fd: add batch_size() method on ForwardData; dflash uses target_worker…
hnyls2002 May 18, 2026
74366b4
fd: extend_lens / prefix_lens properties get setters so spec V2 worke…
hnyls2002 May 18, 2026
ccece61
cpu_value channel: wrap at future_limit (mirrors gpu allocator); fixe…
hnyls2002 May 18, 2026
bdcd76a
relayer: slot-level ready guard for resolve fallback; assert_lockstep…
hnyls2002 May 18, 2026
a0868f2
relayer: same-iter schedule consumers read SB attribute directly; cha…
hnyls2002 May 18, 2026
e645bfd
Merge remote-tracking branch 'origin/main' into lsyin/r3-rm-mwb
hnyls2002 May 18, 2026
62c9fba
strict relayer guard: raise on SB volatile attr read from worker stack
hnyls2002 May 18, 2026
0d34341
spec v1 path: FD carries reqs; scheduler propagates worker spec_info …
hnyls2002 May 18, 2026
1c3c6f1
filter_batch: fallback to req.finished() when channel slot empty
hnyls2002 May 18, 2026
f6ba2c6
non-overlap path: revert to direct SB pass; Relayer scope = overlap only
hnyls2002 May 18, 2026
0f1d921
to_forward_data: aggregate per-req grammars so FD path can install th…
hnyls2002 May 18, 2026
ce25ef3
to_forward_data: pass all_extend_in_batch through FD path for downstr…
hnyls2002 May 18, 2026
bf044e4
spec v1 decode: clear SB.output_ids; accept_tokens flat shape != bs t…
hnyls2002 May 18, 2026
a91e206
strict shim: only enforce FD boundary on forward_batch_generation
hnyls2002 May 18, 2026
ce932b1
spec v2: resolve_future also refreshes batch.spec_info; cached fields…
hnyls2002 May 18, 2026
1887b82
Merge origin/main into lsyin/r3-rm-mwb
hnyls2002 May 18, 2026
b934050
filter_batch: OR channel finished + req.finished(); channel snapshot …
hnyls2002 May 18, 2026
beb154f
cpu_value channel: clear slot on alloc; stale finished_status from wr…
hnyls2002 May 18, 2026
8e2aae0
spec V2 relay rebind: defer in delay-sample path until after store_to…
hnyls2002 May 18, 2026
c304bee
DEBUG: dump reusing req state on alloc assert
hnyls2002 May 18, 2026
4a0fb9e
DEBUG: fix tensor truthiness in alloc debug print
hnyls2002 May 18, 2026
ab1db48
spec V2 overlap: skip ctx rebind for retracted reqs
hnyls2002 May 18, 2026
062d883
resolve_draft_input_from_channel: skip on empty indices (mirrors empt…
hnyls2002 May 18, 2026
0f70953
FD: pass reqs/device/token_to_kv_pool_allocator for spec V2 worker re…
hnyls2002 May 18, 2026
59542ac
restore record_stream for spec V2 verify and channel resolve
hnyls2002 May 18, 2026
13075a1
Merge branch 'main' into lsyin/r3-rm-mwb
hnyls2002 May 18, 2026
869488b
spec v2: req.kv_committed_len += delta in-place (main-style)
hnyls2002 May 19, 2026
b5cc280
Merge remote-tracking branch 'origin/main' into lsyin/r3-rm-mwb
hnyls2002 May 19, 2026
46aaf98
test: rename scheduler.future_map mock to relayer
hnyls2002 May 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 21 additions & 4 deletions python/sglang/srt/disaggregation/decode.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,9 @@ def alloc(self, reqs: List["Req"]) -> Optional[List[int]]:
assert (
len(reusing) <= 1
), "only one chunked request may reuse req_pool_idx in a batch"
# ``req.kv_committed_len`` is updated in place by
# _resolve_spec_overlap_tokens (main-style); attribute is the
# post-iter value, no channel resolve needed.
assert all(
reqs[i].inflight_middle_chunks > 0 or reqs[i].kv_committed_len > 0
for i in reusing
Expand Down Expand Up @@ -1216,7 +1219,13 @@ def _pre_alloc(

fill_len = len(req.origin_input_ids) + max(len(req.output_ids) - 1, 0)
req.kv_allocated_len = fill_len
# disagg-init kv_committed_len: this is the schedule-side
# initialization of a freshly-received KV-ready req, not a
# cross-iter relay value, so it remains a direct attribute write.
# The Relayer cpu_value channel only owns the per-iter delta
# produced by process_batch_result.
req.kv_committed_len = fill_len
req.clear_relayer_kv_committed_ctx()

if prefix_len > 0:
self.req_to_token_pool.write(
Expand Down Expand Up @@ -1331,7 +1340,12 @@ def _pre_alloc(
# Truncate fill_ids to kv_committed_len so cache_unfinished_req only
# inserts committed KV into the radix tree. The last output token
# hasn't had KV committed yet (fill_ids is 1 ahead).
req.fill_ids = (req.origin_input_ids + req.output_ids)[: req.kv_committed_len]
# Route through Relayer cpu_value channel when a kv_committed ctx is
# attached to ``req`` (the channel resolve returns ``baseline +
# delta`` from the per-iter store); falls back to the attribute.
req.fill_ids = (req.origin_input_ids + req.output_ids)[
: req.relayer_resolve_kv_committed_len()
]
# Set prefix_indices so downstream consumers (init_next_round_input,
# prepare_for_extend) see the correct prefix length. In the agg path
# this is done inside init_next_round_input, but decode-disagg needs
Expand Down Expand Up @@ -1733,8 +1747,11 @@ def get_new_prebuilt_batch(self: Scheduler) -> Optional[ScheduleBatch]:
req.init_next_round_input(tree_cache)
# Truncate fill_ids to kv_committed_len so cache_unfinished_req
# only sees committed KV (fill_ids includes one uncommitted token).
if req.kv_committed_len is not None:
req.fill_ids = req.fill_ids[: req.kv_committed_len]
# Route through Relayer cpu_value channel when a kv_committed
# ctx is attached to the req; falls back to the attribute.
committed_len = req.relayer_resolve_kv_committed_len()
if committed_len is not None:
req.fill_ids = req.fill_ids[:committed_len]
req.set_extend_input_len(
len(req.fill_ids) - len(req.prefix_indices)
)
Expand All @@ -1760,7 +1777,7 @@ def get_new_prebuilt_batch(self: Scheduler) -> Optional[ScheduleBatch]:

# construct fake completed prefill
new_batch.prepare_for_prebuilt()
new_batch.process_prebuilt(self.server_args, self.future_map)
new_batch.process_prebuilt(self.server_args, self.relayer)

return new_batch

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
logger = logging.getLogger(__name__)

if TYPE_CHECKING:
from sglang.srt.managers.overlap_utils import FutureMap
from sglang.srt.managers.overlap_utils import Relayer
from sglang.srt.managers.schedule_batch import ScheduleBatch
from sglang.srt.server_args import ServerArgs

Expand Down Expand Up @@ -104,7 +104,7 @@ def prepare_for_prebuilt(self: ScheduleBatch):
def process_prebuilt(
self: ScheduleBatch,
server_args: ServerArgs,
future_map: FutureMap,
relayer: Relayer,
):
"""Assign the buffered last input id to schedule batch"""
self.output_ids = []
Expand Down Expand Up @@ -176,10 +176,8 @@ def process_prebuilt(
spec_info.prepare_for_extend(self)
spec_info.capture_hidden_mode = CaptureHiddenMode.LAST
if self.enable_overlap:
spec_info.future_indices = future_map.alloc_future_indices(
spec_info.future_indices = relayer.alloc_future_indices(
len(self.seq_lens)
)
future_map.store_to_map_for_new_batch(
spec_info.future_indices, spec_info
)
relayer.store_to_map_for_new_batch(spec_info.future_indices, spec_info)
self.spec_info = spec_info
8 changes: 8 additions & 0 deletions python/sglang/srt/environ.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,14 @@ class Envs:
SGLANG_TEST_RETRACT_NO_PREFILL_BS = EnvInt(2 ** 31)
SGLANG_ENABLE_STRICT_MEM_CHECK_DURING_BUSY = EnvInt(0)
SGLANG_ENABLE_STRICT_MEM_CHECK_DURING_IDLE = EnvBool(True)
SGLANG_RELAYER_LOCKSTEP_ASSERT = EnvBool(True)
# Strict mode: raise on any ScheduleBatch cross-iter volatile attribute
# read from a worker-stream entry frame. Worker reads must go through
# ForwardData snapshot or Relayer channel resolve. Default-on so the
# "worker never touches live SB" invariant is locked in CI without
# contributors having to remember to flip it; set 0 to opt out if the
# stack-walk cost shows up in profiling.
SGLANG_RELAYER_DEBUG_STRICT = EnvBool(True)

# Scheduler: new token ratio hyperparameters
SGLANG_INIT_NEW_TOKEN_RATIO = EnvFloat(0.7)
Expand Down
Loading
Loading