refactor: rename FutureMap to Relayer by hnyls2002 · Pull Request #24823 · sgl-project/sglang

hnyls2002 · 2026-05-09T10:14:01Z

Rename FutureMap class (and future_map / create_future_map references) to Relayer / relayer / create_relayer. Pure rename, no behavior change. Sets up Relayer as the named home for cross-iter relay channels; subsequent work can add channels for CPU per-req values and deferred actions behind the same alloc/store/resolve API.

CI States

Latest PR Test (Base): ❌ Run #26071755588
Latest PR Test (Extra): ⚠️ Not enabled -- add run-ci-extra label to opt in.

gemini-code-assist · 2026-05-09T10:14:04Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…annel

….model_runner.req_to_token_pool instead of batch.req_to_token_pool

…r can mutate FD

…s interval-straddles-wrap IndexError at slot reuse

… default-on

…nnel resolve reserved for cross-stream consumers

# Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/managers/scheduler_output_processor_mixin.py

…writes back to SB

…em on sampling_info

…eam-fork parity

…rips lockstep assert

… drift across merge/filter on running_batch

…precedes check_finished

…ap caused filter_batch to drop reqs without cache_finished_req, leaking KV

…_map

After retract -> reset_for_retract clears _relayer_kv_committed_ctx and zeros kv_committed_len. The next process_batch_result iter re-binds ctx on all batch.reqs including retracted ones with baseline=0 + delta=0 (retracted branch stores 0 in the cpu_value slot). On the next prefill of the retracted req, StreamingSession.restore_to_req sets the attribute correctly, but _free_tail uses relayer_resolve_kv_committed_len which still returns the stale ctx (0+0=0), trimming kv_committed_len to 0 and tripping the alloc 'reusing must have committed KV' assert. Fix: skip ctx rebind for retracted reqs in _resolve_spec_overlap_tokens. Their channel slot data is unused; ctx stays None until next bind.

…y-intv store skip)

…ad-only consumption

PR-7/8 (Schedule + forward producers both store to Relayer; SB / FD only hold handles, not raw tensor refs) is not in place yet, so: - spec V2 verify mid-forward rebind (FD.input_ids = predict; then = draft_token; rebind out_cache_loc) drops FD's only ref to the original tensor while fwd_stream still reads it. - Relayer.resolve_draft_input_from_channel replaces spec_info on the forward stream while the old spec_info's future_indices is still in use, losing its only Python ref. add_iter_pin(FD) preserves the FD object but not tensors that FD itself has rebound away from. Restore record_stream defenses until PR-7/8 routes these through Relayer handles.

PR moved kv_committed_delta to a Relayer cpu_value channel with the intent of letting next bind_relayer_for_iter promote the delta into req.kv_committed_len once per iter. But many schedule-side consumers (filter_batch, retract path, mamba_radix_cache_finished, update_running_batch's check_decode_mem, ...) read the attribute directly without going through relayer_resolve_kv_committed_len. In the same iter where _resolve_spec_overlap_tokens runs, these readers saw the stale iter-start baseline (delta not yet promoted), producing wrong KV/seq_len accounting and a measurable spec V2 accuracy drop: test_eagle_infer_beta gsm8k: main : score=0.762 (latency 39s) PR : score=0.687 (latency 73s) -- accept_len 1.36 vs main 1.77 fix : score=0.759 (latency 43s) Apply main's update path: in _resolve_spec_overlap_tokens, mutate req.kv_committed_len in place (+= accept_lens[i] - 1 for normal, -= 1 for finished bonus pre-claim, 0 for retracted). Channel store_kv_committed_delta is kept for any out-of-iter consumer; ctx rebind is dropped since the attribute is already authoritative.

# Conflicts: # python/sglang/srt/disaggregation/decode.py # python/sglang/srt/managers/schedule_batch.py # python/sglang/srt/mem_cache/memory_pool.py

PR renamed Scheduler.future_map to Scheduler.relayer; mainline test fixture still set the old attribute name, so the new code path in get_new_prebuilt_batch (process_prebuilt reads self.relayer) tripped AttributeError on the mock.

hnyls2002 force-pushed the lsyin/r3-rm-mwb branch from bfb2929 to 77016d1 Compare May 9, 2026 11:09

hnyls2002 requested review from Fridge003, Qiaolin-Yu, ShangmingCai, Ying1123, ispobock, kpham-sgl, merrymercy, xiezhq-hermann and yeahdongcn as code owners May 9, 2026 11:09

hnyls2002 changed the title ~~Lock ScheduleBatch to ModelWorkerBatch field mapping~~ Remove ModelWorkerBatch May 9, 2026

hnyls2002 added run-ci bypass-fastfail high priority labels May 9, 2026

hnyls2002 force-pushed the lsyin/r3-rm-mwb branch from a4c70ca to c8ab107 Compare May 15, 2026 23:15

hnyls2002 added the run-ci-extra label May 15, 2026

hnyls2002 mentioned this pull request May 16, 2026

verify_done: wait not synchronize #25465

Merged

hnyls2002 force-pushed the lsyin/r3-rm-mwb branch from 564fd3b to 9d5ff08 Compare May 18, 2026 03:02

rename FutureMap to Relayer

7e1cd0e

hnyls2002 force-pushed the lsyin/r3-rm-mwb branch from 9d5ff08 to 7e1cd0e Compare May 18, 2026 06:19

hnyls2002 requested a review from ByronHsu as a code owner May 18, 2026 06:19

hnyls2002 changed the title ~~Remove ModelWorkerBatch~~ refactor: rename FutureMap to Relayer May 18, 2026

hnyls2002 added 7 commits May 17, 2026 23:49

relayer: introduce 5-channel kit with named relay methods

3a73471

relayer: add cpu_future_indices to GenerationBatchResult

91d300c

relayer: alloc cpu_value slots alongside gpu future indices in run_batch

4a926de

relayer: store kv_committed_delta and finished status to cpu_value ch…

5b622ff

…annel

schedule_batch: annotate seq_lens and verify_done relay migration paths

560c484

scheduler: add cross-stream barrier helper and SB lockstep assertion

33a69a0

relayer: cross-stream sync via cuda event in gpu_scalar channel

c739559

hnyls2002 and others added 29 commits May 18, 2026 03:12

fd: add batch_size() method on ForwardData; dflash uses target_worker…

04d4dea

….model_runner.req_to_token_pool instead of batch.req_to_token_pool

fd: extend_lens / prefix_lens properties get setters so spec V2 worke…

74366b4

…r can mutate FD

cpu_value channel: wrap at future_limit (mirrors gpu allocator); fixe…

ccece61

…s interval-straddles-wrap IndexError at slot reuse

relayer: slot-level ready guard for resolve fallback; assert_lockstep…

bdcd76a

… default-on

relayer: same-iter schedule consumers read SB attribute directly; cha…

a0868f2

…nnel resolve reserved for cross-stream consumers

Merge remote-tracking branch 'origin/main' into lsyin/r3-rm-mwb

e645bfd

# Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/managers/scheduler_output_processor_mixin.py

strict relayer guard: raise on SB volatile attr read from worker stack

62c9fba

spec v1 path: FD carries reqs; scheduler propagates worker spec_info …

0d34341

…writes back to SB

filter_batch: fallback to req.finished() when channel slot empty

1c3c6f1

non-overlap path: revert to direct SB pass; Relayer scope = overlap only

f6ba2c6

to_forward_data: aggregate per-req grammars so FD path can install th…

0f1d921

…em on sampling_info

to_forward_data: pass all_extend_in_batch through FD path for downstr…

ce25ef3

…eam-fork parity

spec v1 decode: clear SB.output_ids; accept_tokens flat shape != bs t…

bf044e4

…rips lockstep assert

strict shim: only enforce FD boundary on forward_batch_generation

a91e206

spec v2: resolve_future also refreshes batch.spec_info; cached fields…

ce932b1

… drift across merge/filter on running_batch

Merge origin/main into lsyin/r3-rm-mwb

1887b82

filter_batch: OR channel finished + req.finished(); channel snapshot …

b934050

…precedes check_finished

cpu_value channel: clear slot on alloc; stale finished_status from wr…

beb154f

…ap caused filter_batch to drop reqs without cache_finished_req, leaking KV

spec V2 relay rebind: defer in delay-sample path until after store_to…

8e2aae0

…_map

DEBUG: dump reusing req state on alloc assert

c304bee

DEBUG: fix tensor truthiness in alloc debug print

4a0fb9e

resolve_draft_input_from_channel: skip on empty indices (mirrors empt…

062d883

…y-intv store skip)

FD: pass reqs/device/token_to_kv_pool_allocator for spec V2 worker re…

0f70953

…ad-only consumption

Merge branch 'main' into lsyin/r3-rm-mwb

13075a1

Merge remote-tracking branch 'origin/main' into lsyin/r3-rm-mwb

b5cc280

# Conflicts: # python/sglang/srt/disaggregation/decode.py # python/sglang/srt/managers/schedule_batch.py # python/sglang/srt/mem_cache/memory_pool.py

test: rename scheduler.future_map mock to relayer

46aaf98

PR renamed Scheduler.future_map to Scheduler.relayer; mainline test fixture still set the old attribute name, so the new code path in get_new_prebuilt_batch (process_prebuilt reads self.relayer) tripped AttributeError on the mock.

hnyls2002 closed this Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: rename FutureMap to Relayer#24823

refactor: rename FutureMap to Relayer#24823
hnyls2002 wants to merge 92 commits into
mainfrom
lsyin/r3-rm-mwb

hnyls2002 commented May 9, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hnyls2002 commented May 9, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI States

Uh oh!

gemini-code-assist Bot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hnyls2002 commented May 9, 2026 •

edited by github-actions Bot

Loading