Skip to content

refactor: rename FutureMap to Relayer#24823

Closed
hnyls2002 wants to merge 92 commits into
mainfrom
lsyin/r3-rm-mwb
Closed

refactor: rename FutureMap to Relayer#24823
hnyls2002 wants to merge 92 commits into
mainfrom
lsyin/r3-rm-mwb

Conversation

@hnyls2002

@hnyls2002 hnyls2002 commented May 9, 2026

Copy link
Copy Markdown
Collaborator

Rename FutureMap class (and future_map / create_future_map references) to Relayer / relayer / create_relayer. Pure rename, no behavior change. Sets up Relayer as the named home for cross-iter relay channels; subsequent work can add channels for CPU per-req values and deferred actions behind the same alloc/store/resolve API.


CI States

Latest PR Test (Base): ❌ Run #26071755588
Latest PR Test (Extra): ⚠️ Not enabled -- add run-ci-extra label to opt in.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@hnyls2002 hnyls2002 requested a review from ByronHsu as a code owner May 18, 2026 06:19
@hnyls2002 hnyls2002 changed the title Remove ModelWorkerBatch refactor: rename FutureMap to Relayer May 18, 2026
hnyls2002 and others added 29 commits May 18, 2026 03:12
….model_runner.req_to_token_pool instead of batch.req_to_token_pool
…s interval-straddles-wrap IndexError at slot reuse
…nnel resolve reserved for cross-stream consumers
# Conflicts:
#	python/sglang/srt/managers/scheduler.py
#	python/sglang/srt/managers/scheduler_output_processor_mixin.py
…ap caused filter_batch to drop reqs without cache_finished_req, leaking KV
After retract -> reset_for_retract clears _relayer_kv_committed_ctx and
zeros kv_committed_len. The next process_batch_result iter re-binds ctx
on all batch.reqs including retracted ones with baseline=0 + delta=0
(retracted branch stores 0 in the cpu_value slot). On the next prefill
of the retracted req, StreamingSession.restore_to_req sets the
attribute correctly, but _free_tail uses relayer_resolve_kv_committed_len
which still returns the stale ctx (0+0=0), trimming kv_committed_len to
0 and tripping the alloc 'reusing must have committed KV' assert.

Fix: skip ctx rebind for retracted reqs in _resolve_spec_overlap_tokens.
Their channel slot data is unused; ctx stays None until next bind.
PR-7/8 (Schedule + forward producers both store to Relayer; SB / FD
only hold handles, not raw tensor refs) is not in place yet, so:

 - spec V2 verify mid-forward rebind (FD.input_ids = predict; then
   = draft_token; rebind out_cache_loc) drops FD's only ref to the
   original tensor while fwd_stream still reads it.
 - Relayer.resolve_draft_input_from_channel replaces spec_info on the
   forward stream while the old spec_info's future_indices is still
   in use, losing its only Python ref.

add_iter_pin(FD) preserves the FD object but not tensors that FD
itself has rebound away from. Restore record_stream defenses until
PR-7/8 routes these through Relayer handles.
PR moved kv_committed_delta to a Relayer cpu_value channel with the
intent of letting next bind_relayer_for_iter promote the delta
into req.kv_committed_len once per iter. But many schedule-side
consumers (filter_batch, retract path, mamba_radix_cache_finished,
update_running_batch's check_decode_mem, ...) read the attribute
directly without going through relayer_resolve_kv_committed_len.
In the same iter where _resolve_spec_overlap_tokens runs, these
readers saw the stale iter-start baseline (delta not yet promoted),
producing wrong KV/seq_len accounting and a measurable spec V2
accuracy drop:

  test_eagle_infer_beta gsm8k:
    main  : score=0.762 (latency 39s)
    PR    : score=0.687 (latency 73s) -- accept_len 1.36 vs main 1.77
    fix   : score=0.759 (latency 43s)

Apply main's update path: in _resolve_spec_overlap_tokens, mutate
req.kv_committed_len in place (+= accept_lens[i] - 1 for normal,
-= 1 for finished bonus pre-claim, 0 for retracted). Channel
store_kv_committed_delta is kept for any out-of-iter consumer; ctx
rebind is dropped since the attribute is already authoritative.
# Conflicts:
#	python/sglang/srt/disaggregation/decode.py
#	python/sglang/srt/managers/schedule_batch.py
#	python/sglang/srt/mem_cache/memory_pool.py
PR renamed Scheduler.future_map to Scheduler.relayer; mainline test
fixture still set the old attribute name, so the new code path in
get_new_prebuilt_batch (process_prebuilt reads self.relayer) tripped
AttributeError on the mock.
@hnyls2002 hnyls2002 closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant