refactor: rename FutureMap to Relayer#24823
Closed
hnyls2002 wants to merge 92 commits into
Closed
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
a4c70ca to
c8ab107
Compare
564fd3b to
9d5ff08
Compare
9d5ff08 to
7e1cd0e
Compare
….model_runner.req_to_token_pool instead of batch.req_to_token_pool
…s interval-straddles-wrap IndexError at slot reuse
…nnel resolve reserved for cross-stream consumers
# Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/managers/scheduler_output_processor_mixin.py
…writes back to SB
…em on sampling_info
…rips lockstep assert
… drift across merge/filter on running_batch
…precedes check_finished
…ap caused filter_batch to drop reqs without cache_finished_req, leaking KV
After retract -> reset_for_retract clears _relayer_kv_committed_ctx and zeros kv_committed_len. The next process_batch_result iter re-binds ctx on all batch.reqs including retracted ones with baseline=0 + delta=0 (retracted branch stores 0 in the cpu_value slot). On the next prefill of the retracted req, StreamingSession.restore_to_req sets the attribute correctly, but _free_tail uses relayer_resolve_kv_committed_len which still returns the stale ctx (0+0=0), trimming kv_committed_len to 0 and tripping the alloc 'reusing must have committed KV' assert. Fix: skip ctx rebind for retracted reqs in _resolve_spec_overlap_tokens. Their channel slot data is unused; ctx stays None until next bind.
…y-intv store skip)
…ad-only consumption
PR-7/8 (Schedule + forward producers both store to Relayer; SB / FD only hold handles, not raw tensor refs) is not in place yet, so: - spec V2 verify mid-forward rebind (FD.input_ids = predict; then = draft_token; rebind out_cache_loc) drops FD's only ref to the original tensor while fwd_stream still reads it. - Relayer.resolve_draft_input_from_channel replaces spec_info on the forward stream while the old spec_info's future_indices is still in use, losing its only Python ref. add_iter_pin(FD) preserves the FD object but not tensors that FD itself has rebound away from. Restore record_stream defenses until PR-7/8 routes these through Relayer handles.
PR moved kv_committed_delta to a Relayer cpu_value channel with the
intent of letting next bind_relayer_for_iter promote the delta
into req.kv_committed_len once per iter. But many schedule-side
consumers (filter_batch, retract path, mamba_radix_cache_finished,
update_running_batch's check_decode_mem, ...) read the attribute
directly without going through relayer_resolve_kv_committed_len.
In the same iter where _resolve_spec_overlap_tokens runs, these
readers saw the stale iter-start baseline (delta not yet promoted),
producing wrong KV/seq_len accounting and a measurable spec V2
accuracy drop:
test_eagle_infer_beta gsm8k:
main : score=0.762 (latency 39s)
PR : score=0.687 (latency 73s) -- accept_len 1.36 vs main 1.77
fix : score=0.759 (latency 43s)
Apply main's update path: in _resolve_spec_overlap_tokens, mutate
req.kv_committed_len in place (+= accept_lens[i] - 1 for normal,
-= 1 for finished bonus pre-claim, 0 for retracted). Channel
store_kv_committed_delta is kept for any out-of-iter consumer; ctx
rebind is dropped since the attribute is already authoritative.
# Conflicts: # python/sglang/srt/disaggregation/decode.py # python/sglang/srt/managers/schedule_batch.py # python/sglang/srt/mem_cache/memory_pool.py
PR renamed Scheduler.future_map to Scheduler.relayer; mainline test fixture still set the old attribute name, so the new code path in get_new_prebuilt_batch (process_prebuilt reads self.relayer) tripped AttributeError on the mock.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rename
FutureMapclass (andfuture_map/create_future_mapreferences) toRelayer/relayer/create_relayer. Pure rename, no behavior change. Sets up Relayer as the named home for cross-iter relay channels; subsequent work can add channels for CPU per-req values and deferred actions behind the same alloc/store/resolve API.CI States
Latest PR Test (Base): ❌ Run #26071755588⚠️ Not enabled -- add
Latest PR Test (Extra):
run-ci-extralabel to opt in.