[Spec] Route seq_lens through FutureMap; drop verify_done.wait#25879
Merged
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the prepare_for_extend_to_fill_draft_kvcache function in eagle_info_v2.py to materialize sequence lengths from the GPU to the CPU using a single transfer. This change calculates seq_lens_cpu, seq_lens_sum, and prefix_lens directly from the materialized data, which removes the previous dependency on refresh_seq_lens_cpu and simplifies the batch metadata updates. I have no feedback to provide as there were no review comments.
Collaborator
Author
|
/tag-and-rerun-ci extra |
dea0b25 to
d3331c9
Compare
d3331c9 to
512d164
Compare
hnyls2002
added a commit
that referenced
this pull request
May 21, 2026
SGLANG_SPEC_V2_NO_VERIFY_SYNC=ON fully skips the remaining D2H sync on top of #25879's FutureMap design: - scheduler.py: gate FutureMap.resolve_seq_lens_cpu on the env so batch.seq_lens_cpu stays None across the schedule prep - eagle_info_v2.prepare_for_extend_to_fill_draft_kvcache: add gpu_only branch (triggered when batch.seq_lens_cpu is None) that produces extend_lens / prefix_lens as device tensors directly, avoiding .tolist() + later H2D inside ForwardBatch.init_new - forward_batch_info.init_new: tolerate None seq_lens_cpu and accept Tensor extend_seq_lens / extend_prefix_lens unchanged - eagle_worker_v2 / multi_layer_eagle_worker_v2: lazily compute seq_lens_sum just before build_tree_kernel_efficient when no preallocated mask buf forces the value
hnyls2002
added a commit
that referenced
this pull request
May 21, 2026
fzyzcjy
added a commit
to fzyzcjy/sglang
that referenced
this pull request
May 25, 2026
schedule_batch.py: drop self.maybe_wait_verify_done() call in merge_batch — upstream removed verify_done.wait via FutureMap routing (sgl-project#25879); keep our branch's assert against chunked/dllm reqs in other.reqs. test/registered/unit/managers/test_scheduler_chunked_req_gate.py: keep HEAD's deletion (v1 gate removed in v2); upstream's array.array migration is moot since the file goes away.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
verify_done.wait()cross-stream barrier — routeseq_lensthroughFutureMapso schedule-stream consumers gate on a forward-streampublish_readyevent insteadFutureMapbuf writes by consumer:publish(schedule-consumednew_seq_lens_buf, fence-gated) vsstash(forward-only fields, FIFO-covered)on_verify_completecallback (fires between sample anddraft_extend), preserving schedule prep /draft_extendoverlapMechanism
Between iters (schedule stream)
batch.seq_lens = -future_indices.indices— schedule-stream sentinelFutureMap.resolve_seq_lens_cpu(batch)pullsseq_lens_cpufromnew_seq_lens_bufvia D2H, gated onpublish_readyInside isolation (forward stream)
resolve_futurereassignsbatch.seq_lensfromnew_seq_lens_buf[indices]on_verify_complete(new_seq_lens)between sample-end anddraft_extend→FutureMap.publishwritesnew_seq_lens_buf+ recordspublish_readydraft_extend,FutureMap.stashwrites forward-only fields (topk / hidden / bonus); same-stream FIFO covers the next iter'sresolve_futurereadChanges
FutureMap(overlap_utils.py)publish/stashmethods (replacestore_to_map)resolve_seq_lens_cpufor schedule-stream D2H ofnew_seq_lens_bufnew_seq_lens_bufeager-allocated (fixed shape/dtype); forward-only bufs stay lazypublish_readyevent lives onFutureMap(no per-FutureIndicesevent)Workers (
eagle_worker_v2.py,multi_layer_eagle_worker_v2.py)forward_batch_generationacceptson_verify_completekwargScheduler (
scheduler.py)batch.seq_lens = -future_indices.indicesbetween itersresolve_seq_lens_cpupre-isolation, gated bybatch.is_spec_v2functools.partialto bindfuture_indicesfor the callbackScheduleBatch(schedule_batch.py)refresh_seq_lens_cpuhelper; inlineseq_lens_sum = int(seq_lens_cpu.sum())at call sitesmaybe_wait_verify_done(replaced by the FutureMap fence)EagleDraftInput(eagle_info.py)verify_donefield (fence moved toFutureMap)CI States
Latest PR Test (Base): 🚫 Run #26205726688
Latest PR Test (Extra): ❌ Run #26205726614