-
Notifications
You must be signed in to change notification settings - Fork 6.4k
[Disagg] Layer-pipelined KV transfer: overlap RDMA with GPU compute #23515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
michael7193
wants to merge
60
commits into
sgl-project:main
Choose a base branch
from
michael7193:feature/layer-pipelined-kv-transfer
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
60 commits
Select commit
Hold shift + click to select a range
8c76d49
feat(disagg): add layer-pipelined KV transfer for PD disaggregation
7e9921d
style: apply black formatting to layer-pipelined KV transfer code
35986c3
feat(disagg): register pipelined KV transfer env vars in environ.py
b8d3b26
feat(disagg): adaptive pipeline group_size based on prompt length
ce6e669
feat(disagg): support different TP in layer-pipelined KV transfer
15da6d1
fix(disagg): disable layer-pipelined mode for Mamba/SWA/NSA models
144c7f5
feat(disagg): support Mamba/SWA/NSA state in layer-pipelined mode
02558c3
feat(disagg): add multimodal fallback guard for layer-pipelined mode
215ef2d
Update python/sglang/srt/disaggregation/mooncake/conn.py
michael7193 1069bb8
feat(disagg): add forward_split_prefill for Qwen3.5 layer-pipelined mode
UNIDY2002 10c4d08
refactor(disagg): refine multimodal guard to allow VL models with for…
4f0a327
fix(models): add missing import and fix formatting in qwen3_5.py
ff8e1a7
test(disagg): add CI tests for layer-pipelined KV transfer
346c1b0
feat(disagg): universal guard + FalconH1 forward_split_prefill
79d504e
fix(disagg): align pipelined result handler with normal path
4a4f0eb
fix(falcon_h1): import LogitsProcessorOutput to fix ruff F821 lint error
955d100
style: fix formatting in falcon_h1.py
michael7193 e47eed2
fix(disagg): correct import path for kv_to_page_indices in scheduler.py
michael7193 823a013
fix(disagg): use BaseSWAKVPool for isinstance check to fix F821 lint
michael7193 184404c
style: use black instead of ruff for falcon_h1.py formatting
michael7193 923f7ea
fix(disagg): add missing imports for HybridLinearKVPool, BaseSWAKVPoo…
michael7193 2e8424a
docs: add MOONCAKE_DEVICE isolation note for pipelined + hicache
michael7193 383b95d
ci: re-trigger CI (flaky H20/H200/AMD/NPU infra failures)
michael7193 cffad0c
fix: adapt split_init to upstream ModelWorkerBatch removal (#25516)
michael7193 3a792e8
style: format single-arg call to one line (black)
michael7193 a2a719e
refactor: address review feedback on layer-pipelined KV transfer
michael7193 10ee204
fix(ci): use correct register_cuda_ci params for pipelined test
michael7193 3c6582d
merge: resolve scheduler.py conflict with upstream component refactor
michael7193 8a20fb4
fix: remove unused time import in kv_pacing.py
michael7193 f16464e
Merge origin/main: resolve conflict in mooncake/conn.py
michael7193 d0008bb
Address ShangmingCai review comments
michael7193 af99f46
Fix lint: remove unused imports, fix line length
michael7193 1f6def3
cleanup: remove stale hisparse code and unrelated formatting changes
michael7193 5e86a71
fix: correct chunked-prefill guard and add transfer metrics recording
michael7193 90d372d
fix: route pipelined path through process_batch_result for full side …
michael7193 f7bf65c
fix: disable overlap schedule when pipelining enabled + add sanity ch…
michael7193 9408ded
Resolve merge conflict: move layer_id/cuda_event to common TransferKV…
michael7193 9b4cc8c
Fix AttributeError: use profiler_manager._profile_batch_predicate
michael7193 f76f39a
Remove undefined set_prefill_run_batch_start_time call
michael7193 6183bf7
Fix StateType.NSA -> StateType.DSA in _prepare_pipelined_state_indices
michael7193 2bc9e69
feat(pipelined): replace step-function with continuous adaptive formula
michael7193 13dd540
docs: improve pipeline formula comments with precise pipeline model
michael7193 2a9ec22
fix(pipelined): swap min/max iters if user misconfigures
michael7193 397d8ef
fix: remove hardcoded default group_size=10 and fix NSA→DSA comment
michael7193 f06bf37
fix(pipelined): set forward_context in forward_split_prefill
michael7193 8eaa3df
docs: clarify GROUP_SIZE default and fix formula comment in environ.py
michael7193 05e5826
docs: move env var docs from docs/ to docs_new/ (fix lint)
michael7193 375ae8d
fix(pipelined): harden _get_pipeline_group_size for edge cases
michael7193 34225bc
fix(pipelined): add guards for DP attention, EAGLE, and input_embeds
michael7193 da1c019
fix(pipelined): add EPLB guard to prevent lost routing statistics
michael7193 e235b92
[Disagg] Add empty batch guard in _get_pipeline_group_size
michael7193 de425de
fix(pipelined): add transfer early-abort and configurable saturation …
michael7193 1286256
fix(pipelined): guard against division-by-zero when SAT_MULTIPLIER <=…
michael7193 4e012a6
fix(pipelined): handle transfer fallback edge cases
michael7193 090ebc7
Merge origin/main into pipelined KV transfer PR
michael7193 6707f10
fix(pipelined): finalize metadata after layer transfers
michael7193 5f1dee2
fix(pipelined): materialize inputs before split prefill
michael7193 0d9b628
fix(pipelined): align edge-case semantics
michael7193 5cbeca7
fix(pipelined): use split-prefill capability guard
michael7193 c59d819
fix(pipelined): address split prefill review notes
michael7193 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to implement this for the common backend (pass) as well, and might need to check if it is mooncake backend in server args for this feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also add a send_layer with NotImplementedError for the base backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a2a719e:
send_layer()withNotImplementedErrortoBaseKVSenderso unsupported backends fail loudly instead of silently breaking._get_pipeline_group_size— pipelining now only activates forMOONCAKEandFAKEbackends. Other backends (nixl, mori, ascend) safely fall back to the normal transfer path.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done —
BaseKVSender.send_layer()now raisesNotImplementedErrorby default. See a2a719e.