Expose Model Parallelism Information #16860
Closed
JD-ETH wants to merge 11 commits intosgl-project:mainfrom
Closed
Expose Model Parallelism Information #16860JD-ETH wants to merge 11 commits intosgl-project:mainfrom
JD-ETH wants to merge 11 commits intosgl-project:mainfrom
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Collaborator
|
/tag-and-rerun-ci |
648bd9d to
14e5c06
Compare
3 tasks
Contributor
Author
|
/rerun-failed-ci |
Collaborator
|
Collaborator
|
/rerun-failed-ci |
Contributor
Author
|
add to run suite |
slin1237
requested changes
Jan 19, 2026
Conflicts resolved: - model_runner.py: keep both RankParallelismConfig and use_symmetric_memory imports - scheduler.py: add parallelism_config_info to get_init_info() (upstream refactored init info into this method) - http_server.py: add parallelism_config_info to _GlobalState in _setup_and_run_http_server() (upstream extracted server startup into this function) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
amysaq2023
reviewed
Mar 12, 2026
Contributor
Author
|
refactored to new PR without all the piping: #20907 |
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
We want to enable model shards to be initialized outside of sglang in a way that's identical to sglang's ModelLoader.
The external initialization workflow will look something like this:
The primary use case of this is to enable train -> inference weight transfer. with 14997, we already have transfer engine and weight registration, but the transfer can only happen between weights of identical shape. One option is to instantiate the model shards outside, register the weights, and send them to sglang via the existing registrations.
I will however leave draft_tp for future work. The test currently verifies: TP, EP and Moe with DP Attention, with the CI test only has the most simple tp case.
Modifications
Whenever transfer engine is enabled, we additionally expose the full parallelism configurations of tp/pp/ep/attn_tp/attn_dp information through http API. The info is propagated to the _global_states just like the rdma weight register does at initialization time, so no performance impact.
A small fix was made to pass dp_parallel_controller's scheduler_infos back to the global engine instance. This allows remote_weight_info APIs to work with dp>1.
Accuracy Tests
python -m sglang.launch_server --model-path qwen/qwen2.5-0.5b-instruct --remote-instance-weight-loader-start-seed-via-transfer-enginecurl http://localhost:30000/get_parallelism_configreturns:
{"rank":0,"tp_size":1,"tp_rank":0,"pp_size":1,"pp_rank":0,"ep_size":1,"ep_rank":0,"attn_tp_size":1,"attn_tp_rank":0,"attn_dp_size":1,"attn_dp_rank":0,"world_size":1,"global_rank":0}etcthe
test_parallelism_context_integration.pyshows the intended use case and that the model parameters match.Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci