Skip to content

Expose Model Parallelism Information #16860

Closed
JD-ETH wants to merge 11 commits intosgl-project:mainfrom
JD-ETH:feature/parallelism_context_for_model_replica
Closed

Expose Model Parallelism Information #16860
JD-ETH wants to merge 11 commits intosgl-project:mainfrom
JD-ETH:feature/parallelism_context_for_model_replica

Conversation

@JD-ETH
Copy link
Copy Markdown
Contributor

@JD-ETH JD-ETH commented Jan 10, 2026

Motivation

We want to enable model shards to be initialized outside of sglang in a way that's identical to sglang's ModelLoader.
The external initialization workflow will look something like this:

sglang.srt.server_args._global_server_args = server_args
model_parallelism_info = engine.get_parallelism_config(rank) 
with ParallelismContext(RankParallelismConfig.from_dict(model_parallelism_info)):
   model = get_model(
                model_config=model_config,
                load_config=load_config,
                device_config=device_config,
            )

The primary use case of this is to enable train -> inference weight transfer. with 14997, we already have transfer engine and weight registration, but the transfer can only happen between weights of identical shape. One option is to instantiate the model shards outside, register the weights, and send them to sglang via the existing registrations.

I will however leave draft_tp for future work. The test currently verifies: TP, EP and Moe with DP Attention, with the CI test only has the most simple tp case.

Modifications

Whenever transfer engine is enabled, we additionally expose the full parallelism configurations of tp/pp/ep/attn_tp/attn_dp information through http API. The info is propagated to the _global_states just like the rdma weight register does at initialization time, so no performance impact.

A small fix was made to pass dp_parallel_controller's scheduler_infos back to the global engine instance. This allows remote_weight_info APIs to work with dp>1.

Accuracy Tests

python -m sglang.launch_server --model-path qwen/qwen2.5-0.5b-instruct --remote-instance-weight-loader-start-seed-via-transfer-engine
curl http://localhost:30000/get_parallelism_config
returns:
{"rank":0,"tp_size":1,"tp_rank":0,"pp_size":1,"pp_rank":0,"ep_size":1,"ep_rank":0,"attn_tp_size":1,"attn_tp_rank":0,"attn_dp_size":1,"attn_dp_rank":0,"world_size":1,"global_rank":0} etc

the test_parallelism_context_integration.py shows the intended use case and that the model parameters match.

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@stmatengss
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@JD-ETH JD-ETH force-pushed the feature/parallelism_context_for_model_replica branch from 648bd9d to 14e5c06 Compare January 18, 2026 18:05
@JD-ETH
Copy link
Copy Markdown
Contributor Author

JD-ETH commented Jan 19, 2026

/rerun-failed-ci

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

  1. add the commands of HTTPS;
  2. modify sglang document;

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@JD-ETH
Copy link
Copy Markdown
Contributor Author

JD-ETH commented Jan 19, 2026

add to run suite

JD-ETH and others added 4 commits January 21, 2026 17:02
Conflicts resolved:
- model_runner.py: keep both RankParallelismConfig and
  use_symmetric_memory imports
- scheduler.py: add parallelism_config_info to get_init_info()
  (upstream refactored init info into this method)
- http_server.py: add parallelism_config_info to _GlobalState
  in _setup_and_run_http_server() (upstream extracted server
  startup into this function)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JD-ETH
Copy link
Copy Markdown
Contributor Author

JD-ETH commented Mar 19, 2026

refactored to new PR without all the piping: #20907

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants