Support PP for zmq_to_scheduler#15312
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
Pull request overview
This PR enables Pipeline Parallelism (PP) support for the zmq_to_scheduler encoder transfer backend in encoder-decoder disaggregated inference scenarios. The key insight is that MM (multimodal) processing only needs to occur at the first PP stage (pp_rank 0), with subsequent stages receiving pre-processed requests from upstream stages.
Key changes:
- Restricted MM receiver initialization and processing to pp_rank 0 only
- Changed synchronization scope from world_size (all ranks) to tp_size (TP ranks within a PP stage)
- Removed the embedding_ports mechanism in favor of direct encoder-to-scheduler communication per TP group
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| python/sglang/srt/server_args.py | Removed validation that prevented PP when using zmq_to_scheduler |
| python/sglang/srt/models/qwen2_5_vl.py | Simplified weight loading to skip missing weights consistently across modes |
| python/sglang/srt/managers/io_struct.py | Removed embedding_ports field from request input structures |
| python/sglang/srt/managers/tokenizer_manager.py | Removed embedding_ports parameter from tokenized object creation |
| python/sglang/srt/managers/scheduler.py | Added pp_rank checks for MM receiver initialization and processing; added tp_group parameter |
| python/sglang/srt/disaggregation/encode_receiver.py | Changed synchronization to TP-only scope, improved device placement, removed embedding_port logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/tag-and-rerun-ci |
ShangmingCai
left a comment
There was a problem hiding this comment.
LGTM, let's wait for the CI.
|
/rerun-failed-ci |
ShangmingCai
left a comment
There was a problem hiding this comment.
LGTM. The modification of the scheduler part is safe since it is all EPD-related code.
Full green CI: https://github.com/sgl-project/sglang/actions/runs/20449993152/job/58772672138?pr=15312

Motivation
Follow up PRs for #12263 to support PP for zmq_to_scheduler
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist