Skip to content

Support PP for zmq_to_scheduler#15312

Merged
ShangmingCai merged 6 commits intosgl-project:mainfrom
gty111:fix_pp_scheduler
Dec 23, 2025
Merged

Support PP for zmq_to_scheduler#15312
ShangmingCai merged 6 commits intosgl-project:mainfrom
gty111:fix_pp_scheduler

Conversation

@gty111
Copy link
Contributor

@gty111 gty111 commented Dec 17, 2025

Motivation

Follow up PRs for #12263 to support PP for zmq_to_scheduler

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Copilot AI review requested due to automatic review settings December 17, 2025 07:36
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables Pipeline Parallelism (PP) support for the zmq_to_scheduler encoder transfer backend in encoder-decoder disaggregated inference scenarios. The key insight is that MM (multimodal) processing only needs to occur at the first PP stage (pp_rank 0), with subsequent stages receiving pre-processed requests from upstream stages.

Key changes:

  • Restricted MM receiver initialization and processing to pp_rank 0 only
  • Changed synchronization scope from world_size (all ranks) to tp_size (TP ranks within a PP stage)
  • Removed the embedding_ports mechanism in favor of direct encoder-to-scheduler communication per TP group

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
python/sglang/srt/server_args.py Removed validation that prevented PP when using zmq_to_scheduler
python/sglang/srt/models/qwen2_5_vl.py Simplified weight loading to skip missing weights consistently across modes
python/sglang/srt/managers/io_struct.py Removed embedding_ports field from request input structures
python/sglang/srt/managers/tokenizer_manager.py Removed embedding_ports parameter from tokenized object creation
python/sglang/srt/managers/scheduler.py Added pp_rank checks for MM receiver initialization and processing; added tp_group parameter
python/sglang/srt/disaggregation/encode_receiver.py Changed synchronization to TP-only scope, improved device placement, removed embedding_port logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ShangmingCai
Copy link
Collaborator

/tag-and-rerun-ci

Copy link
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's wait for the CI.

@ShangmingCai
Copy link
Collaborator

/rerun-failed-ci

Copy link
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The modification of the scheduler part is safe since it is all EPD-related code.

Full green CI: https://github.com/sgl-project/sglang/actions/runs/20449993152/job/58772672138?pr=15312
image

@ShangmingCai ShangmingCai merged commit fa29669 into sgl-project:main Dec 23, 2025
150 of 155 checks passed
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants