[parallel state Refactor 3/n] Unify communicator interface by DarkSharpness · Pull Request #21098 · sgl-project/sglang

DarkSharpness · 2026-03-21T14:19:30Z

Motivation

#20866, #20871

Modifications

This PR refactors the distributed communication stack in srt to make communicator behavior explicit, composable, and easier to extend across backends.

The main change is introducing a unified BaseCommunicator abstraction and moving backend selection logic out of the large ad hoc branches in parallel_state.py into a reusable CommunicatorImpl dispatcher. This keeps the platform-specific implementations local to each communicator while preserving the existing runtime behavior at the group level.

Previously, communicator behavior was spread across backend-specific helpers and parallel_state.py, with different backends exposing slightly different methods such as custom_all_reduce, backend-specific should_* predicates, or implicit in-place assumptions.

With BaseCommunicator, each backend now exposes the same public contract:

can_all_reduce() reports whether a communicator can handle the input, and if so whether it supports in-place, out-of-place, or both modes.
all_reduce(..., inplace=...) is the single all-reduce entry point for all backends.
should_use_custom_op() tells the dispatcher whether the communicator should run through registered custom ops for torch.compile / piecewise-cuda-graph-friendly execution.
graph_capture_context() lets each communicator define any graph-capture-specific setup in one place.

This makes communicator selection data-driven instead of backend-name-driven, and removes a large amount of backend-specific branching from the group coordinator.

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-21T14:20:08Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant refactoring of the distributed communication stack to enhance modularity, extensibility, and maintainability. By establishing a unified BaseCommunicator interface and a CommunicatorImpl dispatcher, it abstracts away backend-specific details, allowing for more consistent and data-driven selection of communication methods. This change simplifies the core parallel_state logic, making it easier to integrate new communication backends and manage existing ones, ultimately improving the overall robustness and flexibility of the distributed system.

Highlights

Unified Communicator Interface: Introduced a new BaseCommunicator abstract class and an AllReduceMode enum to define a standardized contract for distributed communication operations. This centralizes common logic and makes backend-specific implementations more explicit and consistent.
Centralized Dispatching Logic: Implemented a CommunicatorImpl class that acts as a dispatcher, selecting the appropriate backend communicator based on the operation and tensor properties. This moves complex selection logic out of parallel_state.py and into a reusable, data-driven component.
Refactored Existing Communicators: Updated CustomAllreduce, CustomAllReduceV2, HpuCommunicator, PyMscclppCommunicator, PyNcclCommunicator, QuickAllReduce, and TorchSymmMemCommunicator to inherit from BaseCommunicator and conform to its new interface, including can_all_reduce and a unified all_reduce method.
New Communicator Implementations: Added TorchDefaultCommunicator for standard PyTorch distributed operations, AiterCustomAllReduce as a wrapper for Aiter's custom all-reduce, and PyNcclSymmMemCommunicator for NCCL with symmetric memory allocation.
Simplified Parallel State Management: The GroupCoordinator in parallel_state.py was significantly refactored to leverage the new CommunicatorImpl, reducing its complexity and making it easier to manage various communication backends. Direct calls to specific communicators were replaced with calls to the unified self.impl interface.
Removed Deprecated Components: The NpuCommunicator and all_reduce_utils.py files were removed, streamlining the codebase and consolidating functionality within the new architecture.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed refactoring of the distributed communication stack. The new BaseCommunicator abstraction and CommunicatorImpl dispatcher create a unified, data-driven, and more extensible system for handling different communication backends. The changes are clean, modular, and align perfectly with the stated goals. I have one suggestion regarding a potential issue with torch.compile compatibility in the new custom op registration.

python/sglang/srt/distributed/device_communicators/impl.py

DarkSharpness · 2026-04-03T04:52:46Z

/gemini review

gemini-code-assist · 2026-04-03T04:52:50Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

DarkSharpness · 2026-04-03T04:52:56Z

/tag-and-rerun-ci try again

BBuf · 2026-04-03T07:15:25Z

python/sglang/srt/distributed/device_communicators/impl.py

+_GROUPS: Dict[str, Callable[[], Optional[CommunicatorImpl]]] = {}
+
+
+def _register_group(group: CommunicatorImpl) -> None:


I want to know why we use weakref here?

python/sglang/srt/distributed/device_communicators/custom_all_reduce_v2.py

BBuf · 2026-04-03T07:23:31Z

python/sglang/srt/distributed/device_communicators/impl.py

+            if mode is not None and _is_mode_supported(mode, inplace):
+                if not comm.should_use_custom_op():
+                    return comm.all_reduce(input_, inplace=inplace)
+                use_inplace = _can_use_inplace(mode) if inplace is None else inplace


nits: I’m wondering whether users can tell that we’re using inplace operations here, and whether there’s a risk of it being misused.

If inplace is specified (not None), it must be strictly followed. Otherwise, it depends on the backend (it will choose the fastest possible implementation).

BBuf

This is a large change, so before merging we should make sure all CI checks passes.

DarkSharpness requested review from BBuf, Edwardf0t1, FlamingoPg, Fridge003, HydraQYH, Ying1123, celve, ch-wan, hebiao064, hnyls2002, ispobock, merrymercy, yizhang2077 and yuan-luo as code owners March 21, 2026 14:19

github-actions bot added amd sgl-kernel npu piecewise-cuda-graph jit-kernel labels Mar 21, 2026

gemini-code-assist bot reviewed Mar 21, 2026

View reviewed changes

python/sglang/srt/distributed/device_communicators/impl.py Outdated Show resolved Hide resolved

refactor: clean up the parallel state

874c014

DarkSharpness force-pushed the refactor_parallel_state branch from 60a1018 to 874c014 Compare April 3, 2026 04:38

DarkSharpness requested review from HaiShaw and Oasis-Git as code owners April 3, 2026 04:38

fix: fix mindspore compatibility

15cbe81

github-actions bot added the run-ci label Apr 3, 2026

DarkSharpness added 2 commits April 3, 2026 13:21

minor: fix circular import

e5e1841

fix: some deficiencies

6a7f46b

BBuf reviewed Apr 3, 2026

View reviewed changes

python/sglang/srt/distributed/device_communicators/custom_all_reduce_v2.py Outdated Show resolved Hide resolved

BBuf reviewed Apr 3, 2026

View reviewed changes

DarkSharpness added 2 commits April 4, 2026 13:25

misc: rename

74a0982

fix: fix custom ar v2

ae34f4b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[parallel state Refactor 3/n] Unify communicator interface#21098

[parallel state Refactor 3/n] Unify communicator interface#21098
DarkSharpness wants to merge 6 commits intosgl-project:mainfrom
DarkSharpness:refactor_parallel_state

DarkSharpness commented Mar 21, 2026

Uh oh!

gemini-code-assist bot commented Mar 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

DarkSharpness commented Apr 3, 2026

Uh oh!

gemini-code-assist bot commented Apr 3, 2026

Uh oh!

DarkSharpness commented Apr 3, 2026 •

edited

Loading

Uh oh!

BBuf Apr 3, 2026

Uh oh!

Uh oh!

BBuf Apr 3, 2026

Uh oh!

DarkSharpness Apr 3, 2026

Uh oh!

BBuf left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		_GROUPS: Dict[str, Callable[[], Optional[CommunicatorImpl]]] = {}


		def _register_group(group: CommunicatorImpl) -> None:

Conversation

DarkSharpness commented Mar 21, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Mar 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkSharpness commented Apr 3, 2026

Uh oh!

gemini-code-assist bot commented Apr 3, 2026

Uh oh!

DarkSharpness commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BBuf Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BBuf Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

DarkSharpness Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

BBuf left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DarkSharpness commented Apr 3, 2026 •

edited

Loading