refactor context parallel state by dongjiyingdjy · Pull Request #17213 · sgl-project/sglang

dongjiyingdjy · 2026-01-16T09:05:44Z

Motivation

Context parallelism is essential in long context LLM inference. It splits a long input sequence across multiple GPUs so attention can be computed in parallel, drastically reducing latency, which enables practical million-token context windows.
Previously, DeepSeek-V3.2 already supported CP and could use it together with DP. We aim to support the combination of CP, DP, and TP, and make it easier to apply to other models. To achieve this, we first refactored the original implementation.

Modifications

Add _ATTN_CP, _ATTN_TP and _MOE_CP as new GroupCoordinator;
Add attn_cp_size and moe_cp_size as new server_agrs;
Updated the relevant interfaces and processing logic in Scheduler.

Accuracy Tests

============================================================
DeepSeek-V3.2-Exp CP Single Node Results Summary
Dataset: gsm8k
Baseline: 0.935
============================================================

Model 1: deepseek-ai/DeepSeek-V3.2-Exp
  Accuracy: PASS
  Score: 0.980

Model 2: deepseek-ai/DeepSeek-V3.2-Exp
  Accuracy: PASS
  Score: 0.980

============================================================
OVERALL: ALL TESTS PASSED
============================================================

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-01-16T09:07:29Z

Summary of Changes

Hello @dongjiyingdjy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the context parallel state to support attention and MoE context parallelism. It introduces new group coordinators and updates initialization functions to manage these parallel processing capabilities. The changes also include modifications to scheduler processes and the addition of new server arguments for configuration.

Highlights

Context Parallelism Refactor: This PR introduces context parallelism for attention and MoE layers, enhancing parallel processing capabilities.
Group Coordinator Updates: New group coordinators (_ATTN_TP, _ATTN_CP, _MOE_CP) are added to manage attention and MoE context parallelism.
Initialization Function Modification: The initialize_model_parallel function is updated to include parameters for attention and MoE context parallel sizes.
Scheduler Processes: Scheduler processes are modified to pass attention and MoE context parallel ranks.
Server Arguments: New server arguments attn_cp_size and moe_cp_size are added to configure context parallelism.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces context parallelism for attention and MoE layers, which is a significant refactoring. The changes are extensive, touching many files to plumb through the new configuration and rank information. The core logic for creating the new parallel groups has been added.

My review focuses on the correctness of the new parallelism group initialization. While the logic for attention context parallelism (_ATTN_CP) and other groups seems correct, I've found a critical issue in the initialization of the MoE context parallel group (_MOE_CP). The current implementation appears to create groups across pipeline stages instead of within a single stage, which is incorrect for context parallelism.

I've provided a detailed comment with a suggested fix for this issue. Please address this to ensure the correctness of MoE context parallelism.

Also, there is a small typo in the pull request title: "refatcor" should be "refactor".

gemini-code-assist · 2026-01-16T09:10:15Z

python/sglang/srt/distributed/parallel_state.py

+        for i in range(num_tensor_model_parallel_groups):
+            for j in range(moe_tp_size * moe_ep_size):
+                st = i * tensor_model_parallel_size + j
+                en = (i + 1) * tensor_model_parallel_size + j
+                ranks = list(range(st, en, moe_tp_size * moe_ep_size))
+                group_ranks.append(ranks)


The logic for creating the _MOE_CP (MoE Context Parallel) group appears to be incorrect. It seems to be creating groups across pipeline parallel stages, similar to how pipeline parallel groups are formed, rather than creating context parallel groups within a single pipeline stage.

A context parallel group for MoE should group ranks that handle different parts of the context but the same expert and tensor slice. The current implementation:

for i in range(num_tensor_model_parallel_groups): for j in range(moe_tp_size * moe_ep_size): st = i * tensor_model_parallel_size + j en = (i + 1) * tensor_model_parallel_size + j ranks = list(range(st, en, moe_tp_size * moe_ep_size)) group_ranks.append(ranks)

Here, i iterates through pipeline stages, and en points to a rank in the next pipeline stage, which is incorrect for a context parallel group.

A correct implementation should iterate within a single pipeline stage. Assuming a rank layout of (cp, ep, tp) within a tensor parallel group, the logic should be something like this:

for i in range(num_tensor_model_parallel_groups): for j in range(moe_ep_size): for k in range(moe_tp_size): # Assuming a rank layout of (cp, ep, tp) base = i * tensor_model_parallel_size + j * moe_tp_size + k stride = moe_ep_size * moe_tp_size ranks = [base + c * stride for c in range(moe_cp_size)] group_ranks.append(ranks)

Fridge003 · 2026-01-16T13:33:49Z

Please make sure this PR can pass this unit test
https://github.com/sgl-project/sglang/blob/main/test/registered/8-gpu-models/test_deepseek_v32_cp_single_node.py

dongjiyingdjy · 2026-01-21T02:49:00Z

Please make sure this PR can pass this unit test https://github.com/sgl-project/sglang/blob/main/test/registered/8-gpu-models/test_deepseek_v32_cp_single_node.py

Both tests already passed. Thanks!

Fridge003 · 2026-01-21T07:50:17Z

/tag-and-rerun-ci

test/registered/distributed/test_parallel_state.py

Fridge003

Wait for CIs

Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

) PR #17213 added attn_cp_rank and moe_dp_rank parameters to run_scheduler_process but the gRPC scheduler_launcher was not updated, causing startup failure due to missing arguments.

llc-kc · 2026-02-27T11:27:22Z

This PR disable PP+CP, will this be supported in the future?

Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

dongjiyingdjy requested review from BBuf, CatherineSue, Edwardf0t1, Fridge003, HaiShaw, JustinTong0323, Qiaolin-Yu, Ying1123, ch-wan, fzyzcjy, hebiao064, hnyls2002, ispobock, merrymercy, slin1237, xiezhq-hermann, yizhang2077 and zhyncs as code owners January 16, 2026 09:05

github-actions bot added the deepseek label Jan 16, 2026

gemini-code-assist bot reviewed Jan 16, 2026

View reviewed changes

Fridge003 self-assigned this Jan 18, 2026

Fridge003 added the high priority label Jan 18, 2026

Shunkangz mentioned this pull request Jan 19, 2026

[Feature] Support context parallel for Qwen3 model #16632

Open

6 tasks

Fridge003 changed the title ~~refatcor context parallel state~~ refactor context parallel state Jan 20, 2026

dongjiyingdjy force-pushed the support_cp branch from 3d7a871 to 71be3c3 Compare January 21, 2026 07:10

github-actions bot added the documentation Improvements or additions to documentation label Jan 21, 2026

Merge branch 'main' into support_cp

8532f55

Fridge003 added the format Auto Format Code label Feb 11, 2026

Fridge003 added 2 commits February 11, 2026 23:22

fix

bee493c

Merge branch 'main' into support_cp

aeb7903

Kangyan-Zhou reviewed Feb 12, 2026

View reviewed changes

test/registered/distributed/test_parallel_state.py Outdated Show resolved Hide resolved

Merge branch 'main' into support_cp

993d9ff

Fridge003 reviewed Feb 12, 2026

View reviewed changes

test/registered/distributed/test_parallel_state.py Outdated Show resolved Hide resolved

test/registered/distributed/test_parallel_state.py Outdated Show resolved Hide resolved

Fridge003 added 2 commits February 12, 2026 22:35

Update test/registered/distributed/test_parallel_state.py

51be73e

Update test/registered/distributed/test_parallel_state.py

d7733fa

Fridge003 approved these changes Feb 13, 2026

View reviewed changes

Merge branch 'main' into support_cp

c02ff31

Fridge003 merged commit 8b4c364 into sgl-project:main Feb 13, 2026
193 of 206 checks passed

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

refactor context parallel state (sgl-project#17213)

b873721

Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

slin1237 mentioned this pull request Feb 17, 2026

[gRPC] Fix scheduler startup broken by context parallel refactor #18933

Merged

This was referenced Feb 20, 2026

[DSv32] Fix MTP and CP compatability #19062

Merged

[DSv32] Support CP + P/D #19119

Closed

Qiaolin-Yu mentioned this pull request Feb 22, 2026

Fix spec v2+dp attention in nsa backend #19134

Merged

5 tasks

ShangmingCai mentioned this pull request Feb 25, 2026

[PD-Disagg] Fix bootstrap server race condition when prefill workers not yet registered #19288

Merged

2 tasks

Fridge003 mentioned this pull request Feb 26, 2026

[Feature] Add DCP support for DeepSeek v3.2 #18167

Open

7 tasks

This was referenced Feb 27, 2026

[Bug] SGLang v0.5.9 DeepSeek V3.2 CP+DP get incorrect results. V0.5.8 is ok, v0.5.9 not PD is ok. #19483

Closed

[bugfix] Restore attn_tp_rank/size reset in DeepseekV2AttentionMLA when CP enabled #19495

Closed

xu-yfei mentioned this pull request Feb 27, 2026

[DeepSeek 3.2] Support and optimize pipeline parallelis when context pipeline enabled #16380

Merged

5 tasks

ShangmingCai mentioned this pull request Feb 28, 2026

[PD] Support PD with context parallel after refactor #19504

Merged

5 tasks

magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026

refactor context parallel state (sgl-project#17213)

a178bd3

Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

refactor context parallel state (sgl-project#17213)

46c7dc9

Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

Fridge003 mentioned this pull request Apr 6, 2026

[Roadmap] Context Parallelism (2026 Q2) #21788

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor context parallel state #17213

refactor context parallel state #17213
Fridge003 merged 26 commits intosgl-project:mainfrom
dongjiyingdjy:support_cp

dongjiyingdjy commented Jan 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 16, 2026

Uh oh!

Fridge003 commented Jan 16, 2026

Uh oh!

dongjiyingdjy commented Jan 21, 2026

Uh oh!

Fridge003 commented Jan 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fridge003 left a comment

Uh oh!

Uh oh!

llc-kc commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

dongjiyingdjy commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Jan 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Fridge003 commented Jan 16, 2026

Uh oh!

dongjiyingdjy commented Jan 21, 2026

Uh oh!

Fridge003 commented Jan 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fridge003 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llc-kc commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dongjiyingdjy commented Jan 16, 2026 •

edited

Loading