Skip to content

fix: support PP2+CP8+TP8 (PP with context parallelism)#19548

Merged
whybeyoung merged 7 commits intosgl-project:mainfrom
whybeyoung:fix-cp8pp2tp8
Mar 16, 2026
Merged

fix: support PP2+CP8+TP8 (PP with context parallelism)#19548
whybeyoung merged 7 commits intosgl-project:mainfrom
whybeyoung:fix-cp8pp2tp8

Conversation

@whybeyoung
Copy link
Copy Markdown
Collaborator

@whybeyoung whybeyoung commented Feb 28, 2026

AS #19504 (comment) mentioned:
i fixed it on H20 * 8 * 2

  • scheduler_pp_mixin: only TP0+CP0 rank does pyobj send/recv to next PP stage; after TP broadcast, add CP broadcast so all CP ranks get data.
  • server_args: set attn_cp_size=tp_size in NSA prefill CP path; allow PP with CP when enable_nsa_prefill_context_parallel is set.

Ref: 98e9ecb (fix pp new cp)

CC @ShangmingCai @Fridge003 @xu-yfei

- scheduler_pp_mixin: only TP0+CP0 rank does pyobj send/recv to next PP
  stage; after TP broadcast, add CP broadcast so all CP ranks get data.
- server_args: set attn_cp_size=tp_size in NSA prefill CP path; allow
  PP with CP when enable_nsa_prefill_context_parallel is set.

Ref: 98e9ecb (fix pp new cp)
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the broadcast part

Copy link
Copy Markdown
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to fix lint

@ShangmingCai
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@yiakwy-xpu-ml-framework-team
Copy link
Copy Markdown
Contributor

Yes I just manually copy the modification and pp2+cp works as expected (otherwise , it will not return to generate outputs)

@@ -2142,7 +2117,8 @@ def _handle_context_parallelism(self):
assert (
self.tp_size % (self.dp_size * self.attn_cp_size) == 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we clarify the attn_cp_size ? Currently cp is used as attn_dp_size = tp / dp. It can be confusing .

), "tp_size must be divisible by dp_size * attn_cp_size"
assert self.pp_size == 1, "PP is not supported with context parallelism"
if not self.enable_nsa_prefill_context_parallel:
assert self.pp_size == 1, "PP is not supported with context parallelism"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reomved.

with TP=2, we can support CP for H800/H20x8.

For H200, there is no such constraints.

@ShangmingCai
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@whybeyoung whybeyoung enabled auto-merge (squash) March 15, 2026 00:46
@whybeyoung whybeyoung merged commit 289cbcf into sgl-project:main Mar 16, 2026
89 of 94 checks passed
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants