Skip to content

Bundle Scheduler rank/size fields into a frozen ParallelState#25444

Merged
fzyzcjy merged 1 commit into
mainfrom
tom/pr_chain/tom_refactor_202605a/primary/mech_preflight/parallel-state
May 16, 2026
Merged

Bundle Scheduler rank/size fields into a frozen ParallelState#25444
fzyzcjy merged 1 commit into
mainfrom
tom/pr_chain/tom_refactor_202605a/primary/mech_preflight/parallel-state

Conversation

@fzyzcjy
Copy link
Copy Markdown
Collaborator

@fzyzcjy fzyzcjy commented May 16, 2026

Introduce a frozen @dataclass(frozen=True, slots=True, kw_only=True)
ParallelState value object that bundles the 17 rank/size fields
historically scattered across Scheduler (tp_rank, tp_size,
pp_rank, pp_size, dp_rank, dp_size, attn_tp_*, attn_cp_*,
attn_dp_*, moe_ep_*, moe_dp_*, gpu_id). All consumers across
scheduler / disaggregation / dp_attention / ray / observability paths
now read these through self.ps.<field> instead of self.<field>,
making it explicit that the rank/size tuple belongs together and
never mutates independently after init.

Also folds in the original fix-disagg-prefill-rank commit, which
caught two stragglers in SchedulerDisaggregationPrefillMixin
(self.pp_rank in a log message and self.tp_rank in an error
message) that were missed in the first rename pass. Squashing the fix
back into the parent keeps the chain identifier ↔ PR mapping clean
(one commit = one logical refactor step = one PR in the eventual
chain).

Refactor chain ID: parallel-state


CI States

Latest PR Test: ❌ Missing run-ci label — add it to run CI tests.
Latest PR Test (Extra): ❌ Blockedrun-ci is required first.

…lelState

Introduce a frozen `@dataclass(frozen=True, slots=True, kw_only=True)`
`ParallelState` value object that bundles the 17 rank/size fields
historically scattered across Scheduler (`tp_rank`, `tp_size`,
`pp_rank`, `pp_size`, `dp_rank`, `dp_size`, `attn_tp_*`, `attn_cp_*`,
`attn_dp_*`, `moe_ep_*`, `moe_dp_*`, `gpu_id`). All consumers across
scheduler / disaggregation / dp_attention / ray / observability paths
now read these through `self.ps.<field>` instead of `self.<field>`,
making it explicit that the rank/size tuple belongs together and
never mutates independently after init.

Also folds in the original `fix-disagg-prefill-rank` commit, which
caught two stragglers in `SchedulerDisaggregationPrefillMixin`
(`self.pp_rank` in a log message and `self.tp_rank` in an error
message) that were missed in the first rename pass. Squashing the fix
back into the parent keeps the chain identifier ↔ PR mapping clean
(one commit = one logical refactor step = one PR in the eventual
chain).

Refactor chain ID: parallel-state
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@fzyzcjy fzyzcjy merged commit 43797cc into main May 16, 2026
57 of 67 checks passed
@fzyzcjy fzyzcjy deleted the tom/pr_chain/tom_refactor_202605a/primary/mech_preflight/parallel-state branch May 16, 2026 01:23
fzyzcjy added a commit to fzyzcjy/sglang that referenced this pull request May 19, 2026
Upstream PR sgl-project#25444 moved Scheduler.pp_size onto a frozen ParallelState
container (self.ps.pp_size). My branch's chunked-resume PP code still
referenced the old direct attribute, causing
AttributeError: 'Scheduler' object has no attribute 'pp_size'
in _in_flight_other_mb_rids and abort_request.
Shunkangz pushed a commit to Shunkangz/sglang that referenced this pull request May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant