Skip to content

Fix Overlap R3#23349

Closed
zyzshishui wants to merge 1 commit into
sgl-project:overlap_r3from
zyzshishui:fix-overlap-r3
Closed

Fix Overlap R3#23349
zyzshishui wants to merge 1 commit into
sgl-project:overlap_r3from
zyzshishui:fix-overlap-r3

Conversation

@zyzshishui
Copy link
Copy Markdown
Contributor

@zyzshishui zyzshishui commented Apr 21, 2026

Motivation

Merged into #22911

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the copy_to_cpu method in python/sglang/srt/managers/utils.py by removing the conditional check for return_routed_experts when copying routed expert outputs to the CPU. Feedback indicates that this change makes the return_routed_experts parameter unused and leads to inefficient memory management and unnecessary data transfers. It is recommended to restore the original conditional logic to maintain efficiency.

Comment on lines +94 to 96
if self.routed_experts_output is not None:
self.routed_experts_output.copy_to_cpu()
else:
self.routed_experts_output = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameter return_routed_experts is now unused in this function. Removing the check and return_routed_experts and the else block that clears self.routed_experts_output leads to unnecessary D2H copies and keeps GPU tensors alive even when they are not requested by the user. If the intention was to always copy these experts, the parameter should be removed from the function signature. Otherwise, the previous conditional logic should be restored to maintain efficiency and proper memory management.

Suggested change
if self.routed_experts_output is not None:
self.routed_experts_output.copy_to_cpu()
else:
self.routed_experts_output = None
if self.routed_experts_output is not None and return_routed_experts:
self.routed_experts_output.copy_to_cpu()
else:
self.routed_experts_output = None

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a1c4236d6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +94 to 95
if self.routed_experts_output is not None:
self.routed_experts_output.copy_to_cpu()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore routed-expert copy gating by request flag

copy_to_cpu() now always calls self.routed_experts_output.copy_to_cpu() whenever the capturer is enabled, even when no request asked for routed experts. In overlap mode, batch.return_routed_experts is computed as any(req.return_routed_experts for req in reqs) (see schedule_batch.py), so this removed guard turns an optional D2H path into a per-batch cost. For MoE models this can add large host transfers and finalize work on every step, materially reducing throughput/latency for workloads that enable routed-expert support but only occasionally request it.

Useful? React with 👍 / 👎.

@zyzshishui zyzshishui changed the title 1 Fix Overlap R3 Apr 21, 2026
Comment thread python/sglang/srt/managers/utils.py
@zyzshishui zyzshishui closed this Apr 21, 2026
@zyzshishui zyzshishui deleted the fix-overlap-r3 branch May 10, 2026 17:44
@zyzshishui zyzshishui restored the fix-overlap-r3 branch May 10, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants