Fix Overlap R3 by zyzshishui · Pull Request #23349 · sgl-project/sglang

zyzshishui · 2026-04-21T07:59:07Z

Motivation

Merged into #22911

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request modifies the copy_to_cpu method in python/sglang/srt/managers/utils.py by removing the conditional check for return_routed_experts when copying routed expert outputs to the CPU. Feedback indicates that this change makes the return_routed_experts parameter unused and leads to inefficient memory management and unnecessary data transfers. It is recommended to restore the original conditional logic to maintain efficiency.

gemini-code-assist · 2026-04-21T08:00:08Z

+        if self.routed_experts_output is not None:
            self.routed_experts_output.copy_to_cpu()
-        else:
-            self.routed_experts_output = None



The parameter return_routed_experts is now unused in this function. Removing the check and return_routed_experts and the else block that clears self.routed_experts_output leads to unnecessary D2H copies and keeps GPU tensors alive even when they are not requested by the user. If the intention was to always copy these experts, the parameter should be removed from the function signature. Otherwise, the previous conditional logic should be restored to maintain efficiency and proper memory management.

Suggested change

if self.routed_experts_output is not None:

self.routed_experts_output.copy_to_cpu()

else:

self.routed_experts_output = None

if self.routed_experts_output is not None and return_routed_experts:

self.routed_experts_output.copy_to_cpu()

else:

self.routed_experts_output = None

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a1c4236d6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-21T08:02:22Z

+        if self.routed_experts_output is not None:
            self.routed_experts_output.copy_to_cpu()


Restore routed-expert copy gating by request flag

copy_to_cpu() now always calls self.routed_experts_output.copy_to_cpu() whenever the capturer is enabled, even when no request asked for routed experts. In overlap mode, batch.return_routed_experts is computed as any(req.return_routed_experts for req in reqs) (see schedule_batch.py), so this removed guard turns an optional D2H path into a per-batch cost. For MoE models this can add large host transfers and finalize work on every step, materially reducing throughput/latency for workloads that enable routed-expert support but only occasionally request it.

Useful? React with 👍 / 👎.

1

1a1c423

zyzshishui requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners April 21, 2026 07:59

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 21, 2026

View reviewed changes

zyzshishui changed the title 1 Fix Overlap R3 Apr 21, 2026

Qiaolin-Yu requested changes Apr 21, 2026

View reviewed changes

Comment thread python/sglang/srt/managers/utils.py

zyzshishui closed this Apr 21, 2026

zyzshishui deleted the fix-overlap-r3 branch May 10, 2026 17:44

zyzshishui restored the fix-overlap-r3 branch May 10, 2026 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Overlap R3#23349

Fix Overlap R3#23349
zyzshishui wants to merge 1 commit into
sgl-project:overlap_r3from
zyzshishui:fix-overlap-r3

zyzshishui commented Apr 21, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if self.routed_experts_output is not None:
		self.routed_experts_output.copy_to_cpu()

Conversation

zyzshishui commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zyzshishui commented Apr 21, 2026 •

edited

Loading