Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions python/sglang/srt/managers/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,10 +91,8 @@ def copy_to_cpu(self, return_logprob: bool, return_routed_experts: bool):
if self.accept_lens is not None:
self.accept_lens = self.accept_lens.to("cpu", non_blocking=True)

if self.routed_experts_output is not None and return_routed_experts:
if self.routed_experts_output is not None:
self.routed_experts_output.copy_to_cpu()
Comment on lines +94 to 95
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore routed-expert copy gating by request flag

copy_to_cpu() now always calls self.routed_experts_output.copy_to_cpu() whenever the capturer is enabled, even when no request asked for routed experts. In overlap mode, batch.return_routed_experts is computed as any(req.return_routed_experts for req in reqs) (see schedule_batch.py), so this removed guard turns an optional D2H path into a per-batch cost. For MoE models this can add large host transfers and finalize work on every step, materially reducing throughput/latency for workloads that enable routed-expert support but only occasionally request it.

Useful? React with 👍 / 👎.

else:
self.routed_experts_output = None

Comment on lines +94 to 96
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameter return_routed_experts is now unused in this function. Removing the check and return_routed_experts and the else block that clears self.routed_experts_output leads to unnecessary D2H copies and keeps GPU tensors alive even when they are not requested by the user. If the intention was to always copy these experts, the parameter should be removed from the function signature. Otherwise, the previous conditional logic should be restored to maintain efficiency and proper memory management.

Suggested change
if self.routed_experts_output is not None:
self.routed_experts_output.copy_to_cpu()
else:
self.routed_experts_output = None
if self.routed_experts_output is not None and return_routed_experts:
self.routed_experts_output.copy_to_cpu()
else:
self.routed_experts_output = None

if (x := self.expert_distribution_metrics) is not None:
x.copy_to_cpu()
Expand Down
Loading