Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions python/sglang/srt/managers/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -1438,7 +1438,7 @@ def get_next_batch_to_run(self) -> Optional[ScheduleBatch]:
if need_dp_attn_preparation and not self.spec_algorithm.is_none():
# In speculative decoding, prefill batches and decode batches cannot be processed in the same DP attention group.
# We prepare idle batches in advance to skip preparing decode batches when there are prefill batches in the group.
new_batch, _ = self.prepare_dp_attn_batch(new_batch)
new_batch, _ = self.prepare_mlp_sync_batch(new_batch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This change replaces prepare_dp_attn_batch with prepare_mlp_sync_batch. It's important to ensure that prepare_mlp_sync_batch correctly handles the logic previously managed by prepare_dp_attn_batch to avoid introducing regressions in DP attention preparation. Please confirm that this substitution is functionally equivalent or superior in the context of speculative decoding and MLP synchronization.

Suggested change
new_batch, _ = self.prepare_mlp_sync_batch(new_batch)
new_batch, _ = self.prepare_mlp_sync_batch(new_batch) # Ensure this function correctly handles DP attention preparation

need_dp_attn_preparation = new_batch is None

if new_batch is not None:
Expand All @@ -1454,7 +1454,7 @@ def get_next_batch_to_run(self) -> Optional[ScheduleBatch]:

# Handle DP attention
if need_dp_attn_preparation:
ret, _ = self.prepare_dp_attn_batch(ret)
ret, _ = self.prepare_mlp_sync_batch(ret)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, this change replaces prepare_dp_attn_batch with prepare_mlp_sync_batch. Verify that prepare_mlp_sync_batch correctly handles DP attention in this context as well.

Suggested change
ret, _ = self.prepare_mlp_sync_batch(ret)
ret, _ = self.prepare_mlp_sync_batch(ret) # Ensure this function correctly handles DP attention


return ret

Expand Down
Loading