Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions python/sglang/srt/disaggregation/decode.py
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,12 @@ def pop_preallocated(self) -> List[DecodeRequest]:

return preallocated_reqs

@property
def num_tokens_pre_allocated(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The property name num_tokens_pre_allocated is ambiguous. It could be interpreted as the total number of all pre-allocated tokens, but the implementation counts tokens only for requests in the transfer queue. Renaming it to num_tokens_in_transfer would improve clarity.

    @property
    def num_tokens_in_transfer(self):

return sum(
len(decode_req.req.fill_ids) for decode_req in self.transfer_queue.queue
)

def _allocatable_tokens(
self, retractable_tokens: Optional[int] = None, count_retracted: bool = True
) -> int:
Expand Down
5 changes: 1 addition & 4 deletions python/sglang/srt/managers/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -707,9 +707,6 @@ def init_disaggregation(self):
transfer_backend=self.transfer_backend,
)

# Metric for pre-allocation
self.num_tokens_pre_allocated = 0

elif self.disaggregation_mode == DisaggregationMode.PREFILL:
# *2 for the headroom.
buffer_size = self.max_running_requests * 2
Expand Down Expand Up @@ -1372,7 +1369,7 @@ def log_decode_stats(
msg += f"accept len: {spec_accept_length:.2f}, "

if self.disaggregation_mode == DisaggregationMode.DECODE:
msg += f"pre-allocated usage: {self.num_tokens_pre_allocated / self.max_total_num_tokens:.2f}, "
msg += f"pre-allocated usage: {self.disagg_decode_prealloc_queue.num_tokens_pre_allocated / self.max_total_num_tokens:.2f}, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To align with the suggested name change in python/sglang/srt/disaggregation/decode.py, this call should be updated to use the new property name num_tokens_in_transfer.

Suggested change
msg += f"pre-allocated usage: {self.disagg_decode_prealloc_queue.num_tokens_pre_allocated / self.max_total_num_tokens:.2f}, "
msg += f"pre-allocated usage: {self.disagg_decode_prealloc_queue.num_tokens_in_transfer / self.max_total_num_tokens:.2f}, "

msg += f"#retracted-req: {len(self.disagg_decode_prealloc_queue.retracted_queue)}, "

msg += (
Expand Down
Loading