Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions python/sglang/srt/disaggregation/decode.py
Original file line number Diff line number Diff line change
Expand Up @@ -918,6 +918,9 @@ def _commit_transfer_to_req(self, decode_req: DecodeRequest) -> bool:
# Case 3: Success - commit the transfer
decode_req.req.output_ids.append(output_id[0].item())
decode_req.req.cached_tokens = cached_tokens[0].item()
decode_req.req.cached_tokens_device = cached_tokens[1].item()
decode_req.req.cached_tokens_host = cached_tokens[2].item()
decode_req.req.cached_tokens_storage = cached_tokens[3].item()
Comment on lines +921 to +923
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using magic numbers (1, 2, 3) for accessing elements of cached_tokens can make the code harder to read and maintain. It would be better to define named constants for these indices to clarify their meaning (e.g., CACHED_TOKENS_DEVICE_IDX, CACHED_TOKENS_HOST_IDX, etc.). These constants could be defined in a shared location, like python/sglang/srt/disaggregation/utils.py, and used here and in MetadataBuffers.set_buf.

if not self.spec_algorithm.is_none():
decode_req.req.output_topk_p = output_topk_p
decode_req.req.output_topk_index = output_topk_index
Expand Down
3 changes: 3 additions & 0 deletions python/sglang/srt/disaggregation/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,9 @@ def set_buf(self, req: Req):

self.output_ids[req.metadata_buffer_index][0] = req.output_ids[0]
self.cached_tokens[req.metadata_buffer_index][0] = req.cached_tokens
self.cached_tokens[req.metadata_buffer_index][1] = req.cached_tokens_device
self.cached_tokens[req.metadata_buffer_index][2] = req.cached_tokens_host
self.cached_tokens[req.metadata_buffer_index][3] = req.cached_tokens_storage
Comment on lines +230 to +232
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the usage in decode.py, these magic number indices (1, 2, 3) for cached_tokens should be replaced with named constants to improve code clarity and maintainability. Defining these constants in a central place within this file would be ideal, so they can be used consistently.

if req.return_logprob:
if req.output_token_logprobs_val: # not none or empty list
self.output_token_logprobs_val[req.metadata_buffer_index][0] = (
Expand Down
18 changes: 10 additions & 8 deletions python/sglang/srt/managers/scheduler_output_processor_mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,10 @@ def _get_cached_tokens_details(self, req: Req) -> Optional[dict]:
"""Get detailed cache breakdown for a request, if available.

Returns:
- None if HiCache is not enabled
- {"device": X, "host": Y} if HiCache enabled but L3 storage is not
- {"device": X, "host": Y, "storage": Z, "storage_backend": "..."} if L3 enabled
- None if no cached tokens at all
- {"device": X, "host": Y} without storage breakdown
- {"device": X, "host": Y, "storage": Z} with storage breakdown
"""
# Only show details if HiCache is enabled
if not getattr(self, "enable_hierarchical_cache", False):
return None

# Only show if there are any cached tokens
if (
req.cached_tokens_device > 0
or req.cached_tokens_host > 0
Expand All @@ -78,6 +73,13 @@ def _get_cached_tokens_details(self, req: Req) -> Optional[dict]:
details["storage"] = req.cached_tokens_storage
details["storage_backend"] = self._get_storage_backend_type()
return details

if req.cached_tokens > 0:
return {
"device": req.cached_tokens,
"host": 0,
}

return None

def process_batch_result_prebuilt(self: Scheduler, batch: ScheduleBatch):
Expand Down
Loading