Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 19 additions & 6 deletions python/sglang/srt/layers/moe/topk.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ def fused_topk_deepseek(
if _use_aiter:
try:
from aiter import biased_grouped_topk as aiter_biased_grouped_topk
from aiter import topk_softmax as aiter_topk_softmax
except ImportError:
raise ImportError("aiter is required when SGLANG_USE_AITER is set to True")

Expand Down Expand Up @@ -511,12 +512,24 @@ def fused_topk(
topk_ids = torch.empty(M, topk, dtype=torch.int32, device=hidden_states.device)

if scoring_func == "softmax":
topk_softmax(
topk_weights,
topk_ids,
gating_output,
renormalize,
)
if _use_aiter:
token_expert_indices = torch.empty(
M, topk, dtype=torch.int32, device=hidden_states.device
)
Comment on lines +516 to +518
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The token_expert_indices tensor is allocated here but its value is not used after the call to aiter_topk_softmax. This results in an unnecessary memory allocation on every call to fused_topk when _use_aiter is enabled. A similar pattern is present in fused_topk_deepseek with aiter_biased_grouped_topk.

If this tensor is a required output parameter for the aiter kernels that is not needed by sglang, consider checking if the aiter API allows passing None to avoid the allocation. If it's a workspace, it might be possible to manage it more efficiently, for example by using a pre-allocated buffer from a memory pool.

aiter_topk_softmax(
topk_weights,
topk_ids,
token_expert_indices,
gating_output,
renormalize,
)
else:
topk_softmax(
topk_weights,
topk_ids,
gating_output,
renormalize,
)
elif scoring_func == "sigmoid":
topk_sigmoid(
topk_weights,
Expand Down
Loading