Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions python/sglang/srt/server_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -2042,6 +2042,13 @@ def _handle_mamba_radix_cache(
== 0
), f"For SSM models with extra buffer, either FLA_CHUNK_SIZE or page_size must be divisible by the other, got {FLA_CHUNK_SIZE=}, {self.page_size=}"
elif not self.disable_radix_cache: # no_buffer
if self.page_size is not None and self.page_size != 1:
logger.warning(
f"{model_arch} with radix cache requires page_size=1 in the current "
f"Mamba scheduling mode (no_buffer), but got {self.page_size}. "
"Automatically setting page_size=1."
)
self.page_size = 1
Comment on lines +2045 to +2051
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this change correctly identifies that page_size should be 1 for this Mamba mode, setting it here can be overridden by subsequent logic, potentially leading to the same crash this PR aims to prevent.

Specifically, _handle_attention_backend_compatibility() is called after this, and it may enforce a different page_size for certain attention backends (e.g., cutlass_mla, trtllm_mla), causing MambaRadixCache to fail its page_size == 1 assertion.

A more robust approach would be to first check for such conflicts. If an incompatible attention backend is used, you could disable the radix cache, similar to how it's already handled for trtllm_mha below (lines 2058-2064). This would ensure settings are consistent. Consider expanding that logic to cover other incompatible backends and moving the check before this page_size correction.

if self.speculative_algorithm is None:
logger.warning(
"Disabling overlap schedule since mamba no_buffer is not compatible with "
Expand Down
Loading