Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs_new/docs/sglang-diffusion/attention_backends.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ When using the diffusers backend, `--attention-backend` is passed through to dif
- **CUDA**: prefers FlashAttention (FA3/FA4) when supported; otherwise falls back to PyTorch SDPA.
- **ROCm**: uses FlashAttention when available; otherwise falls back to PyTorch SDPA.
- **Intel XPU**: uses XPU Flash Attention backend (fp16/bf16, head sizes 64/96/128/192/256); otherwise falls back to PyTorch SDPA.
- **MUSA**: uses FlashAttention when available; otherwise falls back to PyTorch SDPA.
- **MUSA**: uses FlashAttention when available; also supports Sage Attention when installed; otherwise falls back to PyTorch SDPA.
- **MPS**: always uses PyTorch SDPA.
- **NPU**: for ring attention uses FA otherwise uses PyTorch SDPA.

Expand Down Expand Up @@ -349,10 +349,10 @@ Some backends require additional configuration. You can pass these parameters vi
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Yes</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>No</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>No</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>No</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>Yes</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>❌</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>❌</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>CUDA-only (optional dependency).</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Optional dependency on CUDA and MUSA. Falls back to FlashAttention if <code>sageattention</code> is not installed.</td>
</tr>
<tr>
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`sage_attn_3`</td>
Expand Down
2 changes: 1 addition & 1 deletion python/sglang/multimodal_gen/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ SGLang Diffusion supports AMD Instinct GPUs through ROCm. On AMD platforms, we u

### Moore Threads/MUSA Support

SGLang Diffusion supports Moore Threads GPUs (MTGPU) through the MUSA software stack. On MUSA platforms, we use the Torch SDPA backend for attention. See the [installation guide](https://github.com/sgl-project/sglang/tree/main/docs/diffusion/installation.md) for setup instructions.
SGLang Diffusion supports Moore Threads GPUs (MTGPU) through the MUSA software stack. On MUSA platforms, we use FlashAttention (FA3) when available; also supports Sage Attention when installed; otherwise falls back to the Torch SDPA backend. See the [installation guide](https://github.com/sgl-project/sglang/tree/main/docs/diffusion/installation.md) for setup instructions.

### Apple MPS Support

Expand Down
19 changes: 18 additions & 1 deletion python/sglang/multimodal_gen/runtime/platforms/musa.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,23 @@ def get_attn_backend_cls_str(
if selected_backend == AttentionBackendEnum.TORCH_SDPA:
logger.info("Using Torch SDPA backend")
return "sglang.multimodal_gen.runtime.layers.attention.backends.sdpa.SDPABackend"
elif selected_backend == AttentionBackendEnum.SAGE_ATTN:
try:
from sageattention import sageattn # noqa: F401

from sglang.multimodal_gen.runtime.layers.attention.backends.sage_attn import ( # noqa: F401
SageAttentionBackend,
)

logger.info("Using Sage Attention backend")

return "sglang.multimodal_gen.runtime.layers.attention.backends.sage_attn.SageAttentionBackend"
except ImportError as e:
logger.info(e)
logger.info(
"Sage Attention backend is not installed (To install it, run `pip install sageattention>=0.1.0`). Falling back to Flash Attention."
)
target_backend = AttentionBackendEnum.FA
elif selected_backend in [
AttentionBackendEnum.FA,
]:
Expand Down Expand Up @@ -208,7 +225,7 @@ def get_attn_backend_cls_str(
logger.info("Using Torch SDPA backend")
return "sglang.multimodal_gen.runtime.layers.attention.backends.sdpa.SDPABackend"

logger.info("Using FlashAttention (FA3) backend on MUSA")
logger.info("Using FlashAttention (FA3) backend")
return "sglang.multimodal_gen.runtime.layers.attention.backends.flash_attn.FlashAttentionBackend"

@classmethod
Expand Down
Loading