[AMD] Fix FA3 support check crash on ROCm (torch.version.cuda is None)#22335
[AMD] Fix FA3 support check crash on ROCm (torch.version.cuda is None)#22335
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
@bingxche it's a problem for XPU also |
Waiting for CI test https://github.com/sgl-project/sglang/actions/runs/24124568923/job/70386014759, after the test passed I will mark it as ready for review. |
@bingxche seems test failed, please check |
22988bf to
0095eda
Compare
…SDPA The ROCm multimodal-gen platform incorrectly selects the FlashAttention backend for the text encoder when the flash_attn package is installed. The FA backend routes through FA3 (sgl-kernel), which is CUDA-only, causing a crash on ROCm. Add an explicit _is_fa3_supported() check in the ROCm platform backend selector. When FA3 is not supported (torch.version.cuda is None on ROCm), fall back to Torch SDPA backend instead. Regression introduced by 1a8eb89 ("Kernels community fa3 (#20796)"). Made-with: Cursor
0095eda to
f75b6e3
Compare
Motivation
PR #20796 (
1a8eb890f6, "Kernels community fa3", merged 2026-04-07) introduced a new unified flash attention dispatch layer (python/sglang/jit_kernel/flash_attention.pyandflash_attention_v3.py). As part of this refactor,multimodal_gen's flash attention backend was changed from directly importingsgl_kernel.flash_attn.flash_attn_varlen_functo importing from the newsglang.jit_kernel.flash_attentionmodule.The new code path unconditionally calls
_is_fa3_supported()before dispatching, which checks:On ROCm (AMD GPUs),
torch.version.cudaisNone, causing:This crashes all diffusion tests that use Qwen-Image or Z-Image-Turbo models during the
TextEncodingStage, because the text encoder's attention layer goes through the new FA3 code path.Impact
All 3 multimodal-gen test jobs in the AMD AITER Scout #29 workflow failed:
multimodal-gen-test-1-gpu-amd(part 0)qwen_image_t2iQwen/Qwen-Imagemultimodal-gen-test-1-gpu-amd(part 1)qwen_image_t2i_cache_dit_enabledQwen/Qwen-Imagemultimodal-gen-test-2-gpu-amd(part 0)fsdp-inferenceTongyi-MAI/Z-Image-TurboEach job retried 6 times and exhausted all retries with the same
TypeError.Fix
Guard
_is_fa3_supported()to returnFalsewhentorch.version.cudaisNone(i.e., on ROCm), since FA3 is a CUDA-only feature.Test Plan
multimodal-gen-test-1-gpu-amdpasses (Qwen-Image T2I tests)multimodal-gen-test-2-gpu-amdpasses (Z-Image-Turbo FSDP test)torch.version.cuda is None)Made with Cursor