Skip to content

[AMD] Fix FA3 support check crash on ROCm (torch.version.cuda is None)#22335

Draft
bingxche wants to merge 1 commit intomainfrom
bingxche/fix-amd-multimodal
Draft

[AMD] Fix FA3 support check crash on ROCm (torch.version.cuda is None)#22335
bingxche wants to merge 1 commit intomainfrom
bingxche/fix-amd-multimodal

Conversation

@bingxche
Copy link
Copy Markdown
Collaborator

@bingxche bingxche commented Apr 8, 2026

Motivation

PR #20796 (1a8eb890f6, "Kernels community fa3", merged 2026-04-07) introduced a new unified flash attention dispatch layer (python/sglang/jit_kernel/flash_attention.py and flash_attention_v3.py). As part of this refactor, multimodal_gen's flash attention backend was changed from directly importing sgl_kernel.flash_attn.flash_attn_varlen_func to importing from the new sglang.jit_kernel.flash_attention module.

The new code path unconditionally calls _is_fa3_supported() before dispatching, which checks:

return (torch.version.cuda >= "12.3") and (...)

On ROCm (AMD GPUs), torch.version.cuda is None, causing:

TypeError: '>=' not supported between instances of 'NoneType' and 'str'

This crashes all diffusion tests that use Qwen-Image or Z-Image-Turbo models during the TextEncodingStage, because the text encoder's attention layer goes through the new FA3 code path.

Impact

All 3 multimodal-gen test jobs in the AMD AITER Scout #29 workflow failed:

Job Failed Test Model
multimodal-gen-test-1-gpu-amd (part 0) qwen_image_t2i Qwen/Qwen-Image
multimodal-gen-test-1-gpu-amd (part 1) qwen_image_t2i_cache_dit_enabled Qwen/Qwen-Image
multimodal-gen-test-2-gpu-amd (part 0) fsdp-inference Tongyi-MAI/Z-Image-Turbo

Each job retried 6 times and exhausted all retries with the same TypeError.

Fix

Guard _is_fa3_supported() to return False when torch.version.cuda is None (i.e., on ROCm), since FA3 is a CUDA-only feature.

Test Plan

  • Verify multimodal-gen-test-1-gpu-amd passes (Qwen-Image T2I tests)
  • Verify multimodal-gen-test-2-gpu-amd passes (Z-Image-Turbo FSDP test)
  • No impact on CUDA paths (the guard only triggers when torch.version.cuda is None)

Made with Cursor

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@bingxche
Copy link
Copy Markdown
Collaborator Author

bingxche commented Apr 8, 2026

@polisettyvarma
Copy link
Copy Markdown
Contributor

@bingxche it's a problem for XPU also
when can this PR be marked ready for review ?

@bingxche
Copy link
Copy Markdown
Collaborator Author

bingxche commented Apr 8, 2026

@bingxche it's a problem for XPU also when can this PR be marked ready for review ?

Waiting for CI test https://github.com/sgl-project/sglang/actions/runs/24124568923/job/70386014759, after the test passed I will mark it as ready for review.

@polisettyvarma
Copy link
Copy Markdown
Contributor

@bingxche it's a problem for XPU also when can this PR be marked ready for review ?

Waiting for CI test https://github.com/sgl-project/sglang/actions/runs/24124568923/job/70386014759, after the test passed I will mark it as ready for review.

@bingxche seems test failed, please check

@bingxche bingxche force-pushed the bingxche/fix-amd-multimodal branch 2 times, most recently from 22988bf to 0095eda Compare April 8, 2026 15:51
…SDPA

The ROCm multimodal-gen platform incorrectly selects the FlashAttention
backend for the text encoder when the flash_attn package is installed.
The FA backend routes through FA3 (sgl-kernel), which is CUDA-only,
causing a crash on ROCm.

Add an explicit _is_fa3_supported() check in the ROCm platform backend
selector. When FA3 is not supported (torch.version.cuda is None on
ROCm), fall back to Torch SDPA backend instead.

Regression introduced by 1a8eb89 ("Kernels community fa3 (#20796)").

Made-with: Cursor
@bingxche bingxche force-pushed the bingxche/fix-amd-multimodal branch from 0095eda to f75b6e3 Compare April 8, 2026 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants