[AMD] Fix FA3 support check crash on ROCm (torch.version.cuda is None) by bingxche · Pull Request #22335 · sgl-project/sglang

bingxche · 2026-04-08T06:46:00Z

Motivation

PR #20796 (1a8eb890f6, "Kernels community fa3", merged 2026-04-07) introduced a new unified flash attention dispatch layer (python/sglang/jit_kernel/flash_attention.py and flash_attention_v3.py). As part of this refactor, multimodal_gen's flash attention backend was changed from directly importing sgl_kernel.flash_attn.flash_attn_varlen_func to importing from the new sglang.jit_kernel.flash_attention module.

The new code path unconditionally calls _is_fa3_supported() before dispatching, which checks:

return (torch.version.cuda >= "12.3") and (...)

On ROCm (AMD GPUs), torch.version.cuda is None, causing:

TypeError: '>=' not supported between instances of 'NoneType' and 'str'

This crashes all diffusion tests that use Qwen-Image or Z-Image-Turbo models during the TextEncodingStage, because the text encoder's attention layer goes through the new FA3 code path.

Impact

All 3 multimodal-gen test jobs in the AMD AITER Scout #29 workflow failed:

Job	Failed Test	Model
`multimodal-gen-test-1-gpu-amd` (part 0)	`qwen_image_t2i`	`Qwen/Qwen-Image`
`multimodal-gen-test-1-gpu-amd` (part 1)	`qwen_image_t2i_cache_dit_enabled`	`Qwen/Qwen-Image`
`multimodal-gen-test-2-gpu-amd` (part 0)	`fsdp-inference`	`Tongyi-MAI/Z-Image-Turbo`

Each job retried 6 times and exhausted all retries with the same TypeError.

Fix

Guard _is_fa3_supported() to return False when torch.version.cuda is None (i.e., on ROCm), since FA3 is a CUDA-only feature.

Test Plan

Verify multimodal-gen-test-1-gpu-amd passes (Qwen-Image T2I tests)
Verify multimodal-gen-test-2-gpu-amd passes (Z-Image-Turbo FSDP test)
No impact on CUDA paths (the guard only triggers when torch.version.cuda is None)

Made with Cursor

gemini-code-assist · 2026-04-08T06:46:05Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

bingxche · 2026-04-08T08:07:36Z

test https://github.com/sgl-project/sglang/actions/runs/24145031904

polisettyvarma · 2026-04-08T10:51:05Z

@bingxche it's a problem for XPU also
when can this PR be marked ready for review ?

bingxche · 2026-04-08T11:03:30Z

@bingxche it's a problem for XPU also when can this PR be marked ready for review ?

Waiting for CI test https://github.com/sgl-project/sglang/actions/runs/24124568923/job/70386014759, after the test passed I will mark it as ready for review.

polisettyvarma · 2026-04-08T11:42:25Z

@bingxche it's a problem for XPU also when can this PR be marked ready for review ?

Waiting for CI test https://github.com/sgl-project/sglang/actions/runs/24124568923/job/70386014759, after the test passed I will mark it as ready for review.

@bingxche seems test failed, please check

…SDPA The ROCm multimodal-gen platform incorrectly selects the FlashAttention backend for the text encoder when the flash_attn package is installed. The FA backend routes through FA3 (sgl-kernel), which is CUDA-only, causing a crash on ROCm. Add an explicit _is_fa3_supported() check in the ROCm platform backend selector. When FA3 is not supported (torch.version.cuda is None on ROCm), fall back to Torch SDPA backend instead. Regression introduced by 1a8eb89 ("Kernels community fa3 (#20796)"). Made-with: Cursor

github-actions bot added the jit-kernel label Apr 8, 2026

polisettyvarma approved these changes Apr 8, 2026

View reviewed changes

polisettyvarma mentioned this pull request Apr 8, 2026

[Intel GPU] Upgrade pytorch xpu version to 2.11 #21908

Open

5 tasks

github-actions bot added amd sgl-kernel diffusion SGLang Diffusion labels Apr 8, 2026

bingxche force-pushed the bingxche/fix-amd-multimodal branch 2 times, most recently from 22988bf to 0095eda Compare April 8, 2026 15:51

bingxche force-pushed the bingxche/fix-amd-multimodal branch from 0095eda to f75b6e3 Compare April 8, 2026 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Fix FA3 support check crash on ROCm (torch.version.cuda is None)#22335

[AMD] Fix FA3 support check crash on ROCm (torch.version.cuda is None)#22335
bingxche wants to merge 1 commit intomainfrom
bingxche/fix-amd-multimodal

bingxche commented Apr 8, 2026

Uh oh!

gemini-code-assist bot commented Apr 8, 2026

Uh oh!

bingxche commented Apr 8, 2026 •

edited

Loading

Uh oh!

polisettyvarma commented Apr 8, 2026

Uh oh!

bingxche commented Apr 8, 2026

Uh oh!

polisettyvarma commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bingxche commented Apr 8, 2026

Motivation

Impact

Fix

Test Plan

Uh oh!

gemini-code-assist bot commented Apr 8, 2026

Uh oh!

bingxche commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

polisettyvarma commented Apr 8, 2026

Uh oh!

bingxche commented Apr 8, 2026

Uh oh!

polisettyvarma commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bingxche commented Apr 8, 2026 •

edited

Loading