Skip to content

vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below#19921

Merged
0cc4m merged 1 commit intomasterfrom
0cc4m/vulkan-fix-fa-amd-windows
Feb 26, 2026
Merged

vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below#19921
0cc4m merged 1 commit intomasterfrom
0cc4m/vulkan-fix-fa-amd-windows

Conversation

@0cc4m
Copy link
Contributor

@0cc4m 0cc4m commented Feb 26, 2026

For some reason a f16vec4 subgroupShuffleXor is broken on RDNA2 and lower. I found a workaround by shuffling vec4 instead. This also fixes fp16 Flash Attention on AMD GCN, so I removed the fp32 fallback.

Fixes #19881 and also the issue reported here: #19625 (comment)

@masamaru-san @DeryabinIvan Please try this fix and let me know if it works for you.

@0cc4m 0cc4m requested a review from jeffbolznv February 26, 2026 09:43
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 26, 2026
@DeryabinIvan
Copy link

Everything works as expected on my side
изображение

@0cc4m 0cc4m merged commit 723c710 into master Feb 26, 2026
78 checks passed
@0cc4m 0cc4m deleted the 0cc4m/vulkan-fix-fa-amd-windows branch February 26, 2026 18:11
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Garbage output after #19625

3 participants