vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below by 0cc4m · Pull Request #19921 · ggml-org/llama.cpp

0cc4m · 2026-02-26T09:43:20Z

For some reason a f16vec4 subgroupShuffleXor is broken on RDNA2 and lower. I found a workaround by shuffling vec4 instead. This also fixes fp16 Flash Attention on AMD GCN, so I removed the fp32 fallback.

Fixes #19881 and also the issue reported here: #19625 (comment)

@masamaru-san @DeryabinIvan Please try this fix and let me know if it works for you.

DeryabinIvan · 2026-02-26T10:49:48Z

Everything works as expected on my side

ggml/src/ggml-vulkan/vulkan-shaders/flash_attn.comp

…-org#19921)

vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below

e432922

0cc4m requested a review from jeffbolznv February 26, 2026 09:43

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 26, 2026

JohnLoveJoy mentioned this pull request Feb 26, 2026

[Issue]: GCN5 support on llama.cpp - The AMD proprietary driver has an issue with FP16. ROCm/TheRock#3604

Open

jeffbolznv approved these changes Feb 26, 2026

View reviewed changes

ggml/src/ggml-vulkan/vulkan-shaders/flash_attn.comp Show resolved Hide resolved

0cc4m merged commit 723c710 into master Feb 26, 2026
78 checks passed

0cc4m deleted the 0cc4m/vulkan-fix-fa-amd-windows branch February 26, 2026 18:11

bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026

vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below (ggml…

23d636d

…-org#19921)

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026

vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below (ggml…

66a0a75

…-org#19921)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below#19921

vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below#19921
0cc4m merged 1 commit intomasterfrom
0cc4m/vulkan-fix-fa-amd-windows

0cc4m commented Feb 26, 2026

Uh oh!

DeryabinIvan commented Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

0cc4m commented Feb 26, 2026

Uh oh!

DeryabinIvan commented Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants