Skip to content

ggml-webgpu: only use subgroup-matrix path when head dims are divisib…#23020

Merged
reeselevine merged 1 commit into
ggml-org:masterfrom
ArberSephirotheca:webgpu-fattn-sgmat-dim-guard
May 13, 2026
Merged

ggml-webgpu: only use subgroup-matrix path when head dims are divisib…#23020
reeselevine merged 1 commit into
ggml-org:masterfrom
ArberSephirotheca:webgpu-fattn-sgmat-dim-guard

Conversation

@ArberSephirotheca
Copy link
Copy Markdown
Contributor

Overview

Previously, WebGPU FlashAttention selected the subgroup matrix path whenever subgroup matrix support was available. However, this fails in certain cases. For example, Jetson Thor’s smallest supported subgroup matrix shape is 16x16x16, which is incompatible with head dimensions such as 40 and 72.
This change adds a shape guard before selecting the subgroup matrix path. Specifically, it requires:
head_dim_qk % sg_mat_k == 0 and head_dim_v % sg_mat_n == 0.

Requirements

I used an AI agent to help me understand why the tests are failing on my Jetson Thor machine.

@ArberSephirotheca ArberSephirotheca requested a review from a team as a code owner May 13, 2026 18:22
@reeselevine
Copy link
Copy Markdown
Contributor

Nice I wonder if this is the same failure I'm observing just now as I try to enable the nvidia ci: https://github.com/ggml-org/llama.cpp/actions/runs/25816362883/job/75845993489?pr=22976#step:4:13081

@ArberSephirotheca
Copy link
Copy Markdown
Contributor Author

Yea very likely, These tests were also failed on my Jetson Thor as they have hsv = 40, which is not divisible by 16.

@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning WebGPU labels May 13, 2026
@reeselevine reeselevine merged commit 4c1c3ac into ggml-org:master May 13, 2026
47 checks passed
xxmustafacooTR pushed a commit to xxPlayground/llama-cpp-turboquant that referenced this pull request May 14, 2026
dandm1 pushed a commit to dandm1/llama.cpp that referenced this pull request May 16, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 19, 2026
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
winstonma pushed a commit to winstonma/llama.cpp that referenced this pull request May 27, 2026
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning WebGPU

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants