-
Notifications
You must be signed in to change notification settings - Fork 19.8k
ggml-webgpu: add vectorized flash attention #20709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
reeselevine
merged 35 commits into
ggml-org:master
from
ArberSephirotheca:backup/subgroup_size_agnostic_rebased_ggml_master_20260317
Apr 2, 2026
Merged
Changes from 33 commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
976ebc6
naive vectorized version
ArberSephirotheca 94abbac
add vectorized flash attention
ArberSephirotheca 1033085
update vec version
ArberSephirotheca c307a4b
remove unused path and shader
ArberSephirotheca f8e317c
remove unused helper functions
ArberSephirotheca 52709dd
add comments
ArberSephirotheca df6ef45
remove pad path
ArberSephirotheca 838306f
ggml-webgpu: fix flash-attn vec nwg=1 path and tighten vec specializa…
ArberSephirotheca d61ec8f
change back to vec4
ArberSephirotheca 042a1a5
enable multi split
ArberSephirotheca b61e63d
enable vec path when:
ArberSephirotheca 3602743
update flast_attn_vec_split.wgsl to reduce redundant workgroup barrie…
ArberSephirotheca 356d6ff
enable vec path for q4 and q8
ArberSephirotheca 1ae041d
flash-attn vec nwg=1 fast path (skip tmp/reduce staging)
ArberSephirotheca 33a547e
use packed f16 K loads in flash-attn vec split
ArberSephirotheca 638c49b
use packed f16 K loads in flash-attn vec split on host side
ArberSephirotheca 0abac39
tune flash-attn vec f16 VEC_NE by head dim
ArberSephirotheca 83a42b3
cleanup
ArberSephirotheca 2595b1a
cleanup
ArberSephirotheca 25096b9
keep host side clean
ArberSephirotheca 3d6bfe0
cleanup host side
ArberSephirotheca 68fa272
change back to original host wait/submit behavior
ArberSephirotheca 5065dc6
formatting
ArberSephirotheca 03d0625
reverted param-buffer pool r ecfactor
ArberSephirotheca 5dd2a4b
add helper functions
ArberSephirotheca 1e0d856
ggml-webgpu: move flash-attn vec pipeline caching back into shader lib
ArberSephirotheca 88bf352
ggml-webgpu: remove duplicate functions
ArberSephirotheca 5c2fefe
ggml-webgpu: reserve flash-attn vec scratch in dst buffer allocation
ArberSephirotheca 59aa7d8
ggml-webgpu: revert unrelated change
ArberSephirotheca cac8500
ggml-webgpu: revert deleted comment
ArberSephirotheca ff11e38
Merge branch 'master' into backup/subgroup_size_agnostic_rebased_ggml…
ArberSephirotheca 4e0100b
disable uniformity check
ArberSephirotheca 56fee6e
remove unnecessary change
ArberSephirotheca 29c09c2
Update ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl
reeselevine f40c9e7
Update ggml/src/ggml-webgpu/ggml-webgpu.cpp
reeselevine File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.