-
Notifications
You must be signed in to change notification settings - Fork 15.5k
CUDA: Factor out and re-use block_reduce function
#18785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
am17an
merged 17 commits into
ggml-org:master
from
ORippler:osimons/factor_out_two_stage_warp_reductions
Jan 15, 2026
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
418fb72
CUDA: Refactor and expose two_stage_warp_reduce_* function
ORippler 1ebe58d
Use `two_stage_warp_reduce` also in softmax kernel, move smem out of it
ORippler c63c148
Update ggml/src/ggml-cuda/common.cuh
ORippler 41bab53
Use two_stage_warp_reduce in group_norm_f32
ORippler a40b15f
Use two_stage_warp_reduce in rms_norm_f32
ORippler dbd449e
Fix smem calculation which expects bytes
ORippler 7f43e64
Make `two_stage_warp_reduce` accept all values warp_reduce accepts
ORippler 67a9c13
Use two_stage_warp_reduce in l2_norm_f32
ORippler 612874f
Use type traits for block reduction for better legibility
ORippler 82a3458
Make norm tests cover all cuda paths
ORippler 8bc326a
Mark columns % WARP_SIZE !=0 as supported for RMS_NORM_BACK
ORippler bd6ffff
Use `enum class` for `block_reduce_method`
ORippler c1a048b
Rename variables as suggested in code review by @am17an
ORippler 767eba9
Rename two_stage_warp_reduce -> block_reduce
ORippler 0ed3721
Fix trailing whitespace in common.cuh
ORippler b275e42
Make condition of static_assert type-dependent
ORippler 38e3040
Inline definitions
ORippler File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should be called
block_reduce_1d, users might expectblock_reduceto reduce any dimension of block. Or perhaps we can add an assert that blockDim.y == 1