Conversation
5742bb0 to
632f364
Compare
This was referenced Jan 5, 2026
drisspg
added a commit
to drisspg/flash-attention
that referenced
this pull request
Jan 5, 2026
Adds block-sparse support to SM90 backward pass: - Block-sparse iteration with process_tile, get_block_sparse_iteration_info_bwd - m_block_safe clamping for loads when subtile_factor>1 - Zero-fill for KV tiles with no Q blocks - dQaccum_store with blocksparse_tensors parameter - bwd_subtile_factor=2 for SM90 block sparsity (matches BlockMask 128 granularity) - Tile size m_block_size=64 when using block sparsity stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7
drisspg
commented
Jan 5, 2026
drisspg
commented
Jan 5, 2026
flash_attn/cute/interface.py
Outdated
|
|
||
| use_block_sparsity = block_sparse_tensors is not None | ||
|
|
||
| # For SM90 with block sparsity, use tile_m=64 with subtile_factor=2 to match |
Collaborator
Author
There was a problem hiding this comment.
This was mostly to find the GCD between a m_block_size that would fit and the base block_m of 128 from fwd and block-sparse size for subtiling.
drisspg
commented
Jan 5, 2026
flash_attn/cute/interface.py
Outdated
| expected_count_shape, expected_index_shape = get_block_sparse_expected_shapes_bwd( | ||
| batch_size, num_head, seqlen_q, seqlen_k, | ||
| m_block_size, n_block_size, subtile_factor, | ||
| m_block_size, n_block_size, bwd_subtile_factor, |
Collaborator
Author
There was a problem hiding this comment.
nb: bwd_subtile_factor is always 2 but we could make this larger in a follow up and allow for smaller tile sizes
drisspg
commented
Jan 5, 2026
drisspg
commented
Jan 5, 2026
drisspg
commented
Jan 5, 2026
drisspg
commented
Jan 5, 2026
632f364 to
246cde5
Compare
246cde5 to
2cc732e
Compare
2cc732e to
4e91b34
Compare
drisspg
commented
Jan 5, 2026
1958cdc to
da4b3e8
Compare
drisspg
added a commit
to drisspg/flash-attention
that referenced
this pull request
Jan 7, 2026
stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7
v0i0
approved these changes
Jan 9, 2026
drisspg
added a commit
to drisspg/flash-attention
that referenced
this pull request
Jan 9, 2026
stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7
da4b3e8 to
7be008a
Compare
7be008a to
d592b8d
Compare
drisspg
added a commit
to drisspg/flash-attention
that referenced
this pull request
Jan 9, 2026
stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7
drisspg
added a commit
to drisspg/flash-attention
that referenced
this pull request
Jan 10, 2026
stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7
stack-info: PR: #2136, branch: drisspg/stack/7
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked PRs:
block-sparse backward SM90