block-sparse backward SM90 by drisspg · Pull Request #2136 · Dao-AILab/flash-attention

drisspg · 2026-01-05T01:02:03Z

Stacked PRs:

block-sparse backward SM90

Adds block-sparse support to SM90 backward pass: - Block-sparse iteration with process_tile, get_block_sparse_iteration_info_bwd - m_block_safe clamping for loads when subtile_factor>1 - Zero-fill for KV tiles with no Q blocks - dQaccum_store with blocksparse_tensors parameter - bwd_subtile_factor=2 for SM90 block sparsity (matches BlockMask 128 granularity) - Tile size m_block_size=64 when using block sparsity stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7

flash_attn/cute/interface.py

drisspg · 2026-01-05T03:22:27Z

flash_attn/cute/interface.py


    use_block_sparsity = block_sparse_tensors is not None

+    # For SM90 with block sparsity, use tile_m=64 with subtile_factor=2 to match


This was mostly to find the GCD between a m_block_size that would fit and the base block_m of 128 from fwd and block-sparse size for subtiling.

drisspg · 2026-01-05T03:23:38Z

flash_attn/cute/interface.py

            expected_count_shape, expected_index_shape = get_block_sparse_expected_shapes_bwd(
                batch_size, num_head, seqlen_q, seqlen_k,
-                m_block_size, n_block_size, subtile_factor,
+                m_block_size, n_block_size, bwd_subtile_factor,


nb: bwd_subtile_factor is always 2 but we could make this larger in a follow up and allow for smaller tile sizes

flash_attn/cute/interface.py

flash_attn/cute/mask.py

flash_attn/cute/flash_bwd_sm90.py

stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7

flash_attn/cute/block_sparse_utils.py

stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7

stack-info: PR: #2136, branch: drisspg/stack/7

drisspg force-pushed the drisspg/stack/7 branch from 5742bb0 to 632f364 Compare January 5, 2026 01:02

This was referenced Jan 5, 2026

Misc tests that should be xfailed for now #2127

Merged

score-mod backward SM90 #2137

Merged

score-mod backward SM100 cleanup #2130

Closed

drisspg closed this Jan 5, 2026

drisspg reopened this Jan 5, 2026

drisspg commented Jan 5, 2026

View reviewed changes

flash_attn/cute/interface.py Outdated Show resolved Hide resolved

drisspg commented Jan 5, 2026

View reviewed changes

flash_attn/cute/interface.py Outdated Show resolved Hide resolved

drisspg commented Jan 5, 2026

View reviewed changes

flash_attn/cute/mask.py Outdated Show resolved Hide resolved

drisspg commented Jan 5, 2026

View reviewed changes

flash_attn/cute/flash_bwd_sm90.py Outdated Show resolved Hide resolved

drisspg commented Jan 5, 2026

View reviewed changes

flash_attn/cute/flash_bwd_sm90.py Show resolved Hide resolved

drisspg marked this pull request as draft January 5, 2026 05:42

drisspg changed the base branch from drisspg/stack/1 to main January 5, 2026 05:42

drisspg changed the base branch from main to drisspg/stack/1 January 5, 2026 05:42

drisspg marked this pull request as ready for review January 5, 2026 05:43

drisspg force-pushed the drisspg/stack/7 branch from 632f364 to 246cde5 Compare January 5, 2026 17:01

drisspg marked this pull request as draft January 5, 2026 17:04

drisspg changed the base branch from drisspg/stack/1 to main January 5, 2026 17:04

drisspg force-pushed the drisspg/stack/7 branch from 246cde5 to 2cc732e Compare January 5, 2026 17:04

drisspg changed the base branch from main to drisspg/stack/1 January 5, 2026 17:04

drisspg marked this pull request as ready for review January 5, 2026 17:04

drisspg marked this pull request as draft January 5, 2026 19:08

drisspg changed the base branch from drisspg/stack/1 to main January 5, 2026 19:08

drisspg force-pushed the drisspg/stack/7 branch from 2cc732e to 4e91b34 Compare January 5, 2026 19:08

drisspg changed the base branch from main to drisspg/stack/1 January 5, 2026 19:08

drisspg marked this pull request as ready for review January 5, 2026 19:08

drisspg commented Jan 5, 2026

View reviewed changes

flash_attn/cute/flash_bwd_sm90.py Outdated Show resolved Hide resolved

drisspg marked this pull request as draft January 6, 2026 19:02

drisspg force-pushed the drisspg/stack/7 branch from 1958cdc to da4b3e8 Compare January 6, 2026 19:02

drisspg marked this pull request as ready for review January 6, 2026 19:02

drisspg requested a review from tridao January 6, 2026 19:03

drisspg marked this pull request as draft January 7, 2026 00:21

drisspg mentioned this pull request Jan 7, 2026

[CUTE][SM90]Enable pack-gqa with broadcasted maskmods #2145

Merged

drisspg marked this pull request as ready for review January 7, 2026 00:22

drisspg added a commit to drisspg/flash-attention that referenced this pull request Jan 7, 2026

block-sparse backward SM90

da4b3e8

stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7

drisspg marked this pull request as draft January 7, 2026 01:37

drisspg mentioned this pull request Jan 7, 2026

[CUTE][SM100] Fix backward gqa on sm100 post mask-mod semantic change #2146

Merged

drisspg marked this pull request as ready for review January 7, 2026 01:38

niyunsheng mentioned this pull request Jan 9, 2026

[Feature Request] Support for FlashAttentionBackwardSm90 in flash_attn.cute? #2117

Closed

drisspg marked this pull request as draft January 9, 2026 03:02

drisspg marked this pull request as ready for review January 9, 2026 03:03

drisspg mentioned this pull request Jan 9, 2026

[CUTE][SM90] GQA backward non deterministic #2158

Merged

drisspg marked this pull request as draft January 9, 2026 03:13

drisspg marked this pull request as ready for review January 9, 2026 03:14

v0i0 approved these changes Jan 9, 2026

View reviewed changes

flash_attn/cute/block_sparse_utils.py Show resolved Hide resolved

drisspg added a commit to drisspg/flash-attention that referenced this pull request Jan 9, 2026

block-sparse backward SM90

7be008a

stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7

drisspg marked this pull request as draft January 9, 2026 23:19

drisspg force-pushed the drisspg/stack/7 branch from da4b3e8 to 7be008a Compare January 9, 2026 23:19

drisspg marked this pull request as ready for review January 9, 2026 23:20

drisspg marked this pull request as draft January 9, 2026 23:24

drisspg force-pushed the drisspg/stack/7 branch from 7be008a to d592b8d Compare January 9, 2026 23:24

drisspg marked this pull request as ready for review January 9, 2026 23:24

drisspg added a commit to drisspg/flash-attention that referenced this pull request Jan 9, 2026

block-sparse backward SM90

d592b8d

stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7

drisspg marked this pull request as draft January 9, 2026 23:39

drisspg marked this pull request as ready for review January 9, 2026 23:39

drisspg added a commit to drisspg/flash-attention that referenced this pull request Jan 10, 2026

block-sparse backward SM90

d0f91aa

stack-info: PR: Dao-AILab#2136, branch: drisspg/stack/7

block-sparse backward SM90

edd5b15

stack-info: PR: #2136, branch: drisspg/stack/7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

block-sparse backward SM90#2136

block-sparse backward SM90#2136
drisspg merged 1 commit intomainfrom
drisspg/stack/7

drisspg commented Jan 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

drisspg Jan 5, 2026

Uh oh!

drisspg Jan 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		use_block_sparsity = block_sparse_tensors is not None

		# For SM90 with block sparsity, use tile_m=64 with subtile_factor=2 to match

Conversation

drisspg commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!