Skip to content

Various bug fixes / enable subtile > 2#2411

Merged
drisspg merged 1 commit intomainfrom
drisspg/stack/31
Mar 30, 2026
Merged

Various bug fixes / enable subtile > 2#2411
drisspg merged 1 commit intomainfrom
drisspg/stack/31

Conversation

@drisspg
Copy link
Copy Markdown
Collaborator

@drisspg drisspg commented Mar 30, 2026

Various bug fixes / enable subtile > 2

  1. For small headdim we were expanding up to block_m 192. For the default block-q of 128. Update to take the block_sparse q size for both fwd and bwd.
  2. added get_sparse_q_block_size this ideally just returns None if no sparsity or the set size from input. I am like 99% all things have been swapped to using the explicit setting of blocksparse data. We have switch over already in PT but want to give it 1 more release so we still infer just in case.
  3. When enabling a viable 192, 128 -> noticed a kernel hang in the bwd. This was the first time we were hitting dQ_single_wg so I fixed to use the correct wg count
  4. Also enabled subtiling > 2 which is the case for 192 block Q and m_tile 64 in thw bwd

drisspg added a commit that referenced this pull request Mar 30, 2026
stack-info: PR: #2411, branch: drisspg/stack/31
Comment thread flash_attn/cute/interface.py
stack-info: PR: #2411, branch: drisspg/stack/31
@drisspg drisspg marked this pull request as draft March 30, 2026 19:12
@drisspg drisspg marked this pull request as ready for review March 30, 2026 19:12
@drisspg drisspg merged commit 66bedce into main Mar 30, 2026
1 check passed
@drisspg drisspg deleted the drisspg/stack/31 branch March 31, 2026 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants