[Cute][Flex] Fix kernel hang w/ multiple empty tiles#2258
Merged
Conversation
drisspg
added a commit
that referenced
this pull request
Feb 13, 2026
stack-info: PR: #2258, branch: drisspg/stack/17
894de40 to
c69a02f
Compare
Collaborator
Author
Member
|
Do we have some convention here about which mbarriers are triggered regardless, and which mbarriers are triggered only if the tiles are non-empty? If so we should write it down explicitly as comments for next time. |
tridao
approved these changes
Feb 15, 2026
Collaborator
Author
|
This is a good call out, I'll document the semantics here before landing |
stack-info: PR: #2258, branch: drisspg/stack/17
c69a02f to
3f814e8
Compare
5t4r1i9ht
pushed a commit
to 5t4r1i9ht/flash-attention
that referenced
this pull request
Mar 15, 2026
stack-info: PR: Dao-AILab#2258, branch: drisspg/stack/17
LoserCheems
pushed a commit
to HKUSTDial/flash-sparse-attention
that referenced
this pull request
Mar 24, 2026
stack-info: PR: Dao-AILab/flash-attention#2258, branch: drisspg/stack/17
ussoewwin
pushed a commit
to ussoewwin/flash-attention
that referenced
this pull request
May 13, 2026
stack-info: PR: Dao-AILab#2258, branch: drisspg/stack/17
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

[Cute][Flex] Fix kernel hang w/ multiple empty tiles
I think these werent surfaced because I didnt have any test that goes from empty row to empty row
This fixes basically 3 bugs with our phasing related to empty rows
handle_block_sparse_empty_tile_correction_sm100for empty tiles we need to still flush the correct we were arriving on the rescaled P but the mma doesnt wait in the empty casesoftmax_block_sparse_sm100for empty rows we were arrving on the rescaling bars again with no matching waiter, and formbar_softmax_corr_emptywe were arriving though we should only be waiting on and it gets arrived in the correction loopmbar_softmax_corr_empty_offsetif blocksparse since you can jump from row chunks that have work to ones that dont, and we want to ensure we arent flushing wrong values from previous iterationso_corr_consumer_phasesince for empty tiles again mma never arrives on this and correction never consumes (there is nothing from the mma warp to wait for) and if you had odd number of empty tiles then a tile with work the phase would have been flip flopped and we hang