Skip to content

[Cute][Flex] Fix kernel hang w/ multiple empty tiles#2258

Merged
drisspg merged 1 commit into
mainfrom
drisspg/stack/17
Feb 16, 2026
Merged

[Cute][Flex] Fix kernel hang w/ multiple empty tiles#2258
drisspg merged 1 commit into
mainfrom
drisspg/stack/17

Conversation

@drisspg

@drisspg drisspg commented Feb 13, 2026

Copy link
Copy Markdown
Collaborator

[Cute][Flex] Fix kernel hang w/ multiple empty tiles

I think these werent surfaced because I didnt have any test that goes from empty row to empty row

This fixes basically 3 bugs with our phasing related to empty rows

  1. handle_block_sparse_empty_tile_correction_sm100 for empty tiles we need to still flush the correct we were arriving on the rescaled P but the mma doesnt wait in the empty case
  2. softmax_block_sparse_sm100 for empty rows we were arrving on the rescaling bars again with no matching waiter, and for mbar_softmax_corr_empty we were arriving though we should only be waiting on and it gets arrived in the correction loop
  3. Always wait on mbar_softmax_corr_empty_offset if blocksparse since you can jump from row chunks that have work to ones that dont, and we want to ensure we arent flushing wrong values from previous iterations
  4. Also fixed a spurious phase flip o_corr_consumer_phase since for empty tiles again mma never arrives on this and correction never consumes (there is nothing from the mma warp to wait for) and if you had odd number of empty tiles then a tile with work the phase would have been flip flopped and we hang
image

drisspg added a commit that referenced this pull request Feb 13, 2026
stack-info: PR: #2258, branch: drisspg/stack/17
@drisspg

drisspg commented Feb 14, 2026

Copy link
Copy Markdown
Collaborator Author

With this my fuzz tester for deterministic bwd passes for all tests;
image

@tridao

tridao commented Feb 15, 2026

Copy link
Copy Markdown
Member

Do we have some convention here about which mbarriers are triggered regardless, and which mbarriers are triggered only if the tiles are non-empty? If so we should write it down explicitly as comments for next time.

@drisspg

drisspg commented Feb 15, 2026

Copy link
Copy Markdown
Collaborator Author

This is a good call out, I'll document the semantics here before landing

stack-info: PR: #2258, branch: drisspg/stack/17
@drisspg drisspg marked this pull request as draft February 16, 2026 04:08
@drisspg drisspg marked this pull request as ready for review February 16, 2026 04:08
@drisspg drisspg merged commit fec3a6a into main Feb 16, 2026
5t4r1i9ht pushed a commit to 5t4r1i9ht/flash-attention that referenced this pull request Mar 15, 2026
LoserCheems pushed a commit to HKUSTDial/flash-sparse-attention that referenced this pull request Mar 24, 2026
@drisspg drisspg deleted the drisspg/stack/17 branch March 31, 2026 02:30
ussoewwin pushed a commit to ussoewwin/flash-attention that referenced this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants