Skip to content

score-mod backward SM90#2137

Merged
drisspg merged 1 commit intomainfrom
drisspg/stack/8
Jan 10, 2026
Merged

score-mod backward SM90#2137
drisspg merged 1 commit intomainfrom
drisspg/stack/8

Conversation

@drisspg
Copy link
Collaborator

@drisspg drisspg commented Jan 5, 2026

drisspg added a commit to drisspg/flash-attention that referenced this pull request Jan 5, 2026
Adds score_mod and mask_mod support to SM90 backward pass:
- score_mod, score_mod_bwd, mask_mod, has_aux_tensors parameters
- apply_score_mod() and apply_score_mod_bwd() methods
- fastdiv_mods and aux_tensors plumbing through kernel/mma
- mask_mod application in mask_fn for both block-sparse and dense paths
- Score modification in mma_one_m_block before softmax

stack-info: PR: Dao-AILab#2137, branch: drisspg/stack/8
drisspg added a commit to drisspg/flash-attention that referenced this pull request Jan 5, 2026
Adds score_mod and mask_mod support to SM90 backward pass:
- score_mod, score_mod_bwd, mask_mod, has_aux_tensors parameters
- apply_score_mod() and apply_score_mod_bwd() methods
- fastdiv_mods and aux_tensors plumbing through kernel/mma
- mask_mod application in mask_fn for both block-sparse and dense paths
- Score modification in mma_one_m_block before softmax

stack-info: PR: Dao-AILab#2137, branch: drisspg/stack/8
drisspg added a commit to drisspg/flash-attention that referenced this pull request Jan 5, 2026
Adds score_mod and mask_mod support to SM90 backward pass:
- score_mod, score_mod_bwd, mask_mod, has_aux_tensors parameters
- apply_score_mod() and apply_score_mod_bwd() methods
- fastdiv_mods and aux_tensors plumbing through kernel/mma
- mask_mod application in mask_fn for both block-sparse and dense paths
- Score modification in mma_one_m_block before softmax

stack-info: PR: Dao-AILab#2137, branch: drisspg/stack/8
@drisspg drisspg marked this pull request as draft January 5, 2026 05:42
@drisspg drisspg changed the base branch from drisspg/stack/7 to main January 5, 2026 05:42
@drisspg drisspg changed the base branch from main to drisspg/stack/7 January 5, 2026 05:42
@drisspg drisspg marked this pull request as ready for review January 5, 2026 05:43
stack-info: PR: #2137, branch: drisspg/stack/8
@drisspg drisspg marked this pull request as draft January 5, 2026 17:04
@drisspg drisspg changed the base branch from drisspg/stack/7 to main January 5, 2026 17:04
drisspg added a commit that referenced this pull request Jan 5, 2026
stack-info: PR: #2137, branch: drisspg/stack/8
@drisspg drisspg changed the base branch from main to drisspg/stack/7 January 5, 2026 17:04
@drisspg drisspg marked this pull request as ready for review January 5, 2026 17:04
@drisspg drisspg marked this pull request as draft January 5, 2026 19:08
@drisspg drisspg changed the base branch from drisspg/stack/7 to main January 6, 2026 17:59
drisspg added a commit that referenced this pull request Jan 6, 2026
stack-info: PR: #2137, branch: drisspg/stack/8
@drisspg drisspg changed the base branch from main to drisspg/stack/7 January 6, 2026 17:59
@drisspg drisspg marked this pull request as ready for review January 6, 2026 17:59
@drisspg drisspg marked this pull request as draft January 6, 2026 18:02
@drisspg drisspg changed the base branch from drisspg/stack/7 to main January 6, 2026 18:02
@drisspg drisspg changed the base branch from main to drisspg/stack/7 January 6, 2026 18:02
@drisspg drisspg marked this pull request as ready for review January 6, 2026 18:02
@drisspg drisspg marked this pull request as draft January 6, 2026 18:49
@drisspg drisspg changed the base branch from drisspg/stack/7 to main January 6, 2026 18:50
@drisspg drisspg changed the base branch from main to drisspg/stack/7 January 6, 2026 18:50
@drisspg drisspg marked this pull request as ready for review January 6, 2026 18:50
@drisspg drisspg marked this pull request as draft January 6, 2026 18:57
@drisspg drisspg changed the base branch from drisspg/stack/7 to main January 6, 2026 18:57
@drisspg drisspg changed the base branch from main to drisspg/stack/7 January 6, 2026 18:57
@drisspg drisspg marked this pull request as ready for review January 6, 2026 18:57
@drisspg drisspg marked this pull request as draft January 6, 2026 19:02
@drisspg drisspg changed the base branch from drisspg/stack/7 to main January 6, 2026 19:02
@drisspg drisspg changed the base branch from main to drisspg/stack/7 January 6, 2026 19:02
@drisspg drisspg marked this pull request as ready for review January 6, 2026 19:02
assert cu_seqlens_q is None and cu_seqlens_k is None, (
"varlen + score_mod not supported in bwd yet"
)
assert compute_capability in [10, 11], "score_mod in bwd only supported on SM100/SM110 for now"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep but make it [9,10,11]?

@v0i0
Copy link
Collaborator

v0i0 commented Jan 9, 2026

why is triton faster for small sizes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants