Skip to content

[CuTe,Bwd,Sm100] don't disable 2cta due to cuda 12 in bwd#2543

Merged
jayhshah merged 1 commit into
Dao-AILab:mainfrom
reubenconducts:rstern/fix-disable-2cta-bwd
May 6, 2026
Merged

[CuTe,Bwd,Sm100] don't disable 2cta due to cuda 12 in bwd#2543
jayhshah merged 1 commit into
Dao-AILab:mainfrom
reubenconducts:rstern/fix-disable-2cta-bwd

Conversation

@reubenconducts
Copy link
Copy Markdown
Contributor

No description provided.

@jayhshah
Copy link
Copy Markdown
Collaborator

jayhshah commented May 6, 2026

This PR follows up on #2461 to only disable 2cta for fwd on cuda 12. 2cta for backward is still faster in general and is necessary for hdim 192.

@jayhshah jayhshah self-requested a review May 6, 2026 23:18
@jayhshah jayhshah merged commit c263382 into Dao-AILab:main May 6, 2026
reubenconducts added a commit to reubenconducts/flash-attention that referenced this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants