Skip to content

[Sm75] Add README link for initial Turing support#2379

Merged
tridao merged 1 commit intoDao-AILab:mainfrom
ssiu:turing
Mar 25, 2026
Merged

[Sm75] Add README link for initial Turing support#2379
tridao merged 1 commit intoDao-AILab:mainfrom
ssiu:turing

Conversation

@ssiu
Copy link
Copy Markdown
Contributor

@ssiu ssiu commented Mar 21, 2026

FlashAttention Turing

This PR adds a link to the flash-attention-turing repo that provides support for Turing (SM75) architecture in FlashAttention, following #1533.

Features

Supports:

  • fwd and bwd
  • head dim 64, 128
  • causal mask
  • grouped-query attention (GQA)
  • variable sequence lengths (varlen)

Does not support:

  • dropout
  • local mask
  • kv cache

Performance

Benchmarks are reported on Nvidia T4 GPUs.

Forward pass

Up to 2.19x and 1.95x faster than PyTorch's Attention for non-causal and causal workloads.

On Turing GPUs, PyTorch's Attention uses Memory-Efficient Attention from xformers, since FlashAttention does not provide optimized kernels for SM75.

For long sequences, the forward kernel reaches up to 66% compute throughput.

Forward pass benchmark for head dimension 128

Forward pass benchmark for head dimension 64

Backward pass

The backward pass is split into two kernels: one for dQ and one for dK and dV.

Up to 1.35x and 1.51x faster than PyTorch's Attention for non-causal and causal workloads.

For long sequences, the backward kernels reach up to 49% compute throughput for dK and dV, and 45% for dQ.

Backward pass benchmark for head dimension 128

Backward pass benchmark for head dimension 64

Correctness and numerical differences

From our tests in test_flash_attn.py, we consistently observe maximum and mean absolute differences of ~1e-3 and ~1e-5 respectively relative to PyTorch's attention kernels.

Thanks!

@tridao
Copy link
Copy Markdown
Member

tridao commented Mar 21, 2026

Thanks!
Given that we're switching to Cute-DSL for current and future development, we won't support Turing in this repo (cute-dsl requires Ampere+). You're welcome to have the Turing impl in another repo and we could link to it.

@ssiu
Copy link
Copy Markdown
Contributor Author

ssiu commented Mar 21, 2026

Thanks for the comment!

Sure, I’ll keep the Turing implementation in a separate repo and share it here once it’s cleaned up.

@ssiu ssiu changed the title [Sm75] Initial Turing support [Sm75] Add README link for initial Turing support Mar 23, 2026
@ssiu
Copy link
Copy Markdown
Contributor Author

ssiu commented Mar 23, 2026

Hi @tridao ,

I just cleaned up the Turing repo. I think it's good to go now. Thanks again!

@tridao tridao merged commit b8eda39 into Dao-AILab:main Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants