[Sm75] Add README link for initial Turing support by ssiu · Pull Request #2379 · Dao-AILab/flash-attention

ssiu · 2026-03-21T12:25:36Z

FlashAttention Turing

This PR adds a link to the flash-attention-turing repo that provides support for Turing (SM75) architecture in FlashAttention, following #1533.

Features

Supports:

fwd and bwd
head dim 64, 128
causal mask
grouped-query attention (GQA)
variable sequence lengths (varlen)

Does not support:

dropout
local mask
kv cache

Performance

Benchmarks are reported on Nvidia T4 GPUs.

Forward pass

Up to 2.19x and 1.95x faster than PyTorch's Attention for non-causal and causal workloads.

On Turing GPUs, PyTorch's Attention uses Memory-Efficient Attention from xformers, since FlashAttention does not provide optimized kernels for SM75.

For long sequences, the forward kernel reaches up to 66% compute throughput.

Backward pass

The backward pass is split into two kernels: one for dQ and one for dK and dV.

Up to 1.35x and 1.51x faster than PyTorch's Attention for non-causal and causal workloads.

For long sequences, the backward kernels reach up to 49% compute throughput for dK and dV, and 45% for dQ.

Correctness and numerical differences

From our tests in test_flash_attn.py, we consistently observe maximum and mean absolute differences of ~1e-3 and ~1e-5 respectively relative to PyTorch's attention kernels.

Thanks!

tridao · 2026-03-21T12:53:53Z

Thanks!
Given that we're switching to Cute-DSL for current and future development, we won't support Turing in this repo (cute-dsl requires Ampere+). You're welcome to have the Turing impl in another repo and we could link to it.

ssiu · 2026-03-21T12:58:52Z

Thanks for the comment!

Sure, I’ll keep the Turing implementation in a separate repo and share it here once it’s cleaned up.

ssiu · 2026-03-23T07:24:00Z

Hi @tridao ,

I just cleaned up the Turing repo. I think it's good to go now. Thanks again!

Add README link for Turing support

0308ee0

ssiu force-pushed the turing branch from eda1aeb to 0308ee0 Compare March 23, 2026 07:08

ssiu changed the title ~~[Sm75] Initial Turing support~~ [Sm75] Add README link for initial Turing support Mar 23, 2026

tridao approved these changes Mar 25, 2026

View reviewed changes

tridao merged commit b8eda39 into Dao-AILab:main Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sm75] Add README link for initial Turing support#2379

[Sm75] Add README link for initial Turing support#2379
tridao merged 1 commit intoDao-AILab:mainfrom
ssiu:turing

ssiu commented Mar 21, 2026 •

edited

Loading

Uh oh!

tridao commented Mar 21, 2026

Uh oh!

ssiu commented Mar 21, 2026 •

edited

Loading

Uh oh!

ssiu commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ssiu commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FlashAttention Turing

Features

Performance

Forward pass

Backward pass

Correctness and numerical differences

Uh oh!

tridao commented Mar 21, 2026

Uh oh!

ssiu commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssiu commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ssiu commented Mar 21, 2026 •

edited

Loading

ssiu commented Mar 21, 2026 •

edited

Loading

ssiu commented Mar 23, 2026 •

edited

Loading