diff --git a/README.md b/README.md index b352dada6c8..ec7076e3f3b 100755 --- a/README.md +++ b/README.md @@ -122,9 +122,7 @@ We recommend the container from Nvidia, which has all the required tools to install FlashAttention. FlashAttention-2 with CUDA currently supports: -1. Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing - GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing - GPUs for now. +1. Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). For Turing GPUs (T4, RTX 2080), see the separate [flash-attention-turing](https://github.com/ssiu/flash-attention-turing) repo, which supports a core subset of FlashAttention features on Turing. 2. Datatype fp16 and bf16 (bf16 requires Ampere, Ada, or Hopper GPUs). 3. All head dimensions up to 256. ~~Head dim > 192 backward requires A100/A800 or H100/H800~~. Head dim 256 backward now works on consumer GPUs (if there's no dropout) as of flash-attn 2.5.5.