Flash Attention 101

Overview

This repository provides a simplified implementation of the attention mechanism for self-learning purposes. It includes:

Implementation	File Location
Naive Attention	`csrc/naive_attention.cu`
CUDA core Flash Attention 1 (FA1)	`csrc/flash_attn_1.cu`
CUDA core Flash Attention 2 (FA2)	`csrc/flash_attn_2.cu`
Tensor core FA2 using CUTLASS CuTe	`csrc/flash_attn/`

The python binding version of Tensor core FA2 using CUTLASS CuTe can be found here CuTe FA2 python binding

Build and Run

Use the provided CMakeLists.txt to build and run the CUDA programs:

git submodule init
git submodule update
cmake -B build
cmake --build build
./build/csrc/profile-attention

Benchmark (On A100)

Device name: NVIDIA A100 80GB PCIe MIG 1g.10gb
Global memory size: 9 GB
Peak memory bandwidth: 241.92 GB/s

batch size = 8
sequence length = 256
number of heads = 16
dimension = 64
-------------------------------------------------
implementation: cuda core flash attention 01
all-close check passed
naive attention latency = 53.7733 ms
latency = 28.1272 ms
speedup = 191.179%

-------------------------------------------------
implementation: cuda core flash attention 02
all-close check passed
naive attention latency = 53.7846 ms
latency = 25.7505 ms
speedup = 208.868%

-------------------------------------------------
implementation: cute flash attention 02
all-close check passed
naive attention latency = 53.7733 ms
latency = 0.16384 ms
speedup = 32820.6%

Usage

Customize the implementation by modifying the config in csrc/profile.cu. Note that this implementation supports head dimensions of 32 and 64 only.

Implementation Details

The implementation in this repository follows the ideas and approaches presented in the following works:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
csrc		csrc
include		include
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flash Attention 101

Overview

Build and Run

Benchmark (On A100)

Usage

Implementation Details

About

Releases

Packages

Languages

HuyNguyen-hust/flash-attn-101

Folders and files

Latest commit

History

Repository files navigation

Flash Attention 101

Overview

Build and Run

Benchmark (On A100)

Usage

Implementation Details

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages