Commit 1ce8bc6
committed
Update on "introduce triton sdpa kernel to cuda backend"
**Introduce Triton SDPA Kernel to CUDA Backend**
This diff introduces a Triton-optimized implementation of scaled dot-product attention (SDPA) kernel to the CUDA backend. The new kernel is designed to replace the default Edge SDPA operator during graph transformation to accelerate the model inference and get rid of sdpa decomposition.
**Changes**
* Added a new file `sdpa.py` to `fbcode/executorch/backends/cuda/triton/kernels` and `fbcode/executorch/backends/cuda/triton/kernels` directories, which contains the Triton-optimized SDPA kernel implementation.
* Added a new file `__init__.py` to `fbcode/executorch/backends/cuda/triton/replacement_pass`, which replaces the given existing edge ops with target triton kernels.
* Added tests for sdpa exporting with triton kernel. Without the triton kernel, sdpa model can not be exported.
**Purpose**
The purpose of this diff is to provide a high-performance SDPA kernel for the CUDA backend, which can be used to accelerate attention-based models on NVIDIA GPUs.
Differential Revision: [D87259044](https://our.internmc.facebook.com/intern/diff/D87259044/)
[ghstack-poisoned]File tree
40 files changed
+765
-517
lines changed- .ci/scripts
- .github/workflows
- backends
- arm
- _passes
- operators
- scripts
- test
- models
- ops
- passes
- tester
- tosa
- test/suite
- flows
- examples
- arm/executor_runner
- qualcomm/qaihub_scripts/llama/runner
- extension/llm/runner
- website
40 files changed
+765
-517
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| 60 | + | |
60 | 61 | | |
61 | | - | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
62 | 67 | | |
63 | 68 | | |
64 | 69 | | |
65 | 70 | | |
66 | 71 | | |
67 | 72 | | |
68 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
69 | 79 | | |
70 | 80 | | |
71 | 81 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
181 | 181 | | |
182 | 182 | | |
183 | 183 | | |
184 | | - | |
| 184 | + | |
185 | 185 | | |
186 | 186 | | |
187 | 187 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
317 | 317 | | |
318 | 318 | | |
319 | 319 | | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
320 | 354 | | |
321 | 355 | | |
322 | 356 | | |
| |||
0 commit comments