Commit 20157ff
committed
Update base for Update on "introduce triton sdpa kernel to cuda backend"
**Introduce Triton SDPA Kernel to CUDA Backend**
This diff introduces a Triton-optimized implementation of scaled dot-product attention (SDPA) kernel to the CUDA backend. The new kernel is designed to replace the default Edge SDPA operator during graph transformation to accelerate the model inference and get rid of sdpa decomposition.
**Changes**
* Added a new file `sdpa.py` to `fbcode/executorch/backends/cuda/triton/kernels` and `fbcode/executorch/backends/cuda/triton/kernels` directories, which contains the Triton-optimized SDPA kernel implementation.
* Added a new file `__init__.py` to `fbcode/executorch/backends/cuda/triton/replacement_pass`, which replaces the given existing edge ops with target triton kernels.
* Added tests for sdpa exporting with triton kernel. Without the triton kernel, sdpa model can not be exported.
**Purpose**
The purpose of this diff is to provide a high-performance SDPA kernel for the CUDA backend, which can be used to accelerate attention-based models on NVIDIA GPUs.
Differential Revision: [D87259044](https://our.internmc.facebook.com/intern/diff/D87259044/)
[ghstack-poisoned]1 parent 13d1f3a commit 20157ff
File tree
0 file changed
+0
-0
lines changed0 file changed
+0
-0
lines changed
0 commit comments