Skip to content

Commit 7a573e2

Browse files
committed
Update on "introduce triton sdpa kernel to cuda backend"
**Introduce Triton SDPA Kernel to CUDA Backend** This diff introduces a Triton-optimized implementation of scaled dot-product attention (SDPA) kernel to the CUDA backend. The new kernel is designed to replace the default Edge SDPA operator during graph transformation to accelerate the model inference and get rid of sdpa decomposition. **Changes** * Added a new file `sdpa.py` to `fbcode/executorch/backends/cuda/triton/kernels` and `fbcode/executorch/backends/cuda/triton/kernels` directories, which contains the Triton-optimized SDPA kernel implementation. * Added a new file `__init__.py` to `fbcode/executorch/backends/cuda/triton/replacement_pass`, which replaces the given existing edge ops with target triton kernels. * Added tests for sdpa exporting with triton kernel. Without the triton kernel, sdpa model can not be exported. **Purpose** The purpose of this diff is to provide a high-performance SDPA kernel for the CUDA backend, which can be used to accelerate attention-based models on NVIDIA GPUs. Differential Revision: [D87259044](https://our.internmc.facebook.com/intern/diff/D87259044/) [ghstack-poisoned]
2 parents 8bdb6b5 + 81875ab commit 7a573e2

File tree

1 file changed

+0
-2
lines changed

1 file changed

+0
-2
lines changed

backends/cuda/triton/replacement_pass.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,8 +92,6 @@ def _should_replace_node(self, node: Node) -> bool:
9292
if node.op != "call_function":
9393
return False
9494

95-
print("Checking:", node.target)
96-
9795
return node.target in EDGE_TO_TRITON_KERNELS
9896

9997
def _replace_node_with_triton(self, graph_module: GraphModule, node: Node) -> None:

0 commit comments

Comments
 (0)