Skip to content

Commit 28c7b6c

Browse files
authored
[PHI] Major fix for gather/scatter related CUDA kernels (#74922)
* [PHI] gather_scatter kernel largely refactored for correctness * [PHI] gather scatter kernel rigorously tested * [PHI] Fixed CUDA 700 error in 4 cases. 5184 forward tests passed, 432 torch comparison failed due to mean int and fp16 * [PHI] Resolve conflicts for scatter/gather kernels * [PHI] Reformatted with __restrict__ * [PHI] Fix amin smem not allocated bug
1 parent 2fd8a7e commit 28c7b6c

File tree

1 file changed

+917
-768
lines changed

1 file changed

+917
-768
lines changed

0 commit comments

Comments
 (0)