Commit 28c7b6c
authored
[PHI] Major fix for gather/scatter related CUDA kernels (#74922)
* [PHI] gather_scatter kernel largely refactored for correctness
* [PHI] gather scatter kernel rigorously tested
* [PHI] Fixed CUDA 700 error in 4 cases.
5184 forward tests passed, 432 torch comparison failed due to mean int and fp16
* [PHI] Resolve conflicts for scatter/gather kernels
* [PHI] Reformatted with __restrict__
* [PHI] Fix amin smem not allocated bug1 parent 2fd8a7e commit 28c7b6c
1 file changed
+917
-768
lines changed
0 commit comments