MoYe.jl
is NVIDIA's Cutlass/CuTe implemented in Julia.
The primary purpose of developing this library is my desire to learn CuTe.
The name Mo Ye is derived from an ancient Chinese legend of swordsmiths.
The documentation is mostly my learning notes. Please refer to CuTe's documentation for more details.
GEMM essentially faces two main performance hurdles not implemented yet:
- Swizzling to prevent bank conflicts.
- An efficient epilogue, which involves transferring data from registers to shared memory, followed by a vectorized copy back to global memory.
Since I've sold my old laptop, I no longer have access to an NVIDIA GPU, thus the development of this library will be put on hold indefinitely.