Change logs
- Add cublasStrsm_v2, cublasSgelsBatched.
- [Runtime] Do nothing instead of panicking. (release-mode only)
- Minor fixes.
- [Nightly] Enable FP16 Conv2d.
- [Nightly] Enable PyTorch Flash Attention 2.
cuDNN on Windows
From ZLUDA v3.9.0, the nightly build includes FP16 Conv2d and Flash Attention 2 support.
Supported Architectures
- gfx908, gfx90a
- gfx940, gfx941, gfx942
- gfx1030
- gfx1100, gfx1101, gfx1102
- gfx1150
In order to enable cuDNN acceleration on supported device, you should download and unpack HIP SDK extension upon your existing HIP SDK 6.2 installation.
HIP SDK extension: DOWNLOAD
(unzip and paste folders upon path/to/AMD/ROCm/6.2
)
※ HIP SDK extension does not include hipBLASLt.