HIP: fix RDNA3 FP16/BF16 matrix multiplication#17817
HIP: fix RDNA3 FP16/BF16 matrix multiplication#17817JohannesGaessler merged 1 commit intoggml-org:masterfrom
Conversation
Beinsezii
left a comment
There was a problem hiding this comment.
Full test-backend-ops green now on gfx1100
|
though interesting according to |
|
Looking at the build command in #17576 (comment) they have |
|
yeah. using a realistic workload all mmq commits + this one https://github.com/Beinsezii/llama.cpp/tree/rdna3_perf_mmq
all recent mmq commits reverted Still think this should get merged first to stop the failures but maybe I should open a new issue for perf? I assume it'll be fixed by #17495 eventually since that will replace the rocWMMA path. |
|
Any differences you see should be from MMQ vs. rocBLAS. If you compile with |
Rebuilt against this PR confirmed perf good with cublas. Might be worth making it the default again until the other PR is ready, people will really notice 1/3 of throughput gone. |
|
I don't know if it was within the scope of this PR but building with GGML_HIP_ROCWMMA_FATTN=ON is still broken. |
This reverts commit f334b79.
Fixes #17797 by simply adding an explicit RDNA4 requirement to MMF. @jiachengjason as outlined in https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#pull-requests-for-contributors--collaborators , please test changes to the CUDA/HIP backend for correctness using
test-backend-ops.