HIP: adjust RDNA3.5 MMQ kernel selction logic#18666
HIP: adjust RDNA3.5 MMQ kernel selction logic#18666JohannesGaessler merged 1 commit intoggml-org:masterfrom
Conversation
| } | ||
|
|
||
| // For some quantization types MMQ can have lower peak TOPS than hipBLAS | ||
| // so it's only faster for sufficiently small batch sizes: |
There was a problem hiding this comment.
This is intentional since the sentence is spanning multiple lines.
There was a problem hiding this comment.
greping around in the codebase this is not the style used making it a bit awkward. but its not a big deal
Beinsezii
left a comment
There was a problem hiding this comment.
don't have the chance to test at the moment but it looks good. surprised that 3_0 is so much worse in mmq than everything else
|
for CDNA mmq is also a mixed bag, generally gfx1100 and cdna1 and cdna2 have the best tuned tensile kernels so i think its more a case of blas doing better there than mmq doing worse. |
|
Probably a visit to q2/q6 perf would help everyone then. |
|
iirc from previous discussions the q2 performance anomaly also exists on cuda + mmq. someone could take a look at those kernels specifically, i havent because i dont find the q2 variants a very interesting datatype. |
For me Q6 is the one that hurts as it's perfect for Mistral 3.2 on 24GiB. Otherwise I probably wouldn't have ever found this problem. |
Follow-up to #18537 .
I was able to solve the technical issues I was having with my Strix Halo system and tested the performance change:
Details
This PR changes the kernel selection logic to use MMQ if either the performance of the hipBLAS path is worse of if the speedup is small and it would not really be worth the increase in memory use.