HIP: adjust RDNA3.5 MMQ kernel selction logic by JohannesGaessler · Pull Request #18666 · ggml-org/llama.cpp

JohannesGaessler · 2026-01-07T12:58:52Z

Follow-up to #18537 .

I was able to solve the technical issues I was having with my Strix Halo system and tested the performance change:

Details

GPU	Model	Microbatch size	Test	t/s b7644	t/s b7645	Speedup
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	1	pp2048	80.49	80.66	1.00
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	2	pp2048	135.34	135.54	1.00
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	4	pp2048	198.06	198.68	1.00
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	8	pp2048	242.29	243.07	1.00
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	16	pp2048	478.25	479.90	1.00
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	32	pp2048	655.46	658.93	1.01
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	64	pp2048	862.67	865.35	1.00
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	128	pp2048	977.95	983.60	1.01
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	256	pp2048	1026.28	1022.60	1.00
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	512	pp2048	1034.86	1042.04	1.01
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	1024	pp2048	1081.21	1093.03	1.01
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	2048	pp2048	1088.63	1101.54	1.01
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	1	pp2048	57.53	57.68	1.00
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	2	pp2048	100.87	101.06	1.00
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	4	pp2048	154.50	154.99	1.00
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	8	pp2048	190.41	190.68	1.00
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	16	pp2048	282.39	284.73	1.01
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	32	pp2048	584.52	587.79	1.01
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	64	pp2048	791.65	791.29	1.00
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	128	pp2048	832.41	833.40	1.00
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	256	pp2048	862.35	620.52	0.72
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	512	pp2048	868.23	795.88	0.92
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	1024	pp2048	912.68	905.39	0.99
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	2048	pp2048	917.08	977.26	1.07
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	1	pp2048	60.98	61.03	1.00
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	2	pp2048	104.00	104.47	1.00
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	4	pp2048	156.41	157.16	1.00
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	8	pp2048	186.89	187.82	1.00
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	16	pp2048	277.61	280.44	1.01
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	32	pp2048	590.42	592.67	1.00
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	64	pp2048	779.88	782.53	1.00
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	128	pp2048	805.76	808.56	1.00
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	256	pp2048	837.09	589.52	0.70
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	512	pp2048	848.95	760.39	0.90
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	1024	pp2048	891.73	882.25	0.99
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	2048	pp2048	895.69	964.22	1.08
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	1	pp2048	47.89	47.88	1.00
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	2	pp2048	86.35	86.22	1.00
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	4	pp2048	139.38	139.74	1.00
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	8	pp2048	169.64	170.55	1.01
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	16	pp2048	346.54	347.67	1.00
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	32	pp2048	522.02	524.26	1.00
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	64	pp2048	796.76	800.68	1.00
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	128	pp2048	961.07	964.32	1.00
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	256	pp2048	999.78	1003.37	1.00
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	512	pp2048	1011.26	1024.84	1.01
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	1024	pp2048	1064.53	1075.43	1.01
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	2048	pp2048	1070.38	1085.14	1.01
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	1	pp2048	44.60	44.57	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	2	pp2048	81.16	81.12	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	4	pp2048	136.74	136.84	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	8	pp2048	181.99	182.07	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	16	pp2048	322.14	322.08	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	32	pp2048	486.18	487.20	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	64	pp2048	788.96	789.56	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	128	pp2048	977.89	977.91	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	256	pp2048	1018.10	1014.87	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	512	pp2048	1019.21	1021.68	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	1024	pp2048	1072.28	1069.02	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	2048	pp2048	1068.64	1072.60	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	1	pp2048	44.28	44.30	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	2	pp2048	79.72	79.67	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	4	pp2048	131.11	131.06	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	8	pp2048	170.25	170.14	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	16	pp2048	335.28	336.16	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	32	pp2048	499.62	503.04	1.01
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	64	pp2048	795.08	796.11	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	128	pp2048	977.52	978.78	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	256	pp2048	1015.43	1015.62	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	512	pp2048	1021.13	1020.96	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	1024	pp2048	1069.80	1067.72	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	2048	pp2048	1068.59	1073.08	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	1	pp2048	49.95	49.87	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	2	pp2048	89.89	90.24	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	4	pp2048	145.94	146.12	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	8	pp2048	187.61	188.10	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	16	pp2048	360.69	360.83	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	32	pp2048	524.34	525.70	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	64	pp2048	822.41	825.63	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	128	pp2048	1005.10	1003.98	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	256	pp2048	1046.09	1043.41	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	512	pp2048	1052.57	1053.20	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	1024	pp2048	1103.97	1102.13	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	2048	pp2048	1104.34	1103.92	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	1	pp2048	54.26	54.23	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	2	pp2048	96.89	96.86	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	4	pp2048	152.05	152.09	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	8	pp2048	188.68	188.74	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	16	pp2048	372.62	374.57	1.01
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	32	pp2048	558.53	559.39	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	64	pp2048	836.24	838.77	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	128	pp2048	1007.78	1008.26	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	256	pp2048	1053.00	1013.16	0.96
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	512	pp2048	1066.48	1031.93	0.97
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	1024	pp2048	1111.23	1095.85	0.99
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	2048	pp2048	1111.26	1115.50	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	1	pp2048	49.48	49.47	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	2	pp2048	94.60	94.48	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	4	pp2048	168.98	168.88	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	8	pp2048	249.00	249.22	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	16	pp2048	457.06	457.92	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	32	pp2048	516.60	519.00	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	64	pp2048	944.19	943.83	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	128	pp2048	1072.45	1072.78	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	256	pp2048	1138.03	1137.18	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	512	pp2048	1148.44	1150.13	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	1024	pp2048	1202.56	1191.82	0.99
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	2048	pp2048	1194.05	1192.39	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	1	pp2048	52.82	52.85	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	2	pp2048	100.68	100.71	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	4	pp2048	179.94	180.12	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	8	pp2048	260.81	261.26	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	16	pp2048	493.87	494.99	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	32	pp2048	389.10	391.24	1.01
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	64	pp2048	947.96	948.85	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	128	pp2048	1088.40	1087.98	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	256	pp2048	1153.12	1152.56	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	512	pp2048	1162.22	1163.81	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	1024	pp2048	1215.60	1207.64	0.99
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	2048	pp2048	1202.71	1204.21	1.00
Radeon 8060S Graphics	llama 8B Q2_K_S	1	pp2048	65.70	65.86	1.00
Radeon 8060S Graphics	llama 8B Q2_K_S	2	pp2048	96.45	96.56	1.00
Radeon 8060S Graphics	llama 8B Q2_K_S	4	pp2048	118.90	119.08	1.00
Radeon 8060S Graphics	llama 8B Q2_K_S	8	pp2048	113.61	113.57	1.00
Radeon 8060S Graphics	llama 8B Q2_K_S	16	pp2048	225.46	227.19	1.01
Radeon 8060S Graphics	llama 8B Q2_K_S	32	pp2048	372.21	374.90	1.01
Radeon 8060S Graphics	llama 8B Q2_K_S	64	pp2048	521.49	524.02	1.00
Radeon 8060S Graphics	llama 8B Q2_K_S	128	pp2048	566.55	569.93	1.01
Radeon 8060S Graphics	llama 8B Q2_K_S	256	pp2048	598.39	606.65	1.01
Radeon 8060S Graphics	llama 8B Q2_K_S	512	pp2048	628.21	772.53	1.23
Radeon 8060S Graphics	llama 8B Q2_K_S	1024	pp2048	682.67	893.04	1.31
Radeon 8060S Graphics	llama 8B Q2_K_S	2048	pp2048	688.09	978.68	1.42
Radeon 8060S Graphics	llama 8B Q3_K_S	1	pp2048	48.52	48.69	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	2	pp2048	81.08	81.52	1.01
Radeon 8060S Graphics	llama 8B Q3_K_S	4	pp2048	113.30	113.65	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	8	pp2048	112.39	112.71	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	16	pp2048	324.66	326.53	1.01
Radeon 8060S Graphics	llama 8B Q3_K_S	32	pp2048	628.08	632.08	1.01
Radeon 8060S Graphics	llama 8B Q3_K_S	64	pp2048	822.62	824.57	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	128	pp2048	961.21	963.00	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	256	pp2048	1010.43	1010.83	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	512	pp2048	1048.95	1052.41	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	1024	pp2048	1072.77	1073.93	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	2048	pp2048	1073.66	1073.30	1.00
Radeon 8060S Graphics	llama 8B Q4_0	1	pp2048	50.08	50.17	1.00
Radeon 8060S Graphics	llama 8B Q4_0	2	pp2048	96.16	96.54	1.00
Radeon 8060S Graphics	llama 8B Q4_0	4	pp2048	172.93	173.57	1.00
Radeon 8060S Graphics	llama 8B Q4_0	8	pp2048	248.61	251.83	1.01
Radeon 8060S Graphics	llama 8B Q4_0	16	pp2048	450.84	459.31	1.02
Radeon 8060S Graphics	llama 8B Q4_0	32	pp2048	343.86	349.90	1.02
Radeon 8060S Graphics	llama 8B Q4_0	64	pp2048	905.11	918.95	1.02
Radeon 8060S Graphics	llama 8B Q4_0	128	pp2048	1053.75	1070.02	1.02
Radeon 8060S Graphics	llama 8B Q4_0	256	pp2048	1110.67	1126.46	1.01
Radeon 8060S Graphics	llama 8B Q4_0	512	pp2048	1119.08	1141.89	1.02
Radeon 8060S Graphics	llama 8B Q4_0	1024	pp2048	1175.03	1194.38	1.02
Radeon 8060S Graphics	llama 8B Q4_0	2048	pp2048	1172.29	1187.18	1.01
Radeon 8060S Graphics	llama 8B Q4_1	1	pp2048	45.03	45.03	1.00
Radeon 8060S Graphics	llama 8B Q4_1	2	pp2048	88.58	88.61	1.00
Radeon 8060S Graphics	llama 8B Q4_1	4	pp2048	162.98	163.23	1.00
Radeon 8060S Graphics	llama 8B Q4_1	8	pp2048	253.76	254.94	1.00
Radeon 8060S Graphics	llama 8B Q4_1	16	pp2048	439.88	441.15	1.00
Radeon 8060S Graphics	llama 8B Q4_1	32	pp2048	675.67	678.62	1.00
Radeon 8060S Graphics	llama 8B Q4_1	64	pp2048	889.72	895.45	1.01
Radeon 8060S Graphics	llama 8B Q4_1	128	pp2048	959.38	965.49	1.01
Radeon 8060S Graphics	llama 8B Q4_1	256	pp2048	1005.98	1017.63	1.01
Radeon 8060S Graphics	llama 8B Q4_1	512	pp2048	1028.11	1040.51	1.01
Radeon 8060S Graphics	llama 8B Q4_1	1024	pp2048	1083.72	1092.39	1.01
Radeon 8060S Graphics	llama 8B Q4_1	2048	pp2048	1092.66	1096.33	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	1	pp2048	42.71	42.59	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	2	pp2048	69.58	69.36	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	4	pp2048	98.88	98.76	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	8	pp2048	116.96	117.04	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	16	pp2048	458.83	461.10	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	32	pp2048	662.41	667.17	1.01
Radeon 8060S Graphics	llama 8B Q4_K_S	64	pp2048	869.02	873.42	1.01
Radeon 8060S Graphics	llama 8B Q4_K_S	128	pp2048	988.74	992.93	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	256	pp2048	1037.34	1038.45	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	512	pp2048	1044.03	1047.98	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	1024	pp2048	1098.50	1101.53	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	2048	pp2048	1104.48	1103.43	1.00
Radeon 8060S Graphics	llama 8B Q5_0	1	pp2048	42.14	42.25	1.00
Radeon 8060S Graphics	llama 8B Q5_0	2	pp2048	81.06	81.45	1.00
Radeon 8060S Graphics	llama 8B Q5_0	4	pp2048	148.24	148.21	1.00
Radeon 8060S Graphics	llama 8B Q5_0	8	pp2048	226.26	226.33	1.00
Radeon 8060S Graphics	llama 8B Q5_0	16	pp2048	398.10	398.76	1.00
Radeon 8060S Graphics	llama 8B Q5_0	32	pp2048	294.15	296.09	1.01
Radeon 8060S Graphics	llama 8B Q5_0	64	pp2048	866.06	866.26	1.00
Radeon 8060S Graphics	llama 8B Q5_0	128	pp2048	1040.95	1043.27	1.00
Radeon 8060S Graphics	llama 8B Q5_0	256	pp2048	1104.15	1106.47	1.00
Radeon 8060S Graphics	llama 8B Q5_0	512	pp2048	1124.53	1127.04	1.00
Radeon 8060S Graphics	llama 8B Q5_0	1024	pp2048	1167.65	1169.41	1.00
Radeon 8060S Graphics	llama 8B Q5_0	2048	pp2048	1155.58	1158.23	1.00
Radeon 8060S Graphics	llama 8B Q5_1	1	pp2048	36.42	36.53	1.00
Radeon 8060S Graphics	llama 8B Q5_1	2	pp2048	71.83	72.02	1.00
Radeon 8060S Graphics	llama 8B Q5_1	4	pp2048	135.70	135.90	1.00
Radeon 8060S Graphics	llama 8B Q5_1	8	pp2048	220.35	220.84	1.00
Radeon 8060S Graphics	llama 8B Q5_1	16	pp2048	306.96	306.97	1.00
Radeon 8060S Graphics	llama 8B Q5_1	32	pp2048	540.63	543.56	1.01
Radeon 8060S Graphics	llama 8B Q5_1	64	pp2048	795.80	797.04	1.00
Radeon 8060S Graphics	llama 8B Q5_1	128	pp2048	912.04	914.19	1.00
Radeon 8060S Graphics	llama 8B Q5_1	256	pp2048	973.76	975.76	1.00
Radeon 8060S Graphics	llama 8B Q5_1	512	pp2048	1000.48	996.70	1.00
Radeon 8060S Graphics	llama 8B Q5_1	1024	pp2048	1054.89	1051.86	1.00
Radeon 8060S Graphics	llama 8B Q5_1	2048	pp2048	1058.78	1060.83	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	1	pp2048	38.61	38.78	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	2	pp2048	64.82	64.83	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	4	pp2048	93.97	94.17	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	8	pp2048	113.83	113.88	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	16	pp2048	453.38	454.37	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	32	pp2048	673.74	674.86	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	64	pp2048	882.35	885.74	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	128	pp2048	965.68	969.79	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	256	pp2048	1009.28	1011.65	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	512	pp2048	1025.61	1028.44	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	1024	pp2048	1086.24	1083.86	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	2048	pp2048	1089.73	1091.21	1.00
Radeon 8060S Graphics	llama 8B Q6_K	1	pp2048	34.17	34.23	1.00
Radeon 8060S Graphics	llama 8B Q6_K	2	pp2048	65.05	65.15	1.00
Radeon 8060S Graphics	llama 8B Q6_K	4	pp2048	112.52	112.28	1.00
Radeon 8060S Graphics	llama 8B Q6_K	8	pp2048	145.02	144.71	1.00
Radeon 8060S Graphics	llama 8B Q6_K	16	pp2048	338.73	339.38	1.00
Radeon 8060S Graphics	llama 8B Q6_K	32	pp2048	488.43	490.40	1.00
Radeon 8060S Graphics	llama 8B Q6_K	64	pp2048	637.44	635.32	1.00
Radeon 8060S Graphics	llama 8B Q6_K	128	pp2048	654.69	654.49	1.00
Radeon 8060S Graphics	llama 8B Q6_K	256	pp2048	683.40	573.48	0.84
Radeon 8060S Graphics	llama 8B Q6_K	512	pp2048	695.59	755.10	1.09
Radeon 8060S Graphics	llama 8B Q6_K	1024	pp2048	735.84	877.38	1.19
Radeon 8060S Graphics	llama 8B Q6_K	2048	pp2048	746.75	953.73	1.28
Radeon 8060S Graphics	llama 8B Q8_0	1	pp2048	28.30	28.31	1.00
Radeon 8060S Graphics	llama 8B Q8_0	2	pp2048	55.89	56.06	1.00
Radeon 8060S Graphics	llama 8B Q8_0	4	pp2048	105.92	106.56	1.01
Radeon 8060S Graphics	llama 8B Q8_0	8	pp2048	188.80	189.22	1.00
Radeon 8060S Graphics	llama 8B Q8_0	16	pp2048	336.63	337.55	1.00
Radeon 8060S Graphics	llama 8B Q8_0	32	pp2048	384.23	389.19	1.01
Radeon 8060S Graphics	llama 8B Q8_0	64	pp2048	821.79	825.11	1.00
Radeon 8060S Graphics	llama 8B Q8_0	128	pp2048	972.84	981.31	1.01
Radeon 8060S Graphics	llama 8B Q8_0	256	pp2048	1027.77	1031.53	1.00
Radeon 8060S Graphics	llama 8B Q8_0	512	pp2048	1051.68	1054.52	1.00
Radeon 8060S Graphics	llama 8B Q8_0	1024	pp2048	1112.74	1110.91	1.00
Radeon 8060S Graphics	llama 8B Q8_0	2048	pp2048	1112.56	1112.48	1.00

This PR changes the kernel selection logic to use MMQ if either the performance of the hipBLAS path is worse of if the speedup is small and it would not really be worth the increase in memory use.

IMbackK

otherwise lgtm

IMbackK · 2026-01-07T17:32:00Z

ggml/src/ggml-cuda/mmq.cu

            }

+            // For some quantization types MMQ can have lower peak TOPS than hipBLAS
+            //     so it's only faster for sufficiently small batch sizes:


extra spaces

This is intentional since the sentence is spanning multiple lines.

greping around in the codebase this is not the style used making it a bit awkward. but its not a big deal

Beinsezii

don't have the chance to test at the moment but it looks good. surprised that 3_0 is so much worse in mmq than everything else

IMbackK · 2026-01-07T21:41:38Z

for CDNA mmq is also a mixed bag, generally gfx1100 and cdna1 and cdna2 have the best tuned tensile kernels so i think its more a case of blas doing better there than mmq doing worse.

Beinsezii · 2026-01-07T21:48:06Z

Probably a visit to q2/q6 perf would help everyone then.

IMbackK · 2026-01-07T21:50:03Z

iirc from previous discussions the q2 performance anomaly also exists on cuda + mmq. someone could take a look at those kernels specifically, i havent because i dont find the q2 variants a very interesting datatype.

Beinsezii · 2026-01-07T23:14:17Z

i havent because i dont find the q2 variants a very interesting datatype.

For me Q6 is the one that hurts as it's perfect for Mistral 3.2 on 24GiB. Otherwise I probably wouldn't have ever found this problem.

HIP: adjust RDNA3.5 MMQ kernel selction logic

39a4e83

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 7, 2026

IMbackK approved these changes Jan 7, 2026

View reviewed changes

Beinsezii approved these changes Jan 7, 2026

View reviewed changes

am17an approved these changes Jan 10, 2026

View reviewed changes

JohannesGaessler merged commit d2ff4e2 into ggml-org:master Jan 10, 2026
75 checks passed

gary149 pushed a commit to gary149/llama-agent that referenced this pull request Jan 13, 2026

HIP: adjust RDNA3.5 MMQ kernel selction logic (ggml-org#18666)

8a07d1c

dillon-blake pushed a commit to Boxed-Logic/llama.cpp that referenced this pull request Jan 15, 2026

HIP: adjust RDNA3.5 MMQ kernel selction logic (ggml-org#18666)

6f4f9d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIP: adjust RDNA3.5 MMQ kernel selction logic#18666

HIP: adjust RDNA3.5 MMQ kernel selction logic#18666
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:hip-mmq-tune

JohannesGaessler commented Jan 7, 2026

Uh oh!

IMbackK left a comment

Uh oh!

IMbackK Jan 7, 2026

Uh oh!

JohannesGaessler Jan 7, 2026

Uh oh!

IMbackK Jan 7, 2026 •

edited

Loading

Uh oh!

Beinsezii left a comment

Uh oh!

IMbackK commented Jan 7, 2026

Uh oh!

Beinsezii commented Jan 7, 2026

Uh oh!

IMbackK commented Jan 7, 2026 •

edited

Loading

Uh oh!

Beinsezii commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

JohannesGaessler commented Jan 7, 2026

Uh oh!

IMbackK left a comment

Choose a reason for hiding this comment

Uh oh!

IMbackK Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

IMbackK Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Beinsezii left a comment

Choose a reason for hiding this comment

Uh oh!

IMbackK commented Jan 7, 2026

Uh oh!

Beinsezii commented Jan 7, 2026

Uh oh!

IMbackK commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Beinsezii commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

IMbackK Jan 7, 2026 •

edited

Loading

IMbackK commented Jan 7, 2026 •

edited

Loading