Skip to content

HIP: tune mmq/rocblas switching for RDNA4#18816

Open
jiachengjason wants to merge 7 commits intoggml-org:masterfrom
jiachengjason:fix/jiachengjason/rocm7.x_regression
Open

HIP: tune mmq/rocblas switching for RDNA4#18816
jiachengjason wants to merge 7 commits intoggml-org:masterfrom
jiachengjason:fix/jiachengjason/rocm7.x_regression

Conversation

@jiachengjason
Copy link
Contributor

@jiachengjason jiachengjason commented Jan 13, 2026

Following similar approach to #18537 for tuning mmq/rocblas switching for RDNA4 to improve performance for microbatch size >256 and at micro batch size 8 for most models (+9% to +230% perf gain)

Testing set up:

HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" cmake -S . -B build   -DGGML_HIP=ON   -DGGML_CUDA_FORCE_MMQ=OFF   -DGGML_HIP_UMA=OFF   -DGGML_HIP_ROCWMMA_FATTN=ON   -DGPU_TARGETS="gfx1201"   -DGGML_HIP_GRAPHS=OFF   -DLLAMA_CURL=OFF   -DGGML_CUDA_FORCE_CUBLAS=OFF  -DCMAKE_BUILD_TYPE=Release && cmake --build build --config Release -- -j 32

for q in q4_0 q4_1 q5_1 q8_0 q2_k_s q3_k_s q4_k_s q5_k_s q6_k iq1_s iq2_xxs iq2_xs iq2_s iq3_xxs iq3_xs iq3_s iq3_m iq4_nl iq4_xs; do echo $q; HIP_VISIBLE_DEVICES=0 ./build/bin/llama-bench --model /mnt/nas_share/models/gguf/llama-8b/llama-8b${model_name}-${q}.gguf -r 1 -fa 1 -n 0 -p 2048 -ub "1-2048*2" --progress -o sql|sqlite3 llama-bench.sqlite; sleep 10; done

python3 scripts/compare-llama-bench.py -s gpu_info,model_type,n_ubatch -i llama-bench.sqlite -b 557515be1e93ed8939dd8a7c7d08765fdbe8be31 -c fix/jiachengjason/rocm7.x_regression | tee benchout.txt
Performance result for llama-bench (revised)
GPU Model Microbatch size Test t/s 557515b t/s fix/jiachengjason/rocm7.x_regression Speedup
AI PRO R9700 gpt-oss 20B MXFP4 MoE 1 pp2048 162.46 162.34 1.00
AI PRO R9700 gpt-oss 20B MXFP4 MoE 2 pp2048 196.52 196.23 1.00
AI PRO R9700 gpt-oss 20B MXFP4 MoE 4 pp2048 340.05 342.96 1.01
AI PRO R9700 gpt-oss 20B MXFP4 MoE 8 pp2048 588.83 593.07 1.01
AI PRO R9700 gpt-oss 20B MXFP4 MoE 16 pp2048 901.07 907.40 1.01
AI PRO R9700 gpt-oss 20B MXFP4 MoE 32 pp2048 1323.43 1343.47 1.02
AI PRO R9700 gpt-oss 20B MXFP4 MoE 64 pp2048 1868.19 1890.00 1.01
AI PRO R9700 gpt-oss 20B MXFP4 MoE 128 pp2048 2321.56 2363.62 1.02
AI PRO R9700 gpt-oss 20B MXFP4 MoE 256 pp2048 3486.04 3502.67 1.00
AI PRO R9700 gpt-oss 20B MXFP4 MoE 512 pp2048 4658.84 4675.65 1.00
AI PRO R9700 gpt-oss 20B MXFP4 MoE 1024 pp2048 5472.59 5483.37 1.00
AI PRO R9700 gpt-oss 20B MXFP4 MoE 2048 pp2048 5780.93 5783.82 1.00
AI PRO R9700 granitehybrid 1B Q4_K_M 1 pp2048 153.61 153.62 1.00
AI PRO R9700 granitehybrid 1B Q4_K_M 2 pp2048 257.92 257.23 1.00
AI PRO R9700 granitehybrid 1B Q4_K_M 4 pp2048 442.47 444.39 1.00
AI PRO R9700 granitehybrid 1B Q4_K_M 8 pp2048 728.52 728.29 1.00
AI PRO R9700 granitehybrid 1B Q4_K_M 16 pp2048 1516.27 1527.16 1.01
AI PRO R9700 granitehybrid 1B Q4_K_M 32 pp2048 2476.66 2472.96 1.00
AI PRO R9700 granitehybrid 1B Q4_K_M 64 pp2048 3437.79 3427.70 1.00
AI PRO R9700 granitehybrid 1B Q4_K_M 128 pp2048 4192.80 4180.49 1.00
AI PRO R9700 granitehybrid 1B Q4_K_M 256 pp2048 5437.43 5542.64 1.02
AI PRO R9700 granitehybrid 1B Q4_K_M 512 pp2048 6062.00 6234.81 1.03
AI PRO R9700 granitehybrid 1B Q4_K_M 1024 pp2048 6278.55 6636.16 1.06
AI PRO R9700 granitehybrid 1B Q4_K_M 2048 pp2048 6414.74 6902.19 1.08
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 1 pp2048 139.16 138.97 1.00
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 2 pp2048 230.32 230.44 1.00
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 4 pp2048 329.50 330.13 1.00
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 8 pp2048 432.61 431.86 1.00
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 16 pp2048 912.03 909.88 1.00
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 32 pp2048 1442.29 1448.85 1.00
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 64 pp2048 2035.74 2043.31 1.00
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 128 pp2048 2609.81 2608.93 1.00
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 256 pp2048 3248.39 3250.36 1.00
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 512 pp2048 3389.34 3710.75 1.09
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 1024 pp2048 3527.28 4331.00 1.23
AI PRO R9700 llama 8B IQ1_S - 1.5625 bpw 2048 pp2048 3681.67 4562.44 1.24
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 1 pp2048 105.07 105.19 1.00
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 2 pp2048 174.93 175.26 1.00
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 4 pp2048 262.14 262.51 1.00
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 8 pp2048 401.26 397.39 0.99
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 16 pp2048 535.07 535.12 1.00
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 32 pp2048 1207.78 1208.22 1.00
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 64 pp2048 1799.92 1799.85 1.00
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 128 pp2048 2245.64 2246.34 1.00
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 256 pp2048 2827.70 2827.94 1.00
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 512 pp2048 2811.90 3655.19 1.30
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 1024 pp2048 3013.02 4250.02 1.41
AI PRO R9700 llama 8B IQ2_S - 2.5 bpw 2048 pp2048 3092.72 4407.12 1.42
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 1 pp2048 109.13 109.42 1.00
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 2 pp2048 178.45 178.40 1.00
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 4 pp2048 263.81 261.82 0.99
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 8 pp2048 403.02 401.19 1.00
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 16 pp2048 519.62 516.73 0.99
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 32 pp2048 1194.44 1194.34 1.00
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 64 pp2048 1749.61 1749.69 1.00
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 128 pp2048 2176.35 2172.21 1.00
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 256 pp2048 2697.83 2701.54 1.00
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 512 pp2048 2722.91 3685.15 1.35
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 1024 pp2048 2924.14 4328.51 1.48
AI PRO R9700 llama 8B IQ2_XS - 2.3125 bpw 2048 pp2048 3036.59 4563.77 1.50
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 1 pp2048 91.24 91.35 1.00
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 2 pp2048 159.21 159.28 1.00
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 4 pp2048 257.17 257.29 1.00
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 8 pp2048 354.12 388.46 1.10
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 16 pp2048 760.06 757.19 1.00
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 32 pp2048 1190.56 1186.73 1.00
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 64 pp2048 1852.51 1850.62 1.00
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 128 pp2048 2656.25 2657.58 1.00
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 256 pp2048 3273.80 3273.80 1.00
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 512 pp2048 3413.87 3690.95 1.08
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 1024 pp2048 3546.36 4336.35 1.22
AI PRO R9700 llama 8B IQ2_XXS - 2.0625 bpw 2048 pp2048 3706.44 4561.43 1.23
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 1 pp2048 85.43 85.38 1.00
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 2 pp2048 152.96 152.31 1.00
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 4 pp2048 251.55 251.39 1.00
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 8 pp2048 352.94 368.87 1.05
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 16 pp2048 726.41 726.45 1.00
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 32 pp2048 1178.58 1181.92 1.00
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 64 pp2048 1886.56 1888.61 1.00
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 128 pp2048 2704.42 2704.91 1.00
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 256 pp2048 3369.34 3373.59 1.00
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 512 pp2048 3506.73 3617.26 1.03
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 1024 pp2048 3567.82 4220.77 1.18
AI PRO R9700 llama 8B IQ3_S - 3.4375 bpw 2048 pp2048 3493.89 4387.21 1.26
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 1 pp2048 86.12 86.16 1.00
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 2 pp2048 153.41 153.21 1.00
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 4 pp2048 247.53 247.45 1.00
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 8 pp2048 340.64 355.02 1.04
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 16 pp2048 734.47 743.71 1.01
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 32 pp2048 1199.34 1201.05 1.00
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 64 pp2048 1895.80 1895.58 1.00
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 128 pp2048 2661.39 2660.55 1.00
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 256 pp2048 3364.58 3358.42 1.00
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 512 pp2048 3489.56 3611.79 1.04
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 1024 pp2048 3557.18 4207.80 1.18
AI PRO R9700 llama 8B IQ3_S mix - 3.66 bpw 2048 pp2048 3684.32 4372.08 1.19
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 1 pp2048 93.48 93.96 1.01
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 2 pp2048 161.93 161.90 1.00
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 4 pp2048 256.11 256.31 1.00
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 8 pp2048 361.25 396.08 1.10
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 16 pp2048 779.78 780.10 1.00
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 32 pp2048 1247.60 1247.70 1.00
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 64 pp2048 1978.17 1978.33 1.00
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 128 pp2048 2772.71 2773.59 1.00
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 256 pp2048 3471.33 3466.34 1.00
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 512 pp2048 3612.33 3631.13 1.01
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 1024 pp2048 3688.49 4229.88 1.15
AI PRO R9700 llama 8B IQ3_XS - 3.3 bpw 2048 pp2048 3828.19 4397.23 1.15
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 1 pp2048 102.38 101.94 1.00
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 2 pp2048 169.33 169.30 1.00
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 4 pp2048 259.40 258.97 1.00
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 8 pp2048 364.98 369.44 1.01
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 16 pp2048 726.17 725.29 1.00
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 32 pp2048 1255.87 1256.94 1.00
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 64 pp2048 1971.54 1970.93 1.00
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 128 pp2048 2733.36 2729.99 1.00
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 256 pp2048 3436.32 3520.26 1.02
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 512 pp2048 3606.48 3625.19 1.01
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 1024 pp2048 3684.54 4214.49 1.14
AI PRO R9700 llama 8B IQ3_XXS - 3.0625 bpw 2048 pp2048 3817.58 4389.69 1.15
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 1 pp2048 101.49 101.74 1.00
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 2 pp2048 181.27 182.33 1.01
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 4 pp2048 302.48 302.48 1.00
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 8 pp2048 416.18 415.04 1.00
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 16 pp2048 956.98 956.31 1.00
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 32 pp2048 1566.79 1566.57 1.00
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 64 pp2048 2289.63 2286.18 1.00
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 128 pp2048 3089.22 3089.75 1.00
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 256 pp2048 3887.17 3884.52 1.00
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 512 pp2048 4066.82 4056.70 1.00
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 1024 pp2048 4169.81 4238.42 1.02
AI PRO R9700 llama 8B IQ4_NL - 4.5 bpw 2048 pp2048 4445.39 4506.93 1.01
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 1 pp2048 106.10 106.00 1.00
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 2 pp2048 188.27 188.54 1.00
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 4 pp2048 314.26 313.89 1.00
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 8 pp2048 454.46 453.87 1.00
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 16 pp2048 1004.19 1002.26 1.00
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 32 pp2048 1619.41 1616.43 1.00
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 64 pp2048 2373.13 2365.48 1.00
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 128 pp2048 3139.54 3141.60 1.00
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 256 pp2048 3944.01 3943.90 1.00
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 512 pp2048 4106.43 4109.13 1.00
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 1024 pp2048 4225.78 4262.98 1.01
AI PRO R9700 llama 8B IQ4_XS - 4.25 bpw 2048 pp2048 4483.94 4522.85 1.01
AI PRO R9700 llama 8B Q2_K_S 1 pp2048 127.51 127.58 1.00
AI PRO R9700 llama 8B Q2_K_S 2 pp2048 183.80 183.68 1.00
AI PRO R9700 llama 8B Q2_K_S 4 pp2048 227.46 227.36 1.00
AI PRO R9700 llama 8B Q2_K_S 8 pp2048 265.31 265.31 1.00
AI PRO R9700 llama 8B Q2_K_S 16 pp2048 527.80 528.91 1.00
AI PRO R9700 llama 8B Q2_K_S 32 pp2048 786.36 785.94 1.00
AI PRO R9700 llama 8B Q2_K_S 64 pp2048 1116.22 1115.59 1.00
AI PRO R9700 llama 8B Q2_K_S 128 pp2048 1544.86 1541.81 1.00
AI PRO R9700 llama 8B Q2_K_S 256 pp2048 1699.24 1703.96 1.00
AI PRO R9700 llama 8B Q2_K_S 512 pp2048 1637.18 3629.01 2.22
AI PRO R9700 llama 8B Q2_K_S 1024 pp2048 1852.90 4276.27 2.31
AI PRO R9700 llama 8B Q2_K_S 2048 pp2048 1930.35 4541.01 2.35
AI PRO R9700 llama 8B Q3_K_S 1 pp2048 93.17 93.61 1.00
AI PRO R9700 llama 8B Q3_K_S 2 pp2048 154.20 154.77 1.00
AI PRO R9700 llama 8B Q3_K_S 4 pp2048 212.58 212.97 1.00
AI PRO R9700 llama 8B Q3_K_S 8 pp2048 268.51 269.00 1.00
AI PRO R9700 llama 8B Q3_K_S 16 pp2048 765.06 768.12 1.00
AI PRO R9700 llama 8B Q3_K_S 32 pp2048 1261.81 1260.81 1.00
AI PRO R9700 llama 8B Q3_K_S 64 pp2048 1838.62 1838.73 1.00
AI PRO R9700 llama 8B Q3_K_S 128 pp2048 2470.88 2473.23 1.00
AI PRO R9700 llama 8B Q3_K_S 256 pp2048 3070.46 3068.99 1.00
AI PRO R9700 llama 8B Q3_K_S 512 pp2048 3187.17 3475.47 1.09
AI PRO R9700 llama 8B Q3_K_S 1024 pp2048 3273.10 4213.28 1.29
AI PRO R9700 llama 8B Q3_K_S 2048 pp2048 3425.49 4541.10 1.33
AI PRO R9700 llama 8B Q4_0 1 pp2048 102.60 102.46 1.00
AI PRO R9700 llama 8B Q4_0 2 pp2048 182.26 182.25 1.00
AI PRO R9700 llama 8B Q4_0 4 pp2048 308.53 308.89 1.00
AI PRO R9700 llama 8B Q4_0 8 pp2048 432.80 432.97 1.00
AI PRO R9700 llama 8B Q4_0 16 pp2048 924.51 924.82 1.00
AI PRO R9700 llama 8B Q4_0 32 pp2048 1487.20 1485.29 1.00
AI PRO R9700 llama 8B Q4_0 64 pp2048 2201.51 2201.66 1.00
AI PRO R9700 llama 8B Q4_0 128 pp2048 2991.19 2990.74 1.00
AI PRO R9700 llama 8B Q4_0 256 pp2048 3791.67 3786.13 1.00
AI PRO R9700 llama 8B Q4_0 512 pp2048 3933.45 3930.52 1.00
AI PRO R9700 llama 8B Q4_0 1024 pp2048 4047.04 4044.64 1.00
AI PRO R9700 llama 8B Q4_0 2048 pp2048 4291.14 4287.12 1.00
AI PRO R9700 llama 8B Q4_1 1 pp2048 96.59 96.61 1.00
AI PRO R9700 llama 8B Q4_1 2 pp2048 174.06 173.62 1.00
AI PRO R9700 llama 8B Q4_1 4 pp2048 295.64 295.89 1.00
AI PRO R9700 llama 8B Q4_1 8 pp2048 458.61 458.33 1.00
AI PRO R9700 llama 8B Q4_1 16 pp2048 939.36 936.44 1.00
AI PRO R9700 llama 8B Q4_1 32 pp2048 1529.37 1529.71 1.00
AI PRO R9700 llama 8B Q4_1 64 pp2048 2167.02 2165.97 1.00
AI PRO R9700 llama 8B Q4_1 128 pp2048 2482.65 2481.50 1.00
AI PRO R9700 llama 8B Q4_1 256 pp2048 3263.35 3263.19 1.00
AI PRO R9700 llama 8B Q4_1 512 pp2048 3396.83 3393.68 1.00
AI PRO R9700 llama 8B Q4_1 1024 pp2048 3522.93 3520.91 1.00
AI PRO R9700 llama 8B Q4_1 2048 pp2048 3678.67 3677.86 1.00
AI PRO R9700 llama 8B Q4_K_S 1 pp2048 99.53 99.48 1.00
AI PRO R9700 llama 8B Q4_K_S 2 pp2048 168.54 168.60 1.00
AI PRO R9700 llama 8B Q4_K_S 4 pp2048 223.74 223.91 1.00
AI PRO R9700 llama 8B Q4_K_S 8 pp2048 271.76 272.02 1.00
AI PRO R9700 llama 8B Q4_K_S 16 pp2048 905.29 904.67 1.00
AI PRO R9700 llama 8B Q4_K_S 32 pp2048 1483.56 1483.84 1.00
AI PRO R9700 llama 8B Q4_K_S 64 pp2048 2090.50 2093.09 1.00
AI PRO R9700 llama 8B Q4_K_S 128 pp2048 2610.16 2611.72 1.00
AI PRO R9700 llama 8B Q4_K_S 256 pp2048 3369.53 3374.37 1.00
AI PRO R9700 llama 8B Q4_K_S 512 pp2048 3511.90 3611.18 1.03
AI PRO R9700 llama 8B Q4_K_S 1024 pp2048 3644.83 4276.09 1.17
AI PRO R9700 llama 8B Q4_K_S 2048 pp2048 3814.72 4547.60 1.19
AI PRO R9700 llama 8B Q5_1 1 pp2048 87.80 87.92 1.00
AI PRO R9700 llama 8B Q5_1 2 pp2048 157.08 157.08 1.00
AI PRO R9700 llama 8B Q5_1 4 pp2048 271.03 271.07 1.00
AI PRO R9700 llama 8B Q5_1 8 pp2048 481.45 480.92 1.00
AI PRO R9700 llama 8B Q5_1 16 pp2048 693.04 692.30 1.00
AI PRO R9700 llama 8B Q5_1 32 pp2048 1212.45 1211.42 1.00
AI PRO R9700 llama 8B Q5_1 64 pp2048 1865.91 1871.66 1.00
AI PRO R9700 llama 8B Q5_1 128 pp2048 2429.61 2437.17 1.00
AI PRO R9700 llama 8B Q5_1 256 pp2048 3148.45 3152.74 1.00
AI PRO R9700 llama 8B Q5_1 512 pp2048 3310.21 3311.64 1.00
AI PRO R9700 llama 8B Q5_1 1024 pp2048 3438.49 3442.25 1.00
AI PRO R9700 llama 8B Q5_1 2048 pp2048 3600.85 3603.21 1.00
AI PRO R9700 llama 8B Q5_K_S 1 pp2048 87.88 87.95 1.00
AI PRO R9700 llama 8B Q5_K_S 2 pp2048 155.77 155.80 1.00
AI PRO R9700 llama 8B Q5_K_S 4 pp2048 216.10 217.14 1.00
AI PRO R9700 llama 8B Q5_K_S 8 pp2048 267.32 267.41 1.00
AI PRO R9700 llama 8B Q5_K_S 16 pp2048 884.19 886.08 1.00
AI PRO R9700 llama 8B Q5_K_S 32 pp2048 1441.82 1441.87 1.00
AI PRO R9700 llama 8B Q5_K_S 64 pp2048 2085.77 2086.25 1.00
AI PRO R9700 llama 8B Q5_K_S 128 pp2048 2538.90 2533.59 1.00
AI PRO R9700 llama 8B Q5_K_S 256 pp2048 3266.99 3263.13 1.00
AI PRO R9700 llama 8B Q5_K_S 512 pp2048 3411.74 3477.53 1.02
AI PRO R9700 llama 8B Q5_K_S 1024 pp2048 3530.90 4179.95 1.18
AI PRO R9700 llama 8B Q5_K_S 2048 pp2048 3699.44 4497.06 1.22
AI PRO R9700 llama 8B Q6_K 1 pp2048 80.14 80.07 1.00
AI PRO R9700 llama 8B Q6_K 2 pp2048 142.74 142.75 1.00
AI PRO R9700 llama 8B Q6_K 4 pp2048 220.62 220.19 1.00
AI PRO R9700 llama 8B Q6_K 8 pp2048 299.41 298.36 1.00
AI PRO R9700 llama 8B Q6_K 16 pp2048 694.25 696.38 1.00
AI PRO R9700 llama 8B Q6_K 32 pp2048 1025.43 1028.10 1.00
AI PRO R9700 llama 8B Q6_K 64 pp2048 1391.83 1391.96 1.00
AI PRO R9700 llama 8B Q6_K 128 pp2048 1710.23 1706.08 1.00
AI PRO R9700 llama 8B Q6_K 256 pp2048 2077.78 2469.52 1.19
AI PRO R9700 llama 8B Q6_K 512 pp2048 2051.08 3493.28 1.70
AI PRO R9700 llama 8B Q6_K 1024 pp2048 2237.69 4179.11 1.87
AI PRO R9700 llama 8B Q6_K 2048 pp2048 2294.96 4487.10 1.96
AI PRO R9700 llama 8B Q8_0 1 pp2048 68.61 68.62 1.00
AI PRO R9700 llama 8B Q8_0 2 pp2048 127.59 127.57 1.00
AI PRO R9700 llama 8B Q8_0 4 pp2048 225.68 225.76 1.00
AI PRO R9700 llama 8B Q8_0 8 pp2048 394.17 394.39 1.00
AI PRO R9700 llama 8B Q8_0 16 pp2048 758.14 757.74 1.00
AI PRO R9700 llama 8B Q8_0 32 pp2048 1361.10 1359.20 1.00
AI PRO R9700 llama 8B Q8_0 64 pp2048 1982.91 1977.08 1.00
AI PRO R9700 llama 8B Q8_0 128 pp2048 2826.59 2837.91 1.00
AI PRO R9700 llama 8B Q8_0 256 pp2048 3699.40 3699.17 1.00
AI PRO R9700 llama 8B Q8_0 512 pp2048 3906.16 3900.19 1.00
AI PRO R9700 llama 8B Q8_0 1024 pp2048 4075.83 4149.35 1.02
AI PRO R9700 llama 8B Q8_0 2048 pp2048 4353.68 4466.01 1.03
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 1 pp2048 88.45 88.31 1.00
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 2 pp2048 138.57 138.83 1.00
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 4 pp2048 188.69 188.36 1.00
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 8 pp2048 244.38 240.53 0.98
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 16 pp2048 539.20 539.38 1.00
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 32 pp2048 823.47 824.30 1.00
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 64 pp2048 1165.75 1163.84 1.00
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 128 pp2048 1433.86 1432.05 1.00
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 256 pp2048 1737.17 1734.99 1.00
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 512 pp2048 1801.18 2112.87 1.17
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 1024 pp2048 1969.73 2436.01 1.24
AI PRO R9700 qwen3 14B IQ1_S - 1.5625 bpw 2048 pp2048 2045.01 2679.23 1.31
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 1 pp2048 61.20 61.26 1.00
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 2 pp2048 100.65 100.54 1.00
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 4 pp2048 149.82 149.78 1.00
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 8 pp2048 230.72 225.19 0.98
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 16 pp2048 288.77 288.69 1.00
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 32 pp2048 657.77 657.30 1.00
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 64 pp2048 1017.96 1016.29 1.00
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 128 pp2048 1227.95 1227.58 1.00
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 256 pp2048 1485.19 1486.27 1.00
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 512 pp2048 1565.19 2105.61 1.35
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 1024 pp2048 1686.24 2417.61 1.43
AI PRO R9700 qwen3 14B IQ2_S - 2.5 bpw 2048 pp2048 1724.92 2629.55 1.52
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 1 pp2048 63.95 63.89 1.00
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 2 pp2048 103.22 103.10 1.00
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 4 pp2048 150.76 150.98 1.00
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 8 pp2048 230.58 230.70 1.00
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 16 pp2048 281.73 281.82 1.00
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 32 pp2048 653.69 654.07 1.00
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 64 pp2048 992.38 992.05 1.00
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 128 pp2048 1159.31 1157.99 1.00
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 256 pp2048 1413.57 1413.29 1.00
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 512 pp2048 1497.69 2122.42 1.42
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 1024 pp2048 1622.15 2441.45 1.51
AI PRO R9700 qwen3 14B IQ2_XS - 2.3125 bpw 2048 pp2048 1674.03 2696.34 1.61
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 1 pp2048 51.76 51.72 1.00
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 2 pp2048 90.64 90.61 1.00
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 4 pp2048 146.35 146.14 1.00
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 8 pp2048 199.15 214.62 1.08
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 16 pp2048 423.37 423.64 1.00
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 32 pp2048 662.50 662.22 1.00
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 64 pp2048 1070.05 1070.70 1.00
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 128 pp2048 1447.96 1445.76 1.00
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 256 pp2048 1749.16 1749.04 1.00
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 512 pp2048 1811.15 2126.07 1.17
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 1024 pp2048 1982.87 2445.90 1.23
AI PRO R9700 qwen3 14B IQ2_XXS - 2.0625 bpw 2048 pp2048 2055.64 2698.61 1.31
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 1 pp2048 48.59 48.49 1.00
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 2 pp2048 87.02 87.11 1.00
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 4 pp2048 144.38 144.14 1.00
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 8 pp2048 201.21 203.21 1.01
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 16 pp2048 402.44 402.17 1.00
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 32 pp2048 647.95 647.61 1.00
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 64 pp2048 1079.68 1078.44 1.00
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 128 pp2048 1503.94 1506.41 1.00
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 256 pp2048 1795.33 1795.48 1.00
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 512 pp2048 1849.29 2074.79 1.12
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 1024 pp2048 2019.41 2382.54 1.18
AI PRO R9700 qwen3 14B IQ3_S - 3.4375 bpw 2048 pp2048 2071.77 2594.24 1.25
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 1 pp2048 59.29 59.13 1.00
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 2 pp2048 97.85 97.78 1.00
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 4 pp2048 148.19 147.92 1.00
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 8 pp2048 206.79 212.42 1.03
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 16 pp2048 420.79 420.78 1.00
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 32 pp2048 709.17 708.53 1.00
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 64 pp2048 1152.28 1151.52 1.00
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 128 pp2048 1530.47 1530.98 1.00
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 256 pp2048 1845.71 1913.24 1.04
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 512 pp2048 1948.44 2084.17 1.07
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 1024 pp2048 2089.21 2388.79 1.14
AI PRO R9700 qwen3 14B IQ3_XXS - 3.0625 bpw 2048 pp2048 2146.83 2600.82 1.21
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 1 pp2048 61.83 61.63 1.00
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 2 pp2048 108.97 108.86 1.00
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 4 pp2048 188.03 187.68 1.00
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 8 pp2048 237.19 236.88 1.00
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 16 pp2048 577.69 577.50 1.00
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 32 pp2048 929.66 928.58 1.00
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 64 pp2048 1382.97 1378.78 1.00
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 128 pp2048 1762.55 1761.80 1.00
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 256 pp2048 2119.89 2113.23 1.00
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 512 pp2048 2316.67 2310.17 1.00
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 1024 pp2048 2388.71 2392.21 1.00
AI PRO R9700 qwen3 14B IQ4_NL - 4.5 bpw 2048 pp2048 2494.71 2636.65 1.06
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 1 pp2048 65.57 65.57 1.00
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 2 pp2048 116.58 116.79 1.00
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 4 pp2048 194.72 194.36 1.00
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 8 pp2048 262.17 261.75 1.00
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 16 pp2048 602.71 602.89 1.00
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 32 pp2048 940.83 946.04 1.01
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 64 pp2048 1434.16 1436.56 1.00
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 128 pp2048 1789.33 1789.60 1.00
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 256 pp2048 2149.84 2149.55 1.00
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 512 pp2048 2336.26 2339.30 1.00
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 1024 pp2048 2414.17 2398.50 0.99
AI PRO R9700 qwen3 14B IQ4_XS - 4.25 bpw 2048 pp2048 2516.52 2654.57 1.05
AI PRO R9700 qwen3 14B Q2_K_M 1 pp2048 67.14 66.97 1.00
AI PRO R9700 qwen3 14B Q2_K_M 2 pp2048 98.69 98.43 1.00
AI PRO R9700 qwen3 14B Q2_K_M 4 pp2048 125.38 125.05 1.00
AI PRO R9700 qwen3 14B Q2_K_M 8 pp2048 148.81 148.30 1.00
AI PRO R9700 qwen3 14B Q2_K_M 16 pp2048 321.11 322.16 1.00
AI PRO R9700 qwen3 14B Q2_K_M 32 pp2048 468.21 468.02 1.00
AI PRO R9700 qwen3 14B Q2_K_M 64 pp2048 704.74 703.50 1.00
AI PRO R9700 qwen3 14B Q2_K_M 128 pp2048 904.05 903.52 1.00
AI PRO R9700 qwen3 14B Q2_K_M 256 pp2048 998.80 997.30 1.00
AI PRO R9700 qwen3 14B Q2_K_M 512 pp2048 1064.20 2070.48 1.95
AI PRO R9700 qwen3 14B Q2_K_M 1024 pp2048 1189.24 2406.23 2.02
AI PRO R9700 qwen3 14B Q2_K_M 2048 pp2048 1229.45 2672.19 2.17
AI PRO R9700 qwen3 14B Q3_K_M 1 pp2048 55.77 55.68 1.00
AI PRO R9700 qwen3 14B Q3_K_M 2 pp2048 91.09 90.72 1.00
AI PRO R9700 qwen3 14B Q3_K_M 4 pp2048 122.42 121.44 0.99
AI PRO R9700 qwen3 14B Q3_K_M 8 pp2048 152.01 150.67 0.99
AI PRO R9700 qwen3 14B Q3_K_M 16 pp2048 477.43 477.99 1.00
AI PRO R9700 qwen3 14B Q3_K_M 32 pp2048 770.16 770.12 1.00
AI PRO R9700 qwen3 14B Q3_K_M 64 pp2048 1123.53 1124.25 1.00
AI PRO R9700 qwen3 14B Q3_K_M 128 pp2048 1406.27 1407.97 1.00
AI PRO R9700 qwen3 14B Q3_K_M 256 pp2048 1673.59 1673.71 1.00
AI PRO R9700 qwen3 14B Q3_K_M 512 pp2048 1737.00 2042.14 1.18
AI PRO R9700 qwen3 14B Q3_K_M 1024 pp2048 1897.91 2391.67 1.26
AI PRO R9700 qwen3 14B Q3_K_M 2048 pp2048 1968.36 2661.94 1.35
AI PRO R9700 qwen3 14B Q4_0 1 pp2048 63.15 63.06 1.00
AI PRO R9700 qwen3 14B Q4_0 2 pp2048 111.50 111.32 1.00
AI PRO R9700 qwen3 14B Q4_0 4 pp2048 191.08 191.47 1.00
AI PRO R9700 qwen3 14B Q4_0 8 pp2048 252.67 252.89 1.00
AI PRO R9700 qwen3 14B Q4_0 16 pp2048 546.09 546.03 1.00
AI PRO R9700 qwen3 14B Q4_0 32 pp2048 867.70 868.40 1.00
AI PRO R9700 qwen3 14B Q4_0 64 pp2048 1325.79 1324.34 1.00
AI PRO R9700 qwen3 14B Q4_0 128 pp2048 1712.08 1713.05 1.00
AI PRO R9700 qwen3 14B Q4_0 256 pp2048 2038.27 2044.90 1.00
AI PRO R9700 qwen3 14B Q4_0 512 pp2048 2224.74 2225.07 1.00
AI PRO R9700 qwen3 14B Q4_0 1024 pp2048 2298.66 2297.12 1.00
AI PRO R9700 qwen3 14B Q4_0 2048 pp2048 2397.74 2394.58 1.00
AI PRO R9700 qwen3 14B Q4_1 1 pp2048 59.00 59.07 1.00
AI PRO R9700 qwen3 14B Q4_1 2 pp2048 106.88 106.80 1.00
AI PRO R9700 qwen3 14B Q4_1 4 pp2048 186.14 186.03 1.00
AI PRO R9700 qwen3 14B Q4_1 8 pp2048 266.71 266.68 1.00
AI PRO R9700 qwen3 14B Q4_1 16 pp2048 601.76 600.82 1.00
AI PRO R9700 qwen3 14B Q4_1 32 pp2048 935.00 934.83 1.00
AI PRO R9700 qwen3 14B Q4_1 64 pp2048 1312.32 1309.31 1.00
AI PRO R9700 qwen3 14B Q4_1 128 pp2048 1450.15 1445.24 1.00
AI PRO R9700 qwen3 14B Q4_1 256 pp2048 1726.94 1712.03 0.99
AI PRO R9700 qwen3 14B Q4_1 512 pp2048 1797.02 1791.48 1.00
AI PRO R9700 qwen3 14B Q4_1 1024 pp2048 1973.21 1964.82 1.00
AI PRO R9700 qwen3 14B Q4_1 2048 pp2048 2047.04 2040.09 1.00
AI PRO R9700 qwen3 14B Q4_K_M 1 pp2048 57.84 58.00 1.00
AI PRO R9700 qwen3 14B Q4_K_M 2 pp2048 96.98 97.19 1.00
AI PRO R9700 qwen3 14B Q4_K_M 4 pp2048 128.57 128.62 1.00
AI PRO R9700 qwen3 14B Q4_K_M 8 pp2048 155.93 156.14 1.00
AI PRO R9700 qwen3 14B Q4_K_M 16 pp2048 542.07 540.84 1.00
AI PRO R9700 qwen3 14B Q4_K_M 32 pp2048 812.98 810.63 1.00
AI PRO R9700 qwen3 14B Q4_K_M 64 pp2048 1144.81 1143.60 1.00
AI PRO R9700 qwen3 14B Q4_K_M 128 pp2048 1343.76 1343.08 1.00
AI PRO R9700 qwen3 14B Q4_K_M 256 pp2048 1588.71 1737.02 1.09
AI PRO R9700 qwen3 14B Q4_K_M 512 pp2048 1711.28 2047.32 1.20
AI PRO R9700 qwen3 14B Q4_K_M 1024 pp2048 1867.44 2393.28 1.28
AI PRO R9700 qwen3 14B Q4_K_M 2048 pp2048 1938.66 2661.19 1.37
AI PRO R9700 qwen3 14B Q5_0 1 pp2048 53.76 53.89 1.00
AI PRO R9700 qwen3 14B Q5_0 2 pp2048 98.05 98.33 1.00
AI PRO R9700 qwen3 14B Q5_0 4 pp2048 168.78 168.39 1.00
AI PRO R9700 qwen3 14B Q5_0 8 pp2048 242.69 241.75 1.00
AI PRO R9700 qwen3 14B Q5_0 16 pp2048 457.95 457.84 1.00
AI PRO R9700 qwen3 14B Q5_0 32 pp2048 739.30 739.72 1.00
AI PRO R9700 qwen3 14B Q5_0 64 pp2048 1189.99 1192.07 1.00
AI PRO R9700 qwen3 14B Q5_0 128 pp2048 1588.94 1596.27 1.00
AI PRO R9700 qwen3 14B Q5_0 256 pp2048 1899.71 1903.30 1.00
AI PRO R9700 qwen3 14B Q5_0 512 pp2048 1953.07 1955.20 1.00
AI PRO R9700 qwen3 14B Q5_0 1024 pp2048 2142.79 2149.88 1.00
AI PRO R9700 qwen3 14B Q5_0 2048 pp2048 2234.45 2235.30 1.00
AI PRO R9700 qwen3 14B Q5_1 1 pp2048 51.49 51.64 1.00
AI PRO R9700 qwen3 14B Q5_1 2 pp2048 95.34 95.42 1.00
AI PRO R9700 qwen3 14B Q5_1 4 pp2048 167.41 167.20 1.00
AI PRO R9700 qwen3 14B Q5_1 8 pp2048 286.20 284.91 1.00
AI PRO R9700 qwen3 14B Q5_1 16 pp2048 521.04 520.87 1.00
AI PRO R9700 qwen3 14B Q5_1 32 pp2048 829.97 829.18 1.00
AI PRO R9700 qwen3 14B Q5_1 64 pp2048 1212.58 1208.33 1.00
AI PRO R9700 qwen3 14B Q5_1 128 pp2048 1418.38 1418.20 1.00
AI PRO R9700 qwen3 14B Q5_1 256 pp2048 1685.55 1686.12 1.00
AI PRO R9700 qwen3 14B Q5_1 512 pp2048 1755.91 1760.00 1.00
AI PRO R9700 qwen3 14B Q5_1 1024 pp2048 1920.31 1922.74 1.00
AI PRO R9700 qwen3 14B Q5_1 2048 pp2048 1989.70 1989.42 1.00
AI PRO R9700 qwen3 14B Q5_K_M 1 pp2048 51.59 51.56 1.00
AI PRO R9700 qwen3 14B Q5_K_M 2 pp2048 90.56 90.42 1.00
AI PRO R9700 qwen3 14B Q5_K_M 4 pp2048 125.00 124.73 1.00
AI PRO R9700 qwen3 14B Q5_K_M 8 pp2048 153.07 152.60 1.00
AI PRO R9700 qwen3 14B Q5_K_M 16 pp2048 512.25 511.70 1.00
AI PRO R9700 qwen3 14B Q5_K_M 32 pp2048 785.92 785.94 1.00
AI PRO R9700 qwen3 14B Q5_K_M 64 pp2048 1129.06 1129.35 1.00
AI PRO R9700 qwen3 14B Q5_K_M 128 pp2048 1306.09 1304.79 1.00
AI PRO R9700 qwen3 14B Q5_K_M 256 pp2048 1540.41 1680.76 1.09
AI PRO R9700 qwen3 14B Q5_K_M 512 pp2048 1665.72 2006.47 1.20
AI PRO R9700 qwen3 14B Q5_K_M 1024 pp2048 1812.42 2364.19 1.30
AI PRO R9700 qwen3 14B Q5_K_M 2048 pp2048 1878.64 2628.45 1.40
AI PRO R9700 qwen3 14B Q6_K 1 pp2048 46.06 46.09 1.00
AI PRO R9700 qwen3 14B Q6_K 2 pp2048 83.32 83.37 1.00
AI PRO R9700 qwen3 14B Q6_K 4 pp2048 125.32 125.16 1.00
AI PRO R9700 qwen3 14B Q6_K 8 pp2048 167.87 167.59 1.00
AI PRO R9700 qwen3 14B Q6_K 16 pp2048 389.45 387.53 1.00
AI PRO R9700 qwen3 14B Q6_K 32 pp2048 572.15 569.51 1.00
AI PRO R9700 qwen3 14B Q6_K 64 pp2048 774.56 773.38 1.00
AI PRO R9700 qwen3 14B Q6_K 128 pp2048 887.25 889.35 1.00
AI PRO R9700 qwen3 14B Q6_K 256 pp2048 986.31 1402.96 1.42
AI PRO R9700 qwen3 14B Q6_K 512 pp2048 1150.54 2020.39 1.76
AI PRO R9700 qwen3 14B Q6_K 1024 pp2048 1202.45 2370.73 1.97
AI PRO R9700 qwen3 14B Q6_K 2048 pp2048 1239.87 2632.27 2.12
AI PRO R9700 qwen3 14B Q8_0 1 pp2048 38.43 38.54 1.00
AI PRO R9700 qwen3 14B Q8_0 2 pp2048 71.81 71.72 1.00
AI PRO R9700 qwen3 14B Q8_0 4 pp2048 128.53 128.50 1.00
AI PRO R9700 qwen3 14B Q8_0 8 pp2048 220.44 220.50 1.00
AI PRO R9700 qwen3 14B Q8_0 16 pp2048 447.53 447.43 1.00
AI PRO R9700 qwen3 14B Q8_0 32 pp2048 779.58 780.27 1.00
AI PRO R9700 qwen3 14B Q8_0 64 pp2048 1229.35 1228.14 1.00
AI PRO R9700 qwen3 14B Q8_0 128 pp2048 1671.64 1671.91 1.00
AI PRO R9700 qwen3 14B Q8_0 256 pp2048 2004.81 2005.77 1.00
AI PRO R9700 qwen3 14B Q8_0 512 pp2048 2125.96 2128.48 1.00
AI PRO R9700 qwen3 14B Q8_0 1024 pp2048 2247.91 2336.46 1.04
AI PRO R9700 qwen3 14B Q8_0 2048 pp2048 2336.96 2605.04 1.11

@JohannesGaessler
Copy link
Contributor

I don't think the kernel selection logic should be changed like this. For batch sizes < 1024 you are reporting at most a marginal speedup that is I think not worth the increase in memory use from dequantizing the weights to FP16.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 13, 2026
@slojosic-amd
Copy link
Contributor

slojosic-amd commented Jan 15, 2026

@jiachengjason Could you please repeat your testing with -fa 1 because I saw that you used -fa 0:
image

@JohannesGaessler I built the latest llama.cpp with GGML_HIP_ROCWMMA_FATTN=ON + this PR included and for Q4_K_M version of Llama-3.2-3B, Llama-3.1-8B, Qwen3-4B models I got some perf boost for bigger n_ubatch:

With this PR included:

image image image

Without this PR included:

image image image

@JohannesGaessler
Copy link
Contributor

Okay, but clearly this is dependent on factors that this PR does not account for. As I've said before, the default kernel selection logic should be applicable to the default way to run the software. If it depends e.g. on environment variables being set that needs an explicit check in the code.

@jiachengjason jiachengjason marked this pull request as draft January 19, 2026 17:32
@jiachengjason
Copy link
Contributor Author

did some further tuning such that most of the models would get a significant amount of perf gain for micro batch sizes > 256 and at micro batch size 8 (+9% to +230% perf gain)

@jiachengjason jiachengjason marked this pull request as ready for review January 26, 2026 23:10
@jiachengjason
Copy link
Contributor Author

Hi @JohannesGaessler just want to follow up on this PR, as I did some further tuning such that most of the models would get a bigger performance gain for micro batch sizes > 256 and at micro batch size 8 as mentioned above. Thank you.

@JohannesGaessler
Copy link
Contributor

When I do a quick test on my RX 9060 XT:

GPU Model Microbatch size Test t/s b7819 t/s 08d4445 Speedup
RX 9060 XT llama 8B Q6_K 1 pp512 39.43 39.49 1.00
RX 9060 XT llama 8B Q6_K 2 pp512 75.07 75.06 1.00
RX 9060 XT llama 8B Q6_K 4 pp512 126.04 126.68 1.01
RX 9060 XT llama 8B Q6_K 8 pp512 161.52 162.33 1.01
RX 9060 XT llama 8B Q6_K 16 pp512 449.10 449.84 1.00
RX 9060 XT llama 8B Q6_K 32 pp512 605.89 606.89 1.00
RX 9060 XT llama 8B Q6_K 64 pp512 932.50 935.15 1.00
RX 9060 XT llama 8B Q6_K 128 pp512 1159.37 1165.26 1.01
RX 9060 XT llama 8B Q6_K 256 pp512 1255.62 336.54 0.27
RX 9060 XT llama 8B Q6_K 512 pp512 1275.10 351.27 0.28

This is with ROCm 7.1.1 at the default settings and environment variables where this PR is clearly detrimental. If you are changing anything in your environment that will need an explicit check in the code.

@jiachengjason
Copy link
Contributor Author

When I do a quick test on my RX 9060 XT:

GPU Model Microbatch size Test t/s b7819 t/s 08d4445 Speedup
RX 9060 XT llama 8B Q6_K 1 pp512 39.43 39.49 1.00
RX 9060 XT llama 8B Q6_K 2 pp512 75.07 75.06 1.00
RX 9060 XT llama 8B Q6_K 4 pp512 126.04 126.68 1.01
RX 9060 XT llama 8B Q6_K 8 pp512 161.52 162.33 1.01
RX 9060 XT llama 8B Q6_K 16 pp512 449.10 449.84 1.00
RX 9060 XT llama 8B Q6_K 32 pp512 605.89 606.89 1.00
RX 9060 XT llama 8B Q6_K 64 pp512 932.50 935.15 1.00
RX 9060 XT llama 8B Q6_K 128 pp512 1159.37 1165.26 1.01
RX 9060 XT llama 8B Q6_K 256 pp512 1255.62 336.54 0.27
RX 9060 XT llama 8B Q6_K 512 pp512 1275.10 351.27 0.28
This is with ROCm 7.1.1 at the default settings and environment variables where this PR is clearly detrimental. If you are changing anything in your environment that will need an explicit check in the code.

Hi @JohannesGaessler I used this build command
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" cmake -S . -B build -DGGML_HIP=ON -DGGML_CUDA_FORCE_MMQ=OFF -DGGML_HIP_UMA=OFF -DGPU_TARGETS="gfx1201" -DGGML_HIP_GRAPHS=OFF -DLLAMA_CURL=OFF -DGGML_CUDA_FORCE_CUBLAS=OFF -DCMAKE_BUILD_TYPE=Release && cmake --build build --config Release -- -j 32

the following default run command
HIP_VISIBLE_DEVICES=0 ./build/bin/llama-bench -m /mnt/nas_share/models/gguf/llama-8b/llama-8b-q6_k.gguf -n 0 -ub "1-2048*2"

on ROCm 7.1.1 I don't see the huge regression that you have for micro batch for 256 and 512, I am wondering what was your build and run command that you used?

This tuning increases the perf gain when used with flash attention, and should maintain default performance without.

GPU Model Microbatch size Test t/s master (8bece) t/s fix/jiachengjason/rocm7.x_regression Speedup
AI PRO R9700 llama 8B Q6_K 1 pp512 75.92 75.67 1.00
AI PRO R9700 llama 8B Q6_K 2 pp512 135.33 135.60 1.00
AI PRO R9700 llama 8B Q6_K 4 pp512 215.82 216.67 1.00
AI PRO R9700 llama 8B Q6_K 8 pp512 302.92 304.30 1.00
AI PRO R9700 llama 8B Q6_K 16 pp512 685.74 687.68 1.00
AI PRO R9700 llama 8B Q6_K 32 pp512 47.16 47.33 1.00
AI PRO R9700 llama 8B Q6_K 64 pp512 94.48 94.89 1.00
AI PRO R9700 llama 8B Q6_K 128 pp512 190.08 189.63 1.00
AI PRO R9700 llama 8B Q6_K 256 pp512 376.06 376.35 1.00
AI PRO R9700 llama 8B Q6_K 512 pp512 734.63 737.49 1.00
AI PRO R9700 llama 8B Q6_K 1024 pp512 739.59 739.47 1.00
AI PRO R9700 llama 8B Q6_K 2048 pp512 736.55 736.53 1.00

@JohannesGaessler
Copy link
Contributor

On Linux 6.12 I used this build command:

cmake -DCMAKE_BUILD_TYPE=Release -DGGML_HIP=ON .. && time cmake --build . -j 32 -- --quiet && echo -e "\a"
export mn=llama_3-8b && export q=q6_k
./build/bin/llama-bench --model models/opt/${mn}-${q}.gguf -fa 1 -r 1 -n 0 -ub "1-512*2" -o sql|sqlite3 llama-bench.sqlite

Looking at the raw numbers, the MMQ performance you're reporting is very bad relative to the specs of the card so I think that there is something else wrong.

@jiachengjason
Copy link
Contributor Author

-o sql|sqlite3 llama-bench.sqlite

Hi @JohannesGaessler, running your exact same build and run commands gives me the following results. This is my environment (AMDSMI Tool: 26.2.1+fc0010cf6a | AMDSMI Library version: 26.2.1 | ROCm version: 7.2.0 | amdgpu version: 6.16.6 | hsmp version: N/A)

GPU Model Microbatch size Test t/s b7819 t/s fix/jiachengjason/rocm7.x_regression Speedup
AI PRO R9700 llama 8B Q6_K 1 pp512 25.41 25.45 1.00
AI PRO R9700 llama 8B Q6_K 2 pp512 48.81 48.70 1.00
AI PRO R9700 llama 8B Q6_K 4 pp512 91.34 91.13 1.00
AI PRO R9700 llama 8B Q6_K 8 pp512 166.96 168.41 1.01
AI PRO R9700 llama 8B Q6_K 16 pp512 263.67 266.10 1.01
AI PRO R9700 llama 8B Q6_K 32 pp512 183.16 183.50 1.00
AI PRO R9700 llama 8B Q6_K 64 pp512 643.70 646.30 1.00
AI PRO R9700 llama 8B Q6_K 128 pp512 823.50 823.80 1.00
AI PRO R9700 llama 8B Q6_K 256 pp512 1077.52 929.57 0.86
AI PRO R9700 llama 8B Q6_K 512 pp512 1218.08 1655.18 1.36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants