ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions#12154
ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions#12154slaren merged 6 commits intoggml-org:masterfrom
Conversation
Please add also an option to enable it manually, add a check in cpu-feats-x86.cpp, and add it to the CPU variant list in: llama.cpp/ggml/src/CMakeLists.txt Lines 308 to 312 in cc473ca You could also check for Zen 2 in |
|
https://github.com/zwegner/zp7 Integrating something like the ZP7 (Zach's Peppy Parallel-Prefix-Popcountin' PEXT/PDEP Polyfill) into llama.cpp could be a smart way to address the performance issues with PDEP and PEXT on AMD Zen 2 and earlier CPUs while maintaining compatibility and efficiency across platforms. Just a polite suggestion. |
dd8f10c to
d1aeed0
Compare
|
Update with CMakeLists changes (no Zen 2 specific case, maybe a separate PR can add AMD microarchitectures). |
071c312 to
a3db575
Compare
|
Looks good, thanks. It would also be necessary to add a llama.cpp/ggml/src/ggml-cpu/ggml-cpu.cpp Line 488 in 1a24c46 I suspect that MSVC will enable BMI2 with llama.cpp/ggml/src/ggml-cpu/CMakeLists.txt Lines 209 to 212 in a3db575 I can check for you if you don't have access to a machine with MSVC. |
|
Done. |
|
13900k:
|
…l-org#12154) * ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions * cmake: Add GGML_BMI2 build option * ggml: enable BMI2 on relevant CPU variants * ggml-cpu: include BMI2 in backend score * ggml-cpu: register BMI2 in ggml_backend_cpu_get_features * ggml-cpu: add __BMI2__ define when using MSVC
|
Hello @slaren , @remyoudompheng , It seems that after this PR x86 with AVX2 build for MSVC is failing:
cmake command: Do you have any recommendation on how to fix this issue? |
Nevermind, just disabled the support for BMI2 on Win32 using |
|
Hey guys, having issues with this commit, I don't know why. I put all the relevant information and what I could find issue, I did try and compile with various CUDA versions and kind of worked my way to the the current commit. |
|
Just a heads up, I am confirming that the BMI2 detection is probably wrong because it's forcing BMI2 on a non BMI2 CPU. |


AFAIK the CPU backend does not contain any x86 BMI2 instructions yet.
Is it fine to introduce code using BMI2 instructions?
Is it fine to simply use the
__BMI2__since "NATIVE" build is now the standard?Some numbers on Zen 4 (new code is about 50% faster)
Note that some old CPUs (AMD Zen 2 and older) support BMI2 but emulate instructions using microcode, resulting in catastrophic slowdowns: owners of such hardware would need to manually disable BMI2 in compiler using
-mno-bmi2.