Skip to content

kleidiai: add CPU feature detection to CI run script#20394

Merged
ggerganov merged 3 commits into
ggml-org:masterfrom
martin-klacer-arm:feature/ci_script_cpu_detection
Apr 1, 2026
Merged

kleidiai: add CPU feature detection to CI run script#20394
ggerganov merged 3 commits into
ggml-org:masterfrom
martin-klacer-arm:feature/ci_script_cpu_detection

Conversation

@martin-klacer-arm
Copy link
Copy Markdown
Contributor

This patch adds CPU feature detection for KleidiAI build. Previously, the -march flags for KleidiAI build were decided only by the present compiler capabilities without considering the current CPU, which is addressed in this patch.

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Change-Id: I663adc3a7691a98e7dac5488962c13cc344f034a
@github-actions github-actions Bot added python python script changes devops improvements to build systems and github actions labels Mar 11, 2026
Comment thread requirements/requirements-tool_bench.txt Outdated
Signed-off-by: Martin Klacer <martin.klacer@arm.com>
@martin-klacer-arm martin-klacer-arm requested a review from CISC March 11, 2026 16:51
@martin-klacer-arm
Copy link
Copy Markdown
Contributor Author

Hi, just wondering if there's any updates on this? I'd appreciate another look if you have the time, thank you

@ggerganov
Copy link
Copy Markdown
Member

Could you try to utilize and extend (if needed) the existing GGML_CPU_ALL_VARIANTS functionality to support this?

ggml_add_backend(CPU)
if (GGML_CPU_ALL_VARIANTS)
if (NOT GGML_BACKEND_DL)
message(FATAL_ERROR "GGML_CPU_ALL_VARIANTS requires GGML_BACKEND_DL")
elseif (GGML_CPU_ARM_ARCH)
message(FATAL_ERROR "Cannot use both GGML_CPU_ARM_ARCH and GGML_CPU_ALL_VARIANTS")
endif()
if (GGML_SYSTEM_ARCH STREQUAL "x86")
ggml_add_cpu_backend_variant(x64)
ggml_add_cpu_backend_variant(sse42 SSE42)
ggml_add_cpu_backend_variant(sandybridge SSE42 AVX)
if (NOT MSVC)
# __FMA__ and __F16C__ are not defined in MSVC, however they are implied with AVX2/AVX512
ggml_add_cpu_backend_variant(ivybridge SSE42 AVX F16C)
ggml_add_cpu_backend_variant(piledriver SSE42 AVX F16C FMA)
endif()
ggml_add_cpu_backend_variant(haswell SSE42 AVX F16C FMA AVX2 BMI2)
ggml_add_cpu_backend_variant(skylakex SSE42 AVX F16C FMA AVX2 BMI2 AVX512)
ggml_add_cpu_backend_variant(cannonlake SSE42 AVX F16C FMA AVX2 BMI2 AVX512 AVX512_VBMI)
ggml_add_cpu_backend_variant(cascadelake SSE42 AVX F16C FMA AVX2 BMI2 AVX512 AVX512_VNNI)
ggml_add_cpu_backend_variant(icelake SSE42 AVX F16C FMA AVX2 BMI2 AVX512 AVX512_VBMI AVX512_VNNI)
if (NOT MSVC)
# MSVC 2022 doesn't support BF16 intrinsics without `/arch:AVX10.1` ?!
# https://learn.microsoft.com/en-us/cpp/intrinsics/x64-amd64-intrinsics-list?view=msvc-170
# https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
ggml_add_cpu_backend_variant(cooperlake SSE42 AVX F16C FMA AVX2 BMI2 AVX512 AVX512_VNNI AVX512_BF16)
ggml_add_cpu_backend_variant(zen4 SSE42 AVX F16C FMA AVX2 BMI2 AVX512 AVX512_VBMI AVX512_VNNI AVX512_BF16)
endif()
ggml_add_cpu_backend_variant(alderlake SSE42 AVX F16C FMA AVX2 BMI2 AVX_VNNI)
if (NOT MSVC)
# MSVC doesn't support AMX
ggml_add_cpu_backend_variant(sapphirerapids SSE42 AVX F16C FMA AVX2 BMI2 AVX512 AVX512_VBMI AVX512_VNNI AVX512_BF16 AMX_TILE AMX_INT8)
endif()
elseif(GGML_SYSTEM_ARCH STREQUAL "ARM")
if (CMAKE_SYSTEM_NAME MATCHES "Linux")
# Many of these features are optional so we build versions with popular
# combinations and name the backends based on the version they were
# first released with
ggml_add_cpu_backend_variant(armv8.0_1)
ggml_add_cpu_backend_variant(armv8.2_1 DOTPROD)
ggml_add_cpu_backend_variant(armv8.2_2 DOTPROD FP16_VECTOR_ARITHMETIC)
ggml_add_cpu_backend_variant(armv8.2_3 DOTPROD FP16_VECTOR_ARITHMETIC SVE)
ggml_add_cpu_backend_variant(armv8.6_1 DOTPROD FP16_VECTOR_ARITHMETIC SVE MATMUL_INT8)
ggml_add_cpu_backend_variant(armv8.6_2 DOTPROD FP16_VECTOR_ARITHMETIC SVE MATMUL_INT8 SVE2)
ggml_add_cpu_backend_variant(armv9.2_1 DOTPROD FP16_VECTOR_ARITHMETIC SVE MATMUL_INT8 SME)
ggml_add_cpu_backend_variant(armv9.2_2 DOTPROD FP16_VECTOR_ARITHMETIC SVE MATMUL_INT8 SVE2 SME)
elseif (CMAKE_SYSTEM_NAME MATCHES "Android")
# Android-specific backends with SoC-compatible feature sets
ggml_add_cpu_backend_variant(android_armv8.0_1)
ggml_add_cpu_backend_variant(android_armv8.2_1 DOTPROD)
ggml_add_cpu_backend_variant(android_armv8.2_2 DOTPROD FP16_VECTOR_ARITHMETIC)
ggml_add_cpu_backend_variant(android_armv8.6_1 DOTPROD FP16_VECTOR_ARITHMETIC MATMUL_INT8)
ggml_add_cpu_backend_variant(android_armv9.0_1 DOTPROD MATMUL_INT8 FP16_VECTOR_ARITHMETIC SVE2)
ggml_add_cpu_backend_variant(android_armv9.2_1 DOTPROD MATMUL_INT8 FP16_VECTOR_ARITHMETIC SVE SME)
ggml_add_cpu_backend_variant(android_armv9.2_2 DOTPROD MATMUL_INT8 FP16_VECTOR_ARITHMETIC SVE SVE2 SME)
elseif (APPLE)
ggml_add_cpu_backend_variant(apple_m1 DOTPROD)
ggml_add_cpu_backend_variant(apple_m2_m3 DOTPROD MATMUL_INT8)
ggml_add_cpu_backend_variant(apple_m4 DOTPROD MATMUL_INT8 NOSVE SME)
else()
message(FATAL_ERROR "Unsupported ARM target OS: ${CMAKE_SYSTEM_NAME}")
endif()
elseif (GGML_SYSTEM_ARCH STREQUAL "PowerPC")
if (CMAKE_SYSTEM_NAME MATCHES "Linux")
ggml_add_cpu_backend_variant(power0)
ggml_add_cpu_backend_variant(power7_1 POWER7)
ggml_add_cpu_backend_variant(power7_2 POWER7 VSX)
ggml_add_cpu_backend_variant(power8_1 POWER8)
ggml_add_cpu_backend_variant(power8_2 POWER8 VSX)
ggml_add_cpu_backend_variant(power9 POWER9 VSX)
ggml_add_cpu_backend_variant(power10 POWER10 VSX)
ggml_add_cpu_backend_variant(power11 POWER11 VSX)
else()
message(FATAL_ERROR "Unsupported PowerPC target OS: ${CMAKE_SYSTEM_NAME}")
endif()
elseif (GGML_SYSTEM_ARCH STREQUAL "s390x")
if (CMAKE_SYSTEM_NAME MATCHES "Linux")
ggml_add_cpu_backend_variant(z15 Z15 VXE2)
ggml_add_cpu_backend_variant(z16 Z16 VXE2 NNPA)
else()
message(FATAL_ERROR "Unsupported s390x target OS: ${CMAKE_SYSTEM_NAME}")
endif()
elseif (GGML_SYSTEM_ARCH STREQUAL "riscv64")
if (CMAKE_SYSTEM_NAME MATCHES "Linux")
ggml_add_cpu_backend_variant(riscv64_0)
ggml_add_cpu_backend_variant(riscv64_v RVV)
else()
message(FATAL_ERROR "Unsupported RISC-V target OS: ${CMAKE_SYSTEM_NAME}")
endif()
else()
message(FATAL_ERROR "GGML_CPU_ALL_VARIANTS not yet supported with ${GGML_SYSTEM_ARCH} on ${CMAKE_SYSTEM_NAME}")
endif()
elseif (GGML_CPU)
ggml_add_cpu_backend_variant_impl("")
endif()

The idea is to avoid adding feature detection logic in the ci/run.sh script and instead consolidate it in the existing CMake implementation for building the possible CPU backends.

 * As per the maintainers' suggestion, removed cpu feature detection
   from CI run script as CMake handles it already

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
@martin-klacer-arm
Copy link
Copy Markdown
Contributor Author

Thank you for the comment! That's a very good point, after further investigation I verified that the CMake implementation already does this in a satisfactory way and the existing feature detection/-march setting code from before my patch is actually redundant.

In the latest patch, I removed all the previously existing CPU -march detection, which addresses the same issue as my original patch did and is, I assume, the preferred option.

@ggerganov
Copy link
Copy Markdown
Member

@CISC Do you know if I can start the CI workflow on this branch? Or do I need to clone it in a repo branch to do that?

@CISC
Copy link
Copy Markdown
Member

CISC commented Mar 26, 2026

@CISC Do you know if I can start the CI workflow on this branch? Or do I need to clone it in a repo branch to do that?

I have not found a way to manually trigger workflows on PR-branches, would like to know myself. :)

@ggerganov
Copy link
Copy Markdown
Member

Running here: https://github.com/ggml-org/llama.cpp/actions/runs/23847023744/job/69516801996. If green, we can merge.

@ggerganov ggerganov merged commit 6de97b9 into ggml-org:master Apr 1, 2026
64 of 73 checks passed
slartibardfast pushed a commit to slartibardfast/llama.cpp that referenced this pull request Apr 12, 2026
* kleidiai: add cpu feature detection to CI run script

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Change-Id: I663adc3a7691a98e7dac5488962c13cc344f034a

* kleidiai: revert unrelated requirements change

Signed-off-by: Martin Klacer <martin.klacer@arm.com>

* kleidiai: removed cpu feature detection from CI run script

 * As per the maintainers' suggestion, removed cpu feature detection
   from CI run script as CMake handles it already

Signed-off-by: Martin Klacer <martin.klacer@arm.com>

---------

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* kleidiai: add cpu feature detection to CI run script

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Change-Id: I663adc3a7691a98e7dac5488962c13cc344f034a

* kleidiai: revert unrelated requirements change

Signed-off-by: Martin Klacer <martin.klacer@arm.com>

* kleidiai: removed cpu feature detection from CI run script

 * As per the maintainers' suggestion, removed cpu feature detection
   from CI run script as CMake handles it already

Signed-off-by: Martin Klacer <martin.klacer@arm.com>

---------

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* kleidiai: add cpu feature detection to CI run script

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Change-Id: I663adc3a7691a98e7dac5488962c13cc344f034a

* kleidiai: revert unrelated requirements change

Signed-off-by: Martin Klacer <martin.klacer@arm.com>

* kleidiai: removed cpu feature detection from CI run script

 * As per the maintainers' suggestion, removed cpu feature detection
   from CI run script as CMake handles it already

Signed-off-by: Martin Klacer <martin.klacer@arm.com>

---------

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
* kleidiai: add cpu feature detection to CI run script

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Change-Id: I663adc3a7691a98e7dac5488962c13cc344f034a

* kleidiai: revert unrelated requirements change

Signed-off-by: Martin Klacer <martin.klacer@arm.com>

* kleidiai: removed cpu feature detection from CI run script

 * As per the maintainers' suggestion, removed cpu feature detection
   from CI run script as CMake handles it already

Signed-off-by: Martin Klacer <martin.klacer@arm.com>

---------

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants