UPSTREAM PR #17690: ggml-zendnn : add ZenDNN backend for AMD CPUs#402
UPSTREAM PR #17690: ggml-zendnn : add ZenDNN backend for AMD CPUs#402
Conversation
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary - PR #402: ZenDNN Backend IntegrationOverviewThis PR adds ZenDNN backend support for AMD EPYC CPUs through 19,728 additions across 12 files. The changes introduce a new backend registration path without modifying core inference functions. Analysis shows startup latency increase with no impact on inference performance. Key FindingsPerformance-Critical Area: Backend Initialization The function Inference Performance Impact No changes detected in core inference functions: Power Consumption Analysis Binary-level analysis shows minimal power consumption changes: libggml.so decreased by 6.6 nJ and llama-bench increased by 31 nJ. The net change is effectively zero, indicating no measurable power impact from the backend registration code itself. Code Changes The implementation adds ZenDNN to the backend registry through two paths: static registration via |
bdacbc7 to
ca9e0d2
Compare
048ad94 to
6c1fde6
Compare
ef7afbe to
d4c3480
Compare
Mirrored from ggml-org/llama.cpp#17690
This PR adds ZenDNN backend support for accelerated inference on AMD EPYC™ CPUs.
Background
ZenDNN is AMD's optimized deep learning library for EPYC processors, providing high-performance primitives for inference workloads. It uses the LowOHA (Low Overhead High-performance) MatMul operator for efficient matrix multiplication.
Changes
Backend implementation:
ggml/src/ggml-zendnn/GGML_OP_MUL_MATacceleration using ZenDNN primitivesBuild system:
-DGGML_ZENDNN=ON-DGGML_ZENDNN_PATH=/path/to/zendnnDocumentation:
docs/backend/ZenDNN.mddocs/build.mdHardware Support
Performance Notes
export ZENDNNL_MATMUL_ALGO=2(Blocked AOCL BLIS backend)Testing
Tested on AMD EPYC systems with llama-server and llama-cli using various models (LLaMA, Mistral, Qwen).
Related
AI usage disclosure: AI assistance was used for documentation writing, formatting and CMake syntax. All code logic, implementation decisions, backend integration, and testing were done manually. The core ZenDNN backend implementation, performance optimizations, and benchmark testing were human-authored and validated.