-
Notifications
You must be signed in to change notification settings - Fork 13.9k
ggml-zendnn : add ZenDNN backend for AMD CPUs #17690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I was thinking to create a backend with https://github.com/amd/blis (with FBGEMM) but good with zenDNN to. |
|
Can you also include the benchmark results from #17684 into this PR? |
|
@taronaeo Updated the PR description with benchmark results |
|
@Djip007 Thanks! AMD BLIS is actually what ZenDNN uses under the hood the |
|
|
||
| return &ggml_backend_zendnn_device; | ||
|
|
||
| GGML_UNUSED(reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like both reg and index are used so these GGML_UNUSED are not needed.
|
|
||
| ZenDNN provides optimized deep learning primitives for AMD EPYC™ CPUs. It accelerates matrix multiplication operations for inference workloads. | ||
|
|
||
| ### Compilation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to install LIBXSMM to compile:
$ sudo apt-get install libxsmm-devPerhaps this should be mentioned somewhere if this is the case.
|
|
||
| static bool ggml_zendnn_sgemm(ggml_backend_zendnn_context * ctx, int64_t m, int64_t n, int64_t k, | ||
| const void * A, int64_t lda, const void * B, int64_t ldb, void * C, | ||
| int64_t ldc, int Atype, int Btype, int Ctype) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The coding convention is to use snake case, so perhaps something like A_type for the parameters instead?
This PR adds ZenDNN backend support for accelerated inference on AMD EPYC™ CPUs.
Background
ZenDNN is AMD's optimized deep learning library for EPYC processors, providing high-performance primitives for inference workloads. It uses the LowOHA (Low Overhead High-performance) MatMul operator for efficient matrix multiplication.
Changes
Backend implementation:
ggml/src/ggml-zendnn/GGML_OP_MUL_MATacceleration using ZenDNN primitivesBuild system:
-DGGML_ZENDNN=ON-DGGML_ZENDNN_PATH=/path/to/zendnnDocumentation:
docs/backend/ZenDNN.mddocs/build.mdHardware Support
Performance Notes
export ZENDNNL_MATMUL_ALGO=2(Blocked AOCL BLIS backend)Testing
Tested on AMD EPYC systems with llama-server and llama-cli using various models (LLaMA, Mistral, Qwen).
Performance Results
Test Configuration
ZENDNNL_MATMUL_ALGO=2(Blocked AOCL BLIS)Benchmark Results
LLaMA 3.1 8B (BF16)
LLaMA 3.1 8B (F32)
Qwen2 7B (BF16)
Qwen2 7B (F32)
LLaMA 2 7B (BF16)
LLaMA 2 7B (F32)
LLaMA 2 13B (BF16)
LLaMA 2 13B (F32)
Mixtral 8x7B (BF16)
Key Observations:
Related
AI usage disclosure: AI assistance was used for documentation writing, formatting and CMake syntax. All code logic, implementation decisions, backend integration, and testing were done manually. The core ZenDNN backend implementation, performance optimizations, and benchmark testing were human-authored and validated.