UPSTREAM PR #17690: ggml-zendnn : add ZenDNN backend for AMD CPUs by loci-dev · Pull Request #402 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-02T14:40:24Z

This PR adds ZenDNN backend support for accelerated inference on AMD EPYC™ CPUs.

Background

ZenDNN is AMD's optimized deep learning library for EPYC processors, providing high-performance primitives for inference workloads. It uses the LowOHA (Low Overhead High-performance) MatMul operator for efficient matrix multiplication.

Changes

Backend implementation:
- New ZenDNN backend in ggml/src/ggml-zendnn/
- Implements GGML_OP_MUL_MAT acceleration using ZenDNN primitives
- Supports FP32 and BF16 data types
- Auto-converts types
Build system:
- CMake integration with automatic download/build option: -DGGML_ZENDNN=ON
- Custom installation path support: -DGGML_ZENDNN_PATH=/path/to/zendnn
- Uses ZenDNN's CMake package for clean dependency management
Documentation:
- Comprehensive backend documentation in docs/backend/ZenDNN.md
- Build instructions added to docs/build.md
- Covers hardware support, setup, performance tuning, and profiling

Hardware Support

AMD EPYC 9005 Series (Turin/Zen 5)
AMD EPYC 9004 Series (Zen 4) - Recommended (best BF16 performance)
AMD EPYC 7003 Series (Milan/Zen 3)
AMD Ryzen AI MAX (Strix Halo)

Performance Notes

Best performance with export ZENDNNL_MATMUL_ALGO=2 (Blocked AOCL BLIS backend)
Optimized for BF16 inference on Zen 4/5 processors
Automatic parallel dispatch using OpenMP

Testing

Tested on AMD EPYC systems with llama-server and llama-cli using various models (LLaMA, Mistral, Qwen).

AI usage disclosure: AI assistance was used for documentation writing, formatting and CMake syntax. All code logic, implementation decisions, backend integration, and testing were done manually. The core ZenDNN backend implementation, performance optimizations, and benchmark testing were human-authored and validated.

loci-review · 2025-12-02T15:49:00Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #402: ZenDNN Backend Integration

Overview

This PR adds ZenDNN backend support for AMD EPYC CPUs through 19,728 additions across 12 files. The changes introduce a new backend registration path without modifying core inference functions. Analysis shows startup latency increase with no impact on inference performance.

Key Findings

Performance-Critical Area: Backend Initialization

The function ggml_backend_load_all_from_path in ggml/src/ggml-backend-reg.cpp shows response time increase of 133,671 ns due to the added ggml_backend_load_best("zendnn", ...) call. This breaks down as: filesystem scan (30,000 ns), dynamic library loading (50,000 ns), symbol resolution (15,000 ns), and backend initialization (38,000 ns). The throughput time increased by 5 ns from conditional compilation logic. This is a one-time startup cost that does not affect runtime inference.

Inference Performance Impact

No changes detected in core inference functions: llama_decode, llama_encode, or llama_tokenize. These functions remain unmodified with identical response time and throughput metrics between versions. Tokens per second performance is unchanged as the ZenDNN integration only affects backend registration during initialization, not the inference execution path.

Power Consumption Analysis

Binary-level analysis shows minimal power consumption changes: libggml.so decreased by 6.6 nJ and llama-bench increased by 31 nJ. The net change is effectively zero, indicating no measurable power impact from the backend registration code itself.

Code Changes

The implementation adds ZenDNN to the backend registry through two paths: static registration via ggml_backend_registry() constructor when compiled with GGML_USE_ZENDNN, and dynamic loading via ggml_backend_load_all_from_path(). The backend is positioned second in loading priority after BLAS. No modifications to public API headers or core inference logic were made, ensuring backward compatibility.

loci-dev temporarily deployed to PROD__AL_DEMO December 2, 2025 14:40 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 28 times, most recently from bdacbc7 to ca9e0d2 Compare December 5, 2025 00:36

loci-dev force-pushed the main branch 27 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32

loci-dev force-pushed the main branch 3 times, most recently from ef7afbe to d4c3480 Compare February 14, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #17690: ggml-zendnn : add ZenDNN backend for AMD CPUs#402

UPSTREAM PR #17690: ggml-zendnn : add ZenDNN backend for AMD CPUs#402
loci-dev wants to merge 3 commits intomainfrom
upstream-PR17690-branch_z-vishal-ggml-zendnn

loci-dev commented Dec 2, 2025

Uh oh!

loci-review bot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Dec 2, 2025

Background

Changes

Hardware Support

Performance Notes

Testing

Related

Uh oh!

loci-review bot commented Dec 2, 2025

Performance Analysis Summary - PR #402: ZenDNN Backend Integration

Overview

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants