docs : update ZenDNN docs for Q8 support by truecoder34 · Pull Request #23791 · ggml-org/llama.cpp

truecoder34 · 2026-05-27T19:12:42Z

Overview

This PR updates docs/ZenDNN.md after recently merged support of Q8_0 in ggml-zendnn backend.

Additional information

Hi @z-vishal , @z-sachin , would you be so kind to make overview of proposed updates.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:

z-vishal · 2026-05-30T00:11:58Z

Hi @truecoder34, Could you please have a look at the CONTRIBUTING.md guidelines for commit messages and pr titles, for this pr, something like docs : update ZenDNN docs for Q8 supportwould fit better :)
Thanks for updating the doc

truecoder34 · 2026-05-30T10:37:01Z

@avinashcpandey @Jiten1parmar @z-vishal, could you please check doc updates made ? if it is ok, can call for approvers ?

z-vishal · 2026-05-30T18:17:31Z

 |:----------------------:|:-------:|:---------------------------------------------:|
 | FP32                   | Support | Full precision floating point                 |
 | BF16                   | Support | BFloat16 (best performance on Zen 4/Zen 5)    |
+| Q8_0                   | Support | Quantized 8-bit weights accelerated through ZenDNN MulMat branch |


small suggestion, maybe we can change the note to "8-bit quantized weights via dynamic quantization"? link directly ZenDNN doc for more context

yes, it sounds good, direct link to ZenDNN doc is useful.

z-vishal · 2026-05-30T18:29:09Z


 - **BF16** provides best performance on Zen 4 and Zen 5 EPYC™ processors (Genoa, Turin).
+- **Q8_0** support is available for quantized model weights in supported ZenDNN Mul Mat branch.
+- Other quantization formats may fall back to the standard CPU backend unless explicitly supported by the ZenDNN backend.


small suggestion "may fall back" feels a bit misleading, since other formats always fall back to the CPU backend, what do you think?

you are right, might lead to misunderstanding. "may fall back" --> "fall back"

z-vishal · 2026-05-30T18:38:19Z

 *Notes:*

 - **BF16** provides best performance on Zen 4 and Zen 5 EPYC™ processors (Genoa, Turin).
+- **Q8_0** support is available for quantized model weights in supported ZenDNN Mul Mat branch.


"supported ZenDNN Mul Mat branch" is a bit unclear, Q8_0 works through ZenDNN's dynamic quantization in LowOHA MatMul op, not a specific branch. what do you think?

tried to provide better explanation . added direct link to matmul op

truecoder34 · 2026-05-30T22:11:31Z

Hi team, @ggerganov, @CISC , could you please approve doc changes ? To make docs and implementation to be on the same page .
thank you in advance .

taronaeo · 2026-05-31T01:15:02Z

We'll approve when @z-vishal is satisfied with the changes. Please wait for his approval :)

…wercase * upstream/master: (27 commits) vocab : add tokenizer support for jina-embeddings-v2-base-zh (ggml-org#18756) ui: fix ETag truncation with MSVC compiler (ggml-org#23917) docs : update ZenDNN docs for Q8 support (ggml-org#23791) llama: only use one iGPU device by default (ggml-org#23897) webui: add custom CSS injection via config (ggml-org#23904) Support `-fa auto` in llama-bench (ggml-org#23714) opencl: support bf16 by converting to f16 (ggml-org#23839) ui: exclude generated build dirs from prettier and eslint so lint errors stop being masked (ggml-org#23910) TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs (ggml-org#23843) metal : restore im2col implementation for large kernels (ggml-org#23901) test: (test-llama-archs) log the config name first (ggml-org#23885) ci : update ios-xcode release job to macos-26 (ggml-org#23906) ggml : add some lsx support (ggml-org#23798) vulkan: add Flash Attention support for BFloat16 KV cache (ggml-org#23420) ci : fix s390x release job (ggml-org#23898) ci : clear cache instead of "no timestamp" keys + fix macos (ggml-org#23895) llama : do not skip iGPU when only RPC devices are present (ggml-org#23868) server: in SSE mode, send HTTP headers when slot starts (ggml-org#23884) ggml-webgpu: Check earlier for WebGPU required features (ggml-org#23879) ggml-webgpu: add q4_0/q8_0 SET_ROWS (ggml-org#23760) ... # Conflicts: # gguf-py/gguf/vocab.py # src/llama-vocab.cpp

* docs zendnn added information about Q8 support * docs zendnn rm unnecessary data * docs update, links to ZenDNN docs provided * docs zenDNN update: clarified explanation * docs zenDNN update: one more explanation clarified --------- Co-authored-by: plotnikov.v10 <plotnikov.v10@wb.ru>

github-actions Bot added the documentation Improvements or additions to documentation label May 27, 2026

truecoder34 changed the title ~~[ZenDNN] docs zendnn added information about Q8 support~~ docs : update ZenDNN docs for Q8 support May 30, 2026

z-vishal reviewed May 30, 2026

View reviewed changes

plotnikov.v10 added 4 commits May 31, 2026 01:03

docs zendnn added information about Q8 support

fdaf196

docs zendnn rm unnecessary data

63b95f4

docs update, links to ZenDNN docs provided

6fec66c

docs zenDNN update: clarified explanation

738a887

docs zenDNN update: one more explanation clarified

3d57d0c

truecoder34 force-pushed the docs-zendnn-q8-support branch from dc8b180 to 3d57d0c Compare May 30, 2026 22:15

z-vishal approved these changes May 31, 2026

View reviewed changes

taronaeo approved these changes May 31, 2026

View reviewed changes

CISC approved these changes May 31, 2026

View reviewed changes

CISC merged commit e6123e2 into ggml-org:master May 31, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs : update ZenDNN docs for Q8 support#23791

docs : update ZenDNN docs for Q8 support#23791
CISC merged 5 commits into
ggml-org:masterfrom
truecoder34:docs-zendnn-q8-support

truecoder34 commented May 27, 2026 •

edited

Loading

Uh oh!

z-vishal commented May 30, 2026 •

edited

Loading

Uh oh!

truecoder34 commented May 30, 2026

Uh oh!

z-vishal May 30, 2026

Uh oh!

truecoder34 May 30, 2026

Uh oh!

z-vishal May 30, 2026

Uh oh!

truecoder34 May 30, 2026

Uh oh!

z-vishal May 30, 2026

Uh oh!

truecoder34 May 30, 2026

Uh oh!

truecoder34 commented May 30, 2026 •

edited

Loading

Uh oh!

taronaeo commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

truecoder34 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

z-vishal commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

truecoder34 commented May 30, 2026

Uh oh!

z-vishal May 30, 2026

Choose a reason for hiding this comment

Uh oh!

truecoder34 May 30, 2026

Choose a reason for hiding this comment

Uh oh!

z-vishal May 30, 2026

Choose a reason for hiding this comment

Uh oh!

truecoder34 May 30, 2026

Choose a reason for hiding this comment

Uh oh!

z-vishal May 30, 2026

Choose a reason for hiding this comment

Uh oh!

truecoder34 May 30, 2026

Choose a reason for hiding this comment

Uh oh!

truecoder34 commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

truecoder34 commented May 27, 2026 •

edited

Loading

z-vishal commented May 30, 2026 •

edited

Loading

truecoder34 commented May 30, 2026 •

edited

Loading