Skip to content

docs : update ZenDNN docs for Q8 support#23791

Merged
CISC merged 5 commits into
ggml-org:masterfrom
truecoder34:docs-zendnn-q8-support
May 31, 2026
Merged

docs : update ZenDNN docs for Q8 support#23791
CISC merged 5 commits into
ggml-org:masterfrom
truecoder34:docs-zendnn-q8-support

Conversation

@truecoder34
Copy link
Copy Markdown
Contributor

@truecoder34 truecoder34 commented May 27, 2026

Overview

This PR updates docs/ZenDNN.md after recently merged support of Q8_0 in ggml-zendnn backend.

Additional information

Hi @z-vishal , @z-sachin , would you be so kind to make overview of proposed updates.

Requirements

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 27, 2026
@z-vishal
Copy link
Copy Markdown
Contributor

z-vishal commented May 30, 2026

Hi @truecoder34, Could you please have a look at the CONTRIBUTING.md guidelines for commit messages and pr titles, for this pr, something like docs : update ZenDNN docs for Q8 supportwould fit better :)
Thanks for updating the doc

@truecoder34 truecoder34 changed the title [ZenDNN] docs zendnn added information about Q8 support docs : update ZenDNN docs for Q8 support May 30, 2026
@truecoder34
Copy link
Copy Markdown
Contributor Author

@avinashcpandey @Jiten1parmar @z-vishal, could you please check doc updates made ? if it is ok, can call for approvers ?

Comment thread docs/backend/ZenDNN.md Outdated
|:----------------------:|:-------:|:---------------------------------------------:|
| FP32 | Support | Full precision floating point |
| BF16 | Support | BFloat16 (best performance on Zen 4/Zen 5) |
| Q8_0 | Support | Quantized 8-bit weights accelerated through ZenDNN MulMat branch |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small suggestion, maybe we can change the note to "8-bit quantized weights via dynamic quantization"? link directly ZenDNN doc for more context

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it sounds good, direct link to ZenDNN doc is useful.

Comment thread docs/backend/ZenDNN.md Outdated

- **BF16** provides best performance on Zen 4 and Zen 5 EPYC™ processors (Genoa, Turin).
- **Q8_0** support is available for quantized model weights in supported ZenDNN Mul Mat branch.
- Other quantization formats may fall back to the standard CPU backend unless explicitly supported by the ZenDNN backend.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small suggestion "may fall back" feels a bit misleading, since other formats always fall back to the CPU backend, what do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, might lead to misunderstanding. "may fall back" --> "fall back"

Comment thread docs/backend/ZenDNN.md Outdated
*Notes:*

- **BF16** provides best performance on Zen 4 and Zen 5 EPYC™ processors (Genoa, Turin).
- **Q8_0** support is available for quantized model weights in supported ZenDNN Mul Mat branch.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"supported ZenDNN Mul Mat branch" is a bit unclear, Q8_0 works through ZenDNN's dynamic quantization in LowOHA MatMul op, not a specific branch. what do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried to provide better explanation . added direct link to matmul op

@truecoder34
Copy link
Copy Markdown
Contributor Author

truecoder34 commented May 30, 2026

Hi team, @ggerganov, @CISC , could you please approve doc changes ? To make docs and implementation to be on the same page .
thank you in advance .

@truecoder34 truecoder34 force-pushed the docs-zendnn-q8-support branch from dc8b180 to 3d57d0c Compare May 30, 2026 22:15
@taronaeo
Copy link
Copy Markdown
Member

We'll approve when @z-vishal is satisfied with the changes. Please wait for his approval :)

@CISC CISC merged commit e6123e2 into ggml-org:master May 31, 2026
3 checks passed
o7si added a commit to o7si/llama.cpp that referenced this pull request May 31, 2026
…wercase

* upstream/master: (27 commits)
  vocab : add tokenizer support for jina-embeddings-v2-base-zh (ggml-org#18756)
  ui: fix ETag truncation with MSVC compiler (ggml-org#23917)
  docs : update ZenDNN docs for Q8 support (ggml-org#23791)
  llama: only use one iGPU device by default (ggml-org#23897)
  webui: add custom CSS injection via config (ggml-org#23904)
  Support `-fa auto` in llama-bench (ggml-org#23714)
  opencl: support bf16 by converting to f16 (ggml-org#23839)
  ui: exclude generated build dirs from prettier and eslint so lint errors stop being masked (ggml-org#23910)
  TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs (ggml-org#23843)
  metal : restore im2col implementation for large kernels (ggml-org#23901)
  test: (test-llama-archs) log the config name first (ggml-org#23885)
  ci : update ios-xcode release job to macos-26 (ggml-org#23906)
  ggml : add some lsx support (ggml-org#23798)
  vulkan: add Flash Attention support for BFloat16 KV cache (ggml-org#23420)
  ci : fix s390x release job (ggml-org#23898)
  ci : clear cache instead of "no timestamp" keys + fix macos (ggml-org#23895)
  llama : do not skip iGPU when only RPC devices are present (ggml-org#23868)
  server: in SSE mode, send HTTP headers when slot starts (ggml-org#23884)
  ggml-webgpu: Check earlier for WebGPU required features (ggml-org#23879)
  ggml-webgpu: add q4_0/q8_0 SET_ROWS (ggml-org#23760)
  ...

# Conflicts:
#	gguf-py/gguf/vocab.py
#	src/llama-vocab.cpp
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
* docs zendnn added information about Q8 support

* docs zendnn rm unnecessary data

* docs update, links to ZenDNN docs provided

* docs zenDNN update: clarified explanation

* docs zenDNN update: one more explanation clarified

---------

Co-authored-by: plotnikov.v10 <plotnikov.v10@wb.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants