docs : update ZenDNN docs for Q8 support#23791
Conversation
|
Hi @truecoder34, Could you please have a look at the |
|
@avinashcpandey @Jiten1parmar @z-vishal, could you please check doc updates made ? if it is ok, can call for approvers ? |
| |:----------------------:|:-------:|:---------------------------------------------:| | ||
| | FP32 | Support | Full precision floating point | | ||
| | BF16 | Support | BFloat16 (best performance on Zen 4/Zen 5) | | ||
| | Q8_0 | Support | Quantized 8-bit weights accelerated through ZenDNN MulMat branch | |
There was a problem hiding this comment.
small suggestion, maybe we can change the note to "8-bit quantized weights via dynamic quantization"? link directly ZenDNN doc for more context
There was a problem hiding this comment.
yes, it sounds good, direct link to ZenDNN doc is useful.
|
|
||
| - **BF16** provides best performance on Zen 4 and Zen 5 EPYC™ processors (Genoa, Turin). | ||
| - **Q8_0** support is available for quantized model weights in supported ZenDNN Mul Mat branch. | ||
| - Other quantization formats may fall back to the standard CPU backend unless explicitly supported by the ZenDNN backend. |
There was a problem hiding this comment.
small suggestion "may fall back" feels a bit misleading, since other formats always fall back to the CPU backend, what do you think?
There was a problem hiding this comment.
you are right, might lead to misunderstanding. "may fall back" --> "fall back"
| *Notes:* | ||
|
|
||
| - **BF16** provides best performance on Zen 4 and Zen 5 EPYC™ processors (Genoa, Turin). | ||
| - **Q8_0** support is available for quantized model weights in supported ZenDNN Mul Mat branch. |
There was a problem hiding this comment.
"supported ZenDNN Mul Mat branch" is a bit unclear, Q8_0 works through ZenDNN's dynamic quantization in LowOHA MatMul op, not a specific branch. what do you think?
There was a problem hiding this comment.
tried to provide better explanation . added direct link to matmul op
|
Hi team, @ggerganov, @CISC , could you please approve doc changes ? To make docs and implementation to be on the same page . |
dc8b180 to
3d57d0c
Compare
|
We'll approve when @z-vishal is satisfied with the changes. Please wait for his approval :) |
…wercase * upstream/master: (27 commits) vocab : add tokenizer support for jina-embeddings-v2-base-zh (ggml-org#18756) ui: fix ETag truncation with MSVC compiler (ggml-org#23917) docs : update ZenDNN docs for Q8 support (ggml-org#23791) llama: only use one iGPU device by default (ggml-org#23897) webui: add custom CSS injection via config (ggml-org#23904) Support `-fa auto` in llama-bench (ggml-org#23714) opencl: support bf16 by converting to f16 (ggml-org#23839) ui: exclude generated build dirs from prettier and eslint so lint errors stop being masked (ggml-org#23910) TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs (ggml-org#23843) metal : restore im2col implementation for large kernels (ggml-org#23901) test: (test-llama-archs) log the config name first (ggml-org#23885) ci : update ios-xcode release job to macos-26 (ggml-org#23906) ggml : add some lsx support (ggml-org#23798) vulkan: add Flash Attention support for BFloat16 KV cache (ggml-org#23420) ci : fix s390x release job (ggml-org#23898) ci : clear cache instead of "no timestamp" keys + fix macos (ggml-org#23895) llama : do not skip iGPU when only RPC devices are present (ggml-org#23868) server: in SSE mode, send HTTP headers when slot starts (ggml-org#23884) ggml-webgpu: Check earlier for WebGPU required features (ggml-org#23879) ggml-webgpu: add q4_0/q8_0 SET_ROWS (ggml-org#23760) ... # Conflicts: # gguf-py/gguf/vocab.py # src/llama-vocab.cpp
* docs zendnn added information about Q8 support * docs zendnn rm unnecessary data * docs update, links to ZenDNN docs provided * docs zenDNN update: clarified explanation * docs zenDNN update: one more explanation clarified --------- Co-authored-by: plotnikov.v10 <plotnikov.v10@wb.ru>
Overview
This PR updates docs/ZenDNN.md after recently merged support of Q8_0 in ggml-zendnn backend.
Additional information
Hi @z-vishal , @z-sachin , would you be so kind to make overview of proposed updates.
Requirements