chore(args): mark tbq3_0 / tbq4_0 KV-cache as experimental (closes #70)#71
Open
marksverdhei wants to merge 1 commit into
Open
chore(args): mark tbq3_0 / tbq4_0 KV-cache as experimental (closes #70)#71marksverdhei wants to merge 1 commit into
marksverdhei wants to merge 1 commit into
Conversation
Measured perplexity on Qwen3.5-0.8B-BF16 / wikitext-2 / ctx=512: | cache-type | PPL | vs f16 | |------------|--------|--------| | f16 | 19.08 | baseline | | q8_0 | 19.08 | lossless | | tbq3_0 | 1252.30 | 65x worse | | tbq4_0 | 1393.00 | 73x worse | TBQ KV-cache produces near-random output. Likely root cause is statistical: TBQ's rotated-domain codebook was calibrated for weight distributions, not the K/V tensor distributions seen during inference. The encoding scheme itself cannot faithfully represent KV values. Snoop-kube's cluster audit confirms zero deployments use tbq* KV-cache (every host uses q8_0 or q4_0). DFlash also defaults to q8_0 (PR #65). No production consumer exists. This PR adds a one-line experimental note to the --cache-type-k/v and --cache-type-k-draft/v-draft help text, referencing issue #70 for the full data + recommendation. Code path stays in place — Markus may have roadmap intent I'm not aware of; this just stops anyone reading --help from assuming tbq* is a usable choice without checking. Follow-ups if Markus prefers full removal: * drop tbq3_0/tbq4_0 from common/arg.cpp's kv_cache_types list * keep the ftypes (TBQ weight quantization is separate from KV use) * close issues ggml-org#124 + ggml-org#125 as wont-fix
This was referenced Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #70. One-line note in 4 cache-type help strings.
What
Adds an experimental marker to the help text of , , , and :
Why
Measured PPL on Qwen3.5-0.8B-BF16 / wikitext-2 / ctx=512:
TBQ KV-cache produces near-random output. Full data + cluster audit + likely root cause in issue #70.
Why not just remove
Markus may have roadmap intent I'm not aware of (TBQ landed via PR #52 with substantial CPU + CUDA kernel work). This PR is the cheapest reversible step — code stays, but anyone reading now knows tbq* is experimental and links to the data. Markus can choose to escalate (remove from CLI / rip code entirely) as a follow-up.
Verified
[ 0%] Built target llama-common-base
[ 0%] Built target cpp-httplib
[ 3%] Built target ggml-base
[ 9%] Built target ggml-cpu
[ 11%] Built target ggml
[ 68%] Built target llama
[ 79%] Built target llama-common
[ 79%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-audio.cpp.o
[ 79%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd.cpp.o
[ 79%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-image.cpp.o
[ 79%] Linking CXX executable llama-ui-embed
[ 79%] Built target llama-ui-embed
[ 79%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-helper.cpp.o
[ 79%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/clip.cpp.o
[ 79%] Provisioning UI assets
-- UI: using committed heierchat snapshot from /home/me/ht/forks/ht-llama.cpp/tools/server/public
[ 79%] Built target llama-ui-assets
[ 81%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/cogvlm.cpp.o
[ 81%] Building CXX object tools/ui/CMakeFiles/llama-ui.dir/ui.cpp.o
[ 81%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/conformer.cpp.o
[ 83%] Linking CXX static library libllama-ui.a
[ 83%] Built target llama-ui
[ 83%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/dotsocr.cpp.o
[ 83%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/exaone4_5.cpp.o
[ 83%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/gemma4a.cpp.o
[ 85%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/gemma4v.cpp.o
[ 85%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/gemma4ua.cpp.o
[ 85%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/gemma4uv.cpp.o
[ 85%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/glm4v.cpp.o
[ 85%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/granite-speech.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/hunyuanvl.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/internvl.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/kimivl.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/kimik25.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/nemotron-v2-vl.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/llama4.cpp.o
[ 88%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/llava.cpp.o
[ 88%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/minicpmv.cpp.o
[ 88%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/paddleocr.cpp.o
[ 88%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/pixtral.cpp.o
[ 88%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/qwen2vl.cpp.o
[ 90%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/qwen3vl.cpp.o
[ 90%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/mimovl.cpp.o
[ 90%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/qwen3a.cpp.o
[ 90%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/step3vl.cpp.o
[ 90%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/siglip.cpp.o
[ 92%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/whisper-enc.cpp.o
[ 92%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/deepseekocr.cpp.o
[ 92%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/deepseekocr2.cpp.o
[ 92%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/mobilenetv5.cpp.o
[ 92%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/youtuvl.cpp.o
[ 94%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/yasa2.cpp.o
[ 94%] Linking CXX shared library ../../bin/libmtmd.so
[ 94%] Built target mtmd
[ 94%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-task.cpp.o
[ 94%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-queue.cpp.o
[ 94%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-chat.cpp.o
[ 94%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-common.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-context.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-tools.cpp.o
[ 96%] Linking CXX static library libserver-context.a
[ 96%] Built target server-context
[ 96%] Building CXX object tools/server/CMakeFiles/llama-server-impl.dir/server-http.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/llama-server-impl.dir/server-models.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/llama-server-impl.dir/server.cpp.o
[ 98%] Linking CXX shared library ../../bin/libllama-server-impl.so
[ 98%] Built target llama-server-impl
[100%] Building CXX object tools/server/CMakeFiles/llama-server.dir/main.cpp.o
[100%] Linking CXX executable ../../bin/llama-server
[100%] Built target llama-server)
Out of scope (separate decisions for Markus)