chore(args): mark tbq3_0 / tbq4_0 KV-cache as experimental (closes #70) by marksverdhei · Pull Request #71 · heiervang-technologies/ht-llama.cpp

marksverdhei · 2026-06-04T21:22:08Z

Closes #70. One-line note in 4 cache-type help strings.

What

Adds an experimental marker to the help text of , , , and :

KV cache data type for K
allowed values: f32, f16, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1, tbq3_0, tbq4_0
(default: f16)
note: tbq3_0 / tbq4_0 are experimental — measured ~65-73x worse perplexity vs q8_0 on Qwen3.5-0.8B (issue #70)
(env: LLAMA_ARG_CACHE_TYPE_K)

Why

Measured PPL on Qwen3.5-0.8B-BF16 / wikitext-2 / ctx=512:

cache-type	PPL	vs f16
f16	19.08	baseline
q8_0	19.08	lossless
tbq3_0	1252.30	65x worse
tbq4_0	1393.00	73x worse

TBQ KV-cache produces near-random output. Full data + cluster audit + likely root cause in issue #70.

Why not just remove

Markus may have roadmap intent I'm not aware of (TBQ landed via PR #52 with substantial CPU + CUDA kernel work). This PR is the cheapest reversible step — code stays, but anyone reading now knows tbq* is experimental and links to the data. Markus can choose to escalate (remove from CLI / rip code entirely) as a follow-up.

Verified

✅ build clean ([ 0%] Building CXX object tools/ui/CMakeFiles/llama-ui-embed.dir/embed.cpp.o
[ 0%] Built target llama-common-base
[ 0%] Built target cpp-httplib
[ 3%] Built target ggml-base
[ 9%] Built target ggml-cpu
[ 11%] Built target ggml
[ 68%] Built target llama
[ 79%] Built target llama-common
[ 79%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-audio.cpp.o
[ 79%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd.cpp.o
[ 79%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-image.cpp.o
[ 79%] Linking CXX executable llama-ui-embed
[ 79%] Built target llama-ui-embed
[ 79%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-helper.cpp.o
[ 79%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/clip.cpp.o
[ 79%] Provisioning UI assets
-- UI: using committed heierchat snapshot from /home/me/ht/forks/ht-llama.cpp/tools/server/public
[ 79%] Built target llama-ui-assets
[ 81%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/cogvlm.cpp.o
[ 81%] Building CXX object tools/ui/CMakeFiles/llama-ui.dir/ui.cpp.o
[ 81%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/conformer.cpp.o
[ 83%] Linking CXX static library libllama-ui.a
[ 83%] Built target llama-ui
[ 83%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/dotsocr.cpp.o
[ 83%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/exaone4_5.cpp.o
[ 83%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/gemma4a.cpp.o
[ 85%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/gemma4v.cpp.o
[ 85%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/gemma4ua.cpp.o
[ 85%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/gemma4uv.cpp.o
[ 85%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/glm4v.cpp.o
[ 85%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/granite-speech.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/hunyuanvl.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/internvl.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/kimivl.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/kimik25.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/nemotron-v2-vl.cpp.o
[ 87%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/llama4.cpp.o
[ 88%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/llava.cpp.o
[ 88%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/minicpmv.cpp.o
[ 88%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/paddleocr.cpp.o
[ 88%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/pixtral.cpp.o
[ 88%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/qwen2vl.cpp.o
[ 90%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/qwen3vl.cpp.o
[ 90%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/mimovl.cpp.o
[ 90%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/qwen3a.cpp.o
[ 90%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/step3vl.cpp.o
[ 90%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/siglip.cpp.o
[ 92%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/whisper-enc.cpp.o
[ 92%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/deepseekocr.cpp.o
[ 92%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/deepseekocr2.cpp.o
[ 92%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/mobilenetv5.cpp.o
[ 92%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/youtuvl.cpp.o
[ 94%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/yasa2.cpp.o
[ 94%] Linking CXX shared library ../../bin/libmtmd.so
[ 94%] Built target mtmd
[ 94%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-task.cpp.o
[ 94%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-queue.cpp.o
[ 94%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-chat.cpp.o
[ 94%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-common.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-context.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-tools.cpp.o
[ 96%] Linking CXX static library libserver-context.a
[ 96%] Built target server-context
[ 96%] Building CXX object tools/server/CMakeFiles/llama-server-impl.dir/server-http.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/llama-server-impl.dir/server-models.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/llama-server-impl.dir/server.cpp.o
[ 98%] Linking CXX shared library ../../bin/libllama-server-impl.so
[ 98%] Built target llama-server-impl
[100%] Building CXX object tools/server/CMakeFiles/llama-server.dir/main.cpp.o
[100%] Linking CXX executable ../../bin/llama-server
[100%] Built target llama-server)
✅ renders the new note correctly for all 4 cache-type flags
8 lines added, 4 lines reformatted — no behavior change

Out of scope (separate decisions for Markus)

android port of llama.cpp ggml-org/llama.cpp#124 (test-quantize-perf windows-x64 segfault) — pre-existing TBQ-on-Windows class
Fix build for Android ggml-org/llama.cpp#125 (TBQ KV-cache CPU first-decode segfault) — heap-stomp diagnosed during this work
Full TBQ KV-cache removal — needs roadmap input

Measured perplexity on Qwen3.5-0.8B-BF16 / wikitext-2 / ctx=512: | cache-type | PPL | vs f16 | |------------|--------|--------| | f16 | 19.08 | baseline | | q8_0 | 19.08 | lossless | | tbq3_0 | 1252.30 | 65x worse | | tbq4_0 | 1393.00 | 73x worse | TBQ KV-cache produces near-random output. Likely root cause is statistical: TBQ's rotated-domain codebook was calibrated for weight distributions, not the K/V tensor distributions seen during inference. The encoding scheme itself cannot faithfully represent KV values. Snoop-kube's cluster audit confirms zero deployments use tbq* KV-cache (every host uses q8_0 or q4_0). DFlash also defaults to q8_0 (PR #65). No production consumer exists. This PR adds a one-line experimental note to the --cache-type-k/v and --cache-type-k-draft/v-draft help text, referencing issue #70 for the full data + recommendation. Code path stays in place — Markus may have roadmap intent I'm not aware of; this just stops anyone reading --help from assuming tbq* is a usable choice without checking. Follow-ups if Markus prefers full removal: * drop tbq3_0/tbq4_0 from common/arg.cpp's kv_cache_types list * keep the ftypes (TBQ weight quantization is separate from KV use) * close issues ggml-org#124 + ggml-org#125 as wont-fix

This was referenced Jun 5, 2026

Hivemind Maintenance Tasks Epoch 1 #73

Closed

Hivemind Maintenance Tasks Epoch 2 #79

Closed

Hivemind Maintenance Tasks Epoch 3 #81

Closed

Hivemind Maintenance Tasks Epoch 4 #86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(args): mark tbq3_0 / tbq4_0 KV-cache as experimental (closes #70)#71

chore(args): mark tbq3_0 / tbq4_0 KV-cache as experimental (closes #70)#71
marksverdhei wants to merge 1 commit into
htfrom
chore/tbq-kv-experimental-marker

marksverdhei commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marksverdhei commented Jun 4, 2026

What

Why

Why not just remove

Verified

Out of scope (separate decisions for Markus)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant