test-qvac-lib-infer-nmtcpp#43
Closed
andretetherio wants to merge 250 commits into
Closed
Conversation
chore(qvac-cli): testing pr-request-trigger
add public reusable for qvac-cli
Qvac cli integration
testing qvac-cli workflow
testing qvac-cli workflow
testing qvac-cli-integration
added pullreques:write permission to qvac-cli workflow
added qvac-lib-dl-hyperdrive trigger-reusable-lb workflow
testing registry server workflow
updated workflow secrets
testing qvac-lib-registry
updated workflow secrets
added work dir
Contributor
🧪 C++ Test Coverage ReportCoverage: 📊 Detailed Coverage |
Contributor
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
Contributor
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
ogad-tether
added a commit
to ogad-tether/qvac
that referenced
this pull request
Jun 12, 2026
…x default 4096
Builds on the upstream kv_cache_type support
(qvac-ext-lib-whisper.cpp#43): the T3 KV cache is allocated up-front
at nCtx, and q8_0 stores it at ~27% of f32 — so the new defaults
(nCtx=4096 + kvCacheType="q8_0", ~210 MB of KV for ~160 s of audio
per synthesize() call) use HALF the memory of the previous
f32@2048 plan while doubling the usable context.
- New `kvCacheType` constructor option ('f32'|'f16'|'q8_0'), plumbed
JS -> JSAdapter -> ChatterboxConfig -> EngineOptions. Unknown
values are rejected at construction (tts-cpp's own fallback would
silently revert to f32 and change the memory profile the caller
asked for). kvCacheType:"f32" restores bit-exact pre-quantisation
behaviour.
- nCtx default 2048 -> 4096 (cheaper than the old default AND longer,
per the review suggestion to raise ctx alongside q8 KV).
- vcpkg tts-cpp pin -> 2026-06-12. This pin is Android-safe: the
revision removes the last direct ggml_backend_is_cpu /
ggml_get_type_traits_cpu references from tts-cpp (the
unresolvable-UND dlopen crash behind the 0.2.2 revert), routing
them through the backend registry + ggml_quantize_chunk (ggml-base).
Upstream validation on real GGUFs (see tetherto#43): Turbo greedy token
sequences byte-identical across f32/f16/q8_0 on CPU and Metal; MTL
CFG can flip a near-tie argmax (same class of variation as a seed
change; whisper transcribes the q8_0 output to the exact input
text); Metal decode 20-30% faster from the KV bandwidth saving.
Tests: gtest covers the q8_0 default, explicit forwarding, the f32
escape hatch, and unknown-value rejection (42/42 against tts-cpp
2026-06-12); JS unit suite 63/63; lint clean.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.