Skip to content

tts-cpp: publish 2026-06-12 — QVAC-19557 chatterbox memory (PR #43) + Android-safe symbols#188

Draft
ogad-tether wants to merge 2 commits into
mainfrom
tts-cpp-2026-06-12
Draft

tts-cpp: publish 2026-06-12 — QVAC-19557 chatterbox memory (PR #43) + Android-safe symbols#188
ogad-tether wants to merge 2 commits into
mainfrom
tts-cpp-2026-06-12

Conversation

@ogad-tether

Copy link
Copy Markdown
Contributor

Publishes tts-cpp@2026-06-12, pinning tetherto/qvac-ext-lib-whisper.cpp@8b012789 — the head of tetherto/qvac-ext-lib-whisper.cpp#43 (QVAC-19557), on top of master 1c75d6e9.

What the pin brings:

  • Streamed GGUF tensor loads — no full-file host staging during chatterbox model loads (removes the +0.5–1 GB transient per load behind the iOS SDK jetsam kills).
  • EngineOptions::kv_cache_type (f32|f16|q8_0) — selectable T3 KV-cache dtype on a token-major slab; q8_0 stores the cache at ~27% of f32 and decodes 20-30% faster on Metal. Upstream-validated: f32 byte-identical to the old layout; Turbo greedy decoding byte-identical across all three dtypes (CPU + Metal).
  • Android-safe symbols — removes the last direct ggml_backend_is_cpu / ggml_get_type_traits_cpu references from tts-cpp (backend registry + ggml_quantize_chunk instead). nm -u libtts-cpp.a is clean of both symbols, so static tts-cpp no longer leaves unresolvable UND symbols in Android GGML_BACKEND_DL=ON addon builds — the dlopen crash that forced the tts-ggml 0.2.2 revert and pinned everything back to 2026-06-03.

Consumed by: @qvac/tts-ggml (tetherto/qvac#2527kvCacheType knob + q8_0/nCtx=4096 chatterbox defaults). The addon's full gtest suite (42/42) passes built against this revision via a local overlay port.

Draft until tetherto/qvac-ext-lib-whisper.cpp#43 merges — the pinned SHA is the PR-branch head (fetchable now and after the merge; happy to re-pin to the master merge commit instead once it lands if that's preferred).

🤖 Generated with Claude Code

… Android-safe symbols

Pins tetherto/qvac-ext-lib-whisper.cpp@8b012789 (PR #43 on top of
master 1c75d6e9):

- Streamed GGUF tensor loads: no full-file host staging during
  chatterbox model loads (removes the +0.5..1 GB transient per load
  behind the iOS SDK jetsam kills).
- EngineOptions::kv_cache_type (f32|f16|q8_0): selectable T3 KV-cache
  dtype on a token-major slab; q8_0 stores the cache at ~27% of f32
  and decodes 20-30% faster on Metal.  Validated upstream: f32 is
  byte-identical to the previous layout, Turbo greedy decoding is
  byte-identical across all three dtypes on CPU and Metal.
- Removes the last direct ggml_backend_is_cpu /
  ggml_get_type_traits_cpu references from tts-cpp (backend registry
  + ggml_quantize_chunk instead), so a static tts-cpp no longer
  leaves unresolvable UND symbols in Android GGML_BACKEND_DL=ON
  addon builds — the dlopen crash that forced the tts-ggml 0.2.2
  revert and pinned everything back to 2026-06-03.

Consumed by @qvac/tts-ggml (kvCacheType knob + q8_0/nCtx=4096
chatterbox defaults, qvac PR #2527).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…bility probe)

Updates the 2026-06-12 publish (still in-PR, not yet on registry main)
from 8b012789 to c8620cf9, which adds on top of the streaming-loads +
q8_0-KV + Android-symbol work:

- chatterbox_resolve_kv_type: load-time ggml_backend_supports_op probe
  that falls back to f32 when a backend rejects the requested f16/q8_0
  K/V (review follow-up on PR #43).
- Vulkan guard: quantized K/V forced to f32 on Vulkan, since
  ggml-vulkan's supports_op advertises q8_0 K/V FA but the NV_coopmat2
  kernel faults at compute (toggle-confirmed q8_0 SIGSEGV vs f32 pass on
  NVIDIA coopmat2 CI runners).  f16 + Metal/CPU q8_0 unaffected.

Same source repo/subfolder; new archive SHA512.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant