Add tts-cpp port + bump parakeet-cpp / ggml-speech to port-version 1 by GustavoA1604 · Pull Request #137 · tetherto/qvac-registry-vcpkg

GustavoA1604 · 2026-05-07T18:46:03Z

Summary

Atomic registry flip for the QVAC speech stack: registers the new tts-cpp port (Resemble Chatterbox Turbo + Multilingual + Supertonic TTS, sourced from the tts-cpp/ subtree of tetherto/qvac-ext-lib-whisper.cpp), and bumps the two existing speech-stack ports it ships with — parakeet-cpp and ggml-speech — to port-version 1.

All three changes consume ggml from the same ggml-speech baseline and are tested together by the downstream qvac3/packages/tts-ggml and qvac3/packages/parakeet-ggml addons. Splitting them across multiple PRs would force coordinated registry flips for one addon release, which is why they ship as one bundled change here.

3 commits
10 files changed, +192/-8

What's in the PR

1. `parakeet-cpp` → port-version 1 (commit `cf8dc01`)

Repoints the port at upstream tetherto/qvac-ext-lib-whisper.cpp@761eca0, which adds a new bool starts_word field on parakeet::StreamingSegment. The flag is true when the segment's first token's piece carries the SentencePiece ▁ (U+2581) word-boundary marker. Lets streaming consumers tell apart a chunk-boundary wordpiece continuation ("ctuation" after "pun" → glue without space, yielding "punctuation") from a fresh word ("if" after "see" → insert space, yielding "see if") without re-implementing the SentencePiece detokenizer rules.

Also exposes bool token_is_word_start(BpeVocab, int32_t) from sentencepiece_bpe.h so other engines that build their own segments (EOU per-utterance, attributed) can stamp the flag the same way.

Pure additive ABI change; existing callers that don't read the field are byte-equivalent.

git-tree for ports/parakeet-cpp: 4d5c4f8e101129537413aacc7caf38c3548dcd19.

2. `ggml-speech` → port-version 1 (commit `a27775c`)

Repoints ggml-speech at the latest tetherto/qvac-ext-ggml@speech tip (de7a55e3eea087bed6484607b518d60a3366acbe), which folds in two functional improvements landed since pv0:

Hybrid dynamic backends for Android. The Android variant now ships libqvac-speech-ggml-vulkan.so + libqvac-speech-ggml-opencl.so as dlopen-able accelerator backends alongside the always-loaded libqvac-speech-ggml-cpu-android_armv*_n.so set, matching the same packaging shape the LLM addon already uses on Adreno devices. The portfile flips on GGML_BACKEND_DL=ON + GGML_NATIVE=OFF for the speech build and emits the per-CPU-feature shared libs the Bare loader expects.
Vulkan pipeline-cache rename(2) → std::filesystem::rename. The speech-stack Vulkan path used POSIX rename(2) semantics for the .pcache flush, which silently fails on Windows when the destination already exists; switching to std::filesystem::rename gives portable overwrite-on-rename semantics. This was previously planned as a separate pv2 bump but is rolled into pv1 here so consumers only see one new entry.

The speech branch otherwise stays at the same set of speech-stack patches (chatterbox metal ops, opencl whitelist relaxation + persistent kernel cache, ggml-backend filename-prefix support, GGML_LIB_OUTPUT_PREFIX).

git-tree for ports/ggml-speech: ee31a48f4420f6cca6f6962f3c5f690f82d6eb2d.

3. New port: `tts-cpp` at `version-date 2026-05-07`, port-version 0 (commit `bb5a664`)

New port that builds the tts-cpp/ subtree of tetherto/qvac-ext-lib-whisper.cpp@0b446740 as a standalone vcpkg-cmake package. Ships Resemble Chatterbox (Turbo + Multilingual variants) and Supertonic engines under a unified tts_cpp::Engine API and consumes the ggml-speech port for the speech-flavoured ggml backends.

Source pinned at 0b446740, which is the latest tts-cpp tip on tetherto/qvac-ext-lib-whisper.cpp@master (post-merge of tetherto/qvac-ext-lib-whisper.cpp#14). Two correctness fixes beyond the initial subtree drop are already in this commit:

Multilingual T3 SOT/EOT wrap. Engine::run_t3 for MTL now wraps text tokens with start_text_token (255) / stop_text_token, matching chatterbox_cli.cpp's tokenisation path. Without it, the autoregressive decode dropped the first speech tokens of every MTL utterance, audible as a missing leading syllable on the addon path that doesn't go through the CLI. Turbo is unaffected.
3-consecutive-identical-token early-stop heuristic. Mirrors the CLI's existing AlignmentStreamAnalyzer::token_repetition guard inside Engine::run_t3 (gated on is_mtl). MTL T3 occasionally emits an end-of-speech silence cadence mid-utterance and then hallucinates ~40 s of trailing low-energy junk until n_predict=1000. Most reliably reproduced on German with seed=42.

Port shape (all parameters mirror the existing parakeet-cpp port for consistency, since both build the same kind of *-cpp subtree under the same qvac-ext-lib-whisper.cpp umbrella):

Default-features by triplet:
- metal on osx | ios
- opencl on android
- vulkan on windows | linux | android
- cuda opt-in only.
Each feature only activates the matching ggml-speech feature so the system-ggml-speech link line stays consistent (no cuda on tts-cpp without cuda on ggml-speech, etc.).
Linkage is forced static (BUILD_SHARED_LIBS=OFF + TTS_CPP_BUILD_SHARED=OFF) regardless of triplet preference; VCPKG_POLICY_MISMATCHED_NUMBER_OF_BINARIES is enabled to suppress the warning on dynamic triplets.
VCPKG_BUILD_TYPE=release (no debug build): matches parakeet-cpp precedent — a debug static archive of the engines balloons disk usage and isn't used by either addon.
TTS_CPP_OPENMP=OFF is forced (see Design notes).
TTS_CPP_USE_SYSTEM_GGML=ON (the only supported mode in this subtree; the bundled-ggml dev path was deleted from the in-tree subtree by design and lives in the standalone gianni-cor/chatterbox.cpp repo).
version>= constraint on ggml-speech is pinned at 2026-04-09#1 — i.e. the new pv1 from this same PR, locking in the Android hybrid backends + Vulkan rename Windows fix that the addon path needs.

Versions / baseline:

versions/t-/tts-cpp.json: single entry { "git-tree": "0e910c4d21965556cb40d4fdd991ab104bcdaff0", "version-date": "2026-05-07", "port-version": 0 }.
versions/baseline.json: register tts-cpp at 2026-05-07#0 alphabetically between tokenizers-cpp and vcpkg-cmake; bump ggml-speech and parakeet-cpp baselines to port-version: 1.

Design notes (preempting review questions)

These call out the deliberate choices that look unusual at first glance — flagging here so re-review doesn't re-litigate them.

Why ship all three changes in one PR?

parakeet-cpp, tts-cpp, and ggml-speech are three rungs of a single speech-stack baseline:

qvac3/packages/parakeet-ggml (existing addon) consumes parakeet-cpp + ggml-speech.
qvac3/packages/tts-ggml (new addon) consumes tts-cpp + ggml-speech.
Both share the same ggml-speech port-version, so the registry flip is atomic.

Splitting these into three PRs would mean three coordinated registry merges for one addon release, with each intermediate state under-tested (e.g. a baseline where parakeet-cpp pv1 ships against ggml-speech pv0, or tts-cpp is registered before its dependency exists at the right port-version). The author tested all three together; the registry should land them together too.

Why is `-DTTS_CPP_OPENMP=OFF` forced in the portfile?

Mirrors the -DPARAKEET_OPENMP=OFF flag in the sibling parakeet-cpp port. The upstream tts-cpp/CMakeLists.txt exposes option(TTS_CPP_OPENMP ... ON) and conditionally appends find_dependency(OpenMP COMPONENTS CXX) to the generated share/tts-cpp/tts-cppConfig.cmake when OpenMP_CXX_FOUND is true at install time.

Leaving the option at its upstream default would have two effects on Linux / macOS / Android / iOS triplets (where OpenMP is auto-discovered by the vcpkg toolchain):

The static libtts-cpp.a ends up linking libomp/libgomp transitively into the consumer's binary.
The generated tts-cppConfig.cmake carries find_dependency(OpenMP COMPONENTS CXX), forcing every downstream find_package(tts-cpp) consumer to also locate OpenMP at configure time.

Forcing OFF keeps the consumer surface uniform across triplets. The Windows non-MinGW guard upstream auto-disables it anyway, so this only normalizes the non-Windows triplets to the same shape. CAMPPlus is the only TU that uses #pragma omp parallel for and runs once per voice-encode at session init — bounded perf cost.

Why pin `tts-cpp`'s `version>=` to `ggml-speech 2026-04-09#1` (not the looser `2026-04-09`)?

tts-cpp's addon path on Android requires the hybrid dynamic backend mode (libqvac-speech-ggml-vulkan.so + libqvac-speech-ggml-opencl.so as dlopen-able modules alongside the per-CPU-feature .sos) and on Windows requires the Vulkan .pcache rename fix to avoid the cache being frozen at first-write size. Both land in ggml-speech pv1 from this PR.

Pinning version>= to 2026-04-09#1 (with the explicit #1 port-version separator vcpkg supports) means a downstream that overrides the registry baseline can't accidentally drop tts-cpp against ggml-speech pv0 — a config the author hasn't validated.

Why is `parakeet-cpp`'s `version>=` on `ggml-speech` left at the looser `2026-04-09` (no port-version pin)?

Asymmetric on purpose. parakeet-cpp pv1 is a pure source-level addition (bool starts_word on StreamingSegment) that doesn't actually require any new ggml-speech functionality to work — the new field is set by parakeet-cpp's own SentencePiece detokenizer code, not by ggml. So a consumer of parakeet-cpp pv1 against ggml-speech pv0 is technically a valid configuration.

Bumping the constraint to 2026-04-09#1 for parakeet-cpp would force a parakeet-cpp pv2 (because parakeet-cpp/vcpkg.json is already registered in versions/p-/parakeet-cpp.json at pv1 with the current tree). That extra port-version churn isn't worth it for a soft tightening — the registry baseline update in this PR already moves all default consumers to ggml-speech pv1.

Why does the `tts-cpp` portfile force `BUILD_SHARED_LIBS=OFF` regardless of triplet linkage?

Three reasons:

The upstream tts-cpp/CMakeLists.txt declares option(TTS_CPP_BUILD_SHARED ... OFF) separately from BUILD_SHARED_LIBS because ggml's own CMake declares its own option(BUILD_SHARED_LIBS) which pollutes the cache with a platform-dependent default once any configure has run. Forcing both off keeps the linkage deterministic per the project-namespaced option.
The tts-cpp test harnesses (test-mtl-tokenizer, test-supertonic-*) link against tts-cpp directly and use detail-namespaced symbols outside the TTS_CPP_API public surface. SHARED hides them and disables those targets — but the port already sets TTS_CPP_BUILD_TESTS=OFF, so this is moot here.
Both addons (parakeet-ggml, tts-ggml) link the static archive into a single Bare addon .so/.dll/.dylib. A shared tts-cpp would mean two GGML symbol exports in the addon (one through tts-cpp.so and one through whatever else picks up ggml-speech), which the loader doesn't deduplicate cleanly.

VCPKG_POLICY_MISMATCHED_NUMBER_OF_BINARIES at the top of the portfile suppresses the warning vcpkg would otherwise emit on dynamic triplets when the produced binary count doesn't match the triplet's expectation.

Why is the `cuda` feature on `tts-cpp` valid even though the portfile doesn't probe `nvcc`?

tts-cpp builds entirely on top of ggml-speech via find_package(ggml CONFIG) (TTS_CPP_USE_SYSTEM_GGML=ON). It compiles no CUDA source itself; passing -DGGML_CUDA=ON to its CMake only flips the GGML_USE_CUDA define on the tts-cpp-backend-defs INTERFACE library so consumers can dispatch through the appropriate backend headers. The actual CUDA backend lives in ggml-speech, which does the nvcc lookup in its portfile. Feature-dependency chaining (cuda on tts-cpp implies cuda on ggml-speech) ensures the runtime CUDA backend exists when the consumer enables the feature.

Why don't the default-features include `cuda` anywhere?

CUDA is opt-in across the registry's speech-stack ports. Most desktop / laptop targets that have a CUDA-capable GPU also have a working Vulkan stack, and Vulkan covers more of the ggml-speech test matrix (Linux + Windows + Android). Forcing CUDA into default-features would also force ggml-speech[cuda] into every Linux / Windows install, which requires a working CUDA toolkit on the build host. Mirrors the existing parakeet-cpp precedent.

Test plan

Upstream tts-cpp source-of-truth: tetherto/qvac-ext-lib-whisper.cpp#14 — adds the tts-cpp/ subtree, the MTL SOT/EOT + 3-token-stop fixes, and the parakeet-cpp starts_word field, all in one merge.
Upstream ggml-speech source-of-truth: tetherto/qvac-ext-ggml#7 — adds persistent VkPipelineCache, the hybrid backend packaging cherry-pick, and the Windows-correct .pcache rename.
Standalone tts-cpp dev repo: gianni-cor/chatterbox.cpp — bundled-ggml flow with setup-ggml.sh + patches/, used as the source-of-truth for development before each subtree drop.
Downstream consumers: qvac3/packages/tts-ggml (new), qvac3/packages/parakeet-ggml (existing).
Per-addon ggml prefix policy (libqvac-ggml-* vs libqvac-speech-ggml-* vs libqvac-diffusion-ggml-*): see ggml-speech portfile and the upstream qvac-ext-ggml#7 design notes.

Repoints the port at the upstream commit tetherto/qvac-ext-lib-whisper.cpp@0b44674, which adds the new `bool starts_word` field on parakeet::StreamingSegment. Lets streaming consumers tell apart a chunk-boundary wordpiece continuation ("ctuation" after "pun" -> glue without space, yielding "punctuation") from a fresh word ("if" after "see" -> insert space) without re-implementing the SentencePiece detokenizer rules. Picks the same commit as the new tts-cpp port in this series even though the parakeet-cpp/ subtree is byte-identical between the StreamingSegment.starts_word commit (761eca0c) and the latest tts-cpp tip (0b446740) -- only files under tts-cpp/ moved between the two. Reusing one REF means consumers that build both ports in the same triplet only download tetherto/qvac-ext-lib-whisper.cpp once instead of fetching two separate tarballs at adjacent commits. Also tightens the ggml-speech dependency from `2026-04-09` to `2026-04-09#1` for parity with tts-cpp; both speech ports rely on the new pv1 (Android hybrid dlopen backends + Vulkan pipeline-cache Windows fix) and the explicit port-version separator stops a downstream registry overriding the baseline from silently falling back to ggml-speech pv0. git-tree for ports/parakeet-cpp: 59736cc. Co-authored-by: Cursor <cursoragent@cursor.com>

…s fix) Repoints ggml-speech at the latest tetherto/qvac-ext-ggml@speech tip (de7a55e3eea087bed6484607b518d60a3366acbe), which folds in two functional improvements landed since pv0: - Hybrid dynamic backends for Android. The Android variant now ships libqvac-speech-ggml-vulkan.so + libqvac-speech-ggml-opencl.so as dlopen-able accelerator backends alongside the always-loaded libqvac-speech-ggml-cpu-android_armv*_n.so set, matching the same packaging shape the LLM addon already uses on Adreno devices. The portfile flips on GGML_BACKEND_DL + GGML_NATIVE=OFF for the speech build and emits the per-CPU-feature shared libs the Bare loader expects. - Vulkan pipeline-cache rename(2) -> std::filesystem::rename. The speech-stack Vulkan path used POSIX rename(2) semantics for the pipeline cache flush, which silently fails on Windows when the destination already exists; switching to std::filesystem::rename gives portable overwrite-on-rename semantics. No build / API change. This was previously planned as a separate pv2 bump but is rolled into pv1 here so consumers only see one new entry. The speech branch otherwise stays at the same set of speech-stack patches (chatterbox metal ops, opencl whitelist relaxation + persistent kernel cache, ggml-backend filename-prefix support, GGML_LIB_OUTPUT_PREFIX). git-tree for ports/ggml-speech: ee31a48. Co-authored-by: Cursor <cursoragent@cursor.com>

New port that builds the tts-cpp/ subtree of tetherto/qvac-ext-lib-whisper.cpp as a standalone vcpkg-cmake package. Ships Resemble Chatterbox (Turbo + Multilingual variants) and Supertonic engines under a unified tts_cpp::Engine API and consumes the existing ggml-speech port for the speech-flavoured ggml backends. Highlights: - Source pulled at the latest tts-cpp tip on tetherto upstream (tetherto/qvac-ext-lib-whisper.cpp@0b44674), which already includes: * Multilingual T3 input wrapped with start_text_token / stop_text_token (fixes the "missing leading syllable" bug when the addon path consumed Engine::run_t3 without the CLI's SOT/EOT wrapping). * 3-consecutive-identical-token early-stop heuristic in Engine::run_t3 (fixes the German MTL "40 s of silence / hissing / random tokens up to n_predict" runaway by bringing the Engine path to parity with the CLI's existing guard). - Default features mirror ggml-speech's GPU set: metal on Apple, opencl on Android, vulkan on Windows / Linux / Android; cuda is opt-in. Each feature only activates the matching ggml-speech feature so the system-ggml-speech link line stays consistent. - Configure flags: passes -DTTS_CPP_OPENMP=OFF alongside -DGGML_OPENMP=OFF for parity with parakeet-cpp. Without it the upstream tts-cpp/CMakeLists.txt defaults TTS_CPP_OPENMP=ON, which on macOS / Linux / Android / iOS pulls libomp/libgomp into the static libtts-cpp.a transitively and emits find_dependency(OpenMP COMPONENTS CXX) into the generated share/tts-cpp/tts-cppConfig.cmake, forcing every downstream find_package(tts-cpp) consumer to also locate OpenMP. The speech-stack consumers don't want that. - Pins ggml-speech >= 2026-04-09#1 (the new pv1 that ships the Android hybrid backends + Vulkan pipeline-cache Windows fix); explicit port-version separator guarantees a downstream registry overriding the baseline can't silently fall back to pv0. Versions / baseline: - versions/t-/tts-cpp.json: single entry { "git-tree": "0e910c4d21965556cb40d4fdd991ab104bcdaff0", "version-date": "2026-05-07", "port-version": 0 }. - versions/baseline.json: register tts-cpp at 2026-05-07#0 alphabetically between tokenizers-cpp and vcpkg-cmake. Co-authored-by: Cursor <cursoragent@cursor.com>

GustavoA1604 force-pushed the add-tts-cpp branch 2 times, most recently from e7602dd to bb5a664 Compare May 7, 2026 19:19

GustavoA1604 changed the title ~~Add tts cpp~~ Add tts-cpp port + bump parakeet-cpp / ggml-speech to port-version 1 May 7, 2026

GustavoA1604 and others added 3 commits May 7, 2026 16:32

GustavoA1604 force-pushed the add-tts-cpp branch from bb5a664 to ff90945 Compare May 7, 2026 19:33

gianni-cor approved these changes May 7, 2026

View reviewed changes

GustavoA1604 merged commit 74d2dfd into main May 7, 2026
2 checks passed

gianni-cor deleted the add-tts-cpp branch May 9, 2026 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tts-cpp port + bump parakeet-cpp / ggml-speech to port-version 1#137

Add tts-cpp port + bump parakeet-cpp / ggml-speech to port-version 1#137
GustavoA1604 merged 3 commits into
mainfrom
add-tts-cpp

GustavoA1604 commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GustavoA1604 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the PR

1. parakeet-cpp → port-version 1 (commit cf8dc01)

2. ggml-speech → port-version 1 (commit a27775c)

3. New port: tts-cpp at version-date 2026-05-07, port-version 0 (commit bb5a664)

Design notes (preempting review questions)

Why ship all three changes in one PR?

Why is -DTTS_CPP_OPENMP=OFF forced in the portfile?

Why pin tts-cpp's version>= to ggml-speech 2026-04-09#1 (not the looser 2026-04-09)?

Why is parakeet-cpp's version>= on ggml-speech left at the looser 2026-04-09 (no port-version pin)?

Why does the tts-cpp portfile force BUILD_SHARED_LIBS=OFF regardless of triplet linkage?

Why is the cuda feature on tts-cpp valid even though the portfile doesn't probe nvcc?

Why don't the default-features include cuda anywhere?

Test plan

Related

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GustavoA1604 commented May 7, 2026 •

edited

Loading

1. `parakeet-cpp` → port-version 1 (commit `cf8dc01`)

2. `ggml-speech` → port-version 1 (commit `a27775c`)

3. New port: `tts-cpp` at `version-date 2026-05-07`, port-version 0 (commit `bb5a664`)

Why is `-DTTS_CPP_OPENMP=OFF` forced in the portfile?

Why pin `tts-cpp`'s `version>=` to `ggml-speech 2026-04-09#1` (not the looser `2026-04-09`)?

Why is `parakeet-cpp`'s `version>=` on `ggml-speech` left at the looser `2026-04-09` (no port-version pin)?

Why does the `tts-cpp` portfile force `BUILD_SHARED_LIBS=OFF` regardless of triplet linkage?

Why is the `cuda` feature on `tts-cpp` valid even though the portfile doesn't probe `nvcc`?

Why don't the default-features include `cuda` anywhere?