Skip to content

Add tts-cpp port + bump parakeet-cpp / ggml-speech to port-version 1#137

Merged
GustavoA1604 merged 3 commits into
mainfrom
add-tts-cpp
May 7, 2026
Merged

Add tts-cpp port + bump parakeet-cpp / ggml-speech to port-version 1#137
GustavoA1604 merged 3 commits into
mainfrom
add-tts-cpp

Conversation

@GustavoA1604

@GustavoA1604 GustavoA1604 commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Atomic registry flip for the QVAC speech stack: registers the new tts-cpp port (Resemble Chatterbox Turbo + Multilingual + Supertonic TTS, sourced from the tts-cpp/ subtree of tetherto/qvac-ext-lib-whisper.cpp), and bumps the two existing speech-stack ports it ships with — parakeet-cpp and ggml-speech — to port-version 1.

All three changes consume ggml from the same ggml-speech baseline and are tested together by the downstream qvac3/packages/tts-ggml and qvac3/packages/parakeet-ggml addons. Splitting them across multiple PRs would force coordinated registry flips for one addon release, which is why they ship as one bundled change here.

  • 3 commits
  • 10 files changed, +192/-8

What's in the PR

1. parakeet-cpp → port-version 1 (commit cf8dc01)

Repoints the port at upstream tetherto/qvac-ext-lib-whisper.cpp@761eca0, which adds a new bool starts_word field on parakeet::StreamingSegment. The flag is true when the segment's first token's piece carries the SentencePiece (U+2581) word-boundary marker. Lets streaming consumers tell apart a chunk-boundary wordpiece continuation ("ctuation" after "pun" → glue without space, yielding "punctuation") from a fresh word ("if" after "see" → insert space, yielding "see if") without re-implementing the SentencePiece detokenizer rules.

Also exposes bool token_is_word_start(BpeVocab, int32_t) from sentencepiece_bpe.h so other engines that build their own segments (EOU per-utterance, attributed) can stamp the flag the same way.

Pure additive ABI change; existing callers that don't read the field are byte-equivalent.

  • git-tree for ports/parakeet-cpp: 4d5c4f8e101129537413aacc7caf38c3548dcd19.

2. ggml-speech → port-version 1 (commit a27775c)

Repoints ggml-speech at the latest tetherto/qvac-ext-ggml@speech tip (de7a55e3eea087bed6484607b518d60a3366acbe), which folds in two functional improvements landed since pv0:

  • Hybrid dynamic backends for Android. The Android variant now ships libqvac-speech-ggml-vulkan.so + libqvac-speech-ggml-opencl.so as dlopen-able accelerator backends alongside the always-loaded libqvac-speech-ggml-cpu-android_armv*_n.so set, matching the same packaging shape the LLM addon already uses on Adreno devices. The portfile flips on GGML_BACKEND_DL=ON + GGML_NATIVE=OFF for the speech build and emits the per-CPU-feature shared libs the Bare loader expects.

  • Vulkan pipeline-cache rename(2)std::filesystem::rename. The speech-stack Vulkan path used POSIX rename(2) semantics for the .pcache flush, which silently fails on Windows when the destination already exists; switching to std::filesystem::rename gives portable overwrite-on-rename semantics. This was previously planned as a separate pv2 bump but is rolled into pv1 here so consumers only see one new entry.

The speech branch otherwise stays at the same set of speech-stack patches (chatterbox metal ops, opencl whitelist relaxation + persistent kernel cache, ggml-backend filename-prefix support, GGML_LIB_OUTPUT_PREFIX).

  • git-tree for ports/ggml-speech: ee31a48f4420f6cca6f6962f3c5f690f82d6eb2d.

3. New port: tts-cpp at version-date 2026-05-07, port-version 0 (commit bb5a664)

New port that builds the tts-cpp/ subtree of tetherto/qvac-ext-lib-whisper.cpp@0b446740 as a standalone vcpkg-cmake package. Ships Resemble Chatterbox (Turbo + Multilingual variants) and Supertonic engines under a unified tts_cpp::Engine API and consumes the ggml-speech port for the speech-flavoured ggml backends.

Source pinned at 0b446740, which is the latest tts-cpp tip on tetherto/qvac-ext-lib-whisper.cpp@master (post-merge of tetherto/qvac-ext-lib-whisper.cpp#14). Two correctness fixes beyond the initial subtree drop are already in this commit:

  • Multilingual T3 SOT/EOT wrap. Engine::run_t3 for MTL now wraps text tokens with start_text_token (255) / stop_text_token, matching chatterbox_cli.cpp's tokenisation path. Without it, the autoregressive decode dropped the first speech tokens of every MTL utterance, audible as a missing leading syllable on the addon path that doesn't go through the CLI. Turbo is unaffected.
  • 3-consecutive-identical-token early-stop heuristic. Mirrors the CLI's existing AlignmentStreamAnalyzer::token_repetition guard inside Engine::run_t3 (gated on is_mtl). MTL T3 occasionally emits an end-of-speech silence cadence mid-utterance and then hallucinates ~40 s of trailing low-energy junk until n_predict=1000. Most reliably reproduced on German with seed=42.

Port shape (all parameters mirror the existing parakeet-cpp port for consistency, since both build the same kind of *-cpp subtree under the same qvac-ext-lib-whisper.cpp umbrella):

  • Default-features by triplet:
    • metal on osx | ios
    • opencl on android
    • vulkan on windows | linux | android
    • cuda opt-in only.
  • Each feature only activates the matching ggml-speech feature so the system-ggml-speech link line stays consistent (no cuda on tts-cpp without cuda on ggml-speech, etc.).
  • Linkage is forced static (BUILD_SHARED_LIBS=OFF + TTS_CPP_BUILD_SHARED=OFF) regardless of triplet preference; VCPKG_POLICY_MISMATCHED_NUMBER_OF_BINARIES is enabled to suppress the warning on dynamic triplets.
  • VCPKG_BUILD_TYPE=release (no debug build): matches parakeet-cpp precedent — a debug static archive of the engines balloons disk usage and isn't used by either addon.
  • TTS_CPP_OPENMP=OFF is forced (see Design notes).
  • TTS_CPP_USE_SYSTEM_GGML=ON (the only supported mode in this subtree; the bundled-ggml dev path was deleted from the in-tree subtree by design and lives in the standalone gianni-cor/chatterbox.cpp repo).
  • version>= constraint on ggml-speech is pinned at 2026-04-09#1 — i.e. the new pv1 from this same PR, locking in the Android hybrid backends + Vulkan rename Windows fix that the addon path needs.

Versions / baseline:

  • versions/t-/tts-cpp.json: single entry { "git-tree": "0e910c4d21965556cb40d4fdd991ab104bcdaff0", "version-date": "2026-05-07", "port-version": 0 }.
  • versions/baseline.json: register tts-cpp at 2026-05-07#0 alphabetically between tokenizers-cpp and vcpkg-cmake; bump ggml-speech and parakeet-cpp baselines to port-version: 1.

Design notes (preempting review questions)

These call out the deliberate choices that look unusual at first glance — flagging here so re-review doesn't re-litigate them.

Why ship all three changes in one PR?

parakeet-cpp, tts-cpp, and ggml-speech are three rungs of a single speech-stack baseline:

  • qvac3/packages/parakeet-ggml (existing addon) consumes parakeet-cpp + ggml-speech.
  • qvac3/packages/tts-ggml (new addon) consumes tts-cpp + ggml-speech.
  • Both share the same ggml-speech port-version, so the registry flip is atomic.

Splitting these into three PRs would mean three coordinated registry merges for one addon release, with each intermediate state under-tested (e.g. a baseline where parakeet-cpp pv1 ships against ggml-speech pv0, or tts-cpp is registered before its dependency exists at the right port-version). The author tested all three together; the registry should land them together too.

Why is -DTTS_CPP_OPENMP=OFF forced in the portfile?

Mirrors the -DPARAKEET_OPENMP=OFF flag in the sibling parakeet-cpp port. The upstream tts-cpp/CMakeLists.txt exposes option(TTS_CPP_OPENMP ... ON) and conditionally appends find_dependency(OpenMP COMPONENTS CXX) to the generated share/tts-cpp/tts-cppConfig.cmake when OpenMP_CXX_FOUND is true at install time.

Leaving the option at its upstream default would have two effects on Linux / macOS / Android / iOS triplets (where OpenMP is auto-discovered by the vcpkg toolchain):

  1. The static libtts-cpp.a ends up linking libomp/libgomp transitively into the consumer's binary.
  2. The generated tts-cppConfig.cmake carries find_dependency(OpenMP COMPONENTS CXX), forcing every downstream find_package(tts-cpp) consumer to also locate OpenMP at configure time.

Forcing OFF keeps the consumer surface uniform across triplets. The Windows non-MinGW guard upstream auto-disables it anyway, so this only normalizes the non-Windows triplets to the same shape. CAMPPlus is the only TU that uses #pragma omp parallel for and runs once per voice-encode at session init — bounded perf cost.

Why pin tts-cpp's version>= to ggml-speech 2026-04-09#1 (not the looser 2026-04-09)?

tts-cpp's addon path on Android requires the hybrid dynamic backend mode (libqvac-speech-ggml-vulkan.so + libqvac-speech-ggml-opencl.so as dlopen-able modules alongside the per-CPU-feature .sos) and on Windows requires the Vulkan .pcache rename fix to avoid the cache being frozen at first-write size. Both land in ggml-speech pv1 from this PR.

Pinning version>= to 2026-04-09#1 (with the explicit #1 port-version separator vcpkg supports) means a downstream that overrides the registry baseline can't accidentally drop tts-cpp against ggml-speech pv0 — a config the author hasn't validated.

Why is parakeet-cpp's version>= on ggml-speech left at the looser 2026-04-09 (no port-version pin)?

Asymmetric on purpose. parakeet-cpp pv1 is a pure source-level addition (bool starts_word on StreamingSegment) that doesn't actually require any new ggml-speech functionality to work — the new field is set by parakeet-cpp's own SentencePiece detokenizer code, not by ggml. So a consumer of parakeet-cpp pv1 against ggml-speech pv0 is technically a valid configuration.

Bumping the constraint to 2026-04-09#1 for parakeet-cpp would force a parakeet-cpp pv2 (because parakeet-cpp/vcpkg.json is already registered in versions/p-/parakeet-cpp.json at pv1 with the current tree). That extra port-version churn isn't worth it for a soft tightening — the registry baseline update in this PR already moves all default consumers to ggml-speech pv1.

Why does the tts-cpp portfile force BUILD_SHARED_LIBS=OFF regardless of triplet linkage?

Three reasons:

  1. The upstream tts-cpp/CMakeLists.txt declares option(TTS_CPP_BUILD_SHARED ... OFF) separately from BUILD_SHARED_LIBS because ggml's own CMake declares its own option(BUILD_SHARED_LIBS) which pollutes the cache with a platform-dependent default once any configure has run. Forcing both off keeps the linkage deterministic per the project-namespaced option.
  2. The tts-cpp test harnesses (test-mtl-tokenizer, test-supertonic-*) link against tts-cpp directly and use detail-namespaced symbols outside the TTS_CPP_API public surface. SHARED hides them and disables those targets — but the port already sets TTS_CPP_BUILD_TESTS=OFF, so this is moot here.
  3. Both addons (parakeet-ggml, tts-ggml) link the static archive into a single Bare addon .so/.dll/.dylib. A shared tts-cpp would mean two GGML symbol exports in the addon (one through tts-cpp.so and one through whatever else picks up ggml-speech), which the loader doesn't deduplicate cleanly.

VCPKG_POLICY_MISMATCHED_NUMBER_OF_BINARIES at the top of the portfile suppresses the warning vcpkg would otherwise emit on dynamic triplets when the produced binary count doesn't match the triplet's expectation.

Why is the cuda feature on tts-cpp valid even though the portfile doesn't probe nvcc?

tts-cpp builds entirely on top of ggml-speech via find_package(ggml CONFIG) (TTS_CPP_USE_SYSTEM_GGML=ON). It compiles no CUDA source itself; passing -DGGML_CUDA=ON to its CMake only flips the GGML_USE_CUDA define on the tts-cpp-backend-defs INTERFACE library so consumers can dispatch through the appropriate backend headers. The actual CUDA backend lives in ggml-speech, which does the nvcc lookup in its portfile. Feature-dependency chaining (cuda on tts-cpp implies cuda on ggml-speech) ensures the runtime CUDA backend exists when the consumer enables the feature.

Why don't the default-features include cuda anywhere?

CUDA is opt-in across the registry's speech-stack ports. Most desktop / laptop targets that have a CUDA-capable GPU also have a working Vulkan stack, and Vulkan covers more of the ggml-speech test matrix (Linux + Windows + Android). Forcing CUDA into default-features would also force ggml-speech[cuda] into every Linux / Windows install, which requires a working CUDA toolkit on the build host. Mirrors the existing parakeet-cpp precedent.

Test plan

  • All three referenced upstream tarballs hash-verify against the SHA512 strings in the portfiles:
    • tetherto/qvac-ext-lib-whisper.cpp@0b446740 (tts-cpp source) → 9eb64d8d…405964ff.
    • tetherto/qvac-ext-lib-whisper.cpp@761eca0c (parakeet-cpp source) → ffe69b99…7eb7524d.
    • tetherto/qvac-ext-ggml@de7a55e3 (ggml-speech source) → 16058815…4fc52a74.
  • All three port-tree git-tree ids in versions/{g,p,t}-/*.json match git rev-parse <pr-head>:ports/<port> for ggml-speech, parakeet-cpp, tts-cpp.
  • versions/baseline.json insertion is alphabetically correct (tts-cpp between tokenizers-cpp and vcpkg-cmake); both pv1 bumps are reflected.
  • vcpkg install tts-cpp from a manifest-mode consumer pulls ggml-speech pv1 + tts-cpp pv0 and produces a clean share/tts-cpp/tts-cppConfig.cmake that depends only on find_dependency(ggml CONFIG) (no transitive find_dependency(OpenMP) after the -DTTS_CPP_OPENMP=OFF enforcement).
  • find_package(tts-cpp CONFIG REQUIRED) from a downstream CMake project resolves tts-cpp::tts-cpp as a static archive that links against ggml::ggml from ggml-speech pv1, on Linux (Vulkan), macOS (Metal), Windows (Vulkan), and Android (opencl + vulkan).
  • vcpkg install ggml-speech on arm64-android produces both libqvac-speech-ggml-vulkan.so and libqvac-speech-ggml-opencl.so in the lib/ install directory alongside the per-CPU-feature libqvac-speech-ggml-cpu-android_armv*_n.so set (proof of the new hybrid dynamic-backend mode landing).
  • vcpkg install ggml-speech on Windows (Vulkan): repeated chatterbox.cpp runs grow .pcache past first-write size on the second-and-later flushes (proof of the rename(2)std::filesystem::rename fix landing).
  • qvac3/packages/parakeet-ggml builds against parakeet-cpp pv1 + ggml-speech pv1; StreamingSegment.starts_word is plumbed through to the JS API.
  • qvac3/packages/tts-ggml builds against the new tts-cpp port + ggml-speech pv1 on Linux x86_64 (Vulkan), macOS (Metal), Windows MSVC (Vulkan), Android arm64 (OpenCL + Vulkan); chatterbox-mtl German seed-42 reproduction (the worst case for the 3-identical-token bug) ends cleanly without the trailing 40 s of silence/hissing.
  • Existing parakeet-cpp consumers that don't read starts_word are byte-equivalent — additive ABI only.

Related

  • Upstream tts-cpp source-of-truth: tetherto/qvac-ext-lib-whisper.cpp#14 — adds the tts-cpp/ subtree, the MTL SOT/EOT + 3-token-stop fixes, and the parakeet-cpp starts_word field, all in one merge.
  • Upstream ggml-speech source-of-truth: tetherto/qvac-ext-ggml#7 — adds persistent VkPipelineCache, the hybrid backend packaging cherry-pick, and the Windows-correct .pcache rename.
  • Standalone tts-cpp dev repo: gianni-cor/chatterbox.cpp — bundled-ggml flow with setup-ggml.sh + patches/, used as the source-of-truth for development before each subtree drop.
  • Downstream consumers: qvac3/packages/tts-ggml (new), qvac3/packages/parakeet-ggml (existing).
  • Per-addon ggml prefix policy (libqvac-ggml-* vs libqvac-speech-ggml-* vs libqvac-diffusion-ggml-*): see ggml-speech portfile and the upstream qvac-ext-ggml#7 design notes.

@GustavoA1604 GustavoA1604 force-pushed the add-tts-cpp branch 2 times, most recently from e7602dd to bb5a664 Compare May 7, 2026 19:19
@GustavoA1604 GustavoA1604 changed the title Add tts cpp Add tts-cpp port + bump parakeet-cpp / ggml-speech to port-version 1 May 7, 2026
GustavoA1604 and others added 3 commits May 7, 2026 16:32
Repoints the port at the upstream commit
tetherto/qvac-ext-lib-whisper.cpp@0b44674,
which adds the new `bool starts_word` field on
parakeet::StreamingSegment.  Lets streaming consumers tell apart a
chunk-boundary wordpiece continuation ("ctuation" after "pun" -> glue
without space, yielding "punctuation") from a fresh word ("if" after
"see" -> insert space) without re-implementing the SentencePiece
detokenizer rules.

Picks the same commit as the new tts-cpp port in this series even
though the parakeet-cpp/ subtree is byte-identical between the
StreamingSegment.starts_word commit (761eca0c) and the latest
tts-cpp tip (0b446740) -- only files under tts-cpp/ moved between
the two.  Reusing one REF means consumers that build both ports in
the same triplet only download tetherto/qvac-ext-lib-whisper.cpp
once instead of fetching two separate tarballs at adjacent commits.

Also tightens the ggml-speech dependency from `2026-04-09` to
`2026-04-09#1` for parity with tts-cpp; both speech ports rely on
the new pv1 (Android hybrid dlopen backends + Vulkan pipeline-cache
Windows fix) and the explicit port-version separator stops a
downstream registry overriding the baseline from silently falling
back to ggml-speech pv0.

git-tree for ports/parakeet-cpp: 59736cc.

Co-authored-by: Cursor <cursoragent@cursor.com>
…s fix)

Repoints ggml-speech at the latest tetherto/qvac-ext-ggml@speech tip
(de7a55e3eea087bed6484607b518d60a3366acbe), which folds in two
functional improvements landed since pv0:

- Hybrid dynamic backends for Android.  The Android variant now ships
  libqvac-speech-ggml-vulkan.so + libqvac-speech-ggml-opencl.so as
  dlopen-able accelerator backends alongside the always-loaded
  libqvac-speech-ggml-cpu-android_armv*_n.so set, matching the same
  packaging shape the LLM addon already uses on Adreno devices.  The
  portfile flips on GGML_BACKEND_DL + GGML_NATIVE=OFF for the speech
  build and emits the per-CPU-feature shared libs the Bare loader
  expects.

- Vulkan pipeline-cache rename(2) -> std::filesystem::rename.  The
  speech-stack Vulkan path used POSIX rename(2) semantics for the
  pipeline cache flush, which silently fails on Windows when the
  destination already exists; switching to std::filesystem::rename
  gives portable overwrite-on-rename semantics.  No build / API
  change.  This was previously planned as a separate pv2 bump but is
  rolled into pv1 here so consumers only see one new entry.

The speech branch otherwise stays at the same set of speech-stack
patches (chatterbox metal ops, opencl whitelist relaxation +
persistent kernel cache, ggml-backend filename-prefix support,
GGML_LIB_OUTPUT_PREFIX).

git-tree for ports/ggml-speech: ee31a48.

Co-authored-by: Cursor <cursoragent@cursor.com>
New port that builds the tts-cpp/ subtree of
tetherto/qvac-ext-lib-whisper.cpp as a standalone vcpkg-cmake package.
Ships Resemble Chatterbox (Turbo + Multilingual variants) and
Supertonic engines under a unified tts_cpp::Engine API and consumes
the existing ggml-speech port for the speech-flavoured ggml backends.

Highlights:
  - Source pulled at the latest tts-cpp tip on tetherto upstream
    (tetherto/qvac-ext-lib-whisper.cpp@0b44674),
    which already includes:
      * Multilingual T3 input wrapped with start_text_token /
        stop_text_token (fixes the "missing leading syllable" bug
        when the addon path consumed Engine::run_t3 without the CLI's
        SOT/EOT wrapping).
      * 3-consecutive-identical-token early-stop heuristic in
        Engine::run_t3 (fixes the German MTL "40 s of silence /
        hissing / random tokens up to n_predict" runaway by bringing
        the Engine path to parity with the CLI's existing guard).
  - Default features mirror ggml-speech's GPU set: metal on Apple,
    opencl on Android, vulkan on Windows / Linux / Android; cuda is
    opt-in.  Each feature only activates the matching ggml-speech
    feature so the system-ggml-speech link line stays consistent.
  - Configure flags: passes -DTTS_CPP_OPENMP=OFF alongside
    -DGGML_OPENMP=OFF for parity with parakeet-cpp.  Without it the
    upstream tts-cpp/CMakeLists.txt defaults TTS_CPP_OPENMP=ON,
    which on macOS / Linux / Android / iOS pulls libomp/libgomp into
    the static libtts-cpp.a transitively and emits
    find_dependency(OpenMP COMPONENTS CXX) into the generated
    share/tts-cpp/tts-cppConfig.cmake, forcing every downstream
    find_package(tts-cpp) consumer to also locate OpenMP.  The
    speech-stack consumers don't want that.
  - Pins ggml-speech >= 2026-04-09#1 (the new pv1 that ships the
    Android hybrid backends + Vulkan pipeline-cache Windows fix);
    explicit port-version separator guarantees a downstream registry
    overriding the baseline can't silently fall back to pv0.

Versions / baseline:
  - versions/t-/tts-cpp.json: single entry { "git-tree":
    "0e910c4d21965556cb40d4fdd991ab104bcdaff0", "version-date":
    "2026-05-07", "port-version": 0 }.
  - versions/baseline.json: register tts-cpp at 2026-05-07#0
    alphabetically between tokenizers-cpp and vcpkg-cmake.

Co-authored-by: Cursor <cursoragent@cursor.com>
@GustavoA1604 GustavoA1604 merged commit 74d2dfd into main May 7, 2026
2 checks passed
@gianni-cor gianni-cor deleted the add-tts-cpp branch May 9, 2026 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants