Add tts-cpp port + bump parakeet-cpp / ggml-speech to port-version 1#137
Merged
Conversation
e7602dd to
bb5a664
Compare
Repoints the port at the upstream commit tetherto/qvac-ext-lib-whisper.cpp@0b44674, which adds the new `bool starts_word` field on parakeet::StreamingSegment. Lets streaming consumers tell apart a chunk-boundary wordpiece continuation ("ctuation" after "pun" -> glue without space, yielding "punctuation") from a fresh word ("if" after "see" -> insert space) without re-implementing the SentencePiece detokenizer rules. Picks the same commit as the new tts-cpp port in this series even though the parakeet-cpp/ subtree is byte-identical between the StreamingSegment.starts_word commit (761eca0c) and the latest tts-cpp tip (0b446740) -- only files under tts-cpp/ moved between the two. Reusing one REF means consumers that build both ports in the same triplet only download tetherto/qvac-ext-lib-whisper.cpp once instead of fetching two separate tarballs at adjacent commits. Also tightens the ggml-speech dependency from `2026-04-09` to `2026-04-09#1` for parity with tts-cpp; both speech ports rely on the new pv1 (Android hybrid dlopen backends + Vulkan pipeline-cache Windows fix) and the explicit port-version separator stops a downstream registry overriding the baseline from silently falling back to ggml-speech pv0. git-tree for ports/parakeet-cpp: 59736cc. Co-authored-by: Cursor <cursoragent@cursor.com>
…s fix) Repoints ggml-speech at the latest tetherto/qvac-ext-ggml@speech tip (de7a55e3eea087bed6484607b518d60a3366acbe), which folds in two functional improvements landed since pv0: - Hybrid dynamic backends for Android. The Android variant now ships libqvac-speech-ggml-vulkan.so + libqvac-speech-ggml-opencl.so as dlopen-able accelerator backends alongside the always-loaded libqvac-speech-ggml-cpu-android_armv*_n.so set, matching the same packaging shape the LLM addon already uses on Adreno devices. The portfile flips on GGML_BACKEND_DL + GGML_NATIVE=OFF for the speech build and emits the per-CPU-feature shared libs the Bare loader expects. - Vulkan pipeline-cache rename(2) -> std::filesystem::rename. The speech-stack Vulkan path used POSIX rename(2) semantics for the pipeline cache flush, which silently fails on Windows when the destination already exists; switching to std::filesystem::rename gives portable overwrite-on-rename semantics. No build / API change. This was previously planned as a separate pv2 bump but is rolled into pv1 here so consumers only see one new entry. The speech branch otherwise stays at the same set of speech-stack patches (chatterbox metal ops, opencl whitelist relaxation + persistent kernel cache, ggml-backend filename-prefix support, GGML_LIB_OUTPUT_PREFIX). git-tree for ports/ggml-speech: ee31a48. Co-authored-by: Cursor <cursoragent@cursor.com>
New port that builds the tts-cpp/ subtree of
tetherto/qvac-ext-lib-whisper.cpp as a standalone vcpkg-cmake package.
Ships Resemble Chatterbox (Turbo + Multilingual variants) and
Supertonic engines under a unified tts_cpp::Engine API and consumes
the existing ggml-speech port for the speech-flavoured ggml backends.
Highlights:
- Source pulled at the latest tts-cpp tip on tetherto upstream
(tetherto/qvac-ext-lib-whisper.cpp@0b44674),
which already includes:
* Multilingual T3 input wrapped with start_text_token /
stop_text_token (fixes the "missing leading syllable" bug
when the addon path consumed Engine::run_t3 without the CLI's
SOT/EOT wrapping).
* 3-consecutive-identical-token early-stop heuristic in
Engine::run_t3 (fixes the German MTL "40 s of silence /
hissing / random tokens up to n_predict" runaway by bringing
the Engine path to parity with the CLI's existing guard).
- Default features mirror ggml-speech's GPU set: metal on Apple,
opencl on Android, vulkan on Windows / Linux / Android; cuda is
opt-in. Each feature only activates the matching ggml-speech
feature so the system-ggml-speech link line stays consistent.
- Configure flags: passes -DTTS_CPP_OPENMP=OFF alongside
-DGGML_OPENMP=OFF for parity with parakeet-cpp. Without it the
upstream tts-cpp/CMakeLists.txt defaults TTS_CPP_OPENMP=ON,
which on macOS / Linux / Android / iOS pulls libomp/libgomp into
the static libtts-cpp.a transitively and emits
find_dependency(OpenMP COMPONENTS CXX) into the generated
share/tts-cpp/tts-cppConfig.cmake, forcing every downstream
find_package(tts-cpp) consumer to also locate OpenMP. The
speech-stack consumers don't want that.
- Pins ggml-speech >= 2026-04-09#1 (the new pv1 that ships the
Android hybrid backends + Vulkan pipeline-cache Windows fix);
explicit port-version separator guarantees a downstream registry
overriding the baseline can't silently fall back to pv0.
Versions / baseline:
- versions/t-/tts-cpp.json: single entry { "git-tree":
"0e910c4d21965556cb40d4fdd991ab104bcdaff0", "version-date":
"2026-05-07", "port-version": 0 }.
- versions/baseline.json: register tts-cpp at 2026-05-07#0
alphabetically between tokenizers-cpp and vcpkg-cmake.
Co-authored-by: Cursor <cursoragent@cursor.com>
gianni-cor
approved these changes
May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Atomic registry flip for the QVAC speech stack: registers the new
tts-cppport (Resemble Chatterbox Turbo + Multilingual + Supertonic TTS, sourced from thetts-cpp/subtree oftetherto/qvac-ext-lib-whisper.cpp), and bumps the two existing speech-stack ports it ships with —parakeet-cppandggml-speech— to port-version1.All three changes consume ggml from the same
ggml-speechbaseline and are tested together by the downstreamqvac3/packages/tts-ggmlandqvac3/packages/parakeet-ggmladdons. Splitting them across multiple PRs would force coordinated registry flips for one addon release, which is why they ship as one bundled change here.What's in the PR
1.
parakeet-cpp→ port-version 1 (commitcf8dc01)Repoints the port at upstream
tetherto/qvac-ext-lib-whisper.cpp@761eca0, which adds a newbool starts_wordfield onparakeet::StreamingSegment. The flag istruewhen the segment's first token's piece carries the SentencePiece▁(U+2581) word-boundary marker. Lets streaming consumers tell apart a chunk-boundary wordpiece continuation ("ctuation"after"pun"→ glue without space, yielding"punctuation") from a fresh word ("if"after"see"→ insert space, yielding"see if") without re-implementing the SentencePiece detokenizer rules.Also exposes
bool token_is_word_start(BpeVocab, int32_t)fromsentencepiece_bpe.hso other engines that build their own segments (EOU per-utterance, attributed) can stamp the flag the same way.Pure additive ABI change; existing callers that don't read the field are byte-equivalent.
git-treeforports/parakeet-cpp:4d5c4f8e101129537413aacc7caf38c3548dcd19.2.
ggml-speech→ port-version 1 (commita27775c)Repoints
ggml-speechat the latesttetherto/qvac-ext-ggml@speechtip (de7a55e3eea087bed6484607b518d60a3366acbe), which folds in two functional improvements landed since pv0:Hybrid dynamic backends for Android. The Android variant now ships
libqvac-speech-ggml-vulkan.so+libqvac-speech-ggml-opencl.soasdlopen-able accelerator backends alongside the always-loadedlibqvac-speech-ggml-cpu-android_armv*_n.soset, matching the same packaging shape the LLM addon already uses on Adreno devices. The portfile flips onGGML_BACKEND_DL=ON+GGML_NATIVE=OFFfor the speech build and emits the per-CPU-feature shared libs the Bare loader expects.Vulkan pipeline-cache
rename(2)→std::filesystem::rename. The speech-stack Vulkan path used POSIXrename(2)semantics for the.pcacheflush, which silently fails on Windows when the destination already exists; switching tostd::filesystem::renamegives portable overwrite-on-rename semantics. This was previously planned as a separate pv2 bump but is rolled into pv1 here so consumers only see one new entry.The speech branch otherwise stays at the same set of speech-stack patches (chatterbox metal ops, opencl whitelist relaxation + persistent kernel cache, ggml-backend filename-prefix support,
GGML_LIB_OUTPUT_PREFIX).git-treeforports/ggml-speech:ee31a48f4420f6cca6f6962f3c5f690f82d6eb2d.3. New port:
tts-cppatversion-date 2026-05-07, port-version 0 (commitbb5a664)New port that builds the
tts-cpp/subtree oftetherto/qvac-ext-lib-whisper.cpp@0b446740as a standalone vcpkg-cmake package. Ships Resemble Chatterbox (Turbo + Multilingual variants) and Supertonic engines under a unifiedtts_cpp::EngineAPI and consumes theggml-speechport for the speech-flavoured ggml backends.Source pinned at
0b446740, which is the latesttts-cpptip ontetherto/qvac-ext-lib-whisper.cpp@master(post-merge oftetherto/qvac-ext-lib-whisper.cpp#14). Two correctness fixes beyond the initial subtree drop are already in this commit:Engine::run_t3for MTL now wraps text tokens withstart_text_token(255) /stop_text_token, matchingchatterbox_cli.cpp's tokenisation path. Without it, the autoregressive decode dropped the first speech tokens of every MTL utterance, audible as a missing leading syllable on the addon path that doesn't go through the CLI. Turbo is unaffected.AlignmentStreamAnalyzer::token_repetitionguard insideEngine::run_t3(gated onis_mtl). MTL T3 occasionally emits an end-of-speech silence cadence mid-utterance and then hallucinates ~40 s of trailing low-energy junk untiln_predict=1000. Most reliably reproduced on German withseed=42.Port shape (all parameters mirror the existing
parakeet-cppport for consistency, since both build the same kind of*-cppsubtree under the sameqvac-ext-lib-whisper.cppumbrella):metalonosx | iosopenclonandroidvulkanonwindows | linux | androidcudaopt-in only.ggml-speechfeature so the system-ggml-speech link line stays consistent (nocudaon tts-cpp withoutcudaon ggml-speech, etc.).BUILD_SHARED_LIBS=OFF+TTS_CPP_BUILD_SHARED=OFF) regardless of triplet preference;VCPKG_POLICY_MISMATCHED_NUMBER_OF_BINARIESis enabled to suppress the warning on dynamic triplets.VCPKG_BUILD_TYPE=release(no debug build): matchesparakeet-cppprecedent — a debug static archive of the engines balloons disk usage and isn't used by either addon.TTS_CPP_OPENMP=OFFis forced (see Design notes).TTS_CPP_USE_SYSTEM_GGML=ON(the only supported mode in this subtree; the bundled-ggml dev path was deleted from the in-tree subtree by design and lives in the standalonegianni-cor/chatterbox.cpprepo).version>=constraint onggml-speechis pinned at2026-04-09#1— i.e. the new pv1 from this same PR, locking in the Android hybrid backends + Vulkan rename Windows fix that the addon path needs.Versions / baseline:
versions/t-/tts-cpp.json: single entry{ "git-tree": "0e910c4d21965556cb40d4fdd991ab104bcdaff0", "version-date": "2026-05-07", "port-version": 0 }.versions/baseline.json: registertts-cppat2026-05-07#0alphabetically betweentokenizers-cppandvcpkg-cmake; bumpggml-speechandparakeet-cppbaselines toport-version: 1.Design notes (preempting review questions)
These call out the deliberate choices that look unusual at first glance — flagging here so re-review doesn't re-litigate them.
Why ship all three changes in one PR?
parakeet-cpp,tts-cpp, andggml-speechare three rungs of a single speech-stack baseline:qvac3/packages/parakeet-ggml(existing addon) consumesparakeet-cpp+ggml-speech.qvac3/packages/tts-ggml(new addon) consumestts-cpp+ggml-speech.ggml-speechport-version, so the registry flip is atomic.Splitting these into three PRs would mean three coordinated registry merges for one addon release, with each intermediate state under-tested (e.g. a baseline where
parakeet-cpppv1 ships againstggml-speechpv0, ortts-cppis registered before its dependency exists at the right port-version). The author tested all three together; the registry should land them together too.Why is
-DTTS_CPP_OPENMP=OFFforced in the portfile?Mirrors the
-DPARAKEET_OPENMP=OFFflag in the siblingparakeet-cppport. The upstreamtts-cpp/CMakeLists.txtexposesoption(TTS_CPP_OPENMP ... ON)and conditionally appendsfind_dependency(OpenMP COMPONENTS CXX)to the generatedshare/tts-cpp/tts-cppConfig.cmakewhenOpenMP_CXX_FOUNDis true at install time.Leaving the option at its upstream default would have two effects on Linux / macOS / Android / iOS triplets (where OpenMP is auto-discovered by the vcpkg toolchain):
libtts-cpp.aends up linking libomp/libgomp transitively into the consumer's binary.tts-cppConfig.cmakecarriesfind_dependency(OpenMP COMPONENTS CXX), forcing every downstreamfind_package(tts-cpp)consumer to also locate OpenMP at configure time.Forcing
OFFkeeps the consumer surface uniform across triplets. The Windows non-MinGW guard upstream auto-disables it anyway, so this only normalizes the non-Windows triplets to the same shape. CAMPPlus is the only TU that uses#pragma omp parallel forand runs once per voice-encode at session init — bounded perf cost.Why pin
tts-cpp'sversion>=toggml-speech 2026-04-09#1(not the looser2026-04-09)?tts-cpp's addon path on Android requires the hybrid dynamic backend mode (libqvac-speech-ggml-vulkan.so+libqvac-speech-ggml-opencl.soasdlopen-able modules alongside the per-CPU-feature.sos) and on Windows requires the Vulkan.pcacherename fix to avoid the cache being frozen at first-write size. Both land inggml-speechpv1 from this PR.Pinning
version>=to2026-04-09#1(with the explicit#1port-version separator vcpkg supports) means a downstream that overrides the registry baseline can't accidentally droptts-cppagainstggml-speechpv0 — a config the author hasn't validated.Why is
parakeet-cpp'sversion>=onggml-speechleft at the looser2026-04-09(no port-version pin)?Asymmetric on purpose.
parakeet-cpppv1 is a pure source-level addition (bool starts_wordonStreamingSegment) that doesn't actually require any newggml-speechfunctionality to work — the new field is set by parakeet-cpp's own SentencePiece detokenizer code, not by ggml. So a consumer ofparakeet-cpppv1 againstggml-speechpv0 is technically a valid configuration.Bumping the constraint to
2026-04-09#1for parakeet-cpp would force aparakeet-cpppv2 (becauseparakeet-cpp/vcpkg.jsonis already registered inversions/p-/parakeet-cpp.jsonat pv1 with the current tree). That extra port-version churn isn't worth it for a soft tightening — the registry baseline update in this PR already moves all default consumers toggml-speechpv1.Why does the
tts-cppportfile forceBUILD_SHARED_LIBS=OFFregardless of triplet linkage?Three reasons:
tts-cpp/CMakeLists.txtdeclaresoption(TTS_CPP_BUILD_SHARED ... OFF)separately fromBUILD_SHARED_LIBSbecause ggml's own CMake declares its ownoption(BUILD_SHARED_LIBS)which pollutes the cache with a platform-dependent default once any configure has run. Forcing both off keeps the linkage deterministic per the project-namespaced option.tts-cpptest harnesses (test-mtl-tokenizer,test-supertonic-*) link againsttts-cppdirectly and use detail-namespaced symbols outside theTTS_CPP_APIpublic surface. SHARED hides them and disables those targets — but the port already setsTTS_CPP_BUILD_TESTS=OFF, so this is moot here.parakeet-ggml,tts-ggml) link the static archive into a single Bare addon.so/.dll/.dylib. A sharedtts-cppwould mean two GGML symbol exports in the addon (one throughtts-cpp.soand one through whatever else picks upggml-speech), which the loader doesn't deduplicate cleanly.VCPKG_POLICY_MISMATCHED_NUMBER_OF_BINARIESat the top of the portfile suppresses the warning vcpkg would otherwise emit on dynamic triplets when the produced binary count doesn't match the triplet's expectation.Why is the
cudafeature ontts-cppvalid even though the portfile doesn't probenvcc?tts-cppbuilds entirely on top ofggml-speechviafind_package(ggml CONFIG)(TTS_CPP_USE_SYSTEM_GGML=ON). It compiles no CUDA source itself; passing-DGGML_CUDA=ONto its CMake only flips theGGML_USE_CUDAdefine on thetts-cpp-backend-defsINTERFACE library so consumers can dispatch through the appropriate backend headers. The actual CUDA backend lives inggml-speech, which does thenvcclookup in its portfile. Feature-dependency chaining (cudaontts-cppimpliescudaonggml-speech) ensures the runtime CUDA backend exists when the consumer enables the feature.Why don't the default-features include
cudaanywhere?CUDA is opt-in across the registry's speech-stack ports. Most desktop / laptop targets that have a CUDA-capable GPU also have a working Vulkan stack, and Vulkan covers more of the
ggml-speechtest matrix (Linux + Windows + Android). Forcing CUDA into default-features would also forceggml-speech[cuda]into every Linux / Windows install, which requires a working CUDA toolkit on the build host. Mirrors the existingparakeet-cppprecedent.Test plan
tetherto/qvac-ext-lib-whisper.cpp@0b446740(tts-cpp source) →9eb64d8d…405964ff.tetherto/qvac-ext-lib-whisper.cpp@761eca0c(parakeet-cpp source) →ffe69b99…7eb7524d.tetherto/qvac-ext-ggml@de7a55e3(ggml-speech source) →16058815…4fc52a74.versions/{g,p,t}-/*.jsonmatchgit rev-parse <pr-head>:ports/<port>forggml-speech,parakeet-cpp,tts-cpp.versions/baseline.jsoninsertion is alphabetically correct (tts-cppbetweentokenizers-cppandvcpkg-cmake); both pv1 bumps are reflected.vcpkg install tts-cppfrom a manifest-mode consumer pullsggml-speechpv1 +tts-cpppv0 and produces a cleanshare/tts-cpp/tts-cppConfig.cmakethat depends only onfind_dependency(ggml CONFIG)(no transitivefind_dependency(OpenMP)after the-DTTS_CPP_OPENMP=OFFenforcement).find_package(tts-cpp CONFIG REQUIRED)from a downstream CMake project resolvestts-cpp::tts-cppas a static archive that links againstggml::ggmlfromggml-speechpv1, on Linux (Vulkan), macOS (Metal), Windows (Vulkan), and Android (opencl + vulkan).vcpkg install ggml-speechonarm64-androidproduces bothlibqvac-speech-ggml-vulkan.soandlibqvac-speech-ggml-opencl.soin thelib/install directory alongside the per-CPU-featurelibqvac-speech-ggml-cpu-android_armv*_n.soset (proof of the new hybrid dynamic-backend mode landing).vcpkg install ggml-speechon Windows (Vulkan): repeatedchatterbox.cppruns grow.pcachepast first-write size on the second-and-later flushes (proof of therename(2)→std::filesystem::renamefix landing).qvac3/packages/parakeet-ggmlbuilds againstparakeet-cpppv1 +ggml-speechpv1;StreamingSegment.starts_wordis plumbed through to the JS API.qvac3/packages/tts-ggmlbuilds against the newtts-cppport +ggml-speechpv1 on Linux x86_64 (Vulkan), macOS (Metal), Windows MSVC (Vulkan), Android arm64 (OpenCL + Vulkan); chatterbox-mtl German seed-42 reproduction (the worst case for the 3-identical-token bug) ends cleanly without the trailing 40 s of silence/hissing.parakeet-cppconsumers that don't readstarts_wordare byte-equivalent — additive ABI only.Related
tetherto/qvac-ext-lib-whisper.cpp#14— adds thetts-cpp/subtree, the MTL SOT/EOT + 3-token-stop fixes, and theparakeet-cppstarts_wordfield, all in one merge.tetherto/qvac-ext-ggml#7— adds persistentVkPipelineCache, the hybrid backend packaging cherry-pick, and the Windows-correct.pcacherename.gianni-cor/chatterbox.cpp— bundled-ggml flow withsetup-ggml.sh+patches/, used as the source-of-truth for development before each subtree drop.qvac3/packages/tts-ggml(new),qvac3/packages/parakeet-ggml(existing).libqvac-ggml-*vslibqvac-speech-ggml-*vslibqvac-diffusion-ggml-*): seeggml-speechportfile and the upstreamqvac-ext-ggml#7design notes.