Add tts-cpp/ subtree (Chatterbox Turbo + Multilingual + Supertonic TTS) + integration fixes#14
Merged
Merged
Conversation
fe33e3b to
99188e5
Compare
…l-org#6) The standalone setup-ggml.sh + patches/ tooling was dropped from qvac-ext-lib-whisper.cpp/tts-cpp/ in the integration commit, but the CMakeLists.txt still: * defaulted TTS_CPP_USE_SYSTEM_GGML=OFF, and * unconditionally compile-defined GGML_BACKEND_DL_PROJECT_PREFIX="speech-" on the bundled ggml target. That combination quietly broke standalone bundled-ggml builds: the filename-prefix patch was no longer applied, so libspeech-ggml-*.so files existed on disk but ggml's runtime loader still searched for libggml-*.so under GGML_BACKEND_DL=ON. Vulkan / OpenCL / CUDA backends silently failed to load on Android. Fix per reviewer guidance: converge the speech stack on a single ggml source-of-truth. Standalone-bundled-ggml is no longer a supported build mode out of this in-tree subtree; the canonical path is `-DTTS_CPP_USE_SYSTEM_GGML=ON` against the QVAC speech-stack `ggml-speech` vcpkg port (qvac-ext-ggml/speech branch), which ships the patches pre-applied. Edits: - TTS_CPP_USE_SYSTEM_GGML default flipped from OFF to ON in this tree. Docstring spells out the rationale + points users at the standalone github.com/gianni-cor/chatterbox.cpp repo if they need a bundled-ggml dev build with patches/ present. - The bundled-ggml branch of `if (NOT TARGET ggml)` now refuses to configure when patches/ is absent: a FATAL_ERROR points at the right consumption path (vcpkg ggml-speech) and the standalone fallback. Doesn't break in-tree-with-patches builds (parakeet-cpp in this same repo still ships patches/, so its bundled path is unaffected by this guard inside tts-cpp). - Verified locally: `cmake -S tts-cpp -B build` (no flags) errors out at find_package(ggml CONFIG REQUIRED) with our new message pointing at the ggml-speech port; `cmake -S tts-cpp -B build -DTTS_CPP_USE_SYSTEM_GGML=OFF` errors out at the patches/ guard with the no-patches message. - tts-cpp/scripts/setup-ggml.sh deleted: it referenced patches/ that no longer exist; running it would have errored out anyway. The standalone repo keeps its own setup-ggml.sh; only the in-tree subtree drops it. The standalone chatterbox.cpp repo (the one tts-cpp/ was copied from) keeps TTS_CPP_USE_SYSTEM_GGML=OFF default + the patches/ folder + scripts/setup-ggml.sh. This commit is therefore an integration-time delta against that source, not a change to the standalone build flow. Co-authored-by: Cursor <cursoragent@cursor.com>
) The README was a verbatim copy of the standalone chatterbox.cpp repo, which makes it read as 'I cloned the wrong repo' to anyone landing on tts-cpp/ inside qvac-ext-lib-whisper.cpp. Per the reviewer's two-line ask: rewrite section 1 + global s/chatterbox.cpp/tts-cpp where it's a directory or repo-name reference (kept where it points at the upstream chatterbox.cpp project itself). Edits: - Title changes from `# chatterbox.cpp` to `# tts-cpp` plus a blockquote note up top: this is the in-tree subtree of github.com/gianni-cor/chatterbox.cpp; the integration drops setup-ggml.sh + patches/, ggml comes through the qvac-ext-ggml speech-branch vcpkg port, see section 1 for the build flow. - Section 1 (was '## 1. Clone and build', the standalone clone + setup-ggml.sh + patches/ flow) replaced with '## 1. Build from the qvac speech stack': * one find_package(tts-cpp CONFIG REQUIRED) cmake snippet for downstream consumption; * one cmake -S tts-cpp -B build -DCMAKE_TOOLCHAIN_FILE=vcpkg.cmake flow for in-tree dev; * pointer at the standalone github.com/gianni-cor/chatterbox.cpp repo for anyone needing a bundled-ggml dev build. Drops the entire setup-ggml.sh paragraph + GPU-acceleration paragraph that referenced patches/. - 'Useful CMake options' table: TTS_CPP_USE_SYSTEM_GGML row default flipped from OFF to 'ON (this in-tree subtree)', cell explains that flipping OFF is rejected here (no patches/) and points at the standalone repo for the OFF default. - 'Alternative: consume ggml from vcpkg' subsection collapsed to 'How TTS_CPP_USE_SYSTEM_GGML=ON resolves ggml' since it's now the canonical path, not the alternative. Drops the now-stale 'preserves the standalone flow above untouched, opt-in escape hatch for package-manager-driven builds' paragraph. - 'Consumer integration' subsection rewritten from the wrapper-port perspective ('this in-tree subtree IS the wrapper port') instead of the standalone perspective ('downstream projects consume through the wrapper port'). - Benchmark tables (Mac M3 Ultra + Linux RTX 5090): four '`chatterbox.cpp` Q4_0' implementation-name cells become '`tts-cpp` Q4_0'; the '`chatterbox.cpp` (Metal) is...' / '`chatterbox.cpp` (Vulkan) is...' captions follow. - Repository layout tree: root dir name `chatterbox.cpp/` becomes `tts-cpp/` with a one-line caveat naming the standalone source- of-truth. Drops the `ggml/` entry (no bundled ggml in this subtree by default), drops the `setup-ggml.sh` line under scripts/ (the file no longer exists - removed in the previous commit), updates the chatterbox_cli.cpp comment from 'tts-cli + chatterbox binaries' to 'tts-cli binary' since the back-compat chatterbox alias is dropped in the standalone source too. - One '# Build chatterbox.cpp, then:' bash comment in the reproduction snippet becomes '# Build tts-cpp, then:'. - Lower 'tts-cli / chatterbox binaries' API-overview phrasing becomes 'tts-cli binary' to match the actual built artefact. Five `chatterbox.cpp` references stay on purpose: the title-card URL, the section-1 'use the standalone repo' pointer, the useful-cmake-options note about the OFF default, the how-system-ggml-resolves prose, and the repo-layout caveat. Each one points at the upstream project github.com/gianni-cor/chatterbox.cpp by URL/name, not at this directory. No code changes; README.md only. Co-authored-by: Cursor <cursoragent@cursor.com>
…gml-org#27) Adds a small 'QVAC speech-stack ports' section between the upstream whisper.cpp intro media and the 'Quick start' section, pointing at the two in-tree subtrees this fork carries: - tts-cpp/ - Chatterbox (Turbo + Multilingual) + Supertonic TTS, in-tree subtree of github.com/gianni-cor/chatterbox.cpp. - parakeet-cpp/ - NVIDIA Parakeet FastConformer ASR + Sortformer diarization, in-tree subtree of the parakeet.cpp standalone repo. Both consume ggml through the `ggml-speech` vcpkg port (the qvac-ext-ggml/speech branch). Each subtree has its own README, build flow, and public C++ API; the upstream whisper.cpp build below the new section is unaffected. Closes review ggml-org#27 ('one-line pointer to tts-cpp/ from the top-level qvac-ext-lib-whisper.cpp/README.md'). The reviewer specifically asked for tts-cpp; included parakeet-cpp at the same time so a future 'fix the un-fixed parakeet-cpp version of this bullet' commit doesn't need to revisit the same paragraph. Co-authored-by: Cursor <cursoragent@cursor.com>
Re-syncs the in-tree subtree with the standalone chatterbox.cpp
source-of-truth after seven round-3 review items landed there. The
diff was generated from chatterbox.cpp commits 2d3632b..0a5ad2d and
applied with `git apply --directory=tts-cpp/`; no path-level
conflicts because the subtree was last copied from the same source.
Mirrored commits (chatterbox.cpp side):
- ef0eb36 supertonic: alive-registry guards thread_local cache
teardown vs freed backend (N1)
- fcbff16 engine: Turbo BPE try/catch + drop dead cached_text_lc +
clarify view-vs-copy log (N3 + N6 + N7)
- 055ce84 log: drop dead g_sink_* state, soften thread-safety
docstring (N2)
- 75fbd22 s3gen: cancel checkpoint between STFT and HiFT + tighten
Engine::cancel() doc (N4)
- 0a5ad2d s3gen: document s3gen_preload/unload refcount semantics
on the public header (N5)
Files touched (11):
include/tts-cpp/chatterbox/engine.h (N4 docstring)
include/tts-cpp/chatterbox/s3gen_pipeline.h (N5 docstring)
include/tts-cpp/log.h (N2 docstring)
src/chatterbox_engine.cpp (N3 try/catch)
src/chatterbox_tts.cpp (N4 stft cancel + N7 log)
src/log.cpp (N2 dead-state drop)
src/supertonic_gguf.cpp (N1 alive-registry)
src/supertonic_internal.h (N1 helper API)
src/supertonic_text_encoder.cpp (N1 free-cache gate)
src/supertonic_vector_estimator.cpp (N1 + N6)
src/supertonic_vocoder.cpp (N1 free-cache gate)
The two integration-only review items (N8 unreachable LIB_PREFIX
block, N10 stale patches/ refs in README) land in separate commits
on this branch since they don't correspond to chatterbox.cpp
changes. N9 (per-call seed override) and N11 (richer backend_name)
were dropped per user direction.
Build verification was done on chatterbox.cpp's standalone build (the
source-of-truth); not re-built here because TTS_CPP_USE_SYSTEM_GGML
defaults ON in this in-tree subtree and requires the ggml-speech
vcpkg port installed to configure.
Co-authored-by: Cursor <cursoragent@cursor.com>
After commit fa0d490 (review ggml-org#5+ggml-org#6) made bundled-add_subdirectory(ggml) hard-error in this in-tree subtree when patches/ is absent, the TTS_CPP_GGML_LIB_PREFIX block became dead code: if (TTS_CPP_GGML_LIB_PREFIX AND NOT TTS_CPP_USE_SYSTEM_GGML) NOT TTS_CPP_USE_SYSTEM_GGML can never reach this `if` here - configure has already FATAL_ERROR'd at the patches/-absent guard. The option, the helper function, the foreach loop, the GGML_BACKEND_DL_PROJECT_PREFIX define, and the STATUS message were all unreachable. The next maintainer flipping -DTTS_CPP_GGML_LIB_PREFIX=OFF to disable prefixing would have been silently confused when nothing changed. Edits: tts-cpp/CMakeLists.txt: - The option() declaration at line 22 removed. Replaced with a one-paragraph cross-reference to the standalone chatterbox.cpp repo for the locally-rename flow + the rationale (ggml-speech vcpkg port emits the libspeech-ggml-* filenames itself). - The 41-line block at lines 131-176 (tts_cpp_apply_ggml_prefix function + foreach + target_compile_definitions + STATUS message) replaced with a 9-line note telling future readers where the standalone counterpart lives. tts-cpp/README.md: - Useful CMake options table row for TTS_CPP_GGML_LIB_PREFIX rewritten with a strikethrough + "n/a in this subtree" cell: explains the standalone option exists at chatterbox.cpp upstream, why it's unnecessary here (ggml-speech vcpkg port handles the rename at its own build time), and that the file-prefix surface is whatever vcpkg installs. Doc-only behavior visible to consumers: the integrated subtree no longer has a TTS_CPP_GGML_LIB_PREFIX option at all. Build behaviour unchanged - the vcpkg find_package path was already taking effect and emitting libspeech-ggml-* as designed. Co-authored-by: Cursor <cursoragent@cursor.com>
Two spots in the README still pointed at a `patches/` directory that
isn't in this in-tree subtree (deleted in the integration commit;
the ggml-speech vcpkg port carries the equivalent pre-applied):
(a) §3.24-§3.30 Metal optimisation explanation: "Patch
`patches/ggml-metal-chatterbox-ops.patch` (1088 lines) applies
cleanly on a fresh ggml clone at pinned `58c38058`." Reads as
if the file lives at this subtree's patches/ today.
(b) The "Repository layout" project-tree diagram listed
`patches/ggml-metal-chatterbox-ops.patch` /
`ggml-opencl-chatterbox-ops.patch` / `README.md` as if they were
here.
Edits:
(a) Reworded to "the 1088-line ggml-metal patch backing these
kernel changes is shipped pre-applied by the `ggml-speech`
vcpkg port (qvac-ext-ggml/speech branch); the standalone
chatterbox.cpp repo carries it under
`patches/ggml-metal-chatterbox-ops.patch` against pinned ggml
`58c38058`." Same technical claim, accurate provenance for
this subtree.
(b) The patches/ block in the project-tree diagram replaced with a
parenthetical note pointing at the standalone repo for the
locally-applied flow.
The other five `patches/` mentions in the README (lines 5, 352, 362,
376, 434) are deliberate cross-references to the standalone
chatterbox.cpp repo or describe the
"flipping TTS_CPP_USE_SYSTEM_GGML=OFF rejected because patches/ is
absent here" rationale. Those stay.
Doc-only; no code or build behaviour change.
Co-authored-by: Cursor <cursoragent@cursor.com>
….cpp
System-ggml build of this in-tree subtree was failing in the
ggml-speech vcpkg port because the standalone source included the
internal ggml/src/ggml-quants.h header which isn't installed by
ggml-speech. The standalone chatterbox.cpp source was just bumped
to use ggml_get_type_traits() + tr->to_float instead, mirroring the
parakeet.cpp pattern.
Mirrored from chatterbox.cpp commit edf9e50 via
\`git apply --directory=tts-cpp\` against the standalone diff.
src/supertonic_gguf.cpp:
- Drop \`#include "ggml-quants.h"\`.
- expand_supertonic_tensor_to_f32() now uses
ggml_get_type_traits(src->type)->to_float instead of the
direct ggml_fp16_to_fp32_row / dequantize_row_q8_0 calls.
No public API change; runtime behaviour is bit-equivalent because
to_float dispatches into the same row dequantizers internally.
The qvac-registry-vcpkg/ports/tts-cpp portfile + version bump to
pick up this commit lands in a follow-up.
Co-authored-by: Cursor <cursoragent@cursor.com>
…rbox.cpp
Mirrors chatterbox.cpp commit e481901 to the in-tree subtree.
tts-cpp builds as a STATIC archive by default and links OpenMP as
PRIVATE; install(EXPORT) records that as an
IMPORTED_LINK_DEPENDENT_LIBRARY in tts-cppTargets.cmake, so
consumers doing find_package(tts-cpp CONFIG REQUIRED) failed at
target-property time with
The link interface of target "tts-cpp::tts-cpp" contains:
OpenMP::OpenMP_CXX
but the target was not found.
That hit qvac3/packages/tts-ggml after the integrated tts-cpp
vcpkg port @ 2026-05-07#0 finally compiled and installed.
Fix: tts-cppConfig.cmake re-imports OpenMP via find_dependency
before including tts-cppTargets.cmake; conditionally injected so
the dep is only required of consumers when OpenMP was actually
found and linked at build time.
tts-cpp/cmake/tts-cppConfig.cmake.in:
Add @TTS_CPP_OPTIONAL_DEPS@ substitution slot directly after
the existing find_dependency(ggml CONFIG).
tts-cpp/CMakeLists.txt (install block):
Build TTS_CPP_OPTIONAL_DEPS by appending "find_dependency(OpenMP)\n"
iff OpenMP_CXX_FOUND, otherwise leave empty;
configure_package_config_file substitutes it in. Backwards-
compatible with builds where OpenMP isn't available
(find_package(OpenMP) is non-REQUIRED).
The qvac-registry-vcpkg/ports/tts-cpp port-version 0 entry will be
amended in place to point at this commit (pre-merge convention:
single squashed commit + force-push until upstream merge).
Co-authored-by: Cursor <cursoragent@cursor.com>
…rbox.cpp Mirrors chatterbox.cpp commit c91f2d9. Follow-up to commit e8f6065. The unscoped find_dependency(OpenMP) emitted into tts-cppConfig.cmake by the previous fix made consumers' CMake also probe OpenMP_C, which fails on bare-make's clang-cl-style toolchain even when CXX-side OpenMP is fine: Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES) ...share/tts-cpp/tts-cppConfig.cmake:29 (find_dependency) tts-cpp only links OpenMP::OpenMP_CXX, never the C variant. Fix: in tts-cpp/CMakeLists.txt install block, change the line that appends to TTS_CPP_OPTIONAL_DEPS so it emits find_dependency(OpenMP COMPONENTS CXX) instead of bare find_dependency(OpenMP). CMake's FindOpenMP module respects COMPONENTS and scopes the probe to that language only; OpenMP::OpenMP_CXX is still imported, OpenMP_C is not required. The qvac-registry-vcpkg/ports/tts-cpp port-version 0 entry will be amended in place to point at this commit (pre-merge convention). Co-authored-by: Cursor <cursoragent@cursor.com>
Mirrors chatterbox.cpp commit e6031b2.
Replaces the bare find_package(OpenMP) call with parakeet-style
gating so OpenMP auto-defaults OFF on Windows non-MinGW (the
toolchain combination where vcpkg's MSVC port build ends up
linking OpenMP::OpenMP_CXX into the static-archive transitive
interface, only for consumers - including bare-make's clang-cl
CMake - to fail re-probing OpenMP_CXX or OpenMP_C at
find_package(tts-cpp) time).
Edit (tts-cpp/CMakeLists.txt, ~line 150):
option(TTS_CPP_OPENMP "tts-cpp: enable OpenMP for the tts-cpp
target" ON)
if (WIN32 AND NOT MINGW AND TTS_CPP_OPENMP
AND NOT DEFINED CACHE{TTS_CPP_OPENMP_USER_OVERRIDE})
set(TTS_CPP_OPENMP OFF CACHE BOOL "" FORCE)
message(STATUS "...")
endif()
if (TTS_CPP_OPENMP)
find_package(OpenMP)
endif()
Net effect inside the qvac-registry-vcpkg/ports/tts-cpp port
build (x64-windows triplet, vcpkg's MSVC toolchain): OpenMP_CXX
is never searched, the target_link_libraries(... PRIVATE
OpenMP::OpenMP_CXX) lines are skipped, the install(EXPORT)
emits no OpenMP transitive dep, and tts-cppConfig.cmake's
@TTS_CPP_OPTIONAL_DEPS@ slot stays empty (no
find_dependency(OpenMP) is generated). Consumer toolchains with
broken or missing OpenMP detection are no longer blocked.
Trade-off: the 9 #pragma omp parallel for loops in
src/campplus.cpp run serially in this build mode. CAMPPlus
preprocessing is a small fraction of total synth time; the perf
delta is bounded. Override available via
-DTTS_CPP_OPENMP_USER_OVERRIDE=ON -DTTS_CPP_OPENMP=ON for
toolchains that do have working CXX OpenMP.
The qvac-registry-vcpkg/ports/tts-cpp port-version 0 entry will
be amended in place to point at this commit (pre-merge
convention: single squashed commit + force-push until upstream
merge).
Co-authored-by: Cursor <cursoragent@cursor.com>
Mirrors chatterbox.cpp commit 8c849cc. Adds tts-cpp/include/tts-cpp/backend.h with the BackendDevice enum (CPU = 0, GPU = 1) and a backend_device() method on both chatterbox::Engine and supertonic::Engine. Implementation routes through the ggml backend registry (ggml_backend_get_device + ggml_backend_dev_type) so it works in both GGML_BACKEND_DL modes. Same shape as parakeet.cpp's parakeet::Engine::backend_device(), matched intentionally so the qvac3 tts-ggml addon can mirror ParakeetModel's load-time backend resolution (read backend_device() + backend_name(), map to backendIdFromName(), expose both on JS via RuntimeStats). See chatterbox.cpp commit 8c849cc message for the full technical rationale. The qvac-registry-vcpkg/ports/tts-cpp port-version 0 entry will be amended in place to point at this commit (pre-merge convention). Co-authored-by: Cursor <cursoragent@cursor.com>
… chatterbox.cpp
Mirrors chatterbox.cpp commit 78ae3c5.
Engine::synthesize_batch now gates apply_trim_fade on the actual
presence of a voice override (reference_audio path or voice_dir).
When both are empty - i.e. the chatterbox::Engine built-in-voice
default that loads s3gen/builtin/{embedding,prompt_token,prompt_feat}
from the GGUF - apply_trim_fade is false so the first 40 ms of
synthesized speech is no longer zeroed + faded.
This unblocks the chatterbox-mtl variant in particular: its
upstream conds.pt produces audio with zero leading silence, and
the previous unconditional apply_trim_fade was clipping the
leading consonant of the first word ("Hello" -> "lo", "El" -> "l",
"A" -> nothing) under that configuration. See chatterbox.cpp
commit 78ae3c5 for the full diagnosis + empirical confirmation.
Reference-audio / voice_dir paths keep apply_trim_fade=true and
behave exactly as before; streaming path is unchanged.
The qvac-registry-vcpkg/ports/tts-cpp port-version 0 entry will
be amended in place to point at this commit (pre-merge convention:
single squashed commit + force-push until upstream merge).
Co-authored-by: Cursor <cursoragent@cursor.com>
…ment Adds `bool starts_word` to `parakeet::StreamingSegment`, set true when the segment's first token's piece carries the SentencePiece "▁" word- boundary marker (U+2581) and false when it is a wordpiece continuation. Streaming consumers can use the flag to decide whether to insert a space between successive segments without re-parsing whitespace from `seg.text` (the inner detokenizer strips leading whitespace at the session level, which loses the signal for the chunk that opens a session). With the flag, "see" + "if" stays as "see if" while the chunk-boundary split "pun" + "ctuation" rejoins as "punctuation". Also exposes `bool token_is_word_start(BpeVocab, int32_t)` from sentencepiece_bpe.h so other engines that build their own segments (EOU per-utterance, attributed) can stamp the flag the same way. Defaults `starts_word = true` so existing callers that ignore the field see no behavioural change. Co-authored-by: Cursor <cursoragent@cursor.com>
…_token
Mirrors src/chatterbox_cli.cpp's MTL tokenisation path and the Python
ChatterboxMultilingualTTS.generate reference (chatterbox-ref/src/chatterbox/
mtl_tts.py:288-291). The MTL T3 prompt graph anchors position 0 on
start_text_token (255); without it the autoregressive decode drops the
first speech tokens, audible as a missing leading syllable
("Hello" -> "lo from the multilingual").
Turbo (gpt2_bpe) is unaffected and keeps the existing single-line tokenise
+ punc_norm path.
Co-authored-by: Cursor <cursoragent@cursor.com>
…gine::run_t3 MTL T3 occasionally emits a plausible end-of-speech silence cadence (three identical tokens in a row) mid-utterance and then hallucinates low-energy content -- silence, hissing, garbage tokens -- until n_predict (1000) is reached, producing ~40 s of trailing junk on a short input. chatterbox_cli.cpp already guards against this via the AlignmentStreamAnalyzer token_repetition port, but Engine::run_t3 was missing the same check, so the addon path (which doesn't go through the CLI) saw the regression on whichever language/seed combinations happen to hit the cadence (most reliably reproduced on German with the default seed=42). Mirrors the CLI's existing guard 1:1, gated on is_mtl since the Turbo codebook has a different cadence signature. Co-authored-by: Cursor <cursoragent@cursor.com>
10 tasks
5 tasks
gianni-cor
pushed a commit
that referenced
this pull request
May 28, 2026
Add tts-cpp/ subtree (Chatterbox Turbo + Multilingual + Supertonic TTS) + integration fixes
pratiknarola-t
added a commit
that referenced
this pull request
May 28, 2026
…in init_gpu_backend On Adreno + PR #14/#15 the policy correctly picks OpenCL and Chatterbox runs to completion. On Vulkan-on-Mali (Google Pixel 9 Pro XL / Tensor G4) ggml_backend_dev_init throws an unhandled C++ exception during pipeline init, which bubbles up to libc++abi::terminate() and SIGABRT crashes the host process before the caller can react. Wrap the call in try-catch inside try_init: on any exception, log verbosely and 'continue' to the next candidate; if every candidate in a bucket throws or returns null, the lambda returns nullptr and the policy proceeds to the next bucket. After all buckets fail init_gpu_backend returns nullptr and the caller falls back to CPU -- which is exactly what 'no usable GPU available' should mean. Defensive layer that handles any future bad-GPU vendor (not Mali specific): SIGABRT during GPU init is never an acceptable failure mode for a TTS engine that has a working CPU path. Validated against Pixel 9 Pro XL on AWS Device Farm via the QVAC-19254 [DO NOT MERGE] test PR (tetherto/qvac#2320). QVAC-19254
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the
tts-cpp/in-tree subtree alongside the existingparakeet-cpp/subtree, completing the QVAC speech stack inside this whisper.cpp fork.tts-cpp/is a port ofgianni-cor/chatterbox.cpp— Resemble AI's Chatterbox TTS (Turbo, English-only; Multilingual, 18 tier-1 languages) plus Supertonic TTS — running onggmlwith CPU / Metal / Vulkan / CUDA backends, no Python/PyTorch dependency.The PR also wires the subtree into the QVAC speech-stack consumption pattern (
ggml-speechvcpkg port fromqvac-ext-ggml/speech), folds in the round-3 review fixes that landed in the standalone repo after the initial port, and adds one parakeet-cpp streaming-API change required by downstream consumers.tts-cpp→masterWhat's in the PR
1.
tts-cpp/subtree drop (commitef840d5)Initial squashed import from the standalone
chatterbox.cppsource-of-truth. Layout followsparakeet-cpp/:tts-cpp/include/tts-cpp/— public C++ API (chatterbox/engine.h,supertonic/engine.h,backend.h,log.h,tts-cpp.h,export.h).tts-cpp/src/— implementation (Chatterbox Turbo + MTL T3, Supertonic flow + vocoder, S3Gen, S3Tokenizer, CAMPPlus voice encoder, mel2wav HiFT, MTL tokenizer, GPT-2 BPE, log).tts-cpp/test/— gtest suite mirroring each engine stage (T3, S3Gen, vocoder, CAMPPlus, MTL tokenizer, streaming, etc.).tts-cpp/scripts/— Python conversion scripts (HF safetensors → GGUF for T3-Turbo / T3-MTL / S3Gen / Supertonic2; reference-trace dumps for diffing against PyTorch; voice extraction).tts-cpp/CMakeLists.txt+tts-cpp/cmake/tts-cppConfig.cmake.in— install/export, OpenMP detection,find_package(tts-cpp CONFIG)consumer surface.tts-cpp/README.md+tts-cpp/PROGRESS.md+tts-cpp/PROGRESS_SUPERTONIC.md— developer docs.Performance summary (full table in
tts-cpp/README.md):--cfm-steps 7: 1.05 s wall, 48.4× faster than ONNX Runtime CPU Q4.2. Top-level
README.mdpointer (commita2f2dd6)Adds a small "QVAC speech-stack ports" section between the upstream whisper.cpp intro media and the Quick start, pointing at both
tts-cpp/andparakeet-cpp/as in-tree subtrees with their own READMEs / build flows / public C++ APIs. Upstream whisper.cpp build below is unaffected.3. Integration deltas vs the standalone repo (commits
fa0d490,ae34c58,8ba10a6,e673182)The standalone
chatterbox.cppships withscripts/setup-ggml.sh+patches/for bundled-ggml dev builds. Per reviewer guidance, the in-tree subtree is converged on a single ggml source-of-truth — the QVAC speech-stackggml-speechvcpkg port (which ships the patches pre-applied) — instead of carrying a parallel patches tree:fa0d490(review #5+#6): defaultTTS_CPP_USE_SYSTEM_GGML=ONin this subtree; bundled-ggml branch hard-errors with a pointer at the canonical consumption path; deletestts-cpp/scripts/setup-ggml.sh(it referenced patches/ that no longer exist; it would have errored out anyway).ae34c58(review #26): rewritetts-cpp/README.md§1 from "clone + setup-ggml.sh + patches" to "build via the QVAC speech stack"; flip theTTS_CPP_USE_SYSTEM_GGMLrow default in the cmake-options table; rewriteConsumer integrationfrom the standalone perspective to the wrapper-port perspective; relabel benchmark and code-tree references fromchatterbox.cpp/totts-cpp/where they refer to this directory (kept where they refer to the upstream project itself).8ba10a6(review N8): drop the now-unreachableTTS_CPP_GGML_LIB_PREFIXblock (thefind_package(ggml)path resolveslibspeech-ggml-*natively from the vcpkg port, the local-rename helper is dead code in this subtree). README row updated with a strikethrough + "n/a in this subtree" note pointing at the standalone repo for the OFF path.e673182(review N10): scrub two stalepatches/references fromtts-cpp/README.md(§3.24 Metal-optimisation explanation and the repository-layout tree); rewrite to attribute the patch to theggml-speechvcpkg port. Five otherpatches/mentions stay on purpose — they're cross-references to the standalone repo or describe the rejection rationale.4. Round-3 review fixes mirrored from standalone (commits
4b5d2d7,28ef67d,e8f6065,04b87ea,64abb81,1963f9f,942686d)Re-syncs the subtree with the standalone source-of-truth for fixes that landed there after the initial port. Each commit is a 1:1
git apply --directory=tts-cpp/of the corresponding upstream diff:4b5d2d7(review N1–N7): Supertonic alive-registry guardsthread_localcache teardown vs freed backend (N1); Engine BPE try/catch + dead-state cleanup (N3, N6, N7); log docstring softening (N2); s3gen cancel checkpoint between STFT and HiFT +Engine::cancel()doc (N4);s3gen_preload/unloadrefcount semantics on the public header (N5).28ef67d: drop internalggml-quants.hinclude fromsupertonic_gguf.cpp; useggml_get_type_traits()->to_floatinstead. Required becauseggml-speechdoesn't install internal ggml headers. Bit-equivalent runtime.e8f6065:tts-cppConfig.cmake.inre-imports OpenMP viafind_dependencybefore including the targets file. Without this, consumers of the static archive failed at target-property time withOpenMP::OpenMP_CXX … target was not found.04b87ea: scope thefind_dependency(OpenMP)toCOMPONENTS CXX. Barefind_dependency(OpenMP)was probing OpenMP_C in consumers, which fails on bare-make's clang-cl-style toolchain even when CXX-side OpenMP is fine.64abb81:TTS_CPP_OPENMPoption, defaults OFF on Windows non-MinGW (where vcpkg's MSVC toolchain links OpenMP transitively but consumers can't re-probe it). Override via-DTTS_CPP_OPENMP_USER_OVERRIDE=ON -DTTS_CPP_OPENMP=ON. Trade-off: 9#pragma omp parallel forloops incampplus.cpprun serially in this build mode; CAMPPlus is a small fraction of total synth time.1963f9f:Engine::backend_device()public API +BackendDeviceenum onchatterbox::Engineandsupertonic::Engine, mirroringparakeet::Engine::backend_device(). Routes throughggml_backend_get_device + ggml_backend_dev_type; works in bothGGML_BACKEND_DLmodes. Required by qvac3'stts-ggmladdon to mirror ParakeetModel's load-time backend resolution.942686d:synthesize_batchgatesapply_trim_fadeon the actual presence of a voice override. The previous unconditional fade was clipping the leading consonant of the first word ("Hello" → "lo", "El" → "l", "A" → nothing) for the chatterbox-mtl built-in-voice path. Reference-audio / voice_dir paths keep the fade.5. MTL Chatterbox correctness fixes (commits
db87f42,0b44674)Two issues that surfaced once chatterbox-mtl ran end-to-end on real workloads:
db87f42:Engine::run_t3for MTL wraps text tokens withstart_text_token(255) /stop_text_token, matchingchatterbox_cli.cpp's tokenisation path and the PythonChatterboxMultilingualTTS.generatereference (mtl_tts.py:288-291). Without it, the autoregressive decode dropped the first speech tokens, audible as a missing leading syllable. Turbo path is unaffected.0b44674: port the CLI's existing 3-identical-token early-stop guard intoEngine::run_t3(gated onis_mtl). MTL T3 occasionally emits an end-of-speech silence cadence mid-utterance and then hallucinates ~40 s of trailing low-energy junk untiln_predict=1000. The CLI was already guarded; the addon path (which doesn't go through the CLI) was hitting the regression on certain language/seed combinations (most reliably reproduced on German withseed=42).6.
parakeet-cpp/SentencePiece word-start signal (commit761eca0)Adds
bool starts_wordonparakeet::StreamingSegment, set true when the segment's first token's piece carries the SentencePiece▁(U+2581) word-boundary marker. Streaming consumers can use it to decide whether to insert a space between successive segments without re-parsing whitespace fromseg.text(the inner detokenizer strips leading whitespace at the session level). Also exposesbool token_is_word_start(BpeVocab, int32_t)fromsentencepiece_bpe.hso other engines that build their own segments (EOU per-utterance, attributed) can stamp the flag the same way. Defaultsstarts_word = trueso existing callers are byte-equivalent.Bundled into this PR rather than its own because the
parakeet-cpp/consumer inqvac3/packages/parakeet-ggmland thetts-cpp/consumer inqvac3/packages/tts-ggmlship via the sameggml-speechvcpkg port version bump; splitting them would force two coordinated registry flips for a single addon release.Design notes (preempting review questions)
These call out the deliberate choices that look unusual at first glance — flagging here so re-review doesn't re-litigate them.
Why is
TTS_CPP_USE_SYSTEM_GGML=ONthe default in this subtree?Reviewer guidance (
#5,#6): converge the speech stack on a single ggml source-of-truth. Theggml-speechvcpkg port (qvac-ext-ggml/speech) ships the chatterbox-specific Metal / OpenCL patches pre-applied — carrying a parallelpatches/tree inside this subtree would mean two sources of truth for the same patches and a coordination tax on every ggml bump. The bundled-ggml dev flow is preserved in the standalonechatterbox.cpprepo, which keepssetup-ggml.sh+patches/+TTS_CPP_USE_SYSTEM_GGML=OFFdefault; this subtree is the integrated artefact, not a parallel dev environment.-DTTS_CPP_USE_SYSTEM_GGML=OFFhere intentionally hard-errors at configure time with a pointer at both the canonical consumption path (ggml-speechvcpkg port) and the standalone repo for users who need bundled ggml.Why mirror commits from
chatterbox.cppinstead of squashing into the initial port?Two reasons:
ef840d5) was force-pushed five times during the integration cycle (review-iteration round 1 → round 3); each round-3 fix landed on the standalone repo after this PR's force-push window closed. Mirroring them as separate commits keeps the diff against the standalone source-of-truth reviewable:chatterbox.cppcommit hash → tts-cpp commit, one-to-one.If a squashed history is preferred at merge time, GitHub's "Squash and merge" handles it; the per-commit messages are written to survive that.
Why is the
parakeet-cpp/change in this PR?parakeet-cpp/andtts-cpp/ship through the sameggml-speechvcpkg port version bump (single port-version flip inqvac-registry-vcpkg). Consumers (qvac3/packages/parakeet-ggml,qvac3/packages/tts-ggml) bump together. Splittingparakeet-cpp/'sstarts_wordchange into its own PR would force two coordinated registry flips for one release. The change is small (+15lines of API surface, no behaviour change for existing callers) and gated by an additive bool field that defaults totrue.Why is OpenMP defaulted OFF on Windows non-MinGW?
vcpkg's MSVC toolchain port build links
OpenMP::OpenMP_CXXinto the static archive's transitive interface; consumers — including bare-make's clang-cl CMake — then re-probeOpenMP_CXX(orOpenMP_C) atfind_package(tts-cpp)time, and that probe fails on toolchains where CXX-OpenMP isn't auto-detected. Defaulting OFF on the affected toolchain combination keeps the consumer surface portable; the perf cost is bounded to 9#pragma omp parallel forloops incampplus.cpp(CAMPPlus runs once per voice-encode at session init, small fraction of total synth time). Override available for users on toolchains with working CXX OpenMP.Why ~57 K added lines?
The bulk is the standalone
chatterbox.cppsource dropped undertts-cpp/(ef840d5). Major contributors:tts-cpp/src/mtl_unicode_tables.inc— autogenerated NFKD lookup tables (one-time, regenerable viatts-cpp/scripts/gen-nfkd-table.py).tts-cpp/src/dr_wav.h,tts-cpp/src/npy.h— vendored single-header libs (verbatim upstream copies; their licences are intts-cpp/NOTICE).tts-cpp/test/— gtest suite (~22 test files, one per engine stage).tts-cpp/PROGRESS.md,tts-cpp/PROGRESS_SUPERTONIC.md— developer notebooks.Implementation source under
tts-cpp/src/(excluding the autogenerated table and vendored headers) is roughly 12 K lines.Why doesn't this PR bump
qvac-registry-vcpkg/ports/tts-cpp?Per the standalone repo's pre-merge convention: while the PR is open, the port-version 0 entry is force-amended in place to point at the latest tip of this branch. The actual port bump (port-version 0 → port-version 1, or new commit hash for version 0) lands in
qvac-registry-vcpkgafter this PR merges tomaster, in a follow-up PR there.Why are some
chatterbox.cppreferences kept verbatim intts-cpp/README.md?Five intentional ones survived the §1 rewrite (commit
ae34c58):Each one points at github.com/gianni-cor/chatterbox.cpp by URL or repo name, not at this directory. They stay because the standalone repo is the development source-of-truth for the engine code and we want a single grep to find it.
Why is
BackendDeviceshaped exactly likeparakeet::Engine::backend_device()?Intentional API parallelism so the qvac3 addons can share their backend-resolution code path between
tts-ggmlandparakeet-ggml. Both addons readbackend_device()+backend_name()at session init, map through a sharedbackendIdFromName(), and expose the sameRuntimeStatsshape on the JS API. Diverging the C++ API would mean two parallel addon-side wrappers for the same data.Test plan
tts-cpp/builds cleanly via vcpkg with-DTTS_CPP_USE_SYSTEM_GGML=ONagainst theggml-speechport (qvac-ext-ggml/speechbranch).-DTTS_CPP_USE_SYSTEM_GGML=OFFconfigure attempt errors out at thepatches/-absent guard with the documented message pointing at the standalone repo and the vcpkg port.cmake -S tts-cpp -B build(no flags) errors atfind_package(ggml CONFIG REQUIRED)with the reviewer-asked message pointing atggml-speech.tts-cpp/test/gtest suite passes on Linux (CPU + Vulkan), macOS (CPU + Metal), Windows (CPU; Metal/Vulkan N/A).tts-cpp/README.mdperformance table (Vulkan RTX 5090, Metal M3 Ultra, CPU Ryzen 9 9950X, CPU M3 Ultra NEON).--cfm-steps 7and default-cfm-stepsrows on Metal M3 Ultra and M4.0b44674): no trailing-silence-then-hallucination tail; output ends cleanly at sentence end.db87f42): "Hello from the multilingual" no longer renders as "lo from the multilingual" on the addon path.apply_trim_fadegate from942686d): "El", "A", "Hello" first words preserved on the chatterbox-mtl built-in path; reference-audio path unchanged.find_package(tts-cpp CONFIG REQUIRED)from a downstream CMake project resolves the static archive plus the OpenMP transitive dep with theCOMPONENTS CXXscope (04b87ea), on Linux, macOS, and Windows MSVC + clang-cl.Engine::backend_device()on Vulkan/CUDA/Metal returnsBackendDevice::GPU; on CPU returnsBackendDevice::CPU. Matchesparakeet::Engine::backend_device()shape.parakeet-cpp/starts_wordsignal: streaming consumer rebuild of "see if" stays as "see if"; chunk-boundary split "pun" + "ctuation" rejoins as "punctuation"; default-true callers unchanged.make/cmakebuilds and tests pass identically tomaster).Related
tts-cpp/source-of-truth: github.com/gianni-cor/chatterbox.cpp.parakeet-cpp/source-of-truth: standaloneparakeet.cpprepo.qvac-ext-ggml/speechbranch via theggml-speechvcpkg port.qvac3/packages/tts-ggml,qvac3/packages/parakeet-ggml.qvac-registry-vcpkg/ports/tts-cppport-version 0 in place; the actual port bump lands inqvac-registry-vcpkgin a follow-up after this PR merges tomaster.