Skip to content

Add tts-cpp/ subtree (Chatterbox Turbo + Multilingual + Supertonic TTS) + integration fixes#14

Merged
GustavoA1604 merged 16 commits into
tetherto:masterfrom
GustavoA1604:tts-cpp
May 7, 2026
Merged

Add tts-cpp/ subtree (Chatterbox Turbo + Multilingual + Supertonic TTS) + integration fixes#14
GustavoA1604 merged 16 commits into
tetherto:masterfrom
GustavoA1604:tts-cpp

Conversation

@GustavoA1604

@GustavoA1604 GustavoA1604 commented May 6, 2026

Copy link
Copy Markdown

Summary

Adds the tts-cpp/ in-tree subtree alongside the existing parakeet-cpp/ subtree, completing the QVAC speech stack inside this whisper.cpp fork. tts-cpp/ is a port of gianni-cor/chatterbox.cpp — Resemble AI's Chatterbox TTS (Turbo, English-only; Multilingual, 18 tier-1 languages) plus Supertonic TTS — running on ggml with CPU / Metal / Vulkan / CUDA backends, no Python/PyTorch dependency.

The PR also wires the subtree into the QVAC speech-stack consumption pattern (ggml-speech vcpkg port from qvac-ext-ggml/speech), folds in the round-3 review fixes that landed in the standalone repo after the initial port, and adds one parakeet-cpp streaming-API change required by downstream consumers.

  • 16 commits
  • 103 files changed, +57 184 lines (subtree drop is the bulk; the integration / review commits are small)
  • Branch: tts-cppmaster

What's in the PR

1. tts-cpp/ subtree drop (commit ef840d5)

Initial squashed import from the standalone chatterbox.cpp source-of-truth. Layout follows parakeet-cpp/:

  • tts-cpp/include/tts-cpp/ — public C++ API (chatterbox/engine.h, supertonic/engine.h, backend.h, log.h, tts-cpp.h, export.h).
  • tts-cpp/src/ — implementation (Chatterbox Turbo + MTL T3, Supertonic flow + vocoder, S3Gen, S3Tokenizer, CAMPPlus voice encoder, mel2wav HiFT, MTL tokenizer, GPT-2 BPE, log).
  • tts-cpp/test/ — gtest suite mirroring each engine stage (T3, S3Gen, vocoder, CAMPPlus, MTL tokenizer, streaming, etc.).
  • tts-cpp/scripts/ — Python conversion scripts (HF safetensors → GGUF for T3-Turbo / T3-MTL / S3Gen / Supertonic2; reference-trace dumps for diffing against PyTorch; voice extraction).
  • tts-cpp/CMakeLists.txt + tts-cpp/cmake/tts-cppConfig.cmake.in — install/export, OpenMP detection, find_package(tts-cpp CONFIG) consumer surface.
  • tts-cpp/README.md + tts-cpp/PROGRESS.md + tts-cpp/PROGRESS_SUPERTONIC.md — developer docs.

Performance summary (full table in tts-cpp/README.md):

  • Turbo, Vulkan RTX 5090, Q4_0: 463 ms wall, 13.8× faster than ONNX Runtime CPU Q4.
  • Multilingual, Metal M3 Ultra, Q4_0 + --cfm-steps 7: 1.05 s wall, 48.4× faster than ONNX Runtime CPU Q4.

2. Top-level README.md pointer (commit a2f2dd6)

Adds a small "QVAC speech-stack ports" section between the upstream whisper.cpp intro media and the Quick start, pointing at both tts-cpp/ and parakeet-cpp/ as in-tree subtrees with their own READMEs / build flows / public C++ APIs. Upstream whisper.cpp build below is unaffected.

3. Integration deltas vs the standalone repo (commits fa0d490, ae34c58, 8ba10a6, e673182)

The standalone chatterbox.cpp ships with scripts/setup-ggml.sh + patches/ for bundled-ggml dev builds. Per reviewer guidance, the in-tree subtree is converged on a single ggml source-of-truth — the QVAC speech-stack ggml-speech vcpkg port (which ships the patches pre-applied) — instead of carrying a parallel patches tree:

  • fa0d490 (review #5+#6): default TTS_CPP_USE_SYSTEM_GGML=ON in this subtree; bundled-ggml branch hard-errors with a pointer at the canonical consumption path; deletes tts-cpp/scripts/setup-ggml.sh (it referenced patches/ that no longer exist; it would have errored out anyway).
  • ae34c58 (review #26): rewrite tts-cpp/README.md §1 from "clone + setup-ggml.sh + patches" to "build via the QVAC speech stack"; flip the TTS_CPP_USE_SYSTEM_GGML row default in the cmake-options table; rewrite Consumer integration from the standalone perspective to the wrapper-port perspective; relabel benchmark and code-tree references from chatterbox.cpp/ to tts-cpp/ where they refer to this directory (kept where they refer to the upstream project itself).
  • 8ba10a6 (review N8): drop the now-unreachable TTS_CPP_GGML_LIB_PREFIX block (the find_package(ggml) path resolves libspeech-ggml-* natively from the vcpkg port, the local-rename helper is dead code in this subtree). README row updated with a strikethrough + "n/a in this subtree" note pointing at the standalone repo for the OFF path.
  • e673182 (review N10): scrub two stale patches/ references from tts-cpp/README.md (§3.24 Metal-optimisation explanation and the repository-layout tree); rewrite to attribute the patch to the ggml-speech vcpkg port. Five other patches/ mentions stay on purpose — they're cross-references to the standalone repo or describe the rejection rationale.

4. Round-3 review fixes mirrored from standalone (commits 4b5d2d7, 28ef67d, e8f6065, 04b87ea, 64abb81, 1963f9f, 942686d)

Re-syncs the subtree with the standalone source-of-truth for fixes that landed there after the initial port. Each commit is a 1:1 git apply --directory=tts-cpp/ of the corresponding upstream diff:

  • 4b5d2d7 (review N1–N7): Supertonic alive-registry guards thread_local cache teardown vs freed backend (N1); Engine BPE try/catch + dead-state cleanup (N3, N6, N7); log docstring softening (N2); s3gen cancel checkpoint between STFT and HiFT + Engine::cancel() doc (N4); s3gen_preload/unload refcount semantics on the public header (N5).
  • 28ef67d: drop internal ggml-quants.h include from supertonic_gguf.cpp; use ggml_get_type_traits()->to_float instead. Required because ggml-speech doesn't install internal ggml headers. Bit-equivalent runtime.
  • e8f6065: tts-cppConfig.cmake.in re-imports OpenMP via find_dependency before including the targets file. Without this, consumers of the static archive failed at target-property time with OpenMP::OpenMP_CXX … target was not found.
  • 04b87ea: scope the find_dependency(OpenMP) to COMPONENTS CXX. Bare find_dependency(OpenMP) was probing OpenMP_C in consumers, which fails on bare-make's clang-cl-style toolchain even when CXX-side OpenMP is fine.
  • 64abb81: TTS_CPP_OPENMP option, defaults OFF on Windows non-MinGW (where vcpkg's MSVC toolchain links OpenMP transitively but consumers can't re-probe it). Override via -DTTS_CPP_OPENMP_USER_OVERRIDE=ON -DTTS_CPP_OPENMP=ON. Trade-off: 9 #pragma omp parallel for loops in campplus.cpp run serially in this build mode; CAMPPlus is a small fraction of total synth time.
  • 1963f9f: Engine::backend_device() public API + BackendDevice enum on chatterbox::Engine and supertonic::Engine, mirroring parakeet::Engine::backend_device(). Routes through ggml_backend_get_device + ggml_backend_dev_type; works in both GGML_BACKEND_DL modes. Required by qvac3's tts-ggml addon to mirror ParakeetModel's load-time backend resolution.
  • 942686d: synthesize_batch gates apply_trim_fade on the actual presence of a voice override. The previous unconditional fade was clipping the leading consonant of the first word ("Hello" → "lo", "El" → "l", "A" → nothing) for the chatterbox-mtl built-in-voice path. Reference-audio / voice_dir paths keep the fade.

5. MTL Chatterbox correctness fixes (commits db87f42, 0b44674)

Two issues that surfaced once chatterbox-mtl ran end-to-end on real workloads:

  • db87f42: Engine::run_t3 for MTL wraps text tokens with start_text_token (255) / stop_text_token, matching chatterbox_cli.cpp's tokenisation path and the Python ChatterboxMultilingualTTS.generate reference (mtl_tts.py:288-291). Without it, the autoregressive decode dropped the first speech tokens, audible as a missing leading syllable. Turbo path is unaffected.
  • 0b44674: port the CLI's existing 3-identical-token early-stop guard into Engine::run_t3 (gated on is_mtl). MTL T3 occasionally emits an end-of-speech silence cadence mid-utterance and then hallucinates ~40 s of trailing low-energy junk until n_predict=1000. The CLI was already guarded; the addon path (which doesn't go through the CLI) was hitting the regression on certain language/seed combinations (most reliably reproduced on German with seed=42).

6. parakeet-cpp/ SentencePiece word-start signal (commit 761eca0)

Adds bool starts_word on parakeet::StreamingSegment, set true when the segment's first token's piece carries the SentencePiece (U+2581) word-boundary marker. Streaming consumers can use it to decide whether to insert a space between successive segments without re-parsing whitespace from seg.text (the inner detokenizer strips leading whitespace at the session level). Also exposes bool token_is_word_start(BpeVocab, int32_t) from sentencepiece_bpe.h so other engines that build their own segments (EOU per-utterance, attributed) can stamp the flag the same way. Defaults starts_word = true so existing callers are byte-equivalent.

Bundled into this PR rather than its own because the parakeet-cpp/ consumer in qvac3/packages/parakeet-ggml and the tts-cpp/ consumer in qvac3/packages/tts-ggml ship via the same ggml-speech vcpkg port version bump; splitting them would force two coordinated registry flips for a single addon release.

Design notes (preempting review questions)

These call out the deliberate choices that look unusual at first glance — flagging here so re-review doesn't re-litigate them.

Why is TTS_CPP_USE_SYSTEM_GGML=ON the default in this subtree?

Reviewer guidance (#5, #6): converge the speech stack on a single ggml source-of-truth. The ggml-speech vcpkg port (qvac-ext-ggml/speech) ships the chatterbox-specific Metal / OpenCL patches pre-applied — carrying a parallel patches/ tree inside this subtree would mean two sources of truth for the same patches and a coordination tax on every ggml bump. The bundled-ggml dev flow is preserved in the standalone chatterbox.cpp repo, which keeps setup-ggml.sh + patches/ + TTS_CPP_USE_SYSTEM_GGML=OFF default; this subtree is the integrated artefact, not a parallel dev environment.

-DTTS_CPP_USE_SYSTEM_GGML=OFF here intentionally hard-errors at configure time with a pointer at both the canonical consumption path (ggml-speech vcpkg port) and the standalone repo for users who need bundled ggml.

Why mirror commits from chatterbox.cpp instead of squashing into the initial port?

Two reasons:

  1. The initial port (ef840d5) was force-pushed five times during the integration cycle (review-iteration round 1 → round 3); each round-3 fix landed on the standalone repo after this PR's force-push window closed. Mirroring them as separate commits keeps the diff against the standalone source-of-truth reviewable: chatterbox.cpp commit hash → tts-cpp commit, one-to-one.
  2. Each mirror commit captures the upstream commit hash in its message, so a future bisect or audit can trace back to the standalone PR / discussion.

If a squashed history is preferred at merge time, GitHub's "Squash and merge" handles it; the per-commit messages are written to survive that.

Why is the parakeet-cpp/ change in this PR?

parakeet-cpp/ and tts-cpp/ ship through the same ggml-speech vcpkg port version bump (single port-version flip in qvac-registry-vcpkg). Consumers (qvac3/packages/parakeet-ggml, qvac3/packages/tts-ggml) bump together. Splitting parakeet-cpp/'s starts_word change into its own PR would force two coordinated registry flips for one release. The change is small (+15 lines of API surface, no behaviour change for existing callers) and gated by an additive bool field that defaults to true.

Why is OpenMP defaulted OFF on Windows non-MinGW?

vcpkg's MSVC toolchain port build links OpenMP::OpenMP_CXX into the static archive's transitive interface; consumers — including bare-make's clang-cl CMake — then re-probe OpenMP_CXX (or OpenMP_C) at find_package(tts-cpp) time, and that probe fails on toolchains where CXX-OpenMP isn't auto-detected. Defaulting OFF on the affected toolchain combination keeps the consumer surface portable; the perf cost is bounded to 9 #pragma omp parallel for loops in campplus.cpp (CAMPPlus runs once per voice-encode at session init, small fraction of total synth time). Override available for users on toolchains with working CXX OpenMP.

Why ~57 K added lines?

The bulk is the standalone chatterbox.cpp source dropped under tts-cpp/ (ef840d5). Major contributors:

  • tts-cpp/src/mtl_unicode_tables.inc — autogenerated NFKD lookup tables (one-time, regenerable via tts-cpp/scripts/gen-nfkd-table.py).
  • tts-cpp/src/dr_wav.h, tts-cpp/src/npy.h — vendored single-header libs (verbatim upstream copies; their licences are in tts-cpp/NOTICE).
  • tts-cpp/test/ — gtest suite (~22 test files, one per engine stage).
  • tts-cpp/PROGRESS.md, tts-cpp/PROGRESS_SUPERTONIC.md — developer notebooks.

Implementation source under tts-cpp/src/ (excluding the autogenerated table and vendored headers) is roughly 12 K lines.

Why doesn't this PR bump qvac-registry-vcpkg/ports/tts-cpp?

Per the standalone repo's pre-merge convention: while the PR is open, the port-version 0 entry is force-amended in place to point at the latest tip of this branch. The actual port bump (port-version 0 → port-version 1, or new commit hash for version 0) lands in qvac-registry-vcpkg after this PR merges to master, in a follow-up PR there.

Why are some chatterbox.cpp references kept verbatim in tts-cpp/README.md?

Five intentional ones survived the §1 rewrite (commit ae34c58):

  1. The title-card upstream URL.
  2. The §1 "use the standalone repo for bundled-ggml dev builds" pointer.
  3. The cmake-options note about the OFF default (lives upstream).
  4. The "How TTS_CPP_USE_SYSTEM_GGML=ON resolves ggml" prose (cross-references the standalone build flow).
  5. The repo-layout caveat naming the source-of-truth.

Each one points at github.com/gianni-cor/chatterbox.cpp by URL or repo name, not at this directory. They stay because the standalone repo is the development source-of-truth for the engine code and we want a single grep to find it.

Why is BackendDevice shaped exactly like parakeet::Engine::backend_device()?

Intentional API parallelism so the qvac3 addons can share their backend-resolution code path between tts-ggml and parakeet-ggml. Both addons read backend_device() + backend_name() at session init, map through a shared backendIdFromName(), and expose the same RuntimeStats shape on the JS API. Diverging the C++ API would mean two parallel addon-side wrappers for the same data.

Test plan

  • tts-cpp/ builds cleanly via vcpkg with -DTTS_CPP_USE_SYSTEM_GGML=ON against the ggml-speech port (qvac-ext-ggml/speech branch).
  • -DTTS_CPP_USE_SYSTEM_GGML=OFF configure attempt errors out at the patches/-absent guard with the documented message pointing at the standalone repo and the vcpkg port.
  • cmake -S tts-cpp -B build (no flags) errors at find_package(ggml CONFIG REQUIRED) with the reviewer-asked message pointing at ggml-speech.
  • tts-cpp/test/ gtest suite passes on Linux (CPU + Vulkan), macOS (CPU + Metal), Windows (CPU; Metal/Vulkan N/A).
  • Turbo end-to-end synth on a short English sentence with reference-voice cloning matches the tts-cpp/README.md performance table (Vulkan RTX 5090, Metal M3 Ultra, CPU Ryzen 9 9950X, CPU M3 Ultra NEON).
  • Multilingual end-to-end synth on the documented Spanish prompt + built-in voice, seed 42, matches the --cfm-steps 7 and default-cfm-steps rows on Metal M3 Ultra and M4.
  • German seed-42 reproduction (the worst case for the 3-identical-token bug from 0b44674): no trailing-silence-then-hallucination tail; output ends cleanly at sentence end.
  • MTL leading-syllable regression (db87f42): "Hello from the multilingual" no longer renders as "lo from the multilingual" on the addon path.
  • Built-in voice (apply_trim_fade gate from 942686d): "El", "A", "Hello" first words preserved on the chatterbox-mtl built-in path; reference-audio path unchanged.
  • find_package(tts-cpp CONFIG REQUIRED) from a downstream CMake project resolves the static archive plus the OpenMP transitive dep with the COMPONENTS CXX scope (04b87ea), on Linux, macOS, and Windows MSVC + clang-cl.
  • Engine::backend_device() on Vulkan/CUDA/Metal returns BackendDevice::GPU; on CPU returns BackendDevice::CPU. Matches parakeet::Engine::backend_device() shape.
  • parakeet-cpp/ starts_word signal: streaming consumer rebuild of "see if" stays as "see if"; chunk-boundary split "pun" + "ctuation" rejoins as "punctuation"; default-true callers unchanged.
  • Upstream whisper.cpp build below the new top-level README section is unaffected (existing make / cmake builds and tests pass identically to master).

Related

  • tts-cpp/ source-of-truth: github.com/gianni-cor/chatterbox.cpp.
  • parakeet-cpp/ source-of-truth: standalone parakeet.cpp repo.
  • ggml consumption: qvac-ext-ggml/speech branch via the ggml-speech vcpkg port.
  • Downstream consumers: qvac3/packages/tts-ggml, qvac3/packages/parakeet-ggml.
  • Pre-merge port-version policy: amend qvac-registry-vcpkg/ports/tts-cpp port-version 0 in place; the actual port bump lands in qvac-registry-vcpkg in a follow-up after this PR merges to master.

@GustavoA1604 GustavoA1604 requested review from a team as code owners May 6, 2026 16:34
@GustavoA1604 GustavoA1604 force-pushed the tts-cpp branch 5 times, most recently from fe33e3b to 99188e5 Compare May 6, 2026 23:37
GustavoA1604 and others added 15 commits May 6, 2026 20:57
…l-org#6)

The standalone setup-ggml.sh + patches/ tooling was dropped from
qvac-ext-lib-whisper.cpp/tts-cpp/ in the integration commit, but the
CMakeLists.txt still:
  * defaulted TTS_CPP_USE_SYSTEM_GGML=OFF, and
  * unconditionally compile-defined GGML_BACKEND_DL_PROJECT_PREFIX="speech-"
    on the bundled ggml target.
That combination quietly broke standalone bundled-ggml builds: the
filename-prefix patch was no longer applied, so libspeech-ggml-*.so
files existed on disk but ggml's runtime loader still searched for
libggml-*.so under GGML_BACKEND_DL=ON.  Vulkan / OpenCL / CUDA
backends silently failed to load on Android.

Fix per reviewer guidance: converge the speech stack on a single ggml
source-of-truth.  Standalone-bundled-ggml is no longer a supported
build mode out of this in-tree subtree; the canonical path is
`-DTTS_CPP_USE_SYSTEM_GGML=ON` against the QVAC speech-stack
`ggml-speech` vcpkg port (qvac-ext-ggml/speech branch), which ships
the patches pre-applied.

Edits:

- TTS_CPP_USE_SYSTEM_GGML default flipped from OFF to ON in this
  tree.  Docstring spells out the rationale + points users at the
  standalone github.com/gianni-cor/chatterbox.cpp repo if they need
  a bundled-ggml dev build with patches/ present.

- The bundled-ggml branch of `if (NOT TARGET ggml)` now refuses to
  configure when patches/ is absent: a FATAL_ERROR points at the
  right consumption path (vcpkg ggml-speech) and the standalone
  fallback.  Doesn't break in-tree-with-patches builds (parakeet-cpp
  in this same repo still ships patches/, so its bundled path is
  unaffected by this guard inside tts-cpp).

- Verified locally: `cmake -S tts-cpp -B build` (no flags) errors
  out at find_package(ggml CONFIG REQUIRED) with our new message
  pointing at the ggml-speech port; `cmake -S tts-cpp -B build
  -DTTS_CPP_USE_SYSTEM_GGML=OFF` errors out at the patches/ guard
  with the no-patches message.

- tts-cpp/scripts/setup-ggml.sh deleted: it referenced patches/
  that no longer exist; running it would have errored out anyway.
  The standalone repo keeps its own setup-ggml.sh; only the in-tree
  subtree drops it.

The standalone chatterbox.cpp repo (the one tts-cpp/ was copied
from) keeps TTS_CPP_USE_SYSTEM_GGML=OFF default + the patches/
folder + scripts/setup-ggml.sh.  This commit is therefore an
integration-time delta against that source, not a change to the
standalone build flow.

Co-authored-by: Cursor <cursoragent@cursor.com>
)

The README was a verbatim copy of the standalone chatterbox.cpp repo,
which makes it read as 'I cloned the wrong repo' to anyone landing
on tts-cpp/ inside qvac-ext-lib-whisper.cpp.  Per the reviewer's
two-line ask: rewrite section 1 + global s/chatterbox.cpp/tts-cpp
where it's a directory or repo-name reference (kept where it points
at the upstream chatterbox.cpp project itself).

Edits:

- Title changes from `# chatterbox.cpp` to `# tts-cpp` plus a
  blockquote note up top: this is the in-tree subtree of
  github.com/gianni-cor/chatterbox.cpp; the integration drops
  setup-ggml.sh + patches/, ggml comes through the qvac-ext-ggml
  speech-branch vcpkg port, see section 1 for the build flow.

- Section 1 (was '## 1. Clone and build', the standalone clone +
  setup-ggml.sh + patches/ flow) replaced with '## 1. Build from
  the qvac speech stack':
    * one find_package(tts-cpp CONFIG REQUIRED) cmake snippet for
      downstream consumption;
    * one cmake -S tts-cpp -B build -DCMAKE_TOOLCHAIN_FILE=vcpkg.cmake
      flow for in-tree dev;
    * pointer at the standalone github.com/gianni-cor/chatterbox.cpp
      repo for anyone needing a bundled-ggml dev build.
  Drops the entire setup-ggml.sh paragraph + GPU-acceleration
  paragraph that referenced patches/.

- 'Useful CMake options' table: TTS_CPP_USE_SYSTEM_GGML row default
  flipped from OFF to 'ON (this in-tree subtree)', cell explains
  that flipping OFF is rejected here (no patches/) and points at the
  standalone repo for the OFF default.

- 'Alternative: consume ggml from vcpkg' subsection collapsed to
  'How TTS_CPP_USE_SYSTEM_GGML=ON resolves ggml' since it's now
  the canonical path, not the alternative.  Drops the now-stale
  'preserves the standalone flow above untouched, opt-in escape
  hatch for package-manager-driven builds' paragraph.

- 'Consumer integration' subsection rewritten from the wrapper-port
  perspective ('this in-tree subtree IS the wrapper port') instead
  of the standalone perspective ('downstream projects consume
  through the wrapper port').

- Benchmark tables (Mac M3 Ultra + Linux RTX 5090): four
  '`chatterbox.cpp` Q4_0' implementation-name cells become
  '`tts-cpp` Q4_0'; the '`chatterbox.cpp` (Metal) is...' /
  '`chatterbox.cpp` (Vulkan) is...' captions follow.

- Repository layout tree: root dir name `chatterbox.cpp/` becomes
  `tts-cpp/` with a one-line caveat naming the standalone source-
  of-truth.  Drops the `ggml/` entry (no bundled ggml in this
  subtree by default), drops the `setup-ggml.sh` line under
  scripts/ (the file no longer exists - removed in the previous
  commit), updates the chatterbox_cli.cpp comment from
  'tts-cli + chatterbox binaries' to 'tts-cli binary' since the
  back-compat chatterbox alias is dropped in the standalone source
  too.

- One '# Build chatterbox.cpp, then:' bash comment in the
  reproduction snippet becomes '# Build tts-cpp, then:'.

- Lower 'tts-cli / chatterbox binaries' API-overview phrasing
  becomes 'tts-cli binary' to match the actual built artefact.

Five `chatterbox.cpp` references stay on purpose: the title-card
URL, the section-1 'use the standalone repo' pointer, the
useful-cmake-options note about the OFF default, the
how-system-ggml-resolves prose, and the repo-layout caveat.  Each
one points at the upstream project github.com/gianni-cor/chatterbox.cpp
by URL/name, not at this directory.

No code changes; README.md only.

Co-authored-by: Cursor <cursoragent@cursor.com>
…gml-org#27)

Adds a small 'QVAC speech-stack ports' section between the upstream
whisper.cpp intro media and the 'Quick start' section, pointing at
the two in-tree subtrees this fork carries:

- tts-cpp/ - Chatterbox (Turbo + Multilingual) + Supertonic TTS,
  in-tree subtree of github.com/gianni-cor/chatterbox.cpp.
- parakeet-cpp/ - NVIDIA Parakeet FastConformer ASR + Sortformer
  diarization, in-tree subtree of the parakeet.cpp standalone repo.

Both consume ggml through the `ggml-speech` vcpkg port (the
qvac-ext-ggml/speech branch).  Each subtree has its own README, build
flow, and public C++ API; the upstream whisper.cpp build below the new
section is unaffected.

Closes review ggml-org#27 ('one-line pointer to tts-cpp/ from the top-level
qvac-ext-lib-whisper.cpp/README.md').  The reviewer specifically asked
for tts-cpp; included parakeet-cpp at the same time so a future
'fix the un-fixed parakeet-cpp version of this bullet' commit doesn't
need to revisit the same paragraph.

Co-authored-by: Cursor <cursoragent@cursor.com>
Re-syncs the in-tree subtree with the standalone chatterbox.cpp
source-of-truth after seven round-3 review items landed there.  The
diff was generated from chatterbox.cpp commits 2d3632b..0a5ad2d and
applied with `git apply --directory=tts-cpp/`; no path-level
conflicts because the subtree was last copied from the same source.

Mirrored commits (chatterbox.cpp side):

- ef0eb36  supertonic: alive-registry guards thread_local cache
           teardown vs freed backend (N1)
- fcbff16  engine: Turbo BPE try/catch + drop dead cached_text_lc +
           clarify view-vs-copy log (N3 + N6 + N7)
- 055ce84  log: drop dead g_sink_* state, soften thread-safety
           docstring (N2)
- 75fbd22  s3gen: cancel checkpoint between STFT and HiFT + tighten
           Engine::cancel() doc (N4)
- 0a5ad2d  s3gen: document s3gen_preload/unload refcount semantics
           on the public header (N5)

Files touched (11):

  include/tts-cpp/chatterbox/engine.h         (N4 docstring)
  include/tts-cpp/chatterbox/s3gen_pipeline.h (N5 docstring)
  include/tts-cpp/log.h                       (N2 docstring)
  src/chatterbox_engine.cpp                   (N3 try/catch)
  src/chatterbox_tts.cpp                      (N4 stft cancel + N7 log)
  src/log.cpp                                 (N2 dead-state drop)
  src/supertonic_gguf.cpp                     (N1 alive-registry)
  src/supertonic_internal.h                   (N1 helper API)
  src/supertonic_text_encoder.cpp             (N1 free-cache gate)
  src/supertonic_vector_estimator.cpp         (N1 + N6)
  src/supertonic_vocoder.cpp                  (N1 free-cache gate)

The two integration-only review items (N8 unreachable LIB_PREFIX
block, N10 stale patches/ refs in README) land in separate commits
on this branch since they don't correspond to chatterbox.cpp
changes.  N9 (per-call seed override) and N11 (richer backend_name)
were dropped per user direction.

Build verification was done on chatterbox.cpp's standalone build (the
source-of-truth); not re-built here because TTS_CPP_USE_SYSTEM_GGML
defaults ON in this in-tree subtree and requires the ggml-speech
vcpkg port installed to configure.

Co-authored-by: Cursor <cursoragent@cursor.com>
After commit fa0d490 (review ggml-org#5+ggml-org#6) made bundled-add_subdirectory(ggml)
hard-error in this in-tree subtree when patches/ is absent, the
TTS_CPP_GGML_LIB_PREFIX block became dead code:

  if (TTS_CPP_GGML_LIB_PREFIX AND NOT TTS_CPP_USE_SYSTEM_GGML)

NOT TTS_CPP_USE_SYSTEM_GGML can never reach this `if` here -
configure has already FATAL_ERROR'd at the patches/-absent guard.
The option, the helper function, the foreach loop, the
GGML_BACKEND_DL_PROJECT_PREFIX define, and the STATUS message were
all unreachable.  The next maintainer flipping
-DTTS_CPP_GGML_LIB_PREFIX=OFF to disable prefixing would have been
silently confused when nothing changed.

Edits:

tts-cpp/CMakeLists.txt:
  - The option() declaration at line 22 removed.  Replaced with a
    one-paragraph cross-reference to the standalone chatterbox.cpp
    repo for the locally-rename flow + the rationale (ggml-speech
    vcpkg port emits the libspeech-ggml-* filenames itself).
  - The 41-line block at lines 131-176 (tts_cpp_apply_ggml_prefix
    function + foreach + target_compile_definitions +
    STATUS message) replaced with a 9-line note telling future
    readers where the standalone counterpart lives.

tts-cpp/README.md:
  - Useful CMake options table row for TTS_CPP_GGML_LIB_PREFIX
    rewritten with a strikethrough + "n/a in this subtree" cell:
    explains the standalone option exists at chatterbox.cpp upstream,
    why it's unnecessary here (ggml-speech vcpkg port handles the
    rename at its own build time), and that the file-prefix surface
    is whatever vcpkg installs.

Doc-only behavior visible to consumers: the integrated subtree no
longer has a TTS_CPP_GGML_LIB_PREFIX option at all.  Build behaviour
unchanged - the vcpkg find_package path was already taking effect
and emitting libspeech-ggml-* as designed.

Co-authored-by: Cursor <cursoragent@cursor.com>
Two spots in the README still pointed at a `patches/` directory that
isn't in this in-tree subtree (deleted in the integration commit;
the ggml-speech vcpkg port carries the equivalent pre-applied):

(a) §3.24-§3.30 Metal optimisation explanation: "Patch
    `patches/ggml-metal-chatterbox-ops.patch` (1088 lines) applies
    cleanly on a fresh ggml clone at pinned `58c38058`."  Reads as
    if the file lives at this subtree's patches/ today.

(b) The "Repository layout" project-tree diagram listed
    `patches/ggml-metal-chatterbox-ops.patch` /
    `ggml-opencl-chatterbox-ops.patch` / `README.md` as if they were
    here.

Edits:

(a) Reworded to "the 1088-line ggml-metal patch backing these
    kernel changes is shipped pre-applied by the `ggml-speech`
    vcpkg port (qvac-ext-ggml/speech branch); the standalone
    chatterbox.cpp repo carries it under
    `patches/ggml-metal-chatterbox-ops.patch` against pinned ggml
    `58c38058`."  Same technical claim, accurate provenance for
    this subtree.

(b) The patches/ block in the project-tree diagram replaced with a
    parenthetical note pointing at the standalone repo for the
    locally-applied flow.

The other five `patches/` mentions in the README (lines 5, 352, 362,
376, 434) are deliberate cross-references to the standalone
chatterbox.cpp repo or describe the
"flipping TTS_CPP_USE_SYSTEM_GGML=OFF rejected because patches/ is
absent here" rationale.  Those stay.

Doc-only; no code or build behaviour change.

Co-authored-by: Cursor <cursoragent@cursor.com>
….cpp

System-ggml build of this in-tree subtree was failing in the
ggml-speech vcpkg port because the standalone source included the
internal ggml/src/ggml-quants.h header which isn't installed by
ggml-speech.  The standalone chatterbox.cpp source was just bumped
to use ggml_get_type_traits() + tr->to_float instead, mirroring the
parakeet.cpp pattern.

Mirrored from chatterbox.cpp commit edf9e50 via
\`git apply --directory=tts-cpp\` against the standalone diff.

src/supertonic_gguf.cpp:
  - Drop \`#include "ggml-quants.h"\`.
  - expand_supertonic_tensor_to_f32() now uses
    ggml_get_type_traits(src->type)->to_float instead of the
    direct ggml_fp16_to_fp32_row / dequantize_row_q8_0 calls.

No public API change; runtime behaviour is bit-equivalent because
to_float dispatches into the same row dequantizers internally.

The qvac-registry-vcpkg/ports/tts-cpp portfile + version bump to
pick up this commit lands in a follow-up.

Co-authored-by: Cursor <cursoragent@cursor.com>
…rbox.cpp

Mirrors chatterbox.cpp commit e481901 to the in-tree subtree.

tts-cpp builds as a STATIC archive by default and links OpenMP as
PRIVATE; install(EXPORT) records that as an
IMPORTED_LINK_DEPENDENT_LIBRARY in tts-cppTargets.cmake, so
consumers doing find_package(tts-cpp CONFIG REQUIRED) failed at
target-property time with

  The link interface of target "tts-cpp::tts-cpp" contains:
    OpenMP::OpenMP_CXX
  but the target was not found.

That hit qvac3/packages/tts-ggml after the integrated tts-cpp
vcpkg port @ 2026-05-07#0 finally compiled and installed.

Fix: tts-cppConfig.cmake re-imports OpenMP via find_dependency
before including tts-cppTargets.cmake; conditionally injected so
the dep is only required of consumers when OpenMP was actually
found and linked at build time.

  tts-cpp/cmake/tts-cppConfig.cmake.in:
    Add @TTS_CPP_OPTIONAL_DEPS@ substitution slot directly after
    the existing find_dependency(ggml CONFIG).

  tts-cpp/CMakeLists.txt (install block):
    Build TTS_CPP_OPTIONAL_DEPS by appending "find_dependency(OpenMP)\n"
    iff OpenMP_CXX_FOUND, otherwise leave empty;
    configure_package_config_file substitutes it in.  Backwards-
    compatible with builds where OpenMP isn't available
    (find_package(OpenMP) is non-REQUIRED).

The qvac-registry-vcpkg/ports/tts-cpp port-version 0 entry will be
amended in place to point at this commit (pre-merge convention:
single squashed commit + force-push until upstream merge).

Co-authored-by: Cursor <cursoragent@cursor.com>
…rbox.cpp

Mirrors chatterbox.cpp commit c91f2d9.

Follow-up to commit e8f6065.  The unscoped find_dependency(OpenMP)
emitted into tts-cppConfig.cmake by the previous fix made
consumers' CMake also probe OpenMP_C, which fails on bare-make's
clang-cl-style toolchain even when CXX-side OpenMP is fine:

  Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS
                          OpenMP_C_LIB_NAMES)
  ...share/tts-cpp/tts-cppConfig.cmake:29 (find_dependency)

tts-cpp only links OpenMP::OpenMP_CXX, never the C variant.

Fix: in tts-cpp/CMakeLists.txt install block, change the line that
appends to TTS_CPP_OPTIONAL_DEPS so it emits

  find_dependency(OpenMP COMPONENTS CXX)

instead of bare find_dependency(OpenMP).  CMake's FindOpenMP module
respects COMPONENTS and scopes the probe to that language only;
OpenMP::OpenMP_CXX is still imported, OpenMP_C is not required.

The qvac-registry-vcpkg/ports/tts-cpp port-version 0 entry will be
amended in place to point at this commit (pre-merge convention).

Co-authored-by: Cursor <cursoragent@cursor.com>
Mirrors chatterbox.cpp commit e6031b2.

Replaces the bare find_package(OpenMP) call with parakeet-style
gating so OpenMP auto-defaults OFF on Windows non-MinGW (the
toolchain combination where vcpkg's MSVC port build ends up
linking OpenMP::OpenMP_CXX into the static-archive transitive
interface, only for consumers - including bare-make's clang-cl
CMake - to fail re-probing OpenMP_CXX or OpenMP_C at
find_package(tts-cpp) time).

Edit (tts-cpp/CMakeLists.txt, ~line 150):

  option(TTS_CPP_OPENMP "tts-cpp: enable OpenMP for the tts-cpp
                         target" ON)
  if (WIN32 AND NOT MINGW AND TTS_CPP_OPENMP
      AND NOT DEFINED CACHE{TTS_CPP_OPENMP_USER_OVERRIDE})
      set(TTS_CPP_OPENMP OFF CACHE BOOL "" FORCE)
      message(STATUS "...")
  endif()
  if (TTS_CPP_OPENMP)
      find_package(OpenMP)
  endif()

Net effect inside the qvac-registry-vcpkg/ports/tts-cpp port
build (x64-windows triplet, vcpkg's MSVC toolchain): OpenMP_CXX
is never searched, the target_link_libraries(... PRIVATE
OpenMP::OpenMP_CXX) lines are skipped, the install(EXPORT)
emits no OpenMP transitive dep, and tts-cppConfig.cmake's
@TTS_CPP_OPTIONAL_DEPS@ slot stays empty (no
find_dependency(OpenMP) is generated).  Consumer toolchains with
broken or missing OpenMP detection are no longer blocked.

Trade-off: the 9 #pragma omp parallel for loops in
src/campplus.cpp run serially in this build mode.  CAMPPlus
preprocessing is a small fraction of total synth time; the perf
delta is bounded.  Override available via
-DTTS_CPP_OPENMP_USER_OVERRIDE=ON -DTTS_CPP_OPENMP=ON for
toolchains that do have working CXX OpenMP.

The qvac-registry-vcpkg/ports/tts-cpp port-version 0 entry will
be amended in place to point at this commit (pre-merge
convention: single squashed commit + force-push until upstream
merge).

Co-authored-by: Cursor <cursoragent@cursor.com>
Mirrors chatterbox.cpp commit 8c849cc.

Adds tts-cpp/include/tts-cpp/backend.h with the BackendDevice enum
(CPU = 0, GPU = 1) and a backend_device() method on both
chatterbox::Engine and supertonic::Engine.  Implementation routes
through the ggml backend registry (ggml_backend_get_device +
ggml_backend_dev_type) so it works in both GGML_BACKEND_DL modes.

Same shape as parakeet.cpp's parakeet::Engine::backend_device(),
matched intentionally so the qvac3 tts-ggml addon can mirror
ParakeetModel's load-time backend resolution (read backend_device()
+ backend_name(), map to backendIdFromName(), expose both on JS via
RuntimeStats).  See chatterbox.cpp commit 8c849cc message for the
full technical rationale.

The qvac-registry-vcpkg/ports/tts-cpp port-version 0 entry will be
amended in place to point at this commit (pre-merge convention).

Co-authored-by: Cursor <cursoragent@cursor.com>
… chatterbox.cpp

Mirrors chatterbox.cpp commit 78ae3c5.

Engine::synthesize_batch now gates apply_trim_fade on the actual
presence of a voice override (reference_audio path or voice_dir).
When both are empty - i.e. the chatterbox::Engine built-in-voice
default that loads s3gen/builtin/{embedding,prompt_token,prompt_feat}
from the GGUF - apply_trim_fade is false so the first 40 ms of
synthesized speech is no longer zeroed + faded.

This unblocks the chatterbox-mtl variant in particular: its
upstream conds.pt produces audio with zero leading silence, and
the previous unconditional apply_trim_fade was clipping the
leading consonant of the first word ("Hello" -> "lo", "El" -> "l",
"A" -> nothing) under that configuration.  See chatterbox.cpp
commit 78ae3c5 for the full diagnosis + empirical confirmation.

Reference-audio / voice_dir paths keep apply_trim_fade=true and
behave exactly as before; streaming path is unchanged.

The qvac-registry-vcpkg/ports/tts-cpp port-version 0 entry will
be amended in place to point at this commit (pre-merge convention:
single squashed commit + force-push until upstream merge).

Co-authored-by: Cursor <cursoragent@cursor.com>
…ment

Adds `bool starts_word` to `parakeet::StreamingSegment`, set true when
the segment's first token's piece carries the SentencePiece "▁" word-
boundary marker (U+2581) and false when it is a wordpiece continuation.

Streaming consumers can use the flag to decide whether to insert a
space between successive segments without re-parsing whitespace from
`seg.text` (the inner detokenizer strips leading whitespace at the
session level, which loses the signal for the chunk that opens a
session). With the flag, "see" + "if" stays as "see if" while the
chunk-boundary split "pun" + "ctuation" rejoins as "punctuation".

Also exposes `bool token_is_word_start(BpeVocab, int32_t)` from
sentencepiece_bpe.h so other engines that build their own segments
(EOU per-utterance, attributed) can stamp the flag the same way.

Defaults `starts_word = true` so existing callers that ignore the
field see no behavioural change.

Co-authored-by: Cursor <cursoragent@cursor.com>
…_token

Mirrors src/chatterbox_cli.cpp's MTL tokenisation path and the Python
ChatterboxMultilingualTTS.generate reference (chatterbox-ref/src/chatterbox/
mtl_tts.py:288-291).  The MTL T3 prompt graph anchors position 0 on
start_text_token (255); without it the autoregressive decode drops the
first speech tokens, audible as a missing leading syllable
("Hello" -> "lo from the multilingual").

Turbo (gpt2_bpe) is unaffected and keeps the existing single-line tokenise
+ punc_norm path.

Co-authored-by: Cursor <cursoragent@cursor.com>
…gine::run_t3

MTL T3 occasionally emits a plausible end-of-speech silence cadence
(three identical tokens in a row) mid-utterance and then hallucinates
low-energy content -- silence, hissing, garbage tokens -- until
n_predict (1000) is reached, producing ~40 s of trailing junk on a
short input.  chatterbox_cli.cpp already guards against this via the
AlignmentStreamAnalyzer token_repetition port, but Engine::run_t3 was
missing the same check, so the addon path (which doesn't go through the
CLI) saw the regression on whichever language/seed combinations happen
to hit the cadence (most reliably reproduced on German with the default
seed=42).

Mirrors the CLI's existing guard 1:1, gated on is_mtl since the Turbo
codebook has a different cadence signature.

Co-authored-by: Cursor <cursoragent@cursor.com>
@GustavoA1604 GustavoA1604 changed the title Add tts-cpp files Add tts-cpp/ subtree (Chatterbox Turbo + Multilingual + Supertonic TTS) + integration fixes May 7, 2026
@GustavoA1604 GustavoA1604 merged commit be913c8 into tetherto:master May 7, 2026
58 of 66 checks passed
gianni-cor pushed a commit that referenced this pull request May 28, 2026
Add tts-cpp/ subtree (Chatterbox Turbo + Multilingual + Supertonic TTS) + integration fixes
pratiknarola-t added a commit that referenced this pull request May 28, 2026
…in init_gpu_backend

On Adreno + PR #14/#15 the policy correctly picks OpenCL and Chatterbox
runs to completion. On Vulkan-on-Mali (Google Pixel 9 Pro XL / Tensor
G4) ggml_backend_dev_init throws an unhandled C++ exception during
pipeline init, which bubbles up to libc++abi::terminate() and SIGABRT
crashes the host process before the caller can react.

Wrap the call in try-catch inside try_init: on any exception, log
verbosely and 'continue' to the next candidate; if every candidate in
a bucket throws or returns null, the lambda returns nullptr and the
policy proceeds to the next bucket. After all buckets fail
init_gpu_backend returns nullptr and the caller falls back to CPU --
which is exactly what 'no usable GPU available' should mean.

Defensive layer that handles any future bad-GPU vendor (not Mali
specific): SIGABRT during GPU init is never an acceptable failure
mode for a TTS engine that has a working CPU path. Validated against
Pixel 9 Pro XL on AWS Device Farm via the QVAC-19254 [DO NOT MERGE]
test PR (tetherto/qvac#2320).

QVAC-19254
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant