Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 46 additions & 1 deletion packages/tts-ggml/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,52 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.2.1] - 2026-06-05
## [0.2.2] - 2026-06-09

### Fixed

- **Android: revert the `tts-cpp` `2026-06-05` bump (introduced in 0.2.1)
that crashed the addon at `dlopen` during bootstrap, taking down every
Android e2e run.** `tts-cpp` `2026-06-05` pins upstream
`qvac-ext-lib-whisper.cpp@128dae42` (the QVAC-19254 "sched + cpu_backend
refactor"), which added direct `ggml_backend_is_cpu` /
`ggml_get_type_traits_cpu` calls inside the statically-linked `tts-cpp`
library. On Android the shared `ggml-speech` port builds the CPU backend
as runtime-`dlopen`'d per-microarch MODULE `.so` variants
(`GGML_CPU_ALL_VARIANTS=ON` + `GGML_BACKEND_DL=ON`; no static CPU
archive), so those two symbols are left `UND` in
`libqvac__tts-ggml.*.so`'s dynamic symbol table with no `DT_NEEDED` able
to resolve them β€” the CPU variant libraries are only `dlopen`'d lazily
inside Engine construction, long after Bare loads the addon. Bare's
resolver therefore fails to register the addon
(`ADDON_NOT_FOUND: linked:libqvac__tts-ggml.*.so` / `dlopen failed`) and
the unhandled rejection aborts the process (SIGABRT) ~1 s into
bootstrap. iOS and desktop (Linux/macOS/Windows) statically link the CPU
backend and were never affected. Pin `tts-cpp` back to `2026-06-03#1`
(the last-known-good revision, the one 0.2.0 shipped) so the Android
addon loads cleanly again.

### Reverted

- Reverts the 0.2.1 Supertonic GPU enablement (QVAC-19255, #2473) in full:
the `tts-cpp` pin, the `SupertonicModel.cpp` / `index.js` `useGPU` /
`nGpuLayers` gate removals, the flipped C++ unit tests and
`gpu-smoke.test.js` integration test, and the README / `index.d.ts` /
examples docs. With `tts-cpp` back at `2026-06-03#1` Supertonic is
CPU-only again, so the rejection gates and the CPU-only contract are
restored to keep the package internally consistent. The Supertonic GPU
work should re-land once the Android CPU-backend linkage is fixed
upstream (QVAC-19254 follow-up against `tts-cpp` / `ggml-speech`, e.g.
by statically linking `ggml-cpu` into the addon on Android the way
desktop/iOS already do).

## [0.2.1] - 2026-06-05 β€” superseded by 0.2.2

> **Broken on Android.** The `tts-cpp` `2026-06-05` dependency this release
> introduced crashes the addon at load time (`dlopen` failure β†’ SIGABRT)
> on Android ARM64; iOS and desktop are unaffected. Reverted in 0.2.2 (see
> above). The entry below describes what 0.2.1 attempted and is retained
> for history.

### Added

Expand Down
2 changes: 1 addition & 1 deletion packages/tts-ggml/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ backend persist its compiled program cache across launches.
| `backendsDir` | string | `path.join(__dirname, 'prebuilds')` | Root dir the addon scans for dynamically-loaded ggml backend `.so` files. Required on Android (host should pass `path.join(__dirname, 'prebuilds')`); ignored on platforms that statically link the backend |
| `openclCacheDir` | string | unset | Android-only: directory where the OpenCL backend persists its compiled program-binary cache. Setting it across runs avoids re-JITing the kernels on every fresh process |
| `config.language` | string | `"en"` | Chatterbox MTL accepts `es/fr/de/pt/it/zh/ja/ko/...`; turbo & Supertonic are English |
| `config.useGPU` | boolean | `false` | Set to `true` to route through Metal / Vulkan / OpenCL if available, on either Chatterbox or Supertonic. Backend selection follows tts-cpp's `init_gpu_backend` tier policy (Adreno 700+ β†’ OpenCL, otherwise Vulkan/Metal/CUDA via the registry walk, otherwise CPU) |
| `config.useGPU` | boolean | `false` | Set to `true` to route through Metal / Vulkan / OpenCL if available. Ignored on Android (forced to CPU at the C++ engine boundary); rejected by Supertonic at construction time (engine is CPU-only today) |
| `config.outputSampleRate` | number | 24000 | Resample native 24 kHz output |
| `opts.stats` | boolean | `false` | Populate `response.stats` with RTF, `backendDevice` (0=CPU, 1=GPU), `backendId` (0=CPU, 1=Metal, 3=Vulkan, 4=OpenCL, 99=other) etc. |
| `opts.exclusiveRun` | boolean | `false` | Serialize overlapping streaming runs |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,10 @@ struct SupertonicConfig {
* Tri-state GPU intent (mirrors ChatterboxConfig::useGpu):
* - std::nullopt: unspecified, let the engine use its library default.
* - true: if nGpuLayers unset, maps to nGpuLayers=99.
* Honoured as of tts-cpp@2026-06-05 (QVAC-18605
* Supertonic Vulkan/Metal optimisations + QVAC-19254
* sched/cpu_backend refactor for Adreno OpenCL).
* Backend selection follows tts-cpp's init_gpu_backend
* tier policy (Adreno 700+ -> OpenCL, otherwise
* Vulkan/Metal/CUDA via the registry walk, otherwise
* CPU).
* Note: SupertonicModel::validateConfig still rejects
* any GPU intent today because the Supertonic
* engine is CPU-only ("CPU only today" β€” see
* tts-cpp include/tts-cpp/supertonic/engine.h).
* - false: if nGpuLayers unset, forces nGpuLayers=0 (CPU).
*
* Conflicts with nGpuLayers (true + 0, or false + !=0) are rejected
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,12 +126,19 @@ void SupertonicModel::validateConfig(const SupertonicConfig& cfg) {
"(useGPU:true + nGpuLayers!=0, or useGPU:false + nGpuLayers=0).");
}
}
// GPU execution is supported as of tts-cpp@2026-06-05 (QVAC-18605
// Supertonic Vulkan/Metal optimisations + QVAC-19254 sched/cpu_backend
// refactor for Adreno OpenCL). Backend selection follows tts-cpp's
// init_gpu_backend tier policy: Adreno 700+ -> OpenCL, otherwise
// Vulkan/Metal/CUDA via the registry walk, otherwise CPU. Caller
// intent (useGPU / nGpuLayers) is honoured.
const bool wantsGpu =
cfg.useGpu.value_or(false) ||
(cfg.nGpuLayers.has_value() && *cfg.nGpuLayers != 0);
if (wantsGpu) {
throw StatusError(
general_error::InvalidArgument,
"SupertonicModel: GPU execution is not supported by the Supertonic "
"engine yet (see tts-cpp include/tts-cpp/supertonic/engine.h: \"CPU "
"only today\"). GPU output is currently silently wrong "
"(~4x quieter, slightly truncated) on the Vulkan vector-estimator "
"+ vocoder path. Pass useGPU: false (and leave nGpuLayers unset or "
"0) when constructing a Supertonic model.");
}
}

void SupertonicModel::load() {
Expand All @@ -153,13 +160,23 @@ void SupertonicModel::reload() {
void SupertonicModel::loadLocked() {
if (engine_) return;

// Android GPU policy is delegated to tts-cpp's init_gpu_backend tier
// policy as of QVAC-19254: it allowlists Qualcomm Adreno (OpenCL on
// Adreno 700+, falls through to Vulkan / CPU on other tiers) and
// skips Mali / non-Adreno GPUs that would abort ggml_backend_graph_
// compute. No extra force-off at this boundary; consumers asking
// for useGPU=true on Android will get Adreno-OpenCL when available
// and CPU otherwise.
// Force useGPU to false on Android until Vulkan (Mali) and OpenCL (Adreno)
// stabilize for the Supertonic graph.
#ifdef __ANDROID__
{
const bool wantsGpu =
cfg_.useGpu.value_or(false) ||
(cfg_.nGpuLayers.has_value() && *cfg_.nGpuLayers != 0);
if (wantsGpu) {
QLOG(logger::Priority::WARNING,
"Supertonic: useGPU=true is currently ignored on Android "
"(GPU backends disabled at engine boundary pending Vulkan/Mali "
"and OpenCL/Adreno driver fixes); falling back to CPU.");
}
cfg_.useGpu = false;
cfg_.nGpuLayers = 0;
}
#endif

try {
engine_ = std::make_shared<tts_cpp::supertonic::Engine>(toEngineOptions(cfg_));
Expand Down
65 changes: 25 additions & 40 deletions packages/tts-ggml/addon/tests/test_supertonic_config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -80,65 +80,50 @@ TEST(SupertonicValidate, NonexistentNoiseNpyRejected) {
EXPECT_THROW(SupertonicModel{cfg}, StatusError);
}

TEST(SupertonicValidate, UseGpuTrueAcceptedAtConstruction) {
// QVAC-19255 (companion to PR-bump-to-tts-cpp-128dae42): Supertonic
// gained Vulkan/Metal GPU support in tts-cpp@2026-06-05 (QVAC-18605
// rounds 1-13 + QVAC-19254 sched). validateConfig must now ACCEPT
// useGPU=true at construction time. The stub GGUF file still fails
// parsing on load() β€” that's exercised below β€” but construction
// itself no longer rejects on GPU intent.
TEST(SupertonicValidate, UseGpuTrueRejectedWithExplanation) {
auto cfg = minimallyValidStubConfig();
cfg.useGpu = true;
std::unique_ptr<SupertonicModel> m;
EXPECT_NO_THROW(m = std::make_unique<SupertonicModel>(cfg));
ASSERT_NE(m, nullptr);
EXPECT_FALSE(m->isLoaded());
}

TEST(SupertonicValidate, NGpuLayersGreaterThanZeroAccepted) {
// Companion to UseGpuTrueAcceptedAtConstruction: explicit
// nGpuLayers > 0 is no longer rejected at validation. Loading the
// stub will throw on GGUF parse, but the constructor must succeed.
auto cfg = minimallyValidStubConfig();
cfg.nGpuLayers = 99;
std::unique_ptr<SupertonicModel> m;
EXPECT_NO_THROW(m = std::make_unique<SupertonicModel>(cfg));
ASSERT_NE(m, nullptr);
EXPECT_FALSE(m->isLoaded());
}

TEST(SupertonicValidate, UseGpuNGpuLayersConflictStillRejected) {
// The cross-field conflict check (useGPU=true + nGpuLayers=0, or
// useGPU=false + nGpuLayers!=0) is still enforced after the GPU
// gate was lifted, so callers can't silently get the opposite
// backend they asked for.
auto cfg = minimallyValidStubConfig();
cfg.useGpu = true;
cfg.nGpuLayers = 0;
bool threw = false;
try {
SupertonicModel m(cfg);
} catch (const StatusError& e) {
threw = true;
const std::string what = e.what();
EXPECT_NE(what.find("conflicts with nGpuLayers"), std::string::npos)
<< "error should explain the conflict; got: " << what;
EXPECT_NE(what.find("GPU"), std::string::npos)
<< "error should mention GPU; got: " << what;
EXPECT_NE(what.find("Supertonic"), std::string::npos)
<< "error should mention Supertonic engine; got: " << what;
}
EXPECT_TRUE(threw);
}

TEST(SupertonicValidate, NGpuLayersGreaterThanZeroRejected) {
auto cfg = minimallyValidStubConfig();
cfg.nGpuLayers = 99;
EXPECT_THROW(SupertonicModel{cfg}, StatusError);
}

TEST(SupertonicValidate, NGpuLayersZeroAcceptedAndDeferredLoad) {
auto cfg = minimallyValidStubConfig();
cfg.nGpuLayers = 0;
// Validation passes (CPU path); the stub file then fails GGUF
// parsing on load() (not at construction β€” load is deferred to
// waitForLoadInitialization). Locks the contract that construction
// succeeds for any internally-consistent CPU config.
// Validation passes (CPU-only path); the stub file then fails GGUF
// parsing on load() (not at construction β€” load is now deferred to
// waitForLoadInitialization). The eventual throw must NOT be the
// GPU-rejection branch.
std::unique_ptr<SupertonicModel> m;
EXPECT_NO_THROW(m = std::make_unique<SupertonicModel>(cfg));
ASSERT_NE(m, nullptr);
EXPECT_FALSE(m->isLoaded());
EXPECT_THROW(m->load(), StatusError);
bool threw = false;
try {
m->load();
} catch (const StatusError& e) {
threw = true;
const std::string what = e.what();
EXPECT_EQ(what.find("GPU"), std::string::npos)
<< "nGpuLayers=0 should not trigger the GPU-rejection path; got: " << what;
}
EXPECT_TRUE(threw);
EXPECT_FALSE(m->isLoaded());
}

Expand Down
4 changes: 2 additions & 2 deletions packages/tts-ggml/examples/supertonic-mtl-sweep-tts.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@
* `bash scripts/convert-models.sh -t supertonic-mtl`). The
* English-pinned single-sentence entry point lives in supertonic-tts.js.
*
* NOTE: Supertonic gained GPU support in tts-cpp@2026-06-05. This
* example keeps useGPU=false so it runs identically everywhere.
* NOTE: Supertonic is CPU-only in tts-cpp today. This example sets
* useGPU=false explicitly to match.
*/

const fs = require('bare-fs')
Expand Down
5 changes: 2 additions & 3 deletions packages/tts-ggml/examples/supertonic-mtl-tts.js
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,8 @@
* supertonic-mtl-sweep-tts.js; for the simpler English-pinned entry
* point see supertonic-tts.js.
*
* NOTE: Supertonic gained GPU support in tts-cpp@2026-06-05. This
* example keeps useGPU=false so it runs identically everywhere; flip
* to true on GPU-capable hosts to engage Metal / Vulkan / Adreno-OpenCL.
* NOTE: Supertonic is CPU-only in tts-cpp today. This example sets
* useGPU=false explicitly to match.
*/

const fs = require('bare-fs')
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@
* Expects the Supertonic GGUF at:
* models/supertonic.gguf
*
* NOTE: Supertonic gained GPU support in tts-cpp@2026-06-05; this
* example keeps useGPU=false so it runs identically everywhere. See
* supertonic-tts.js for the GPU opt-in pattern.
* NOTE: Supertonic is CPU-only in tts-cpp today; this example sets
* useGPU=false explicitly. See supertonic-tts.js for the full
* limitation context.
*/

const fs = require('bare-fs')
Expand Down
10 changes: 5 additions & 5 deletions packages/tts-ggml/examples/supertonic-tts.js
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@
* ONNX bundle into a single .gguf via
* scripts/convert-supertonic2-to-gguf.py --arch supertonic.
*
* NOTE: Supertonic gained GPU support in tts-cpp@2026-06-05 (QVAC-18605
* Vulkan/Metal optimisations + QVAC-19254 Adreno OpenCL sched). Pass
* useGPU=true on GPU-capable hosts to engage Metal / Vulkan / CUDA /
* Adreno-OpenCL via the tts-cpp init_gpu_backend tier policy; this
* example keeps useGPU=false so it runs identically everywhere.
* NOTE: Supertonic is CPU-only in tts-cpp today (engine docstring at
* include/tts-cpp/supertonic/engine.h: "CPU only today"). Passing
* useGPU=true throws at construction with a message pointing at the
* limitation; the example explicitly sets useGPU=false. Chatterbox
* (turbo + MTL) keeps GPU enabled by default.
*/

const fs = require('bare-fs')
Expand Down
4 changes: 2 additions & 2 deletions packages/tts-ggml/index.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ declare interface TTSGgmlFiles {
declare interface TTSGgmlRuntimeConfig {
/** Language code; default "en". Chatterbox MTL accepts es/fr/de/pt/it/zh/ja/ko/... */
language?: string
/** Route inference through a GPU backend (Metal / Vulkan / CUDA / OpenCL) if available, on either Chatterbox or Supertonic. Defaults to `false` for both engines (opt-in via `useGPU: true` on GPU-capable hosts). */
/** Route inference through a GPU backend (Metal / Vulkan / CUDA / OpenCL) if available. Defaults to `false` for both engines (opt-in via `useGPU: true` on GPU-capable hosts). Supertonic still rejects `useGPU: true` at construction time (engine is CPU-only today). */
useGPU?: boolean
/** Resample the engine's native rate (24 kHz Chatterbox, 44.1 kHz Supertonic) to this rate before emitting (8000-192000 Hz). */
outputSampleRate?: number
Expand All @@ -68,7 +68,7 @@ declare interface TTSGgmlOptions {
voiceDir?: string
/** RNG seed for CFM initial noise + SineGen excitation (Chatterbox) / vector-estimator latent (Supertonic). */
seed?: number
/** Move N layers to the GPU backend. Chatterbox + Supertonic: pass 99 to move everything. */
/** Move N layers to the GPU backend. Chatterbox: pass 99 to move everything. Supertonic: must be 0 / unset (engine is CPU-only today). */
nGpuLayers?: number
/** Override `std::thread::hardware_concurrency()`. */
threads?: number
Expand Down
21 changes: 16 additions & 5 deletions packages/tts-ggml/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -362,11 +362,22 @@ class TTSGgml {
'agnostic runStream() / runStreaming() / run({ streamOutput: true }) APIs.'
)
}
// GPU is supported as of tts-cpp@2026-06-05 (QVAC-18605 Supertonic
// Vulkan/Metal optimisations + QVAC-19254 sched/cpu_backend for
// Adreno OpenCL). Default-off mirrors Chatterbox; callers opt in
// with config: { useGPU: true } on GPU-capable hosts.
if (this._config.useGPU === undefined && this._nGpuLayers == null) {
const wantsGpu =
this._config.useGPU === true ||
(this._nGpuLayers != null && this._nGpuLayers !== 0)
if (wantsGpu) {
throw new Error(
'tts-ggml: GPU execution is not supported by the Supertonic engine yet ' +
'(see tts-cpp include/tts-cpp/supertonic/engine.h: "CPU only today"). ' +
'GPU output is currently silently wrong (~4x quieter, slightly truncated) ' +
'because the Vulkan path of the supertonic vector-estimator + vocoder is ' +
'not yet validated. Pass config: { useGPU: false } (and leave nGpuLayers ' +
'unset, or set it to 0) when constructing a Supertonic model. ' +
'Chatterbox also defaults to CPU now; opt in with ' +
'config: { useGPU: true } on GPU-capable hosts.'
)
}
if (this._config.useGPU === undefined) {
this._config.useGPU = false
}
} else if (this._config.useGPU === undefined && this._nGpuLayers == null) {
Expand Down
2 changes: 1 addition & 1 deletion packages/tts-ggml/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@qvac/tts-ggml",
"version": "0.2.1",
"version": "0.2.2",
"description": "Text to Speech (TTS) addon for qvac (ggml backend, wrapping the chatterbox + supertonic engines from tts-cpp)",
"addon": true,
"engines": {
Expand Down
52 changes: 17 additions & 35 deletions packages/tts-ggml/test/integration/gpu-smoke.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -164,43 +164,25 @@ test('Chatterbox GPU smoke - useGPU=true must engage the GPU backend on GPU-capa
}
})

test('Supertonic GPU smoke - useGPU=true must engage the GPU backend on GPU-capable platforms', { timeout: 600000, skip: NO_GPU }, async (t) => {
// QVAC-19255: Supertonic gained Vulkan/Metal/Adreno-OpenCL support
// in tts-cpp@2026-06-05 (QVAC-18605 rounds 1-13 + QVAC-19254 sched).
// This test mirrors the Chatterbox GPU smoke above: useGPU=true on
// a GPU-capable platform must resolve to a real GPU backend, not
// silently fall back to CPU.
const baseDir = getBaseDir()
const modelsDir = path.join(baseDir, 'models')

const download = await ensureSupertonicModel({ targetDir: modelsDir })
if (!download || !download.success) {
t.fail('Supertonic GGUF not available - registry fetch failed. Run `npm run download-models:registry` or stage models locally.')
return
}

const supertonicPath = download.path ||
path.join(modelsDir, 'supertonic.gguf')

const model = await loadSupertonicTTS({
supertonicModelPath: supertonicPath,
language: 'en',
voice: 'F1',
useGPU: true
})
test('Supertonic GPU smoke - useGPU=true is rejected at constructor (engine is CPU-only today)', { timeout: 60000 }, async (t) => {
const TTSGgml = require('@qvac/tts-ggml')
let threw = false
try {
const result = await runSupertonicTTS(
model,
{ text: 'GPU smoke check.' },
{ minSamples: 5000 }
)
console.log(result.output)
t.ok(result.passed, 'Supertonic/GPU produced expected sample count')
t.ok(result.data.sampleCount > 0, 'Supertonic/GPU produced audio')
assertGpuBackend(t, 'Supertonic', result.data.stats)
} finally {
try { await model.unload() } catch (_e) {}
/* eslint no-new: 0 */
new TTSGgml({
engine: TTSGgml.ENGINE_SUPERTONIC,
files: { supertonicModel: '/dev/null' },
voice: 'F1',
config: { language: 'en', useGPU: true }
})
} catch (e) {
threw = true
t.ok(/CPU only today/.test(e.message),
'rejection message references the engine docstring')
t.ok(/Pass config:.*useGPU: false/.test(e.message),
'rejection message tells user how to fix')
}
t.ok(threw, 'TTSGgml constructor should throw on Supertonic + useGPU:true')
})

// CPU smoke: useGPU:false must actually pin the engine to CPU on every
Expand Down
Loading
Loading