feat(tts): integrate LavaSR audio enhancer as opt-in post-processing by sharmaraju352 · Pull Request #1142 · tetherto/qvac

sharmaraju352 · 2026-03-26T08:26:49Z

🎯 What problem does this PR solve?

TTS addon outputs audio at native engine sample rates (Chatterbox 24kHz, Supertonic 44.1kHz) with no way to improve quality beyond what the engine produces
Users have no control over output sample rate; higher-quality applications (48kHz, cleaner output) require manual post-processing outside QVAC

📝 How does it solve it?

Integrates LavaSR, a lightweight neural speech enhancement model, as an opt-in post-processing step with three independent config flags:

enhance (bool) — runs Vocos-based neural bandwidth extension to 48kHz (2 ONNX sessions, ~55MB)
denoise (bool) — runs UL-UNAS denoiser at 16kHz before enhancement (~1.7MB, 1 ONNX session)
outputSampleRate (int) — sets final output rate (must be in 8000–192000 range); with enhance, upscales neurally first then resamples conventionally

Pipeline: Engine synthesize → [denoise at 16kHz] → [enhance to 48kHz] → [resample to target rate]

All flags default to off — zero performance cost and full backward compatibility.

C++ implementation:

Pure DSP utilities (Lanczos resampler, radix-2 FFT, STFT/ISTFT, Slaney mel filterbank, spectral crossover merge) — no ML dependency, tested in isolation
LavaSR ONNX wrappers (denoiser with chunked overlap-add, enhancer with backbone + spec head + FastLR merge)
TTSModel::postProcess() integration after engine synthesize(), with lazy session loading for per-job toggles

JS bridge: enhance/denoise/outputSampleRate + individual model paths (backbonePath, specHeadPath, denoiserPath) flow through AddonJs.hpp → index.js → index.d.ts

Nice-to-haves included:

sampleRate exposed in JS output callback (data.sampleRate) and runtimeStats
Per-job enhance/denoise/outputSampleRate toggle via run() input (lazy ONNX session loading on first use)
Benchmark tests: enhancer ~22x realtime, denoiser ~48x realtime on Apple Silicon

🧪 How was it tested?

C++ (187 total, 183 pass, 4 skip for missing Supertonic models):

43 new unit tests — DSP round-trip validation (FFT, STFT/ISTFT, resampler, mel filterbank, spectral merge), LavaSR wrapper lifecycle, TTSModel config parsing + resampling + per-job toggle + backward compat
6 new integration tests with real ONNX models — enhancer/denoiser individually, full Chatterbox→LavaSR pipeline, denoise+enhance combined, enhance+outputSampleRate
3 benchmark tests measuring latency at 1s/3s/5s/10s durations
Zero regressions on all pre-existing tests

JS:

outputSampleRate resampling test added to addon.test.js (runs in CI without LavaSR models)
6 LavaSR integration tests in addon.test.js (Chatterbox+enhance, denoise+enhance, outputSampleRate, enhance+downsample, Supertonic+enhance, backward compat) — all assert sampleRate via callback
Locally verified: outputSampleRate=16000 produces 16kHz output with sampleRate correctly reported in callback

Manual verification:

Ran comparison example producing 3 WAV files (raw 24kHz, enhanced 48kHz, denoised+enhanced 48kHz) — enhanced output is audibly clearer

🔌 API Changes

const tts = new ONNXTTS({
  files: {
    tokenizerPath: 'models/chatterbox/tokenizer.json',
    speechEncoderPath: 'models/chatterbox/speech_encoder.onnx',
    embedTokensPath: 'models/chatterbox/embed_tokens.onnx',
    conditionalDecoderPath: 'models/chatterbox/conditional_decoder.onnx',
    languageModelPath: 'models/chatterbox/language_model.onnx',
  },
  referenceAudio,
  config: {
    language: 'en',
    outputSampleRate: 22050, // optional: resample after enhancement (8000–192000)
  },
  // LavaSR enhancement (nested enhancer config)
  enhancer: {
    type: 'lavasr',
    enhance: true,
    denoise: true,
    backbonePath: 'models/lavasr/enhancer_backbone.onnx',
    specHeadPath: 'models/lavasr/enhancer_spec_head.onnx',
    denoiserPath: 'models/lavasr/denoiser_core_legacy_fixed63.onnx',
  },
})

// Per-job toggle (lazy loads ONNX sessions on first use)
await model.run({
  type: 'text',
  input: 'Hello world',
  enhancer: { type: 'lavasr', enhance: true, denoise: false },
  outputSampleRate: 22050, // per-job override
})

// Output callback now includes sampleRate
response.onUpdate(data => {
  console.log(data.outputArray)  // Int16Array
  console.log(data.sampleRate)   // 48000 (if enhanced)
})

New TypeScript interfaces:

interface LavaSREnhancerConfig {
  type: 'lavasr'
  enhance?: boolean
  denoise?: boolean
  backbonePath?: string
  specHeadPath?: string
  denoiserPath?: string
}

type EnhancerConfig = LavaSREnhancerConfig

// Added to ONNXTTSOptions:
interface ONNXTTSOptions {
  enhancer?: EnhancerConfig
  // ...existing fields
}

// Added to ONNXTTSRuntimeConfig:
interface ONNXTTSRuntimeConfig {
  outputSampleRate?: number  // 8000–192000
  // ...existing fields
}

// Per-job run input:
interface TTSRunInput {
  enhancer?: { type: 'lavasr'; enhance?: boolean; denoise?: boolean }
  outputSampleRate?: number
  // ...existing fields
}

🔄 CI Run

Latest full CI pipeline: https://github.com/tetherto/qvac/actions/runs/24237672067

GustavoA1604

Need JS linting
Also I think automatic on-pr run of this PR got broken, please run manually and share results

Add neural speech enhancement (LavaSR) to the TTS package with three independent, opt-in config flags: `enhance` (Vocos BWE to 48kHz), `denoise` (UL-UNAS denoiser), and `outputSampleRate` (arbitrary target rate with smart algorithm selection). All flags default to off so backward compatibility is preserved; no new dependencies introduced. C++ implementation: - DSP utilities: Lanczos resampler, radix-2 FFT, windowed STFT/ISTFT, Slaney mel filterbank, spectral crossover merge (pure C++, no ML dep) - LavaSRDenoiser: chunked STFT-domain ONNX inference with overlap-add - LavaSREnhancer: backbone + spec head ONNX sessions with DSP pipeline - TTSModel::postProcess() pipeline: denoise -> enhance -> resample JS bridge: - AddonJs.hpp: 6 new config keys (enhance, denoise, outputSampleRate, enhancerBackbonePath, enhancerSpecHeadPath, denoiserPath) - index.js: constructor params, _getLavaSRParams(), download integration - index.d.ts: LavaSROptions interface, extended type declarations Testing: - 40 new C++ unit tests (DSP, LavaSR wrappers, TTSModel integration) - 3 C++ integration tests with real ONNX models (enhancer, denoiser, full Chatterbox+enhance pipeline verified producing 48kHz output) - JS integration test scaffolding for 6 enhancement scenarios - Model download helper for LavaSR from GitHub releases Made-with: Cursor

- Expose sampleRate in JS output callback: outputArray now includes a sampleRate field so consumers know the actual output rate. Also added to runtimeStats (JobEnded event). Uses shared atomic<int> between TTSModel and JsAudioOutputHandler. - Per-job enhance/denoise toggle: _runInternal now passes enhance, denoise, outputSampleRate from the job input to the native config, enabling per-utterance control. LavaSR sessions are lazily loaded on first use when toggled on per-job. - Benchmark tests: enhancer, denoiser, and resampler latency across 1s/3s/5s/10s audio durations. Results: enhancer ~22x realtime, denoiser ~48x realtime on Apple Silicon. Made-with: Cursor

- Remove unused constants to pass standard linter - Add release-notes/v0.6.2.md for release-notes-check workflow Made-with: Cursor

Made-with: Cursor

- Add outputSampleRate-only + backward-compat tests to addon.test.js (runs in CI without LavaSR models, validates resampling path) - Add per-job outputSampleRate toggle C++ test via AnyInput config - Add denoise+enhance combined pipeline C++ integration test - Add enhance+outputSampleRate combined path C++ integration test Made-with: Cursor

Made-with: Cursor

Follow existing codebase pattern where each model path is passed individually (enhancerBackbonePath, enhancerSpecHeadPath, denoiserPath) rather than via a directory. Consistent with how Chatterbox and Supertonic engines accept their model paths. Made-with: Cursor

Made-with: Cursor

- C++: guard padReflect against N<=1 input, hoist frame/chunk/shape vectors out of hot loops in StftProcessor and LavaSRDenoiser, cache StftProcessor in MelFilterbank instead of recreating per call - C++: parseLavaSRConfig only overwrites fields present in configMap so reload events don't clear existing paths with empty strings - C++: add cancel check before postProcess step - JS: forward enhancerBackbonePath/specHeadPath/denoiserPath in per-job config so run()-level overrides reach the C++ layer - Tests: move LavaSR tests from separate file into addon.test.js, switch outputSampleRate test to Supertonic (faster), verify reported sampleRate via callback assertions instead of only checking output is non-empty - Rename example-lavasr-compare.js to example-enhanced-audio.js Made-with: Cursor

…on, JS path handling Address remaining PR review findings for LavaSR integration: - Fix FastLRMerge division-by-zero when transition band is a single bin - Add input size validation in FastLRMerge for mismatched enhanced/original vectors - Fix PCM16 conversion asymmetry: use 32768.0f for symmetric [-1, 1) mapping - Hoist istft() frame allocation out of hot loop to avoid per-frame heap alloc - Send LavaSR model paths to C++ unconditionally (not gated behind boolean flags) - Update reload() to persist enhancer/denoiser model path changes - Log warning instead of silently swallowing stoi parse failures for outputSampleRate - Add clarifying comments on ORT_DISABLE_ALL, magnitude spectrogram, and cutoff logic - Add trailing newline to TTSModel.cpp Made-with: Cursor

- Fix PCM16 overflow: clamp scaled value to [-32768, 32767] before int16_t cast to avoid UB when audio reaches +1.0f - Add FastLRMerge input size validation: throw on mismatched lengths, return enhanced directly when original is empty - Add outputSampleRate range validation [8000, 192000] in both C++ (parseLavaSRConfig warns and rejects) and JS (constructor throws) - Extract shared DspConstants.hpp with inline constexpr PI, replacing duplicate anonymous-namespace definitions in 3 source files - Extract shared DspTestHelpers.hpp with generateSine, rms, maxAbsDiff, replacing duplicate definitions across 6 test files - Improve LavaSREnhancer CONFIG_SAMPLE_RATE comment with upstream ref URL - Make LavaSR integration tests fail loudly via t.fail() when model downloads fail, instead of silently returning Made-with: Cursor

Replace flat enhance/denoise/model-path constructor fields with a structured enhancer config object using a type discriminator: enhancer: { type: 'lavasr', enhance, denoise, backbonePath, specHeadPath, denoiserPath } - Constructor: accepts enhancer object + outputSampleRate at top level - Per-job run(): accepts enhancer with type + boolean toggles - reload(): merges enhancer config incrementally - Internally flattens to C++ addon's flat config map - Future enhancers added as new type values Made-with: Cursor

…erns, bump to 0.9.0 - Fix undeclared `accepted` variable in _runInternal (ReferenceError in strict mode) - Fix outputSampleRate validation: reject all out-of-range values, not just positive ones - Update all LavaSR integration tests and example to use 0.8.0 constructor API (files:{}, config:{}, enhancer:{} instead of flat top-level props) - Add cancel check before postProcess to allow early abort before expensive enhancement - Save/restore lavaSRConfig_ around per-job overrides to prevent state pollution - Guard negative outputSampleRate values in C++ parseLavaSRConfig - Remove BUILD_CLI option from CMakeLists.txt (out of scope for this PR) - Bump version to 0.9.0 - Remove double blank lines in index.js Made-with: Cursor

The second cancel check was dead code — exchange(false) on the first check already cleared the flag, so the adjacent second check always saw false. Made-with: Cursor

Made-with: Cursor

- Fix all LavaSR tests: use onUpdate callback pattern instead of accessing undefined result.data.outputArray - Move lavasrEnhancerConfig and loadReferenceAudio to test/utils/ - Switch LavaSR tests from Chatterbox to Supertonic (faster execution) - Remove separate backward-compat test; add reportedSampleRate assertion to existing Chatterbox and Supertonic tests - Rename section header to 'outputSampleRate tests' - Capture reportedSampleRate in runTTS utility - Apply clang-format to all changed C++ files - Add error handling to multilingual test reload/unload Made-with: Cursor

clang-format was accidentally run on the CMake file, breaking its syntax. Restore the original version. Made-with: Cursor

Revert clang-format noise in SupertonicEngine.cpp, ChatterboxEngine.cpp, OnnxInferSession.cpp/hpp — these files have no functional changes in this PR, only formatting from an overly broad clang-format run. Made-with: Cursor

LavaSR ONNX models (~58MB) are not bundled in the mobile test app and downloading them on device farm is unreliable. Skip all 5 LavaSR tests when running on iOS/Android. Made-with: Cursor

Made-with: Cursor

github-code-quality Bot found potential problems Mar 26, 2026

View reviewed changes

Comment thread packages/qvac-lib-infer-onnx-tts/test/integration/lavasr-enhance.test.js Fixed

sharmaraju352 force-pushed the feat/tts-lavasr-enhancer branch from e662283 to 3f13749 Compare March 26, 2026 08:54

sharmaraju352 added the verify label Mar 26, 2026

GustavoA1604 requested changes Apr 1, 2026

View reviewed changes

sharmaraju352 force-pushed the feat/tts-lavasr-enhancer branch from ec000af to 46bb092 Compare April 1, 2026 13:20

sharmaraju352 temporarily deployed to release April 1, 2026 13:20 — with GitHub Actions Inactive

sharmaraju352 had a problem deploying to release April 1, 2026 13:20 — with GitHub Actions Failure

sharmaraju352 temporarily deployed to release April 1, 2026 13:20 — with GitHub Actions Inactive

sharmaraju352 temporarily deployed to release April 1, 2026 13:29 — with GitHub Actions Inactive

sharmaraju352 had a problem deploying to release April 1, 2026 13:29 — with GitHub Actions Failure

sharmaraju352 temporarily deployed to release April 1, 2026 13:29 — with GitHub Actions Inactive

sharmaraju352 had a problem deploying to release April 1, 2026 13:29 — with GitHub Actions Failure

sharmaraju352 temporarily deployed to release April 1, 2026 13:29 — with GitHub Actions Inactive

sharmaraju352 had a problem deploying to release April 1, 2026 15:46 — with GitHub Actions Failure

sharmaraju352 mentioned this pull request Apr 2, 2026

feat[notask|api]: add LavaSR speech enhancement support to TTS plugin #1310

Closed

4 tasks

GustavoA1604 requested changes Apr 8, 2026

View reviewed changes

Comment thread packages/qvac-lib-infer-onnx-tts/package.json

Comment thread packages/qvac-lib-infer-onnx-tts/CMakeLists.txt Outdated

Comment thread packages/qvac-lib-infer-onnx-tts/index.js Outdated

Comment thread packages/qvac-lib-infer-onnx-tts/test/integration/addon.test.js

GustavoA1604 requested changes Apr 9, 2026

View reviewed changes

GustavoA1604 requested changes Apr 13, 2026

View reviewed changes

Comment thread .github/workflows/on-pr-qvac-lib-infer-onnx-tts.yml Outdated

Comment thread packages/qvac-lib-infer-onnx-tts/addon/src/model-interface/SupertonicEngine.cpp Outdated

Raju and others added 24 commits April 15, 2026 16:22

chore(tts): fix lint, add release notes for CI

9ee7aa4

- Remove unused constants to pass standard linter - Add release-notes/v0.6.2.md for release-notes-check workflow Made-with: Cursor

fix(tts): restore sample rate constants for lint compliance

fd268d7

Made-with: Cursor

style(tts): apply clang-format to LavaSR C++ files

d336948

Made-with: Cursor

fix(tts): construct ONNXTTS directly in outputSampleRate test

379887b

Made-with: Cursor

fix(tts): use onUpdate pattern in outputSampleRate integration test

de27da1

Made-with: Cursor

chore(tts): move release notes to CHANGELOG, remove release-notes file

ae6d2ad

Made-with: Cursor

fix(tts): remove duplicate cancelRequested_ check in postProcess path

c0a37fe

The second cancel check was dead code — exchange(false) on the first check already cleared the flag, so the adjacent second check always saw false. Made-with: Cursor

chore(tts): update changelog heading from Unreleased to 0.9.0

ae3b580

Made-with: Cursor

fix(tts): restore CMakeLists.txt mangled by clang-format

f7d31f0

clang-format was accidentally run on the CMake file, breaking its syntax. Restore the original version. Made-with: Cursor

test(tts): skip LavaSR integration tests on mobile

24b0f1e

LavaSR ONNX models (~58MB) are not bundled in the mobile test app and downloading them on device farm is unreliable. Skip all 5 LavaSR tests when running on iOS/Android. Made-with: Cursor

chore: retrigger CI

06de859

Made-with: Cursor

Merge branch 'main' into feat/tts-lavasr-enhancer

3e2de14

Change version bump to 0.8.3

5414b8b

GustavoA1604 approved these changes Apr 15, 2026

View reviewed changes

moromisato approved these changes Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tts): integrate LavaSR audio enhancer as opt-in post-processing#1142

feat(tts): integrate LavaSR audio enhancer as opt-in post-processing#1142
GustavoA1604 merged 24 commits into
mainfrom
feat/tts-lavasr-enhancer

sharmaraju352 commented Mar 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GustavoA1604 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sharmaraju352 commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What problem does this PR solve?

📝 How does it solve it?

🧪 How was it tested?

🔌 API Changes

🔄 CI Run

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GustavoA1604 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sharmaraju352 commented Mar 26, 2026 •

edited

Loading