Skip to content

feat(tts): integrate LavaSR audio enhancer as opt-in post-processing#1142

Merged
GustavoA1604 merged 24 commits into
mainfrom
feat/tts-lavasr-enhancer
Apr 15, 2026
Merged

feat(tts): integrate LavaSR audio enhancer as opt-in post-processing#1142
GustavoA1604 merged 24 commits into
mainfrom
feat/tts-lavasr-enhancer

Conversation

@sharmaraju352

@sharmaraju352 sharmaraju352 commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • TTS addon outputs audio at native engine sample rates (Chatterbox 24kHz, Supertonic 44.1kHz) with no way to improve quality beyond what the engine produces
  • Users have no control over output sample rate; higher-quality applications (48kHz, cleaner output) require manual post-processing outside QVAC

📝 How does it solve it?

Integrates LavaSR, a lightweight neural speech enhancement model, as an opt-in post-processing step with three independent config flags:

  • enhance (bool) — runs Vocos-based neural bandwidth extension to 48kHz (2 ONNX sessions, ~55MB)
  • denoise (bool) — runs UL-UNAS denoiser at 16kHz before enhancement (~1.7MB, 1 ONNX session)
  • outputSampleRate (int) — sets final output rate (must be in 8000–192000 range); with enhance, upscales neurally first then resamples conventionally

Pipeline: Engine synthesize → [denoise at 16kHz] → [enhance to 48kHz] → [resample to target rate]

All flags default to off — zero performance cost and full backward compatibility.

C++ implementation:

  • Pure DSP utilities (Lanczos resampler, radix-2 FFT, STFT/ISTFT, Slaney mel filterbank, spectral crossover merge) — no ML dependency, tested in isolation
  • LavaSR ONNX wrappers (denoiser with chunked overlap-add, enhancer with backbone + spec head + FastLR merge)
  • TTSModel::postProcess() integration after engine synthesize(), with lazy session loading for per-job toggles

JS bridge: enhance/denoise/outputSampleRate + individual model paths (backbonePath, specHeadPath, denoiserPath) flow through AddonJs.hppindex.jsindex.d.ts

Nice-to-haves included:

  • sampleRate exposed in JS output callback (data.sampleRate) and runtimeStats
  • Per-job enhance/denoise/outputSampleRate toggle via run() input (lazy ONNX session loading on first use)
  • Benchmark tests: enhancer ~22x realtime, denoiser ~48x realtime on Apple Silicon

🧪 How was it tested?

C++ (187 total, 183 pass, 4 skip for missing Supertonic models):

  • 43 new unit tests — DSP round-trip validation (FFT, STFT/ISTFT, resampler, mel filterbank, spectral merge), LavaSR wrapper lifecycle, TTSModel config parsing + resampling + per-job toggle + backward compat
  • 6 new integration tests with real ONNX models — enhancer/denoiser individually, full Chatterbox→LavaSR pipeline, denoise+enhance combined, enhance+outputSampleRate
  • 3 benchmark tests measuring latency at 1s/3s/5s/10s durations
  • Zero regressions on all pre-existing tests

JS:

  • outputSampleRate resampling test added to addon.test.js (runs in CI without LavaSR models)
  • 6 LavaSR integration tests in addon.test.js (Chatterbox+enhance, denoise+enhance, outputSampleRate, enhance+downsample, Supertonic+enhance, backward compat) — all assert sampleRate via callback
  • Locally verified: outputSampleRate=16000 produces 16kHz output with sampleRate correctly reported in callback

Manual verification:

  • Ran comparison example producing 3 WAV files (raw 24kHz, enhanced 48kHz, denoised+enhanced 48kHz) — enhanced output is audibly clearer

🔌 API Changes

const tts = new ONNXTTS({
  files: {
    tokenizerPath: 'models/chatterbox/tokenizer.json',
    speechEncoderPath: 'models/chatterbox/speech_encoder.onnx',
    embedTokensPath: 'models/chatterbox/embed_tokens.onnx',
    conditionalDecoderPath: 'models/chatterbox/conditional_decoder.onnx',
    languageModelPath: 'models/chatterbox/language_model.onnx',
  },
  referenceAudio,
  config: {
    language: 'en',
    outputSampleRate: 22050, // optional: resample after enhancement (8000–192000)
  },
  // LavaSR enhancement (nested enhancer config)
  enhancer: {
    type: 'lavasr',
    enhance: true,
    denoise: true,
    backbonePath: 'models/lavasr/enhancer_backbone.onnx',
    specHeadPath: 'models/lavasr/enhancer_spec_head.onnx',
    denoiserPath: 'models/lavasr/denoiser_core_legacy_fixed63.onnx',
  },
})

// Per-job toggle (lazy loads ONNX sessions on first use)
await model.run({
  type: 'text',
  input: 'Hello world',
  enhancer: { type: 'lavasr', enhance: true, denoise: false },
  outputSampleRate: 22050, // per-job override
})

// Output callback now includes sampleRate
response.onUpdate(data => {
  console.log(data.outputArray)  // Int16Array
  console.log(data.sampleRate)   // 48000 (if enhanced)
})

New TypeScript interfaces:

interface LavaSREnhancerConfig {
  type: 'lavasr'
  enhance?: boolean
  denoise?: boolean
  backbonePath?: string
  specHeadPath?: string
  denoiserPath?: string
}

type EnhancerConfig = LavaSREnhancerConfig

// Added to ONNXTTSOptions:
interface ONNXTTSOptions {
  enhancer?: EnhancerConfig
  // ...existing fields
}

// Added to ONNXTTSRuntimeConfig:
interface ONNXTTSRuntimeConfig {
  outputSampleRate?: number  // 8000–192000
  // ...existing fields
}

// Per-job run input:
interface TTSRunInput {
  enhancer?: { type: 'lavasr'; enhance?: boolean; denoise?: boolean }
  outputSampleRate?: number
  // ...existing fields
}

🔄 CI Run

Latest full CI pipeline: https://github.com/tetherto/qvac/actions/runs/24237672067

Comment thread packages/qvac-lib-infer-onnx-tts/test/integration/addon.test.js
Comment thread packages/qvac-lib-infer-onnx-tts/index.js
Comment thread packages/qvac-lib-infer-onnx-tts/addon/src/model-interface/TTSModel.cpp Outdated
Comment thread packages/qvac-lib-infer-onnx-tts/examples/example-enhanced-audio.js
Comment thread packages/qvac-lib-infer-onnx-tts/addon/src/model-interface/dsp/StftProcessor.cpp Outdated
Comment thread packages/qvac-lib-infer-onnx-tts/addon/src/model-interface/LavaSRDenoiser.cpp Outdated
Comment thread packages/qvac-lib-infer-onnx-tts/addon/src/model-interface/LavaSRDenoiser.cpp Outdated
Comment thread packages/qvac-lib-infer-onnx-tts/addon/src/model-interface/dsp/MelFilterbank.cpp Outdated
@sharmaraju352 sharmaraju352 force-pushed the feat/tts-lavasr-enhancer branch from ec000af to 46bb092 Compare April 1, 2026 13:20

@GustavoA1604 GustavoA1604 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Need JS linting
  • Also I think automatic on-pr run of this PR got broken, please run manually and share results

Comment thread packages/qvac-lib-infer-onnx-tts/package.json
Comment thread packages/qvac-lib-infer-onnx-tts/CMakeLists.txt Outdated
Comment thread packages/qvac-lib-infer-onnx-tts/index.js Outdated
Comment thread packages/qvac-lib-infer-onnx-tts/test/integration/addon.test.js
Comment thread packages/qvac-lib-infer-onnx-tts/test/integration/addon.test.js Outdated
Comment thread packages/qvac-lib-infer-onnx-tts/test/integration/addon.test.js Outdated
Comment thread packages/qvac-lib-infer-onnx-tts/test/integration/addon.test.js
Comment thread packages/qvac-lib-infer-onnx-tts/test/integration/addon.test.js Outdated
Comment thread .github/workflows/on-pr-qvac-lib-infer-onnx-tts.yml Outdated
Comment thread packages/qvac-lib-infer-onnx-tts/addon/src/model-interface/SupertonicEngine.cpp Outdated
Raju and others added 24 commits April 15, 2026 16:22
Add neural speech enhancement (LavaSR) to the TTS package with three
independent, opt-in config flags: `enhance` (Vocos BWE to 48kHz),
`denoise` (UL-UNAS denoiser), and `outputSampleRate` (arbitrary target
rate with smart algorithm selection). All flags default to off so
backward compatibility is preserved; no new dependencies introduced.

C++ implementation:
- DSP utilities: Lanczos resampler, radix-2 FFT, windowed STFT/ISTFT,
  Slaney mel filterbank, spectral crossover merge (pure C++, no ML dep)
- LavaSRDenoiser: chunked STFT-domain ONNX inference with overlap-add
- LavaSREnhancer: backbone + spec head ONNX sessions with DSP pipeline
- TTSModel::postProcess() pipeline: denoise -> enhance -> resample

JS bridge:
- AddonJs.hpp: 6 new config keys (enhance, denoise, outputSampleRate,
  enhancerBackbonePath, enhancerSpecHeadPath, denoiserPath)
- index.js: constructor params, _getLavaSRParams(), download integration
- index.d.ts: LavaSROptions interface, extended type declarations

Testing:
- 40 new C++ unit tests (DSP, LavaSR wrappers, TTSModel integration)
- 3 C++ integration tests with real ONNX models (enhancer, denoiser,
  full Chatterbox+enhance pipeline verified producing 48kHz output)
- JS integration test scaffolding for 6 enhancement scenarios
- Model download helper for LavaSR from GitHub releases

Made-with: Cursor
- Expose sampleRate in JS output callback: outputArray now includes a
  sampleRate field so consumers know the actual output rate. Also added
  to runtimeStats (JobEnded event). Uses shared atomic<int> between
  TTSModel and JsAudioOutputHandler.
- Per-job enhance/denoise toggle: _runInternal now passes enhance,
  denoise, outputSampleRate from the job input to the native config,
  enabling per-utterance control. LavaSR sessions are lazily loaded on
  first use when toggled on per-job.
- Benchmark tests: enhancer, denoiser, and resampler latency across
  1s/3s/5s/10s audio durations. Results: enhancer ~22x realtime,
  denoiser ~48x realtime on Apple Silicon.

Made-with: Cursor
- Remove unused constants to pass standard linter
- Add release-notes/v0.6.2.md for release-notes-check workflow

Made-with: Cursor
- Add outputSampleRate-only + backward-compat tests to addon.test.js
  (runs in CI without LavaSR models, validates resampling path)
- Add per-job outputSampleRate toggle C++ test via AnyInput config
- Add denoise+enhance combined pipeline C++ integration test
- Add enhance+outputSampleRate combined path C++ integration test

Made-with: Cursor
Follow existing codebase pattern where each model path is passed
individually (enhancerBackbonePath, enhancerSpecHeadPath, denoiserPath)
rather than via a directory. Consistent with how Chatterbox and
Supertonic engines accept their model paths.

Made-with: Cursor
- C++: guard padReflect against N<=1 input, hoist frame/chunk/shape
  vectors out of hot loops in StftProcessor and LavaSRDenoiser,
  cache StftProcessor in MelFilterbank instead of recreating per call
- C++: parseLavaSRConfig only overwrites fields present in configMap
  so reload events don't clear existing paths with empty strings
- C++: add cancel check before postProcess step
- JS: forward enhancerBackbonePath/specHeadPath/denoiserPath in
  per-job config so run()-level overrides reach the C++ layer
- Tests: move LavaSR tests from separate file into addon.test.js,
  switch outputSampleRate test to Supertonic (faster), verify
  reported sampleRate via callback assertions instead of only
  checking output is non-empty
- Rename example-lavasr-compare.js to example-enhanced-audio.js

Made-with: Cursor
…on, JS path handling

Address remaining PR review findings for LavaSR integration:

- Fix FastLRMerge division-by-zero when transition band is a single bin
- Add input size validation in FastLRMerge for mismatched enhanced/original vectors
- Fix PCM16 conversion asymmetry: use 32768.0f for symmetric [-1, 1) mapping
- Hoist istft() frame allocation out of hot loop to avoid per-frame heap alloc
- Send LavaSR model paths to C++ unconditionally (not gated behind boolean flags)
- Update reload() to persist enhancer/denoiser model path changes
- Log warning instead of silently swallowing stoi parse failures for outputSampleRate
- Add clarifying comments on ORT_DISABLE_ALL, magnitude spectrogram, and cutoff logic
- Add trailing newline to TTSModel.cpp

Made-with: Cursor
- Fix PCM16 overflow: clamp scaled value to [-32768, 32767] before
  int16_t cast to avoid UB when audio reaches +1.0f
- Add FastLRMerge input size validation: throw on mismatched lengths,
  return enhanced directly when original is empty
- Add outputSampleRate range validation [8000, 192000] in both C++
  (parseLavaSRConfig warns and rejects) and JS (constructor throws)
- Extract shared DspConstants.hpp with inline constexpr PI, replacing
  duplicate anonymous-namespace definitions in 3 source files
- Extract shared DspTestHelpers.hpp with generateSine, rms, maxAbsDiff,
  replacing duplicate definitions across 6 test files
- Improve LavaSREnhancer CONFIG_SAMPLE_RATE comment with upstream ref URL
- Make LavaSR integration tests fail loudly via t.fail() when model
  downloads fail, instead of silently returning

Made-with: Cursor
Replace flat enhance/denoise/model-path constructor fields with a
structured enhancer config object using a type discriminator:

  enhancer: { type: 'lavasr', enhance, denoise, backbonePath, specHeadPath, denoiserPath }

- Constructor: accepts enhancer object + outputSampleRate at top level
- Per-job run(): accepts enhancer with type + boolean toggles
- reload(): merges enhancer config incrementally
- Internally flattens to C++ addon's flat config map
- Future enhancers added as new type values

Made-with: Cursor
…erns, bump to 0.9.0

- Fix undeclared `accepted` variable in _runInternal (ReferenceError in strict mode)
- Fix outputSampleRate validation: reject all out-of-range values, not just positive ones
- Update all LavaSR integration tests and example to use 0.8.0 constructor API
  (files:{}, config:{}, enhancer:{} instead of flat top-level props)
- Add cancel check before postProcess to allow early abort before expensive enhancement
- Save/restore lavaSRConfig_ around per-job overrides to prevent state pollution
- Guard negative outputSampleRate values in C++ parseLavaSRConfig
- Remove BUILD_CLI option from CMakeLists.txt (out of scope for this PR)
- Bump version to 0.9.0
- Remove double blank lines in index.js

Made-with: Cursor
The second cancel check was dead code — exchange(false) on the first
check already cleared the flag, so the adjacent second check always
saw false.

Made-with: Cursor
- Fix all LavaSR tests: use onUpdate callback pattern instead of
  accessing undefined result.data.outputArray
- Move lavasrEnhancerConfig and loadReferenceAudio to test/utils/
- Switch LavaSR tests from Chatterbox to Supertonic (faster execution)
- Remove separate backward-compat test; add reportedSampleRate
  assertion to existing Chatterbox and Supertonic tests
- Rename section header to 'outputSampleRate tests'
- Capture reportedSampleRate in runTTS utility
- Apply clang-format to all changed C++ files
- Add error handling to multilingual test reload/unload

Made-with: Cursor
clang-format was accidentally run on the CMake file, breaking its
syntax. Restore the original version.

Made-with: Cursor
Revert clang-format noise in SupertonicEngine.cpp, ChatterboxEngine.cpp,
OnnxInferSession.cpp/hpp — these files have no functional changes in
this PR, only formatting from an overly broad clang-format run.

Made-with: Cursor
LavaSR ONNX models (~58MB) are not bundled in the mobile test app
and downloading them on device farm is unreliable. Skip all 5 LavaSR
tests when running on iOS/Android.

Made-with: Cursor
Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants