feat(tts): integrate LavaSR audio enhancer as opt-in post-processing#1142
Merged
Conversation
e662283 to
3f13749
Compare
GustavoA1604
requested changes
Apr 1, 2026
ec000af to
46bb092
Compare
4 tasks
GustavoA1604
requested changes
Apr 8, 2026
GustavoA1604
left a comment
Contributor
There was a problem hiding this comment.
- Need JS linting
- Also I think automatic on-pr run of this PR got broken, please run manually and share results
GustavoA1604
requested changes
Apr 9, 2026
GustavoA1604
requested changes
Apr 13, 2026
Add neural speech enhancement (LavaSR) to the TTS package with three independent, opt-in config flags: `enhance` (Vocos BWE to 48kHz), `denoise` (UL-UNAS denoiser), and `outputSampleRate` (arbitrary target rate with smart algorithm selection). All flags default to off so backward compatibility is preserved; no new dependencies introduced. C++ implementation: - DSP utilities: Lanczos resampler, radix-2 FFT, windowed STFT/ISTFT, Slaney mel filterbank, spectral crossover merge (pure C++, no ML dep) - LavaSRDenoiser: chunked STFT-domain ONNX inference with overlap-add - LavaSREnhancer: backbone + spec head ONNX sessions with DSP pipeline - TTSModel::postProcess() pipeline: denoise -> enhance -> resample JS bridge: - AddonJs.hpp: 6 new config keys (enhance, denoise, outputSampleRate, enhancerBackbonePath, enhancerSpecHeadPath, denoiserPath) - index.js: constructor params, _getLavaSRParams(), download integration - index.d.ts: LavaSROptions interface, extended type declarations Testing: - 40 new C++ unit tests (DSP, LavaSR wrappers, TTSModel integration) - 3 C++ integration tests with real ONNX models (enhancer, denoiser, full Chatterbox+enhance pipeline verified producing 48kHz output) - JS integration test scaffolding for 6 enhancement scenarios - Model download helper for LavaSR from GitHub releases Made-with: Cursor
- Expose sampleRate in JS output callback: outputArray now includes a sampleRate field so consumers know the actual output rate. Also added to runtimeStats (JobEnded event). Uses shared atomic<int> between TTSModel and JsAudioOutputHandler. - Per-job enhance/denoise toggle: _runInternal now passes enhance, denoise, outputSampleRate from the job input to the native config, enabling per-utterance control. LavaSR sessions are lazily loaded on first use when toggled on per-job. - Benchmark tests: enhancer, denoiser, and resampler latency across 1s/3s/5s/10s audio durations. Results: enhancer ~22x realtime, denoiser ~48x realtime on Apple Silicon. Made-with: Cursor
- Remove unused constants to pass standard linter - Add release-notes/v0.6.2.md for release-notes-check workflow Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
- Add outputSampleRate-only + backward-compat tests to addon.test.js (runs in CI without LavaSR models, validates resampling path) - Add per-job outputSampleRate toggle C++ test via AnyInput config - Add denoise+enhance combined pipeline C++ integration test - Add enhance+outputSampleRate combined path C++ integration test Made-with: Cursor
Made-with: Cursor
Follow existing codebase pattern where each model path is passed individually (enhancerBackbonePath, enhancerSpecHeadPath, denoiserPath) rather than via a directory. Consistent with how Chatterbox and Supertonic engines accept their model paths. Made-with: Cursor
Made-with: Cursor
- C++: guard padReflect against N<=1 input, hoist frame/chunk/shape vectors out of hot loops in StftProcessor and LavaSRDenoiser, cache StftProcessor in MelFilterbank instead of recreating per call - C++: parseLavaSRConfig only overwrites fields present in configMap so reload events don't clear existing paths with empty strings - C++: add cancel check before postProcess step - JS: forward enhancerBackbonePath/specHeadPath/denoiserPath in per-job config so run()-level overrides reach the C++ layer - Tests: move LavaSR tests from separate file into addon.test.js, switch outputSampleRate test to Supertonic (faster), verify reported sampleRate via callback assertions instead of only checking output is non-empty - Rename example-lavasr-compare.js to example-enhanced-audio.js Made-with: Cursor
…on, JS path handling Address remaining PR review findings for LavaSR integration: - Fix FastLRMerge division-by-zero when transition band is a single bin - Add input size validation in FastLRMerge for mismatched enhanced/original vectors - Fix PCM16 conversion asymmetry: use 32768.0f for symmetric [-1, 1) mapping - Hoist istft() frame allocation out of hot loop to avoid per-frame heap alloc - Send LavaSR model paths to C++ unconditionally (not gated behind boolean flags) - Update reload() to persist enhancer/denoiser model path changes - Log warning instead of silently swallowing stoi parse failures for outputSampleRate - Add clarifying comments on ORT_DISABLE_ALL, magnitude spectrogram, and cutoff logic - Add trailing newline to TTSModel.cpp Made-with: Cursor
- Fix PCM16 overflow: clamp scaled value to [-32768, 32767] before int16_t cast to avoid UB when audio reaches +1.0f - Add FastLRMerge input size validation: throw on mismatched lengths, return enhanced directly when original is empty - Add outputSampleRate range validation [8000, 192000] in both C++ (parseLavaSRConfig warns and rejects) and JS (constructor throws) - Extract shared DspConstants.hpp with inline constexpr PI, replacing duplicate anonymous-namespace definitions in 3 source files - Extract shared DspTestHelpers.hpp with generateSine, rms, maxAbsDiff, replacing duplicate definitions across 6 test files - Improve LavaSREnhancer CONFIG_SAMPLE_RATE comment with upstream ref URL - Make LavaSR integration tests fail loudly via t.fail() when model downloads fail, instead of silently returning Made-with: Cursor
Replace flat enhance/denoise/model-path constructor fields with a
structured enhancer config object using a type discriminator:
enhancer: { type: 'lavasr', enhance, denoise, backbonePath, specHeadPath, denoiserPath }
- Constructor: accepts enhancer object + outputSampleRate at top level
- Per-job run(): accepts enhancer with type + boolean toggles
- reload(): merges enhancer config incrementally
- Internally flattens to C++ addon's flat config map
- Future enhancers added as new type values
Made-with: Cursor
…erns, bump to 0.9.0
- Fix undeclared `accepted` variable in _runInternal (ReferenceError in strict mode)
- Fix outputSampleRate validation: reject all out-of-range values, not just positive ones
- Update all LavaSR integration tests and example to use 0.8.0 constructor API
(files:{}, config:{}, enhancer:{} instead of flat top-level props)
- Add cancel check before postProcess to allow early abort before expensive enhancement
- Save/restore lavaSRConfig_ around per-job overrides to prevent state pollution
- Guard negative outputSampleRate values in C++ parseLavaSRConfig
- Remove BUILD_CLI option from CMakeLists.txt (out of scope for this PR)
- Bump version to 0.9.0
- Remove double blank lines in index.js
Made-with: Cursor
The second cancel check was dead code — exchange(false) on the first check already cleared the flag, so the adjacent second check always saw false. Made-with: Cursor
Made-with: Cursor
- Fix all LavaSR tests: use onUpdate callback pattern instead of accessing undefined result.data.outputArray - Move lavasrEnhancerConfig and loadReferenceAudio to test/utils/ - Switch LavaSR tests from Chatterbox to Supertonic (faster execution) - Remove separate backward-compat test; add reportedSampleRate assertion to existing Chatterbox and Supertonic tests - Rename section header to 'outputSampleRate tests' - Capture reportedSampleRate in runTTS utility - Apply clang-format to all changed C++ files - Add error handling to multilingual test reload/unload Made-with: Cursor
clang-format was accidentally run on the CMake file, breaking its syntax. Restore the original version. Made-with: Cursor
Revert clang-format noise in SupertonicEngine.cpp, ChatterboxEngine.cpp, OnnxInferSession.cpp/hpp — these files have no functional changes in this PR, only formatting from an overly broad clang-format run. Made-with: Cursor
LavaSR ONNX models (~58MB) are not bundled in the mobile test app and downloading them on device farm is unreliable. Skip all 5 LavaSR tests when running on iOS/Android. Made-with: Cursor
Made-with: Cursor
GustavoA1604
approved these changes
Apr 15, 2026
moromisato
approved these changes
Apr 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
📝 How does it solve it?
Integrates LavaSR, a lightweight neural speech enhancement model, as an opt-in post-processing step with three independent config flags:
enhance(bool) — runs Vocos-based neural bandwidth extension to 48kHz (2 ONNX sessions, ~55MB)denoise(bool) — runs UL-UNAS denoiser at 16kHz before enhancement (~1.7MB, 1 ONNX session)outputSampleRate(int) — sets final output rate (must be in 8000–192000 range); withenhance, upscales neurally first then resamples conventionallyPipeline:
Engine synthesize → [denoise at 16kHz] → [enhance to 48kHz] → [resample to target rate]All flags default to off — zero performance cost and full backward compatibility.
C++ implementation:
TTSModel::postProcess()integration after enginesynthesize(), with lazy session loading for per-job togglesJS bridge:
enhance/denoise/outputSampleRate+ individual model paths (backbonePath,specHeadPath,denoiserPath) flow throughAddonJs.hpp→index.js→index.d.tsNice-to-haves included:
sampleRateexposed in JS output callback (data.sampleRate) andruntimeStatsenhance/denoise/outputSampleRatetoggle viarun()input (lazy ONNX session loading on first use)🧪 How was it tested?
C++ (187 total, 183 pass, 4 skip for missing Supertonic models):
JS:
outputSampleRateresampling test added toaddon.test.js(runs in CI without LavaSR models)addon.test.js(Chatterbox+enhance, denoise+enhance, outputSampleRate, enhance+downsample, Supertonic+enhance, backward compat) — all assertsampleRatevia callbacksampleRatecorrectly reported in callbackManual verification:
🔌 API Changes
New TypeScript interfaces:
🔄 CI Run
Latest full CI pipeline: https://github.com/tetherto/qvac/actions/runs/24237672067