Skip to content

feat[notask|api]: add LavaSR speech enhancement support to TTS plugin#1310

Closed
sharmaraju352 wants to merge 11 commits into
mainfrom
feat/sdk-lavasr-tts-support
Closed

feat[notask|api]: add LavaSR speech enhancement support to TTS plugin#1310
sharmaraju352 wants to merge 11 commits into
mainfrom
feat/sdk-lavasr-tts-support

Conversation

@sharmaraju352

@sharmaraju352 sharmaraju352 commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Wire LavaSR neural speech enhancement (enhance, denoise) through the SDK's TTS plugin so users can opt into enhanced audio output at load time
  • Adds a nested enhancer config object to both Chatterbox and Supertonic model configs, with a discriminated union schema (type: "lavasr") to support future enhancer types
  • Bumps @qvac/tts-onnx to ^0.8.3 (LavaSR support landed in feat(tts): integrate LavaSR audio enhancer as opt-in post-processing #1142)

Changes

Schemas (schemas/text-to-speech.ts):

  • lavaSREnhancerRuntimeSchema: runtime-only flags (type, enhance, denoise) — no model sources
  • lavaSREnhancerConfigSchema: extends runtime schema with backboneSrc, specHeadSrc, denoiserSrc (optional) for load-time config
  • ttsEnhancerConfigSchema: discriminated union on type with refine validation (denoiserSrc required when denoise is true)
  • Both Chatterbox and Supertonic runtime + load-time config schemas gain an optional enhancer field
  • New exported types: TtsEnhancerRuntimeConfig, TtsEnhancerConfig, LavaSREnhancerConfig

Plugin (server/bare/plugins/onnx-tts/plugin.ts):

  • resolveEnhancerArtifacts() resolves model sources to paths (runs in parallel with main artifact resolution)
  • buildRuntimeEnhancer() strips model sources to produce runtime config
  • buildEnhancerArg() builds the addon-facing enhancer argument from runtime config + resolved artifacts
  • Both engine resolveConfig and createModel functions forward LavaSR config/artifacts to ONNXTTS

Tests (test/unit/tts-schemas.test.ts):

  • 24 schema validation tests covering all new enhancer schemas (load-time, runtime, discriminated union)
  • Boundary tests verifying runtime schemas strip load-time model source fields
  • Denoise/denoiserSrc refinement propagation through parent schemas

Example (examples/tts/chatterbox-enhanced.ts):

  • New A/B comparison example: raw Chatterbox (24 kHz) vs Chatterbox + LavaSR enhancement (48 kHz)
  • Uses registry model constants (TTS_ENHANCER_BACKBONE_LAVASR_FP32, TTS_ENHANCER_SPEC_HEAD_LAVASR_FP32, TTS_DENOISER_LAVASR_FP32)

New API

import {
  TTS_ENHANCER_BACKBONE_LAVASR_FP32,
  TTS_ENHANCER_SPEC_HEAD_LAVASR_FP32,
  TTS_DENOISER_LAVASR_FP32,
} from "@qvac/sdk";

const modelId = await loadModel({
  modelSrc: TTS_TOKENIZER_EN_CHATTERBOX.src,
  modelType: "tts",
  modelConfig: {
    ttsEngine: "chatterbox",
    language: "en",
    ttsTokenizerSrc: TTS_TOKENIZER_EN_CHATTERBOX.src,
    // ... other model sources ...
    referenceAudioSrc: "path/to/ref.wav",
    enhancer: {
      type: "lavasr",
      enhance: true,
      denoise: true,
      backboneSrc: TTS_ENHANCER_BACKBONE_LAVASR_FP32.src,
      specHeadSrc: TTS_ENHANCER_SPEC_HEAD_LAVASR_FP32.src,
      denoiserSrc: TTS_DENOISER_LAVASR_FP32.src,
    },
  },
});

const result = textToSpeech({
  modelId,
  text: "Hello world",
  inputType: "text",
  stream: false,
});

const buffer = await result.buffer;

Test plan

  • bun run build passes (lint + typecheck + compile)
  • Unit tests for all new enhancer schemas (24 tests in tts-schemas.test.ts)
  • Integration test with @qvac/tts-onnx v0.8.3 — chatterbox example runs end-to-end
  • Verify enhanced audio output produces higher quality WAV compared to raw (manual A/B with chatterbox-enhanced.ts)

@sharmaraju352 sharmaraju352 requested review from a team as code owners April 2, 2026 09:45
@kinsta

kinsta Bot commented Apr 7, 2026

Copy link
Copy Markdown

Preview deployments for qvac-docs-staging ⚡️

Status Branch preview Commit preview
✅ Ready Visit preview Visit preview

Commit: 4e99f5600424a1726838a86dcb6df3ff0347c376

Deployment ID: 3211d724-57c8-414b-83e0-5806225da1f3

Static site name: qvac-docs-staging-fazwv

@sharmaraju352 sharmaraju352 force-pushed the feat/sdk-lavasr-tts-support branch from 4e99f56 to 29ff9df Compare April 20, 2026 10:29
@sharmaraju352 sharmaraju352 changed the title feat(sdk)[api]: add LavaSR speech enhancement support to TTS plugin feat[notask][api]: add LavaSR speech enhancement support to TTS plugin Apr 20, 2026
@sharmaraju352 sharmaraju352 changed the title feat[notask][api]: add LavaSR speech enhancement support to TTS plugin feat[notask]: add LavaSR speech enhancement support to TTS plugin Apr 20, 2026
- Fix Supertonic test fixtures using stale pre-0.8.x schema field names
- Parallelize enhancer artifact resolution with main model artifacts
- Use TtsEnhancerRuntimeConfig type instead of inline type in buildEnhancerArg
- Add exhaustive switch defaults in enhancer helper functions
- Fix example error format to match convention

Made-with: Cursor

@ogad-tether ogad-tether left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the SDK-side LavaSR integration and the @qvac/tts-onnx 0.8.3 bump. The schema additions, artifact resolution plumbing, and new example line up with the addon API, and I did not find a blocking issue in the plugin path.

@ogad-tether ogad-tether left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the LavaSR SDK changes end-to-end. The schema split between load-time and runtime config is consistent with the plugin wiring, the new example matches the existing TTS example pattern, and the added schema coverage exercises the new enhancer surface well. I did not find a PR-specific regression in the changed files. One remaining non-code note: the main build job is still pending as of this review.

case "lavasr": {
const backbonePath = artifacts["enhancerBackbonePath"];
const specHeadPath = artifacts["enhancerSpecHeadPath"];
if (!backbonePath || !specHeadPath) return undefined;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be explicit exception, specifying which is missing.
If user has asked for lavasr enchancer, we shouldn't silently go with no enhancement, without any error -- that would be surprising behavior.

registerAddonLogger(modelId, ModelType.onnxTts, logger);
const referenceAudio = loadReferenceAudioAt24k(referenceAudioPath);
const enhancerArg = buildEnhancerArg(config.enhancer, artifacts);
const model = new ONNXTTS({

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to fix with this ONNXTTS args is - {...} as never -- this should be avoided, as any change in underlying library can suddenly break at runtime.
The addon has types, so please use them - would be nice to have.

Comment on lines +62 to +63
enhance: enhancer.enhance ?? false,
denoise: enhancer.denoise ?? false,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user who writes enhancer: { type: "lavasr", backboneSrc: ..., specHeadSrc: ... } (omitting the boolean flags) will download and load the enhancer models but then neither enhancement nor denoising actually runs. The enhancer config schema has both as optional() with no default. Worth asking: should enhance default to true when the user bothers to configure an enhancer? In this case, also reveal it in the schema perhaps?

export type TtsRuntimeConfig = z.infer<typeof ttsRuntimeConfigSchema>;
export type TtsEnhancerRuntimeConfig = z.infer<typeof ttsEnhancerRuntimeConfigSchema>;
export type TtsEnhancerConfig = z.infer<typeof ttsEnhancerConfigSchema>;
export type LavaSREnhancerConfig = z.infer<typeof lavaSREnhancerConfigSchema>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed to be public export ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem to be used anywhere and I don't see obvious use for it

@NamelsKing NamelsKing changed the title feat[notask]: add LavaSR speech enhancement support to TTS plugin feat[notask|api]: add LavaSR speech enhancement support to TTS plugin Apr 22, 2026
@NamelsKing NamelsKing marked this pull request as draft April 22, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants