feat[notask|api]: add LavaSR speech enhancement support to TTS plugin#1310
feat[notask|api]: add LavaSR speech enhancement support to TTS plugin#1310sharmaraju352 wants to merge 11 commits into
Conversation
Preview deployments for qvac-docs-staging ⚡️
Commit: Deployment ID: Static site name: |
4e99f56 to
29ff9df
Compare
- Fix Supertonic test fixtures using stale pre-0.8.x schema field names - Parallelize enhancer artifact resolution with main model artifacts - Use TtsEnhancerRuntimeConfig type instead of inline type in buildEnhancerArg - Add exhaustive switch defaults in enhancer helper functions - Fix example error format to match convention Made-with: Cursor
Made-with: Cursor
ogad-tether
left a comment
There was a problem hiding this comment.
Reviewed the SDK-side LavaSR integration and the @qvac/tts-onnx 0.8.3 bump. The schema additions, artifact resolution plumbing, and new example line up with the addon API, and I did not find a blocking issue in the plugin path.
ogad-tether
left a comment
There was a problem hiding this comment.
Reviewed the LavaSR SDK changes end-to-end. The schema split between load-time and runtime config is consistent with the plugin wiring, the new example matches the existing TTS example pattern, and the added schema coverage exercises the new enhancer surface well. I did not find a PR-specific regression in the changed files. One remaining non-code note: the main build job is still pending as of this review.
| case "lavasr": { | ||
| const backbonePath = artifacts["enhancerBackbonePath"]; | ||
| const specHeadPath = artifacts["enhancerSpecHeadPath"]; | ||
| if (!backbonePath || !specHeadPath) return undefined; |
There was a problem hiding this comment.
I think this should be explicit exception, specifying which is missing.
If user has asked for lavasr enchancer, we shouldn't silently go with no enhancement, without any error -- that would be surprising behavior.
| registerAddonLogger(modelId, ModelType.onnxTts, logger); | ||
| const referenceAudio = loadReferenceAudioAt24k(referenceAudioPath); | ||
| const enhancerArg = buildEnhancerArg(config.enhancer, artifacts); | ||
| const model = new ONNXTTS({ |
There was a problem hiding this comment.
One thing to fix with this ONNXTTS args is - {...} as never -- this should be avoided, as any change in underlying library can suddenly break at runtime.
The addon has types, so please use them - would be nice to have.
| enhance: enhancer.enhance ?? false, | ||
| denoise: enhancer.denoise ?? false, |
There was a problem hiding this comment.
A user who writes enhancer: { type: "lavasr", backboneSrc: ..., specHeadSrc: ... } (omitting the boolean flags) will download and load the enhancer models but then neither enhancement nor denoising actually runs. The enhancer config schema has both as optional() with no default. Worth asking: should enhance default to true when the user bothers to configure an enhancer? In this case, also reveal it in the schema perhaps?
| export type TtsRuntimeConfig = z.infer<typeof ttsRuntimeConfigSchema>; | ||
| export type TtsEnhancerRuntimeConfig = z.infer<typeof ttsEnhancerRuntimeConfigSchema>; | ||
| export type TtsEnhancerConfig = z.infer<typeof ttsEnhancerConfigSchema>; | ||
| export type LavaSREnhancerConfig = z.infer<typeof lavaSREnhancerConfigSchema>; |
There was a problem hiding this comment.
Is this actually needed to be public export ?
There was a problem hiding this comment.
doesn't seem to be used anywhere and I don't see obvious use for it
Summary
enhancerconfig object to both Chatterbox and Supertonic model configs, with a discriminated union schema (type: "lavasr") to support future enhancer types@qvac/tts-onnxto^0.8.3(LavaSR support landed in feat(tts): integrate LavaSR audio enhancer as opt-in post-processing #1142)Changes
Schemas (
schemas/text-to-speech.ts):lavaSREnhancerRuntimeSchema: runtime-only flags (type,enhance,denoise) — no model sourceslavaSREnhancerConfigSchema: extends runtime schema withbackboneSrc,specHeadSrc,denoiserSrc(optional) for load-time configttsEnhancerConfigSchema: discriminated union ontypewith refine validation (denoiserSrc required when denoise is true)enhancerfieldTtsEnhancerRuntimeConfig,TtsEnhancerConfig,LavaSREnhancerConfigPlugin (
server/bare/plugins/onnx-tts/plugin.ts):resolveEnhancerArtifacts()resolves model sources to paths (runs in parallel with main artifact resolution)buildRuntimeEnhancer()strips model sources to produce runtime configbuildEnhancerArg()builds the addon-facing enhancer argument from runtime config + resolved artifactsresolveConfigandcreateModelfunctions forward LavaSR config/artifacts toONNXTTSTests (
test/unit/tts-schemas.test.ts):Example (
examples/tts/chatterbox-enhanced.ts):TTS_ENHANCER_BACKBONE_LAVASR_FP32,TTS_ENHANCER_SPEC_HEAD_LAVASR_FP32,TTS_DENOISER_LAVASR_FP32)New API
Test plan
bun run buildpasses (lint + typecheck + compile)tts-schemas.test.ts)@qvac/tts-onnxv0.8.3 — chatterbox example runs end-to-endchatterbox-enhanced.ts)