QVAC-17236 [Chatterbox] Investigate possibilities of reducing RTF by Zbig9000 · Pull Request #1674 · tetherto/qvac

Zbig9000 · 2026-04-20T11:23:44Z

Previous Session Summary: Chatterbox RTF Reduction
Goal: Reduce the Real-Time Factor (RTF) for the Chatterbox TTS model in packages/qvac-lib-infer-onnx-tts.

Bottleneck Analysis
The inference pipeline was profiled and broken down:

Phase English (q4) Multilingual
Speech Encoder
~2.7s
~2.7s
LM Generation
~25s (62%)
~23s (17%)
Conditional Decoder
~16s (37%)
~109s (82%)
Optimizations Implemented

Speech Encoder Output Caching (HIGH IMPACT)
- Added SpeechEncoderCache struct to store audio features, prompt tokens, speaker embeddings, and speaker features
- On first synthesize() call, the speech encoder runs and results are cached
- Subsequent calls (e.g., in runStream() multi-chunk mode) skip the encoder entirely
- Saves ~2.7s per subsequent call
Optimized Vector Operations
- Replaced expensive insert(begin, ...) prepends in prepareCfgEmbeddings with reserve+append pattern using std::move
- Replaced wav.erase(begin, ...) in trimPromptFromWaveform with std::move + resize
Configurable Thread Count for ONNX Sessions
- Added numThreads config parameter flowing from JS through to OnnxInferSession
- Default remains 1 thread for backward compatibility; users can set higher values
- Benchmark result: 25.1% faster with 4 threads (RTF 20.92 -> 15.67 for English)
Per-Phase Timing Instrumentation
- Added std::chrono timing around speech encoder, LM generation, and conditional decoder phases
  Test Results
- All 160 C++ unit tests pass (156 original + 4 new SpeechEncoderCacheTest tests)
- Benchmark scripts confirmed caching and threading improvements
- Integration test failure was pre-existing (model files not at expected path)

Remaining Opportunities Identified (not yet implemented)
- KV cache optimization: Avoiding redundant tensor copies between ORT and CPU vectors (major effort)
- ORT IO binding: Directly chaining output tensors as inputs to next step
- Conditional decoder: Dominates multilingual RTF (82%) but is a single large model — limited optimization without model-level changes

Pls take a look: @freddy311082, @GustavoA1604

…tro paragraph) to reflect that the encoder runs during load(), not on first synthesize()

…cing-RTF

…s-onnx: bump "0.8.4" → "version": "0.8.5"

…cing-RTF

GustavoA1604 · 2026-04-22T16:17:51Z

/review

github-actions · 2026-04-22T16:18:21Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (2/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

Zbig9000 requested review from GustavoA1604, freddy311082 and mario-rei April 20, 2026 11:23

Zbig9000 requested review from a team as code owners April 20, 2026 11:23

Zbig9000 had a problem deploying to release April 20, 2026 11:24 — with GitHub Actions Failure

Zbig9000 temporarily deployed to release April 20, 2026 14:20 — with GitHub Actions Inactive

Zbig9000 had a problem deploying to release April 20, 2026 14:20 — with GitHub Actions Failure

Zbig9000 temporarily deployed to release April 20, 2026 14:20 — with GitHub Actions Inactive

Zbig9000 had a problem deploying to release April 20, 2026 14:42 — with GitHub Actions Failure

GustavoA1604 had a problem deploying to release April 20, 2026 19:42 — with GitHub Actions Failure

Zbig9000 temporarily deployed to release April 21, 2026 08:51 — with GitHub Actions Inactive

GustavoA1604 previously approved these changes Apr 21, 2026

View reviewed changes

Zbig9000 had a problem deploying to release April 21, 2026 08:57 — with GitHub Actions Failure

Zbig9000 mentioned this pull request Apr 21, 2026

Qvac 17489 io binding for kv cache chaining #1686

Merged

GustavoA1604 previously approved these changes Apr 21, 2026

View reviewed changes

ogad-tether previously approved these changes Apr 22, 2026

View reviewed changes

Zbig9000 added 3 commits April 22, 2026 10:14

QVAC-17236 [Chatterbox] Investigate possibilities of reducing RTF

9dcbf31

QVAC-17236 Rewrote the "Speech encoder output caching" bullet (and in…

ce57810

…tro paragraph) to reflect that the encoder runs during load(), not on first synthesize()

Merge branch 'main' into QVAC-17236-Investigate-possibilities-of-redu…

69a0053

…cing-RTF

ogad-tether previously approved these changes Apr 22, 2026

View reviewed changes

mario-rei previously approved these changes Apr 22, 2026

View reviewed changes

Zbig9000 added 3 commits April 22, 2026 15:21

Merge branch 'main' into QVAC-17236-Investigate-possibilities-of-redu…

59eefb9

…cing-RTF

QVAC-17236-Investigate-possibilities-of-reducing-RTF version @qvac/tt…

89f8a03

…s-onnx: bump "0.8.4" → "version": "0.8.5"

Merge branch 'main' into QVAC-17236-Investigate-possibilities-of-redu…

4607a5d

…cing-RTF

GustavoA1604 approved these changes Apr 22, 2026

View reviewed changes

mario-rei approved these changes Apr 22, 2026

View reviewed changes

ogad-tether approved these changes Apr 22, 2026

View reviewed changes

Merge branch 'main' into QVAC-17236-Investigate-possibilities-of-redu…

3d64060

…cing-RTF

Zbig9000 mentioned this pull request Apr 24, 2026

[Chatterbox] Investigate possibilities of reducing RTF optimization #1745

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-17236 [Chatterbox] Investigate possibilities of reducing RTF#1674

QVAC-17236 [Chatterbox] Investigate possibilities of reducing RTF#1674
GustavoA1604 merged 7 commits into
tetherto:mainfrom
Zbig9000:QVAC-17236-Investigate-possibilities-of-reducing-RTF

Zbig9000 commented Apr 20, 2026

Uh oh!

GustavoA1604 commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Zbig9000 commented Apr 20, 2026

Uh oh!

GustavoA1604 commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Tier-based Approval Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants