Qvac 17489 io binding for kv cache chaining by Zbig9000 · Pull Request #1686 · tetherto/qvac

Zbig9000 · 2026-04-21T11:17:06Z

What problem does this PR solve?

Chatterbox's autoregressive LM loop copied every present.* tensor back into its matching past_key_values.* input on every step, a per-step O(past_len × heads × head_dim) copy that grew linearly with sequence length (quadratic total cost for an N-step generation).
That per-step copy was a measurable share of LM generation RTF, particularly on the English (non-CFG) path where LM dominates total time.

How does it solve it?

Add setOutputToInputChain / clearChainedInputs / isInputChained to IOnnxInferSession. After each run() the session std::moves output Ort::Values directly into the matching input slots — no copy, no intermediate user-space vector.
OnnxInferSession::initInputTensors() preserves chained slots across re-init so the moved tensors survive.
ChatterboxEngine::enableKvCacheChaining() builds the {present.i → past_key_values.i} mapping from session I/O names (inputs [keyValueOffset_..] map 1:1 onto outputs [1..], since outputs[0] is logits). Wired into both generateSpeechTokens (non-CFG) and generateSpeechTokensWithCfg (CFG) paths. writeKvToTensors and cachePastKeyValues now skip chained inputs.
Add ChatterboxConfig.kvCacheChaining (default true) wired through TTSModel, AddonJs, and index.js so the optimization can be toggled for A/B benchmarking or an emergency disable.

How was it tested?

Full unit suite green: 214 passing / 0 failing (2 unrelated LavaSR benchmark tests skipped), qvac-lib-inference-tts-unit-test with integration filters excluded.
New coverage:
- 5 KvCacheChainingTest cases: English offset=3 mapping, multilingual offset=2 mapping, truncated-outputs edge case, writeKvToTensors skip, cachePastKeyValues skip.
- 3 OnnxInferSessionMockTest cases for the new mock methods.
A/B benchmark on device (Linux, 4 cores, q4 models, jfk.wav reference, kvCacheChaining toggled):
- English (non-CFG): mean RTF 1.245 → 1.102 (−11.5%), totalTime −12.1% over 3 runs.
- Multilingual ES (CFG): mean RTF 4.145 → 3.987 (−3.8%), totalTime −3.8% over 2 runs (smaller share because the conditional decoder dominates multilingual).
Listened to audio output with chaining ON on both paths — no artifacts vs the pre-change baseline.
JS lint (standard) clean; C++ lints clean.

API Changes
New non-breaking kvCacheChaining option on the ONNXTTS constructor (default true, i.e. optimization on).

const model = new ONNXTTS({
  engine: 'chatterbox',
  files: { /* ... */ },
  referenceAudio,
  config: { language: 'en' },
  kvCacheChaining: false // opt-out, e.g. for benchmarking or emergency disable
})

GustavoA1604

Need not to add kvCacheChaining as a JS option. Can just keep it to true always. We can compare with rpevious version by running old version of the addon

GustavoA1604 · 2026-04-23T15:49:31Z

/review

github-actions · 2026-04-23T15:49:55Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

Zbig9000 requested review from GustavoA1604, freddy311082, ishanvohra2, mario-rei and sharmaraju352 April 21, 2026 11:17

Zbig9000 requested review from a team as code owners April 21, 2026 11:17

Zbig9000 had a problem deploying to release April 21, 2026 11:17 — with GitHub Actions Failure

Zbig9000 temporarily deployed to release April 21, 2026 11:17 — with GitHub Actions Inactive

Zbig9000 had a problem deploying to release April 21, 2026 11:26 — with GitHub Actions Failure

Zbig9000 temporarily deployed to release April 21, 2026 12:31 — with GitHub Actions Inactive

Zbig9000 had a problem deploying to release April 21, 2026 12:37 — with GitHub Actions Failure

mario-rei previously approved these changes Apr 23, 2026

View reviewed changes

GustavoA1604 requested changes Apr 23, 2026

View reviewed changes

Zbig9000 added 5 commits April 23, 2026 14:50

QVAC-17489 [Chatterbox] IO binding for KV cache chaining

fc056af

Qvac 17489 io binding for kv cache chaining part 2

13cceb1

after review fixes

a99ed6e

clang format fix

51070c2

after review fix

e9b1faf

GustavoA1604 approved these changes Apr 23, 2026

View reviewed changes

mario-rei approved these changes Apr 23, 2026

View reviewed changes

GustavoA1604 approved these changes Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qvac 17489 io binding for kv cache chaining#1686

Qvac 17489 io binding for kv cache chaining#1686
Zbig9000 merged 5 commits into
tetherto:mainfrom
Zbig9000:QVAC-17489-IO-binding-for-KV-cache-chaining

Zbig9000 commented Apr 21, 2026 •

edited

Loading

Uh oh!

GustavoA1604 left a comment

Uh oh!

GustavoA1604 commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Zbig9000 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GustavoA1604 left a comment

Choose a reason for hiding this comment

Uh oh!

GustavoA1604 commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Tier-based Approval Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Zbig9000 commented Apr 21, 2026 •

edited

Loading