testing qvac-cli-integration by Proletter · Pull Request #6 · tetherto/qvac

Proletter · 2026-01-08T13:47:39Z

No description provided.

Polish the remaining review nits on the TTS client streaming surface. - #3 TtsMulticast.pump now rejects the `done` promise with the fatal error instead of resolving `false`. An internal `.catch(() => {})` silences unhandled-rejection warnings when the caller only iterates the buffer/chunk streams and never awaits `done`; re-awaits still see the rejection. - #6 TextToSpeechStreamSession[Symbol.asyncIterator] no longer throws synchronously on a second iteration; it returns an iterator whose first `.next()` rejects, so `for await` surfaces the error in the normal async control flow rather than the iterator protocol. - #9 plainTtsBufferStream / collectTtsBuffer wrap the RPC loop in try/catch/finally so `done` always settles: resolve(true) on the terminal frame, reject with the real error on exceptions, and resolve(false) on early consumer break. Previously `await done` could hang forever when the consumer bailed out early. - #11 Skip per-frame ttsResponseSchema.parse() in all three paths; rely on the discriminated-union narrowing at the RPC boundary. Drops the per-PCM-frame Zod validation cost for large sentences. Made-with: Cursor

…peech (#1590) * feat: Add runStream() which takes input as a stream * add integration tests * uncomment cb tests * chore: Add cb streaming example * feat: Add TTS streaming funcitonality and example * Update tts addon version * Remove chatterbox example * add new error code for tts streaming fail * Move common code to util * fix: Use z.infer to define TextToSpeechStreamClientParams * Move TextToSpeechStreamSession to schemas * Track subscriber current index and trim queue when all subscribers consumed past items * add missing unit tests * fix: drive done promise from multicast pump lifecycle * fix: Forward chunkIndex and sentenceChunk in sentence-stream mode to client * fix: Use correct error code for tts stream failure * chore: Add supertonic stream test in tts-tests.ts * fix: Make tts client more readable * Remove closures and inline async generators * fix: Subscribe eagerly in sentenceStreamTts to avoid late-subscriber data loss TtsMulticast.pump() starts in a microtask on construction, while the returned async generators only call subscribe() when first iterated. If the consumer iterated one generator before the other, the first subscriber could trim the queue before the second ever registered, silently dropping earlier frames. Subscribe synchronously for both bufferStream and chunkUpdates before returning, so both subscriber indexes are in place before pump pushes its first item. Made-with: Cursor * fix: Close TTS stream on server-sent done frame Remove the dead `null` sentinel from `processTextToSpeechStreamLine` and instead close `parseTextToSpeechStreamLines` after yielding the terminal `done: true` frame, so consumers don't rely on the server closing the socket to stop iteration. Made-with: Cursor * fix: Reject sentenceStream without stream in textToSpeech Previously `sentenceStream: true` combined with `stream: false` fell through to the collect path, silently dropping the sentence-stream parameters and returning no `chunkUpdates`. Fail fast at the dispatcher with a clear error so the contract mismatch surfaces to the caller instead of being swallowed. Made-with: Cursor * fix: Release TtsMulticast subscriber slot on early break Wire a try/finally into drain() so that when a consumer breaks out of the for-await (or the generator is .return()'d / throws), the slot is parked at +Infinity via unsubscribe(). This prevents a stale low min-index from permanently pinning trimConsumed, which otherwise leaked the queue for the entire RPC stream. Made-with: Cursor * fix: Guard TTS stream write after close and preserve UTF-8 boundaries Client: - Track a `closed` flag in `textToSpeechStream` duplex session, set by `end()` / `destroy()`. Subsequent `write()` calls now throw a typed `TextToSpeechStreamFailedError` instead of propagating a raw Bare/Node "write after end" stream error. - `end()` is idempotent so accidental double-close no longer errors. Server: - `buffersToUtf8Fragments` previously decoded each incoming Buffer via `toString("utf8")`, which corrupts any multi-byte codepoint whose bytes straddle a chunk boundary (common with CJK / emoji / accented scripts emitted as LLM token deltas). Added a small tail-buffer that finds the last complete UTF-8 codepoint end in the combined buffer and defers trailing incomplete bytes to the next chunk. Any dangling partial sequence is flushed on stream end. Made-with: Cursor * fix: Order TEXT_TO_SPEECH_STREAM_FAILED code and document it - Move TEXT_TO_SPEECH_STREAM_FAILED (52415) to the end of the 52400 Model Operations block so the ordering in SDK_SERVER_ERROR_CODES matches the numeric sequence (…52413, 52414, 52415). - Add the missing row for 52415 to the (latest) errors.mdx table, per the sdk/docs-freshness rule that the error table stay in sync whenever a new code is introduced. Made-with: Cursor * fix: Register operation metrics for textToSpeechStream Only `textToSpeech` was registered in `operation-metrics.ts`, so the duplex `textToSpeechStream` path silently skipped `modelExecutionTime`, `audioDuration`, and `totalSamples` gauges even though the server already collects the same `TtsStats` via `collectTtsStats()` on the final chunk. Mirror the non-streaming registration so the streaming path has parity observability. Made-with: Cursor * fix: Harden TTS client done-promise, iterator, and parse cost Polish the remaining review nits on the TTS client streaming surface. - #3 TtsMulticast.pump now rejects the `done` promise with the fatal error instead of resolving `false`. An internal `.catch(() => {})` silences unhandled-rejection warnings when the caller only iterates the buffer/chunk streams and never awaits `done`; re-awaits still see the rejection. - #6 TextToSpeechStreamSession[Symbol.asyncIterator] no longer throws synchronously on a second iteration; it returns an iterator whose first `.next()` rejects, so `for await` surfaces the error in the normal async control flow rather than the iterator protocol. - #9 plainTtsBufferStream / collectTtsBuffer wrap the RPC loop in try/catch/finally so `done` always settles: resolve(true) on the terminal frame, reject with the real error on exceptions, and resolve(false) on early consumer break. Previously `await done` could hang forever when the consumer bailed out early. - #11 Skip per-frame ttsResponseSchema.parse() in all three paths; rely on the discriminated-union narrowing at the RPC boundary. Drops the per-PCM-frame Zod validation cost for large sentences. Made-with: Cursor * fix: Tighten textToSpeechStream schema surface - Add .positive() to maxBufferScalars and flushAfterMs to match the existing constraint on sentenceStreamMaxChunkScalars. Previously a caller could pass negative values straight through to the addon. - Un-export textToSpeechStreamRequestBaseSchema — consumers only need the finalized textToSpeechStreamRequestSchema, and the base is an implementation detail of the shared object shape. The exported type alias TextToSpeechStreamClientParams continues to derive from the base via `typeof`, so nothing on the public type surface changes. Made-with: Cursor * fix: Cross-platform tmp path and safer PCM append in TTS examples - playPcmInt16Chunk now writes the intermediate WAV chunk under os.tmpdir() / path.join instead of a hard-coded /tmp/qvac-tts-chunk-… path. The previous code's Windows branch was unreachable in practice because the POSIX /tmp directory doesn't exist there; this uses %TEMP% on Windows automatically. - appendPcmSamples switches from `target.push(...chunk.slice(i, end))` to `Array.prototype.push.apply(target, chunk.slice(i, end))`. Same semantics, but avoids allocating the spread rest array per batch and is closer to a memcpy-style concat in V8. Made-with: Cursor * fix: Catch zero-chunk regressions in TTS sentence-stream test - TtsExecutor.makeSentenceStream now returns `{ passed: false, ... }` when the chunkUpdates iterator yields no chunks / no samples. The previous executor always returned a formatted string regardless of counts, so a regression that silently emitted zero chunks would still have looked like a pass. - ttsSupertonicSentenceStream's expectation upgraded from `{ validation: "type", expectedType: "string" }` to `{ validation: "contains-all", contains: ["sentence-streamed", "chunks", "samples"] }`. The executor's zero-case failure string lacks "sentence-streamed", so the contains-all match fails on regression. Made-with: Cursor * fix: Apply stream default locally and throw typed error on tts mismatch Previous guard only rejected the explicit `stream: false + sentenceStream: true` combination. A caller passing `{ modelId, text, sentenceStream: true }` with `stream` omitted silently fell through to `collectTts` while the server's Zod `.default(true)` still ran the sentence-stream branch and emitted chunk frames — which the client then discarded, dropping all chunk metadata. - Resolve the `stream` default locally (`params.stream ?? true`) so the client's dispatch routing matches the server's Zod-applied routing, and an omitted `stream` now correctly lands in `sentenceStreamTts` or `plainStreamTts`. - Only the explicit `sentenceStream: true + stream: false` combination is rejected, and it now throws `TextToSpeechStreamFailedError` (code 52415) instead of a bare `new Error(...)` so callers can discriminate by error code like everywhere else in the SDK. Made-with: Cursor * remove inline defaults for sentenceStream and stream * Use TtsMulticast in unit test instead of mock --------- Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

…xt-ggml@2026-01-30 The Wan-Metal work that was carried as a local overlay has all landed upstream on tetherto/qvac-ext-ggml's 2026-01-30 branch: - bc053644 metal: IM2COL_3D op + PAD left-padding for Wan video (#5) - 512e1773 cmake: support qvac hybrid backend packaging (static CPU + dynamic GPU backends, GGML_MAX_NAME prop, graceful no-OpenCL-device fallback, public ggml-opencl.h install -- previously six local overlay patches) - 6d2d24bb / b1923e29 / 05afdc59 metal: tighten IM2COL_3D supports_op to match the CPU-reference invariants (#6) Repin vcpkg/ports/ggml from PR #5's head (bc053644) to PR #6's merge commit (05afdc59) on 2026-01-30, drop all seven local overlay patches since their content is now upstream verbatim, and bump port-version 102 -> 104 to force a clean rebuild of ggml. Net diff: +22 / -201; the overlay now exists only as a baseline pin that overrides the registry's ggml-org/ggml@a8db410a (which still lacks the Wan-required Metal ops). Once the registry baseline catches up to a ref containing this work, vcpkg/ports/ggml/ can be deleted entirely. Verified with npm run build on darwin-arm64: ggml@2026-01-30#104 builds fresh from 05afdc59 with zero patches applied, addon links and tests compile, prebuild installed. Co-authored-by: Cursor <cursoragent@cursor.com>

Bundle of correctness, hygiene, and CI-doc fixes from the recent code review. Each item below has its own paragraph in the diff comments. - #1 files-array: add test/utils/runSupertonicTTS.js + test/data/sentences-{medium,long}.js to package.json so consumers running the integration tests from the npm tarball don't crash with `Cannot find module ../utils/runSupertonicTTS`. - #2 deps: move @qvac/langdetect-text from runtime dependencies to devDependencies (it's only referenced from examples/, which aren't in the published files list). - #3 race-fix: ChatterboxModel::process()'s post-synthesize streaming detection used to read engine_->options() outside engineMu_, racing with reload(). synthesize() now returns SynthesizeResult { pcm, wasStreaming } where wasStreaming is captured under the engine lock against the local shared_ptr so process() doesn't have to touch engine_ again. - #4 deferred-load: ChatterboxModel + SupertonicModel constructors used to call load() eagerly, so JsInterface::createInstance() (sync on the JS thread) was parsing ~370 MB of GGUF on the Bare event loop. Both models now implement IModelAsyncLoad: constructors validate + return; the actual load is deferred to waitForLoadInitialization(), which the new addon_js::activate wraps inside JsAsyncTask::run so the parse runs on a worker thread. binding.cpp registers addon_js::activate in place of JsInterface::activate; tts.js now awaits the resulting promise. - #5 dead code: drop _resolvePath (unused), drop the (void)inputObj read in AddonJs.hpp::runJob, document FAILED_TO_PAUSE / FAILED_TO_STOP / JOB_ALREADY_RUNNING in lib/error.js as reserved-but- not-thrown so future maintainers don't delete them blindly (the unit suite asserts the values). - #6 cancel-reset: SupertonicModel grew Chatterbox's cancelRequested_ reset pattern: cancel() sets it, synthesize() fast-fails on it, process() resets it per call so a stale cancel doesn't poison the next run. - #7 useGPU comment: explain in JSAdapter::buildChatterboxConfig that the JS layer is the source of truth for useGPU and nGpuLayers wins downstream; left a pointer to std::optional<bool> if a future caller ever needs to distinguish "absent" from "explicit false". - #10 fork pointers: README.md and test/utils/downloadModel.js no longer point at GustavoA1604/chatterbox.cpp; both reference the upstream tetherto/qvac-ext-lib-whisper.cpp/tts-cpp tree now. - #9 doc: integration-mobile-test-tts-ggml.yml gained a header comment on the build-and-test job documenting that continue-on-error is the early-days landing posture (merge-guard treats success || skipped as pass), with a pointer to tighten once Device Farm provisioning is stable. Nits: - 'use strict' added to addonLogging.js (matches every other .js). - node-vs-bare runtime banners on scripts/{generate,validate}-mobile-integration-tests.js. - ttsOutputDebugString no longer JSON.stringify's the full PCM Int16Array on every chunk-streaming event; emits a tiny summary ({sampleRate, chunkIndex, isLast, sentenceChunk, outputArrayLen}) instead. Tests: 35 passing (33 -> 35; two new assertions cover the deferred-load contract); 4 skipped real-GGUF tests behind the existing QVAC_TEST_CHATTERBOX_T3_GGUF / QVAC_TEST_CHATTERBOX_S3GEN_GGUF / QVAC_TEST_SUPERTONIC_GGUF env-var gates. Lint clean. Co-authored-by: Cursor <cursoragent@cursor.com>

…#1983) * feat: add @qvac/tts-ggml package (Chatterbox English on qvac-tts.cpp) New Bare addon wrapping the `qvac-tts::qvac-tts` static library (backed by the `tts-cpp` port added in tetherto/qvac-registry-vcpkg). API-compatible with the Chatterbox engine exposed by `@qvac/tts-onnx` so downstream consumers can swap backends without touching orchestration code. ## Scope * First iteration. Supports Chatterbox **English** only. Chatterbox multilingual, LavaSR enhancer, Supertonic engine, and streaming are out of scope and remain in `@qvac/tts-onnx`. They'll land alongside the evolution of qvac-tts.cpp. * Native backend is the static `qvac-tts` library from the QVAC vcpkg registry (`ports/tts-cpp`, baseline `2026-04-21`). No ONNX Runtime dependency. ## JS surface * `@qvac/tts-ggml` exports `TTSGgml` with the same method shape as `ONNXTTS`: `run` / `runStream` / `runStreaming` / `reload` / `unload` / `destroy`. * `files: { modelDir }` looks for `chatterbox-t3-turbo.gguf` + `chatterbox-s3gen.gguf` side-by-side; `files.t3Model` / `files.s3genModel` override the defaults. * Options: `referenceAudio`, `voiceDir` (baked profile), `seed`, `nGpuLayers`, `threads`, `outputSampleRate`, plus placeholders for the upcoming streaming flags (`streamChunkTokens`, `streamFirstChunkTokens`, `cfmSteps`). * Shared reusable lib code (`lib/textChunker.js`, `lib/textStreamAccumulator.js`, `addonLogging.*`) is copied verbatim from `@qvac/tts-onnx`. * New error class `QvacErrorAddonTTSGgml` uses codes **13001–14000** to avoid collisions with `@qvac/tts-onnx` (7001–7011) when both packages are loaded in the same Bare process. ## Native addon * `addon/src/model-interface/chatterbox/ChatterboxModel.{hpp,cpp}` — `IModel` + `IModelCancel` implementation. First-iteration strategy: assemble argv for `qvac_tts_cli_main` with a scratch `.wav` output path, call it synchronously, then parse the resulting 16-bit mono PCM wav back into `std::vector<int16_t>` for the JS handler. Consequences: every job re-loads the model (~700 ms + inference time), no mid-synthesis cancellation, no streaming. The follow-up milestone replaces this with a persistent, struct-based API once qvac-tts.cpp exposes one. * `addon/src/js-interface/{JSAdapter.{hpp,cpp}, binding.cpp}` — JS-to-C++ config bridging (same string-map pattern as `@qvac/tts-onnx`) and the `BARE_MODULE(qvac_tts_ggml, ...)` registration exposing `createInstance` / `runJob` / `reload` / `activate` / `cancel` / `destroyInstance` / `loadWeights` / `setLogger` / `releaseLogger`. * `addon/src/addon/AddonJs.hpp` — JS-facing `createInstance` / `runJob` / `reload` wrappers that register a `JsAudioOutputHandler` emitting `{ outputArray: Int16Array, sampleRate: number }` to JS. ## Build / registry * `CMakeLists.txt` uses `find_package(qvac-tts-cpp CONFIG REQUIRED)` and the standard `cmake-bare` + `cmake-vcpkg` scaffolding (shape matches `@qvac/transcription-whispercpp`). * `vcpkg.json` depends on `tts-cpp` (with a `vulkan` feature passthrough) plus `qvac-lib-inference-addon-cpp`, `qvac-lint-cpp`, and `gtest`. * `vcpkg-configuration.json` points at tetherto/qvac-registry-vcpkg. NOTE: the baseline pin here is inherited from `@qvac/transcription-whispercpp` and **must be bumped** to a commit that contains the `tts-cpp` port once that registry PR lands. A follow-up commit will update it. ## Tests & examples * Integration + unit test files for Chatterbox English are copied verbatim from `@qvac/tts-onnx` with only mechanical renames (`ONNXTTS` -> `TTSGgml`, `QvacErrorAddonTTS` -> `QvacErrorAddonTTSGgml`, `@qvac/tts-onnx/text-chunker` -> `../../lib/textChunker.js`). Some paths in `test/integration/addon.test.js` still import Supertonic / LavaSR helpers that don't exist in this package — those test blocks will fail fast when the file loads, which is expected until those backends get their own ggml packages. * Examples: `chatterbox-tts.js`, `chatterbox-streaming-tts.js`, plus shared `wav-helper.js` + `pcm-chunk-player.js`. ## What's not in this PR (known gaps) * No docs: README, NOTICE, CHANGELOG, PULL_REQUEST_TEMPLATE changes will land in a single documentation pass once the registry + fork commits have merged upstream. * `vcpkg-configuration.json` baseline needs to point at a qvac-registry-vcpkg commit that ships `tts-cpp` (pending the registry PR). * Actual `npm run build` requires the registry and fork commits to be on `main` of their respective upstream repos. * chore: point tts-ggml vcpkg baseline at the tts-cpp-bearing registry commit Bumps `vcpkg-configuration.json` to GustavoA1604/qvac-registry-vcpkg at commit 1e2839680b6be8d8ffff889a9c29b966c176098c — the commit that adds the `tts-cpp` port. Paired with the `qvac-tts` library already pinned in the port's `portfile.cmake` (GustavoA1604/chatterbox.cpp @ 0fe4a521618cc30358040b29d75d4261b31cbb60). Will be re-pointed at tetherto/qvac-registry-vcpkg once the registry PR lands upstream. * chore: tts-ggml: trim tests + examples to Chatterbox English, restore mobile wrapper Second pass over @qvac/tts-ggml after the build started passing: prune everything that only made sense for the ONNX-era multi-engine scope and adapt the remaining Chatterbox-English bits to the GGUF + file-path reference-audio contract. Restores `test/mobile/` so the Android build has something to point at. ## C++ * `ChatterboxModel.cpp`: the `ArgvBuilder::buildArgv` doc comment contained `**/` which closed the block comment early and broke the build. Rewrote as a `//` comment. ## Examples * `examples/chatterbox-tts.js` — rewrite for v0 contract: single `<text>` argv, `files: { modelDir }` pointing at the two GGUFs, `referenceAudio` is now a wav **path** (addon passes it to `--reference-audio`) instead of a Float32Array. Drops english/multilingual arg and the CHATTERBOX_VARIANT switch that picked which `.onnx` files to load. * Removed `examples/chatterbox-streaming-tts.js` + `examples/pcm-chunk-player.js`. The v0 addon re-loads the model per `run()` call — exposing streaming would mislead. Both come back alongside the persistent-engine milestone. * `package.json`: `npm run example` now passes a default text so it runs without extra args. ## Tests ### Kept as-is (engine-agnostic) * `test/unit/textChunker.test.js` * `test/mock/{MockedBinding,utils}.js` * `test/utils/{wav-helper,pcmConcatenator,loader.fake,runWhisper,runTTS}.js` * `test/reference-audio/jfk.wav`, `test/data/sentences-*.js` ### Mechanical fixes * `test/unit/tts.error.test.js` — fix error-code assertions to the tts-ggml range (`13001–14000`); was still checking the `@qvac/tts-onnx` range (`7001–7011`). * `test/unit/tts-ggml.lifecycle.test.js` — fix stale `QvacErrorAddonTTS` import to `QvacErrorAddonTTSGgml`; switch the stubbed model to `{ t3Model, s3genModel }` GGUFs and drop the non-existent `engine: 'chatterbox'` option. * `test/unit/tts-ggml.sentence-stream.test.js` — same GGUF/engine cleanup. ### Rewritten * `test/unit/chatterbox.inference.test.js` — drop tests that asserted the old ONNX file shape (`tokenizer / speechEncoder / embedTokens / conditionalDecoder / languageModel`), the removed `engine` detection and the wrong `getModelKey` return value (`'onnx-tts'` -> `'tts-ggml'`). New tests cover: `modelDir` derives the two GGUF paths; explicit `t3Model` / `s3genModel` override the defaults. The mocked-binding run/reload/cancel flow stays. * `test/integration/addon.test.js` — fresh, ~180 LoC, Chatterbox-English only. Ensures the GGUFs are present, runs the short sentence set through `loadChatterboxTTS` + `runChatterboxTTS[WithSplit]`, and (on darwin only) runs a whisper-based WER check via the existing `runWhisper` util. Drops the Chatterbox-multilingual block + every Supertonic + LavaSR block that doesn't apply to this package. * `test/utils/runChatterboxTTS.js` — rewrite for the GGUF contract: `files: { modelDir, t3Model, s3genModel }`, `referenceAudio` as a file path that falls back to `test/reference-audio/jfk.wav` (or the mobile test-asset when `global.assetPaths` is present). No more WAV decode / resample on the JS side. * `test/utils/downloadModel.js` — trim from 1007 LoC to 280. Drops the Supertonic + LavaSR + Chatterbox-multilingual + Cangjie downloaders. Keeps the shared HTTP/curl infrastructure and `ensureWhisperModel` (still used by the integration WER check). `ensureChatterboxModels` is now **check-only**: it verifies `chatterbox-t3-turbo.gguf` + `chatterbox-s3gen.gguf` exist locally and, if missing, prints the exact commands for generating them from the qvac-tts.cpp (née chatterbox.cpp) conversion scripts. Once the GGUFs land on a canonical HuggingFace repo we'll wire up download URLs here. ## Scripts * `scripts/ensure-chatterbox.js` — simplify to a single invocation against `./models/`. Drops the variant / language matrix that the ONNX downloader needed. * `scripts/ensure-models.js` — now a thin alias to `ensure-chatterbox.js`. Drops the Supertonic + LavaSR orchestration. ## Mobile * Restored `test/mobile/{integration.auto.cjs, integration-runtime.cjs, testAssets/jfk.wav}` so the Android build has a wrapper to point at. * `package.json`: re-added `test/mobile` to the `files` list. ## Gitignore * Ignore generated `.clang-format` / `.clang-tidy` / `.valgrind.supp` (produced by the top-level `configure_file(...)` calls) and `build_*/` dirs (bare-make convention). ## Verified locally * `npx standard "test/**/*.js" "*.js" "lib/*.js"` — clean. * `npm run test:unit` — 38/38 pass (105/105 asserts). * `npm run build && bare examples/chatterbox-tts.js "Hello from qvac tts ggml."` produces a 24 kHz wav as expected. * Add streaming support * Update ggml backend to use separate ggml repo * tts-ggml: consume renamed tts-cpp library (2026-04-24#1) Upstream chatterbox.cpp renamed the package + namespace + target from qvac-tts to tts-cpp and tightened the library boundary; pick up the new artefacts here: - find_package(qvac-tts-cpp CONFIG REQUIRED) -> find_package(tts-cpp CONFIG REQUIRED) - qvac-tts::qvac-tts -> tts-cpp::tts-cpp - qvac_tts::chatterbox -> tts_cpp::chatterbox (engine ptrs, EngineOptions, SynthesisResult, forward-decls in ChatterboxModel.hpp) - #include <qvac-tts/chatterbox/engine.h> -> #include <tts-cpp/chatterbox/engine.h> - Doxygen / inline doc references to the old names refreshed alongside the code changes. vcpkg wiring: - vcpkg-configuration.json baseline bumped to qvac-registry-vcpkg commit bc30b0b (ports/tts-cpp renamed and repointed at chatterbox.cpp@f8f9145). - vcpkg.json tts-cpp constraint bumped to 2026-04-24#1 (the port that carries the rename + namespace + install(EXPORT) changes). Verified with a cold bare-make generate + bare-make build against the new port, and the addon's existing unit + integration test suites. Made-with: Cursor * tts-ggml: bump tts-cpp port to 2026-05-07 + registry baseline Picks up the round-3 review-fix wave landed on the tts-cpp port: e673182 scrub stale patches/ refs from README (N10) 8ba10a6 drop unreachable TTS_CPP_GGML_LIB_PREFIX block (N8) 4b5d2d7 mirror N1-N7 fixes from chatterbox.cpp source-of-truth - N1 supertonic alive-registry guard against freed-backend gallocr_free assert on hot-swap (Vulkan/Metal/CUDA) - N2 drop dead g_sink_* state, soften log_set docstring - N3 Turbo BPE try/catch (exception-safe Engine ctor) - N4 STFT cancel checkpoint + tighter Engine::cancel() doc - N5 document s3gen_preload/unload refcount semantics - N6 drop dead cached_text_lc Supertonic shim - N7 fix misleading "no copy" view-vs-copy log wording Plus the integrated-port-only round-2 fixes that landed earlier: fa0d490 close patches/-deleted regression: TTS_CPP_USE_SYSTEM_GGML now defaults ON; bundled-without-patches hard-errors at configure time with a pointer at the ggml-speech vcpkg port. ae34c58 README rewritten for integrated/vcpkg context. a2f2dd6 top-level qvac-ext-lib-whisper.cpp README points at the tts-cpp/ subtree (alongside parakeet-cpp/). Public API used by ChatterboxModel (tts_cpp::chatterbox::Engine / EngineOptions / SynthesisResult / s3gen_preload / s3gen_unload) is backward-compatible: the new port adds Engine::backend_name(), MTL-variant fields on EngineOptions (language / cfg_weight / min_p / exaggeration), and a separate tts_cpp::supertonic::Engine class, but nothing this consumer was already calling has changed. Edits: packages/tts-ggml/vcpkg.json - tts-cpp dep: version>=2026-04-24#1 -> version>=2026-05-07. packages/tts-ggml/vcpkg-configuration.json - default-registry baseline: bc30b0b (April 2026 fork-only state) -> 16b91afdcfd59baea60e81f3da94f49311ef2a97. The new baseline pulls in the post-tetherto-merge state (parakeet-cpp port at 932d5d9, ggml-speech port-version 1 at f07bdd0) plus the new tts-cpp port (16b91af) on the developer's GustavoA1604 registry fork. Smoke-test plan: after running `vcpkg install` against the new baseline, the tts-cpp port's vcpkg_from_github resolves at GustavoA1604/qvac-ext-lib-whisper.cpp@e673182 (tts-cpp branch) until the upstream PR merges. ChatterboxModel should build and synthesize identically; expanding to Multilingual + Supertonic flows is the follow-up commit on the package side. Co-authored-by: Cursor <cursoragent@cursor.com> * Add chatterbox multilingual and supertonic * Add mobile integration tests * tts-ggml: drop clang-19 pin in linux-clang toolchain The toolchain hardcoded `clang-19` / `clang++-19` (versioned binary names) since the package's first commit (0a2c978). Linux CI hadn't exercised this path before — the new on-pr-tts-ggml.yml -> integration matrix is the first time it does, and it fails on every linux runner (ai-run-ubuntu-22.04, ai-run-linux-gpu, ubuntu-24.04-arm) at vcpkg's "detect_compiler" step because none of the GH-hosted images ship a `clang-19` symlink: Detecting compiler hash for triplet x64-linux... error: while detecting compiler information: ... CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:127 (message): Command failed: ... -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE= .../tts-ggml/vcpkg/triplets/../toolchains/linux-clang.cmake ... Match parakeet's working pattern (qvac-lib-infer-parakeet/vcpkg/ toolchains/linux-clang.cmake): use unversioned `clang` / `clang++` so each runner picks up its image's default clang (clang-15 on ubuntu-22.04, clang-18 on ubuntu-24.04, whatever the AI runners ship). The `-stdlib=libc++` flag added by x64-linux.cmake / arm64-linux.cmake is honoured by every reasonable clang version. Co-authored-by: Cursor <cursoragent@cursor.com> * Add C++ tests and coverage; fix linux build * tts-ggml: address PR review feedback Bundle of correctness, hygiene, and CI-doc fixes from the recent code review. Each item below has its own paragraph in the diff comments. - #1 files-array: add test/utils/runSupertonicTTS.js + test/data/sentences-{medium,long}.js to package.json so consumers running the integration tests from the npm tarball don't crash with `Cannot find module ../utils/runSupertonicTTS`. - #2 deps: move @qvac/langdetect-text from runtime dependencies to devDependencies (it's only referenced from examples/, which aren't in the published files list). - #3 race-fix: ChatterboxModel::process()'s post-synthesize streaming detection used to read engine_->options() outside engineMu_, racing with reload(). synthesize() now returns SynthesizeResult { pcm, wasStreaming } where wasStreaming is captured under the engine lock against the local shared_ptr so process() doesn't have to touch engine_ again. - #4 deferred-load: ChatterboxModel + SupertonicModel constructors used to call load() eagerly, so JsInterface::createInstance() (sync on the JS thread) was parsing ~370 MB of GGUF on the Bare event loop. Both models now implement IModelAsyncLoad: constructors validate + return; the actual load is deferred to waitForLoadInitialization(), which the new addon_js::activate wraps inside JsAsyncTask::run so the parse runs on a worker thread. binding.cpp registers addon_js::activate in place of JsInterface::activate; tts.js now awaits the resulting promise. - #5 dead code: drop _resolvePath (unused), drop the (void)inputObj read in AddonJs.hpp::runJob, document FAILED_TO_PAUSE / FAILED_TO_STOP / JOB_ALREADY_RUNNING in lib/error.js as reserved-but- not-thrown so future maintainers don't delete them blindly (the unit suite asserts the values). - #6 cancel-reset: SupertonicModel grew Chatterbox's cancelRequested_ reset pattern: cancel() sets it, synthesize() fast-fails on it, process() resets it per call so a stale cancel doesn't poison the next run. - #7 useGPU comment: explain in JSAdapter::buildChatterboxConfig that the JS layer is the source of truth for useGPU and nGpuLayers wins downstream; left a pointer to std::optional<bool> if a future caller ever needs to distinguish "absent" from "explicit false". - #10 fork pointers: README.md and test/utils/downloadModel.js no longer point at GustavoA1604/chatterbox.cpp; both reference the upstream tetherto/qvac-ext-lib-whisper.cpp/tts-cpp tree now. - #9 doc: integration-mobile-test-tts-ggml.yml gained a header comment on the build-and-test job documenting that continue-on-error is the early-days landing posture (merge-guard treats success || skipped as pass), with a pointer to tighten once Device Farm provisioning is stable. Nits: - 'use strict' added to addonLogging.js (matches every other .js). - node-vs-bare runtime banners on scripts/{generate,validate}-mobile-integration-tests.js. - ttsOutputDebugString no longer JSON.stringify's the full PCM Int16Array on every chunk-streaming event; emits a tiny summary ({sampleRate, chunkIndex, isLast, sentenceChunk, outputArrayLen}) instead. Tests: 35 passing (33 -> 35; two new assertions cover the deferred-load contract); 4 skipped real-GGUF tests behind the existing QVAC_TEST_CHATTERBOX_T3_GGUF / QVAC_TEST_CHATTERBOX_S3GEN_GGUF / QVAC_TEST_SUPERTONIC_GGUF env-var gates. Lint clean. Co-authored-by: Cursor <cursoragent@cursor.com> * tts-ggml: unblock CI integration tests on every desktop runner Four independent failures, one per platform: 1. linux-x64 / linux-arm64: addon load crashed at `libomp.so.5: cannot open shared object file`. tts-cpp's binary is built with clang under the linux-clang toolchain and links against libomp (LLVM OpenMP runtime); only `libgomp1` (GNU OpenMP) was being apt-installed. Add `libomp5` so libomp.so.5 is on the loader path. 2. darwin-arm64: convert-models.sh aborted at line 200 with `hf_args[@]: unbound variable`. macOS's system bash is 3.2 which treats `"${arr[@]}"` as nounset access when the array is empty under `set -u`; with HF_TOKEN unset we hit it on every fresh runner. Use the `${arr[@]+"${arr[@]}"}` idiom (defined-or-nothing) at all six call sites and add a header comment so the next maintainer doesn't accidentally regress. 3. darwin-x64: pip install bombed building `llvmlite` from source because the macos-15-large runner has no LLVM 15 development install. Root cause: librosa pulls in numba 0.65+, which stopped shipping darwin-x86_64 wheels for Python 3.12. Pin Python to 3.11 in the Setup Python step; 3.11 has prebuilt wheels for the entire numba/llvmlite/librosa stack on darwin-x64 and is fine for every other converter dependency. 4. windows-2022: ChatterboxModel::load threw `vk::createInstance: ErrorIncompatibleDriver`. Root cause: the addon's index.js::_validateConfig defaults `useGPU = true` when neither useGPU nor nGpuLayers is specified, so the test ran with n_gpu_layers=99 -> ggml_backend_vk_init -> vk::createInstance -> ErrorIncompatibleDriver on the runner's no-Vulkan-driver image. runChatterboxTTS.js now honours `process.env.NO_GPU === 'true'` (set on the no-GPU matrix entries) and forces useGPU=false on exactly those runners; the other test runners (chatterbox-mtl, gpu-smoke, multiple-runs) already had this guard. Also documents the `mesa-vulkan-drivers` apt package (already pulled in) as the software ICD that lets the Vulkan-built prebuild's runtime backend probe enumerate at least one device on linux runners. Co-authored-by: Cursor <cursoragent@cursor.com> * tts-ggml: drop Chatterbox from mobile bundle (Metro V8 string limit) Mobile build failed at `:app:createBundleReleaseJsAndAssets` with: SyntaxError: assets/testAssets/chatterbox-s3gen.gguf: Cannot create a string longer than 0x1fffffe8 characters Root cause: Metro's bundler reads every asset under `test/mobile/testAssets/` via `Buffer.toString()`. V8's max string length is 0x1fffffe8 (~512 MiB). chatterbox-s3gen.gguf is ~1 GiB even with --quant q4_0 because the s3gen converter only quantizes attention weights and leaves the bulk of the s3gen graph in fp16 ("0/291 weight tensors quantized" in the converter log). Fix: bundle ONLY supertonic.gguf (~125 MiB, comfortably under the limit) on mobile. Mobile Chatterbox tests degrade cleanly to `t.pass('Skipped: Chatterbox GGUFs not available')` via the existing `ensureChatterboxModels` helper -- it already returns { success: false } when the GGUFs aren't on disk. Cache key bumped to v2 so existing v1 cache entries (which include the chatterbox files) are evicted on the next run. Bundling Chatterbox on mobile requires either: - adding `gguf` to qvac-test-addon-mobile's metro `assetExts` so the JS-string read is skipped (then the s3gen file can flow through the bundle as a raw asset), or - pushing the chatterbox GGUFs to the device via `adb push` outside the bundle and surfacing the path through downloadModel.js's existing ANDROID_CANDIDATE_DIRS fallback. Both are outside the scope of this PR; documented inline above the cache step for the next maintainer. Co-authored-by: Cursor <cursoragent@cursor.com> * Bump hash of vcpkg * Consume vcpkg from tetherto repository * Fix integration tests failures in all platforms * Further fix tests * fix: Make useGPU flag more meaningful (#1953) * fix[api]: make useGPU flag actually force CPU/GPU and reject useGPU/nGpuLayers conflicts * add gpu smoke test * resolve comments --------- Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local> * Update dependencies after monorepo directory changes * Further drop qvac-lib- prefix * Add CHANGELOG.md --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com> Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

* feat(diffusion): refactor download scripts and add Wan 2.1 support - Extract shared dl() function into reusable dl-functions.sh module - Update all download-model-*.sh scripts to source shared utilities - Add download-model-wan.sh for Wan 2.1 video generation models - Reduces code duplication and improves maintainability Wan 2.1 downloads (~8.3 GB): - wan2.1_t2v_1.3B_fp16.safetensors (diffusion model) - wan_2.1_vae.safetensors (VAE encoder/decoder) - umt5_xxl_fp16.safetensors (text encoder) Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video foundation -- ctx/vid handlers, AVI muxer, shared parsers Phase 1-4 of Wan 2.1 / 2.2 video generation support in the diffusion-cpp addon. Configuration + parsing layer only; dispatch + callback plumbing + JS surface land in follow-up commits on this branch. SdCtxConfig: - Add highNoiseDiffusionModelPath for Wan 2.2 MoE high-noise expert (leave empty for Wan 2.1 and all non-Wan models) - Add previewMode / previewInterval / previewDenoised / previewNoisy for optional mid-denoising preview frames via sd_set_preview_callback - Wire both through SdCtxHandlers (new JS keys: preview_mode, preview_interval, preview_denoised, preview_noisy) and AddonJs (highNoiseDiffusionModelPath in args map) AviWriter (new utility): - addon/src/utils/AviWriter.{hpp,cpp} ports the upstream avi_writer.h MJPG encoder onto an in-memory std::vector<uint8_t> sink (no stdio, no temp files) so video bytes flow through the existing OutputCallBackJs queue - Full input validation (numFrames, fps, jpegQuality, channel count, frame homogeneity, null data) -- StatusError on any rejection SdParsers (new shared module): - Extract parseSampler / parseScheduler / parseCacheMode / parseVaeTileSize / parseCachePreset / requireNum/Str/Bool from SdGenHandlers into addon/src/handlers/SdParsers.{hpp,cpp} - Reused by both SdGenHandlers (image) and SdVidGenHandlers (video) SdVidGenHandlers (new): - SdVidGenConfig struct with full Wan 2.1 + 2.2 surface: mode (txt2vid/img2vid/flf2vid), prompts, dimensions, videoFrames (4k+1 validated), fps, seed, low-noise expert sample params, high-noise expert sample params, moeBoundary, strength, vaceStrength, VAE tiling, cache mode/preset/threshold - 22 JSON handlers with validation for each field Tests (all pass): - 5 new SdCtxHandlers tests for preview_* + high_noise path default - 18 new AviWriter tests covering happy path, RIFF header structure, all validation rejections, JPEG round-trip - 54 new SdVidGenHandlers tests covering every field + integration payload + defaults - Zero regressions across existing 144 fast-unit tests No user-facing JS API changes yet. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video generation -- dispatch, processVideo, JS wrapper + examples Builds on the Wan foundation commit by wiring the video path end-to-end from JS to C++ and back. Adds txt2vid / img2vid / flf2vid generation via a new VideoStableDiffusion class that shares the single native binding with the existing ImgStableDiffusion class. Native: - SdModel::process() dispatches on the JSON "mode" field to processImage() (existing) or the new processVideo() path. - processVideo() applies SdVidGenHandlers, validates mode-vs-inputs invariants (img2vid requires init_image; flf2vid requires both; txt2vid rejects both; end_image only valid on flf2vid), decodes init/end/control frames, fills sd_vid_gen_params_t, and encodes the returned sd_image_t* sequence to an in-memory MJPG AVI. - SdVideoFrames RAII wrapper extracted to addon/src/utils/ so it can be unit-tested without a loaded model. - GenerationJob grows endImageBytes and controlFramesBytes plus an optional per-frame frameCallback (unused from JS in this PR; reserved for the preview follow-up). - AddonJs::runJob reads endImageBuffer (single Uint8Array) and controlFramesBuffers (Array of Uint8Array) as typed-array args, no JSON encoding. JS surface: - video.js / video.d.ts: new VideoStableDiffusion class with full per-mode validation, 4k+1 frame-count rule, fps range, moe_boundary range, Uint8Array type checks, and warning when high_noise_* params are set without files.highNoiseDiffusionModel. - addon.js: SdInterface.runJob threads end_image and control_frames through to the native runJob without round-tripping through JSON. - index.js / index.d.ts: unchanged -- image wrapper continues to work exactly as before. Both classes compose the same SdInterface and hit the same binding.cpp entry points. - package.json: exports "./video", ships video.js / video.d.ts, adds generate:video / generate:img2vid / generate:flf2vid scripts. Examples: - examples/generate-video-wan.js (txt2vid @ 832x480, 33 frames) - examples/img2vid-wan.js (reuses assets/von-neumann.jpg as first frame) - examples/flf2vid-wan.js (expects flf-first.png / flf-last.png) Tests: - test_sd_video_frames.cpp: 12 RAII tests (empty states, destruction of 4k+1 production sizes, null-pixel tolerance, bounds-checked operator[], compile-time copy/move deletion). - test_wan_video.cpp: 12 validation tests reusing the SD2.1 context to satisfy isLoaded() and exercise every processVideo() guard before generate_video() runs; plus an opt-in happy-path smoke test (SD_RUN_WAN_SMOKE=1) gated off by default because ggml-metal lacks IM2COL_3D for Wan's 3D convs. Gates: npm run lint, npm run test:dts, npm run build, and the fast subset of addon-test (178/178) all pass. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video tests, ggml overlay, example tuning Add a vcpkg overlay-port for ggml at vcpkg/ports/ggml/ that pins tetherto/qvac-ext-ggml @ feature/metal-pr-16669-clean (commit bc053644). The fork adds Metal kernels for IM2COL_3D and 3-axis PAD-left, both required by Wan 2.1 / 2.2 video generation; without them ggml hard-aborts mid-run with "unsupported op 'IM2COL_3D'". Rationale lives in portfile.cmake -- the overlay is transient and will be removed once the registry baseline rolls forward. Add JS test coverage for VideoStableDiffusion: - test/unit/video-validation.test.js: 63 input-validation cases mirroring the existing input-validation.test.js pattern. - test/integration/generate-video-wan.test.js: opt-in (WAN_INTEGRATION=1) end-to-end T2V smoke test plus sniffAvi self-tests. Tune the Wan examples: - generate-video-wan.js: env-var-driven (PROMPT, FRAMES, STEPS, SEED, CFG_SCALE, FLOW_SHIFT, ...), inline frame-count cheat sheet, (4*k+1) pre-flight check, default FRAMES bumped to 81 (Wan 1.3B's native training length). - img2vid-wan.js, flf2vid-wan.js: flow_shift 5.0 -> 3.0 to match the upstream test-wan reference scripts. Refresh the C++ smoke-test gating doc in test_wan_video.cpp to reflect that Metal works once the overlay is in place. Drop build.md: the vcpkg overlay rationale already lives next to the overlay (portfile.cmake header), and transient infrastructure doesn't earn its own long-form doc. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(diffusion-cpp): restore build.md The earlier deletion conflated build.md with the vcpkg overlay rationale, but build.md is the package's standalone build guide (prerequisites, build pipeline, cross-compilation, troubleshooting) and is still the target of README.md's "Building from Source" link. Restore it from main, which also picks up the LLVM 19 -> 22 bump. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): address PR review feedback for Wan video gen * Flip default video dimensions to 480x832 portrait (phone-screen friendly). Wan 2.1 T2V 1.3B handles both orientations equally well; the previous 832x480 landscape default disagreed with the example. * Document the flow_shift=0 fall-through sentinel in JSDoc, .d.ts, and C++ struct/handler comments; correct stale "5-8" recommendation to the actually-used 3.0 (matches example + ref scripts). * Make video_frames error messages consistent JS<->C++ and list the full valid set up to 81 (Wan 1.3B native training cap). * Fix frame-duration arithmetic (33 frames is ~2s @ default 16 fps, not ~1.3s @ 24 fps). * Warn when upscaler_* keys are passed to VideoStableDiffusion -- ESRGAN upscale is image-only and was being silently ignored. * Annotate addon.js end_image / control_frames forwarding to call out the typed-array transport (avoids JSON byte-array bloat). * Document the two-level concurrency model around _hasActiveResponse (the busy guard isn't dead under exclusiveRunQueue -- it covers overlap between the released queue lock and an in-flight response). * Update C++ defaults test + JS suggestion-fallback test for the new portrait orientation. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): retarget ggml overlay to merged tetherto/qvac-ext-ggml@2026-01-30 The Wan-Metal work that was carried as a local overlay has all landed upstream on tetherto/qvac-ext-ggml's 2026-01-30 branch: - bc053644 metal: IM2COL_3D op + PAD left-padding for Wan video (#5) - 512e1773 cmake: support qvac hybrid backend packaging (static CPU + dynamic GPU backends, GGML_MAX_NAME prop, graceful no-OpenCL-device fallback, public ggml-opencl.h install -- previously six local overlay patches) - 6d2d24bb / b1923e29 / 05afdc59 metal: tighten IM2COL_3D supports_op to match the CPU-reference invariants (#6) Repin vcpkg/ports/ggml from PR #5's head (bc053644) to PR #6's merge commit (05afdc59) on 2026-01-30, drop all seven local overlay patches since their content is now upstream verbatim, and bump port-version 102 -> 104 to force a clean rebuild of ggml. Net diff: +22 / -201; the overlay now exists only as a baseline pin that overrides the registry's ggml-org/ggml@a8db410a (which still lacks the Wan-required Metal ops). Once the registry baseline catches up to a ref containing this work, vcpkg/ports/ggml/ can be deleted entirely. Verified with npm run build on darwin-arm64: ggml@2026-01-30#104 builds fresh from 05afdc59 with zero patches applied, addon links and tests compile, prebuild installed. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): drop local ggml overlay now that registry serves 2026-01-30#7 The previous commit (04a6496) repointed the local ggml overlay at the merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134 landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the identical REF + SHA512 the overlay was carrying. This means the diffusion-cpp-local overlay is now strictly redundant -- and slightly behind, since the registry's port-version 7 also picks up two improvements the overlay didn't have: - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build off the Apple Accelerate / BLAS path that breaks the iOS toolchain. - The Android backend-glob now also matches `libqvac-ggml-*.so` in addition to `libggml-*.so`, so the qvac-prefixed DL backends get installed alongside the upstream-named ones. So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake, vcpkg.json, usage, android-vulkan-version.cmake) and: - Bump `vcpkg-configuration.json`'s default-registry baseline from a9eae49a -> d1b2497b (the merge commit of registry PR #134), which is the first registry SHA that serves ggml@2026-01-30#7. - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5` to `version>=: 2026-01-30#7` so any later baseline bump can't silently drop us back below the Wan-Metal pin. The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep` marker are kept in place so future overlays can be added without a config flap. Verified end-to-end on darwin-arm64: clean `npm run build` (bare-make generate + build + install) with the build/ tree wiped. vcpkg resolves ggml[core,metal]:arm64-osx@2026-01-30#7 -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875... straight from the registry (no overlay), all 8 ports install in 47s, the addon links cleanly against the registry-supplied libggml*.a, and prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten. Net diff: +2 / -283. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): satisfy standard quotes rule in validateVideoFrames The middle line of the validateVideoFrames Error message was a template literal with no `${...}` interpolation, so `standard` (configured via `npm run lint`) flags it as `quotes`: video.js:39:7: Strings must use singlequote. Adjacent lines 37, 38 use single quotes, and line 40 legitimately uses backticks for `${n}`. Just the one stray backtick-string -- swap to single quotes; no behaviour change. Sanity-checks job 74830306544 on PR #1879 fails on this single line; `npm run lint` passes locally after the swap. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: enable diffusion FA in examples and fix addon paths - Set diffusion_fa: true across SD, FLUX, and integration test ImgStableDiffusion configs so diffusion flash attention matches WAN video examples. - Pass highNoiseDiffusionModelPath (empty when unset) from index.js so native createInstance validation succeeds for image mode; document optional files.highNoiseDiffusionModel in index.d.ts and validate absolute paths. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp(video): pass esrganPath to native createInstance VideoStableDiffusion omitted esrganPath while the binding validates it as a string; mirror image-mode by forwarding files.esrgan or empty string. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: align C++ includes and image codec with inference-addon-cpp - Switch remaining qvac-lib-inference-addon-cpp includes to inference-addon-cpp (vcpkg installs headers under the shorter prefix). - Use image_codec::decodeImage / encodeToPng in processVideo after ImageCodec API rename from decodePng. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: apply clang-format to changed C++ sources Run git-clang-format against ce2ea93 to satisfy the repo formatter on the video addon, image codec, and Wan tests. No behavior changes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): address review comments 1-3 1. Use global addonLogging instead of per-instance setLogger/releaseLogger - Eliminates process-global logger collision (was reintroduced in video.js) - Mirrors fix from ImgStableDiffusion / EsrganUpscaler - video.js no longer manages per-instance logger state 2. Reject width/height values <= 0 in JS validation - Now validates that width > 0 and height > 0 before alignment check - Error message updated to say "positive multiples of 8" - Updated test expectations to match new message 3. Validate double values are integers before casting in C++ - All int casts now check std::floor(d) == d first - Affects: width, height, video_frames, fps handlers - Prevents silent truncation (e.g. 8.5 -> 8) All 70 unit tests pass; build/lint/dts all clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): address review comments 4-7 4. Validate end_image / control_frames dimensions match video dimensions - Added dimension checks in processVideo() before generate_video() - Rejects mismatched frame sizes with clear error messages - Prevents silent corruption or undefined behavior in native layer 5. Use ImageCodec ownership helper instead of raw free() - Replaced FrameBuffersGuard with unique_ptr<uint8_t, FreeDeleter> - Consistent with existing image_codec ownership pattern - Automatic cleanup on exception; no manual free() calls 6. Regenerate mobile integration test manifest - Ran npm run test:mobile:generate - Updated test/mobile/integration.auto.cjs with new runners 7. Add checked buffer size calculation in AviWriter - Validates width * height overflow before multiplication - Validates numFrames * bytesPerFrame overflow - Rejects allocations that would exceed SIZE_MAX - Prevents silent integer overflow in reserve() call All 70 unit tests pass; build/lint/dts all clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): harden int validation, ownership, AVI overflow Follow-up tightening on top of the review fixes for #1879. SdVidGenHandlers: - Extract a single requireInt() helper used by width / height / video_frames / fps / requirePositiveInt. The helper rejects NaN, +/-inf, fractional doubles, and values outside [INT_MIN, INT_MAX] before static_cast<int>, so casts to int are always well-defined and no JSON value silently truncates (e.g. 8.5 -> 8). - Add <cmath>/<climits> includes that were transitively available. SdModel::processVideo: - Replace the bespoke FrameBuffersGuard struct with three plain unique_ptr<uint8_t, image_codec::FreeDeleter> values (initData / endData / controlData). Same lifetime semantics, less custom code, and the control-frame dimension mismatch path now takes ownership *before* the check so a throw can no longer leak the freshly-decoded buffer. AviWriter::encodeFramesToAvi: - Reserve calculation is now step-wise overflow-checked against SIZE_MAX (width vs height vs *3 vs *numFrames) instead of a single multiply that could wrap. - Add a hard upper bound at UINT32_MAX (AVI 1.0 RIFF size header is a uint32_t -- anything past 4 GB cannot be addressed by the spec). - Re-check the final size before patching the RIFF header in case JPEG output overshoots the pre-flight estimate. Tests: - SdVidGenHandlers: new IntCoercion suite covers fractional doubles, out-of-int-range doubles, picojson's own NaN/inf rejection at the JSON layer, and integer-valued doubles (the common case from JSON). - AviWriter: new tests for the overflow guard and the 4 GB RIFF cap, both fire before any encoding starts. - test_wan_video: pin width/height in the existing CorruptControlFrame test so the new dimension check passes for frame [0] and we still exercise the decode-failure path at frame [1]. Add two new cases covering end_image and control_frames dimension mismatch. All 211 C++ tests, 70 JS unit tests, lint and tsc --dts pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): don't eager-require binding via addonLogging CI sanity-checks (JS unit tests on a runner with no native prebuild) was crashing with `AddonError: ADDON_NOT_FOUND` because the top-level `require('./addonLogging')` introduced in e6b13ae transitively pulled in `binding.js` -> `libqvac__diffusion-cpp.so`. The unit tests only exercise JS-side validation and never call `load()`, so they used to work without the prebuilt addon -- this regression broke that. Match `ImgStableDiffusion` instead: drop the per-instance native logger plumbing entirely (it's dead code anyway after the e6b13ae refactor, since `_connectNativeLogger` was no longer called), and document in the constructor JSDoc that callers wire up native C++ logs once globally via `addonLogging.setLogger(...)`. Net diff: - Remove `const addonLogging = require('./addonLogging')` at top. - Remove `_connectNativeLogger` / `_releaseNativeLogger` methods and their two stale call sites. - Remove `LOG_METHODS` (only used by the removed method) and `this._binding` (used to keep a handle for the removed release path; the binding is now scoped to `_createAddon` only, matching `ImgStableDiffusion::_createAddon`). - JSDoc on `args.logger` now mirrors `index.js` and points users at `addonLogging.setLogger`. Verified: JS unit tests 70/70 pass with the prebuilds directory moved aside, lint clean, tsc --dts clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): validate init_image dims; reject unsupported lora Two reviewer-flagged regressions on PR #1879: 1. blocker (gabrielgrigoras-serv): processVideo() validates dimensions for end_image and every control_frames[i] but not for init_image. A caller passing width/height that don't match the decoded init_image would hand mismatched (width, height) and frame pixel stride to generate_video(), producing inconsistent frame data downstream (and risking VAE segfaults). Fix: add the same dimension check in SdModel.cpp processVideo() right after the init_image decode, throwing StatusError on mismatch -- consistent with the existing end_image / control_frames checks. All three checks now compare against vid.width / vid.height as the single source of truth for the video's final dimensions. Ownership of the freshly-decoded init pixel buffer is taken into the unique_ptr *before* the dim check, mirroring the control_frames path so a mismatch can't leak the buffer. 2. gianni-cor: params.lora silently dropped on the video path -- video.js validated it as a non-empty absolute path and video.d.ts advertised `lora?: string`, but SD_VID_GEN_HANDLERS has no "lora" entry and SdModel::processVideo never touches sd_vid_gen_params_t::loras, so any LoRA passed through was swallowed by the unknown-keys branch in applySdVidGenHandlers and silently produced LoRA-less output. Fix B applied (reviewer's preferred "out of scope" option): - video.js: replaced the absolute-path validation with a loud TypeError('params.lora is not supported for video generation yet'), so existing callers fail at the JS boundary instead of getting silent LoRA-less output. - video.d.ts: dropped `lora?: string` from VideoGenerationParams. - video-validation.test.js: collapsed the four old lora cases (empty / non-string / relative / absolute) into one parametrised test that asserts the new TypeError fires for every shape, so a future re-introduction of the JS validation can't bring back the silent-drop regression. When LoRA-on-video is wired through native (mirror of processImage's prepareLoras() + sd_img_gen_params_t::loras), the right path is to restore the absolute-path validation here and add a "lora" handler to SD_VID_GEN_HANDLERS, NOT to revert the d.ts. C++ test changes: - new Img2VidRejectsInitImageWithWrongDimensions covers the blocker. - Flf2VidRejectsCorruptEndImage pinned width/height to 64 so the new init dim check passes for the 64x64 init and we still reach the intended end-decode-failure path (same approach as the existing Img2VidRejectsCorruptControlFrame fixture). Verified: 67/67 JS unit tests pass with and without prebuilds, 176/176 C++ tests pass (1 opt-in Wan smoke skipped, requires ~8GB weights), lint and tsc --dts clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): regression + 7 review-batch fixes (NaN/Inf guards, cancel, etc.) Addresses all 8 outstanding comments on PR #1879 (one regression from commit 59f2663 plus a CHANGES_REQUESTED batch of seven items). Major points below; per-file rationale in the inline comments. == Regression fix (highest priority) * gianni-cor flagged that the new init_image strict-equality check from commit 59f2663 rejects every off-grid frame with a confusing error citing wrapper-picked dims. Root cause: addon.js _fillDimsFromImage was silently doing Math.ceil(d/8)*8, so a 100x100 init_image got dispatched as 104x104 and the native check then threw "100x100 != 104x104" -- citing a value the caller never passed. Fixes: - addon.js _fillDimsFromImage now passes dims through verbatim (no rounding). The image SDEdit path already realigns internally (SdModel.cpp ~600) and the FLUX2 ref path uses auto_resize_ref_image, so dropping the rounding is safe across every path. - video.js _runInternal pre-empts the cryptic native error with a JS-layer off-grid probe: when width/height aren't explicit it reads init_image / end_image / control_frames[i] dimensions and throws a clear "your image is off-grid, pre-align or pass explicit dims" message naming the exact buffer. - Removes the ceil-vs-round inconsistency wart between _fillDimsFromImage (ceil) and the user-facing validator (round). - Three new JS regression tests for off-grid init / end / control, plus one positive test for explicit aligned dims overriding the probe. == JS hardening * params.prompt is documented Required but was never validated -- undefined / "" / 42 each produced a different failure mode (silent noise, silent noise, far-away C++ error). video.js now throws a loud TypeError at the wrapper boundary. Four new prompt-validation tests. * mapAddonEvent JobEnded fallback accepted every typed-array view -- works today only because uint8_t is the sole registered TypedArrayOutputHandler. When frameCallback (SdModel.hpp:139) gets wired through to JS, every per-frame event would have been misclassified as JobEnded and the response stream would have closed after the first frame. One-token fix: add `&& !ArrayBuffer.isView(rawData)` to the discriminator. ArrayBuffer.isView is true for every TypedArray + DataView, false for plain objects -- exactly the discrimination needed for the runtime-stats POJO. == C++ parser hardening (NaN / Inf / int64 / range) * Promoted requireInt from SdVidGenHandlers.cpp's anonymous namespace into parsers::, and added two siblings: - requireFiniteFloat: rejects NaN / +inf / -inf before the float cast (NaN compares false against every bound, so range checks of the form `f < lo || f > hi` previously let it sneak through). - requireInt64: same finite + integer guards as requireInt, range check against representable [INT64_MIN, INT64_MAX] doubles. - requireFiniteFloatInRange: convenience wrapper for [lo, hi] checks. * Routed every previously-vulnerable cast through the new helpers: - SdVidGenHandlers.cpp: seed (int64), cfg_scale, flow_shift, high_noise_cfg_scale, high_noise_flow_shift, vae_tile_overlap, cache_threshold, moe_boundary, strength, vace_strength - SdGenHandlers.cpp (image path, reviewer asked for symmetric fix): eta, cfg_scale, guidance, img_cfg_scale, seed, batch_count, strength, clip_skip, vae_tile_overlap, cache_threshold, width, height, steps, parseUpscaleRepeats * parseVaeTileSize (SdParsers.cpp): numeric form now routes through requireInt (rejects NaN/Inf/fractional/out-of-range), and BOTH forms (numeric and "WxH" string) now reject <= 0. Five new tests. == Cancellation gap + typed status * SdModel.cpp processVideo cancelRequested_ was checked exactly once after generate_video() returns -- the slow tail (per-frame PNG fan-out + AVI mux, multi-second on 81-frame 832x480 videos) had no cancellation visibility. Added 2 checks: top of frame-callback loop body, and immediately before encodeFramesToAvi. * Switched both Job cancelled throws (image path at SdModel.cpp:730, video path at :987, plus the 2 new C1 sites) from bare std::runtime_error to StatusError tagged with localCodeMsg="Cancelled", so the JS layer can discriminate cancel from real internal failures via codeString() ("[ General :: Cancelled ]") instead of string-matching the exception message. Note: this PR deliberately does NOT add `Cancelled = 6` to the shared inference-addon-cpp Errors.hpp enum, because that header ships via vcpkg to every package in the monorepo and a cross-package coordinated change is out of scope. Instead we use the 3-arg StatusError ctor (addonId, localCodeMsg, errorMsg) which produces the same codeString without touching the shared enum. When the enum is updated later, the 4 call sites can switch to the 2-arg ctor in a one-line follow-up. == C5 (preview_*) -- product decision deferred * The header comment at SdCtxHandlers.hpp:112 claimed preview_mode et al are "Wired to sd_set_preview_callback() in SdModel::process()", but a grep across packages/diffusion-cpp for sd_set_preview_callback returns zero matches -- the four config keys are validated and stored but the upstream callback is never installed, so they're a silent no-op end-to-end. Downgraded the misleading comment to an explicit TODO(QVAC-18026 follow-up) documenting the gap and the two viable resolution paths (wire it up alongside sd_set_abort_callback, OR remove the handlers + fields + tests). Reviewer asked which path is intended; this commit picks neither and just stops claiming the wiring exists. The choice can land in a separate PR without holding this one up. == Test surface * +8 JS tests (prompt validation x4, off-grid probe x4) * +5 C++ tests (vae_tile_size zero/negative/fractional/out-of-range rejection, plus the existing IntCoercion suite carried over to the promoted helpers transparently) * Cancel-context test updated to assert the typed "[ General :: Cancelled ]" codeString in addition to the message. Verified locally: JS unit tests: 75/75 pass with prebuild, 75/75 also without (CI sanity-checks mode, no native binary loaded) C++ unit tests: 209/210 pass, 1 opt-in skip (SdWanHappyPathTest needs ~8GB Wan weights) npm run lint: clean npm run test:dts: clean Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): release 0.8.0 Bumps @qvac/diffusion-cpp to 0.8.0 and documents the Wan 2.1 / Wan 2.2 video pipeline shipped since 0.7.0: new VideoStableDiffusion class (txt2vid / img2vid / flf2vid), MoE high-noise expert routing, streaming MJPG AVI muxer, refactored download helpers + Wan model script, plus the supporting JS + C++ test coverage and validation hardening. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): re-align auto-detected img dims to multiple of 8 _fillDimsFromImage was passing raw image dimensions through verbatim since fe4d10f, but the native SdGenHandlers validates width/height % 8 == 0 before the downstream alignment in SdModel::processImage ever runs. Any img2img call with a non-aligned source image (e.g. the bundled 500x627 von-neumann.jpg used by the FLUX2 i2i integration test) therefore failed with: height must be a positive multiple of 8, got: 627 Restore the Math.ceil(d/8)*8 round-up that was removed in fe4d10f. The original motivation for the removal -- avoiding a spurious dim mismatch on the video path where processVideo strict-compares decoded frame dims against vid.width/vid.height -- is already handled at the JS layer by VideoStableDiffusion's off-grid pre-validation in video.js, which runs before this helper and rejects unaligned init/end/control frames with a clear caller-facing error. The ceil() is therefore a no-op on the video path. Co-authored-by: Cursor <cursoragent@cursor.com> * style(diffusion-cpp): apply clang-format to drifted C++ sources cpp-lint surfaced clang-format drift in 4 files that accumulated across recent Wan-video commits. No semantic changes -- only mechanical line-wrap / arg-break placement to match the project's .clang-format. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/test): use package export for video module in wan integration test The generate-video-wan.test.js test was using a relative import (require('../../video')) that breaks when test files are bundled and relocated to the test-framework backend directory during mobile test setup. Change to the package export pattern (@qvac/diffusion-cpp/video) used by other integration tests, which remains valid regardless of file location. Fixes: https://github.com/tetherto/qvac/actions/runs/25929776543/job/76221440417 Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): expose video API from package root Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): repair variable names in SdModel after merge Co-authored-by: Cursor <cursoragent@cursor.com> * style(diffusion-cpp): apply git-clang-format Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

…xt-ggml@2026-01-30 The Wan-Metal work that was carried as a local overlay has all landed upstream on tetherto/qvac-ext-ggml's 2026-01-30 branch: - bc053644 metal: IM2COL_3D op + PAD left-padding for Wan video (#5) - 512e1773 cmake: support qvac hybrid backend packaging (static CPU + dynamic GPU backends, GGML_MAX_NAME prop, graceful no-OpenCL-device fallback, public ggml-opencl.h install -- previously six local overlay patches) - 6d2d24bb / b1923e29 / 05afdc59 metal: tighten IM2COL_3D supports_op to match the CPU-reference invariants (#6) Repin vcpkg/ports/ggml from PR #5's head (bc053644) to PR #6's merge commit (05afdc59) on 2026-01-30, drop all seven local overlay patches since their content is now upstream verbatim, and bump port-version 102 -> 104 to force a clean rebuild of ggml. Net diff: +22 / -201; the overlay now exists only as a baseline pin that overrides the registry's ggml-org/ggml@a8db410a (which still lacks the Wan-required Metal ops). Once the registry baseline catches up to a ref containing this work, vcpkg/ports/ggml/ can be deleted entirely. Verified with npm run build on darwin-arm64: ggml@2026-01-30#104 builds fresh from 05afdc59 with zero patches applied, addon links and tests compile, prebuild installed. Co-authored-by: Cursor <cursoragent@cursor.com>

…peech (#1590) * feat: Add runStream() which takes input as a stream * add integration tests * uncomment cb tests * chore: Add cb streaming example * feat: Add TTS streaming funcitonality and example * Update tts addon version * Remove chatterbox example * add new error code for tts streaming fail * Move common code to util * fix: Use z.infer to define TextToSpeechStreamClientParams * Move TextToSpeechStreamSession to schemas * Track subscriber current index and trim queue when all subscribers consumed past items * add missing unit tests * fix: drive done promise from multicast pump lifecycle * fix: Forward chunkIndex and sentenceChunk in sentence-stream mode to client * fix: Use correct error code for tts stream failure * chore: Add supertonic stream test in tts-tests.ts * fix: Make tts client more readable * Remove closures and inline async generators * fix: Subscribe eagerly in sentenceStreamTts to avoid late-subscriber data loss TtsMulticast.pump() starts in a microtask on construction, while the returned async generators only call subscribe() when first iterated. If the consumer iterated one generator before the other, the first subscriber could trim the queue before the second ever registered, silently dropping earlier frames. Subscribe synchronously for both bufferStream and chunkUpdates before returning, so both subscriber indexes are in place before pump pushes its first item. Made-with: Cursor * fix: Close TTS stream on server-sent done frame Remove the dead `null` sentinel from `processTextToSpeechStreamLine` and instead close `parseTextToSpeechStreamLines` after yielding the terminal `done: true` frame, so consumers don't rely on the server closing the socket to stop iteration. Made-with: Cursor * fix: Reject sentenceStream without stream in textToSpeech Previously `sentenceStream: true` combined with `stream: false` fell through to the collect path, silently dropping the sentence-stream parameters and returning no `chunkUpdates`. Fail fast at the dispatcher with a clear error so the contract mismatch surfaces to the caller instead of being swallowed. Made-with: Cursor * fix: Release TtsMulticast subscriber slot on early break Wire a try/finally into drain() so that when a consumer breaks out of the for-await (or the generator is .return()'d / throws), the slot is parked at +Infinity via unsubscribe(). This prevents a stale low min-index from permanently pinning trimConsumed, which otherwise leaked the queue for the entire RPC stream. Made-with: Cursor * fix: Guard TTS stream write after close and preserve UTF-8 boundaries Client: - Track a `closed` flag in `textToSpeechStream` duplex session, set by `end()` / `destroy()`. Subsequent `write()` calls now throw a typed `TextToSpeechStreamFailedError` instead of propagating a raw Bare/Node "write after end" stream error. - `end()` is idempotent so accidental double-close no longer errors. Server: - `buffersToUtf8Fragments` previously decoded each incoming Buffer via `toString("utf8")`, which corrupts any multi-byte codepoint whose bytes straddle a chunk boundary (common with CJK / emoji / accented scripts emitted as LLM token deltas). Added a small tail-buffer that finds the last complete UTF-8 codepoint end in the combined buffer and defers trailing incomplete bytes to the next chunk. Any dangling partial sequence is flushed on stream end. Made-with: Cursor * fix: Order TEXT_TO_SPEECH_STREAM_FAILED code and document it - Move TEXT_TO_SPEECH_STREAM_FAILED (52415) to the end of the 52400 Model Operations block so the ordering in SDK_SERVER_ERROR_CODES matches the numeric sequence (…52413, 52414, 52415). - Add the missing row for 52415 to the (latest) errors.mdx table, per the sdk/docs-freshness rule that the error table stay in sync whenever a new code is introduced. Made-with: Cursor * fix: Register operation metrics for textToSpeechStream Only `textToSpeech` was registered in `operation-metrics.ts`, so the duplex `textToSpeechStream` path silently skipped `modelExecutionTime`, `audioDuration`, and `totalSamples` gauges even though the server already collects the same `TtsStats` via `collectTtsStats()` on the final chunk. Mirror the non-streaming registration so the streaming path has parity observability. Made-with: Cursor * fix: Harden TTS client done-promise, iterator, and parse cost Polish the remaining review nits on the TTS client streaming surface. - #3 TtsMulticast.pump now rejects the `done` promise with the fatal error instead of resolving `false`. An internal `.catch(() => {})` silences unhandled-rejection warnings when the caller only iterates the buffer/chunk streams and never awaits `done`; re-awaits still see the rejection. - #6 TextToSpeechStreamSession[Symbol.asyncIterator] no longer throws synchronously on a second iteration; it returns an iterator whose first `.next()` rejects, so `for await` surfaces the error in the normal async control flow rather than the iterator protocol. - #9 plainTtsBufferStream / collectTtsBuffer wrap the RPC loop in try/catch/finally so `done` always settles: resolve(true) on the terminal frame, reject with the real error on exceptions, and resolve(false) on early consumer break. Previously `await done` could hang forever when the consumer bailed out early. - #11 Skip per-frame ttsResponseSchema.parse() in all three paths; rely on the discriminated-union narrowing at the RPC boundary. Drops the per-PCM-frame Zod validation cost for large sentences. Made-with: Cursor * fix: Tighten textToSpeechStream schema surface - Add .positive() to maxBufferScalars and flushAfterMs to match the existing constraint on sentenceStreamMaxChunkScalars. Previously a caller could pass negative values straight through to the addon. - Un-export textToSpeechStreamRequestBaseSchema — consumers only need the finalized textToSpeechStreamRequestSchema, and the base is an implementation detail of the shared object shape. The exported type alias TextToSpeechStreamClientParams continues to derive from the base via `typeof`, so nothing on the public type surface changes. Made-with: Cursor * fix: Cross-platform tmp path and safer PCM append in TTS examples - playPcmInt16Chunk now writes the intermediate WAV chunk under os.tmpdir() / path.join instead of a hard-coded /tmp/qvac-tts-chunk-… path. The previous code's Windows branch was unreachable in practice because the POSIX /tmp directory doesn't exist there; this uses %TEMP% on Windows automatically. - appendPcmSamples switches from `target.push(...chunk.slice(i, end))` to `Array.prototype.push.apply(target, chunk.slice(i, end))`. Same semantics, but avoids allocating the spread rest array per batch and is closer to a memcpy-style concat in V8. Made-with: Cursor * fix: Catch zero-chunk regressions in TTS sentence-stream test - TtsExecutor.makeSentenceStream now returns `{ passed: false, ... }` when the chunkUpdates iterator yields no chunks / no samples. The previous executor always returned a formatted string regardless of counts, so a regression that silently emitted zero chunks would still have looked like a pass. - ttsSupertonicSentenceStream's expectation upgraded from `{ validation: "type", expectedType: "string" }` to `{ validation: "contains-all", contains: ["sentence-streamed", "chunks", "samples"] }`. The executor's zero-case failure string lacks "sentence-streamed", so the contains-all match fails on regression. Made-with: Cursor * fix: Apply stream default locally and throw typed error on tts mismatch Previous guard only rejected the explicit `stream: false + sentenceStream: true` combination. A caller passing `{ modelId, text, sentenceStream: true }` with `stream` omitted silently fell through to `collectTts` while the server's Zod `.default(true)` still ran the sentence-stream branch and emitted chunk frames — which the client then discarded, dropping all chunk metadata. - Resolve the `stream` default locally (`params.stream ?? true`) so the client's dispatch routing matches the server's Zod-applied routing, and an omitted `stream` now correctly lands in `sentenceStreamTts` or `plainStreamTts`. - Only the explicit `sentenceStream: true + stream: false` combination is rejected, and it now throws `TextToSpeechStreamFailedError` (code 52415) instead of a bare `new Error(...)` so callers can discriminate by error code like everywhere else in the SDK. Made-with: Cursor * remove inline defaults for sentenceStream and stream * Use TtsMulticast in unit test instead of mock --------- Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

…#1983) * feat: add @qvac/tts-ggml package (Chatterbox English on qvac-tts.cpp) New Bare addon wrapping the `qvac-tts::qvac-tts` static library (backed by the `tts-cpp` port added in tetherto/qvac-registry-vcpkg). API-compatible with the Chatterbox engine exposed by `@qvac/tts-onnx` so downstream consumers can swap backends without touching orchestration code. ## Scope * First iteration. Supports Chatterbox **English** only. Chatterbox multilingual, LavaSR enhancer, Supertonic engine, and streaming are out of scope and remain in `@qvac/tts-onnx`. They'll land alongside the evolution of qvac-tts.cpp. * Native backend is the static `qvac-tts` library from the QVAC vcpkg registry (`ports/tts-cpp`, baseline `2026-04-21`). No ONNX Runtime dependency. ## JS surface * `@qvac/tts-ggml` exports `TTSGgml` with the same method shape as `ONNXTTS`: `run` / `runStream` / `runStreaming` / `reload` / `unload` / `destroy`. * `files: { modelDir }` looks for `chatterbox-t3-turbo.gguf` + `chatterbox-s3gen.gguf` side-by-side; `files.t3Model` / `files.s3genModel` override the defaults. * Options: `referenceAudio`, `voiceDir` (baked profile), `seed`, `nGpuLayers`, `threads`, `outputSampleRate`, plus placeholders for the upcoming streaming flags (`streamChunkTokens`, `streamFirstChunkTokens`, `cfmSteps`). * Shared reusable lib code (`lib/textChunker.js`, `lib/textStreamAccumulator.js`, `addonLogging.*`) is copied verbatim from `@qvac/tts-onnx`. * New error class `QvacErrorAddonTTSGgml` uses codes **13001–14000** to avoid collisions with `@qvac/tts-onnx` (7001–7011) when both packages are loaded in the same Bare process. ## Native addon * `addon/src/model-interface/chatterbox/ChatterboxModel.{hpp,cpp}` — `IModel` + `IModelCancel` implementation. First-iteration strategy: assemble argv for `qvac_tts_cli_main` with a scratch `.wav` output path, call it synchronously, then parse the resulting 16-bit mono PCM wav back into `std::vector<int16_t>` for the JS handler. Consequences: every job re-loads the model (~700 ms + inference time), no mid-synthesis cancellation, no streaming. The follow-up milestone replaces this with a persistent, struct-based API once qvac-tts.cpp exposes one. * `addon/src/js-interface/{JSAdapter.{hpp,cpp}, binding.cpp}` — JS-to-C++ config bridging (same string-map pattern as `@qvac/tts-onnx`) and the `BARE_MODULE(qvac_tts_ggml, ...)` registration exposing `createInstance` / `runJob` / `reload` / `activate` / `cancel` / `destroyInstance` / `loadWeights` / `setLogger` / `releaseLogger`. * `addon/src/addon/AddonJs.hpp` — JS-facing `createInstance` / `runJob` / `reload` wrappers that register a `JsAudioOutputHandler` emitting `{ outputArray: Int16Array, sampleRate: number }` to JS. ## Build / registry * `CMakeLists.txt` uses `find_package(qvac-tts-cpp CONFIG REQUIRED)` and the standard `cmake-bare` + `cmake-vcpkg` scaffolding (shape matches `@qvac/transcription-whispercpp`). * `vcpkg.json` depends on `tts-cpp` (with a `vulkan` feature passthrough) plus `qvac-lib-inference-addon-cpp`, `qvac-lint-cpp`, and `gtest`. * `vcpkg-configuration.json` points at tetherto/qvac-registry-vcpkg. NOTE: the baseline pin here is inherited from `@qvac/transcription-whispercpp` and **must be bumped** to a commit that contains the `tts-cpp` port once that registry PR lands. A follow-up commit will update it. ## Tests & examples * Integration + unit test files for Chatterbox English are copied verbatim from `@qvac/tts-onnx` with only mechanical renames (`ONNXTTS` -> `TTSGgml`, `QvacErrorAddonTTS` -> `QvacErrorAddonTTSGgml`, `@qvac/tts-onnx/text-chunker` -> `../../lib/textChunker.js`). Some paths in `test/integration/addon.test.js` still import Supertonic / LavaSR helpers that don't exist in this package — those test blocks will fail fast when the file loads, which is expected until those backends get their own ggml packages. * Examples: `chatterbox-tts.js`, `chatterbox-streaming-tts.js`, plus shared `wav-helper.js` + `pcm-chunk-player.js`. ## What's not in this PR (known gaps) * No docs: README, NOTICE, CHANGELOG, PULL_REQUEST_TEMPLATE changes will land in a single documentation pass once the registry + fork commits have merged upstream. * `vcpkg-configuration.json` baseline needs to point at a qvac-registry-vcpkg commit that ships `tts-cpp` (pending the registry PR). * Actual `npm run build` requires the registry and fork commits to be on `main` of their respective upstream repos. * chore: point tts-ggml vcpkg baseline at the tts-cpp-bearing registry commit Bumps `vcpkg-configuration.json` to GustavoA1604/qvac-registry-vcpkg at commit 1e2839680b6be8d8ffff889a9c29b966c176098c — the commit that adds the `tts-cpp` port. Paired with the `qvac-tts` library already pinned in the port's `portfile.cmake` (GustavoA1604/chatterbox.cpp @ 0fe4a521618cc30358040b29d75d4261b31cbb60). Will be re-pointed at tetherto/qvac-registry-vcpkg once the registry PR lands upstream. * chore: tts-ggml: trim tests + examples to Chatterbox English, restore mobile wrapper Second pass over @qvac/tts-ggml after the build started passing: prune everything that only made sense for the ONNX-era multi-engine scope and adapt the remaining Chatterbox-English bits to the GGUF + file-path reference-audio contract. Restores `test/mobile/` so the Android build has something to point at. ## C++ * `ChatterboxModel.cpp`: the `ArgvBuilder::buildArgv` doc comment contained `**/` which closed the block comment early and broke the build. Rewrote as a `//` comment. ## Examples * `examples/chatterbox-tts.js` — rewrite for v0 contract: single `<text>` argv, `files: { modelDir }` pointing at the two GGUFs, `referenceAudio` is now a wav **path** (addon passes it to `--reference-audio`) instead of a Float32Array. Drops english/multilingual arg and the CHATTERBOX_VARIANT switch that picked which `.onnx` files to load. * Removed `examples/chatterbox-streaming-tts.js` + `examples/pcm-chunk-player.js`. The v0 addon re-loads the model per `run()` call — exposing streaming would mislead. Both come back alongside the persistent-engine milestone. * `package.json`: `npm run example` now passes a default text so it runs without extra args. ## Tests ### Kept as-is (engine-agnostic) * `test/unit/textChunker.test.js` * `test/mock/{MockedBinding,utils}.js` * `test/utils/{wav-helper,pcmConcatenator,loader.fake,runWhisper,runTTS}.js` * `test/reference-audio/jfk.wav`, `test/data/sentences-*.js` ### Mechanical fixes * `test/unit/tts.error.test.js` — fix error-code assertions to the tts-ggml range (`13001–14000`); was still checking the `@qvac/tts-onnx` range (`7001–7011`). * `test/unit/tts-ggml.lifecycle.test.js` — fix stale `QvacErrorAddonTTS` import to `QvacErrorAddonTTSGgml`; switch the stubbed model to `{ t3Model, s3genModel }` GGUFs and drop the non-existent `engine: 'chatterbox'` option. * `test/unit/tts-ggml.sentence-stream.test.js` — same GGUF/engine cleanup. ### Rewritten * `test/unit/chatterbox.inference.test.js` — drop tests that asserted the old ONNX file shape (`tokenizer / speechEncoder / embedTokens / conditionalDecoder / languageModel`), the removed `engine` detection and the wrong `getModelKey` return value (`'onnx-tts'` -> `'tts-ggml'`). New tests cover: `modelDir` derives the two GGUF paths; explicit `t3Model` / `s3genModel` override the defaults. The mocked-binding run/reload/cancel flow stays. * `test/integration/addon.test.js` — fresh, ~180 LoC, Chatterbox-English only. Ensures the GGUFs are present, runs the short sentence set through `loadChatterboxTTS` + `runChatterboxTTS[WithSplit]`, and (on darwin only) runs a whisper-based WER check via the existing `runWhisper` util. Drops the Chatterbox-multilingual block + every Supertonic + LavaSR block that doesn't apply to this package. * `test/utils/runChatterboxTTS.js` — rewrite for the GGUF contract: `files: { modelDir, t3Model, s3genModel }`, `referenceAudio` as a file path that falls back to `test/reference-audio/jfk.wav` (or the mobile test-asset when `global.assetPaths` is present). No more WAV decode / resample on the JS side. * `test/utils/downloadModel.js` — trim from 1007 LoC to 280. Drops the Supertonic + LavaSR + Chatterbox-multilingual + Cangjie downloaders. Keeps the shared HTTP/curl infrastructure and `ensureWhisperModel` (still used by the integration WER check). `ensureChatterboxModels` is now **check-only**: it verifies `chatterbox-t3-turbo.gguf` + `chatterbox-s3gen.gguf` exist locally and, if missing, prints the exact commands for generating them from the qvac-tts.cpp (née chatterbox.cpp) conversion scripts. Once the GGUFs land on a canonical HuggingFace repo we'll wire up download URLs here. ## Scripts * `scripts/ensure-chatterbox.js` — simplify to a single invocation against `./models/`. Drops the variant / language matrix that the ONNX downloader needed. * `scripts/ensure-models.js` — now a thin alias to `ensure-chatterbox.js`. Drops the Supertonic + LavaSR orchestration. ## Mobile * Restored `test/mobile/{integration.auto.cjs, integration-runtime.cjs, testAssets/jfk.wav}` so the Android build has a wrapper to point at. * `package.json`: re-added `test/mobile` to the `files` list. ## Gitignore * Ignore generated `.clang-format` / `.clang-tidy` / `.valgrind.supp` (produced by the top-level `configure_file(...)` calls) and `build_*/` dirs (bare-make convention). ## Verified locally * `npx standard "test/**/*.js" "*.js" "lib/*.js"` — clean. * `npm run test:unit` — 38/38 pass (105/105 asserts). * `npm run build && bare examples/chatterbox-tts.js "Hello from qvac tts ggml."` produces a 24 kHz wav as expected. * Add streaming support * Update ggml backend to use separate ggml repo * tts-ggml: consume renamed tts-cpp library (2026-04-24#1) Upstream chatterbox.cpp renamed the package + namespace + target from qvac-tts to tts-cpp and tightened the library boundary; pick up the new artefacts here: - find_package(qvac-tts-cpp CONFIG REQUIRED) -> find_package(tts-cpp CONFIG REQUIRED) - qvac-tts::qvac-tts -> tts-cpp::tts-cpp - qvac_tts::chatterbox -> tts_cpp::chatterbox (engine ptrs, EngineOptions, SynthesisResult, forward-decls in ChatterboxModel.hpp) - #include <qvac-tts/chatterbox/engine.h> -> #include <tts-cpp/chatterbox/engine.h> - Doxygen / inline doc references to the old names refreshed alongside the code changes. vcpkg wiring: - vcpkg-configuration.json baseline bumped to qvac-registry-vcpkg commit bc30b0b (ports/tts-cpp renamed and repointed at chatterbox.cpp@f8f9145). - vcpkg.json tts-cpp constraint bumped to 2026-04-24#1 (the port that carries the rename + namespace + install(EXPORT) changes). Verified with a cold bare-make generate + bare-make build against the new port, and the addon's existing unit + integration test suites. Made-with: Cursor * tts-ggml: bump tts-cpp port to 2026-05-07 + registry baseline Picks up the round-3 review-fix wave landed on the tts-cpp port: e673182 scrub stale patches/ refs from README (N10) 8ba10a6 drop unreachable TTS_CPP_GGML_LIB_PREFIX block (N8) 4b5d2d7 mirror N1-N7 fixes from chatterbox.cpp source-of-truth - N1 supertonic alive-registry guard against freed-backend gallocr_free assert on hot-swap (Vulkan/Metal/CUDA) - N2 drop dead g_sink_* state, soften log_set docstring - N3 Turbo BPE try/catch (exception-safe Engine ctor) - N4 STFT cancel checkpoint + tighter Engine::cancel() doc - N5 document s3gen_preload/unload refcount semantics - N6 drop dead cached_text_lc Supertonic shim - N7 fix misleading "no copy" view-vs-copy log wording Plus the integrated-port-only round-2 fixes that landed earlier: fa0d490 close patches/-deleted regression: TTS_CPP_USE_SYSTEM_GGML now defaults ON; bundled-without-patches hard-errors at configure time with a pointer at the ggml-speech vcpkg port. ae34c58 README rewritten for integrated/vcpkg context. a2f2dd6 top-level qvac-ext-lib-whisper.cpp README points at the tts-cpp/ subtree (alongside parakeet-cpp/). Public API used by ChatterboxModel (tts_cpp::chatterbox::Engine / EngineOptions / SynthesisResult / s3gen_preload / s3gen_unload) is backward-compatible: the new port adds Engine::backend_name(), MTL-variant fields on EngineOptions (language / cfg_weight / min_p / exaggeration), and a separate tts_cpp::supertonic::Engine class, but nothing this consumer was already calling has changed. Edits: packages/tts-ggml/vcpkg.json - tts-cpp dep: version>=2026-04-24#1 -> version>=2026-05-07. packages/tts-ggml/vcpkg-configuration.json - default-registry baseline: bc30b0b (April 2026 fork-only state) -> 16b91afdcfd59baea60e81f3da94f49311ef2a97. The new baseline pulls in the post-tetherto-merge state (parakeet-cpp port at 932d5d9, ggml-speech port-version 1 at f07bdd0) plus the new tts-cpp port (16b91af) on the developer's GustavoA1604 registry fork. Smoke-test plan: after running `vcpkg install` against the new baseline, the tts-cpp port's vcpkg_from_github resolves at GustavoA1604/qvac-ext-lib-whisper.cpp@e673182 (tts-cpp branch) until the upstream PR merges. ChatterboxModel should build and synthesize identically; expanding to Multilingual + Supertonic flows is the follow-up commit on the package side. Co-authored-by: Cursor <cursoragent@cursor.com> * Add chatterbox multilingual and supertonic * Add mobile integration tests * tts-ggml: drop clang-19 pin in linux-clang toolchain The toolchain hardcoded `clang-19` / `clang++-19` (versioned binary names) since the package's first commit (0a2c978). Linux CI hadn't exercised this path before — the new on-pr-tts-ggml.yml -> integration matrix is the first time it does, and it fails on every linux runner (ai-run-ubuntu-22.04, ai-run-linux-gpu, ubuntu-24.04-arm) at vcpkg's "detect_compiler" step because none of the GH-hosted images ship a `clang-19` symlink: Detecting compiler hash for triplet x64-linux... error: while detecting compiler information: ... CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:127 (message): Command failed: ... -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE= .../tts-ggml/vcpkg/triplets/../toolchains/linux-clang.cmake ... Match parakeet's working pattern (qvac-lib-infer-parakeet/vcpkg/ toolchains/linux-clang.cmake): use unversioned `clang` / `clang++` so each runner picks up its image's default clang (clang-15 on ubuntu-22.04, clang-18 on ubuntu-24.04, whatever the AI runners ship). The `-stdlib=libc++` flag added by x64-linux.cmake / arm64-linux.cmake is honoured by every reasonable clang version. Co-authored-by: Cursor <cursoragent@cursor.com> * Add C++ tests and coverage; fix linux build * tts-ggml: address PR review feedback Bundle of correctness, hygiene, and CI-doc fixes from the recent code review. Each item below has its own paragraph in the diff comments. - #1 files-array: add test/utils/runSupertonicTTS.js + test/data/sentences-{medium,long}.js to package.json so consumers running the integration tests from the npm tarball don't crash with `Cannot find module ../utils/runSupertonicTTS`. - #2 deps: move @qvac/langdetect-text from runtime dependencies to devDependencies (it's only referenced from examples/, which aren't in the published files list). - #3 race-fix: ChatterboxModel::process()'s post-synthesize streaming detection used to read engine_->options() outside engineMu_, racing with reload(). synthesize() now returns SynthesizeResult { pcm, wasStreaming } where wasStreaming is captured under the engine lock against the local shared_ptr so process() doesn't have to touch engine_ again. - #4 deferred-load: ChatterboxModel + SupertonicModel constructors used to call load() eagerly, so JsInterface::createInstance() (sync on the JS thread) was parsing ~370 MB of GGUF on the Bare event loop. Both models now implement IModelAsyncLoad: constructors validate + return; the actual load is deferred to waitForLoadInitialization(), which the new addon_js::activate wraps inside JsAsyncTask::run so the parse runs on a worker thread. binding.cpp registers addon_js::activate in place of JsInterface::activate; tts.js now awaits the resulting promise. - #5 dead code: drop _resolvePath (unused), drop the (void)inputObj read in AddonJs.hpp::runJob, document FAILED_TO_PAUSE / FAILED_TO_STOP / JOB_ALREADY_RUNNING in lib/error.js as reserved-but- not-thrown so future maintainers don't delete them blindly (the unit suite asserts the values). - #6 cancel-reset: SupertonicModel grew Chatterbox's cancelRequested_ reset pattern: cancel() sets it, synthesize() fast-fails on it, process() resets it per call so a stale cancel doesn't poison the next run. - #7 useGPU comment: explain in JSAdapter::buildChatterboxConfig that the JS layer is the source of truth for useGPU and nGpuLayers wins downstream; left a pointer to std::optional<bool> if a future caller ever needs to distinguish "absent" from "explicit false". - #10 fork pointers: README.md and test/utils/downloadModel.js no longer point at GustavoA1604/chatterbox.cpp; both reference the upstream tetherto/qvac-ext-lib-whisper.cpp/tts-cpp tree now. - #9 doc: integration-mobile-test-tts-ggml.yml gained a header comment on the build-and-test job documenting that continue-on-error is the early-days landing posture (merge-guard treats success || skipped as pass), with a pointer to tighten once Device Farm provisioning is stable. Nits: - 'use strict' added to addonLogging.js (matches every other .js). - node-vs-bare runtime banners on scripts/{generate,validate}-mobile-integration-tests.js. - ttsOutputDebugString no longer JSON.stringify's the full PCM Int16Array on every chunk-streaming event; emits a tiny summary ({sampleRate, chunkIndex, isLast, sentenceChunk, outputArrayLen}) instead. Tests: 35 passing (33 -> 35; two new assertions cover the deferred-load contract); 4 skipped real-GGUF tests behind the existing QVAC_TEST_CHATTERBOX_T3_GGUF / QVAC_TEST_CHATTERBOX_S3GEN_GGUF / QVAC_TEST_SUPERTONIC_GGUF env-var gates. Lint clean. Co-authored-by: Cursor <cursoragent@cursor.com> * tts-ggml: unblock CI integration tests on every desktop runner Four independent failures, one per platform: 1. linux-x64 / linux-arm64: addon load crashed at `libomp.so.5: cannot open shared object file`. tts-cpp's binary is built with clang under the linux-clang toolchain and links against libomp (LLVM OpenMP runtime); only `libgomp1` (GNU OpenMP) was being apt-installed. Add `libomp5` so libomp.so.5 is on the loader path. 2. darwin-arm64: convert-models.sh aborted at line 200 with `hf_args[@]: unbound variable`. macOS's system bash is 3.2 which treats `"${arr[@]}"` as nounset access when the array is empty under `set -u`; with HF_TOKEN unset we hit it on every fresh runner. Use the `${arr[@]+"${arr[@]}"}` idiom (defined-or-nothing) at all six call sites and add a header comment so the next maintainer doesn't accidentally regress. 3. darwin-x64: pip install bombed building `llvmlite` from source because the macos-15-large runner has no LLVM 15 development install. Root cause: librosa pulls in numba 0.65+, which stopped shipping darwin-x86_64 wheels for Python 3.12. Pin Python to 3.11 in the Setup Python step; 3.11 has prebuilt wheels for the entire numba/llvmlite/librosa stack on darwin-x64 and is fine for every other converter dependency. 4. windows-2022: ChatterboxModel::load threw `vk::createInstance: ErrorIncompatibleDriver`. Root cause: the addon's index.js::_validateConfig defaults `useGPU = true` when neither useGPU nor nGpuLayers is specified, so the test ran with n_gpu_layers=99 -> ggml_backend_vk_init -> vk::createInstance -> ErrorIncompatibleDriver on the runner's no-Vulkan-driver image. runChatterboxTTS.js now honours `process.env.NO_GPU === 'true'` (set on the no-GPU matrix entries) and forces useGPU=false on exactly those runners; the other test runners (chatterbox-mtl, gpu-smoke, multiple-runs) already had this guard. Also documents the `mesa-vulkan-drivers` apt package (already pulled in) as the software ICD that lets the Vulkan-built prebuild's runtime backend probe enumerate at least one device on linux runners. Co-authored-by: Cursor <cursoragent@cursor.com> * tts-ggml: drop Chatterbox from mobile bundle (Metro V8 string limit) Mobile build failed at `:app:createBundleReleaseJsAndAssets` with: SyntaxError: assets/testAssets/chatterbox-s3gen.gguf: Cannot create a string longer than 0x1fffffe8 characters Root cause: Metro's bundler reads every asset under `test/mobile/testAssets/` via `Buffer.toString()`. V8's max string length is 0x1fffffe8 (~512 MiB). chatterbox-s3gen.gguf is ~1 GiB even with --quant q4_0 because the s3gen converter only quantizes attention weights and leaves the bulk of the s3gen graph in fp16 ("0/291 weight tensors quantized" in the converter log). Fix: bundle ONLY supertonic.gguf (~125 MiB, comfortably under the limit) on mobile. Mobile Chatterbox tests degrade cleanly to `t.pass('Skipped: Chatterbox GGUFs not available')` via the existing `ensureChatterboxModels` helper -- it already returns { success: false } when the GGUFs aren't on disk. Cache key bumped to v2 so existing v1 cache entries (which include the chatterbox files) are evicted on the next run. Bundling Chatterbox on mobile requires either: - adding `gguf` to qvac-test-addon-mobile's metro `assetExts` so the JS-string read is skipped (then the s3gen file can flow through the bundle as a raw asset), or - pushing the chatterbox GGUFs to the device via `adb push` outside the bundle and surfacing the path through downloadModel.js's existing ANDROID_CANDIDATE_DIRS fallback. Both are outside the scope of this PR; documented inline above the cache step for the next maintainer. Co-authored-by: Cursor <cursoragent@cursor.com> * Bump hash of vcpkg * Consume vcpkg from tetherto repository * Fix integration tests failures in all platforms * Further fix tests * fix: Make useGPU flag more meaningful (#1953) * fix[api]: make useGPU flag actually force CPU/GPU and reject useGPU/nGpuLayers conflicts * add gpu smoke test * resolve comments --------- Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local> * Update dependencies after monorepo directory changes * Further drop qvac-lib- prefix * Add CHANGELOG.md --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com> Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

* feat(diffusion): refactor download scripts and add Wan 2.1 support - Extract shared dl() function into reusable dl-functions.sh module - Update all download-model-*.sh scripts to source shared utilities - Add download-model-wan.sh for Wan 2.1 video generation models - Reduces code duplication and improves maintainability Wan 2.1 downloads (~8.3 GB): - wan2.1_t2v_1.3B_fp16.safetensors (diffusion model) - wan_2.1_vae.safetensors (VAE encoder/decoder) - umt5_xxl_fp16.safetensors (text encoder) Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video foundation -- ctx/vid handlers, AVI muxer, shared parsers Phase 1-4 of Wan 2.1 / 2.2 video generation support in the diffusion-cpp addon. Configuration + parsing layer only; dispatch + callback plumbing + JS surface land in follow-up commits on this branch. SdCtxConfig: - Add highNoiseDiffusionModelPath for Wan 2.2 MoE high-noise expert (leave empty for Wan 2.1 and all non-Wan models) - Add previewMode / previewInterval / previewDenoised / previewNoisy for optional mid-denoising preview frames via sd_set_preview_callback - Wire both through SdCtxHandlers (new JS keys: preview_mode, preview_interval, preview_denoised, preview_noisy) and AddonJs (highNoiseDiffusionModelPath in args map) AviWriter (new utility): - addon/src/utils/AviWriter.{hpp,cpp} ports the upstream avi_writer.h MJPG encoder onto an in-memory std::vector<uint8_t> sink (no stdio, no temp files) so video bytes flow through the existing OutputCallBackJs queue - Full input validation (numFrames, fps, jpegQuality, channel count, frame homogeneity, null data) -- StatusError on any rejection SdParsers (new shared module): - Extract parseSampler / parseScheduler / parseCacheMode / parseVaeTileSize / parseCachePreset / requireNum/Str/Bool from SdGenHandlers into addon/src/handlers/SdParsers.{hpp,cpp} - Reused by both SdGenHandlers (image) and SdVidGenHandlers (video) SdVidGenHandlers (new): - SdVidGenConfig struct with full Wan 2.1 + 2.2 surface: mode (txt2vid/img2vid/flf2vid), prompts, dimensions, videoFrames (4k+1 validated), fps, seed, low-noise expert sample params, high-noise expert sample params, moeBoundary, strength, vaceStrength, VAE tiling, cache mode/preset/threshold - 22 JSON handlers with validation for each field Tests (all pass): - 5 new SdCtxHandlers tests for preview_* + high_noise path default - 18 new AviWriter tests covering happy path, RIFF header structure, all validation rejections, JPEG round-trip - 54 new SdVidGenHandlers tests covering every field + integration payload + defaults - Zero regressions across existing 144 fast-unit tests No user-facing JS API changes yet. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video generation -- dispatch, processVideo, JS wrapper + examples Builds on the Wan foundation commit by wiring the video path end-to-end from JS to C++ and back. Adds txt2vid / img2vid / flf2vid generation via a new VideoStableDiffusion class that shares the single native binding with the existing ImgStableDiffusion class. Native: - SdModel::process() dispatches on the JSON "mode" field to processImage() (existing) or the new processVideo() path. - processVideo() applies SdVidGenHandlers, validates mode-vs-inputs invariants (img2vid requires init_image; flf2vid requires both; txt2vid rejects both; end_image only valid on flf2vid), decodes init/end/control frames, fills sd_vid_gen_params_t, and encodes the returned sd_image_t* sequence to an in-memory MJPG AVI. - SdVideoFrames RAII wrapper extracted to addon/src/utils/ so it can be unit-tested without a loaded model. - GenerationJob grows endImageBytes and controlFramesBytes plus an optional per-frame frameCallback (unused from JS in this PR; reserved for the preview follow-up). - AddonJs::runJob reads endImageBuffer (single Uint8Array) and controlFramesBuffers (Array of Uint8Array) as typed-array args, no JSON encoding. JS surface: - video.js / video.d.ts: new VideoStableDiffusion class with full per-mode validation, 4k+1 frame-count rule, fps range, moe_boundary range, Uint8Array type checks, and warning when high_noise_* params are set without files.highNoiseDiffusionModel. - addon.js: SdInterface.runJob threads end_image and control_frames through to the native runJob without round-tripping through JSON. - index.js / index.d.ts: unchanged -- image wrapper continues to work exactly as before. Both classes compose the same SdInterface and hit the same binding.cpp entry points. - package.json: exports "./video", ships video.js / video.d.ts, adds generate:video / generate:img2vid / generate:flf2vid scripts. Examples: - examples/generate-video-wan.js (txt2vid @ 832x480, 33 frames) - examples/img2vid-wan.js (reuses assets/von-neumann.jpg as first frame) - examples/flf2vid-wan.js (expects flf-first.png / flf-last.png) Tests: - test_sd_video_frames.cpp: 12 RAII tests (empty states, destruction of 4k+1 production sizes, null-pixel tolerance, bounds-checked operator[], compile-time copy/move deletion). - test_wan_video.cpp: 12 validation tests reusing the SD2.1 context to satisfy isLoaded() and exercise every processVideo() guard before generate_video() runs; plus an opt-in happy-path smoke test (SD_RUN_WAN_SMOKE=1) gated off by default because ggml-metal lacks IM2COL_3D for Wan's 3D convs. Gates: npm run lint, npm run test:dts, npm run build, and the fast subset of addon-test (178/178) all pass. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video tests, ggml overlay, example tuning Add a vcpkg overlay-port for ggml at vcpkg/ports/ggml/ that pins tetherto/qvac-ext-ggml @ feature/metal-pr-16669-clean (commit bc053644). The fork adds Metal kernels for IM2COL_3D and 3-axis PAD-left, both required by Wan 2.1 / 2.2 video generation; without them ggml hard-aborts mid-run with "unsupported op 'IM2COL_3D'". Rationale lives in portfile.cmake -- the overlay is transient and will be removed once the registry baseline rolls forward. Add JS test coverage for VideoStableDiffusion: - test/unit/video-validation.test.js: 63 input-validation cases mirroring the existing input-validation.test.js pattern. - test/integration/generate-video-wan.test.js: opt-in (WAN_INTEGRATION=1) end-to-end T2V smoke test plus sniffAvi self-tests. Tune the Wan examples: - generate-video-wan.js: env-var-driven (PROMPT, FRAMES, STEPS, SEED, CFG_SCALE, FLOW_SHIFT, ...), inline frame-count cheat sheet, (4*k+1) pre-flight check, default FRAMES bumped to 81 (Wan 1.3B's native training length). - img2vid-wan.js, flf2vid-wan.js: flow_shift 5.0 -> 3.0 to match the upstream test-wan reference scripts. Refresh the C++ smoke-test gating doc in test_wan_video.cpp to reflect that Metal works once the overlay is in place. Drop build.md: the vcpkg overlay rationale already lives next to the overlay (portfile.cmake header), and transient infrastructure doesn't earn its own long-form doc. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(diffusion-cpp): restore build.md The earlier deletion conflated build.md with the vcpkg overlay rationale, but build.md is the package's standalone build guide (prerequisites, build pipeline, cross-compilation, troubleshooting) and is still the target of README.md's "Building from Source" link. Restore it from main, which also picks up the LLVM 19 -> 22 bump. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): address PR review feedback for Wan video gen * Flip default video dimensions to 480x832 portrait (phone-screen friendly). Wan 2.1 T2V 1.3B handles both orientations equally well; the previous 832x480 landscape default disagreed with the example. * Document the flow_shift=0 fall-through sentinel in JSDoc, .d.ts, and C++ struct/handler comments; correct stale "5-8" recommendation to the actually-used 3.0 (matches example + ref scripts). * Make video_frames error messages consistent JS<->C++ and list the full valid set up to 81 (Wan 1.3B native training cap). * Fix frame-duration arithmetic (33 frames is ~2s @ default 16 fps, not ~1.3s @ 24 fps). * Warn when upscaler_* keys are passed to VideoStableDiffusion -- ESRGAN upscale is image-only and was being silently ignored. * Annotate addon.js end_image / control_frames forwarding to call out the typed-array transport (avoids JSON byte-array bloat). * Document the two-level concurrency model around _hasActiveResponse (the busy guard isn't dead under exclusiveRunQueue -- it covers overlap between the released queue lock and an in-flight response). * Update C++ defaults test + JS suggestion-fallback test for the new portrait orientation. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): retarget ggml overlay to merged tetherto/qvac-ext-ggml@2026-01-30 The Wan-Metal work that was carried as a local overlay has all landed upstream on tetherto/qvac-ext-ggml's 2026-01-30 branch: - bc053644 metal: IM2COL_3D op + PAD left-padding for Wan video (#5) - 512e1773 cmake: support qvac hybrid backend packaging (static CPU + dynamic GPU backends, GGML_MAX_NAME prop, graceful no-OpenCL-device fallback, public ggml-opencl.h install -- previously six local overlay patches) - 6d2d24bb / b1923e29 / 05afdc59 metal: tighten IM2COL_3D supports_op to match the CPU-reference invariants (#6) Repin vcpkg/ports/ggml from PR #5's head (bc053644) to PR #6's merge commit (05afdc59) on 2026-01-30, drop all seven local overlay patches since their content is now upstream verbatim, and bump port-version 102 -> 104 to force a clean rebuild of ggml. Net diff: +22 / -201; the overlay now exists only as a baseline pin that overrides the registry's ggml-org/ggml@a8db410a (which still lacks the Wan-required Metal ops). Once the registry baseline catches up to a ref containing this work, vcpkg/ports/ggml/ can be deleted entirely. Verified with npm run build on darwin-arm64: ggml@2026-01-30#104 builds fresh from 05afdc59 with zero patches applied, addon links and tests compile, prebuild installed. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): drop local ggml overlay now that registry serves 2026-01-30#7 The previous commit (04a6496) repointed the local ggml overlay at the merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134 landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the identical REF + SHA512 the overlay was carrying. This means the diffusion-cpp-local overlay is now strictly redundant -- and slightly behind, since the registry's port-version 7 also picks up two improvements the overlay didn't have: - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build off the Apple Accelerate / BLAS path that breaks the iOS toolchain. - The Android backend-glob now also matches `libqvac-ggml-*.so` in addition to `libggml-*.so`, so the qvac-prefixed DL backends get installed alongside the upstream-named ones. So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake, vcpkg.json, usage, android-vulkan-version.cmake) and: - Bump `vcpkg-configuration.json`'s default-registry baseline from a9eae49a -> d1b2497b (the merge commit of registry PR #134), which is the first registry SHA that serves ggml@2026-01-30#7. - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5` to `version>=: 2026-01-30#7` so any later baseline bump can't silently drop us back below the Wan-Metal pin. The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep` marker are kept in place so future overlays can be added without a config flap. Verified end-to-end on darwin-arm64: clean `npm run build` (bare-make generate + build + install) with the build/ tree wiped. vcpkg resolves ggml[core,metal]:arm64-osx@2026-01-30#7 -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875... straight from the registry (no overlay), all 8 ports install in 47s, the addon links cleanly against the registry-supplied libggml*.a, and prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten. Net diff: +2 / -283. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): satisfy standard quotes rule in validateVideoFrames The middle line of the validateVideoFrames Error message was a template literal with no `${...}` interpolation, so `standard` (configured via `npm run lint`) flags it as `quotes`: video.js:39:7: Strings must use singlequote. Adjacent lines 37, 38 use single quotes, and line 40 legitimately uses backticks for `${n}`. Just the one stray backtick-string -- swap to single quotes; no behaviour change. Sanity-checks job 74830306544 on PR #1879 fails on this single line; `npm run lint` passes locally after the swap. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: enable diffusion FA in examples and fix addon paths - Set diffusion_fa: true across SD, FLUX, and integration test ImgStableDiffusion configs so diffusion flash attention matches WAN video examples. - Pass highNoiseDiffusionModelPath (empty when unset) from index.js so native createInstance validation succeeds for image mode; document optional files.highNoiseDiffusionModel in index.d.ts and validate absolute paths. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp(video): pass esrganPath to native createInstance VideoStableDiffusion omitted esrganPath while the binding validates it as a string; mirror image-mode by forwarding files.esrgan or empty string. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: align C++ includes and image codec with inference-addon-cpp - Switch remaining qvac-lib-inference-addon-cpp includes to inference-addon-cpp (vcpkg installs headers under the shorter prefix). - Use image_codec::decodeImage / encodeToPng in processVideo after ImageCodec API rename from decodePng. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: apply clang-format to changed C++ sources Run git-clang-format against 2c4dc65 to satisfy the repo formatter on the video addon, image codec, and Wan tests. No behavior changes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): address review comments 1-3 1. Use global addonLogging instead of per-instance setLogger/releaseLogger - Eliminates process-global logger collision (was reintroduced in video.js) - Mirrors fix from ImgStableDiffusion / EsrganUpscaler - video.js no longer manages per-instance logger state 2. Reject width/height values <= 0 in JS validation - Now validates that width > 0 and height > 0 before alignment check - Error message updated to say "positive multiples of 8" - Updated test expectations to match new message 3. Validate double values are integers before casting in C++ - All int casts now check std::floor(d) == d first - Affects: width, height, video_frames, fps handlers - Prevents silent truncation (e.g. 8.5 -> 8) All 70 unit tests pass; build/lint/dts all clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): address review comments 4-7 4. Validate end_image / control_frames dimensions match video dimensions - Added dimension checks in processVideo() before generate_video() - Rejects mismatched frame sizes with clear error messages - Prevents silent corruption or undefined behavior in native layer 5. Use ImageCodec ownership helper instead of raw free() - Replaced FrameBuffersGuard with unique_ptr<uint8_t, FreeDeleter> - Consistent with existing image_codec ownership pattern - Automatic cleanup on exception; no manual free() calls 6. Regenerate mobile integration test manifest - Ran npm run test:mobile:generate - Updated test/mobile/integration.auto.cjs with new runners 7. Add checked buffer size calculation in AviWriter - Validates width * height overflow before multiplication - Validates numFrames * bytesPerFrame overflow - Rejects allocations that would exceed SIZE_MAX - Prevents silent integer overflow in reserve() call All 70 unit tests pass; build/lint/dts all clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): harden int validation, ownership, AVI overflow Follow-up tightening on top of the review fixes for #1879. SdVidGenHandlers: - Extract a single requireInt() helper used by width / height / video_frames / fps / requirePositiveInt. The helper rejects NaN, +/-inf, fractional doubles, and values outside [INT_MIN, INT_MAX] before static_cast<int>, so casts to int are always well-defined and no JSON value silently truncates (e.g. 8.5 -> 8). - Add <cmath>/<climits> includes that were transitively available. SdModel::processVideo: - Replace the bespoke FrameBuffersGuard struct with three plain unique_ptr<uint8_t, image_codec::FreeDeleter> values (initData / endData / controlData). Same lifetime semantics, less custom code, and the control-frame dimension mismatch path now takes ownership *before* the check so a throw can no longer leak the freshly-decoded buffer. AviWriter::encodeFramesToAvi: - Reserve calculation is now step-wise overflow-checked against SIZE_MAX (width vs height vs *3 vs *numFrames) instead of a single multiply that could wrap. - Add a hard upper bound at UINT32_MAX (AVI 1.0 RIFF size header is a uint32_t -- anything past 4 GB cannot be addressed by the spec). - Re-check the final size before patching the RIFF header in case JPEG output overshoots the pre-flight estimate. Tests: - SdVidGenHandlers: new IntCoercion suite covers fractional doubles, out-of-int-range doubles, picojson's own NaN/inf rejection at the JSON layer, and integer-valued doubles (the common case from JSON). - AviWriter: new tests for the overflow guard and the 4 GB RIFF cap, both fire before any encoding starts. - test_wan_video: pin width/height in the existing CorruptControlFrame test so the new dimension check passes for frame [0] and we still exercise the decode-failure path at frame [1]. Add two new cases covering end_image and control_frames dimension mismatch. All 211 C++ tests, 70 JS unit tests, lint and tsc --dts pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): don't eager-require binding via addonLogging CI sanity-checks (JS unit tests on a runner with no native prebuild) was crashing with `AddonError: ADDON_NOT_FOUND` because the top-level `require('./addonLogging')` introduced in e6b13ae transitively pulled in `binding.js` -> `libqvac__diffusion-cpp.so`. The unit tests only exercise JS-side validation and never call `load()`, so they used to work without the prebuilt addon -- this regression broke that. Match `ImgStableDiffusion` instead: drop the per-instance native logger plumbing entirely (it's dead code anyway after the e6b13ae refactor, since `_connectNativeLogger` was no longer called), and document in the constructor JSDoc that callers wire up native C++ logs once globally via `addonLogging.setLogger(...)`. Net diff: - Remove `const addonLogging = require('./addonLogging')` at top. - Remove `_connectNativeLogger` / `_releaseNativeLogger` methods and their two stale call sites. - Remove `LOG_METHODS` (only used by the removed method) and `this._binding` (used to keep a handle for the removed release path; the binding is now scoped to `_createAddon` only, matching `ImgStableDiffusion::_createAddon`). - JSDoc on `args.logger` now mirrors `index.js` and points users at `addonLogging.setLogger`. Verified: JS unit tests 70/70 pass with the prebuilds directory moved aside, lint clean, tsc --dts clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): validate init_image dims; reject unsupported lora Two reviewer-flagged regressions on PR #1879: 1. blocker (gabrielgrigoras-serv): processVideo() validates dimensions for end_image and every control_frames[i] but not for init_image. A caller passing width/height that don't match the decoded init_image would hand mismatched (width, height) and frame pixel stride to generate_video(), producing inconsistent frame data downstream (and risking VAE segfaults). Fix: add the same dimension check in SdModel.cpp processVideo() right after the init_image decode, throwing StatusError on mismatch -- consistent with the existing end_image / control_frames checks. All three checks now compare against vid.width / vid.height as the single source of truth for the video's final dimensions. Ownership of the freshly-decoded init pixel buffer is taken into the unique_ptr *before* the dim check, mirroring the control_frames path so a mismatch can't leak the buffer. 2. gianni-cor: params.lora silently dropped on the video path -- video.js validated it as a non-empty absolute path and video.d.ts advertised `lora?: string`, but SD_VID_GEN_HANDLERS has no "lora" entry and SdModel::processVideo never touches sd_vid_gen_params_t::loras, so any LoRA passed through was swallowed by the unknown-keys branch in applySdVidGenHandlers and silently produced LoRA-less output. Fix B applied (reviewer's preferred "out of scope" option): - video.js: replaced the absolute-path validation with a loud TypeError('params.lora is not supported for video generation yet'), so existing callers fail at the JS boundary instead of getting silent LoRA-less output. - video.d.ts: dropped `lora?: string` from VideoGenerationParams. - video-validation.test.js: collapsed the four old lora cases (empty / non-string / relative / absolute) into one parametrised test that asserts the new TypeError fires for every shape, so a future re-introduction of the JS validation can't bring back the silent-drop regression. When LoRA-on-video is wired through native (mirror of processImage's prepareLoras() + sd_img_gen_params_t::loras), the right path is to restore the absolute-path validation here and add a "lora" handler to SD_VID_GEN_HANDLERS, NOT to revert the d.ts. C++ test changes: - new Img2VidRejectsInitImageWithWrongDimensions covers the blocker. - Flf2VidRejectsCorruptEndImage pinned width/height to 64 so the new init dim check passes for the 64x64 init and we still reach the intended end-decode-failure path (same approach as the existing Img2VidRejectsCorruptControlFrame fixture). Verified: 67/67 JS unit tests pass with and without prebuilds, 176/176 C++ tests pass (1 opt-in Wan smoke skipped, requires ~8GB weights), lint and tsc --dts clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): regression + 7 review-batch fixes (NaN/Inf guards, cancel, etc.) Addresses all 8 outstanding comments on PR #1879 (one regression from commit 59f2663 plus a CHANGES_REQUESTED batch of seven items). Major points below; per-file rationale in the inline comments. == Regression fix (highest priority) * gianni-cor flagged that the new init_image strict-equality check from commit 59f2663 rejects every off-grid frame with a confusing error citing wrapper-picked dims. Root cause: addon.js _fillDimsFromImage was silently doing Math.ceil(d/8)*8, so a 100x100 init_image got dispatched as 104x104 and the native check then threw "100x100 != 104x104" -- citing a value the caller never passed. Fixes: - addon.js _fillDimsFromImage now passes dims through verbatim (no rounding). The image SDEdit path already realigns internally (SdModel.cpp ~600) and the FLUX2 ref path uses auto_resize_ref_image, so dropping the rounding is safe across every path. - video.js _runInternal pre-empts the cryptic native error with a JS-layer off-grid probe: when width/height aren't explicit it reads init_image / end_image / control_frames[i] dimensions and throws a clear "your image is off-grid, pre-align or pass explicit dims" message naming the exact buffer. - Removes the ceil-vs-round inconsistency wart between _fillDimsFromImage (ceil) and the user-facing validator (round). - Three new JS regression tests for off-grid init / end / control, plus one positive test for explicit aligned dims overriding the probe. == JS hardening * params.prompt is documented Required but was never validated -- undefined / "" / 42 each produced a different failure mode (silent noise, silent noise, far-away C++ error). video.js now throws a loud TypeError at the wrapper boundary. Four new prompt-validation tests. * mapAddonEvent JobEnded fallback accepted every typed-array view -- works today only because uint8_t is the sole registered TypedArrayOutputHandler. When frameCallback (SdModel.hpp:139) gets wired through to JS, every per-frame event would have been misclassified as JobEnded and the response stream would have closed after the first frame. One-token fix: add `&& !ArrayBuffer.isView(rawData)` to the discriminator. ArrayBuffer.isView is true for every TypedArray + DataView, false for plain objects -- exactly the discrimination needed for the runtime-stats POJO. == C++ parser hardening (NaN / Inf / int64 / range) * Promoted requireInt from SdVidGenHandlers.cpp's anonymous namespace into parsers::, and added two siblings: - requireFiniteFloat: rejects NaN / +inf / -inf before the float cast (NaN compares false against every bound, so range checks of the form `f < lo || f > hi` previously let it sneak through). - requireInt64: same finite + integer guards as requireInt, range check against representable [INT64_MIN, INT64_MAX] doubles. - requireFiniteFloatInRange: convenience wrapper for [lo, hi] checks. * Routed every previously-vulnerable cast through the new helpers: - SdVidGenHandlers.cpp: seed (int64), cfg_scale, flow_shift, high_noise_cfg_scale, high_noise_flow_shift, vae_tile_overlap, cache_threshold, moe_boundary, strength, vace_strength - SdGenHandlers.cpp (image path, reviewer asked for symmetric fix): eta, cfg_scale, guidance, img_cfg_scale, seed, batch_count, strength, clip_skip, vae_tile_overlap, cache_threshold, width, height, steps, parseUpscaleRepeats * parseVaeTileSize (SdParsers.cpp): numeric form now routes through requireInt (rejects NaN/Inf/fractional/out-of-range), and BOTH forms (numeric and "WxH" string) now reject <= 0. Five new tests. == Cancellation gap + typed status * SdModel.cpp processVideo cancelRequested_ was checked exactly once after generate_video() returns -- the slow tail (per-frame PNG fan-out + AVI mux, multi-second on 81-frame 832x480 videos) had no cancellation visibility. Added 2 checks: top of frame-callback loop body, and immediately before encodeFramesToAvi. * Switched both Job cancelled throws (image path at SdModel.cpp:730, video path at :987, plus the 2 new C1 sites) from bare std::runtime_error to StatusError tagged with localCodeMsg="Cancelled", so the JS layer can discriminate cancel from real internal failures via codeString() ("[ General :: Cancelled ]") instead of string-matching the exception message. Note: this PR deliberately does NOT add `Cancelled = 6` to the shared inference-addon-cpp Errors.hpp enum, because that header ships via vcpkg to every package in the monorepo and a cross-package coordinated change is out of scope. Instead we use the 3-arg StatusError ctor (addonId, localCodeMsg, errorMsg) which produces the same codeString without touching the shared enum. When the enum is updated later, the 4 call sites can switch to the 2-arg ctor in a one-line follow-up. == C5 (preview_*) -- product decision deferred * The header comment at SdCtxHandlers.hpp:112 claimed preview_mode et al are "Wired to sd_set_preview_callback() in SdModel::process()", but a grep across packages/diffusion-cpp for sd_set_preview_callback returns zero matches -- the four config keys are validated and stored but the upstream callback is never installed, so they're a silent no-op end-to-end. Downgraded the misleading comment to an explicit TODO(QVAC-18026 follow-up) documenting the gap and the two viable resolution paths (wire it up alongside sd_set_abort_callback, OR remove the handlers + fields + tests). Reviewer asked which path is intended; this commit picks neither and just stops claiming the wiring exists. The choice can land in a separate PR without holding this one up. == Test surface * +8 JS tests (prompt validation x4, off-grid probe x4) * +5 C++ tests (vae_tile_size zero/negative/fractional/out-of-range rejection, plus the existing IntCoercion suite carried over to the promoted helpers transparently) * Cancel-context test updated to assert the typed "[ General :: Cancelled ]" codeString in addition to the message. Verified locally: JS unit tests: 75/75 pass with prebuild, 75/75 also without (CI sanity-checks mode, no native binary loaded) C++ unit tests: 209/210 pass, 1 opt-in skip (SdWanHappyPathTest needs ~8GB Wan weights) npm run lint: clean npm run test:dts: clean Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): release 0.8.0 Bumps @qvac/diffusion-cpp to 0.8.0 and documents the Wan 2.1 / Wan 2.2 video pipeline shipped since 0.7.0: new VideoStableDiffusion class (txt2vid / img2vid / flf2vid), MoE high-noise expert routing, streaming MJPG AVI muxer, refactored download helpers + Wan model script, plus the supporting JS + C++ test coverage and validation hardening. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): re-align auto-detected img dims to multiple of 8 _fillDimsFromImage was passing raw image dimensions through verbatim since fe4d10f, but the native SdGenHandlers validates width/height % 8 == 0 before the downstream alignment in SdModel::processImage ever runs. Any img2img call with a non-aligned source image (e.g. the bundled 500x627 von-neumann.jpg used by the FLUX2 i2i integration test) therefore failed with: height must be a positive multiple of 8, got: 627 Restore the Math.ceil(d/8)*8 round-up that was removed in fe4d10f. The original motivation for the removal -- avoiding a spurious dim mismatch on the video path where processVideo strict-compares decoded frame dims against vid.width/vid.height -- is already handled at the JS layer by VideoStableDiffusion's off-grid pre-validation in video.js, which runs before this helper and rejects unaligned init/end/control frames with a clear caller-facing error. The ceil() is therefore a no-op on the video path. Co-authored-by: Cursor <cursoragent@cursor.com> * style(diffusion-cpp): apply clang-format to drifted C++ sources cpp-lint surfaced clang-format drift in 4 files that accumulated across recent Wan-video commits. No semantic changes -- only mechanical line-wrap / arg-break placement to match the project's .clang-format. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/test): use package export for video module in wan integration test The generate-video-wan.test.js test was using a relative import (require('../../video')) that breaks when test files are bundled and relocated to the test-framework backend directory during mobile test setup. Change to the package export pattern (@qvac/diffusion-cpp/video) used by other integration tests, which remains valid regardless of file location. Fixes: https://github.com/tetherto/qvac/actions/runs/25929776543/job/76221440417 Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): expose video API from package root Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): repair variable names in SdModel after merge Co-authored-by: Cursor <cursoragent@cursor.com> * style(diffusion-cpp): apply git-clang-format Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

… overlay PR #10 (Wan 2.1 I2V VAE-tiling fix) is merged into the 2026-03-01 branch of qvac-ext-stable-diffusion.cpp and published to the registry as 2026-03-01#6. Remove the temporary package-local stable-diffusion-cpp vcpkg overlay port and its overlay-ports entry, bump the dependency to #6, and point the registry baseline at the commit that publishes it. Registry bump: tetherto/qvac-registry-vcpkg#175 Co-authored-by: Cursor <cursoragent@cursor.com>

…encoder (#2237) * feat(diffusion-cpp): add Wan 2.1 I2V model download, FLF2V helpers, and VAE tiling patch Adds tooling and assets to support image-to-video (img2vid) and frame-to-frame interpolation (FLF2V) generation with the Wan 2.1 I2V 14B model in GGUF format. Additions: - scripts/download-model-wan-i2v.sh: downloads city96/Wan2.1-I2V-14B-480P-gguf Q4_K_M (~11 GB) plus VAE, T5-XXL, and CLIP ViT-H/14 vision encoder - examples/generate-shannon-flux.js: FLUX2-klein img2img helper to generate an end-frame at matching resolution (FLF2V requires both frames to share dims) - examples/generate-flf-end-frame.js: alternative img2vid-based frame generator - addon/examples/img2vid-wan-example.cpp + CMakeLists.txt: native C++ usage example - vcpkg/ports/patches/wan-i2v-encode-video-bypass-tiling.patch: patches stable-diffusion.cpp to skip 2D VAE tiling for 4D video tensors (avoids GGML_ASSERT failure during VAE encode in img2vid/flf2vid) - assets/claude-shannon-resized.jpg, assets/maks-original.jpg: example assets Note: This PR adds only NEW files; the corresponding C++ wiring for clipVision in addon/src/* and JS bindings in addon.js/video.js/index.js is tracked separately in feature/itv (b0e32e0) and will be ported in a follow-up PR once compatible with the post-history-rewrite addon refactor. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion-cpp): port Wan 2.1 I2V C++ wiring and JS bindings from feature/itv - Port full addon/src C++ implementation: clipVisionPath support in SdCtxHandlers, AddonJs, and SdModel; FLF2V (first-last-frame-to-video) handlers in SdVidGenHandlers; updated AviWriter and SdVideoFrames for video generation - Add clipVisionPath to video.js and index.js configurationParams so the native addon receives the CLIP vision encoder path for I2V/FLF2V modes - Update img2vid-wan.js to default to the dedicated Wan 2.1 I2V 14B GGUF checkpoint with CLIP vision, replacing the T2V 1.3B placeholder - Update flf2vid-wan.js with production-ready FLF2V defaults, crossfade prompt, and releaseLogger() in finally block to prevent process hang - Update img2img-flux2.js and img2img-flux2-f16.js with clipVisionPath passthrough fix Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion-cpp): remove FLF2V interpolation, deliver I2V only Remove first-last-frame-to-video (flf2vid) mode from the public API: - Delete examples/flf2vid-wan.js and examples/generate-flf-end-frame.js - Remove 'flf2vid' from VIDEO_MODES and all end_image validation in video.js - Remove VideoMode 'flf2vid' and end_image field from video.d.ts Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion-cpp): remove flf2vid from C++ addon entirely Remove first-last-frame-to-video from the native layer: - SdModel.cpp: remove flf2vid mode branch, end_image decode/resize path, vidParams.end_image assignment, and endImg/endData locals - SdModel.hpp: remove endImageBytes field from GenerationJob - SdVidGenHandlers.cpp/.hpp: remove flf2vid from valid mode set and comments - AddonJs.hpp: remove endImageBuffer parsing - SdCtxHandlers.hpp: remove FLF2V references from clipVisionPath comment Supported video modes are now strictly txt2vid and img2vid. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): Address all critical C1–C7 issues + implement High priority fixes **Critical Issues (C1–C7):** - C1: Thread-local callbacks already implemented (tl_progressCtx, tl_abortModel) - C2: Gate unused preview_mode config (parsed but never wired) - C3: Fix memory leak on generate_image() exception paths using RAII wrappers - C4: Null-check generate_image/video returns, throw StatusError on failure - C5: Implement applyFluxImg2ImgDimDefaults() for FLUX img2img dimension defaults - C6: Harden VideoStableDiffusion (LoRA rejection; end_image/flf2vid deferred) - C7: Harden mapAddonEvent with explicit Uint8Array checks and documentation **High Priority (H1–H12) - Previously completed:** - Shared integer parsing (requireInt, requirePositiveInt, etc.) with overflow guards - Standardized cancellation errors via makeCancelledError() - JS input validation (dimensions, prompts, image coercion) - Overflow checks in image resizing & AVI encoding - Cooperative cancellation in video post-generation - TypeScript .d.ts synchronization **Infrastructure:** - Scaffold local vcpkg overlay port for Wan I2V VAE-tiling patch - Restore portfile.cmake + supporting config files - Pin to stable-diffusion-cpp@00cd2a09 (registry #4) for SD_BACKEND_PREF_AUTO **Files Changed:** C++ handlers, model interface, utilities: integer parsing, error handling, memory safety JavaScript: input validation, FLUX dimension defaults, video params, event mapping TypeScript: type definitions for new exports and corrected runtime behavior vcpkg: local overlay + patch machinery for I2V fix Closes #HIGH-PRIORITY, fixes i2v model loading via patched VAE tiling. Co-authored-by: Cursor <cursoragent@cursor.com> * Merge origin/main with C1-C7 critical fixes (excluding flf2vid) Co-authored-by: Cursor <cursoragent@cursor.com> * style(diffusion-cpp): clang-format C++ files changed vs main Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): fix unit test failures after flf2vid removal - video.js: add peekImageDims helper; reject off-grid init_image / control_frames dimensions when caller omits explicit width/height; unify control_frames error message to 'must be a non-empty Uint8Array' - test: remove flf2vid-specific tests (29,40,56,58,64-66); update test 63 error-message regex; update test 29 mode list regex Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): fix cpp-tests build failures - overlay portfile: bump stable-diffusion-cpp pin from 00cd2a09 (#4) to 747a1801 (#5) so EsrganUpscaler.cpp's sd_upscaler_device_t and new_upscaler_ctx_with_device resolve; patch still applies cleanly - SdModel.cpp processVideo: revert init_image / control_frames dimension mismatch from resize to throw, matching C++ unit test expectations - test_wan_video.cpp: remove all flf2vid and endImageBytes tests (flf2vid was removed from the C++ layer); update ValidationThrowClearsThreadLocalState to use img2vid instead Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): pass clipVisionPath to addon in ImgStableDiffusion Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): align init_images error messages with integration test expectations Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): fix 10 failing cpp-tests unit tests - Restore diffusionFlashAttn/diffusionConvDirect/vaeConvDirect defaults to true - Restore preview handlers (mode/interval/denoised/noisy) — revert C2 gating - Remove flf2vid from AcceptsTxt2VidImg2VidFlf2Vid test (renamed) - Add zero/negative/fractional/out-of-range validation to parseVaeTileSize Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): apply FLUX img2img 1024 defaults when prediction is in load config Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): address PR review comments (jpgaribotti, jesusmb1995) - Remove generate:flf2vid npm script (example file was deleted) - Fix img2vid-wan-example.cpp default to GGUF path (not fp8_scaled) - Align Wan I2V spatial constraint to 16 (was 8) in video.js - Throw (not warn) when files.clipVision missing for img2vid - Remove endImageBuffer dead code from addon.js - Scrub stale flf2vid/end_image references from JSDoc and comments Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): update video-validation tests for alignTo=16 (Wan spatial multiple) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): fix unit test regressions from alignTo=16 and clipVision throw - Add FAKE_CLIP_VISION to makeWanModel defaults so img2vid tests pass the new 'files.clipVision required' guard - Fix test 41: width/height 104 -> 112 (first multiple of 16 > 100) Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): scrub all remaining FLF2V/end_image references Remove every comment, JSDoc, test, and CHANGELOG mention of flf2vid, FLF2V, first-last-frame, and end_image across the package. Also removes the end_image validation blocks in video.js and the two corresponding unit tests, since end_image was only ever used by the now-removed flf2vid mode. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ci): remove stale vcpkg dir before clone on macOS self-hosted runners Self-hosted macOS runners persist the parent directory between runs, so a leftover vcpkg/ from a previous job causes `git clone` to fail with "destination path 'vcpkg' already exists". Add `rm -rf vcpkg` before the clone to ensure a clean state. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ci): update setup-vcpkg SHA to include stale-dir rm fix All workflow callers were pinned to 6e8d3c3 (original action commit) which didn't include the rm -rf vcpkg cleanup. Update all 7 callers to 80fdb78 so CI picks up the fix on macOS self-hosted runners. Co-authored-by: Cursor <cursoragent@cursor.com> * revert(ci): remove rm -rf vcpkg patch from setup-vcpkg action Runner-level cleanup to be handled by DevOps. Keeping the SHA bump in workflow callers to stay in sync with the current action commit. Co-authored-by: Cursor <cursoragent@cursor.com> * test(diffusion-cpp): add Wan 2.1 I2V smoke integration test Adds a CI smoke test for img2vid mode alongside the existing txt2vid test in generate-video-wan.test.js. Downloads the I2V 14B Q4_K_M GGUF, shared VAE/T5-XXL, and clip_vision_h models on demand; uses the existing von-neumann-colorized.jpg asset as init_image; runs 2 steps at 480x272 to keep wall-clock under 5 minutes on GPU runners. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): use city96 public repo for Wan I2V GGUF model download bartowski's wan2.1-i2v-14b-480p-GGUF repo requires authentication (401). Switch to city96/Wan2.1-I2V-14B-480P-gguf which is public (gated: false) and is the same source used by the download-model-wan-i2v.sh script. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): resolve init_image dimension mismatch in I2V video generation - Remove hardcoded 480x272 dimensions from I2V test to prevent mismatch with 512x512 init_image - Infer video dimensions from init_image header when width/height are omitted - Add early JavaScript validation to catch dimension mismatches before C++ execution - Provide helpful error message guiding users to either omit dimensions or pre-scale the image Fixes Windows CI failure: "init_image dimensions 512x512 do not match video dimensions 480x272" Co-authored-by: Cursor <cursoragent@cursor.com> * ci(diffusion-cpp): skip Wan tests on CPU-only runners, enable on GPU darwin-arm64 - Remove blanket darwin skip to allow Wan tests on GPU-enabled darwin-arm64 - Only skip Wan tests on mobile and CPU-only runners (NO_GPU=true) - Fixes darwin-x64 CI timeout by skipping Wan tests on CPU-only macos-15-large - Allows Wan tests to run on GPU-enabled mac-mini-m4 (darwin-arm64) Resolves: darwin-x64 integration test taking 50+ minutes Co-authored-by: Cursor <cursoragent@cursor.com> * ci: add debug logging for Wan test skip behavior - Add workflow step to log NO_GPU and test configuration before tests run - Add console.log in Wan test module to show skip decision - Helps diagnose why darwin-x64 integration tests are taking too long This will show us: - If NO_GPU env var is properly set - Whether Wan tests are actually being skipped or running Co-authored-by: Cursor <cursoragent@cursor.com> * fix: resolve linting quote style error in Wan I2V test Co-authored-by: Cursor <cursoragent@cursor.com> * fix: revert overly strict init_image dimension validation The dimension mismatch check was catching a valid use case where: - caller passes off-grid init_image (e.g. 100x100) - caller explicitly specifies aligned width/height (e.g. 112x112) - caller handles alignment themselves Removing this check restores the original behavior and allows callers to intentionally provide mismatched dimensions. The C++ layer will catch truly invalid combinations. Fixes failing unit test: "accepts off-grid init_image when caller passes explicit aligned width/height" Co-authored-by: Cursor <cursoragent@cursor.com> * fix: correct workspace cleanup condition for all self-hosted runners Replace restrictive startsWith(matrix.runner, 'qvac-') check with runner.environment != 'github-hosted' to properly apply workspace cleanup to ALL self-hosted runners, including mac-mini-m4-gpu and other runners that don't follow the qvac- naming convention. This ensures self-hosted runners (whether qvac-*, mac-mini-*, or others) get proper workspace cleanup, while github-hosted runners skip it. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: refine workspace cleanup condition to avoid GitHub-hosted ARM runners Use explicit exclusion of standard GitHub runner prefixes (ubuntu-, macos-, windows-) instead of runner.environment check, which may not work reliably with GitHub-hosted ARM runners like ubuntu-24.04-arm and ubuntu-22.04-arm. This ensures: - Self-hosted runners (qvac-*, mac-mini-*, etc.) get cleanup (✓) - GitHub-hosted runners (ubuntu-*, macos-*, windows-*) skip cleanup (✓) - GitHub-hosted ARM runners (ubuntu-*-arm) skip cleanup (✓) Co-authored-by: Cursor <cursoragent@cursor.com> * chore: sync CI/CD workflows from main Pulls latest workflow files from main branch to ensure feature/wan-i2v uses the current CI/CD configurations, including the workspace cleanup fixes for self-hosted macOS runners. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: use correct workspace cleanup condition instead of failed runner.environment The runner.environment != 'github-hosted' condition caused failures on GitHub-hosted ARM runners (ubuntu-*-arm). Use explicit prefix exclusion instead: - Skip cleanup for GitHub-provided runners (ubuntu-*, macos-*, windows-*) - Apply cleanup to all self-hosted runners (qvac-*, mac-mini-*, etc.) This is the correct fix that should have been in PR #2359. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: sync workflows with main Pull all workflow files from main to keep feature/wan-i2v workflows identical to main. No custom CI/CD changes on this branch. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: update vcpkg overlay to point to fix/wan-i2v-vae-tiling PR branch Point the stable-diffusion-cpp portfile to the fix/wan-i2v-vae-tiling branch from qvac-ext-stable-diffusion.cpp PR #9 instead of applying the patch overlay. This allows testing the upstream fix before it's merged. Once the PR is merged and published in the qvac registry, this overlay can be removed entirely. GitHub PR: tetherto/qvac-ext-stable-diffusion.cpp#9 Co-authored-by: Cursor <cursoragent@cursor.com> * fix: pin vcpkg overlay to exact commit SHA instead of branch name Using a branch name REF without SHA512 causes vcpkg to fail. Pin to exact commit 793d377 (HEAD of fix/wan-i2v-vae-tiling branch) with the correct SHA512 hash. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: point vcpkg overlay to clean cherry-pick on 2026-03-01 base Previous branch was based off master and included 9 upstream commits that shouldn't be in the PR (CI workflow changes, docs, etc.). New clean branch fix/wan-i2v-vae-tiling-clean is based directly off 2026-03-01 with only the VAE tiling fix cherry-picked. PR: tetherto/qvac-ext-stable-diffusion.cpp#10 Co-authored-by: Cursor <cursoragent@cursor.com> * fix: correct SHA512 to use zip hash (vcpkg downloads .zip not .tar.gz) Co-authored-by: Cursor <cursoragent@cursor.com> * chore: remove patch file — fix is baked into the pinned commit The portfile now points directly to the commit that already contains the VAE tiling fix, so the patch file is redundant and has been removed. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: use tar.gz SHA512 — vcpkg downloads .tar.gz not .zip Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): use 256x256 init image for Wan I2V to fit Metal GPU budget The Wan I2V 14B test OOM'd on the Mac mini M4 Metal backend during diffusion compute (kIOGPUCommandBufferCallbackErrorOutOfMemory). The 512x512 init image (inferred as the video resolution) was ~2x the pixels of the original 480x272 config and exceeded the GPU memory budget. Add a pre-resized 256x256 init image asset and point the I2V smoke test at it, shrinking the video latent/activation footprint so the 14B model fits in GPU memory on the Mac mini M4 runner. Co-authored-by: Cursor <cursoragent@cursor.com> * test(diffusion-cpp): skip Wan video tests on macOS/Metal due to GPU OOM The Wan 14B I2V model OOMs the Mac mini M4 Metal GPU during diffusion compute (kIOGPUCommandBufferCallbackErrorOutOfMemory), even after dropping the init image to 256x256. Exclude darwin entirely from the Wan suite; the tests still run on Linux/Windows GPU runners. Co-authored-by: Cursor <cursoragent@cursor.com> * test(diffusion-cpp): remove unused 256x256 init image Wan tests are now skipped on macOS/Metal, so the smaller init image added to work around the Metal GPU OOM is no longer needed. Revert the I2V smoke test back to the original 512x512 init image and delete the resized asset. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): satisfy clang-tidy identifier-naming in addon clang-tidy readability-identifier-naming flagged six globals introduced by the Wan I2V wiring. Rename to match the package .clang-tidy convention: - global constants -> UPPER_CASE: kMaxSafeJsonInt, kAddonId, kCancelled, kJobCancelledMessage - thread_local globals -> g_ prefix: tl_progressCtx, tl_abortModel Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): restore root VideoStableDiffusion export VideoStableDiffusion was dropped from index.js when the Wan 2.1 I2V bindings were ported (ca07e91), leaving require('@qvac/diffusion-cpp').VideoStableDiffusion undefined even though index.d.ts still declares it as a named export. Re-export it from the barrel to realign the runtime export with the type declarations. The subpath entry point (@qvac/diffusion-cpp/video) was unaffected. Co-authored-by: Cursor <cursoragent@cursor.com> * build(diffusion-cpp): consume sd.cpp 2026-03-01#6 from registry, drop overlay PR #10 (Wan 2.1 I2V VAE-tiling fix) is merged into the 2026-03-01 branch of qvac-ext-stable-diffusion.cpp and published to the registry as 2026-03-01#6. Remove the temporary package-local stable-diffusion-cpp vcpkg overlay port and its overlay-ports entry, bump the dependency to #6, and point the registry baseline at the commit that publishes it. Registry bump: tetherto/qvac-registry-vcpkg#175 Co-authored-by: Cursor <cursoragent@cursor.com> * build(diffusion-cpp): repoint vcpkg baseline to merged registry commit Registry PR tetherto/qvac-registry-vcpkg#175 is merged. Update the default-registry baseline from the temporary PR-branch commit to the registry main merge commit (8693af45) that publishes stable-diffusion-cpp 2026-03-01#6. Co-authored-by: Cursor <cursoragent@cursor.com> * Update vcpkg-configuration.json * Update vcpkg-configuration.json * Update CHANGELOG.md * bump version to 0.11.0 * fix(diffusion-cpp): remove broken Wan C++ example Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): address PR review on Wan I2V video bindings - Standardize video dimensions on multiples of 16 end-to-end: C++ width/height handlers and video.d.ts now match the JS wrapper. - requireRange: reject non-finite values (NaN/Inf) before range check. - Video seed uses requireInt64 (parity with image path); no silent truncation of fractional/out-of-range seeds. - Use typed makeCancelledError() at all diffusion cancel sites. - Docs: clipVision is required for img2vid and throws; preview-callback options are parsed but not yet wired. Co-authored-by: Cursor <cursoragent@cursor.com> * test(diffusion-cpp): update unit tests for 16-aligned dims and typed cancel - SdVidGenHandlers dimension tests now expect multiples of 16 (reject multiples of 8 that aren't 16-aligned), matching the handler change. - Cancel-context test expects the typed [ Diffusion :: Cancelled ] code emitted by makeCancelledError() at all diffusion cancel sites. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

Lifecycle correctness: - Spawn lock: steal only when the owner pid is dead (with an mtime fallback for an unreadable lock), so a legitimate multi-minute cold start no longer loses its lock after 30s and spawns a duplicate runner/serve (tetherto#1). - close(): the fetch path now bails out instead of re-resolving once closed, so a request racing close() can't silently re-add a consumer / spawn a runner (tetherto#3). - sweepServes: when an orphaned serve's pid is alive but its health check fails, keep the record instead of dropping it — dropping stranded a live serve with no registry trace. We only reap once it answers as ours, or drop once its pid dies (tetherto#4). - servePort: fold a pinned port into the fleet key so pinned-port callers don't reuse an auto-allocated serve on a different port, and distinct pins don't collide (tetherto#5). - Respawn: expose baseURL/port/pid as getters over live state, updated on every reconnect, so diagnostics/external clients see the real serve after recovery (tetherto#6). - retargetUrl now handles Request inputs (not just string/URL) so a respawn stays transparent if the SDK ever switches input shapes (tetherto#8). Docs: - README + docs-site: direct-baseURL tools (OpenCode/Cline/Aider) don't extend liveness; document the long-lived-sentinel/wrapper pattern and fix the misleading "the script doesn't have to stay running" note (tetherto#2). - Reconcile version wording: README/changelog now describe managed mode as unreleased (package is 0.1.0); docs-site integration page documents managed mode + the async overload (tetherto#7). Tests: spawn-lock steal/keep matrix, fleet-key pinned-port sensitivity, and the runner-dead + serve-alive + health-failing sweep case. Build + suite green (60 pass / 1 integration skip).

* feat[api]: add managed mode to @qvac/ai-sdk-provider (QVAC-19900) Add `mode: 'managed'` so the provider can synthesize an ephemeral qvac.config.json from a model-constant list, spawn and supervise `qvac serve` on a free port, and tear it down on host exit. External mode is unchanged and stays synchronous; the managed supervisor is lazily dynamic-imported so external-mode users pay no startup cost. @qvac/cli becomes an optional peer dependency. * fix: resolve @qvac/cli via main entry when its exports block package.json (QVAC-19900) The published @qvac/cli ships a string `exports` field ("./dist/index.js"), which makes the `./package.json` subpath non-resolvable (ERR_PACKAGE_PATH_NOT_EXPORTED). Managed mode relied on resolving `@qvac/cli/package.json` to locate the bin, so it would fail to find the CLI on a clean install. Fall back to resolving the package main entry, which for @qvac/cli is the same file as the `qvac` bin. * doc: update ai-sdk provider agent setup after queue (QVAC-19900) * QVAC-19900 feat[api]: per-model config for managed mode Managed mode `models` now accepts spec objects ({ name, config, preload, default }) alongside bare constant names, so callers can set per-model serve options — notably `ctx_size` and `reasoning_budget` — that coding agents like OpenCode require. The synthesized qvac.config.json carries the config block, honors explicit `preload`/`default`, and validates names inside spec objects. Exports the new `QvacManagedModel` type and documents per-model config plus a managed-mode OpenCode example in the README. * QVAC-19900 feat[api]: shared idle-reaped managed serve daemon Rework managed mode from a per-provider supervisor into a shared, self-cleaning serve daemon so it is robust standalone and usable by any tool, not just a single session. - Reuse via a fleet key (model set + per-model config + host) keyed in a cross-process registry under ~/.qvac/managed-serves/; createQvac attaches to a matching healthy serve instead of cold-starting a duplicate. - A detached runner owns the qvac serve child and reaps it once no consumer process has been alive for serveIdleTimeout (default 5m). Liveness, not request traffic, is the signal, so it works for tools that hit baseURL directly (OpenCode/Cline/Aider). - close() now detaches (deregisters the consumer) instead of killing; a shared serve survives until its last user is gone. - Sweep only reaps dead/orphaned serves, never a healthy serve a live process owns (fixes a second session SIGKILLing a downloading serve). - Respawn-on-failure: fetch re-resolves and retries once on ECONNREFUSED. - reuse:false (or a pinned servePort) yields a private serve reaped as soon as its owner exits. Refactor into serve-process.ts (spawn/health/stop), registry.ts, fleet-key.ts, runner.ts; remove supervisor.ts and pid-tracker.ts. Add reuse and serveIdleTimeout options. Rewrite tests and add reuse/idle-reap end-to-end coverage; document the shared lifecycle in the README. * QVAC-19900 fix: reject duplicate model names in managed mode Each managed model maps to a single serve alias keyed by its name, so a repeated name silently overwrote the earlier entry — and could drop its `default: true`. Reject duplicates up front with DuplicateManagedModelError instead of resolving them ambiguously. Addresses PR review feedback. * QVAC-19900 fix[api]: address managed-mode self-review findings - Per-instance consumer markers (<pid>.<rand>) so two providers in one process sharing a fleet key don't deregister each other on close (A). - Restrict respawn retry to ECONNREFUSED so an in-flight completion is never blindly replayed on ECONNRESET/EPIPE (C). - Health-check the recorded baseURL before SIGTERM-ing an orphaned serve, guarding against killing a recycled pid (D). - Use dirname() instead of a posix-only regex for ephemeral config cleanup (E). - Fold serveBinPath into the fleet key so distinct local builds don't share a serve (G). - Export managed error classes + QvacManagedErrorCode for instanceof checks (H). - Reject more than one explicit default: true (I). - Deregister the consumer if resolveServe throws (F); drop dead firstConsumerPid runner param (J). Tests: per-instance markers, health-gated orphan sweep (kills serving orphan, spares non-serving stranger pid), fleet-key serveBinPath sensitivity, multiple-default rejection. README updated. * QVAC-19900 fix[api]: address managed-mode lifecycle review (round 2) Lifecycle correctness: - Spawn lock: steal only when the owner pid is dead (with an mtime fallback for an unreadable lock), so a legitimate multi-minute cold start no longer loses its lock after 30s and spawns a duplicate runner/serve (#1). - close(): the fetch path now bails out instead of re-resolving once closed, so a request racing close() can't silently re-add a consumer / spawn a runner (#3). - sweepServes: when an orphaned serve's pid is alive but its health check fails, keep the record instead of dropping it — dropping stranded a live serve with no registry trace. We only reap once it answers as ours, or drop once its pid dies (#4). - servePort: fold a pinned port into the fleet key so pinned-port callers don't reuse an auto-allocated serve on a different port, and distinct pins don't collide (#5). - Respawn: expose baseURL/port/pid as getters over live state, updated on every reconnect, so diagnostics/external clients see the real serve after recovery (#6). - retargetUrl now handles Request inputs (not just string/URL) so a respawn stays transparent if the SDK ever switches input shapes (#8). Docs: - README + docs-site: direct-baseURL tools (OpenCode/Cline/Aider) don't extend liveness; document the long-lived-sentinel/wrapper pattern and fix the misleading "the script doesn't have to stay running" note (#2). - Reconcile version wording: README/changelog now describe managed mode as unreleased (package is 0.1.0); docs-site integration page documents managed mode + the async overload (#7). Tests: spawn-lock steal/keep matrix, fleet-key pinned-port sensitivity, and the runner-dead + serve-alive + health-failing sweep case. Build + suite green (60 pass / 1 integration skip). * docs: use canonical qvac.tether.io URL in ai-sdk-provider README * QVAC-19900 feat[api]: public model catalog + catalog-id aliases in managed mode Add `models.qvacCatalog`, a public models.dev-style catalog that maps friendly ids (`qwen3.5-9b`) to the SDK constant the serve loads (`QWEN3_5_9B_MULTIMODAL_Q4_K_M`), so the id a user picks from models.dev resolves end-to-end with no translation layer in front of the serve. Managed mode now accepts catalog ids as model names: the synthesized serve config keys the alias by the friendly id while `model` resolves to the underlying SDK constant, so the serve answers `qwen3.5-9b` directly. Bare SDK constants keep working unchanged. A drift unit test fails CI if any catalog constant disappears from the generated SDK catalog. * QVAC-19900 feat[api]: process-group serve teardown + closeOnParentExit Harden managed-mode lifecycle so a managed serve never leaks its `bare` inference worker or outlives the process that owns it. - Process-group teardown: spawn `qvac serve` detached (its own group) and, when stopServe must escalate past the grace window, SIGKILL the whole group. A plain SIGKILL of the serve pid never cascades to the grandchild bare worker, so previously a wedged serve orphaned the worker. The graceful SIGTERM is still sent to the serve process only, so a healthy serve orchestrates its own shutdown and releases the global worker lock (no stale lock left behind); the group SIGKILL is the wedged-path fallback. - `closeOnParentExit` option: for a daemon-style host whose sole job is to keep a managed serve alive for a parent process (e.g. an editor/agent plugin). The provider watches its parent pid and, the moment the parent exits (on POSIX we are reparented to init, ppid → 1), closes itself — deregistering the consumer so the runner reaps the serve — and exits. Without it a hard-killed parent would leave a reparented host alive, keeping its consumer marker forever so the serve was never reaped. Tests: a stubborn-grandchild fake serve proves group teardown reaps the worker; `parentIsGone` unit-tests the parent-watch decision. * QVAC-19900 fix: keep managed serve lifecycle correct under close() race and crash-respawn - Undo the consumer re-registration when close() wins the race against an in-flight fetch retry: resolveServe re-adds the marker after close() removed it, which would keep the shared serve warm until the process exits. - Preserve live consumer markers when sweepServes reaps a crashed/orphaned serve, so a respawned runner inherits the still-alive sessions instead of idle-reaping the fresh serve out from under them. - docs: bump managed-mode ctx_size examples to 32768 for agent-sized prompts. * QVAC-19900 fix: rename reresolve result to resolved for clarity in managed fetch * QVAC-19900 mod: collapse redundant sync/async registry teardown helpers removeConsumer/removeConsumerSync and removeRecord/removeRecordSync were a confusing sync/async mirror: the async removeConsumer was only ever called right after the sync one (a guaranteed no-op), and the removeRecord pair was really two teardown semantics under near-identical names. Marker/record teardown is a single unlink/rm, cheap enough to be synchronous everywhere — including process 'exit' handlers where async can't run — so collapse each pair into one sync function. No behaviour change; addresses review feedback on #2408. * QVAC-19900 mod: trim verbose comments in managed registry Tighten the sync-rationale comments on removeRecord/removeConsumer and drop a stale, broken leftover comment above ensureDirSync. Keeps the non-obvious intent (why sync, preserveConsumers semantics) without the narration. * QVAC-19900 mod: drop unused DEFAULT_SERVE_BIN and ephemeralConfigName Both were dead: DEFAULT_SERVE_BIN was never imported (serve-process spawns the resolved CLI path verbatim) and ephemeralConfigName was an unused helper (writeEphemeralConfig uses a fixed name inside an mkdtemp dir). Removing the latter also drops the now-unused randomBytes import.

testing qvac-cli-integration

67e9392

Proletter merged commit 266e6d5 into main Jan 8, 2026
3 of 4 checks passed

gianni-cor mentioned this pull request May 5, 2026

QVAC-17990 Add standalone ESRGAN upscaler API #1901

Merged

simon-iribarren mentioned this pull request May 13, 2026

QVAC-17876 feat[bc]: replace onnx-tts with ggml-tts #1992

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing qvac-cli-integration#6

testing qvac-cli-integration#6
Proletter merged 1 commit into
mainfrom
qvac-cli-integration-test2

Proletter commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Proletter commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant