QVAC-19094 feat[api]: expose Sortformer v2.1 + AOSC streaming knobs in @qvac/sdk#2176
Closed
pratiknarola-t wants to merge 1309 commits into
Closed
QVAC-19094 feat[api]: expose Sortformer v2.1 + AOSC streaming knobs in @qvac/sdk#2176pratiknarola-t wants to merge 1309 commits into
pratiknarola-t wants to merge 1309 commits into
Conversation
Installs rustup on Windows when missing and runs `rustup target add` for the matrix entry (iOS device+sim arm64/x64, Android arm64/armv7).
…urns (#1737) * fix: skip kv-cache savedCount on cancelled or zero-token turns When a completion was cancelled mid-decode (or returned zero tokens) with `kvCache` enabled, the SDK was still recording `history.length + 1` in `cachedMessageCounts`. On the user's next prompt, `prepareMessagesForCache` would then slice the history against that stale count and hand the model an empty payload, producing a silent no-response. See QVAC-17780 for the repro. Fix: - Split the pure state + decision logic into a new `kv-cache-state.ts` module (no bare-* imports so it can be unit tested under bun). - Add a per-model cancel counter; `cancel.ts` bumps it before calling the addon, and `completion()` snapshots it around `processModelResponse` to detect cancellation. - `processModelResponse` now reports `producedTokens`; a new `shouldRecordSavedCount(wasCancelled, producedTokens)` helper gates both the custom-key and auto-keyed `recordCacheSaveCount` call sites. On cancel or zero-token turns, the entry is deleted instead of being poisoned. - Harden `decideCachedHistorySlice` so that a `savedCount` which would slice the history to `[]` is treated as stale: fall back to sending the system-stripped full history and clear the bad count. Protects against any future path that records an off-by-one count. * chore: drop ticket/PR references from kv-cache fix comments Keep only the comments that explain intent or behaviour; the "why" already lives in the commit log and PR body. * fix: thread platform path.sep into clearCachedMessageCounts The pure state module's "/" default was load-bearing only for unit tests under bun. Runtime callers go through delete-cache.ts → completion-stream.ts, where bare-path is available — inject `path.sep` there so directory-prefix clears match correctly on every target (including Windows). * fix: drop stale cache file + registry entry on cancelled custom-key turn The custom-key cancel branch only cleared `cachedMessageCounts` while leaving the on-disk cache file (and the in-memory `initializedCaches` entry) in place. The addon writes the cache unconditionally on `saveCacheToDisk` turns — including cancellations — so what's left on disk holds partial decode state past the user prompt. Next turn would then load that stale KV state on top of the new prompt and reply incoherently. Mirror what the auto-key branch already does for the same outcome: unlink the cache file and clear the cache registry entry. Without the registry clear, `customCacheExists` would still return true (it checks the in-memory flag before the file) and the SDK would skip the system re-prime, asking the addon to load a now-deleted file. Next turn now re-primes cleanly — a one-turn perf hit, but no risk of corrupted KV state.
infra: unify native prebuild workflows behind a single reusable workflow
Each of the 9 native-prebuild workflows (parakeet, onnx, nmtcpp,
diffusion-cpp, llamacpp-llm, llamacpp-embed, whispercpp, onnx-tts,
ocr-onnx) shipped its own ~200-line copy of the same generate / build /
install / strip / upload sequence, with subtle drift across them
(different matrix entries, different cmake defines, different artifact
prefixes, dead inputs). This made cross-cutting CI changes (action SHA
bumps, runner moves, security patches) costly and obscured the small
set of legitimate per-package differences behind boilerplate.
- Add `.github/workflows/reusable-prebuilds.yml`: a `workflow_call`
reusable that owns the canonical 9-entry matrix (linux x64/arm64,
android arm64, ios arm64 + sim arm64/x64, darwin arm64/x64,
win32 x64), the standard step sequence (setup-build-host, checkout,
setup-aws-prebuild, setup-vcpkg, setup-bare-tooling, setup-apple-clang,
optional Vulkan SDK, npm install, optional Rust toolchain, compute
defines, bare-make generate/build/install, strip `.a` + debug symbols),
and the merge job that uploads the well-known `prebuilds` artifact.
- Convert all 9 prebuild workflows into thin (~45-line) wrappers around
the reusable. Net effect across the 10 touched files: +436 / -1534.
- Inputs let each wrapper opt into legitimate per-package needs without
forking the workflow:
- `workdir`, `ref`, `repository`,
- `artifact-name-prefix` (must match `PREBUILD_ARTIFACT_PREFIX`
consumed by the mobile integration tests),
- `linux-extra-packages`, `mac-brew-packages`,
- `include-vulkan-sdk` (diffusion-cpp / nmtcpp / llamacpp-llm /
llamacpp-embed / whispercpp),
- `extra-cmake-defines` (verbatim) and `platform-cmake-defines`
(per-platform overlay; used by whispercpp for `WHISPER_USE_METAL=ON`
on darwin/ios only),
- `setup-python-on-windows` (parakeet, onnx-tts),
- `setup-rust-toolchain`, delegating to the new
`setup-rust-prebuild` composite action (PR #1803, SHA-pinned)
that bootstraps rustup on Windows when missing and runs the right
`rustup target add` calls for ios / android matrix entries
(used by onnx-tts).
- Centralise CMake defines: `BUILD_TESTING=OFF` is now applied globally
in a dedicated `Compute cmake defines` step; `ANDROID_STL=c++_shared`
is added there too. `VK_PROFILING=ON` is no longer a custom input —
callers pass it through `extra-cmake-defines` when needed.
- Drop dead leftovers along the way:
- `prebuilds-ocr-onnx.yml` no longer pulls OCR models from S3 nor
uploads a `models` artifact: a repo-wide search shows zero
consumers; downstream OCR flows fetch a different model set
(`rec_dyn`) directly from S3.
- The `tar-run-id` input on `prebuilds-qvac-lib-infer-onnx.yml` (and
formerly on `prebuilds-ocr-onnx.yml`) was declared but never read
or passed by any caller — removed.
- The `Verify UIKit Frameworks` diagnostic step (present in 5
workflows) was a one-off iOS bring-up check that no longer adds
value — removed.
- Bump `actions/setup-python` from v5 to commit-pinned 6.2.0 in the
wrappers that need it, clearing the Node.js 20 deprecation warning.
…on with folded errors and aggregated release notes (#1753) * feat: add Nunjucks rendering pipeline, AI augmentation, release CI, and link validation * fix: update remaining things * fix: fix codeql issue * fix: fix build issue * fix: update review comments * fix: update api ref generation * fix: update review comments * fix: update api ref generation * fix: deleted the unused resolveViaTypeScript function * fix: update api gen missing functions and values * fix: revert sdk updates * fix: update the api gen * fix: update git pipeline * fix: removed the dead code * doc: collapse API reference into single-page summary per version with folded errors and aggregated release notes * fix: fix code security alert issues * fix: update the review comments * fix: update the merge conflicts and docs-workflow * infra: pin docs release pipeline to trigger commit via dual checkout to close concurrent-main race * fix: update review comments
…ktop integration tests (#1785) * QVAC-17892 fix[ci]: use refreshed base-memory bergamot models for desktop integration tests Bumps S3 path date from 2025-12-18 -> 2026-04-28 for all four Bergamot pairs (enit, esen, fren, enes) used by integration-test-qvac-lib-infer-nmtcpp.yml. The 2025-12-18 snapshot held the `tiny` Bergamot variant for `enit` and `esen` while every other lane (mobile CI, lib/bergamot-model-fetcher.js, real SDK consumers) used `base-memory` from Firefox CDN. That mismatch showed up in QVAC-17474 Phase B perf-reports as a 3pp chrF++ drop on [Bergamot] [CPU] and a 33pp drop on [Pivot es->en->it] [CPU]. Olya re-uploaded all four pairs as `base-memory` (32MB v2.x from firefox-translations-models) under the new 2026-04-28/ path. This commit points the workflow at those bytes so desktop quality scores match mobile. Made-with: Cursor * QVAC-17892 fix[ci,registry]: align workflow paths and registry manifest with refreshed bergamot bytes Two follow-up corrections to the original 4-line workflow date bump: - Workflow: revert bergamot-fren and bergamot-enes back to 2025-12-18/. Those two pairs were already base-memory (32MB) in the old snapshot and were not part of the 2026-04-28/ refresh. The previous 4-line bump succeeded only because the test's ensureBergamotModel() falls back to Firefox CDN at runtime when the dir is empty (silent fallback). Reverting makes the workflow honest about what exists in S3 and removes the runtime-fallback dependency. Net diff vs main: 2 lines (enit + esen only - the only pairs whose S3 bytes actually flipped variant). - registry-server data/models.prod.json: align enit and esen entries with the refreshed S3 bytes: * 8 source-line date bumps (2025-12-18/ -> 2026-04-28/) * 8 link-line corrections (firefox-translations-models/tree/main/ models/tiny/<pair> -> .../base-memory/<pair>) The link field previously claimed "tiny" but those uploads are now the base-memory variant per Olya's 2026-04-28 re-upload. - registry-server client/NOTICE: hand-patch lines 158 and 168 to reflect the variant change (tiny -> base-memory) for enit and esen. Equivalent to what notice-generate would emit when re-run against the corrected manifest, but minimal and predictable diff. Other Bergamot pairs keep their 2025-12-18/ S3 paths in both workflow and manifest because those bytes have always been correct (base-memory). Made-with: Cursor
…1728) Co-authored-by: Yauheni Pankratovich <ypankratovich@wallarm.com> Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
…ythonic, json) with override (#1802) * feat: add native tool-call dialect routing (hermes, pythonic, json) with override * doc: add native tool-calling pythonic example * doc: add expected output examples to toolDialectSchema * fix: address review feedback on native tool-call dialect routing
* fix(ci): pull bare_console.log from iOS app sandbox on crash When a test crashes on the AWS Device Farm iOS hosts (jetsam OOM kills, SIGTRAP/SIGABRT, watchdog) the wdio `before` hook trips `checkAppCrash` / `crashMonitor`, which call `process.exit(1)` immediately. The wdio `after` hook never runs, and the C++ side `bare_console.log` (where the BareKit / native side prints stdout) is still sitting inside the application sandbox. Device Farm's `Customer_Artifacts.zip` is sealed off the host filesystem only, so when the test fails the C++ log is missing from the artifact tarball — which is exactly the case Olya / Gustavo flagged on https://github.com/tetherto/qvac/actions/runs/25101382346/job/73555220736. Fix: * iOS — hoist the existing `after`-hook `pull_file` block into a `global.flushBareLog(reason)` helper. `checkAppCrash` and `crashMonitor` now `await flushBareLog("crash-…")` (with a short `browser.pause(1500)` first, so any in-flight stdout has a chance to hit disk) before exiting. The `after` hook reuses the same helper on the normal completion path. * iOS + Android — replace the bare `process.exit(1)` in `checkAppCrash` / `crashMonitor` with `setTimeout(function(){process.exit(1);},5000)`. This gives os_log / logcat / bare stdout a 5 s drain window before Appium is torn down, which on Android is enough by itself (logcat is captured by Device Farm directly) and on iOS bookends the Appium pull above. Both WDIO configs were syntax-checked with `vm.Script` and the workflow round-trips through PyYAML. The biggest `run:` block in the workflow is now 18,408 chars, well under the 21,000-char GH Actions expression budget. This change was originally made on the `feature-qvac-17830-vlm-perf-metrics` branch (so that PR's mobile delta perf data would survive a fruit-plate crash) and is now extracted to a standalone branch so it can land independently of the perf work. Made-with: Cursor * fix(ci): roll out bare_console.log crash flush to all mobile addons The first commit on this branch fixed the missing C++ side log on iOS Device Farm crashes for the llamacpp-llm workflow. The same wdio crash-detection pattern is used verbatim across every other mobile addon workflow (and they all currently lose bare_console.log on crash in exactly the same way). This commit ports the fix to all of them so the artifact tarball contains the C++ log regardless of which addon crashes. Workflows updated (8): * integration-mobile-test-qvac-lib-infer-llamacpp-embed.yml * integration-mobile-test-qvac-lib-infer-onnx-tts.yml * integration-mobile-test-qvac-lib-infer-parakeet.yml * integration-mobile-test-qvac-lib-infer-whispercpp.yml * integration-mobile-test-decoder-audio.yml * integration-mobile-test-diffusion-cpp.yml (preserves the generated-images pull in `after:`) * integration-mobile-test-qvac-lib-infer-nmtcpp.yml (heredoc shape + _healthInterval crash machinery, uses browser.pullFile) * integration-mobile-test-ocr-onnx.yml (heredoc shape, simple checkApp- Crash, raw-HTTP pull_file like llm) Each iOS WDIO config now hoists the inline `after`-hook bare_console.log pull into a `global.flushBareLog(reason)` helper, calls it from `checkAppCrash` (and `crashMonitor` where present) before `process.exit(1)`, and wraps the exit in `setTimeout(function(){process.exit(1);},5000)` so os_log + bare stdout have time to drain. Android side gets the same 5 s drain wrap (logcat is captured directly by Device Farm so no `pull_file` needed there). Validation (run locally before push): * All 18 patched WDIO bodies parse cleanly via vm.Script. * All 9 mobile workflow YAML files round-trip through PyYAML. * Biggest `run:` block in any workflow is now 18,408 chars (the LLM workflow, unchanged), well under the 21,000-char GH Actions expression budget. The LLM workflow on this PR will be triggered via workflow_dispatch as the regression smoke test — it's the most exercised mobile workflow and the one where the original fix was developed, so a passing run there confirms the rollout didn't break anything. Made-with: Cursor * fix(ci): bound iOS crash flushBareLog so process.exit always fires @gianni-cor pointed out (PR #1804 review) that the crash-exit setTimeout was registered AFTER `await global.flushBareLog(...)` in both `checkAppCrash` and `crashMonitor`. If Appium/WDA was already unhealthy after an app crash, the raw HTTP `pull_file` (or `browser.pullFile`) call can hang indefinitely, so the `setTimeout(function(){process.exit(1);},5000)` was never scheduled and a fast crash turned into a long CI timeout. Fix: * Pre-register `setTimeout(function(){process.exit(1);},5000)` BEFORE any awaits on the crash path. This is the hard upper bound — even if `browser.pause` and the flush both wedge, libuv still fires the timer and exits the runner at T+5 s. * Wrap the flush itself in `Promise.race` against a 3 s timeout so the await chain itself is bounded: try { await browser.pause(1500) await Promise.race([ global.flushBareLog('crash-' + stage), new Promise(function (_, rj) { setTimeout(function () { rj(new Error('bare-log flush timed out')) }, 3000) }) ]) } catch (_) {} This keeps the flush best-effort: if WDA is alive we still get bare_console.log into Customer_Artifacts.zip; if it's wedged we abandon at T+3 s and the pre-registered exit fires shortly after. Workflows updated (9): * integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml (checkAppCrash + crashMonitor, plus a refreshed design-intent comment that explains the pre-registered exit invariant) * integration-mobile-test-qvac-lib-infer-llamacpp-embed.yml * integration-mobile-test-qvac-lib-infer-onnx-tts.yml * integration-mobile-test-qvac-lib-infer-parakeet.yml * integration-mobile-test-qvac-lib-infer-whispercpp.yml * integration-mobile-test-qvac-lib-infer-nmtcpp.yml (checkAppCrash variant that also clears `_healthInterval` before exiting) * integration-mobile-test-decoder-audio.yml * integration-mobile-test-diffusion-cpp.yml * integration-mobile-test-ocr-onnx.yml Total: 10 call sites (8 `checkAppCrash` + 1 `crashMonitor` + 1 `_healthInterval`-clearing checkAppCrash). Validation (run locally before push): * All 18 patched WDIO bodies (9 workflows x Android + iOS) parse cleanly via vm.Script. * All 9 mobile workflow YAML files round-trip through PyYAML. * Biggest `run:` block in any workflow is now 18,912 chars (the LLM workflow, +504 chars from the Promise.race wrappers + refreshed comment), still well under the 21,000-char GH Actions expression budget. Made-with: Cursor
…re SDK execution layer" (#1795) * doc: content update - sdk - completion * doc: content new - SDK - runtime lifecycle * doc: update sidebar - add new page - runtime lifecycle * doc: replacement for PR 1735 to be closed * doc: fix broken link in completion page
…1811) `getRPC()` was using `void tracked.finally(...)` to clear the `inflightConnections` map entry once the connect attempt settled. `Promise.prototype.finally(cb)` returns a *new* promise that re-rejects with the original error on rejection — and `void` does not attach a handler, so when `tracked` rejected (e.g. `PEER_NOT_FOUND` / `DelegateConnectionFailedError`) the chain produced by `.finally()` was an orphaned rejected promise. The Bare worker's global `process.on("unhandledRejection")` handler treats any unhandled rejection as fatal: it calls `shutdownBareDirectWorker("unhandled-rejection")`, which runs `cleanupDownloads()` (cancelling all in-flight downloads, including the legitimate fallback-to-local download started from `handleLoadModelDelegated`'s catch block) and `destroySwarm()` (poisoning the DHT for every subsequent delegated call). That sequence is exactly the regression reported in QVAC-18162: - `delegated-load-model-fallback-local` fails with `Failed to load model: Download was cancelled` instead of falling back successfully — the fallback download is cancelled by `cleanupDownloads`. - `delegated-heartbeat-provider`, `delegated-cancel-download`, `delegated-connection-failure`, ... all hang at `🌐 Waiting for DHT to fully bootstrap...` until their per-test timeout — `getSwarm()` still returns the destroyed swarm and `swarm.dht.fullyBootstrapped()` never resolves. Fix: replace `void tracked.finally(clearInflight)` with `tracked.then(clearInflight, clearInflight)`. This registers the cleanup on `tracked` itself (not on a derived promise), so the rejection is *observed* — no orphan chain is created and no unhandled rejection is emitted. The caller still receives the original rejection through `await withTimeout(inflight, options.timeout)` two lines below. Regression introduced in #1729 (QVAC-18144). The topic-removal in #1729 is correct; this is a defect in the new `inflightConnections` deduplication logic added alongside it. A separate follow-up should harden the worker-level `unhandledRejection` policy so a single leaked promise can no longer take down the entire worker, but that change is broader than this fix. Co-authored-by: Opanin Akuffo <46673050+opaninakuffo@users.noreply.github.com>
* feat: add mobile Parakeet RTF reporting Run Parakeet RTF benchmarks through the mobile Device Farm workflow and combine desktop and mobile artifacts into a single report so cross-platform performance is visible in one place. Made-with: Cursor * fix: resolve mobile RTF benchmark shared module path Allow the mobile benchmark entrypoint to load the shared benchmark helper from either the source test layout or the generated Device Farm backend bundle so test app packaging succeeds. Made-with: Cursor * fix: increase Parakeet mobile Device Farm timeouts Give the mobile integration workflow enough time to finish the longer Device Farm test runs now that the RTF benchmark path is included, instead of being force-stopped at the 60 minute job timeout. Made-with: Cursor * fix: split Parakeet mobile perf and regular runs Mirror OCR's mobile reporting approach by isolating the RTF benchmark into dedicated Device Farm perf runs while keeping the regular mobile suite separate, then only extract benchmark artifacts from the perf lane. Made-with: Cursor * fix: make Parakeet mobile split workflow portable Replace bash mapfile usage with portable while-read loops for macOS runners and refresh AWS credentials before the long Device Farm monitor and log download phases so the split perf/regular workflow can run to completion. Made-with: Cursor * fix: honor mobile test filters in Parakeet addon Make the addon-side mobile wrappers read testFilter.txt and skip non-selected tests with zero-failure summaries so the perf and regular Device Farm lanes can actually execute different subsets without requiring framework changes. Made-with: Cursor * fix: use shared Parakeet mobile perf pipeline Made-with: Cursor * fix: extend Parakeet mobile performance matrix Made-with: Cursor * fix: keep iOS Parakeet mobile perf on TDT Made-with: Cursor * fix: split Parakeet mobile perf cases by model Made-with: Cursor * fix: address Parakeet PR bot findings Made-with: Cursor * fix: restore Parakeet mobile perf matrix shape Made-with: Cursor * Revert "fix: restore Parakeet mobile perf matrix shape" This reverts commit df12527. * fix: quarantine iOS Sortformer GPU perf case Made-with: Cursor --------- Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
* feat[api]: route responseFormat via per-request generationParams (QVAC-17939)
Adds an OpenAI-compatible `responseFormat` to `completion()`:
- `{ type: 'text' }` (default, free-form)
- `{ type: 'json_object' }` (any valid JSON)
- `{ type: 'json_schema', json_schema: { name, schema, ... } }`
Forwards the schema to the addon as a per-request
`generationParams.json_schema`, which the addon converts to GBNF and
applies for the duration of the request only — avoiding the previous
shared-`modelConfig.grammar` mutation, which was unsafe under concurrent
completions and didn't actually flow per-request anyway.
`tools` and a non-text `responseFormat` are mutually exclusive at the
schema layer (tools already constrain output via their parameter
schema). Bumps `@qvac/llm-llamacpp` to ^0.17.1 for the new addon API.
Includes `examples/llamacpp-structured-output.ts` demonstrating all
three modes against `QWEN3_600M_INST_Q4`.
* review: address #1768 comments — events/final example + json_schema field docs (QVAC-17939)
- Migrate `examples/llamacpp-structured-output.ts` from the legacy
`tokenStream` surface to the canonical `events` / `final` API
recommended for new SDK code. Streaming now consumes
`contentDelta` events and aggregates via `final.contentText`.
- Document `json_schema.description` and `json_schema.strict` as
accepted-for-OpenAI-compatibility-only on `responseFormatSchema`
via `.describe()` annotations, and on the `responseFormat` JSDoc
in `client/api/completion-stream.ts`. Both fields are accepted
by the schema but not forwarded to the addon —
`getResponseFormatJsonSchema()` only forwards `json_schema.schema`,
and `strict: true` does NOT trigger OpenAI's auto-tightening
(implicit `additionalProperties: false`, all properties required).
Callers wanting strict validation must encode it explicitly in
`schema`. Honoring `strict` semantics natively is tracked as a
follow-up — out of scope for this PR.
* review: drop double blank line in completion-stream.ts (QVAC-17939)
…moke (#1797) * feat: add pre-terminate cleanup signal for SDK clients Lets a client request a clean addon teardown before tearing the bare runtime down, so addon static state (e.g. js_ref_t handles into the worker V8 isolate) is released while that env is still alive. Without this, tearing down a runtime whose addons retain isolate-bound refs trips a V8 GlobalHandles assertion (brk 0 / SIGTRAP) inside the next runtime that re-imports the same .bare files in the same OS process — the JsLogger.setLogger path in qvac-lib-inference-addon-cpp is the reproducer (every addon that links it has the same retention). - worker-core.ts: extract the existing shutdown body into a reusable cleanupForTerminate() that runs the same registry / model / resource cleanup but skips releaseWorkerLock() and process.exit(). The full shutdownBareDirectWorker still runs both for desktop signal and exit paths. - handler-utils.ts + handle-request.ts: new internal __shutdown__ message dispatched alongside __init_config. Bypasses the schema, awaits cleanupForTerminate(), and replies success. Lazy-imports the worker-core function to break the handler-utils -> worker-core -> create-server -> handle-request import cycle. - bare-client.ts: mirror the message in the in-process mock RPC for desktop direct-mode (Pear-style) consumers. - expo-rpc-client.ts: close() is now async; sends __shutdown__ over RPC and awaits the success reply (with a 10s timeout safety) before calling worklet.terminate(). Best-effort: timeouts log a warning and proceed with terminate. The auto-close path in unload-model.ts already awaits close(), so this is non-breaking for that caller. * test: stabilise mobile smoke run via eviction-on-none and post-unload settle Two related fixes that together let the mobile smoke run progress past the "previous heavy model still resident" memory ceiling: - resource-lifecycle: tests with dependency:none used to skip evictExcept and leave whatever was loaded by the previous test resident. Now treated as evictExcept([]), so a heavy model from the prior test gets unloaded before the next one starts allocating. Empirically this is what kept tripping sharded-model-load right after translation-afriquegemma-sw-en (afriquegemma 4B leaves ~550 MB resident; sharded then asks for multi-GB on top and hits the iOS memory limit). - resource-manager: new ResourceManager({ unloadSettleMs }) option that sleeps for the configured duration after a successful unloadModel (only on success — failure path returns immediately). Lets the kernel release pages before the next load starts allocating. Defaults to 0 (off, desktop is fine without it). Mobile consumer opts in to 100ms. Mobile consumer also picks up SkipExecutor entries for the lifecycle-suspend tests; suspend hangs the runner indefinitely on mobile because the lifecycle coordinator pauses MQTT and never resumes within the test timeout. * chore: bump qvac-test-suite to ^0.6.2 Picks up: - in-app memory poller in mobile-consumer template - desktop in-app memory poller (process-tree RSS) - Memory tab + per-test memory metrics in HTML/JSON reports - bucket results by metadata.category instead of testId-prefix split Required by the eviction / settle work in this PR; both depend on the new MemorySummary fields and the corrected category bucketing. * fix: split cleanupRan and isShuttingDown so shutdown still releases lock cleanupForTerminate previously set the same isShuttingDown flag that shutdownBareDirectWorker uses as its early-return guard. After a __shutdown__ message ran the pre-terminate cleanup, a subsequent SIGTERM / SIGINT / uncaught-exception in desktop direct mode would early-return at the guard and skip releaseWorkerLock() + process.exit(). Result: lock file leak and no graceful exit. Mobile is unaffected because each Worklet has its own module instance (fresh isShuttingDown per worklet). The bug only bites the bare-client mock-RPC path (Pear-style consumers where the worker shares the host process for its lifetime). Two flags now: - cleanupRan: idempotent guard around runCleanup body - isShuttingDown: only set by shutdownBareDirectWorker; cleanupForTerminate must NOT set it shutdownBareDirectWorker still calls runCleanup which is now a no-op when cleanupRan is already true. * fix: serialise expo-rpc-client.close() to avoid duplicate __shutdown__ races If two callers race close() (or one calls close() while another getRPC() is mid-flight), the second sees rpcInstance still set, fires a redundant __shutdown__, then re-enters the terminate block on already-null state. Wrap the body in a singleton closingPromise; concurrent callers share the same in-flight close. Reset to null in finally so a fresh worker brought up later can be cleanly closed again. The auto-close path in unload-model.ts is naturally serialised today so this is robustness rather than fixing an active bug, but the cost is minimal and the failure mode (double __shutdown__ after terminate) is annoying to diagnose. * fix: skip Worklet.terminate() on non-iOS platforms Worklet.terminate() crashes on Android: addon dlclose unmaps the lib but pthread_key_t destructors registered by some addons (likely rocksdb-native, libbare-tls, libbare-crypto) are never pthread_key_delete'd before unload, so libc's per-thread cleanup table points at unmapped memory and the next pthread_exit SIGSEGVs in pthread_key_clean_all(). iOS dyld no-ops dlclose for already-loaded third-party libs, so the dangling-destructor problem cannot manifest there. The terminate path stays enabled on iOS. On non-iOS, fall back to the legacy refs-only close: drop rpcInstance and rpcPromise, leave workletInstance + workletInitialized intact so the next getRPC() reuses the live worklet. Skip the __shutdown__ roundtrip too -- it would clear the worker plugin registry without a follow-up terminate, leaving the worker unusable for subsequent loadModel. Trade-off: Android tests no longer recover memory between heavy tests the way iOS now does, so memory accumulates across the smoke run. On Pixel-class devices (8+ GB RAM) this is fine; smaller-RAM Android devices may regress vs the pre-PR baseline. Acceptable until the upstream holepunchto/bare exposes a per-addon unload hook. Platform is resolved via the existing getRuntimeContext() path (getDeviceInfo handles a missing expo-device safely via dynamic import + try/catch), so no new react-native imports are added. * test: skip diffusion-streaming-progress on mobile The test reliably times out on mobile (Android Pixel 10 Pro hit the 600s timeout in the latest smoke run). Test framework drops the await on timeout but the underlying streaming inference keeps running on the Bare worker side, leaving the diffusion model "in use" from the runtime's perspective. Knock-on effect: any later test whose modelSetup needs to evict diffusion (e.g. wrong-model-transcribe-on-llm via ResourceManager.evictExcept) blocks indefinitely waiting for the stream to finish. Observed in local-android-smoke: 86/88 tests completed, then the runner stuck for 50+ minutes inside the eviction of diffusion at test 86's setup. Skipping unblocks the smoke run end-to-end. The proper fixes (framework-side cancel-on-timeout, resource-manager bounded waits) are tracked separately.
* fix[api]: deterministic decoding for LLM translate
Force greedy decoding with a fixed seed and bounded output length on every
LLM translate call (non-African branch) so output is reproducible across
calls and runaway generations cannot blow ctx_size on the next call.
Background: with @qvac/llm-llamacpp 0.17.x, calling `translate()` against
Salamandra (loaded with no decoding params) intermittently produced
verbatim source echo, "Translation in Spanish:" preambles, or
`processPromptImpl: context overflow` on tiny inputs like "bank". The
flake was non-deterministic across runs on the same input, masked in the
smoke suite by `contains-any` validators that still matched a Spanish
keyword inside a preamble.
The change is one call site: when the model is llamacpp-completion and
the prompt is not the AfriqueGemma path, pass per-call generationParams
overriding sampling for that runJob:
- temp/top_k/top_p collapse to greedy
- repeat_penalty: 1.3 breaks single-token echo loops
(e.g. greedy "bank" -> "bank\nbank\n...")
- seed: 42 pins anything residual sampling
- predict: 256 caps output so a runaway can't accumulate KV state
Prompt template, NMT branch, and African branch are unchanged.
AfriqueGemma is loaded with its own deterministic config + stop_sequences
already, so we skip the override there.
Verified locally on @qvac/llm-llamacpp 0.17.1 with 30 calls
(streaming + en-es + context, 10 iterations each):
- before: 23/30 pass with 2 echoes, 2 ctx-overflow, 3 echoes
- after: 30/30 pass, all outputs identical across iterations
* refactor: extract LLM translate generation params into named constant
Pull the per-call sampling overrides for LLM translate out of the call
site into a top-of-file constant with a comment that explains the
purpose of each field. No behavior change — values are identical to the
previous commit.
Adding a third translate-friendly LLM model later still goes through
this single constant unless it needs different sampling, in which case
it would warrant a small profile lookup keyed on model family. That
restructure is deferred until a concrete second profile lands.
* refactor[api]: skip per-call sampling override for AfriqueGemma
Apply the per-call deterministic-decoding override only to non-AFRICAN_*
LLM models. AfriqueGemma's load-time `modelConfig` carries
`stop_sequences: ["\n"]` and `repeat_penalty: 1`, and these values must
not be overridden mid-call: with `repeat_penalty: 1.3`, the addon
penalises "\n" and the stop never fires, so generation runs all the way
to `predict` and produces non-translation output. The earlier attempt
to dispatch by `afriquePrompt` (language-pair-derived) silently did
nothing for the actual AfriqueGemma traffic: `isAfrican("sw")` returns
`false` because `AFRICAN_LANGUAGES_MAP` is keyed by FLORES codes
(`"swh_Latn"`), not the ISO codes the smoke tests pass.
This commit dispatches by model name (entry.local.name starts with
"AFRICAN_") and falls back to `model.run(input)` with no override —
identical to the pre-fix call shape — so AfriqueGemma's behaviour is
preserved exactly as it is on main. A latent AfriqueGemma garbage-output
issue exists at HEAD regardless of this PR; that is out of scope.
The constant is renamed `LLM_TRANSLATE_GENERATION_PARAMS` since it now
applies to every non-skipped LLM, not just Salamandra.
* refactor: tighten typing on per-call generation params
Pull `RunOptions` and `GenerationParams` from `@qvac/llm-llamacpp` and
use them in place of the loose `Record<string, number>` cast in
`translate()`. Define a `LlmTranslateGenerationParams` alias as the
specific subset of `GenerationParams` we set per call (six fields,
required) so a typo on any of them is a compile error. The cast on
`model.run.bind(model)` now references the addon's `RunOptions` shape
directly, which keeps us protected if the addon's option shape changes.
No behaviour change.
---------
Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>
…ios, raise c… (#1773) * test[notask]: stabilize mobile e2e (skip afriquegemma on ios, raise chatterbox timeout) * test[notask]: inactivity timeout bumps * test[notask]: wait out addon busy throw in logging tests * test[notask]: increase timeouts * test[notask]: skip diffusion on ios, drop kv-cache math assertion, revert heartbeat to 300s * test[notask]: bump parakeet-ctc-mp3 timeout for mobile cold-load * test[notask]: fix mobile sentence-stream + parakeet-ctc-mp3 timeout + stop-sequences flake * test[notask]: update types for tts executors * test[notask]: bump mobile e2e timeouts (device-farm 90m, consumer 1200s) * test[notask]: android skip diffusion tests * test[notask]: revert consumer-inactivity-timeout input for mobile workflows
…at/completions (#1810) * feat[api]: response_format support in OpenAI-compat /v1/chat/completions (QVAC-17939) Wires OpenAI's `response_format` field through the CLI serve adapter: - `{ type: "text" }` (default, free-form) - `{ type: "json_object" }` (any valid JSON) - `{ type: "json_schema", json_schema: { name, schema, ... } }` The adapter validates the shape, removes `response_format` from `UNSUPPORTED_PARAMS`, and forwards it to the SDK's `responseFormat` parameter, which converts the JSON Schema to GBNF in the addon and applies it for the duration of the request only. Depends on the SDK PR #1768 (`@qvac/sdk@>=0.10.0`) for the actual constraint to take effect end-to-end. Without that landed, the SDK silently ignores `responseFormat`. Includes 11 new `extractResponseFormat` unit tests in `test/translate.test.ts` covering valid shapes, optional fields, and validation errors. * chore: bump @qvac/sdk dep floor to ^0.10.0 CLI now consumes the per-request `responseFormat` API added in qvac-sdk 0.10.0 (PR #1768 / QVAC-17939). Caret-pinned to ^0.10.0 to keep upgrades within the same minor line. * chore: bump runtime MIN_SDK_VERSION to 0.10.0 Matches the package.json devDep floor bumped in the previous commit. The runtime guard in serve/core/sdk.ts otherwise still accepted SDK 0.9.x, which lacks the responseFormat parameter. Addresses opaninakuffo's review on PR #1810.
infra: align cpp-lint with reusable-prebuilds and gate it on PR-changed files Refactor the reusable cpp-lint workflow so its bootstrap is the same as `reusable-prebuilds.yml`, and tighten the linters so the step actually fails PRs on findings. Toolchain bootstrap: - Replace inline LLVM/Vulkan/vcpkg/Node setup with the shared composite actions used by prebuilds (`setup-build-host`, `setup-aws-prebuild`, `setup-vcpkg`, `setup-bare-tooling`, `setup-vulkan-sdk`). - Drop divergent scaffolding no caller exercises: the ssh/https `git insteadOf` rewrites, the `@tetherto:registry` npmrc setup, the `~/.npm` cache, and the hardcoded `node-version: 18.x` override (default `lts/*` now matches prebuilds). - Move `runs-on` from `ai-run-linux` to `ubuntu-22.04` to match prebuilds and decommission self-hosted usage for this job. - Add an optional `include-vulkan-sdk` boolean input (default `true`) mirroring the same input on `reusable-prebuilds.yml` so ONNX/decoder addons can opt out in a follow-up. Inputs / refs: - Resolve `BASE_SHA` / `HEAD_REF` / `HEAD_REPO` up-front with documented fallbacks (`inputs.sha → github.event.before → HEAD~1`) so `workflow_dispatch` runs work without a PR context. The `sha` input is now optional; existing callers already pass it. Lint scope and gating: - Drop `bare-make build` / `bare-make test` (duplicated the Linux x64 prebuild work; ctest finds nothing without `BUILD_TESTING=ON`, and the gtest binary belongs to the dedicated `cpp-tests-*.yml` workflows). - Keep `bare-make generate -D ENABLE_VULKAN=OFF` because clang-tidy needs the resulting `compile_commands.json`. `ENABLE_VULKAN=OFF` is honoured today only by `qvac-lib-infer-whispercpp` and is a no-op CMake variable elsewhere. - Scope `clang-tidy` to the C/C++ translation units touched by the PR by intersecting `git diff $BASE_SHA HEAD` with the TU list emitted by `clang-tidy-helper.py --files`. Header-only and removed-source diffs short-circuit cleanly. Reuse the helper's `--clang-tidy-cmd` output for the `-p` build dir and `--header-filter` regex; only the file list is replaced. - Pass `--warnings-as-errors='*'` to `clang-tidy-19` so every check enabled in the shared `qvac-lint-cpp/.clang-tidy` `Checks:` is promoted to an error and the step's exit code reflects findings (the shared config sets `WarningsAsErrors: ''`, which by itself would let clang-tidy print warnings and exit 0). - Wire `clang-tidy` into the existing failure tally alongside `cpp_files_fmt`. Toolchain pinning: - Pin both linters to their LLVM-19 binaries explicitly: `git-clang-format-19 --binary clang-format-19` and `clang-tidy-19`. Removes the dependency on whichever unversioned `clang-format` / `clang-tidy` wins PATH resolution (Ubuntu's stock 18.x was being picked up before this fix). The `::error` annotation contributors copy-paste locally is updated to match. cpp-lint remains advisory at the on-pr workflow level — no `prebuild`, `cpp-tests*`, integration test, or ts-checks job lists cpp-lint in `needs:`, so a failing cpp-lint doesn't block the addon build/test path; merge-guard semantics are unchanged.
* fix Vulkan SDK installation verification * run vulkaninfo on win11-rtx4000-hetzner * add job build-win11-nvidia-image * fix missing double quoted string value in poweshell * run vulkaninfo on ai-run-windows11-gpu * install vulkan sdk on ai-run-windows11-gpu * fix typo * fix typo * correct vulkaninfoSDK.exe path * Force WDDM Mode for Vulkan to work * nvidia-smi -fdm 0 * install nvidia grid drivers instead of datacenter driver * use Use .NET HttpClient with Progress Tracking * fix httpclient instance creation for newer powershell * dd-Type -AssemblyName System.Net.Http * try wget * test run vulkaninfo on ai-run-windows11-gpu * installl git in win11-nvidia-grid-image * disable inteeractivity when installing git with winget * git version separate step after winget install Git * test * test * install winget before winget install git * install winget from PSGallery * fix * fix2 * use Cyberboss/install-winget@v1 * add --source winget to winget install * add --custom '/o:PathOption=CmdTools' to winget installation of git for windows * do not verify git installation in path in same job where it gets installed (needs new job run) * verify that git is in the path for ai-run-windows11-gpu * uncomment needed lines * Install Chocolatey on win11-nvidia-grid-image * s/pwsh/powershell/ * verify choco installation * verify vcpkg * clone vcpkg * vulkaninfo check * install visual studio build tools, LLVM, and Vulkan SDK in base image for win11 gpu runner * do not install vulkan sdk on base win11 gpu image * rename vulkaninfo to win11-nvidia-image-builder.yml * add vulkaninfo.yml
…1796) * doc: content update - sdk - completion * doc: content new - SDK - runtime lifecycle * doc: update sidebar - add new page - runtime lifecycle * doc: content update - SDK - diffusion - add img2img gen * doc: replacement for PR 1735 to be closed * doc: new code example - SDK - img gen img2img with flux2 * doc: fix broken link in completion page * doc: sdk - create new example - img2img with klein --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
* fix: handle Harmony <|call|> EOG token for GPT-OSS tool calling GPT-OSS models use <|call|> as a frame delimiter in Harmony tool-call protocol. This token is in the EOG set, causing generation to stop silently before tool calls reach the SDK. Add Harmony model detection and <|call|>-specific handling in the generation loop: render the token as visible text (special=true) so the SDK can parse frame boundaries, then stop generation for the turn-based tool execution protocol. * Example added to showcase GPT-OSS mutli-turn tool call * CHANGELOG added, package version bumped to 0.19.1 * CPP lint applied --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
* doc: create new page - sdk - new ai capability - voice assistant * doc: create new page - sdk - voice assistant * doc: remove temporary doc that leaked into commit * doc: content new - sdk - voice assistant - PR review
…1834) GitHub only registers `workflow_dispatch` for workflow files that live on the default branch. The full benchmark orchestrator is wired in PR #1677 (QVAC-17092) but cannot be triggered from the Actions UI or `gh workflow run` until the file exists on `main`. This stub adds the file with a no-op job so the workflow becomes dispatchable. Dispatching with `--ref <branch>` runs the *branch's* version of the YAML, which is how we'll validate PR #1677's orchestrator end-to-end before merging it. The stub will be overwritten by PR #1677's real workflow on merge.
* test[notask]: fix android sharded-model-resume scudo oom
Tag sharded-model {detection, hash-validation, progress, resume,
cancellation} with dependency "sharded-embeddings" instead of "none".
With dep:none and the default loadSharded handler, modelSetup evicts
sharded-embeddings then immediately re-mmaps the same 5 shards; on
Android (Pixel 10 Pro) Scudo's mmap fails with "internal map failure"
before the kernel reclaims the prior maps, killing the worklet and
cascading the rest of the sharded category.
Also bump mobile unloadSettleMs 100 -> 200 ms to keep some slack for
remaining same-model unload / reload paths.
* infra[notask]: pass --report-dir to CI sdk producer runs
Without --report-dir, BatchOrchestrator skips writing app-mem.ndjson
and test-timeline.ndjson, so mobile in-app memory samples published
on qvac/app-memory get dropped silently and the per-test memory rows
/ chart / suite peak never make it into the report. run:local already
passes --report-dir; the three CI workflows did not.
Pin --report-dir=./reports in test-android-sdk.yml, test-ios-sdk.yml
and test-desktop-sdk.yml. The existing "Upload results" step already
uploads ${working-directory}/reports/ so the new files ride along.
---------
Co-authored-by: Victor-Rodzko <victor.rodzko@itrexgroup.com>
Co-authored-by: Opanin Akuffo <46673050+opaninakuffo@users.noreply.github.com>
…se/workaround bare double creation issue, harden addon-cpp workflows (#1825) * add JS integration CI for inference addon Run addon JS integration packages across desktop targets so callback lifetime and platform-specific runtime issues are exercised in PR checks. * fix output callback lifetime cleanup Keep callback state alive until the libuv async handle is closed so pending JS output delivery cannot observe freed addon storage during teardown. * add JS number creation integration test Add a minimal integration package that exercises first double creation through js::Number and keeps js_create_int32 as a control across the JS test matrix. * work around Windows js_create_double first call Route addon double creation through js::Number so win32 can burn the first js_create_double call observed to produce an invalid value on GitHub Azure runners. * version * fix Windows double burn-in guard Use a function-local static initializer for the Windows-only js_create_double burn-in. This keeps the workaround process-wide without carrying template-level atomic state before returning the requested double value. * harden PR JS test workflow Limit the addon-cpp JS PR tests to read-only repository access and avoid release secrets, inherited secrets, and persisted checkout credentials. Install fixture dependencies without lifecycle scripts so PR-controlled package code cannot run during setup. * fix PR JS workflow trust boundary Run addon-cpp JS tests from the unprivileged pull_request workflow so PR-controlled code is not executed from pull_request_target. Keep a lightweight verify-label gate for external forks while leaving the privileged workflow focused on authorization and native tests. * harden addon-cpp PR workflows Move PR-controlled native test execution out of pull_request_target so cache restore and builds run without write-capable credentials, inherited secrets, or PAT-backed checkouts. Keep pull_request_target limited to clearing the verify label when new commits arrive, and make unverified PR events skip the native/JS matrices without failing the workflow. --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
…flow_dispatch input (#1824) Co-authored-by: Matt Cavanagh <1789097+darkynt@users.noreply.github.com> Co-authored-by: tamer-hassan-tether <tamer.hassan@tether.io>
The publish step now uses npm trusted publishing (OIDC), so the legacy NPM_TOKEN auth in the build step's .npmrc and bunfig.toml is no longer needed. @qvac/* packages on npmjs.org are public, so anonymous reads work for installs. Aligns with #1618 (DEVOPS-2062) which migrated the publish step to trusted publishing but left the build-step NPM_TOKEN references in place.
… CMakeLists (#1852) The version in CMakeLists.txt kept drifting from vcpkg.json (currently 1.1.4 vs 1.1.5) but was never consumed: no downstream package calls find_package(qvac-lib-inference-addon-cpp <version> ...) — they all use find_path() to locate headers. Remove the project VERSION and the generated ConfigVersion.cmake so vcpkg.json stays the single source of truth. Bump vcpkg.json version to 1.1.6, as it drifted from the version in the registry.
…dk publish doesn't auto-skip (#1853) publish-npm needs [build, publish-logic, release-merge-guard]. On a manual workflow_dispatch from a release-sdk-* branch, the guard's if: rejected the event (push only), so the guard was skipped, and GitHub Actions' implicit success() check on needs auto-skipped publish-npm before its if: with the explicit needs.release-merge-guard.result == 'skipped' branch could even be evaluated. Allow the guard to run on workflow_dispatch too. The guard already handles workflow_dispatch safely: github.event.before is empty, so base-sha is empty, so isInitialPush is true and the changelog diff check is skipped. The branch-name pattern check and the package.json-version-matches-branch check still run, which is what we want for a manual release publish. Net effect: manual publish-sdk dispatches on release branches now actually reach the publish-npm job instead of silently skipping.
Contributor
QVAC E2E —
|
Contributor
QVAC E2E —
|
Contributor
Contributor
QVAC E2E —
|
Contributor
ishanvohra2
reviewed
May 21, 2026
Contributor
Tier-based Approval Status |
ishanvohra2
reviewed
May 21, 2026
ishanvohra2
left a comment
Contributor
There was a problem hiding this comment.
Just one minor comment
Reviewer ask: the new example ran top-level await without the
repo's example error wrapper, so a failure during load/transcribe
exited 0 (unhandled-rejection trace, but no non-zero exit code).
Align with .cursor/rules/sdk/example.mdc and the sibling
parakeet-tdt-filesystem.ts pattern -- wrap load -> transcribe ->
unload in try { ... process.exit(0); }, with catch printing
console.error("❌ Error:", error) before process.exit(1).
ishanvohra2
approved these changes
May 21, 2026
NamelsKing
approved these changes
May 22, 2026
83a7f13 to
6d5ab04
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
QVAC-19094 feat[api]: expose Sortformer v2.1 + AOSC streaming knobs in @qvac/sdk
🎯 What problem does this PR solve?
@qvac/transcription-parakeet@0.5.0added v2.1 Sortformer + AOSC(Audio-Online Speaker Cache) support natively, but
@qvac/sdkexposed no way to opt into the new streaming/AOSC knobs. Consumers
wanting v2.1's stable speaker-slot anchoring across silence and
re-entry had to drop down to
@qvac/transcription-parakeetdirectly, bypassing the SDK surface.
so the addon could land first; this PR re-applies it now that
0.5.0 is published and the registry bump is merged.
📝 How does it solve it?
parakeetRuntimeConfigSchemacovering7 streaming-session knobs + 6 AOSC knobs; names, primitive types,
and positive/non-negative refinements mirror the published addon's
ParakeetConfig.parakeet-transcription/plugin.tsinto the addon's
parakeetConfigliteral. The localParakeetModelConfigtype is extended with matching declarationsso the SDK's strict TS settings
(
noPropertyAccessFromIndexSignature, etc.) accept the newconfig.streaming*accesses.parakeet-sortformer-streaming.tsexample demonstratingthe streaming + AOSC path with a CLI-provided model source. Once
the v2.1 GGUF entry lands in
models.prod.json, the exampleswitches to the auto-generated
PARAKEET_SORTFORMER_STREAMING_V21_AOSCconstant.
🧪 How was it tested?
./node_modules/.bin/tsc --noEmit -p tsconfig.json— clean(catches the
ParakeetModelConfigextension; would fail withoutit under strict TS).
bun run lint(eslint + typecheck) — clean.bun run test/unit/parakeet-schemas.test.ts— 8/8 pass, includingtwo new cases: happy-path streaming + AOSC config, and rejection
of negative
streamingSpkCacheLen.bun run test/unit/plugin-cancel-capability.test.ts— 7/7(touches the parakeet plugin import surface, unchanged).
🔌 API Changes
New optional fields on
parakeetConfigSchema/ParakeetConfig.All
.optional()— no breakage for existing callers. Auto-detectedon v2.1 GGUFs via the
parakeet.model_variantmetadata tag; ignoredby v1/v2 Sortformer and non-Sortformer engines.
Depends on
@qvac/transcription-parakeet >= 0.5.0— already pinned onmain.packages/registry-server/data/models.prod.json(separate PR;drives the auto-generated
PARAKEET_SORTFORMER_STREAMING_V21_AOSCconstant).