Skip to content

test[notask]: validate verified-only label gate end-to-end#2178

Closed
Proletter wants to merge 1309 commits into
tetherto:mainfrom
Proletter:test/qvac-19143-verified-label-validation
Closed

test[notask]: validate verified-only label gate end-to-end#2178
Proletter wants to merge 1309 commits into
tetherto:mainfrom
Proletter:test/qvac-19143-verified-label-validation

Conversation

@Proletter

Copy link
Copy Markdown
Collaborator

Purpose

TESTING ONLY — DO NOT MERGE. Will be closed without merging.

Throwaway PR to validate the post-#2102 state, where the legacy verify label has been retired and verified is now the single label authorising:

  1. Secret-bearing CI jobs (via label-gate)
  2. The previously-verify-gated heavy validation paths
  3. The public-pr.yml merge-quality gate

What it touches

  • packages/inference-addon-cpp/README.md — single trailing comment line, triggers:
    • pr-test-inference-addon-cpp.yml (the inline authorize job gates run-unit-tests on verified)
    • pr-test-inference-addon-cpp-js.yml (same gate)
  • packages/registry-server/README.md — single trailing comment line, triggers:
    • pr-models-validation-registry-server.yml (label-gate action + authorize-pr action both check verified)
    • public-pr.yml (hard verified merge gate)

No source / config / workflow files are modified.

Test plan

Step Action Expected
1 Open PR with no label run-unit-tests jobs SKIPPED in pr-test-inference-addon-cpp*. validate-json / test skipped in pr-models-validation-registry-server. public-pr fails with PR is not verified.
2 Apply verified All previously-skipped gated jobs FLIP TO RUNNING. public-pr passes the verified-label check.
3 Apply (deprecated) verify on top No additional jobs trigger. No workflow re-runs. verify is a no-op label.

Caveats

  • Author (Proletter) is a member of qvac-internal-dev + qvac-internal-merge, so this PR will NOT exercise the "untrusted-fork synchronize-strip" or "apply-by-non-trusted-actor strip" paths in label-gate. Those need a non-trusted actor's PR — out of scope for this happy-path validation.

Disposition

Will be closed (not merged) once observations are recorded. Both README touches are revertible no-ops.

jpgaribotti and others added 30 commits April 29, 2026 13:36
…erto#1803)

Installs rustup on Windows when missing and runs `rustup target add` for the
matrix entry (iOS device+sim arm64/x64, Android arm64/armv7).
…urns (tetherto#1737)

* fix: skip kv-cache savedCount on cancelled or zero-token turns

When a completion was cancelled mid-decode (or returned zero tokens)
with `kvCache` enabled, the SDK was still recording
`history.length + 1` in `cachedMessageCounts`. On the user's next
prompt, `prepareMessagesForCache` would then slice the history against
that stale count and hand the model an empty payload, producing a
silent no-response. See QVAC-17780 for the repro.

Fix:
- Split the pure state + decision logic into a new
  `kv-cache-state.ts` module (no bare-* imports so it can be unit
  tested under bun).
- Add a per-model cancel counter; `cancel.ts` bumps it before calling
  the addon, and `completion()` snapshots it around
  `processModelResponse` to detect cancellation.
- `processModelResponse` now reports `producedTokens`; a new
  `shouldRecordSavedCount(wasCancelled, producedTokens)` helper gates
  both the custom-key and auto-keyed `recordCacheSaveCount` call
  sites. On cancel or zero-token turns, the entry is deleted
  instead of being poisoned.
- Harden `decideCachedHistorySlice` so that a `savedCount` which
  would slice the history to `[]` is treated as stale: fall back to
  sending the system-stripped full history and clear the bad count.
  Protects against any future path that records an off-by-one count.

* chore: drop ticket/PR references from kv-cache fix comments

Keep only the comments that explain intent or behaviour; the "why"
already lives in the commit log and PR body.

* fix: thread platform path.sep into clearCachedMessageCounts

The pure state module's "/" default was load-bearing only for unit tests
under bun. Runtime callers go through delete-cache.ts → completion-stream.ts,
where bare-path is available — inject `path.sep` there so directory-prefix
clears match correctly on every target (including Windows).

* fix: drop stale cache file + registry entry on cancelled custom-key turn

The custom-key cancel branch only cleared `cachedMessageCounts` while
leaving the on-disk cache file (and the in-memory `initializedCaches`
entry) in place. The addon writes the cache unconditionally on
`saveCacheToDisk` turns — including cancellations — so what's left on
disk holds partial decode state past the user prompt. Next turn would
then load that stale KV state on top of the new prompt and reply
incoherently.

Mirror what the auto-key branch already does for the same outcome:
unlink the cache file and clear the cache registry entry. Without the
registry clear, `customCacheExists` would still return true (it checks
the in-memory flag before the file) and the SDK would skip the system
re-prime, asking the addon to load a now-deleted file. Next turn now
re-primes cleanly — a one-turn perf hit, but no risk of corrupted KV
state.
infra: unify native prebuild workflows behind a single reusable workflow

Each of the 9 native-prebuild workflows (parakeet, onnx, nmtcpp,
diffusion-cpp, llamacpp-llm, llamacpp-embed, whispercpp, onnx-tts,
ocr-onnx) shipped its own ~200-line copy of the same generate / build /
install / strip / upload sequence, with subtle drift across them
(different matrix entries, different cmake defines, different artifact
prefixes, dead inputs). This made cross-cutting CI changes (action SHA
bumps, runner moves, security patches) costly and obscured the small
set of legitimate per-package differences behind boilerplate.

- Add `.github/workflows/reusable-prebuilds.yml`: a `workflow_call`
  reusable that owns the canonical 9-entry matrix (linux x64/arm64,
  android arm64, ios arm64 + sim arm64/x64, darwin arm64/x64,
  win32 x64), the standard step sequence (setup-build-host, checkout,
  setup-aws-prebuild, setup-vcpkg, setup-bare-tooling, setup-apple-clang,
  optional Vulkan SDK, npm install, optional Rust toolchain, compute
  defines, bare-make generate/build/install, strip `.a` + debug symbols),
  and the merge job that uploads the well-known `prebuilds` artifact.

- Convert all 9 prebuild workflows into thin (~45-line) wrappers around
  the reusable. Net effect across the 10 touched files: +436 / -1534.

- Inputs let each wrapper opt into legitimate per-package needs without
  forking the workflow:
  - `workdir`, `ref`, `repository`,
  - `artifact-name-prefix` (must match `PREBUILD_ARTIFACT_PREFIX`
    consumed by the mobile integration tests),
  - `linux-extra-packages`, `mac-brew-packages`,
  - `include-vulkan-sdk` (diffusion-cpp / nmtcpp / llamacpp-llm /
    llamacpp-embed / whispercpp),
  - `extra-cmake-defines` (verbatim) and `platform-cmake-defines`
    (per-platform overlay; used by whispercpp for `WHISPER_USE_METAL=ON`
    on darwin/ios only),
  - `setup-python-on-windows` (parakeet, onnx-tts),
  - `setup-rust-toolchain`, delegating to the new
    `setup-rust-prebuild` composite action (PR tetherto#1803, SHA-pinned)
    that bootstraps rustup on Windows when missing and runs the right
    `rustup target add` calls for ios / android matrix entries
    (used by onnx-tts).

- Centralise CMake defines: `BUILD_TESTING=OFF` is now applied globally
  in a dedicated `Compute cmake defines` step; `ANDROID_STL=c++_shared`
  is added there too. `VK_PROFILING=ON` is no longer a custom input —
  callers pass it through `extra-cmake-defines` when needed.

- Drop dead leftovers along the way:
  - `prebuilds-ocr-onnx.yml` no longer pulls OCR models from S3 nor
    uploads a `models` artifact: a repo-wide search shows zero
    consumers; downstream OCR flows fetch a different model set
    (`rec_dyn`) directly from S3.
  - The `tar-run-id` input on `prebuilds-qvac-lib-infer-onnx.yml` (and
    formerly on `prebuilds-ocr-onnx.yml`) was declared but never read
    or passed by any caller — removed.
  - The `Verify UIKit Frameworks` diagnostic step (present in 5
    workflows) was a one-off iOS bring-up check that no longer adds
    value — removed.

- Bump `actions/setup-python` from v5 to commit-pinned 6.2.0 in the
  wrappers that need it, clearing the Node.js 20 deprecation warning.
…on with folded errors and aggregated release notes (tetherto#1753)

* feat: add Nunjucks rendering pipeline, AI augmentation, release CI, and link validation

* fix: update remaining things

* fix: fix codeql issue

* fix: fix build issue

* fix: update review comments

* fix: update api ref generation

* fix: update review comments

* fix: update api ref generation

* fix: deleted the unused resolveViaTypeScript function

* fix: update api gen missing functions and values

* fix: revert sdk updates

* fix: update the api gen

* fix: update git pipeline

* fix: removed the dead code

* doc: collapse API reference into single-page summary per version with folded errors and aggregated release notes

* fix: fix code security alert issues

* fix: update the review comments

* fix: update the merge conflicts and docs-workflow

* infra: pin docs release pipeline to trigger commit via dual checkout to close concurrent-main race

* fix: update review comments
…ktop integration tests (tetherto#1785)

* QVAC-17892 fix[ci]: use refreshed base-memory bergamot models for desktop integration tests

Bumps S3 path date from 2025-12-18 -> 2026-04-28 for all four Bergamot
pairs (enit, esen, fren, enes) used by integration-test-qvac-lib-infer-nmtcpp.yml.

The 2025-12-18 snapshot held the `tiny` Bergamot variant for `enit` and
`esen` while every other lane (mobile CI, lib/bergamot-model-fetcher.js,
real SDK consumers) used `base-memory` from Firefox CDN. That mismatch
showed up in QVAC-17474 Phase B perf-reports as a 3pp chrF++ drop on
[Bergamot] [CPU] and a 33pp drop on [Pivot es->en->it] [CPU].

Olya re-uploaded all four pairs as `base-memory` (32MB v2.x from
firefox-translations-models) under the new 2026-04-28/ path. This commit
points the workflow at those bytes so desktop quality scores match
mobile.

Made-with: Cursor

* QVAC-17892 fix[ci,registry]: align workflow paths and registry manifest with refreshed bergamot bytes

Two follow-up corrections to the original 4-line workflow date bump:

- Workflow: revert bergamot-fren and bergamot-enes back to 2025-12-18/.
  Those two pairs were already base-memory (32MB) in the old snapshot
  and were not part of the 2026-04-28/ refresh. The previous 4-line
  bump succeeded only because the test's ensureBergamotModel() falls
  back to Firefox CDN at runtime when the dir is empty (silent fallback).
  Reverting makes the workflow honest about what exists in S3 and removes
  the runtime-fallback dependency. Net diff vs main: 2 lines (enit + esen
  only - the only pairs whose S3 bytes actually flipped variant).

- registry-server data/models.prod.json: align enit and esen entries
  with the refreshed S3 bytes:
    * 8 source-line date bumps (2025-12-18/ -> 2026-04-28/)
    * 8 link-line corrections (firefox-translations-models/tree/main/
      models/tiny/<pair> -> .../base-memory/<pair>)
  The link field previously claimed "tiny" but those uploads are now
  the base-memory variant per Olya's 2026-04-28 re-upload.

- registry-server client/NOTICE: hand-patch lines 158 and 168 to
  reflect the variant change (tiny -> base-memory) for enit and esen.
  Equivalent to what notice-generate would emit when re-run against
  the corrected manifest, but minimal and predictable diff.

Other Bergamot pairs keep their 2025-12-18/ S3 paths in both workflow
and manifest because those bytes have always been correct (base-memory).

Made-with: Cursor
…etherto#1728)

Co-authored-by: Yauheni Pankratovich <ypankratovich@wallarm.com>
Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
…ythonic, json) with override (tetherto#1802)

* feat: add native tool-call dialect routing (hermes, pythonic, json) with override

* doc: add native tool-calling pythonic example

* doc: add expected output examples to toolDialectSchema

* fix: address review feedback on native tool-call dialect routing
…o#1804)

* fix(ci): pull bare_console.log from iOS app sandbox on crash

When a test crashes on the AWS Device Farm iOS hosts (jetsam OOM kills,
SIGTRAP/SIGABRT, watchdog) the wdio `before` hook trips
`checkAppCrash` / `crashMonitor`, which call `process.exit(1)`
immediately. The wdio `after` hook never runs, and the C++ side
`bare_console.log` (where the BareKit / native side prints stdout) is
still sitting inside the application sandbox. Device Farm's
`Customer_Artifacts.zip` is sealed off the host filesystem only, so when
the test fails the C++ log is missing from the artifact tarball — which
is exactly the case Olya / Gustavo flagged on
https://github.com/tetherto/qvac/actions/runs/25101382346/job/73555220736.

Fix:

* iOS — hoist the existing `after`-hook `pull_file` block into a
  `global.flushBareLog(reason)` helper. `checkAppCrash` and
  `crashMonitor` now `await flushBareLog("crash-…")` (with a short
  `browser.pause(1500)` first, so any in-flight stdout has a chance to
  hit disk) before exiting. The `after` hook reuses the same helper on
  the normal completion path.

* iOS + Android — replace the bare `process.exit(1)` in
  `checkAppCrash` / `crashMonitor` with
  `setTimeout(function(){process.exit(1);},5000)`. This gives os_log /
  logcat / bare stdout a 5 s drain window before Appium is torn down,
  which on Android is enough by itself (logcat is captured by Device
  Farm directly) and on iOS bookends the Appium pull above.

Both WDIO configs were syntax-checked with `vm.Script` and the workflow
round-trips through PyYAML. The biggest `run:` block in the workflow is
now 18,408 chars, well under the 21,000-char GH Actions expression
budget.

This change was originally made on the
`feature-qvac-17830-vlm-perf-metrics` branch (so that PR's mobile delta
perf data would survive a fruit-plate crash) and is now extracted to a
standalone branch so it can land independently of the perf work.

Made-with: Cursor

* fix(ci): roll out bare_console.log crash flush to all mobile addons

The first commit on this branch fixed the missing C++ side log on iOS
Device Farm crashes for the llamacpp-llm workflow. The same wdio
crash-detection pattern is used verbatim across every other mobile
addon workflow (and they all currently lose bare_console.log on crash
in exactly the same way). This commit ports the fix to all of them so
the artifact tarball contains the C++ log regardless of which addon
crashes.

Workflows updated (8):

* integration-mobile-test-qvac-lib-infer-llamacpp-embed.yml
* integration-mobile-test-qvac-lib-infer-onnx-tts.yml
* integration-mobile-test-qvac-lib-infer-parakeet.yml
* integration-mobile-test-qvac-lib-infer-whispercpp.yml
* integration-mobile-test-decoder-audio.yml
* integration-mobile-test-diffusion-cpp.yml          (preserves the
                                                      generated-images
                                                      pull in `after:`)
* integration-mobile-test-qvac-lib-infer-nmtcpp.yml  (heredoc shape +
                                                      _healthInterval
                                                      crash machinery,
                                                      uses
                                                      browser.pullFile)
* integration-mobile-test-ocr-onnx.yml               (heredoc shape,
                                                      simple checkApp-
                                                      Crash, raw-HTTP
                                                      pull_file like
                                                      llm)

Each iOS WDIO config now hoists the inline `after`-hook bare_console.log
pull into a `global.flushBareLog(reason)` helper, calls it from
`checkAppCrash` (and `crashMonitor` where present) before
`process.exit(1)`, and wraps the exit in
`setTimeout(function(){process.exit(1);},5000)` so os_log + bare stdout
have time to drain. Android side gets the same 5 s drain wrap (logcat
is captured directly by Device Farm so no `pull_file` needed there).

Validation (run locally before push):

* All 18 patched WDIO bodies parse cleanly via vm.Script.
* All 9 mobile workflow YAML files round-trip through PyYAML.
* Biggest `run:` block in any workflow is now 18,408 chars (the LLM
  workflow, unchanged), well under the 21,000-char GH Actions
  expression budget.

The LLM workflow on this PR will be triggered via workflow_dispatch as
the regression smoke test — it's the most exercised mobile workflow and
the one where the original fix was developed, so a passing run there
confirms the rollout didn't break anything.

Made-with: Cursor

* fix(ci): bound iOS crash flushBareLog so process.exit always fires

@gianni-cor pointed out (PR tetherto#1804 review) that the crash-exit
setTimeout was registered AFTER `await global.flushBareLog(...)` in
both `checkAppCrash` and `crashMonitor`. If Appium/WDA was already
unhealthy after an app crash, the raw HTTP `pull_file` (or
`browser.pullFile`) call can hang indefinitely, so the
`setTimeout(function(){process.exit(1);},5000)` was never scheduled
and a fast crash turned into a long CI timeout.

Fix:

* Pre-register `setTimeout(function(){process.exit(1);},5000)` BEFORE
  any awaits on the crash path. This is the hard upper bound — even if
  `browser.pause` and the flush both wedge, libuv still fires the
  timer and exits the runner at T+5 s.

* Wrap the flush itself in `Promise.race` against a 3 s timeout so the
  await chain itself is bounded:

      try {
        await browser.pause(1500)
        await Promise.race([
          global.flushBareLog('crash-' + stage),
          new Promise(function (_, rj) {
            setTimeout(function () {
              rj(new Error('bare-log flush timed out'))
            }, 3000)
          })
        ])
      } catch (_) {}

  This keeps the flush best-effort: if WDA is alive we still get
  bare_console.log into Customer_Artifacts.zip; if it's wedged we
  abandon at T+3 s and the pre-registered exit fires shortly after.

Workflows updated (9):

* integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml
  (checkAppCrash + crashMonitor, plus a refreshed design-intent
  comment that explains the pre-registered exit invariant)
* integration-mobile-test-qvac-lib-infer-llamacpp-embed.yml
* integration-mobile-test-qvac-lib-infer-onnx-tts.yml
* integration-mobile-test-qvac-lib-infer-parakeet.yml
* integration-mobile-test-qvac-lib-infer-whispercpp.yml
* integration-mobile-test-qvac-lib-infer-nmtcpp.yml (checkAppCrash
  variant that also clears `_healthInterval` before exiting)
* integration-mobile-test-decoder-audio.yml
* integration-mobile-test-diffusion-cpp.yml
* integration-mobile-test-ocr-onnx.yml

Total: 10 call sites (8 `checkAppCrash` + 1 `crashMonitor` + 1
`_healthInterval`-clearing checkAppCrash).

Validation (run locally before push):

* All 18 patched WDIO bodies (9 workflows x Android + iOS) parse
  cleanly via vm.Script.
* All 9 mobile workflow YAML files round-trip through PyYAML.
* Biggest `run:` block in any workflow is now 18,912 chars (the LLM
  workflow, +504 chars from the Promise.race wrappers + refreshed
  comment), still well under the 21,000-char GH Actions expression
  budget.

Made-with: Cursor
…re SDK execution layer" (tetherto#1795)

* doc: content update - sdk - completion

* doc: content new - SDK - runtime lifecycle

* doc: update sidebar - add new page - runtime lifecycle

* doc: replacement for PR 1735 to be closed

* doc: fix broken link in completion page
…etherto#1811)

`getRPC()` was using `void tracked.finally(...)` to clear the
`inflightConnections` map entry once the connect attempt settled.
`Promise.prototype.finally(cb)` returns a *new* promise that re-rejects
with the original error on rejection — and `void` does not attach a
handler, so when `tracked` rejected (e.g. `PEER_NOT_FOUND` /
`DelegateConnectionFailedError`) the chain produced by `.finally()` was
an orphaned rejected promise.

The Bare worker's global `process.on("unhandledRejection")` handler
treats any unhandled rejection as fatal: it calls
`shutdownBareDirectWorker("unhandled-rejection")`, which runs
`cleanupDownloads()` (cancelling all in-flight downloads, including the
legitimate fallback-to-local download started from
`handleLoadModelDelegated`'s catch block) and `destroySwarm()` (poisoning
the DHT for every subsequent delegated call).

That sequence is exactly the regression reported in QVAC-18162:

- `delegated-load-model-fallback-local` fails with
  `Failed to load model: Download was cancelled` instead of falling back
  successfully — the fallback download is cancelled by `cleanupDownloads`.
- `delegated-heartbeat-provider`, `delegated-cancel-download`,
  `delegated-connection-failure`, ... all hang at
  `🌐 Waiting for DHT to fully bootstrap...` until their per-test timeout
  — `getSwarm()` still returns the destroyed swarm and
  `swarm.dht.fullyBootstrapped()` never resolves.

Fix: replace `void tracked.finally(clearInflight)` with
`tracked.then(clearInflight, clearInflight)`. This registers the cleanup
on `tracked` itself (not on a derived promise), so the rejection is
*observed* — no orphan chain is created and no unhandled rejection is
emitted. The caller still receives the original rejection through
`await withTimeout(inflight, options.timeout)` two lines below.

Regression introduced in tetherto#1729 (QVAC-18144). The topic-removal in tetherto#1729
is correct; this is a defect in the new `inflightConnections`
deduplication logic added alongside it.

A separate follow-up should harden the worker-level
`unhandledRejection` policy so a single leaked promise can no longer
take down the entire worker, but that change is broader than this fix.

Co-authored-by: Opanin Akuffo <46673050+opaninakuffo@users.noreply.github.com>
* feat: add mobile Parakeet RTF reporting

Run Parakeet RTF benchmarks through the mobile Device Farm workflow and combine desktop and mobile artifacts into a single report so cross-platform performance is visible in one place.

Made-with: Cursor

* fix: resolve mobile RTF benchmark shared module path

Allow the mobile benchmark entrypoint to load the shared benchmark helper from either the source test layout or the generated Device Farm backend bundle so test app packaging succeeds.

Made-with: Cursor

* fix: increase Parakeet mobile Device Farm timeouts

Give the mobile integration workflow enough time to finish the longer Device Farm test runs now that the RTF benchmark path is included, instead of being force-stopped at the 60 minute job timeout.

Made-with: Cursor

* fix: split Parakeet mobile perf and regular runs

Mirror OCR's mobile reporting approach by isolating the RTF benchmark into dedicated Device Farm perf runs while keeping the regular mobile suite separate, then only extract benchmark artifacts from the perf lane.

Made-with: Cursor

* fix: make Parakeet mobile split workflow portable

Replace bash mapfile usage with portable while-read loops for macOS runners and refresh AWS credentials before the long Device Farm monitor and log download phases so the split perf/regular workflow can run to completion.

Made-with: Cursor

* fix: honor mobile test filters in Parakeet addon

Make the addon-side mobile wrappers read testFilter.txt and skip non-selected tests with zero-failure summaries so the perf and regular Device Farm lanes can actually execute different subsets without requiring framework changes.

Made-with: Cursor

* fix: use shared Parakeet mobile perf pipeline

Made-with: Cursor

* fix: extend Parakeet mobile performance matrix

Made-with: Cursor

* fix: keep iOS Parakeet mobile perf on TDT

Made-with: Cursor

* fix: split Parakeet mobile perf cases by model

Made-with: Cursor

* fix: address Parakeet PR bot findings

Made-with: Cursor

* fix: restore Parakeet mobile perf matrix shape

Made-with: Cursor

* Revert "fix: restore Parakeet mobile perf matrix shape"

This reverts commit df12527.

* fix: quarantine iOS Sortformer GPU perf case

Made-with: Cursor

---------

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
…rto#1768)

* feat[api]: route responseFormat via per-request generationParams (QVAC-17939)

Adds an OpenAI-compatible `responseFormat` to `completion()`:
  - `{ type: 'text' }` (default, free-form)
  - `{ type: 'json_object' }` (any valid JSON)
  - `{ type: 'json_schema', json_schema: { name, schema, ... } }`

Forwards the schema to the addon as a per-request
`generationParams.json_schema`, which the addon converts to GBNF and
applies for the duration of the request only — avoiding the previous
shared-`modelConfig.grammar` mutation, which was unsafe under concurrent
completions and didn't actually flow per-request anyway.

`tools` and a non-text `responseFormat` are mutually exclusive at the
schema layer (tools already constrain output via their parameter
schema). Bumps `@qvac/llm-llamacpp` to ^0.17.1 for the new addon API.

Includes `examples/llamacpp-structured-output.ts` demonstrating all
three modes against `QWEN3_600M_INST_Q4`.

* review: address tetherto#1768 comments — events/final example + json_schema field docs (QVAC-17939)

- Migrate `examples/llamacpp-structured-output.ts` from the legacy
  `tokenStream` surface to the canonical `events` / `final` API
  recommended for new SDK code. Streaming now consumes
  `contentDelta` events and aggregates via `final.contentText`.

- Document `json_schema.description` and `json_schema.strict` as
  accepted-for-OpenAI-compatibility-only on `responseFormatSchema`
  via `.describe()` annotations, and on the `responseFormat` JSDoc
  in `client/api/completion-stream.ts`. Both fields are accepted
  by the schema but not forwarded to the addon —
  `getResponseFormatJsonSchema()` only forwards `json_schema.schema`,
  and `strict: true` does NOT trigger OpenAI's auto-tightening
  (implicit `additionalProperties: false`, all properties required).
  Callers wanting strict validation must encode it explicitly in
  `schema`. Honoring `strict` semantics natively is tracked as a
  follow-up — out of scope for this PR.

* review: drop double blank line in completion-stream.ts (QVAC-17939)
…moke (tetherto#1797)

* feat: add pre-terminate cleanup signal for SDK clients

Lets a client request a clean addon teardown before tearing the bare
runtime down, so addon static state (e.g. js_ref_t handles into the
worker V8 isolate) is released while that env is still alive.

Without this, tearing down a runtime whose addons retain
isolate-bound refs trips a V8 GlobalHandles assertion (brk 0 / SIGTRAP)
inside the next runtime that re-imports the same .bare files in the
same OS process — the JsLogger.setLogger path in
qvac-lib-inference-addon-cpp is the reproducer (every addon that
links it has the same retention).

- worker-core.ts: extract the existing shutdown body into a reusable
  cleanupForTerminate() that runs the same registry / model / resource
  cleanup but skips releaseWorkerLock() and process.exit(). The full
  shutdownBareDirectWorker still runs both for desktop signal and
  exit paths.
- handler-utils.ts + handle-request.ts: new internal __shutdown__
  message dispatched alongside __init_config. Bypasses the schema,
  awaits cleanupForTerminate(), and replies success. Lazy-imports the
  worker-core function to break the handler-utils -> worker-core ->
  create-server -> handle-request import cycle.
- bare-client.ts: mirror the message in the in-process mock RPC for
  desktop direct-mode (Pear-style) consumers.
- expo-rpc-client.ts: close() is now async; sends __shutdown__ over
  RPC and awaits the success reply (with a 10s timeout safety) before
  calling worklet.terminate(). Best-effort: timeouts log a warning
  and proceed with terminate. The auto-close path in unload-model.ts
  already awaits close(), so this is non-breaking for that caller.

* test: stabilise mobile smoke run via eviction-on-none and post-unload settle

Two related fixes that together let the mobile smoke run progress
past the "previous heavy model still resident" memory ceiling:

- resource-lifecycle: tests with dependency:none used to skip
  evictExcept and leave whatever was loaded by the previous test
  resident. Now treated as evictExcept([]), so a heavy model from
  the prior test gets unloaded before the next one starts allocating.
  Empirically this is what kept tripping sharded-model-load right
  after translation-afriquegemma-sw-en (afriquegemma 4B leaves ~550 MB
  resident; sharded then asks for multi-GB on top and hits the iOS
  memory limit).

- resource-manager: new ResourceManager({ unloadSettleMs }) option
  that sleeps for the configured duration after a successful
  unloadModel (only on success — failure path returns immediately).
  Lets the kernel release pages before the next load starts allocating.
  Defaults to 0 (off, desktop is fine without it). Mobile consumer
  opts in to 100ms.

Mobile consumer also picks up SkipExecutor entries for the
lifecycle-suspend tests; suspend hangs the runner indefinitely on
mobile because the lifecycle coordinator pauses MQTT and never
resumes within the test timeout.

* chore: bump qvac-test-suite to ^0.6.2

Picks up:
- in-app memory poller in mobile-consumer template
- desktop in-app memory poller (process-tree RSS)
- Memory tab + per-test memory metrics in HTML/JSON reports
- bucket results by metadata.category instead of testId-prefix split

Required by the eviction / settle work in this PR; both depend on
the new MemorySummary fields and the corrected category bucketing.

* fix: split cleanupRan and isShuttingDown so shutdown still releases lock

cleanupForTerminate previously set the same isShuttingDown flag that
shutdownBareDirectWorker uses as its early-return guard. After a
__shutdown__ message ran the pre-terminate cleanup, a subsequent
SIGTERM / SIGINT / uncaught-exception in desktop direct mode would
early-return at the guard and skip releaseWorkerLock() + process.exit().
Result: lock file leak and no graceful exit.

Mobile is unaffected because each Worklet has its own module instance
(fresh isShuttingDown per worklet). The bug only bites the bare-client
mock-RPC path (Pear-style consumers where the worker shares the host
process for its lifetime).

Two flags now:
- cleanupRan: idempotent guard around runCleanup body
- isShuttingDown: only set by shutdownBareDirectWorker; cleanupForTerminate
  must NOT set it
shutdownBareDirectWorker still calls runCleanup which is now a no-op
when cleanupRan is already true.

* fix: serialise expo-rpc-client.close() to avoid duplicate __shutdown__ races

If two callers race close() (or one calls close() while another getRPC()
is mid-flight), the second sees rpcInstance still set, fires a redundant
__shutdown__, then re-enters the terminate block on already-null state.

Wrap the body in a singleton closingPromise; concurrent callers share
the same in-flight close. Reset to null in finally so a fresh worker
brought up later can be cleanly closed again.

The auto-close path in unload-model.ts is naturally serialised today
so this is robustness rather than fixing an active bug, but the cost
is minimal and the failure mode (double __shutdown__ after terminate)
is annoying to diagnose.

* fix: skip Worklet.terminate() on non-iOS platforms

Worklet.terminate() crashes on Android: addon dlclose unmaps the lib
but pthread_key_t destructors registered by some addons (likely
rocksdb-native, libbare-tls, libbare-crypto) are never
pthread_key_delete'd before unload, so libc's per-thread cleanup table
points at unmapped memory and the next pthread_exit SIGSEGVs in
pthread_key_clean_all().

iOS dyld no-ops dlclose for already-loaded third-party libs, so the
dangling-destructor problem cannot manifest there. The terminate path
stays enabled on iOS.

On non-iOS, fall back to the legacy refs-only close: drop rpcInstance
and rpcPromise, leave workletInstance + workletInitialized intact so
the next getRPC() reuses the live worklet. Skip the __shutdown__
roundtrip too -- it would clear the worker plugin registry without a
follow-up terminate, leaving the worker unusable for subsequent
loadModel.

Trade-off: Android tests no longer recover memory between heavy tests
the way iOS now does, so memory accumulates across the smoke run. On
Pixel-class devices (8+ GB RAM) this is fine; smaller-RAM Android
devices may regress vs the pre-PR baseline. Acceptable until the
upstream holepunchto/bare exposes a per-addon unload hook.

Platform is resolved via the existing getRuntimeContext() path
(getDeviceInfo handles a missing expo-device safely via dynamic
import + try/catch), so no new react-native imports are added.

* test: skip diffusion-streaming-progress on mobile

The test reliably times out on mobile (Android Pixel 10 Pro hit the
600s timeout in the latest smoke run). Test framework drops the await
on timeout but the underlying streaming inference keeps running on the
Bare worker side, leaving the diffusion model "in use" from the
runtime's perspective.

Knock-on effect: any later test whose modelSetup needs to evict
diffusion (e.g. wrong-model-transcribe-on-llm via
ResourceManager.evictExcept) blocks indefinitely waiting for the
stream to finish. Observed in local-android-smoke: 86/88 tests
completed, then the runner stuck for 50+ minutes inside the eviction
of diffusion at test 86's setup.

Skipping unblocks the smoke run end-to-end. The proper fixes
(framework-side cancel-on-timeout, resource-manager bounded waits)
are tracked separately.
* fix[api]: deterministic decoding for LLM translate

Force greedy decoding with a fixed seed and bounded output length on every
LLM translate call (non-African branch) so output is reproducible across
calls and runaway generations cannot blow ctx_size on the next call.

Background: with @qvac/llm-llamacpp 0.17.x, calling `translate()` against
Salamandra (loaded with no decoding params) intermittently produced
verbatim source echo, "Translation in Spanish:" preambles, or
`processPromptImpl: context overflow` on tiny inputs like "bank". The
flake was non-deterministic across runs on the same input, masked in the
smoke suite by `contains-any` validators that still matched a Spanish
keyword inside a preamble.

The change is one call site: when the model is llamacpp-completion and
the prompt is not the AfriqueGemma path, pass per-call generationParams
overriding sampling for that runJob:
- temp/top_k/top_p collapse to greedy
- repeat_penalty: 1.3 breaks single-token echo loops
  (e.g. greedy "bank" -> "bank\nbank\n...")
- seed: 42 pins anything residual sampling
- predict: 256 caps output so a runaway can't accumulate KV state

Prompt template, NMT branch, and African branch are unchanged.
AfriqueGemma is loaded with its own deterministic config + stop_sequences
already, so we skip the override there.

Verified locally on @qvac/llm-llamacpp 0.17.1 with 30 calls
(streaming + en-es + context, 10 iterations each):
- before: 23/30 pass with 2 echoes, 2 ctx-overflow, 3 echoes
- after:  30/30 pass, all outputs identical across iterations

* refactor: extract LLM translate generation params into named constant

Pull the per-call sampling overrides for LLM translate out of the call
site into a top-of-file constant with a comment that explains the
purpose of each field. No behavior change — values are identical to the
previous commit.

Adding a third translate-friendly LLM model later still goes through
this single constant unless it needs different sampling, in which case
it would warrant a small profile lookup keyed on model family. That
restructure is deferred until a concrete second profile lands.

* refactor[api]: skip per-call sampling override for AfriqueGemma

Apply the per-call deterministic-decoding override only to non-AFRICAN_*
LLM models. AfriqueGemma's load-time `modelConfig` carries
`stop_sequences: ["\n"]` and `repeat_penalty: 1`, and these values must
not be overridden mid-call: with `repeat_penalty: 1.3`, the addon
penalises "\n" and the stop never fires, so generation runs all the way
to `predict` and produces non-translation output. The earlier attempt
to dispatch by `afriquePrompt` (language-pair-derived) silently did
nothing for the actual AfriqueGemma traffic: `isAfrican("sw")` returns
`false` because `AFRICAN_LANGUAGES_MAP` is keyed by FLORES codes
(`"swh_Latn"`), not the ISO codes the smoke tests pass.

This commit dispatches by model name (entry.local.name starts with
"AFRICAN_") and falls back to `model.run(input)` with no override —
identical to the pre-fix call shape — so AfriqueGemma's behaviour is
preserved exactly as it is on main. A latent AfriqueGemma garbage-output
issue exists at HEAD regardless of this PR; that is out of scope.

The constant is renamed `LLM_TRANSLATE_GENERATION_PARAMS` since it now
applies to every non-skipped LLM, not just Salamandra.

* refactor: tighten typing on per-call generation params

Pull `RunOptions` and `GenerationParams` from `@qvac/llm-llamacpp` and
use them in place of the loose `Record<string, number>` cast in
`translate()`. Define a `LlmTranslateGenerationParams` alias as the
specific subset of `GenerationParams` we set per call (six fields,
required) so a typo on any of them is a compile error. The cast on
`model.run.bind(model)` now references the addon's `RunOptions` shape
directly, which keeps us protected if the addon's option shape changes.

No behaviour change.

---------

Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>
…ios, raise c… (tetherto#1773)

* test[notask]: stabilize mobile e2e (skip afriquegemma on ios, raise chatterbox timeout)

* test[notask]: inactivity timeout bumps

* test[notask]: wait out addon busy throw in logging tests

* test[notask]: increase timeouts

* test[notask]: skip diffusion on ios, drop kv-cache math assertion, revert heartbeat to 300s

* test[notask]: bump parakeet-ctc-mp3 timeout for mobile cold-load

* test[notask]: fix mobile sentence-stream + parakeet-ctc-mp3 timeout + stop-sequences flake

* test[notask]: update types for tts executors

* test[notask]: bump mobile e2e timeouts (device-farm 90m, consumer 1200s)

* test[notask]: android skip diffusion tests

* test[notask]: revert consumer-inactivity-timeout input for mobile workflows
…at/completions (tetherto#1810)

* feat[api]: response_format support in OpenAI-compat /v1/chat/completions (QVAC-17939)

Wires OpenAI's `response_format` field through the CLI serve adapter:
  - `{ type: "text" }` (default, free-form)
  - `{ type: "json_object" }` (any valid JSON)
  - `{ type: "json_schema", json_schema: { name, schema, ... } }`

The adapter validates the shape, removes `response_format` from
`UNSUPPORTED_PARAMS`, and forwards it to the SDK's `responseFormat`
parameter, which converts the JSON Schema to GBNF in the addon and
applies it for the duration of the request only.

Depends on the SDK PR tetherto#1768 (`@qvac/sdk@>=0.10.0`) for the actual
constraint to take effect end-to-end. Without that landed, the SDK
silently ignores `responseFormat`.

Includes 11 new `extractResponseFormat` unit tests in
`test/translate.test.ts` covering valid shapes, optional fields,
and validation errors.

* chore: bump @qvac/sdk dep floor to ^0.10.0

CLI now consumes the per-request `responseFormat` API added in
qvac-sdk 0.10.0 (PR tetherto#1768 / QVAC-17939). Caret-pinned to ^0.10.0
to keep upgrades within the same minor line.

* chore: bump runtime MIN_SDK_VERSION to 0.10.0

Matches the package.json devDep floor bumped in the previous commit.
The runtime guard in serve/core/sdk.ts otherwise still accepted
SDK 0.9.x, which lacks the responseFormat parameter.

Addresses opaninakuffo's review on PR tetherto#1810.
infra: align cpp-lint with reusable-prebuilds and gate it on PR-changed files

Refactor the reusable cpp-lint workflow so its bootstrap is the same
as `reusable-prebuilds.yml`, and tighten the linters so the step
actually fails PRs on findings.

Toolchain bootstrap:
- Replace inline LLVM/Vulkan/vcpkg/Node setup with the shared
  composite actions used by prebuilds (`setup-build-host`,
  `setup-aws-prebuild`, `setup-vcpkg`, `setup-bare-tooling`,
  `setup-vulkan-sdk`).
- Drop divergent scaffolding no caller exercises: the ssh/https
  `git insteadOf` rewrites, the `@tetherto:registry` npmrc setup,
  the `~/.npm` cache, and the hardcoded `node-version: 18.x`
  override (default `lts/*` now matches prebuilds).
- Move `runs-on` from `ai-run-linux` to `ubuntu-22.04` to match
  prebuilds and decommission self-hosted usage for this job.
- Add an optional `include-vulkan-sdk` boolean input (default
  `true`) mirroring the same input on `reusable-prebuilds.yml` so
  ONNX/decoder addons can opt out in a follow-up.

Inputs / refs:
- Resolve `BASE_SHA` / `HEAD_REF` / `HEAD_REPO` up-front with
  documented fallbacks (`inputs.sha → github.event.before →
  HEAD~1`) so `workflow_dispatch` runs work without a PR context.
  The `sha` input is now optional; existing callers already pass it.

Lint scope and gating:
- Drop `bare-make build` / `bare-make test` (duplicated the Linux
  x64 prebuild work; ctest finds nothing without `BUILD_TESTING=ON`,
  and the gtest binary belongs to the dedicated `cpp-tests-*.yml`
  workflows).
- Keep `bare-make generate -D ENABLE_VULKAN=OFF` because clang-tidy
  needs the resulting `compile_commands.json`. `ENABLE_VULKAN=OFF`
  is honoured today only by `qvac-lib-infer-whispercpp` and is a
  no-op CMake variable elsewhere.
- Scope `clang-tidy` to the C/C++ translation units touched by the
  PR by intersecting `git diff $BASE_SHA HEAD` with the TU list
  emitted by `clang-tidy-helper.py --files`. Header-only and
  removed-source diffs short-circuit cleanly. Reuse the helper's
  `--clang-tidy-cmd` output for the `-p` build dir and
  `--header-filter` regex; only the file list is replaced.
- Pass `--warnings-as-errors='*'` to `clang-tidy-19` so every check
  enabled in the shared `qvac-lint-cpp/.clang-tidy` `Checks:` is
  promoted to an error and the step's exit code reflects findings
  (the shared config sets `WarningsAsErrors: ''`, which by itself
  would let clang-tidy print warnings and exit 0).
- Wire `clang-tidy` into the existing failure tally alongside
  `cpp_files_fmt`.

Toolchain pinning:
- Pin both linters to their LLVM-19 binaries explicitly:
  `git-clang-format-19 --binary clang-format-19` and
  `clang-tidy-19`. Removes the dependency on whichever unversioned
  `clang-format` / `clang-tidy` wins PATH resolution (Ubuntu's
  stock 18.x was being picked up before this fix). The `::error`
  annotation contributors copy-paste locally is updated to match.

cpp-lint remains advisory at the on-pr workflow level — no
`prebuild`, `cpp-tests*`, integration test, or ts-checks job lists
cpp-lint in `needs:`, so a failing cpp-lint doesn't block the addon
build/test path; merge-guard semantics are unchanged.
* fix Vulkan SDK installation verification

* run vulkaninfo on win11-rtx4000-hetzner

* add job build-win11-nvidia-image

* fix missing double quoted string value in poweshell

* run vulkaninfo on ai-run-windows11-gpu

* install vulkan sdk on ai-run-windows11-gpu

* fix typo

* fix typo

* correct vulkaninfoSDK.exe path

* Force WDDM Mode for Vulkan to work

* nvidia-smi -fdm 0

* install nvidia grid drivers instead of datacenter driver

* use Use .NET HttpClient with Progress Tracking

* fix httpclient instance creation for newer powershell

* dd-Type -AssemblyName System.Net.Http

* try wget

* test run vulkaninfo on ai-run-windows11-gpu

* installl git in win11-nvidia-grid-image

* disable inteeractivity when installing git with winget

* git version separate step after winget install Git

* test

* test

* install winget before winget install git

* install winget from PSGallery

* fix

* fix2

* use Cyberboss/install-winget@v1

* add --source winget to winget install

* add --custom '/o:PathOption=CmdTools' to winget installation of git for windows

* do not verify git installation in path in same job where it gets installed (needs new job run)

* verify that git is in the path for ai-run-windows11-gpu

* uncomment needed lines

* Install Chocolatey on win11-nvidia-grid-image

* s/pwsh/powershell/

* verify choco installation

* verify vcpkg

* clone vcpkg

* vulkaninfo check

* install visual studio build tools, LLVM, and Vulkan SDK in base image for win11 gpu runner

* do not install vulkan sdk on base win11 gpu image

* rename vulkaninfo to win11-nvidia-image-builder.yml

* add vulkaninfo.yml
…etherto#1796)

* doc: content update - sdk - completion

* doc: content new - SDK - runtime lifecycle

* doc: update sidebar - add new page - runtime lifecycle

* doc: content update - SDK - diffusion - add img2img gen

* doc: replacement for PR 1735 to be closed

* doc: new code example - SDK - img gen img2img with flux2

* doc: fix broken link in completion page

* doc: sdk - create new example - img2img with klein

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
…herto#1812)

* fix: handle Harmony <|call|> EOG token for GPT-OSS tool calling

GPT-OSS models use <|call|> as a frame delimiter in Harmony tool-call
protocol. This token is in the EOG set, causing generation to stop
silently before tool calls reach the SDK.

Add Harmony model detection and <|call|>-specific handling in the
generation loop: render the token as visible text (special=true) so the
SDK can parse frame boundaries, then stop generation for the turn-based
tool execution protocol.

* Example added to showcase GPT-OSS mutli-turn tool call

* CHANGELOG added, package version bumped to 0.19.1

* CPP lint applied

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
* doc: create new page - sdk - new ai capability - voice assistant

* doc: create new page - sdk - voice assistant

* doc: remove temporary doc that leaked into commit

* doc: content new - sdk - voice assistant - PR review
…etherto#1834)

GitHub only registers `workflow_dispatch` for workflow files that live
on the default branch. The full benchmark orchestrator is wired in
PR tetherto#1677 (QVAC-17092) but cannot be triggered from the Actions UI or
`gh workflow run` until the file exists on `main`.

This stub adds the file with a no-op job so the workflow becomes
dispatchable. Dispatching with `--ref <branch>` runs the *branch's*
version of the YAML, which is how we'll validate PR tetherto#1677's
orchestrator end-to-end before merging it.

The stub will be overwritten by PR tetherto#1677's real workflow on merge.
* test[notask]: fix android sharded-model-resume scudo oom

Tag sharded-model {detection, hash-validation, progress, resume,
cancellation} with dependency "sharded-embeddings" instead of "none".
With dep:none and the default loadSharded handler, modelSetup evicts
sharded-embeddings then immediately re-mmaps the same 5 shards; on
Android (Pixel 10 Pro) Scudo's mmap fails with "internal map failure"
before the kernel reclaims the prior maps, killing the worklet and
cascading the rest of the sharded category.

Also bump mobile unloadSettleMs 100 -> 200 ms to keep some slack for
remaining same-model unload / reload paths.

* infra[notask]: pass --report-dir to CI sdk producer runs

Without --report-dir, BatchOrchestrator skips writing app-mem.ndjson
and test-timeline.ndjson, so mobile in-app memory samples published
on qvac/app-memory get dropped silently and the per-test memory rows
/ chart / suite peak never make it into the report. run:local already
passes --report-dir; the three CI workflows did not.

Pin --report-dir=./reports in test-android-sdk.yml, test-ios-sdk.yml
and test-desktop-sdk.yml. The existing "Upload results" step already
uploads ${working-directory}/reports/ so the new files ride along.

---------

Co-authored-by: Victor-Rodzko <victor.rodzko@itrexgroup.com>
Co-authored-by: Opanin Akuffo <46673050+opaninakuffo@users.noreply.github.com>
…se/workaround bare double creation issue, harden addon-cpp workflows (tetherto#1825)

* add JS integration CI for inference addon

Run addon JS integration packages across desktop targets so callback lifetime and platform-specific runtime issues are exercised in PR checks.

* fix output callback lifetime cleanup

Keep callback state alive until the libuv async handle is closed so pending JS output delivery cannot observe freed addon storage during teardown.

* add JS number creation integration test

Add a minimal integration package that exercises first double creation through js::Number and keeps js_create_int32 as a control across the JS test matrix.

* work around Windows js_create_double first call

Route addon double creation through js::Number so win32 can burn the first js_create_double call observed to produce an invalid value on GitHub Azure runners.

* version

* fix Windows double burn-in guard

Use a function-local static initializer for the Windows-only js_create_double burn-in. This keeps the workaround process-wide without carrying template-level atomic state before returning the requested double value.

* harden PR JS test workflow

Limit the addon-cpp JS PR tests to read-only repository access and avoid release secrets, inherited secrets, and persisted checkout credentials. Install fixture dependencies without lifecycle scripts so PR-controlled package code cannot run during setup.

* fix PR JS workflow trust boundary

Run addon-cpp JS tests from the unprivileged pull_request workflow so PR-controlled code is not executed from pull_request_target. Keep a lightweight verify-label gate for external forks while leaving the privileged workflow focused on authorization and native tests.

* harden addon-cpp PR workflows

Move PR-controlled native test execution out of pull_request_target so cache restore and builds run without write-capable credentials, inherited secrets, or PAT-backed checkouts.

Keep pull_request_target limited to clearing the verify label when new commits arrive, and make unverified PR events skip the native/JS matrices without failing the workflow.

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
…flow_dispatch input (tetherto#1824)

Co-authored-by: Matt Cavanagh <1789097+darkynt@users.noreply.github.com>
Co-authored-by: tamer-hassan-tether <tamer.hassan@tether.io>
…to#1849)

The publish step now uses npm trusted publishing (OIDC), so the legacy
NPM_TOKEN auth in the build step's .npmrc and bunfig.toml is no longer
needed. @qvac/* packages on npmjs.org are public, so anonymous reads
work for installs.

Aligns with tetherto#1618 (DEVOPS-2062) which migrated the publish step to
trusted publishing but left the build-step NPM_TOKEN references in
place.
… CMakeLists (tetherto#1852)

The version in CMakeLists.txt kept drifting from vcpkg.json (currently
1.1.4 vs 1.1.5) but was never consumed: no downstream package calls
find_package(qvac-lib-inference-addon-cpp <version> ...) — they all use
find_path() to locate headers. Remove the project VERSION and the
generated ConfigVersion.cmake so vcpkg.json stays the single source of
truth.

Bump vcpkg.json version to 1.1.6, as it drifted from the version in the
registry.
…dk publish doesn't auto-skip (tetherto#1853)

publish-npm needs [build, publish-logic, release-merge-guard]. On a
manual workflow_dispatch from a release-sdk-* branch, the guard's
if: rejected the event (push only), so the guard was skipped, and
GitHub Actions' implicit success() check on needs auto-skipped
publish-npm before its if: with the explicit
needs.release-merge-guard.result == 'skipped' branch could even be
evaluated.

Allow the guard to run on workflow_dispatch too. The guard already
handles workflow_dispatch safely: github.event.before is empty, so
base-sha is empty, so isInitialPush is true and the changelog diff
check is skipped. The branch-name pattern check and the
package.json-version-matches-branch check still run, which is what
we want for a manual release publish.

Net effect: manual publish-sdk dispatches on release branches now
actually reach the publish-npm job instead of silently skipping.
yuranich and others added 25 commits May 19, 2026 19:01
…rto#2120)

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
…etherto#2115)

Add `qvac-collabora` to the default `teams` input of the `label-gate`
composite action so that Collabora-authored PRs (e.g.
tetherto#2088 from @zoq) do not deauthorise on every
`synchronize` event from the PR author.

Without this team, every force-push from a Collabora contributor on a
`verified`-labelled PR triggers the action's "synchronize from
non-trusted sender" path, which strips the label and denies. The
operational result is that CI stops re-running on each push until an
internal Tether engineer manually re-applies the label, which has been
slowing down Collabora's iteration loop and was raised as a blocker on
PR tetherto#2088.

`qvac-collabora` is the broad Collabora contributor team; everyone who
ships against this repo from Collabora's side is in it, so a single
entry is sufficient to unblock the synchronize path. No companion
`-merge` tier is added — the existing `qvac-internal-merge` /
`-release` teams continue to gate the merge / release flows on the
Tether side.

The action's policy logic is unchanged; only the default team list
grows. All 58 existing tests pass.

Updates the supporting docs (`docs/ci/TEAMS.md`, `docs/ci/LABELS.md`,
`label-gate/README.md`) and the `devops-why-my-pr-not` diagnostic
skill so the new trusted set is discoverable.
…etherto#2097)

* chore: bump qvac-lib-inference-addon-cpp to 1.2.0 across addons

Update addon package vcpkg constraints from 1.1.7#1 to 1.2.0 to align on the latest lint-cpp checks and provide access to TransparentStringMap for string api conventions.

* Added vla to the addon list

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
…i browser (tetherto#2119)

* doc: infographic - bug rendering in Safari browser

* doc: infographic - bug rendering in Safari browser
…sumers (tetherto#2049)

Replace the per-definition preLoadUnload warm-up (which back-to-back loaded
and unloaded each "model with companion" through the SDK plugin at bootstrap)
with a recursive pre-download walk over every ModelConstant a contributing
definition references — root `constant` plus any constant nested inside
`config` (whisper VAD, chatterbox s3gen, parakeet decoder/vocab/preprocessor,
diffusion VAE/LLM/upscaler, bergamot pivot, vision projection, etc.).

Why
- The warm-up serialized a load+unload cycle for every flagged def at
  consumer bootstrap. On iOS this kept enough resident memory across
  cycles to be a contributor to memory pressure during ggml-tts model
  loads.
- It also overlapped with what each test's own `ensureLoaded` already
  does, so it was redundant for "did this addon link correctly" once
  the SDK is wired up.
- And it only pre-downloaded the root constant per def — companion
  models (whisper VAD, chatterbox s3gen, etc.) silently downloaded
  lazily on first test, inconsistent with the "all pre-cached" log line.

What
- ResourceManager.downloadAllOnce now discovers ModelConstants by walking
  `constant` + resolved `config` recursively, dedupes by `modelId`, and
  downloads everything in parallel (same Promise.allSettled shape).
- `skipPreDownload: true` still excludes a whole def. A constant
  referenced by both a skipped def and a non-skipped def is downloaded
  once via the non-skipped one.
- Removes `preLoadUnload` from the ModelDefinition interface and from
  every desktop/mobile consumer entry that used it.

No SDK API surface change; this is tests-qvac infra only.
…es (tetherto#2089)

Two related changes to the SDK's `peerDependencies` block, both motivated
by the same TD-HOLEPUNCH-PEER-DEPS follow-up effort: keep the peer block
to what the SDK actually owns, and stop shipping duplicate copies of
stateful Holepunch / bare-* singletons in consumer trees.

Drifted ranges bumped:
- compact-encoding ^2.19.0 -> ^3.0.0  (major dedup: 17+ nested ^3.x copies
                                        across hyperswarm/corestore/hypercore/
                                        hyperdb/hyperblobs/hyperdht/hyperschema/
                                        hyperdispatch/protomux/... collapse to
                                        a single top-level; silences the
                                        `npm ls invalid` warning on every
                                        consumer install)

Redundant peer entries removed:
- bare-dns       (transitive via bare-net; SDK doesn't import it; no need
                  to re-declare a floor on what bare-net already pins)
- bare-http1     (transitive via bare-fetch; same reasoning)
- bare-https     (transitive via bare-fetch; removing entirely instead of
                  bumping ^2.1.3 -> ^3.0.0 — once SDK stops declaring its
                  own floor, bare-fetch's ^3.0.0 wins and the duplication
                  resolves)

Moved out of peerDependencies (still listed where SDK source needs them):
- events       -> devDependencies    (only used by `.d.ts` type stubs;
                                       @types/node already provides the
                                       EventEmitter type; consumers don't
                                       need the npm `events` package at
                                       runtime — Node has it built-in and
                                       Bare maps it to bare-events)
- tar-stream   -> dependencies       (used in server/utils/archive.ts; no
                                       other consumer in the install tree
                                       so no dedup concern from making it
                                       a hard dep)

llm-splitter KEPT as a required SDK peer. @qvac/rag declares it as an
*optional* peer in its own manifest, which means npm skips installation
unless --include=optional is passed. Removing the SDK-level required peer
broke the LLMChunkAdapter path at runtime (caught by E2E smoke: rag-
embeddings-small-chunks, rag-large-document-32kb, rag-medium-document-10kb
all failed with "Required dependency missing: llm-splitter is required
for LLMChunkAdapter"). The SDK-layer required peer is what guarantees
the package is installed in consumer trees. Revisit if @qvac/rag ever
promotes llm-splitter from optional to required, or bundles it as a hard
dep.

bare-subprocess v6 still deferred for the same reason as before: the only
v6 consumer in the SDK tree today is bare-link@3.2.2 (5-day-old release
at original audit time), and bare-runtime@1.x is still firmly on ^5.0.0.
Revisit when bare-runtime catches up.

This is the first step of the broader peer-deps cleanup effort tracked
in the QVAC SDK & Platforms Asana project. Future steps may consider
moving compact-encoding out of peers entirely and auditing the remaining
optional peers, but those are out of scope here.

No source changes; package.json metadata only.

Validation:
- `bun install` in packages/sdk -> clean.
- `bun run build` -> clean.
- Dedup check via `/tmp/sdk-dedup-check.sh` against packed SDK -> ✅
  no Holepunch singletons duplicated.
- E2E smoke catch on initial revision (no llm-splitter) confirmed the
  removal was wrong; reverted.

Co-authored-by: Cursor <cursoragent@cursor.com>
…o#2134)

The package was scaffolded directly at 0.2.0 but never published to npm.
Rewinding to 0.1.0 so the public npm history starts at the proper initial
release. Folds the substantive user-facing items from the old 0.2.0 entry
(return shape, input validation, GGUF guards, integration test) into a
single initial-release CHANGELOG entry; drops internal refactor /
dep-bump rationale that doesn't belong in a first-release changelog.
…on crashes (tetherto#1815)

* fix: improve mobile download robustness and prevent unhandled rejection crashes

* test: remove test-only changelog entry from llm-llamacpp

---------

Co-authored-by: tamer-hassan-tether <tamer.hassan@tether.io>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
…2128)

* feat[mod]: add wan 2.1 video generation models to registry

* chore[mod]: changed model links to HF

---------

Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>
…erto#2108)

* chore[notask]: remove unused npm: aliases from llm-llamacpp

os, process, tty, and util were aliased to bare shims but never imported
anywhere in the package source, tests, or examples.

* chore[notask]: remove npm: aliases from remaining packages

Removes unused or unnecessary npm: module aliases from tts-onnx,
tts-ggml, bci-whispercpp, transcription-whispercpp,
transcription-parakeet, langdetect-text-cld2, and registry-server.

Most aliases (fs, os, tty, util, stream, worker_threads, path, module,
buffer, crypto, http, https, readline) were never imported in any Bare
context. registry-server is a Node.js server (engines: node >=20) so
bare shims in dependencies were incorrect entirely.

tts-onnx and tts-ggml unit tests used require('process') via the alias;
updated to require('bare-process') directly, which is already an
explicit devDependency in both packages.

* chore[notask]: restore aliases needed at runtime in Bare

Restore "util": "npm:bare-utils" in tts-onnx, tts-ggml,
transcription-parakeet, and transcription-whispercpp — sinon requires
'util' internally and these packages run their unit tests under
brittle-bare.

Restore "module": "npm:bare-module" in langdetect-text-cld2 — index.js
uses `import { createRequire } from 'module'` which requires the alias
in Bare.

* chore[notask]: replace sinon with manual stubs and import bare-module directly

Remove sinon from tts-onnx, tts-ggml, transcription-parakeet, and
transcription-whispercpp — replace every sinon.stub / sinon.spy call
with direct property assignment (instance stubs) and a small
module-level spy helper that wraps the prototype method with a counter.
This eliminates the only remaining reason those four packages kept
"util": "npm:bare-utils" in devDependencies, which is now removed along
with sinon itself.

Change langdetect-text-cld2/index.js to import createRequire from
'bare-module' instead of 'module', and declare bare-module explicitly in
dependencies. This removes the last "module": "npm:bare-module" alias.

All unit tests pass across all five packages after the changes.

* chore[notask]: fix review findings from sinon removal

Remove dead _origValidateModelFiles captures that triggered no-unused-vars
lint errors, drop stale ?.restore?.() guard calls from integration tests
that became no-ops once sinon stubs were replaced with plain prototype
assignments, add bare-process to transcription-whispercpp devDependencies
explicitly, and tighten the reload-spy assertion to t.is for clarity.

* chore[notask]: address code review findings

langdetect-text-cld2: add engines restriction (bare>=1.0.0), bump to
0.4.0, add CHANGELOG entry, and update README to remove stale Node.js
compatibility claims. The import of createRequire from bare-module makes
this package explicitly Bare-only; prior docs implied Node.js support.

tts-onnx, tts-ggml: capture MockedBinding.prototype.runJob once at
module level so spyRunJob() always wraps the pristine original rather
than stacking on previous wrappers. Eliminates cross-test prototype
contamination when a test throws before restore() is called.

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Rename 23 repo skills under .cursor/skills/ to qv-* directories and
slash commands, update cross-references in rules and _lib scripts, add
qv-skill-list catalog, and align .agent orchestrate/release skill refs.
…therto#2138)

* chore: Add parakeet softformer model to qvac registry

* remove deprecated parameter

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
…herto#2141)

The release pipeline was missing the native-prebuild build step. Without
it, integration tests across all 5 platforms could not load
`libqvac__classification-ggml.{so,dylib,dll}` (ADDON_NOT_FOUND), and the
npm publish step would have shipped a tarball with no `prebuilds/`
directory.

This mirrors the working `on-merge-vla.yml` pattern (and matches what
`on-pr-classification-ggml.yml` already does for PR validation):

  * Add a `prebuild` job that reuses
    `.github/workflows/prebuilds-classification-ggml.yml`.
  * Make `run-integration-tests`, `mobile-integration-tests`,
    `publish-gpr`, and `publish-release-npm` depend on `prebuild`.
  * Gate `publish-gpr` and `publish-release-npm` on
    `needs.prebuild.result == 'success'` so a prebuild failure does not
    cascade into a broken npm publish.
  * Have `publish-gpr` and `publish-release-npm` download the merged
    `prebuilds` artifact produced by `reusable-prebuilds.yml` before
    invoking the publish action.

Repro of the original failure:
https://github.com/tetherto/qvac/actions/runs/26159115931 -- all five
`run-integration-tests` matrix legs and both mobile legs fail with
`ADDON_NOT_FOUND`; `publish-release-npm` then also fails (separately --
new package needs npm-scope grant; tracked outside this PR).
…odel registry (tetherto#2046)

* feat(sdk): expose diffusion_fa in sdcppConfigSchema

Adds diffusion_fa to sdcppConfigSchema so callers can explicitly
control per-transformer flash attention. The addon enables this by
default (required for FLUX.2 to avoid materialising the full Q·Kᵀ
attention matrix); the field is a no-op escape hatch for backends
that don't support ggml_flash_attn_ext.

The plugin's ...rest spread already forwards it to the native layer;
no plugin changes required.

* fix(sdk): remove flux_flow from prediction enum

flux_flow (FLUX.1) was never a supported model family — only flux2_flow
(FLUX.2) is. Remove the stale enum value so the SDK schema matches the
diffusion addon surface.

* fix(sdk): simplify diffusion_fa description and add unit test coverage

Shorten the describe() string to match the terse style of adjacent
boolean fields. Add diffusion_fa to the "accepts valid full config"
fixture in sdcpp-plugin.test.ts so the field has schema-parse coverage.

* test(sdk): add diffusion_fa E2E test to tests-qvac

Adds a dedicated 'diffusion-fa' resource in the desktop consumer loaded with diffusion_fa: true, a matching executor method that calls ensureLoaded('diffusion-fa'), and a test definition 'diffusion-fa-accepted' that generates a 256x256 image through the full SDK -> plugin -> addon path, confirming the field is accepted and forwarded without breaking inference.

* test(sdk): remove misleading comment from diffusion-fa resource

* test(sdk): add rejection tests for flux_flow and diffusion_fa type; fix E2E test name and remove redundant preload

Add two missing schema rejection tests: non-boolean diffusion_fa and the removed flux_flow prediction value. Rename diffusion-fa-accepted to diffusion-fa-loads-and-runs to match what the test actually verifies (load + generate, not FA effect). Remove preLoadUnload from diffusion-fa resource — it reuses the same Flux2 model files as the diffusion resource, so the extra load+unload at bootstrap is redundant cost.

* feat[mod](sdk): add Gemma4-E2B/E4B/31B, Qwen3.5-0.8B/2B/4B/9B, Qwen3.6-27B/35B-A3B to SDK registry

* fix[notask]: bump @qvac/diffusion-cpp to ^0.8.0

* test(sdk): prove diffusion_fa:false override path end-to-end

Unit test verifies sdcppConfigSchema preserves false through parsing (not
just rejects non-booleans). E2E adds diffusion-fa-disabled resource with
diffusion_fa:false and a matching test so the full SDK→plugin→addon path
is exercised for the opt-out case, not just the addon default.

* fix(sdk): replace hardcoded HF URLs with registry constants; add qwen35/gemma4 dialect E2E tests

Examples llamacpp-tools-qwen35 and llamacpp-tools-gemma4 were using raw
HuggingFace URLs as fallback defaults because the registry had not yet been
seeded with Qwen3.5 and Gemma4 models. Now that those constants exist
(QWEN3_5_0_8B_MULTIMODAL_Q8_0, GEMMA4_2B_MULTIMODAL_Q4_K_M), use them
directly, matching the pattern of all other SDK examples.

Adds tools-qwen35 and tools-gemma4 resources to the desktop consumer and
two dialect-specific E2E tests (tools-simple-function-qwen35,
tools-simple-function-gemma4). PR tetherto#1974 wired toolDialect and resourceKey
through ToolsExecutor and createToolsTest specifically to enable these tests
once constants were available.

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
…-parakeet (tetherto#2142)

* Add android dynamic backend loading

* Disable failing GPU

* Bypass gpu smoke test for android

* Fix cpp lint

* Address review comments
…posite action (NMT/OCR/LLM) (tetherto#1913)

* QVAC-18168: refactor NMT/OCR/LLM mobile integration tests onto shared composites

Refactors the mobile integration-test workflows for translation-nmtcpp,
ocr-onnx, and llm-llamacpp onto a new set of composite actions under
.github/actions/run-mobile-integration-tests/ (setup, build-mobile-app,
upload-to-devicefarm, schedule-test-run, monitor-test-run,
collect-and-upload-logs, extract-addon-perf, combine-perf-reports,
comment-on-pr).

Supporting changes:
- scripts/perf-report/extract-from-log.js: strip RUN_NAME prefixes so
  combined perf reports show clean device names across single, sharded
  (Perf/Regular), and dual-flagship (iPhone17/Samsung) configurations.
- packages/translation-nmtcpp/scripts/provision-mobile-models.sh: new
  provisioning script for NMT mobile tests (Bergamot + IndicTrans);
  writes indictrans-model-urls.json to match the addon's runtime loader.
- packages/llm-llamacpp/test/mobile/test-groups.json and
  packages/ocr-onnx/test/mobile/test-groups.json: refactor test groups
  for sharded scheduling under the new composite-driven runner.
- on-pr-llm-llamacpp.yml, on-pr-translation-nmtcpp.yml,
  on-pr-ocr-onnx.yml: minor fallback hardening so workflow_dispatch
  inputs work when pull_request context is unavailable.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mobile-llm): move Extend Android Timeouts step after build-mobile-app

The Extend Android mobile test timeouts step patches the generated
e2e/tests/app.test.js. That file is generated by build-mobile-app
via 'npm run build', so the step has to run AFTER build, not before.
Previously it ran between setup and build, hit ENOENT on readFileSync
and exited 1.

Also adds an existsSync guard so the failure message points at the
real cause if it ever moves again.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(mobile-composites): move LLM-only perf-extract + Android timeout into shared composites

Adds two optional inputs so any addon can opt in with one flag:

- build-mobile-app: android-per-test-timeout-minutes. Patches the
  generated e2e/tests/app.test.js (timeout: 1200000 + the matching
  '20 minutes' string) to the requested ceiling on Android.
  LLM passes '30'; NMT/OCR omit it (no-op).

- extract-addon-perf: merge + unzip-customer-artifacts. merge=true
  threads --merge through to extract-from-log.js so multi-group
  per-device payloads union (needed for VLM image-*.test.js).
  unzip-customer-artifacts=true expands Customer_Artifacts.zip in
  the log dir before scanning so markers buried inside
  bare_console.log become visible.

Defaults preserve previous behaviour: NMT/OCR pass neither input,
so the new build step is skipped and extract-from-log.js is invoked
with identical args to before (just an EXTRA_ARGS variable rename).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mobile-upload-to-devicefarm): write temp files under $RUNNER_TEMP, not /tmp

Persistent self-hosted runners (qvac-ubuntu*-x64) keep /tmp across jobs,
so the fixed-name files we were writing —

    /tmp/wdio-after-snippet.js
    /tmp/extra-pre-test.sh
    /tmp/extra-post-test.sh
    /tmp/wdio.config.devicefarm.js

— could already exist from a prior job under a different uid, and a
follow-up job's `>` redirection then failed with `Permission denied`.
Caught on NMT Android in https://github.com/tetherto/qvac/actions/runs/25859918425/job/75996556235
("Upload to Device Farm" step). OCR's Android leg only happened to
pass because it landed on a runner with a clean /tmp.

Fix: write to $RUNNER_TEMP (GHA-default, per-job, current-user-owned,
auto-cleaned) and export the four paths so the python heredoc, base64
encoder, and generate-testspec.sh all read the same files. The script
keeps a /tmp/* fallback for direct local-dev invocations.

Behaviour-compatible with NMT/OCR/LLM (all three pass empty values for
the after-snippet / extra-pre/post-test inputs today; only the directory
moved).

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat[api]: surface model name + GPU in LLM perf report (QVAC-17830 follow-up)

Olya pointed out (Slack, 14 May) that the LLM benchmark report didn't
tell reviewers which model produced each row, nor which GPU the run
used. The GPU was already collected by `_detectGpu()` on every reporter
boot — we just weren't rendering it in the per-job Step Summary. Model
was genuinely missing everywhere.

What changed
- `scripts/test-utils/performance-reporter.js`: accept `extra.model` on
  `record()` and stash it on the entry alongside `scenario` and
  `execution_provider`. Renderer-side fallback is `-` so reports from
  call sites that don't set it stay unchanged.
- `scripts/perf-report/render-step-summary.js`: subtitle now includes
  `| GPU: <name>` when `device.gpu` is populated. Detail table grows a
  `Model` column only when at least one row in the report carries a
  model id, so non-LLM addons render identically to before.
- `scripts/perf-report/utils.js`: aggregator now mirrors `model` onto
  the categorical map for `(device, test)`. Both the Markdown and HTML
  per-device detail tables grow a `Model` column conditionally, scoped
  per-scenario so a scenario with no model ids skips the column
  entirely.
- `packages/llm-llamacpp/test/integration/_perf-helper.js`: forward
  `extra.model` from `recordPerformance()` into the reporter, and
  include it in the mobile inline fallback's entry shape +
  `writeToConsole` delta slice so iOS/Android Device Farm logs reflect
  the model on every chunk.
- `packages/llm-llamacpp/test/integration/_image-common.js`:
  `setupMultimodalInference()` now returns `{ inference, modelName }`
  (`.gguf` stripped). `runImageRecognitionTest()` threads it into the
  perf record.
- `packages/llm-llamacpp/test/integration/bitnet.test.js`: pass
  `BITNET_MODEL.name` (without `.gguf`).
- `packages/llm-llamacpp/test/integration/tool-calling.test.js`: pass
  `modelVariant.modelName` (without `.gguf`) on both the batch and
  followup rows.

Why this is safe
- Renderer changes are conditional — non-LLM addons and legacy reports
  produce byte-identical output to `main` (verified with synthetic
  fixtures for both vision + tts addon types).
- `extract-from-log.js` already preserves unknown row fields verbatim
  via `base.results.push(row)`, so mobile delta reconstruction picks
  up the new `model` field with no code change.
- `_VISION_DETAIL_COLUMNS` is untouched — Model is inserted as a
  pre-column header, not in the metric column list, so the existing
  per-stage timer plumbing (Vision Enc / Img Prefill follow-up in
  QVAC-18103) stays orthogonal.

Pre-existing lint findings on `utils.js` / `render-step-summary.js`
(camelcase, brace-style, multiline-ternary) are pre-existing on `main`
and not from this change — confirmed by re-linting `origin/main` files
before/after.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(mobile-monitor): port Device Farm queue/run visibility + per-job deep dive into shared composite (QVAC-18943)

PR tetherto#2072 added per-slot queue vs run timing, a periodic per-job deep
dive, a legend, and a polished first-poll attach notice — but scoped
the change to the OLD monolithic integration-mobile-test-llm-llamacpp.yml
because that workflow is what was live on main. Its own description
called the broader rollout out by name:

  "Scope still limited to integration-mobile-test-llm-llamacpp.yml;
   broader rollout under QVAC-18168."

This commit IS that broader rollout. Drops the same logic into the
shared monitor-test-run composite so every addon (NMT, OCR, LLM, and
the 7 mobile workflows still to migrate) gets the same diagnostics
the moment they pick up the composite:

- Legend group printed once at startup explaining every status,
  result, counter abbreviation, deep-dive field and transition
  notice. Wrapped in ::group:: so it collapses by default.

- Per-slot transition tracking via PREV_STATUS[] / RUNNING_AT[]
  bash arrays (cleaner than the eval-based STATUS_$i style we'd
  have inherited verbatim from PR tetherto#2072). Two ::notice:: lines
  surface at the top of the job UI:
    * <state> -> RUNNING after Qs   slot acquired a device
    * queued Qs | executed Xs | total Ts   slot finished
  Every other transition (PENDING_CONCURRENCY, SCHEDULING ->
  PREPARING, ...) goes to a plain echo to keep notice density low.

- Polished first-observation: emits
    🔌 Monitor attached: Run N starting from state=<cur> (t=<s>s)
  instead of the misleading "<empty> -> RUNNING after 0s" arrow.

- Periodic per-job deep dive every 180s, wrapped in ::group:: so it
  collapses. For every slot in run-level RUNNING, calls
  `aws devicefarm list-jobs` and dumps each child Job's
  device/state/result/counters/started/message. This is what
  exposes device-pool starvation (PENDING_DEVICE), app-install
  hangs (PROCESSING), and per-test progress (counters=PFES of T)
  live, without waiting for the 2h CI cap.

Operational cost: ~0.06 RPS extra against the Device Farm API in the
worst case (all 12 slots in RUNNING simultaneously), well under the
account quota. Polling is fully out-of-band relative to the phones
running the tests, so test execution time on-device is unaffected.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mobile-setup): respect PR tetherto#2021 — skip github-hosted-only setup on self-hosted runners

PR tetherto#2021 (infra: align CI workflows with self-hosted runner strategy)
moved every mobile-test Android job onto qvac-ubuntu*-x64 self-hosted
runners. Those runners ship pinned Node + bare runtime + Expo CLI
pre-installed on PATH and don't have /usr/share/dotnet,
/opt/hostedtoolcache, /usr/local/lib/android/sdk/ndk, etc.

That PR gated three steps in the monolithic LLM mobile workflow:
  - "Free up disk space" — removed entirely (no-op on self-hosted)
  - "Setup Node.js" — gated to iOS only (skipped on Android self-hosted)
  - "Install global dependencies" (@expo/cli) — gated to iOS only

When QVAC-18168 extracted the setup phase into the shared composite,
those gates were lost. The composite was running setup-node + npm
install -g on every job, which on qvac-ubuntu*-x64:
  - clobbers the pre-installed Node PATH (breaks bare runtime resolution)
  - writes to a global node_modules dir the runner user may not own
  - re-downloads ~1 GB of Expo CLI we don't need on Android (Android
    uses Gradle, not Expo)

Reinstate the PR tetherto#2021 gates inside the composite, but use the
generic `runner.environment == 'github-hosted'` check rather than
`matrix.platform == 'iOS'`. That keeps the behaviour identical for
today's matrix (iOS = github-hosted macOS, Android = self-hosted
ubuntu) while staying correct if a future addon adds a github-hosted
Android matrix entry or moves iOS to a self-hosted Mac.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mobile-build): preemptively gate iOS sudo + global-install ops for self-hosted Mac (PR tetherto#2021 policy)

Same self-hosted-runner policy as the previous setup/action.yml gate,
but applied to the four iOS-side privileged ops in build-mobile-app:

  - `brew install ninja`               → gate to github-hosted only
  - `sudo xcode-select -s /Applications/Xcode_*.app` (×3 fallbacks)
                                       → gate to github-hosted only
  - `sudo gem install cocoapods`       → gate to github-hosted only

iOS lives on macos-14 (github-hosted) today so these still run as
before. The reviewer flagged that the self-hosted-runner migration
will reach Mac next ("soon for mac and ios as well; end goal is to
have no runners running as root / admin"). With these gates the
composite is already correct for that day — `runner.environment !=
'github-hosted'` skips every sudo/brew/gem call and trusts the
pre-installed pinned toolchain.

Each gated step now has a self-hosted twin that does a loud Verify*
on PATH (ninja, xcodebuild + iOS SDK, pod). A misconfigured future
Mac runner therefore fails at the verify step with a precise error
instead of silently breaking the build halfway through.

No behaviour change on today's github-hosted macos-14 — same steps
run, same sudo calls happen there.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(mobile-composites): drop redundant PR tetherto#2021 policy explainer comments

Step names already self-document the gate (e.g. "Install Ninja
build tool (iOS, github-hosted only)" / "Verify Ninja on PATH (iOS,
self-hosted)") and the `if:` shows the exact predicate. The
preceding 5–6-line policy paragraphs added no information that wasn't
already conveyable from those two lines.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mobile-setup): remove workspace cleanup from composite — breaks checkout-first consumer pattern

PR tetherto#2107's 'rm -rf $GITHUB_WORKSPACE' step works in monoliths
because it runs as the very first step, before any checkout. In
our composite architecture the consumers check out the composite
actions + addon BEFORE calling the setup composite, so nuking
$GITHUB_WORKSPACE inside the composite deletes the composite
YAMLs + addon source that are already checked out, causing:

  Can't find 'action.yml' under .github/actions/run-mobile-integration-tests/...

The consumer's own actions/checkout (with clean: true default)
already gives us a fresh workspace. Removing the step.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
…etherto#2137)

* feat[api]: add Sortformer v2.1 + AOSC streaming diarization support

Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive
API only; no breaking changes).

## 🎯 What problem does this PR solve?

- v1 Sortformer streaming uses a fixed-size sliding-history window; once
  a speaker goes silent long enough to roll out of the window, their
  slot identity drifts onto a different physical voice when they return.
- Continuous single-speaker stretches collapse all voices onto
  `sortformer_0` once two speakers have been seen, breaking live
  speaker-tagged transcripts.
- v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in
  parakeet-cpp, but until now there was no way to consume it from the
  JS layer.

## 📝 How does it solve it?

- Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg
  bump in PR tetherto#156 pulls in PRs tetherto#22 / tetherto#24 of qvac-ext-lib-whisper.cpp).
- Plumb 6 AOSC knobs (`streamingSpkCacheEnable`, `streamingSpkCacheLen`,
  `streamingFifoLen`, `streamingChunkLeftContextMs`,
  `streamingChunkRightContextMs`, `streamingSpkCacheUpdatePeriod`) from
  JS through `ParakeetConfig` -> `ParakeetModel` /
  `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`,
  for both the in-process Mode-3 streaming path and the duplex
  `runStreaming()` processor.
- v2.1 is auto-detected by the engine via the GGUF metadata tag
  `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's
  NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled).
- Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline
  default. Both GGUFs remain registered.
- New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a
  CLI flag for A/B comparison against the v1 sliding-window path.

## 🧪 How was it tested?

- Built locally against a vcpkg overlay pointing at the PR tetherto#156 branch;
  addon compiled cleanly with all 6 new AOSC field references through
  `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`,
  and `JSAdapter.cpp`.
- Full integration suite: **37/37 tests pass, 72/72 assertions in 145s**
  (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer).
- New `test/integration/sortformer-aosc-streaming.test.js` covers
  default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to
  the v1 sliding-window code path. Confirmed via engine logs that the
  override actually disables the cache (`Sortformer AOSC enabled` line
  only prints when AOSC is active).
- v1 Sortformer desktop integration + GPU smoke tests still pass -- no
  regression to the existing diarization path.

## 🔌 API Changes

New optional fields on `ParakeetConfig`, mirrored as per-call overrides
on `StreamingRunConfig`. All default to parakeet-cpp's NeMo-port
tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on
non-Sortformer engines (no-op forwarding is safe).

```typescript
import { TranscriptionParakeet } from "@qvac/transcription-parakeet";

const model = new TranscriptionParakeet({
  files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" },
  config: {
    parakeetConfig: {
      streaming: true,
      streamingChunkMs: 2000,
      // AOSC (v2.1+ only; auto-detected via GGUF metadata)
      streamingSpkCacheEnable: true,         // default
      streamingSpkCacheLen: 188,             // long-term cache rows
      streamingFifoLen: 188,                 // warmup FIFO rows
      streamingChunkLeftContextMs: 80,       // ~1 encoder frame
      streamingChunkRightContextMs: 560,     // ~7 encoder frames
      streamingSpkCacheUpdatePeriod: 144,    // FIFO-overflow pop count
    },
  },
});
```

## Depends on

- qvac-registry-vcpkg tetherto#156 (parakeet-cpp 2026-05-20 bump). CI will not
  resolve the new `version>=` constraint until that PR merges.
- Separate registry-server PR for the v2.1 GGUF entry in
  `models.prod.json` (out of scope for this PR -- handled independently).
- Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the
  GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`).

## Follow-up (separate PR, not in scope here)

SDK adoption (`@qvac/sdk` schema + plugin + example) lands in a
separate PR after this addon is published and the v2.1 GGUF entry has
synced into `sdk/models/registry/models.ts`. The SDK needs both pieces
in place before its schema can meaningfully forward AOSC knobs.

* chore[notask]: address review — setup-models v2.1 + CHANGELOG [Unreleased]

Two reviewer follow-ups on the v2.1 + AOSC PR:

1. `npm run setup-models` now fetches + converts v2.1 sortformer.
   - download-models.sh: new `sortformer-streaming-v2.1` type pulling
     from
     https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1/resolve/main/diar_streaming_sortformer_4spk-v2.1.nemo
   - convert-nemo.sh: matching type maps .nemo ->
     `diar_streaming_sortformer_4spk-v2.1.${q}.gguf`.
   - `--type all` (default) now includes the new type, so
     `npm run setup-models` stages v2.1 alongside the other models.
   - convert-nemo-to-gguf.py: surgically picked up PR tetherto#24's variant
     emission (the `detect_sortformer_variant(ckpt)` helper +
     `writer.add_string("parakeet.model_variant", ...)` call) without
     touching local qvac divergences (vendored attribution header,
     descriptive docstrings, `--quant f16` default, and the
     huggingface_hub import-error helper). The C++ engine's strict
     v2.1 detection now matches on `parakeet.model_variant ==
     "sortformer-streaming-v2.1-aosc"` instead of falling back to
     the encoder-shape heuristic.
   - Verified end-to-end locally: `bash scripts/convert-nemo.sh
     --type sortformer-streaming-v2.1 --quant q8_0 --force` produces
     models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf and the
     resulting GGUF carries `parakeet.model_variant =
     "sortformer-streaming-v2.1-aosc"` (confirmed via gguf reader).

2. CHANGELOG entry moved under `## [Unreleased]`; version bumps in
   package.json + vcpkg.json reverted to 0.4.0. The release PR will
   promote `[Unreleased]` -> `[0.5.0]` and bump the versions then.

* fix[notask]: pin parakeet-cpp to 2026-05-20#1 to avoid orphan tree

The registry's parakeet-cpp.json lists both 2026-05-20#0 and
2026-05-20#1 (PR tetherto#156 introduced both port-versions in its two
commits before squash-merging). vcpkg's minimum-version-selection
picks #0 when the manifest says `version>=: 2026-05-20`, but the
#0 git-tree is orphaned by the squash merge -- unreachable from
main, so `git fetch HEAD` doesn't pull it in. CI fails with:

  fatal: failed to unpack tree object 91a6fc169003b70dcc66b82ca8d1d23445343127
  note: while loading parakeet-cpp@2026-05-20

Pinning `version>=: 2026-05-20#1` skips the orphan and resolves
to the actual port content on main (tree 69619b43...). Matches
the existing `qvac-lint-cpp >= 1.4.4#3` precedent in the same
file.

Local clean build (no overlay, no cached registry) succeeds.

* cpp lint format

* Bump version

---------

Co-authored-by: Pratik Narola <pratiknarola@Mac.bbrouter>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Co-authored-by: GustavoA1604 <gustavogefa@hotmail.com>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
* fix: drop SDK peerDependencies; enforce in CI

Completes the peer-deps cleanup trajectory started in tetherto#2089 by removing
the peerDependencies + peerDependenciesMeta blocks from @qvac/sdk
entirely and adding a CI gate that asserts the invariant holds on
every SDK pod PR.

Policy:
- dependencies: every package directly imported by SDK source.
- devDependencies: build / typecheck / lint-only modules.
- No peer declarations -- host owns anything SDK doesn't import.

CI gate (.github/workflows/pr-checks-sdk-pod.yml, sdk-only):
- Packs the tarball and installs it into a fresh consumer dir.
- Fails on any ERESOLVE / `npm warn peer` line on install.
- Fails if any of corestore / hyperswarm / hyperdrive / hyperdb /
  hyperblobs / hyperdht resolves to more than one copy.
- Smoke-imports @qvac/sdk and asserts >= 50 named exports.

* fix: split SDK conditional modules into optionalDependencies

Promote the 7 packages that are runtime-conditional (per-platform or
per-feature) from devDependencies into optionalDependencies:

- @modelcontextprotocol/sdk    (MCP, only if hosting an MCP server)
- bare-link                    (Bare-only linker shim)
- compact-encoding             (pin to ^3 to dedupe Holepunch v3 tree)
- expo-device                  (Expo runtime only)
- expo-file-system             (Expo runtime only)
- pear-pipe                    (Pear runtime only)
- react-native-bare-kit        (RN/Expo runtime only)

Why not regular dependencies:
- Backend consumers (npm install --omit=optional) get a 182-package
  tree instead of 790; mobile/Pear consumers get the plug-n-play
  default with all 7 auto-installed.

Why not peerDependencies:
- npm 7+ auto-installs peers and emits ERESOLVE on range drift, which
  is the exact failure mode this PR is fixing for Keet.

Validated:
- Keet-shape repro (cross-worker-bare-kit@^2 + @qvac/sdk): 0 ERESOLVE.
- Default install: 0 peer warnings, 7/7 optionals present.
- --omit=optional: 0 peer warnings, lean tree, Holepunch invariant
  still holds (single copy of corestore / hyperswarm / hyperdrive /
  hyperdb / hyperblobs / hyperdht / react-native-bare-kit).

* test[ci]: add --omit=optional install gate to SDK consumer check

The default-install gate validates the plug-n-play path (all optionals
present); it does not exercise the lean backend path that consumers
get with `npm install --omit=optional`. That path was implicitly trusted
when we adopted optionalDependencies in the previous commit.

Refactor the inline gate into a check_consumer() helper and call it
twice:

1. Default install (plug-n-play): all 7 optionalDeps installed. Catches
   Keet-style ERESOLVE from optional deps' peer ranges colliding with
   other deps. Validates mobile/Pear consumer profile.

2. Lean install (--omit=optional): no optionalDeps. Catches
   (a) backend-required packages accidentally classified as optional,
   (b) SDK entry-point eagerly importing an optional module.

Both scenarios run the same three assertions:
- No ERESOLVE / `npm warn peer` lines.
- Single copy of corestore / hyperswarm / hyperdrive / hyperdb /
  hyperblobs / hyperdht.
- Smoke import yields >= 50 named exports.

Validated locally: lean install yields 385 named exports (well above
threshold), Holepunch invariant holds, 0 peer warnings. Adds ~60s of
CI per SDK pod PR.

* update dev deps

* fix: address review — restore expo-build-properties; revert bare-subprocess major bump

- Add expo-build-properties to optionalDependencies. withQvacSDK wires it
  by string into the Expo plugin chain, so dropping it entirely broke
  plug-n-play for Expo consumers (the consumer-install gate runs in a
  Node consumer dir, so it didn't catch the missing Expo plugin module).
  Keeping it optional preserves the previous behavior while letting
  Node-only backends skip it via --omit=optional.

- Revert bare-subprocess to ^5.2.3 (was bumped to ^6.0.0 by bun add).
  v5→v6 is a major bump for what is only a dev-time shim consumed by
  scripts/bare-bootstrap.js; staying on the prior major avoids dragging
  drift into the dev tree and keeps NOTICE accurate.

Validated:
- bun install + bun run build clean.
- Consumer install gate (default + --omit=optional): both green,
  Holepunch invariant holds, 385 named exports.

* chore: drop hyperdb + hyperblobs from SDK dependencies

Neither package is imported by SDK source — they were guarantor pins
for @qvac/registry-client, which declares both as non-optional
peerDependencies. npm 7+ auto-installs non-optional peers, so the
consumer install graph is unchanged: hyperdb and hyperblobs still
resolve to a single copy in both default and --omit=optional installs,
satisfied by registry-client's own peer ranges.

Aligns with the "declare what you import, nothing else" policy.

Validated:
- bun install + bun run build clean.
- Default install (790 pkgs): single copy of corestore / hyperswarm /
  hyperdrive / hyperdb / hyperblobs / hyperdht.
- --omit=optional install (182 pkgs): same.
- 0 peer warnings in both scenarios; 385 named exports.
…o#2102)

The verified label already gates every secret-bearing workflow via
label-gate (108 workflows since QVAC-18612). The legacy verify label was
still in use on five paths for non-secret heavy CI and a per-package
merge assertion, forcing reviewers to apply two labels for the same
trust ceremony. Collapse onto verified everywhere.

- public-pr.yml merge gate now reads verified.
- public-reusable-npm.yml integration step now reads verified.
- pr-test-inference-addon-cpp.yml + -js.yml replace their bespoke
  "verify must be freshly applied" dance with a verified-presence check
  that still denies on fork synchronize (pending label-gate strip in
  sibling pull_request_target workflows). Trusted same-repo pushes now
  re-trigger automatically instead of requiring re-labelling.
- pr-test-inference-addon-cpp-verify.yml deleted; its sole purpose was
  to strip verify on every push, which would actively conflict with
  label-gate's verified strip policy.
- pr-models-validation-registry-server.yml comment refreshed; its
  authorize-pr invocation picks up the new default.
- authorize-pr composite action default flipped from verify to verified.
  Affects 17 consumers that all already pair authorize-pr with a
  label-gate job requiring verified, so the change removes the
  double-label awkwardness for fork PRs without altering the trust
  model.
- Description strings on six on-pr-*.yml workflow_dispatch inputs and
  two integration-mobile-test comments updated for consistency
  (run_verify variable kept to avoid breaking dispatch scripts).
- docs/ci/LABELS.md collapses the deprecated verify row and expands
  the verified section to cover the broader scope.
- devops-why-my-pr-not SKILL.md C6 row drops the verify-deprecation
  caveat.

Validation:
- 58/58 label-gate unit tests pass.
- actionlint issue count unchanged (30) across the five edited critical
  workflows; every remaining warning is pre-existing shellcheck noise
  in the PowerShell/CMake matrix steps.
- yaml.safe_load round-trips every modified workflow.
- Grep for remaining verify-label references in .github/ returns only
  the human-facing run_verify workflow_dispatch input names (kept) and
  the unrelated qvac verify CLI subcommand bats test.

Behavioural changes worth flagging:
1. inference-addon-cpp heavy tests now re-run on every trusted push to
   a verified PR (previously needed a remove+re-add label dance).
   Bounded by the existing paths filter.
2. The github label verify itself is NOT deleted by this PR; run
   gh label delete verify --repo tetherto/qvac after merge so in-flight
   PRs with the legacy label aren't surprised.
Throwaway test PR to validate that after retiring the legacy `verify`
label (tetherto#2102), the `verified` label is the single authorisation gate
across the previously-`verify`-gated surfaces.

Touches `packages/inference-addon-cpp/README.md` and
`packages/registry-server/README.md` (only) to trigger:

- pr-test-inference-addon-cpp.yml (authorize step gates run-unit-tests)
- pr-test-inference-addon-cpp-js.yml (authorize step gates the matrix)
- pr-models-validation-registry-server.yml (label-gate + authorize-pr)
- public-pr.yml (verified is a hard merge gate)

Will be closed without merging once the gate behaviour is confirmed:

1. Pre-label: gated jobs SKIPPED, public-pr fails with "PR is not verified"
2. Apply `verified`: gated jobs FLIP TO RUNNING, public-pr passes
3. Apply (deprecated) `verify`: NO additional jobs run; verify is a no-op

Co-authored-by: Cursor <cursoragent@cursor.com>
@Proletter Proletter requested review from a team as code owners May 21, 2026 09:34
@Proletter Proletter added the verified Authorize secrets / label-gate in PR workflows label May 21, 2026
@Proletter Proletter closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

verified Authorize secrets / label-gate in PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.