[DRAFT] QVAC b8828 by zoq · Pull Request #2088 · tetherto/qvac

zoq · 2026-05-15T18:19:46Z

Pull qvac-fabric to PR 126 via per-package overlay ports, and add the addon-side adjustments needed against the rebased fabric API.

… bootstrap (tetherto#1899) * test: pre-load multi-file model companion sets on bootstrap Switch chatterbox/supertonic/parakeet/vision/diffusion to preLoadUnload so loadModel() at bootstrap fetches every companion file (encoder, decoder, vocab, projection, etc.) — otherwise they were lazily fetched inside the first test, which caused the tts-chatterbox-short-text Android flake (5 ONNX files + ONNX Runtime cold init blew through the 600s test watchdog). Add async config resolver to ResourceManager so chatterbox can resolve its referenceAudioSrc from the bundled RN asset registry at bootstrap time; cached per-dep. Remove the now-obsolete patchChatterboxReferenceAudio workaround from MobileTtsExecutor. Extend the iOS transcribe() skip list to catch the call sites that slipped past the ^transcription- regex (config-reload-then-transcribe, error-transcription-failed). Co-authored-by: Cursor <cursoragent@cursor.com> * test: extract shared resolveBundledAssetUri helper Pull asset resolution + file:// stripping out of the duplicated copies in mobile/consumer.ts and mobile/executors/model-asset-executor.ts into a single mobile/asset-uri.ts helper. Both sites now delegate, so future changes to expo-asset handling live in one place. Tighter idioms in the helper itself: regex strip instead of substring(7), ?? instead of ||, no mutable let. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: clean stale @qvac/sdk snapshots before consumer install Add `clean:sdk-snapshot` to wipe the cached @qvac/sdk copies left over in tests-qvac/node_modules and the iOS/Android consumer build dirs by previous `npm install --install-links` runs. Wire it into `install:build:full` so a full rebuild always pulls a fresh SDK snapshot after `prepare:sdk` rebuilds the SDK itself. Co-authored-by: Cursor <cursoragent@cursor.com> * test: dispatch mobile TTS tests by metadata.dependency Mobile MobileTtsExecutor used a string-prefix heuristic that mapped every `tts-supertonic-*` test to `tts-supertonic`, so the new `tts-supertonic-multilingual` resource was preloaded but never exercised by `tts-supertonic-multilingual-text`. Switch to `test.metadata?.dependency` to match the desktop TtsExecutor. --------- Co-authored-by: Cursor <cursoragent@cursor.com>

- Bump packages/sdk/package.json: 0.10.1 -> 0.10.2. - Add packages/sdk/changelog/0.10.2/{CHANGELOG.md, CHANGELOG_LLM.md}. - Rebuild root packages/sdk/CHANGELOG.md aggregate with v0.10.2 at top. Hotfix release for the delegated-inference connection regression introduced in v0.10.0 (tetherto#1934). NOTICE file unchanged — no dependency changes since v0.10.1. (cherry picked from commit a4f7225)

* Add tts-ggml workflows * Add explicit file-level permissions to tts-ggml workflows Addresses GitHub Advanced Security findings on PR tetherto#1946 flagging the 4 benchmark stubs and the two integration workflows for the CodeQL actions/missing-workflow-permissions rule: every workflow file should declare a least-privilege `permissions:` block at the top so that any job added later inherits read-only by default instead of the implicit read/write GITHUB_TOKEN. - benchmark-{chatterbox,performance,rtf,supertonic}-tts-ggml.yml: add top-level `permissions: contents: read` plus a job-level mirror on the noop body (which only writes to stdout). - integration-test-tts-ggml.yml: add top-level `permissions: contents: read; packages: read` matching the existing job-level scope on `run-integration-tests`. - integration-mobile-test-tts-ggml.yml: add top-level `permissions: contents: read`; the existing `build-and-test` job continues to widen this to packages:read + pull-requests:write + id-token:write for the prebuild artifact pull and Device Farm hooks. The other six tts-ggml workflows (cpp-test-coverage, create-github-release, on-merge, on-pr, on-pr-close, prebuilds) already had top-level permissions declared and are untouched. Co-authored-by: Cursor <cursoragent@cursor.com> * Pin mobile prebuild downloads to current run-id (CodeQL artifact poisoning) Addresses CodeQL alert tetherto#796 (`actions/artifact-poisoning`) on PR tetherto#1946 flagged on the two `Download {Android,iOS} prebuilds (from artifacts)` steps in integration-mobile-test-tts-ggml.yml. The rule fires because the workflow has a `workflow_dispatch` entry point and downloads an artifact via `actions/download-artifact@v8` without an explicit `run-id`. Without that input, CodeQL has to assume the artifact could come from any prior run on the branch (including one uploaded by a fork PR's prebuild step) — which is the poisoning surface we are explicitly NOT exposing. Setting `run-id: ${{ github.run_id }}` (the action's existing default) plus `github-token: ${{ secrets.GITHUB_TOKEN }}` makes the trust boundary explicit at the call site so the analyzer can see the artifact is current-run only. No behavioural change: in `workflow_call` from on-pr-tts-ggml the parent workflow already produced the artifact in the same run, and `workflow_dispatch` falls into the `!inputs.package_spec` -> npm-pack branch since `package_spec` defaults to `@qvac/tts-ggml@latest` for that trigger. The four `actions/missing-workflow-permissions` findings on the benchmark stubs are addressed by the previous commit (ef8d4e2). Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

@lauripiisang

…o#1915) * QVAC-18524 fix[api]: avoid Node-only Buffer in RN duplex RPC path On RN/Hermes the bare-rpc duplex stream polyfill calls `Buffer.from(chunk, encoding)` for string writes, which throws `Property 'Buffer' doesn't exist` because Hermes has no global `Buffer`. This blocks every transcribeStream() call on iOS/Android. - expo-rpc-client: pre-encode the JSON payload to Uint8Array via TextEncoder so the polyfill's binary branch is taken everywhere. - rpc-client: drop Buffer.from in the profiled response generator, widen DuplexWritable/DuplexReadable to accept Uint8Array/string. - transcribe API + transcription schemas: widen all TranscribeStream*Session.write(audioChunk) signatures from `Buffer` to `Buffer | Uint8Array` so RN callers can pass Uint8Array directly. - tests-qvac shared runner: stop wrapping Uint8Array slices in Buffer.from before writing. - tests-qvac mobile consumer: skip transcribe-stream-events-* on iOS under the existing QVAC-18460 TODO (same native Silero/Whisper crash path as transcription-*). - tests-qvac.mdc: add a one-liner rule about avoiding Node-only globals in shared/mobile test code. Co-authored-by: Cursor <cursoragent@cursor.com> * test: narrow duplex write() type to Uint8Array Replace the `Buffer | Uint8Array` parameter unions on `TranscribeStream*Session.write()` and `DuplexWritable.write()` with plain `Uint8Array`. Node `Buffer` extends `Uint8Array`, so existing `Buffer`-passing callers keep typechecking, but the public surface no longer mentions a Node-only type. Per review feedback from @lauripiisang. --------- Co-authored-by: Cursor <cursoragent@cursor.com>

… paths (tetherto#1950) * fix: add info-level logging to ocr-onnx load and inference paths * fix: add config and stats info logging to ocr-onnx for full addon parity * test: add unit test for info-level addon logging * chore: bump ocr-onnx to 0.4.5 --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…o#1917) * infra: add scheduled SDK install-check pipeline on main * fix: update the review comments * infra: surface npm lifecycle script output via foreground-scripts in install-check

…etherto#1942) * doc: add architecture manifesto and principles to docs/architecture * doc: drop internal North-Star/OKR/Google Doc references for public repo

tetherto#1908) * infra: auto-decide npm dist-tag so backports don't clobber latest * fix: update the review comments

…g page (tetherto#1935) * doc: examples changed path in sdk/examples * doc: CLI - doctor - fix issue + create new page troubleshooting * doc: CLI - doctor - fix issue + create new page troubleshooting * doc: fix sitemap generation in staging env * doc: frontend - make it possible to have ToCs using less subsections headings * doc: add tip - requested by reviewer * doc: add tip - requested by reviewer --------- Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>

…etherto#1932) * fix: route bare-crypto and bare-fetch through imports map * fix(rag): harden crypto and fetch shims Remove uuid-random and avoid mutating global crypto for ID generation. Require secure randomness with globalThis.crypto or #crypto fallback. Add crypto-browserify as an optional peer and clarify crypto/fetch errors. --------- Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>

* QVAC-17990 Add standalone ESRGAN upscaler API * QVAC-17990 Address standalone upscaler review feedback * QVAC-17990 Raise error when upscaler thread detection fails * QVAC-17990 Share diffusion backend loading * QVAC-17990 Honor cancel during ESRGAN upscale * QVAC-17990 Tighten upscaler validation * QVAC-17990 Add ESRGAN e2e integration coverage * QVAC-17990 Add standalone upscaler changelog and model links * QVAC-17990 Add ESRGAN coexistence integration coverage * QVAC-17990 Share ESRGAN helpers and warn on dropped output * QVAC-17990 Add ESRGAN cancel integration coverage * QVAC-17990 Add ESRGAN cancel and coexistence tests * QVAC-17990 Fix C++ lint issues * QVAC-17990 Sync mobile integration manifest * QVAC-17990 Use global native logging setup * QVAC-17990 Document standalone EsrganUpscaler API Add a standalone-esrgan-upscale example and a README usage section covering the new EsrganUpscaler named export. The previous CHANGELOG entry was the only user-facing reference to the new public class; this commit makes it discoverable from the README index, the Other Examples list, and a runnable example script that mirrors the existing generate-image-esrgan-upscale flow but without the diffusion phase. * QVAC-17990 Fix standalone ESRGAN example lint --------- Co-authored-by: gianni-cor <gianfrancocordella@gmail.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

…orepo (tetherto#1860) * mv qvac-lib-decoder-audio -> decoder-audio * chore: mv lib-infer-diffusion -> infer-diffusion * chore: remove qvac-lib- prefix from diagnostics pkg * chore: remove qvac-lib prefix from infer-base * Rename llamacpp-embed to embed-llamacpp * chore: align llm-llamacpp folder to pkgname * chore: align translation-nmtcpp with package.json canonical name * chore: align folder for tts-onnx with pkgname * chore: align onnx directory with pkg name, add cleanup * chore: align dirname with pkgname transcription-parakeet * chore: align transcription-whispercpp * chore: mv langdetect-text-cld2 to canonical foldername * chore: align foldername -> pkgname langdetect-text * chore: align registry-server folder/pkgnames * chore: rename pkg lint-cpp to match structure * chore: align pkgname inference-addon-cpp * chore: file update remove lib-* prefix * chore: re-align to content of package.json > name canonical pkgname * chore: completes rename to diffusion-cpp * chore: align workflow name * chore: wrap up lib-diagnostics rename * chore: mv reusable-workflow{ * chore: mv llama-embed to embed-llama for workflows * chore: mv llm-llamacpp workflows * chore: mv translation-cpp workflow files * chore: rename tts-onnx workflow files * chore: rename onnx workflow files * chore: mv transcription-parakeet workflows * chore: rename whispercpp workflows * chore: mv langdetect cld2 workflow * chore: rename langdetect workflow * chore: mv registry-server workflow * chore: rename remaining files * Tidyup: remaining files * Tidyup: remaining renames * chore: fixup moved embed-llamacpp * fixup: additional renames * fixup: revert package changes to upstream vcpkg * fixup: remove non-existing upstreams

…1 / tetherto#1860) (tetherto#1959) PR tetherto#1860 (commit 1d1d8c3) renamed the inference-addon-cpp package and updated its CMakeLists.txt to look up the package config template at cmake/inference-addon-cppConfig.cmake.in, but the template file itself was never moved off the legacy filename. As a result every vcpkg port build of qvac-lib-inference-addon-cpp@1.1.7#1 fails at configure time: CMake Error at .../cmake/CMakePackageConfigHelpers.cmake:519 (configure_file): configure_file Problem configuring file Call Stack (most recent call first): CMakeLists.txt:41 (configure_package_config_file) Just complete the rename: git mv the template to the short post-rename name. Contents are template-only (`@PACKAGE_INIT@` + `@PROJECT_NAME@`) and need no edits. Co-authored-by: gianni-cor <gianfranco.cordella@tether.io> Co-authored-by: Cursor <cursoragent@cursor.com>

…inference-addon-cpp/ (tetherto#1960) PR tetherto#1860 renamed the inference-addon-cpp include namespace and tetherto#1959 landed the matching cmake.in / vcpkg.json updates, but EsrganUpscalerModel was overlooked and still pulled the old namespace headers, breaking the diffusion-cpp build now that the registry advertises 1.1.7#1 (which installs under include/inference-addon-cpp/): .../diffusion-cpp/addon/src/model-interface/EsrganUpscalerModel.hpp:10:10: fatal error: 'qvac-lib-inference-addon-cpp/ModelInterfaces.hpp' file not found Just align the three remaining #include directives. Co-authored-by: Cursor <cursoragent@cursor.com>

…1957) * chore: Add chatterbox ggml models * chore: Add supertonic ggml models --------- Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

…etherto#1922) * infra[notask]: rename desktop test runner labels for sdk tests-qvac Update the GPU runner labels used by the SDK desktop tests workflows to the new naming scheme: - ai-run-windows11-gpu -> qvac-win25-x64-gpu - ai-run-linux-gpu -> qvac-ubuntu2204-x64-gpu - mac-mini-m4-gpu (unchanged) Affected: - .github/workflows/test-sdk.yml: update default for desktop-platforms in workflow_dispatch and workflow_call inputs, plus the inline fallback used when calling test-desktop-sdk.yml. - .github/workflows/test-desktop-sdk.yml: refresh the runner-label example in the platforms input description for consistency. Co-authored-by: Cursor <cursoragent@cursor.com> * infra[notask]: enable cross-os archive for desktop sdk model cache --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>

…ate (tetherto#1926) * QVAC-18394 infra: add devops pod conventions, team file, and PR template Baseline DevOps pod metadata and conventions to unblock the QVAC-18394 skill subtasks (Stale-Prs, Create-pr, Daily-update, Pr-review). Documentation and config only; no behavior change. Files: - .github/teams/devops.json — pod metadata (leads, members, ownedPaths) - .cursor/rules/devops/main.mdc — pod entry point + operating principles - .cursor/rules/devops/github-actions.mdc — workflow/action conventions - .cursor/rules/devops/secrets-and-credentials.mdc — secrets handling + leak-response playbook - .cursor/rules/devops/agentic-automation.mdc — read-only-default, plan-then-apply, validation-before-success for AI-driven work - .cursor/rules/devops/commit-and-pr-format.mdc — commit/PR title format scoped to .github/** and scripts/** (sdk pod's rule is package-scoped) - .github/PULL_REQUEST_TEMPLATE/devops.md — PR body template mirroring sdk-pod.md / addon.md discipline (flat sections only) Validated: - All .mdc frontmatter parses cleanly (description, globs, alwaysApply) - devops.json parses cleanly - No linter errors, no secret patterns matched - PR template structure mirrors existing templates (no H3 nesting, no tables, no HTML) Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-18394 chore: expand devops pod roster with 5 team members Adds the rest of the active DevOps engineers to .github/teams/devops.json so /devops-pr-status correctly partitions reviewers between "Reviews:" (team) and "Other:" (outside) buckets. Without this, every team-member review currently lands in "Other:" and the dashboard reports approvals as still-needed. Members (alphabetical, case-insensitive): - darkynt (Matt Cavanagh) - GiacomoSorbiWork (Giacomo) - sidj-thr - tamer-hassan-tether - yauhenipankratovich-web Removes Proletter from members per the cross-pod convention (lead is listed in `leads` only — see .github/teams/sdk.json). Validation: - JSON parses; pr-status.mjs --pod devops --mode team loads the new roster without error. - No code/path changes, data-only update. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-18394 chore: drop commit-and-pr-format rule (skill-only) Per review feedback: rules auto-attach via globs and pollute the context window on every devops surface. The format spec is already encoded in devops-pr-create (regex validation, allowed prefixes/tags, trigger detection) and devops-pr-review (title validation against the same regex) — both invoked explicitly, never autoloaded. - Delete .cursor/rules/devops/commit-and-pr-format.mdc (5 KB). - main.mdc: drop the rule from the related-rules table and replace the "Commit messages and PR titles" section with a one-line pointer to the devops-pr-create skill. Skill-side cross-references to the deleted rule are cleaned up on PR tetherto#1929 (next in the stack) since that's where the skills live. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-18394 feat: add devops pod skills (pr-status, pr-create, daily-update, pr-review) (tetherto#1929) * QVAC-18394 feat: add devops pod skills (pr-status, pr-create, daily-update, pr-review) Resolves the four QVAC-18394 subtasks by adding the DevOps pod's user-facing Cursor skills on top of the conventions and team file landed in the prereq branch. The skills lean on the existing _lib/pr-skills/ shared library for pod discovery, PR enumeration, Slack-handle mapping, and worktree management, so no new shared infra is added — only thin SKILL.md surfaces and DevOps-specific workflows. Files: - .cursor/skills/devops-pr-status/SKILL.md — Stale-Prs subtask. Thin wrapper invoking pr-status.mjs --pod devops --mode team. The shared script already segregates PRs into needs-your-re-review / stale (>3d) / needs-review and flags merge conflicts; no separate stale-only mode is needed. - .cursor/skills/devops-pr-create/SKILL.md — Create-pr subtask. Generates TICKET prefix[tag]?: subject titles + devops.md PR body, with trigger detection (action-pinning / permissions / IaC plan / [bc]) driving which template sections are required. Client-side title validation since no pr-validation-devops.yml exists yet. - .cursor/skills/devops-daily-update/SKILL.md — Daily-update subtask. Aggregates yesterday's merged PRs, today's open PRs, reviews owed, and recent CI runs into a Slack/Asana-ready message. Bounded to <=6 shell calls. Read-only; never posts. Includes a secret-pattern scrub before writing the temp file. - .cursor/skills/devops-pr-review/SKILL.md — Pr-review subtask, absorbs gha-audit. Wraps /pr-review (does NOT fork it) and layers a deterministic GitHub Actions security audit (15 checks A1-A15) sourced verbatim from .cursor/rules/devops/github-actions.mdc and secrets-and-credentials.mdc. Findings flow into the same pending-review payload the user confirms. All four skills: - disable-model-invocation: true (state-changing or PR-posting flows) - Reference rules and team file landed by the prereq PR - Inherit safety + efficiency rules from .cursor/rules/devops/agentic-automation.mdc (read-only by default, plan-then-apply for state changes, bounded shell calls) Validated: - All four SKILL.md frontmatter parses (name matches directory; non-trivial description) - All 12 cross-file references resolve (rules, team file, PR template, shared lib, parent skills) - gh search prs / gh run list flags + JSON fields verified against gh CLI 2.x help output - ReadLints clean - No formatter mangling Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-18394 fix: align devops-daily-update output to team's slack template The first draft used a generic Markdown layout (`## Yesterday`, `## Today`, `## Blockers`, `_(none)_` for empty sections, GitHub-flavored links). The team's actual daily-update format on Slack is different: 🔨 *Done today* - QVAC-XXXXX: <past-tense action> - <optional sub-bullet> 📅 *Planned for tomorrow* - QVAC-XXXXX: <forward-looking action> - QVAC-YYYYY 🚧 *Blockers / risks* - N/A Changes: - Replaced the section names and added the canonical 🔨 / 📅 / 🚧 emoji - Switched from Markdown headings to Slack-bold (`*Section*`) so the output renders correctly when pasted into Slack (Slack does not render `##`) - Empty sections now render `- N/A` (literal), not `_(none)_` - Bullets lead with `TICKET:` (auto-linked by the workspace's Asana app), not `#<pr-num>` — falls back to `#<num>` only when no ticket can be extracted from PR title or branch name - Sub-bullets at 4-space indent for ticket-level context - Default `--format` is now `slack` (not `markdown`) — Slack is the primary destination; chat preview keeps the Markdown form - Temp file extension changed `.md` → `.txt` to reflect Slack mrkdwn (not GitHub-flavored Markdown) as the canonical form - Added ticket-extraction rules (PR title → branch name → `#<pr-num>`) - Added a per-section routing table (merged-today / pushed-today / open-no-recent-commits / reviews-owed / conflicting / stale-review / CI-failing) so the agent knows which bucket each item lands in Lookback default unchanged at "yesterday 00:00 local" — covers both an EOD post late evening and a morning standup at 7am without manual `--since`. Quality gates updated to enforce the new layout (correct emoji + section names; `- N/A` for empty; no Markdown headings in Slack form; no GitHub- style links). The skill is still read-only and never posts. The user copies from the temp file and pastes into Slack manually. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-18394 chore: align devops skills to sdk-pod conventions Self-audit pass against `.cursor/rules/sdk/skill-authoring-guidelines.mdc` and the SDK pod's reference skills (sdk-pr-status, sdk-pr-create, sdk-changelog, sdk-backmerge). Documentation-only. Description tightening: - devops-pr-status: 341 → 275 chars - devops-pr-create: 269 → 231 chars - devops-daily-update: 398 → 255 chars - devops-pr-review: 386 → 271 chars Reference: sdk-pr-status's description is 256 chars. All four are now in the same 230–280 range, vs the prior 270–400 range. WHAT/WHEN preserved on each. Heading consistency: - "## Quality gates" → "## Quality Checklist" in devops-daily-update, devops-pr-review (sdk-changelog / sdk-backmerge / sdk-pr-create all use "Quality Checklist") - "## Validation gate (CLIENT-SIDE)" → "## Validation" in devops-pr-create (no SDK skill uses uppercase parenthetical scope qualifiers in headings) Editorial cleanup: - devops-pr-status: dropped the "Resolves the Stale-Prs subtask of QVAC-18394 …" paragraph (skill bodies should not reference their own PR/ticket; SDK skills never do) - devops-daily-update: dropped the upfront "## Canonical template" section (~25 lines). Step 8's "#### Slack form (canonical)" is the single source of truth for the format. Folded the one unique line — bare-ticket bullets allowed when self-evident — into Step 8. Reduced devops-daily-update from 269 → 242 lines. Other line-counts stable (46, 183, 140). No behaviour changes. Cross-file references still resolve. Frontmatter parses; name matches dir; disable-model-invocation: true preserved on all four. ReadLints clean. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-18394 fix: devops skill issues found during test pass - github-actions.mdc § Permissions: accept top-level OR per-job permissions blocks as equivalent (per-job is the more secure narrower-scope pattern). - github-actions.mdc § File layout: add integration-<scope>-<pkg>.yml to the canonical filename list (existing repo convention). - devops-pr-review SKILL.md: tighten A2 + A15 check descriptions to mirror the loosened rule (audit becomes more permissive — no consumers break). - devops-daily-update SKILL.md: trim merged-PRs gh-search --json field set to what the API actually exposes (closedAt, not mergedAt/ additions/deletions); add cap of 5 most-recently-updated reviews to the standup output with overflow line. - devops-pr-create SKILL.md + devops.md PR template: drop the redundant "be concise" Note line from the template head. All issues uncovered by the end-to-end test session of the four new devops skills on this branch. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-18394 fix: emit paste-ready output files in devops pr-status + pr-create - devops-pr-status: tee dashboard stdout to /tmp/devops-pr-status-<date>.txt and redirect stderr to a sibling .stderr file. Print pbcopy/xclip/wl-copy commands so the operator can paste the dashboard straight into a Slack thread (Slack auto-renders the indented plain text as nested bullets and turns #<num> into PR auto-links). - devops-pr-create: add an explicit step 8 to write the assembled PR body to /tmp/pr-body.md (the gh CLI Integration section already cat's that path). Add the pbcopy/xclip/wl-copy commands as step 9 for direct paste into the GitHub PR-create form. Discovered during the test pass — the dashboard output was useful but the operator had to manually copy from the terminal. Now there's a single pbcopy command to grab paste-ready content. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-18394 chore: drop commit-and-pr-format rule cross-references in skills Mirror the rule deletion on PR tetherto#1926 — remove dead links from devops-pr-create and devops-pr-review SKILL.md, and inline the title regex / allowed prefixes / allowed tags so the skills stay self-contained without auto-loading anything via globs. - devops-pr-create: Format References now points at the inline Validation regex; the "see rule" parenthetical in Validation is replaced with a one-line note that no pr-validation-devops.yml exists yet; the References bullet for the deleted rule is removed. - devops-pr-review: drop commit-and-pr-format from the auto-load list in step 4 (it's deleted, no longer auto-loads); inline the format spec in step 5 (regex + prefixes + tags); replace the rule bullet in References with a pointer to devops-pr-create as the canonical home for the format spec. No behavior changes — same regex, same prefix/tag list, same validation logic. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

@jpgaribotti

* Restore Qwen3.5 / Gemma4 / PaddleOCR-VL tests + Mali coopmat fix Stack of three logical changes squashed into one commit so the test ports stay self-consistent with the build/runtime they depend on: 1. qvac-fabric overlay ports (LLM + embed + nmtcpp): - Pin to fabric 78db8bf4 (PR tetherto/qvac-fabric-llm.cpp#121 HEAD, includes c79a8851 "ggml-vulkan: Fix NaN outputs on Mali"). - Drop -DGGML_VULKAN_DISABLE_COOPMAT*=ON for Android so coopmat shaders are compiled in. With coopmat off, runtime device->coopmat_support is false and the Mali fix's ARM-gated branches were skipped, leaving Qwen3-Q8_0 finetuning NaN on Pixel 9 Pro Mali. - Wire up overlay-ports in each package's vcpkg-configuration.json. - Add find_package(OpenSSL) before find_package(llama) in the LLM CMakeLists so llama-targets.cmake's transitive OpenSSL::SSL reference (via cpp-httplib) resolves on local builds. 2. utils.js downloadFile redirect race: - Track a handedOff flag set when the redirect branch hands off dest to a recursive call. All cleanup paths now skip fs.unlink once ownership is transferred, so a late error from the outer writestream can't delete the freshly-downloaded file (Pixel ENOENT after "successful" mmproj download). 3. Three new integration tests + their mobile harness wiring: - qwen3-5.test.js — basic / multi-turn / tool-calling - gemma4.test.js — text / multi-turn / image (forced to CPU on darwin + mobile because gemma4v projector SIGSEGVs on Metal and Adreno OpenCL) / tool-calling - ocr-paddle.test.js — OCR; mobile maxTokens capped to 768 - Ported to the new addon API (files: { model: [absPath], projectionModel?: absPath }, config: …). - Added matching unit test test_text_llm_context_qwen3.cpp. - integration.auto.cjs registers runQwen35Test, runGemma4Test, runOcrPaddleTest dispatchers. - test-groups.json: iOS heavy4 cluster (Gemma4+OcrLighton+OcrPaddle), iOS lightB adds Qwen35, Android groupB has Qwen35 first then Gemma4 / OcrPaddle. - Workflow: Android GroupB Device Farm jobTimeout 60→90 min. * API port + Gemma4 tool-call fix. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Wire addon/src/patches ahead of the vcpkg include path to pick up the LlamacppUtils.hpp ptr-API override. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * API port + Gemma4 tool-call fix. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Split iOS heavy4 into three single-test specs (heavy4 = OcrLighton, new heavy7 = Gemma4, new heavy8 = OcrPaddle) and schedule them as separate Device Farm runs to avoid memory pressure. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Drop LlamacppUtils.hpp patch override; bump addon-cpp to 1.1.7 The LlamacppUtils.hpp common_init_result_ptr API now ships in qvac-lib-inference-addon-cpp 1.1.7 (PR tetherto#1887), so the local addon/src/patches/qvac-lib-inference-addon-cpp/LlamacppUtils.hpp shim is no longer needed in the embed and llm addons. - Delete the patch headers in embed and llm. - Drop the BEFORE PRIVATE addon/src/patches include path from the embed/llm production and unit-test CMakeLists. - Bump qvac-lib-inference-addon-cpp version>= to 1.1.7 in the embed, llm, and nmtcpp vcpkg.json files so they pick up the upstream ptr-API header from the registry. The OpenSSL find_package() addition stays — it's an unrelated local-build fix. Co-authored-by: Cursor <cursoragent@cursor.com> * Cap ocr-lighton predict to 1800 (desktop) / 768 (mobile) so the LightOnOCR response can't overrun ctx_size=4096. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Rewrite sliding-context test to use the post-GGML_PAD effective n_ctx (512) and retune n_predict / n_discarded so all 8 cases match the current ContextSlider semantics. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Allow embed batching test to override ctx_size and pin gte-large to batch_size=512 / ctx_size=384 to probe the Mali Vulkan first-submit ErrorDeviceLost. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Fix reverse-prompt scenario by removing comma, space, listing both 'pizza' and 'Pizza', and lowercasing the assertion comparisons to match 'Pizza' and 'pizza'. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Sanitize media Uint8Array prompts before logging to avoid V8 Zone OOM. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Use Qwen3 family chat-template to fix Qwen3.5-0.8B gibberish output on macOS Metal. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Update portfiles to point to the latest fabric. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Revert "Allow embed batching test to override ctx_size and pin gte-large to batch_size=512 / ctx_size=384 to probe the Mali Vulkan first-submit ErrorDeviceLost." This reverts commit 1408896. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Raise AfriqueGemma cancel maxWait to 60s, and apply the use_jinja gate-drop so Qwen3-family models always pick the fixed jinja template. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Drop the retired AfriqueGemma integration tests. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Update portfiles to point to the latest head. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Update portfiles to point to the latest head. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Drop qwen35 from the Qwen3-template detection and the supported-finetune-architecture list since neither path is actually validated for Qwen3.5. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Update portfiles to point to the latest head. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Enable coopmat. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Drop the Qwen3 use_jinja override pairing now that qwen35 is no longer treated as Qwen3-family. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Use only general.architecture for Qwen3 detection so Qwen3.5 stops getting the Qwen3 chat-template via the model-name substring fallback. Drop modelNameLooksLikeQwen3 / getModelName and the modelName parameter from supportsToolsCompactForModelMetadata and selectToolsCompactMarkerForModelMetadata. The substring match on general.name treated "Qwen3.5-..." as Qwen3 and overrode the model's embedded tokenizer.chat_template, contradicting the recent decision to keep qwen35 out of the Qwen3 family. Update the LlamaModel call site and unit tests; add explicit qwen35/nullopt negative cases. Co-authored-by: Cursor <cursoragent@cursor.com> * Accept HuggingFace function-call XML in extractToolCalls so the Qwen3.5 tool-calling integration test parses the model's native <tool_call><function=...><parameter=...>...</parameter></function></tool_call> envelope produced by its embedded chat template, in addition to the Qwen3-style JSON envelope. Co-authored-by: Cursor <cursoragent@cursor.com> * Bump n_predict in the Qwen3.5 basic and multi-turn integration tests so the embedded chat-template's reasoning block has room to finish before the answer on slower CI backends. Co-authored-by: Cursor <cursoragent@cursor.com> * Enable coopmat and point to the latest fabric. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Route Qwen3.5 inference and all finetuning on Mali to CPU, disable Vulkan coopmat at build time, halve mobile finetune workload to account for CPU training. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Point to the latest fabric version. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Force Bert to the CPU on Mali. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Run finetuning on Mali GPU. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Run Qwen 3.5 on Mali GPU. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Point to the latest fabric version and enable coopmat path. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * vcpkg: drop per-package qvac-fabric overlays Removes the qvac-fabric overlay-ports infrastructure from the LLM, Embed, and NMT manifests. The default-registry baseline is left untouched, so vcpkg now resolves qvac-fabric directly from the registry at the existing baseline (7248.2.3). Bumping to fabric 8189.0.0 will be handled by a separate baseline update; this commit only undoes the overlay-based development setup that was no longer needed. - vcpkg-configuration.json (3x): drop "overlay-ports" entry. - vcpkg/ports/qvac-fabric/ (3x): remove overlay portfile.cmake, vcpkg.json, and android-vulkan-version.cmake. Co-authored-by: Cursor <cursoragent@cursor.com> * vcpkg: bump qvac-fabric version constraint to 8189.0.0 Updates the consumer manifests in the LLM, Embed, and NMT packages to require qvac-fabric >= 8189.0.0. The default-registry baseline is intentionally left untouched. Co-authored-by: Cursor <cursoragent@cursor.com> * llm/embed/nmtcpp: bump versions for qvac-fabric 8189.0.0 - qvac-lib-infer-llamacpp-llm: 0.19.2 -> 0.20.0 (minor) - qvac-lib-infer-llamacpp-embed: 0.15.0 -> 0.16.0 (minor) - qvac-lib-infer-nmtcpp: 2.1.1 -> 3.0.0 (major) The nmtcpp major bump reflects a real behavioural regression: the previous overlay built ggml unconditionally with every GPU backend the platform supported (Vulkan/Metal/OpenCL); switching to the upstream registry port with the existing "default-features": false in nmtcpp's vcpkg.json now disables the new "gpu-backends" feature, so out-of-the-box ggml exposes only the CPU backend. Consumers that rely on GPU-accelerated nmt inference must add '"features": ["gpu-backends"]' to the qvac-fabric block of their nmtcpp build manifest. CHANGELOG entries added in all three packages. Co-authored-by: Cursor <cursoragent@cursor.com> * nmtcpp: opt into qvac-fabric gpu-backends feature; downgrade bump to 2.2.0 The previous commit (3.0.0) flagged a breaking change: switching from the always-on overlay to the registry port with default-features:false disabled GPU backends in ggml. Adding "features": ["gpu-backends"] to nmtcpp's qvac-fabric dep restores the previous Vulkan/Metal/OpenCL behaviour, so the bump is now a non-breaking minor (2.2.0) and the BREAKING note in the changelog is replaced with a plain Changed entry. Co-authored-by: Cursor <cursoragent@cursor.com> * nmtcpp: re-bump to 3.0.0 (major) Restores the major version bump for nmtcpp. The new fabric port schema (features split between gpu-backends/llama) and the move from a vendored overlay to the upstream registry are large enough downstream changes that consumers should treat this as a major release, even though runtime behaviour is preserved by opting into "gpu-backends". Co-authored-by: Cursor <cursoragent@cursor.com> * vcpkg: pin qvac-fabric to >=8189.0.0#1 The 8189.0.0 (port-version 0) qvac-fabric port shipped a configure-time bug for consumers without the "llama" feature (i.e. nmtcpp): -DLLAMA_MTMD=ON was passed unconditionally, which transitively enables LLAMA_BUILD_COMMON, which makes upstream call license_generate(common) -- but BUILD_LLAMA=OFF skips defining the 'common' target, so the cmake configure aborts. The fix landed in tetherto/qvac-registry-vcpkg#136 as qvac-fabric port-version 1. Bumping the consumer constraint from "version>=": "8189.0.0" to "version>=": "8189.0.0#1" forces vcpkg to pick the fixed port-version (otherwise it picks the lowest satisfying version, which is the broken #0). Validated: nmtcpp arm64-android cross-build now configures and builds end-to-end against the upstream registry, no overlay needed. Co-authored-by: Cursor <cursoragent@cursor.com> * docs: drop overlay-removal note from changelogs Removes the changelog bullet describing the deletion of the per-package qvac-fabric vcpkg overlay. The overlay teardown is mechanical packaging plumbing rather than a user-facing change worth documenting. Co-authored-by: Cursor <cursoragent@cursor.com> * test/llm: restore AfriqueGemma integration tests (desktop-only) Reverts e257a19's deletion of the afriquegemma-edge-cases and afriquegemma-translation integration tests, and adds a 'desktopOnly' opt-out so they're skipped on mobile without breaking the per-test group coverage invariant. - packages/qvac-lib-infer-llamacpp-llm/test/integration/afriquegemma-edge-cases.test.js: restored. - packages/qvac-lib-infer-llamacpp-llm/test/integration/afriquegemma-translation.test.js: restored. - test/mobile/test-groups.json: new top-level "desktopOnly" array listing runAfriquegemmaEdgeCasesTest and runAfriquegemmaTranslationTest. - scripts/generate-mobile-integration-tests.js: validateGroups now reads the desktopOnly list; entries are still emitted into integration.auto.cjs (so validate-mobile-tests stays happy) but excluded from the per-platform "missing" check, so the mobile runners never invoke them. - test/mobile/integration.auto.cjs: regenerated by `npm run test:mobile:generate`. - CHANGELOG note in qvac-lib-infer-llamacpp-llm under Tests. Validated via `npm run test:mobile:generate` + `npm run test:mobile:validate`. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(llm): drop AfriqueGemma test restoration changelog note Co-authored-by: Cursor <cursoragent@cursor.com> * test/llm: switch AfriqueGemma desktop-only skip to in-test pattern Per review: don't change generate-mobile-integration-tests.js. Use the same skip:isMobile pattern other tests already use (config-parameters, tool-calling, image), and keep the AfriqueGemma functions in the iOS lightA / Android groupA groups so the existing per-test coverage invariant stays intact. - packages/qvac-lib-infer-llamacpp-llm/scripts/generate-mobile-integration-tests.js: reverted to upstream/main (drops the desktopOnly opt-out plumbing). - test/mobile/test-groups.json: drops 'desktopOnly', adds runAfriquegemmaEdgeCasesTest and runAfriquegemmaTranslationTest back to ios.lightA and android.groupA. - test/integration/afriquegemma-edge-cases.test.js, test/integration/afriquegemma-translation.test.js: add isMobile = platform === 'ios' || platform === 'android', and skip:isMobile to every test() options object (13 total). - test/mobile/integration.auto.cjs: regenerated. Validators both green: npm run test:mobile:generate -> "all tests assigned for every platform" npm run test:mobile:validate -> ok Co-authored-by: Cursor <cursoragent@cursor.com> * test/llm: skip ocr-lighton on mobile Adds skip:isMobile to the single test in ocr-lighton.test.js, matching the AfriqueGemma / config-parameters / tool-calling pattern. isMobile is already defined in this file. The test stays in ios.heavy4 / android.groupB so per-platform group coverage is unaffected; the brittle test itself just skips on mobile. Co-authored-by: Cursor <cursoragent@cursor.com> * ci: revert workflow timeout change for llm mobile integration Drops PR tetherto#1874's edit to .github/workflows/integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml (parameterised jobTimeoutMinutes + 90-minute override for Android GroupB). Workflow is restored to the upstream/main version. Co-authored-by: Cursor <cursoragent@cursor.com> * addons: disable flash-attn by default on the OpenCL backend Flash attention is not reliably supported by the OpenCL ggml backend (Adreno path), so when the chosen GPU backend ends up being OpenCL the addons now force "flash-attn=off" unless the user explicitly passed flash-attn / flash_attn in their config. LLM (LlamaModel.cpp / LlamaModel.hpp): - Add a bool isOpenCl parameter to tuneConfigMap (defaulted to false to keep the existing test_tune_config_map.cpp call sites working). - Mirror the BitNet-disabling branch with an else-if for OpenCL + notUserSet("flash-attn", "flash_attn"). - At the call site, read chosenBackend.first/second after chooseBackend returns and pass isOpenCl through. Embed (BertModel.cpp): - No tuneConfigMap equivalent here. Inject the same logic inline immediately after chooseBackend, before configFilemap is serialised into configVector. Honour user-set "flash-attn"/"flash_attn". Both packages compile cleanly via bare-make build on macOS-arm64. Co-authored-by: Cursor <cursoragent@cursor.com> * fixup! tuneConfigMap: keep ABI for existing 4-arg test callers CI failure on cpp-tests-darwin-arm64 (PR tetherto#1874): test/unit/test_tune_config_map.cpp:199:43: fatal error: no viable conversion from 'FtOverrides' to 'bool' The previous commit inserted bool isOpenCl as the 4th parameter of tuneConfigMap, but several existing tests pass FtOverrides{...} as the 4th positional argument (relying on it being finetuneOverrides). Swap the order so the new isOpenCl parameter comes after the existing finetuneOverrides; both stay defaulted, so all old 3-arg and 4-arg call sites compile unchanged. The production call site in LlamaModel.cpp is updated accordingly. Also adds 4 new TuneConfigMapTest cases covering the OpenCL branch: - OpenCl_NonBitnet_FlashAttnDisabledByDefault - OpenCl_UserSetFlashAttnHyphen_Respected - OpenCl_UserSetFlashAttnUnderscore_Respected - NotOpenCl_NonBitnet_FlashAttnUnchanged All 53 TuneConfigMapTest cases pass locally on macOS-arm64. Co-authored-by: Cursor <cursoragent@cursor.com> * Add QWen 3.5 vision test. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Route vision models with mmproj to CPU on Apple M1. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Route only the projector to CPU on Apple M1. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * run qwen3-5.test.js on IOS GPU * js lint * Recognize Gemma 4 channel reasoning markers in Qwen3ReasoningUtils, and bump gemma4 basic-test n_predict so the answer fits after the thinking preamble. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Wire reasoning-budget config to inputs.enable_thinking so passing reasoning-budget=0 disables the model's <think> reasoning channel, and add coverage for Qwen3, Qwen3.5, and Gemma 4. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * vcpkg: bump qvac-fabric to >=8189.0.1 The 8189.0.1 port (tetherto/qvac-registry-vcpkg#138) drops port-version 1's BUILD_LLAMA=OFF portfile workaround and ships the new fabric tip 739b309ae. Notable upstream fixes pulled in: - Inject enable_thinking into the Jinja template context so Qwen 3.5 and Gemma 4 actually emit <think> reasoning content. - GGML_OP_DELTA_NET_AR Vulkan compute shader (Qwen 3.5 / DeltaNet decode no longer falls back to CPU per token). - vulkan: f32 src1 strided cpy fix (embedding-model crash). Validated on macOS-arm64: vcpkg resolves qvac-fabric[core,gpu-backends,llama]:arm64-osx@8189.0.1 and the addon builds end-to-end. Co-authored-by: Cursor <cursoragent@cursor.com> * Disable the embed addon's BERT-on-Mali CPU override. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Prepend <think> opener to the visible stream when the chat template force-opens the reasoning channel. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Remove the Mali detection plumbing from the embed addon now that BERT runs on Mali GPU. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Bump n_predict and ctx_size in the Qwen3.5 reasoning-budget baseline so the model reliably reaches </think>. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Restore the mobile finetune dataset to 8 samples. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * test: drop AfriqueGemma + MedGemma + Dolphin-MoE tests Per review: cull tests that exercise models we no longer want covered in the LLM/SDK CI matrix. LLM (packages/llm-llamacpp): - Delete integration tests: - test/integration/afriquegemma-edge-cases.test.js - test/integration/afriquegemma-translation.test.js - test/integration/moe.test.js (dolphin-mixtral-2x7b) - Delete docs/afriquegemma-translation.md (only documents the now-removed integration tests). - Strip the medgemma-4b-it variant from: - test/integration/tool-calling.test.js (collapses ALL_TOOL_MODEL_VARIANTS / TOOL_MODEL_VARIANTS to qwen3-1.7b only, drops the now-unused isMobile derived var). - test/integration/finetuning-pause-resume.test.js (drops the medgemma-4b-it-q4_0 entry from FINETUNE_MODELS). - test/unit/test_model_metadata.cpp: drop the gemma3Model_ fixture + the two Gemma3-specific TEST_F cases (DiskSingleFile_Gemma3Arch_*); update the comment block listing exercised arches accordingly. - test/unit/pick-primary-gguf-path.test.js: keep the tensors.txt-first ordering test, but rebase the fixture filenames on Qwen3-4B-Q4_K_M-* so no medgemma names remain in the test corpus. - test/mobile/test-groups.json + test/mobile/integration.auto.cjs: drop runAfriquegemmaEdgeCasesTest, runAfriquegemmaTranslationTest, runMoeTest from both ios and android groups; auto.cjs trimmed to match. `validate-mobile-tests.js` is green. SDK (packages/sdk/tests-qvac): - Delete tests/translation-afriquegemma-tests.ts. - tests/test-definitions.ts: drop translationAfriquegemmaTests import + spread. - tests/shared/executors/translation-executor.ts: drop the import, the spread, and the |afriquegemma branch from the dispatch regex. - tests/mobile/consumer.ts + tests/desktop/consumer.ts: drop the AFRICAN_4B_TRANSLATION_Q4_K_M import and the resources.define("afriquegemma", ...) block; mobile also drops the afriquegemma-only SkipExecutor. - tests/shared/resource-lifecycle.ts: rephrase the eviction-comment example to a generic "large translation model" so it no longer references the deleted resource. Not touched: NOTICE/CHANGELOG (auto-generated/historical), sdk/models/registry/* (model constants in the registry are data, not tests), sdk/examples/translation/translation-llm-afriquegemma.ts (consumer-facing example, not a test). * Revert "test: drop AfriqueGemma references from packages/sdk/tests-qvac" Per review: keep packages/sdk/tests-qvac/ untouched. Restore the SDK afriquegemma test file, the test-definitions / translation-executor / desktop+mobile consumer / resource-lifecycle edits to their state prior to commit 36de6ec. Only the LLM-side cull (packages/llm-llamacpp + the deleted afrique / moe / medgemma test files there) from 36de6ec is kept. * Restore packages/llm-llamacpp/docs/afriquegemma-translation.md Per review: keep the AfriqueGemma translation doc. Commit 36de6ec removed it together with the LLM AfriqueGemma test files; restore it unchanged from the merge tip (e29836d). * chore: pin qvac-fabric to 8189.0.2 via overlay-ports for testing Adds an overlay port copy of qvac-fabric pointing at v8189.0.2 of tetherto/qvac-fabric-llm.cpp (tetherto/qvac-registry-vcpkg#140) to llm-llamacpp, embed-llamacpp, and translation-nmtcpp, declared via each package's vcpkg-configuration.json. Lets this PR exercise the new fabric build (incl. the Mali coopmat1 BitNet TQ NaN fix) without waiting for the registry baseline bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: pin overlay qvac-fabric to temp-8189 tip f686a1324 Point REF at the latest qvac-fabric-llm.cpp temp-8189 commit (f686a1324e13184d3257cb74c1ba17f9cf8ef575) instead of v8189.0.2 so the overlay tracks branch tip while the branch is still moving. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: extend Android LLM mobile test timeouts Allow slower Android Device Farm runs to finish model-heavy LLM tests before the harness marks them as timed out. Co-authored-by: Cursor <cursoragent@cursor.com> * vcpkg: drop qvac-fabric overlay-ports, bump version>= to 8189.0.2 tetherto/qvac-registry-vcpkg#140 publishes qvac-fabric@8189.0.2 in the default registry, so the temporary per-package overlay we used while the new fabric build was still being shaken out is no longer necessary. For llm-llamacpp, embed-llamacpp, and translation-nmtcpp: - Delete `packages/<pkg>/vcpkg/ports/qvac-fabric/` (portfile.cmake, vcpkg.json, android-vulkan-version.cmake) — the overlay copy. - Drop the `overlay-ports` entry from each package's vcpkg-configuration.json. The `default-registry` baseline is left untouched intentionally; the `version>=` constraints below are what forces vcpkg to resolve to the new fabric revision against the unchanged baseline. - Bump the `qvac-fabric` `version>=` pin from `8189.0.1` -> `8189.0.2` in each package's vcpkg.json. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(llm): drop dead sawMali plumbing from BackendSelection `sawMali` was threaded through `emplaceIfValidDevice` / `tryEmplaceDevice` / `chooseBackend` but never read by any caller — leftover from the earlier "Force BERT/Qwen3.5 to CPU on Mali" iterations. The embed-side cleanup already landed in 2ac5de0 ("Remove the Mali detection plumbing from the embed addon now that BERT runs on Mali GPU."); this finishes the symmetric removal on the LLM side. `sawAppleM1` plumbing is preserved unchanged. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(llm): explain why MtmdLlmContext skips inside_reasoning flip TextLlmContext flips reasoningState_.inside_reasoning = true alongside the forced "<think>\n" opener; MtmdLlmContext doesn't because it doesn't carry a reasoningState_ today. Add an inline note so the asymmetry isn't read as a bug, and point at the symmetric site to update if reasoning-aware EOS replacement is later added on the multimodal path. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(llm): narrow tool-call args quoter to leading bare key only The previous post-generation regex (`([{,])(\s*)([A-Za-z_]…)(\s*):` -> quote the ident) was too broad: it also matched `, ident:` substrings sitting inside JSON string values, so a tool call with a free-form string argument like `{"query":"phase one, step: validate"}` came out corrupted as `{"query":"phase one, "step": validate"}`, which then failed JSON.parse on the consumer side. In practice the rewrite is only needed for one upstream quirk: the Gemma 4 parser's `gemma4_args_to_json` (common/chat-parser.cpp) uses an `at_key_start()` helper that peeks backwards in the output buffer for a `{`/`,` -- so the very first top-level key is left bare while every nested or post-comma key is already quoted. All other tool dialects reach us via `json::dump()` upstream and already start with a quoted key. Replace the broad regex with one anchored at `^\{(\s*)<ident>\s*:`, which fixes exactly that single leading-bare-key case and cannot match anywhere inside a JSON string value. Verified end-to-end on linux-x64 against gemma-4-E2B-it-Q8_0 (CPU): - Adversarial prompt forcing `phase one, step: validate` as a tool arg string: baseline produced invalid JSON `{"query":"phase one, "step": validate"}` (parse fail at pos 55); this fix yields `{"query":"phase one, step: validate"}` and the test passes 7/7 assertions. - Existing simple-args happy path (`get_weather` with city/unit) still passes 5/5. Co-authored-by: Cursor <cursoragent@cursor.com> * revert(llm): drop synthetic <tool_call>{json}</tool_call> post-processing Each model now streams only its own native tool-call dialect: - Qwen3 / Hermes: <tool_call>{json}</tool_call> (already canonical) - Qwen3.5: <tool_call><function=name><parameter=k>v</parameter></function></tool_call> - Gemma 4: <|tool_call>call:NAME{key:<|"|>val<|"|>,...}<tool_call|> - Mistral, DeepSeek-R1, Functionary, GPT-OSS, etc. emit their own markers. The previous PR added a post-generation common_chat_parse pass that appended a uniform <tool_call>{json}</tool_call> envelope for every detected call. That duplicated tokens for Hermes-shape models (the envelope is already in the native stream) and inflated Gemma 4 output by ~14% with two synthetic copies per call. The leading-bare-key handling for Gemma 4's tc.arguments was also a constant source of sharp edges (broad regex corrupted string values containing ", ident:"; narrow anchored regex still required follow-up). Per-dialect parsing belongs at the SDK consumer layer, not in the addon. Removed: - Post-generation block in LlamaModel::processPromptImpl (synthesizer). - needsOutputCapture widening to include !resolved.tools.empty(). - LlmContext::getLastChatFormat() virtual. - lastChatFormat_ members + overrides in TextLlmContext, MtmdLlmContext. - common_chat_format* outFormat parameter from getPrompt(). - <regex> include in LlamaModel.cpp (no remaining users). Kept: - outThinkingForcedOpen mechanism (independent reasoning-channel feature). - toolsCompact_ controller and KV-cache trim logic. - All other PR work. Validated on linux-x64/CPU after incremental rebuild: - Gemma 4 (gemma-4-E2B-it-Q8_0): 6/6 asserts pass with native-dialect parser, no synthetic envelope leaks, output 941 chars (down from ~1100 with synthesizer). - Qwen3.5 (Qwen3.5-0.8B-Q8_0): 5/5 asserts pass with the existing parseXmlToolCall path, output 394 chars. Co-authored-by: Cursor <cursoragent@cursor.com> * test(llm): parse Gemma 4 native tool-call dialect in gemma4.test.js Without the synthetic <tool_call>{json}</tool_call> envelope reverted in the previous commit, Gemma 4 emits its own dialect: <|tool_call>call:NAME{key:<|"|>val<|"|>,...}<tool_call|> Strings are wrapped in <|"|>...<|"|> instead of "...", keys are bare, and the closing tag is <tool_call|> (trailing pipe, no slash). extractToolCalls now matches that shape directly and returns { name, argsRaw }. argsContainStringValue() helper checks the args body for a Gemma-4-quoted string literal. Substring-based assertion is sufficient to verify the model called the right tool with the right argument values; full dialect-to-JSON conversion lives upstream in fabric's gemma4_args_to_json and is not the addon test's job. qwen3-5.test.js was unchanged: Qwen3.5 wraps its <function=name> <parameter=k>v</parameter></function> XML in <tool_call>...</tool_call> natively, so the existing parseXmlToolCall path keeps working. Validated on linux-x64/CPU against gemma-4-E2B-it-Q8_0: 4/4 tests, 13/13 asserts (3 synthetic-input parser sanity checks + 1 live LLM run). Co-authored-by: Cursor <cursoragent@cursor.com> * revert(llm): drop Apple M1 detection + projector-CPU routing The PR added an Apple-M1-specific code path that detected the chip via the GPU description string and routed `params.mmproj_use_gpu = false` so the vision projector ran on CPU instead of Metal, working around a SIGSEGV in the projector's image-encoding kernel observed on M1 Metal at the time. Re-tested on M1 with the current fabric tip: no SIGSEGV, projector runs fine on Metal end-to-end. The carve-out is no longer needed. Removed: - BackendSelection: `isAppleM1Device()` helper, `bool& sawAppleM1` plumbing through `emplaceIfValidDevice` / `tryEmplaceDevice` / `chooseBackend`, and `bool* outSawAppleM1` parameter on both `chooseBackend` overloads. - LlamaModel: the `bool sawAppleM1 = false` local, the call-site argument, and the `params.mmproj_use_gpu = !sawAppleM1` ternary; mmproj now uses GPU on every desktop platform (Android still hardcoded to false). - test_backend_selection.cpp: `APPLE_M{1,2,3,4}_DESC` constants, `chooseBackendWithM1Flag()` helper, and the four `AppleM*_*` test cases. - gemma4.test.js / qwen3-5.test.js: the comment blocks describing the M1 carve-out; `useCpuForVision` semantics are unchanged (`useCpu || isMobile` on gemma4 and `useCpu` on qwen3-5). Verified on linux-x64/CPU after rebuild: 148/148 C++ unit tests pass (BackendSelectionTest, TuneConfigMapTest, ChatTemplateUtilsTest). Co-authored-by: Cursor <cursoragent@cursor.com> * revert(llm): drop dead Gemma 4 markers from updateQwen3ReasoningBuffer The PR added two extra substring scans for Gemma 4's reasoning channel markers (<|channel>thought open, <channel|> close) to updateQwen3ReasoningBuffer. The intent was to extend the EOS-rescue path (handleQwen3ReasoningEOS rewrites EOS-while-thinking into a closing tag) to Gemma 4. That never actually fires though: both the buffer-update call and the EOS-rescue call in TextLlmContext are gated by `if (isQwen3Model_)`, and isQwen3Model_ resolves to `general.architecture == "qwen3"` only. Gemma 4 reports architecture "gemma4", so the gate never opens, the markers never get scanned, and the rescue path never runs for Gemma 4. In live runs Gemma 4 always emits <channel|> cleanly before <eos>, so the rescue isn't needed on the happy path; if Gemma 4 ever truncates mid-thought under context pressure we will need a real dialect-aware rescue (per-arch close-tag token + extended gate) and a follow-up will add that. For this PR we just want the dead code gone so it doesn't mislead future readers about what's actually wired up. Net: -9 lines, file is now identical to upstream main. Co-authored-by: Cursor <cursoragent@cursor.com> * test(llm): switch gemma4 fixtures from unsloth to bartowski The unsloth GGUF pack (huggingface.co/unsloth/gemma-4-E2B-it-GGUF) tags <turn|> as the EOG token in tokenizer.ggml.eos_token_id and leaves <eos> classified as a regular text token. Gemma 4's training-baked behaviour after assistant content is to emit a few <eos> tokens before <turn|>, so with that pack the addon's generation loop -- which terminates on llama_vocab_is_eog -- doesn't stop until <turn|> arrives. We were observing ~9 spurious <eos> tokens trailing every Gemma 4 response, eating into n_predict and KV cache for no gain. bartowski's GGUF (huggingface.co/bartowski/google_gemma-4-E2B-it-GGUF) ships the exact same vocabulary but tags <eos> as EOG (matching the base google/gemma-4-E2B-it tokenizer config). With that pack the addon terminates on the first <eos> -- empirically 0 trailing tokens, ~30 % shorter completions on the same prompt, same dialect output that the native-dialect parser added in 87e6c35 handles unchanged. Verified on linux-x64/CPU (qvac-dev-linux-x64) with the same get_weather tool prompt: unsloth Q8_0 : 941 chars, 9 trailing <eos>, EOG = {<turn|>, </s>} bartowski Q4_K_M: 676 chars, 0 trailing <eos>, EOG = {<eos>, </s>} Note: the unsloth metadata bug deserves an upstream issue against the unsloth pack maintainers; this PR's scope is just to stop our tests paying the wasted-tokens tax. Co-authored-by: Cursor <cursoragent@cursor.com> * test(llm): unblock gemma4 image test on mobile + fix ctx overflow Three changes to packages/llm-llamacpp/test/integration/gemma4.test.js (image-describe subtest): 1. Drop the mobile CPU-vision carve-out. useCpuForVision used to force `device: 'cpu'` on Android/iOS to dodge Adreno OpenCL SIGABRT and Mali Vulkan instability that bit us with the unsloth mmproj. With bartowski's mmproj (now the fixture in 787c3322) we want CI to actually exercise the device-farm GPU code path for vision -- if that path regresses on a real Adreno or Mali chip we want to find out from CI, not by accident in production. Desktop x64-darwin / linux-arm64 keep CPU fallback because those hosts don't have a working GPU stack here. 2. Bump ctx_size 2048 -> 8192. A single elephant.jpg encodes to ~260 mtmd image tokens. With ctx_size=2048 plus Gemma 4's verbose CoT preamble the generation loop overflowed nPast > n_ctx during sampling (MtmdLlmContext.cpp:452), throwing 'processPromptImpl: context overflow'. 8192 leaves comfortable headroom on every backend. 3. Set reasoning-budget=0 for this test. We literally ask the model "Answer in one word" -- the <|channel>thought ...<channel|> CoT preamble that Gemma 4 wants to emit by default is wasted tokens here, and was the actual cause of the overflow above (CoT was running 8k+ tokens before the model reached the one-word answer and emitted <eos>). Disabling thinking gives us a deterministic ~10-token "Elephant" + <eos> response, which is what the substring-based assertion is testing for anyway. Verified on linux-x64 (qvac-dev-linux-x64, 2x RTX 5090, Vulkan backend) end-to-end: output: "Elephant" asserts: 3/3 total time: ~2 s Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(llm): drop dead selectToolsCompactMarker(string) overload selectToolsCompactMarker(const std::string& architecture) had no production callers anywhere -- only its two unit tests (SelectToolsCompactMarkerForQwen3, SelectToolsCompactMarkerForUnsupportedArchitecture) referenced it. Live production code goes through selectToolsCompactMarkerForModelMetadata (LlamaModel::resolveToolsCompactConfig calls that one), which takes std::optional<std::string> and is the only path that ever reaches the "qwen3" -> "<tool_call>" mapping at runtime. Removed the .cpp definition, the .hpp declaration, and the two unit tests. selectToolsCompactMarkerForModelMetadata is unchanged and still covered by SelectToolsCompactMarkerForModelMetadataUsesArchitecture. ChatTemplateUtilsTest now runs 19/19 tests on linux-x64 (was 21/21). Co-authored-by: Cursor <cursoragent@cursor.com> * test(llm): drop redundant useCpuForVision alias; vision runs on GPU on mobile After we removed the per-mobile CPU carve-out for Gemma 4 vision (commit 2843297) and never had one for Qwen3.5 vision, useCpuForVision was just a no-op alias of useCpu used at exactly one call site each. Inline it. Net effect on the device routing matrix is unchanged but explicit: platform/arch useCpu device used -------------------------------------------------------- darwin-x64 true cpu (no working GPU here) linux-arm64 true cpu (no working GPU here) darwin-arm64 (M-series) false gpu (Metal) linux-x64 false gpu (Vulkan/OpenCL) ios false gpu (Metal -- device farm) android false gpu (Adreno OpenCL / Mali Vulkan -- device farm) So on iOS / Android the gemma4 and qwen3-5 image-describe subtests run through the actual GPU vision path -- the same path users hit -- and will surface any regression from CI rather than from production. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(llm): correct thinkingForcedOpen_ comment re: gemma4 Gemma4 does not hit this code path: upstream common_chat_params_init_gemma4 explicitly leaves thinking_forced_open unset because gemma4's reasoning channel is model-emitted. Drop the misleading reference and call out the actual templates that trigger this path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(changelog): refresh PR-1874 entries to reflect actual shipped scope The original CHANGELOG entries for llm-llamacpp 0.20.0, embed-llamacpp 0.16.0, and translation-nmtcpp 3.0.0 were drafted before the synthesizer revert, the M1 / sawMali / dead-code cleanups, the bartowski fixture swap, the native-dialect tool-call parsing, the reasoning-budget knob, the thinkingForcedOpen synthetic-opener, the new integration tests, and the move from 8189.0.0 to 8189.0.2. They now match what the PR actually ships. Compressed every entry to a flat bullet list grouped by Keep-a-Changelog section (Changed / Added / Removed / Fixed / Deprecated / Internals) and bumped the date to 2026-05-10. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(changelog): trim items that round-trip to net-zero in the PR Removed lines that described code that's neither in upstream/main nor in the PR head (so it has no observable impact on consumers): - llm-llamacpp 0.20.0: * "tool-call streaming: each model now streams its native dialect / no re-shaping" -- main already streamed native dialects; the PR-internal synthesizer never shipped, so this is a non-change. * "Dropped sawMali plumbing / Apple-M1 detection / dead Gemma 4 markers in Qwen3ReasoningUtils" -- all three were added and removed inside this PR's commit history; net diff is zero. - embed-llamacpp 0.16.0: * "Dropped Mali-detection plumbing" -- same: added and removed within this PR's history, net diff is zero. Kept genuine net removals against upstream/main: - Qwen3 model-name-based fallback. - Dead `selectToolsCompactMarker(std::string)` overload (was pre-existing in main, only ever called from unit tests). Co-authored-by: Cursor <cursoragent@cursor.com> * docs(notice): regenerate NOTICE for embed-llamacpp, llm-llamacpp, translation-nmtcpp Re-ran the notice-generate skill (.cursor/skills/notice-generate) for the three addons whose dependency surfaces changed in this PR: - qvac-fabric bumped from 7248.x to 8189.0.2 -- different transitive C++ license set. - holepunch / hyperswarm libs moved to peerDependencies on main, so the JS attribution lists shrink accordingly. - @qvac/infer-base bumped to 0.4.1. Per-package C++ resolution after the run: embed-llamacpp : opencl/qvac-fabric/qvac-lib-inference-addon-cpp/ qvac-lint-cpp + libc++ (5 deps) llm-llamacpp : the above + picojson + nlohmann-json (7 deps) translation-nmtcpp : bergamot-translator/sentencepiece/ssplit/ qvac-fabric/qvac-lib-inference-addon-cpp/ qvac-lint-cpp + libc++ (7 deps) Net: +206 / -585 lines across the three NOTICE files (mostly transitive JS attribution shrink from the holepunch peerDeps refactor). Co-authored-by: Cursor <cursoragent@cursor.com> * test(llm): make gemma4 reasoning-budget test tolerate model-emitted reasoning Gemma 4's reasoning channel is model-emitted (no template force-open), so the model decides per-prompt whether to engage reasoning. For trivial prompts like "What is the capital of France?" the model can short-circuit and skip the <|channel>thought…<channel|> markers, which made the test flaky on CI. Gate the marker / length assertions on the baseline actually emitting the opening marker; if it didn't, log a comment and skip the dependent checks instead of failing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * types(llm): declare reasoning_budget in LlamaConfig The C++ config parser already accepts `reasoning_budget` (and the kebab-case `reasoning-budget` alias), but neither was a typed property on `LlamaConfig` — they only typechecked via the catch-all index signature. Add a typed entry with JSDoc so TypeScript consumers get autocomplete and the accepted values (-1 default, 0 disabled). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(llm): allow per-request reasoning_budget override in run() `reasoning_budget` was load-time only. Add it to `GenerationParams` so `model.run(messages, { generationParams: { reasoning_budget: 0 } })` can disable reasoning for a single request without re-loading the model — same shape as `temp` / `top_p` / `seed` overrides. Wiring: - `LlmContext::GenerationParams` gains an optional `reasoning_budget` field and `hasOverrides()` covers it. - `applyGenerationParamsToContext` snapshots / overrides / restores `params.reasoning_budget` alongside `n_predict`. - `AddonJs::runJob` parses `generationParams.reasoning_budget` from JS and rejects values other than `-1` or `0`. - `index.d.ts` exposes `reasoning_budget?: -1 | 0` on `GenerationParams` with a JSDoc note. `tokenizeChat` already reads `params_.reasoning_budget`, so no change is needed in `TextLlmContext` / `MtmdLlmContext` — the temporary override naturally propagates to `inputs.enable_thinking`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(llm): cover per-request reasoning_budget override on Qwen3.5 Validates the new per-request `generationParams.reasoning_budget` override end-to-end in two runs against a single loaded model: 1. `reasoning_budget: 0` override suppresses the `<think>…</think>` reasoning markers for that one request. 2. The next `run()` with no override restores the load-time default (reasoning enabled), proving the override is request-scoped and not sticky. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(llm): case-insensitive antiprompt substring matching `checkAntiprompt` now lowercases both the recent output window and each antiprompt before the `find()` so a single `Pizza` entry catches the model's `pizza`, `Pizza`, `PIZZA`, etc. Callers no longer need to list every casing variant. Applied identically in `TextLlmContext` and `MtmdLlmContext`. The token-level early-exit path is unchanged (BPE tokens are case-specific; the substring path is the authoritative check). Also drop the stale comment on the `Reverse prompt stops generation` scenario in `config-parameters.test.js`: it claimed the addon split on `,` without trimming, but `LlamaModel.cpp::split()` already trims and drops empty segments. Replaced with a brief note that documents the new (current) behaviour and simplified the antiprompt list to `'network, Pizza, bitcoin, blockchain'` so the test exercises both the trim and the case-insensitive match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(llm): stress case-insensitive antiprompt with PiZzA mixed-case entry Swap the `Pizza` reverse_prompt entry for `PiZzA`. With case-sensitive matching `PiZzA` would never match the model's `pizza` / `Pizza` output; only case-insensitive comparison fires the stop. Verified locally — the test still completes with output length 5, so the antiprompt trips on the first emitted "Pizza". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(llm): validate reasoning_budget before truncating to int Address @jpgaribotti's review: previously the value was cast to int *before* the `0` / `-1` check, so fractional inputs like `0.5` or `-1.1` would silently truncate to a "valid" 0 / -1 and pass through. Validate against the exact double values (both `0` and `-1` are exactly representable in IEEE-754, so `==` comparison is safe) before casting to int when storing in `ov.reasoning_budget`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(llm): use std::from_chars for reasoning_budget load-time parse Address @jpgaribotti's review: `std::stoi` silently accepts trailing garbage (`"0abc"` → `0`) and throws an uncaught `std::out_of_range` for inputs that overflow `int`. Switch to `std::from_chars`, which fails clean on non-numeric input, overflow (`errc::result_out_of_range`), and trailing garbage (`ptr != end`), then validate against the allowed `-1` / `0` values in the same check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Marcus Edel <marcus.edel@collabora.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…therto#1956) - Bump @qvac/rag to 0.5.0. - Add packages/rag/changelog/0.5.0/{CHANGELOG.md,CHANGELOG_LLM.md}. - Prepend [0.5.0] entry in root packages/rag/CHANGELOG.md (Keep a Changelog format). - Regenerate packages/rag/NOTICE. (cherry picked from commit cbdbaea) Co-authored-by: Cursor <cursoragent@cursor.com>

…1966) Migrates `environment: release` -> `environment: npm` on every job that invokes `./.github/actions/publish-library-to-npm`, in lockstep with the github-ops repo config (qvac/repos.json npm trustedPublishing.environment) and the npmjs Trusted Publisher records (QVAC-18610). Scope: only the npm-publishing jobs are flipped. Build, GPR-publish, publish-logic, release-merge-guard, lint-and-test and other jobs that reference `environment: release` for `secrets.PAT_TOKEN` access are left untouched. `id-token: write` is preserved on every flipped job. Files: 16 changed, 18 jobs flipped: - publish-sdk.yml: publish-npm - publish-registry-server.yml: publish-schema-npm, publish-client-npm - on-merge-{bci-whispercpp,decoder-audio,diffusion-cpp,embed-llamacpp, llm-llamacpp,ocr-onnx,onnx,transcription-parakeet, transcription-whispercpp,translation-nmtcpp,tts-ggml,tts-onnx}.yml: publish-(release-)?npm - trigger-reusable-lib-cli.yml: publish-release-npm - public-reusable-npm.yml: pull-request-event, push-event Co-authored-by: Cursor <cursoragent@cursor.com>

…1968) * QVAC-18608 feat(actions): add label-gate composite action (Node 20) Introduces a new `.github/actions/label-gate` action that authorises secret-bearing workflow jobs based on whether a trusted actor has applied a "verified" label to the pull request. Replaces per-job environment approvals as the primary trust gate for PR-triggered workflows. Trust model: - Trusted events (push, workflow_dispatch, workflow_call, schedule, release) -> always authorised. - PR events (pull_request, pull_request_target) -> authorised iff the applier of the configured `label` is in the `users` allowlist OR is an active member of any team in `teams`. Login comparison is case-insensitive. - Synchronize from a non-trusted sender -> strip the label, deny. - Anything else -> fail closed. Inputs: - label (default: "verified") - teams (default: qvac-internal-{dev,merge,release}) - users (default: empty) -- new explicit allowlist - github-token (required; needs read:org and PR-label write) Output: - authorised ("true" | "false") -- downstream jobs gate via `if: needs.<id>.outputs.authorised == 'true'`. Implementation: - Pure-Node 20 action; no npm dependencies, no bundler, no `dist/` artifact to maintain. Three small ESM modules in src/. - github-client.mjs: native-fetch wrapper for the three endpoints used (team membership, issue timeline, label deletion); retry-with- exponential-backoff on 5xx and 429; full pagination on the timeline; idempotent label strip; URL-encodes inputs. - gate.mjs: pure async decision function with an injected client; never throws on policy denials. - index.mjs: action entrypoint; reads INPUT_* env, writes `authorised=` to $GITHUB_OUTPUT, emits structured `::notice::` / `::warning::` / `::error::` annotations. Hard misconfig (missing token, unreadable event payload, unhandled API error) exits non-zero so the gate job goes red. Soft denials exit 0 with `authorised=false`. Tests: - 41 tests via the built-in `node:test` runner (no test deps). - test/gate.test.mjs: 26 policy tests covering every event type, both fast-path and timeline-path resolution, synchronize protection, user-allowlist precedence, empty-config denial, and input validation. - test/github-client.test.mjs: 15 HTTP tests covering retry policy, pagination, 404-as-not-member, idempotent strip, URL encoding, and constructor validation. Uses an injected fetch stub. - test/fixtures/: 8 hand-rolled GitHub event payloads, including a new `labeled-by-allowlisted-user` case for the `users` input. Run locally: `node --test .github/actions/label-gate/test/*.test.mjs`. Note: the existing `authorize-pr` action is intentionally left in place and unchanged. Migration to label-gate will happen workflow-by-workflow in follow-up PRs to allow incremental rollout against production. Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087 Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-18608 fix(label-gate): deny when gate label not currently applied Pre-merge audit caught a label-strip bypass: 1. Alice (team member) labels PR with 'verified' -> labeled event -> gate authorises -> secrets used 2. Mallory (any contributor with triage) removes the label off-band -> unlabeled event fires but no workflow subscribes to it 3. Alice (or anyone) pushes a new commit -> synchronize event -> gate runs -> sender = alice, isTrustedActor(alice) = true, falls through -> findLabelApplier walks the timeline and finds Alice's old labeled event from step 1 (the unlabeled doesn't disqualify it) -> applier = alice = trusted -> AUTHORISED -> ...even though the label is no longer on the PR. Fix: consult `payload.pull_request.labels` (the authoritative current label state) before trusting the timeline. If the gate label is not currently applied, deny without making any GitHub API calls. Also restructured the synchronize handler so the label-applied check runs BEFORE the sender-trust API calls, avoiding 3 wasted team-membership lookups per PR that doesn't actually have the gate label. Tests: - REGRESSION: synchronize after label was removed -> deny even if timeline still shows trusted applier (must short-circuit before timeline lookup) - REGRESSION: opened/reopened PR with stale labeled timeline but no current label -> deny - synchronize from non-trusted with no label currently applied -> deny with zero API calls 44 tests now pass (was 41); end-to-end smoke against the new logic verifies both the bypass scenario (denied, zero API calls) and the happy path (allowlisted user labels -> authorised, zero team API calls). Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087 Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

…etherto#1972) * doc: addons - diffusion - update * doc: addon - diffusion - put flux as main model to be used

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

…etherto#1973) The QVAC-18612 canary (PR tetherto#1971, run id 25672483584) hard-failed with "required input 'github-token' is missing" even though the workflow clearly passed `github-token: ${{ secrets.GITHUB_TOKEN }}`. Root cause: `getInput` in src/index.mjs was uppercasing the input name AND replacing hyphens with underscores, looking up `INPUT_GITHUB_TOKEN`. The GitHub Actions runner (and @actions/core) preserve hyphens — only spaces are replaced — so the runner sets `INPUT_GITHUB-TOKEN`. The action never found the token and threw a missing-input error. The local smoke test that "passed" before merge set `INPUT_GITHUB_TOKEN=...` (matching the buggy lookup) so both sides were wrong in the same direction. This is exactly the failure mode the canary was meant to surface; without it, the gate would have failed across all 75 secret-bearing workflows on first PR after the QVAC-18612 fan-out. Fix: - getInput now uses `name.replace(/ /g, '_').toUpperCase()` — matching the runner / @actions/core convention exactly. - getInput is exported from src/index.mjs (with an injectable env arg) so the convention can be unit-tested. - Top-level main() is gated on `import.meta.url === argv[1]` so importing index.mjs from tests no longer triggers a real run. Tests: - 9 new tests in test/index.test.mjs pin the env-var-name resolution: * INPUT_GITHUB-TOKEN (hyphen preserved) -> resolves * INPUT_GITHUB_TOKEN (hyphen replaced) -> does NOT resolve (locks the contract against accidental "helpful" rewrite) * spaces are still replaced with underscores * trim, missing-required, defaults-to-process.env - Total: 53/53 pass via `node --test`. - End-to-end smoke against the runner-correct env-var name (INPUT_GITHUB-TOKEN=...) confirms exit 0 and authorised=false on the no-label deny path. Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087 Related: tetherto#1971 Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-05-22T12:38:37Z

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage

Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

@GustavoA1604

…c GGML backends (tetherto#2124) * transcription-whispercpp: bump to 0.7.1 with whisper-cpp 1.8.4.3#1 (QVAC-18993) Pull in the consolidated vcpkg PR (whisper-cpp 1.8.4.3 tetherto#1 + ggml-speech 2026-05-18 tetherto#1) that covers four asana tickets: - QVAC-18991: whisper.cpp upstream-sync from ggml-org/master to v1.8.4.3. Adds upstream's VAD streaming API (whisper_vad_detect_speech_no_reset, whisper_vad_reset_state) with a regression test, the macOS Vulkan persistent-pipeline cache, and various BCI / bindings fixes. - QVAC-18300: enables OpenCL on Whisper for Android, gated behind a new `opencl` feature. This package now declares an android-only `opencl` feature that wires through to the whisper-cpp port's opencl feature, so a transcription addon built for android-arm64 can ship the Adreno backend without forcing it on non-Adreno consumers. - QVAC-18992: rebases the speech-stack ggml (qvac-ext-ggml@speech) onto the same upstream v0.10.2 baseline that whisper.cpp's bundled ggml uses, so the QVAC speech stack (whisper + parakeet + tts-cpp) consumes a coherent ggml API surface. No direct dependency from this package -- transitive via other speech-stack addons sharing the Android process. - QVAC-18993: switches the Android build to pure dynamic-backend mode: GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON on both the whisper-cpp port and ggml-speech port, so the addon's .bare prebuild ships one libggml-cpu-android_armv*_*.so per microarchitecture plus dynamically-loaded libggml-vulkan.so / libggml-opencl.so. ggml's loader picks the highest-feature CPU variant (armv9.2_2 .. armv8.0_1) plus the right GPU backend (Adreno 700+ -> OpenCL, everything else -> Vulkan) at runtime, so a single APK serves the whole device matrix without per-device builds. vcpkg-configuration.json is TEMPORARILY pointed at Zbig9000/qvac-registry-vcpkg.git @ b5a5e199 (= QVAC-vcpkg-speech-stack-android-dynamic-backend HEAD on Zbig9000's fork) because the consolidated port versions don't exist on tetherto/main yet. Once the vcpkg PR lands the default-registry block must be re-pointed back to https://github.com/tetherto/qvac-registry-vcpkg.git with the post-merge tetherto/main SHA as baseline. Devicefarm: the asana asks for GPU testing on mobile to verify S25 picks OpenCL and Pixel 9 picks Vulkan. Those tests live outside this addon (in qvac CI's integration-mobile-test workflow) and depend on device-farm config that I can't validate locally; the addon code side is unchanged in this bump (CPU dispatcher + dynamic backend `.so` files are already wired by the whisper-cpp port's prebuild output, and the JS layer already enumerates ggml_backend_devs at init). * transcription-whispercpp: bump to 0.7.2 with whisper-cpp 1.8.4.3#2 (QVAC-18993) Picks up the Android per-arch CPU dlopen fallback patch added to the whisper-cpp port (mirrors qvac-ext-ggml@speech 9562ed04). Without this, every APK consumer with `useLegacyPackaging=false` (AGP 3.6+ default) would silently lose CPU init: the directory iterator finds nothing inside compressed APK libs, and the existing on-disk filename fallback never composes the per-arch `libggml-cpu-android_armv*_*.so` names that `GGML_CPU_ALL_VARIANTS=ON` produces. Re-pins the Zbig9000/qvac-registry-vcpkg default-registry baseline to 86257dc376ca043c67cc4805ab8d1e74a94b7eda so both whisper-cpp 1.8.4.3#2 and ggml-speech 2026-05-19#0 are reachable. Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: bump to 0.7.3 → whisper-cpp 1.8.4.3#3 (QVAC-18993) Pure follow-up to 0.7.2 -- the two Android dynamic-backend ggml fixes the 0.7.2 release pulled in via vcpkg patches are now upstreamed as commits on tetherto/qvac-ext-lib-whisper.cpp PR tetherto#26 ("ggml + tts-cpp Android dynamic-backend overlays") instead of being carried in the vcpkg port's patches/ tree. Plus a tts-cpp `<atomic>` include fix that closes the parallel speech-stack consumer's build under the day-2 ggml-speech merge. Build output is bit-identical to 0.7.2 (whisper-cpp 1.8.4.3#3 SOURCE == 1.8.4.3#2 SOURCE+PATCHES, verified by hashing all libggml-cpu-android_armv*_*.so files from the NDK r29 cross-compile). Registry baseline bumped to 965f5e5a so the new port-version (1.8.4.3#3) is reachable. PRs in the cross-repo set: whisper.cpp tetherto#26 (Zbig9000:QVAC-18993-bundled-ggml-android-dynamic-backend) vcpkg tetherto#152 (Zbig9000:QVAC-vcpkg-speech-stack-android-dynamic-backend) Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: bridge ggml dlopen backends as IMPORTED targets (QVAC-18993) `bare-make generate` failed on android-arm64 with CMake Error: get_target_property() called with non-existent target "ggml::ggml-cpu-android_armv8.0_1" (… 8 backends total) after enabling `GGML_BACKEND_DL=ON` on the `whisper-cpp` port. With dynamic- backend mode, ggml builds the per-arch CPU + GPU backends as standalone MODULE libraries that ggml dlopens at runtime; upstream ggml's `install(TARGETS … EXPORT)` deliberately skips them, so the consumer's `BACKEND_DL_LIBS` loop in `CMakeLists.txt` referenced targets that don't exist. Wrap the existing loop with a `if(NOT TARGET ggml::${_backend})` fallback that locates the `.so` under `${VCPKG_INSTALLED_PATH}/bin` via `find_library` and materialises a `SHARED IMPORTED` target locally with `IMPORTED_NO_SONAME=TRUE` — then bundle via the existing `INSTALL TARGET` path. Mirrors the pattern that already ships in `packages/diffusion-cpp` for the same Android-dlopen build mode. Static backends (any platform that links ggml in directly) still find their imported target via ggml-config.cmake on the first branch, so non-Android prebuilds are byte-identical. Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: re-pin baseline to vcpkg PR tetherto#152 rebased HEAD 8c6ca188 (QVAC-18993) tetherto/qvac-registry-vcpkg/main moved forward yesterday with tetherto#156 (parakeet-cpp 2026-05-20 + ggml-speech 2026-04-09#2 bumps), so vcpkg PR tetherto#152 was rebased onto the new base 0e75457. Update the default- registry baseline pointer from the old PR tetherto#152 HEAD (dffaaf6) to the rebased HEAD (8c6ca188) so the version-resolver still finds `ggml-speech 2026-05-19#3` (now layered on top of the just-landed 2026-04-09#2) and `whisper-cpp 1.8.4.3#3` (unchanged content, correct SHA512). No other changes --- the resolver picks up the same final versions of every package as before, just with the rebased baseline as the search root. Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: consume whisper-cpp 1.8.4.3#4 + ggml-speech 2026-05-19#4 (QVAC-18993, QVAC-18992) Picks up the MSVC `/I` fix in the spirv-headers include-shim (vcpkg PR tetherto#152 commit 5cd209c) so prebuild / win32-x64 stops dying with `c1xx: fatal error C1083: Cannot open source file: '.../x64-windows/include'` on the `whisper-cpp[vulkan]` configure step. The shim now emits the MSVC-style `/I<path>` on Windows and keeps `-isystem <path>` (with warning suppression) on GCC/Clang elsewhere. whisper-cpp override bumped 1.8.4.3#3 -> 1.8.4.3#4. Default-registry baseline bumped 8c6ca188 -> 5cd209c1. Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: wire ENABLE_OPENCL so Android prebuilds ship libggml-opencl.so (QVAC-18300) The `opencl` feature was declared in `packages/transcription-whispercpp/vcpkg.json` (gated to `platform: android`) and the `whisper-cpp` port's `opencl` feature correctly enables `-DGGML_OPENCL=ON` on Android — but the consumer's `CMakeLists.txt` only appended `"tests"` and `"vulkan"` to `VCPKG_MANIFEST_FEATURES`. The `opencl` feature was therefore never activated, so vcpkg resolved `whisper-cpp` without `[opencl]`, ggml was built without `GGML_OPENCL=ON`, and the `android-arm64` prebuild silently shipped CPU + Vulkan backends only (no `libggml-opencl.so`) — defeating the entire point of QVAC-18300. Add an `ENABLE_OPENCL` option (default `ON` on Android, `OFF` elsewhere — the `vcpkg.json` feature is `platform: android` gated so non-Android is a no-op anyway) that appends `"opencl"` to `VCPKG_MANIFEST_FEATURES`. Mirrors the `SD_OPENCL` pattern in `packages/diffusion-cpp/CMakeLists.txt` and keeps the GPU-feature wiring uniform across the three GPU backends (Metal auto, Vulkan toggle, OpenCL toggle). After this commit, the `android-arm64` prebuild's `qvac__transcription-whispercpp/` directory should ship `libggml-opencl.so` alongside the existing 7 per-microarch CPU variants and `libggml-vulkan.so`. Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: default ENABLE_OPENCL ON unconditionally (QVAC-18300) Previous commit (6b42bc0) wired ENABLE_OPENCL but gated it on `_qvac_whispercpp_target_os STREQUAL "Android"`, mirroring the existing ENABLE_VULKAN block. CI re-run (26172345624) exposed that the gate is broken: at top-level CMakeLists.txt time, `CMAKE_SYSTEM_NAME` is not yet set --- the bare-make Android toolchain file is loaded by `project()` (which runs *after* the option block), so `_qvac_whispercpp_target_os` falls through to the host OS ("Linux") and ENABLE_OPENCL stayed OFF on the android-arm64 prebuild. Evidence from run 26172345624's android-arm64 build log: `Installing 9/9 whisper-cpp[core,vulkan]:arm64-android@1.8.4.3#4...` ^^^^^^^^ no `[opencl]` ENABLE_VULKAN works only by coincidence: Vulkan is also default-ON on the Linux host detection branch, so the wrong target detection produces the right behaviour. For Android-only features there is no such overlap. Fix: default ENABLE_OPENCL ON unconditionally and let the actual platform gating happen where it can: (1) the `platform: android` clause on the `whisper-cpp[opencl]` dep in `vcpkg.json`, and (2) the `VCPKG_TARGET_IS_ANDROID` check in the `whisper-cpp` portfile that gates `-DGGML_OPENCL=ON`. Adding `"opencl"` to `VCPKG_MANIFEST_FEATURES` on non-Android is a guaranteed no-op because the feature's only dep is platform-gated --- mirrors the layered gating that `whisper-cpp[vulkan]` already uses (the `vulkan` feature's deps are `!osx & !ios` gated and the portfile's `-DGGML_VULKAN=ON` is also target-gated). After this commit, the android-arm64 install plan should resolve as `whisper-cpp[core,vulkan,opencl]` and the prebuild tarball should contain `libggml-opencl.so` alongside the 7 per-microarch CPU `.so`s and `libggml-vulkan.so`. Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: call ggml_backend_load_all_from_path before whisper_init (QVAC-18993) Android mobile-test E2E crashed inside whisper_init_from_file_with_params with SIGABRT on PR tetherto#2124 / run 26173084690 (both Pixel 9 Pro + Samsung S25 Ultra, 132 ms after Downloaded model: ggml-tiny.bin). Stack: abort → ggml_abort+228 → ggml_backend_dev_backend_reg+48 → whisper_init_with_params_no_state+480 → whisper_init_from_file_with_params_no_state+212 → whisper_init_from_file_with_params+48 → WhisperModel::load()+460 Root cause: the addon never called ggml_backend_load_all*(). With the QVAC-18993 GGML_BACKEND_DL=ON build, the bundled ggml-base no longer defines GGML_USE_CPU, so the static ggml_backend_registry ctor registers zero backends. whisper.cpp's first ggml_backend_init_by_type(CPU) returns NULL → ggml_backend_dev_backend_reg(NULL) trips GGML_ASSERT(device). This is the same crash signature on both the pre-OpenCL run 26170576156 and the post-OpenCL run 26173084690, so it is independent of the recent OpenCL enablement. The mobile workflow last passed on tmp-whisper-184-3-validation back on 2026-05-11, which predates GGML_BACKEND_DL=ON. Mirror the pattern used by every other ggml-based addon in the monorepo (packages/{diffusion-cpp,llm-llamacpp,classification-ggml,…}): * CMakeLists.txt — emit BACKENDS_SUBDIR (<bare_target>/<module_name>) compile def via bare_target / bare_module_target. * WhisperConfig — add backendsDir field (sibling of the handler-driven maps so it bypasses WHISPER_CONTEXT_HANDLERS.at()). * JSAdapter — read top-level backendsDir string directly from configurationParams into config.backendsDir. * WhisperModel::load — on __ANDROID__, std::call_once → ggml_backend_load_all_from_path(backendsDir/BACKENDS_SUBDIR) before whisper_init. * index.js — require('bare-path'); pass backendsDir: path.join(__dirname, 'prebuilds') in _load + reload. No diff on non-Android (Linux/macOS/Windows/iOS): ggml's static ctor keeps registering CPU there as before. aiDocs/15-android-mobile-test-crash-fix.md has the full investigation (crash extraction, layered root-cause, why every other ggml addon already does this, follow-ups). Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: re-pin vcpkg baseline to cleaned PR tetherto#152 head (QVAC-18993) PR tetherto#152 (qvac-registry-vcpkg) was rebased today to drop the ggml-speech port bump (b4cf7b2) and the matching ggml-speech-side MSVC shim. Only the whisper-cpp bump + whisper-cpp portfile MSVC `/I` fix remain. The consumer-side migration to ggml-speech (QVAC-18992 / PR tetherto#13) stays open on the speech branch but is no longer a prerequisite for this Android dynamic-backend rollout. New PR tetherto#152 HEAD: 9f4e8e20072d8a7a1e118a49c36aacf6af6b3e0d Old (pre-cleanup): 5cd209c145a1d61636f1d44b4afe37868c298a8c This addon does not depend on `ggml-speech` (it consumes the bundled ggml inside `whisper-cpp`), so the dependency closure is unchanged. Updated CHANGELOG to record the new baseline + the reason ggml-speech got dropped. Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: fix cpp-lint failures (clang-format + clang-tidy) The prior CI run skipped cpp-lint entirely because the recent PR commits only touched CMakeLists.txt / CHANGELOG.md. The new ea298cf commit (QVAC-18993 mobile-test fix) added the first C++ diff in this branch, so cpp-lint now runs full clang-format + clang-tidy and surfaces three issues: 1. clang-format: JSAdapter.cpp had a one-line declaration broken across two lines (LLVM PointerAlignment=Left + AlignAfterOpen collapsed it). Reformatted in place. 2. clang-tidy [readability-identifier-naming]: WhisperHandlers.hpp:9 -- local `const int LANG_ID` violates the variable case style. Renamed to `langId` (lowerCamelCase, matches `checkLanguage` two lines above). Latent issue; never reported before because cpp-lint was a no-op on every prior PR commit. 3. clang-tidy [readability-identifier-naming]: WhisperModel.hpp:100 -- unused `set_weights_for_file(span, bool)` stub kept for parity with `transcription-parakeet` (which uses snake_case extensively for this exact API). Renaming would diverge from the parakeet pattern, so suppress with a single NOLINTNEXTLINE rather than touching the API surface. Local repro: `cp packages/lint-cpp/.clang-format packages/transcription-whispercpp/.clang-format` then `git-clang-format --diff $(git merge-base HEAD origin/main) -- packages/transcription-whispercpp` reports `did not modify any files`. The .clang-format copy is normally produced by `packages/transcription-whispercpp/CMakeLists.txt:58 (configure_file COPYONLY)` during CMake configure. [QVAC-18993][QVAC-19071] Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: reference QVAC-19071 in CHANGELOG QVAC-19071 ([Whisper] Update qvac-registry-vcpkg and addon with new port versions) is the meta task that bundles the registry-side port bump (qvac-registry-vcpkg PR tetherto#152: whisper-cpp 1.8.4.3#4) with the consumer-side addon bump (qvac PR tetherto#2124: transcription-whispercpp 0.7.3, baseline re-pin). No code changes; the work itself was already covered by PR tetherto#152 + this PR. Adds the cross-reference so the Asana ticket can be closed off this release cycle. The QVAC-18992 ggml-speech migration (PR tetherto#13 + ggml-speech port bump) stays deferred per the 2026-05-21 plan; it will land as a follow-up port bump under the same QVAC-19071 umbrella. [QVAC-19071] Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: re-pin baseline to consume whisper-cpp 1.8.4.3#5 (REF flipped to tetherto/master) [whisper-cpp PR tetherto#28](tetherto/qvac-ext-lib-whisper.cpp#28) (QVAC-18993 bundled-ggml --- Android dynamic backend + per-arch CPU dlopen fallback) was merged today (2026-05-21, merge commit `f3102199` on `tetherto/qvac-ext-lib-whisper.cpp/master`). With it merged, `tetherto/master` now carries every commit the registry's `whisper-cpp` port previously pulled from the temporary `Zbig9000/qvac-ext-lib-whisper.cpp@14620c8857` branch: - PR tetherto#25 (QVAC-18991, upstream whisper.cpp sync) --- merged 2026-05-20 - PR tetherto#27 (QVAC-18966, tts-cpp chatterbox <atomic> fix) --- merged 2026-05-20 - PR tetherto#28 (QVAC-18993, ggml-backend android dynamic backend) --- merged 2026-05-21 [qvac-registry-vcpkg PR tetherto#152](tetherto/qvac-registry-vcpkg#152) HEAD (`f2870372`) bumps `whisper-cpp` to port-version `1.8.4.3#5` with the REF repoint --- byte-identical source tarball outside `parakeet-cpp/` and `tts-cpp/` (separate vcpkg ports). This commit just re-pins the consumer-side baseline so the addon resolves against the new port-version. vcpkg-configuration.json default-registry baseline: 9f4e8e20072d8a7a1e118a49c36aacf6af6b3e0d (MSVC fix only, whisper-cpp 1.8.4.3#4) -> f2870372965e899ae1f8a221154d2b243a6c3d30 (+ whisper-cpp 1.8.4.3#5 REF repoint) No code change in this monorepo --- pure baseline re-pin. CHANGELOG updated to record both the new baseline and the (now superseded) intermediate `9f4e8e2` pin. Closes the consumer-side half of [QVAC-19071](https://tetherapp.atlassian.net/browse/QVAC-19071) ("Update qvac-registry-vcpkg and addon with new port versions"). Registry-side half = vcpkg PR tetherto#152 commit `f287037`. [QVAC-18993][QVAC-19071] Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: re-pin baseline to whisper-cpp 1.8.4.3#0 (PR tetherto#152 review fixes) @GustavoA1604 review on [qvac-registry-vcpkg PR tetherto#152](tetherto/qvac-registry-vcpkg#152) requested three changes on the registry side: 1. Drop the explanatory comment block at top of `ports/whisper-cpp/portfile.cmake`. 2. Reset `port-version` 5 -> 0 (treat the tetherto REF repoint as a fresh start, not a continuation of the Zbig9000-branch series). 3. Collapse the three historical `1.8.4.3` entries (`port-version` 3, 4, 5 -- never consumed off-fork) in `versions/w-/whisper-cpp.json` into a single `port-version: 0` entry with the new git-tree. All three landed in PR tetherto#152 commit `ee71ecb`. This commit is the consumer-side mirror: vcpkg-configuration.json default-registry baseline: f2870372965e899ae1f8a221154d2b243a6c3d30 (1.8.4.3#5, pre-review) -> ee71ecb5b286224377313e5a50558d11adbef3ac (1.8.4.3#0, post-review) CHANGELOG entry updated: "1.8.4.3#5" -> "1.8.4.3#0" + note about port-version reset and history collapse + supersession line covers both prior pins (`9f4e8e2` MSVC fix, `f287037` 1.8.4.3#5). No code change in this monorepo --- pure baseline re-pin. The underlying whisper.cpp source bytes are unchanged (REPO + REF + SHA512 in the portfile are identical between `1.8.4.3#5` and `1.8.4.3#0`), so the produced binary is bit-for-bit equivalent. [QVAC-18993][QVAC-19071] Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: 0.8.0 — address PR review Collapses the 0.7.1/0.7.2/0.7.3 work into a single 0.8.0 release and folds in Gustavo's PR tetherto#2124 review feedback: - Bump version to 0.8.0; collapse CHANGELOG into a single 0.8.0 entry - Bump whisper-cpp override to 1.8.4.3#0 (matches PR tetherto#152 collapse) - Repoint default-registry to tetherto/qvac-registry-vcpkg @ a9d7e924 (PR tetherto#152 merge commit on tetherto/main) - vcpkg.json: model GPU features on transcription-parakeet's pattern — platform-gated whisper-cpp deps select [opencl,vulkan] on android, [vulkan] on linux/windows, and no GPU feature on apple. Drop the addon-side opencl/vulkan feature sections; CMake no longer carries ENABLE_OPENCL / ENABLE_VULKAN option indirection. - index.js: nest backendsDir under whisperConfig (mirrors parakeet's parakeetConfig.backendsDir). Strip it from the wire-format whisperConfig map and surface it as top-level configurationParams.backendsDir before handing the config to the addon. Fix the stale _createAddon JSDoc that still described "LLM-specific settings". - index.d.ts + README.md: document whisperConfig.backendsDir; drop the ENABLE_VULKAN build instructions (now controlled by vcpkg.json). - Compact all the addon-side comments (CMakeLists.txt, JSAdapter.cpp, WhisperConfig.hpp, WhisperModel.cpp); drop every QVAC asana ticket reference; standardise the C++ log wording on "configurationParams.backendsDir". - Drop "-D ENABLE_VULKAN=OFF" from the test:cpp:build / coverage:cpp:build npm scripts (no-op now that the option is gone). Co-authored-by: Cursor <cursoragent@cursor.com> * transcription-whispercpp: 0.9.0 -> 0.8.0 (fold into single release) Reverts the 0.8.0 -> 0.9.0 bump from the merge commit: per request, this PR's release notes are folded into the existing 0.8.0 entry rather than shipping as a separate semver step. Order: Added -> Changed -> Fixed (from this PR) -> Removed (the OutputCallbackJs revert that landed on main as 0.8.0 via tetherto#2133). package.json bumped back to 0.8.0. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

github-actions · 2026-05-22T13:08:36Z

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage

Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

github-actions · 2026-05-22T13:12:20Z

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage

Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

This reverts commit 613934a.

…est.js. Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

github-actions · 2026-05-22T14:19:35Z

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage

Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

github-actions · 2026-05-22T15:21:10Z

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage

Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

github-actions · 2026-05-22T18:04:11Z

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage

Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

…o#2210) The shared composite refactor in tetherto#2153 dropped the `npm run mobile:copy-prebuilds` step that the old monolith workflow ran before building the APK/IPA. That script copies `weights/mobilenetv3_3class_v3_fp16.gguf` into `test/mobile/testAssets/mobilenetv3_3class_v3_fp16.gguf.bin` (also staging test images) so the React Native bundler packs them as assets and `global.assetPaths` exposes them on-device. Without it, every Android E2E run since 2026-05-20 has crashed bare on startup with `Uncaught (in promise) Error: Mobile asset not found in global.assetPaths: mobilenetv3_3class_v3_fp16.gguf.bin` (SIGABRT from libbare-kit in `js_callback_s::on_call`). Last known-good run on main: 26168109084 (2026-05-20 14:10Z). All Android E2E runs since the refactor have failed identically across Pixel 9 Pro, Samsung S25 Ultra, and S26 Ultra. Mirrors the existing pattern in integration-mobile-test-vla.yml.

…#2208) Co-authored-by: Bruno Campana <7632562+BrunoCampana@users.noreply.github.com>

…etherto#2212) * infra[notask]: route classification cpp-tests linux to self-hosted Move the classification-ggml cpp-tests Linux entry off GitHub-hosted ubuntu-22.04 onto qvac-ubuntu2404-x64 (ubuntu-24.04), matching every other addon's cpp-tests workflow (vla, llm, embed, diffusion). Why: on GitHub-hosted jammy, `setup-llvm` apt-installs clang-19 + libc++-19 but does not run `update-alternatives`, so the unversioned `/usr/bin/clang++` keeps resolving to the system default clang-14. The vcpkg build then compiles ggml with clang-14 against libc++-19 headers and fails on the new `__verbose_abort` / `_LIBCPP_BEGIN_NAMESPACE_STD` macros (warning literally says "Libc++ only supports Clang 17 and later"). Self-hosted qvac-ubuntu2404-x64 is provisioned with clang-22 and the alternatives wired correctly per CLAUDE.md's setup-llvm guidance, so the unversioned `clang++` invocations resolve as expected. Windows + macOS entries left as-is — they're currently green. * infra[notask]: also route classification cpp-tests windows to self-hosted Per review feedback on PR tetherto#2212 — bump windows-2022 → windows-2025 and add runner: qvac-win25-x64, matching llm/embed/vla/diffusion cpp-tests.

github-actions · 2026-05-22T20:57:38Z

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage

Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

…js too (part 1) (tetherto#2216)

)

github-actions · 2026-05-23T05:53:38Z

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage

Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

Replace duplicated per-OS vcpkg clone/bootstrap steps with the shared setup-vcpkg action and drop the obsolete Windows user-profile override from reusable-prebuilds.

Co-authored-by: Cursor <cursoragent@cursor.com>

…tetherto#2184) * feat[bc]: migrate SDK parakeet transcription to 0.6.0 GGML Replace ONNX multi-file parakeet loading with single-GGUF models, duplex transcribeStream, and Q8_0 registry constants. Legacy ONNX modelConfig fields raise LegacyParakeetModelDeprecatedError. Wire local @qvac/transcription-parakeet 0.6.0 until publish. Co-authored-by: Cursor <cursoragent@cursor.com> * doc: standardize parakeet transcription example headers Align all six parakeet examples on a consistent file header: title, usage, brief description, and requirements only when needed. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: address PR review — restore API docs, legacy endOfTurn wire - Restore reference/api/index.mdx from main; keep only parakeet error update - Preprocess endOfTurn to accept legacy whisper frames without source - Whisper plugin: forward legacy addon endOfTurn (was silently dropped) - Document wire compatibility in transcription.mdx and 0.11.0 breaking.md Co-authored-by: Cursor <cursoragent@cursor.com> * fix: stabilize parakeet-stream-iterator-throw on iOS e2e Bare-RN tears down transcribeStream sessions asynchronously over JSI. After a consumer-side iterator throw, wait and retry before opening a recovery session so Device Farm does not see zero text events. Co-authored-by: Cursor <cursoragent@cursor.com> * feat[api]: wire Sortformer v2.1 streaming example and model constant Use transcribeStream with AOSC load config and add PARAKEET_SORTFORMER_STREAMING_4SPK_V2_1_Q8_0 to models.ts (blob metadata placeholder until update-models after registry sync). * feat[api]: adopt Sortformer v2.1 registry models in SDK Run update-models for PARAKEET_SORTFORMER_4SPK_V2_1_* (f16/q4/q8). Wire streaming example to v2.1 + transcribeStream/AOSC; point batch example and e2e parakeet-sortformer resource at v2.1 q8_0. * chore: remove manual parakeet entry from 0.11.0 breaking changelog * fix unit test --------- Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local> Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-05-24T07:07:06Z

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage

Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

Victor-Rodzko and others added 30 commits May 7, 2026 17:01

feat[mod]: add RealESRGAN x4 upscaler models to registry (tetherto#1943)

a4d3e73

QVAC-18543 fix: stabilize sdk e2e reports (tetherto#1920)

edef8fa

chore: Remove failing parakeet models (tetherto#1951)

78914b4

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

chore: Add BCI models to registry server (tetherto#1938)

a6a62d4

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

Update registry model constants (tetherto#1948)

d186d96

QVAC-17345: add scheduled SDK install-check pipeline on main (tethert…

d4edfd8

…o#1917) * infra: add scheduled SDK install-check pipeline on main * fix: update the review comments * infra: surface npm lifecycle script output via foreground-scripts in install-check

doc: add architecture manifesto and principles to docs/architecture (t…

a8c6785

…etherto#1942) * doc: add architecture manifesto and principles to docs/architecture * doc: drop internal North-Star/OKR/Google Doc references for public repo

QVAC-17357: auto-decide npm dist-tag so backports don't clobber latest (

31310d7

tetherto#1908) * infra: auto-decide npm dist-tag so backports don't clobber latest * fix: update the review comments

QVAC-17877 feat[mod]: add TTS GGUF models to prod registry (tetherto#…

734f896

…1957) * chore: Add chatterbox ggml models * chore: Add supertonic ggml models --------- Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

Bump version (tetherto#1970)

4f183f2

QVAC-18723 Error in Diffusion addon docs - quickstart doesn't work (t…

3651038

…etherto#1972) * doc: addons - diffusion - update * doc: addon - diffusion - put flux as main model to be used

chore: Add parakeet GGML models to qvac reg server (tetherto#1964)

3c62bc2

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

Merge branch 'main' into qvac-b8828

dd41751

zoq added 2 commits May 22, 2026 14:11

Revert "test: we don't use that in the CI."

8594b22

This reverts commit 613934a.

test: re-apply iOS stress-test budget trims in embed-llamacpp addon.t…

98706c5

…est.js. Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

Point to the latest fabric changes.

4b3cf80

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

Merge branch 'main' into qvac-b8828

f463d1e

olyasir and others added 5 commits May 22, 2026 21:53

Merge branch 'main' into qvac-b8828

98aa444

doc[notask]: checkout docs-release-setup before invoking it (tetherto…

fde5979

…#2208) Co-authored-by: Bruno Campana <7632562+BrunoCampana@users.noreply.github.com>

Merge branch 'main' into qvac-b8828

d589e00

tamer-hassan-tether added 4 commits May 23, 2026 00:20

infra: fix windows bare tooling setup on windows when installing node…

1210623

…js too (part 1) (tetherto#2216)

infra: fix windows bare tooling setup on windows (part 2) (tetherto#2217

9490f49

)

infra: setup-build-host on GH hosted runners only (tetherto#2218)

67139c4

Merge branch 'main' into qvac-b8828

4dce43d

tamer-hassan-tether and others added 6 commits May 23, 2026 12:27

infra: use setup-vcpkg action in cpp test workflows (tetherto#2220)

bbbbc2b

Replace duplicated per-OS vcpkg clone/bootstrap steps with the shared setup-vcpkg action and drop the obsolete Windows user-profile override from reusable-prebuilds.

release: bump PR 2088 package versions

771b261

Co-authored-by: Cursor <cursoragent@cursor.com>

revert inference job runner change

abb683f

Co-authored-by: Cursor <cursoragent@cursor.com>

release: pin qvac fabric overlay to GDN OpenCL fix

8060ff7

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge branch 'main' into qvac-b8828

0f8726b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] QVAC b8828#2088

[DRAFT] QVAC b8828#2088
zoq wants to merge 1396 commits into
tetherto:mainfrom
zoq:qvac-b8828

zoq commented May 15, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Conversation

zoq commented May 15, 2026

Uh oh!

github-actions Bot commented May 22, 2026

🧪 C++ Test Coverage Report

Uh oh!

github-actions Bot commented May 22, 2026

🧪 C++ Test Coverage Report

Uh oh!

github-actions Bot commented May 22, 2026

🧪 C++ Test Coverage Report

Uh oh!

github-actions Bot commented May 22, 2026

🧪 C++ Test Coverage Report

Uh oh!

github-actions Bot commented May 22, 2026

🧪 C++ Test Coverage Report

Uh oh!

github-actions Bot commented May 22, 2026

🧪 C++ Test Coverage Report

Uh oh!

github-actions Bot commented May 22, 2026

🧪 C++ Test Coverage Report

Uh oh!

github-actions Bot commented May 23, 2026

🧪 C++ Test Coverage Report

Uh oh!

github-actions Bot commented May 24, 2026

🧪 C++ Test Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants