Skip to content

[DRAFT] QVAC b8828#2088

Closed
zoq wants to merge 1396 commits into
tetherto:mainfrom
zoq:qvac-b8828
Closed

[DRAFT] QVAC b8828#2088
zoq wants to merge 1396 commits into
tetherto:mainfrom
zoq:qvac-b8828

Conversation

@zoq

@zoq zoq commented May 15, 2026

Copy link
Copy Markdown
Collaborator

Pull qvac-fabric to PR 126 via per-package overlay ports, and add the addon-side adjustments needed against the rebased fabric API.

Victor-Rodzko and others added 30 commits May 7, 2026 17:01
… bootstrap (tetherto#1899)

* test: pre-load multi-file model companion sets on bootstrap

Switch chatterbox/supertonic/parakeet/vision/diffusion to preLoadUnload
so loadModel() at bootstrap fetches every companion file (encoder,
decoder, vocab, projection, etc.) — otherwise they were lazily fetched
inside the first test, which caused the tts-chatterbox-short-text
Android flake (5 ONNX files + ONNX Runtime cold init blew through the
600s test watchdog).

Add async config resolver to ResourceManager so chatterbox can resolve
its referenceAudioSrc from the bundled RN asset registry at bootstrap
time; cached per-dep. Remove the now-obsolete
patchChatterboxReferenceAudio workaround from MobileTtsExecutor.

Extend the iOS transcribe() skip list to catch the call sites that
slipped past the ^transcription- regex
(config-reload-then-transcribe, error-transcription-failed).

Co-authored-by: Cursor <cursoragent@cursor.com>

* test: extract shared resolveBundledAssetUri helper

Pull asset resolution + file:// stripping out of the duplicated copies in
mobile/consumer.ts and mobile/executors/model-asset-executor.ts into a
single mobile/asset-uri.ts helper. Both sites now delegate, so future
changes to expo-asset handling live in one place.

Tighter idioms in the helper itself: regex strip instead of substring(7),
?? instead of ||, no mutable let.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: clean stale @qvac/sdk snapshots before consumer install

Add `clean:sdk-snapshot` to wipe the cached @qvac/sdk copies left over
in tests-qvac/node_modules and the iOS/Android consumer build dirs by
previous `npm install --install-links` runs. Wire it into
`install:build:full` so a full rebuild always pulls a fresh SDK
snapshot after `prepare:sdk` rebuilds the SDK itself.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test: dispatch mobile TTS tests by metadata.dependency

Mobile MobileTtsExecutor used a string-prefix heuristic that mapped
every `tts-supertonic-*` test to `tts-supertonic`, so the new
`tts-supertonic-multilingual` resource was preloaded but never
exercised by `tts-supertonic-multilingual-text`. Switch to
`test.metadata?.dependency` to match the desktop TtsExecutor.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
- Bump packages/sdk/package.json: 0.10.1 -> 0.10.2.
- Add packages/sdk/changelog/0.10.2/{CHANGELOG.md, CHANGELOG_LLM.md}.
- Rebuild root packages/sdk/CHANGELOG.md aggregate with v0.10.2 at top.

Hotfix release for the delegated-inference connection regression introduced
in v0.10.0 (tetherto#1934). NOTICE file unchanged — no dependency changes since v0.10.1.

(cherry picked from commit a4f7225)
* Add tts-ggml workflows

* Add explicit file-level permissions to tts-ggml workflows

Addresses GitHub Advanced Security findings on PR tetherto#1946 flagging the
4 benchmark stubs and the two integration workflows for the CodeQL
actions/missing-workflow-permissions rule: every workflow file should
declare a least-privilege `permissions:` block at the top so that any
job added later inherits read-only by default instead of the implicit
read/write GITHUB_TOKEN.

- benchmark-{chatterbox,performance,rtf,supertonic}-tts-ggml.yml: add
  top-level `permissions: contents: read` plus a job-level mirror on the
  noop body (which only writes to stdout).
- integration-test-tts-ggml.yml: add top-level
  `permissions: contents: read; packages: read` matching the existing
  job-level scope on `run-integration-tests`.
- integration-mobile-test-tts-ggml.yml: add top-level
  `permissions: contents: read`; the existing `build-and-test` job
  continues to widen this to packages:read + pull-requests:write +
  id-token:write for the prebuild artifact pull and Device Farm hooks.

The other six tts-ggml workflows (cpp-test-coverage, create-github-release,
on-merge, on-pr, on-pr-close, prebuilds) already had top-level
permissions declared and are untouched.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Pin mobile prebuild downloads to current run-id (CodeQL artifact poisoning)

Addresses CodeQL alert tetherto#796 (`actions/artifact-poisoning`) on PR tetherto#1946
flagged on the two `Download {Android,iOS} prebuilds (from artifacts)`
steps in integration-mobile-test-tts-ggml.yml.

The rule fires because the workflow has a `workflow_dispatch` entry
point and downloads an artifact via `actions/download-artifact@v8`
without an explicit `run-id`.  Without that input, CodeQL has to
assume the artifact could come from any prior run on the branch
(including one uploaded by a fork PR's prebuild step) — which is the
poisoning surface we are explicitly NOT exposing.

Setting `run-id: ${{ github.run_id }}` (the action's existing default)
plus `github-token: ${{ secrets.GITHUB_TOKEN }}` makes the trust
boundary explicit at the call site so the analyzer can see the
artifact is current-run only.  No behavioural change: in `workflow_call`
from on-pr-tts-ggml the parent workflow already produced the artifact
in the same run, and `workflow_dispatch` falls into the
`!inputs.package_spec` -> npm-pack branch since `package_spec`
defaults to `@qvac/tts-ggml@latest` for that trigger.

The four `actions/missing-workflow-permissions` findings on the
benchmark stubs are addressed by the previous commit (ef8d4e2).

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
…o#1915)

* QVAC-18524 fix[api]: avoid Node-only Buffer in RN duplex RPC path

On RN/Hermes the bare-rpc duplex stream polyfill calls
`Buffer.from(chunk, encoding)` for string writes, which throws
`Property 'Buffer' doesn't exist` because Hermes has no global
`Buffer`. This blocks every transcribeStream() call on iOS/Android.

- expo-rpc-client: pre-encode the JSON payload to Uint8Array via
  TextEncoder so the polyfill's binary branch is taken everywhere.
- rpc-client: drop Buffer.from in the profiled response generator,
  widen DuplexWritable/DuplexReadable to accept Uint8Array/string.
- transcribe API + transcription schemas: widen all
  TranscribeStream*Session.write(audioChunk) signatures from `Buffer`
  to `Buffer | Uint8Array` so RN callers can pass Uint8Array directly.
- tests-qvac shared runner: stop wrapping Uint8Array slices in
  Buffer.from before writing.
- tests-qvac mobile consumer: skip transcribe-stream-events-* on iOS
  under the existing QVAC-18460 TODO (same native Silero/Whisper
  crash path as transcription-*).
- tests-qvac.mdc: add a one-liner rule about avoiding Node-only
  globals in shared/mobile test code.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test: narrow duplex write() type to Uint8Array

Replace the `Buffer | Uint8Array` parameter unions on
`TranscribeStream*Session.write()` and `DuplexWritable.write()` with
plain `Uint8Array`. Node `Buffer` extends `Uint8Array`, so existing
`Buffer`-passing callers keep typechecking, but the public surface no
longer mentions a Node-only type. Per review feedback from @lauripiisang.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
… paths (tetherto#1950)

* fix: add info-level logging to ocr-onnx load and inference paths
* fix: add config and stats info logging to ocr-onnx for full addon parity
* test: add unit test for info-level addon logging
* chore: bump ocr-onnx to 0.4.5

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…o#1917)

* infra: add scheduled SDK install-check pipeline on main

* fix: update the review comments

* infra: surface npm lifecycle script output via foreground-scripts in install-check
…etherto#1942)

* doc: add architecture manifesto and principles to docs/architecture

* doc: drop internal North-Star/OKR/Google Doc references for public repo
tetherto#1908)

* infra: auto-decide npm dist-tag so backports don't clobber latest

* fix: update the review comments
…g page (tetherto#1935)

* doc: examples changed path in sdk/examples

* doc: CLI - doctor - fix issue + create new page troubleshooting

* doc: CLI - doctor - fix issue + create new page troubleshooting

* doc: fix sitemap generation in staging env

* doc: frontend - make it possible to have ToCs using less subsections headings

* doc: add tip - requested by reviewer

* doc: add tip - requested by reviewer

---------

Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>
…etherto#1932)

* fix: route bare-crypto and bare-fetch through imports map

* fix(rag): harden crypto and fetch shims

Remove uuid-random and avoid mutating global crypto for ID generation.
Require secure randomness with globalThis.crypto or #crypto fallback.
Add crypto-browserify as an optional peer and clarify crypto/fetch errors.

---------

Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>
* QVAC-17990 Add standalone ESRGAN upscaler API

* QVAC-17990 Address standalone upscaler review feedback

* QVAC-17990 Raise error when upscaler thread detection fails

* QVAC-17990 Share diffusion backend loading

* QVAC-17990 Honor cancel during ESRGAN upscale

* QVAC-17990 Tighten upscaler validation

* QVAC-17990 Add ESRGAN e2e integration coverage

* QVAC-17990 Add standalone upscaler changelog and model links

* QVAC-17990 Add ESRGAN coexistence integration coverage

* QVAC-17990 Share ESRGAN helpers and warn on dropped output

* QVAC-17990 Add ESRGAN cancel integration coverage

* QVAC-17990 Add ESRGAN cancel and coexistence tests

* QVAC-17990 Fix C++ lint issues

* QVAC-17990 Sync mobile integration manifest

* QVAC-17990 Use global native logging setup

* QVAC-17990 Document standalone EsrganUpscaler API

Add a standalone-esrgan-upscale example and a README usage section
covering the new EsrganUpscaler named export. The previous CHANGELOG
entry was the only user-facing reference to the new public class;
this commit makes it discoverable from the README index, the Other
Examples list, and a runnable example script that mirrors the existing
generate-image-esrgan-upscale flow but without the diffusion phase.

* QVAC-17990 Fix standalone ESRGAN example lint

---------

Co-authored-by: gianni-cor <gianfrancocordella@gmail.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
…orepo (tetherto#1860)

* mv qvac-lib-decoder-audio -> decoder-audio

* chore: mv lib-infer-diffusion -> infer-diffusion

* chore: remove qvac-lib- prefix from diagnostics pkg

* chore: remove qvac-lib prefix from infer-base

* Rename llamacpp-embed to embed-llamacpp

* chore: align llm-llamacpp folder to pkgname

* chore: align translation-nmtcpp with package.json canonical name

* chore: align folder for tts-onnx with pkgname

* chore: align onnx directory with pkg name, add cleanup

* chore: align dirname with pkgname transcription-parakeet

* chore: align transcription-whispercpp

* chore: mv langdetect-text-cld2 to canonical foldername

* chore: align foldername -> pkgname langdetect-text

* chore: align registry-server folder/pkgnames

* chore: rename pkg lint-cpp to match structure

* chore: align pkgname inference-addon-cpp

* chore: file update remove lib-* prefix

* chore: re-align to content of package.json > name canonical pkgname

* chore: completes rename to diffusion-cpp

* chore: align workflow name

* chore: wrap up lib-diagnostics rename

* chore: mv reusable-workflow{

* chore: mv llama-embed to embed-llama for workflows

* chore: mv llm-llamacpp workflows

* chore: mv translation-cpp workflow files

* chore: rename tts-onnx workflow files

* chore: rename onnx workflow files

* chore: mv transcription-parakeet workflows

* chore: rename whispercpp workflows

* chore: mv langdetect cld2 workflow

* chore: rename langdetect workflow

* chore: mv registry-server workflow

* chore: rename remaining files

* Tidyup: remaining files

* Tidyup: remaining renames

* chore: fixup moved embed-llamacpp

* fixup: additional renames

* fixup: revert package changes to upstream vcpkg

* fixup: remove non-existing upstreams
…1 / tetherto#1860) (tetherto#1959)

PR tetherto#1860 (commit 1d1d8c3) renamed the inference-addon-cpp package and
updated its CMakeLists.txt to look up the package config template at
cmake/inference-addon-cppConfig.cmake.in, but the template file itself
was never moved off the legacy filename. As a result every vcpkg port
build of qvac-lib-inference-addon-cpp@1.1.7#1 fails at configure time:

    CMake Error at .../cmake/CMakePackageConfigHelpers.cmake:519 (configure_file):
      configure_file Problem configuring file
    Call Stack (most recent call first):
      CMakeLists.txt:41 (configure_package_config_file)

Just complete the rename: git mv the template to the short post-rename
name. Contents are template-only (`@PACKAGE_INIT@` + `@PROJECT_NAME@`)
and need no edits.

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Co-authored-by: Cursor <cursoragent@cursor.com>
…inference-addon-cpp/ (tetherto#1960)

PR tetherto#1860 renamed the inference-addon-cpp include namespace and tetherto#1959
landed the matching cmake.in / vcpkg.json updates, but EsrganUpscalerModel
was overlooked and still pulled the old namespace headers, breaking the
diffusion-cpp build now that the registry advertises 1.1.7#1 (which
installs under include/inference-addon-cpp/):

    .../diffusion-cpp/addon/src/model-interface/EsrganUpscalerModel.hpp:10:10:
      fatal error: 'qvac-lib-inference-addon-cpp/ModelInterfaces.hpp' file not found

Just align the three remaining #include directives.

Co-authored-by: Cursor <cursoragent@cursor.com>
…1957)

* chore: Add chatterbox ggml models

* chore: Add supertonic ggml models

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
…etherto#1922)

* infra[notask]: rename desktop test runner labels for sdk tests-qvac

Update the GPU runner labels used by the SDK desktop tests workflows to
the new naming scheme:

- ai-run-windows11-gpu  -> qvac-win25-x64-gpu
- ai-run-linux-gpu      -> qvac-ubuntu2204-x64-gpu
- mac-mini-m4-gpu       (unchanged)

Affected:

- .github/workflows/test-sdk.yml: update default for desktop-platforms
  in workflow_dispatch and workflow_call inputs, plus the inline
  fallback used when calling test-desktop-sdk.yml.
- .github/workflows/test-desktop-sdk.yml: refresh the runner-label
  example in the platforms input description for consistency.

Co-authored-by: Cursor <cursoragent@cursor.com>

* infra[notask]: enable cross-os archive for desktop sdk model cache

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>
…ate (tetherto#1926)

* QVAC-18394 infra: add devops pod conventions, team file, and PR template

Baseline DevOps pod metadata and conventions to unblock the QVAC-18394
skill subtasks (Stale-Prs, Create-pr, Daily-update, Pr-review).
Documentation and config only; no behavior change.

Files:
- .github/teams/devops.json — pod metadata (leads, members, ownedPaths)
- .cursor/rules/devops/main.mdc — pod entry point + operating principles
- .cursor/rules/devops/github-actions.mdc — workflow/action conventions
- .cursor/rules/devops/secrets-and-credentials.mdc — secrets handling
  + leak-response playbook
- .cursor/rules/devops/agentic-automation.mdc — read-only-default,
  plan-then-apply, validation-before-success for AI-driven work
- .cursor/rules/devops/commit-and-pr-format.mdc — commit/PR title format
  scoped to .github/** and scripts/** (sdk pod's rule is package-scoped)
- .github/PULL_REQUEST_TEMPLATE/devops.md — PR body template mirroring
  sdk-pod.md / addon.md discipline (flat sections only)

Validated:
- All .mdc frontmatter parses cleanly (description, globs, alwaysApply)
- devops.json parses cleanly
- No linter errors, no secret patterns matched
- PR template structure mirrors existing templates (no H3 nesting,
  no tables, no HTML)

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-18394 chore: expand devops pod roster with 5 team members

Adds the rest of the active DevOps engineers to .github/teams/devops.json
so /devops-pr-status correctly partitions reviewers between "Reviews:"
(team) and "Other:" (outside) buckets. Without this, every team-member
review currently lands in "Other:" and the dashboard reports approvals
as still-needed.

Members (alphabetical, case-insensitive):
- darkynt (Matt Cavanagh)
- GiacomoSorbiWork (Giacomo)
- sidj-thr
- tamer-hassan-tether
- yauhenipankratovich-web

Removes Proletter from members per the cross-pod convention (lead is
listed in `leads` only — see .github/teams/sdk.json).

Validation:
- JSON parses; pr-status.mjs --pod devops --mode team loads the new
  roster without error.
- No code/path changes, data-only update.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-18394 chore: drop commit-and-pr-format rule (skill-only)

Per review feedback: rules auto-attach via globs and pollute the
context window on every devops surface. The format spec is already
encoded in devops-pr-create (regex validation, allowed prefixes/tags,
trigger detection) and devops-pr-review (title validation against the
same regex) — both invoked explicitly, never autoloaded.

- Delete .cursor/rules/devops/commit-and-pr-format.mdc (5 KB).
- main.mdc: drop the rule from the related-rules table and replace the
  "Commit messages and PR titles" section with a one-line pointer to
  the devops-pr-create skill.

Skill-side cross-references to the deleted rule are cleaned up on
PR tetherto#1929 (next in the stack) since that's where the skills live.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-18394 feat: add devops pod skills (pr-status, pr-create, daily-update, pr-review) (tetherto#1929)

* QVAC-18394 feat: add devops pod skills (pr-status, pr-create, daily-update, pr-review)

Resolves the four QVAC-18394 subtasks by adding the DevOps pod's user-facing
Cursor skills on top of the conventions and team file landed in the
prereq branch.

The skills lean on the existing _lib/pr-skills/ shared library for pod
discovery, PR enumeration, Slack-handle mapping, and worktree management,
so no new shared infra is added — only thin SKILL.md surfaces and
DevOps-specific workflows.

Files:
- .cursor/skills/devops-pr-status/SKILL.md — Stale-Prs subtask. Thin wrapper
  invoking pr-status.mjs --pod devops --mode team. The shared script already
  segregates PRs into needs-your-re-review / stale (>3d) / needs-review and
  flags merge conflicts; no separate stale-only mode is needed.
- .cursor/skills/devops-pr-create/SKILL.md — Create-pr subtask. Generates
  TICKET prefix[tag]?: subject titles + devops.md PR body, with trigger
  detection (action-pinning / permissions / IaC plan / [bc]) driving which
  template sections are required. Client-side title validation since no
  pr-validation-devops.yml exists yet.
- .cursor/skills/devops-daily-update/SKILL.md — Daily-update subtask.
  Aggregates yesterday's merged PRs, today's open PRs, reviews owed,
  and recent CI runs into a Slack/Asana-ready message. Bounded to <=6
  shell calls. Read-only; never posts. Includes a secret-pattern scrub
  before writing the temp file.
- .cursor/skills/devops-pr-review/SKILL.md — Pr-review subtask, absorbs
  gha-audit. Wraps /pr-review (does NOT fork it) and layers a deterministic
  GitHub Actions security audit (15 checks A1-A15) sourced verbatim from
  .cursor/rules/devops/github-actions.mdc and secrets-and-credentials.mdc.
  Findings flow into the same pending-review payload the user confirms.

All four skills:
- disable-model-invocation: true (state-changing or PR-posting flows)
- Reference rules and team file landed by the prereq PR
- Inherit safety + efficiency rules from .cursor/rules/devops/agentic-automation.mdc
  (read-only by default, plan-then-apply for state changes, bounded shell calls)

Validated:
- All four SKILL.md frontmatter parses (name matches directory; non-trivial description)
- All 12 cross-file references resolve (rules, team file, PR template, shared lib, parent skills)
- gh search prs / gh run list flags + JSON fields verified against gh CLI 2.x help output
- ReadLints clean
- No formatter mangling

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-18394 fix: align devops-daily-update output to team's slack template

The first draft used a generic Markdown layout (`## Yesterday`, `## Today`,
`## Blockers`, `_(none)_` for empty sections, GitHub-flavored links).
The team's actual daily-update format on Slack is different:

  🔨 *Done today*
  - QVAC-XXXXX: <past-tense action>
      - <optional sub-bullet>

  📅 *Planned for tomorrow*
  - QVAC-XXXXX: <forward-looking action>
  - QVAC-YYYYY

  🚧 *Blockers / risks*
  - N/A

Changes:
- Replaced the section names and added the canonical 🔨 / 📅 / 🚧 emoji
- Switched from Markdown headings to Slack-bold (`*Section*`) so the output
  renders correctly when pasted into Slack (Slack does not render `##`)
- Empty sections now render `- N/A` (literal), not `_(none)_`
- Bullets lead with `TICKET:` (auto-linked by the workspace's Asana app),
  not `#<pr-num>` — falls back to `#<num>` only when no ticket can be
  extracted from PR title or branch name
- Sub-bullets at 4-space indent for ticket-level context
- Default `--format` is now `slack` (not `markdown`) — Slack is the primary
  destination; chat preview keeps the Markdown form
- Temp file extension changed `.md` → `.txt` to reflect Slack mrkdwn (not
  GitHub-flavored Markdown) as the canonical form
- Added ticket-extraction rules (PR title → branch name → `#<pr-num>`)
- Added a per-section routing table (merged-today / pushed-today /
  open-no-recent-commits / reviews-owed / conflicting / stale-review /
  CI-failing) so the agent knows which bucket each item lands in

Lookback default unchanged at "yesterday 00:00 local" — covers both an EOD
post late evening and a morning standup at 7am without manual `--since`.

Quality gates updated to enforce the new layout (correct emoji + section
names; `- N/A` for empty; no Markdown headings in Slack form; no GitHub-
style links).

The skill is still read-only and never posts. The user copies from the
temp file and pastes into Slack manually.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-18394 chore: align devops skills to sdk-pod conventions

Self-audit pass against `.cursor/rules/sdk/skill-authoring-guidelines.mdc`
and the SDK pod's reference skills (sdk-pr-status, sdk-pr-create,
sdk-changelog, sdk-backmerge). Documentation-only.

Description tightening:
- devops-pr-status:    341 → 275 chars
- devops-pr-create:    269 → 231 chars
- devops-daily-update: 398 → 255 chars
- devops-pr-review:    386 → 271 chars

Reference: sdk-pr-status's description is 256 chars. All four are now in
the same 230–280 range, vs the prior 270–400 range. WHAT/WHEN preserved
on each.

Heading consistency:
- "## Quality gates" → "## Quality Checklist" in devops-daily-update,
  devops-pr-review (sdk-changelog / sdk-backmerge / sdk-pr-create all use
  "Quality Checklist")
- "## Validation gate (CLIENT-SIDE)" → "## Validation" in devops-pr-create
  (no SDK skill uses uppercase parenthetical scope qualifiers in headings)

Editorial cleanup:
- devops-pr-status: dropped the "Resolves the Stale-Prs subtask of
  QVAC-18394 …" paragraph (skill bodies should not reference their own
  PR/ticket; SDK skills never do)
- devops-daily-update: dropped the upfront "## Canonical template"
  section (~25 lines). Step 8's "#### Slack form (canonical)" is the
  single source of truth for the format. Folded the one unique line —
  bare-ticket bullets allowed when self-evident — into Step 8.

Reduced devops-daily-update from 269 → 242 lines. Other line-counts
stable (46, 183, 140).

No behaviour changes. Cross-file references still resolve. Frontmatter
parses; name matches dir; disable-model-invocation: true preserved on
all four. ReadLints clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-18394 fix: devops skill issues found during test pass

- github-actions.mdc § Permissions: accept top-level OR per-job
  permissions blocks as equivalent (per-job is the more secure
  narrower-scope pattern).
- github-actions.mdc § File layout: add integration-<scope>-<pkg>.yml
  to the canonical filename list (existing repo convention).
- devops-pr-review SKILL.md: tighten A2 + A15 check descriptions to
  mirror the loosened rule (audit becomes more permissive — no
  consumers break).
- devops-daily-update SKILL.md: trim merged-PRs gh-search --json
  field set to what the API actually exposes (closedAt, not mergedAt/
  additions/deletions); add cap of 5 most-recently-updated reviews
  to the standup output with overflow line.
- devops-pr-create SKILL.md + devops.md PR template: drop the
  redundant "be concise" Note line from the template head.

All issues uncovered by the end-to-end test session of the four new
devops skills on this branch.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-18394 fix: emit paste-ready output files in devops pr-status + pr-create

- devops-pr-status: tee dashboard stdout to /tmp/devops-pr-status-<date>.txt
  and redirect stderr to a sibling .stderr file. Print pbcopy/xclip/wl-copy
  commands so the operator can paste the dashboard straight into a Slack
  thread (Slack auto-renders the indented plain text as nested bullets and
  turns #<num> into PR auto-links).
- devops-pr-create: add an explicit step 8 to write the assembled PR body
  to /tmp/pr-body.md (the gh CLI Integration section already cat's that
  path). Add the pbcopy/xclip/wl-copy commands as step 9 for direct paste
  into the GitHub PR-create form.

Discovered during the test pass — the dashboard output was useful but the
operator had to manually copy from the terminal. Now there's a single
pbcopy command to grab paste-ready content.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-18394 chore: drop commit-and-pr-format rule cross-references in skills

Mirror the rule deletion on PR tetherto#1926 — remove dead links from
devops-pr-create and devops-pr-review SKILL.md, and inline the
title regex / allowed prefixes / allowed tags so the skills stay
self-contained without auto-loading anything via globs.

- devops-pr-create: Format References now points at the inline
  Validation regex; the "see rule" parenthetical in Validation is
  replaced with a one-line note that no pr-validation-devops.yml
  exists yet; the References bullet for the deleted rule is
  removed.
- devops-pr-review: drop commit-and-pr-format from the auto-load
  list in step 4 (it's deleted, no longer auto-loads); inline the
  format spec in step 5 (regex + prefixes + tags); replace the
  rule bullet in References with a pointer to devops-pr-create as
  the canonical home for the format spec.

No behavior changes — same regex, same prefix/tag list, same
validation logic.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
* Restore Qwen3.5 / Gemma4 / PaddleOCR-VL tests + Mali coopmat fix

Stack of three logical changes squashed into one commit so the test
ports stay self-consistent with the build/runtime they depend on:

1. qvac-fabric overlay ports (LLM + embed + nmtcpp):
   - Pin to fabric 78db8bf4 (PR tetherto/qvac-fabric-llm.cpp#121 HEAD,
     includes c79a8851 "ggml-vulkan: Fix NaN outputs on Mali").
   - Drop -DGGML_VULKAN_DISABLE_COOPMAT*=ON for Android so coopmat
     shaders are compiled in. With coopmat off, runtime
     device->coopmat_support is false and the Mali fix's ARM-gated
     branches were skipped, leaving Qwen3-Q8_0 finetuning NaN on
     Pixel 9 Pro Mali.
   - Wire up overlay-ports in each package's vcpkg-configuration.json.
   - Add find_package(OpenSSL) before find_package(llama) in the LLM
     CMakeLists so llama-targets.cmake's transitive OpenSSL::SSL
     reference (via cpp-httplib) resolves on local builds.

2. utils.js downloadFile redirect race:
   - Track a handedOff flag set when the redirect branch hands off
     dest to a recursive call. All cleanup paths now skip fs.unlink
     once ownership is transferred, so a late error from the outer
     writestream can't delete the freshly-downloaded file (Pixel
     ENOENT after "successful" mmproj download).

3. Three new integration tests + their mobile harness wiring:
   - qwen3-5.test.js — basic / multi-turn / tool-calling
   - gemma4.test.js — text / multi-turn / image (forced to CPU on
     darwin + mobile because gemma4v projector SIGSEGVs on Metal and
     Adreno OpenCL) / tool-calling
   - ocr-paddle.test.js — OCR; mobile maxTokens capped to 768
   - Ported to the new addon API (files: { model: [absPath],
     projectionModel?: absPath }, config: …).
   - Added matching unit test test_text_llm_context_qwen3.cpp.
   - integration.auto.cjs registers runQwen35Test, runGemma4Test,
     runOcrPaddleTest dispatchers.
   - test-groups.json: iOS heavy4 cluster
     (Gemma4+OcrLighton+OcrPaddle), iOS lightB adds Qwen35,
     Android groupB has Qwen35 first then Gemma4 / OcrPaddle.
   - Workflow: Android GroupB Device Farm jobTimeout 60→90 min.

* API port + Gemma4 tool-call fix.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Wire addon/src/patches ahead of the vcpkg include path to pick up the LlamacppUtils.hpp ptr-API override.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* API port + Gemma4 tool-call fix.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Split iOS heavy4 into three single-test specs (heavy4 = OcrLighton, new heavy7 = Gemma4, new heavy8 = OcrPaddle) and schedule them as separate Device Farm runs to avoid memory pressure.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Drop LlamacppUtils.hpp patch override; bump addon-cpp to 1.1.7

The LlamacppUtils.hpp common_init_result_ptr API now ships in
qvac-lib-inference-addon-cpp 1.1.7 (PR tetherto#1887), so the local
addon/src/patches/qvac-lib-inference-addon-cpp/LlamacppUtils.hpp
shim is no longer needed in the embed and llm addons.

- Delete the patch headers in embed and llm.
- Drop the BEFORE PRIVATE addon/src/patches include path from the
  embed/llm production and unit-test CMakeLists.
- Bump qvac-lib-inference-addon-cpp version>= to 1.1.7 in the embed,
  llm, and nmtcpp vcpkg.json files so they pick up the upstream
  ptr-API header from the registry.

The OpenSSL find_package() addition stays — it's an unrelated
local-build fix.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Cap ocr-lighton predict to 1800 (desktop) / 768 (mobile) so the LightOnOCR response can't overrun ctx_size=4096.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Rewrite sliding-context test to use the post-GGML_PAD effective n_ctx (512) and retune n_predict / n_discarded so all 8 cases match the current ContextSlider semantics.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Allow embed batching test to override ctx_size and pin gte-large to batch_size=512 / ctx_size=384 to probe the Mali Vulkan first-submit ErrorDeviceLost.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Fix reverse-prompt scenario by removing comma, space, listing both 'pizza' and 'Pizza', and lowercasing the assertion comparisons to match 'Pizza' and 'pizza'.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Sanitize media Uint8Array prompts before logging to avoid V8 Zone OOM.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Use Qwen3 family chat-template to fix Qwen3.5-0.8B gibberish output on macOS Metal.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Update portfiles to point to the latest fabric.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Revert "Allow embed batching test to override ctx_size and pin gte-large to batch_size=512 / ctx_size=384 to probe the Mali Vulkan first-submit ErrorDeviceLost."

This reverts commit 1408896.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Raise AfriqueGemma cancel maxWait to 60s, and apply the use_jinja gate-drop so Qwen3-family models always pick the fixed jinja template.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Drop the retired AfriqueGemma integration tests.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Update portfiles to point to the latest head.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Update portfiles to point to the latest head.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Drop qwen35 from the Qwen3-template detection and the supported-finetune-architecture list since neither path is actually validated for Qwen3.5.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Update portfiles to point to the latest head.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Enable coopmat.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Drop the Qwen3 use_jinja override pairing now that qwen35 is no longer treated as Qwen3-family.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Use only general.architecture for Qwen3 detection so Qwen3.5 stops getting the Qwen3 chat-template via the model-name substring fallback.

Drop modelNameLooksLikeQwen3 / getModelName and the modelName parameter from supportsToolsCompactForModelMetadata and selectToolsCompactMarkerForModelMetadata. The substring match on general.name treated "Qwen3.5-..." as Qwen3 and overrode the model's embedded tokenizer.chat_template, contradicting the recent decision to keep qwen35 out of the Qwen3 family. Update the LlamaModel call site and unit tests; add explicit qwen35/nullopt negative cases.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Accept HuggingFace function-call XML in extractToolCalls so the Qwen3.5 tool-calling integration test parses the model's native <tool_call><function=...><parameter=...>...</parameter></function></tool_call> envelope produced by its embedded chat template, in addition to the Qwen3-style JSON envelope.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Bump n_predict in the Qwen3.5 basic and multi-turn integration tests so the embedded chat-template's reasoning block has room to finish before the answer on slower CI backends.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Enable coopmat and point to the latest fabric.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Route Qwen3.5 inference and all finetuning on Mali to CPU, disable Vulkan coopmat at build time, halve mobile finetune workload to account for CPU training.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Point to the latest fabric version.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Force Bert to the CPU on Mali.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Run finetuning on Mali GPU.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Run Qwen 3.5 on Mali GPU.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Point to the latest fabric version and enable coopmat path.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* vcpkg: drop per-package qvac-fabric overlays

Removes the qvac-fabric overlay-ports infrastructure from the LLM,
Embed, and NMT manifests. The default-registry baseline is left
untouched, so vcpkg now resolves qvac-fabric directly from the
registry at the existing baseline (7248.2.3).

Bumping to fabric 8189.0.0 will be handled by a separate baseline
update; this commit only undoes the overlay-based development setup
that was no longer needed.

- vcpkg-configuration.json (3x): drop "overlay-ports" entry.
- vcpkg/ports/qvac-fabric/ (3x): remove overlay portfile.cmake,
  vcpkg.json, and android-vulkan-version.cmake.

Co-authored-by: Cursor <cursoragent@cursor.com>

* vcpkg: bump qvac-fabric version constraint to 8189.0.0

Updates the consumer manifests in the LLM, Embed, and NMT packages
to require qvac-fabric >= 8189.0.0. The default-registry baseline
is intentionally left untouched.

Co-authored-by: Cursor <cursoragent@cursor.com>

* llm/embed/nmtcpp: bump versions for qvac-fabric 8189.0.0

- qvac-lib-infer-llamacpp-llm: 0.19.2 -> 0.20.0 (minor)
- qvac-lib-infer-llamacpp-embed: 0.15.0 -> 0.16.0 (minor)
- qvac-lib-infer-nmtcpp: 2.1.1 -> 3.0.0 (major)

The nmtcpp major bump reflects a real behavioural regression: the
previous overlay built ggml unconditionally with every GPU backend
the platform supported (Vulkan/Metal/OpenCL); switching to the
upstream registry port with the existing "default-features": false
in nmtcpp's vcpkg.json now disables the new "gpu-backends" feature,
so out-of-the-box ggml exposes only the CPU backend. Consumers that
rely on GPU-accelerated nmt inference must add
'"features": ["gpu-backends"]' to the qvac-fabric block of their
nmtcpp build manifest.

CHANGELOG entries added in all three packages.

Co-authored-by: Cursor <cursoragent@cursor.com>

* nmtcpp: opt into qvac-fabric gpu-backends feature; downgrade bump to 2.2.0

The previous commit (3.0.0) flagged a breaking change: switching from
the always-on overlay to the registry port with default-features:false
disabled GPU backends in ggml. Adding "features": ["gpu-backends"]
to nmtcpp's qvac-fabric dep restores the previous Vulkan/Metal/OpenCL
behaviour, so the bump is now a non-breaking minor (2.2.0) and the
BREAKING note in the changelog is replaced with a plain Changed entry.

Co-authored-by: Cursor <cursoragent@cursor.com>

* nmtcpp: re-bump to 3.0.0 (major)

Restores the major version bump for nmtcpp. The new fabric port schema
(features split between gpu-backends/llama) and the move from a vendored
overlay to the upstream registry are large enough downstream changes
that consumers should treat this as a major release, even though
runtime behaviour is preserved by opting into "gpu-backends".

Co-authored-by: Cursor <cursoragent@cursor.com>

* vcpkg: pin qvac-fabric to >=8189.0.0#1

The 8189.0.0 (port-version 0) qvac-fabric port shipped a
configure-time bug for consumers without the "llama" feature
(i.e. nmtcpp): -DLLAMA_MTMD=ON was passed unconditionally, which
transitively enables LLAMA_BUILD_COMMON, which makes upstream call
license_generate(common) -- but BUILD_LLAMA=OFF skips defining the
'common' target, so the cmake configure aborts.

The fix landed in tetherto/qvac-registry-vcpkg#136 as
qvac-fabric port-version 1. Bumping the consumer constraint from
"version>=": "8189.0.0" to "version>=": "8189.0.0#1" forces vcpkg
to pick the fixed port-version (otherwise it picks the lowest
satisfying version, which is the broken #0).

Validated: nmtcpp arm64-android cross-build now configures and
builds end-to-end against the upstream registry, no overlay needed.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs: drop overlay-removal note from changelogs

Removes the changelog bullet describing the deletion of the per-package
qvac-fabric vcpkg overlay. The overlay teardown is mechanical packaging
plumbing rather than a user-facing change worth documenting.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test/llm: restore AfriqueGemma integration tests (desktop-only)

Reverts e257a19's deletion of the afriquegemma-edge-cases and
afriquegemma-translation integration tests, and adds a 'desktopOnly'
opt-out so they're skipped on mobile without breaking the per-test
group coverage invariant.

- packages/qvac-lib-infer-llamacpp-llm/test/integration/afriquegemma-edge-cases.test.js: restored.
- packages/qvac-lib-infer-llamacpp-llm/test/integration/afriquegemma-translation.test.js: restored.
- test/mobile/test-groups.json: new top-level "desktopOnly" array
  listing runAfriquegemmaEdgeCasesTest and runAfriquegemmaTranslationTest.
- scripts/generate-mobile-integration-tests.js: validateGroups now
  reads the desktopOnly list; entries are still emitted into
  integration.auto.cjs (so validate-mobile-tests stays happy) but
  excluded from the per-platform "missing" check, so the mobile
  runners never invoke them.
- test/mobile/integration.auto.cjs: regenerated by
  `npm run test:mobile:generate`.
- CHANGELOG note in qvac-lib-infer-llamacpp-llm under Tests.

Validated via `npm run test:mobile:generate` + `npm run test:mobile:validate`.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(llm): drop AfriqueGemma test restoration changelog note

Co-authored-by: Cursor <cursoragent@cursor.com>

* test/llm: switch AfriqueGemma desktop-only skip to in-test pattern

Per review: don't change generate-mobile-integration-tests.js. Use the
same skip:isMobile pattern other tests already use (config-parameters,
tool-calling, image), and keep the AfriqueGemma functions in the iOS
lightA / Android groupA groups so the existing per-test coverage
invariant stays intact.

- packages/qvac-lib-infer-llamacpp-llm/scripts/generate-mobile-integration-tests.js:
  reverted to upstream/main (drops the desktopOnly opt-out plumbing).
- test/mobile/test-groups.json: drops 'desktopOnly', adds
  runAfriquegemmaEdgeCasesTest and runAfriquegemmaTranslationTest
  back to ios.lightA and android.groupA.
- test/integration/afriquegemma-edge-cases.test.js,
  test/integration/afriquegemma-translation.test.js: add
  isMobile = platform === 'ios' || platform === 'android', and
  skip:isMobile to every test() options object (13 total).
- test/mobile/integration.auto.cjs: regenerated.

Validators both green:
  npm run test:mobile:generate -> "all tests assigned for every platform"
  npm run test:mobile:validate -> ok

Co-authored-by: Cursor <cursoragent@cursor.com>

* test/llm: skip ocr-lighton on mobile

Adds skip:isMobile to the single test in ocr-lighton.test.js,
matching the AfriqueGemma / config-parameters / tool-calling
pattern. isMobile is already defined in this file. The test stays
in ios.heavy4 / android.groupB so per-platform group coverage is
unaffected; the brittle test itself just skips on mobile.

Co-authored-by: Cursor <cursoragent@cursor.com>

* ci: revert workflow timeout change for llm mobile integration

Drops PR tetherto#1874's edit to
.github/workflows/integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml
(parameterised jobTimeoutMinutes + 90-minute override for Android
GroupB). Workflow is restored to the upstream/main version.

Co-authored-by: Cursor <cursoragent@cursor.com>

* addons: disable flash-attn by default on the OpenCL backend

Flash attention is not reliably supported by the OpenCL ggml backend
(Adreno path), so when the chosen GPU backend ends up being OpenCL
the addons now force "flash-attn=off" unless the user explicitly
passed flash-attn / flash_attn in their config.

LLM (LlamaModel.cpp / LlamaModel.hpp):

- Add a bool isOpenCl parameter to tuneConfigMap (defaulted to false
  to keep the existing test_tune_config_map.cpp call sites working).
- Mirror the BitNet-disabling branch with an else-if for OpenCL +
  notUserSet("flash-attn", "flash_attn").
- At the call site, read chosenBackend.first/second after chooseBackend
  returns and pass isOpenCl through.

Embed (BertModel.cpp):

- No tuneConfigMap equivalent here. Inject the same logic inline
  immediately after chooseBackend, before configFilemap is serialised
  into configVector. Honour user-set "flash-attn"/"flash_attn".

Both packages compile cleanly via bare-make build on macOS-arm64.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fixup! tuneConfigMap: keep ABI for existing 4-arg test callers

CI failure on cpp-tests-darwin-arm64 (PR tetherto#1874):
  test/unit/test_tune_config_map.cpp:199:43: fatal error: no viable
  conversion from 'FtOverrides' to 'bool'

The previous commit inserted bool isOpenCl as the 4th parameter of
tuneConfigMap, but several existing tests pass FtOverrides{...} as
the 4th positional argument (relying on it being finetuneOverrides).

Swap the order so the new isOpenCl parameter comes after the existing
finetuneOverrides; both stay defaulted, so all old 3-arg and 4-arg
call sites compile unchanged. The production call site in
LlamaModel.cpp is updated accordingly.

Also adds 4 new TuneConfigMapTest cases covering the OpenCL branch:
- OpenCl_NonBitnet_FlashAttnDisabledByDefault
- OpenCl_UserSetFlashAttnHyphen_Respected
- OpenCl_UserSetFlashAttnUnderscore_Respected
- NotOpenCl_NonBitnet_FlashAttnUnchanged

All 53 TuneConfigMapTest cases pass locally on macOS-arm64.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add QWen 3.5 vision test.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Route vision models with mmproj to CPU on Apple M1.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Route only the projector to CPU on Apple M1.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* run qwen3-5.test.js on IOS GPU

* js lint

* Recognize Gemma 4 channel reasoning markers in Qwen3ReasoningUtils, and bump gemma4 basic-test n_predict so the answer fits after the thinking preamble.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Wire reasoning-budget config to inputs.enable_thinking so passing reasoning-budget=0 disables the model's <think> reasoning channel, and add coverage for Qwen3, Qwen3.5, and Gemma 4.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* vcpkg: bump qvac-fabric to >=8189.0.1

The 8189.0.1 port (tetherto/qvac-registry-vcpkg#138) drops
port-version 1's BUILD_LLAMA=OFF portfile workaround and ships the
new fabric tip 739b309ae. Notable upstream fixes pulled in:

- Inject enable_thinking into the Jinja template context so Qwen 3.5
  and Gemma 4 actually emit <think> reasoning content.
- GGML_OP_DELTA_NET_AR Vulkan compute shader (Qwen 3.5 / DeltaNet
  decode no longer falls back to CPU per token).
- vulkan: f32 src1 strided cpy fix (embedding-model crash).

Validated on macOS-arm64: vcpkg resolves
qvac-fabric[core,gpu-backends,llama]:arm64-osx@8189.0.1 and the
addon builds end-to-end.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Disable the embed addon's BERT-on-Mali CPU override.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Prepend <think> opener to the visible stream when the chat template force-opens the reasoning channel.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Remove the Mali detection plumbing from the embed addon now that BERT runs on Mali GPU.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Bump n_predict and ctx_size in the Qwen3.5 reasoning-budget baseline so the model reliably reaches </think>.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Restore the mobile finetune dataset to 8 samples.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* test: drop AfriqueGemma + MedGemma + Dolphin-MoE tests

Per review: cull tests that exercise models we no longer want covered
in the LLM/SDK CI matrix.

LLM (packages/llm-llamacpp):
- Delete integration tests:
  - test/integration/afriquegemma-edge-cases.test.js
  - test/integration/afriquegemma-translation.test.js
  - test/integration/moe.test.js (dolphin-mixtral-2x7b)
- Delete docs/afriquegemma-translation.md (only documents the
  now-removed integration tests).
- Strip the medgemma-4b-it variant from:
  - test/integration/tool-calling.test.js (collapses
    ALL_TOOL_MODEL_VARIANTS / TOOL_MODEL_VARIANTS to qwen3-1.7b only,
    drops the now-unused isMobile derived var).
  - test/integration/finetuning-pause-resume.test.js (drops the
    medgemma-4b-it-q4_0 entry from FINETUNE_MODELS).
- test/unit/test_model_metadata.cpp: drop the gemma3Model_ fixture +
  the two Gemma3-specific TEST_F cases
  (DiskSingleFile_Gemma3Arch_*); update the comment block listing
  exercised arches accordingly.
- test/unit/pick-primary-gguf-path.test.js: keep the tensors.txt-first
  ordering test, but rebase the fixture filenames on
  Qwen3-4B-Q4_K_M-* so no medgemma names remain in the test corpus.
- test/mobile/test-groups.json + test/mobile/integration.auto.cjs:
  drop runAfriquegemmaEdgeCasesTest, runAfriquegemmaTranslationTest,
  runMoeTest from both ios and android groups; auto.cjs trimmed to
  match. `validate-mobile-tests.js` is green.

SDK (packages/sdk/tests-qvac):
- Delete tests/translation-afriquegemma-tests.ts.
- tests/test-definitions.ts: drop translationAfriquegemmaTests
  import + spread.
- tests/shared/executors/translation-executor.ts: drop the import,
  the spread, and the |afriquegemma branch from the dispatch regex.
- tests/mobile/consumer.ts + tests/desktop/consumer.ts: drop the
  AFRICAN_4B_TRANSLATION_Q4_K_M import and the
  resources.define("afriquegemma", ...) block; mobile also drops the
  afriquegemma-only SkipExecutor.
- tests/shared/resource-lifecycle.ts: rephrase the eviction-comment
  example to a generic "large translation model" so it no longer
  references the deleted resource.

Not touched: NOTICE/CHANGELOG (auto-generated/historical),
sdk/models/registry/* (model constants in the registry are data, not
tests), sdk/examples/translation/translation-llm-afriquegemma.ts
(consumer-facing example, not a test).

* Revert "test: drop AfriqueGemma references from packages/sdk/tests-qvac"

Per review: keep packages/sdk/tests-qvac/ untouched. Restore the SDK
afriquegemma test file, the test-definitions / translation-executor /
desktop+mobile consumer / resource-lifecycle edits to their state
prior to commit 36de6ec.

Only the LLM-side cull (packages/llm-llamacpp + the deleted afrique /
moe / medgemma test files there) from 36de6ec is kept.

* Restore packages/llm-llamacpp/docs/afriquegemma-translation.md

Per review: keep the AfriqueGemma translation doc. Commit 36de6ec
removed it together with the LLM AfriqueGemma test files; restore it
unchanged from the merge tip (e29836d).

* chore: pin qvac-fabric to 8189.0.2 via overlay-ports for testing

Adds an overlay port copy of qvac-fabric pointing at v8189.0.2 of
tetherto/qvac-fabric-llm.cpp (tetherto/qvac-registry-vcpkg#140)
to llm-llamacpp, embed-llamacpp, and translation-nmtcpp, declared via
each package's vcpkg-configuration.json. Lets this PR exercise the new
fabric build (incl. the Mali coopmat1 BitNet TQ NaN fix) without
waiting for the registry baseline bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: pin overlay qvac-fabric to temp-8189 tip f686a1324

Point REF at the latest qvac-fabric-llm.cpp temp-8189 commit
(f686a1324e13184d3257cb74c1ba17f9cf8ef575) instead of v8189.0.2 so the
overlay tracks branch tip while the branch is still moving.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: extend Android LLM mobile test timeouts

Allow slower Android Device Farm runs to finish model-heavy LLM tests before the harness marks them as timed out.

Co-authored-by: Cursor <cursoragent@cursor.com>

* vcpkg: drop qvac-fabric overlay-ports, bump version>= to 8189.0.2

tetherto/qvac-registry-vcpkg#140 publishes qvac-fabric@8189.0.2 in the
default registry, so the temporary per-package overlay we used while the
new fabric build was still being shaken out is no longer necessary.

For llm-llamacpp, embed-llamacpp, and translation-nmtcpp:

- Delete `packages/<pkg>/vcpkg/ports/qvac-fabric/` (portfile.cmake,
  vcpkg.json, android-vulkan-version.cmake) — the overlay copy.
- Drop the `overlay-ports` entry from each package's
  vcpkg-configuration.json. The `default-registry` baseline is left
  untouched intentionally; the `version>=` constraints below are what
  forces vcpkg to resolve to the new fabric revision against the
  unchanged baseline.
- Bump the `qvac-fabric` `version>=` pin from `8189.0.1` -> `8189.0.2`
  in each package's vcpkg.json.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(llm): drop dead sawMali plumbing from BackendSelection

`sawMali` was threaded through `emplaceIfValidDevice` / `tryEmplaceDevice` /
`chooseBackend` but never read by any caller — leftover from the earlier
"Force BERT/Qwen3.5 to CPU on Mali" iterations. The embed-side cleanup
already landed in 2ac5de0 ("Remove the Mali detection plumbing from the
embed addon now that BERT runs on Mali GPU."); this finishes the symmetric
removal on the LLM side. `sawAppleM1` plumbing is preserved unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(llm): explain why MtmdLlmContext skips inside_reasoning flip

TextLlmContext flips reasoningState_.inside_reasoning = true alongside the
forced "<think>\n" opener; MtmdLlmContext doesn't because it doesn't carry a
reasoningState_ today. Add an inline note so the asymmetry isn't read as a
bug, and point at the symmetric site to update if reasoning-aware EOS
replacement is later added on the multimodal path.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(llm): narrow tool-call args quoter to leading bare key only

The previous post-generation regex (`([{,])(\s*)([A-Za-z_]…)(\s*):` -> quote
the ident) was too broad: it also matched `, ident:` substrings sitting
inside JSON string values, so a tool call with a free-form string argument
like `{"query":"phase one, step: validate"}` came out corrupted as
`{"query":"phase one, "step": validate"}`, which then failed JSON.parse on
the consumer side.

In practice the rewrite is only needed for one upstream quirk: the Gemma 4
parser's `gemma4_args_to_json` (common/chat-parser.cpp) uses an
`at_key_start()` helper that peeks backwards in the output buffer for a
`{`/`,` -- so the very first top-level key is left bare while every nested
or post-comma key is already quoted. All other tool dialects reach us via
`json::dump()` upstream and already start with a quoted key.

Replace the broad regex with one anchored at `^\{(\s*)<ident>\s*:`, which
fixes exactly that single leading-bare-key case and cannot match anywhere
inside a JSON string value.

Verified end-to-end on linux-x64 against gemma-4-E2B-it-Q8_0 (CPU):

- Adversarial prompt forcing `phase one, step: validate` as a tool arg
  string: baseline produced invalid JSON
  `{"query":"phase one, "step": validate"}` (parse fail at pos 55);
  this fix yields `{"query":"phase one, step: validate"}` and the test
  passes 7/7 assertions.
- Existing simple-args happy path (`get_weather` with city/unit) still
  passes 5/5.

Co-authored-by: Cursor <cursoragent@cursor.com>

* revert(llm): drop synthetic <tool_call>{json}</tool_call> post-processing

Each model now streams only its own native tool-call dialect:
- Qwen3 / Hermes: <tool_call>{json}</tool_call> (already canonical)
- Qwen3.5: <tool_call><function=name><parameter=k>v</parameter></function></tool_call>
- Gemma 4: <|tool_call>call:NAME{key:<|"|>val<|"|>,...}<tool_call|>
- Mistral, DeepSeek-R1, Functionary, GPT-OSS, etc. emit their own markers.

The previous PR added a post-generation common_chat_parse pass that
appended a uniform <tool_call>{json}</tool_call> envelope for every
detected call. That duplicated tokens for Hermes-shape models (the
envelope is already in the native stream) and inflated Gemma 4 output
by ~14% with two synthetic copies per call. The leading-bare-key
handling for Gemma 4's tc.arguments was also a constant source of sharp
edges (broad regex corrupted string values containing ", ident:";
narrow anchored regex still required follow-up). Per-dialect parsing
belongs at the SDK consumer layer, not in the addon.

Removed:
- Post-generation block in LlamaModel::processPromptImpl (synthesizer).
- needsOutputCapture widening to include !resolved.tools.empty().
- LlmContext::getLastChatFormat() virtual.
- lastChatFormat_ members + overrides in TextLlmContext, MtmdLlmContext.
- common_chat_format* outFormat parameter from getPrompt().
- <regex> include in LlamaModel.cpp (no remaining users).

Kept:
- outThinkingForcedOpen mechanism (independent reasoning-channel feature).
- toolsCompact_ controller and KV-cache trim logic.
- All other PR work.

Validated on linux-x64/CPU after incremental rebuild:
- Gemma 4 (gemma-4-E2B-it-Q8_0): 6/6 asserts pass with native-dialect
  parser, no synthetic envelope leaks, output 941 chars (down from
  ~1100 with synthesizer).
- Qwen3.5 (Qwen3.5-0.8B-Q8_0): 5/5 asserts pass with the existing
  parseXmlToolCall path, output 394 chars.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(llm): parse Gemma 4 native tool-call dialect in gemma4.test.js

Without the synthetic <tool_call>{json}</tool_call> envelope reverted
in the previous commit, Gemma 4 emits its own dialect:

  <|tool_call>call:NAME{key:<|"|>val<|"|>,...}<tool_call|>

Strings are wrapped in <|"|>...<|"|> instead of "...", keys are bare,
and the closing tag is <tool_call|> (trailing pipe, no slash).

extractToolCalls now matches that shape directly and returns
{ name, argsRaw }. argsContainStringValue() helper checks the args
body for a Gemma-4-quoted string literal. Substring-based assertion
is sufficient to verify the model called the right tool with the
right argument values; full dialect-to-JSON conversion lives upstream
in fabric's gemma4_args_to_json and is not the addon test's job.

qwen3-5.test.js was unchanged: Qwen3.5 wraps its <function=name>
<parameter=k>v</parameter></function> XML in <tool_call>...</tool_call>
natively, so the existing parseXmlToolCall path keeps working.

Validated on linux-x64/CPU against gemma-4-E2B-it-Q8_0:
4/4 tests, 13/13 asserts (3 synthetic-input parser sanity checks +
1 live LLM run).

Co-authored-by: Cursor <cursoragent@cursor.com>

* revert(llm): drop Apple M1 detection + projector-CPU routing

The PR added an Apple-M1-specific code path that detected the chip via the
GPU description string and routed `params.mmproj_use_gpu = false` so the
vision projector ran on CPU instead of Metal, working around a SIGSEGV in
the projector's image-encoding kernel observed on M1 Metal at the time.

Re-tested on M1 with the current fabric tip: no SIGSEGV, projector runs
fine on Metal end-to-end. The carve-out is no longer needed.

Removed:
- BackendSelection: `isAppleM1Device()` helper, `bool& sawAppleM1` plumbing
  through `emplaceIfValidDevice` / `tryEmplaceDevice` / `chooseBackend`,
  and `bool* outSawAppleM1` parameter on both `chooseBackend` overloads.
- LlamaModel: the `bool sawAppleM1 = false` local, the call-site argument,
  and the `params.mmproj_use_gpu = !sawAppleM1` ternary; mmproj now uses
  GPU on every desktop platform (Android still hardcoded to false).
- test_backend_selection.cpp: `APPLE_M{1,2,3,4}_DESC` constants,
  `chooseBackendWithM1Flag()` helper, and the four `AppleM*_*` test cases.
- gemma4.test.js / qwen3-5.test.js: the comment blocks describing the M1
  carve-out; `useCpuForVision` semantics are unchanged (`useCpu || isMobile`
  on gemma4 and `useCpu` on qwen3-5).

Verified on linux-x64/CPU after rebuild: 148/148 C++ unit tests pass
(BackendSelectionTest, TuneConfigMapTest, ChatTemplateUtilsTest).

Co-authored-by: Cursor <cursoragent@cursor.com>

* revert(llm): drop dead Gemma 4 markers from updateQwen3ReasoningBuffer

The PR added two extra substring scans for Gemma 4's reasoning channel
markers (<|channel>thought open, <channel|> close) to
updateQwen3ReasoningBuffer. The intent was to extend the EOS-rescue
path (handleQwen3ReasoningEOS rewrites EOS-while-thinking into a
closing tag) to Gemma 4. That never actually fires though: both the
buffer-update call and the EOS-rescue call in TextLlmContext are gated
by `if (isQwen3Model_)`, and isQwen3Model_ resolves to
`general.architecture == "qwen3"` only. Gemma 4 reports architecture
"gemma4", so the gate never opens, the markers never get scanned, and
the rescue path never runs for Gemma 4.

In live runs Gemma 4 always emits <channel|> cleanly before <eos>, so
the rescue isn't needed on the happy path; if Gemma 4 ever truncates
mid-thought under context pressure we will need a real dialect-aware
rescue (per-arch close-tag token + extended gate) and a follow-up will
add that. For this PR we just want the dead code gone so it doesn't
mislead future readers about what's actually wired up.

Net: -9 lines, file is now identical to upstream main.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(llm): switch gemma4 fixtures from unsloth to bartowski

The unsloth GGUF pack
(huggingface.co/unsloth/gemma-4-E2B-it-GGUF) tags <turn|> as the EOG token
in tokenizer.ggml.eos_token_id and leaves <eos> classified as a regular
text token. Gemma 4's training-baked behaviour after assistant content is
to emit a few <eos> tokens before <turn|>, so with that pack the addon's
generation loop -- which terminates on llama_vocab_is_eog -- doesn't stop
until <turn|> arrives. We were observing ~9 spurious <eos> tokens
trailing every Gemma 4 response, eating into n_predict and KV cache for
no gain.

bartowski's GGUF
(huggingface.co/bartowski/google_gemma-4-E2B-it-GGUF) ships the exact
same vocabulary but tags <eos> as EOG (matching the base
google/gemma-4-E2B-it tokenizer config). With that pack the addon
terminates on the first <eos> -- empirically 0 trailing tokens, ~30 %
shorter completions on the same prompt, same dialect output that the
native-dialect parser added in 87e6c35 handles unchanged.

Verified on linux-x64/CPU (qvac-dev-linux-x64) with the same
get_weather tool prompt:

  unsloth Q8_0    : 941 chars, 9 trailing <eos>, EOG = {<turn|>, </s>}
  bartowski Q4_K_M:  676 chars, 0 trailing <eos>, EOG = {<eos>,    </s>}

Note: the unsloth metadata bug deserves an upstream issue against the
unsloth pack maintainers; this PR's scope is just to stop our tests
paying the wasted-tokens tax.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(llm): unblock gemma4 image test on mobile + fix ctx overflow

Three changes to packages/llm-llamacpp/test/integration/gemma4.test.js
(image-describe subtest):

1. Drop the mobile CPU-vision carve-out.
   useCpuForVision used to force `device: 'cpu'` on Android/iOS to dodge
   Adreno OpenCL SIGABRT and Mali Vulkan instability that bit us with
   the unsloth mmproj. With bartowski's mmproj (now the fixture in
   787c3322) we want CI to actually exercise the device-farm GPU code
   path for vision -- if that path regresses on a real Adreno or Mali
   chip we want to find out from CI, not by accident in production.
   Desktop x64-darwin / linux-arm64 keep CPU fallback because those
   hosts don't have a working GPU stack here.

2. Bump ctx_size 2048 -> 8192. A single elephant.jpg encodes to ~260
   mtmd image tokens. With ctx_size=2048 plus Gemma 4's verbose CoT
   preamble the generation loop overflowed nPast > n_ctx during
   sampling (MtmdLlmContext.cpp:452), throwing
   'processPromptImpl: context overflow'. 8192 leaves comfortable
   headroom on every backend.

3. Set reasoning-budget=0 for this test. We literally ask the model
   "Answer in one word" -- the <|channel>thought ...<channel|> CoT
   preamble that Gemma 4 wants to emit by default is wasted tokens
   here, and was the actual cause of the overflow above (CoT was
   running 8k+ tokens before the model reached the one-word answer
   and emitted <eos>). Disabling thinking gives us a deterministic
   ~10-token "Elephant" + <eos> response, which is what the
   substring-based assertion is testing for anyway.

Verified on linux-x64 (qvac-dev-linux-x64, 2x RTX 5090, Vulkan
backend) end-to-end:
  output: "Elephant"
  asserts: 3/3
  total time: ~2 s

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(llm): drop dead selectToolsCompactMarker(string) overload

selectToolsCompactMarker(const std::string& architecture) had no production
callers anywhere -- only its two unit tests
(SelectToolsCompactMarkerForQwen3,
SelectToolsCompactMarkerForUnsupportedArchitecture) referenced it. Live
production code goes through selectToolsCompactMarkerForModelMetadata
(LlamaModel::resolveToolsCompactConfig calls that one), which takes
std::optional<std::string> and is the only path that ever reaches the
"qwen3" -> "<tool_call>" mapping at runtime.

Removed the .cpp definition, the .hpp declaration, and the two unit
tests. selectToolsCompactMarkerForModelMetadata is unchanged and still
covered by SelectToolsCompactMarkerForModelMetadataUsesArchitecture.

ChatTemplateUtilsTest now runs 19/19 tests on linux-x64 (was 21/21).

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(llm): drop redundant useCpuForVision alias; vision runs on GPU on mobile

After we removed the per-mobile CPU carve-out for Gemma 4 vision (commit
2843297) and never had one for Qwen3.5 vision, useCpuForVision was just
a no-op alias of useCpu used at exactly one call site each. Inline it.

Net effect on the device routing matrix is unchanged but explicit:

  platform/arch              useCpu  device used
  --------------------------------------------------------
  darwin-x64                 true    cpu  (no working GPU here)
  linux-arm64                true    cpu  (no working GPU here)
  darwin-arm64 (M-series)    false   gpu  (Metal)
  linux-x64                  false   gpu  (Vulkan/OpenCL)
  ios                        false   gpu  (Metal -- device farm)
  android                    false   gpu  (Adreno OpenCL / Mali Vulkan -- device farm)

So on iOS / Android the gemma4 and qwen3-5 image-describe subtests run
through the actual GPU vision path -- the same path users hit -- and
will surface any regression from CI rather than from production.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(llm): correct thinkingForcedOpen_ comment re: gemma4

Gemma4 does not hit this code path: upstream
common_chat_params_init_gemma4 explicitly leaves thinking_forced_open
unset because gemma4's reasoning channel is model-emitted. Drop the
misleading reference and call out the actual templates that trigger
this path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): refresh PR-1874 entries to reflect actual shipped scope

The original CHANGELOG entries for llm-llamacpp 0.20.0, embed-llamacpp
0.16.0, and translation-nmtcpp 3.0.0 were drafted before the synthesizer
revert, the M1 / sawMali / dead-code cleanups, the bartowski fixture
swap, the native-dialect tool-call parsing, the reasoning-budget knob,
the thinkingForcedOpen synthetic-opener, the new integration tests, and
the move from 8189.0.0 to 8189.0.2. They now match what the PR
actually ships.

Compressed every entry to a flat bullet list grouped by Keep-a-Changelog
section (Changed / Added / Removed / Fixed / Deprecated / Internals)
and bumped the date to 2026-05-10.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(changelog): trim items that round-trip to net-zero in the PR

Removed lines that described code that's neither in upstream/main nor in
the PR head (so it has no observable impact on consumers):

- llm-llamacpp 0.20.0:
  * "tool-call streaming: each model now streams its native dialect /
     no re-shaping" -- main already streamed native dialects; the
     PR-internal synthesizer never shipped, so this is a non-change.
  * "Dropped sawMali plumbing / Apple-M1 detection / dead Gemma 4
     markers in Qwen3ReasoningUtils" -- all three were added and
     removed inside this PR's commit history; net diff is zero.

- embed-llamacpp 0.16.0:
  * "Dropped Mali-detection plumbing" -- same: added and removed
     within this PR's history, net diff is zero.

Kept genuine net removals against upstream/main:
- Qwen3 model-name-based fallback.
- Dead `selectToolsCompactMarker(std::string)` overload (was
  pre-existing in main, only ever called from unit tests).

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(notice): regenerate NOTICE for embed-llamacpp, llm-llamacpp, translation-nmtcpp

Re-ran the notice-generate skill (.cursor/skills/notice-generate) for the
three addons whose dependency surfaces changed in this PR:

- qvac-fabric bumped from 7248.x to 8189.0.2 -- different transitive C++
  license set.
- holepunch / hyperswarm libs moved to peerDependencies on main, so the
  JS attribution lists shrink accordingly.
- @qvac/infer-base bumped to 0.4.1.

Per-package C++ resolution after the run:

  embed-llamacpp        : opencl/qvac-fabric/qvac-lib-inference-addon-cpp/
                          qvac-lint-cpp + libc++          (5 deps)
  llm-llamacpp          : the above + picojson + nlohmann-json (7 deps)
  translation-nmtcpp    : bergamot-translator/sentencepiece/ssplit/
                          qvac-fabric/qvac-lib-inference-addon-cpp/
                          qvac-lint-cpp + libc++          (7 deps)

Net: +206 / -585 lines across the three NOTICE files (mostly transitive
JS attribution shrink from the holepunch peerDeps refactor).

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(llm): make gemma4 reasoning-budget test tolerate model-emitted reasoning

Gemma 4's reasoning channel is model-emitted (no template force-open),
so the model decides per-prompt whether to engage reasoning. For
trivial prompts like "What is the capital of France?" the model can
short-circuit and skip the <|channel>thought…<channel|> markers, which
made the test flaky on CI.

Gate the marker / length assertions on the baseline actually emitting
the opening marker; if it didn't, log a comment and skip the dependent
checks instead of failing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* types(llm): declare reasoning_budget in LlamaConfig

The C++ config parser already accepts `reasoning_budget` (and the
kebab-case `reasoning-budget` alias), but neither was a typed property
on `LlamaConfig` — they only typechecked via the catch-all index
signature. Add a typed entry with JSDoc so TypeScript consumers get
autocomplete and the accepted values (-1 default, 0 disabled).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(llm): allow per-request reasoning_budget override in run()

`reasoning_budget` was load-time only. Add it to `GenerationParams` so
`model.run(messages, { generationParams: { reasoning_budget: 0 } })`
can disable reasoning for a single request without re-loading the
model — same shape as `temp` / `top_p` / `seed` overrides.

Wiring:
- `LlmContext::GenerationParams` gains an optional `reasoning_budget`
  field and `hasOverrides()` covers it.
- `applyGenerationParamsToContext` snapshots / overrides /
  restores `params.reasoning_budget` alongside `n_predict`.
- `AddonJs::runJob` parses `generationParams.reasoning_budget` from
  JS and rejects values other than `-1` or `0`.
- `index.d.ts` exposes `reasoning_budget?: -1 | 0` on
  `GenerationParams` with a JSDoc note.

`tokenizeChat` already reads `params_.reasoning_budget`, so no change
is needed in `TextLlmContext` / `MtmdLlmContext` — the temporary
override naturally propagates to `inputs.enable_thinking`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(llm): cover per-request reasoning_budget override on Qwen3.5

Validates the new per-request `generationParams.reasoning_budget`
override end-to-end in two runs against a single loaded model:

1. `reasoning_budget: 0` override suppresses the `<think>…</think>`
   reasoning markers for that one request.
2. The next `run()` with no override restores the load-time default
   (reasoning enabled), proving the override is request-scoped and
   not sticky.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(llm): case-insensitive antiprompt substring matching

`checkAntiprompt` now lowercases both the recent output window and each
antiprompt before the `find()` so a single `Pizza` entry catches the
model's `pizza`, `Pizza`, `PIZZA`, etc. Callers no longer need to list
every casing variant. Applied identically in `TextLlmContext` and
`MtmdLlmContext`. The token-level early-exit path is unchanged (BPE
tokens are case-specific; the substring path is the authoritative
check).

Also drop the stale comment on the `Reverse prompt stops generation`
scenario in `config-parameters.test.js`: it claimed the addon split
on `,` without trimming, but `LlamaModel.cpp::split()` already
trims and drops empty segments. Replaced with a brief note that
documents the new (current) behaviour and simplified the antiprompt
list to `'network, Pizza, bitcoin, blockchain'` so the test exercises
both the trim and the case-insensitive match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(llm): stress case-insensitive antiprompt with PiZzA mixed-case entry

Swap the `Pizza` reverse_prompt entry for `PiZzA`. With case-sensitive
matching `PiZzA` would never match the model's `pizza` / `Pizza`
output; only case-insensitive comparison fires the stop. Verified
locally — the test still completes with output length 5, so the
antiprompt trips on the first emitted "Pizza".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(llm): validate reasoning_budget before truncating to int

Address @jpgaribotti's review: previously the value was cast to int
*before* the `0` / `-1` check, so fractional inputs like `0.5` or
`-1.1` would silently truncate to a "valid" 0 / -1 and pass through.

Validate against the exact double values (both `0` and `-1` are
exactly representable in IEEE-754, so `==` comparison is safe) before
casting to int when storing in `ov.reasoning_budget`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(llm): use std::from_chars for reasoning_budget load-time parse

Address @jpgaribotti's review: `std::stoi` silently accepts trailing
garbage (`"0abc"` → `0`) and throws an uncaught `std::out_of_range`
for inputs that overflow `int`. Switch to `std::from_chars`, which
fails clean on non-numeric input, overflow (`errc::result_out_of_range`),
and trailing garbage (`ptr != end`), then validate against the
allowed `-1` / `0` values in the same check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…therto#1956)

- Bump @qvac/rag to 0.5.0.
- Add packages/rag/changelog/0.5.0/{CHANGELOG.md,CHANGELOG_LLM.md}.
- Prepend [0.5.0] entry in root packages/rag/CHANGELOG.md (Keep a Changelog format).
- Regenerate packages/rag/NOTICE.


(cherry picked from commit cbdbaea)

Co-authored-by: Cursor <cursoragent@cursor.com>
…1966)

Migrates `environment: release` -> `environment: npm` on every job that
invokes `./.github/actions/publish-library-to-npm`, in lockstep with the
github-ops repo config (qvac/repos.json npm trustedPublishing.environment)
and the npmjs Trusted Publisher records (QVAC-18610).

Scope: only the npm-publishing jobs are flipped. Build, GPR-publish,
publish-logic, release-merge-guard, lint-and-test and other jobs that
reference `environment: release` for `secrets.PAT_TOKEN` access are left
untouched. `id-token: write` is preserved on every flipped job.

Files: 16 changed, 18 jobs flipped:
- publish-sdk.yml: publish-npm
- publish-registry-server.yml: publish-schema-npm, publish-client-npm
- on-merge-{bci-whispercpp,decoder-audio,diffusion-cpp,embed-llamacpp,
  llm-llamacpp,ocr-onnx,onnx,transcription-parakeet,
  transcription-whispercpp,translation-nmtcpp,tts-ggml,tts-onnx}.yml:
  publish-(release-)?npm
- trigger-reusable-lib-cli.yml: publish-release-npm
- public-reusable-npm.yml: pull-request-event, push-event

Co-authored-by: Cursor <cursoragent@cursor.com>
…1968)

* QVAC-18608 feat(actions): add label-gate composite action (Node 20)

Introduces a new `.github/actions/label-gate` action that authorises
secret-bearing workflow jobs based on whether a trusted actor has applied
a "verified" label to the pull request. Replaces per-job environment
approvals as the primary trust gate for PR-triggered workflows.

Trust model:
  - Trusted events (push, workflow_dispatch, workflow_call, schedule,
    release) -> always authorised.
  - PR events (pull_request, pull_request_target) -> authorised iff the
    applier of the configured `label` is in the `users` allowlist OR is
    an active member of any team in `teams`. Login comparison is
    case-insensitive.
  - Synchronize from a non-trusted sender -> strip the label, deny.
  - Anything else -> fail closed.

Inputs:
  - label         (default: "verified")
  - teams         (default: qvac-internal-{dev,merge,release})
  - users         (default: empty) -- new explicit allowlist
  - github-token  (required; needs read:org and PR-label write)

Output:
  - authorised ("true" | "false") -- downstream jobs gate via
    `if: needs.<id>.outputs.authorised == 'true'`.

Implementation:
  - Pure-Node 20 action; no npm dependencies, no bundler, no `dist/`
    artifact to maintain. Three small ESM modules in src/.
  - github-client.mjs: native-fetch wrapper for the three endpoints used
    (team membership, issue timeline, label deletion); retry-with-
    exponential-backoff on 5xx and 429; full pagination on the timeline;
    idempotent label strip; URL-encodes inputs.
  - gate.mjs: pure async decision function with an injected client;
    never throws on policy denials.
  - index.mjs: action entrypoint; reads INPUT_* env, writes
    `authorised=` to $GITHUB_OUTPUT, emits structured `::notice::` /
    `::warning::` / `::error::` annotations. Hard misconfig (missing
    token, unreadable event payload, unhandled API error) exits non-zero
    so the gate job goes red. Soft denials exit 0 with
    `authorised=false`.

Tests:
  - 41 tests via the built-in `node:test` runner (no test deps).
  - test/gate.test.mjs: 26 policy tests covering every event type, both
    fast-path and timeline-path resolution, synchronize protection,
    user-allowlist precedence, empty-config denial, and input
    validation.
  - test/github-client.test.mjs: 15 HTTP tests covering retry policy,
    pagination, 404-as-not-member, idempotent strip, URL encoding, and
    constructor validation. Uses an injected fetch stub.
  - test/fixtures/: 8 hand-rolled GitHub event payloads, including a new
    `labeled-by-allowlisted-user` case for the `users` input.

Run locally: `node --test .github/actions/label-gate/test/*.test.mjs`.

Note: the existing `authorize-pr` action is intentionally left in place
and unchanged. Migration to label-gate will happen workflow-by-workflow
in follow-up PRs to allow incremental rollout against production.

Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087
Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-18608 fix(label-gate): deny when gate label not currently applied

Pre-merge audit caught a label-strip bypass:

  1. Alice (team member) labels PR with 'verified'
     -> labeled event -> gate authorises -> secrets used
  2. Mallory (any contributor with triage) removes the label off-band
     -> unlabeled event fires but no workflow subscribes to it
  3. Alice (or anyone) pushes a new commit
     -> synchronize event -> gate runs
     -> sender = alice, isTrustedActor(alice) = true, falls through
     -> findLabelApplier walks the timeline and finds Alice's old
        labeled event from step 1 (the unlabeled doesn't disqualify it)
     -> applier = alice = trusted -> AUTHORISED
     -> ...even though the label is no longer on the PR.

Fix: consult `payload.pull_request.labels` (the authoritative current
label state) before trusting the timeline. If the gate label is not
currently applied, deny without making any GitHub API calls.

Also restructured the synchronize handler so the label-applied check
runs BEFORE the sender-trust API calls, avoiding 3 wasted team-membership
lookups per PR that doesn't actually have the gate label.

Tests:
  - REGRESSION: synchronize after label was removed -> deny even if
    timeline still shows trusted applier (must short-circuit before
    timeline lookup)
  - REGRESSION: opened/reopened PR with stale labeled timeline but no
    current label -> deny
  - synchronize from non-trusted with no label currently applied ->
    deny with zero API calls

44 tests now pass (was 41); end-to-end smoke against the new logic
verifies both the bypass scenario (denied, zero API calls) and the
happy path (allowlisted user labels -> authorised, zero team API calls).

Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087
Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
…etherto#1972)

* doc: addons - diffusion - update

* doc: addon - diffusion - put flux as main model to be used
Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
…etherto#1973)

The QVAC-18612 canary (PR tetherto#1971, run id 25672483584) hard-failed with
"required input 'github-token' is missing" even though the workflow
clearly passed `github-token: ${{ secrets.GITHUB_TOKEN }}`.

Root cause: `getInput` in src/index.mjs was uppercasing the input
name AND replacing hyphens with underscores, looking up
`INPUT_GITHUB_TOKEN`. The GitHub Actions runner (and @actions/core)
preserve hyphens — only spaces are replaced — so the runner sets
`INPUT_GITHUB-TOKEN`. The action never found the token and threw a
missing-input error.

The local smoke test that "passed" before merge set
`INPUT_GITHUB_TOKEN=...` (matching the buggy lookup) so both sides
were wrong in the same direction. This is exactly the failure mode
the canary was meant to surface; without it, the gate would have
failed across all 75 secret-bearing workflows on first PR after the
QVAC-18612 fan-out.

Fix:
  - getInput now uses `name.replace(/ /g, '_').toUpperCase()` —
    matching the runner / @actions/core convention exactly.
  - getInput is exported from src/index.mjs (with an injectable env
    arg) so the convention can be unit-tested.
  - Top-level main() is gated on `import.meta.url === argv[1]` so
    importing index.mjs from tests no longer triggers a real run.

Tests:
  - 9 new tests in test/index.test.mjs pin the env-var-name resolution:
      * INPUT_GITHUB-TOKEN (hyphen preserved) -> resolves
      * INPUT_GITHUB_TOKEN (hyphen replaced) -> does NOT resolve
        (locks the contract against accidental "helpful" rewrite)
      * spaces are still replaced with underscores
      * trim, missing-required, defaults-to-process.env
  - Total: 53/53 pass via `node --test`.
  - End-to-end smoke against the runner-correct env-var name
    (INPUT_GITHUB-TOKEN=...) confirms exit 0 and authorised=false
    on the no-label deny path.

Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087
Related: tetherto#1971

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

…c GGML backends (tetherto#2124)

* transcription-whispercpp: bump to 0.7.1 with whisper-cpp 1.8.4.3#1 (QVAC-18993)

Pull in the consolidated vcpkg PR (whisper-cpp 1.8.4.3 tetherto#1 +
ggml-speech 2026-05-18 tetherto#1) that covers four asana tickets:

- QVAC-18991: whisper.cpp upstream-sync from ggml-org/master to
  v1.8.4.3.  Adds upstream's VAD streaming API
  (whisper_vad_detect_speech_no_reset, whisper_vad_reset_state)
  with a regression test, the macOS Vulkan persistent-pipeline
  cache, and various BCI / bindings fixes.
- QVAC-18300: enables OpenCL on Whisper for Android, gated
  behind a new `opencl` feature.  This package now declares an
  android-only `opencl` feature that wires through to the
  whisper-cpp port's opencl feature, so a transcription addon
  built for android-arm64 can ship the Adreno backend without
  forcing it on non-Adreno consumers.
- QVAC-18992: rebases the speech-stack ggml (qvac-ext-ggml@speech)
  onto the same upstream v0.10.2 baseline that whisper.cpp's
  bundled ggml uses, so the QVAC speech stack (whisper +
  parakeet + tts-cpp) consumes a coherent ggml API surface.
  No direct dependency from this package -- transitive via
  other speech-stack addons sharing the Android process.
- QVAC-18993: switches the Android build to pure
  dynamic-backend mode: GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON
  on both the whisper-cpp port and ggml-speech port, so the
  addon's .bare prebuild ships one libggml-cpu-android_armv*_*.so
  per microarchitecture plus dynamically-loaded
  libggml-vulkan.so / libggml-opencl.so.  ggml's loader picks
  the highest-feature CPU variant (armv9.2_2 .. armv8.0_1) plus
  the right GPU backend (Adreno 700+ -> OpenCL, everything else
  -> Vulkan) at runtime, so a single APK serves the whole device
  matrix without per-device builds.

vcpkg-configuration.json is TEMPORARILY pointed at
Zbig9000/qvac-registry-vcpkg.git @ b5a5e199 (= QVAC-vcpkg-speech-stack-android-dynamic-backend
HEAD on Zbig9000's fork) because the consolidated port versions
don't exist on tetherto/main yet.  Once the vcpkg PR lands the
default-registry block must be re-pointed back to
https://github.com/tetherto/qvac-registry-vcpkg.git with the
post-merge tetherto/main SHA as baseline.

Devicefarm: the asana asks for GPU testing on mobile to verify
S25 picks OpenCL and Pixel 9 picks Vulkan.  Those tests live
outside this addon (in qvac CI's integration-mobile-test
workflow) and depend on device-farm config that I can't validate
locally; the addon code side is unchanged in this bump (CPU
dispatcher + dynamic backend `.so` files are already wired by
the whisper-cpp port's prebuild output, and the JS layer
already enumerates ggml_backend_devs at init).

* transcription-whispercpp: bump to 0.7.2 with whisper-cpp 1.8.4.3#2 (QVAC-18993)

Picks up the Android per-arch CPU dlopen fallback patch added to the
whisper-cpp port (mirrors qvac-ext-ggml@speech 9562ed04). Without
this, every APK consumer with `useLegacyPackaging=false` (AGP 3.6+
default) would silently lose CPU init: the directory iterator finds
nothing inside compressed APK libs, and the existing on-disk filename
fallback never composes the per-arch `libggml-cpu-android_armv*_*.so`
names that `GGML_CPU_ALL_VARIANTS=ON` produces.

Re-pins the Zbig9000/qvac-registry-vcpkg default-registry baseline to
86257dc376ca043c67cc4805ab8d1e74a94b7eda so both whisper-cpp 1.8.4.3#2
and ggml-speech 2026-05-19#0 are reachable.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: bump to 0.7.3 → whisper-cpp 1.8.4.3#3 (QVAC-18993)

Pure follow-up to 0.7.2 -- the two Android dynamic-backend ggml fixes
the 0.7.2 release pulled in via vcpkg patches are now upstreamed as
commits on tetherto/qvac-ext-lib-whisper.cpp PR tetherto#26 ("ggml + tts-cpp
Android dynamic-backend overlays") instead of being carried in the
vcpkg port's patches/ tree. Plus a tts-cpp `<atomic>` include fix
that closes the parallel speech-stack consumer's build under the
day-2 ggml-speech merge.

Build output is bit-identical to 0.7.2 (whisper-cpp 1.8.4.3#3 SOURCE
== 1.8.4.3#2 SOURCE+PATCHES, verified by hashing all
libggml-cpu-android_armv*_*.so files from the NDK r29 cross-compile).

Registry baseline bumped to 965f5e5a so the new port-version
(1.8.4.3#3) is reachable.

PRs in the cross-repo set:
  whisper.cpp tetherto#26 (Zbig9000:QVAC-18993-bundled-ggml-android-dynamic-backend)
  vcpkg tetherto#152 (Zbig9000:QVAC-vcpkg-speech-stack-android-dynamic-backend)

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: bridge ggml dlopen backends as IMPORTED targets (QVAC-18993)

`bare-make generate` failed on android-arm64 with

    CMake Error: get_target_property() called with non-existent target
    "ggml::ggml-cpu-android_armv8.0_1"  (… 8 backends total)

after enabling `GGML_BACKEND_DL=ON` on the `whisper-cpp` port. With dynamic-
backend mode, ggml builds the per-arch CPU + GPU backends as standalone MODULE
libraries that ggml dlopens at runtime; upstream ggml's `install(TARGETS … EXPORT)`
deliberately skips them, so the consumer's `BACKEND_DL_LIBS` loop in
`CMakeLists.txt` referenced targets that don't exist.

Wrap the existing loop with a `if(NOT TARGET ggml::${_backend})` fallback that
locates the `.so` under `${VCPKG_INSTALLED_PATH}/bin` via `find_library` and
materialises a `SHARED IMPORTED` target locally with `IMPORTED_NO_SONAME=TRUE`
— then bundle via the existing `INSTALL TARGET` path. Mirrors the pattern that
already ships in `packages/diffusion-cpp` for the same Android-dlopen
build mode.

Static backends (any platform that links ggml in directly) still find their
imported target via ggml-config.cmake on the first branch, so non-Android
prebuilds are byte-identical.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin baseline to vcpkg PR tetherto#152 rebased HEAD 8c6ca188 (QVAC-18993)

tetherto/qvac-registry-vcpkg/main moved forward yesterday with tetherto#156
(parakeet-cpp 2026-05-20 + ggml-speech 2026-04-09#2 bumps), so vcpkg
PR tetherto#152 was rebased onto the new base 0e75457. Update the default-
registry baseline pointer from the old PR tetherto#152 HEAD (dffaaf6) to the
rebased HEAD (8c6ca188) so the version-resolver still finds
`ggml-speech 2026-05-19#3` (now layered on top of the just-landed
2026-04-09#2) and `whisper-cpp 1.8.4.3#3` (unchanged content,
correct SHA512).

No other changes --- the resolver picks up the same final versions
of every package as before, just with the rebased baseline as the
search root.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: consume whisper-cpp 1.8.4.3#4 + ggml-speech 2026-05-19#4 (QVAC-18993, QVAC-18992)

Picks up the MSVC `/I` fix in the spirv-headers include-shim (vcpkg
PR tetherto#152 commit 5cd209c) so prebuild / win32-x64 stops dying with
`c1xx: fatal error C1083: Cannot open source file: '.../x64-windows/include'`
on the `whisper-cpp[vulkan]` configure step. The shim now emits the
MSVC-style `/I<path>` on Windows and keeps `-isystem <path>` (with
warning suppression) on GCC/Clang elsewhere.

whisper-cpp override bumped 1.8.4.3#3 -> 1.8.4.3#4.
Default-registry baseline bumped 8c6ca188 -> 5cd209c1.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: wire ENABLE_OPENCL so Android prebuilds ship libggml-opencl.so (QVAC-18300)

The `opencl` feature was declared in `packages/transcription-whispercpp/vcpkg.json`
(gated to `platform: android`) and the `whisper-cpp` port's `opencl` feature
correctly enables `-DGGML_OPENCL=ON` on Android — but the consumer's
`CMakeLists.txt` only appended `"tests"` and `"vulkan"` to
`VCPKG_MANIFEST_FEATURES`. The `opencl` feature was therefore never activated,
so vcpkg resolved `whisper-cpp` without `[opencl]`, ggml was built without
`GGML_OPENCL=ON`, and the `android-arm64` prebuild silently shipped CPU + Vulkan
backends only (no `libggml-opencl.so`) — defeating the entire point of
QVAC-18300.

Add an `ENABLE_OPENCL` option (default `ON` on Android, `OFF` elsewhere — the
`vcpkg.json` feature is `platform: android` gated so non-Android is a no-op
anyway) that appends `"opencl"` to `VCPKG_MANIFEST_FEATURES`. Mirrors the
`SD_OPENCL` pattern in `packages/diffusion-cpp/CMakeLists.txt` and keeps the
GPU-feature wiring uniform across the three GPU backends (Metal auto, Vulkan
toggle, OpenCL toggle).

After this commit, the `android-arm64` prebuild's
`qvac__transcription-whispercpp/` directory should ship `libggml-opencl.so`
alongside the existing 7 per-microarch CPU variants and `libggml-vulkan.so`.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: default ENABLE_OPENCL ON unconditionally (QVAC-18300)

Previous commit (6b42bc0) wired ENABLE_OPENCL but gated it on
`_qvac_whispercpp_target_os STREQUAL "Android"`, mirroring the existing
ENABLE_VULKAN block. CI re-run (26172345624) exposed that the gate is broken:
at top-level CMakeLists.txt time, `CMAKE_SYSTEM_NAME` is not yet set --- the
bare-make Android toolchain file is loaded by `project()` (which runs *after*
the option block), so `_qvac_whispercpp_target_os` falls through to the host
OS ("Linux") and ENABLE_OPENCL stayed OFF on the android-arm64 prebuild.

Evidence from run 26172345624's android-arm64 build log:
  `Installing 9/9 whisper-cpp[core,vulkan]:arm64-android@1.8.4.3#4...`
                                ^^^^^^^^ no `[opencl]`

ENABLE_VULKAN works only by coincidence: Vulkan is also default-ON on the
Linux host detection branch, so the wrong target detection produces the right
behaviour. For Android-only features there is no such overlap.

Fix: default ENABLE_OPENCL ON unconditionally and let the actual platform
gating happen where it can: (1) the `platform: android` clause on the
`whisper-cpp[opencl]` dep in `vcpkg.json`, and (2) the `VCPKG_TARGET_IS_ANDROID`
check in the `whisper-cpp` portfile that gates `-DGGML_OPENCL=ON`. Adding
`"opencl"` to `VCPKG_MANIFEST_FEATURES` on non-Android is a guaranteed no-op
because the feature's only dep is platform-gated --- mirrors the layered
gating that `whisper-cpp[vulkan]` already uses (the `vulkan` feature's deps
are `!osx & !ios` gated and the portfile's `-DGGML_VULKAN=ON` is also
target-gated).

After this commit, the android-arm64 install plan should resolve as
`whisper-cpp[core,vulkan,opencl]` and the prebuild tarball should contain
`libggml-opencl.so` alongside the 7 per-microarch CPU `.so`s and
`libggml-vulkan.so`.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: call ggml_backend_load_all_from_path before whisper_init (QVAC-18993)

Android mobile-test E2E crashed inside whisper_init_from_file_with_params
with SIGABRT on PR tetherto#2124 / run 26173084690 (both Pixel 9 Pro + Samsung S25
Ultra, 132 ms after Downloaded model: ggml-tiny.bin). Stack:

  abort → ggml_abort+228 → ggml_backend_dev_backend_reg+48
       → whisper_init_with_params_no_state+480
       → whisper_init_from_file_with_params_no_state+212
       → whisper_init_from_file_with_params+48
       → WhisperModel::load()+460

Root cause: the addon never called ggml_backend_load_all*(). With the
QVAC-18993 GGML_BACKEND_DL=ON build, the bundled ggml-base no longer
defines GGML_USE_CPU, so the static ggml_backend_registry ctor registers
zero backends. whisper.cpp's first ggml_backend_init_by_type(CPU) returns
NULL → ggml_backend_dev_backend_reg(NULL) trips GGML_ASSERT(device).

This is the same crash signature on both the pre-OpenCL run 26170576156
and the post-OpenCL run 26173084690, so it is independent of the recent
OpenCL enablement. The mobile workflow last passed on
tmp-whisper-184-3-validation back on 2026-05-11, which predates
GGML_BACKEND_DL=ON.

Mirror the pattern used by every other ggml-based addon in the monorepo
(packages/{diffusion-cpp,llm-llamacpp,classification-ggml,…}):

* CMakeLists.txt — emit BACKENDS_SUBDIR (<bare_target>/<module_name>)
  compile def via bare_target / bare_module_target.
* WhisperConfig — add backendsDir field (sibling of the handler-driven
  maps so it bypasses WHISPER_CONTEXT_HANDLERS.at()).
* JSAdapter — read top-level backendsDir string directly from
  configurationParams into config.backendsDir.
* WhisperModel::load — on __ANDROID__, std::call_once →
  ggml_backend_load_all_from_path(backendsDir/BACKENDS_SUBDIR) before
  whisper_init.
* index.js — require('bare-path'); pass
  backendsDir: path.join(__dirname, 'prebuilds') in _load + reload.

No diff on non-Android (Linux/macOS/Windows/iOS): ggml's static ctor
keeps registering CPU there as before.

aiDocs/15-android-mobile-test-crash-fix.md has the full investigation
(crash extraction, layered root-cause, why every other ggml addon
already does this, follow-ups).

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin vcpkg baseline to cleaned PR tetherto#152 head (QVAC-18993)

PR tetherto#152 (qvac-registry-vcpkg) was rebased today to drop the ggml-speech
port bump (b4cf7b2) and the matching ggml-speech-side MSVC shim. Only
the whisper-cpp bump + whisper-cpp portfile MSVC `/I` fix remain. The
consumer-side migration to ggml-speech (QVAC-18992 / PR tetherto#13) stays open
on the speech branch but is no longer a prerequisite for this Android
dynamic-backend rollout.

New PR tetherto#152 HEAD: 9f4e8e20072d8a7a1e118a49c36aacf6af6b3e0d
Old (pre-cleanup): 5cd209c145a1d61636f1d44b4afe37868c298a8c

This addon does not depend on `ggml-speech` (it consumes the bundled
ggml inside `whisper-cpp`), so the dependency closure is unchanged.
Updated CHANGELOG to record the new baseline + the reason ggml-speech
got dropped.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: fix cpp-lint failures (clang-format + clang-tidy)

The prior CI run skipped cpp-lint entirely because the recent PR
commits only touched CMakeLists.txt / CHANGELOG.md. The new
ea298cf commit (QVAC-18993 mobile-test fix) added the first C++
diff in this branch, so cpp-lint now runs full clang-format
+ clang-tidy and surfaces three issues:

1. clang-format: JSAdapter.cpp had a one-line declaration broken
   across two lines (LLVM PointerAlignment=Left + AlignAfterOpen
   collapsed it). Reformatted in place.

2. clang-tidy [readability-identifier-naming]:
   WhisperHandlers.hpp:9 -- local `const int LANG_ID` violates the
   variable case style. Renamed to `langId` (lowerCamelCase, matches
   `checkLanguage` two lines above). Latent issue; never reported
   before because cpp-lint was a no-op on every prior PR commit.

3. clang-tidy [readability-identifier-naming]:
   WhisperModel.hpp:100 -- unused `set_weights_for_file(span, bool)`
   stub kept for parity with `transcription-parakeet` (which uses
   snake_case extensively for this exact API). Renaming would
   diverge from the parakeet pattern, so suppress with a single
   NOLINTNEXTLINE rather than touching the API surface.

Local repro: `cp packages/lint-cpp/.clang-format
packages/transcription-whispercpp/.clang-format` then
`git-clang-format --diff $(git merge-base HEAD origin/main) --
packages/transcription-whispercpp` reports `did not modify any
files`. The .clang-format copy is normally produced by
`packages/transcription-whispercpp/CMakeLists.txt:58
(configure_file COPYONLY)` during CMake configure.

[QVAC-18993][QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: reference QVAC-19071 in CHANGELOG

QVAC-19071 ([Whisper] Update qvac-registry-vcpkg and addon with new
port versions) is the meta task that bundles the registry-side port
bump (qvac-registry-vcpkg PR tetherto#152: whisper-cpp 1.8.4.3#4) with the
consumer-side addon bump (qvac PR tetherto#2124: transcription-whispercpp
0.7.3, baseline re-pin). No code changes; the work itself was
already covered by PR tetherto#152 + this PR. Adds the cross-reference so
the Asana ticket can be closed off this release cycle.

The QVAC-18992 ggml-speech migration (PR tetherto#13 + ggml-speech port
bump) stays deferred per the 2026-05-21 plan; it will land as a
follow-up port bump under the same QVAC-19071 umbrella.

[QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin baseline to consume whisper-cpp 1.8.4.3#5 (REF flipped to tetherto/master)

[whisper-cpp PR tetherto#28](tetherto/qvac-ext-lib-whisper.cpp#28)
(QVAC-18993 bundled-ggml --- Android dynamic backend + per-arch CPU
dlopen fallback) was merged today (2026-05-21, merge commit
`f3102199` on `tetherto/qvac-ext-lib-whisper.cpp/master`). With it
merged, `tetherto/master` now carries every commit the registry's
`whisper-cpp` port previously pulled from the temporary
`Zbig9000/qvac-ext-lib-whisper.cpp@14620c8857` branch:

  - PR tetherto#25 (QVAC-18991, upstream whisper.cpp sync) --- merged 2026-05-20
  - PR tetherto#27 (QVAC-18966, tts-cpp chatterbox <atomic> fix) --- merged 2026-05-20
  - PR tetherto#28 (QVAC-18993, ggml-backend android dynamic backend) --- merged 2026-05-21

[qvac-registry-vcpkg PR tetherto#152](tetherto/qvac-registry-vcpkg#152)
HEAD (`f2870372`) bumps `whisper-cpp` to port-version `1.8.4.3#5`
with the REF repoint --- byte-identical source tarball outside
`parakeet-cpp/` and `tts-cpp/` (separate vcpkg ports). This commit
just re-pins the consumer-side baseline so the addon resolves
against the new port-version.

  vcpkg-configuration.json default-registry baseline:
    9f4e8e20072d8a7a1e118a49c36aacf6af6b3e0d   (MSVC fix only, whisper-cpp 1.8.4.3#4)
      -> f2870372965e899ae1f8a221154d2b243a6c3d30  (+ whisper-cpp 1.8.4.3#5 REF repoint)

No code change in this monorepo --- pure baseline re-pin. CHANGELOG
updated to record both the new baseline and the (now superseded)
intermediate `9f4e8e2` pin.

Closes the consumer-side half of [QVAC-19071](https://tetherapp.atlassian.net/browse/QVAC-19071)
("Update qvac-registry-vcpkg and addon with new port versions").
Registry-side half = vcpkg PR tetherto#152 commit `f287037`.

[QVAC-18993][QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin baseline to whisper-cpp 1.8.4.3#0 (PR tetherto#152 review fixes)

@GustavoA1604 review on [qvac-registry-vcpkg PR tetherto#152](tetherto/qvac-registry-vcpkg#152)
requested three changes on the registry side:

  1. Drop the explanatory comment block at top of
     `ports/whisper-cpp/portfile.cmake`.
  2. Reset `port-version` 5 -> 0 (treat the tetherto REF repoint as
     a fresh start, not a continuation of the Zbig9000-branch series).
  3. Collapse the three historical `1.8.4.3` entries
     (`port-version` 3, 4, 5 -- never consumed off-fork) in
     `versions/w-/whisper-cpp.json` into a single `port-version: 0`
     entry with the new git-tree.

All three landed in PR tetherto#152 commit `ee71ecb`. This commit is the
consumer-side mirror:

  vcpkg-configuration.json default-registry baseline:
    f2870372965e899ae1f8a221154d2b243a6c3d30  (1.8.4.3#5, pre-review)
      -> ee71ecb5b286224377313e5a50558d11adbef3ac  (1.8.4.3#0, post-review)

  CHANGELOG entry updated:
    "1.8.4.3#5" -> "1.8.4.3#0" + note about port-version reset and
    history collapse + supersession line covers both prior pins
    (`9f4e8e2` MSVC fix, `f287037` 1.8.4.3#5).

No code change in this monorepo --- pure baseline re-pin. The
underlying whisper.cpp source bytes are unchanged (REPO + REF +
SHA512 in the portfile are identical between `1.8.4.3#5` and
`1.8.4.3#0`), so the produced binary is bit-for-bit equivalent.

[QVAC-18993][QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: 0.8.0 — address PR review

Collapses the 0.7.1/0.7.2/0.7.3 work into a single 0.8.0 release and
folds in Gustavo's PR tetherto#2124 review feedback:

- Bump version to 0.8.0; collapse CHANGELOG into a single 0.8.0 entry
- Bump whisper-cpp override to 1.8.4.3#0 (matches PR tetherto#152 collapse)
- Repoint default-registry to tetherto/qvac-registry-vcpkg @ a9d7e924
  (PR tetherto#152 merge commit on tetherto/main)
- vcpkg.json: model GPU features on transcription-parakeet's pattern —
  platform-gated whisper-cpp deps select [opencl,vulkan] on android,
  [vulkan] on linux/windows, and no GPU feature on apple. Drop the
  addon-side opencl/vulkan feature sections; CMake no longer carries
  ENABLE_OPENCL / ENABLE_VULKAN option indirection.
- index.js: nest backendsDir under whisperConfig (mirrors parakeet's
  parakeetConfig.backendsDir). Strip it from the wire-format whisperConfig
  map and surface it as top-level configurationParams.backendsDir before
  handing the config to the addon. Fix the stale _createAddon JSDoc that
  still described "LLM-specific settings".
- index.d.ts + README.md: document whisperConfig.backendsDir; drop the
  ENABLE_VULKAN build instructions (now controlled by vcpkg.json).
- Compact all the addon-side comments (CMakeLists.txt, JSAdapter.cpp,
  WhisperConfig.hpp, WhisperModel.cpp); drop every QVAC asana ticket
  reference; standardise the C++ log wording on
  "configurationParams.backendsDir".
- Drop "-D ENABLE_VULKAN=OFF" from the test:cpp:build / coverage:cpp:build
  npm scripts (no-op now that the option is gone).

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: 0.9.0 -> 0.8.0 (fold into single release)

Reverts the 0.8.0 -> 0.9.0 bump from the merge commit: per request, this
PR's release notes are folded into the existing 0.8.0 entry rather than
shipping as a separate semver step. Order: Added -> Changed -> Fixed
(from this PR) -> Removed (the OutputCallbackJs revert that landed on
main as 0.8.0 via tetherto#2133).

package.json bumped back to 0.8.0.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

zoq added 2 commits May 22, 2026 14:11
…est.js.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
@github-actions

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
@github-actions

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

@github-actions

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

olyasir and others added 5 commits May 22, 2026 21:53
…o#2210)

The shared composite refactor in tetherto#2153 dropped the `npm run mobile:copy-prebuilds`
step that the old monolith workflow ran before building the APK/IPA. That script
copies `weights/mobilenetv3_3class_v3_fp16.gguf` into
`test/mobile/testAssets/mobilenetv3_3class_v3_fp16.gguf.bin` (also staging test
images) so the React Native bundler packs them as assets and `global.assetPaths`
exposes them on-device.

Without it, every Android E2E run since 2026-05-20 has crashed bare on startup
with `Uncaught (in promise) Error: Mobile asset not found in global.assetPaths:
mobilenetv3_3class_v3_fp16.gguf.bin` (SIGABRT from libbare-kit in
`js_callback_s::on_call`).

Last known-good run on main: 26168109084 (2026-05-20 14:10Z). All Android E2E
runs since the refactor have failed identically across Pixel 9 Pro, Samsung
S25 Ultra, and S26 Ultra.

Mirrors the existing pattern in integration-mobile-test-vla.yml.
…#2208)

Co-authored-by: Bruno Campana <7632562+BrunoCampana@users.noreply.github.com>
…etherto#2212)

* infra[notask]: route classification cpp-tests linux to self-hosted

Move the classification-ggml cpp-tests Linux entry off GitHub-hosted
ubuntu-22.04 onto qvac-ubuntu2404-x64 (ubuntu-24.04), matching every
other addon's cpp-tests workflow (vla, llm, embed, diffusion).

Why: on GitHub-hosted jammy, `setup-llvm` apt-installs clang-19 + libc++-19
but does not run `update-alternatives`, so the unversioned `/usr/bin/clang++`
keeps resolving to the system default clang-14. The vcpkg build then
compiles ggml with clang-14 against libc++-19 headers and fails on the
new `__verbose_abort` / `_LIBCPP_BEGIN_NAMESPACE_STD` macros (warning
literally says "Libc++ only supports Clang 17 and later").

Self-hosted qvac-ubuntu2404-x64 is provisioned with clang-22 and the
alternatives wired correctly per CLAUDE.md's setup-llvm guidance, so
the unversioned `clang++` invocations resolve as expected.

Windows + macOS entries left as-is — they're currently green.

* infra[notask]: also route classification cpp-tests windows to self-hosted

Per review feedback on PR tetherto#2212 — bump windows-2022 → windows-2025 and
add runner: qvac-win25-x64, matching llm/embed/vla/diffusion cpp-tests.
@github-actions

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

@github-actions

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

tamer-hassan-tether and others added 6 commits May 23, 2026 12:27
Replace duplicated per-OS vcpkg clone/bootstrap steps with the shared setup-vcpkg action and drop the obsolete Windows user-profile override from reusable-prebuilds.
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…tetherto#2184)

* feat[bc]: migrate SDK parakeet transcription to 0.6.0 GGML

Replace ONNX multi-file parakeet loading with single-GGUF models,
duplex transcribeStream, and Q8_0 registry constants. Legacy ONNX
modelConfig fields raise LegacyParakeetModelDeprecatedError. Wire
local @qvac/transcription-parakeet 0.6.0 until publish.

Co-authored-by: Cursor <cursoragent@cursor.com>

* doc: standardize parakeet transcription example headers

Align all six parakeet examples on a consistent file header:
title, usage, brief description, and requirements only when needed.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: address PR review — restore API docs, legacy endOfTurn wire

- Restore reference/api/index.mdx from main; keep only parakeet error update
- Preprocess endOfTurn to accept legacy whisper frames without source
- Whisper plugin: forward legacy addon endOfTurn (was silently dropped)
- Document wire compatibility in transcription.mdx and 0.11.0 breaking.md

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: stabilize parakeet-stream-iterator-throw on iOS e2e

Bare-RN tears down transcribeStream sessions asynchronously over JSI.
After a consumer-side iterator throw, wait and retry before opening a
recovery session so Device Farm does not see zero text events.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat[api]: wire Sortformer v2.1 streaming example and model constant

Use transcribeStream with AOSC load config and add
PARAKEET_SORTFORMER_STREAMING_4SPK_V2_1_Q8_0 to models.ts (blob
metadata placeholder until update-models after registry sync).

* feat[api]: adopt Sortformer v2.1 registry models in SDK

Run update-models for PARAKEET_SORTFORMER_4SPK_V2_1_* (f16/q4/q8).
Wire streaming example to v2.1 + transcribeStream/AOSC; point batch
example and e2e parakeet-sortformer resource at v2.1 q8_0.

* chore: remove manual parakeet entry from 0.11.0 breaking changelog

* fix unit test

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          58                18    68.97%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         152               106    30.26%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          38                12    68.42%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          74                17    77.03%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          36                15    58.33%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         138                61    55.80%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         154                80    48.05%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          58                25    56.90%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          72                57    20.83%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%         920               460    50.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

verified Authorize secrets / label-gate in PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.