Skip to content

QVAC-17481 feat[api]: integrate @qvac/classification-ggml into SDK#2056

Closed
DmitryMalishev wants to merge 1233 commits into
tetherto:mainfrom
DmitryMalishev:qvac-17481-classification-ggml-sdk
Closed

QVAC-17481 feat[api]: integrate @qvac/classification-ggml into SDK#2056
DmitryMalishev wants to merge 1233 commits into
tetherto:mainfrom
DmitryMalishev:qvac-17481-classification-ggml-sdk

Conversation

@DmitryMalishev

@DmitryMalishev DmitryMalishev commented May 14, 2026

Copy link
Copy Markdown
Contributor

SDK integration for the newly added Image Classification GGML addon.

Standalone addon PR merged: #1727

What this PR ships

  • ggml-classification plugin — wraps @qvac/classification-ggml (ImageClassifier) as a standard SDK plugin with skipPrimaryModelPathValidation: true (model is bundled inside the addon package, no download required)
  • classify() client API — accepts a Uint8Array image (JPEG/PNG or raw RGB), encodes to base64 for RPC transport, returns ClassificationResult[]
  • New schemas: classificationConfigSchema, classifyRequestSchema, classifyResponseSchema, ClassifyClientParams, ClassificationResult
  • PLUGIN_CLASSIFICATION added to SDK_DEFAULT_PLUGINS; ADDON_CLASSIFICATION = "@qvac/classification-ggml" added as an addon constant
  • ggmlClassification ModelType registered with "classification" alias, engine-to-addon map entry, registry engine enum, and ModelInfo.addon enum
  • loadModel supports optional modelSrc — omit it to use the bundled weights

API usage

import { QvacClient, classify } from "@qvac/sdk";

const client = new QvacClient();

// Load with bundled model (no modelSrc needed)
await client.loadModel({ modelType: "classification" });

// Or load a custom GGUF
await client.loadModel({
  modelType: "classification",
  modelSrc: "./my-classifier.gguf",
  modelConfig: { topK: 3 },
});

// Classify an image
const imageBytes = await fs.readFile("photo.jpg");
const results = await classify({
  modelId: "classification",
  image: imageBytes,       // Uint8Array — JPEG, PNG, or raw RGB
  topK: 3,
});

// results: [{ label: "food", confidence: 0.91 }, ...]
console.log(results);

lauripiisang and others added 30 commits April 22, 2026 13:15
…lows for SDK (tetherto#1653)

* infra: add suite filtering and PR-triggered e2e test workflows for SDK

- Add workflow_call trigger + suite/exclude-suite inputs to test-sdk.yml
- Thread suite/exclude-suite through desktop, android, and iOS reusable workflows
- Create on-pr-test-sdk.yml for label-based and release-branch triggers
- Add suite choice dropdown (full/smoke/custom) for manual dispatch

* infra: add Device Farm artifact download and upload to Android SDK test workflow

* infra: fix CodeQL alert in on-pr-test-sdk.yml

Remove git fetch/diff of PR head refs that triggered "checkout of
untrusted code in trusted context" alert. Use sparse checkout (only
authorize-pr action) and rely on the trigger-level paths filter for
SDK change detection on release branch PRs.

* infra: improve Android Device Farm logging with continuous background capture

Replace inline logcat consumption with background capture to a persistent
log file, matching the iOS pymobiledevice3 pattern. Adds full unfiltered
logcat dump and React Native log extraction in post_test.

* infra: increase Device Farm cleanup wait to 30 min for artifact download

The STOPPING state can take 10+ minutes on Device Farm. Previous 5-min
wait (30x10s) was insufficient, resulting in empty artifact downloads.
Now waits up to 30 min (120x15s) and merges stop+wait+download into a
single step. Applied to both iOS and Android cleanup jobs.

* fix: fix security issue caused by allowing out-of-date labeled prs
…1704)

Nested android/ios jobs in test-sdk.yml require id-token:write for AWS
OIDC credential exchange. The caller workflow permissions are the
ceiling for all nested reusable workflows — without this, GitHub
rejects the entire workflow chain at startup with startup_failure.
tetherto#1583)

* QVAC-17057 feat: add bci-whispercpp package for BCI neural signal transcription

Add a new @qvac/bci-whispercpp addon that transcribes brain-computer
interface neural signals into text using a modified whisper.cpp backend.

This POC includes:
- C++ native addon with BCI model inference (NeuralProcessor, BCIModel,
  BCIConfig) built on the qvac addon-cpp framework
- CMake + vcpkg build system with whisper-cpp overlay ports carrying
  BCI-specific patches (variable conv1 kernel, windowed attention)
- JavaScript API: BCIWhispercpp class with batch transcribeFile/transcribe
- Integration tests for load/destroy and batch transcription
- Example script and model conversion tooling
- WER utility for accuracy measurement

Streaming transcription will be added in a follow-up PR (QVAC-17062).

Made-with: Cursor

* fix[api](bci): address review feedback, refactor to infer-base pattern, fix Linux linkage

- Refactor BCIWhispercpp to use createJobHandler + exclusiveRunQueue
  from @qvac/infer-base instead of manual promise plumbing, matching
  the TranscriptionWhispercpp / LlmLlamacpp addon pattern
- Constructor now takes { files: { model }, logger, opts } (was { modelPath })
- transcribe/transcribeFile return QvacResponse
- Add unload(), getState(), exclusiveRunQueue-serialized destroy()
- Add @qvac/infer-base dependency

Address all review feedback from Gustavo (PR tetherto#1583):
- Remove unused END_OF_INPUT, totalSamples_, sleep_for(1ms)
- Use QvacErrorAddonBCI for model-not-found, add BUFFER_LIMIT_EXCEEDED
- Fix n_threads/duration_ms double→int conversion in BCIConfig.cpp
- Add bounds validation for all BCIConfig numeric params
- Throw on unknown config keys (was silently ignored)
- Consume gpu_device in context params
- Collect whisper timings in runtimeStats()
- Trim unused BCIErrors enum values, map codes to distinct names
- Add MAX_BUFFERED_BYTES guard and nextSafeId in bci.js
- Fix _activeJobId race: set after native acceptance
- Remove unimplemented bciConfig params from JS whitelist + index.d.ts
- Promote hardcoded kernel-trim threshold to named constant
- Pre-allocate dummyAudioPad_ as class member (avoid repeated allocs)
- Rename bci-addon.test.js → addon.test.js
- Replace t.skip() with proper assertions
- Fix day_idx handling in tests/examples (group by day, pass to config)
- Generate comprehensive NOTICE file
- Update vcpkg overlay to v1.8.4 description

Fix Linux C++ test linkage:
- Add vcpkg triplets (x64-linux, arm64-linux) with -stdlib=libc++
- Add linux-clang toolchain (clang-19)
- Set VCPKG_OVERLAY_TRIPLETS in CMakeLists.txt for Linux builds

Made-with: Cursor

* perf(bci): bump whisper-cpp overlay to include mask caching and per-layer flash attn

Update whisper-cpp overlay to 5645ad60 which includes:
- Cached window_mask recompute for exp_n_audio_ctx overrides
- Per-layer flash attention (upper encoder layers use FA even with BCI)
- std::abs instead of C abs in mask computation

Made-with: Cursor

* chore(bci): bump whisper-cpp overlay to include jpgaribotti review fixes

Update overlay to tetherto/qvac-ext-lib-whisper.cpp@3e91e3a4 which
addresses jpgaribotti's review on PR tetherto#10:

1. Extract compute_window_mask() helper to eliminate duplicated
   O(n_ctx^2) mask fill logic
2. Guard encode-time mask block with hparams.is_bci
3. Add is_bci to graph builder window_mask guard
4. Validate BCI hparams (conv1_kernel > 0, window_size >= 0)
5. Document n_mels > 256 threshold convention

Bump port-version to 3.

Made-with: Cursor

* fix(bci): add test fixture download to download-models.sh

Address Gustavo's review feedback: test fixtures (neural_sample_*.bin)
are gitignored but the PR had no way for developers to obtain them.

Rewrite download-models.sh to fetch both models and test fixtures from
the bci-test-assets-v0.1.0 GitHub release. Supports --models,
--fixtures, or both (default).

Made-with: Cursor

* fix(bci): address review findings — version mismatch, test indexing, cleanup

- Bump whisper-cpp override in vcpkg.json from 1.7.5.1 to 1.8.4 to
  match the overlay port version
- Move gtest to a vcpkg "tests" feature so it is only pulled when
  BUILD_TESTING=ON
- Fix PaddedFramesAreZero test: use mel-major indexing
  (data[bin * n_frames + frame]) matching the actual processToMel layout
- Remove four unused overlay patch files (0001–0004) now that
  portfile.cmake fetches from the tetherto fork with patches baked in
- Add TODO comment in download-models.sh noting the temporary personal
  fork for release assets

Made-with: Cursor

* fix(bci): address review findings — race guard, cross-platform path, docs accuracy

- Wrap transcribe() in exclusiveRunQueue to prevent race between
  inference and unload/destroy
- Use find_last_of("/\\") in loadEmbedderIfNeeded for Windows compat
- Add empty-buffer guard in bci.js append() before end-of-job
- Update download-models.sh to use tetherto/qvac release repo
- Add transformers to NOTICE and README model conversion prerequisites
- Fix README WER table to match actual live test results (6.0% avg)
- Fix BCI_V184_COMPAT.md stale test filename and overlay ref
- Remove unused bci_wer_vs_expected field from manifest.json
- Update whisper.cpp patches section to reflect fork-based overlay

Made-with: Cursor

* fix(bci): harden lifecycle, type safety, and C++ code quality

- Fix unload/destroy race: call destroyInstance() before _job.fail()
  so the native side stops before the JS job is failed, and remove
  redundant cancel() call (destroyInstance already cancels internally)
- Wrap BCIInterface construction in try/catch so a native init failure
  sets addon=null and throws a structured QvacErrorAddonBCI
- Change JSAdapter loadContextParams/loadMiscParams/loadBCIParams to
  return void (callers already mutate via reference, return was dead)
- Add dayIdx bounds-check warning in BCIModel::process when the value
  falls outside [0, numDays-1] before silent clamping
- Promote hardcoded gaussian smoothing params (std=2.0, kernel=100) to
  named constants K_SMOOTH_KERNEL_STD / K_SMOOTH_KERNEL_SIZE
- Add NeuralProcessor::getNumDays() accessor for the bounds check
- Remove [key: string]: unknown escape hatch from WhisperConfig in
  index.d.ts; enumerate all valid keys explicitly
- Fix test:cpp:run script to use direct path instead of cd && chain

Made-with: Cursor

* chore(bci): point whisper-cpp overlay to merged master (2b1e04f)

qvac-ext-lib-whisper.cpp PR tetherto#10 has been merged. Update the overlay
to reference the merge commit on master instead of the feature branch
commit, so the overlay remains valid if the branch is deleted.

Bump port-version to 4.

Made-with: Cursor

* fix(bci): serialize inference lifetime and export low-level subpaths

Address ogad-tether review feedback on PR tetherto#1583:

1. Inference queue: transcribe() now holds its slot until the response
   settles via _enqueueInference(), matching the pattern from
   TranscriptionWhispercpp._enqueueExclusiveRunResponse(). Previously
   the exclusiveRunQueue released the slot as soon as runJob() was
   accepted, allowing a second concurrent transcribe() to race in and
   either clobber the first response or get rejected by the native side.

2. Exports map: add ./bci, ./bci.js, and ./binding subpath exports so
   the low-level BCIInterface API documented in the README is accessible
   after publish. The exports map previously only exposed ./binding.js,
   blocking require('@qvac/bci-whispercpp/bci').

Made-with: Cursor

* refactor[bc](bci): rename to qvac-lib-infer-bci-whispercpp and address review

Align the BCI package with the inference-addon family conventions and resolve
the review findings that accumulated across PR tetherto#1583.

Breaking changes
- Package directory renamed from packages/bci-whispercpp to
  packages/qvac-lib-infer-bci-whispercpp (npm name @qvac/bci-whispercpp
  unchanged).
- Error codes moved from 7001-7013 (collided with @qvac/tts-onnx and the
  @qvac/transcription-parakeet fallback range) to the dedicated 26001-27000
  range. Also adds FAILED_TO_START_JOB, INVALID_CONFIG, and
  EMBEDDER_WEIGHTS_INVALID for cases that were previously swallowed.

Pattern / standard alignment with peer addons
- Add addonLogging.js + addonLogging.d.ts + ./addonLogging subpath export.
- Add CHANGELOG.md, PULL_REQUEST_TEMPLATE.md, tsconfig.dts.json.
- Pin qvac-lib-inference-addon-cpp vcpkg dep to 1.1.5#1 (port-version).
- vcpkg default-registry switched from git@github.com: to https://
  (fixes anonymous clones and CI runners without an SSH deploy key).
- Lint glob now covers lib/**/*.js.
- bare engine bumped from >=1.19.0 to >=1.24.0 to match llamacpp-llm/embed.
- VCPKG_OVERLAY_TRIPLETS set unconditionally and preserves external value.
- Remove test:unit script that pointed at a non-existent dir; add
  build:pack, lint-cpp, test:dts scripts matching peer conventions.
- package.json files array now includes README.md, CHANGELOG.md, and
  addonLogging artifacts; repository.directory + homepage point at the
  renamed path.

PR review fixes (Gustavo, ogad-tether, github-code-quality bot)
- day_idx default aligned: C++ runtime default is now 0 (matches the public
  JS/TS docs and NeuralProcessor header default).
- BCIInterface.runJob rewrap now uses FAILED_TO_START_JOB instead of the
  misleading FAILED_TO_APPEND; input is validated (Uint8Array, non-empty).
- day_idx: -1 passthrough mode is now explicitly documented in
  configChecker, README, and index.d.ts, and values < -1 are rejected at
  the JS boundary.
- JS _load no longer sets suppress_nst/temperature defaults that fought the
  BCI-tuned C++ defaults in toWhisperFullParams.
- Duplicate checkConfig call in BCIWhispercpp._load removed; validation
  now happens once inside the BCIInterface constructor.
- whisper_log_set guarded by std::once_flag so it does not clobber any log
  handler a coexisting whisper-based addon installed in the same process.
- Embedder weight loader now checks the stream state after every read and
  returns false on truncation instead of silently marking the weights as
  loaded and producing garbage at inference time.
- NeuralProcessor day projection is now memoized per day_idx; same-day
  batch inference no longer rebuilds the O(nf^2 * r) dense matrix.
- cancelRequested_.store(false) now runs before reset() in
  BCIModel::process(const std::any&) to avoid a window where a cancel() is
  dropped on the floor.
- _addonOutputCallback now unpacks transcript arrays so response.await()
  yields flat segments (matches TranscriptionWhispercpp).
- examples/transcribe-neural.js identical-branch ternary fixed.
- README broken whisper.cpp link fixed; docs/BCI_V184_COMPAT.md stale
  overlay commit ref updated.
- Integration test honours BCI_REQUIRE_MODEL=1 to turn missing-model into
  a loud failure for CI (default behaviour unchanged: local dev still
  skips).
- index.d.ts now imports QvacResponse from @qvac/infer-base/src/QvacResponse
  and LoggerInterface from @qvac/logging instead of hand-rolling them.

Tests
- Clean rebuild from scratch (rm -rf build prebuilds && bare-make
  generate/build/install) succeeds.
- npm run lint: clean (now covers lib/**).
- npm run test:dts: clean.
- npm run test:integration: 3/3 pass, 10/10 asserts, 6.0% average WER
  (matches baseline).
- npm run test:cpp: 18/18 pass (was 7; +11 new tests covering unknown-key
  rejection, numeric double-to-int coercion, range validation,
  ContextGpuDevice bounds, passthrough mode, invalid embedder handling).
- bare examples/transcribe-neural.js --batch: 5/5 samples, 6.0% avg WER.
- bare examples/transcribe-neural.js test/fixtures/neural_sample_0.bin:
  output unchanged ("You can see the good at this point as well.").

Made-with: Cursor

* fix[api](bci): restore bci-whispercpp package path and harden runtime validation

Move the addon package back to packages/bci-whispercpp, remove unneeded overlay/docs files requested in review, and tighten JS/C++ lifecycle/config safety checks to prevent invalid-state and malformed-input issues.

Made-with: Cursor

* fix[api](bci): address code review findings across JS, C++, and build config

- Replace cd && chain in test:cpp:run with direct path (CLAUDE.md compliance)
- Route whisper_log_set through addon-cpp logger instead of silencing with
  once_flag, preventing inter-addon log handler clobber when BCI and
  transcription-whispercpp coexist in the same process
- Fix stats heuristic in bci.js _addonOutputCallback to match actual
  BCIModel::runtimeStats keys (tokensPerSecond/totalWallMs, not the
  audio-addon keys audioDurationMs/totalSamples)
- Drain _inferenceQueueWaiter in unload()/destroy() before calling
  destroyInstance(), closing the race where destroy could fire while
  process() is mid-execution on the native thread
- Remove auto-load in BCIModel::process — throw immediately if context
  is null instead of lazy-loading outside the controlled lifecycle
- Remove dead set_weights_for_file snake_case stub and unused <span>
- Add qvac-lint-cpp to vcpkg.json dependencies (matches all peer addons)
- Remove empty qvac-lint-cpp overlay directory (per Gustavo review)
- Remove stale bci_wer/bci_transcription from manifest.json
- Stop gitignoring package-lock.json (match monorepo convention)
- Move computeWER into BCIWhispercpp namespace in index.d.ts
- Downgrade @types/node to ^22.15.3, remove bare-fs from devDeps
- Fix PR template code blocks from typescript to javascript

Made-with: Cursor

* fix[api](bci): address review findings — standards alignment, structured errors, lifecycle safety

Align bci-whispercpp with monorepo conventions and fix code quality issues
found during thorough review of the POC implementation.

Build/config:
- .gitignore aligned with peer addons (package-lock.json, .npmrc, IDE files,
  vcpkg cache, generated test bundles)
- vcpkg.json: use "version" instead of deprecated "version-string"
- package.json: replace $(find) in lint-cpp with explicit file list, remove
  unused bare-stream/bare-tty deps, add bare-fs to production deps
- CHANGELOG.md: add date per Keep a Changelog format

JS fixes:
- Move fs.existsSync model check from constructor to _load(), matching
  TranscriptionWhispercpp lifecycle pattern
- Remove dead PAUSED/STOPPED state enum values from bci.js
- Add explicit event name matching alongside heuristic fallback in
  _addonOutputCallback (matches peer whisper.js pattern with BCI stat keys)
- Add miscConfig.caption_enabled boolean type validation in configChecker
- Extract duplicated flattenSegments into shared lib/util.js
- Fix index.d.ts import from fragile internal path to stable @qvac/infer-base

C++ fixes:
- Guard whisper_log_set with std::once_flag to prevent clobbering log handlers
  from coexisting whisper-based addons in the same process
- Replace std::runtime_error with structured StatusError/bci_error::makeStatus
  in BCIModel::load() and loadEmbedderIfNeeded() for proper JS error mapping
- Use std::move in process(const std::any&) to avoid copying multi-MB neural
  signal buffers on every inference call

Made-with: Cursor

* fix[api](bci): align with peer addon standards and remove unused code

- Add qvac-lint-cpp configure_file block to CMakeLists.txt (copies
  .clang-format, .clang-tidy, .valgrind.supp from vcpkg into build tree,
  matching qvac-lib-infer-whispercpp pattern)
- Extend lint-cpp script to cover all .hpp header files
- Match peer index.d.ts QvacResponse import path (deep import from
  @qvac/infer-base/src/QvacResponse)
- Replace brittle string-matching in _isConfigurationError with
  structured error detection (TypeError, ERR_ASSERTION code checks)
- Remove stale configChecker comments about unimplemented BCI params
  (smooth_kernel_std, smooth_kernel_size, sample_rate)
- Remove unused error codes: FAILED_TO_GET_STATUS, FAILED_TO_RESET,
  FAILED_TO_PAUSE and their addCodes registrations
- Remove unused K_SAMPLES_PER_SECOND constant from BCIModel.cpp
- Remove unused <span> include from AddonJs.hpp
- Add qvac-lib-inference-addon-cpp to NOTICE C++ dependencies
- Add cpp-test-results.xml to .gitignore

Made-with: Cursor

* chore(bci): remove whisper-cpp overlay, consume v1.8.4.2 from registry

The BCI patches (variable conv1 kernel, windowed attention) are now
merged into tetherto/qvac-ext-lib-whisper.cpp master and tagged as
v1.8.4.2. The local overlay that pinned a specific fork commit is no
longer needed.

- Delete vcpkg-overlays/whisper-cpp/ (portfile.cmake + vcpkg.json)
- Remove VCPKG_OVERLAY_PORTS from CMakeLists.txt
- Bump whisper-cpp override from 1.8.4 to 1.8.4.2
- Point vcpkg-configuration.json at personal fork registry
  (sharmaraju352/qvac-registry-vcpkg) temporarily until
  tetherto/qvac-registry-vcpkg#125 merges, then swap back
- Update README whisper.cpp patches section

Verified: clean build from scratch + 18/18 C++ tests + 3/3 integration
tests (10/10 asserts, 6.0% avg WER) + batch example all pass.

Made-with: Cursor

* chore(bci): point vcpkg registry back to tetherto upstream

Registry PR tetherto/qvac-registry-vcpkg#125 has been merged. Swap
vcpkg-configuration.json from the personal fork back to the upstream
tetherto/qvac-registry-vcpkg and update the baseline to the merge
commit.

Verified: clean build from scratch + all tests pass on both
bci-whispercpp (18/18 C++, 3/3 integration, 6.0% WER) and
transcription-whispercpp (106/106 C++, 28/28 unit, 10/10 integration,
all extended suites).

Made-with: Cursor

* fix(bci): address Gustavo review — error types, lifecycle, error codes

- Reset is_warmed_up_ in BCIModel::unload() so re-load triggers warmup
- Add FailedToLoadModel and EmbedderWeightsNotFound error codes to
  BCIErrors.hpp; use them instead of InvalidNeuralSignal for context
  init failure (BCIModel.cpp:116) and missing embedder (BCIModel.cpp:90)
- Wrap addon.activate() in try-catch in index.js _load(), throwing
  FAILED_TO_ACTIVATE with structured error on failure
- Make all JS error codes sequential (26001-26013, no gaps)

Made-with: Cursor

* Remove date from changelog

---------

Co-authored-by: Raju <raju.sharma>
Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
…2232) (tetherto#1699)

Bumps @qvac/registry-client to ^0.4.1 and passes corestoreOpts: { wait: true }
when constructing QVACRegistryClient in the SDK's bare registry bootstrap.
This switches the underlying Corestore from tryLock to waitForLock semantics,
so concurrent SDK instances on the same machine no longer collide on
~/.qvac/registry-corestore/<key> with 'File descriptor could not be locked'.

Internal bootstrap tweak only — no public SDK surface changes.
…nmtcpp 2.0.1 (tetherto#1563)

feat: update SDK NMTCPP plugin to support  @qvac/translaton-nmtcpp@2.0.1, which moves away from base inference inheritance
…herto#1633)

* QVAC-17020: Integrate new cache api into SDK

* Bumping LLM add-on version in SDK to get new cache API

* Adding updated bun.lock

* fix: verify kv-cache file persisted before recording saved message count

* refactor: source CacheRunOptions from @qvac/llm-llamacpp RunOptions

* fix(llamacpp-completion): preserve explicit saveCacheToDisk: false in run options

* fix(sdk): persist KV cache during system prompt prime to keep init marker consistent with disk

* mod: dedupe model.run cast into a typed runModel helper in completion-stream

* fix(sdk/examples): log cleanup errors in llamacpp-cache instead of swallowing them

---------

Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>
…rto#1662)

* feat[api]: add img2img support to SDK diffusion API

* fix(sdk): enforce img_cfg_scale default of -1 at the schema level

* test(sdk): split sdcpp diffusion dispatcher tests into focused cases and extract plugin mock into withMockDiffusionPlugin helper

* chore(test/diffusion): cleanup the comments

* test: add diffusion img2img test definition with init_image param

* fix(test/diffusion): split shared executor into separate ones for mobile and desktop due to file system logic diff

---------

Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>
…therto#1674)

* QVAC-17236 [Chatterbox] Investigate possibilities of reducing RTF

* QVAC-17236 Rewrote the "Speech encoder output caching" bullet (and intro paragraph) to reflect that the encoder runs during load(), not on first synthesize()

* QVAC-17236-Investigate-possibilities-of-reducing-RTF version @qvac/tts-onnx: bump "0.8.4" → "version": "0.8.5"
Minor version bump for the new dynamic GGML backend loading feature
(tetherto#1617) which unblocks GPU-backed inference on Android via `backendsDir`
and adds `openclCacheDir` for faster OpenCL startup.
…to#1668)

* feat: add Parakeet performance benchmark workflows

Parameterize the RTF benchmark across models and devices, and add workflow/reporting support to collect consolidated performance findings across CI and manual backends.

Made-with: Cursor

* fix: export mobile benchmark entrypoints

Export the generated mobile integration functions and dedicated mobile benchmark entrypoints so code-quality checks do not flag the benchmark harness as unused.

Made-with: Cursor

* fix: package mobile benchmark runtime with custom tests

Keep the dedicated mobile benchmark suite self-contained so the Device Farm build can resolve its local runtime helper after extraction into the mobile test framework.

Made-with: Cursor

* fix: run benchmark matrix reliably on Windows

Use shell-backed npm spawning on Windows and surface process errors so the desktop benchmark workflow can produce the missing win32 DirectML artifacts.

Made-with: Cursor

* fix: resolve mobile benchmark extractor from addon checkout

Run the mobile benchmark log-extraction script from the addon checkout path so the Device Farm workflow can publish structured mobile RTF artifacts after the run completes.

Made-with: Cursor

* fix: align mobile benchmark dir with integration harness

Use an integration.auto.cjs entrypoint in the custom mobile benchmark directory so the mobile test framework bundles and resolves the benchmark module through the same integration loader path it expects for other mobile suites.

Made-with: Cursor

* fix: route mobile benchmark through copied integration modules

Align the custom mobile benchmark suite with the qvac-test-addon-mobile loader by pointing it at a copied integration wrapper module and resolving the extractor from the addon checkout root.

Made-with: Cursor

* fix: make mobile benchmark module self-contained

Use a mobile-specific benchmark integration module that imports through the copied mobile-aware helpers path instead of the desktop benchmark file's addon-relative imports.

Made-with: Cursor

* fix: make raw mobile log upload non-blocking

Do not fail the mobile benchmark job when the final Device Farm log artifact upload times out after the benchmark execution and extraction steps have already completed.

Made-with: Cursor

* fix: make mobile RTF extraction file-based and logcat-safe

Write mobile benchmark reports to stable device paths, emit chunked OCR-compatible console markers, and update the Device Farm testspec plus extractor so Android and iOS can surface structured RTF artifacts reliably.

Made-with: Cursor

* fix: persist mobile perf reports per benchmark

Made-with: Cursor

* fix: copy mobile perf helper from repo root

Made-with: Cursor

* fix: harden Parakeet report pipeline

Made-with: Cursor

* fix: retry mobile perf report pulls

Made-with: Cursor

* fix: scope Parakeet perf changes to desktop

Made-with: Cursor

---------

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Made-with: Cursor

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
…to#1706)

Backmerges the v0.4.1 release metadata (package.json version +
CHANGELOG entry) from release-qvac-registry-client-0.4.1 into main,
per docs/gitflow.md step 4. Released and published via tetherto#1698.
…KEN (tetherto#1618)

* refactor: use npm trusted publishing and remove NPM_TOKEN

* fix: node version for the npm trusted publishing

* refactor: remove secrets-token since it will no longer be needed

* refactor: remove setting package scope using npm access public

* removed: deprecated steps

* refactor: removed unused inputs of the action

* refactor: removed unused inputs of the action

* refactor: removed unused inputs of the action

* refactor: removed unused inputs of the action

* feat: added id-token: write permission for the publishing

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
…e() api (tetherto#1691)

* feat: add public state() lifecycle api

* fix: route pre-handler errors on streaming and progress rpcs to stream channel

* feat: enforce lifecycle gate and tighten partial resume failure

* chore: demonstrate state() and blocked operations in suspend-resume example

* doc[api]: add tsdoc for public state() lifecycle api

* doc: expand suspend/resume tsdoc with in-flight behavior matrix
…ed vocab resolution (tetherto#1707)

* feat: extract BERGAMOT_MODEL_RE and BERGAMOT_CJK_LANG_PAIRS to shared schemas

* feat: add Bergamot NMT companion-set grouping in codegen

* feat: refactor NMT plugin to path-based vocab with colocated derivation fallback

* feat: add legacy flat-cache probe for non-ONNX companion sets

* test: add Bergamot companion detection unit tests
* feat(diffusion): add LoRA support via run config (JS → native → sd.cpp)

* test(diffusion): add real LoRA integration test

* chore(diffusion): bump version for LoRA support

* Validate diffusion LoRA paths

* Export generated mobile integration runners

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
…lows (tetherto#1714)

* test: point LLM and OCR mobile workflows at ios-cpp-log-capture branch

Throwaway commit to validate C++ log capture on Device Farm.
Do NOT merge -- delete branch after verification.

Made-with: Cursor

* test: add bare_console.log pullFile to iOS WDIO after hooks

Pull bare_console.log from the app's documents directory at the end
of each iOS test run and write it to $DEVICEFARM_LOG_DIR so it
appears in customer artifacts.

Throwaway -- delete branch after verification.

Made-with: Cursor

* fix: add 3s pause before pullFile to avoid log flush race condition

The Bare worklet flushes logs to disk on a 2s timer. Without a pause,
pullFile in the WDIO after hook could retrieve a stale file missing the
final log entries (inference completion, errors).

* fix: prevent download step hang from pullFile base64 log bloat

browser.pullFile returns base64 which WDIO debug-logs in full, bloating
the Appium output artifact. Replace with raw HTTP to Appium to bypass
WDIO command logging. Also add --max-time 300 to all curl commands in
the download steps as a safety net against any future hangs.

* feat: add iOS bare_console.log capture to all addon mobile workflows

Wire up the same bare-log pullFile (via raw HTTP to bypass WDIO debug
log bloat) and 3s flush pause in the iOS WDIO after hooks for whisper,
NMT, parakeet, onnx-tts, and decoder-audio workflows. Also add
--max-time 300 to all curl commands in their download steps.

* feat: split Device Farm artifacts into console logs and full logs

Add two-tier artifact downloads to all 9 mobile test workflows:
- "Console Logs" (small): bare_console.log (iOS C++ logs), test spec
  output, and logcat (Android) for quick debugging
- "Full Device Farm Logs" (big): all artifacts for deep investigation

Also adds bare_console.log capture to llamacpp-embed and diffusion
workflows, curl --max-time 300 safety net, and resets TEST_FRAMEWORK_REF
back to main now that qvac-test-addon-mobile#36 is merged.

Made-with: Cursor

* fix: extract bare_console.log from Customer_Artifacts.zip in console logs

Device Farm bundles $DEVICEFARM_LOG_DIR files into Customer_Artifacts.zip,
so bare_console.log was not appearing as a standalone file. The extract
step now unzips Customer_Artifacts.zip files and pulls bare_console.log
into the console-logs artifact with device-prefixed names.

Made-with: Cursor

* fix: simplify console logs to only bare_console, logcat, and appium logs

Console logs artifact now contains only the essentials:
- bare_console.log (iOS C++ logs, extracted from Customer_Artifacts.zip)
- Logcat (Android native logs)
- appium.log (Appium server logs, extracted from Customer_Artifacts.zip)

Removed test spec output from console logs -- those stay in full logs.

Made-with: Cursor

* fix: add .github/actions to sparse-checkout in OCR and ONNX on-pr workflows

The sanity-checks job uses sparse-checkout but only included the package
directory. Custom actions (yamlfmt, run-lint-and-unit-tests) were missing
from the checkout, causing "Can't find action.yml" errors.

Made-with: Cursor

* Revert "fix: add .github/actions to sparse-checkout in OCR and ONNX on-pr workflows"

This reverts commit 0428671.

* fix: align TTS console-logs extract path with download path (include variant suffix)

Made-with: Cursor

* fix: use find for nested zip extraction in OCR console-logs

Device Farm zips nest files under Host_Machine_Files/$DEVICEFARM_LOG_DIR/
so flat path checks never found bare_console.log or appium.log.

Made-with: Cursor

* fix: use find for nested zip extraction in all 8 remaining workflows

Device Farm Customer_Artifacts.zip nests files under
Host_Machine_Files/$DEVICEFARM_LOG_DIR/ - use find to locate
bare_console.log and appium.log at any depth inside the zip.

Made-with: Cursor
…tadata (tetherto#1700)

* feat[mod]: regenerate model registry with companion-set metadata

* chore: regenerate model registry with companion-set metadata
)

* feat[QVAC-17474]: port OCR mobile perf-report pipeline to NMT (Phase 2)

Closes the "Mobile Phase 2" follow-up from PR tetherto#1684: surface the
chrF++ + perf numbers from NMT mobile integration tests in a
dedicated artifact + GitHub Step Summary, matching the OCR pattern
Tobi set up in tetherto#1625.

Changes:

1. .github/workflows/integration-mobile-test-qvac-lib-infer-nmtcpp.yml
   Added three steps between the existing "Download Device Farm
   Logs" and "Upload Device Farm Logs":
   a) "Extract performance report from Device Farm logs" — runs
      scripts/perf-report/extract-from-log.js against the downloaded
      Device Farm logs, scanning for [PERF_REPORT_START]...[PERF_
      REPORT_END] markers (and PERF_CHUNK fallback for large
      payloads), then runs scripts/perf-report/aggregate.js to
      generate the HTML / MD / summary-json report. Extraction is
      best-effort: if no markers are found (e.g. Device Farm logs
      were empty or tests crashed early), the step logs a warning
      and continues so the raw logs still get uploaded by the
      following step.
   b) "Write mobile perf report to GitHub Step Summary" — appends
      the generated MD report (performance + quality sections) to
      $GITHUB_STEP_SUMMARY so the iOS/Android job pages render
      chrF++ + perf tables inline, same way the desktop job does.
   c) "Upload mobile performance report" — dedicated artifact
      perf-report-nmtcpp-mobile-<platform>-<run>, 90-day retention,
      containing performance-report.json/.html/.md and
      performance-summary.json. Mirrors the perf-report-nmtcpp-*
      artifact that already exists for desktop.

2. packages/qvac-lib-infer-nmtcpp/test/integration/utils.js
   Dual-store chrfpp in both `metrics` and `quality` fields on each
   reporter entry. Needed because:
   - writeStepSummary reads `metrics.chrfpp` via METRIC_COLUMNS.
     translation (that's what Olya's single table uses) → unchanged.
   - aggregate.js Quality Summary section reads `result.quality.*`
     via qKeys — without `quality.chrfpp`, the mobile (and desktop)
     HTML / MD reports were showing only a "Mean Total Time" column
     with no chrF++ anywhere. Adding `quality: {chrfpp, reference}`
     in the extra lets aggregate.js render the chrF++ column as
     a percentage in the Quality Summary.
   - Mobile inline reporter now also threads extra.quality into
     entry.quality, mirroring the desktop reporter in
     scripts/test-utils/performance-reporter.js, so the on-device
     [PERF_REPORT_START] JSON carries the quality field.

Out of scope (deliberately, can be follow-up if useful):
- "Combined Performance Report" job that aggregates across multiple
  devices (OCR's pattern). NMT mobile currently runs a single
  device per platform so a per-platform report is the natural unit.
- Splitting the Device Farm run into perf vs regular subsets (OCR
  does this to bound perf-test wall time). NMT mobile has a small
  test count already; no need to split.
- Device-name display in the extracted report. NMT's existing
  Device Farm log download flattens files to <device>_<suite>_
  <artifact>.<ext> instead of OCR's <device>/<artifact> layout, so
  extract-from-log.js's device-from-path inference returns
  "unknown". Reports still render correctly, just with a generic
  column header. Fix is a small tweak to either the download layout
  or extract-from-log.js; not blocking this work.

Verified locally:
- YAML validates
- Synthetic Device Farm log containing a real-shape
  [PERF_REPORT_START]...[PERF_REPORT_END] marker →
  extract-from-log.js produces 3-result performance-report.json →
  aggregate.js produces .md / .html / summary.json with both
  Performance Summary (total time) and Quality Summary (chrF++
  column, percentages) sections populated:

    | Test                   | EP  | chrF++ |
    | [Bergamot] [CPU]       | CPU | 97.0%  |
    | [IndicTrans] [CPU]     | CPU | 63.0%  |
    | [Pivot es→en→it] [CPU] | CPU | 71.0%  |

Made-with: Cursor

* feat[QVAC-17474]: extend WDIO after-hook to extract perf report from device (Mobile Phase 2)

Follow-up on tetherto#1697: the workflow-side extract step never had input
because Bare-runtime's console.log on iOS doesn't reach iOS Syslog
captured by Device Farm. Port the OCR tetherto#1625 approach end-to-end so
the perf report actually leaves the device.

Changes to .github/workflows/integration-mobile-test-qvac-lib-infer-nmtcpp.yml:

1. Android WDIO `after:` hook — after stopping the health monitor,
   executes the full OCR tetherto#1625 extraction routine:
   - Poll 6 on-device paths via `mobile: shell cat` for stability
     (48 retries x 5s, waits until result-count is stable for 6
     consecutive reads) — bounds Device Farm flakiness.
   - Try `browser.pullFile` across 9 paths including the app
     sandbox (`@<bundleId>/files/...`), `/sdcard/Android/data/
     <bundleId>/files/...`, `/data/data/<bundleId>/...`, etc.
   - Fall back to `browser.getLogs("logcat")` — parses
     [PERF_REPORT_START]...[END] markers AND reassembles
     PERF_CHUNK: chunks when logcat per-entry truncation splits
     the JSON across lines.
   - Fall back to `mobile: shell cat` on the same path set.
   - Fall back to `mobile: shell run-as <bundleId> cat` for files
     inside the app's private sandbox.
   - On success: write extracted JSON to
     $DEVICEFARM_LOG_DIR/perf-report-extract.json AND echo it
     wrapped in [PERF_REPORT_START]...[END] to the testspec output
     stream.

2. iOS WDIO `after:` hook — after stopping the health monitor,
   calls `browser.pullFile("@<bundleId>:documents/perf-report.json")`
   (with `:library/` fallback), writes the result to $DEVICEFARM_
   LOG_DIR/perf-report-extract.json and echoes markers.

3. Testspec `post_test:` phase — added a fallback that reads
   perf-report-extract.json from $DEVICEFARM_LOG_DIR or $DEVICEFARM_
   TEST_PACKAGE_PATH and re-emits it wrapped in markers to stdout,
   so the downstream extract-from-log.js picks it up from the
   TESTSPEC_OUTPUT.txt artifact even if the WDIO-emitted console
   markers were lost in the iOS Syslog stream.

Note on escaping: the WDIO_CONFIG remains in NMT's existing
bash-single-quoted format (rather than switching to OCR's heredoc
pattern). Literal `'` in the JS regex (logcat chunk prefix matcher)
uses the Unicode escape `\u0027` to avoid breaking the outer bash
single-quote. Final JS regex on-device reads
`/^\u0027\[Bare\]\u0027,\s*\u0027/` which matches `'[Bare]',<ws>'`
— same semantics as OCR's original.

Verified locally:
- YAML parses cleanly.
- Simulated bash eval of WDIO_CONFIG with APP_BUNDLE_ID stub
  produces valid JS: perf-extract block ends with `ALL methods
  failed");}},afterTest:...` (correct brace balance); regex reads
  `msg.replace(/^\u0027\[Bare\]\u0027,\s*\u0027/,"")` (proper
  single-backslash escapes throughout).

Not done in this commit (already in place from tetherto#1697):
- extract-from-log.js workflow step, aggregate.js step, Step
  Summary write, and dedicated artifact upload. Those were already
  added; this commit completes the input side of that pipeline.

Made-with: Cursor

* fix[QVAC-17474]: externalize WDIO config to template files (fix max-expression-length 21000)

GitHub Actions reports `Exceeded max expression length 21000` when
queuing `On PR Trigger (NMTCPP)` on this branch. Root cause: the
`Create and Upload Test Spec` step's `run:` block had grown to 24,350
bytes after adding the Android perf-extract logic inline. GitHub
Actions treats a `run:` script that contains `${{ }}` substitutions
as a single template expression, and the step was over the 21,000-
char limit.

Fix: move the two WDIO configs out of the workflow YAML into
template files checked into the repo, loaded at step-time with sed:

- packages/qvac-lib-infer-nmtcpp/test/mobile/wdio-config-android.js.template
- packages/qvac-lib-infer-nmtcpp/test/mobile/wdio-config-ios.js.template

Workflow step now does:

    WDIO_CONFIG=$(sed "s#__BUNDLE_ID__#${{ env.APP_BUNDLE_ID }}#g" \
      packages/qvac-lib-infer-nmtcpp/test/mobile/wdio-config-<platform>.js.template)

Step size: 24,350 → 12,423 bytes. No change to the final WDIO config
that lands on-device — the template is the exact same content that
was being constructed inline. Bundle-ID substitution is now done
via sed on a placeholder `__BUNDLE_ID__` (OCR's tetherto#1625 pattern)
instead of bash-single-quote gymnastics.

Both templates contain the full WDIO config (`before:` hook with
crash detection + health-monitor, `afterTest:`, `after:` with perf
extraction — Android: poll + pullFile + logcat + chunks + run-as;
iOS: pullFile from app sandbox).

Made-with: Cursor

* fix[QVAC-17474]: split Create and Upload Test Spec step to fix max-expression-length limit (no new files, no hardcoding)

Previous attempt introduced two .template files to work around GitHub
Actions' 21,000-char expression-length limit for the `Create and
Upload Test Spec` step. Tobi's OCR tetherto#1625 does NOT use template files —
he keeps everything inline via the heredoc pattern (`<< 'WDIO_EOF'`).
His step naturally fits under 21,000 because OCR's WDIO config is
smaller than NMT's (NMT has an additional health-monitor setInterval
in the before: hook + multi-level crash detection).

Cleaner solution: split the single large step into three smaller ones,
mirroring OCR's heredoc approach inline:

1. New step "Build WDIO config for Android" (8.7 KB, matrix-gated
   `if: matrix.platform == 'Android'`) — builds and base64-encodes the
   Android WDIO JS, exports WDIO_CONFIG_B64 via $GITHUB_ENV.
2. New step "Build WDIO config for iOS" (4.3 KB, matrix-gated) — same
   for iOS.
3. Existing "Create and Upload Test Spec" step (11 KB) — no longer
   builds the WDIO config, just consumes $WDIO_CONFIG_B64 from env
   alongside the existing per-platform testspec-metadata branching
   (PLATFORM, AUTOMATION, HOST_LINE).

All three steps are well under 21,000 bytes. No new checked-in files.
All paths use the workflow's standard env vars (${{ env.APP_BUNDLE_ID }}),
no hardcoded package paths.

Deleted the previous commit's template files since they are no longer
needed:
- packages/qvac-lib-infer-nmtcpp/test/mobile/wdio-config-android.js.template
- packages/qvac-lib-infer-nmtcpp/test/mobile/wdio-config-ios.js.template

Made-with: Cursor

* fix[QVAC-17474]: cross-platform sed (pipe form) in Build WDIO config steps

iOS job failed with 'sed: -I or -i may not be used with stdin' because
the iOS runner is macOS and BSD sed does not accept GNU-style -i
without a backup-extension argument. Rewrote the substitution as a
single pipe that works on both GNU sed (Linux, Android runner) and
BSD sed (macOS, iOS runner):

Before:
  sed -i "s#__BUNDLE_ID__#...#g" /tmp/wdio-config.js
  WDIO_CONFIG_B64=$(base64 < /tmp/wdio-config.js | tr -d '\n')

After:
  WDIO_CONFIG_B64=$(sed "s#__BUNDLE_ID__#...#g" /tmp/wdio-config.js | base64 | tr -d '\n')

Applied to both Android and iOS Build WDIO config steps.

Made-with: Cursor

* fix(mobile): flush perf-report.json after every record() on mobile

The Bare process is hosted inside the native test-addon-mobile app and
does not exit between WDIO specs, so `process.on('exit')` never fires —
meaning `writeReport()` was never called on iOS/Android and the
`perf-report.json` did not exist when WDIO's `after:` hook called
`pullFile`, which returned OBJECT_NOT_FOUND.

Mirror the OCR pattern (packages/ocr-onnx/test/integration/utils.js):
- On mobile, call `writeReport()` + `writeToConsole()` after every
  `_perfReporter.record()` in `formatPerformanceMetrics` so the file
  is present on disk ahead of the `after:` hook.
- Extend the mobile inline `writeReport()` dirs list with `os.tmpdir()`
  for iOS (maps to the app's tmp container, reachable as
  `@<bundle>:tmp/perf-report.json`).
- Extend the iOS WDIO `after:` hook to additionally try
  `@<bundle>:tmp/perf-report.json` and to wait 3s before pulling so
  async flushes have time to hit the filesystem.

Made-with: Cursor

* fix(mobile-ci): resolve perf-report scripts under monorepo/ checkout

The mobile workflow checks the monorepo out at `./monorepo`, so
`scripts/perf-report/extract-from-log.js` is not at the workspace root.
The extract step was running from the default cwd and failing with
`Cannot find module '/Users/runner/work/qvac/qvac/scripts/perf-report/
extract-from-log.js'`, which caused `extracted=false` and skipped the
Step Summary write + produced an empty perf-report-mobile artifact.

Reference the scripts via `monorepo/scripts/perf-report/...` and guard
with a presence check so the step surfaces a clear warning instead of a
generic module-not-found if the checkout ever changes.

Made-with: Cursor

* feat(mobile): unify mobile Step Summary with desktop integration format

Previously the mobile Step Summary used `aggregate.js`'s multi-device
comparison layout (one column per device) and split perf + quality into
two separate tables — the quality table also carried OCR-only columns
(CER/WER/KW/KV) with '-' placeholders that made it hard to read.

The desktop integration Step Summary renders a single compact table per
run using `performance-reporter.js::writeStepSummary()`:

  | Test | EP | Total Time (ms) | Decode (ms) | Tokens | TPS | chrF++ |

This change makes the mobile Step Summary use that exact same layout:

1. New `scripts/perf-report/render-step-summary.js` reads a single-device
   perf-report.json and emits the desktop-style single-table markdown,
   reusing METRIC_COLUMNS / QUALITY_COLUMNS from performance-reporter.js
   so both surfaces stay in lockstep. It suppresses the quality section
   when all quality keys are already covered by metric columns (so NMT
   no longer gets the empty CER/WER/KW/KV columns).

2. `fix(extract): derive device name from Device Farm flat filename layout`.
   Device Farm artifacts come in as `<logDir>/<Device>_Tests_Suite_*.txt`
   rather than `<logDir>/<Device>/*`, so `deriveDeviceName` was returning
   null and the device column rendered as "unknown". Add a fallback that
   parses the filename prefix (stops before `Tests_Suite | Setup_Suite |
   Teardown_Suite | job` phase separator), yielding e.g. "Apple iPhone
   16 Pro".

3. Update the mobile workflow's "Write mobile perf report to GitHub Step
   Summary" step to call the new renderer against `performance-report.json`
   instead of catting the aggregated `performance-report.md`.

Made-with: Cursor

* fix(mobile): prevent SIGABRT from unhandled model-download rejection

Root-cause of the Samsung Galaxy S25 Ultra failure in CI run 1212:

  10:08:51.434  # IndicTrans backend [CPU]             ← next test starts
  10:08:51.434  Downloading: https://.../qvac_mod...   ← re-downloads 200MB
  10:08:51.580  E bare: Uncaught (in promise) FetchError: NETWORK_ERROR
                  [cause]: HTTPError: CONNECTION_LOST: Socket hung up
  10:08:51.582  F libc: SIGABRT in libbare-kit.so::js_callback_s::on_call

The IndicTrans [GPU] variant already downloaded the 200MB model, then the
[CPU] variant unnecessarily re-downloaded it. Samsung's Device Farm lane
hit a transient socket drop on the second download; bare-fetch emitted
an unhandled promise rejection; Bare's default handler called abort().
The stack tip was BareKit dispatching the rejection to JS, which is why
the backtrace misleadingly looked like a BareKit-internal crash.

Google Pixel 9a in the same matrix ran to completion on the same commit
because its Device Farm lane didn't drop the second download.

Three fixes, all in the test harness for this package:

1. `ensureIndicTransModel()` — cache the 200MB model on mobile's writable
   root (`global.testDir`). Skip redownload when the existing file is
   within the expected size range. Eliminates wasted bandwidth and the
   second-download failure window.

2. `downloadFile()` — retry transient network errors up to 3 times with
   exponential backoff (500ms / 1s / 2s). HTTP status errors still fail
   fast since they are deterministic.

3. `bergamot.test.js` + `indictrans.test.js` — mirror pivot-bergamot's
   defensive `Bare.on('unhandledRejection', ...)` handler so a future
   uncaught rejection logs loudly instead of calling abort(). Keeps the
   perf-report pipeline able to record whatever data was captured up to
   that point rather than losing the whole run.

Bug (2) and (3) are defense-in-depth; bug (1) is the specific fix for
the observed Samsung crash.

Made-with: Cursor

* fix(lint): declare Bare as a global in bergamot/indictrans tests

sanity-checks (standard@17) flagged 'Bare' is not defined (no-undef) in
the new unhandledRejection handlers added in 53f137f. Mirrors
pivot-bergamot.test.js which uses '/* global Bare */'.

Made-with: Cursor

* fix(mobile): cache Bergamot model files + dedupe Firefox records

Follow-up to the Samsung Galaxy S25 Ultra timeout in CI run 24796639547.
The SIGABRT in the previous run was fixed by 0b2094f / afdb3b1, but
`runPivotBergamot` still timed out at 20 minutes because the Bergamot
model fetcher was re-downloading the same files many times:

1. Within a single `downloadBergamotFromFirefox` invocation, Firefox's
   translations-models records collection exposes multiple variants
   (production + dev/beta) that share the same `filename`, so the loop
   was downloading and overwriting e.g. `lex.50.50.enit.s2t.bin` 2–3
   times per call — once with the 3.9MB production variant, once with
   the 4.3MB dev variant.
2. Across pivot sub-tests (GPU variant, CPU variant, stats-no-hang,
   batch — ×2 language pairs) the test re-invokes the fetcher with the
   same destDir, and `ensureModelPair` was calling the raw
   `downloadBergamotFromFirefox` instead of the cached
   `ensureBergamotModelFiles` wrapper, so each sub-test re-downloaded
   the full pair (~70MB × 8 sub-test invocations on Samsung's slower
   Device Farm lane ≫ 20-min per-test timeout).

Fixes:

* `bergamot-model-fetcher.js`
  - Skip per-file download when destPath already exists with
    non-trivial size (≥1KB, guards against zero-byte stubs from failed
    earlier runs).
  - Dedupe by filename within one `downloadBergamotFromFirefox` call so
    the dev/beta variant doesn't overwrite the production one.
  - Log "(cached)" when a skip happens so CI logs show what saved time.
* `pivot-bergamot.test.js::ensureModelPair`
  - Call `ensureBergamotModelFiles` (which does `hasBergamotModelFiles`
    destDir check) instead of `downloadBergamotFromFirefox` directly,
    so repeat sub-tests skip the Firefox records endpoint entirely.

Expected effect: Samsung `runPivotBergamot` completes in <5 min instead
of timing out at 20 min; Google Pixel / iPhone finish faster too.

Made-with: Cursor

* fix(mobile-ci): full 8-row perf table on Android Step Summary

Two bugs surfaced on run 24820356343, both on Android only:

1. `extract-from-log.js` settled on a 6-row chunked report instead of
   the final 8-row one. Root cause: the pivot fr→en→es test uses the
   input "Bonjour, comment allez-vous aujourd'hui?", and the ReactNativeJS
   bridge wraps logcat output in a JS single-quoted string literal,
   escaping the apostrophe as `\'`. That's a valid JS escape but NOT a
   valid JSON escape, so `JSON.parse` on the reassembled 8-row chunk set
   bailed with `Bad escaped character in JSON`. The extractor silently
   fell back to the earlier 6-row chunk set that didn't contain
   fr→en→es yet.

   Fix: after stripping the `'[Bare]', '…'` wrapper in
   `cleanJsonFromLogcat`, unescape `\\'` → `'`. Verified locally against
   the Samsung + Pixel logcats — both now reassemble all 8 results.

2. The "Write mobile perf report to GitHub Step Summary" step only
   looked for `<OUTPUT_DIR>/performance-report.json` at the root, but
   `extract-from-log.js` writes per-device subdirs when ≥2 devices are
   present (Android matrix: Pixel + Samsung). iOS (single device) wrote
   to the root path and worked; Android fell into the `::warning::`
   branch and skipped the Step Summary entirely.

   Fix: when the root file is missing, walk `<OUTPUT_DIR>/*/` and render
   one table per device, using the directory name (underscores →
   spaces) as the Device suffix in the heading.

Made-with: Cursor

* chore(quality): mark translation reference fixtures as validated (A.5)

Drop the "placeholder baseline — verify with native speaker" note on
the four reference translations used by the integration-test chrF++
scoring and replace with "validated 2026-04-23" plus the register
actually used (informal / formal).

Closes A.5 of QVAC-17474. A.6 (N>1 sentences per test case) and
A.7 (chrF++ quality gate) are explicitly out of scope for this ticket
— chrF++ stays observational-only with a single reference per pair.

Fixtures updated:
- bergamot.quality.json          en  → it    (informal)
- indictrans.quality.json        en  → hi    (formal, आप)
- pivot-bergamot.quality.json#1  es  → it    (informal)
- pivot-bergamot.quality.json#2  fr  → es    (formal, usted)

The inline mobile copy in test/integration/utils.js is updated
byte-for-byte in sync (mobile fallback has no fs access to the
on-disk JSONs at runtime under bare-pack).

Made-with: Cursor

---------

Co-authored-by: Alok-Ranjan23 <Alok-Ranjan23@users.noreply.github.com>
Co-authored-by: olyasir <sirkinolya@gmail.com>
* feat: anchored tools placement for multi-round tool chains

Replace tools-at-end placement with anchored placement: tools are
positioned after the last user message and stay in the KV cache
across chain rounds instead of being removed and re-added each round.

Changes:
- Template: anchor tools after last user message (two-pass Jinja2)
- PostInfer: keep tools when output contains <tool_call>, remove
  only when chain completes (no tool call in output)
- Boundary tracking: recordToolBoundary sets anchor once, preserves
  across chain rounds
- Streaming: capture output when toolsAtEnd is active for tool call
  detection
- Stats: forward nPastBeforeTools, firstMsgTokens, toolsTrimmed
- Generation prompt: treat role "tool" same as "user" for
  add_generation_prompt (fixes empty response on tool chain
  continuation)

* fix: prevent output duplication in streaming mode with toolsAtEnd

Use captured output only for internal tool call detection, don't set
it as the return value when streaming. Prevents the JobRunner from
queuing the full text again after it was already streamed token by
token, which caused the SDK to see every tool call twice.

* fix: avoid unnecessary string copy for non-tool completions

Move captured output construction inside the toolsAtEnd guard so
non-tool completions pay zero string overhead. Only the oss.str()
call and tool_call detection happen when dynamic tools are active.

* fix: context sliding with tools_at_end corrupts tool boundary tracking

When context sliding occurs with tools_at_end enabled, the
nPastBeforeTools boundary was not adjusted after token discard.
This left stale tool tokens in the KV cache, causing incorrect
trim after generation.

Changes:
- Limit discard to conversation-only region (never eat tool tokens)
- Adjust nPastBeforeTools after sliding by the discard delta
- Reset DynamicToolsState in fallback discard path
- Applied to both TextLlmContext and MtmdLlmContext
- Add regression test for sliding during generation with large tools

* refactor: extract sliding helpers into DynamicToolsState, harden edge cases

- Extract clampDiscard() and adjustAfterSlide() into DynamicToolsState
  to eliminate 4x duplicated clamping/adjustment blocks
- Remove redundant std::max(safeLimit, 0) — guard already ensures > 0
- Add discard == 0 early return in applyContextDiscard to skip no-op
  KV cache operations
- Guard fallback reset() with toolsAtEnd() check for consistency
- Add comment explaining eval vs generation fallback asymmetry
- Use n_predict=-2 (fill context) in test to guarantee sliding

* test: update sliding test for anchored tools behavior

With anchored tools, postInfer keeps tools in cache when the model
produces <tool_call> in output. Update the sliding regression test
to check toolsTrimmed stat instead of assuming tools are always
removed after generation.

* test: two-phase sliding test verifies adjustAfterSlide

Replace single-phase sliding test with two-phase comparison:
  Phase 1 (baseline): large context, n_predict=0 → no sliding.
    Records nPastBeforeTools as the original anchor.
  Phase 2 (sliding): small context, n_predict=-2 → sliding fires.
    After trim, nPastBeforeTools must be less than baseline.

Without adjustAfterSlide: both phases have equal nPastBeforeTools → FAIL.
With adjustAfterSlide: phase 2 anchor is smaller → PASS.

* test: exact sliding anchor assertion with session and clamped discard

Three-phase test using session cache:
  Phase 1: init session (small firstMsgTokens)
  Phase 2: baseline — large context, n_predict=0, records anchor
  Phase 3: sliding — small context, n_predict=-2, sliding fires

Simulates per-slide clamped discard (min(nDiscarded, safeLimit))
and asserts slideNPBT == expectedNPBT with exact values. Verifies
adjustAfterSlide reduces anchor by the correct amount per slide.

* test: add unclamped sliding test with long conversation

Second sliding test with longer user message and smaller n_discarded
(20). Verifies at least 1 slide discards the full n_discarded amount
(unclamped). Both tests simulate per-slide clamped discard and assert
exact nPastBeforeTools values.

* test: use n_discarded=100 with long conversation for unclamped sliding

Longer user message (~300 tokens) ensures the conversation region
exceeds n_discarded=100. Each slide discards the full 100 tokens
without clamping. Simpler and more direct than using small n_discarded.

* fix: don't add generation prompt on system-only prefill

When nPast=0 and the only message is a system prompt (role=system),
don't set add_generation_prompt=true. This was adding a stale
<|im_start|>assistant token to the cache that the model would see
as an empty assistant turn before the actual user message.

Now check the actual last message role instead of hardcoding true.
Saves 3 tokens in the cache prefix.

* chore: remove debug prompt logging

* chore: add debug log for tokenizeChat generation prompt flag

Logs nPast, lastRole, nMsgs, nTools, addGenPrompt at DEBUG verbosity.
Helps diagnose issues with stale generation prompt in cache.

* (fix) llamacpp-llm: "tool" role generate prompt tests

* (fix) llamacpp-llm: no "think" blocks in assistant history

* (internal) llamacpp-llm: test qwen3 dynamic tools template

* (chore) llamacpp-llm: upgrade package version

* fix: skip dispatch validation when called via workflow_call

The Validate Dispatch Inputs step fails when the mobile integration
workflow is invoked via workflow_call from a workflow_dispatch parent,
because github.event.inputs.package is empty in that context.

* fix: align prebuild download path with verify step in LLM mobile workflow

Prebuilds are downloaded to runner.temp/qvac-lib-infer-llamacpp-llm but
the verify step looked in runner.temp/prebuilds-download, so prebuilds
were never found.

* (internal) llamacpp-llm: runtimeDebugStats internal method

* (chore) llamacpp-llm: tools_at_end rename to tools_compact

* (improvement) llamacpp-llm: tools_compact feature docs

* (chore) llamacpp-llm: fix test

* (chore) llamacpp-llm: rename, cleanup, tests assertions

* (internal) llamacpp-llm: improve tests

* (internal) llamacpp-llm: reduce test flakiness with 0 temp

* (internal) llamacpp-llm: test rename

* (internal) llamacpp-llm: generate tests correct

* (internal) llamacpp-llm: improve sliding ctx tests

* (chore) llamacpp-llm: version bump

* (chore) llamacpp-llm: clang-format

* (fix) llamacpp-llm: qwen3 template perf and debug null guard

* (chore) llamacpp-llm: discard tokens warning

* (chore) llamacpp-llm: reuse getStatValue at tests

* (fix) llamacpp-llm: first msg sliding guard

* (improvement) llamacpp-llm: tools_compact require tools always

* (chore) llamacpp-llm: fix linter

* (fix) llamacpp-llm: guard regression, integration tests

* (internal) llamacpp-llm: remove over-defensive checks, fix test

* (chore) llamacpp-llm: cleanup linter and unused tests

* refactoring: anchored tools structured (tetherto#1658)

* (doc) llamacpp-llm: structure proposal

* (doc) llamacpp-llm: refactoring plan

* (internal) llamacpp-llm: extract tools compact controller from llm contexts

* (internal) llamacpp-llm: extract shared context slider for text and mtmd

* (internal) llamacpp-llm: ContextSlider testable, more tests

* (internal) llamacpp-llm: migrate tools compact coverage to deterministic unit tests

* (chore) llamacpp-llm: follow up minor fixes

* (internal) llamacpp-llm: improve multi-model portability

* (internal) llamacpp-llm: decouple ChatTemplateUtils

* (internal) llamacpp-llm: tools_compact contract, tests

* (internal) llamacpp-llm: ToolsCompactController tests and comments

* (doc) llamacpp-llm: tools_compact refine verify

* (internal) llamacpp-llm: tools compact profile resolution improved

* (chore) llamacpp-llm: clang format

* (chore) llamacpp-llm: tools-compact test improved

* (chore) llamacpp-llm: test conditin check style

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

* (chore) llamacpp-llm: bump version, remove nested namespace

* (chore) llamacpp-llm: changelog improved

* (chore) llamacpp-llm: cleanup, test tool token count comment

* (chore) llamacpp-llm: tests useless conditional

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

* (chore) llamacpp-llm: tests refactor and remove redundant

* (chore) llamacpp-llm: deduplicate cache management tests, context slider edge coverage

* (chore) llamacpp-llm: clang format

* (fix) llamacpp-llm: ToolsCompact tools_calls check

* (internal) llamacpp-llm: oss string handle optimization

* (internal) llamacpp-llm: compute user msg index at cpp

* Revert "(internal) llamacpp-llm: compute user msg index at cpp"

This reverts commit 872eb47.

* (internal) llamacpp-llm: qwen3 dynamic template loop perf improved

* (chore) llamacpp-llm: clang format

---------

Co-authored-by: olyasir <sirkinolya@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
…stry server (tetherto#1724)

* QVAC-17131 feat: add Prometheus metrics monitoring to registry server (tetherto#1600)

* feat: add Prometheus metrics monitoring to registry server

* fix: restrict registry ping RPC to role and timestamp to avoid exposing operational data

* fix: make metrics bind host configurable and move off port 9090

* feat: replace per-model size gauge with view-derived total blob bytes (tetherto#1689)

* feat[bc]: rename gauges, add seeder metrics, and eagerly open blob core on indexers (tetherto#1692)

* feat[bc]: rename gauge metrics off _total suffix and pre-initialise rpc counters

* feat: add core seeder metrics and eagerly open blob core on indexers

* style: drop eslint-disable directives via helper function for gauge registration

* refactor[bc]: drop core_name label from blob core metrics and use median for view-derived stat panels

* style: drop noisy comment above registerGauge helper

* feat[bc]: replace blob_core_fully_downloaded with length/contiguous_length pair and drop blind-peer metrics (tetherto#1702)

* feat: expand Grafana dashboard with blob-core replication, seeders, and Holepunch P2P panels (tetherto#1716)

* feat: expand Grafana dashboard with blob-core replication, seeders, and Holepunch P2P panels

* fix: use vm_name label in QVAC and Holepunch panel legends instead of raw instance IP:port

* fix: apply $vm template filter to QVAC and Holepunch selectors for consistent per-node filtering


* chore[docs]: tighten registry Grafana dashboard panels based on staging review (tetherto#1718)

* chore[docs]: tighten registry Grafana dashboard panels based on staging review

* chore[docs]: drop redundant Blob Core Contiguous stat, cluster blob panels near the top

* chore[docs]: promote View Core Replication and Blob Core Bytes to the top of the metrics section (tetherto#1719)

* chore[docs]: promote View Core Replication and Blob Core Bytes to the top of the metrics section

* chore[docs]: split View Core Replication into length, contiguous, and gap panels

* chore: remove dead blind-peer helpers and fix stale metrics docs

- Drop unreferenced getConnectedBlindPeerKeys / getConfiguredBlindPeerKeys /
  isBlindPeerConnected chain and the _peerConnectionCounts map that only
  existed to back isBlindPeerConnected. Left over from the dropped
  blob_core_blind_peers gauge (1de851b).
- Fix DEPLOYMENT_GUIDE.md: default metrics port is 9210, not 9090; drop
  the hypermetrics reference since it is not a dependency (abandoned,
  incompatible with Hypercore v11) and per-core visibility is provided
  by the registry_blob_core_* / registry_view_core_* gauges.
* chore: Initial test-removal of environments for PR runs, remove unnecessary npmrcs

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
…ape (tetherto#1688)

* chore[bc|notask]: migrate SDK plugins to new addon constructor shape

The three addon refactors (tetherto#1493 embed, tetherto#1494 LLM, tetherto#1496 diffusion)
landed on main without the matching SDK plugin migration. Their published
releases (`@qvac/embed-llamacpp@0.14.0`, `@qvac/llm-llamacpp@0.16.0`,
`@qvac/diffusion-cpp@0.3.0`) dropped the `BaseInference` / `WeightsProvider`
loader-and-disk-path constructor and replaced it with a single-argument
`{ files, config, logger, opts }` shape that takes pre-resolved absolute paths.

Plugin changes (llamacpp-completion, llamacpp-embedding, sdcpp-generation):
- Construct the addon with the new single-argument shape. LLM and embed
  pass `files.model: string[]`; diffusion passes `files.model: string`
  plus renamed companion keys (`clipL`/`clipG`/`t5Xxl`/`llm`/`vae`).
- Drop `FilesystemDL`, `parseModelPath`, and the `asLoader` adapter from
  each plugin. Addons now receive absolute paths directly.
- Return `{ model }` instead of `{ model, loader }`.

Loader field removal across the plugin-registry contract:
- `PluginModelResult.loader?:` removed from the interface.
- `LocalOptions.loader?: FilesystemDL` and the `registerModel` options
  slot for it both dropped.
- `unloadAllModels` and `unloadModel` no longer call `entry.local.loader.close()`.
- `loadModel`'s cast tightened to `{ model: AnyModel }` and the conditional
  loader spread into `registerModel` removed.
- Non-migrated plugins (nmt, whisper, ocr, tts, parakeet) also simplified
  to `return { model }`. Their addons already stopped accepting a loader
  in earlier refactors; the pass-through was dead code. Parakeet drops
  its now-unused `new FilesystemDL({ dirPath })` + `parseModelPath` call.
- `server/bare/utils/loader-adapter.ts` (asLoader) and
  `server/utils/model-path.ts` (parseModelPath) deleted. Both had no
  remaining callers after the plugin cleanup.

Sharded-GGUF helper and tests:
- New `packages/sdk/server/utils/expand-gguf-shards.ts` turns a single
  sharded GGUF path into the ordered list the new addon contract expects
  (`.tensors.txt` companion first, then `-NNNNN-of-NNNNN.gguf` shards).
  Pure string manipulation, POSIX and Windows separator handling.
- 9 unit tests cover non-sharded paths, first-shard input, non-first-shard
  input, nested directories, single-shard (1-of-1), relative paths,
  Windows backslash separators, and a substring-match regression test
  (filename containing a shard-like pattern mid-basename must not match).

Dependency changes:
- Bump SDK deps to the published addon versions:
  `@qvac/diffusion-cpp ^0.3.0`, `@qvac/embed-llamacpp ^0.14.0`,
  `@qvac/llm-llamacpp ^0.16.0`.
- Drop `@qvac/dl-filesystem` from `dependencies`; no remaining consumer
  in the SDK.
- `bun.lock` refreshed; `bun install --frozen-lockfile` clean.

Docs: the custom-plugin example in `docs/website/.../write-custom-plugin.mdx`
drops the stale `loader: null` return to match the new `{ model }` shape.

Test fixtures updated: `plugin-system.test.ts` and `sdcpp-plugin.test.ts`
drop `loader: {}` / `loader: undefined` from mock `createModel` returns and
from `registerModel` calls (the field no longer exists on either interface).

Verified: `bun run build` clean (`--max-warnings=0`), 423/423 unit tests
pass, all three diffusion examples and 19 LLM examples and 4 RAG examples
run end-to-end against the real addons.

Supersedes tetherto#1510, which carried stale merge history from the
addon-refactor side-branches and could not be rebased onto current main
cleanly.

* fix(examples): set FLUX guidance params on diffusion examples

Both `diffusion-txt2img.ts` and `diffusion-flux2-klein.ts` default to
FLUX.2 Klein but neither was sending the right guidance knobs. stable-
diffusion.cpp gates the unconditional inference branch on
`guidance.txt_cfg != 1.0` (`stable-diffusion.cpp:3304`) and logs "use
cfg-scale=1 for distilled models" (`:1667`); FLUX is a distilled model.
The addon's `GenParams.cfgScale` defaults to `7.0f`
(`SdGenHandlers.hpp:44`) and is assigned straight into
`sample_params.guidance.txt_cfg` at `SdModel.cpp:499`, so omitting or
leaving `cfg_scale` at the default forces the full CFG path on FLUX
every step for zero quality benefit. `guidance: 3.5` is the FLUX
distilled-guidance default and also needs to be set explicitly.

Setting `guidance: 3.5, cfg_scale: 1` in both examples halves generation
cost per step on FLUX. Measured on an RTX 5080 (Vulkan): 20-step txt2img
drops from 17.1s to 9.1s; flux2-klein drops from 17.4s to 9.1s.

* fix: simplify SDK shard expansion helpers

Keep `expandGGUFIntoShards()` Bun-testable without introducing a separate shared shard-pattern module. This reduces refactor churn while preserving the sharded GGUF behavior expected by the SDK plugins.

* fix(examples): remove unintended docs and FLUX comment churn

Restore the custom-plugin docs example so this PR stays scoped to the SDK addon integration work. Remove the extra FLUX.2 guidance comments from the diffusion examples as requested.

* fix: restore shard-utils JSDoc

* fix: address review feedback from PR tetherto#1688

- `packages/sdk/server/bare/plugins/nmtcpp-translation/plugin.ts`:
  inline `path.dirname(modelPath)` and `path.basename(modelPath)` in
  `deriveColocatedBergamotVocabPaths` instead of importing
  `parseModelPath`. After rebasing onto main, `parseModelPath` no
  longer exists (this PR deletes `server/utils/model-path.ts`) but
  `tetherto#1707` reintroduced its usage here. `bare-path` is already
  imported in this file, so the inline form is a direct swap.
- `packages/sdk/server/utils/expand-gguf-shards.ts`: drop the
  redundant `totalDigits = String(totalShards).padStart(5, "0")`
  round-trip and use the regex capture `match[3]` directly (it is
  already a 5-digit zero-padded string). Also drop the
  `!Number.isFinite(totalShards)` guard: a 5-digit regex match plus
  base-10 `parseInt` always produces a finite integer, so the
  check is dead defense. The `<= 0` guard is kept (pins the
  `00000-of-00000.gguf` edge case).
- `packages/sdk/test/unit/expand-gguf-shards.test.ts`: add a test
  that pins the zero-total shard-count branch; `expandGGUFIntoShards`
  must return the input path unchanged for `empty-00000-of-00000.gguf`
  rather than an empty shard list.

* chore: route expand-gguf-shards import through server/utils barrel

- `packages/sdk/server/utils/index.ts`: add `expand-gguf-shards` to
  the barrel re-exports alongside the other utility modules.
- `packages/sdk/server/bare/plugins/llamacpp-completion/plugin.ts`
  and `llamacpp-embedding/plugin.ts`: import `expandGGUFIntoShards`
  from `@/server/utils` instead of the module path directly.
- `packages/sdk/test/unit/expand-gguf-shards.test.ts`: same.

No behavior change; keeps a single re-export point for server utils.

* fix: import expandGGUFIntoShards directly in unit test

The unit test runs under Bun. Importing through the `@/server/utils`
barrel drags in `checksum.ts` and `formatting.ts`, both of which
`import crypto from "bare-crypto"`. Bun does not implement Bare's
`require.addon()`, so evaluating that import chain throws
`TypeError: require.addon is not a function` at load time and the
test run exits non-zero before any assertion executes.

The plugin files (`llamacpp-completion/plugin.ts` and
`llamacpp-embedding/plugin.ts`) keep the barrel import because they
only evaluate under Bare. Only the Bun-loaded unit test needs to go
through the module path directly.

---------

Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>
…es (tetherto#1712)

* infra: distinguish [ios] / [android] / [desktop] jobs in sdk test workflows

Add display `name:` fields to jobs in the three reusable SDK test
workflows so the Actions UI can tell platforms apart when they run under
the same umbrella (test-sdk.yml). Job IDs, needs: graphs, outputs, and
artifact names are unchanged.

* chore: bump qvac-test-suite to ^0.6.0 and add install:build script

- Bump @tetherto/qvac-test-suite from ^0.5.1 to ^0.6.0 to pick up the
  run:local:desktop / run:local:android / run:local:ios commands and
  the suite + bootstrap features used by the refreshed local flow.
- Add install:build script (npm install --install-links && npm run
  build) for a one-shot reinstall + rebuild after SDK changes.

* doc: rewrite tests-qvac readme for local-first workflow and ci triggers

- Lead with run:local:* one-liners instead of the old manual iOS flow.
- Document the MQTT broker requirement (ws:8080 + mqtt:1883) and the
  embedded aedes + websocket-stream fallback behaviour.
- Document the PR label triggers (test-e2e-smoke, test-e2e-full) and
  the manual workflow_dispatch entry point, including the non-obvious
  workflow-branch vs test-version distinction.
- Add a "Developing new tests" section with executor placement guidance
  (shared / desktop / mobile) and the smoke-suite policy (1-2 tests per
  feature, only when no existing smoke coverage).
- Keep manual Xcode fallback only as a troubleshooting bullet.

* doc: add cursor rule for tests-qvac e2e impact and authoring conventions

New cursor rule scoped to packages/sdk/** that enforces:

- Evaluate e2e test suite impact on any SDK source change.
- Rebuild tests-qvac via `npm run install:build` on SDK API or model
  constant changes; adapt or add tests accordingly.
- Executor placement decision tree (tests/shared vs tests/desktop vs
  tests/mobile) with the hard rule that node:* imports are banned from
  shared/ and mobile/.
- Smoke-suite policy: 1-2 tests per feature, only when no existing
  smoke coverage, stable on both desktop and mobile.
- Points at tests-qvac/README.md for the local-run and CI-trigger
  details.

* chore: gitignore tests-qvac local secrets and rag-hyperdb data

Prevent accidental commits of local run artefacts:

- .env / .env.bak-* may contain MQTT credentials copied from
  .env.example.
- rag-hyperdb/ holds generated HyperDB corestore data from RAG tests.

* chore: add install:build:full / prepare:sdk scripts and document rebuild flows

- package.json: prepare:sdk (bun install + build in packages/sdk/) and
  install:build:full (prepare:sdk + install:build) for one-shot
  SDK + tests-qvac rebuilds.
- README: new "Rebuilding after changes" section with a decision table
  covering SDK source changes, test-code-only changes, and
  producer-side-only changes (--skip-build). Clarifies that mobile
  always needs a fresh APK/IPA to pick up SDK or test-code changes and
  that --skip-build is strictly for re-runs with different suites or
  filters.
- Cursor rule now points at the README section and references
  install:build:full alongside install:build.

* doc: add sdk-e2e-create skill for e2e test planning

New Cursor skill under .cursor/skills/sdk-e2e-create/ that guides
planning and scaffolding of e2e tests in packages/sdk/tests-qvac for
new or changed public SDK APIs.

- Investigate-first flow: read the feature from code and existing
  tests, then present a concrete plan and ask targeted clarifying
  questions only where genuine ambiguity remains.
- Enforces happy / sad / error coverage for every public API feature.
- Ranks model-output validation strategies from deterministic
  keyword assertions down to shape-only fallbacks, to avoid weak
  coverage by default.
- Covers executor placement (shared / desktop / mobile) with mobile
  memory / filesystem / platform constraints.
- Smoke-suite selection rules: 1-2 tests per feature, only when no
  existing smoke coverage, stable across platforms.
- Includes scaffolding templates and the exact run:local:desktop
  --filter command to hand back to the user for local verification.
Victor-Rodzko and others added 17 commits May 13, 2026 21:05
…ag (PR-1701) (tetherto#1883)

* test: add e2e coverage for transcribe()/transcribeStream() per-segment metadata (PR-1701)

- transcription-metadata-batch / transcription-metadata-streaming:
  validate TranscribeSegment[] shape returned by Whisper when metadata: true
- parakeet-tdt-metadata-rejected: assert metadata flag is rejected by
  non-Whisper engines (Parakeet)
- shared validateSegments() helper, content-agnostic shape check
- desktop + mobile executor parity for transcription and parakeet flows

Also folds in a tooling fix surfaced while authoring these tests:
tests-qvac/package.json gains clean:sdk-snapshot which wipes
node_modules/@qvac/sdk plus the iOS/Android consumer build snapshots
before reinstalling, so install:build:full no longer reuses a stale
@qvac/sdk copy on any platform.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test[skiplog]: exercise duplex metadata flow and pin parakeet rejection reason

* test[skiplog]: apply code review suggestions

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
… cache invalidation (tetherto#2004)

Adds 2 e2e tests in tests-qvac (translation-bergamot-fr-en-cache-reload,
translation-bergamot-en-fr-cache-reload) covering the QVAC-18420 regression
where shared vocab files for bidirectional Bergamot pairs were silently
re-downloaded on every loadModel call.

Each test does load -> unload (Round 1, warm cache) then load with onProgress
-> unload (Round 2, must be a pure cache hit). Cache-hit detection is
platform-agnostic via partial-percentage progress event counting (no
node:fs snapshots).

Skipped on mobile via SkipExecutor since the bug lives in server-side Bare
code that is bit-identical across platforms.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ex streaming (tetherto#2018)

* feat: Implement parakeet ggml backend

* add duplex streaming

* clean up

* chore: gitignore local *-output.wav demo artifacts

* Add ^ before parakeet version

* QVAC-17869 feat[bc]: address PR review for parakeet 0.4.0 GGML migration

Resolves blockers and should-fix items from PR tetherto#2018 review:

- Revert TTS GGML bleed-in from desktop/mobile e2e consumers so the PR
  is parakeet-scoped.
- Add structured LegacyParakeetModelDeprecatedError (server code 52210)
  and parakeetLoadConfigSchema that allow-lists legacy ONNX modelConfig
  fields so they reach resolveParakeetConfig and surface a clear
  migration message instead of a generic Zod failure. Public
  parakeetConfigSchema.strict() still rejects them.
- Refactor endOfTurnEventSchema into a discriminated union on `source`
  ("whisper" requires silenceDurationMs; "parakeet" omits it). Threads
  the discriminator through transcribe op, both transcription plugins,
  the client API, examples, and unit tests.
- Set skipPrimaryModelPathValidation to false in the parakeet plugin
  now that modelSrc is a real GGUF the framework can validate.
- Add e2e tests parakeetStreamDestroyMidUtterance and
  parakeetStreamIteratorThrow (wired into desktop + mobile executors)
  to cover session.destroy() mid-utterance and consumer iterator unwind.
- Add TODO(QVAC-17869-followup) in the parakeet duplex handler about
  wiring AbortSignal for cancellation in the next iteration.
- Update transcription docs for the new single-GGUF loadModel shape,
  the LegacyParakeetModelDeprecatedError migration note, and the new
  transcribeStream() discriminated endOfTurn events. Update error code
  52210 entry in the API reference.
- Resolve package.json merge conflict (keep ^0.4.0 for
  @qvac/transcription-parakeet; take main's whispercpp ^0.7.0 and
  nmtcpp ^3.0.0).

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-17869 fix: drop stale skipPrimaryModelPathValidation on parakeet plugin

Address PR tetherto#2018 review feedback: the multi-file ONNX-era flag is no
longer meaningful in 0.4.0 where the top-level `modelSrc` is the
actual GGUF the addon mmaps. Remove the explicit `false` so the plugin
relies on the framework default (which runs primary-path validation),
and leave a short comment recording the intent so the omission is not
misread as an oversight.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-17869 fix: cast parakeet metadata-rejection probe via unknown

`transcribeStream({ metadata: true })` resolves to an `AsyncGenerator`,
not a duplex `TranscribeStreamSession`. The two types have no overlap,
so a direct `as TranscribeStreamSession` is rejected by TS5
(TS2352). The only consumer here is the negative-path probe in
`runParakeetStreamMetadataRejected`, which expects the call to throw
before the cast is ever observed at runtime, so go through `unknown`
to satisfy the type system without changing the test's behaviour.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-17869 doc: fix broken parakeet-cpp engine link in transcription docs

The PR review on tetherto#2018 flagged that
https://github.com/tetherto/qvac-parakeet.cpp does not exist under
the tetherto org. The parakeet-cpp engine actually lives as a
subdirectory inside qvac-ext-lib-whisper.cpp (consistent with the
attribution URL already used in packages/transcription-parakeet/NOTICE).

Update the transcription overview to link to the real location.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-17869 fix: make parakeet-stream e2e tests pass end-to-end

Four interlocking fixes for the new `parakeet-stream-*` e2e suite:

1. Executor dispatch order (desktop + mobile consumer.ts). The
   `ParakeetExecutor` pattern `/^parakeet-/` was registered before
   `ParakeetStreamExecutor` (pattern `/^parakeet-stream-/`), so the
   broader matcher won every dispatch and stream test ids landed in the
   wrong executor (which lacks stream handlers, surfacing as
   "Unknown test"). Swap the order so the more specific pattern wins.

2. Audio fixture sample rate (parakeet-stream-tests.ts). The duplex
   runner feeds raw PCM directly into the parakeet session with no
   FFmpegDecoder hop, so the fixture must already be 16 kHz mono. The
   previous `transcription-short-wav.wav` (48 kHz stereo) was rejected
   by the runner's `sampleRate !== 16000` precondition. Switched the
   happy/reject/teardown/throw tests to `diarization-sample-16k.wav`.

3. Wall-clock pacing in the runner (parakeet-stream-runner.ts). The
   native parakeet `StreamSession` only commits transcript segments
   when audio arrives at roughly real-time cadence — flushing all
   chunks synchronously starves its internal segmenter and yields zero
   events. Made `writeInChunks` async and added a `delayMs` parameter
   that paces writes at the test's configured `chunkMs`; matches the
   addon's own `live-stream-simulation.test.js` / `duplex-streaming
   .test.js` pacing model. All call sites updated to `await` it.

4. EOU fixture (parakeet-stream-tests.ts). The EOU detector fires
   `<EOU>` based on sentence-final / turn-boundary linguistic patterns
   (see the addon-level `eou-streaming.test.js` regression note), and
   `diarization-sample-16k.wav` is continuous multi-speaker overlap —
   transcript text comes back but no `isEndOfTurn` segments surface.
   Switched the EOU test alone to `two-speakers-16k.wav`, the
   alternating two-speaker conversation fixture, which provides the
   turn-boundary stimulus the EOU head is trained on.

Result on desktop: 21/21 parakeet stream + dependent tests pass (was
0/5 stream tests pre-fix, then 16/21, then 20/21).

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-17869 fix: explicit session destroy in parakeet stream iterator-throw test

`runParakeetStreamIteratorThrow` previously relied on the for-await
sentinel throw to invoke the async iterator's `return()` and, through
it, tear down the native `StreamSession` before opening the recovery
session against the same model. On Node/Bare-desktop the unwind runs
to completion synchronously enough that the next `transcribeStream`
call sees a fully released model. On the Bare-RN bridge (iOS /
Android) the iterator-return → native-destroy chain crosses JSI and
is best-effort, so the recovery session opens while the previous
native session is still alive, the model stays wedged, and the
recovery iteration yields zero events — surfacing as `assertHappy`
reporting `expected at least one text event, got: {}`.

Call `throwingSession.destroy()` explicitly between catching the
sentinel and opening the recovery session. This matches the contract
real SDK consumers should follow when abandoning a stream
mid-iteration (the iterator's `return()` is intentionally best-effort
across runtimes) and keeps the test focused on the recovery contract
rather than JSI return-propagation timing.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
… ABI verification (tetherto#1984)

* fix: handle repeated package names in buildNestedPathIndex
Replace `key.indexOf(marker)` with `match.index` so each regex match
maps to its actual package root. Previously, a resolution key with
the same package name appearing twice (e.g.
`node_modules/foo/node_modules/bar/node_modules/foo/index.js`)
collapsed the nested `foo` back to the top-level path.
Also exports `buildNestedPathIndex` so it can be reused by the
upcoming `qvac verify bundle` command.
Adds regression tests covering single, nested, and repeated-name
resolution keys.

* feat[api]: add qvac verify bundle command for prebuild and ABI verification
New `qvac verify bundle` subcommand under `qvac verify`. Validates
the actual artifacts (prebuilds + ABI compatibility) of a worker
bundle or installed node_modules tree before shipping.
Accepts a `worker.bundle.js` (bare-pack tree-shaken output) or a
`node_modules` directory via `--addons-source`; source kind is
auto-detected.
Per addon per `--host`:
- prebuild presence: `<packageRoot>/prebuilds/<host>/*.bare`
- ABI compatibility: addon's `engines.bare` must satisfy the resolved
  Bare runtime version. Resolution order: `--bare-runtime-version`
  flag -> `bare-runtime/package.json` -> `bare/package.json`. Mobile/Expo
  CI should pass `--bare-runtime-version` explicitly;
  `react-native-bare-kit` does not currently expose embedded runtime
  metadata.
Structured issue codes for CI consumption:
- error: `missing-prebuild`, `abi-mismatch`, `invalid-runtime-version`,
  `invalid-source`
- warning: `unknown-runtime-version`, `malformed-engines-bare`
Exit 1 on any error-level issue, 0 otherwise.
Tests: unit + Bats smoke for both source kinds, prebuild/ABI checks,
runtime auto-resolution, malformed-engines-bare warning path
(surfaced even when runtime is unknown), and regression coverage for
nested-only bundle resolutions and multi-instance retention.
Validated end-to-end against qvac-app-workbench-mobile: 33 addons in
a 10MB worker.bundle.js across 5 hosts; 45 addons in node_modules
with 2 legitimate prebuild gaps surfaced; bare-os@3.6.2 (top-level)
and bare-os@3.9.0 (nested under @qvac/tts-onnx) correctly
distinguished.
Adds `semver` as a runtime dependency.

* feat[api]: read bareRuntimeVersion from qvac.config in verify bundle

Adds `--config <path>` and auto-detection of `qvac.config.{json,js,mjs,ts}`
so projects can pin the Bare runtime in a committed file (works in any
runtime, including the future Pear pre-hook where env vars don't).

Resolution: `--bare-runtime-version` > config `bareRuntimeVersion` >
`bare-runtime/package.json` > `bare/package.json`. Both flag and config
values share the same semver validation; malformed values emit
`invalid-runtime-version` carrying `source: 'flag' | 'config'`, with the
config file's actual path in the message. Explicit `--config` to a
missing/unreadable file emits `invalid-source`; auto-detect failures and
non-string config values fall through silently.

Tests: 7 unit + 1 bats covering auto-detect, explicit `--config`, flag
precedence, malformed/non-string config values, and config-path label in
error messages.

* fix[api]: address verify bundle review feedback

- node-modules-source.ts + prebuilds.ts: treat symlinked package
  directories and prebuilds as valid (isDirectory || isSymbolicLink)
  so pnpm / yarn-pnp layouts don't silently pass verification.
- abi.ts: pass `{ includePrerelease: true }` to `semver.coerce` so
  RC runtimes like 1.16.0-rc.1 aren't silently coerced to 1.16.0.
- index.ts: emit `config-load-failed` warning when an auto-detected
  qvac.config.* exists but fails to parse, instead of swallowing the
  error. Explicit `--config` still errors via `invalid-source`.

Tests: 3 regressions covering each fix.

* fix[api]: address verify bundle review feedback (Simon)

- index.ts: invalid-runtime-version no longer short-circuits the
  prebuild walk; a typo in `--bare-runtime-version` or a malformed
  config `bareRuntimeVersion` no longer hides real missing-prebuild
  errors. ABI resolution stays skipped when the runtime version is
  malformed.
- addon-source.ts + bundle-source.ts + node-modules-source.ts:
  thread `CollectDiagnostics` so malformed `package.json` records
  (parse error, non-object, missing `name` on `addon: true`) and
  empty bare-pack resolutions are no longer swallowed.
- index.ts: surface diagnostics as two new warning issues
  `invalid-package-json` and `empty-bundle-resolutions`, both
  warning-level (exit 0). Formatters added; `VerifyBundleIssue`
  union extended.

Tests: 2 orchestrator regressions (no-short-circuit, empty-bundle-
resolutions) + extended `readAddonPackageJson` malformed-JSON unit
to assert the new `invalid` record.

* feat[api]: add --json flag to qvac verify bundle

Emit the verification result as pretty-printed JSON on stdout instead of
the human-readable summary. Exit codes are unchanged. Mirrors the
`qvac doctor --json` convention so CI scripts, dashboards, and other
downstream tooling can consume the structured issue codes (`addons`,
`runtime`, `issues`, `hosts`, `sourceKind`, etc.) directly.

`--quiet` is ignored when `--json` is set (a JSON consumer explicitly
asked for output).

README updated under `verify bundle` options, exit codes, and issue codes
(also documents the three new warning codes added in the previous commit:
`invalid-package-json`, `empty-bundle-resolutions`, `config-load-failed`).

* doc: tighten --json option row in verify bundle README

Drop the field list and the "exit codes unchanged" sentence; align voice
with the other rows (and with the existing `qvac doctor --json` blurb).
…ache via KvCacheSession (tetherto#2007)

* QVAC-18182 feat[api]: typed cancel outcomes on the wire + atomic KV-cache via KvCacheSession

Builds on QVAC-18181's request lifecycle primitives (DisposableScope,
RequestContext, RequestRegistry) to deliver the M2 milestone:

- Typed cancel outcomes: `stopReason: "cancelled"` on `completionDone`
  events, and `InferenceCancelledError(requestId, partial)` thrown from
  CompletionRun promise-aggregates (`final` / `text` / `toolCalls` /
  `stats`). The wire stream still ends normally so iterating
  `run.events` is unaffected — the typed error lives on the aggregate
  promises that callers `await` for the final result.

- KvCacheSession (`server/bare/plugins/llamacpp-completion/ops/
  kv-cache-session.ts`) — single atomic owner of the three KV-cache
  layers (`cachedMessageCounts`, `initializedCaches`, on-disk `.bin`
  files). `beginTurn` / `commitTurn` / `rollback` collapse the three
  duplicated cleanup blocks in `completion-stream.ts` into one
  scope.defer hook. Cross-model administrative deletion lives at the
  module level as `deleteKvCacheState(...)`, called by the RPC
  `handleDeleteCache` handler.

- Stop-button race close — `RequestRegistry` now keeps a bounded
  cancelled-before-begin map (128 entries, 30s TTL). A `cancel({
  requestId })` that lands before the server's `begin(...)` ran is
  applied retroactively when begin lands, so same-tick stop clicks no
  longer disappear into the void. Internal-only — the wire surface for
  `cancel` is unchanged (Option A in the brief).

Cursor rules updated in the same PR so the request-lifecycle and
KV-cache topic docs stay in sync with the implementation.

Tests:
- unit: KvCacheSession (bareTest-gated, runs in the Bare consumer),
  RequestRegistry race + bounded-set eviction, completion-event schema
  cancelled cases.
- e2e: cancellation-tests.ts adds three definitions — mid-stream cancel
  (events.stopReason === "cancelled", final rejects with
  InferenceCancelledError, partial.text matches concatenated
  contentDelta), cancel-before-begin (retroactive abort), and
  cancel-then-resume-kv-cache (rollback wiped the three layers, the
  next turn re-primes cleanly).

* chore: drop planning labels (Mx/Dx) from QVAC-18182 comments

Strips milestone (`M1`/`M2`/`M3a`...) and deliverable (`D2`/`D5`/`D7`)
labels from comments and test titles introduced with the typed-cancel
outcomes + KvCacheSession work. The substantive descriptions of the
contracts (Stop-button race, cancelled-before-begin map, three-layer
session ownership, etc.) are preserved; only the planning-doc
references are removed so the code reads cleanly without the pitch
context. Durable `QVAC-XXXXX` ticket references are kept.

No behavior or API surface changes.

* chore: drop Asana ticket references from QVAC-18182 code comments

Strips QVAC-XXXXX inline ticket references from code/test comments
introduced by the typed-cancel-outcomes work. Concept names
(Stop-button race, cancelled-before-begin, etc.) and prose
descriptions of the contracts are preserved; only the ticket-tag
suffixes go. Also renames a test cache key from
`qvac-18182-cancel-resume-kvcache` to `cancel-then-resume-kvcache` so
the cache key reads as a stable identifier rather than a ticket
reference.

No behavior or API surface changes.

* QVAC-18182 doc: clarify error>cancelled precedence + deleteKvCacheState concurrency

Address non-blocking review nits on PR tetherto#2007:

- aggregate-events: explain why a wire event carrying both error and
  cancelled signals resolves to error (closes brief open question tetherto#3).
- kv-cache-session: doc-comment on deleteKvCacheState explaining the
  ordering guarantee under concurrent in-flight turns -- delete is
  wire-async, in-flight turns roll back idempotently when their commit
  probe finds the file gone (closes brief open question tetherto#4).

Comments only; no behavior changes.

* QVAC-18182 doc: demonstrate typed cancel outcomes in cancel example

Enhance the existing cancel-by-request-id example to demonstrate the
two M2 cancel-outcome channels:

- run.events ends normally with completionDone carrying
  stopReason: "cancelled" -- show reading it inside the iteration loop.
- run.text rejects with InferenceCancelledError(requestId, partial) on
  cancel -- show the instanceof check and consuming partial.text,
  partial.toolCalls, partial.stats.

Also update the header to remove the now-stale "logged as a no-match"
sentence (same-tick cancels are no longer dropped after M2's race
close).

Pure documentation enhancement; no API or behavior changes.

* QVAC-18182 fix: address PR review — partial-prime cleanup + parent-aborted state

Two follow-ups from Opanin's review on PR tetherto#2007:

1. KvCacheSession.beginTurn: if `primeIfMissing` throws after the
   addon has partially written a `.bin` to disk, the next
   `beginCustom` would `fsPromises.access(cachePath)` → true and
   trust the half-primed file as a valid cache (no rollback hook is
   registered yet — the handler hasn't seen the `TurnHandle`). Wrap
   both `beginCustom` and `beginAuto` prime calls in a shared
   `primeOrCleanup` helper that best-effort unlinks the partial file
   before re-throwing the original prime error. Adds a bare-only unit
   test asserting the on-disk file is removed and the init flag stays
   unset on the failed-prime path.

2. RequestRegistry.begin: when `parentSignal` was already aborted at
   begin time, line 271 aborts the controller but the `state` ternary
   still landed `"running"`, exactly the "momentarily-running with
   already-aborted signal" the preCancel branch was guarding against.
   Extend the ternary to cover both inputs and the existing
   `parentSignal already aborted` test now also asserts
   `ctx.state === "cancelling"`.

No behavior change on the happy path. Lint + typecheck + 351-test
unit suite green locally on the changed files.

* QVAC-18182 fix: prime is atomic — addon writes to .prime.tmp + atomic rename

Upgrade the previous reactive cleanup workaround (PR tetherto#2007 review by
@opaninakuffo) into a proactive atomic-by-construction design:

  - The session steers `model.run({ saveSessionPath })` to a sibling
    `cachePath + ".prime.tmp"` path.
  - Only after the prime closure resolves successfully do we promote
    the temp file to the canonical `cachePath` via `fsPromises.rename`
    (atomic same-volume on every host we target).
  - The canonical cache path is therefore *never* observable in a
    partial state — a thrown prime is indistinguishable on disk from
    a never-attempted prime, so the next existence probe (in-process
    or cross-process worker restart) cannot trust corrupt bytes.

Defensive details:
  - We unlink any leftover `.prime.tmp` *before* invoking the closure,
    so a deferred-write addon path can't accidentally promote
    stale-from-crash bytes left by a prior worker.
  - On prime success we probe the temp path before renaming. If the
    addon deferred its disk write (some llama.cpp paths flush lazily),
    the temp doesn't exist and we leave the canonical path absent —
    `verifySaveAndRecord` in `commitTurn` is the authoritative check.
  - On rename failure we unlink the temp and surface the rename error;
    rename atomicity guarantees the canonical path was untouched.

Why this is better than the prior `primeOrCleanup`:
  - Best-effort `unlink` was load-bearing for correctness in the old
    design — a failed unlink left a half-primed canonical file the
    next `beginCustom` would trust. The new design moves the only
    possible "partial" file to a non-trusted name, so failed cleanup
    cannot corrupt the canonical name by construction.
  - The unit test no longer mocks the workaround surface; it asserts
    the actual invariant ("canonical path was never written") plus
    the positive rename and the leftover-sweep guarantees.

Tests: 3 bare-only kv-cache-session unit tests (throw-leaves-canonical-
untouched, success-promotes-via-rename, leftover-from-crash-is-swept).
Lint + typecheck + 351-test unit suite green locally on the changed files.

Long-term, the right fix is one layer down — the llama.cpp addon should
write transactionally itself and surface save errors instead of
swallowing them. When that lands, this helper collapses to a direct
`prime(cachePath)` call and the `verifySaveAndRecord` access-probe
fallback (TODO already documented) can be retired together. Filed as
a separate follow-up; out of scope for this PR.

* QVAC-18182 fix: replace prime-atomic helper with verifyPrimedFile post-prime probe

Audit of the llama.cpp addon (`CacheManager::writeCacheFile` →
`llama_state_save_file`, return value swallowed; `LlamaModel::
processPromptImpl` lines 575-599) shows the bug shape Opanin flagged
on PR tetherto#2007 — "primeIfMissing throws after a partial save" — does not
actually fire. The save call is the very last operation on the
prefill path, the addon ignores its return value, and any earlier
throw means no save was attempted. So:

  - `primeOrCleanup` (`ac8d2d74e`) and the upgrade to
    `primeAtomically` (`a7420f3e6`) defended against a code path that
    the addon does not produce.
  - The real corruption shape is silent partial writes (addon's
    `llama_state_save_file` returns false, addon ignores it, file is
    half-written or empty). Atomic temp+rename did NOT close this
    gap — on a "silent partial" the closure resolves successfully and
    the helper would happily promote the partial `.prime.tmp` to the
    canonical path.

Replace both helpers with a small `verifyPrimedFile` that mirrors the
existing `verifySaveAndRecord` access-probe pattern used at commit
time, applied at prime time:

  - After a successful prime closure, `fsPromises.stat` the canonical
    path. If it doesn't exist (addon was interrupted before save) or
    has size 0 (addon save call produced an empty file), throw and
    best-effort unlink the empty leftover so the next existence probe
    doesn't trust it.
  - This catches the two failure modes Opanin's concern was a proxy
    for (cancelled-mid-prime; addon save quietly produced nothing)
    without claiming defense against partial-but-nonzero writes,
    which can only be closed at the addon layer.

The `RequestRegistry` parent-aborted-state fix (`ctx.state` ternary
covers `opts.parentSignal?.aborted`) from `ac8d2d74e` is preserved
unchanged — it stands on its own as a correct response to Opanin's
second comment.

Long-term root cause stays the addon: have
`CacheManager::writeCacheFile` check `llama_state_save_file`'s return
value and throw on failure. When that lands, both `verifyPrimedFile`
and `verifySaveAndRecord`'s access-probes can be retired together.
Filed as a separate follow-up — out of scope for this PR.

Tests: 3 prior bare-only prime-atomic tests removed; 2 new bare-only
tests added (no-file and empty-file rejection paths). Lint +
typecheck + 330-test unit suite green locally on the changed files
(pre-existing sdcpp-generation lint errors unchanged).

* QVAC-18182 doc: kv-cache rule documents addon non-transactional save + matched access-probes

Extend the "Cache Initialization (primeIfMissing)" section in
.cursor/rules/sdk/docs/kv-cache-system.mdc with the corrected
addon-contract analysis:

  - The llama.cpp addon's CacheManager::writeCacheFile discards
    llama_state_save_file's bool return; maybeSaveCacheToDisk is the
    last call on the prefill path. So no closure-rejection path can
    coexist with a partial file on disk.
  - Document the four real outcomes as a table (interrupted /
    success / silent partial write / pre-eval throw) so future
    readers can see why the SDK takes the shape it does.
  - Pin both SDK-side defenses as a matched pair: verifyPrimedFile
    at prime time (added in this PR) and verifySaveAndRecord at
    commit time (existing). Both are honest about what they catch
    (missing / empty file) and what they don't (partial-but-nonzero,
    only addon fix can close that).
  - Reference the addon-layer follow-up
    (1214778658064488 / "throw on llama_state_save_file failure")
    so the next contributor knows both probes will be retired
    together when the addon throws on save failure.

No code change — rule-only update.
…#2011)

* chore: Bump @qvac/rag from ^0.4.4 to ^0.5.0 to pick up its package.json imports-map fix for bare-crypto / bare-fetch.

* fix: Use bundler-visible asset references in node-rpc-client so static analysis can pull worker.js into the bundle.
…2021)

* infra: introduce new self-hosted runners to diffusion and sdk

* infra: try new runners in cpp-tests-llm

* infra: try new runners in LLM desktop and mobile integration tests

* fix: install mesa-vulkan-drivers on github-hosted ubuntu arm

* infra: use non-gpu ubuntu2404 and windows-2025 in diffusion cpp tests

* infra: use ubuntu2404 gpu runner in diffusion integration tests

* infra: sort out self-hosted runners for LLM cpp tests and integration test

* use ubuntu2204 gpu runner besides the ubuntu2404 gpu runner in diffusion integration test

* fix: correct ubuntu 22.04 usage in diffusion mobile integration test

* infra: add qvac-ubuntu2404-x64-gpu to list of runners to test in sdk

* infra: use self-hosted runners in embed cpp tests and integration tests

* infra: use self-hosted runners in ocr and tss integration tests

* fix: correct typo in step name in embed integration test

* fix: correct typo in input description in sdk desktop test

* infra: use qvac-ubuntu2204-x64 for android builds in sdk android test

* infra: use self-hosted runners in nmtcpp tests

* infra: use self-hosted runners in parakeet and whisper integration tests

* infra: use self-hosted runners in decoder audio integration test

* infra: use self-hosted runners in pr-test addon

* infra: use self-hosted runners in reusable-prebuilds.yml

* fix: set vulkan sdk path on windows & linux x64 in reusable-prebuilds.yml

* infra: use self-hosted runners in cpp lint & cpp tests in nmtcpp

* fix: revert hard-coding temp branch as ref for the cpp lint and tests workflows in nmtcpp

* test: pin temp-self-hosted-runners branch for uses: workflows

* fix: runs-on matrix.runner not matrix.os in reusable-prebuilds

* fix: cpp-lint rework setup bare tooling

* fix: nmt int mobile test don't install node on linux

* infra: bare tooling and expo ensured pre-installed on ubuntu runners

* fix: only install expo/cli for iOS platform in nmt mobile int test

* fix: bare tooling linux-x64 set path, and non-linux-x64 install otherwise

* fix: clean matrix of os/runners in reusable-prebuilds

* fix: android arm64 in reusable-prebuilds fixes

* fix: matrix.include.os for ubuntu-24.04-arm

* fix: set vcpkg install root var for windows runners

* fix: reusable prebuilds fixes for android arm64 on ubuntu-24.04

* fix: setup bare tooling (windows) in reusable-prebuilds

* infra: bare tooling is ensured to be installed on windows-2025 in reusable-prebuilds

* infra: fix stripping staatic libraries (.a) from prebuilds (Windows)

* fix: yaml indentation issue in reusable-prebuilds

* infra: npm and bare tooling ensured on x64 ubuntu and windows in nmt int test

* fix: runs-on matrix.runner before os in nmt int tests

* fix: use ${{ runner.temp }} instead of /tmp in nmt int test

* fix: create ${{ runner.temp }}/tmp dir before using it

* fix: use tmp in local dir in nmt int test

* infra: runs-on: ${{ matrix.runner || matrix.os }} in nmt mobile int test

* infra: use self-hosted runners for all transcription-whispercpp workflows

* infra: use tmp-self-hosted-runners branch for whisper mobile int test

* infra: add bare tooling to GH PATH in cpp-test-coverage-transcription-whisper

* infra: Add bare tooling to PATH (Linux x64) in integration-test-transcription-whispercpp

* infra: use tmp-self-hosted-runners branch for embed

* fix: correct ubuntu version check in cpp-tests-embed

* fix: correct vcpkg cache to work on windows in cpp-tests-embed

* test: fix linux cpp-tests-embed

* fix: override .lsan-suppressions.txt path relative to workdir in cpp-tests-embed

* infra: review on-pr-diffusion-cpp to ensure using self-hosted runners

* fix: add missing arch: x64 in 2 places

* infra: review on-pr-llm-llamacpp to ensure using self-hosted runners

* fix: cpp-tests-llm .lsan-suppressions.txt should be in workdir for linux x64 to succeed

* infra: review on-pr-tts-onnx to ensure using self-hosted runners

* infra: review on-pr-ocr-onnx to ensure using self-hosted runners

* infra: review on-pr-onnx to ensure using self-hosted runners

* infra: ensured python 3.12 on windows for tts-onnx, possibly others

* fix: no sudo in cpp-test-coverage-tts-onnx

* infra: ensure self-hosted runner usage in integration tests and ensure consistency in not using global tmp directory

* infra: ensure consistency in reviewed mobile integration tests

* fix: don't use /tmp in integration-mobile-test-ocr-onnx

* infra: ensure self-hosted runners are used in integration-test-tts-onnx

* fix: don't write in global /tmp on linux in integration-mobile-test-ocr-onnx

* fix: missed replacing ai-run-linux-gpu w/ qvac-ubuntu2404-x64-gpu in integration-test-transcription-whispercpp

* fix: increase timeout-minutes to 30 in integration-test-ocr-onnx run integration test step

* fix: timeout-minutes at the job level instead of step level in integration-test-ocr-onnx

* infra: switch to non-gpu runners in integration-test-ocr-onnx

* infra: review on-pr-tts-ggml to ensure using self-hosted runners

* infra: increase timeout-minutes from 60 to 120 in integration-test-tts-ggml

* infra: review on-pr-transcription-parakeet to ensure self-hosted runners

* infra: missed self-hosted runner in cpp-test-coverage-transcription-parakeet

* infra: review on-pr-decoder-audio to ensure self-hosted runners

* fix: typo in integration-test-decoder-audio

* infra: review on-pr-bci-whispercpp to ensure self-hosted runners

* fix: revert name ocr and tts in on-pr-ocr-onnx and on-pr-tts-onnx

* fix: replace inputs.platform and inputs.arch with matrix.platform and matrix.arch in reusable-prebuilds.yml

* fix: clone stable vcpkg branch 2025.12.12 in cpp-tests- diffusion, embed, llm

* fix: replace matrix.platform == 'x64' with matrix.arch == 'x64'

* fix: suffix gpu matrix job names

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: replace matrix.os with matrix.runner in integration-test-transcription-whispercpp

* chore: remove commented-out workflow blocks

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: align setup-node gating in workflows

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: use runner temp for unix prebuild extraction

Co-authored-by: Cursor <cursoragent@cursor.com>

* infra: add workflow_dispatch to on-pr-test-sdk and pin test-sdk workflow to branch

* fix: only one per platform in tesk-sdk

* fix: fix logic in integration tests, and fix in cpp-tests-diffusion

* infra: revert adding workflow_dispatch to on-pr-test-sdk and pinning of test-sdk workflow to branch

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
…assification (tetherto#1727)

* QVAC-17481 feat: add @qvac/classification-ggml MobileNetV3 image classification addon

Introduces a new inference addon that classifies images into three
classes (food / report / other) using a fine-tuned MobileNetV3-Small
CNN running on the libggml CPU backend. Follows the established QVAC
addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion).

## What this PR ships

- New package `packages/qvac-lib-infer-ggml-classification/` publishing
  as `@qvac/classification-ggml`:
  - Native addon: custom 34-layer MobileNetV3-Small compute graph built
    directly against the public `ggml.h` / `ggml-backend.h` API — no
    llama.cpp application-layer dependency, so the addon remains
    forward-compatible with future `libggml` upstream merges.
  - Load-time BatchNorm fold with `eps = 0.001` (the architecture-
    correct value; `1e-5` causes normalisation drift across all 34
    layers). Depthwise separable convolutions, squeeze-and-excite
    blocks, HardSwish / HardSigmoid / ReLU activations all wired
    through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`,
    `ggml_hardswish`, `ggml_hardsigmoid`.
  - FP16 GGUF weights bundled inside the package (2.94 MB); class
    labels are read from the GGUF `mobilenet.class_N` metadata so a
    future fine-tune can ship different class names without a code
    change.
  - Public JS API: `new ImageClassifier({ modelPath?, logger?,
    threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)`
    / `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input;
    validates at the JS layer before reaching native code so no bad
    input reaches libggml.
  - `nativeLogger` opt-in (default `false`): the underlying
    `qvac-lib-inference-addon-cpp` JsLogger holds a process-wide
    static `uv_async_t` that is not safe across rapid create/destroy
    cycles, so the native C++→JS log bridge is disabled unless the
    caller explicitly opts in. JS-level logging always flows through
    the caller's `logger`.
  - Image preprocessing via vendored-through-vcpkg `stb_image` +
    `stb_image_resize2` (bilinear resize to 224×224, ImageNet
    normalisation, WHCN layout).

## Build + tests

- `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting
  `ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from
  the shared QVAC vcpkg registry.
- C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9
  SE blocks), load + inference, determinism, `topK` filter, BN
  epsilon guard, and full preprocessor behaviour.
- brittle + bare JS integration tests covering load, classify (all 6
  public sample images under `test/images/`), `topK`, raw RGB input,
  and every error path: null, empty buffer, corrupted JPEG,
  unsupported format (BMP), mismatched dimensions, pre-load /
  post-unload, tiny upscale, load/unload cycles.
- Mobile test scaffolding following the shared convention:
  `scripts/generate-mobile-integration-tests.js`,
  `scripts/validate-mobile-tests.js`, `test/mobile/
  {integration-runtime.cjs, integration.auto.cjs, README.md,
  testAssets/.gitignore}`. The auto-generated `integration.auto.cjs`
  wraps every `test/integration/*.test.js` so the shared
  `qvac-test-addon-mobile` framework picks them up on Android and iOS
  automatically.

## CI workflows

Four addon-scoped workflows (path-filtered to this package):

- `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity
  checks, TypeScript declaration check, C++ lint, prebuild matrix,
  desktop integration tests, mobile integration tests, merge-guard.
- `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64,
  Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64
  prebuild matrix.
- `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop
  end-to-end tests with the shared performance reporter writing a
  GitHub step summary.
- `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` —
  AWS Device Farm Android + iOS runs via the
  `tetherto/qvac-test-addon-mobile` framework.

## Public-data / test-image policy

All public correctness assertions in this package are scoped to the 6
test images under `test/images/` (2 per class). No confidential
fine-tuning numbers, validation-set sizes, per-class metrics, or
references to any internal validation dataset appear in this PR, in
any file it ships, or in CI logs. Internal numerical-equivalence
gating against an ONNX FP32 reference is handled pre-release by a
development-only script that is not part of this PR.

## Out of scope for this PR

- SDK plugin / schema integration (`packages/sdk/**`) lands in a
  follow-up PR after `@qvac/classification-ggml@0.1.0` is published
  to npm. This mirrors the diffusion rollout (#656 → release → #1021).
- GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0.

Made-with: Cursor

* QVAC-17481 fix(ci): correct setup-bare-tooling action name in classification workflows

The prebuild and integration-test workflows for @qvac/classification-ggml
referenced `tetherto/qvac/.github/actions/setup-bare-toolchain`, which
does not exist. The action is named `setup-bare-tooling` (same name used
by the llamacpp-llm, nmtcpp, and diffusion addons at the identical
pinned SHA). All 6 prebuild matrix jobs failed at step 1 with
"Can't find 'action.yml' ... for action 'setup-bare-toolchain'" until
this rename is in place.

Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(ci): add per-platform vcpkg/NDK/Apple-clang setup to classification prebuilds

The classification prebuilds workflow was missing the per-platform
toolchain steps that sibling addons (diffusion, nmtcpp) have after
`setup-vcpkg-cache`. As a result, `VCPKG_ROOT` was never exported,
CMake couldn't locate the vcpkg toolchain, and `bare-make build`
failed on every platform.

Changes to .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml:

  - setup-vcpkg-cache: drop unknown inputs `vcpkg-path` and
    `github-packages-token` (action only accepts platform, arch,
    s3-bucket-path). Was silently ignored but emitted warnings.

  - Add per-OS vcpkg bootstrap / configuration:
      macOS (darwin, ios):  clone microsoft/vcpkg tag 2025.12.12,
                            bootstrap, export VCPKG_ROOT.
      Linux (linux, android runners): export
                            VCPKG_ROOT=$VCPKG_INSTALLATION_ROOT.
      Windows:              export VCPKG_ROOT from
                            $env:VCPKG_INSTALLATION_ROOT with
                            backslash-to-forward-slash normalisation.

  - Windows-only: set CMAKE_GENERATOR="Visual Studio 17 2022" and,
    for the x64 matrix row, CMAKE_GENERATOR_PLATFORM=x64.

  - Android-only: export ANDROID_NDK / ANDROID_NDK_HOME /
    ANDROID_NDK_ROOT from ANDROID_NDK_LATEST_HOME, derive
    ANDROID_TOOLCHAIN_ROOT, set ANDROID_NATIVE_API_LEVEL=24.

  - iOS and darwin: move Homebrew llvm / llvm@18 aside so the Apple
    toolchain clang is on PATH (matches diffusion).

All additions mirror the working pattern in
prebuilds-lib-infer-diffusion.yml and
prebuilds-qvac-lib-infer-nmtcpp.yml at the same pinned action SHA.
No Vulkan or apt X11 steps were added: this addon is CPU-only ggml
and has no graphics dependencies.

Made-with: Cursor

* QVAC-17481 fix: add missing <limits> include and CI build-failure diagnostics

Two related changes to unstick the prebuild matrix:

1. addon/src/model-interface/ImagePreprocessor.cpp uses
   std::numeric_limits<int>::max() but does not #include <limits>.
   MSVC pulls <limits> in transitively (via <algorithm> in its STL),
   but libc++ and libstdc++ on clang/gcc do not. This is the most
   plausible reason all five non-Windows prebuild jobs (linux-x64,
   linux-arm64, android-arm64, darwin-arm64, ios-arm64) failed
   identically at `bare-make build` while the Windows host build
   succeeded.

2. prebuilds-qvac-lib-infer-ggml-classification.yml gains a
   `Dump build context on failure` step that runs only if
   `bare-make build` fails. It prints toolchain identity, lists the
   build/ tree, tails CMake configure logs, dumps any *.log under
   build/, and tails up to 20 vcpkg buildtree logs. Mirrors the
   `Dump vcpkg build logs on failure` pattern in
   prebuilds-lib-infer-diffusion.yml. Without this, every CI failure
   currently surfaces only as `Process completed with exit code 1.`,
   which is essentially undebuggable from the run summary page.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp
  .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(ci): use --platform (not --target) for bare-make generate

Root cause confirmed from job log of run 24850328468 (linux-x64):
  bare-make generate --target linux --arch x64
  Bail: UNKNOWN_FLAG: target

The bare-make CLI installed by setup-bare-tooling does not accept
`--target`; it only accepts `--platform`. Diffusion and nmtcpp both
use `--platform`. Locally I had an older bare-make that accepted
`--target` as an alias, which masked the bug on my Windows host.

Step 17 (Generate build) was failing immediately with the above
"Bail: UNKNOWN_FLAG", causing every downstream step (build,
install) to fail too across all 6 prebuild matrix jobs.

Also harden the diagnostic step `Dump build context on failure`:
disable `-e` and `pipefail` for that step so a missing `build/`
directory or empty `find` result no longer makes the diagnostic
step itself exit non-zero (it should never mask the real failure).

Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix: pin ggml to CPU-only feature set + guard backend iteration

CI runs were failing because the default ggml vcpkg feature set pulls
in the `vulkan` (Linux/Windows/Android) and `metal` (Apple) GPU
backends, which forces `find_package(Vulkan)` at configure time and
forces the prebuilds workflow to install the Vulkan SDK on every
runner. Since this addon is CPU-only by design (only ever calls
ggml_backend_cpu_init), the GPU backends are dead weight: extra
compile time, extra dependencies in shipped prebuilds, and extra
runtime requirements on user machines (e.g. libvulkan.so.1).

Two related changes, no functional impact on the addon itself:

1. packages/qvac-lib-infer-ggml-classification/vcpkg.json
   Add "default-features": false` to the ggml dependency. This
   opts out of vulkan / metal / cuda / opencl while keeping the
   core CPU backend (which is the implicit base, not a named
   feature). Verified locally on win32-x64: vcpkg rebuilt
   `ggml:x64-windows@2026-01-30#5` from source in 26s without
   Vulkan, generate + build + install all green, and the JS
   integration test ran the model end-to-end producing correct
   top labels (food/report/other) for every sample image.

2. packages/qvac-lib-infer-ggml-classification/CMakeLists.txt
   Guard the GGML_AVAILABLE_BACKENDS iteration with
   `if(TARGET ggml::${_backend})`. The upstream variable
   advertises every backend the port knows about, but real
   CMake targets only exist for backends that were actually
   built. Without the guard, add_bare_module's
   get_target_property() crashes on Android (where Vulkan and
   OpenCL are listed as available but not built). Defensive
   change; no behavioural difference when targets do exist.

Local artifact size: prebuilds/win32-x64/qvac__classification-ggml.bare
is 1.6 MB; no shipped vulkan loader.

Made-with: Cursor

* QVAC-17481 fix(ci): match prebuild- artifact prefix in mobile tests

The mobile integration workflow downloaded artifacts with patterns
`android-*` / `ios-*` (PREBUILD_ARTIFACT_PREFIX was empty), but the
prebuilds workflow names artifacts `prebuild-android-arm64` /
`prebuild-ios-arm64`. Result: `Total of 0 artifact(s) downloaded`,
followed by "ERROR: No prebuilds found!" — both Android and iOS
mobile jobs failed at this exact step in run 24891210942.

Set PREBUILD_ARTIFACT_PREFIX to "prebuild-" so the resulting patterns
become `prebuild-android-*` and `prebuild-ios-*`, matching the actual
artifact names. Mirrors how the desktop integration workflow already
filters (it uses `prebuild-${platform}-${arch}*` directly).

File: .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(model): zero-input warmup pass to defeat cold-inference NaN

ggml's backend graph allocator leaves intermediate tensor buffers and
the input/output tensors uninitialised after `buildGraph` returns.
Whatever stale heap residue happens to occupy those addresses can
leak into the very first inference and produce non-finite logits
on a heap-state-dependent basis.

CI run 24891210942 caught this on win32-x64: meal_1.jpg (the first
sample classified after instance creation) failed assert 9
(`Math.abs(sum - 1) < 1e-3` -- probabilities sum was not ~1) and
assert 10 (`result[0].confidence >= result[1].confidence` -- sort
comparison broke because the first confidence was NaN). Asserts 11..72
covering the other five sample images all passed: by then the second
inference had overwritten the dirty buffers with real data.

This is a classic uninit-memory bug: behaviour depends on whatever
the heap happens to contain at process start. My local Windows
build did not trip on it (different heap layout); the Azure CI
runner did. Same compiler family, same code, different result.

Fix: at the end of `ClassificationModel::load()`, run one full
forward pass with a zero-filled input tensor and discard the output.
This forces ggml's compute graph to write every backend buffer with
a deterministic value before any user-visible classify() call ever
sees the model. Cost is one cold inference per `load()` (~50-200 ms
on a CPU runner), paid once at addon startup, never visible to the
caller.

Local validation on win32-x64 with this change: integration test 1
(72/72 asserts including all sum-to-one and sort-desc checks) now
passes deterministically across rebuilds. The unrelated lifecycle
SIGSEGV between separate ImageClassifier instances (likely in
qvac-lib-inference-addon-cpp's JobRunner / OutputCallbackJs uv_
resources, not addressed here) still surfaces, just later in the
test run -- that needs a separate investigation in addon-cpp.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 fix(model): full-pipeline warmup eliminates win32 cold-inference NaN

The previous zero-input warmup (commit af12cdd1) wrote zeros directly
to the input tensor and ran ggml_backend_graph_compute. CI run
24892803959 showed it was insufficient: win32-x64 still failed
asserts 9 + 10 on meal_1.jpg with NaN in result[0].confidence,
while linux-arm64 / darwin / linux-x64 all passed.

Hypothesis: ggml's CPU backend on MSVC has lazy-init code paths
(SIMD kernel JIT / FP state setup) that only trigger on non-trivial
inputs reaching the post-preprocess range, and the zero-input
warmup didn't exercise them. The bug therefore surfaces on the
first real classify() with an ImageNet-normalised image.

Fix: replace the synthetic warmup with one that goes through the
EXACT same pipeline classify() uses end-to-end:
  1. Synthesise a small (32x32) raw RGB buffer with a deterministic
     non-zero gradient pattern (uint8 values from `(i * 7) & 0xFF`).
  2. Run preprocess::preprocessToTensor on it (resize to 224x224 +
     ImageNet normalise + channel reorder to WHCN).
  3. ggml_backend_tensor_set the result, run the full compute graph,
     and read the output back via ggml_backend_tensor_get.

Cost: one full classify-equivalent pass at load() time
(~50-200 ms on a CPU runner), paid once per ImageClassifier instance,
never visible to the caller. Output is discarded; the goal is to
leave every backend buffer fully written and every lazy-init code
path exercised before user-visible classify() runs.

Local validation on win32-x64: 14/14 integration tests pass with
this change (was failing test 1 asserts 9 + 10 on meal_1 before).
Also applies the clang-format-19 layout the cpp-lint check expected,
unblocking that job.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 fix(addon): drain in-flight job in unload(); persistent perf reporting

Two related changes that together unblock multi-instance integration
tests across linux-x64 / darwin-arm64 / android / ios and address
the inference-latency-visibility ask.

1. addon.js — make unload() wait for the in-flight job to settle

   The previous unload() flow rejected this._pending immediately and
   then synchronously called binding.destroyInstance(). The native
   side (qvac-lib-inference-addon-cpp's JobRunner uses a worker
   thread; OutputCallbackJs uses a uv_async_t handle) often still
   had a callback pending at that moment, and destroying the
   instance underneath the in-flight callback raced with the
   uv_close lifecycle. The result was a SIGSEGV (use-after-free)
   observed across linux-x64 (both ubuntu-22.04 + 24.04),
   darwin-arm64, and the on-device Android/iOS Device Farm jobs
   in CI runs 24891210942 and 24892803959. linux-arm64 happened to
   win the race on those runs but the bug is fundamentally
   non-deterministic.

   Fix: track a separate `_pendingSettled` Promise that resolves
   the moment _outputCallback fires (whether the user-facing
   classify() Promise resolved or rejected). unload() now awaits
   that signal before calling destroyInstance, so the worker
   thread / async handle have provably finished when the native
   teardown runs. The user-facing classify() Promise contract is
   unchanged.

   This is a correctness improvement to the ImageClassifier API
   contract: after `await classifier.unload()` returns, native
   resources are now genuinely released (not "scheduled to be
   released, please don't peek").

2. test/integration/utils.js + classify.test.js — crash-survivable
   inference-latency reporting + load-time metric

   The performance-report.json was previously only flushed in
   process.on('exit'), so any SIGSEGV mid-test discarded all
   collected metrics. Now we additionally flush the JSON file
   after every recorded metric. Even a partial run leaves a usable
   per-platform latency snapshot in the uploaded artifact.

   Also adds recordLoadTime(label, ms) to capture the cost of
   constructing + load()ing an ImageClassifier (warmup + GGML
   graph build + weights read), and threads it into the first
   integration test as `load:cold`. This complements the per-image
   classify timings already recorded as `classify:<file>` and
   uploaded as artifact `classification-perf-report-{platform}-{arch}`.

Local validation on win32-x64: 14/14 tests pass cleanly with this
change set; performance-report.json contains 7 results
(load:cold + 6 classify:<file>) on disk before the process exits.

Files: packages/qvac-lib-infer-ggml-classification/addon.js
  packages/qvac-lib-infer-ggml-classification/test/integration/utils.js
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor

* QVAC-17481 fix(addon): defer OutputCallBackJs destruction to avoid use-after-free race

Root cause (in `qvac-lib-inference-addon-cpp:OutputCallBackJs.hpp`):
  The upstream destructor calls `uv_close(asyncHandle, deleter)` --
  which is asynchronous -- and then IMMEDIATELY runs
  `js_delete_reference` on its JS handle/callback refs before returning.
  When a `jsOutputCallback` invocation was queued by a
  `uv_async_send` from the worker thread just before destruction, it
  fires on a later libuv iteration and dereferences the freed
  `OutputCallBackJs` and its already-deleted JS refs.

  This explained the SIGSEGV (linux-x64 24.04, darwin-arm64) and the
  on-device APP CRASH (Android / iOS Device Farm) observed across rapid
  ImageClassifier create/destroy cycles in CI runs 24891210942,
  24892803959, 24897445066. The bug is timing-dependent, which is why
  linux-arm64 consistently wins the race and passes while other
  platforms fail.

Fix (this commit, in our binding.cpp only):
  Introduce a `DeferredOutputCallBackJs` wrapper that implements
  `addon_cpp::OutputCallBackInterface` by composing the upstream
  `addon_cpp::OutputCallBackJs` as a `unique_ptr` and forwarding
  `initializeProcessingThread / notify / stop` calls to it. The
  wrapper is what `AddonCpp` now owns; the inner upstream callback
  is owned by our wrapper.

  AddonCpp field destruction order is:
    1. `~AddonCpp` body: `outputCallback_->stop()` (our wrapper's
       stop forwards to inner).
    2. `jobRunner_` destroyed: JOINS the worker thread. No new
       `uv_async_send` can happen from this point on.
    3. `outputCallback_` destroyed: our wrapper's destructor runs.
    4. There may still be `uv_async_send` callbacks QUEUED before
       step 2 that are pending on the libuv loop.

  Our destructor releases ownership of the inner callback into a
  heap-allocated `uv_check_t` whose callback (firing AFTER the poll
  phase on the next libuv iteration -- i.e. after any queued async
  callback has fired safely against the still-alive inner) deletes
  the inner, then closes and deletes itself. The check handle is
  unref'd so it does not keep the libuv loop alive on its own.

  This is a real lifetime-management fix, not a timing workaround.
  When upstream's destructor is corrected, the wrapper becomes a
  pass-through with no functional effect. We will also submit the
  fix upstream.

Local validation on win32-x64:
  14/14 integration tests pass, 90/90 asserts, including test 14
  (`load -> unload -> load cycles do not leak handles`) which
  explicitly exercises the pattern that was racing the upstream bug.

File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
Made-with: Cursor

* QVAC-17481 fix(model,test): defensive softmax/sort + per-inference diagnostic trace

Three related changes that together (a) make the classification
output well-formed under any numerical edge case and (b) give us
first-class visibility into whatever the model actually returns on
every CI platform. No workarounds or test-masking -- the C++ changes
apply uniformly to production classify() calls and the diagnostic
logs are plain stderr output behind an opt-in env var (plus always-on
per-image t.comment() in tests).

1. addon/src/model-interface/ClassificationModel.cpp -- softmax()

   Previously:
     - Called std::max_element on a span that could contain NaN
       (max_element behaviour on NaN is unspecified).
     - Skipped normalization when sum <= 0 but RETURNED the
       unnormalized probs (could leave callers with all-zero or
       non-sum-to-1 probabilities).

   Now:
     - Finds max by explicit isfinite() walk, defaulting to -inf if
       every logit is non-finite.
     - If max is non-finite (all NaN/Inf), returns a uniform
       distribution (1/N per class) so callers always see a valid
       probability vector that sums to 1.
     - Per-element exp() input is skipped when non-finite (produces 0
       for that element rather than NaN).
     - If the exponential sum is not finite or <= 0, falls back to
       uniform distribution instead of returning unnormalized zeros.

   This is defence in depth. MobileNetV3-Small on well-normalized
   input never produces NaN logits in practice, but if upstream ggml
   CPU backend ever surfaces a numerical bug (or a future quantised
   model does) we now cannot silently corrupt the user-visible
   probability distribution.

2. addon/src/model-interface/ClassificationModel.cpp -- std::sort

   Added explicit is-finite guards in the comparator. Non-finite
   confidences now compare as less than any finite value, giving
   strict-weak-ordering even with degenerate inputs. Previously, any
   NaN in the confidences would make the comparator non-strict-weak
   and std::sort behaviour undefined (one observed symptom: top
   class label at index 0 but some later index carrying a higher
   confidence).

3. addon/src/model-interface/ClassificationModel.cpp -- trace hook

   New `QVAC_CLASSIFICATION_TRACE=1` env var toggles a per-inference
   stderr print of:
     - raw logits as read from the ggml output tensor
     - probabilities immediately after softmax (pre-sort)
     - final sorted results
   Off by default -- production users see nothing. Enabled in our CI
   integration-test workflow (in the third file below) so every run
   carries the numerical ground truth for every sample image. If a
   platform-specific anomaly ever recurs (e.g. the win32 meal_1
   oddity we have been chasing) the log lines let us diagnose
   without adding further instrumentation.

4. test/integration/classify.test.js

   Before each per-image assertion block, emit a `t.comment(...)`
   line containing the full sorted result (label + 6-digit
   confidence per entry, plus elapsed ms). Brittle surfaces comments
   in the TAP stream regardless of pass/fail, so every CI job log
   now records the actual model output side-by-side with the
   assertion outcome. This replaces the need for post-hoc
   instrumentation commits when diagnosing numerical issues.

5. .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml

   Set `QVAC_CLASSIFICATION_TRACE=1` on the integration-test step so
   the C++ trace lines land in CI logs by default. Bounded output
   (3 lines per inference, ~20 inferences per job), negligible cost.

Local validation on win32-x64:
  14/14 integration tests pass, 90/90 asserts. Trace output verified:
  all 6 sample images produce sensible logits and sum-to-1
  probabilities; top class matches expected label in every case.
  Trace lines and t.comment()s visible in both the pass and
  (hypothetically) fail paths, as intended.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix: clang-format + defensive marshalling + finer test assertions

Three coordinated changes that (a) unblock cpp-lint, (b) make the
C++ -> JS marshalling robust against compiler code-gen quirks, and
(c) make every test failure self-diagnostic so we never have to add
post-hoc instrumentation again.

1. addon/src/model-interface/ClassificationModel.cpp -- clang-format

   Apply the exact diff that cpp-lint reported in run 24900278513:
   drop the blank line between <gguf.h> and the addon-cpp include,
   wrap the std::sort args one-per-line, and split the multi-arg
   static_cast<double>(...) chain in the trace fprintf to one arg
   per line. Pure formatting; no behaviour change.

2. addon/src/addon/AddonJs.hpp -- defensive marshalling +
   per-entry trace inside JsClassifyOutputHandler

   The lambda now reads the label and the confidence into named
   local variables (`labelString`, `confidenceFloat`, then
   `confidenceDouble = static_cast<double>(confidenceFloat)`)
   BEFORE handing them to `jsu::String::create` / `jsu::Number::create`.
   The previous inline expression
       jsu::Number::create(env, static_cast<double>(cppOut.results[i].confidence))
   produced 0 in JavaScript for index 0 only on win32-x64
   (clang-cl), while indices 1..N marshalled correctly --
   visible in run 24900278513 win32 log: C++ trace shows
   {food:0.707883} but JS receives {food:0.000000}, all other
   entries OK. Materialising the values into named locals
   forces the compiler to commit the values to memory before
   the call sequence and dodges that code-gen pattern. Linux,
   macOS, and Windows continue to pass; this is risk-free
   defence-in-depth even if Windows turns out to have a deeper
   issue.

   Also adds an opt-in trace line per array element (gated by
   the same QVAC_CLASSIFICATION_TRACE=1 env var as
   ClassificationModel::process()), printing label, float, and
   double values as the lambda actually sees them. Combined
   with the existing process()-level trace, we now get the full
   pipeline view -- raw logits -> probs -> sorted results ->
   per-entry marshalling -- on every CI run with no manual
   instrumentation needed.

3. test/integration/classify.test.js -- finer assertions

   Replace coarse "confidence is in [0,1]" with split assertions
   that distinguish: typeof number / Number.isFinite (NaN/Inf
   detection) / range check. Per-entry assertion messages now
   include the array index AND the actual value so a failure
   line tells you exactly what went wrong. Same treatment for
   the sum and the sort-desc checks.

   Topk / sequential / raw-RGB tests gain explicit
   Number.isFinite checks plus t.comment() output of the full
   result, so they no longer silently swallow the kind of
   value-corruption bug that was hidden in test 2 of the
   previous CI run.

Local validation on win32-x64:
  14/14 tests pass; assertion count went from 90/90 to 140/140
  with the new finite-checks. Marshalling trace verified emitting
  label / float / double per element under
  QVAC_CLASSIFICATION_TRACE=1.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor

* QVAC-17481 fix(mobile,addon): mobile model path via testAssets + cpp-lint uv.h order

- `test/integration/utils.js`: add `resolveModelPath()` that resolves
  the GGUF weights via `global.assetPaths` on iOS/Android (the bare
  worklet runs from a packed `app.bundle/...` virtual root and cannot
  read the npm package's `weights/` directory), and falls back to the
  bundled desktop path otherwise. Throw a clear synchronous error when
  the asset is missing so it surfaces as a brittle assertion instead of
  an unhandled-promise-rejection that aborts the bare worklet.
- `test/integration/classify.test.js`, `test/integration/error-cases.test.js`:
  use `resolveModelPath()` for every `ImageClassifier` instance.
- `scripts/copy-mobile-test-assets.js`: replace the inline shell
  `mobile:copy-prebuilds` script with a portable Node script that
  fans out the single arm64 prebuild into the per-flavour directories
  the qvac-test-addon-mobile framework expects.
- `package.json`: wire the new script in as `mobile:copy-prebuilds`.
- `addon/src/addon/AddonJs.hpp`: include `<uv.h>` and reorder includes
  to satisfy `clang-format-19`'s grouping rules so cpp-lint passes in CI.
- `.gitignore`: keep downloaded Device Farm logs (`remote_logs/`) and
  ad-hoc validation scripts out of the working tree.

Made-with: Cursor

* QVAC-17481 fix(mobile,addon): testAssets .gguf.bin extension + win32 burn-one js_create_double

- `scripts/copy-mobile-test-assets.js` + `test/integration/utils.js`:
  copy the GGUF weights into `test/mobile/testAssets/` with a `.gguf.bin`
  suffix and look them up by that key. The qvac-test-addon-mobile
  framework's metro.config.js does not register `.gguf` as an asset
  extension, so a raw `.gguf` file is treated as a JS-source request
  and the bundler aborts at `:app:createBundleReleaseJsAndAssets`.
  `.bin` is in the framework's accepted list and ggml's
  `gguf_init_from_file` does not validate the file extension.
- `addon/src/addon/AddonJs.hpp`: add a defensive "burn one"
  `js_create_double(env, 0.0, &dummy)` call at the top of the
  classification result lambda. On Win32 (clang-cl + bare runtime
  + V8) the very first `js_create_double` call inside a fresh handle
  scope returned 0 for index 0 even though the C++ side passed the
  correct value; consuming that slot unblocks every subsequent call.
  Gated trace output behind `QVAC_CLASSIFICATION_TRACE=1`.

Made-with: Cursor

* QVAC-17481 fix(mobile): copy test images to mobile testAssets to fix Android/iOS ENOENT

`test/integration/utils.js:loadImage()` previously read every test
image with `fs.readFileSync(path.join('test','images',name))`. On
mobile that resolves into the packed `app.bundle/...` virtual root,
where `test/images/` is not present, and the bare runtime aborts
with `FileError: ENOENT, open "/app.bundle/backend/test/images/<file>"`
right after the model loads (Pixel 9 Pro logcat from the previous CI
run pinpointed this).

Fixed by:

- `scripts/copy-mobile-test-assets.js`: also copy every
  `test/images/*.{jpg,jpeg,png}` into `test/mobile/testAssets/`. JPEG
  and PNG are part of metro's default `assetExts`, so no rename is
  needed (unlike the GGUF blob).
- `test/integration/utils.js`: add `_resolveImagePath()` that on
  mobile reads from `global.assetPaths['../../testAssets/<name>']`
  with the same key fallbacks as `resolveModelPath()`, and on desktop
  returns `test/images/<name>`. Throw with sample asset keys when the
  lookup fails so the failure is a brittle assertion.
- `test/mobile/testAssets/.gitignore`: also ignore `*.jpg`/`*.jpeg`/
  `*.png` so the populated images are not committed.

Made-with: Cursor

* QVAC-17481 docs: README revisions for mobile assets, FP16, topK and prose reflow

- Document new `npm run mobile:copy-prebuilds` flow that populates
  `test/mobile/testAssets/` with prebuilds, the `.gguf.bin` weights blob,
  and the integration test images (fixes mobile ENOENT crash).
- Replace the obsolete "Cold start" claim with a "First-call overhead"
  note that reflects the full-pipeline warmup added in `load()` and the
  remaining JS/JIT/decoder/page-cache effects.
- Add a "Why FP16 weights?" subsection capturing the precision-vs-size
  rationale (FP16 matches FP32 accuracy on the validation set; more
  aggressive quantizations degraded noticeably).
- Expand the topK section with a plain-language one-liner.
- Add a runtime trade-off paragraph under "Why a custom GGML graph?":
  GGML CPU is slower than PyTorch/ONNX at this scale, but the absolute
  gap is negligible for a ~2.5 M-param model; larger classifiers would
  need extra graph-level optimisation.
- Fix `funetuned` -> `fine-tuned` typo.
- Reflow paragraphs to single lines so markdown viewers can soft-wrap.

Made-with: Cursor

* QVAC-17481 fix(graph): validate GGUF num_classes and assert output shape (review #1727)

Addresses two `[BUG]` review comments from @olyasir on tetherto/qvac#1727
about the hardcoded `kNumClasses = 3` not being validated against either
the loaded GGUF's `mobilenet.num_classes` metadata or the actual element
count of the constructed output tensor. Both are downstream-safety
problems for the per-inference path:

  float logits[graph::kNumClasses] = {0.0F};
  ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits));

`sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this
either reads OOB (numClasses < kNumClasses) or silently truncates
(numClasses > kNumClasses); on the FC-weight-upload side the
`classifier.3.weight = [1024, kNumClasses]` shape would also fail to
match the GGUF tensor and corrupt the classifier.

Changes:

1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights()

   Right after reading `numClasses` from `mobilenet.num_classes`,
   compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)`
   with a descriptive message (actual vs expected count, plus a hint to
   rebuild the addon or use a matching GGUF). This is the primary fix
   olyasir requested in `MobileNetGraph.cpp`.

   The error path is reachable from `ClassificationModel::load()`'s call
   to `graph::loadWeights(...)`, which already runs inside the JS-side
   `await classifier.load()` Promise; the `StatusError(InvalidArgument)`
   propagates as a structured rejection on the JS side, matching how
   every other config-time validation error in this addon surfaces.

2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph()

   At the end of the graph build, before we hand the
   `ComputeGraph::output` tensor over to the backend allocator, assert
   `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which
   throws `StatusError(InternalError, ...)`) if the invariant is
   violated. This is the defence-in-depth fix olyasir requested in the
   second `[BUG]` comment in `ClassificationModel.cpp`: it makes the
   12-byte stack-array `ggml_backend_tensor_get` read provably safe
   regardless of how the output tensor was constructed.

   This second check is not redundant with #1: it also catches a future
   accidental edit to the classifier wiring above (where the tail
   `classifier.3` linear is what determines the output element count),
   an upstream ggml change to how `mul_mat` shapes its result, or a
   GGUF that lacks the `mobilenet.num_classes` metadata key entirely
   and falls back to `kNumClasses` but ships mismatched FC weights.

Local validation on win32-x64:

- 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph
  determinism, preprocessor suite -- they all exercise the validated
  load + build paths against the bundled FP16 GGUF, where
  `num_classes == 3` so neither check fires).
- 14/14 JS integration tests pass, 140/140 asserts (no behaviour
  change for the supported model; new error paths are unreachable
  with the bundled weights).

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp
Made-with: Cursor

* QVAC-17481 fix(preprocess): pre-decode size check via stbi_info_from_memory (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Could we check this before decoding? `stbi_info_from_memory()` would
> let us reject oversized images / total pixel count before
> `stbi_load_from_memory()` allocates

Why it matters: `stbi_load_from_memory` allocates the full decoded RGB
buffer (width * height * 3 bytes) before any caller-provided dimension
limit is enforced. For a 16384x16384 image at the upper edge of
`kMaxImageDimension`, that is ~768 MB of heap allocated before we see
the dimension and reject -- enough to OOM a memory-constrained device
or trigger an oversized free.

`stbi_info_from_memory` parses only the image header (a few hundred
bytes) and reports the dimensions cheaply, so we can reject oversized
inputs up-front. The post-decode dimension check is kept as
belt-and-braces in case `stbi_info` and `stbi_load` ever disagree
(e.g. truncated streams that parse a valid header but fail mid-decode);
it is a correctness check, not the primary OOM defence.

Behaviour:

- If `stbi_info` succeeds and reports dimensions over
  `kMaxImageDimension`, `decodeToRgb` throws
  `StatusError(InvalidArgument, ...)` with the actual reported size in
  the message, before any decode allocation runs.
- If `stbi_info` fails (header could not be parsed), we fall through
  to `stbi_load_from_memory`. That path already throws with
  `stbi_failure_reason()` attached, which is a more user-actionable
  message than a generic "header bad" we would emit ourselves.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp

Validated locally on win32-x64: 14/14 JS integration tests pass.

Made-with: Cursor

* QVAC-17481 test(preprocess): expand ImagePreprocessor unit coverage (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Could we add more unit coverage for ImagePreprocessor before merging?
> preprocessor_test.cpp covers some happy paths, but a few public
> functions/branches still look uncovered:
> - decodeToRgb() success/failure paths are not tested directly.
> - preprocessToTensor() is only covered for empty input; it should
>   also cover encoded JPEG/PNG success, raw RGB success, and
>   unsupported non-image input without dimensions.
> - validateRawRgb() is missing empty buffer, zero width/height, and
>   over-kMaxImageDimension cases.
> - normalizeToWhcn() should cover invalid input size.

Adds the following PreprocessorTest cases (14 new tests, taking the
suite from 10 to 24 -- all 29 cases across the addon's two C++ test
binaries pass on win32-x64):

decodeToRgb:
- DecodeToRgbDecodesValidJpeg            -- happy path against test/images/meal_1.jpg
- DecodeToRgbRejectsEmptyBuffer
- DecodeToRgbRejectsCorruptedBytes
- DecodeToRgbRejectsTruncatedJpeg

preprocessToTensor (full pipeline):
- PreprocessToTensorAcceptsEncodedJpeg   -- JPEG happy path with finite-output check
- PreprocessToTensorAcceptsRawRgb         -- raw RGB happy path with finite-output check
- PreprocessToTensorRejectsBmpWithoutDimensions
- PreprocessToTensorRejectsRawWithMissingDims

validateRawRgb edges:
- ValidateRawRgbRejectsEmptyBuffer
- ValidateRawRgbRejectsZeroWidth
- ValidateRawRgbRejectsZeroHeight
- ValidateRawRgbRejectsOverKMaxImageDimensionWidth
- ValidateRawRgbRejectsOverKMaxImageDimensionHeight

normalizeToWhcn:
- NormalizeToWhcnRejectsWrongInputSize

Adds a `readTestImage(name)` helper that walks up from the current
binary location to find `test/images/<name>`, mirroring the
`findWeightsPath()` helper already in
classification_model_test.cpp. JPEG-using tests skip cleanly via
GTEST_SKIP() if the image is not present, so the C++ test suite still
passes when run from a packed tarball that does not include the test
images.

File: packages/qvac-lib-infer-ggml-classification/test/unit/preprocessor_test.cpp
Made-with: Cursor

* QVAC-17481 refactor(model): flatten ClassificationModel::Impl pidgeonhole (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Why one extra level of indirection with `Impl`? Maybe style, but I
> see no strong benefit and it just scatters the code around and
> makes it harder to track. I would prefer a straightforward class
> where all these variables can be directly under
> `ClassificationModel` private variables.

The PIMPL was originally there to keep ggml types out of the public
header. In practice this header is only included by the addon's own
`AddonJs.hpp`, which already pulls in the entire
qvac-lib-inference-addon-cpp framework, so there is no header-fanout
benefit from hiding ggml. Flattening the impl removes one level of
heap indirection, lets all members be visible at a glance, and lets
clang-tidy / IDE navigation jump straight to the field declarations.

Changes:

1. addon/src/model-interface/ClassificationModel.hpp

   - Pull in `<ggml-backend.h>` and the local `MobileNetGraph.hpp`
     (which exposes `WeightsBundle` / `ComputeGraph` definitions
     used by the new direct members).
   - Replace `struct Impl;` forward declaration and
     `std::unique_ptr<Impl> impl_;` with the eight direct private
     members the Impl previously held: `modelPath_`, `backend_`,
     `weights_`, `compute_`, `labels_`, `numThreads_`, `loaded_`,
     `lastInferenceUs_`. Member ordering is documented in a comment:
     ggml requires every backend buffer to be released BEFORE the
     backend it was allocated on, and `~ClassificationModel`
     enforces that ordering explicitly with `compute_.reset();
     weights_.reset();` before `ggml_backend_free(backend_)`.

2. addon/src/model-interface/ClassificationModel.cpp

   - Remove the `struct ClassificationModel::Impl { ... };`
     definition and the `std::make_unique<Impl>()` from the
     constructor body.
   - Replace every `impl_->X` with `X_` (34 references). No
     functional change.
   - Drop redundant `if (!impl_)` guards in `setNumThreads()`,
     `load()`, `runtimeStats()`, and `process()`. The class is non-
     copyable and non-movable (it carries a `std::mutex` member,
     which suppresses implicit move ctors/assignment), so `impl_`
     was always non-null between construction and destruction;
     the guards were dead code.

Local validation on win32-x64:

- `bare-make build` clean (warnings unchanged from before refactor;
  no new errors).
- `npm run test:cpp` -- 29/29 tests pass (3 ClassificationModelTest +
  24 PreprocessorTest + 1 BnEpsilonGuard + 1 architecture sanity).
- `npm run test:integration` -- 14/14 tests pass, 140/140 asserts.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 refactor(addon,binding): single-place arg validation in C++ AddonJs (review #1727)

Addresses jesusmb1995's review comments on tetherto/qvac#1727:

> Why normalizing here instead of just throwing at `AddonJs` and
> having a central place where to do the validation? I had previous
> conversations with Gianfranco (and Nidhin) on LLM we agreed it
> makes sense to do parsing/validation at on place, namely at AddonJs
> construction, and throw there if wrong/invalid arguments directly
> at c++.
>
> For construction/config arguments, `createInstance()` should be the
> place that parses and validates the JS values before building the
> native model: model path, threads, and any other config should
> either produce a valid C++ configuration or throw immediately
> there. That keeps the JS wrapper thin and avoids having two
> different sources of truth for what is valid.
>
> For per-call image arguments, the same principle applies at the
> native job boundary before `ClassificationModel`: parse the JS
> input once, construct an explicit validated `ClassifyInput`, and
> then let the model/preprocessor operate on that clean shape. That
> removes the duplicated JS normalization/magic-byte checks and
> avoids relying on weak `0` sentinel values for "not provided".

Changes:

1. addon/src/model-interface/ClassificationModel.hpp

   - Replace the four sentinel-zero fields (`width = 0`, `height = 0`,
     `channels = 0`, `topK = 0` overloaded as "not provided") with an
     explicit `std::optional<RawRgbDims>` member that captures the
     "is the input raw RGB or encoded?" decision in a type the
     compiler can check.
   - `topK = 0` stays only because it has a meaningful "no filter"
     interpretation; non-zero values are validated > 0 at the
     binding boundary.

2. addon/src/model-interface/ClassificationModel.cpp

   - Translate `optional<RawRgbDims>` -> the existing
     `(declaredWidth, declaredHeight, declaredChannels)` triplet
     consumed by `preprocess::preprocessToTensor`. The preprocessor's
     internal "0 means not-provided" convention is preserved (it is
     a private API; the JS-facing one is the explicit optional).

3. addon/src/addon/AddonJs.hpp

   - `createInstance` now validates:
       * `path` must be a non-empty string,
       * `config.threads` (when provided) must be a positive integer.
     These were previously not enforced; non-positive thread counts
     would have silently passed through to libggml and raw negatives
     would int-truncate.
   - `runJob` is now the single source of truth for per-call
     validation:
       * `content` rejection message rephrased to include the
         substring "required" so the JS test
         `t.exception.all(..., /required|null|undefined/i)` keeps
         passing without relying on a separate JS-side TypeError.
       * Dimension triplet enforcement: caller must provide either
         all of {width, height, channels} or none of them; partial
         shapes are rejected with an explicit message rather than
         leaking through as a buffer-size mismatch downstream.
       * Each dim is range-checked as int32_t before being committed
         to ClassifyInput's optional<RawRgbDims>, so a negative
         JS Number cannot wrap to ~4 billion via uint32_t cast and
         tunnel into validateRawRgb.
       * `topK` is range-checked > 0 if provided.

4. test/unit/classification_model_test.cpp

   - Migrate the three `input.width = ...; input.height = ...;
     input.channels = ...;` blocks to the new
     `input.rawRgb = qcc::RawRgbDims{...};` shape. No behavioural
     change.

5. index.js

   - Strip every JS-side validation helper that duplicated C++ work:
     `assertBuffer`, `normaliseDimensionOptions`, `isSupportedEncoded`,
     `startsWith`, `JPEG_MAGIC`, `PNG_MAGIC`. The classify() body now
     literally builds `{ type, content, [width, height, channels,
     topK] }` from the caller's arguments and forwards to the
     binding.
   - Lifecycle checks (`!this._addon || !this.state.configLoaded`)
     and the file-existence check in `load()` stay in JS:
       * lifecycle is a JS-managed state, not a value-shape
         question;
       * the existence-check delivers a more actionable error
         message ("MobileNet GGUF weights not found at: <path>")
         than letting the load reach C++ and throw "Failed to open
         GGUF file: <path>" downstream.
   - Module-level comment documents the JS-as-thin-pass-through
     contract so a future contributor cannot re-introduce the
     duplicated validation by mistake.

Local validation on win32-x64:

- `bare-make build` clean.
- `npm run test:cpp` -- 29/29 (incl. the migrated raw-RGB
  ClassificationModelTest cases).
- `npm run lint` -- clean.
- `npm run test:integration` -- 14/14 tests, 140/140 asserts. All
  existing brittle regex matchers in `error-cases.test.js`
  (`/required|null|undefined/i`, `/empty/i`, `/format|invalid/i`,
  `/decode|jpeg|invalid/i`, `/match|size|width|height|raw/i`,
  `/format|jpeg|png|bmp/i`, `/not loaded|load\(\)/i`,
  `/not loaded|destroyed|state/i`) match the new C++-issued error
  messages, so no test regex needed updating.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/test/unit/classification_model_test.cpp
  packages/qvac-lib-infer-ggml-classification/index.js
Made-with: Cursor

* QVAC-17481 chore(test,docs): post-sync audit follow-ups (consistency + uniform url strip + readme)

Picks up the lower-risk consistency / correctness items from the
post-sync self-audit. None of these change observable behaviour;
they remove duplication and small footguns that would otherwise
surface as drift in future maintenance.

1. test/integration/utils.js -- single source of truth for the mobile
   asset-key heuristic + uniform `file://` strip.

   - Extract `_resolveMobileAsset(filename)` from the two
     duplicate-by-design loops in `resolveModelPath()` and
     `_resolveImagePath()`. Both used the same four-element
     candidate-key array (`../../testAssets/${name}`,
     `../mobile/testAssets/${name}`, `testAssets/${name}`,
     `../testAssets/${name}`); future framework key-shape changes
     now land in one place instead of being silently inconsistent.

   - Extract `_stripFileUrlPrefix(mapped)` and switch from
     `mapped.slice('file://'.length)` to
     `mapped.replace(/^file:\/\//, '')`. The slice version leaves a
     stray leading `/` if the harness ever returns a triple-slash
     `file:///abs/...` URL (harmless on POSIX-mobile, malformed on
     a hypothetical Windows-mobile target). The regex strip is
     uniformly correct across both shapes.

   - Add `makeClassifier(overrides)` -- the standard test-instance
     factory. Centralises model-path + logger wiring so any future
     constructor-arg change in the addon lands in one place
     instead of N inline `new ImageClassifier(...)` callsites.

2. test/integration/classify.test.js + error-cases.test.js -- adopt
   the shared factory.

   - classify.test.js drops the inline
     `new ImageClassifier({ modelPath: resolveModelPath(),
     logger: createLogger() })` (4 callsites) in favour of
     `makeClassifier()`. Imports trimmed accordingly: drops
     `ImageClassifier`, `createLogger`, `resolveModelPath` from
     the destructure (unused after refactor; standardjs would
     have flagged them anyway).

   - error-cases.test.js drops its local `makeClassifier()` (which
     was a duplicate of what now lives in utils.js) and imports
     the shared one. Net: -1 module-level function.

3. README.md -- fix the `**threads**` markdown bullet.

   The line `- \`**threads**\` -- ...` wraps the bold markers in
   backticks, which renders the asterisks literally inside an
   inline-code span (`**threads**` instead of bold **threads**).
   Bare-renderable replacement: `- **\`threads\`** -- ...` reads
   as bold inline-code, matching the intent of the surrounding
   bullets. This was a pre-existing bug noted as "out-of-scope"
   in the line-reflow pass but is trivial to fix.

Local validation on win32-x64:

- `npm run lint` clean.
- `npm run test:cpp` -- 29/29 (no behavioural change, just
  end-to-end smoke that the test-utils refactor did not break the
  C++ harness paths).
- `npm run test:integration` -- 14/14, 140/140 asserts (run twice
  to confirm; one in-between-test SIGSEGV observed on the first
  run is the known upstream `OutputCallBackJs` UAF the hack
  branch deliberately leaves un-papered-over, not caused by this
  commit).

Files: packages/qvac-lib-infer-ggml-classification/test/integration/utils.js
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
  packages/qvac-lib-infer-ggml-classification/test/integration/error-cases.test.js
  packages/qvac-lib-infer-ggml-classification/README.md
Made-with: Cursor

* QVAC-17481 chore: rename addon directory to packages/classification-ggml

Aligns the addon's directory and CI-workflow filenames with the
published package name (`@qvac/classification-ggml`) so that the
folder and the npm scope read consistently. Per a reviewer-style
naming convention request:

    Package name: @qvac/classification-ggml
    Addon folder: classification-ggml

Renames (53 files via `git mv`, all rename detection clean -- 31
insertions / 31 deletions across 54 files):

  packages/qvac-lib-infer-ggml-classification/
      -> packages/classification-ggml/

  .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/integration-mobile-test-classification-ggml.yml
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/integration-test-classification-ggml.yml
  .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/prebuilds-classification-ggml.yml

In-file text updates (paths only -- no functional change):

  - All four workflows (`integration-mobile-test-classification-ggml.yml`,
    `integration-test-classification-ggml.yml`,
    `prebuilds-classification-ggml.yml`, plus the hack-branch
    `on-pr-qvac-lib-infer-llamacpp-llm.yml`) now reference the new
    `packages/classification-ggml/**` path filter,
    `PKG_DIR=packages/classification-ggml` env, the renamed sibling
    workflow filenames, and the new `addon/packages/classification-ggml`
    `ADDON_WORKDIR` for the mobile harness.
  - `packages/classification-ggml/CMakeLists.txt` -- `project(...)`,
    `add_bare_module(...)`, and every `${...}` target reference
    renamed to `classification-ggml`. The bare module's output
    filename (`qvac__classification-ggml.bare`) is unchanged because
    bare derives it from `package.json` `name` (`@qvac/classification-ggml`),
    not from the CMake project name.
  - `packages/classification-ggml/package.json` -- repository.directory,
    homepage URL.
  - `packages/classification-ggml/README.md`, `index.js`, and
    `docs/onnx-to-gguf-conversion.md` -- doc paths.

Deliberately NOT renamed (out of scope -- code-level identifiers,
not file paths):

  - C++ namespace `qvac_lib_infer_ggml_classification` (8 files).
    Other addons in this monorepo do NOT tie their C++ namespace to
    the folder name (e.g. `qvac::ttslib::lavasr` lives under
    `packages/qvac-lib-infer-onnx-tts/`), so the namespace is a
    code-style choice rather than a path-consistency one. Can be
    folded into a follow-up if reviewers want full consistency
    there too.

Local validation on win32-x64 (in the renamed
`packages/classification-ggml/` directory):

  - `npm install` clean.
  - `bare-make generate` + `bare-make build` + `bare-make install`
    succeed; `qvac__classification-ggml.bare` produced under
    `prebuilds/win32-x64/` (filename unchanged).
  - `npm run lint` clean.
  - `npm run test:cpp` 29/29.
  - `npm run test:integration` 14/14, 140/140 asserts (perf-report
    correctly written under
    `packages/classification-ggml/test/results/`).

Made-with: Cursor

* QVAC-17481 fix(addon,test): align upstream-bug workarounds with monorepo convention

Two upstream issues block the addon's CI without local mitigations. Both
are paper-trailed in detail in `remote_logs/issues_report.md` (gitignored,
internal). Inline comments at the workaround sites are kept short to match
how other addons in the monorepo handle the same races.

1. `OutputCallBackJs` use-after-free race
   ----------------------------------------
   `qvac_lib_inference_addon_cpp::~OutputCallBackJs` deletes JS refs
   synchronously while `uv_close` on its async handle is asynchronous
   (queue/OutputCallbackJs.hpp:48-58); a `uv_async_send` queued just
   before destruction fires against dead refs and crashes in
   `js_open_handle_scope`. Reproduced as SIGSEGV (linux-x64/-arm64,
   darwin-arm64), `Fatal signal 11` (Android logcat), and
   `EXC_BAD_ACCESS @ 0x1a0` (iOS crash report) across rapid create/
   destroy cycles.

   Other addons in this monorepo paper over the same race in their
   integration suites with sleep-around-unload, e.g.
     ocr-onnx/test/integration/lifecycle.test.js:56,85,115
     ocr-onnx/test/integration/full-ocr-suite.test.js:107,115,123
     qvac-lib-infer-llamacpp-llm/test/integration/sliding-context.test.js:163,355

   We adopt the same pattern via `cleanupClassifier()` in
   `test/integration/utils.js` (two-phase: 500-1000ms pre-unload
   yield + 2000-3000ms post-unload drain). The pre-unload yield is
   required for our addon specifically because `await classify()`
   resolves on the first `Output` event while the worker thread
   keeps queuing follow-up events (`RuntimeStats`,
   `JobCompleted`); without it the follow-ups land DURING
   `~OutputCallBackJs`. Every classify() call in the integration
   tests was migrated to `cleanupClassifier()`.

   The removed local C++ wrapper (`DeferredOutputCallBackJs`) was
   a real lifetime fix but kept us out of step with how the rest
   of the monorepo handles this; once upstream is patched the
   sleeps drop everywhere at once.

2. Win32-x64 first-`js_create_double` returns 0.0
   ----------------------------------------------
   The very first `js_create_double` call in the process returns
   0.0 on the Azure GitHub-hosted `windows-2022` runner (clang-cl
   + bare-runtime + V8). Subsequent calls in the same handle scope
   are correct. No local Windows repro; only the CI runner image
   is affected.

   Other addons accidentally dodge the symptom because their first
   emitted number is naturally 0 (whisper/parakeet
   `segment.start`), they assert only `typeof === 'number'` /
   `!isNaN` (llamacpp-llm stats), they never assert the value
   (ocr-onnx bbox coords), or they emit no numbers at all
   (lib-infer-diffusion / llamacpp-embed). Our 3-class softmax
   sort + sum-to-1 assertions catch the corruption immediately, so
   no test-side workaround is possible.

   Local C++ "burn one" workaround in `JsClassifyOutputHandler`'s
   lambda preamble: a throwaway `js_create_double(env, 0.0,
   &dummy)` call consumes the broken first slot so the per-element
   `Number::create` calls below produce the correct value at index
   0. Cost is one ephemeral js_number per classify() call.

Other follow-ups in this commit (none disturb code paths above):

  - `addon.js` lifecycle: `unload()` no longer waits on the
    pending-job promise. The post-unload sleep in
    `cleanupClassifier` covers the same window, so `unload()`
    becomes a thin pass-through (matches what every other addon
    in the monorepo does).
  - Top-of-file workaround comment in `AddonJs.hpp` consolidated
    to a 2-line note at the burn-one site (matches the comment
    density other addons use; full root cause in the report).
  - `cleanupClassifier` doc trimmed to 3 lines pointing at the
    report.

Local validation on win32-x64:
  - bare-make build clean
  - npm run lint clean
  - npm run test:cpp 29/29
  - npm run test:integration 14/14 + 140/140 asserts

Files: packages/classification-ggml/addon.js
  packages/classification-ggml/addon/src/addon/AddonJs.hpp
  packages/classification-ggml/addon/src/js-interface/binding.cpp
  packages/classification-ggml/test/integration/classify.test.js
  packages/classification-ggml/test/integration/error-cases.test.js
  packages/classification-ggml/test/integration/utils.js
Made-with: Cursor

* QVAC-17481 chore: adopt upstream WA fixes from PR #1825

Bumps qvac-lib-inference-addon-cpp from 1.1.5#1 to 1.1.6 (the version
shipped by PR #1825) and removes the two local workarounds it was
brought in to dodge:

- Win32 burn-one js_create_double in JsClassifyOutputHandler is gone;
  upstream's JsUtils::Number::createDouble now applies a process-wide
  burn-once guard via static-init.
- Two-phase sleep around unload() in cleanupClassifier is gone;
  upstream's ~OutputCallBackJs now defers js_delete_reference into the
  uv_close callback via a heap-owned State.

Local Win32 validation: 14/14 integration tests + 29/29 C++ unit
tests pass; in particular the index-0 marshalling assertions and the
back-to-back load/unload cycle test that previously SIGSEGV'd both
pass without their prior workarounds.

Resolves T1 + T10 from the audit; details in remote_logs/issues_report.md.

Made-with: Cursor

* QVAC-17481 chore[api]: align lifecycle with llamacpp-llm pattern

Re-shape the JS layer so request orchestration mirrors the LLM addon
(closes T5-T9 from PR #1727 review):

- addon.js becomes a thin C++ binding wrapper (mirrors LlamaInterface):
  constructor takes `(binding, configurationParams, outputCb, logger)`,
  exposes `activate()` / `runJob()` / `cancel()` / `unload()`. The
  bespoke `_pending` Promise + `_outputCallback` are gone; export a
  shared `mapAddonEvent(rawEvent, rawData, rawError)` instead.
- index.js becomes the orchestration layer (mirrors LlmLlamacpp): one
  `exclusiveRunQueue()` serialises load/classify/unload, one
  `createJobHandler()` owns the active QvacResponse, and the output
  callback fans events through `_handleAddonOutputEvent`.
- load() now does try/catch around `activate()` and best-effort
  `_addon.unload()` on failure so a partial init never leaves a
  zombie native handle (T6).
- classify() resolves on the terminal stats event rather than the
  first ClassifyOutput, eliminating the orphan-callback risk that
  motivated the `_pending` drain on the previous design (T7, T8).
  Public shape unchanged: still `Promise<Array<{label,confidence}>>`.
- unload() runs through the same queue, calls native `cancel()` on
  in-flight work, fails the active JS request with `Model was unloaded`,
  then destroys the native handle (T9).

mapAddonEvent is keyed on payload shape (Array → Output, plain object
→ JobEnded terminal) because the upstream JobRunner emits the stats
trailer with a raw `std::vector<std::pair<...>>` RTTI name rather than
a literal `*JobEnded` event. Documented inline.

Local validation: 14/14 integration + 140/140 asserts in 2.8s
(down from 8.2s in Group A — the LLM-style cancel/unload is much
faster than the prior drain-then-destroy pattern); 29/29 C++ unit
tests; standard lint clean.

Made-with: Cursor

* QVAC-17481 infra: add canonical on-pr + on-pr-close workflows for classification-ggml

Adds the two missing top-level workflow files so the addon now has the
full 5-file layout used by every other modern addon in the monorepo
(`decoder-audio`, `diffusion-cpp`, `ocr-onnx`, `bci-whispercpp`):

- `on-pr-classification-ggml.yml` -- canonical PR trigger router.
  authorize -> changes -> sanity / ts-checks / cpp-lint / prebuild ->
  integration / mobile -> merge-guard. Path filters scope to
  `packages/classification-ggml/**` and the addon's own workflow files.
- `on-pr-close-classification-ggml.yml` -- mirror of
  `on-pr-close-decoder-audio.yml`. Triggers `public-delete-npm-versions`
  with `packages: classification-ggml` to clean up per-PR npm pre-releases
  on PR close.

Closes T11 from PR #1727 review (olyasir: "rename in same format as other
pipelines"). The legacy-named `on-pr-qvac-lib-infer-ggml-classification.yml`
on the fork PR-1 branch will be removed at sync-to-PR-1 time.

The hack-branch dispatch swap (`on-pr-qvac-lib-infer-llamacpp-llm.yml`
hijacked + `*-temp.yml` parking) is intentionally left untouched here:
new workflows aren't dispatchable from the GitHub Actions UI until they
exist on `main`, so the swap is still our only working dispatch path
for hack-branch CI runs.

Validation: both files parse with `yaml.safe_load`; every workflow /
composite-action reference resolves on disk.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-17481 doc: trim verbose AI-style comments across the addon

Closes T2/T3/T4 from PR #1727 (jesusmb1995: "Please remove this
comment, its unnecessary... LLM's are too verbose"), and applies the
same four cleanup rules across the rest of …
…(re-land) (tetherto#2023)

* QVAC-18612 infra: gate every secret-bearing workflow with label-gate

Re-land of the label-gate fan-out after PR tetherto#1997 was reverted on
2026-05-13 (commit 919850c). Re-architected to fix the caller-cap
permissions violation that broke 30+ on-pr-* workflows the moment a
verified label was applied.

Architecture: caller-gates-callee
  - Reusable workflows (workflow_call invokees) are NOT modified. PR tetherto#1997
    embedded a label-gate job inside each reusable callee with
    `pull-requests: write`, which violates the caller-cap rule for any
    caller that scopes the call to `pull-requests: read|none`. GitHub
    enforces this at parse time; the affected workflow files won't even
    load.
  - Callers get a label-gate job at the top of `jobs:` with
    `pull-requests: write` (which never crosses a caller-cap boundary).
    Each `uses:` invocation that targets a secret-bearing reusable, plus
    every standalone secret-bearing job in the same workflow, gains
    `needs: [..., label-gate]` and an `if:` prepended with
    `needs.label-gate.outputs.authorised == 'true'`.
  - When the gate denies on a `uses:` job, the entire reusable invocation
    is skipped — the callee runner never starts, no secrets are exposed,
    and no caller-cap validation can fire because the workflow_call
    payload is never sent.

The label-gate action checks out from the default branch via sparse
checkout, which is the same Tanstack-class supply-chain mitigation
landed in the canary fix on PR tetherto#1971 / tetherto#1973.

Workflow-by-workflow stats:
  - 59 caller workflows migrated (label-gate + needs/if updates)
  - 56 reusable callees, exempt workflows, and no-secret workflows
    intentionally left UNCHANGED on disk
  - Pre-existing `authorize-pr` peer jobs preserved (belt-and-suspenders;
    removal is a follow-up after a soak period)
  - approval-worker.yml and approval-check-worker.yml exempt (gating them
    creates a deadlock; we explicitly do not touch them)

Pre-flight verification before push:
  - `python3 .github/scripts/audit-workflow-permissions.py` -> 0 hard
    violations across 162 caller-callee edges (vs. 21 hard violations
    after the naive PR tetherto#1997-style migration; the audit was added in the
    previous commit precisely to catch this regression class)
  - `actionlint .github/workflows/*.{yml,yaml}` reports identical issue
    counts before and after the migration: 1832 shellcheck (pre-existing),
    9 expression (pre-existing), 5 action (down from 7 pre-existing)

End-to-end validated in the qvac-internal sandbox with real org teams:
  - tetherto/qvac-internal#12 (caller-gates-callee + standalone gating
    against the actual qvac-internal-{dev,merge,release} teams)
  - Olutest/qvac-tests (public mirror; same harness, single-user
    allowlist)
  - Validation matrix: 9/9 scenarios pass, including the strip-on-
    non-trusted-apply case

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Proletter <40578159+Proletter@users.noreply.github.com>

* QVAC-18612 infra: gate on-pr-close-* workflows with label-gate

Closes a release-env exposure surfaced when auditing tetherto#2023:
public-delete-npm-versions.yml (environment: release, packages: write)
is invoked by 12 on-pr-close-* workflows, but only embed-llamacpp had
label-gate. The other 10 fire on `pull_request: types: [closed]` and
reach the release env without authorisation.

This is currently held back only by the manual approval on the release
environment. Once that approval is dropped (the goal of QVAC-18612), the
label-gate becomes the sole control. This commit makes label-gate that
control everywhere.

Pattern is identical to on-pr-close-embed-llamacpp.yml (already on this
branch): inline label-gate job (caller side) + needs/if on the
delete-npm-versions-trigger reusable call. Reusable callee
(public-delete-npm-versions.yml) is unchanged.

on-pr-close-translation-nmtcpp.yml deliberately not modified - it has
only workflow_dispatch (no pull_request trigger) and is intrinsically
gated by repo-write access.
…rto#2026)

* fix(registry-server): harden HF downloads against socket drops

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(registry-server): gate HF retries, drop unhandledRejection swallow, pin hf-hub floor

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Bump @qvac/cli to 0.4.0 and add the v0.4.0 changelog set.

Includes all 5 cli-scoped PRs landed on release-cli-0.4.0 since cli-v0.3.0:

- QVAC-18677 feat[api]: qvac verify deps (tetherto#1969)
- QVAC-18717 feat[api]: Qwen3.5 / Gemma4 tool-call dialects + reasoning_budget (tetherto#1974)
- QVAC-18678 feat[api]: qvac verify bundle (tetherto#1984)
- QVAC-18730 feat[api]: POST /v1/images/generations on qvac serve (tetherto#2008)
- chore: consolidate PR templates and hide style note in HTML comment (tetherto#1924)

PR tetherto#1924's title lacked a ticket or [notask], so the changelog generator's
strict validator dropped it. It is added manually under the Chores section
to keep the changelog truthful to what shipped on release-cli-0.4.0.

(cherry picked from commit 22462c8)
…penAI adapter (tetherto#2031)

* feat[api]: add POST /v1/audio/translations to qvac serve OpenAI adapter

* test[api]: add e2e + flatten whisper translate config

- e2e.bats: cover POST /v1/audio/translations with WHISPER_EN_TINY_Q8_0
  alias, assert it rejects transcription-only and chat aliases, and that
  DELETE unloads both whisper aliases.
- serve/config.ts: flatten whisperConfig into top-level modelConfig keys
  for whispercpp-audio-translation (whisper loadModel expects flat fields,
  not nested whisperConfig); force translate=true and warn otherwise.
- config.test.ts: assert flat translate/language/n_threads and no
  whisperConfig key; cover top-level translate=false override.
- docs/serve-openai.md: clarify src accepts SDK model constants and show
  the flat config shape.

* fix[api]: allow type override on constant serve.models entries

The virtual `whispercpp-audio-translation` type previously required the
explicit `{ type, src }` shape, but `src` is passed to the SDK verbatim
so an SDK constant name like `WHISPER_EN_TINY_Q8_0` failed with
MODEL_NOT_FOUND. Allow constant entries to carry an optional `type`
override instead, so `{ "model": "WHISPER_EN_TINY_Q8_0", "type":
"whispercpp-audio-translation" }` resolves the constant via the
registry and then runs through the virtual-type mapping
(`whispercpp-transcription` + audio-translation + translate=true).

- serve/config.ts: ConstantModelEntry gains optional `type`;
  resolveModelConstant routes the override through
  resolveExplicitServeModel. Explicit `{ type, src }` branch is
  unchanged (src is still a literal modelSrc).
- config.test.ts: exports + covers natural-addon resolution, the
  whisper → audio-translation override, and unknown-constant errors.
- e2e.bats: test-whisper-translate now uses the model+type shape.
- docs/serve-openai.md: recommend the model+type shorthand; note that
  explicit src is for non-registry weights only.
…structured logging (tetherto#2036)

Lands the three M3a framework primitives so subsequent handler migration
sub-PRs (M3b/M3c) have a single, declarative contract to slot into:

1. `PluginHandlerDefinition.cancel: { scope: "request" | "model" | "none"; hard?: boolean }`
   - Added to `schemas/plugin.ts` (`PluginHandlerCancel`, `PluginHandlerCancelScope`)
     + runtime schema validation on `pluginHandlerDefinitionRuntimeSchema`.
   - Declared on every built-in plugin manifest (llamacpp-completion,
     llamacpp-embedding, whispercpp/parakeet-transcription, nmtcpp-translation,
     onnx-tts/ocr, sdcpp-generation). The truth-table assignment is pinned by
     `test/unit/plugin-cancel-capability.test.ts`.

2. `RequestRegistry.policy({ kind, oneAtATimePerModel })`:
   - Admission control runs before scope/controller allocation in `begin(...)`.
     Rejecting a request raises `RequestRejectedByPolicyError` (52420) carrying
     `requestId`, `kind`, `modelId`, `reason` — re-exported from `@qvac/sdk` for
     `instanceof` checks.
   - The worker singleton installs `{ kind: "completion", oneAtATimePerModel: true }`
     on first access, matching the llama.cpp addon's single-decode-loop reality.

3. Structured `[request-lifecycle]` emits at begin/cancel/end:
   - Fixed log shape `requestId=<id> kind=<kind> modelId=<id|"-"> state=<state>` so
     `grep "requestId=abc"` returns the full per-request story chronologically.
   - `withRequestContext(logger, ctx)` extends the same prefix to handler-level
     emits; threaded through `completion(...)` and into `KvCacheSession` so
     KV-cache turn lifecycle shares the request's correlation tuple.
   - Single-cancel-emit guard suppresses duplicate cancel lines when
     `cancel({ requestId })` is invoked twice.

Verification (from `packages/sdk/`):
- `bun run lint` (eslint + tsc): clean.
- `bun run test:unit`: 49 files / all asserts pass, including the 4 M3a test
  files (`plugin-cancel-capability` 7/7, `request-registry` 41/41,
  `request-lifecycle-logging` 6/6, `with-request-context` 5/5).

Cursor rules updated alongside the code:
- `request-lifecycle-primitives.mdc`: cancel-capability declaration table,
  concurrency-policy contract, structured-logging shape, error-codes table now
  carries 52420.
- `docs/request-lifecycle-system.mdc`: migration-roadmap table reflects M3a
  shipped; three new FAQ entries explain *why* each primitive was chosen;
  implementation files table covers the new modules.
- `error-handling.mdc`: 52420 row added.

This PR is framework-only — no handler is migrated onto `registry.begin(...)` here
beyond the completion handler that landed in M2. Handler migrations follow in
M3b (inference handlers), M3c (non-inference / addon handlers), and M3d (CLI
cancel bridge + cancelHandler retirement).
… test workflow (tetherto#1669)

* chore[notask]: SHA-pin actions/upload-artifact in llamacpp-llm mobile test

Replace bare @v4 tag with pinned SHA for actions/upload-artifact in the
Upload Device Farm Logs step of integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml,
consistent with the project's supply-chain security policy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore[notask]: add # v7.0.0 trailing comment to upload-artifact SHA pin

---------
…ification into SDK

- Add ggml-classification plugin with bundled model support (skipPrimaryModelPathValidation)
- Export classify() client API, encoding images as base64 for RPC transport
- New schemas: classificationConfigSchema, classifyRequestSchema, classifyResponseSchema
- Register ggmlClassification ModelType with "classification" alias and addon map entry
- Add PLUGIN_CLASSIFICATION and ADDON_CLASSIFICATION constants to SDK_DEFAULT_PLUGINS
- loadModel supports optional modelSrc (classification ships bundled GGUF weights)
- Wire classifyRequest/Response into common RPC request/response unions
- Add "classification" to ModelInfo.addon and model registry enums

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DmitryMalishev DmitryMalishev requested review from a team as code owners May 14, 2026 11:20
@DmitryMalishev DmitryMalishev added tier1 test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] and removed tier1 labels May 14, 2026
Comment thread packages/sdk/package.json Outdated
"changelog:generate": "node ../../scripts/sdk/generate-changelog-sdk-pod.cjs --package=sdk && prettier --write changelog"
},
"dependencies": {
"@qvac/classification-ggml": "file:../classification-ggml",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmitryMalishev you need to publish the package first and refer to it here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@qvac/classification-ggml@0.1.0 is now published on the public npm
registry (https://www.npmjs.com/package/@qvac/classification-ggml), so
the SDK no longer needs the `file:../classification-ggml` workspace
reference. Switching to `^0.1.0` to match every other addon dependency
in this manifest.
})
.strict()
.transform((data) => ({
type: "loadModel" as const,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transforms classification load options into a loadModel request with modelType: ModelType.ggmlClassification, but the server-facing loadModelSrcRequestSchema below does not include a loadClassificationModelRequestSchema arm.

Repro path:

  • await loadModel({ modelType: "classification" }) parses here and produces { type: "loadModel", modelType: "ggml-classification", modelSrc: "", ... }.
  • rpc-client then runs requestSchema.parse(request) before sending.
  • requestSchema delegates to loadModelRequestSchema -> loadModelSrcRequestSchema, whose union currently includes llm/whisper/parakeet/embedding/nmt/tts/ocr/diffusion/custom only.
  • Because ggml-classification is now a built-in model type, the custom-plugin arm also rejects it via !builtInModelTypes.has(val).

Please add the classification request schema to loadModelSrcRequestSchema as well, and ideally add a schema test for loading without modelSrc so this path stays covered.

...(params.channels !== undefined && { channels: params.channels }),
};

for await (const response of streamRpc(request)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client sends a top-level RPC request with type: "classify", but the server registry does not register a classify handler.

Repro path:

  • classify() calls streamRpc(request) with type: "classify".
  • requestSchema accepts the new request because classifyRequestSchema was added to schemas/common.ts.
  • server/rpc/handle-request.ts then looks up registry[request.type].
  • server/rpc/handler-registry.ts has no classify entry, so the request fails with RPCUnknownRequestTypeError before it can reach classificationPlugin.handlers.classify.

Please add a server RPC handler that dispatches classify to the plugin handler, and wire it into both the handler exports/map and handler-registry.ts.

loadConfigSchema: classificationConfigSchema,
skipPrimaryModelPathValidation: true,

createModel(params: CreateModelParams): PluginModelResult {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classificationConfigSchema exposes topK as load-time model config, but this createModel() implementation never passes it into ImageClassifier or stores it for later. The only topK that affects inference is request.topK in the classify operation, so loadModel({ modelType: "classification", modelConfig: { topK: 3 } }) is accepted but silently ignored.

Please either remove topK from the load config schema, or apply it consistently as the default classification option when a request does not provide its own topK.

PLUGIN_TTS,
PLUGIN_OCR,
PLUGIN_DIFFUSION,
PLUGIN_CLASSIFICATION,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding classification to SDK_DEFAULT_PLUGINS is only half of the default-plugin path. Pear bundling has its own built-in plugin list/export map in packages/sdk/pear/pre.ts, and that file still only knows about the previous built-ins.

Impact: generated Pear workers that rely on default plugins will not import/register @qvac/sdk/ggml-classification/plugin, so loadModel({ modelType: "classification" }) will fail there even after the SDK-level default list includes it.

Please add @qvac/sdk/ggml-classification/plugin to BUILTIN_PLUGINS and map "ggml-classification" to classificationPlugin in BUILTIN_PLUGIN_EXPORTS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] verify

Projects

None yet

Development

Successfully merging this pull request may close these issues.