Skip to content

chore(qvac-cli): testing pr-request-trigger#1

Merged
Proletter merged 2 commits into
mainfrom
qvac-cli-integration
Jan 8, 2026
Merged

chore(qvac-cli): testing pr-request-trigger#1
Proletter merged 2 commits into
mainfrom
qvac-cli-integration

Conversation

@Proletter

Copy link
Copy Markdown
Collaborator

No description provided.

@Proletter

Proletter commented Jan 8, 2026

Copy link
Copy Markdown
Collaborator Author

/review

@Proletter Proletter merged commit 30e99ec into main Jan 8, 2026
maxim-smotrov added a commit that referenced this pull request Feb 28, 2026
Try #1. Adding tokenizer proxy to provide vocab size.
maxim-smotrov added a commit that referenced this pull request Mar 1, 2026
maxim-smotrov added a commit that referenced this pull request Mar 4, 2026
gianni-cor added a commit that referenced this pull request Mar 5, 2026
* Try #1. Adding tokenizer proxy to provide vocab size.

* Try #2. More fixes and logs.

* Try #3. Limit device to only cpu or gpu.

* Revert "Try #2. More fixes and logs."

This reverts commit a461e69.

* Revert "Try #1. Adding tokenizer proxy to provide vocab size."

This reverts commit 9951195.

* Fixing pipeline logging

* Add more logs

* Fixing bench logging

* Add more error handling and logging

* Improve error handling on the server. Added retry in case of context overflow.

* Make retries self-adjustable

* Adding some more checks and limiting the datasets temporarily

* Test: trying to narrow down the error

* Exclude failing datasets from embed benchmark

* Clean up the code

* Changing bench model for LLM

* Try #1. Adding tokenizer proxy to provide vocab size.

* Try #2. More fixes and logs.

* Try #3. Limit device to only cpu or gpu.

* Revert "Try #2. More fixes and logs."

This reverts commit a461e69.

* Revert "Try #1. Adding tokenizer proxy to provide vocab size."

This reverts commit 9951195.

* Fixing pipeline logging

* Add more logs

* Fixing bench logging

* Add more error handling and logging

* Improve error handling on the server. Added retry in case of context overflow.

* Make retries self-adjustable

* Adding some more checks and limiting the datasets temporarily

* Test: trying to narrow down the error

* Exclude failing datasets from embed benchmark

* Clean up the code

* Changing bench model for LLM

* Minor fixes for clarity

* Removing unused vars

* Removing unused imports

* Removing unused python deps

---------

Co-authored-by: gianni <gianfranco.cordella@tether.io>
donriddo added a commit to donriddo/qvac that referenced this pull request Apr 17, 2026
Previous commit 979a070 reworded only my own addition (line 251) but
the block still failed at the same position because the surrounding
pre-existing message bodies still used ; as a statement separator.
Mermaid sequenceDiagram parses ; as end-of-statement, so every message
containing it broke the diagram.

Replace ; with , or a separator word across all four affected lines
(block tetherto#1 lines 251, 256, 266 and block tetherto#2 line 296) so the finetune
and pause flow diagrams render on GitHub.
gianni-cor added a commit that referenced this pull request Apr 17, 2026
…ightsProvider (#1494)

* chore[bc]: remove BaseInference inheritance and WeightsProvider from LLM addon

Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization
- Direct shard streaming via bare-fs instead of WeightsProvider

Constructor now takes { files: { model: string[], projectionModel?: string }, config, logger, opts }
instead of { loader, diskPath, modelName, projectionModel } + config.

All finetune, media, and filtered logger functionality preserved.

* fix: correct FinetuneProgress and finetune terminal handling in output callback

FinetuneProgress must call updateStats(data.stats), not updateOutput(data).
Finetune terminal JobEnded must call ended(data) as result, not updateStats.

* fix: update all LLM examples and model-loading test to new constructor shape

Update 13 examples and sharded model test to use files: { model: [...] } pattern.
Remove FilesystemDL dependency from all examples and tests.

* fix: update sharded model test to download shards to disk first

The network loader test used the old loader-based constructor.
Rewritten to download shards via HttpDL to disk, then pass absolute paths.

* fix: update LLM benchmark tooling to new constructor shape

* fix: update LLM perf benchmark sweep and judge to new constructor shape

* docs: update LLM README, finetuning, and afriquegemma docs for new constructor

* fix: update LLM prepare-prompts and verify-prompts to new constructor

* fix: update LLM finetuning unit tests to new constructor and exclusiveRunQueue

* docs: update LLM architecture, data-flows, finetuning, README sharded contract

* docs: align LLM finetuning docs and mobile README with new constructor

* chore[bc]: address PR #1494 review findings and bump to 0.15.0

Bumps `@qvac/llm-llamacpp` to `0.15.0` per the addon-changelog
process — minor bump on a pre-1.0 package signals the breaking
constructor change to consumers using semver ranges. Adds the
matching `0.15.0` block to `CHANGELOG.md` documenting the new
single-object constructor with `files`, the removal of
`BaseInference` + `WeightsProvider`, the dropped `destroy()`
method, the dependency churn, and every behaviour change in this
release.

Hardens the JS layer based on the review:

- Constructor now throws a clear `TypeError` when `files` /
  `files.model` is missing or empty, instead of crashing with an
  opaque "cannot read properties of undefined" later.
- `_runInternal` now throws "Addon not initialized. Call load()
  first." when invoked before `load()`, matching `finetune()` and
  the diffusion addon.
- `_load()` wraps `_streamShards` + `addon.activate()` in a
  try/catch that best-effort-unloads the partially-initialized
  native instance and resets `this.addon = null` so a subsequent
  `load()` does not leak a zombie addon.
- `createJobHandler({ cancel })` closure uses optional chaining so
  a stale `response.cancel()` after `unload()` is a no-op rather
  than a `TypeError`.
- `unload()` sets `this.addon = null` after `addon.unload()`, so
  the new `if (!this.addon)` guard in `_runInternal` is also
  effective post-unload.
- `pause()` and `cancel()` re-add the defensive `?.cancel` check.
- The `_load()` primary-path selection now picks the first entry
  matching the shard regex, replacing the fragile `[length - 1]`
  index. This stays compatible with the documented sharded order
  (`tensors.txt` first, shards second) and with the non-sharded
  single-file path; an inline comment explains the contract.
- The `_handleAddonOutputEvent` error log line now passes the
  `Error` object directly so loggers can format the full stack.

Drops dead `_isSuppressedNoResponseLog` /
`_createFilteredLogger` / `_originalLogger` plumbing. Those
existed to swallow `'No response found for job'` warnings emitted
by the old `BaseInference._jobToResponse` Map; the new
`createJobHandler`-based architecture cannot emit that message,
so the filter, the wrapped logger, and the `_originalLogger`
indirection are all gone. The user-supplied logger is now used
directly.

Restores JSDoc on every `FinetuneOptions` field in `index.d.ts`,
including default values (`numberOfEpochs = 1`,
`learningRate = 1e-4`, `batchSize = 128`, …) so IDE tooltips show
them without needing to read `docs/finetuning.md`.

* refactor: move LLM C++ event normalization into addon.js

Per the team-2 task doc (`TD-ADDON-INTERFACE-LLM-EMBED-SD.md`,
LLM section): "Move event name normalization from `index.js`
`_addonOutputCallback` into `addon.js` `LlamaInterface` — the
native binding wrapper should own the mapping from raw C++ events
to Output / Error / JobEnded / FinetuneProgress."

Adds `mapAddonEvent(rawEvent, data, error, state)` as a free
export from `addon.js`, co-located with `LlamaInterface`. The
function normalizes the C++-mangled event vocabulary into one of
`Output` / `Error` / `JobEnded` / `FinetuneProgress`, including:

- TPS-shaped runtime stats → JobEnded with `backendDevice`
  mapped from `0/1` to `'cpu'/'gpu'`.
- Finetune terminal payloads (`{op:'finetune', status, stats?}`)
  → JobEnded carrying the finetune payload, and arms the skip
  flag so the trailing TPS stats from the finetune are not
  dispatched as a fresh inference terminal.
- `finetune_progress` payloads → FinetuneProgress.
- Anything else with an `Error`-flavored event name → Error.
- String payloads → Output.

`LlmLlamacpp._addonOutputCallback` becomes a thin shim that
imports `mapAddonEvent`, hands it the per-instance state object
(now `this._addonEventState = { skipNextRuntimeStats }` instead
of the bare `_skipNextRuntimeStats` field), and forwards the
mapped event to `_handleAddonOutputEvent`.

Stateful flag lives on the model so unit tests can still poke at
it via `model._addonEventState.skipNextRuntimeStats`. Updated all
9 references in `test/unit/finetuning.test.js`. All 31 unit
tests still pass; lint and dts checks clean.

Also fixes the misleading JSDoc on `LlamaInterface.loadWeights`:
the native binding reads the JS property name `chunk` (verified
in `qvac-lib-inference-addon-cpp/JsBlobsStream.hpp::appendBlob`,
lines 41–42 and 66–67), not `contents`. The C++ local variable
is named `contents`, which is what the proposal text was
referencing — but the on-the-wire JS property name is `chunk`
and the JS layer call sites are correct.

* fix: address PR #1494 second-round review findings

1. `test/integration/http-loader.js` no longer extends
   `@qvac/dl-base`. The base class was only providing a `close()`
   shim around `_close()`, and the package's devDependencies no
   longer list `@qvac/dl-base` after the loader-removal refactor.
   The helper now stands on its own — `getStream()` and `close()`
   are the only methods the sharded model-loading test calls, so
   the rest of the BaseDL surface (including the unused
   `getFileSize` and `list`) is dropped. Removes the dangling
   require that would break a clean install of this package and
   block the sharded test in CI.

2. `examples/multiModal.js` no longer passes `content: imageFilePath`
   on the second `media` message. The native binding only accepts
   `Uint8Array` payloads on `media` messages — file paths were
   silently broken after the loader removal. The example now
   reuses the same `imageBuffer` for both inferences and uses a
   different prompt on the second one to keep the example
   pedagogically distinct.

3. `index.d.ts` `AddonMessage` now exposes the optional
   `generationParams?: GenerationParams` field. The runtime path
   in `LlmLlamacpp._runInternal` already serializes this field
   onto every text message it forwards through `addon.runJob`,
   but the published transport type omitted it — IDE consumers
   building their own message-shaped payloads would lose the
   per-call overrides. The field documents that it is forwarded
   from `RunOptions.generationParams` and is the canonical way
   to vary sampling per request without re-loading the model.

* fix: extract pickPrimaryGgufPath, restore multiModal example, fix docs

- Extract shard-picker logic into named pickPrimaryGgufPath() with unit
  tests documenting the contract (tensors.txt-first ordering, single-file
  fallback). Move SHARD_REGEX inside the function.
- Revert multiModal.js to original: first inference uses Uint8Array,
  second uses string path. Both C++ code paths work. Remove false comment
  claiming file paths are not supported.
- Restore stripped JSDoc on FinetuneValidationSplit.fraction and
  FinetuneValidationDataset.path in index.d.ts.
- Fix docs/architecture.md and docs/data-flows-detailed.md: 4 occurrences
  incorrectly said "last" shard is the primary path; actual code picks
  the first shard regex match.
- Hardcode shard filenames in model-loading integration test instead of
  generating them via regex.
- Add network streaming capability loss note to CHANGELOG.

* fix: correct version in architecture.md and remove stale dl-filesystem benchmark dep

- docs/architecture.md header: v0.14.3 → v0.15.0 to match package.json
- benchmarks/performance/package.json: remove @qvac/dl-filesystem (no
  longer used after FilesystemDL references were removed from all
  benchmark JS files)

* fix: align _hasActiveResponse clearing with embed pattern

Remove the synchronous clear in _handleAddonOutputEvent on JobEnded/Error.
The .finally() on response.await() already clears the flag when the response
promise settles, and exclusiveRunQueue serializes _runInternal so the next
call cannot race the current one. Matches the embed addon's pattern, where
.finally() is the sole clear path outside of unload().

* fix: throw on second load(), log rejected responses, add mapAddonEvent unit test

- load(): throw if already loaded. Caller must unload() first. Aligns
  with the team consensus (Yury/Gianfranco/Gustavo) — silent reload
  masks caller bugs. unload() already clears configLoaded.
- _runInternal / finetune: replace silent `finalized.catch(() => {})`
  with a warn-level log so rejected responses are not swallowed when
  the caller does not await.
- test/unit/map-addon-event.test.js: new unit test covering TPS stats
  mapping + backendDevice translation, skipNextRuntimeStats dropping,
  finetune terminal + skip-flag arming, finetune_progress, Error event,
  string-as-token Output, and default fall-through.
- CHANGELOG 0.15.0: document the load() throw.

* fix: restore JSDoc on run() that was dropped during BaseInference removal

The JSDoc documenting run()'s prompt and runOptions parameters was
accidentally removed during the BaseInference removal refactor when
run() was split into run() + _runInternal(). Restore it on the public
run() method, and reference the full RunOptions type (which already
documents prefill / generationParams / cacheKey / saveCacheToDisk in
index.d.ts) so the docs stay authoritative in one place.

* fix: migrate afriquegemma-edge-cases test to new addon constructor

The afriquegemma-edge-cases.test.js file came in via the upstream/main
merge but still used the pre-refactor constructor shape:
  new LlmLlamacpp({ loader, modelName, diskPath, ... }, config)
with a FilesystemDL loader. All 7 tests in the file are now migrated to:
  new LlmLlamacpp({ files: { model: [path.join(dirPath, modelName)] },
                    config, logger, opts })
Removed FilesystemDL import and all loader.close() calls. Added
isMobile skip flag matching the pattern in afriquegemma-translation.

Caught by the qvac-staff-code-reviewer agent as a "merge brought in a
new consumer of the old API" — restore-the-class issue across the family.

* fix: make load() idempotent when already loaded

Second load() on an already-loaded instance returns immediately instead
of throwing. Matches the ReadyResource pattern used elsewhere in QVAC:
open/load is idempotent; explicit unload() is required to swap weights.

CHANGELOG updated.

* test: regenerate mobile integration auto.cjs

Integration test files were touched during the refactor and the
generated mobile harness was not regenerated. `npm run test:mobile:generate`
output committed so `validate-mobile-tests.js` passes.

* doc: document missing breaking changes from BaseInference removal

Address feedback to report all breaking changes from the BaseInference
refactor, not just the constructor shape:

- getState() narrows from {configLoaded, weightsLoaded, destroyed}
  to {configLoaded} only
- LlmLlamacpp public methods removed: downloadWeights, unpause, stop,
  status, destroy, getApiDefinition (destroy was already mentioned;
  other five were missing)
- load() takes no arguments (was (closeLoader, onDownloadProgress))
- Type exports removed from index.d.ts: ReportProgressCallback,
  Loader, DownloadWeightsOptions, DownloadResult

Also fix the stale (0.15.0) version marker in the AFTER code block.

* fix: address lifecycle, validation, and CI-surface review findings

- load() now runs through `this._run()` so concurrent calls on the same
  instance serialize instead of racing past the `configLoaded` guard.
  Two overlapping loads could previously both allocate a native addon
  and clobber `this.addon`, leaking one native handle.
- Constructor now validates each `files.model` entry with
  `path.isAbsolute()` and applies the same check to the optional
  `files.projectionModel` (which previously had no validation at all).
  Relative paths are rejected at construction time instead of bubbling
  up from bare-fs / native load.
- `pickPrimaryGgufPath` is now declared in `index.d.ts` so the TS
  surface matches the CommonJS export at `index.js`.
- Add `test:unit` and `test:unit:generate` scripts that run the JS
  unit tests under `test/unit/*.test.js` via brittle + bare. Wire
  `test:unit` into `test:all` and into the PR workflow's ts-checks
  job so `map-addon-event.test.js`, `pick-primary-gguf-path.test.js`,
  and the pre-existing `finetuning.test.js` all run on every PR.

* doc: add CHANGELOG entries for load() serialization and absolute-path validation

* fix[ci]: run test:unit via run-lint-and-unit-tests action

Replace my hand-rolled test:unit step (which invoked `bare` in a job
that never installs it) with the existing run-lint-and-unit-tests
external action. Same pattern qvac-lib-infer-onnx and ocr-onnx already
use. The action installs bare globally and runs
`npm run test:unit --if-present`.

Also chain test:unit into the `test` script for local dev convenience,
matching the standalone-repo precedent (qvac-lib-inference-addon-base,
qvac-lib-dl-filesystem, etc.).

* doc: fix mermaid parsing errors in architecture.md and finetuning.md

architecture.md:159 — mermaid classDiagram uses { } as class-body
delimiters; the inline destructured-object syntax in the constructor
signature broke parsing. Replace with the canonical named type
`LlmLlamacppArgs` from index.d.ts so the class diagram renders.

finetuning.md:251 — sequence-diagram message contained `(_run)` and
`_hasActiveResponse` where the leading underscore was being
interpreted as mermaid italic-open, and slashes in
`validationSplit/useEvalDatasetForValidation/evalDatasetPath` made
the message ambiguous. Reword to use prose-style commas and drop the
leading-underscore identifiers.

Reported by maxim-smotrov.

* chore[ci]: rename step to reflect what the action actually runs

The run-lint-and-unit-tests action runs `npm run lint` and
`npm run test:unit` (and installs bare in between). The step name
"Run JavaScript tests" hides the lint half. Rename to
"Run lint and unit tests" and update the step id accordingly.

* fix: readme, finetune lifecycle, multimodal type

README quickstart, sharded, and OCR examples now use `path.resolve('./models')`
so the resulting `files.model` entries and `files.projectionModel` are
absolute. The refactored constructor rejects relative paths, which meant
the README snippets threw `TypeError` when copied verbatim.

`finetune()` moves the `!this.addon` readiness check and the
`_checkpointSaveDir` assignment inside the `this._run(...)` closure,
matching the pattern `run()` uses via `_runInternal`. If `unload()` is
already queued ahead of `finetune()`, the guard now runs after
`unload()` nulls `this.addon` instead of before, so the caller gets the
intended "Call load() first." error rather than a null-dereference
crash inside the queued body.

`UserMediaMessage.content` widens from `Uint8Array` to `Uint8Array | string`.
The C++ layer has always accepted both (raw bytes go through `parseMedia`;
string paths go through `loadMedia` in LlamaModel.cpp), and the OCR /
multimodal examples exercise the string-path form. The d.ts was
inadvertently narrower than the runtime contract.

* fix: preserve LogMsg event name in mapAddonEvent

Native `JsLogMsgOutputHandler` emits log events whose payload is a
plain string (`js::String::create(env, logMsg)`). The old mapping had
a generic `typeof rawData === 'string'` fallback that remapped every
string-payload event to `Output`, so any native LogMsg was quietly
pushed into the job output stream instead of the logger. The
`_handleAddonOutputEvent` branch that routes `LogMsg` to
`this.logger.info()` was therefore unreachable.

Check the `LogMsg` event name before the string-to-Output fallback so
log messages keep their type and reach the logger. Add a unit test
covering the precedence.

* doc: restore class JSDoc, method JSDoc, and media-separation comments

Restore documentation that the refactor dropped but whose content is
still accurate against the refactored code:

- Class-level JSDoc on LlmLlamacpp describing what the class does.
- Short JSDoc on pause(), cancel(), and unload() explaining each method's
  purpose, including how pause() saves a resumable checkpoint and how
  cancel() wipes it so the next finetune() starts fresh.
- Inline comments in _runInternal explaining the media/text separation:
  binary blobs go into promptMessages as type: 'media' entries in order,
  then the JSON text payload carries empty-content placeholders for each
  media item so tokenization can align.

* doc: shorten pickPrimaryGgufPath JSDoc in d.ts to a single line

Declaration-file JSDoc surfaces in IDE hover tooltips, so multi-paragraph
prose is noise. Trim to a one-liner covering the only behavior the type
hover needs to convey. The "exported for unit testing" rationale is
dropped since consumers do not need it on the type surface.

* doc: trim verbose comments added during the refactor

Tighten comments this PR introduced that drifted into over-explanation.
Leave pre-existing comments as-is.

- addon.js mapAddonEvent JSDoc: drop the multi-paragraph prose about
  C++ event naming and stateful ordering; keep the one-sentence
  contract plus the param block.
- index.js pickPrimaryGgufPath JSDoc: replace the multi-paragraph
  explanation of the caller's shard-list contract with a single-line
  summary citing the C++ regex contract.
- index.js class header on LlmLlamacpp: reduce to a single purpose line.
- index.js constructor block: shorten the lazy-deref rationale and the
  _addonEventState comment to one line each.
- index.js _addonOutputCallback: reduce the three-line comment
  pointing at addon.js to a single line. The detailed rationale is
  already in addon.js mapAddonEvent JSDoc.
- index.js media-separation comment: restore the one-line wording that
  already existed on main; earlier revision expanded it into three
  lines unnecessarily.

* doc: drop narration comment on _addonOutputCallback

The comment said "Event-name normalization lives in addon.js
(mapAddonEvent)", but the very next line imports and calls
mapAddonEvent — the code already tells the reader where event mapping
lives. Remove the line so the code speaks for itself.

* doc: restore FinetuneOptions JSDoc to pre-refactor forms

The refactor commit unintentionally rephrased FinetuneOptions JSDoc
lines that the refactor itself did not change. Revert those fields back
to main's original wording so the diff only carries structural changes
tied to the interface migration.

* doc: restore pre-refactor load/createAddon logs and JSDoc

The refactor commit silently dropped the _load() progress logs ('Creating
addon with configuration', 'Activating addon'), the 'Error during model
load' error log, and the JSDoc block on _createAddon(). Put them back so
the refactor only changes what needs to change.

* chore: drop unused 'test' script, inline into 'test:all'

The 'test' alias was only consumed by 'test:all', and neither was
referenced in CI workflows or the README. 'test:all' ran test:unit
twice because it called both test:unit and the 'test' alias. Remove
'test' and rewrite 'test:all' to run test:unit, test:integration, and
test:cpp directly.

* doc: correct pre-refactor constructor marker to <= 0.15.x

0.15.x still used the old (args, config) constructor shape; the old
example applies to any 0.15.x caller, not just 0.14.x. Align the
CHANGELOG marker with the PR body.

* test: run AfriqueGemma tests on mobile, matching main

The backmerge of upstream/main carried a stale 'skip: isMobile' from
the pre-refactor translation test into the six new translation tests
and the edge-cases migration. Main's a570189 deliberately dropped
the mobile skip; restore that intent. The isMobile constant is
unused after this and dropped.

* doc, test: fix _createAddon JSDoc and cover string-path media content

_createAddon() JSDoc referenced 'configurationParams.settings' and
omitted 'projectionPath'. The actual shape built in _load() is
{ path, projectionPath, config }; align the JSDoc with that.

UserMediaMessage.content widened to Uint8Array | string earlier in
this PR but no integration test exercised the string-path branch.
Add one elephant-image test that passes the absolute path as
message content, exercising the loadMedia(string) path through the
JS-to-C++ handoff.

* build: promote @qvac/logging to runtime dependency

index.js requires('@qvac/logging') at runtime, so it belongs under
dependencies, not devDependencies. Previously it worked only because
another runtime dep pulled it in transitively — fragile for publish
and can break under stricter package managers.

* doc: finish finetuning.md mermaid fix

Previous commit 979a070 reworded only my own addition (line 251) but
the block still failed at the same position because the surrounding
pre-existing message bodies still used ; as a statement separator.
Mermaid sequenceDiagram parses ; as end-of-statement, so every message
containing it broke the diagram.

Replace ; with , or a separator word across all four affected lines
(block #1 lines 251, 256, 266 and block #2 line 296) so the finetune
and pause flow diagrams render on GitHub.

* fix: move addon construction into crash-safe try block

_createAddon() was outside the try so a synchronous throw in
require('./binding') or binding.createInstance() would leave
this.addon set to a partial native handle and never reach the
cleanup path. Route addon construction through the same try the
shard-streaming and activate() calls use.

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
iancris added a commit that referenced this pull request Apr 24, 2026
Previous run (8267369) landed the abort callback and synchronous
__android_log_print path, but the Samsung S25 tombstone still shows
only the backtrace — no "ggml-nmt-abort" / "GGML_ABORT:" line.

Root cause: ggml-backend loads every backend via
  dlopen(path, RTLD_NOW | RTLD_LOCAL)
(see ggml-backend-reg.cpp:159). Because libggml-base.a is statically
linked into each backend .so, the g_logger_state and g_abort_callback
symbols are PRIVATE per .so. ggml_log_set / ggml_set_abort_callback
called from the main .bare only mutates the main .bare's copy. When a
GGML_ASSERT fires inside libqvac-ggml-opencl.so (or vulkan), it runs
that .so's uninstalled callback and falls back to stderr — which is
dropped on Android. That is why the crash looks silent despite the
callback being installed in the main .bare.

The tombstone backtrace frame #1 shows ggml_abort+228 at a load-base
offset (~0x5edc000) that doesn't match the main .bare (~0x8fc8000),
confirming the abort runs out of a different mapped module.

Workaround: after ggml_backend_load_all_from_path returns, iterate
backendsDir for libqvac-ggml*.so, re-open each with
RTLD_NOW | RTLD_NOLOAD (returns the existing handle without remapping),
dlsym ggml_log_set + ggml_set_abort_callback out of the .so, and call
them with our callbacks so each backend .so's copy of the state is
also wired up. dlclose drops only our extra reference; the .so stays
loaded.

Each install is logged synchronously via __android_log_print with the
"ggml-nmt" tag so the logcat confirms which .sos received the hooks
before the crash.
DmitryMalishev added a commit that referenced this pull request Apr 28, 2026
…ape (review #1727)

Addresses two `[BUG]` review comments from @olyasir on #1727
about the hardcoded `kNumClasses = 3` not being validated against either
the loaded GGUF's `mobilenet.num_classes` metadata or the actual element
count of the constructed output tensor. Both are downstream-safety
problems for the per-inference path:

  float logits[graph::kNumClasses] = {0.0F};
  ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits));

`sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this
either reads OOB (numClasses < kNumClasses) or silently truncates
(numClasses > kNumClasses); on the FC-weight-upload side the
`classifier.3.weight = [1024, kNumClasses]` shape would also fail to
match the GGUF tensor and corrupt the classifier.

Changes:

1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights()

   Right after reading `numClasses` from `mobilenet.num_classes`,
   compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)`
   with a descriptive message (actual vs expected count, plus a hint to
   rebuild the addon or use a matching GGUF). This is the primary fix
   olyasir requested in `MobileNetGraph.cpp`.

   The error path is reachable from `ClassificationModel::load()`'s call
   to `graph::loadWeights(...)`, which already runs inside the JS-side
   `await classifier.load()` Promise; the `StatusError(InvalidArgument)`
   propagates as a structured rejection on the JS side, matching how
   every other config-time validation error in this addon surfaces.

2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph()

   At the end of the graph build, before we hand the
   `ComputeGraph::output` tensor over to the backend allocator, assert
   `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which
   throws `StatusError(InternalError, ...)`) if the invariant is
   violated. This is the defence-in-depth fix olyasir requested in the
   second `[BUG]` comment in `ClassificationModel.cpp`: it makes the
   12-byte stack-array `ggml_backend_tensor_get` read provably safe
   regardless of how the output tensor was constructed.

   This second check is not redundant with #1: it also catches a future
   accidental edit to the classifier wiring above (where the tail
   `classifier.3` linear is what determines the output element count),
   an upstream ggml change to how `mul_mat` shapes its result, or a
   GGUF that lacks the `mobilenet.num_classes` metadata key entirely
   and falls back to `kNumClasses` but ships mismatched FC weights.

Local validation on win32-x64:

- 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph
  determinism, preprocessor suite -- they all exercise the validated
  load + build paths against the bundled FP16 GGUF, where
  `num_classes == 3` so neither check fires).
- 14/14 JS integration tests pass, 140/140 asserts (no behaviour
  change for the supported model; new error paths are unreachable
  with the bundled weights).

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp
Made-with: Cursor
DmitryMalishev added a commit that referenced this pull request Apr 29, 2026
CI run 25074595106 confirmed the two-phase test-side drain
(commit f26f561) is sufficient for the upstream `OutputCallBackJs`
UAF on every platform: linux-x64/-arm64, darwin-arm64,
android-arm64, ios-arm64 all pass.

Only `win32-x64-integration-tests` still fails, and it does so for
a completely different upstream issue: the first
`js_create_double` call inside an `OutputCallBackJs` callback
returns 0.0 on win32-x64 (clang-cl + bare-runtime + V8) regardless
of the input. Subsequent calls in the same handle scope are
correct. The bug zeros out the highest-confidence value on every
classify() call, breaks the sort order, and trips
`meal_1.jpg "sorted desc [0]>=[1]"` (CI runs 24851301107,
24891210942, 24897445066, 24900278513, 25002820522, 25062157099,
25070800838, 25074595106).

There is no test-side workaround for this one. Sleeps don't help
because it isn't a lifecycle race. Other addons accidentally dodge
it for the reasons enumerated in the comment block at the top of
`AddonJs.hpp` (first emitted number is naturally 0; tests assert
only typeof / !isNaN; first number never asserted on; or no
numbers emitted at all). Our 3-class triage assertions cover none
of those, so the bug remains visible in CI.

Fix: restore the local C++ "burn one" workaround that was removed
in commit 7ccb9f5. A throwaway `js_create_double(env, 0.0,
&dummy)` call at the top of `JsClassifyOutputHandler`'s lambda
consumes the broken first slot; the per-element `Number::create`
calls that follow produce the correct value at index 0. The
throwaway value is never wired into the result array; cost is one
ephemeral js_number per classify() call.

The asymmetry between issues #1 (test-side sleep is enough) and
#2 (needs C++ workaround) is now documented at the top of
AddonJs.hpp -- including the CI runs that surfaced each, why the
test-side approach worked for one and not the other, and the
explicit rationale ("removed once upstream marshalling layer is
patched") for revisiting both.

Local validation on win32-x64:
- `bare-make build` clean.
- `npm run test:integration` 14/14 tests, 140/140 asserts (was
  failing on `meal_1.jpg sorted desc [0]>=[1]` before this).

Expected CI behaviour after this commit:

- Linux x64/arm64, Darwin arm64, Android arm64, iOS arm64 should
  keep passing (this commit doesn't touch their code paths).
- win32-x64 should now pass: the burn-one consumes the broken
  first slot and every per-element confidence marshalls correctly.

File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
Made-with: Cursor
DmitryMalishev added a commit to DmitryMalishev/qvac that referenced this pull request Apr 29, 2026
…ape (review tetherto#1727)

Addresses two `[BUG]` review comments from @olyasir on tetherto#1727
about the hardcoded `kNumClasses = 3` not being validated against either
the loaded GGUF's `mobilenet.num_classes` metadata or the actual element
count of the constructed output tensor. Both are downstream-safety
problems for the per-inference path:

  float logits[graph::kNumClasses] = {0.0F};
  ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits));

`sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this
either reads OOB (numClasses < kNumClasses) or silently truncates
(numClasses > kNumClasses); on the FC-weight-upload side the
`classifier.3.weight = [1024, kNumClasses]` shape would also fail to
match the GGUF tensor and corrupt the classifier.

Changes:

1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights()

   Right after reading `numClasses` from `mobilenet.num_classes`,
   compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)`
   with a descriptive message (actual vs expected count, plus a hint to
   rebuild the addon or use a matching GGUF). This is the primary fix
   olyasir requested in `MobileNetGraph.cpp`.

   The error path is reachable from `ClassificationModel::load()`'s call
   to `graph::loadWeights(...)`, which already runs inside the JS-side
   `await classifier.load()` Promise; the `StatusError(InvalidArgument)`
   propagates as a structured rejection on the JS side, matching how
   every other config-time validation error in this addon surfaces.

2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph()

   At the end of the graph build, before we hand the
   `ComputeGraph::output` tensor over to the backend allocator, assert
   `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which
   throws `StatusError(InternalError, ...)`) if the invariant is
   violated. This is the defence-in-depth fix olyasir requested in the
   second `[BUG]` comment in `ClassificationModel.cpp`: it makes the
   12-byte stack-array `ggml_backend_tensor_get` read provably safe
   regardless of how the output tensor was constructed.

   This second check is not redundant with tetherto#1: it also catches a future
   accidental edit to the classifier wiring above (where the tail
   `classifier.3` linear is what determines the output element count),
   an upstream ggml change to how `mul_mat` shapes its result, or a
   GGUF that lacks the `mobilenet.num_classes` metadata key entirely
   and falls back to `kNumClasses` but ships mismatched FC weights.

Local validation on win32-x64:

- 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph
  determinism, preprocessor suite -- they all exercise the validated
  load + build paths against the bundled FP16 GGUF, where
  `num_classes == 3` so neither check fires).
- 14/14 JS integration tests pass, 140/140 asserts (no behaviour
  change for the supported model; new error paths are unreachable
  with the bundled weights).

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp
Made-with: Cursor
gianni-cor added a commit to zoq/qvac-fork that referenced this pull request May 7, 2026
Updates the per-package qvac-fabric overlay (and matching consumer
version constraint) to ship the new fabric release without touching
the default-registry baseline.

- vcpkg/ports/qvac-fabric/vcpkg.json (3x): version 7248.2.3 tetherto#1 -> 8189.0.0
- vcpkg/ports/qvac-fabric/portfile.cmake (3x): REF -> v${VERSION},
  SHA512 -> 95ebab85... (matches https://github.com/tetherto/
  qvac-fabric-llm.cpp/archive/refs/tags/v8189.0.0.tar.gz, the
  SSH-signed v8189.0.0 tag).
- vcpkg.json (3x): qvac-fabric version>= -> 8189.0.0.

Co-authored-by: Cursor <cursoragent@cursor.com>
GustavoA1604 added a commit that referenced this pull request May 7, 2026
Bundle of correctness, hygiene, and CI-doc fixes from the recent code
review.  Each item below has its own paragraph in the diff comments.

- #1 files-array: add test/utils/runSupertonicTTS.js + test/data/sentences-{medium,long}.js
  to package.json so consumers running the integration tests from the
  npm tarball don't crash with `Cannot find module ../utils/runSupertonicTTS`.
- #2 deps: move @qvac/langdetect-text from runtime dependencies to
  devDependencies (it's only referenced from examples/, which aren't in
  the published files list).
- #3 race-fix: ChatterboxModel::process()'s post-synthesize streaming
  detection used to read engine_->options() outside engineMu_, racing
  with reload().  synthesize() now returns SynthesizeResult { pcm,
  wasStreaming } where wasStreaming is captured under the engine lock
  against the local shared_ptr so process() doesn't have to touch
  engine_ again.
- #4 deferred-load: ChatterboxModel + SupertonicModel constructors
  used to call load() eagerly, so JsInterface::createInstance() (sync
  on the JS thread) was parsing ~370 MB of GGUF on the Bare event loop.
  Both models now implement IModelAsyncLoad: constructors validate +
  return; the actual load is deferred to waitForLoadInitialization(),
  which the new addon_js::activate wraps inside JsAsyncTask::run so the
  parse runs on a worker thread.  binding.cpp registers
  addon_js::activate in place of JsInterface::activate; tts.js now
  awaits the resulting promise.
- #5 dead code: drop _resolvePath (unused), drop the (void)inputObj
  read in AddonJs.hpp::runJob, document FAILED_TO_PAUSE /
  FAILED_TO_STOP / JOB_ALREADY_RUNNING in lib/error.js as reserved-but-
  not-thrown so future maintainers don't delete them blindly (the unit
  suite asserts the values).
- #6 cancel-reset: SupertonicModel grew Chatterbox's cancelRequested_
  reset pattern: cancel() sets it, synthesize() fast-fails on it,
  process() resets it per call so a stale cancel doesn't poison the
  next run.
- #7 useGPU comment: explain in JSAdapter::buildChatterboxConfig that
  the JS layer is the source of truth for useGPU and nGpuLayers wins
  downstream; left a pointer to std::optional<bool> if a future caller
  ever needs to distinguish "absent" from "explicit false".
- #10 fork pointers: README.md and test/utils/downloadModel.js no
  longer point at GustavoA1604/chatterbox.cpp; both reference the
  upstream tetherto/qvac-ext-lib-whisper.cpp/tts-cpp tree now.
- #9 doc: integration-mobile-test-tts-ggml.yml gained a header comment
  on the build-and-test job documenting that continue-on-error is the
  early-days landing posture (merge-guard treats success || skipped as
  pass), with a pointer to tighten once Device Farm provisioning is
  stable.

Nits:
- 'use strict' added to addonLogging.js (matches every other .js).
- node-vs-bare runtime banners on
  scripts/{generate,validate}-mobile-integration-tests.js.
- ttsOutputDebugString no longer JSON.stringify's the full PCM
  Int16Array on every chunk-streaming event; emits a tiny summary
  ({sampleRate, chunkIndex, isLast, sentenceChunk, outputArrayLen})
  instead.

Tests: 35 passing (33 -> 35; two new assertions cover the deferred-load
contract); 4 skipped real-GGUF tests behind the existing
QVAC_TEST_CHATTERBOX_T3_GGUF / QVAC_TEST_CHATTERBOX_S3GEN_GGUF /
QVAC_TEST_SUPERTONIC_GGUF env-var gates.  Lint clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
GustavoA1604 added a commit that referenced this pull request May 11, 2026
…#1983)

* feat: add @qvac/tts-ggml package (Chatterbox English on qvac-tts.cpp)

New Bare addon wrapping the `qvac-tts::qvac-tts` static library (backed
by the `tts-cpp` port added in tetherto/qvac-registry-vcpkg).  API-compatible
with the Chatterbox engine exposed by `@qvac/tts-onnx` so downstream
consumers can swap backends without touching orchestration code.

## Scope

* First iteration.  Supports Chatterbox **English** only.  Chatterbox
  multilingual, LavaSR enhancer, Supertonic engine, and streaming are
  out of scope and remain in `@qvac/tts-onnx`.  They'll land alongside
  the evolution of qvac-tts.cpp.
* Native backend is the static `qvac-tts` library from the QVAC vcpkg
  registry (`ports/tts-cpp`, baseline `2026-04-21`).  No ONNX Runtime
  dependency.

## JS surface

* `@qvac/tts-ggml` exports `TTSGgml` with the same method shape as
  `ONNXTTS`:  `run` / `runStream` / `runStreaming` / `reload` /
  `unload` / `destroy`.
* `files: { modelDir }` looks for `chatterbox-t3-turbo.gguf` +
  `chatterbox-s3gen.gguf` side-by-side; `files.t3Model` /
  `files.s3genModel` override the defaults.
* Options: `referenceAudio`, `voiceDir` (baked profile), `seed`,
  `nGpuLayers`, `threads`, `outputSampleRate`, plus placeholders for
  the upcoming streaming flags (`streamChunkTokens`,
  `streamFirstChunkTokens`, `cfmSteps`).
* Shared reusable lib code (`lib/textChunker.js`,
  `lib/textStreamAccumulator.js`, `addonLogging.*`) is copied verbatim
  from `@qvac/tts-onnx`.
* New error class `QvacErrorAddonTTSGgml` uses codes **13001–14000**
  to avoid collisions with `@qvac/tts-onnx` (7001–7011) when both
  packages are loaded in the same Bare process.

## Native addon

* `addon/src/model-interface/chatterbox/ChatterboxModel.{hpp,cpp}` —
  `IModel` + `IModelCancel` implementation.  First-iteration strategy:
  assemble argv for `qvac_tts_cli_main` with a scratch `.wav` output
  path, call it synchronously, then parse the resulting 16-bit mono
  PCM wav back into `std::vector<int16_t>` for the JS handler.
  Consequences: every job re-loads the model (~700 ms + inference
  time), no mid-synthesis cancellation, no streaming.  The follow-up
  milestone replaces this with a persistent, struct-based API once
  qvac-tts.cpp exposes one.
* `addon/src/js-interface/{JSAdapter.{hpp,cpp}, binding.cpp}` — JS-to-C++
  config bridging (same string-map pattern as `@qvac/tts-onnx`) and the
  `BARE_MODULE(qvac_tts_ggml, ...)` registration exposing
  `createInstance` / `runJob` / `reload` / `activate` / `cancel` /
  `destroyInstance` / `loadWeights` / `setLogger` / `releaseLogger`.
* `addon/src/addon/AddonJs.hpp` — JS-facing `createInstance` / `runJob`
  / `reload` wrappers that register a `JsAudioOutputHandler` emitting
  `{ outputArray: Int16Array, sampleRate: number }` to JS.

## Build / registry

* `CMakeLists.txt` uses `find_package(qvac-tts-cpp CONFIG REQUIRED)`
  and the standard `cmake-bare` + `cmake-vcpkg` scaffolding (shape
  matches `@qvac/transcription-whispercpp`).
* `vcpkg.json` depends on `tts-cpp` (with a `vulkan` feature passthrough)
  plus `qvac-lib-inference-addon-cpp`, `qvac-lint-cpp`, and `gtest`.
* `vcpkg-configuration.json` points at tetherto/qvac-registry-vcpkg.
  NOTE: the baseline pin here is inherited from
  `@qvac/transcription-whispercpp` and **must be bumped** to a commit
  that contains the `tts-cpp` port once that registry PR lands.  A
  follow-up commit will update it.

## Tests & examples

* Integration + unit test files for Chatterbox English are copied
  verbatim from `@qvac/tts-onnx` with only mechanical renames
  (`ONNXTTS` -> `TTSGgml`, `QvacErrorAddonTTS` -> `QvacErrorAddonTTSGgml`,
  `@qvac/tts-onnx/text-chunker` -> `../../lib/textChunker.js`).  Some
  paths in `test/integration/addon.test.js` still import Supertonic /
  LavaSR helpers that don't exist in this package — those test blocks
  will fail fast when the file loads, which is expected until those
  backends get their own ggml packages.
* Examples: `chatterbox-tts.js`, `chatterbox-streaming-tts.js`, plus
  shared `wav-helper.js` + `pcm-chunk-player.js`.

## What's not in this PR (known gaps)

* No docs: README, NOTICE, CHANGELOG, PULL_REQUEST_TEMPLATE changes
  will land in a single documentation pass once the registry + fork
  commits have merged upstream.
* `vcpkg-configuration.json` baseline needs to point at a
  qvac-registry-vcpkg commit that ships `tts-cpp` (pending the
  registry PR).
* Actual `npm run build` requires the registry and fork commits to be
  on `main` of their respective upstream repos.

* chore: point tts-ggml vcpkg baseline at the tts-cpp-bearing registry commit

Bumps `vcpkg-configuration.json` to GustavoA1604/qvac-registry-vcpkg
at commit 1e2839680b6be8d8ffff889a9c29b966c176098c — the commit that
adds the `tts-cpp` port.  Paired with the `qvac-tts` library already
pinned in the port's `portfile.cmake` (GustavoA1604/chatterbox.cpp
@ 0fe4a521618cc30358040b29d75d4261b31cbb60).

Will be re-pointed at tetherto/qvac-registry-vcpkg once the registry
PR lands upstream.

* chore: tts-ggml: trim tests + examples to Chatterbox English, restore mobile wrapper

Second pass over @qvac/tts-ggml after the build started passing: prune
everything that only made sense for the ONNX-era multi-engine scope and
adapt the remaining Chatterbox-English bits to the GGUF + file-path
reference-audio contract.  Restores `test/mobile/` so the Android build
has something to point at.

## C++

* `ChatterboxModel.cpp`: the `ArgvBuilder::buildArgv` doc comment
  contained `**/` which closed the block comment early and broke the
  build.  Rewrote as a `//` comment.

## Examples

* `examples/chatterbox-tts.js` — rewrite for v0 contract: single
  `<text>` argv, `files: { modelDir }` pointing at the two GGUFs,
  `referenceAudio` is now a wav **path** (addon passes it to
  `--reference-audio`) instead of a Float32Array.  Drops
  english/multilingual arg and the CHATTERBOX_VARIANT switch that
  picked which `.onnx` files to load.
* Removed `examples/chatterbox-streaming-tts.js` +
  `examples/pcm-chunk-player.js`.  The v0 addon re-loads the model
  per `run()` call — exposing streaming would mislead.  Both come
  back alongside the persistent-engine milestone.
* `package.json`: `npm run example` now passes a default text so it
  runs without extra args.

## Tests

### Kept as-is (engine-agnostic)

* `test/unit/textChunker.test.js`
* `test/mock/{MockedBinding,utils}.js`
* `test/utils/{wav-helper,pcmConcatenator,loader.fake,runWhisper,runTTS}.js`
* `test/reference-audio/jfk.wav`, `test/data/sentences-*.js`

### Mechanical fixes

* `test/unit/tts.error.test.js` — fix error-code assertions to the
  tts-ggml range (`13001–14000`); was still checking the
  `@qvac/tts-onnx` range (`7001–7011`).
* `test/unit/tts-ggml.lifecycle.test.js` — fix stale
  `QvacErrorAddonTTS` import to `QvacErrorAddonTTSGgml`; switch the
  stubbed model to `{ t3Model, s3genModel }` GGUFs and drop the
  non-existent `engine: 'chatterbox'` option.
* `test/unit/tts-ggml.sentence-stream.test.js` — same GGUF/engine
  cleanup.

### Rewritten

* `test/unit/chatterbox.inference.test.js` — drop tests that asserted
  the old ONNX file shape (`tokenizer / speechEncoder / embedTokens /
  conditionalDecoder / languageModel`), the removed `engine` detection
  and the wrong `getModelKey` return value (`'onnx-tts'` -> `'tts-ggml'`).
  New tests cover: `modelDir` derives the two GGUF paths; explicit
  `t3Model` / `s3genModel` override the defaults.  The mocked-binding
  run/reload/cancel flow stays.
* `test/integration/addon.test.js` — fresh, ~180 LoC, Chatterbox-English
  only.  Ensures the GGUFs are present, runs the short sentence set
  through `loadChatterboxTTS` + `runChatterboxTTS[WithSplit]`, and
  (on darwin only) runs a whisper-based WER check via the existing
  `runWhisper` util.  Drops the Chatterbox-multilingual block + every
  Supertonic + LavaSR block that doesn't apply to this package.
* `test/utils/runChatterboxTTS.js` — rewrite for the GGUF contract:
  `files: { modelDir, t3Model, s3genModel }`, `referenceAudio` as a
  file path that falls back to `test/reference-audio/jfk.wav` (or the
  mobile test-asset when `global.assetPaths` is present).  No more
  WAV decode / resample on the JS side.
* `test/utils/downloadModel.js` — trim from 1007 LoC to 280.  Drops
  the Supertonic + LavaSR + Chatterbox-multilingual + Cangjie
  downloaders.  Keeps the shared HTTP/curl infrastructure and
  `ensureWhisperModel` (still used by the integration WER check).
  `ensureChatterboxModels` is now **check-only**: it verifies
  `chatterbox-t3-turbo.gguf` + `chatterbox-s3gen.gguf` exist locally
  and, if missing, prints the exact commands for generating them
  from the qvac-tts.cpp (née chatterbox.cpp) conversion scripts.
  Once the GGUFs land on a canonical HuggingFace repo we'll wire up
  download URLs here.

## Scripts

* `scripts/ensure-chatterbox.js` — simplify to a single invocation
  against `./models/`.  Drops the variant / language matrix that the
  ONNX downloader needed.
* `scripts/ensure-models.js` — now a thin alias to
  `ensure-chatterbox.js`.  Drops the Supertonic + LavaSR orchestration.

## Mobile

* Restored `test/mobile/{integration.auto.cjs, integration-runtime.cjs,
  testAssets/jfk.wav}` so the Android build has a wrapper to point at.
* `package.json`: re-added `test/mobile` to the `files` list.

## Gitignore

* Ignore generated `.clang-format` / `.clang-tidy` / `.valgrind.supp`
  (produced by the top-level `configure_file(...)` calls) and
  `build_*/` dirs (bare-make convention).

## Verified locally

* `npx standard "test/**/*.js" "*.js" "lib/*.js"` — clean.
* `npm run test:unit` — 38/38 pass (105/105 asserts).
* `npm run build && bare examples/chatterbox-tts.js "Hello from qvac tts ggml."`
  produces a 24 kHz wav as expected.

* Add streaming support

* Update ggml backend to use separate ggml repo

* tts-ggml: consume renamed tts-cpp library (2026-04-24#1)

Upstream chatterbox.cpp renamed the package + namespace + target from
qvac-tts to tts-cpp and tightened the library boundary; pick up the
new artefacts here:

- find_package(qvac-tts-cpp CONFIG REQUIRED)
    -> find_package(tts-cpp CONFIG REQUIRED)
- qvac-tts::qvac-tts  -> tts-cpp::tts-cpp
- qvac_tts::chatterbox -> tts_cpp::chatterbox (engine ptrs, EngineOptions,
  SynthesisResult, forward-decls in ChatterboxModel.hpp)
- #include <qvac-tts/chatterbox/engine.h>
    -> #include <tts-cpp/chatterbox/engine.h>
- Doxygen / inline doc references to the old names refreshed alongside
  the code changes.

vcpkg wiring:
- vcpkg-configuration.json baseline bumped to qvac-registry-vcpkg
  commit bc30b0b (ports/tts-cpp renamed and repointed at
  chatterbox.cpp@f8f9145).
- vcpkg.json tts-cpp constraint bumped to 2026-04-24#1 (the port that
  carries the rename + namespace + install(EXPORT) changes).

Verified with a cold bare-make generate + bare-make build against the
new port, and the addon's existing unit + integration test suites.

Made-with: Cursor

* tts-ggml: bump tts-cpp port to 2026-05-07 + registry baseline

Picks up the round-3 review-fix wave landed on the tts-cpp port:

  e673182  scrub stale patches/ refs from README                (N10)
  8ba10a6  drop unreachable TTS_CPP_GGML_LIB_PREFIX block        (N8)
  4b5d2d7  mirror N1-N7 fixes from chatterbox.cpp source-of-truth
            - N1 supertonic alive-registry guard against freed-backend
              gallocr_free assert on hot-swap (Vulkan/Metal/CUDA)
            - N2 drop dead g_sink_* state, soften log_set docstring
            - N3 Turbo BPE try/catch (exception-safe Engine ctor)
            - N4 STFT cancel checkpoint + tighter Engine::cancel() doc
            - N5 document s3gen_preload/unload refcount semantics
            - N6 drop dead cached_text_lc Supertonic shim
            - N7 fix misleading "no copy" view-vs-copy log wording

Plus the integrated-port-only round-2 fixes that landed earlier:

  fa0d490  close patches/-deleted regression: TTS_CPP_USE_SYSTEM_GGML
            now defaults ON; bundled-without-patches hard-errors at
            configure time with a pointer at the ggml-speech vcpkg
            port.
  ae34c58  README rewritten for integrated/vcpkg context.
  a2f2dd6  top-level qvac-ext-lib-whisper.cpp README points at the
            tts-cpp/ subtree (alongside parakeet-cpp/).

Public API used by ChatterboxModel (tts_cpp::chatterbox::Engine /
EngineOptions / SynthesisResult / s3gen_preload / s3gen_unload) is
backward-compatible: the new port adds Engine::backend_name(),
MTL-variant fields on EngineOptions (language / cfg_weight / min_p /
exaggeration), and a separate tts_cpp::supertonic::Engine class, but
nothing this consumer was already calling has changed.

Edits:

  packages/tts-ggml/vcpkg.json
    - tts-cpp dep: version>=2026-04-24#1 -> version>=2026-05-07.

  packages/tts-ggml/vcpkg-configuration.json
    - default-registry baseline: bc30b0b (April 2026 fork-only state)
      -> 16b91afdcfd59baea60e81f3da94f49311ef2a97.  The new baseline
      pulls in the post-tetherto-merge state (parakeet-cpp port at
      932d5d9, ggml-speech port-version 1 at f07bdd0) plus the new
      tts-cpp port (16b91af) on the developer's GustavoA1604
      registry fork.

Smoke-test plan: after running `vcpkg install` against the new
baseline, the tts-cpp port's vcpkg_from_github resolves at
GustavoA1604/qvac-ext-lib-whisper.cpp@e673182 (tts-cpp branch) until the
upstream PR merges.  ChatterboxModel should build and synthesize
identically; expanding to Multilingual + Supertonic flows is the
follow-up commit on the package side.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add chatterbox multilingual and supertonic

* Add mobile integration tests

* tts-ggml: drop clang-19 pin in linux-clang toolchain

The toolchain hardcoded `clang-19` / `clang++-19` (versioned binary
names) since the package's first commit (0a2c978).  Linux CI hadn't
exercised this path before — the new on-pr-tts-ggml.yml -> integration
matrix is the first time it does, and it fails on every linux runner
(ai-run-ubuntu-22.04, ai-run-linux-gpu, ubuntu-24.04-arm) at vcpkg's
"detect_compiler" step because none of the GH-hosted images ship a
`clang-19` symlink:

  Detecting compiler hash for triplet x64-linux...
  error: while detecting compiler information:
  ...
  CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:127
  (message): Command failed: ... -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=
  .../tts-ggml/vcpkg/triplets/../toolchains/linux-clang.cmake ...

Match parakeet's working pattern (qvac-lib-infer-parakeet/vcpkg/
toolchains/linux-clang.cmake): use unversioned `clang` / `clang++` so
each runner picks up its image's default clang (clang-15 on
ubuntu-22.04, clang-18 on ubuntu-24.04, whatever the AI runners ship).
The `-stdlib=libc++` flag added by x64-linux.cmake / arm64-linux.cmake
is honoured by every reasonable clang version.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add C++ tests and coverage; fix linux build

* tts-ggml: address PR review feedback

Bundle of correctness, hygiene, and CI-doc fixes from the recent code
review.  Each item below has its own paragraph in the diff comments.

- #1 files-array: add test/utils/runSupertonicTTS.js + test/data/sentences-{medium,long}.js
  to package.json so consumers running the integration tests from the
  npm tarball don't crash with `Cannot find module ../utils/runSupertonicTTS`.
- #2 deps: move @qvac/langdetect-text from runtime dependencies to
  devDependencies (it's only referenced from examples/, which aren't in
  the published files list).
- #3 race-fix: ChatterboxModel::process()'s post-synthesize streaming
  detection used to read engine_->options() outside engineMu_, racing
  with reload().  synthesize() now returns SynthesizeResult { pcm,
  wasStreaming } where wasStreaming is captured under the engine lock
  against the local shared_ptr so process() doesn't have to touch
  engine_ again.
- #4 deferred-load: ChatterboxModel + SupertonicModel constructors
  used to call load() eagerly, so JsInterface::createInstance() (sync
  on the JS thread) was parsing ~370 MB of GGUF on the Bare event loop.
  Both models now implement IModelAsyncLoad: constructors validate +
  return; the actual load is deferred to waitForLoadInitialization(),
  which the new addon_js::activate wraps inside JsAsyncTask::run so the
  parse runs on a worker thread.  binding.cpp registers
  addon_js::activate in place of JsInterface::activate; tts.js now
  awaits the resulting promise.
- #5 dead code: drop _resolvePath (unused), drop the (void)inputObj
  read in AddonJs.hpp::runJob, document FAILED_TO_PAUSE /
  FAILED_TO_STOP / JOB_ALREADY_RUNNING in lib/error.js as reserved-but-
  not-thrown so future maintainers don't delete them blindly (the unit
  suite asserts the values).
- #6 cancel-reset: SupertonicModel grew Chatterbox's cancelRequested_
  reset pattern: cancel() sets it, synthesize() fast-fails on it,
  process() resets it per call so a stale cancel doesn't poison the
  next run.
- #7 useGPU comment: explain in JSAdapter::buildChatterboxConfig that
  the JS layer is the source of truth for useGPU and nGpuLayers wins
  downstream; left a pointer to std::optional<bool> if a future caller
  ever needs to distinguish "absent" from "explicit false".
- #10 fork pointers: README.md and test/utils/downloadModel.js no
  longer point at GustavoA1604/chatterbox.cpp; both reference the
  upstream tetherto/qvac-ext-lib-whisper.cpp/tts-cpp tree now.
- #9 doc: integration-mobile-test-tts-ggml.yml gained a header comment
  on the build-and-test job documenting that continue-on-error is the
  early-days landing posture (merge-guard treats success || skipped as
  pass), with a pointer to tighten once Device Farm provisioning is
  stable.

Nits:
- 'use strict' added to addonLogging.js (matches every other .js).
- node-vs-bare runtime banners on
  scripts/{generate,validate}-mobile-integration-tests.js.
- ttsOutputDebugString no longer JSON.stringify's the full PCM
  Int16Array on every chunk-streaming event; emits a tiny summary
  ({sampleRate, chunkIndex, isLast, sentenceChunk, outputArrayLen})
  instead.

Tests: 35 passing (33 -> 35; two new assertions cover the deferred-load
contract); 4 skipped real-GGUF tests behind the existing
QVAC_TEST_CHATTERBOX_T3_GGUF / QVAC_TEST_CHATTERBOX_S3GEN_GGUF /
QVAC_TEST_SUPERTONIC_GGUF env-var gates.  Lint clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* tts-ggml: unblock CI integration tests on every desktop runner

Four independent failures, one per platform:

1. linux-x64 / linux-arm64: addon load crashed at
   `libomp.so.5: cannot open shared object file`.  tts-cpp's binary is
   built with clang under the linux-clang toolchain and links against
   libomp (LLVM OpenMP runtime); only `libgomp1` (GNU OpenMP) was being
   apt-installed.  Add `libomp5` so libomp.so.5 is on the loader path.

2. darwin-arm64: convert-models.sh aborted at line 200 with
   `hf_args[@]: unbound variable`.  macOS's system bash is 3.2 which
   treats `"${arr[@]}"` as nounset access when the array is empty under
   `set -u`; with HF_TOKEN unset we hit it on every fresh runner.  Use
   the `${arr[@]+"${arr[@]}"}` idiom (defined-or-nothing) at all six
   call sites and add a header comment so the next maintainer doesn't
   accidentally regress.

3. darwin-x64: pip install bombed building `llvmlite` from source
   because the macos-15-large runner has no LLVM 15 development
   install.  Root cause: librosa pulls in numba 0.65+, which stopped
   shipping darwin-x86_64 wheels for Python 3.12.  Pin Python to 3.11
   in the Setup Python step; 3.11 has prebuilt wheels for the entire
   numba/llvmlite/librosa stack on darwin-x64 and is fine for every
   other converter dependency.

4. windows-2022: ChatterboxModel::load threw
   `vk::createInstance: ErrorIncompatibleDriver`.  Root cause: the
   addon's index.js::_validateConfig defaults `useGPU = true` when
   neither useGPU nor nGpuLayers is specified, so the test ran with
   n_gpu_layers=99 -> ggml_backend_vk_init -> vk::createInstance ->
   ErrorIncompatibleDriver on the runner's no-Vulkan-driver image.
   runChatterboxTTS.js now honours `process.env.NO_GPU === 'true'`
   (set on the no-GPU matrix entries) and forces useGPU=false on
   exactly those runners; the other test runners (chatterbox-mtl,
   gpu-smoke, multiple-runs) already had this guard.

Also documents the `mesa-vulkan-drivers` apt package (already pulled
in) as the software ICD that lets the Vulkan-built prebuild's runtime
backend probe enumerate at least one device on linux runners.

Co-authored-by: Cursor <cursoragent@cursor.com>

* tts-ggml: drop Chatterbox from mobile bundle (Metro V8 string limit)

Mobile build failed at `:app:createBundleReleaseJsAndAssets` with:

  SyntaxError: assets/testAssets/chatterbox-s3gen.gguf:
    Cannot create a string longer than 0x1fffffe8 characters

Root cause: Metro's bundler reads every asset under
`test/mobile/testAssets/` via `Buffer.toString()`.  V8's max string
length is 0x1fffffe8 (~512 MiB).  chatterbox-s3gen.gguf is ~1 GiB even
with --quant q4_0 because the s3gen converter only quantizes attention
weights and leaves the bulk of the s3gen graph in fp16 ("0/291 weight
tensors quantized" in the converter log).

Fix: bundle ONLY supertonic.gguf (~125 MiB, comfortably under the
limit) on mobile.  Mobile Chatterbox tests degrade cleanly to
`t.pass('Skipped: Chatterbox GGUFs not available')` via the existing
`ensureChatterboxModels` helper -- it already returns
{ success: false } when the GGUFs aren't on disk.

Cache key bumped to v2 so existing v1 cache entries (which include
the chatterbox files) are evicted on the next run.

Bundling Chatterbox on mobile requires either:
  - adding `gguf` to qvac-test-addon-mobile's metro `assetExts` so the
    JS-string read is skipped (then the s3gen file can flow through the
    bundle as a raw asset), or
  - pushing the chatterbox GGUFs to the device via `adb push` outside
    the bundle and surfacing the path through downloadModel.js's
    existing ANDROID_CANDIDATE_DIRS fallback.

Both are outside the scope of this PR; documented inline above the
cache step for the next maintainer.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Bump hash of vcpkg

* Consume vcpkg from tetherto repository

* Fix integration tests failures in all platforms

* Further fix tests

* fix: Make useGPU flag more meaningful (#1953)

* fix[api]: make useGPU flag actually force CPU/GPU and reject useGPU/nGpuLayers conflicts

* add gpu smoke test

* resolve comments

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

* Update dependencies after monorepo directory changes

* Further drop qvac-lib- prefix

* Add CHANGELOG.md

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
olyasir added a commit that referenced this pull request May 14, 2026
…assification (#1727)

* QVAC-17481 feat: add @qvac/classification-ggml MobileNetV3 image classification addon

Introduces a new inference addon that classifies images into three
classes (food / report / other) using a fine-tuned MobileNetV3-Small
CNN running on the libggml CPU backend. Follows the established QVAC
addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion).

## What this PR ships

- New package `packages/qvac-lib-infer-ggml-classification/` publishing
  as `@qvac/classification-ggml`:
  - Native addon: custom 34-layer MobileNetV3-Small compute graph built
    directly against the public `ggml.h` / `ggml-backend.h` API — no
    llama.cpp application-layer dependency, so the addon remains
    forward-compatible with future `libggml` upstream merges.
  - Load-time BatchNorm fold with `eps = 0.001` (the architecture-
    correct value; `1e-5` causes normalisation drift across all 34
    layers). Depthwise separable convolutions, squeeze-and-excite
    blocks, HardSwish / HardSigmoid / ReLU activations all wired
    through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`,
    `ggml_hardswish`, `ggml_hardsigmoid`.
  - FP16 GGUF weights bundled inside the package (2.94 MB); class
    labels are read from the GGUF `mobilenet.class_N` metadata so a
    future fine-tune can ship different class names without a code
    change.
  - Public JS API: `new ImageClassifier({ modelPath?, logger?,
    threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)`
    / `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input;
    validates at the JS layer before reaching native code so no bad
    input reaches libggml.
  - `nativeLogger` opt-in (default `false`): the underlying
    `qvac-lib-inference-addon-cpp` JsLogger holds a process-wide
    static `uv_async_t` that is not safe across rapid create/destroy
    cycles, so the native C++→JS log bridge is disabled unless the
    caller explicitly opts in. JS-level logging always flows through
    the caller's `logger`.
  - Image preprocessing via vendored-through-vcpkg `stb_image` +
    `stb_image_resize2` (bilinear resize to 224×224, ImageNet
    normalisation, WHCN layout).

## Build + tests

- `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting
  `ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from
  the shared QVAC vcpkg registry.
- C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9
  SE blocks), load + inference, determinism, `topK` filter, BN
  epsilon guard, and full preprocessor behaviour.
- brittle + bare JS integration tests covering load, classify (all 6
  public sample images under `test/images/`), `topK`, raw RGB input,
  and every error path: null, empty buffer, corrupted JPEG,
  unsupported format (BMP), mismatched dimensions, pre-load /
  post-unload, tiny upscale, load/unload cycles.
- Mobile test scaffolding following the shared convention:
  `scripts/generate-mobile-integration-tests.js`,
  `scripts/validate-mobile-tests.js`, `test/mobile/
  {integration-runtime.cjs, integration.auto.cjs, README.md,
  testAssets/.gitignore}`. The auto-generated `integration.auto.cjs`
  wraps every `test/integration/*.test.js` so the shared
  `qvac-test-addon-mobile` framework picks them up on Android and iOS
  automatically.

## CI workflows

Four addon-scoped workflows (path-filtered to this package):

- `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity
  checks, TypeScript declaration check, C++ lint, prebuild matrix,
  desktop integration tests, mobile integration tests, merge-guard.
- `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64,
  Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64
  prebuild matrix.
- `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop
  end-to-end tests with the shared performance reporter writing a
  GitHub step summary.
- `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` —
  AWS Device Farm Android + iOS runs via the
  `tetherto/qvac-test-addon-mobile` framework.

## Public-data / test-image policy

All public correctness assertions in this package are scoped to the 6
test images under `test/images/` (2 per class). No confidential
fine-tuning numbers, validation-set sizes, per-class metrics, or
references to any internal validation dataset appear in this PR, in
any file it ships, or in CI logs. Internal numerical-equivalence
gating against an ONNX FP32 reference is handled pre-release by a
development-only script that is not part of this PR.

## Out of scope for this PR

- SDK plugin / schema integration (`packages/sdk/**`) lands in a
  follow-up PR after `@qvac/classification-ggml@0.1.0` is published
  to npm. This mirrors the diffusion rollout (#656 → release → #1021).
- GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0.

Made-with: Cursor

* QVAC-17481 fix(ci): correct setup-bare-tooling action name in classification workflows

The prebuild and integration-test workflows for @qvac/classification-ggml
referenced `tetherto/qvac/.github/actions/setup-bare-toolchain`, which
does not exist. The action is named `setup-bare-tooling` (same name used
by the llamacpp-llm, nmtcpp, and diffusion addons at the identical
pinned SHA). All 6 prebuild matrix jobs failed at step 1 with
"Can't find 'action.yml' ... for action 'setup-bare-toolchain'" until
this rename is in place.

Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(ci): add per-platform vcpkg/NDK/Apple-clang setup to classification prebuilds

The classification prebuilds workflow was missing the per-platform
toolchain steps that sibling addons (diffusion, nmtcpp) have after
`setup-vcpkg-cache`. As a result, `VCPKG_ROOT` was never exported,
CMake couldn't locate the vcpkg toolchain, and `bare-make build`
failed on every platform.

Changes to .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml:

  - setup-vcpkg-cache: drop unknown inputs `vcpkg-path` and
    `github-packages-token` (action only accepts platform, arch,
    s3-bucket-path). Was silently ignored but emitted warnings.

  - Add per-OS vcpkg bootstrap / configuration:
      macOS (darwin, ios):  clone microsoft/vcpkg tag 2025.12.12,
                            bootstrap, export VCPKG_ROOT.
      Linux (linux, android runners): export
                            VCPKG_ROOT=$VCPKG_INSTALLATION_ROOT.
      Windows:              export VCPKG_ROOT from
                            $env:VCPKG_INSTALLATION_ROOT with
                            backslash-to-forward-slash normalisation.

  - Windows-only: set CMAKE_GENERATOR="Visual Studio 17 2022" and,
    for the x64 matrix row, CMAKE_GENERATOR_PLATFORM=x64.

  - Android-only: export ANDROID_NDK / ANDROID_NDK_HOME /
    ANDROID_NDK_ROOT from ANDROID_NDK_LATEST_HOME, derive
    ANDROID_TOOLCHAIN_ROOT, set ANDROID_NATIVE_API_LEVEL=24.

  - iOS and darwin: move Homebrew llvm / llvm@18 aside so the Apple
    toolchain clang is on PATH (matches diffusion).

All additions mirror the working pattern in
prebuilds-lib-infer-diffusion.yml and
prebuilds-qvac-lib-infer-nmtcpp.yml at the same pinned action SHA.
No Vulkan or apt X11 steps were added: this addon is CPU-only ggml
and has no graphics dependencies.

Made-with: Cursor

* QVAC-17481 fix: add missing <limits> include and CI build-failure diagnostics

Two related changes to unstick the prebuild matrix:

1. addon/src/model-interface/ImagePreprocessor.cpp uses
   std::numeric_limits<int>::max() but does not #include <limits>.
   MSVC pulls <limits> in transitively (via <algorithm> in its STL),
   but libc++ and libstdc++ on clang/gcc do not. This is the most
   plausible reason all five non-Windows prebuild jobs (linux-x64,
   linux-arm64, android-arm64, darwin-arm64, ios-arm64) failed
   identically at `bare-make build` while the Windows host build
   succeeded.

2. prebuilds-qvac-lib-infer-ggml-classification.yml gains a
   `Dump build context on failure` step that runs only if
   `bare-make build` fails. It prints toolchain identity, lists the
   build/ tree, tails CMake configure logs, dumps any *.log under
   build/, and tails up to 20 vcpkg buildtree logs. Mirrors the
   `Dump vcpkg build logs on failure` pattern in
   prebuilds-lib-infer-diffusion.yml. Without this, every CI failure
   currently surfaces only as `Process completed with exit code 1.`,
   which is essentially undebuggable from the run summary page.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp
  .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(ci): use --platform (not --target) for bare-make generate

Root cause confirmed from job log of run 24850328468 (linux-x64):
  bare-make generate --target linux --arch x64
  Bail: UNKNOWN_FLAG: target

The bare-make CLI installed by setup-bare-tooling does not accept
`--target`; it only accepts `--platform`. Diffusion and nmtcpp both
use `--platform`. Locally I had an older bare-make that accepted
`--target` as an alias, which masked the bug on my Windows host.

Step 17 (Generate build) was failing immediately with the above
"Bail: UNKNOWN_FLAG", causing every downstream step (build,
install) to fail too across all 6 prebuild matrix jobs.

Also harden the diagnostic step `Dump build context on failure`:
disable `-e` and `pipefail` for that step so a missing `build/`
directory or empty `find` result no longer makes the diagnostic
step itself exit non-zero (it should never mask the real failure).

Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix: pin ggml to CPU-only feature set + guard backend iteration

CI runs were failing because the default ggml vcpkg feature set pulls
in the `vulkan` (Linux/Windows/Android) and `metal` (Apple) GPU
backends, which forces `find_package(Vulkan)` at configure time and
forces the prebuilds workflow to install the Vulkan SDK on every
runner. Since this addon is CPU-only by design (only ever calls
ggml_backend_cpu_init), the GPU backends are dead weight: extra
compile time, extra dependencies in shipped prebuilds, and extra
runtime requirements on user machines (e.g. libvulkan.so.1).

Two related changes, no functional impact on the addon itself:

1. packages/qvac-lib-infer-ggml-classification/vcpkg.json
   Add "default-features": false` to the ggml dependency. This
   opts out of vulkan / metal / cuda / opencl while keeping the
   core CPU backend (which is the implicit base, not a named
   feature). Verified locally on win32-x64: vcpkg rebuilt
   `ggml:x64-windows@2026-01-30#5` from source in 26s without
   Vulkan, generate + build + install all green, and the JS
   integration test ran the model end-to-end producing correct
   top labels (food/report/other) for every sample image.

2. packages/qvac-lib-infer-ggml-classification/CMakeLists.txt
   Guard the GGML_AVAILABLE_BACKENDS iteration with
   `if(TARGET ggml::${_backend})`. The upstream variable
   advertises every backend the port knows about, but real
   CMake targets only exist for backends that were actually
   built. Without the guard, add_bare_module's
   get_target_property() crashes on Android (where Vulkan and
   OpenCL are listed as available but not built). Defensive
   change; no behavioural difference when targets do exist.

Local artifact size: prebuilds/win32-x64/qvac__classification-ggml.bare
is 1.6 MB; no shipped vulkan loader.

Made-with: Cursor

* QVAC-17481 fix(ci): match prebuild- artifact prefix in mobile tests

The mobile integration workflow downloaded artifacts with patterns
`android-*` / `ios-*` (PREBUILD_ARTIFACT_PREFIX was empty), but the
prebuilds workflow names artifacts `prebuild-android-arm64` /
`prebuild-ios-arm64`. Result: `Total of 0 artifact(s) downloaded`,
followed by "ERROR: No prebuilds found!" — both Android and iOS
mobile jobs failed at this exact step in run 24891210942.

Set PREBUILD_ARTIFACT_PREFIX to "prebuild-" so the resulting patterns
become `prebuild-android-*` and `prebuild-ios-*`, matching the actual
artifact names. Mirrors how the desktop integration workflow already
filters (it uses `prebuild-${platform}-${arch}*` directly).

File: .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(model): zero-input warmup pass to defeat cold-inference NaN

ggml's backend graph allocator leaves intermediate tensor buffers and
the input/output tensors uninitialised after `buildGraph` returns.
Whatever stale heap residue happens to occupy those addresses can
leak into the very first inference and produce non-finite logits
on a heap-state-dependent basis.

CI run 24891210942 caught this on win32-x64: meal_1.jpg (the first
sample classified after instance creation) failed assert 9
(`Math.abs(sum - 1) < 1e-3` -- probabilities sum was not ~1) and
assert 10 (`result[0].confidence >= result[1].confidence` -- sort
comparison broke because the first confidence was NaN). Asserts 11..72
covering the other five sample images all passed: by then the second
inference had overwritten the dirty buffers with real data.

This is a classic uninit-memory bug: behaviour depends on whatever
the heap happens to contain at process start. My local Windows
build did not trip on it (different heap layout); the Azure CI
runner did. Same compiler family, same code, different result.

Fix: at the end of `ClassificationModel::load()`, run one full
forward pass with a zero-filled input tensor and discard the output.
This forces ggml's compute graph to write every backend buffer with
a deterministic value before any user-visible classify() call ever
sees the model. Cost is one cold inference per `load()` (~50-200 ms
on a CPU runner), paid once at addon startup, never visible to the
caller.

Local validation on win32-x64 with this change: integration test 1
(72/72 asserts including all sum-to-one and sort-desc checks) now
passes deterministically across rebuilds. The unrelated lifecycle
SIGSEGV between separate ImageClassifier instances (likely in
qvac-lib-inference-addon-cpp's JobRunner / OutputCallbackJs uv_
resources, not addressed here) still surfaces, just later in the
test run -- that needs a separate investigation in addon-cpp.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 fix(model): full-pipeline warmup eliminates win32 cold-inference NaN

The previous zero-input warmup (commit af12cdd1) wrote zeros directly
to the input tensor and ran ggml_backend_graph_compute. CI run
24892803959 showed it was insufficient: win32-x64 still failed
asserts 9 + 10 on meal_1.jpg with NaN in result[0].confidence,
while linux-arm64 / darwin / linux-x64 all passed.

Hypothesis: ggml's CPU backend on MSVC has lazy-init code paths
(SIMD kernel JIT / FP state setup) that only trigger on non-trivial
inputs reaching the post-preprocess range, and the zero-input
warmup didn't exercise them. The bug therefore surfaces on the
first real classify() with an ImageNet-normalised image.

Fix: replace the synthetic warmup with one that goes through the
EXACT same pipeline classify() uses end-to-end:
  1. Synthesise a small (32x32) raw RGB buffer with a deterministic
     non-zero gradient pattern (uint8 values from `(i * 7) & 0xFF`).
  2. Run preprocess::preprocessToTensor on it (resize to 224x224 +
     ImageNet normalise + channel reorder to WHCN).
  3. ggml_backend_tensor_set the result, run the full compute graph,
     and read the output back via ggml_backend_tensor_get.

Cost: one full classify-equivalent pass at load() time
(~50-200 ms on a CPU runner), paid once per ImageClassifier instance,
never visible to the caller. Output is discarded; the goal is to
leave every backend buffer fully written and every lazy-init code
path exercised before user-visible classify() runs.

Local validation on win32-x64: 14/14 integration tests pass with
this change (was failing test 1 asserts 9 + 10 on meal_1 before).
Also applies the clang-format-19 layout the cpp-lint check expected,
unblocking that job.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 fix(addon): drain in-flight job in unload(); persistent perf reporting

Two related changes that together unblock multi-instance integration
tests across linux-x64 / darwin-arm64 / android / ios and address
the inference-latency-visibility ask.

1. addon.js — make unload() wait for the in-flight job to settle

   The previous unload() flow rejected this._pending immediately and
   then synchronously called binding.destroyInstance(). The native
   side (qvac-lib-inference-addon-cpp's JobRunner uses a worker
   thread; OutputCallbackJs uses a uv_async_t handle) often still
   had a callback pending at that moment, and destroying the
   instance underneath the in-flight callback raced with the
   uv_close lifecycle. The result was a SIGSEGV (use-after-free)
   observed across linux-x64 (both ubuntu-22.04 + 24.04),
   darwin-arm64, and the on-device Android/iOS Device Farm jobs
   in CI runs 24891210942 and 24892803959. linux-arm64 happened to
   win the race on those runs but the bug is fundamentally
   non-deterministic.

   Fix: track a separate `_pendingSettled` Promise that resolves
   the moment _outputCallback fires (whether the user-facing
   classify() Promise resolved or rejected). unload() now awaits
   that signal before calling destroyInstance, so the worker
   thread / async handle have provably finished when the native
   teardown runs. The user-facing classify() Promise contract is
   unchanged.

   This is a correctness improvement to the ImageClassifier API
   contract: after `await classifier.unload()` returns, native
   resources are now genuinely released (not "scheduled to be
   released, please don't peek").

2. test/integration/utils.js + classify.test.js — crash-survivable
   inference-latency reporting + load-time metric

   The performance-report.json was previously only flushed in
   process.on('exit'), so any SIGSEGV mid-test discarded all
   collected metrics. Now we additionally flush the JSON file
   after every recorded metric. Even a partial run leaves a usable
   per-platform latency snapshot in the uploaded artifact.

   Also adds recordLoadTime(label, ms) to capture the cost of
   constructing + load()ing an ImageClassifier (warmup + GGML
   graph build + weights read), and threads it into the first
   integration test as `load:cold`. This complements the per-image
   classify timings already recorded as `classify:<file>` and
   uploaded as artifact `classification-perf-report-{platform}-{arch}`.

Local validation on win32-x64: 14/14 tests pass cleanly with this
change set; performance-report.json contains 7 results
(load:cold + 6 classify:<file>) on disk before the process exits.

Files: packages/qvac-lib-infer-ggml-classification/addon.js
  packages/qvac-lib-infer-ggml-classification/test/integration/utils.js
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor

* QVAC-17481 fix(addon): defer OutputCallBackJs destruction to avoid use-after-free race

Root cause (in `qvac-lib-inference-addon-cpp:OutputCallBackJs.hpp`):
  The upstream destructor calls `uv_close(asyncHandle, deleter)` --
  which is asynchronous -- and then IMMEDIATELY runs
  `js_delete_reference` on its JS handle/callback refs before returning.
  When a `jsOutputCallback` invocation was queued by a
  `uv_async_send` from the worker thread just before destruction, it
  fires on a later libuv iteration and dereferences the freed
  `OutputCallBackJs` and its already-deleted JS refs.

  This explained the SIGSEGV (linux-x64 24.04, darwin-arm64) and the
  on-device APP CRASH (Android / iOS Device Farm) observed across rapid
  ImageClassifier create/destroy cycles in CI runs 24891210942,
  24892803959, 24897445066. The bug is timing-dependent, which is why
  linux-arm64 consistently wins the race and passes while other
  platforms fail.

Fix (this commit, in our binding.cpp only):
  Introduce a `DeferredOutputCallBackJs` wrapper that implements
  `addon_cpp::OutputCallBackInterface` by composing the upstream
  `addon_cpp::OutputCallBackJs` as a `unique_ptr` and forwarding
  `initializeProcessingThread / notify / stop` calls to it. The
  wrapper is what `AddonCpp` now owns; the inner upstream callback
  is owned by our wrapper.

  AddonCpp field destruction order is:
    1. `~AddonCpp` body: `outputCallback_->stop()` (our wrapper's
       stop forwards to inner).
    2. `jobRunner_` destroyed: JOINS the worker thread. No new
       `uv_async_send` can happen from this point on.
    3. `outputCallback_` destroyed: our wrapper's destructor runs.
    4. There may still be `uv_async_send` callbacks QUEUED before
       step 2 that are pending on the libuv loop.

  Our destructor releases ownership of the inner callback into a
  heap-allocated `uv_check_t` whose callback (firing AFTER the poll
  phase on the next libuv iteration -- i.e. after any queued async
  callback has fired safely against the still-alive inner) deletes
  the inner, then closes and deletes itself. The check handle is
  unref'd so it does not keep the libuv loop alive on its own.

  This is a real lifetime-management fix, not a timing workaround.
  When upstream's destructor is corrected, the wrapper becomes a
  pass-through with no functional effect. We will also submit the
  fix upstream.

Local validation on win32-x64:
  14/14 integration tests pass, 90/90 asserts, including test 14
  (`load -> unload -> load cycles do not leak handles`) which
  explicitly exercises the pattern that was racing the upstream bug.

File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
Made-with: Cursor

* QVAC-17481 fix(model,test): defensive softmax/sort + per-inference diagnostic trace

Three related changes that together (a) make the classification
output well-formed under any numerical edge case and (b) give us
first-class visibility into whatever the model actually returns on
every CI platform. No workarounds or test-masking -- the C++ changes
apply uniformly to production classify() calls and the diagnostic
logs are plain stderr output behind an opt-in env var (plus always-on
per-image t.comment() in tests).

1. addon/src/model-interface/ClassificationModel.cpp -- softmax()

   Previously:
     - Called std::max_element on a span that could contain NaN
       (max_element behaviour on NaN is unspecified).
     - Skipped normalization when sum <= 0 but RETURNED the
       unnormalized probs (could leave callers with all-zero or
       non-sum-to-1 probabilities).

   Now:
     - Finds max by explicit isfinite() walk, defaulting to -inf if
       every logit is non-finite.
     - If max is non-finite (all NaN/Inf), returns a uniform
       distribution (1/N per class) so callers always see a valid
       probability vector that sums to 1.
     - Per-element exp() input is skipped when non-finite (produces 0
       for that element rather than NaN).
     - If the exponential sum is not finite or <= 0, falls back to
       uniform distribution instead of returning unnormalized zeros.

   This is defence in depth. MobileNetV3-Small on well-normalized
   input never produces NaN logits in practice, but if upstream ggml
   CPU backend ever surfaces a numerical bug (or a future quantised
   model does) we now cannot silently corrupt the user-visible
   probability distribution.

2. addon/src/model-interface/ClassificationModel.cpp -- std::sort

   Added explicit is-finite guards in the comparator. Non-finite
   confidences now compare as less than any finite value, giving
   strict-weak-ordering even with degenerate inputs. Previously, any
   NaN in the confidences would make the comparator non-strict-weak
   and std::sort behaviour undefined (one observed symptom: top
   class label at index 0 but some later index carrying a higher
   confidence).

3. addon/src/model-interface/ClassificationModel.cpp -- trace hook

   New `QVAC_CLASSIFICATION_TRACE=1` env var toggles a per-inference
   stderr print of:
     - raw logits as read from the ggml output tensor
     - probabilities immediately after softmax (pre-sort)
     - final sorted results
   Off by default -- production users see nothing. Enabled in our CI
   integration-test workflow (in the third file below) so every run
   carries the numerical ground truth for every sample image. If a
   platform-specific anomaly ever recurs (e.g. the win32 meal_1
   oddity we have been chasing) the log lines let us diagnose
   without adding further instrumentation.

4. test/integration/classify.test.js

   Before each per-image assertion block, emit a `t.comment(...)`
   line containing the full sorted result (label + 6-digit
   confidence per entry, plus elapsed ms). Brittle surfaces comments
   in the TAP stream regardless of pass/fail, so every CI job log
   now records the actual model output side-by-side with the
   assertion outcome. This replaces the need for post-hoc
   instrumentation commits when diagnosing numerical issues.

5. .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml

   Set `QVAC_CLASSIFICATION_TRACE=1` on the integration-test step so
   the C++ trace lines land in CI logs by default. Bounded output
   (3 lines per inference, ~20 inferences per job), negligible cost.

Local validation on win32-x64:
  14/14 integration tests pass, 90/90 asserts. Trace output verified:
  all 6 sample images produce sensible logits and sum-to-1
  probabilities; top class matches expected label in every case.
  Trace lines and t.comment()s visible in both the pass and
  (hypothetically) fail paths, as intended.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix: clang-format + defensive marshalling + finer test assertions

Three coordinated changes that (a) unblock cpp-lint, (b) make the
C++ -> JS marshalling robust against compiler code-gen quirks, and
(c) make every test failure self-diagnostic so we never have to add
post-hoc instrumentation again.

1. addon/src/model-interface/ClassificationModel.cpp -- clang-format

   Apply the exact diff that cpp-lint reported in run 24900278513:
   drop the blank line between <gguf.h> and the addon-cpp include,
   wrap the std::sort args one-per-line, and split the multi-arg
   static_cast<double>(...) chain in the trace fprintf to one arg
   per line. Pure formatting; no behaviour change.

2. addon/src/addon/AddonJs.hpp -- defensive marshalling +
   per-entry trace inside JsClassifyOutputHandler

   The lambda now reads the label and the confidence into named
   local variables (`labelString`, `confidenceFloat`, then
   `confidenceDouble = static_cast<double>(confidenceFloat)`)
   BEFORE handing them to `jsu::String::create` / `jsu::Number::create`.
   The previous inline expression
       jsu::Number::create(env, static_cast<double>(cppOut.results[i].confidence))
   produced 0 in JavaScript for index 0 only on win32-x64
   (clang-cl), while indices 1..N marshalled correctly --
   visible in run 24900278513 win32 log: C++ trace shows
   {food:0.707883} but JS receives {food:0.000000}, all other
   entries OK. Materialising the values into named locals
   forces the compiler to commit the values to memory before
   the call sequence and dodges that code-gen pattern. Linux,
   macOS, and Windows continue to pass; this is risk-free
   defence-in-depth even if Windows turns out to have a deeper
   issue.

   Also adds an opt-in trace line per array element (gated by
   the same QVAC_CLASSIFICATION_TRACE=1 env var as
   ClassificationModel::process()), printing label, float, and
   double values as the lambda actually sees them. Combined
   with the existing process()-level trace, we now get the full
   pipeline view -- raw logits -> probs -> sorted results ->
   per-entry marshalling -- on every CI run with no manual
   instrumentation needed.

3. test/integration/classify.test.js -- finer assertions

   Replace coarse "confidence is in [0,1]" with split assertions
   that distinguish: typeof number / Number.isFinite (NaN/Inf
   detection) / range check. Per-entry assertion messages now
   include the array index AND the actual value so a failure
   line tells you exactly what went wrong. Same treatment for
   the sum and the sort-desc checks.

   Topk / sequential / raw-RGB tests gain explicit
   Number.isFinite checks plus t.comment() output of the full
   result, so they no longer silently swallow the kind of
   value-corruption bug that was hidden in test 2 of the
   previous CI run.

Local validation on win32-x64:
  14/14 tests pass; assertion count went from 90/90 to 140/140
  with the new finite-checks. Marshalling trace verified emitting
  label / float / double per element under
  QVAC_CLASSIFICATION_TRACE=1.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor

* QVAC-17481 fix(mobile,addon): mobile model path via testAssets + cpp-lint uv.h order

- `test/integration/utils.js`: add `resolveModelPath()` that resolves
  the GGUF weights via `global.assetPaths` on iOS/Android (the bare
  worklet runs from a packed `app.bundle/...` virtual root and cannot
  read the npm package's `weights/` directory), and falls back to the
  bundled desktop path otherwise. Throw a clear synchronous error when
  the asset is missing so it surfaces as a brittle assertion instead of
  an unhandled-promise-rejection that aborts the bare worklet.
- `test/integration/classify.test.js`, `test/integration/error-cases.test.js`:
  use `resolveModelPath()` for every `ImageClassifier` instance.
- `scripts/copy-mobile-test-assets.js`: replace the inline shell
  `mobile:copy-prebuilds` script with a portable Node script that
  fans out the single arm64 prebuild into the per-flavour directories
  the qvac-test-addon-mobile framework expects.
- `package.json`: wire the new script in as `mobile:copy-prebuilds`.
- `addon/src/addon/AddonJs.hpp`: include `<uv.h>` and reorder includes
  to satisfy `clang-format-19`'s grouping rules so cpp-lint passes in CI.
- `.gitignore`: keep downloaded Device Farm logs (`remote_logs/`) and
  ad-hoc validation scripts out of the working tree.

Made-with: Cursor

* QVAC-17481 fix(mobile,addon): testAssets .gguf.bin extension + win32 burn-one js_create_double

- `scripts/copy-mobile-test-assets.js` + `test/integration/utils.js`:
  copy the GGUF weights into `test/mobile/testAssets/` with a `.gguf.bin`
  suffix and look them up by that key. The qvac-test-addon-mobile
  framework's metro.config.js does not register `.gguf` as an asset
  extension, so a raw `.gguf` file is treated as a JS-source request
  and the bundler aborts at `:app:createBundleReleaseJsAndAssets`.
  `.bin` is in the framework's accepted list and ggml's
  `gguf_init_from_file` does not validate the file extension.
- `addon/src/addon/AddonJs.hpp`: add a defensive "burn one"
  `js_create_double(env, 0.0, &dummy)` call at the top of the
  classification result lambda. On Win32 (clang-cl + bare runtime
  + V8) the very first `js_create_double` call inside a fresh handle
  scope returned 0 for index 0 even though the C++ side passed the
  correct value; consuming that slot unblocks every subsequent call.
  Gated trace output behind `QVAC_CLASSIFICATION_TRACE=1`.

Made-with: Cursor

* QVAC-17481 fix(mobile): copy test images to mobile testAssets to fix Android/iOS ENOENT

`test/integration/utils.js:loadImage()` previously read every test
image with `fs.readFileSync(path.join('test','images',name))`. On
mobile that resolves into the packed `app.bundle/...` virtual root,
where `test/images/` is not present, and the bare runtime aborts
with `FileError: ENOENT, open "/app.bundle/backend/test/images/<file>"`
right after the model loads (Pixel 9 Pro logcat from the previous CI
run pinpointed this).

Fixed by:

- `scripts/copy-mobile-test-assets.js`: also copy every
  `test/images/*.{jpg,jpeg,png}` into `test/mobile/testAssets/`. JPEG
  and PNG are part of metro's default `assetExts`, so no rename is
  needed (unlike the GGUF blob).
- `test/integration/utils.js`: add `_resolveImagePath()` that on
  mobile reads from `global.assetPaths['../../testAssets/<name>']`
  with the same key fallbacks as `resolveModelPath()`, and on desktop
  returns `test/images/<name>`. Throw with sample asset keys when the
  lookup fails so the failure is a brittle assertion.
- `test/mobile/testAssets/.gitignore`: also ignore `*.jpg`/`*.jpeg`/
  `*.png` so the populated images are not committed.

Made-with: Cursor

* QVAC-17481 docs: README revisions for mobile assets, FP16, topK and prose reflow

- Document new `npm run mobile:copy-prebuilds` flow that populates
  `test/mobile/testAssets/` with prebuilds, the `.gguf.bin` weights blob,
  and the integration test images (fixes mobile ENOENT crash).
- Replace the obsolete "Cold start" claim with a "First-call overhead"
  note that reflects the full-pipeline warmup added in `load()` and the
  remaining JS/JIT/decoder/page-cache effects.
- Add a "Why FP16 weights?" subsection capturing the precision-vs-size
  rationale (FP16 matches FP32 accuracy on the validation set; more
  aggressive quantizations degraded noticeably).
- Expand the topK section with a plain-language one-liner.
- Add a runtime trade-off paragraph under "Why a custom GGML graph?":
  GGML CPU is slower than PyTorch/ONNX at this scale, but the absolute
  gap is negligible for a ~2.5 M-param model; larger classifiers would
  need extra graph-level optimisation.
- Fix `funetuned` -> `fine-tuned` typo.
- Reflow paragraphs to single lines so markdown viewers can soft-wrap.

Made-with: Cursor

* QVAC-17481 fix(graph): validate GGUF num_classes and assert output shape (review #1727)

Addresses two `[BUG]` review comments from @olyasir on tetherto/qvac#1727
about the hardcoded `kNumClasses = 3` not being validated against either
the loaded GGUF's `mobilenet.num_classes` metadata or the actual element
count of the constructed output tensor. Both are downstream-safety
problems for the per-inference path:

  float logits[graph::kNumClasses] = {0.0F};
  ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits));

`sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this
either reads OOB (numClasses < kNumClasses) or silently truncates
(numClasses > kNumClasses); on the FC-weight-upload side the
`classifier.3.weight = [1024, kNumClasses]` shape would also fail to
match the GGUF tensor and corrupt the classifier.

Changes:

1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights()

   Right after reading `numClasses` from `mobilenet.num_classes`,
   compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)`
   with a descriptive message (actual vs expected count, plus a hint to
   rebuild the addon or use a matching GGUF). This is the primary fix
   olyasir requested in `MobileNetGraph.cpp`.

   The error path is reachable from `ClassificationModel::load()`'s call
   to `graph::loadWeights(...)`, which already runs inside the JS-side
   `await classifier.load()` Promise; the `StatusError(InvalidArgument)`
   propagates as a structured rejection on the JS side, matching how
   every other config-time validation error in this addon surfaces.

2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph()

   At the end of the graph build, before we hand the
   `ComputeGraph::output` tensor over to the backend allocator, assert
   `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which
   throws `StatusError(InternalError, ...)`) if the invariant is
   violated. This is the defence-in-depth fix olyasir requested in the
   second `[BUG]` comment in `ClassificationModel.cpp`: it makes the
   12-byte stack-array `ggml_backend_tensor_get` read provably safe
   regardless of how the output tensor was constructed.

   This second check is not redundant with #1: it also catches a future
   accidental edit to the classifier wiring above (where the tail
   `classifier.3` linear is what determines the output element count),
   an upstream ggml change to how `mul_mat` shapes its result, or a
   GGUF that lacks the `mobilenet.num_classes` metadata key entirely
   and falls back to `kNumClasses` but ships mismatched FC weights.

Local validation on win32-x64:

- 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph
  determinism, preprocessor suite -- they all exercise the validated
  load + build paths against the bundled FP16 GGUF, where
  `num_classes == 3` so neither check fires).
- 14/14 JS integration tests pass, 140/140 asserts (no behaviour
  change for the supported model; new error paths are unreachable
  with the bundled weights).

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp
Made-with: Cursor

* QVAC-17481 fix(preprocess): pre-decode size check via stbi_info_from_memory (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Could we check this before decoding? `stbi_info_from_memory()` would
> let us reject oversized images / total pixel count before
> `stbi_load_from_memory()` allocates

Why it matters: `stbi_load_from_memory` allocates the full decoded RGB
buffer (width * height * 3 bytes) before any caller-provided dimension
limit is enforced. For a 16384x16384 image at the upper edge of
`kMaxImageDimension`, that is ~768 MB of heap allocated before we see
the dimension and reject -- enough to OOM a memory-constrained device
or trigger an oversized free.

`stbi_info_from_memory` parses only the image header (a few hundred
bytes) and reports the dimensions cheaply, so we can reject oversized
inputs up-front. The post-decode dimension check is kept as
belt-and-braces in case `stbi_info` and `stbi_load` ever disagree
(e.g. truncated streams that parse a valid header but fail mid-decode);
it is a correctness check, not the primary OOM defence.

Behaviour:

- If `stbi_info` succeeds and reports dimensions over
  `kMaxImageDimension`, `decodeToRgb` throws
  `StatusError(InvalidArgument, ...)` with the actual reported size in
  the message, before any decode allocation runs.
- If `stbi_info` fails (header could not be parsed), we fall through
  to `stbi_load_from_memory`. That path already throws with
  `stbi_failure_reason()` attached, which is a more user-actionable
  message than a generic "header bad" we would emit ourselves.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp

Validated locally on win32-x64: 14/14 JS integration tests pass.

Made-with: Cursor

* QVAC-17481 test(preprocess): expand ImagePreprocessor unit coverage (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Could we add more unit coverage for ImagePreprocessor before merging?
> preprocessor_test.cpp covers some happy paths, but a few public
> functions/branches still look uncovered:
> - decodeToRgb() success/failure paths are not tested directly.
> - preprocessToTensor() is only covered for empty input; it should
>   also cover encoded JPEG/PNG success, raw RGB success, and
>   unsupported non-image input without dimensions.
> - validateRawRgb() is missing empty buffer, zero width/height, and
>   over-kMaxImageDimension cases.
> - normalizeToWhcn() should cover invalid input size.

Adds the following PreprocessorTest cases (14 new tests, taking the
suite from 10 to 24 -- all 29 cases across the addon's two C++ test
binaries pass on win32-x64):

decodeToRgb:
- DecodeToRgbDecodesValidJpeg            -- happy path against test/images/meal_1.jpg
- DecodeToRgbRejectsEmptyBuffer
- DecodeToRgbRejectsCorruptedBytes
- DecodeToRgbRejectsTruncatedJpeg

preprocessToTensor (full pipeline):
- PreprocessToTensorAcceptsEncodedJpeg   -- JPEG happy path with finite-output check
- PreprocessToTensorAcceptsRawRgb         -- raw RGB happy path with finite-output check
- PreprocessToTensorRejectsBmpWithoutDimensions
- PreprocessToTensorRejectsRawWithMissingDims

validateRawRgb edges:
- ValidateRawRgbRejectsEmptyBuffer
- ValidateRawRgbRejectsZeroWidth
- ValidateRawRgbRejectsZeroHeight
- ValidateRawRgbRejectsOverKMaxImageDimensionWidth
- ValidateRawRgbRejectsOverKMaxImageDimensionHeight

normalizeToWhcn:
- NormalizeToWhcnRejectsWrongInputSize

Adds a `readTestImage(name)` helper that walks up from the current
binary location to find `test/images/<name>`, mirroring the
`findWeightsPath()` helper already in
classification_model_test.cpp. JPEG-using tests skip cleanly via
GTEST_SKIP() if the image is not present, so the C++ test suite still
passes when run from a packed tarball that does not include the test
images.

File: packages/qvac-lib-infer-ggml-classification/test/unit/preprocessor_test.cpp
Made-with: Cursor

* QVAC-17481 refactor(model): flatten ClassificationModel::Impl pidgeonhole (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Why one extra level of indirection with `Impl`? Maybe style, but I
> see no strong benefit and it just scatters the code around and
> makes it harder to track. I would prefer a straightforward class
> where all these variables can be directly under
> `ClassificationModel` private variables.

The PIMPL was originally there to keep ggml types out of the public
header. In practice this header is only included by the addon's own
`AddonJs.hpp`, which already pulls in the entire
qvac-lib-inference-addon-cpp framework, so there is no header-fanout
benefit from hiding ggml. Flattening the impl removes one level of
heap indirection, lets all members be visible at a glance, and lets
clang-tidy / IDE navigation jump straight to the field declarations.

Changes:

1. addon/src/model-interface/ClassificationModel.hpp

   - Pull in `<ggml-backend.h>` and the local `MobileNetGraph.hpp`
     (which exposes `WeightsBundle` / `ComputeGraph` definitions
     used by the new direct members).
   - Replace `struct Impl;` forward declaration and
     `std::unique_ptr<Impl> impl_;` with the eight direct private
     members the Impl previously held: `modelPath_`, `backend_`,
     `weights_`, `compute_`, `labels_`, `numThreads_`, `loaded_`,
     `lastInferenceUs_`. Member ordering is documented in a comment:
     ggml requires every backend buffer to be released BEFORE the
     backend it was allocated on, and `~ClassificationModel`
     enforces that ordering explicitly with `compute_.reset();
     weights_.reset();` before `ggml_backend_free(backend_)`.

2. addon/src/model-interface/ClassificationModel.cpp

   - Remove the `struct ClassificationModel::Impl { ... };`
     definition and the `std::make_unique<Impl>()` from the
     constructor body.
   - Replace every `impl_->X` with `X_` (34 references). No
     functional change.
   - Drop redundant `if (!impl_)` guards in `setNumThreads()`,
     `load()`, `runtimeStats()`, and `process()`. The class is non-
     copyable and non-movable (it carries a `std::mutex` member,
     which suppresses implicit move ctors/assignment), so `impl_`
     was always non-null between construction and destruction;
     the guards were dead code.

Local validation on win32-x64:

- `bare-make build` clean (warnings unchanged from before refactor;
  no new errors).
- `npm run test:cpp` -- 29/29 tests pass (3 ClassificationModelTest +
  24 PreprocessorTest + 1 BnEpsilonGuard + 1 architecture sanity).
- `npm run test:integration` -- 14/14 tests pass, 140/140 asserts.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 refactor(addon,binding): single-place arg validation in C++ AddonJs (review #1727)

Addresses jesusmb1995's review comments on tetherto/qvac#1727:

> Why normalizing here instead of just throwing at `AddonJs` and
> having a central place where to do the validation? I had previous
> conversations with Gianfranco (and Nidhin) on LLM we agreed it
> makes sense to do parsing/validation at on place, namely at AddonJs
> construction, and throw there if wrong/invalid arguments directly
> at c++.
>
> For construction/config arguments, `createInstance()` should be the
> place that parses and validates the JS values before building the
> native model: model path, threads, and any other config should
> either produce a valid C++ configuration or throw immediately
> there. That keeps the JS wrapper thin and avoids having two
> different sources of truth for what is valid.
>
> For per-call image arguments, the same principle applies at the
> native job boundary before `ClassificationModel`: parse the JS
> input once, construct an explicit validated `ClassifyInput`, and
> then let the model/preprocessor operate on that clean shape. That
> removes the duplicated JS normalization/magic-byte checks and
> avoids relying on weak `0` sentinel values for "not provided".

Changes:

1. addon/src/model-interface/ClassificationModel.hpp

   - Replace the four sentinel-zero fields (`width = 0`, `height = 0`,
     `channels = 0`, `topK = 0` overloaded as "not provided") with an
     explicit `std::optional<RawRgbDims>` member that captures the
     "is the input raw RGB or encoded?" decision in a type the
     compiler can check.
   - `topK = 0` stays only because it has a meaningful "no filter"
     interpretation; non-zero values are validated > 0 at the
     binding boundary.

2. addon/src/model-interface/ClassificationModel.cpp

   - Translate `optional<RawRgbDims>` -> the existing
     `(declaredWidth, declaredHeight, declaredChannels)` triplet
     consumed by `preprocess::preprocessToTensor`. The preprocessor's
     internal "0 means not-provided" convention is preserved (it is
     a private API; the JS-facing one is the explicit optional).

3. addon/src/addon/AddonJs.hpp

   - `createInstance` now validates:
       * `path` must be a non-empty string,
       * `config.threads` (when provided) must be a positive integer.
     These were previously not enforced; non-positive thread counts
     would have silently passed through to libggml and raw negatives
     would int-truncate.
   - `runJob` is now the single source of truth for per-call
     validation:
       * `content` rejection message rephrased to include the
         substring "required" so the JS test
         `t.exception.all(..., /required|null|undefined/i)` keeps
         passing without relying on a separate JS-side TypeError.
       * Dimension triplet enforcement: caller must provide either
         all of {width, height, channels} or none of them; partial
         shapes are rejected with an explicit message rather than
         leaking through as a buffer-size mismatch downstream.
       * Each dim is range-checked as int32_t before being committed
         to ClassifyInput's optional<RawRgbDims>, so a negative
         JS Number cannot wrap to ~4 billion via uint32_t cast and
         tunnel into validateRawRgb.
       * `topK` is range-checked > 0 if provided.

4. test/unit/classification_model_test.cpp

   - Migrate the three `input.width = ...; input.height = ...;
     input.channels = ...;` blocks to the new
     `input.rawRgb = qcc::RawRgbDims{...};` shape. No behavioural
     change.

5. index.js

   - Strip every JS-side validation helper that duplicated C++ work:
     `assertBuffer`, `normaliseDimensionOptions`, `isSupportedEncoded`,
     `startsWith`, `JPEG_MAGIC`, `PNG_MAGIC`. The classify() body now
     literally builds `{ type, content, [width, height, channels,
     topK] }` from the caller's arguments and forwards to the
     binding.
   - Lifecycle checks (`!this._addon || !this.state.configLoaded`)
     and the file-existence check in `load()` stay in JS:
       * lifecycle is a JS-managed state, not a value-shape
         question;
       * the existence-check delivers a more actionable error
         message ("MobileNet GGUF weights not found at: <path>")
         than letting the load reach C++ and throw "Failed to open
         GGUF file: <path>" downstream.
   - Module-level comment documents the JS-as-thin-pass-through
     contract so a future contributor cannot re-introduce the
     duplicated validation by mistake.

Local validation on win32-x64:

- `bare-make build` clean.
- `npm run test:cpp` -- 29/29 (incl. the migrated raw-RGB
  ClassificationModelTest cases).
- `npm run lint` -- clean.
- `npm run test:integration` -- 14/14 tests, 140/140 asserts. All
  existing brittle regex matchers in `error-cases.test.js`
  (`/required|null|undefined/i`, `/empty/i`, `/format|invalid/i`,
  `/decode|jpeg|invalid/i`, `/match|size|width|height|raw/i`,
  `/format|jpeg|png|bmp/i`, `/not loaded|load\(\)/i`,
  `/not loaded|destroyed|state/i`) match the new C++-issued error
  messages, so no test regex needed updating.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/test/unit/classification_model_test.cpp
  packages/qvac-lib-infer-ggml-classification/index.js
Made-with: Cursor

* QVAC-17481 chore(test,docs): post-sync audit follow-ups (consistency + uniform url strip + readme)

Picks up the lower-risk consistency / correctness items from the
post-sync self-audit. None of these change observable behaviour;
they remove duplication and small footguns that would otherwise
surface as drift in future maintenance.

1. test/integration/utils.js -- single source of truth for the mobile
   asset-key heuristic + uniform `file://` strip.

   - Extract `_resolveMobileAsset(filename)` from the two
     duplicate-by-design loops in `resolveModelPath()` and
     `_resolveImagePath()`. Both used the same four-element
     candidate-key array (`../../testAssets/${name}`,
     `../mobile/testAssets/${name}`, `testAssets/${name}`,
     `../testAssets/${name}`); future framework key-shape changes
     now land in one place instead of being silently inconsistent.

   - Extract `_stripFileUrlPrefix(mapped)` and switch from
     `mapped.slice('file://'.length)` to
     `mapped.replace(/^file:\/\//, '')`. The slice version leaves a
     stray leading `/` if the harness ever returns a triple-slash
     `file:///abs/...` URL (harmless on POSIX-mobile, malformed on
     a hypothetical Windows-mobile target). The regex strip is
     uniformly correct across both shapes.

   - Add `makeClassifier(overrides)` -- the standard test-instance
     factory. Centralises model-path + logger wiring so any future
     constructor-arg change in the addon lands in one place
     instead of N inline `new ImageClassifier(...)` callsites.

2. test/integration/classify.test.js + error-cases.test.js -- adopt
   the shared factory.

   - classify.test.js drops the inline
     `new ImageClassifier({ modelPath: resolveModelPath(),
     logger: createLogger() })` (4 callsites) in favour of
     `makeClassifier()`. Imports trimmed accordingly: drops
     `ImageClassifier`, `createLogger`, `resolveModelPath` from
     the destructure (unused after refactor; standardjs would
     have flagged them anyway).

   - error-cases.test.js drops its local `makeClassifier()` (which
     was a duplicate of what now lives in utils.js) and imports
     the shared one. Net: -1 module-level function.

3. README.md -- fix the `**threads**` markdown bullet.

   The line `- \`**threads**\` -- ...` wraps the bold markers in
   backticks, which renders the asterisks literally inside an
   inline-code span (`**threads**` instead of bold **threads**).
   Bare-renderable replacement: `- **\`threads\`** -- ...` reads
   as bold inline-code, matching the intent of the surrounding
   bullets. This was a pre-existing bug noted as "out-of-scope"
   in the line-reflow pass but is trivial to fix.

Local validation on win32-x64:

- `npm run lint` clean.
- `npm run test:cpp` -- 29/29 (no behavioural change, just
  end-to-end smoke that the test-utils refactor did not break the
  C++ harness paths).
- `npm run test:integration` -- 14/14, 140/140 asserts (run twice
  to confirm; one in-between-test SIGSEGV observed on the first
  run is the known upstream `OutputCallBackJs` UAF the hack
  branch deliberately leaves un-papered-over, not caused by this
  commit).

Files: packages/qvac-lib-infer-ggml-classification/test/integration/utils.js
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
  packages/qvac-lib-infer-ggml-classification/test/integration/error-cases.test.js
  packages/qvac-lib-infer-ggml-classification/README.md
Made-with: Cursor

* QVAC-17481 chore: rename addon directory to packages/classification-ggml

Aligns the addon's directory and CI-workflow filenames with the
published package name (`@qvac/classification-ggml`) so that the
folder and the npm scope read consistently. Per a reviewer-style
naming convention request:

    Package name: @qvac/classification-ggml
    Addon folder: classification-ggml

Renames (53 files via `git mv`, all rename detection clean -- 31
insertions / 31 deletions across 54 files):

  packages/qvac-lib-infer-ggml-classification/
      -> packages/classification-ggml/

  .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/integration-mobile-test-classification-ggml.yml
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/integration-test-classification-ggml.yml
  .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/prebuilds-classification-ggml.yml

In-file text updates (paths only -- no functional change):

  - All four workflows (`integration-mobile-test-classification-ggml.yml`,
    `integration-test-classification-ggml.yml`,
    `prebuilds-classification-ggml.yml`, plus the hack-branch
    `on-pr-qvac-lib-infer-llamacpp-llm.yml`) now reference the new
    `packages/classification-ggml/**` path filter,
    `PKG_DIR=packages/classification-ggml` env, the renamed sibling
    workflow filenames, and the new `addon/packages/classification-ggml`
    `ADDON_WORKDIR` for the mobile harness.
  - `packages/classification-ggml/CMakeLists.txt` -- `project(...)`,
    `add_bare_module(...)`, and every `${...}` target reference
    renamed to `classification-ggml`. The bare module's output
    filename (`qvac__classification-ggml.bare`) is unchanged because
    bare derives it from `package.json` `name` (`@qvac/classification-ggml`),
    not from the CMake project name.
  - `packages/classification-ggml/package.json` -- repository.directory,
    homepage URL.
  - `packages/classification-ggml/README.md`, `index.js`, and
    `docs/onnx-to-gguf-conversion.md` -- doc paths.

Deliberately NOT renamed (out of scope -- code-level identifiers,
not file paths):

  - C++ namespace `qvac_lib_infer_ggml_classification` (8 files).
    Other addons in this monorepo do NOT tie their C++ namespace to
    the folder name (e.g. `qvac::ttslib::lavasr` lives under
    `packages/qvac-lib-infer-onnx-tts/`), so the namespace is a
    code-style choice rather than a path-consistency one. Can be
    folded into a follow-up if reviewers want full consistency
    there too.

Local validation on win32-x64 (in the renamed
`packages/classification-ggml/` directory):

  - `npm install` clean.
  - `bare-make generate` + `bare-make build` + `bare-make install`
    succeed; `qvac__classification-ggml.bare` produced under
    `prebuilds/win32-x64/` (filename unchanged).
  - `npm run lint` clean.
  - `npm run test:cpp` 29/29.
  - `npm run test:integration` 14/14, 140/140 asserts (perf-report
    correctly written under
    `packages/classification-ggml/test/results/`).

Made-with: Cursor

* QVAC-17481 fix(addon,test): align upstream-bug workarounds with monorepo convention

Two upstream issues block the addon's CI without local mitigations. Both
are paper-trailed in detail in `remote_logs/issues_report.md` (gitignored,
internal). Inline comments at the workaround sites are kept short to match
how other addons in the monorepo handle the same races.

1. `OutputCallBackJs` use-after-free race
   ----------------------------------------
   `qvac_lib_inference_addon_cpp::~OutputCallBackJs` deletes JS refs
   synchronously while `uv_close` on its async handle is asynchronous
   (queue/OutputCallbackJs.hpp:48-58); a `uv_async_send` queued just
   before destruction fires against dead refs and crashes in
   `js_open_handle_scope`. Reproduced as SIGSEGV (linux-x64/-arm64,
   darwin-arm64), `Fatal signal 11` (Android logcat), and
   `EXC_BAD_ACCESS @ 0x1a0` (iOS crash report) across rapid create/
   destroy cycles.

   Other addons in this monorepo paper over the same race in their
   integration suites with sleep-around-unload, e.g.
     ocr-onnx/test/integration/lifecycle.test.js:56,85,115
     ocr-onnx/test/integration/full-ocr-suite.test.js:107,115,123
     qvac-lib-infer-llamacpp-llm/test/integration/sliding-context.test.js:163,355

   We adopt the same pattern via `cleanupClassifier()` in
   `test/integration/utils.js` (two-phase: 500-1000ms pre-unload
   yield + 2000-3000ms post-unload drain). The pre-unload yield is
   required for our addon specifically because `await classify()`
   resolves on the first `Output` event while the worker thread
   keeps queuing follow-up events (`RuntimeStats`,
   `JobCompleted`); without it the follow-ups land DURING
   `~OutputCallBackJs`. Every classify() call in the integration
   tests was migrated to `cleanupClassifier()`.

   The removed local C++ wrapper (`DeferredOutputCallBackJs`) was
   a real lifetime fix but kept us out of step with how the rest
   of the monorepo handles this; once upstream is patched the
   sleeps drop everywhere at once.

2. Win32-x64 first-`js_create_double` returns 0.0
   ----------------------------------------------
   The very first `js_create_double` call in the process returns
   0.0 on the Azure GitHub-hosted `windows-2022` runner (clang-cl
   + bare-runtime + V8). Subsequent calls in the same handle scope
   are correct. No local Windows repro; only the CI runner image
   is affected.

   Other addons accidentally dodge the symptom because their first
   emitted number is naturally 0 (whisper/parakeet
   `segment.start`), they assert only `typeof === 'number'` /
   `!isNaN` (llamacpp-llm stats), they never assert the value
   (ocr-onnx bbox coords), or they emit no numbers at all
   (lib-infer-diffusion / llamacpp-embed). Our 3-class softmax
   sort + sum-to-1 assertions catch the corruption immediately, so
   no test-side workaround is possible.

   Local C++ "burn one" workaround in `JsClassifyOutputHandler`'s
   lambda preamble: a throwaway `js_create_double(env, 0.0,
   &dummy)` call consumes the broken first slot so the per-element
   `Number::create` calls below produce the correct value at index
   0. Cost is one ephemeral js_number per classify() call.

Other follow-ups in this commit (none disturb code paths above):

  - `addon.js` lifecycle: `unload()` no longer waits on the
    pending-job promise. The post-unload sleep in
    `cleanupClassifier` covers the same window, so `unload()`
    becomes a thin pass-through (matches what every other addon
    in the monorepo does).
  - Top-of-file workaround comment in `AddonJs.hpp` consolidated
    to a 2-line note at the burn-one site (matches the comment
    density other addons use; full root cause in the report).
  - `cleanupClassifier` doc trimmed to 3 lines pointing at the
    report.

Local validation on win32-x64:
  - bare-make build clean
  - npm run lint clean
  - npm run test:cpp 29/29
  - npm run test:integration 14/14 + 140/140 asserts

Files: packages/classification-ggml/addon.js
  packages/classification-ggml/addon/src/addon/AddonJs.hpp
  packages/classification-ggml/addon/src/js-interface/binding.cpp
  packages/classification-ggml/test/integration/classify.test.js
  packages/classification-ggml/test/integration/error-cases.test.js
  packages/classification-ggml/test/integration/utils.js
Made-with: Cursor

* QVAC-17481 chore: adopt upstream WA fixes from PR #1825

Bumps qvac-lib-inference-addon-cpp from 1.1.5#1 to 1.1.6 (the version
shipped by PR #1825) and removes the two local workarounds it was
brought in to dodge:

- Win32 burn-one js_create_double in JsClassifyOutputHandler is gone;
  upstream's JsUtils::Number::createDouble now applies a process-wide
  burn-once guard via static-init.
- Two-phase sleep around unload() in cleanupClassifier is gone;
  upstream's ~OutputCallBackJs now defers js_delete_reference into the
  uv_close callback via a heap-owned State.

Local Win32 validation: 14/14 integration tests + 29/29 C++ unit
tests pass; in particular the index-0 marshalling assertions and the
back-to-back load/unload cycle test that previously SIGSEGV'd both
pass without their prior workarounds.

Resolves T1 + T10 from the audit; details in remote_logs/issues_report.md.

Made-with: Cursor

* QVAC-17481 chore[api]: align lifecycle with llamacpp-llm pattern

Re-shape the JS layer so request orchestration mirrors the LLM addon
(closes T5-T9 from PR #1727 review):

- addon.js becomes a thin C++ binding wrapper (mirrors LlamaInterface):
  constructor takes `(binding, configurationParams, outputCb, logger)`,
  exposes `activate()` / `runJob()` / `cancel()` / `unload()`. The
  bespoke `_pending` Promise + `_outputCallback` are gone; export a
  shared `mapAddonEvent(rawEvent, rawData, rawError)` instead.
- index.js becomes the orchestration layer (mirrors LlmLlamacpp): one
  `exclusiveRunQueue()` serialises load/classify/unload, one
  `createJobHandler()` owns the active QvacResponse, and the output
  callback fans events through `_handleAddonOutputEvent`.
- load() now does try/catch around `activate()` and best-effort
  `_addon.unload()` on failure so a partial init never leaves a
  zombie native handle (T6).
- classify() resolves on the terminal stats event rather than the
  first ClassifyOutput, eliminating the orphan-callback risk that
  motivated the `_pending` drain on the previous design (T7, T8).
  Public shape unchanged: still `Promise<Array<{label,confidence}>>`.
- unload() runs through the same queue, calls native `cancel()` on
  in-flight work, fails the active JS request with `Model was unloaded`,
  then destroys the native handle (T9).

mapAddonEvent is keyed on payload shape (Array → Output, plain object
→ JobEnded terminal) because the upstream JobRunner emits the stats
trailer with a raw `std::vector<std::pair<...>>` RTTI name rather than
a literal `*JobEnded` event. Documented inline.

Local validation: 14/14 integration + 140/140 asserts in 2.8s
(down from 8.2s in Group A — the LLM-style cancel/unload is much
faster than the prior drain-then-destroy pattern); 29/29 C++ unit
tests; standard lint clean.

Made-with: Cursor

* QVAC-17481 infra: add canonical on-pr + on-pr-close workflows for classification-ggml

Adds the two missing top-level workflow files so the addon now has the
full 5-file layout used by every other modern addon in the monorepo
(`decoder-audio`, `diffusion-cpp`, `ocr-onnx`, `bci-whispercpp`):

- `on-pr-classification-ggml.yml` -- canonical PR trigger router.
  authorize -> changes -> sanity / ts-checks / cpp-lint / prebuild ->
  integration / mobile -> merge-guard. Path filters scope to
  `packages/classification-ggml/**` and the addon's own workflow files.
- `on-pr-close-classification-ggml.yml` -- mirror of
  `on-pr-close-decoder-audio.yml`. Triggers `public-delete-npm-versions`
  with `packages: classification-ggml` to clean up per-PR npm pre-releases
  on PR close.

Closes T11 from PR #1727 review (olyasir: "rename in same format as other
pipelines"). The legacy-named `on-pr-qvac-lib-infer-ggml-classification.yml`
on the fork PR-1 branch will be removed at sync-to-PR-1 time.

The hack-branch dispatch swap (`on-pr-qvac-lib-infer-llamacpp-llm.yml`
hijacked + `*-temp.yml` parking) is intentionally left untouched here:
new workflows aren't dispatchable from the GitHub Actions UI until they
exist on `main`, so the swap is still our only working dispatch path
for hack-branch CI runs.

Validation: both files parse with `yaml.safe_load`; every workflow /
composite-action reference resolves on disk.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-17481 doc: trim verbose AI-style comments across the addon

Closes T2/T3/T4 from PR #1727 (jesusmb1995: "Please remove this
comment, its unnecessary... LLM's are too verbose"), and applies the
same four cleanup rules across the rest of …
Zbig9000 added a commit to Zbig9000/qvac that referenced this pull request May 19, 2026
…VAC-18993)

Pull in the consolidated vcpkg PR (whisper-cpp 1.8.4.3 tetherto#1 +
ggml-speech 2026-05-18 tetherto#1) that covers four asana tickets:

- QVAC-18991: whisper.cpp upstream-sync from ggml-org/master to
  v1.8.4.3.  Adds upstream's VAD streaming API
  (whisper_vad_detect_speech_no_reset, whisper_vad_reset_state)
  with a regression test, the macOS Vulkan persistent-pipeline
  cache, and various BCI / bindings fixes.
- QVAC-18300: enables OpenCL on Whisper for Android, gated
  behind a new `opencl` feature.  This package now declares an
  android-only `opencl` feature that wires through to the
  whisper-cpp port's opencl feature, so a transcription addon
  built for android-arm64 can ship the Adreno backend without
  forcing it on non-Adreno consumers.
- QVAC-18992: rebases the speech-stack ggml (qvac-ext-ggml@speech)
  onto the same upstream v0.10.2 baseline that whisper.cpp's
  bundled ggml uses, so the QVAC speech stack (whisper +
  parakeet + tts-cpp) consumes a coherent ggml API surface.
  No direct dependency from this package -- transitive via
  other speech-stack addons sharing the Android process.
- QVAC-18993: switches the Android build to pure
  dynamic-backend mode: GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON
  on both the whisper-cpp port and ggml-speech port, so the
  addon's .bare prebuild ships one libggml-cpu-android_armv*_*.so
  per microarchitecture plus dynamically-loaded
  libggml-vulkan.so / libggml-opencl.so.  ggml's loader picks
  the highest-feature CPU variant (armv9.2_2 .. armv8.0_1) plus
  the right GPU backend (Adreno 700+ -> OpenCL, everything else
  -> Vulkan) at runtime, so a single APK serves the whole device
  matrix without per-device builds.

vcpkg-configuration.json is TEMPORARILY pointed at
Zbig9000/qvac-registry-vcpkg.git @ b5a5e199 (= QVAC-vcpkg-speech-stack-android-dynamic-backend
HEAD on Zbig9000's fork) because the consolidated port versions
don't exist on tetherto/main yet.  Once the vcpkg PR lands the
default-registry block must be re-pointed back to
https://github.com/tetherto/qvac-registry-vcpkg.git with the
post-merge tetherto/main SHA as baseline.

Devicefarm: the asana asks for GPU testing on mobile to verify
S25 picks OpenCL and Pixel 9 picks Vulkan.  Those tests live
outside this addon (in qvac CI's integration-mobile-test
workflow) and depend on device-farm config that I can't validate
locally; the addon code side is unchanged in this bump (CPU
dispatcher + dynamic backend `.so` files are already wired by
the whisper-cpp port's prebuild output, and the JS layer
already enumerates ggml_backend_devs at init).
GustavoA1604 added a commit that referenced this pull request May 22, 2026
…c GGML backends (#2124)

* transcription-whispercpp: bump to 0.7.1 with whisper-cpp 1.8.4.3#1 (QVAC-18993)

Pull in the consolidated vcpkg PR (whisper-cpp 1.8.4.3 #1 +
ggml-speech 2026-05-18 #1) that covers four asana tickets:

- QVAC-18991: whisper.cpp upstream-sync from ggml-org/master to
  v1.8.4.3.  Adds upstream's VAD streaming API
  (whisper_vad_detect_speech_no_reset, whisper_vad_reset_state)
  with a regression test, the macOS Vulkan persistent-pipeline
  cache, and various BCI / bindings fixes.
- QVAC-18300: enables OpenCL on Whisper for Android, gated
  behind a new `opencl` feature.  This package now declares an
  android-only `opencl` feature that wires through to the
  whisper-cpp port's opencl feature, so a transcription addon
  built for android-arm64 can ship the Adreno backend without
  forcing it on non-Adreno consumers.
- QVAC-18992: rebases the speech-stack ggml (qvac-ext-ggml@speech)
  onto the same upstream v0.10.2 baseline that whisper.cpp's
  bundled ggml uses, so the QVAC speech stack (whisper +
  parakeet + tts-cpp) consumes a coherent ggml API surface.
  No direct dependency from this package -- transitive via
  other speech-stack addons sharing the Android process.
- QVAC-18993: switches the Android build to pure
  dynamic-backend mode: GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON
  on both the whisper-cpp port and ggml-speech port, so the
  addon's .bare prebuild ships one libggml-cpu-android_armv*_*.so
  per microarchitecture plus dynamically-loaded
  libggml-vulkan.so / libggml-opencl.so.  ggml's loader picks
  the highest-feature CPU variant (armv9.2_2 .. armv8.0_1) plus
  the right GPU backend (Adreno 700+ -> OpenCL, everything else
  -> Vulkan) at runtime, so a single APK serves the whole device
  matrix without per-device builds.

vcpkg-configuration.json is TEMPORARILY pointed at
Zbig9000/qvac-registry-vcpkg.git @ b5a5e199 (= QVAC-vcpkg-speech-stack-android-dynamic-backend
HEAD on Zbig9000's fork) because the consolidated port versions
don't exist on tetherto/main yet.  Once the vcpkg PR lands the
default-registry block must be re-pointed back to
https://github.com/tetherto/qvac-registry-vcpkg.git with the
post-merge tetherto/main SHA as baseline.

Devicefarm: the asana asks for GPU testing on mobile to verify
S25 picks OpenCL and Pixel 9 picks Vulkan.  Those tests live
outside this addon (in qvac CI's integration-mobile-test
workflow) and depend on device-farm config that I can't validate
locally; the addon code side is unchanged in this bump (CPU
dispatcher + dynamic backend `.so` files are already wired by
the whisper-cpp port's prebuild output, and the JS layer
already enumerates ggml_backend_devs at init).

* transcription-whispercpp: bump to 0.7.2 with whisper-cpp 1.8.4.3#2 (QVAC-18993)

Picks up the Android per-arch CPU dlopen fallback patch added to the
whisper-cpp port (mirrors qvac-ext-ggml@speech 9562ed04). Without
this, every APK consumer with `useLegacyPackaging=false` (AGP 3.6+
default) would silently lose CPU init: the directory iterator finds
nothing inside compressed APK libs, and the existing on-disk filename
fallback never composes the per-arch `libggml-cpu-android_armv*_*.so`
names that `GGML_CPU_ALL_VARIANTS=ON` produces.

Re-pins the Zbig9000/qvac-registry-vcpkg default-registry baseline to
86257dc376ca043c67cc4805ab8d1e74a94b7eda so both whisper-cpp 1.8.4.3#2
and ggml-speech 2026-05-19#0 are reachable.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: bump to 0.7.3 → whisper-cpp 1.8.4.3#3 (QVAC-18993)

Pure follow-up to 0.7.2 -- the two Android dynamic-backend ggml fixes
the 0.7.2 release pulled in via vcpkg patches are now upstreamed as
commits on tetherto/qvac-ext-lib-whisper.cpp PR #26 ("ggml + tts-cpp
Android dynamic-backend overlays") instead of being carried in the
vcpkg port's patches/ tree. Plus a tts-cpp `<atomic>` include fix
that closes the parallel speech-stack consumer's build under the
day-2 ggml-speech merge.

Build output is bit-identical to 0.7.2 (whisper-cpp 1.8.4.3#3 SOURCE
== 1.8.4.3#2 SOURCE+PATCHES, verified by hashing all
libggml-cpu-android_armv*_*.so files from the NDK r29 cross-compile).

Registry baseline bumped to 965f5e5a so the new port-version
(1.8.4.3#3) is reachable.

PRs in the cross-repo set:
  whisper.cpp #26 (Zbig9000:QVAC-18993-bundled-ggml-android-dynamic-backend)
  vcpkg #152 (Zbig9000:QVAC-vcpkg-speech-stack-android-dynamic-backend)

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: bridge ggml dlopen backends as IMPORTED targets (QVAC-18993)

`bare-make generate` failed on android-arm64 with

    CMake Error: get_target_property() called with non-existent target
    "ggml::ggml-cpu-android_armv8.0_1"  (… 8 backends total)

after enabling `GGML_BACKEND_DL=ON` on the `whisper-cpp` port. With dynamic-
backend mode, ggml builds the per-arch CPU + GPU backends as standalone MODULE
libraries that ggml dlopens at runtime; upstream ggml's `install(TARGETS … EXPORT)`
deliberately skips them, so the consumer's `BACKEND_DL_LIBS` loop in
`CMakeLists.txt` referenced targets that don't exist.

Wrap the existing loop with a `if(NOT TARGET ggml::${_backend})` fallback that
locates the `.so` under `${VCPKG_INSTALLED_PATH}/bin` via `find_library` and
materialises a `SHARED IMPORTED` target locally with `IMPORTED_NO_SONAME=TRUE`
— then bundle via the existing `INSTALL TARGET` path. Mirrors the pattern that
already ships in `packages/diffusion-cpp` for the same Android-dlopen
build mode.

Static backends (any platform that links ggml in directly) still find their
imported target via ggml-config.cmake on the first branch, so non-Android
prebuilds are byte-identical.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin baseline to vcpkg PR #152 rebased HEAD 8c6ca188 (QVAC-18993)

tetherto/qvac-registry-vcpkg/main moved forward yesterday with #156
(parakeet-cpp 2026-05-20 + ggml-speech 2026-04-09#2 bumps), so vcpkg
PR #152 was rebased onto the new base 0e75457. Update the default-
registry baseline pointer from the old PR #152 HEAD (dffaaf6) to the
rebased HEAD (8c6ca188) so the version-resolver still finds
`ggml-speech 2026-05-19#3` (now layered on top of the just-landed
2026-04-09#2) and `whisper-cpp 1.8.4.3#3` (unchanged content,
correct SHA512).

No other changes --- the resolver picks up the same final versions
of every package as before, just with the rebased baseline as the
search root.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: consume whisper-cpp 1.8.4.3#4 + ggml-speech 2026-05-19#4 (QVAC-18993, QVAC-18992)

Picks up the MSVC `/I` fix in the spirv-headers include-shim (vcpkg
PR #152 commit 5cd209c) so prebuild / win32-x64 stops dying with
`c1xx: fatal error C1083: Cannot open source file: '.../x64-windows/include'`
on the `whisper-cpp[vulkan]` configure step. The shim now emits the
MSVC-style `/I<path>` on Windows and keeps `-isystem <path>` (with
warning suppression) on GCC/Clang elsewhere.

whisper-cpp override bumped 1.8.4.3#3 -> 1.8.4.3#4.
Default-registry baseline bumped 8c6ca188 -> 5cd209c1.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: wire ENABLE_OPENCL so Android prebuilds ship libggml-opencl.so (QVAC-18300)

The `opencl` feature was declared in `packages/transcription-whispercpp/vcpkg.json`
(gated to `platform: android`) and the `whisper-cpp` port's `opencl` feature
correctly enables `-DGGML_OPENCL=ON` on Android — but the consumer's
`CMakeLists.txt` only appended `"tests"` and `"vulkan"` to
`VCPKG_MANIFEST_FEATURES`. The `opencl` feature was therefore never activated,
so vcpkg resolved `whisper-cpp` without `[opencl]`, ggml was built without
`GGML_OPENCL=ON`, and the `android-arm64` prebuild silently shipped CPU + Vulkan
backends only (no `libggml-opencl.so`) — defeating the entire point of
QVAC-18300.

Add an `ENABLE_OPENCL` option (default `ON` on Android, `OFF` elsewhere — the
`vcpkg.json` feature is `platform: android` gated so non-Android is a no-op
anyway) that appends `"opencl"` to `VCPKG_MANIFEST_FEATURES`. Mirrors the
`SD_OPENCL` pattern in `packages/diffusion-cpp/CMakeLists.txt` and keeps the
GPU-feature wiring uniform across the three GPU backends (Metal auto, Vulkan
toggle, OpenCL toggle).

After this commit, the `android-arm64` prebuild's
`qvac__transcription-whispercpp/` directory should ship `libggml-opencl.so`
alongside the existing 7 per-microarch CPU variants and `libggml-vulkan.so`.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: default ENABLE_OPENCL ON unconditionally (QVAC-18300)

Previous commit (6b42bc0) wired ENABLE_OPENCL but gated it on
`_qvac_whispercpp_target_os STREQUAL "Android"`, mirroring the existing
ENABLE_VULKAN block. CI re-run (26172345624) exposed that the gate is broken:
at top-level CMakeLists.txt time, `CMAKE_SYSTEM_NAME` is not yet set --- the
bare-make Android toolchain file is loaded by `project()` (which runs *after*
the option block), so `_qvac_whispercpp_target_os` falls through to the host
OS ("Linux") and ENABLE_OPENCL stayed OFF on the android-arm64 prebuild.

Evidence from run 26172345624's android-arm64 build log:
  `Installing 9/9 whisper-cpp[core,vulkan]:arm64-android@1.8.4.3#4...`
                                ^^^^^^^^ no `[opencl]`

ENABLE_VULKAN works only by coincidence: Vulkan is also default-ON on the
Linux host detection branch, so the wrong target detection produces the right
behaviour. For Android-only features there is no such overlap.

Fix: default ENABLE_OPENCL ON unconditionally and let the actual platform
gating happen where it can: (1) the `platform: android` clause on the
`whisper-cpp[opencl]` dep in `vcpkg.json`, and (2) the `VCPKG_TARGET_IS_ANDROID`
check in the `whisper-cpp` portfile that gates `-DGGML_OPENCL=ON`. Adding
`"opencl"` to `VCPKG_MANIFEST_FEATURES` on non-Android is a guaranteed no-op
because the feature's only dep is platform-gated --- mirrors the layered
gating that `whisper-cpp[vulkan]` already uses (the `vulkan` feature's deps
are `!osx & !ios` gated and the portfile's `-DGGML_VULKAN=ON` is also
target-gated).

After this commit, the android-arm64 install plan should resolve as
`whisper-cpp[core,vulkan,opencl]` and the prebuild tarball should contain
`libggml-opencl.so` alongside the 7 per-microarch CPU `.so`s and
`libggml-vulkan.so`.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: call ggml_backend_load_all_from_path before whisper_init (QVAC-18993)

Android mobile-test E2E crashed inside whisper_init_from_file_with_params
with SIGABRT on PR #2124 / run 26173084690 (both Pixel 9 Pro + Samsung S25
Ultra, 132 ms after Downloaded model: ggml-tiny.bin). Stack:

  abort → ggml_abort+228 → ggml_backend_dev_backend_reg+48
       → whisper_init_with_params_no_state+480
       → whisper_init_from_file_with_params_no_state+212
       → whisper_init_from_file_with_params+48
       → WhisperModel::load()+460

Root cause: the addon never called ggml_backend_load_all*(). With the
QVAC-18993 GGML_BACKEND_DL=ON build, the bundled ggml-base no longer
defines GGML_USE_CPU, so the static ggml_backend_registry ctor registers
zero backends. whisper.cpp's first ggml_backend_init_by_type(CPU) returns
NULL → ggml_backend_dev_backend_reg(NULL) trips GGML_ASSERT(device).

This is the same crash signature on both the pre-OpenCL run 26170576156
and the post-OpenCL run 26173084690, so it is independent of the recent
OpenCL enablement. The mobile workflow last passed on
tmp-whisper-184-3-validation back on 2026-05-11, which predates
GGML_BACKEND_DL=ON.

Mirror the pattern used by every other ggml-based addon in the monorepo
(packages/{diffusion-cpp,llm-llamacpp,classification-ggml,…}):

* CMakeLists.txt — emit BACKENDS_SUBDIR (<bare_target>/<module_name>)
  compile def via bare_target / bare_module_target.
* WhisperConfig — add backendsDir field (sibling of the handler-driven
  maps so it bypasses WHISPER_CONTEXT_HANDLERS.at()).
* JSAdapter — read top-level backendsDir string directly from
  configurationParams into config.backendsDir.
* WhisperModel::load — on __ANDROID__, std::call_once →
  ggml_backend_load_all_from_path(backendsDir/BACKENDS_SUBDIR) before
  whisper_init.
* index.js — require('bare-path'); pass
  backendsDir: path.join(__dirname, 'prebuilds') in _load + reload.

No diff on non-Android (Linux/macOS/Windows/iOS): ggml's static ctor
keeps registering CPU there as before.

aiDocs/15-android-mobile-test-crash-fix.md has the full investigation
(crash extraction, layered root-cause, why every other ggml addon
already does this, follow-ups).

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin vcpkg baseline to cleaned PR #152 head (QVAC-18993)

PR #152 (qvac-registry-vcpkg) was rebased today to drop the ggml-speech
port bump (b4cf7b2) and the matching ggml-speech-side MSVC shim. Only
the whisper-cpp bump + whisper-cpp portfile MSVC `/I` fix remain. The
consumer-side migration to ggml-speech (QVAC-18992 / PR #13) stays open
on the speech branch but is no longer a prerequisite for this Android
dynamic-backend rollout.

New PR #152 HEAD: 9f4e8e20072d8a7a1e118a49c36aacf6af6b3e0d
Old (pre-cleanup): 5cd209c145a1d61636f1d44b4afe37868c298a8c

This addon does not depend on `ggml-speech` (it consumes the bundled
ggml inside `whisper-cpp`), so the dependency closure is unchanged.
Updated CHANGELOG to record the new baseline + the reason ggml-speech
got dropped.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: fix cpp-lint failures (clang-format + clang-tidy)

The prior CI run skipped cpp-lint entirely because the recent PR
commits only touched CMakeLists.txt / CHANGELOG.md. The new
ea298cf commit (QVAC-18993 mobile-test fix) added the first C++
diff in this branch, so cpp-lint now runs full clang-format
+ clang-tidy and surfaces three issues:

1. clang-format: JSAdapter.cpp had a one-line declaration broken
   across two lines (LLVM PointerAlignment=Left + AlignAfterOpen
   collapsed it). Reformatted in place.

2. clang-tidy [readability-identifier-naming]:
   WhisperHandlers.hpp:9 -- local `const int LANG_ID` violates the
   variable case style. Renamed to `langId` (lowerCamelCase, matches
   `checkLanguage` two lines above). Latent issue; never reported
   before because cpp-lint was a no-op on every prior PR commit.

3. clang-tidy [readability-identifier-naming]:
   WhisperModel.hpp:100 -- unused `set_weights_for_file(span, bool)`
   stub kept for parity with `transcription-parakeet` (which uses
   snake_case extensively for this exact API). Renaming would
   diverge from the parakeet pattern, so suppress with a single
   NOLINTNEXTLINE rather than touching the API surface.

Local repro: `cp packages/lint-cpp/.clang-format
packages/transcription-whispercpp/.clang-format` then
`git-clang-format --diff $(git merge-base HEAD origin/main) --
packages/transcription-whispercpp` reports `did not modify any
files`. The .clang-format copy is normally produced by
`packages/transcription-whispercpp/CMakeLists.txt:58
(configure_file COPYONLY)` during CMake configure.

[QVAC-18993][QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: reference QVAC-19071 in CHANGELOG

QVAC-19071 ([Whisper] Update qvac-registry-vcpkg and addon with new
port versions) is the meta task that bundles the registry-side port
bump (qvac-registry-vcpkg PR #152: whisper-cpp 1.8.4.3#4) with the
consumer-side addon bump (qvac PR #2124: transcription-whispercpp
0.7.3, baseline re-pin). No code changes; the work itself was
already covered by PR #152 + this PR. Adds the cross-reference so
the Asana ticket can be closed off this release cycle.

The QVAC-18992 ggml-speech migration (PR #13 + ggml-speech port
bump) stays deferred per the 2026-05-21 plan; it will land as a
follow-up port bump under the same QVAC-19071 umbrella.

[QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin baseline to consume whisper-cpp 1.8.4.3#5 (REF flipped to tetherto/master)

[whisper-cpp PR #28](tetherto/qvac-ext-lib-whisper.cpp#28)
(QVAC-18993 bundled-ggml --- Android dynamic backend + per-arch CPU
dlopen fallback) was merged today (2026-05-21, merge commit
`f3102199` on `tetherto/qvac-ext-lib-whisper.cpp/master`). With it
merged, `tetherto/master` now carries every commit the registry's
`whisper-cpp` port previously pulled from the temporary
`Zbig9000/qvac-ext-lib-whisper.cpp@14620c8857` branch:

  - PR #25 (QVAC-18991, upstream whisper.cpp sync) --- merged 2026-05-20
  - PR #27 (QVAC-18966, tts-cpp chatterbox <atomic> fix) --- merged 2026-05-20
  - PR #28 (QVAC-18993, ggml-backend android dynamic backend) --- merged 2026-05-21

[qvac-registry-vcpkg PR #152](tetherto/qvac-registry-vcpkg#152)
HEAD (`f2870372`) bumps `whisper-cpp` to port-version `1.8.4.3#5`
with the REF repoint --- byte-identical source tarball outside
`parakeet-cpp/` and `tts-cpp/` (separate vcpkg ports). This commit
just re-pins the consumer-side baseline so the addon resolves
against the new port-version.

  vcpkg-configuration.json default-registry baseline:
    9f4e8e20072d8a7a1e118a49c36aacf6af6b3e0d   (MSVC fix only, whisper-cpp 1.8.4.3#4)
      -> f2870372965e899ae1f8a221154d2b243a6c3d30  (+ whisper-cpp 1.8.4.3#5 REF repoint)

No code change in this monorepo --- pure baseline re-pin. CHANGELOG
updated to record both the new baseline and the (now superseded)
intermediate `9f4e8e2` pin.

Closes the consumer-side half of [QVAC-19071](https://tetherapp.atlassian.net/browse/QVAC-19071)
("Update qvac-registry-vcpkg and addon with new port versions").
Registry-side half = vcpkg PR #152 commit `f287037`.

[QVAC-18993][QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin baseline to whisper-cpp 1.8.4.3#0 (PR #152 review fixes)

@GustavoA1604 review on [qvac-registry-vcpkg PR #152](tetherto/qvac-registry-vcpkg#152)
requested three changes on the registry side:

  1. Drop the explanatory comment block at top of
     `ports/whisper-cpp/portfile.cmake`.
  2. Reset `port-version` 5 -> 0 (treat the tetherto REF repoint as
     a fresh start, not a continuation of the Zbig9000-branch series).
  3. Collapse the three historical `1.8.4.3` entries
     (`port-version` 3, 4, 5 -- never consumed off-fork) in
     `versions/w-/whisper-cpp.json` into a single `port-version: 0`
     entry with the new git-tree.

All three landed in PR #152 commit `ee71ecb`. This commit is the
consumer-side mirror:

  vcpkg-configuration.json default-registry baseline:
    f2870372965e899ae1f8a221154d2b243a6c3d30  (1.8.4.3#5, pre-review)
      -> ee71ecb5b286224377313e5a50558d11adbef3ac  (1.8.4.3#0, post-review)

  CHANGELOG entry updated:
    "1.8.4.3#5" -> "1.8.4.3#0" + note about port-version reset and
    history collapse + supersession line covers both prior pins
    (`9f4e8e2` MSVC fix, `f287037` 1.8.4.3#5).

No code change in this monorepo --- pure baseline re-pin. The
underlying whisper.cpp source bytes are unchanged (REPO + REF +
SHA512 in the portfile are identical between `1.8.4.3#5` and
`1.8.4.3#0`), so the produced binary is bit-for-bit equivalent.

[QVAC-18993][QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: 0.8.0 — address PR review

Collapses the 0.7.1/0.7.2/0.7.3 work into a single 0.8.0 release and
folds in Gustavo's PR #2124 review feedback:

- Bump version to 0.8.0; collapse CHANGELOG into a single 0.8.0 entry
- Bump whisper-cpp override to 1.8.4.3#0 (matches PR #152 collapse)
- Repoint default-registry to tetherto/qvac-registry-vcpkg @ a9d7e924
  (PR #152 merge commit on tetherto/main)
- vcpkg.json: model GPU features on transcription-parakeet's pattern —
  platform-gated whisper-cpp deps select [opencl,vulkan] on android,
  [vulkan] on linux/windows, and no GPU feature on apple. Drop the
  addon-side opencl/vulkan feature sections; CMake no longer carries
  ENABLE_OPENCL / ENABLE_VULKAN option indirection.
- index.js: nest backendsDir under whisperConfig (mirrors parakeet's
  parakeetConfig.backendsDir). Strip it from the wire-format whisperConfig
  map and surface it as top-level configurationParams.backendsDir before
  handing the config to the addon. Fix the stale _createAddon JSDoc that
  still described "LLM-specific settings".
- index.d.ts + README.md: document whisperConfig.backendsDir; drop the
  ENABLE_VULKAN build instructions (now controlled by vcpkg.json).
- Compact all the addon-side comments (CMakeLists.txt, JSAdapter.cpp,
  WhisperConfig.hpp, WhisperModel.cpp); drop every QVAC asana ticket
  reference; standardise the C++ log wording on
  "configurationParams.backendsDir".
- Drop "-D ENABLE_VULKAN=OFF" from the test:cpp:build / coverage:cpp:build
  npm scripts (no-op now that the option is gone).

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: 0.9.0 -> 0.8.0 (fold into single release)

Reverts the 0.8.0 -> 0.9.0 bump from the merge commit: per request, this
PR's release notes are folded into the existing 0.8.0 entry rather than
shipping as a separate semver step. Order: Added -> Changed -> Fixed
(from this PR) -> Removed (the OutputCallbackJs revert that landed on
main as 0.8.0 via #2133).

package.json bumped back to 0.8.0.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Proletter pushed a commit that referenced this pull request May 24, 2026
* Try #1. Adding tokenizer proxy to provide vocab size.

* Try #2. More fixes and logs.

* Try #3. Limit device to only cpu or gpu.

* Revert "Try #2. More fixes and logs."

This reverts commit a461e69.

* Revert "Try #1. Adding tokenizer proxy to provide vocab size."

This reverts commit 9951195.

* Fixing pipeline logging

* Add more logs

* Fixing bench logging

* Add more error handling and logging

* Improve error handling on the server. Added retry in case of context overflow.

* Make retries self-adjustable

* Adding some more checks and limiting the datasets temporarily

* Test: trying to narrow down the error

* Exclude failing datasets from embed benchmark

* Clean up the code

* Changing bench model for LLM

* Try #1. Adding tokenizer proxy to provide vocab size.

* Try #2. More fixes and logs.

* Try #3. Limit device to only cpu or gpu.

* Revert "Try #2. More fixes and logs."

This reverts commit a461e69.

* Revert "Try #1. Adding tokenizer proxy to provide vocab size."

This reverts commit 9951195.

* Fixing pipeline logging

* Add more logs

* Fixing bench logging

* Add more error handling and logging

* Improve error handling on the server. Added retry in case of context overflow.

* Make retries self-adjustable

* Adding some more checks and limiting the datasets temporarily

* Test: trying to narrow down the error

* Exclude failing datasets from embed benchmark

* Clean up the code

* Changing bench model for LLM

* Minor fixes for clarity

* Removing unused vars

* Removing unused imports

* Removing unused python deps

---------

Co-authored-by: gianni <gianfranco.cordella@tether.io>
Proletter pushed a commit that referenced this pull request May 24, 2026
…ape (review #1727)

Addresses two `[BUG]` review comments from @olyasir on #1727
about the hardcoded `kNumClasses = 3` not being validated against either
the loaded GGUF's `mobilenet.num_classes` metadata or the actual element
count of the constructed output tensor. Both are downstream-safety
problems for the per-inference path:

  float logits[graph::kNumClasses] = {0.0F};
  ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits));

`sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this
either reads OOB (numClasses < kNumClasses) or silently truncates
(numClasses > kNumClasses); on the FC-weight-upload side the
`classifier.3.weight = [1024, kNumClasses]` shape would also fail to
match the GGUF tensor and corrupt the classifier.

Changes:

1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights()

   Right after reading `numClasses` from `mobilenet.num_classes`,
   compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)`
   with a descriptive message (actual vs expected count, plus a hint to
   rebuild the addon or use a matching GGUF). This is the primary fix
   olyasir requested in `MobileNetGraph.cpp`.

   The error path is reachable from `ClassificationModel::load()`'s call
   to `graph::loadWeights(...)`, which already runs inside the JS-side
   `await classifier.load()` Promise; the `StatusError(InvalidArgument)`
   propagates as a structured rejection on the JS side, matching how
   every other config-time validation error in this addon surfaces.

2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph()

   At the end of the graph build, before we hand the
   `ComputeGraph::output` tensor over to the backend allocator, assert
   `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which
   throws `StatusError(InternalError, ...)`) if the invariant is
   violated. This is the defence-in-depth fix olyasir requested in the
   second `[BUG]` comment in `ClassificationModel.cpp`: it makes the
   12-byte stack-array `ggml_backend_tensor_get` read provably safe
   regardless of how the output tensor was constructed.

   This second check is not redundant with #1: it also catches a future
   accidental edit to the classifier wiring above (where the tail
   `classifier.3` linear is what determines the output element count),
   an upstream ggml change to how `mul_mat` shapes its result, or a
   GGUF that lacks the `mobilenet.num_classes` metadata key entirely
   and falls back to `kNumClasses` but ships mismatched FC weights.

Local validation on win32-x64:

- 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph
  determinism, preprocessor suite -- they all exercise the validated
  load + build paths against the bundled FP16 GGUF, where
  `num_classes == 3` so neither check fires).
- 14/14 JS integration tests pass, 140/140 asserts (no behaviour
  change for the supported model; new error paths are unreachable
  with the bundled weights).

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp
Made-with: Cursor
Proletter pushed a commit that referenced this pull request May 24, 2026
CI run 25074595106 confirmed the two-phase test-side drain
(commit f6d1d5d) is sufficient for the upstream `OutputCallBackJs`
UAF on every platform: linux-x64/-arm64, darwin-arm64,
android-arm64, ios-arm64 all pass.

Only `win32-x64-integration-tests` still fails, and it does so for
a completely different upstream issue: the first
`js_create_double` call inside an `OutputCallBackJs` callback
returns 0.0 on win32-x64 (clang-cl + bare-runtime + V8) regardless
of the input. Subsequent calls in the same handle scope are
correct. The bug zeros out the highest-confidence value on every
classify() call, breaks the sort order, and trips
`meal_1.jpg "sorted desc [0]>=[1]"` (CI runs 24851301107,
24891210942, 24897445066, 24900278513, 25002820522, 25062157099,
25070800838, 25074595106).

There is no test-side workaround for this one. Sleeps don't help
because it isn't a lifecycle race. Other addons accidentally dodge
it for the reasons enumerated in the comment block at the top of
`AddonJs.hpp` (first emitted number is naturally 0; tests assert
only typeof / !isNaN; first number never asserted on; or no
numbers emitted at all). Our 3-class triage assertions cover none
of those, so the bug remains visible in CI.

Fix: restore the local C++ "burn one" workaround that was removed
in commit efbd683. A throwaway `js_create_double(env, 0.0,
&dummy)` call at the top of `JsClassifyOutputHandler`'s lambda
consumes the broken first slot; the per-element `Number::create`
calls that follow produce the correct value at index 0. The
throwaway value is never wired into the result array; cost is one
ephemeral js_number per classify() call.

The asymmetry between issues #1 (test-side sleep is enough) and
#2 (needs C++ workaround) is now documented at the top of
AddonJs.hpp -- including the CI runs that surfaced each, why the
test-side approach worked for one and not the other, and the
explicit rationale ("removed once upstream marshalling layer is
patched") for revisiting both.

Local validation on win32-x64:
- `bare-make build` clean.
- `npm run test:integration` 14/14 tests, 140/140 asserts (was
  failing on `meal_1.jpg sorted desc [0]>=[1]` before this).

Expected CI behaviour after this commit:

- Linux x64/arm64, Darwin arm64, Android arm64, iOS arm64 should
  keep passing (this commit doesn't touch their code paths).
- win32-x64 should now pass: the burn-one consumes the broken
  first slot and every per-element confidence marshalls correctly.

File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
Made-with: Cursor
Proletter pushed a commit that referenced this pull request May 24, 2026
…ightsProvider (#1494)

* chore[bc]: remove BaseInference inheritance and WeightsProvider from LLM addon

Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization
- Direct shard streaming via bare-fs instead of WeightsProvider

Constructor now takes { files: { model: string[], projectionModel?: string }, config, logger, opts }
instead of { loader, diskPath, modelName, projectionModel } + config.

All finetune, media, and filtered logger functionality preserved.

* fix: correct FinetuneProgress and finetune terminal handling in output callback

FinetuneProgress must call updateStats(data.stats), not updateOutput(data).
Finetune terminal JobEnded must call ended(data) as result, not updateStats.

* fix: update all LLM examples and model-loading test to new constructor shape

Update 13 examples and sharded model test to use files: { model: [...] } pattern.
Remove FilesystemDL dependency from all examples and tests.

* fix: update sharded model test to download shards to disk first

The network loader test used the old loader-based constructor.
Rewritten to download shards via HttpDL to disk, then pass absolute paths.

* fix: update LLM benchmark tooling to new constructor shape

* fix: update LLM perf benchmark sweep and judge to new constructor shape

* docs: update LLM README, finetuning, and afriquegemma docs for new constructor

* fix: update LLM prepare-prompts and verify-prompts to new constructor

* fix: update LLM finetuning unit tests to new constructor and exclusiveRunQueue

* docs: update LLM architecture, data-flows, finetuning, README sharded contract

* docs: align LLM finetuning docs and mobile README with new constructor

* chore[bc]: address PR #1494 review findings and bump to 0.15.0

Bumps `@qvac/llm-llamacpp` to `0.15.0` per the addon-changelog
process — minor bump on a pre-1.0 package signals the breaking
constructor change to consumers using semver ranges. Adds the
matching `0.15.0` block to `CHANGELOG.md` documenting the new
single-object constructor with `files`, the removal of
`BaseInference` + `WeightsProvider`, the dropped `destroy()`
method, the dependency churn, and every behaviour change in this
release.

Hardens the JS layer based on the review:

- Constructor now throws a clear `TypeError` when `files` /
  `files.model` is missing or empty, instead of crashing with an
  opaque "cannot read properties of undefined" later.
- `_runInternal` now throws "Addon not initialized. Call load()
  first." when invoked before `load()`, matching `finetune()` and
  the diffusion addon.
- `_load()` wraps `_streamShards` + `addon.activate()` in a
  try/catch that best-effort-unloads the partially-initialized
  native instance and resets `this.addon = null` so a subsequent
  `load()` does not leak a zombie addon.
- `createJobHandler({ cancel })` closure uses optional chaining so
  a stale `response.cancel()` after `unload()` is a no-op rather
  than a `TypeError`.
- `unload()` sets `this.addon = null` after `addon.unload()`, so
  the new `if (!this.addon)` guard in `_runInternal` is also
  effective post-unload.
- `pause()` and `cancel()` re-add the defensive `?.cancel` check.
- The `_load()` primary-path selection now picks the first entry
  matching the shard regex, replacing the fragile `[length - 1]`
  index. This stays compatible with the documented sharded order
  (`tensors.txt` first, shards second) and with the non-sharded
  single-file path; an inline comment explains the contract.
- The `_handleAddonOutputEvent` error log line now passes the
  `Error` object directly so loggers can format the full stack.

Drops dead `_isSuppressedNoResponseLog` /
`_createFilteredLogger` / `_originalLogger` plumbing. Those
existed to swallow `'No response found for job'` warnings emitted
by the old `BaseInference._jobToResponse` Map; the new
`createJobHandler`-based architecture cannot emit that message,
so the filter, the wrapped logger, and the `_originalLogger`
indirection are all gone. The user-supplied logger is now used
directly.

Restores JSDoc on every `FinetuneOptions` field in `index.d.ts`,
including default values (`numberOfEpochs = 1`,
`learningRate = 1e-4`, `batchSize = 128`, …) so IDE tooltips show
them without needing to read `docs/finetuning.md`.

* refactor: move LLM C++ event normalization into addon.js

Per the team-2 task doc (`TD-ADDON-INTERFACE-LLM-EMBED-SD.md`,
LLM section): "Move event name normalization from `index.js`
`_addonOutputCallback` into `addon.js` `LlamaInterface` — the
native binding wrapper should own the mapping from raw C++ events
to Output / Error / JobEnded / FinetuneProgress."

Adds `mapAddonEvent(rawEvent, data, error, state)` as a free
export from `addon.js`, co-located with `LlamaInterface`. The
function normalizes the C++-mangled event vocabulary into one of
`Output` / `Error` / `JobEnded` / `FinetuneProgress`, including:

- TPS-shaped runtime stats → JobEnded with `backendDevice`
  mapped from `0/1` to `'cpu'/'gpu'`.
- Finetune terminal payloads (`{op:'finetune', status, stats?}`)
  → JobEnded carrying the finetune payload, and arms the skip
  flag so the trailing TPS stats from the finetune are not
  dispatched as a fresh inference terminal.
- `finetune_progress` payloads → FinetuneProgress.
- Anything else with an `Error`-flavored event name → Error.
- String payloads → Output.

`LlmLlamacpp._addonOutputCallback` becomes a thin shim that
imports `mapAddonEvent`, hands it the per-instance state object
(now `this._addonEventState = { skipNextRuntimeStats }` instead
of the bare `_skipNextRuntimeStats` field), and forwards the
mapped event to `_handleAddonOutputEvent`.

Stateful flag lives on the model so unit tests can still poke at
it via `model._addonEventState.skipNextRuntimeStats`. Updated all
9 references in `test/unit/finetuning.test.js`. All 31 unit
tests still pass; lint and dts checks clean.

Also fixes the misleading JSDoc on `LlamaInterface.loadWeights`:
the native binding reads the JS property name `chunk` (verified
in `qvac-lib-inference-addon-cpp/JsBlobsStream.hpp::appendBlob`,
lines 41–42 and 66–67), not `contents`. The C++ local variable
is named `contents`, which is what the proposal text was
referencing — but the on-the-wire JS property name is `chunk`
and the JS layer call sites are correct.

* fix: address PR #1494 second-round review findings

1. `test/integration/http-loader.js` no longer extends
   `@qvac/dl-base`. The base class was only providing a `close()`
   shim around `_close()`, and the package's devDependencies no
   longer list `@qvac/dl-base` after the loader-removal refactor.
   The helper now stands on its own — `getStream()` and `close()`
   are the only methods the sharded model-loading test calls, so
   the rest of the BaseDL surface (including the unused
   `getFileSize` and `list`) is dropped. Removes the dangling
   require that would break a clean install of this package and
   block the sharded test in CI.

2. `examples/multiModal.js` no longer passes `content: imageFilePath`
   on the second `media` message. The native binding only accepts
   `Uint8Array` payloads on `media` messages — file paths were
   silently broken after the loader removal. The example now
   reuses the same `imageBuffer` for both inferences and uses a
   different prompt on the second one to keep the example
   pedagogically distinct.

3. `index.d.ts` `AddonMessage` now exposes the optional
   `generationParams?: GenerationParams` field. The runtime path
   in `LlmLlamacpp._runInternal` already serializes this field
   onto every text message it forwards through `addon.runJob`,
   but the published transport type omitted it — IDE consumers
   building their own message-shaped payloads would lose the
   per-call overrides. The field documents that it is forwarded
   from `RunOptions.generationParams` and is the canonical way
   to vary sampling per request without re-loading the model.

* fix: extract pickPrimaryGgufPath, restore multiModal example, fix docs

- Extract shard-picker logic into named pickPrimaryGgufPath() with unit
  tests documenting the contract (tensors.txt-first ordering, single-file
  fallback). Move SHARD_REGEX inside the function.
- Revert multiModal.js to original: first inference uses Uint8Array,
  second uses string path. Both C++ code paths work. Remove false comment
  claiming file paths are not supported.
- Restore stripped JSDoc on FinetuneValidationSplit.fraction and
  FinetuneValidationDataset.path in index.d.ts.
- Fix docs/architecture.md and docs/data-flows-detailed.md: 4 occurrences
  incorrectly said "last" shard is the primary path; actual code picks
  the first shard regex match.
- Hardcode shard filenames in model-loading integration test instead of
  generating them via regex.
- Add network streaming capability loss note to CHANGELOG.

* fix: correct version in architecture.md and remove stale dl-filesystem benchmark dep

- docs/architecture.md header: v0.14.3 → v0.15.0 to match package.json
- benchmarks/performance/package.json: remove @qvac/dl-filesystem (no
  longer used after FilesystemDL references were removed from all
  benchmark JS files)

* fix: align _hasActiveResponse clearing with embed pattern

Remove the synchronous clear in _handleAddonOutputEvent on JobEnded/Error.
The .finally() on response.await() already clears the flag when the response
promise settles, and exclusiveRunQueue serializes _runInternal so the next
call cannot race the current one. Matches the embed addon's pattern, where
.finally() is the sole clear path outside of unload().

* fix: throw on second load(), log rejected responses, add mapAddonEvent unit test

- load(): throw if already loaded. Caller must unload() first. Aligns
  with the team consensus (Yury/Gianfranco/Gustavo) — silent reload
  masks caller bugs. unload() already clears configLoaded.
- _runInternal / finetune: replace silent `finalized.catch(() => {})`
  with a warn-level log so rejected responses are not swallowed when
  the caller does not await.
- test/unit/map-addon-event.test.js: new unit test covering TPS stats
  mapping + backendDevice translation, skipNextRuntimeStats dropping,
  finetune terminal + skip-flag arming, finetune_progress, Error event,
  string-as-token Output, and default fall-through.
- CHANGELOG 0.15.0: document the load() throw.

* fix: restore JSDoc on run() that was dropped during BaseInference removal

The JSDoc documenting run()'s prompt and runOptions parameters was
accidentally removed during the BaseInference removal refactor when
run() was split into run() + _runInternal(). Restore it on the public
run() method, and reference the full RunOptions type (which already
documents prefill / generationParams / cacheKey / saveCacheToDisk in
index.d.ts) so the docs stay authoritative in one place.

* fix: migrate afriquegemma-edge-cases test to new addon constructor

The afriquegemma-edge-cases.test.js file came in via the upstream/main
merge but still used the pre-refactor constructor shape:
  new LlmLlamacpp({ loader, modelName, diskPath, ... }, config)
with a FilesystemDL loader. All 7 tests in the file are now migrated to:
  new LlmLlamacpp({ files: { model: [path.join(dirPath, modelName)] },
                    config, logger, opts })
Removed FilesystemDL import and all loader.close() calls. Added
isMobile skip flag matching the pattern in afriquegemma-translation.

Caught by the qvac-staff-code-reviewer agent as a "merge brought in a
new consumer of the old API" — restore-the-class issue across the family.

* fix: make load() idempotent when already loaded

Second load() on an already-loaded instance returns immediately instead
of throwing. Matches the ReadyResource pattern used elsewhere in QVAC:
open/load is idempotent; explicit unload() is required to swap weights.

CHANGELOG updated.

* test: regenerate mobile integration auto.cjs

Integration test files were touched during the refactor and the
generated mobile harness was not regenerated. `npm run test:mobile:generate`
output committed so `validate-mobile-tests.js` passes.

* doc: document missing breaking changes from BaseInference removal

Address feedback to report all breaking changes from the BaseInference
refactor, not just the constructor shape:

- getState() narrows from {configLoaded, weightsLoaded, destroyed}
  to {configLoaded} only
- LlmLlamacpp public methods removed: downloadWeights, unpause, stop,
  status, destroy, getApiDefinition (destroy was already mentioned;
  other five were missing)
- load() takes no arguments (was (closeLoader, onDownloadProgress))
- Type exports removed from index.d.ts: ReportProgressCallback,
  Loader, DownloadWeightsOptions, DownloadResult

Also fix the stale (0.15.0) version marker in the AFTER code block.

* fix: address lifecycle, validation, and CI-surface review findings

- load() now runs through `this._run()` so concurrent calls on the same
  instance serialize instead of racing past the `configLoaded` guard.
  Two overlapping loads could previously both allocate a native addon
  and clobber `this.addon`, leaking one native handle.
- Constructor now validates each `files.model` entry with
  `path.isAbsolute()` and applies the same check to the optional
  `files.projectionModel` (which previously had no validation at all).
  Relative paths are rejected at construction time instead of bubbling
  up from bare-fs / native load.
- `pickPrimaryGgufPath` is now declared in `index.d.ts` so the TS
  surface matches the CommonJS export at `index.js`.
- Add `test:unit` and `test:unit:generate` scripts that run the JS
  unit tests under `test/unit/*.test.js` via brittle + bare. Wire
  `test:unit` into `test:all` and into the PR workflow's ts-checks
  job so `map-addon-event.test.js`, `pick-primary-gguf-path.test.js`,
  and the pre-existing `finetuning.test.js` all run on every PR.

* doc: add CHANGELOG entries for load() serialization and absolute-path validation

* fix[ci]: run test:unit via run-lint-and-unit-tests action

Replace my hand-rolled test:unit step (which invoked `bare` in a job
that never installs it) with the existing run-lint-and-unit-tests
external action. Same pattern qvac-lib-infer-onnx and ocr-onnx already
use. The action installs bare globally and runs
`npm run test:unit --if-present`.

Also chain test:unit into the `test` script for local dev convenience,
matching the standalone-repo precedent (qvac-lib-inference-addon-base,
qvac-lib-dl-filesystem, etc.).

* doc: fix mermaid parsing errors in architecture.md and finetuning.md

architecture.md:159 — mermaid classDiagram uses { } as class-body
delimiters; the inline destructured-object syntax in the constructor
signature broke parsing. Replace with the canonical named type
`LlmLlamacppArgs` from index.d.ts so the class diagram renders.

finetuning.md:251 — sequence-diagram message contained `(_run)` and
`_hasActiveResponse` where the leading underscore was being
interpreted as mermaid italic-open, and slashes in
`validationSplit/useEvalDatasetForValidation/evalDatasetPath` made
the message ambiguous. Reword to use prose-style commas and drop the
leading-underscore identifiers.

Reported by maxim-smotrov.

* chore[ci]: rename step to reflect what the action actually runs

The run-lint-and-unit-tests action runs `npm run lint` and
`npm run test:unit` (and installs bare in between). The step name
"Run JavaScript tests" hides the lint half. Rename to
"Run lint and unit tests" and update the step id accordingly.

* fix: readme, finetune lifecycle, multimodal type

README quickstart, sharded, and OCR examples now use `path.resolve('./models')`
so the resulting `files.model` entries and `files.projectionModel` are
absolute. The refactored constructor rejects relative paths, which meant
the README snippets threw `TypeError` when copied verbatim.

`finetune()` moves the `!this.addon` readiness check and the
`_checkpointSaveDir` assignment inside the `this._run(...)` closure,
matching the pattern `run()` uses via `_runInternal`. If `unload()` is
already queued ahead of `finetune()`, the guard now runs after
`unload()` nulls `this.addon` instead of before, so the caller gets the
intended "Call load() first." error rather than a null-dereference
crash inside the queued body.

`UserMediaMessage.content` widens from `Uint8Array` to `Uint8Array | string`.
The C++ layer has always accepted both (raw bytes go through `parseMedia`;
string paths go through `loadMedia` in LlamaModel.cpp), and the OCR /
multimodal examples exercise the string-path form. The d.ts was
inadvertently narrower than the runtime contract.

* fix: preserve LogMsg event name in mapAddonEvent

Native `JsLogMsgOutputHandler` emits log events whose payload is a
plain string (`js::String::create(env, logMsg)`). The old mapping had
a generic `typeof rawData === 'string'` fallback that remapped every
string-payload event to `Output`, so any native LogMsg was quietly
pushed into the job output stream instead of the logger. The
`_handleAddonOutputEvent` branch that routes `LogMsg` to
`this.logger.info()` was therefore unreachable.

Check the `LogMsg` event name before the string-to-Output fallback so
log messages keep their type and reach the logger. Add a unit test
covering the precedence.

* doc: restore class JSDoc, method JSDoc, and media-separation comments

Restore documentation that the refactor dropped but whose content is
still accurate against the refactored code:

- Class-level JSDoc on LlmLlamacpp describing what the class does.
- Short JSDoc on pause(), cancel(), and unload() explaining each method's
  purpose, including how pause() saves a resumable checkpoint and how
  cancel() wipes it so the next finetune() starts fresh.
- Inline comments in _runInternal explaining the media/text separation:
  binary blobs go into promptMessages as type: 'media' entries in order,
  then the JSON text payload carries empty-content placeholders for each
  media item so tokenization can align.

* doc: shorten pickPrimaryGgufPath JSDoc in d.ts to a single line

Declaration-file JSDoc surfaces in IDE hover tooltips, so multi-paragraph
prose is noise. Trim to a one-liner covering the only behavior the type
hover needs to convey. The "exported for unit testing" rationale is
dropped since consumers do not need it on the type surface.

* doc: trim verbose comments added during the refactor

Tighten comments this PR introduced that drifted into over-explanation.
Leave pre-existing comments as-is.

- addon.js mapAddonEvent JSDoc: drop the multi-paragraph prose about
  C++ event naming and stateful ordering; keep the one-sentence
  contract plus the param block.
- index.js pickPrimaryGgufPath JSDoc: replace the multi-paragraph
  explanation of the caller's shard-list contract with a single-line
  summary citing the C++ regex contract.
- index.js class header on LlmLlamacpp: reduce to a single purpose line.
- index.js constructor block: shorten the lazy-deref rationale and the
  _addonEventState comment to one line each.
- index.js _addonOutputCallback: reduce the three-line comment
  pointing at addon.js to a single line. The detailed rationale is
  already in addon.js mapAddonEvent JSDoc.
- index.js media-separation comment: restore the one-line wording that
  already existed on main; earlier revision expanded it into three
  lines unnecessarily.

* doc: drop narration comment on _addonOutputCallback

The comment said "Event-name normalization lives in addon.js
(mapAddonEvent)", but the very next line imports and calls
mapAddonEvent — the code already tells the reader where event mapping
lives. Remove the line so the code speaks for itself.

* doc: restore FinetuneOptions JSDoc to pre-refactor forms

The refactor commit unintentionally rephrased FinetuneOptions JSDoc
lines that the refactor itself did not change. Revert those fields back
to main's original wording so the diff only carries structural changes
tied to the interface migration.

* doc: restore pre-refactor load/createAddon logs and JSDoc

The refactor commit silently dropped the _load() progress logs ('Creating
addon with configuration', 'Activating addon'), the 'Error during model
load' error log, and the JSDoc block on _createAddon(). Put them back so
the refactor only changes what needs to change.

* chore: drop unused 'test' script, inline into 'test:all'

The 'test' alias was only consumed by 'test:all', and neither was
referenced in CI workflows or the README. 'test:all' ran test:unit
twice because it called both test:unit and the 'test' alias. Remove
'test' and rewrite 'test:all' to run test:unit, test:integration, and
test:cpp directly.

* doc: correct pre-refactor constructor marker to <= 0.15.x

0.15.x still used the old (args, config) constructor shape; the old
example applies to any 0.15.x caller, not just 0.14.x. Align the
CHANGELOG marker with the PR body.

* test: run AfriqueGemma tests on mobile, matching main

The backmerge of upstream/main carried a stale 'skip: isMobile' from
the pre-refactor translation test into the six new translation tests
and the edge-cases migration. Main's c1cc8c0 deliberately dropped
the mobile skip; restore that intent. The isMobile constant is
unused after this and dropped.

* doc, test: fix _createAddon JSDoc and cover string-path media content

_createAddon() JSDoc referenced 'configurationParams.settings' and
omitted 'projectionPath'. The actual shape built in _load() is
{ path, projectionPath, config }; align the JSDoc with that.

UserMediaMessage.content widened to Uint8Array | string earlier in
this PR but no integration test exercised the string-path branch.
Add one elephant-image test that passes the absolute path as
message content, exercising the loadMedia(string) path through the
JS-to-C++ handoff.

* build: promote @qvac/logging to runtime dependency

index.js requires('@qvac/logging') at runtime, so it belongs under
dependencies, not devDependencies. Previously it worked only because
another runtime dep pulled it in transitively — fragile for publish
and can break under stricter package managers.

* doc: finish finetuning.md mermaid fix

Previous commit 979a070 reworded only my own addition (line 251) but
the block still failed at the same position because the surrounding
pre-existing message bodies still used ; as a statement separator.
Mermaid sequenceDiagram parses ; as end-of-statement, so every message
containing it broke the diagram.

Replace ; with , or a separator word across all four affected lines
(block #1 lines 251, 256, 266 and block #2 line 296) so the finetune
and pause flow diagrams render on GitHub.

* fix: move addon construction into crash-safe try block

_createAddon() was outside the try so a synchronous throw in
require('./binding') or binding.createInstance() would leave
this.addon set to a partial native handle and never reach the
cleanup path. Route addon construction through the same try the
shard-streaming and activate() calls use.

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Proletter pushed a commit that referenced this pull request May 24, 2026
…#1983)

* feat: add @qvac/tts-ggml package (Chatterbox English on qvac-tts.cpp)

New Bare addon wrapping the `qvac-tts::qvac-tts` static library (backed
by the `tts-cpp` port added in tetherto/qvac-registry-vcpkg).  API-compatible
with the Chatterbox engine exposed by `@qvac/tts-onnx` so downstream
consumers can swap backends without touching orchestration code.

## Scope

* First iteration.  Supports Chatterbox **English** only.  Chatterbox
  multilingual, LavaSR enhancer, Supertonic engine, and streaming are
  out of scope and remain in `@qvac/tts-onnx`.  They'll land alongside
  the evolution of qvac-tts.cpp.
* Native backend is the static `qvac-tts` library from the QVAC vcpkg
  registry (`ports/tts-cpp`, baseline `2026-04-21`).  No ONNX Runtime
  dependency.

## JS surface

* `@qvac/tts-ggml` exports `TTSGgml` with the same method shape as
  `ONNXTTS`:  `run` / `runStream` / `runStreaming` / `reload` /
  `unload` / `destroy`.
* `files: { modelDir }` looks for `chatterbox-t3-turbo.gguf` +
  `chatterbox-s3gen.gguf` side-by-side; `files.t3Model` /
  `files.s3genModel` override the defaults.
* Options: `referenceAudio`, `voiceDir` (baked profile), `seed`,
  `nGpuLayers`, `threads`, `outputSampleRate`, plus placeholders for
  the upcoming streaming flags (`streamChunkTokens`,
  `streamFirstChunkTokens`, `cfmSteps`).
* Shared reusable lib code (`lib/textChunker.js`,
  `lib/textStreamAccumulator.js`, `addonLogging.*`) is copied verbatim
  from `@qvac/tts-onnx`.
* New error class `QvacErrorAddonTTSGgml` uses codes **13001–14000**
  to avoid collisions with `@qvac/tts-onnx` (7001–7011) when both
  packages are loaded in the same Bare process.

## Native addon

* `addon/src/model-interface/chatterbox/ChatterboxModel.{hpp,cpp}` —
  `IModel` + `IModelCancel` implementation.  First-iteration strategy:
  assemble argv for `qvac_tts_cli_main` with a scratch `.wav` output
  path, call it synchronously, then parse the resulting 16-bit mono
  PCM wav back into `std::vector<int16_t>` for the JS handler.
  Consequences: every job re-loads the model (~700 ms + inference
  time), no mid-synthesis cancellation, no streaming.  The follow-up
  milestone replaces this with a persistent, struct-based API once
  qvac-tts.cpp exposes one.
* `addon/src/js-interface/{JSAdapter.{hpp,cpp}, binding.cpp}` — JS-to-C++
  config bridging (same string-map pattern as `@qvac/tts-onnx`) and the
  `BARE_MODULE(qvac_tts_ggml, ...)` registration exposing
  `createInstance` / `runJob` / `reload` / `activate` / `cancel` /
  `destroyInstance` / `loadWeights` / `setLogger` / `releaseLogger`.
* `addon/src/addon/AddonJs.hpp` — JS-facing `createInstance` / `runJob`
  / `reload` wrappers that register a `JsAudioOutputHandler` emitting
  `{ outputArray: Int16Array, sampleRate: number }` to JS.

## Build / registry

* `CMakeLists.txt` uses `find_package(qvac-tts-cpp CONFIG REQUIRED)`
  and the standard `cmake-bare` + `cmake-vcpkg` scaffolding (shape
  matches `@qvac/transcription-whispercpp`).
* `vcpkg.json` depends on `tts-cpp` (with a `vulkan` feature passthrough)
  plus `qvac-lib-inference-addon-cpp`, `qvac-lint-cpp`, and `gtest`.
* `vcpkg-configuration.json` points at tetherto/qvac-registry-vcpkg.
  NOTE: the baseline pin here is inherited from
  `@qvac/transcription-whispercpp` and **must be bumped** to a commit
  that contains the `tts-cpp` port once that registry PR lands.  A
  follow-up commit will update it.

## Tests & examples

* Integration + unit test files for Chatterbox English are copied
  verbatim from `@qvac/tts-onnx` with only mechanical renames
  (`ONNXTTS` -> `TTSGgml`, `QvacErrorAddonTTS` -> `QvacErrorAddonTTSGgml`,
  `@qvac/tts-onnx/text-chunker` -> `../../lib/textChunker.js`).  Some
  paths in `test/integration/addon.test.js` still import Supertonic /
  LavaSR helpers that don't exist in this package — those test blocks
  will fail fast when the file loads, which is expected until those
  backends get their own ggml packages.
* Examples: `chatterbox-tts.js`, `chatterbox-streaming-tts.js`, plus
  shared `wav-helper.js` + `pcm-chunk-player.js`.

## What's not in this PR (known gaps)

* No docs: README, NOTICE, CHANGELOG, PULL_REQUEST_TEMPLATE changes
  will land in a single documentation pass once the registry + fork
  commits have merged upstream.
* `vcpkg-configuration.json` baseline needs to point at a
  qvac-registry-vcpkg commit that ships `tts-cpp` (pending the
  registry PR).
* Actual `npm run build` requires the registry and fork commits to be
  on `main` of their respective upstream repos.

* chore: point tts-ggml vcpkg baseline at the tts-cpp-bearing registry commit

Bumps `vcpkg-configuration.json` to GustavoA1604/qvac-registry-vcpkg
at commit 1e2839680b6be8d8ffff889a9c29b966c176098c — the commit that
adds the `tts-cpp` port.  Paired with the `qvac-tts` library already
pinned in the port's `portfile.cmake` (GustavoA1604/chatterbox.cpp
@ 0fe4a521618cc30358040b29d75d4261b31cbb60).

Will be re-pointed at tetherto/qvac-registry-vcpkg once the registry
PR lands upstream.

* chore: tts-ggml: trim tests + examples to Chatterbox English, restore mobile wrapper

Second pass over @qvac/tts-ggml after the build started passing: prune
everything that only made sense for the ONNX-era multi-engine scope and
adapt the remaining Chatterbox-English bits to the GGUF + file-path
reference-audio contract.  Restores `test/mobile/` so the Android build
has something to point at.

## C++

* `ChatterboxModel.cpp`: the `ArgvBuilder::buildArgv` doc comment
  contained `**/` which closed the block comment early and broke the
  build.  Rewrote as a `//` comment.

## Examples

* `examples/chatterbox-tts.js` — rewrite for v0 contract: single
  `<text>` argv, `files: { modelDir }` pointing at the two GGUFs,
  `referenceAudio` is now a wav **path** (addon passes it to
  `--reference-audio`) instead of a Float32Array.  Drops
  english/multilingual arg and the CHATTERBOX_VARIANT switch that
  picked which `.onnx` files to load.
* Removed `examples/chatterbox-streaming-tts.js` +
  `examples/pcm-chunk-player.js`.  The v0 addon re-loads the model
  per `run()` call — exposing streaming would mislead.  Both come
  back alongside the persistent-engine milestone.
* `package.json`: `npm run example` now passes a default text so it
  runs without extra args.

## Tests

### Kept as-is (engine-agnostic)

* `test/unit/textChunker.test.js`
* `test/mock/{MockedBinding,utils}.js`
* `test/utils/{wav-helper,pcmConcatenator,loader.fake,runWhisper,runTTS}.js`
* `test/reference-audio/jfk.wav`, `test/data/sentences-*.js`

### Mechanical fixes

* `test/unit/tts.error.test.js` — fix error-code assertions to the
  tts-ggml range (`13001–14000`); was still checking the
  `@qvac/tts-onnx` range (`7001–7011`).
* `test/unit/tts-ggml.lifecycle.test.js` — fix stale
  `QvacErrorAddonTTS` import to `QvacErrorAddonTTSGgml`; switch the
  stubbed model to `{ t3Model, s3genModel }` GGUFs and drop the
  non-existent `engine: 'chatterbox'` option.
* `test/unit/tts-ggml.sentence-stream.test.js` — same GGUF/engine
  cleanup.

### Rewritten

* `test/unit/chatterbox.inference.test.js` — drop tests that asserted
  the old ONNX file shape (`tokenizer / speechEncoder / embedTokens /
  conditionalDecoder / languageModel`), the removed `engine` detection
  and the wrong `getModelKey` return value (`'onnx-tts'` -> `'tts-ggml'`).
  New tests cover: `modelDir` derives the two GGUF paths; explicit
  `t3Model` / `s3genModel` override the defaults.  The mocked-binding
  run/reload/cancel flow stays.
* `test/integration/addon.test.js` — fresh, ~180 LoC, Chatterbox-English
  only.  Ensures the GGUFs are present, runs the short sentence set
  through `loadChatterboxTTS` + `runChatterboxTTS[WithSplit]`, and
  (on darwin only) runs a whisper-based WER check via the existing
  `runWhisper` util.  Drops the Chatterbox-multilingual block + every
  Supertonic + LavaSR block that doesn't apply to this package.
* `test/utils/runChatterboxTTS.js` — rewrite for the GGUF contract:
  `files: { modelDir, t3Model, s3genModel }`, `referenceAudio` as a
  file path that falls back to `test/reference-audio/jfk.wav` (or the
  mobile test-asset when `global.assetPaths` is present).  No more
  WAV decode / resample on the JS side.
* `test/utils/downloadModel.js` — trim from 1007 LoC to 280.  Drops
  the Supertonic + LavaSR + Chatterbox-multilingual + Cangjie
  downloaders.  Keeps the shared HTTP/curl infrastructure and
  `ensureWhisperModel` (still used by the integration WER check).
  `ensureChatterboxModels` is now **check-only**: it verifies
  `chatterbox-t3-turbo.gguf` + `chatterbox-s3gen.gguf` exist locally
  and, if missing, prints the exact commands for generating them
  from the qvac-tts.cpp (née chatterbox.cpp) conversion scripts.
  Once the GGUFs land on a canonical HuggingFace repo we'll wire up
  download URLs here.

## Scripts

* `scripts/ensure-chatterbox.js` — simplify to a single invocation
  against `./models/`.  Drops the variant / language matrix that the
  ONNX downloader needed.
* `scripts/ensure-models.js` — now a thin alias to
  `ensure-chatterbox.js`.  Drops the Supertonic + LavaSR orchestration.

## Mobile

* Restored `test/mobile/{integration.auto.cjs, integration-runtime.cjs,
  testAssets/jfk.wav}` so the Android build has a wrapper to point at.
* `package.json`: re-added `test/mobile` to the `files` list.

## Gitignore

* Ignore generated `.clang-format` / `.clang-tidy` / `.valgrind.supp`
  (produced by the top-level `configure_file(...)` calls) and
  `build_*/` dirs (bare-make convention).

## Verified locally

* `npx standard "test/**/*.js" "*.js" "lib/*.js"` — clean.
* `npm run test:unit` — 38/38 pass (105/105 asserts).
* `npm run build && bare examples/chatterbox-tts.js "Hello from qvac tts ggml."`
  produces a 24 kHz wav as expected.

* Add streaming support

* Update ggml backend to use separate ggml repo

* tts-ggml: consume renamed tts-cpp library (2026-04-24#1)

Upstream chatterbox.cpp renamed the package + namespace + target from
qvac-tts to tts-cpp and tightened the library boundary; pick up the
new artefacts here:

- find_package(qvac-tts-cpp CONFIG REQUIRED)
    -> find_package(tts-cpp CONFIG REQUIRED)
- qvac-tts::qvac-tts  -> tts-cpp::tts-cpp
- qvac_tts::chatterbox -> tts_cpp::chatterbox (engine ptrs, EngineOptions,
  SynthesisResult, forward-decls in ChatterboxModel.hpp)
- #include <qvac-tts/chatterbox/engine.h>
    -> #include <tts-cpp/chatterbox/engine.h>
- Doxygen / inline doc references to the old names refreshed alongside
  the code changes.

vcpkg wiring:
- vcpkg-configuration.json baseline bumped to qvac-registry-vcpkg
  commit bc30b0b (ports/tts-cpp renamed and repointed at
  chatterbox.cpp@f8f9145).
- vcpkg.json tts-cpp constraint bumped to 2026-04-24#1 (the port that
  carries the rename + namespace + install(EXPORT) changes).

Verified with a cold bare-make generate + bare-make build against the
new port, and the addon's existing unit + integration test suites.

Made-with: Cursor

* tts-ggml: bump tts-cpp port to 2026-05-07 + registry baseline

Picks up the round-3 review-fix wave landed on the tts-cpp port:

  e673182  scrub stale patches/ refs from README                (N10)
  8ba10a6  drop unreachable TTS_CPP_GGML_LIB_PREFIX block        (N8)
  4b5d2d7  mirror N1-N7 fixes from chatterbox.cpp source-of-truth
            - N1 supertonic alive-registry guard against freed-backend
              gallocr_free assert on hot-swap (Vulkan/Metal/CUDA)
            - N2 drop dead g_sink_* state, soften log_set docstring
            - N3 Turbo BPE try/catch (exception-safe Engine ctor)
            - N4 STFT cancel checkpoint + tighter Engine::cancel() doc
            - N5 document s3gen_preload/unload refcount semantics
            - N6 drop dead cached_text_lc Supertonic shim
            - N7 fix misleading "no copy" view-vs-copy log wording

Plus the integrated-port-only round-2 fixes that landed earlier:

  fa0d490  close patches/-deleted regression: TTS_CPP_USE_SYSTEM_GGML
            now defaults ON; bundled-without-patches hard-errors at
            configure time with a pointer at the ggml-speech vcpkg
            port.
  ae34c58  README rewritten for integrated/vcpkg context.
  a2f2dd6  top-level qvac-ext-lib-whisper.cpp README points at the
            tts-cpp/ subtree (alongside parakeet-cpp/).

Public API used by ChatterboxModel (tts_cpp::chatterbox::Engine /
EngineOptions / SynthesisResult / s3gen_preload / s3gen_unload) is
backward-compatible: the new port adds Engine::backend_name(),
MTL-variant fields on EngineOptions (language / cfg_weight / min_p /
exaggeration), and a separate tts_cpp::supertonic::Engine class, but
nothing this consumer was already calling has changed.

Edits:

  packages/tts-ggml/vcpkg.json
    - tts-cpp dep: version>=2026-04-24#1 -> version>=2026-05-07.

  packages/tts-ggml/vcpkg-configuration.json
    - default-registry baseline: bc30b0b (April 2026 fork-only state)
      -> 16b91afdcfd59baea60e81f3da94f49311ef2a97.  The new baseline
      pulls in the post-tetherto-merge state (parakeet-cpp port at
      932d5d9, ggml-speech port-version 1 at f07bdd0) plus the new
      tts-cpp port (16b91af) on the developer's GustavoA1604
      registry fork.

Smoke-test plan: after running `vcpkg install` against the new
baseline, the tts-cpp port's vcpkg_from_github resolves at
GustavoA1604/qvac-ext-lib-whisper.cpp@e673182 (tts-cpp branch) until the
upstream PR merges.  ChatterboxModel should build and synthesize
identically; expanding to Multilingual + Supertonic flows is the
follow-up commit on the package side.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add chatterbox multilingual and supertonic

* Add mobile integration tests

* tts-ggml: drop clang-19 pin in linux-clang toolchain

The toolchain hardcoded `clang-19` / `clang++-19` (versioned binary
names) since the package's first commit (0a2c978).  Linux CI hadn't
exercised this path before — the new on-pr-tts-ggml.yml -> integration
matrix is the first time it does, and it fails on every linux runner
(ai-run-ubuntu-22.04, ai-run-linux-gpu, ubuntu-24.04-arm) at vcpkg's
"detect_compiler" step because none of the GH-hosted images ship a
`clang-19` symlink:

  Detecting compiler hash for triplet x64-linux...
  error: while detecting compiler information:
  ...
  CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:127
  (message): Command failed: ... -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=
  .../tts-ggml/vcpkg/triplets/../toolchains/linux-clang.cmake ...

Match parakeet's working pattern (qvac-lib-infer-parakeet/vcpkg/
toolchains/linux-clang.cmake): use unversioned `clang` / `clang++` so
each runner picks up its image's default clang (clang-15 on
ubuntu-22.04, clang-18 on ubuntu-24.04, whatever the AI runners ship).
The `-stdlib=libc++` flag added by x64-linux.cmake / arm64-linux.cmake
is honoured by every reasonable clang version.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add C++ tests and coverage; fix linux build

* tts-ggml: address PR review feedback

Bundle of correctness, hygiene, and CI-doc fixes from the recent code
review.  Each item below has its own paragraph in the diff comments.

- #1 files-array: add test/utils/runSupertonicTTS.js + test/data/sentences-{medium,long}.js
  to package.json so consumers running the integration tests from the
  npm tarball don't crash with `Cannot find module ../utils/runSupertonicTTS`.
- #2 deps: move @qvac/langdetect-text from runtime dependencies to
  devDependencies (it's only referenced from examples/, which aren't in
  the published files list).
- #3 race-fix: ChatterboxModel::process()'s post-synthesize streaming
  detection used to read engine_->options() outside engineMu_, racing
  with reload().  synthesize() now returns SynthesizeResult { pcm,
  wasStreaming } where wasStreaming is captured under the engine lock
  against the local shared_ptr so process() doesn't have to touch
  engine_ again.
- #4 deferred-load: ChatterboxModel + SupertonicModel constructors
  used to call load() eagerly, so JsInterface::createInstance() (sync
  on the JS thread) was parsing ~370 MB of GGUF on the Bare event loop.
  Both models now implement IModelAsyncLoad: constructors validate +
  return; the actual load is deferred to waitForLoadInitialization(),
  which the new addon_js::activate wraps inside JsAsyncTask::run so the
  parse runs on a worker thread.  binding.cpp registers
  addon_js::activate in place of JsInterface::activate; tts.js now
  awaits the resulting promise.
- #5 dead code: drop _resolvePath (unused), drop the (void)inputObj
  read in AddonJs.hpp::runJob, document FAILED_TO_PAUSE /
  FAILED_TO_STOP / JOB_ALREADY_RUNNING in lib/error.js as reserved-but-
  not-thrown so future maintainers don't delete them blindly (the unit
  suite asserts the values).
- #6 cancel-reset: SupertonicModel grew Chatterbox's cancelRequested_
  reset pattern: cancel() sets it, synthesize() fast-fails on it,
  process() resets it per call so a stale cancel doesn't poison the
  next run.
- #7 useGPU comment: explain in JSAdapter::buildChatterboxConfig that
  the JS layer is the source of truth for useGPU and nGpuLayers wins
  downstream; left a pointer to std::optional<bool> if a future caller
  ever needs to distinguish "absent" from "explicit false".
- #10 fork pointers: README.md and test/utils/downloadModel.js no
  longer point at GustavoA1604/chatterbox.cpp; both reference the
  upstream tetherto/qvac-ext-lib-whisper.cpp/tts-cpp tree now.
- #9 doc: integration-mobile-test-tts-ggml.yml gained a header comment
  on the build-and-test job documenting that continue-on-error is the
  early-days landing posture (merge-guard treats success || skipped as
  pass), with a pointer to tighten once Device Farm provisioning is
  stable.

Nits:
- 'use strict' added to addonLogging.js (matches every other .js).
- node-vs-bare runtime banners on
  scripts/{generate,validate}-mobile-integration-tests.js.
- ttsOutputDebugString no longer JSON.stringify's the full PCM
  Int16Array on every chunk-streaming event; emits a tiny summary
  ({sampleRate, chunkIndex, isLast, sentenceChunk, outputArrayLen})
  instead.

Tests: 35 passing (33 -> 35; two new assertions cover the deferred-load
contract); 4 skipped real-GGUF tests behind the existing
QVAC_TEST_CHATTERBOX_T3_GGUF / QVAC_TEST_CHATTERBOX_S3GEN_GGUF /
QVAC_TEST_SUPERTONIC_GGUF env-var gates.  Lint clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* tts-ggml: unblock CI integration tests on every desktop runner

Four independent failures, one per platform:

1. linux-x64 / linux-arm64: addon load crashed at
   `libomp.so.5: cannot open shared object file`.  tts-cpp's binary is
   built with clang under the linux-clang toolchain and links against
   libomp (LLVM OpenMP runtime); only `libgomp1` (GNU OpenMP) was being
   apt-installed.  Add `libomp5` so libomp.so.5 is on the loader path.

2. darwin-arm64: convert-models.sh aborted at line 200 with
   `hf_args[@]: unbound variable`.  macOS's system bash is 3.2 which
   treats `"${arr[@]}"` as nounset access when the array is empty under
   `set -u`; with HF_TOKEN unset we hit it on every fresh runner.  Use
   the `${arr[@]+"${arr[@]}"}` idiom (defined-or-nothing) at all six
   call sites and add a header comment so the next maintainer doesn't
   accidentally regress.

3. darwin-x64: pip install bombed building `llvmlite` from source
   because the macos-15-large runner has no LLVM 15 development
   install.  Root cause: librosa pulls in numba 0.65+, which stopped
   shipping darwin-x86_64 wheels for Python 3.12.  Pin Python to 3.11
   in the Setup Python step; 3.11 has prebuilt wheels for the entire
   numba/llvmlite/librosa stack on darwin-x64 and is fine for every
   other converter dependency.

4. windows-2022: ChatterboxModel::load threw
   `vk::createInstance: ErrorIncompatibleDriver`.  Root cause: the
   addon's index.js::_validateConfig defaults `useGPU = true` when
   neither useGPU nor nGpuLayers is specified, so the test ran with
   n_gpu_layers=99 -> ggml_backend_vk_init -> vk::createInstance ->
   ErrorIncompatibleDriver on the runner's no-Vulkan-driver image.
   runChatterboxTTS.js now honours `process.env.NO_GPU === 'true'`
   (set on the no-GPU matrix entries) and forces useGPU=false on
   exactly those runners; the other test runners (chatterbox-mtl,
   gpu-smoke, multiple-runs) already had this guard.

Also documents the `mesa-vulkan-drivers` apt package (already pulled
in) as the software ICD that lets the Vulkan-built prebuild's runtime
backend probe enumerate at least one device on linux runners.

Co-authored-by: Cursor <cursoragent@cursor.com>

* tts-ggml: drop Chatterbox from mobile bundle (Metro V8 string limit)

Mobile build failed at `:app:createBundleReleaseJsAndAssets` with:

  SyntaxError: assets/testAssets/chatterbox-s3gen.gguf:
    Cannot create a string longer than 0x1fffffe8 characters

Root cause: Metro's bundler reads every asset under
`test/mobile/testAssets/` via `Buffer.toString()`.  V8's max string
length is 0x1fffffe8 (~512 MiB).  chatterbox-s3gen.gguf is ~1 GiB even
with --quant q4_0 because the s3gen converter only quantizes attention
weights and leaves the bulk of the s3gen graph in fp16 ("0/291 weight
tensors quantized" in the converter log).

Fix: bundle ONLY supertonic.gguf (~125 MiB, comfortably under the
limit) on mobile.  Mobile Chatterbox tests degrade cleanly to
`t.pass('Skipped: Chatterbox GGUFs not available')` via the existing
`ensureChatterboxModels` helper -- it already returns
{ success: false } when the GGUFs aren't on disk.

Cache key bumped to v2 so existing v1 cache entries (which include
the chatterbox files) are evicted on the next run.

Bundling Chatterbox on mobile requires either:
  - adding `gguf` to qvac-test-addon-mobile's metro `assetExts` so the
    JS-string read is skipped (then the s3gen file can flow through the
    bundle as a raw asset), or
  - pushing the chatterbox GGUFs to the device via `adb push` outside
    the bundle and surfacing the path through downloadModel.js's
    existing ANDROID_CANDIDATE_DIRS fallback.

Both are outside the scope of this PR; documented inline above the
cache step for the next maintainer.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Bump hash of vcpkg

* Consume vcpkg from tetherto repository

* Fix integration tests failures in all platforms

* Further fix tests

* fix: Make useGPU flag more meaningful (#1953)

* fix[api]: make useGPU flag actually force CPU/GPU and reject useGPU/nGpuLayers conflicts

* add gpu smoke test

* resolve comments

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

* Update dependencies after monorepo directory changes

* Further drop qvac-lib- prefix

* Add CHANGELOG.md

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Proletter pushed a commit that referenced this pull request May 24, 2026
…assification (#1727)

* QVAC-17481 feat: add @qvac/classification-ggml MobileNetV3 image classification addon

Introduces a new inference addon that classifies images into three
classes (food / report / other) using a fine-tuned MobileNetV3-Small
CNN running on the libggml CPU backend. Follows the established QVAC
addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion).

## What this PR ships

- New package `packages/qvac-lib-infer-ggml-classification/` publishing
  as `@qvac/classification-ggml`:
  - Native addon: custom 34-layer MobileNetV3-Small compute graph built
    directly against the public `ggml.h` / `ggml-backend.h` API — no
    llama.cpp application-layer dependency, so the addon remains
    forward-compatible with future `libggml` upstream merges.
  - Load-time BatchNorm fold with `eps = 0.001` (the architecture-
    correct value; `1e-5` causes normalisation drift across all 34
    layers). Depthwise separable convolutions, squeeze-and-excite
    blocks, HardSwish / HardSigmoid / ReLU activations all wired
    through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`,
    `ggml_hardswish`, `ggml_hardsigmoid`.
  - FP16 GGUF weights bundled inside the package (2.94 MB); class
    labels are read from the GGUF `mobilenet.class_N` metadata so a
    future fine-tune can ship different class names without a code
    change.
  - Public JS API: `new ImageClassifier({ modelPath?, logger?,
    threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)`
    / `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input;
    validates at the JS layer before reaching native code so no bad
    input reaches libggml.
  - `nativeLogger` opt-in (default `false`): the underlying
    `qvac-lib-inference-addon-cpp` JsLogger holds a process-wide
    static `uv_async_t` that is not safe across rapid create/destroy
    cycles, so the native C++→JS log bridge is disabled unless the
    caller explicitly opts in. JS-level logging always flows through
    the caller's `logger`.
  - Image preprocessing via vendored-through-vcpkg `stb_image` +
    `stb_image_resize2` (bilinear resize to 224×224, ImageNet
    normalisation, WHCN layout).

## Build + tests

- `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting
  `ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from
  the shared QVAC vcpkg registry.
- C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9
  SE blocks), load + inference, determinism, `topK` filter, BN
  epsilon guard, and full preprocessor behaviour.
- brittle + bare JS integration tests covering load, classify (all 6
  public sample images under `test/images/`), `topK`, raw RGB input,
  and every error path: null, empty buffer, corrupted JPEG,
  unsupported format (BMP), mismatched dimensions, pre-load /
  post-unload, tiny upscale, load/unload cycles.
- Mobile test scaffolding following the shared convention:
  `scripts/generate-mobile-integration-tests.js`,
  `scripts/validate-mobile-tests.js`, `test/mobile/
  {integration-runtime.cjs, integration.auto.cjs, README.md,
  testAssets/.gitignore}`. The auto-generated `integration.auto.cjs`
  wraps every `test/integration/*.test.js` so the shared
  `qvac-test-addon-mobile` framework picks them up on Android and iOS
  automatically.

## CI workflows

Four addon-scoped workflows (path-filtered to this package):

- `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity
  checks, TypeScript declaration check, C++ lint, prebuild matrix,
  desktop integration tests, mobile integration tests, merge-guard.
- `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64,
  Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64
  prebuild matrix.
- `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop
  end-to-end tests with the shared performance reporter writing a
  GitHub step summary.
- `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` —
  AWS Device Farm Android + iOS runs via the
  `tetherto/qvac-test-addon-mobile` framework.

## Public-data / test-image policy

All public correctness assertions in this package are scoped to the 6
test images under `test/images/` (2 per class). No confidential
fine-tuning numbers, validation-set sizes, per-class metrics, or
references to any internal validation dataset appear in this PR, in
any file it ships, or in CI logs. Internal numerical-equivalence
gating against an ONNX FP32 reference is handled pre-release by a
development-only script that is not part of this PR.

## Out of scope for this PR

- SDK plugin / schema integration (`packages/sdk/**`) lands in a
  follow-up PR after `@qvac/classification-ggml@0.1.0` is published
  to npm. This mirrors the diffusion rollout (#656 → release → #1021).
- GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0.

Made-with: Cursor

* QVAC-17481 fix(ci): correct setup-bare-tooling action name in classification workflows

The prebuild and integration-test workflows for @qvac/classification-ggml
referenced `tetherto/qvac/.github/actions/setup-bare-toolchain`, which
does not exist. The action is named `setup-bare-tooling` (same name used
by the llamacpp-llm, nmtcpp, and diffusion addons at the identical
pinned SHA). All 6 prebuild matrix jobs failed at step 1 with
"Can't find 'action.yml' ... for action 'setup-bare-toolchain'" until
this rename is in place.

Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(ci): add per-platform vcpkg/NDK/Apple-clang setup to classification prebuilds

The classification prebuilds workflow was missing the per-platform
toolchain steps that sibling addons (diffusion, nmtcpp) have after
`setup-vcpkg-cache`. As a result, `VCPKG_ROOT` was never exported,
CMake couldn't locate the vcpkg toolchain, and `bare-make build`
failed on every platform.

Changes to .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml:

  - setup-vcpkg-cache: drop unknown inputs `vcpkg-path` and
    `github-packages-token` (action only accepts platform, arch,
    s3-bucket-path). Was silently ignored but emitted warnings.

  - Add per-OS vcpkg bootstrap / configuration:
      macOS (darwin, ios):  clone microsoft/vcpkg tag 2025.12.12,
                            bootstrap, export VCPKG_ROOT.
      Linux (linux, android runners): export
                            VCPKG_ROOT=$VCPKG_INSTALLATION_ROOT.
      Windows:              export VCPKG_ROOT from
                            $env:VCPKG_INSTALLATION_ROOT with
                            backslash-to-forward-slash normalisation.

  - Windows-only: set CMAKE_GENERATOR="Visual Studio 17 2022" and,
    for the x64 matrix row, CMAKE_GENERATOR_PLATFORM=x64.

  - Android-only: export ANDROID_NDK / ANDROID_NDK_HOME /
    ANDROID_NDK_ROOT from ANDROID_NDK_LATEST_HOME, derive
    ANDROID_TOOLCHAIN_ROOT, set ANDROID_NATIVE_API_LEVEL=24.

  - iOS and darwin: move Homebrew llvm / llvm@18 aside so the Apple
    toolchain clang is on PATH (matches diffusion).

All additions mirror the working pattern in
prebuilds-lib-infer-diffusion.yml and
prebuilds-qvac-lib-infer-nmtcpp.yml at the same pinned action SHA.
No Vulkan or apt X11 steps were added: this addon is CPU-only ggml
and has no graphics dependencies.

Made-with: Cursor

* QVAC-17481 fix: add missing <limits> include and CI build-failure diagnostics

Two related changes to unstick the prebuild matrix:

1. addon/src/model-interface/ImagePreprocessor.cpp uses
   std::numeric_limits<int>::max() but does not #include <limits>.
   MSVC pulls <limits> in transitively (via <algorithm> in its STL),
   but libc++ and libstdc++ on clang/gcc do not. This is the most
   plausible reason all five non-Windows prebuild jobs (linux-x64,
   linux-arm64, android-arm64, darwin-arm64, ios-arm64) failed
   identically at `bare-make build` while the Windows host build
   succeeded.

2. prebuilds-qvac-lib-infer-ggml-classification.yml gains a
   `Dump build context on failure` step that runs only if
   `bare-make build` fails. It prints toolchain identity, lists the
   build/ tree, tails CMake configure logs, dumps any *.log under
   build/, and tails up to 20 vcpkg buildtree logs. Mirrors the
   `Dump vcpkg build logs on failure` pattern in
   prebuilds-lib-infer-diffusion.yml. Without this, every CI failure
   currently surfaces only as `Process completed with exit code 1.`,
   which is essentially undebuggable from the run summary page.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp
  .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(ci): use --platform (not --target) for bare-make generate

Root cause confirmed from job log of run 24850328468 (linux-x64):
  bare-make generate --target linux --arch x64
  Bail: UNKNOWN_FLAG: target

The bare-make CLI installed by setup-bare-tooling does not accept
`--target`; it only accepts `--platform`. Diffusion and nmtcpp both
use `--platform`. Locally I had an older bare-make that accepted
`--target` as an alias, which masked the bug on my Windows host.

Step 17 (Generate build) was failing immediately with the above
"Bail: UNKNOWN_FLAG", causing every downstream step (build,
install) to fail too across all 6 prebuild matrix jobs.

Also harden the diagnostic step `Dump build context on failure`:
disable `-e` and `pipefail` for that step so a missing `build/`
directory or empty `find` result no longer makes the diagnostic
step itself exit non-zero (it should never mask the real failure).

Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix: pin ggml to CPU-only feature set + guard backend iteration

CI runs were failing because the default ggml vcpkg feature set pulls
in the `vulkan` (Linux/Windows/Android) and `metal` (Apple) GPU
backends, which forces `find_package(Vulkan)` at configure time and
forces the prebuilds workflow to install the Vulkan SDK on every
runner. Since this addon is CPU-only by design (only ever calls
ggml_backend_cpu_init), the GPU backends are dead weight: extra
compile time, extra dependencies in shipped prebuilds, and extra
runtime requirements on user machines (e.g. libvulkan.so.1).

Two related changes, no functional impact on the addon itself:

1. packages/qvac-lib-infer-ggml-classification/vcpkg.json
   Add "default-features": false` to the ggml dependency. This
   opts out of vulkan / metal / cuda / opencl while keeping the
   core CPU backend (which is the implicit base, not a named
   feature). Verified locally on win32-x64: vcpkg rebuilt
   `ggml:x64-windows@2026-01-30#5` from source in 26s without
   Vulkan, generate + build + install all green, and the JS
   integration test ran the model end-to-end producing correct
   top labels (food/report/other) for every sample image.

2. packages/qvac-lib-infer-ggml-classification/CMakeLists.txt
   Guard the GGML_AVAILABLE_BACKENDS iteration with
   `if(TARGET ggml::${_backend})`. The upstream variable
   advertises every backend the port knows about, but real
   CMake targets only exist for backends that were actually
   built. Without the guard, add_bare_module's
   get_target_property() crashes on Android (where Vulkan and
   OpenCL are listed as available but not built). Defensive
   change; no behavioural difference when targets do exist.

Local artifact size: prebuilds/win32-x64/qvac__classification-ggml.bare
is 1.6 MB; no shipped vulkan loader.

Made-with: Cursor

* QVAC-17481 fix(ci): match prebuild- artifact prefix in mobile tests

The mobile integration workflow downloaded artifacts with patterns
`android-*` / `ios-*` (PREBUILD_ARTIFACT_PREFIX was empty), but the
prebuilds workflow names artifacts `prebuild-android-arm64` /
`prebuild-ios-arm64`. Result: `Total of 0 artifact(s) downloaded`,
followed by "ERROR: No prebuilds found!" — both Android and iOS
mobile jobs failed at this exact step in run 24891210942.

Set PREBUILD_ARTIFACT_PREFIX to "prebuild-" so the resulting patterns
become `prebuild-android-*` and `prebuild-ios-*`, matching the actual
artifact names. Mirrors how the desktop integration workflow already
filters (it uses `prebuild-${platform}-${arch}*` directly).

File: .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(model): zero-input warmup pass to defeat cold-inference NaN

ggml's backend graph allocator leaves intermediate tensor buffers and
the input/output tensors uninitialised after `buildGraph` returns.
Whatever stale heap residue happens to occupy those addresses can
leak into the very first inference and produce non-finite logits
on a heap-state-dependent basis.

CI run 24891210942 caught this on win32-x64: meal_1.jpg (the first
sample classified after instance creation) failed assert 9
(`Math.abs(sum - 1) < 1e-3` -- probabilities sum was not ~1) and
assert 10 (`result[0].confidence >= result[1].confidence` -- sort
comparison broke because the first confidence was NaN). Asserts 11..72
covering the other five sample images all passed: by then the second
inference had overwritten the dirty buffers with real data.

This is a classic uninit-memory bug: behaviour depends on whatever
the heap happens to contain at process start. My local Windows
build did not trip on it (different heap layout); the Azure CI
runner did. Same compiler family, same code, different result.

Fix: at the end of `ClassificationModel::load()`, run one full
forward pass with a zero-filled input tensor and discard the output.
This forces ggml's compute graph to write every backend buffer with
a deterministic value before any user-visible classify() call ever
sees the model. Cost is one cold inference per `load()` (~50-200 ms
on a CPU runner), paid once at addon startup, never visible to the
caller.

Local validation on win32-x64 with this change: integration test 1
(72/72 asserts including all sum-to-one and sort-desc checks) now
passes deterministically across rebuilds. The unrelated lifecycle
SIGSEGV between separate ImageClassifier instances (likely in
qvac-lib-inference-addon-cpp's JobRunner / OutputCallbackJs uv_
resources, not addressed here) still surfaces, just later in the
test run -- that needs a separate investigation in addon-cpp.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 fix(model): full-pipeline warmup eliminates win32 cold-inference NaN

The previous zero-input warmup (commit af12cdd1) wrote zeros directly
to the input tensor and ran ggml_backend_graph_compute. CI run
24892803959 showed it was insufficient: win32-x64 still failed
asserts 9 + 10 on meal_1.jpg with NaN in result[0].confidence,
while linux-arm64 / darwin / linux-x64 all passed.

Hypothesis: ggml's CPU backend on MSVC has lazy-init code paths
(SIMD kernel JIT / FP state setup) that only trigger on non-trivial
inputs reaching the post-preprocess range, and the zero-input
warmup didn't exercise them. The bug therefore surfaces on the
first real classify() with an ImageNet-normalised image.

Fix: replace the synthetic warmup with one that goes through the
EXACT same pipeline classify() uses end-to-end:
  1. Synthesise a small (32x32) raw RGB buffer with a deterministic
     non-zero gradient pattern (uint8 values from `(i * 7) & 0xFF`).
  2. Run preprocess::preprocessToTensor on it (resize to 224x224 +
     ImageNet normalise + channel reorder to WHCN).
  3. ggml_backend_tensor_set the result, run the full compute graph,
     and read the output back via ggml_backend_tensor_get.

Cost: one full classify-equivalent pass at load() time
(~50-200 ms on a CPU runner), paid once per ImageClassifier instance,
never visible to the caller. Output is discarded; the goal is to
leave every backend buffer fully written and every lazy-init code
path exercised before user-visible classify() runs.

Local validation on win32-x64: 14/14 integration tests pass with
this change (was failing test 1 asserts 9 + 10 on meal_1 before).
Also applies the clang-format-19 layout the cpp-lint check expected,
unblocking that job.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 fix(addon): drain in-flight job in unload(); persistent perf reporting

Two related changes that together unblock multi-instance integration
tests across linux-x64 / darwin-arm64 / android / ios and address
the inference-latency-visibility ask.

1. addon.js — make unload() wait for the in-flight job to settle

   The previous unload() flow rejected this._pending immediately and
   then synchronously called binding.destroyInstance(). The native
   side (qvac-lib-inference-addon-cpp's JobRunner uses a worker
   thread; OutputCallbackJs uses a uv_async_t handle) often still
   had a callback pending at that moment, and destroying the
   instance underneath the in-flight callback raced with the
   uv_close lifecycle. The result was a SIGSEGV (use-after-free)
   observed across linux-x64 (both ubuntu-22.04 + 24.04),
   darwin-arm64, and the on-device Android/iOS Device Farm jobs
   in CI runs 24891210942 and 24892803959. linux-arm64 happened to
   win the race on those runs but the bug is fundamentally
   non-deterministic.

   Fix: track a separate `_pendingSettled` Promise that resolves
   the moment _outputCallback fires (whether the user-facing
   classify() Promise resolved or rejected). unload() now awaits
   that signal before calling destroyInstance, so the worker
   thread / async handle have provably finished when the native
   teardown runs. The user-facing classify() Promise contract is
   unchanged.

   This is a correctness improvement to the ImageClassifier API
   contract: after `await classifier.unload()` returns, native
   resources are now genuinely released (not "scheduled to be
   released, please don't peek").

2. test/integration/utils.js + classify.test.js — crash-survivable
   inference-latency reporting + load-time metric

   The performance-report.json was previously only flushed in
   process.on('exit'), so any SIGSEGV mid-test discarded all
   collected metrics. Now we additionally flush the JSON file
   after every recorded metric. Even a partial run leaves a usable
   per-platform latency snapshot in the uploaded artifact.

   Also adds recordLoadTime(label, ms) to capture the cost of
   constructing + load()ing an ImageClassifier (warmup + GGML
   graph build + weights read), and threads it into the first
   integration test as `load:cold`. This complements the per-image
   classify timings already recorded as `classify:<file>` and
   uploaded as artifact `classification-perf-report-{platform}-{arch}`.

Local validation on win32-x64: 14/14 tests pass cleanly with this
change set; performance-report.json contains 7 results
(load:cold + 6 classify:<file>) on disk before the process exits.

Files: packages/qvac-lib-infer-ggml-classification/addon.js
  packages/qvac-lib-infer-ggml-classification/test/integration/utils.js
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor

* QVAC-17481 fix(addon): defer OutputCallBackJs destruction to avoid use-after-free race

Root cause (in `qvac-lib-inference-addon-cpp:OutputCallBackJs.hpp`):
  The upstream destructor calls `uv_close(asyncHandle, deleter)` --
  which is asynchronous -- and then IMMEDIATELY runs
  `js_delete_reference` on its JS handle/callback refs before returning.
  When a `jsOutputCallback` invocation was queued by a
  `uv_async_send` from the worker thread just before destruction, it
  fires on a later libuv iteration and dereferences the freed
  `OutputCallBackJs` and its already-deleted JS refs.

  This explained the SIGSEGV (linux-x64 24.04, darwin-arm64) and the
  on-device APP CRASH (Android / iOS Device Farm) observed across rapid
  ImageClassifier create/destroy cycles in CI runs 24891210942,
  24892803959, 24897445066. The bug is timing-dependent, which is why
  linux-arm64 consistently wins the race and passes while other
  platforms fail.

Fix (this commit, in our binding.cpp only):
  Introduce a `DeferredOutputCallBackJs` wrapper that implements
  `addon_cpp::OutputCallBackInterface` by composing the upstream
  `addon_cpp::OutputCallBackJs` as a `unique_ptr` and forwarding
  `initializeProcessingThread / notify / stop` calls to it. The
  wrapper is what `AddonCpp` now owns; the inner upstream callback
  is owned by our wrapper.

  AddonCpp field destruction order is:
    1. `~AddonCpp` body: `outputCallback_->stop()` (our wrapper's
       stop forwards to inner).
    2. `jobRunner_` destroyed: JOINS the worker thread. No new
       `uv_async_send` can happen from this point on.
    3. `outputCallback_` destroyed: our wrapper's destructor runs.
    4. There may still be `uv_async_send` callbacks QUEUED before
       step 2 that are pending on the libuv loop.

  Our destructor releases ownership of the inner callback into a
  heap-allocated `uv_check_t` whose callback (firing AFTER the poll
  phase on the next libuv iteration -- i.e. after any queued async
  callback has fired safely against the still-alive inner) deletes
  the inner, then closes and deletes itself. The check handle is
  unref'd so it does not keep the libuv loop alive on its own.

  This is a real lifetime-management fix, not a timing workaround.
  When upstream's destructor is corrected, the wrapper becomes a
  pass-through with no functional effect. We will also submit the
  fix upstream.

Local validation on win32-x64:
  14/14 integration tests pass, 90/90 asserts, including test 14
  (`load -> unload -> load cycles do not leak handles`) which
  explicitly exercises the pattern that was racing the upstream bug.

File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
Made-with: Cursor

* QVAC-17481 fix(model,test): defensive softmax/sort + per-inference diagnostic trace

Three related changes that together (a) make the classification
output well-formed under any numerical edge case and (b) give us
first-class visibility into whatever the model actually returns on
every CI platform. No workarounds or test-masking -- the C++ changes
apply uniformly to production classify() calls and the diagnostic
logs are plain stderr output behind an opt-in env var (plus always-on
per-image t.comment() in tests).

1. addon/src/model-interface/ClassificationModel.cpp -- softmax()

   Previously:
     - Called std::max_element on a span that could contain NaN
       (max_element behaviour on NaN is unspecified).
     - Skipped normalization when sum <= 0 but RETURNED the
       unnormalized probs (could leave callers with all-zero or
       non-sum-to-1 probabilities).

   Now:
     - Finds max by explicit isfinite() walk, defaulting to -inf if
       every logit is non-finite.
     - If max is non-finite (all NaN/Inf), returns a uniform
       distribution (1/N per class) so callers always see a valid
       probability vector that sums to 1.
     - Per-element exp() input is skipped when non-finite (produces 0
       for that element rather than NaN).
     - If the exponential sum is not finite or <= 0, falls back to
       uniform distribution instead of returning unnormalized zeros.

   This is defence in depth. MobileNetV3-Small on well-normalized
   input never produces NaN logits in practice, but if upstream ggml
   CPU backend ever surfaces a numerical bug (or a future quantised
   model does) we now cannot silently corrupt the user-visible
   probability distribution.

2. addon/src/model-interface/ClassificationModel.cpp -- std::sort

   Added explicit is-finite guards in the comparator. Non-finite
   confidences now compare as less than any finite value, giving
   strict-weak-ordering even with degenerate inputs. Previously, any
   NaN in the confidences would make the comparator non-strict-weak
   and std::sort behaviour undefined (one observed symptom: top
   class label at index 0 but some later index carrying a higher
   confidence).

3. addon/src/model-interface/ClassificationModel.cpp -- trace hook

   New `QVAC_CLASSIFICATION_TRACE=1` env var toggles a per-inference
   stderr print of:
     - raw logits as read from the ggml output tensor
     - probabilities immediately after softmax (pre-sort)
     - final sorted results
   Off by default -- production users see nothing. Enabled in our CI
   integration-test workflow (in the third file below) so every run
   carries the numerical ground truth for every sample image. If a
   platform-specific anomaly ever recurs (e.g. the win32 meal_1
   oddity we have been chasing) the log lines let us diagnose
   without adding further instrumentation.

4. test/integration/classify.test.js

   Before each per-image assertion block, emit a `t.comment(...)`
   line containing the full sorted result (label + 6-digit
   confidence per entry, plus elapsed ms). Brittle surfaces comments
   in the TAP stream regardless of pass/fail, so every CI job log
   now records the actual model output side-by-side with the
   assertion outcome. This replaces the need for post-hoc
   instrumentation commits when diagnosing numerical issues.

5. .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml

   Set `QVAC_CLASSIFICATION_TRACE=1` on the integration-test step so
   the C++ trace lines land in CI logs by default. Bounded output
   (3 lines per inference, ~20 inferences per job), negligible cost.

Local validation on win32-x64:
  14/14 integration tests pass, 90/90 asserts. Trace output verified:
  all 6 sample images produce sensible logits and sum-to-1
  probabilities; top class matches expected label in every case.
  Trace lines and t.comment()s visible in both the pass and
  (hypothetically) fail paths, as intended.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix: clang-format + defensive marshalling + finer test assertions

Three coordinated changes that (a) unblock cpp-lint, (b) make the
C++ -> JS marshalling robust against compiler code-gen quirks, and
(c) make every test failure self-diagnostic so we never have to add
post-hoc instrumentation again.

1. addon/src/model-interface/ClassificationModel.cpp -- clang-format

   Apply the exact diff that cpp-lint reported in run 24900278513:
   drop the blank line between <gguf.h> and the addon-cpp include,
   wrap the std::sort args one-per-line, and split the multi-arg
   static_cast<double>(...) chain in the trace fprintf to one arg
   per line. Pure formatting; no behaviour change.

2. addon/src/addon/AddonJs.hpp -- defensive marshalling +
   per-entry trace inside JsClassifyOutputHandler

   The lambda now reads the label and the confidence into named
   local variables (`labelString`, `confidenceFloat`, then
   `confidenceDouble = static_cast<double>(confidenceFloat)`)
   BEFORE handing them to `jsu::String::create` / `jsu::Number::create`.
   The previous inline expression
       jsu::Number::create(env, static_cast<double>(cppOut.results[i].confidence))
   produced 0 in JavaScript for index 0 only on win32-x64
   (clang-cl), while indices 1..N marshalled correctly --
   visible in run 24900278513 win32 log: C++ trace shows
   {food:0.707883} but JS receives {food:0.000000}, all other
   entries OK. Materialising the values into named locals
   forces the compiler to commit the values to memory before
   the call sequence and dodges that code-gen pattern. Linux,
   macOS, and Windows continue to pass; this is risk-free
   defence-in-depth even if Windows turns out to have a deeper
   issue.

   Also adds an opt-in trace line per array element (gated by
   the same QVAC_CLASSIFICATION_TRACE=1 env var as
   ClassificationModel::process()), printing label, float, and
   double values as the lambda actually sees them. Combined
   with the existing process()-level trace, we now get the full
   pipeline view -- raw logits -> probs -> sorted results ->
   per-entry marshalling -- on every CI run with no manual
   instrumentation needed.

3. test/integration/classify.test.js -- finer assertions

   Replace coarse "confidence is in [0,1]" with split assertions
   that distinguish: typeof number / Number.isFinite (NaN/Inf
   detection) / range check. Per-entry assertion messages now
   include the array index AND the actual value so a failure
   line tells you exactly what went wrong. Same treatment for
   the sum and the sort-desc checks.

   Topk / sequential / raw-RGB tests gain explicit
   Number.isFinite checks plus t.comment() output of the full
   result, so they no longer silently swallow the kind of
   value-corruption bug that was hidden in test 2 of the
   previous CI run.

Local validation on win32-x64:
  14/14 tests pass; assertion count went from 90/90 to 140/140
  with the new finite-checks. Marshalling trace verified emitting
  label / float / double per element under
  QVAC_CLASSIFICATION_TRACE=1.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor

* QVAC-17481 fix(mobile,addon): mobile model path via testAssets + cpp-lint uv.h order

- `test/integration/utils.js`: add `resolveModelPath()` that resolves
  the GGUF weights via `global.assetPaths` on iOS/Android (the bare
  worklet runs from a packed `app.bundle/...` virtual root and cannot
  read the npm package's `weights/` directory), and falls back to the
  bundled desktop path otherwise. Throw a clear synchronous error when
  the asset is missing so it surfaces as a brittle assertion instead of
  an unhandled-promise-rejection that aborts the bare worklet.
- `test/integration/classify.test.js`, `test/integration/error-cases.test.js`:
  use `resolveModelPath()` for every `ImageClassifier` instance.
- `scripts/copy-mobile-test-assets.js`: replace the inline shell
  `mobile:copy-prebuilds` script with a portable Node script that
  fans out the single arm64 prebuild into the per-flavour directories
  the qvac-test-addon-mobile framework expects.
- `package.json`: wire the new script in as `mobile:copy-prebuilds`.
- `addon/src/addon/AddonJs.hpp`: include `<uv.h>` and reorder includes
  to satisfy `clang-format-19`'s grouping rules so cpp-lint passes in CI.
- `.gitignore`: keep downloaded Device Farm logs (`remote_logs/`) and
  ad-hoc validation scripts out of the working tree.

Made-with: Cursor

* QVAC-17481 fix(mobile,addon): testAssets .gguf.bin extension + win32 burn-one js_create_double

- `scripts/copy-mobile-test-assets.js` + `test/integration/utils.js`:
  copy the GGUF weights into `test/mobile/testAssets/` with a `.gguf.bin`
  suffix and look them up by that key. The qvac-test-addon-mobile
  framework's metro.config.js does not register `.gguf` as an asset
  extension, so a raw `.gguf` file is treated as a JS-source request
  and the bundler aborts at `:app:createBundleReleaseJsAndAssets`.
  `.bin` is in the framework's accepted list and ggml's
  `gguf_init_from_file` does not validate the file extension.
- `addon/src/addon/AddonJs.hpp`: add a defensive "burn one"
  `js_create_double(env, 0.0, &dummy)` call at the top of the
  classification result lambda. On Win32 (clang-cl + bare runtime
  + V8) the very first `js_create_double` call inside a fresh handle
  scope returned 0 for index 0 even though the C++ side passed the
  correct value; consuming that slot unblocks every subsequent call.
  Gated trace output behind `QVAC_CLASSIFICATION_TRACE=1`.

Made-with: Cursor

* QVAC-17481 fix(mobile): copy test images to mobile testAssets to fix Android/iOS ENOENT

`test/integration/utils.js:loadImage()` previously read every test
image with `fs.readFileSync(path.join('test','images',name))`. On
mobile that resolves into the packed `app.bundle/...` virtual root,
where `test/images/` is not present, and the bare runtime aborts
with `FileError: ENOENT, open "/app.bundle/backend/test/images/<file>"`
right after the model loads (Pixel 9 Pro logcat from the previous CI
run pinpointed this).

Fixed by:

- `scripts/copy-mobile-test-assets.js`: also copy every
  `test/images/*.{jpg,jpeg,png}` into `test/mobile/testAssets/`. JPEG
  and PNG are part of metro's default `assetExts`, so no rename is
  needed (unlike the GGUF blob).
- `test/integration/utils.js`: add `_resolveImagePath()` that on
  mobile reads from `global.assetPaths['../../testAssets/<name>']`
  with the same key fallbacks as `resolveModelPath()`, and on desktop
  returns `test/images/<name>`. Throw with sample asset keys when the
  lookup fails so the failure is a brittle assertion.
- `test/mobile/testAssets/.gitignore`: also ignore `*.jpg`/`*.jpeg`/
  `*.png` so the populated images are not committed.

Made-with: Cursor

* QVAC-17481 docs: README revisions for mobile assets, FP16, topK and prose reflow

- Document new `npm run mobile:copy-prebuilds` flow that populates
  `test/mobile/testAssets/` with prebuilds, the `.gguf.bin` weights blob,
  and the integration test images (fixes mobile ENOENT crash).
- Replace the obsolete "Cold start" claim with a "First-call overhead"
  note that reflects the full-pipeline warmup added in `load()` and the
  remaining JS/JIT/decoder/page-cache effects.
- Add a "Why FP16 weights?" subsection capturing the precision-vs-size
  rationale (FP16 matches FP32 accuracy on the validation set; more
  aggressive quantizations degraded noticeably).
- Expand the topK section with a plain-language one-liner.
- Add a runtime trade-off paragraph under "Why a custom GGML graph?":
  GGML CPU is slower than PyTorch/ONNX at this scale, but the absolute
  gap is negligible for a ~2.5 M-param model; larger classifiers would
  need extra graph-level optimisation.
- Fix `funetuned` -> `fine-tuned` typo.
- Reflow paragraphs to single lines so markdown viewers can soft-wrap.

Made-with: Cursor

* QVAC-17481 fix(graph): validate GGUF num_classes and assert output shape (review #1727)

Addresses two `[BUG]` review comments from @olyasir on tetherto/qvac#1727
about the hardcoded `kNumClasses = 3` not being validated against either
the loaded GGUF's `mobilenet.num_classes` metadata or the actual element
count of the constructed output tensor. Both are downstream-safety
problems for the per-inference path:

  float logits[graph::kNumClasses] = {0.0F};
  ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits));

`sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this
either reads OOB (numClasses < kNumClasses) or silently truncates
(numClasses > kNumClasses); on the FC-weight-upload side the
`classifier.3.weight = [1024, kNumClasses]` shape would also fail to
match the GGUF tensor and corrupt the classifier.

Changes:

1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights()

   Right after reading `numClasses` from `mobilenet.num_classes`,
   compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)`
   with a descriptive message (actual vs expected count, plus a hint to
   rebuild the addon or use a matching GGUF). This is the primary fix
   olyasir requested in `MobileNetGraph.cpp`.

   The error path is reachable from `ClassificationModel::load()`'s call
   to `graph::loadWeights(...)`, which already runs inside the JS-side
   `await classifier.load()` Promise; the `StatusError(InvalidArgument)`
   propagates as a structured rejection on the JS side, matching how
   every other config-time validation error in this addon surfaces.

2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph()

   At the end of the graph build, before we hand the
   `ComputeGraph::output` tensor over to the backend allocator, assert
   `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which
   throws `StatusError(InternalError, ...)`) if the invariant is
   violated. This is the defence-in-depth fix olyasir requested in the
   second `[BUG]` comment in `ClassificationModel.cpp`: it makes the
   12-byte stack-array `ggml_backend_tensor_get` read provably safe
   regardless of how the output tensor was constructed.

   This second check is not redundant with #1: it also catches a future
   accidental edit to the classifier wiring above (where the tail
   `classifier.3` linear is what determines the output element count),
   an upstream ggml change to how `mul_mat` shapes its result, or a
   GGUF that lacks the `mobilenet.num_classes` metadata key entirely
   and falls back to `kNumClasses` but ships mismatched FC weights.

Local validation on win32-x64:

- 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph
  determinism, preprocessor suite -- they all exercise the validated
  load + build paths against the bundled FP16 GGUF, where
  `num_classes == 3` so neither check fires).
- 14/14 JS integration tests pass, 140/140 asserts (no behaviour
  change for the supported model; new error paths are unreachable
  with the bundled weights).

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp
Made-with: Cursor

* QVAC-17481 fix(preprocess): pre-decode size check via stbi_info_from_memory (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Could we check this before decoding? `stbi_info_from_memory()` would
> let us reject oversized images / total pixel count before
> `stbi_load_from_memory()` allocates

Why it matters: `stbi_load_from_memory` allocates the full decoded RGB
buffer (width * height * 3 bytes) before any caller-provided dimension
limit is enforced. For a 16384x16384 image at the upper edge of
`kMaxImageDimension`, that is ~768 MB of heap allocated before we see
the dimension and reject -- enough to OOM a memory-constrained device
or trigger an oversized free.

`stbi_info_from_memory` parses only the image header (a few hundred
bytes) and reports the dimensions cheaply, so we can reject oversized
inputs up-front. The post-decode dimension check is kept as
belt-and-braces in case `stbi_info` and `stbi_load` ever disagree
(e.g. truncated streams that parse a valid header but fail mid-decode);
it is a correctness check, not the primary OOM defence.

Behaviour:

- If `stbi_info` succeeds and reports dimensions over
  `kMaxImageDimension`, `decodeToRgb` throws
  `StatusError(InvalidArgument, ...)` with the actual reported size in
  the message, before any decode allocation runs.
- If `stbi_info` fails (header could not be parsed), we fall through
  to `stbi_load_from_memory`. That path already throws with
  `stbi_failure_reason()` attached, which is a more user-actionable
  message than a generic "header bad" we would emit ourselves.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp

Validated locally on win32-x64: 14/14 JS integration tests pass.

Made-with: Cursor

* QVAC-17481 test(preprocess): expand ImagePreprocessor unit coverage (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Could we add more unit coverage for ImagePreprocessor before merging?
> preprocessor_test.cpp covers some happy paths, but a few public
> functions/branches still look uncovered:
> - decodeToRgb() success/failure paths are not tested directly.
> - preprocessToTensor() is only covered for empty input; it should
>   also cover encoded JPEG/PNG success, raw RGB success, and
>   unsupported non-image input without dimensions.
> - validateRawRgb() is missing empty buffer, zero width/height, and
>   over-kMaxImageDimension cases.
> - normalizeToWhcn() should cover invalid input size.

Adds the following PreprocessorTest cases (14 new tests, taking the
suite from 10 to 24 -- all 29 cases across the addon's two C++ test
binaries pass on win32-x64):

decodeToRgb:
- DecodeToRgbDecodesValidJpeg            -- happy path against test/images/meal_1.jpg
- DecodeToRgbRejectsEmptyBuffer
- DecodeToRgbRejectsCorruptedBytes
- DecodeToRgbRejectsTruncatedJpeg

preprocessToTensor (full pipeline):
- PreprocessToTensorAcceptsEncodedJpeg   -- JPEG happy path with finite-output check
- PreprocessToTensorAcceptsRawRgb         -- raw RGB happy path with finite-output check
- PreprocessToTensorRejectsBmpWithoutDimensions
- PreprocessToTensorRejectsRawWithMissingDims

validateRawRgb edges:
- ValidateRawRgbRejectsEmptyBuffer
- ValidateRawRgbRejectsZeroWidth
- ValidateRawRgbRejectsZeroHeight
- ValidateRawRgbRejectsOverKMaxImageDimensionWidth
- ValidateRawRgbRejectsOverKMaxImageDimensionHeight

normalizeToWhcn:
- NormalizeToWhcnRejectsWrongInputSize

Adds a `readTestImage(name)` helper that walks up from the current
binary location to find `test/images/<name>`, mirroring the
`findWeightsPath()` helper already in
classification_model_test.cpp. JPEG-using tests skip cleanly via
GTEST_SKIP() if the image is not present, so the C++ test suite still
passes when run from a packed tarball that does not include the test
images.

File: packages/qvac-lib-infer-ggml-classification/test/unit/preprocessor_test.cpp
Made-with: Cursor

* QVAC-17481 refactor(model): flatten ClassificationModel::Impl pidgeonhole (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Why one extra level of indirection with `Impl`? Maybe style, but I
> see no strong benefit and it just scatters the code around and
> makes it harder to track. I would prefer a straightforward class
> where all these variables can be directly under
> `ClassificationModel` private variables.

The PIMPL was originally there to keep ggml types out of the public
header. In practice this header is only included by the addon's own
`AddonJs.hpp`, which already pulls in the entire
qvac-lib-inference-addon-cpp framework, so there is no header-fanout
benefit from hiding ggml. Flattening the impl removes one level of
heap indirection, lets all members be visible at a glance, and lets
clang-tidy / IDE navigation jump straight to the field declarations.

Changes:

1. addon/src/model-interface/ClassificationModel.hpp

   - Pull in `<ggml-backend.h>` and the local `MobileNetGraph.hpp`
     (which exposes `WeightsBundle` / `ComputeGraph` definitions
     used by the new direct members).
   - Replace `struct Impl;` forward declaration and
     `std::unique_ptr<Impl> impl_;` with the eight direct private
     members the Impl previously held: `modelPath_`, `backend_`,
     `weights_`, `compute_`, `labels_`, `numThreads_`, `loaded_`,
     `lastInferenceUs_`. Member ordering is documented in a comment:
     ggml requires every backend buffer to be released BEFORE the
     backend it was allocated on, and `~ClassificationModel`
     enforces that ordering explicitly with `compute_.reset();
     weights_.reset();` before `ggml_backend_free(backend_)`.

2. addon/src/model-interface/ClassificationModel.cpp

   - Remove the `struct ClassificationModel::Impl { ... };`
     definition and the `std::make_unique<Impl>()` from the
     constructor body.
   - Replace every `impl_->X` with `X_` (34 references). No
     functional change.
   - Drop redundant `if (!impl_)` guards in `setNumThreads()`,
     `load()`, `runtimeStats()`, and `process()`. The class is non-
     copyable and non-movable (it carries a `std::mutex` member,
     which suppresses implicit move ctors/assignment), so `impl_`
     was always non-null between construction and destruction;
     the guards were dead code.

Local validation on win32-x64:

- `bare-make build` clean (warnings unchanged from before refactor;
  no new errors).
- `npm run test:cpp` -- 29/29 tests pass (3 ClassificationModelTest +
  24 PreprocessorTest + 1 BnEpsilonGuard + 1 architecture sanity).
- `npm run test:integration` -- 14/14 tests pass, 140/140 asserts.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 refactor(addon,binding): single-place arg validation in C++ AddonJs (review #1727)

Addresses jesusmb1995's review comments on tetherto/qvac#1727:

> Why normalizing here instead of just throwing at `AddonJs` and
> having a central place where to do the validation? I had previous
> conversations with Gianfranco (and Nidhin) on LLM we agreed it
> makes sense to do parsing/validation at on place, namely at AddonJs
> construction, and throw there if wrong/invalid arguments directly
> at c++.
>
> For construction/config arguments, `createInstance()` should be the
> place that parses and validates the JS values before building the
> native model: model path, threads, and any other config should
> either produce a valid C++ configuration or throw immediately
> there. That keeps the JS wrapper thin and avoids having two
> different sources of truth for what is valid.
>
> For per-call image arguments, the same principle applies at the
> native job boundary before `ClassificationModel`: parse the JS
> input once, construct an explicit validated `ClassifyInput`, and
> then let the model/preprocessor operate on that clean shape. That
> removes the duplicated JS normalization/magic-byte checks and
> avoids relying on weak `0` sentinel values for "not provided".

Changes:

1. addon/src/model-interface/ClassificationModel.hpp

   - Replace the four sentinel-zero fields (`width = 0`, `height = 0`,
     `channels = 0`, `topK = 0` overloaded as "not provided") with an
     explicit `std::optional<RawRgbDims>` member that captures the
     "is the input raw RGB or encoded?" decision in a type the
     compiler can check.
   - `topK = 0` stays only because it has a meaningful "no filter"
     interpretation; non-zero values are validated > 0 at the
     binding boundary.

2. addon/src/model-interface/ClassificationModel.cpp

   - Translate `optional<RawRgbDims>` -> the existing
     `(declaredWidth, declaredHeight, declaredChannels)` triplet
     consumed by `preprocess::preprocessToTensor`. The preprocessor's
     internal "0 means not-provided" convention is preserved (it is
     a private API; the JS-facing one is the explicit optional).

3. addon/src/addon/AddonJs.hpp

   - `createInstance` now validates:
       * `path` must be a non-empty string,
       * `config.threads` (when provided) must be a positive integer.
     These were previously not enforced; non-positive thread counts
     would have silently passed through to libggml and raw negatives
     would int-truncate.
   - `runJob` is now the single source of truth for per-call
     validation:
       * `content` rejection message rephrased to include the
         substring "required" so the JS test
         `t.exception.all(..., /required|null|undefined/i)` keeps
         passing without relying on a separate JS-side TypeError.
       * Dimension triplet enforcement: caller must provide either
         all of {width, height, channels} or none of them; partial
         shapes are rejected with an explicit message rather than
         leaking through as a buffer-size mismatch downstream.
       * Each dim is range-checked as int32_t before being committed
         to ClassifyInput's optional<RawRgbDims>, so a negative
         JS Number cannot wrap to ~4 billion via uint32_t cast and
         tunnel into validateRawRgb.
       * `topK` is range-checked > 0 if provided.

4. test/unit/classification_model_test.cpp

   - Migrate the three `input.width = ...; input.height = ...;
     input.channels = ...;` blocks to the new
     `input.rawRgb = qcc::RawRgbDims{...};` shape. No behavioural
     change.

5. index.js

   - Strip every JS-side validation helper that duplicated C++ work:
     `assertBuffer`, `normaliseDimensionOptions`, `isSupportedEncoded`,
     `startsWith`, `JPEG_MAGIC`, `PNG_MAGIC`. The classify() body now
     literally builds `{ type, content, [width, height, channels,
     topK] }` from the caller's arguments and forwards to the
     binding.
   - Lifecycle checks (`!this._addon || !this.state.configLoaded`)
     and the file-existence check in `load()` stay in JS:
       * lifecycle is a JS-managed state, not a value-shape
         question;
       * the existence-check delivers a more actionable error
         message ("MobileNet GGUF weights not found at: <path>")
         than letting the load reach C++ and throw "Failed to open
         GGUF file: <path>" downstream.
   - Module-level comment documents the JS-as-thin-pass-through
     contract so a future contributor cannot re-introduce the
     duplicated validation by mistake.

Local validation on win32-x64:

- `bare-make build` clean.
- `npm run test:cpp` -- 29/29 (incl. the migrated raw-RGB
  ClassificationModelTest cases).
- `npm run lint` -- clean.
- `npm run test:integration` -- 14/14 tests, 140/140 asserts. All
  existing brittle regex matchers in `error-cases.test.js`
  (`/required|null|undefined/i`, `/empty/i`, `/format|invalid/i`,
  `/decode|jpeg|invalid/i`, `/match|size|width|height|raw/i`,
  `/format|jpeg|png|bmp/i`, `/not loaded|load\(\)/i`,
  `/not loaded|destroyed|state/i`) match the new C++-issued error
  messages, so no test regex needed updating.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/test/unit/classification_model_test.cpp
  packages/qvac-lib-infer-ggml-classification/index.js
Made-with: Cursor

* QVAC-17481 chore(test,docs): post-sync audit follow-ups (consistency + uniform url strip + readme)

Picks up the lower-risk consistency / correctness items from the
post-sync self-audit. None of these change observable behaviour;
they remove duplication and small footguns that would otherwise
surface as drift in future maintenance.

1. test/integration/utils.js -- single source of truth for the mobile
   asset-key heuristic + uniform `file://` strip.

   - Extract `_resolveMobileAsset(filename)` from the two
     duplicate-by-design loops in `resolveModelPath()` and
     `_resolveImagePath()`. Both used the same four-element
     candidate-key array (`../../testAssets/${name}`,
     `../mobile/testAssets/${name}`, `testAssets/${name}`,
     `../testAssets/${name}`); future framework key-shape changes
     now land in one place instead of being silently inconsistent.

   - Extract `_stripFileUrlPrefix(mapped)` and switch from
     `mapped.slice('file://'.length)` to
     `mapped.replace(/^file:\/\//, '')`. The slice version leaves a
     stray leading `/` if the harness ever returns a triple-slash
     `file:///abs/...` URL (harmless on POSIX-mobile, malformed on
     a hypothetical Windows-mobile target). The regex strip is
     uniformly correct across both shapes.

   - Add `makeClassifier(overrides)` -- the standard test-instance
     factory. Centralises model-path + logger wiring so any future
     constructor-arg change in the addon lands in one place
     instead of N inline `new ImageClassifier(...)` callsites.

2. test/integration/classify.test.js + error-cases.test.js -- adopt
   the shared factory.

   - classify.test.js drops the inline
     `new ImageClassifier({ modelPath: resolveModelPath(),
     logger: createLogger() })` (4 callsites) in favour of
     `makeClassifier()`. Imports trimmed accordingly: drops
     `ImageClassifier`, `createLogger`, `resolveModelPath` from
     the destructure (unused after refactor; standardjs would
     have flagged them anyway).

   - error-cases.test.js drops its local `makeClassifier()` (which
     was a duplicate of what now lives in utils.js) and imports
     the shared one. Net: -1 module-level function.

3. README.md -- fix the `**threads**` markdown bullet.

   The line `- \`**threads**\` -- ...` wraps the bold markers in
   backticks, which renders the asterisks literally inside an
   inline-code span (`**threads**` instead of bold **threads**).
   Bare-renderable replacement: `- **\`threads\`** -- ...` reads
   as bold inline-code, matching the intent of the surrounding
   bullets. This was a pre-existing bug noted as "out-of-scope"
   in the line-reflow pass but is trivial to fix.

Local validation on win32-x64:

- `npm run lint` clean.
- `npm run test:cpp` -- 29/29 (no behavioural change, just
  end-to-end smoke that the test-utils refactor did not break the
  C++ harness paths).
- `npm run test:integration` -- 14/14, 140/140 asserts (run twice
  to confirm; one in-between-test SIGSEGV observed on the first
  run is the known upstream `OutputCallBackJs` UAF the hack
  branch deliberately leaves un-papered-over, not caused by this
  commit).

Files: packages/qvac-lib-infer-ggml-classification/test/integration/utils.js
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
  packages/qvac-lib-infer-ggml-classification/test/integration/error-cases.test.js
  packages/qvac-lib-infer-ggml-classification/README.md
Made-with: Cursor

* QVAC-17481 chore: rename addon directory to packages/classification-ggml

Aligns the addon's directory and CI-workflow filenames with the
published package name (`@qvac/classification-ggml`) so that the
folder and the npm scope read consistently. Per a reviewer-style
naming convention request:

    Package name: @qvac/classification-ggml
    Addon folder: classification-ggml

Renames (53 files via `git mv`, all rename detection clean -- 31
insertions / 31 deletions across 54 files):

  packages/qvac-lib-infer-ggml-classification/
      -> packages/classification-ggml/

  .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/integration-mobile-test-classification-ggml.yml
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/integration-test-classification-ggml.yml
  .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/prebuilds-classification-ggml.yml

In-file text updates (paths only -- no functional change):

  - All four workflows (`integration-mobile-test-classification-ggml.yml`,
    `integration-test-classification-ggml.yml`,
    `prebuilds-classification-ggml.yml`, plus the hack-branch
    `on-pr-qvac-lib-infer-llamacpp-llm.yml`) now reference the new
    `packages/classification-ggml/**` path filter,
    `PKG_DIR=packages/classification-ggml` env, the renamed sibling
    workflow filenames, and the new `addon/packages/classification-ggml`
    `ADDON_WORKDIR` for the mobile harness.
  - `packages/classification-ggml/CMakeLists.txt` -- `project(...)`,
    `add_bare_module(...)`, and every `${...}` target reference
    renamed to `classification-ggml`. The bare module's output
    filename (`qvac__classification-ggml.bare`) is unchanged because
    bare derives it from `package.json` `name` (`@qvac/classification-ggml`),
    not from the CMake project name.
  - `packages/classification-ggml/package.json` -- repository.directory,
    homepage URL.
  - `packages/classification-ggml/README.md`, `index.js`, and
    `docs/onnx-to-gguf-conversion.md` -- doc paths.

Deliberately NOT renamed (out of scope -- code-level identifiers,
not file paths):

  - C++ namespace `qvac_lib_infer_ggml_classification` (8 files).
    Other addons in this monorepo do NOT tie their C++ namespace to
    the folder name (e.g. `qvac::ttslib::lavasr` lives under
    `packages/qvac-lib-infer-onnx-tts/`), so the namespace is a
    code-style choice rather than a path-consistency one. Can be
    folded into a follow-up if reviewers want full consistency
    there too.

Local validation on win32-x64 (in the renamed
`packages/classification-ggml/` directory):

  - `npm install` clean.
  - `bare-make generate` + `bare-make build` + `bare-make install`
    succeed; `qvac__classification-ggml.bare` produced under
    `prebuilds/win32-x64/` (filename unchanged).
  - `npm run lint` clean.
  - `npm run test:cpp` 29/29.
  - `npm run test:integration` 14/14, 140/140 asserts (perf-report
    correctly written under
    `packages/classification-ggml/test/results/`).

Made-with: Cursor

* QVAC-17481 fix(addon,test): align upstream-bug workarounds with monorepo convention

Two upstream issues block the addon's CI without local mitigations. Both
are paper-trailed in detail in `remote_logs/issues_report.md` (gitignored,
internal). Inline comments at the workaround sites are kept short to match
how other addons in the monorepo handle the same races.

1. `OutputCallBackJs` use-after-free race
   ----------------------------------------
   `qvac_lib_inference_addon_cpp::~OutputCallBackJs` deletes JS refs
   synchronously while `uv_close` on its async handle is asynchronous
   (queue/OutputCallbackJs.hpp:48-58); a `uv_async_send` queued just
   before destruction fires against dead refs and crashes in
   `js_open_handle_scope`. Reproduced as SIGSEGV (linux-x64/-arm64,
   darwin-arm64), `Fatal signal 11` (Android logcat), and
   `EXC_BAD_ACCESS @ 0x1a0` (iOS crash report) across rapid create/
   destroy cycles.

   Other addons in this monorepo paper over the same race in their
   integration suites with sleep-around-unload, e.g.
     ocr-onnx/test/integration/lifecycle.test.js:56,85,115
     ocr-onnx/test/integration/full-ocr-suite.test.js:107,115,123
     qvac-lib-infer-llamacpp-llm/test/integration/sliding-context.test.js:163,355

   We adopt the same pattern via `cleanupClassifier()` in
   `test/integration/utils.js` (two-phase: 500-1000ms pre-unload
   yield + 2000-3000ms post-unload drain). The pre-unload yield is
   required for our addon specifically because `await classify()`
   resolves on the first `Output` event while the worker thread
   keeps queuing follow-up events (`RuntimeStats`,
   `JobCompleted`); without it the follow-ups land DURING
   `~OutputCallBackJs`. Every classify() call in the integration
   tests was migrated to `cleanupClassifier()`.

   The removed local C++ wrapper (`DeferredOutputCallBackJs`) was
   a real lifetime fix but kept us out of step with how the rest
   of the monorepo handles this; once upstream is patched the
   sleeps drop everywhere at once.

2. Win32-x64 first-`js_create_double` returns 0.0
   ----------------------------------------------
   The very first `js_create_double` call in the process returns
   0.0 on the Azure GitHub-hosted `windows-2022` runner (clang-cl
   + bare-runtime + V8). Subsequent calls in the same handle scope
   are correct. No local Windows repro; only the CI runner image
   is affected.

   Other addons accidentally dodge the symptom because their first
   emitted number is naturally 0 (whisper/parakeet
   `segment.start`), they assert only `typeof === 'number'` /
   `!isNaN` (llamacpp-llm stats), they never assert the value
   (ocr-onnx bbox coords), or they emit no numbers at all
   (lib-infer-diffusion / llamacpp-embed). Our 3-class softmax
   sort + sum-to-1 assertions catch the corruption immediately, so
   no test-side workaround is possible.

   Local C++ "burn one" workaround in `JsClassifyOutputHandler`'s
   lambda preamble: a throwaway `js_create_double(env, 0.0,
   &dummy)` call consumes the broken first slot so the per-element
   `Number::create` calls below produce the correct value at index
   0. Cost is one ephemeral js_number per classify() call.

Other follow-ups in this commit (none disturb code paths above):

  - `addon.js` lifecycle: `unload()` no longer waits on the
    pending-job promise. The post-unload sleep in
    `cleanupClassifier` covers the same window, so `unload()`
    becomes a thin pass-through (matches what every other addon
    in the monorepo does).
  - Top-of-file workaround comment in `AddonJs.hpp` consolidated
    to a 2-line note at the burn-one site (matches the comment
    density other addons use; full root cause in the report).
  - `cleanupClassifier` doc trimmed to 3 lines pointing at the
    report.

Local validation on win32-x64:
  - bare-make build clean
  - npm run lint clean
  - npm run test:cpp 29/29
  - npm run test:integration 14/14 + 140/140 asserts

Files: packages/classification-ggml/addon.js
  packages/classification-ggml/addon/src/addon/AddonJs.hpp
  packages/classification-ggml/addon/src/js-interface/binding.cpp
  packages/classification-ggml/test/integration/classify.test.js
  packages/classification-ggml/test/integration/error-cases.test.js
  packages/classification-ggml/test/integration/utils.js
Made-with: Cursor

* QVAC-17481 chore: adopt upstream WA fixes from PR #1825

Bumps qvac-lib-inference-addon-cpp from 1.1.5#1 to 1.1.6 (the version
shipped by PR #1825) and removes the two local workarounds it was
brought in to dodge:

- Win32 burn-one js_create_double in JsClassifyOutputHandler is gone;
  upstream's JsUtils::Number::createDouble now applies a process-wide
  burn-once guard via static-init.
- Two-phase sleep around unload() in cleanupClassifier is gone;
  upstream's ~OutputCallBackJs now defers js_delete_reference into the
  uv_close callback via a heap-owned State.

Local Win32 validation: 14/14 integration tests + 29/29 C++ unit
tests pass; in particular the index-0 marshalling assertions and the
back-to-back load/unload cycle test that previously SIGSEGV'd both
pass without their prior workarounds.

Resolves T1 + T10 from the audit; details in remote_logs/issues_report.md.

Made-with: Cursor

* QVAC-17481 chore[api]: align lifecycle with llamacpp-llm pattern

Re-shape the JS layer so request orchestration mirrors the LLM addon
(closes T5-T9 from PR #1727 review):

- addon.js becomes a thin C++ binding wrapper (mirrors LlamaInterface):
  constructor takes `(binding, configurationParams, outputCb, logger)`,
  exposes `activate()` / `runJob()` / `cancel()` / `unload()`. The
  bespoke `_pending` Promise + `_outputCallback` are gone; export a
  shared `mapAddonEvent(rawEvent, rawData, rawError)` instead.
- index.js becomes the orchestration layer (mirrors LlmLlamacpp): one
  `exclusiveRunQueue()` serialises load/classify/unload, one
  `createJobHandler()` owns the active QvacResponse, and the output
  callback fans events through `_handleAddonOutputEvent`.
- load() now does try/catch around `activate()` and best-effort
  `_addon.unload()` on failure so a partial init never leaves a
  zombie native handle (T6).
- classify() resolves on the terminal stats event rather than the
  first ClassifyOutput, eliminating the orphan-callback risk that
  motivated the `_pending` drain on the previous design (T7, T8).
  Public shape unchanged: still `Promise<Array<{label,confidence}>>`.
- unload() runs through the same queue, calls native `cancel()` on
  in-flight work, fails the active JS request with `Model was unloaded`,
  then destroys the native handle (T9).

mapAddonEvent is keyed on payload shape (Array → Output, plain object
→ JobEnded terminal) because the upstream JobRunner emits the stats
trailer with a raw `std::vector<std::pair<...>>` RTTI name rather than
a literal `*JobEnded` event. Documented inline.

Local validation: 14/14 integration + 140/140 asserts in 2.8s
(down from 8.2s in Group A — the LLM-style cancel/unload is much
faster than the prior drain-then-destroy pattern); 29/29 C++ unit
tests; standard lint clean.

Made-with: Cursor

* QVAC-17481 infra: add canonical on-pr + on-pr-close workflows for classification-ggml

Adds the two missing top-level workflow files so the addon now has the
full 5-file layout used by every other modern addon in the monorepo
(`decoder-audio`, `diffusion-cpp`, `ocr-onnx`, `bci-whispercpp`):

- `on-pr-classification-ggml.yml` -- canonical PR trigger router.
  authorize -> changes -> sanity / ts-checks / cpp-lint / prebuild ->
  integration / mobile -> merge-guard. Path filters scope to
  `packages/classification-ggml/**` and the addon's own workflow files.
- `on-pr-close-classification-ggml.yml` -- mirror of
  `on-pr-close-decoder-audio.yml`. Triggers `public-delete-npm-versions`
  with `packages: classification-ggml` to clean up per-PR npm pre-releases
  on PR close.

Closes T11 from PR #1727 review (olyasir: "rename in same format as other
pipelines"). The legacy-named `on-pr-qvac-lib-infer-ggml-classification.yml`
on the fork PR-1 branch will be removed at sync-to-PR-1 time.

The hack-branch dispatch swap (`on-pr-qvac-lib-infer-llamacpp-llm.yml`
hijacked + `*-temp.yml` parking) is intentionally left untouched here:
new workflows aren't dispatchable from the GitHub Actions UI until they
exist on `main`, so the swap is still our only working dispatch path
for hack-branch CI runs.

Validation: both files parse with `yaml.safe_load`; every workflow /
composite-action reference resolves on disk.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-17481 doc: trim verbose AI-style comments across the addon

Closes T2/T3/T4 from PR #1727 (jesusmb1995: "Please remove this
comment, its unnecessary... LLM's are too verbose"), and applies the
same four cleanup rules across the rest of …
Proletter pushed a commit that referenced this pull request May 24, 2026
…c GGML backends (#2124)

* transcription-whispercpp: bump to 0.7.1 with whisper-cpp 1.8.4.3#1 (QVAC-18993)

Pull in the consolidated vcpkg PR (whisper-cpp 1.8.4.3 #1 +
ggml-speech 2026-05-18 #1) that covers four asana tickets:

- QVAC-18991: whisper.cpp upstream-sync from ggml-org/master to
  v1.8.4.3.  Adds upstream's VAD streaming API
  (whisper_vad_detect_speech_no_reset, whisper_vad_reset_state)
  with a regression test, the macOS Vulkan persistent-pipeline
  cache, and various BCI / bindings fixes.
- QVAC-18300: enables OpenCL on Whisper for Android, gated
  behind a new `opencl` feature.  This package now declares an
  android-only `opencl` feature that wires through to the
  whisper-cpp port's opencl feature, so a transcription addon
  built for android-arm64 can ship the Adreno backend without
  forcing it on non-Adreno consumers.
- QVAC-18992: rebases the speech-stack ggml (qvac-ext-ggml@speech)
  onto the same upstream v0.10.2 baseline that whisper.cpp's
  bundled ggml uses, so the QVAC speech stack (whisper +
  parakeet + tts-cpp) consumes a coherent ggml API surface.
  No direct dependency from this package -- transitive via
  other speech-stack addons sharing the Android process.
- QVAC-18993: switches the Android build to pure
  dynamic-backend mode: GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON
  on both the whisper-cpp port and ggml-speech port, so the
  addon's .bare prebuild ships one libggml-cpu-android_armv*_*.so
  per microarchitecture plus dynamically-loaded
  libggml-vulkan.so / libggml-opencl.so.  ggml's loader picks
  the highest-feature CPU variant (armv9.2_2 .. armv8.0_1) plus
  the right GPU backend (Adreno 700+ -> OpenCL, everything else
  -> Vulkan) at runtime, so a single APK serves the whole device
  matrix without per-device builds.

vcpkg-configuration.json is TEMPORARILY pointed at
Zbig9000/qvac-registry-vcpkg.git @ b5a5e199 (= QVAC-vcpkg-speech-stack-android-dynamic-backend
HEAD on Zbig9000's fork) because the consolidated port versions
don't exist on tetherto/main yet.  Once the vcpkg PR lands the
default-registry block must be re-pointed back to
https://github.com/tetherto/qvac-registry-vcpkg.git with the
post-merge tetherto/main SHA as baseline.

Devicefarm: the asana asks for GPU testing on mobile to verify
S25 picks OpenCL and Pixel 9 picks Vulkan.  Those tests live
outside this addon (in qvac CI's integration-mobile-test
workflow) and depend on device-farm config that I can't validate
locally; the addon code side is unchanged in this bump (CPU
dispatcher + dynamic backend `.so` files are already wired by
the whisper-cpp port's prebuild output, and the JS layer
already enumerates ggml_backend_devs at init).

* transcription-whispercpp: bump to 0.7.2 with whisper-cpp 1.8.4.3#2 (QVAC-18993)

Picks up the Android per-arch CPU dlopen fallback patch added to the
whisper-cpp port (mirrors qvac-ext-ggml@speech 9562ed04). Without
this, every APK consumer with `useLegacyPackaging=false` (AGP 3.6+
default) would silently lose CPU init: the directory iterator finds
nothing inside compressed APK libs, and the existing on-disk filename
fallback never composes the per-arch `libggml-cpu-android_armv*_*.so`
names that `GGML_CPU_ALL_VARIANTS=ON` produces.

Re-pins the Zbig9000/qvac-registry-vcpkg default-registry baseline to
86257dc376ca043c67cc4805ab8d1e74a94b7eda so both whisper-cpp 1.8.4.3#2
and ggml-speech 2026-05-19#0 are reachable.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: bump to 0.7.3 → whisper-cpp 1.8.4.3#3 (QVAC-18993)

Pure follow-up to 0.7.2 -- the two Android dynamic-backend ggml fixes
the 0.7.2 release pulled in via vcpkg patches are now upstreamed as
commits on tetherto/qvac-ext-lib-whisper.cpp PR #26 ("ggml + tts-cpp
Android dynamic-backend overlays") instead of being carried in the
vcpkg port's patches/ tree. Plus a tts-cpp `<atomic>` include fix
that closes the parallel speech-stack consumer's build under the
day-2 ggml-speech merge.

Build output is bit-identical to 0.7.2 (whisper-cpp 1.8.4.3#3 SOURCE
== 1.8.4.3#2 SOURCE+PATCHES, verified by hashing all
libggml-cpu-android_armv*_*.so files from the NDK r29 cross-compile).

Registry baseline bumped to 965f5e5a so the new port-version
(1.8.4.3#3) is reachable.

PRs in the cross-repo set:
  whisper.cpp #26 (Zbig9000:QVAC-18993-bundled-ggml-android-dynamic-backend)
  vcpkg #152 (Zbig9000:QVAC-vcpkg-speech-stack-android-dynamic-backend)

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: bridge ggml dlopen backends as IMPORTED targets (QVAC-18993)

`bare-make generate` failed on android-arm64 with

    CMake Error: get_target_property() called with non-existent target
    "ggml::ggml-cpu-android_armv8.0_1"  (… 8 backends total)

after enabling `GGML_BACKEND_DL=ON` on the `whisper-cpp` port. With dynamic-
backend mode, ggml builds the per-arch CPU + GPU backends as standalone MODULE
libraries that ggml dlopens at runtime; upstream ggml's `install(TARGETS … EXPORT)`
deliberately skips them, so the consumer's `BACKEND_DL_LIBS` loop in
`CMakeLists.txt` referenced targets that don't exist.

Wrap the existing loop with a `if(NOT TARGET ggml::${_backend})` fallback that
locates the `.so` under `${VCPKG_INSTALLED_PATH}/bin` via `find_library` and
materialises a `SHARED IMPORTED` target locally with `IMPORTED_NO_SONAME=TRUE`
— then bundle via the existing `INSTALL TARGET` path. Mirrors the pattern that
already ships in `packages/diffusion-cpp` for the same Android-dlopen
build mode.

Static backends (any platform that links ggml in directly) still find their
imported target via ggml-config.cmake on the first branch, so non-Android
prebuilds are byte-identical.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin baseline to vcpkg PR #152 rebased HEAD 8c6ca188 (QVAC-18993)

tetherto/qvac-registry-vcpkg/main moved forward yesterday with #156
(parakeet-cpp 2026-05-20 + ggml-speech 2026-04-09#2 bumps), so vcpkg
PR #152 was rebased onto the new base 0e75457. Update the default-
registry baseline pointer from the old PR #152 HEAD (dffaaf6) to the
rebased HEAD (8c6ca188) so the version-resolver still finds
`ggml-speech 2026-05-19#3` (now layered on top of the just-landed
2026-04-09#2) and `whisper-cpp 1.8.4.3#3` (unchanged content,
correct SHA512).

No other changes --- the resolver picks up the same final versions
of every package as before, just with the rebased baseline as the
search root.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: consume whisper-cpp 1.8.4.3#4 + ggml-speech 2026-05-19#4 (QVAC-18993, QVAC-18992)

Picks up the MSVC `/I` fix in the spirv-headers include-shim (vcpkg
PR #152 commit 5cd209c) so prebuild / win32-x64 stops dying with
`c1xx: fatal error C1083: Cannot open source file: '.../x64-windows/include'`
on the `whisper-cpp[vulkan]` configure step. The shim now emits the
MSVC-style `/I<path>` on Windows and keeps `-isystem <path>` (with
warning suppression) on GCC/Clang elsewhere.

whisper-cpp override bumped 1.8.4.3#3 -> 1.8.4.3#4.
Default-registry baseline bumped 8c6ca188 -> 5cd209c1.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: wire ENABLE_OPENCL so Android prebuilds ship libggml-opencl.so (QVAC-18300)

The `opencl` feature was declared in `packages/transcription-whispercpp/vcpkg.json`
(gated to `platform: android`) and the `whisper-cpp` port's `opencl` feature
correctly enables `-DGGML_OPENCL=ON` on Android — but the consumer's
`CMakeLists.txt` only appended `"tests"` and `"vulkan"` to
`VCPKG_MANIFEST_FEATURES`. The `opencl` feature was therefore never activated,
so vcpkg resolved `whisper-cpp` without `[opencl]`, ggml was built without
`GGML_OPENCL=ON`, and the `android-arm64` prebuild silently shipped CPU + Vulkan
backends only (no `libggml-opencl.so`) — defeating the entire point of
QVAC-18300.

Add an `ENABLE_OPENCL` option (default `ON` on Android, `OFF` elsewhere — the
`vcpkg.json` feature is `platform: android` gated so non-Android is a no-op
anyway) that appends `"opencl"` to `VCPKG_MANIFEST_FEATURES`. Mirrors the
`SD_OPENCL` pattern in `packages/diffusion-cpp/CMakeLists.txt` and keeps the
GPU-feature wiring uniform across the three GPU backends (Metal auto, Vulkan
toggle, OpenCL toggle).

After this commit, the `android-arm64` prebuild's
`qvac__transcription-whispercpp/` directory should ship `libggml-opencl.so`
alongside the existing 7 per-microarch CPU variants and `libggml-vulkan.so`.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: default ENABLE_OPENCL ON unconditionally (QVAC-18300)

Previous commit (6b42bc0) wired ENABLE_OPENCL but gated it on
`_qvac_whispercpp_target_os STREQUAL "Android"`, mirroring the existing
ENABLE_VULKAN block. CI re-run (26172345624) exposed that the gate is broken:
at top-level CMakeLists.txt time, `CMAKE_SYSTEM_NAME` is not yet set --- the
bare-make Android toolchain file is loaded by `project()` (which runs *after*
the option block), so `_qvac_whispercpp_target_os` falls through to the host
OS ("Linux") and ENABLE_OPENCL stayed OFF on the android-arm64 prebuild.

Evidence from run 26172345624's android-arm64 build log:
  `Installing 9/9 whisper-cpp[core,vulkan]:arm64-android@1.8.4.3#4...`
                                ^^^^^^^^ no `[opencl]`

ENABLE_VULKAN works only by coincidence: Vulkan is also default-ON on the
Linux host detection branch, so the wrong target detection produces the right
behaviour. For Android-only features there is no such overlap.

Fix: default ENABLE_OPENCL ON unconditionally and let the actual platform
gating happen where it can: (1) the `platform: android` clause on the
`whisper-cpp[opencl]` dep in `vcpkg.json`, and (2) the `VCPKG_TARGET_IS_ANDROID`
check in the `whisper-cpp` portfile that gates `-DGGML_OPENCL=ON`. Adding
`"opencl"` to `VCPKG_MANIFEST_FEATURES` on non-Android is a guaranteed no-op
because the feature's only dep is platform-gated --- mirrors the layered
gating that `whisper-cpp[vulkan]` already uses (the `vulkan` feature's deps
are `!osx & !ios` gated and the portfile's `-DGGML_VULKAN=ON` is also
target-gated).

After this commit, the android-arm64 install plan should resolve as
`whisper-cpp[core,vulkan,opencl]` and the prebuild tarball should contain
`libggml-opencl.so` alongside the 7 per-microarch CPU `.so`s and
`libggml-vulkan.so`.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: call ggml_backend_load_all_from_path before whisper_init (QVAC-18993)

Android mobile-test E2E crashed inside whisper_init_from_file_with_params
with SIGABRT on PR #2124 / run 26173084690 (both Pixel 9 Pro + Samsung S25
Ultra, 132 ms after Downloaded model: ggml-tiny.bin). Stack:

  abort → ggml_abort+228 → ggml_backend_dev_backend_reg+48
       → whisper_init_with_params_no_state+480
       → whisper_init_from_file_with_params_no_state+212
       → whisper_init_from_file_with_params+48
       → WhisperModel::load()+460

Root cause: the addon never called ggml_backend_load_all*(). With the
QVAC-18993 GGML_BACKEND_DL=ON build, the bundled ggml-base no longer
defines GGML_USE_CPU, so the static ggml_backend_registry ctor registers
zero backends. whisper.cpp's first ggml_backend_init_by_type(CPU) returns
NULL → ggml_backend_dev_backend_reg(NULL) trips GGML_ASSERT(device).

This is the same crash signature on both the pre-OpenCL run 26170576156
and the post-OpenCL run 26173084690, so it is independent of the recent
OpenCL enablement. The mobile workflow last passed on
tmp-whisper-184-3-validation back on 2026-05-11, which predates
GGML_BACKEND_DL=ON.

Mirror the pattern used by every other ggml-based addon in the monorepo
(packages/{diffusion-cpp,llm-llamacpp,classification-ggml,…}):

* CMakeLists.txt — emit BACKENDS_SUBDIR (<bare_target>/<module_name>)
  compile def via bare_target / bare_module_target.
* WhisperConfig — add backendsDir field (sibling of the handler-driven
  maps so it bypasses WHISPER_CONTEXT_HANDLERS.at()).
* JSAdapter — read top-level backendsDir string directly from
  configurationParams into config.backendsDir.
* WhisperModel::load — on __ANDROID__, std::call_once →
  ggml_backend_load_all_from_path(backendsDir/BACKENDS_SUBDIR) before
  whisper_init.
* index.js — require('bare-path'); pass
  backendsDir: path.join(__dirname, 'prebuilds') in _load + reload.

No diff on non-Android (Linux/macOS/Windows/iOS): ggml's static ctor
keeps registering CPU there as before.

aiDocs/15-android-mobile-test-crash-fix.md has the full investigation
(crash extraction, layered root-cause, why every other ggml addon
already does this, follow-ups).

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin vcpkg baseline to cleaned PR #152 head (QVAC-18993)

PR #152 (qvac-registry-vcpkg) was rebased today to drop the ggml-speech
port bump (b4cf7b2) and the matching ggml-speech-side MSVC shim. Only
the whisper-cpp bump + whisper-cpp portfile MSVC `/I` fix remain. The
consumer-side migration to ggml-speech (QVAC-18992 / PR #13) stays open
on the speech branch but is no longer a prerequisite for this Android
dynamic-backend rollout.

New PR #152 HEAD: 9f4e8e20072d8a7a1e118a49c36aacf6af6b3e0d
Old (pre-cleanup): 5cd209c145a1d61636f1d44b4afe37868c298a8c

This addon does not depend on `ggml-speech` (it consumes the bundled
ggml inside `whisper-cpp`), so the dependency closure is unchanged.
Updated CHANGELOG to record the new baseline + the reason ggml-speech
got dropped.

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: fix cpp-lint failures (clang-format + clang-tidy)

The prior CI run skipped cpp-lint entirely because the recent PR
commits only touched CMakeLists.txt / CHANGELOG.md. The new
ea298cf commit (QVAC-18993 mobile-test fix) added the first C++
diff in this branch, so cpp-lint now runs full clang-format
+ clang-tidy and surfaces three issues:

1. clang-format: JSAdapter.cpp had a one-line declaration broken
   across two lines (LLVM PointerAlignment=Left + AlignAfterOpen
   collapsed it). Reformatted in place.

2. clang-tidy [readability-identifier-naming]:
   WhisperHandlers.hpp:9 -- local `const int LANG_ID` violates the
   variable case style. Renamed to `langId` (lowerCamelCase, matches
   `checkLanguage` two lines above). Latent issue; never reported
   before because cpp-lint was a no-op on every prior PR commit.

3. clang-tidy [readability-identifier-naming]:
   WhisperModel.hpp:100 -- unused `set_weights_for_file(span, bool)`
   stub kept for parity with `transcription-parakeet` (which uses
   snake_case extensively for this exact API). Renaming would
   diverge from the parakeet pattern, so suppress with a single
   NOLINTNEXTLINE rather than touching the API surface.

Local repro: `cp packages/lint-cpp/.clang-format
packages/transcription-whispercpp/.clang-format` then
`git-clang-format --diff $(git merge-base HEAD origin/main) --
packages/transcription-whispercpp` reports `did not modify any
files`. The .clang-format copy is normally produced by
`packages/transcription-whispercpp/CMakeLists.txt:58
(configure_file COPYONLY)` during CMake configure.

[QVAC-18993][QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: reference QVAC-19071 in CHANGELOG

QVAC-19071 ([Whisper] Update qvac-registry-vcpkg and addon with new
port versions) is the meta task that bundles the registry-side port
bump (qvac-registry-vcpkg PR #152: whisper-cpp 1.8.4.3#4) with the
consumer-side addon bump (qvac PR #2124: transcription-whispercpp
0.7.3, baseline re-pin). No code changes; the work itself was
already covered by PR #152 + this PR. Adds the cross-reference so
the Asana ticket can be closed off this release cycle.

The QVAC-18992 ggml-speech migration (PR #13 + ggml-speech port
bump) stays deferred per the 2026-05-21 plan; it will land as a
follow-up port bump under the same QVAC-19071 umbrella.

[QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin baseline to consume whisper-cpp 1.8.4.3#5 (REF flipped to tetherto/master)

[whisper-cpp PR #28](tetherto/qvac-ext-lib-whisper.cpp#28)
(QVAC-18993 bundled-ggml --- Android dynamic backend + per-arch CPU
dlopen fallback) was merged today (2026-05-21, merge commit
`f3102199` on `tetherto/qvac-ext-lib-whisper.cpp/master`). With it
merged, `tetherto/master` now carries every commit the registry's
`whisper-cpp` port previously pulled from the temporary
`Zbig9000/qvac-ext-lib-whisper.cpp@14620c8857` branch:

  - PR #25 (QVAC-18991, upstream whisper.cpp sync) --- merged 2026-05-20
  - PR #27 (QVAC-18966, tts-cpp chatterbox <atomic> fix) --- merged 2026-05-20
  - PR #28 (QVAC-18993, ggml-backend android dynamic backend) --- merged 2026-05-21

[qvac-registry-vcpkg PR #152](tetherto/qvac-registry-vcpkg#152)
HEAD (`f2870372`) bumps `whisper-cpp` to port-version `1.8.4.3#5`
with the REF repoint --- byte-identical source tarball outside
`parakeet-cpp/` and `tts-cpp/` (separate vcpkg ports). This commit
just re-pins the consumer-side baseline so the addon resolves
against the new port-version.

  vcpkg-configuration.json default-registry baseline:
    9f4e8e20072d8a7a1e118a49c36aacf6af6b3e0d   (MSVC fix only, whisper-cpp 1.8.4.3#4)
      -> f2870372965e899ae1f8a221154d2b243a6c3d30  (+ whisper-cpp 1.8.4.3#5 REF repoint)

No code change in this monorepo --- pure baseline re-pin. CHANGELOG
updated to record both the new baseline and the (now superseded)
intermediate `9f4e8e2` pin.

Closes the consumer-side half of [QVAC-19071](https://tetherapp.atlassian.net/browse/QVAC-19071)
("Update qvac-registry-vcpkg and addon with new port versions").
Registry-side half = vcpkg PR #152 commit `f287037`.

[QVAC-18993][QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: re-pin baseline to whisper-cpp 1.8.4.3#0 (PR #152 review fixes)

@GustavoA1604 review on [qvac-registry-vcpkg PR #152](tetherto/qvac-registry-vcpkg#152)
requested three changes on the registry side:

  1. Drop the explanatory comment block at top of
     `ports/whisper-cpp/portfile.cmake`.
  2. Reset `port-version` 5 -> 0 (treat the tetherto REF repoint as
     a fresh start, not a continuation of the Zbig9000-branch series).
  3. Collapse the three historical `1.8.4.3` entries
     (`port-version` 3, 4, 5 -- never consumed off-fork) in
     `versions/w-/whisper-cpp.json` into a single `port-version: 0`
     entry with the new git-tree.

All three landed in PR #152 commit `ee71ecb`. This commit is the
consumer-side mirror:

  vcpkg-configuration.json default-registry baseline:
    f2870372965e899ae1f8a221154d2b243a6c3d30  (1.8.4.3#5, pre-review)
      -> ee71ecb5b286224377313e5a50558d11adbef3ac  (1.8.4.3#0, post-review)

  CHANGELOG entry updated:
    "1.8.4.3#5" -> "1.8.4.3#0" + note about port-version reset and
    history collapse + supersession line covers both prior pins
    (`9f4e8e2` MSVC fix, `f287037` 1.8.4.3#5).

No code change in this monorepo --- pure baseline re-pin. The
underlying whisper.cpp source bytes are unchanged (REPO + REF +
SHA512 in the portfile are identical between `1.8.4.3#5` and
`1.8.4.3#0`), so the produced binary is bit-for-bit equivalent.

[QVAC-18993][QVAC-19071]

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: 0.8.0 — address PR review

Collapses the 0.7.1/0.7.2/0.7.3 work into a single 0.8.0 release and
folds in Gustavo's PR #2124 review feedback:

- Bump version to 0.8.0; collapse CHANGELOG into a single 0.8.0 entry
- Bump whisper-cpp override to 1.8.4.3#0 (matches PR #152 collapse)
- Repoint default-registry to tetherto/qvac-registry-vcpkg @ a9d7e924
  (PR #152 merge commit on tetherto/main)
- vcpkg.json: model GPU features on transcription-parakeet's pattern —
  platform-gated whisper-cpp deps select [opencl,vulkan] on android,
  [vulkan] on linux/windows, and no GPU feature on apple. Drop the
  addon-side opencl/vulkan feature sections; CMake no longer carries
  ENABLE_OPENCL / ENABLE_VULKAN option indirection.
- index.js: nest backendsDir under whisperConfig (mirrors parakeet's
  parakeetConfig.backendsDir). Strip it from the wire-format whisperConfig
  map and surface it as top-level configurationParams.backendsDir before
  handing the config to the addon. Fix the stale _createAddon JSDoc that
  still described "LLM-specific settings".
- index.d.ts + README.md: document whisperConfig.backendsDir; drop the
  ENABLE_VULKAN build instructions (now controlled by vcpkg.json).
- Compact all the addon-side comments (CMakeLists.txt, JSAdapter.cpp,
  WhisperConfig.hpp, WhisperModel.cpp); drop every QVAC asana ticket
  reference; standardise the C++ log wording on
  "configurationParams.backendsDir".
- Drop "-D ENABLE_VULKAN=OFF" from the test:cpp:build / coverage:cpp:build
  npm scripts (no-op now that the option is gone).

Co-authored-by: Cursor <cursoragent@cursor.com>

* transcription-whispercpp: 0.9.0 -> 0.8.0 (fold into single release)

Reverts the 0.8.0 -> 0.9.0 bump from the merge commit: per request, this
PR's release notes are folded into the existing 0.8.0 entry rather than
shipping as a separate semver step. Order: Added -> Changed -> Fixed
(from this PR) -> Removed (the OutputCallbackJs revert that landed on
main as 0.8.0 via #2133).

package.json bumped back to 0.8.0.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
simon-iribarren added a commit to simon-iribarren/qvac that referenced this pull request Jun 8, 2026
Lifecycle correctness:
- Spawn lock: steal only when the owner pid is dead (with an mtime fallback for
  an unreadable lock), so a legitimate multi-minute cold start no longer loses
  its lock after 30s and spawns a duplicate runner/serve (tetherto#1).
- close(): the fetch path now bails out instead of re-resolving once closed, so
  a request racing close() can't silently re-add a consumer / spawn a runner (tetherto#3).
- sweepServes: when an orphaned serve's pid is alive but its health check fails,
  keep the record instead of dropping it — dropping stranded a live serve with
  no registry trace. We only reap once it answers as ours, or drop once its pid
  dies (tetherto#4).
- servePort: fold a pinned port into the fleet key so pinned-port callers don't
  reuse an auto-allocated serve on a different port, and distinct pins don't
  collide (tetherto#5).
- Respawn: expose baseURL/port/pid as getters over live state, updated on every
  reconnect, so diagnostics/external clients see the real serve after recovery (tetherto#6).
- retargetUrl now handles Request inputs (not just string/URL) so a respawn stays
  transparent if the SDK ever switches input shapes (tetherto#8).

Docs:
- README + docs-site: direct-baseURL tools (OpenCode/Cline/Aider) don't extend
  liveness; document the long-lived-sentinel/wrapper pattern and fix the
  misleading "the script doesn't have to stay running" note (tetherto#2).
- Reconcile version wording: README/changelog now describe managed mode as
  unreleased (package is 0.1.0); docs-site integration page documents managed
  mode + the async overload (tetherto#7).

Tests: spawn-lock steal/keep matrix, fleet-key pinned-port sensitivity, and the
runner-dead + serve-alive + health-failing sweep case. Build + suite green
(60 pass / 1 integration skip).
simon-iribarren added a commit that referenced this pull request Jun 10, 2026
* feat[api]: add managed mode to @qvac/ai-sdk-provider (QVAC-19900)

Add `mode: 'managed'` so the provider can synthesize an ephemeral
qvac.config.json from a model-constant list, spawn and supervise
`qvac serve` on a free port, and tear it down on host exit. External
mode is unchanged and stays synchronous; the managed supervisor is
lazily dynamic-imported so external-mode users pay no startup cost.

@qvac/cli becomes an optional peer dependency.

* fix: resolve @qvac/cli via main entry when its exports block package.json (QVAC-19900)

The published @qvac/cli ships a string `exports` field ("./dist/index.js"),
which makes the `./package.json` subpath non-resolvable
(ERR_PACKAGE_PATH_NOT_EXPORTED). Managed mode relied on resolving
`@qvac/cli/package.json` to locate the bin, so it would fail to find the CLI
on a clean install. Fall back to resolving the package main entry, which for
@qvac/cli is the same file as the `qvac` bin.

* doc: update ai-sdk provider agent setup after queue (QVAC-19900)

* QVAC-19900 feat[api]: per-model config for managed mode

Managed mode `models` now accepts spec objects ({ name, config, preload,
default }) alongside bare constant names, so callers can set per-model serve
options — notably `ctx_size` and `reasoning_budget` — that coding agents like
OpenCode require. The synthesized qvac.config.json carries the config block,
honors explicit `preload`/`default`, and validates names inside spec objects.

Exports the new `QvacManagedModel` type and documents per-model config plus a
managed-mode OpenCode example in the README.

* QVAC-19900 feat[api]: shared idle-reaped managed serve daemon

Rework managed mode from a per-provider supervisor into a shared,
self-cleaning serve daemon so it is robust standalone and usable by any
tool, not just a single session.

- Reuse via a fleet key (model set + per-model config + host) keyed in a
  cross-process registry under ~/.qvac/managed-serves/; createQvac attaches
  to a matching healthy serve instead of cold-starting a duplicate.
- A detached runner owns the qvac serve child and reaps it once no consumer
  process has been alive for serveIdleTimeout (default 5m). Liveness, not
  request traffic, is the signal, so it works for tools that hit baseURL
  directly (OpenCode/Cline/Aider).
- close() now detaches (deregisters the consumer) instead of killing; a
  shared serve survives until its last user is gone.
- Sweep only reaps dead/orphaned serves, never a healthy serve a live
  process owns (fixes a second session SIGKILLing a downloading serve).
- Respawn-on-failure: fetch re-resolves and retries once on ECONNREFUSED.
- reuse:false (or a pinned servePort) yields a private serve reaped as soon
  as its owner exits.

Refactor into serve-process.ts (spawn/health/stop), registry.ts,
fleet-key.ts, runner.ts; remove supervisor.ts and pid-tracker.ts. Add
reuse and serveIdleTimeout options. Rewrite tests and add reuse/idle-reap
end-to-end coverage; document the shared lifecycle in the README.

* QVAC-19900 fix: reject duplicate model names in managed mode

Each managed model maps to a single serve alias keyed by its name, so a
repeated name silently overwrote the earlier entry — and could drop its
`default: true`. Reject duplicates up front with DuplicateManagedModelError
instead of resolving them ambiguously. Addresses PR review feedback.

* QVAC-19900 fix[api]: address managed-mode self-review findings

- Per-instance consumer markers (<pid>.<rand>) so two providers in one
  process sharing a fleet key don't deregister each other on close (A).
- Restrict respawn retry to ECONNREFUSED so an in-flight completion is
  never blindly replayed on ECONNRESET/EPIPE (C).
- Health-check the recorded baseURL before SIGTERM-ing an orphaned serve,
  guarding against killing a recycled pid (D).
- Use dirname() instead of a posix-only regex for ephemeral config cleanup (E).
- Fold serveBinPath into the fleet key so distinct local builds don't share
  a serve (G).
- Export managed error classes + QvacManagedErrorCode for instanceof checks (H).
- Reject more than one explicit default: true (I).
- Deregister the consumer if resolveServe throws (F); drop dead
  firstConsumerPid runner param (J).

Tests: per-instance markers, health-gated orphan sweep (kills serving
orphan, spares non-serving stranger pid), fleet-key serveBinPath sensitivity,
multiple-default rejection. README updated.

* QVAC-19900 fix[api]: address managed-mode lifecycle review (round 2)

Lifecycle correctness:
- Spawn lock: steal only when the owner pid is dead (with an mtime fallback for
  an unreadable lock), so a legitimate multi-minute cold start no longer loses
  its lock after 30s and spawns a duplicate runner/serve (#1).
- close(): the fetch path now bails out instead of re-resolving once closed, so
  a request racing close() can't silently re-add a consumer / spawn a runner (#3).
- sweepServes: when an orphaned serve's pid is alive but its health check fails,
  keep the record instead of dropping it — dropping stranded a live serve with
  no registry trace. We only reap once it answers as ours, or drop once its pid
  dies (#4).
- servePort: fold a pinned port into the fleet key so pinned-port callers don't
  reuse an auto-allocated serve on a different port, and distinct pins don't
  collide (#5).
- Respawn: expose baseURL/port/pid as getters over live state, updated on every
  reconnect, so diagnostics/external clients see the real serve after recovery (#6).
- retargetUrl now handles Request inputs (not just string/URL) so a respawn stays
  transparent if the SDK ever switches input shapes (#8).

Docs:
- README + docs-site: direct-baseURL tools (OpenCode/Cline/Aider) don't extend
  liveness; document the long-lived-sentinel/wrapper pattern and fix the
  misleading "the script doesn't have to stay running" note (#2).
- Reconcile version wording: README/changelog now describe managed mode as
  unreleased (package is 0.1.0); docs-site integration page documents managed
  mode + the async overload (#7).

Tests: spawn-lock steal/keep matrix, fleet-key pinned-port sensitivity, and the
runner-dead + serve-alive + health-failing sweep case. Build + suite green
(60 pass / 1 integration skip).

* docs: use canonical qvac.tether.io URL in ai-sdk-provider README

* QVAC-19900 feat[api]: public model catalog + catalog-id aliases in managed mode

Add `models.qvacCatalog`, a public models.dev-style catalog that maps
friendly ids (`qwen3.5-9b`) to the SDK constant the serve loads
(`QWEN3_5_9B_MULTIMODAL_Q4_K_M`), so the id a user picks from models.dev
resolves end-to-end with no translation layer in front of the serve.

Managed mode now accepts catalog ids as model names: the synthesized
serve config keys the alias by the friendly id while `model` resolves to
the underlying SDK constant, so the serve answers `qwen3.5-9b` directly.
Bare SDK constants keep working unchanged. A drift unit test fails CI if
any catalog constant disappears from the generated SDK catalog.

* QVAC-19900 feat[api]: process-group serve teardown + closeOnParentExit

Harden managed-mode lifecycle so a managed serve never leaks its `bare`
inference worker or outlives the process that owns it.

- Process-group teardown: spawn `qvac serve` detached (its own group) and,
  when stopServe must escalate past the grace window, SIGKILL the whole
  group. A plain SIGKILL of the serve pid never cascades to the grandchild
  bare worker, so previously a wedged serve orphaned the worker. The
  graceful SIGTERM is still sent to the serve process only, so a healthy
  serve orchestrates its own shutdown and releases the global worker lock
  (no stale lock left behind); the group SIGKILL is the wedged-path fallback.

- `closeOnParentExit` option: for a daemon-style host whose sole job is to
  keep a managed serve alive for a parent process (e.g. an editor/agent
  plugin). The provider watches its parent pid and, the moment the parent
  exits (on POSIX we are reparented to init, ppid → 1), closes itself —
  deregistering the consumer so the runner reaps the serve — and exits.
  Without it a hard-killed parent would leave a reparented host alive,
  keeping its consumer marker forever so the serve was never reaped.

Tests: a stubborn-grandchild fake serve proves group teardown reaps the
worker; `parentIsGone` unit-tests the parent-watch decision.

* QVAC-19900 fix: keep managed serve lifecycle correct under close() race and crash-respawn

- Undo the consumer re-registration when close() wins the race against an
  in-flight fetch retry: resolveServe re-adds the marker after close() removed
  it, which would keep the shared serve warm until the process exits.
- Preserve live consumer markers when sweepServes reaps a crashed/orphaned
  serve, so a respawned runner inherits the still-alive sessions instead of
  idle-reaping the fresh serve out from under them.
- docs: bump managed-mode ctx_size examples to 32768 for agent-sized prompts.

* QVAC-19900 fix: rename reresolve result to resolved for clarity in managed fetch

* QVAC-19900 mod: collapse redundant sync/async registry teardown helpers

removeConsumer/removeConsumerSync and removeRecord/removeRecordSync were a
confusing sync/async mirror: the async removeConsumer was only ever called right
after the sync one (a guaranteed no-op), and the removeRecord pair was really two
teardown semantics under near-identical names. Marker/record teardown is a single
unlink/rm, cheap enough to be synchronous everywhere — including process 'exit'
handlers where async can't run — so collapse each pair into one sync function.
No behaviour change; addresses review feedback on #2408.

* QVAC-19900 mod: trim verbose comments in managed registry

Tighten the sync-rationale comments on removeRecord/removeConsumer and drop a
stale, broken leftover comment above ensureDirSync. Keeps the non-obvious intent
(why sync, preserveConsumers semantics) without the narration.

* QVAC-19900 mod: drop unused DEFAULT_SERVE_BIN and ephemeralConfigName

Both were dead: DEFAULT_SERVE_BIN was never imported (serve-process spawns the
resolved CLI path verbatim) and ephemeralConfigName was an unused helper
(writeEphemeralConfig uses a fixed name inside an mkdtemp dir). Removing the
latter also drops the now-unused randomBytes import.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant