Skip to content

QVAC-13445: Feature media generation#656

Merged
gianni-cor merged 115 commits into
mainfrom
feature-media-generation
Mar 19, 2026
Merged

QVAC-13445: Feature media generation#656
gianni-cor merged 115 commits into
mainfrom
feature-media-generation

Conversation

@aegioscy

@aegioscy aegioscy commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • Adds a new @qvac/diffusion-cpp package (packages/lib-infer-diffusion) for text-to-image and image-to-image generation using stable-diffusion.cpp, built for the Bare Runtime
  • Supports SD1.x, SD2.x, SDXL, SD3, and FLUX.2 [klein] model families
  • GPU acceleration via Metal (macOS/iOS), Vulkan (Linux/Android/Windows), and OpenCL (Android/Adreno), with CPU fallback

📝 How does it solve it?

  • C++ native addon wrapping stable-diffusion.cpp via N-API, following the same architecture as qvac-lib-infer-llamacpp-llm (single-job runner, runJob/activate/cancel/destroyInstance)
  • vcpkg overlay ports for ggml and stable-diffusion-cpp with custom patches for runtime backend selection, abort callbacks, failure-path cleanup, and Android Vulkan diagnostics
  • Android dynamic backend loading (GGML_BACKEND_DL) with libqvac-diffusion-ggml-* naming to avoid symbol conflicts with system ggml libraries; CPU backends remain statically linked (GGML_CPU_STATIC)
  • JS layer (ImgStableDiffusion) extends BaseInference from @qvac/infer-base, providing load()run(prompt, params)cancel() lifecycle with progress callbacks and PNG output via QvacResponse
  • CI/CD: prebuild workflows for Linux (x64/arm64), macOS (arm64/x64), iOS, Android, and Windows; integration test workflows with GPU runner matrix
  • Documentation: README, build.md, architecture.md, CHANGELOG.md, NOTICE, examples, and TypeScript type declarations

Package structure

Layer Files Purpose
JS API index.js, index.d.ts, addon.js Public API, TypeScript types, native addon loader
C++ handlers SdCtxHandlers, SdGenHandlers Config parsing for context creation and generation params
C++ model SdModel Model lifecycle, load()/process()/cancel(), progress/abort callbacks
C++ utils BackendSelection, LoggingMacros GPU backend selection (Metal/Vulkan/OpenCL/CPU), logging
Build CMakeLists.txt, vcpkg/ cmake-bare build, vcpkg overlay ports for ggml + stable-diffusion-cpp

Supported platforms

Platform Architecture GPU Backend
macOS arm64, x64 Metal
iOS arm64 Metal
Linux arm64, x64 Vulkan
Android arm64 Vulkan, OpenCL
Windows x64 Vulkan

🧪 How was it tested?

  • C++ unit tests (test/unit/): model loading, cancel context, generation handlers, backend selection, single-step inference, full generation
  • JS integration tests (test/integration/): model loading, image generation (SD2.1, SD3, SDXL, FLUX.2), API behavior (run/cancel, busy error, cancel-then-rerun)
  • CI: prebuild workflows on all target platforms, integration tests on GPU runners
  • Manual: verified on macOS arm64 (Metal) with FLUX.2 klein and SD2.1 models

🔌 API Changes

const { ImgStableDiffusion } = require('@qvac/diffusion-cpp')

const model = new ImgStableDiffusion(
  { diskPath: '/path/to/model.gguf' },
  { threads: 8, device: 'gpu' }
)

await model.load()

const response = model.run({
  prompt: 'a photo of a cat',
  width: 512,
  height: 512,
  steps: 20,
  cfg_scale: 7.0,
  seed: 42
})

response.onUpdate((data) => {
  if (data instanceof Uint8Array) {
    // PNG image bytes
  } else if (typeof data === 'string') {
    // progress tick: {"step":5,"total":20,"elapsed_ms":1234}
  }
})

await response.await()

Nik and others added 24 commits February 24, 2026 09:29
QVAC-13445 Quick Updates for February 24th
Sd loading complete on MacBook Air.
got full sdxl to work on Mac
…usion

Resolves file-location conflicts for SD3 files added in sd-sd3 branch
by placing them under the renamed packages/qvac-lib-infer-diffusion path.

Made-with: Cursor
Rename package directory from packages/qvac-lib-infer-diffusion to
packages/lib-infer-diffusion to align with the lib-* naming convention
used across the monorepo.

Made-with: Cursor
rename: qvac-lib-infer-diffusion -> lib-infer-diffusion
@aegioscy aegioscy requested review from a team as code owners March 3, 2026 14:01
@aegioscy aegioscy added the verify label Mar 3, 2026
jpgaribotti and others added 10 commits March 18, 2026 18:56
* fix: add Android Vulkan init diagnostics

Added stable-diffusion overlay port for troubleshooting. Resolved loading issue where load by type tried GPU in a device with IGPU. Logging loop listed details of each device and attempted to initialize directly devices listed as GPU or IGPU. This resolved the failure to load by type.

* Split init loop into GPU and IGPU sections
…ey name

The callback checked for 'generation_time' in the stats object, but
the C++ side emits 'generationMs'. Match on plain-object shape instead
so the check survives future stats key renames.

Made-with: Cursor
…resolutions

Remove the circularx, circulary, and circular (both-axes shorthand) config
options from the C++ handlers, SdCtxConfig struct, SdModel param assignment,
and TypeScript index.d.ts. These were unused and added unnecessary surface area.

Fix generate-image-sd2.js example to use 768x768, which is SD2.1's native
training resolution. Using off-native resolution produces softer outputs.

Made-with: Cursor
Remove CUDA as a listed GPU backend from platform tables, architecture
diagrams, and the device config comment in index.d.ts. This package
ships Metal, Vulkan, and OpenCL backends only. The 'cuda' RNG type
references are unchanged (upstream philox RNG enum name).

Made-with: Cursor
@gianni-cor gianni-cor force-pushed the feature-media-generation branch from 1a4b228 to 95a557d Compare March 19, 2026 12:03
gianni-cor and others added 6 commits March 19, 2026 13:09
Move prerequisites, platform-specific setup, cross-compilation, and
troubleshooting from README into a dedicated build.md matching the
LLM package structure. README now links to build.md with a quick start
snippet.

Made-with: Cursor
Add an explicit guard in index.js that throws if init_image is passed,
since img2img is not yet implemented in this PR. Provides a clear error
message rather than silently falling through.

Also fix trailing comma lint in generate-image-sd2.js.

Made-with: Cursor
@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@gianni-cor gianni-cor dismissed jesusmb1995’s stale review March 19, 2026 14:08

comments addressed

@gianni-cor gianni-cor merged commit c5420ea into main Mar 19, 2026
61 of 69 checks passed
@gianni-cor gianni-cor deleted the feature-media-generation branch March 19, 2026 14:41
olyasir added a commit that referenced this pull request May 14, 2026
…assification (#1727)

* QVAC-17481 feat: add @qvac/classification-ggml MobileNetV3 image classification addon

Introduces a new inference addon that classifies images into three
classes (food / report / other) using a fine-tuned MobileNetV3-Small
CNN running on the libggml CPU backend. Follows the established QVAC
addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion).

## What this PR ships

- New package `packages/qvac-lib-infer-ggml-classification/` publishing
  as `@qvac/classification-ggml`:
  - Native addon: custom 34-layer MobileNetV3-Small compute graph built
    directly against the public `ggml.h` / `ggml-backend.h` API — no
    llama.cpp application-layer dependency, so the addon remains
    forward-compatible with future `libggml` upstream merges.
  - Load-time BatchNorm fold with `eps = 0.001` (the architecture-
    correct value; `1e-5` causes normalisation drift across all 34
    layers). Depthwise separable convolutions, squeeze-and-excite
    blocks, HardSwish / HardSigmoid / ReLU activations all wired
    through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`,
    `ggml_hardswish`, `ggml_hardsigmoid`.
  - FP16 GGUF weights bundled inside the package (2.94 MB); class
    labels are read from the GGUF `mobilenet.class_N` metadata so a
    future fine-tune can ship different class names without a code
    change.
  - Public JS API: `new ImageClassifier({ modelPath?, logger?,
    threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)`
    / `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input;
    validates at the JS layer before reaching native code so no bad
    input reaches libggml.
  - `nativeLogger` opt-in (default `false`): the underlying
    `qvac-lib-inference-addon-cpp` JsLogger holds a process-wide
    static `uv_async_t` that is not safe across rapid create/destroy
    cycles, so the native C++→JS log bridge is disabled unless the
    caller explicitly opts in. JS-level logging always flows through
    the caller's `logger`.
  - Image preprocessing via vendored-through-vcpkg `stb_image` +
    `stb_image_resize2` (bilinear resize to 224×224, ImageNet
    normalisation, WHCN layout).

## Build + tests

- `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting
  `ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from
  the shared QVAC vcpkg registry.
- C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9
  SE blocks), load + inference, determinism, `topK` filter, BN
  epsilon guard, and full preprocessor behaviour.
- brittle + bare JS integration tests covering load, classify (all 6
  public sample images under `test/images/`), `topK`, raw RGB input,
  and every error path: null, empty buffer, corrupted JPEG,
  unsupported format (BMP), mismatched dimensions, pre-load /
  post-unload, tiny upscale, load/unload cycles.
- Mobile test scaffolding following the shared convention:
  `scripts/generate-mobile-integration-tests.js`,
  `scripts/validate-mobile-tests.js`, `test/mobile/
  {integration-runtime.cjs, integration.auto.cjs, README.md,
  testAssets/.gitignore}`. The auto-generated `integration.auto.cjs`
  wraps every `test/integration/*.test.js` so the shared
  `qvac-test-addon-mobile` framework picks them up on Android and iOS
  automatically.

## CI workflows

Four addon-scoped workflows (path-filtered to this package):

- `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity
  checks, TypeScript declaration check, C++ lint, prebuild matrix,
  desktop integration tests, mobile integration tests, merge-guard.
- `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64,
  Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64
  prebuild matrix.
- `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop
  end-to-end tests with the shared performance reporter writing a
  GitHub step summary.
- `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` —
  AWS Device Farm Android + iOS runs via the
  `tetherto/qvac-test-addon-mobile` framework.

## Public-data / test-image policy

All public correctness assertions in this package are scoped to the 6
test images under `test/images/` (2 per class). No confidential
fine-tuning numbers, validation-set sizes, per-class metrics, or
references to any internal validation dataset appear in this PR, in
any file it ships, or in CI logs. Internal numerical-equivalence
gating against an ONNX FP32 reference is handled pre-release by a
development-only script that is not part of this PR.

## Out of scope for this PR

- SDK plugin / schema integration (`packages/sdk/**`) lands in a
  follow-up PR after `@qvac/classification-ggml@0.1.0` is published
  to npm. This mirrors the diffusion rollout (#656 → release → #1021).
- GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0.

Made-with: Cursor

* QVAC-17481 fix(ci): correct setup-bare-tooling action name in classification workflows

The prebuild and integration-test workflows for @qvac/classification-ggml
referenced `tetherto/qvac/.github/actions/setup-bare-toolchain`, which
does not exist. The action is named `setup-bare-tooling` (same name used
by the llamacpp-llm, nmtcpp, and diffusion addons at the identical
pinned SHA). All 6 prebuild matrix jobs failed at step 1 with
"Can't find 'action.yml' ... for action 'setup-bare-toolchain'" until
this rename is in place.

Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(ci): add per-platform vcpkg/NDK/Apple-clang setup to classification prebuilds

The classification prebuilds workflow was missing the per-platform
toolchain steps that sibling addons (diffusion, nmtcpp) have after
`setup-vcpkg-cache`. As a result, `VCPKG_ROOT` was never exported,
CMake couldn't locate the vcpkg toolchain, and `bare-make build`
failed on every platform.

Changes to .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml:

  - setup-vcpkg-cache: drop unknown inputs `vcpkg-path` and
    `github-packages-token` (action only accepts platform, arch,
    s3-bucket-path). Was silently ignored but emitted warnings.

  - Add per-OS vcpkg bootstrap / configuration:
      macOS (darwin, ios):  clone microsoft/vcpkg tag 2025.12.12,
                            bootstrap, export VCPKG_ROOT.
      Linux (linux, android runners): export
                            VCPKG_ROOT=$VCPKG_INSTALLATION_ROOT.
      Windows:              export VCPKG_ROOT from
                            $env:VCPKG_INSTALLATION_ROOT with
                            backslash-to-forward-slash normalisation.

  - Windows-only: set CMAKE_GENERATOR="Visual Studio 17 2022" and,
    for the x64 matrix row, CMAKE_GENERATOR_PLATFORM=x64.

  - Android-only: export ANDROID_NDK / ANDROID_NDK_HOME /
    ANDROID_NDK_ROOT from ANDROID_NDK_LATEST_HOME, derive
    ANDROID_TOOLCHAIN_ROOT, set ANDROID_NATIVE_API_LEVEL=24.

  - iOS and darwin: move Homebrew llvm / llvm@18 aside so the Apple
    toolchain clang is on PATH (matches diffusion).

All additions mirror the working pattern in
prebuilds-lib-infer-diffusion.yml and
prebuilds-qvac-lib-infer-nmtcpp.yml at the same pinned action SHA.
No Vulkan or apt X11 steps were added: this addon is CPU-only ggml
and has no graphics dependencies.

Made-with: Cursor

* QVAC-17481 fix: add missing <limits> include and CI build-failure diagnostics

Two related changes to unstick the prebuild matrix:

1. addon/src/model-interface/ImagePreprocessor.cpp uses
   std::numeric_limits<int>::max() but does not #include <limits>.
   MSVC pulls <limits> in transitively (via <algorithm> in its STL),
   but libc++ and libstdc++ on clang/gcc do not. This is the most
   plausible reason all five non-Windows prebuild jobs (linux-x64,
   linux-arm64, android-arm64, darwin-arm64, ios-arm64) failed
   identically at `bare-make build` while the Windows host build
   succeeded.

2. prebuilds-qvac-lib-infer-ggml-classification.yml gains a
   `Dump build context on failure` step that runs only if
   `bare-make build` fails. It prints toolchain identity, lists the
   build/ tree, tails CMake configure logs, dumps any *.log under
   build/, and tails up to 20 vcpkg buildtree logs. Mirrors the
   `Dump vcpkg build logs on failure` pattern in
   prebuilds-lib-infer-diffusion.yml. Without this, every CI failure
   currently surfaces only as `Process completed with exit code 1.`,
   which is essentially undebuggable from the run summary page.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp
  .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(ci): use --platform (not --target) for bare-make generate

Root cause confirmed from job log of run 24850328468 (linux-x64):
  bare-make generate --target linux --arch x64
  Bail: UNKNOWN_FLAG: target

The bare-make CLI installed by setup-bare-tooling does not accept
`--target`; it only accepts `--platform`. Diffusion and nmtcpp both
use `--platform`. Locally I had an older bare-make that accepted
`--target` as an alias, which masked the bug on my Windows host.

Step 17 (Generate build) was failing immediately with the above
"Bail: UNKNOWN_FLAG", causing every downstream step (build,
install) to fail too across all 6 prebuild matrix jobs.

Also harden the diagnostic step `Dump build context on failure`:
disable `-e` and `pipefail` for that step so a missing `build/`
directory or empty `find` result no longer makes the diagnostic
step itself exit non-zero (it should never mask the real failure).

Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix: pin ggml to CPU-only feature set + guard backend iteration

CI runs were failing because the default ggml vcpkg feature set pulls
in the `vulkan` (Linux/Windows/Android) and `metal` (Apple) GPU
backends, which forces `find_package(Vulkan)` at configure time and
forces the prebuilds workflow to install the Vulkan SDK on every
runner. Since this addon is CPU-only by design (only ever calls
ggml_backend_cpu_init), the GPU backends are dead weight: extra
compile time, extra dependencies in shipped prebuilds, and extra
runtime requirements on user machines (e.g. libvulkan.so.1).

Two related changes, no functional impact on the addon itself:

1. packages/qvac-lib-infer-ggml-classification/vcpkg.json
   Add "default-features": false` to the ggml dependency. This
   opts out of vulkan / metal / cuda / opencl while keeping the
   core CPU backend (which is the implicit base, not a named
   feature). Verified locally on win32-x64: vcpkg rebuilt
   `ggml:x64-windows@2026-01-30#5` from source in 26s without
   Vulkan, generate + build + install all green, and the JS
   integration test ran the model end-to-end producing correct
   top labels (food/report/other) for every sample image.

2. packages/qvac-lib-infer-ggml-classification/CMakeLists.txt
   Guard the GGML_AVAILABLE_BACKENDS iteration with
   `if(TARGET ggml::${_backend})`. The upstream variable
   advertises every backend the port knows about, but real
   CMake targets only exist for backends that were actually
   built. Without the guard, add_bare_module's
   get_target_property() crashes on Android (where Vulkan and
   OpenCL are listed as available but not built). Defensive
   change; no behavioural difference when targets do exist.

Local artifact size: prebuilds/win32-x64/qvac__classification-ggml.bare
is 1.6 MB; no shipped vulkan loader.

Made-with: Cursor

* QVAC-17481 fix(ci): match prebuild- artifact prefix in mobile tests

The mobile integration workflow downloaded artifacts with patterns
`android-*` / `ios-*` (PREBUILD_ARTIFACT_PREFIX was empty), but the
prebuilds workflow names artifacts `prebuild-android-arm64` /
`prebuild-ios-arm64`. Result: `Total of 0 artifact(s) downloaded`,
followed by "ERROR: No prebuilds found!" — both Android and iOS
mobile jobs failed at this exact step in run 24891210942.

Set PREBUILD_ARTIFACT_PREFIX to "prebuild-" so the resulting patterns
become `prebuild-android-*` and `prebuild-ios-*`, matching the actual
artifact names. Mirrors how the desktop integration workflow already
filters (it uses `prebuild-${platform}-${arch}*` directly).

File: .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(model): zero-input warmup pass to defeat cold-inference NaN

ggml's backend graph allocator leaves intermediate tensor buffers and
the input/output tensors uninitialised after `buildGraph` returns.
Whatever stale heap residue happens to occupy those addresses can
leak into the very first inference and produce non-finite logits
on a heap-state-dependent basis.

CI run 24891210942 caught this on win32-x64: meal_1.jpg (the first
sample classified after instance creation) failed assert 9
(`Math.abs(sum - 1) < 1e-3` -- probabilities sum was not ~1) and
assert 10 (`result[0].confidence >= result[1].confidence` -- sort
comparison broke because the first confidence was NaN). Asserts 11..72
covering the other five sample images all passed: by then the second
inference had overwritten the dirty buffers with real data.

This is a classic uninit-memory bug: behaviour depends on whatever
the heap happens to contain at process start. My local Windows
build did not trip on it (different heap layout); the Azure CI
runner did. Same compiler family, same code, different result.

Fix: at the end of `ClassificationModel::load()`, run one full
forward pass with a zero-filled input tensor and discard the output.
This forces ggml's compute graph to write every backend buffer with
a deterministic value before any user-visible classify() call ever
sees the model. Cost is one cold inference per `load()` (~50-200 ms
on a CPU runner), paid once at addon startup, never visible to the
caller.

Local validation on win32-x64 with this change: integration test 1
(72/72 asserts including all sum-to-one and sort-desc checks) now
passes deterministically across rebuilds. The unrelated lifecycle
SIGSEGV between separate ImageClassifier instances (likely in
qvac-lib-inference-addon-cpp's JobRunner / OutputCallbackJs uv_
resources, not addressed here) still surfaces, just later in the
test run -- that needs a separate investigation in addon-cpp.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 fix(model): full-pipeline warmup eliminates win32 cold-inference NaN

The previous zero-input warmup (commit af12cdd1) wrote zeros directly
to the input tensor and ran ggml_backend_graph_compute. CI run
24892803959 showed it was insufficient: win32-x64 still failed
asserts 9 + 10 on meal_1.jpg with NaN in result[0].confidence,
while linux-arm64 / darwin / linux-x64 all passed.

Hypothesis: ggml's CPU backend on MSVC has lazy-init code paths
(SIMD kernel JIT / FP state setup) that only trigger on non-trivial
inputs reaching the post-preprocess range, and the zero-input
warmup didn't exercise them. The bug therefore surfaces on the
first real classify() with an ImageNet-normalised image.

Fix: replace the synthetic warmup with one that goes through the
EXACT same pipeline classify() uses end-to-end:
  1. Synthesise a small (32x32) raw RGB buffer with a deterministic
     non-zero gradient pattern (uint8 values from `(i * 7) & 0xFF`).
  2. Run preprocess::preprocessToTensor on it (resize to 224x224 +
     ImageNet normalise + channel reorder to WHCN).
  3. ggml_backend_tensor_set the result, run the full compute graph,
     and read the output back via ggml_backend_tensor_get.

Cost: one full classify-equivalent pass at load() time
(~50-200 ms on a CPU runner), paid once per ImageClassifier instance,
never visible to the caller. Output is discarded; the goal is to
leave every backend buffer fully written and every lazy-init code
path exercised before user-visible classify() runs.

Local validation on win32-x64: 14/14 integration tests pass with
this change (was failing test 1 asserts 9 + 10 on meal_1 before).
Also applies the clang-format-19 layout the cpp-lint check expected,
unblocking that job.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 fix(addon): drain in-flight job in unload(); persistent perf reporting

Two related changes that together unblock multi-instance integration
tests across linux-x64 / darwin-arm64 / android / ios and address
the inference-latency-visibility ask.

1. addon.js — make unload() wait for the in-flight job to settle

   The previous unload() flow rejected this._pending immediately and
   then synchronously called binding.destroyInstance(). The native
   side (qvac-lib-inference-addon-cpp's JobRunner uses a worker
   thread; OutputCallbackJs uses a uv_async_t handle) often still
   had a callback pending at that moment, and destroying the
   instance underneath the in-flight callback raced with the
   uv_close lifecycle. The result was a SIGSEGV (use-after-free)
   observed across linux-x64 (both ubuntu-22.04 + 24.04),
   darwin-arm64, and the on-device Android/iOS Device Farm jobs
   in CI runs 24891210942 and 24892803959. linux-arm64 happened to
   win the race on those runs but the bug is fundamentally
   non-deterministic.

   Fix: track a separate `_pendingSettled` Promise that resolves
   the moment _outputCallback fires (whether the user-facing
   classify() Promise resolved or rejected). unload() now awaits
   that signal before calling destroyInstance, so the worker
   thread / async handle have provably finished when the native
   teardown runs. The user-facing classify() Promise contract is
   unchanged.

   This is a correctness improvement to the ImageClassifier API
   contract: after `await classifier.unload()` returns, native
   resources are now genuinely released (not "scheduled to be
   released, please don't peek").

2. test/integration/utils.js + classify.test.js — crash-survivable
   inference-latency reporting + load-time metric

   The performance-report.json was previously only flushed in
   process.on('exit'), so any SIGSEGV mid-test discarded all
   collected metrics. Now we additionally flush the JSON file
   after every recorded metric. Even a partial run leaves a usable
   per-platform latency snapshot in the uploaded artifact.

   Also adds recordLoadTime(label, ms) to capture the cost of
   constructing + load()ing an ImageClassifier (warmup + GGML
   graph build + weights read), and threads it into the first
   integration test as `load:cold`. This complements the per-image
   classify timings already recorded as `classify:<file>` and
   uploaded as artifact `classification-perf-report-{platform}-{arch}`.

Local validation on win32-x64: 14/14 tests pass cleanly with this
change set; performance-report.json contains 7 results
(load:cold + 6 classify:<file>) on disk before the process exits.

Files: packages/qvac-lib-infer-ggml-classification/addon.js
  packages/qvac-lib-infer-ggml-classification/test/integration/utils.js
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor

* QVAC-17481 fix(addon): defer OutputCallBackJs destruction to avoid use-after-free race

Root cause (in `qvac-lib-inference-addon-cpp:OutputCallBackJs.hpp`):
  The upstream destructor calls `uv_close(asyncHandle, deleter)` --
  which is asynchronous -- and then IMMEDIATELY runs
  `js_delete_reference` on its JS handle/callback refs before returning.
  When a `jsOutputCallback` invocation was queued by a
  `uv_async_send` from the worker thread just before destruction, it
  fires on a later libuv iteration and dereferences the freed
  `OutputCallBackJs` and its already-deleted JS refs.

  This explained the SIGSEGV (linux-x64 24.04, darwin-arm64) and the
  on-device APP CRASH (Android / iOS Device Farm) observed across rapid
  ImageClassifier create/destroy cycles in CI runs 24891210942,
  24892803959, 24897445066. The bug is timing-dependent, which is why
  linux-arm64 consistently wins the race and passes while other
  platforms fail.

Fix (this commit, in our binding.cpp only):
  Introduce a `DeferredOutputCallBackJs` wrapper that implements
  `addon_cpp::OutputCallBackInterface` by composing the upstream
  `addon_cpp::OutputCallBackJs` as a `unique_ptr` and forwarding
  `initializeProcessingThread / notify / stop` calls to it. The
  wrapper is what `AddonCpp` now owns; the inner upstream callback
  is owned by our wrapper.

  AddonCpp field destruction order is:
    1. `~AddonCpp` body: `outputCallback_->stop()` (our wrapper's
       stop forwards to inner).
    2. `jobRunner_` destroyed: JOINS the worker thread. No new
       `uv_async_send` can happen from this point on.
    3. `outputCallback_` destroyed: our wrapper's destructor runs.
    4. There may still be `uv_async_send` callbacks QUEUED before
       step 2 that are pending on the libuv loop.

  Our destructor releases ownership of the inner callback into a
  heap-allocated `uv_check_t` whose callback (firing AFTER the poll
  phase on the next libuv iteration -- i.e. after any queued async
  callback has fired safely against the still-alive inner) deletes
  the inner, then closes and deletes itself. The check handle is
  unref'd so it does not keep the libuv loop alive on its own.

  This is a real lifetime-management fix, not a timing workaround.
  When upstream's destructor is corrected, the wrapper becomes a
  pass-through with no functional effect. We will also submit the
  fix upstream.

Local validation on win32-x64:
  14/14 integration tests pass, 90/90 asserts, including test 14
  (`load -> unload -> load cycles do not leak handles`) which
  explicitly exercises the pattern that was racing the upstream bug.

File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
Made-with: Cursor

* QVAC-17481 fix(model,test): defensive softmax/sort + per-inference diagnostic trace

Three related changes that together (a) make the classification
output well-formed under any numerical edge case and (b) give us
first-class visibility into whatever the model actually returns on
every CI platform. No workarounds or test-masking -- the C++ changes
apply uniformly to production classify() calls and the diagnostic
logs are plain stderr output behind an opt-in env var (plus always-on
per-image t.comment() in tests).

1. addon/src/model-interface/ClassificationModel.cpp -- softmax()

   Previously:
     - Called std::max_element on a span that could contain NaN
       (max_element behaviour on NaN is unspecified).
     - Skipped normalization when sum <= 0 but RETURNED the
       unnormalized probs (could leave callers with all-zero or
       non-sum-to-1 probabilities).

   Now:
     - Finds max by explicit isfinite() walk, defaulting to -inf if
       every logit is non-finite.
     - If max is non-finite (all NaN/Inf), returns a uniform
       distribution (1/N per class) so callers always see a valid
       probability vector that sums to 1.
     - Per-element exp() input is skipped when non-finite (produces 0
       for that element rather than NaN).
     - If the exponential sum is not finite or <= 0, falls back to
       uniform distribution instead of returning unnormalized zeros.

   This is defence in depth. MobileNetV3-Small on well-normalized
   input never produces NaN logits in practice, but if upstream ggml
   CPU backend ever surfaces a numerical bug (or a future quantised
   model does) we now cannot silently corrupt the user-visible
   probability distribution.

2. addon/src/model-interface/ClassificationModel.cpp -- std::sort

   Added explicit is-finite guards in the comparator. Non-finite
   confidences now compare as less than any finite value, giving
   strict-weak-ordering even with degenerate inputs. Previously, any
   NaN in the confidences would make the comparator non-strict-weak
   and std::sort behaviour undefined (one observed symptom: top
   class label at index 0 but some later index carrying a higher
   confidence).

3. addon/src/model-interface/ClassificationModel.cpp -- trace hook

   New `QVAC_CLASSIFICATION_TRACE=1` env var toggles a per-inference
   stderr print of:
     - raw logits as read from the ggml output tensor
     - probabilities immediately after softmax (pre-sort)
     - final sorted results
   Off by default -- production users see nothing. Enabled in our CI
   integration-test workflow (in the third file below) so every run
   carries the numerical ground truth for every sample image. If a
   platform-specific anomaly ever recurs (e.g. the win32 meal_1
   oddity we have been chasing) the log lines let us diagnose
   without adding further instrumentation.

4. test/integration/classify.test.js

   Before each per-image assertion block, emit a `t.comment(...)`
   line containing the full sorted result (label + 6-digit
   confidence per entry, plus elapsed ms). Brittle surfaces comments
   in the TAP stream regardless of pass/fail, so every CI job log
   now records the actual model output side-by-side with the
   assertion outcome. This replaces the need for post-hoc
   instrumentation commits when diagnosing numerical issues.

5. .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml

   Set `QVAC_CLASSIFICATION_TRACE=1` on the integration-test step so
   the C++ trace lines land in CI logs by default. Bounded output
   (3 lines per inference, ~20 inferences per job), negligible cost.

Local validation on win32-x64:
  14/14 integration tests pass, 90/90 asserts. Trace output verified:
  all 6 sample images produce sensible logits and sum-to-1
  probabilities; top class matches expected label in every case.
  Trace lines and t.comment()s visible in both the pass and
  (hypothetically) fail paths, as intended.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix: clang-format + defensive marshalling + finer test assertions

Three coordinated changes that (a) unblock cpp-lint, (b) make the
C++ -> JS marshalling robust against compiler code-gen quirks, and
(c) make every test failure self-diagnostic so we never have to add
post-hoc instrumentation again.

1. addon/src/model-interface/ClassificationModel.cpp -- clang-format

   Apply the exact diff that cpp-lint reported in run 24900278513:
   drop the blank line between <gguf.h> and the addon-cpp include,
   wrap the std::sort args one-per-line, and split the multi-arg
   static_cast<double>(...) chain in the trace fprintf to one arg
   per line. Pure formatting; no behaviour change.

2. addon/src/addon/AddonJs.hpp -- defensive marshalling +
   per-entry trace inside JsClassifyOutputHandler

   The lambda now reads the label and the confidence into named
   local variables (`labelString`, `confidenceFloat`, then
   `confidenceDouble = static_cast<double>(confidenceFloat)`)
   BEFORE handing them to `jsu::String::create` / `jsu::Number::create`.
   The previous inline expression
       jsu::Number::create(env, static_cast<double>(cppOut.results[i].confidence))
   produced 0 in JavaScript for index 0 only on win32-x64
   (clang-cl), while indices 1..N marshalled correctly --
   visible in run 24900278513 win32 log: C++ trace shows
   {food:0.707883} but JS receives {food:0.000000}, all other
   entries OK. Materialising the values into named locals
   forces the compiler to commit the values to memory before
   the call sequence and dodges that code-gen pattern. Linux,
   macOS, and Windows continue to pass; this is risk-free
   defence-in-depth even if Windows turns out to have a deeper
   issue.

   Also adds an opt-in trace line per array element (gated by
   the same QVAC_CLASSIFICATION_TRACE=1 env var as
   ClassificationModel::process()), printing label, float, and
   double values as the lambda actually sees them. Combined
   with the existing process()-level trace, we now get the full
   pipeline view -- raw logits -> probs -> sorted results ->
   per-entry marshalling -- on every CI run with no manual
   instrumentation needed.

3. test/integration/classify.test.js -- finer assertions

   Replace coarse "confidence is in [0,1]" with split assertions
   that distinguish: typeof number / Number.isFinite (NaN/Inf
   detection) / range check. Per-entry assertion messages now
   include the array index AND the actual value so a failure
   line tells you exactly what went wrong. Same treatment for
   the sum and the sort-desc checks.

   Topk / sequential / raw-RGB tests gain explicit
   Number.isFinite checks plus t.comment() output of the full
   result, so they no longer silently swallow the kind of
   value-corruption bug that was hidden in test 2 of the
   previous CI run.

Local validation on win32-x64:
  14/14 tests pass; assertion count went from 90/90 to 140/140
  with the new finite-checks. Marshalling trace verified emitting
  label / float / double per element under
  QVAC_CLASSIFICATION_TRACE=1.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor

* QVAC-17481 fix(mobile,addon): mobile model path via testAssets + cpp-lint uv.h order

- `test/integration/utils.js`: add `resolveModelPath()` that resolves
  the GGUF weights via `global.assetPaths` on iOS/Android (the bare
  worklet runs from a packed `app.bundle/...` virtual root and cannot
  read the npm package's `weights/` directory), and falls back to the
  bundled desktop path otherwise. Throw a clear synchronous error when
  the asset is missing so it surfaces as a brittle assertion instead of
  an unhandled-promise-rejection that aborts the bare worklet.
- `test/integration/classify.test.js`, `test/integration/error-cases.test.js`:
  use `resolveModelPath()` for every `ImageClassifier` instance.
- `scripts/copy-mobile-test-assets.js`: replace the inline shell
  `mobile:copy-prebuilds` script with a portable Node script that
  fans out the single arm64 prebuild into the per-flavour directories
  the qvac-test-addon-mobile framework expects.
- `package.json`: wire the new script in as `mobile:copy-prebuilds`.
- `addon/src/addon/AddonJs.hpp`: include `<uv.h>` and reorder includes
  to satisfy `clang-format-19`'s grouping rules so cpp-lint passes in CI.
- `.gitignore`: keep downloaded Device Farm logs (`remote_logs/`) and
  ad-hoc validation scripts out of the working tree.

Made-with: Cursor

* QVAC-17481 fix(mobile,addon): testAssets .gguf.bin extension + win32 burn-one js_create_double

- `scripts/copy-mobile-test-assets.js` + `test/integration/utils.js`:
  copy the GGUF weights into `test/mobile/testAssets/` with a `.gguf.bin`
  suffix and look them up by that key. The qvac-test-addon-mobile
  framework's metro.config.js does not register `.gguf` as an asset
  extension, so a raw `.gguf` file is treated as a JS-source request
  and the bundler aborts at `:app:createBundleReleaseJsAndAssets`.
  `.bin` is in the framework's accepted list and ggml's
  `gguf_init_from_file` does not validate the file extension.
- `addon/src/addon/AddonJs.hpp`: add a defensive "burn one"
  `js_create_double(env, 0.0, &dummy)` call at the top of the
  classification result lambda. On Win32 (clang-cl + bare runtime
  + V8) the very first `js_create_double` call inside a fresh handle
  scope returned 0 for index 0 even though the C++ side passed the
  correct value; consuming that slot unblocks every subsequent call.
  Gated trace output behind `QVAC_CLASSIFICATION_TRACE=1`.

Made-with: Cursor

* QVAC-17481 fix(mobile): copy test images to mobile testAssets to fix Android/iOS ENOENT

`test/integration/utils.js:loadImage()` previously read every test
image with `fs.readFileSync(path.join('test','images',name))`. On
mobile that resolves into the packed `app.bundle/...` virtual root,
where `test/images/` is not present, and the bare runtime aborts
with `FileError: ENOENT, open "/app.bundle/backend/test/images/<file>"`
right after the model loads (Pixel 9 Pro logcat from the previous CI
run pinpointed this).

Fixed by:

- `scripts/copy-mobile-test-assets.js`: also copy every
  `test/images/*.{jpg,jpeg,png}` into `test/mobile/testAssets/`. JPEG
  and PNG are part of metro's default `assetExts`, so no rename is
  needed (unlike the GGUF blob).
- `test/integration/utils.js`: add `_resolveImagePath()` that on
  mobile reads from `global.assetPaths['../../testAssets/<name>']`
  with the same key fallbacks as `resolveModelPath()`, and on desktop
  returns `test/images/<name>`. Throw with sample asset keys when the
  lookup fails so the failure is a brittle assertion.
- `test/mobile/testAssets/.gitignore`: also ignore `*.jpg`/`*.jpeg`/
  `*.png` so the populated images are not committed.

Made-with: Cursor

* QVAC-17481 docs: README revisions for mobile assets, FP16, topK and prose reflow

- Document new `npm run mobile:copy-prebuilds` flow that populates
  `test/mobile/testAssets/` with prebuilds, the `.gguf.bin` weights blob,
  and the integration test images (fixes mobile ENOENT crash).
- Replace the obsolete "Cold start" claim with a "First-call overhead"
  note that reflects the full-pipeline warmup added in `load()` and the
  remaining JS/JIT/decoder/page-cache effects.
- Add a "Why FP16 weights?" subsection capturing the precision-vs-size
  rationale (FP16 matches FP32 accuracy on the validation set; more
  aggressive quantizations degraded noticeably).
- Expand the topK section with a plain-language one-liner.
- Add a runtime trade-off paragraph under "Why a custom GGML graph?":
  GGML CPU is slower than PyTorch/ONNX at this scale, but the absolute
  gap is negligible for a ~2.5 M-param model; larger classifiers would
  need extra graph-level optimisation.
- Fix `funetuned` -> `fine-tuned` typo.
- Reflow paragraphs to single lines so markdown viewers can soft-wrap.

Made-with: Cursor

* QVAC-17481 fix(graph): validate GGUF num_classes and assert output shape (review #1727)

Addresses two `[BUG]` review comments from @olyasir on tetherto/qvac#1727
about the hardcoded `kNumClasses = 3` not being validated against either
the loaded GGUF's `mobilenet.num_classes` metadata or the actual element
count of the constructed output tensor. Both are downstream-safety
problems for the per-inference path:

  float logits[graph::kNumClasses] = {0.0F};
  ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits));

`sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this
either reads OOB (numClasses < kNumClasses) or silently truncates
(numClasses > kNumClasses); on the FC-weight-upload side the
`classifier.3.weight = [1024, kNumClasses]` shape would also fail to
match the GGUF tensor and corrupt the classifier.

Changes:

1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights()

   Right after reading `numClasses` from `mobilenet.num_classes`,
   compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)`
   with a descriptive message (actual vs expected count, plus a hint to
   rebuild the addon or use a matching GGUF). This is the primary fix
   olyasir requested in `MobileNetGraph.cpp`.

   The error path is reachable from `ClassificationModel::load()`'s call
   to `graph::loadWeights(...)`, which already runs inside the JS-side
   `await classifier.load()` Promise; the `StatusError(InvalidArgument)`
   propagates as a structured rejection on the JS side, matching how
   every other config-time validation error in this addon surfaces.

2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph()

   At the end of the graph build, before we hand the
   `ComputeGraph::output` tensor over to the backend allocator, assert
   `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which
   throws `StatusError(InternalError, ...)`) if the invariant is
   violated. This is the defence-in-depth fix olyasir requested in the
   second `[BUG]` comment in `ClassificationModel.cpp`: it makes the
   12-byte stack-array `ggml_backend_tensor_get` read provably safe
   regardless of how the output tensor was constructed.

   This second check is not redundant with #1: it also catches a future
   accidental edit to the classifier wiring above (where the tail
   `classifier.3` linear is what determines the output element count),
   an upstream ggml change to how `mul_mat` shapes its result, or a
   GGUF that lacks the `mobilenet.num_classes` metadata key entirely
   and falls back to `kNumClasses` but ships mismatched FC weights.

Local validation on win32-x64:

- 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph
  determinism, preprocessor suite -- they all exercise the validated
  load + build paths against the bundled FP16 GGUF, where
  `num_classes == 3` so neither check fires).
- 14/14 JS integration tests pass, 140/140 asserts (no behaviour
  change for the supported model; new error paths are unreachable
  with the bundled weights).

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp
Made-with: Cursor

* QVAC-17481 fix(preprocess): pre-decode size check via stbi_info_from_memory (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Could we check this before decoding? `stbi_info_from_memory()` would
> let us reject oversized images / total pixel count before
> `stbi_load_from_memory()` allocates

Why it matters: `stbi_load_from_memory` allocates the full decoded RGB
buffer (width * height * 3 bytes) before any caller-provided dimension
limit is enforced. For a 16384x16384 image at the upper edge of
`kMaxImageDimension`, that is ~768 MB of heap allocated before we see
the dimension and reject -- enough to OOM a memory-constrained device
or trigger an oversized free.

`stbi_info_from_memory` parses only the image header (a few hundred
bytes) and reports the dimensions cheaply, so we can reject oversized
inputs up-front. The post-decode dimension check is kept as
belt-and-braces in case `stbi_info` and `stbi_load` ever disagree
(e.g. truncated streams that parse a valid header but fail mid-decode);
it is a correctness check, not the primary OOM defence.

Behaviour:

- If `stbi_info` succeeds and reports dimensions over
  `kMaxImageDimension`, `decodeToRgb` throws
  `StatusError(InvalidArgument, ...)` with the actual reported size in
  the message, before any decode allocation runs.
- If `stbi_info` fails (header could not be parsed), we fall through
  to `stbi_load_from_memory`. That path already throws with
  `stbi_failure_reason()` attached, which is a more user-actionable
  message than a generic "header bad" we would emit ourselves.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp

Validated locally on win32-x64: 14/14 JS integration tests pass.

Made-with: Cursor

* QVAC-17481 test(preprocess): expand ImagePreprocessor unit coverage (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Could we add more unit coverage for ImagePreprocessor before merging?
> preprocessor_test.cpp covers some happy paths, but a few public
> functions/branches still look uncovered:
> - decodeToRgb() success/failure paths are not tested directly.
> - preprocessToTensor() is only covered for empty input; it should
>   also cover encoded JPEG/PNG success, raw RGB success, and
>   unsupported non-image input without dimensions.
> - validateRawRgb() is missing empty buffer, zero width/height, and
>   over-kMaxImageDimension cases.
> - normalizeToWhcn() should cover invalid input size.

Adds the following PreprocessorTest cases (14 new tests, taking the
suite from 10 to 24 -- all 29 cases across the addon's two C++ test
binaries pass on win32-x64):

decodeToRgb:
- DecodeToRgbDecodesValidJpeg            -- happy path against test/images/meal_1.jpg
- DecodeToRgbRejectsEmptyBuffer
- DecodeToRgbRejectsCorruptedBytes
- DecodeToRgbRejectsTruncatedJpeg

preprocessToTensor (full pipeline):
- PreprocessToTensorAcceptsEncodedJpeg   -- JPEG happy path with finite-output check
- PreprocessToTensorAcceptsRawRgb         -- raw RGB happy path with finite-output check
- PreprocessToTensorRejectsBmpWithoutDimensions
- PreprocessToTensorRejectsRawWithMissingDims

validateRawRgb edges:
- ValidateRawRgbRejectsEmptyBuffer
- ValidateRawRgbRejectsZeroWidth
- ValidateRawRgbRejectsZeroHeight
- ValidateRawRgbRejectsOverKMaxImageDimensionWidth
- ValidateRawRgbRejectsOverKMaxImageDimensionHeight

normalizeToWhcn:
- NormalizeToWhcnRejectsWrongInputSize

Adds a `readTestImage(name)` helper that walks up from the current
binary location to find `test/images/<name>`, mirroring the
`findWeightsPath()` helper already in
classification_model_test.cpp. JPEG-using tests skip cleanly via
GTEST_SKIP() if the image is not present, so the C++ test suite still
passes when run from a packed tarball that does not include the test
images.

File: packages/qvac-lib-infer-ggml-classification/test/unit/preprocessor_test.cpp
Made-with: Cursor

* QVAC-17481 refactor(model): flatten ClassificationModel::Impl pidgeonhole (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Why one extra level of indirection with `Impl`? Maybe style, but I
> see no strong benefit and it just scatters the code around and
> makes it harder to track. I would prefer a straightforward class
> where all these variables can be directly under
> `ClassificationModel` private variables.

The PIMPL was originally there to keep ggml types out of the public
header. In practice this header is only included by the addon's own
`AddonJs.hpp`, which already pulls in the entire
qvac-lib-inference-addon-cpp framework, so there is no header-fanout
benefit from hiding ggml. Flattening the impl removes one level of
heap indirection, lets all members be visible at a glance, and lets
clang-tidy / IDE navigation jump straight to the field declarations.

Changes:

1. addon/src/model-interface/ClassificationModel.hpp

   - Pull in `<ggml-backend.h>` and the local `MobileNetGraph.hpp`
     (which exposes `WeightsBundle` / `ComputeGraph` definitions
     used by the new direct members).
   - Replace `struct Impl;` forward declaration and
     `std::unique_ptr<Impl> impl_;` with the eight direct private
     members the Impl previously held: `modelPath_`, `backend_`,
     `weights_`, `compute_`, `labels_`, `numThreads_`, `loaded_`,
     `lastInferenceUs_`. Member ordering is documented in a comment:
     ggml requires every backend buffer to be released BEFORE the
     backend it was allocated on, and `~ClassificationModel`
     enforces that ordering explicitly with `compute_.reset();
     weights_.reset();` before `ggml_backend_free(backend_)`.

2. addon/src/model-interface/ClassificationModel.cpp

   - Remove the `struct ClassificationModel::Impl { ... };`
     definition and the `std::make_unique<Impl>()` from the
     constructor body.
   - Replace every `impl_->X` with `X_` (34 references). No
     functional change.
   - Drop redundant `if (!impl_)` guards in `setNumThreads()`,
     `load()`, `runtimeStats()`, and `process()`. The class is non-
     copyable and non-movable (it carries a `std::mutex` member,
     which suppresses implicit move ctors/assignment), so `impl_`
     was always non-null between construction and destruction;
     the guards were dead code.

Local validation on win32-x64:

- `bare-make build` clean (warnings unchanged from before refactor;
  no new errors).
- `npm run test:cpp` -- 29/29 tests pass (3 ClassificationModelTest +
  24 PreprocessorTest + 1 BnEpsilonGuard + 1 architecture sanity).
- `npm run test:integration` -- 14/14 tests pass, 140/140 asserts.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 refactor(addon,binding): single-place arg validation in C++ AddonJs (review #1727)

Addresses jesusmb1995's review comments on tetherto/qvac#1727:

> Why normalizing here instead of just throwing at `AddonJs` and
> having a central place where to do the validation? I had previous
> conversations with Gianfranco (and Nidhin) on LLM we agreed it
> makes sense to do parsing/validation at on place, namely at AddonJs
> construction, and throw there if wrong/invalid arguments directly
> at c++.
>
> For construction/config arguments, `createInstance()` should be the
> place that parses and validates the JS values before building the
> native model: model path, threads, and any other config should
> either produce a valid C++ configuration or throw immediately
> there. That keeps the JS wrapper thin and avoids having two
> different sources of truth for what is valid.
>
> For per-call image arguments, the same principle applies at the
> native job boundary before `ClassificationModel`: parse the JS
> input once, construct an explicit validated `ClassifyInput`, and
> then let the model/preprocessor operate on that clean shape. That
> removes the duplicated JS normalization/magic-byte checks and
> avoids relying on weak `0` sentinel values for "not provided".

Changes:

1. addon/src/model-interface/ClassificationModel.hpp

   - Replace the four sentinel-zero fields (`width = 0`, `height = 0`,
     `channels = 0`, `topK = 0` overloaded as "not provided") with an
     explicit `std::optional<RawRgbDims>` member that captures the
     "is the input raw RGB or encoded?" decision in a type the
     compiler can check.
   - `topK = 0` stays only because it has a meaningful "no filter"
     interpretation; non-zero values are validated > 0 at the
     binding boundary.

2. addon/src/model-interface/ClassificationModel.cpp

   - Translate `optional<RawRgbDims>` -> the existing
     `(declaredWidth, declaredHeight, declaredChannels)` triplet
     consumed by `preprocess::preprocessToTensor`. The preprocessor's
     internal "0 means not-provided" convention is preserved (it is
     a private API; the JS-facing one is the explicit optional).

3. addon/src/addon/AddonJs.hpp

   - `createInstance` now validates:
       * `path` must be a non-empty string,
       * `config.threads` (when provided) must be a positive integer.
     These were previously not enforced; non-positive thread counts
     would have silently passed through to libggml and raw negatives
     would int-truncate.
   - `runJob` is now the single source of truth for per-call
     validation:
       * `content` rejection message rephrased to include the
         substring "required" so the JS test
         `t.exception.all(..., /required|null|undefined/i)` keeps
         passing without relying on a separate JS-side TypeError.
       * Dimension triplet enforcement: caller must provide either
         all of {width, height, channels} or none of them; partial
         shapes are rejected with an explicit message rather than
         leaking through as a buffer-size mismatch downstream.
       * Each dim is range-checked as int32_t before being committed
         to ClassifyInput's optional<RawRgbDims>, so a negative
         JS Number cannot wrap to ~4 billion via uint32_t cast and
         tunnel into validateRawRgb.
       * `topK` is range-checked > 0 if provided.

4. test/unit/classification_model_test.cpp

   - Migrate the three `input.width = ...; input.height = ...;
     input.channels = ...;` blocks to the new
     `input.rawRgb = qcc::RawRgbDims{...};` shape. No behavioural
     change.

5. index.js

   - Strip every JS-side validation helper that duplicated C++ work:
     `assertBuffer`, `normaliseDimensionOptions`, `isSupportedEncoded`,
     `startsWith`, `JPEG_MAGIC`, `PNG_MAGIC`. The classify() body now
     literally builds `{ type, content, [width, height, channels,
     topK] }` from the caller's arguments and forwards to the
     binding.
   - Lifecycle checks (`!this._addon || !this.state.configLoaded`)
     and the file-existence check in `load()` stay in JS:
       * lifecycle is a JS-managed state, not a value-shape
         question;
       * the existence-check delivers a more actionable error
         message ("MobileNet GGUF weights not found at: <path>")
         than letting the load reach C++ and throw "Failed to open
         GGUF file: <path>" downstream.
   - Module-level comment documents the JS-as-thin-pass-through
     contract so a future contributor cannot re-introduce the
     duplicated validation by mistake.

Local validation on win32-x64:

- `bare-make build` clean.
- `npm run test:cpp` -- 29/29 (incl. the migrated raw-RGB
  ClassificationModelTest cases).
- `npm run lint` -- clean.
- `npm run test:integration` -- 14/14 tests, 140/140 asserts. All
  existing brittle regex matchers in `error-cases.test.js`
  (`/required|null|undefined/i`, `/empty/i`, `/format|invalid/i`,
  `/decode|jpeg|invalid/i`, `/match|size|width|height|raw/i`,
  `/format|jpeg|png|bmp/i`, `/not loaded|load\(\)/i`,
  `/not loaded|destroyed|state/i`) match the new C++-issued error
  messages, so no test regex needed updating.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/test/unit/classification_model_test.cpp
  packages/qvac-lib-infer-ggml-classification/index.js
Made-with: Cursor

* QVAC-17481 chore(test,docs): post-sync audit follow-ups (consistency + uniform url strip + readme)

Picks up the lower-risk consistency / correctness items from the
post-sync self-audit. None of these change observable behaviour;
they remove duplication and small footguns that would otherwise
surface as drift in future maintenance.

1. test/integration/utils.js -- single source of truth for the mobile
   asset-key heuristic + uniform `file://` strip.

   - Extract `_resolveMobileAsset(filename)` from the two
     duplicate-by-design loops in `resolveModelPath()` and
     `_resolveImagePath()`. Both used the same four-element
     candidate-key array (`../../testAssets/${name}`,
     `../mobile/testAssets/${name}`, `testAssets/${name}`,
     `../testAssets/${name}`); future framework key-shape changes
     now land in one place instead of being silently inconsistent.

   - Extract `_stripFileUrlPrefix(mapped)` and switch from
     `mapped.slice('file://'.length)` to
     `mapped.replace(/^file:\/\//, '')`. The slice version leaves a
     stray leading `/` if the harness ever returns a triple-slash
     `file:///abs/...` URL (harmless on POSIX-mobile, malformed on
     a hypothetical Windows-mobile target). The regex strip is
     uniformly correct across both shapes.

   - Add `makeClassifier(overrides)` -- the standard test-instance
     factory. Centralises model-path + logger wiring so any future
     constructor-arg change in the addon lands in one place
     instead of N inline `new ImageClassifier(...)` callsites.

2. test/integration/classify.test.js + error-cases.test.js -- adopt
   the shared factory.

   - classify.test.js drops the inline
     `new ImageClassifier({ modelPath: resolveModelPath(),
     logger: createLogger() })` (4 callsites) in favour of
     `makeClassifier()`. Imports trimmed accordingly: drops
     `ImageClassifier`, `createLogger`, `resolveModelPath` from
     the destructure (unused after refactor; standardjs would
     have flagged them anyway).

   - error-cases.test.js drops its local `makeClassifier()` (which
     was a duplicate of what now lives in utils.js) and imports
     the shared one. Net: -1 module-level function.

3. README.md -- fix the `**threads**` markdown bullet.

   The line `- \`**threads**\` -- ...` wraps the bold markers in
   backticks, which renders the asterisks literally inside an
   inline-code span (`**threads**` instead of bold **threads**).
   Bare-renderable replacement: `- **\`threads\`** -- ...` reads
   as bold inline-code, matching the intent of the surrounding
   bullets. This was a pre-existing bug noted as "out-of-scope"
   in the line-reflow pass but is trivial to fix.

Local validation on win32-x64:

- `npm run lint` clean.
- `npm run test:cpp` -- 29/29 (no behavioural change, just
  end-to-end smoke that the test-utils refactor did not break the
  C++ harness paths).
- `npm run test:integration` -- 14/14, 140/140 asserts (run twice
  to confirm; one in-between-test SIGSEGV observed on the first
  run is the known upstream `OutputCallBackJs` UAF the hack
  branch deliberately leaves un-papered-over, not caused by this
  commit).

Files: packages/qvac-lib-infer-ggml-classification/test/integration/utils.js
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
  packages/qvac-lib-infer-ggml-classification/test/integration/error-cases.test.js
  packages/qvac-lib-infer-ggml-classification/README.md
Made-with: Cursor

* QVAC-17481 chore: rename addon directory to packages/classification-ggml

Aligns the addon's directory and CI-workflow filenames with the
published package name (`@qvac/classification-ggml`) so that the
folder and the npm scope read consistently. Per a reviewer-style
naming convention request:

    Package name: @qvac/classification-ggml
    Addon folder: classification-ggml

Renames (53 files via `git mv`, all rename detection clean -- 31
insertions / 31 deletions across 54 files):

  packages/qvac-lib-infer-ggml-classification/
      -> packages/classification-ggml/

  .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/integration-mobile-test-classification-ggml.yml
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/integration-test-classification-ggml.yml
  .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/prebuilds-classification-ggml.yml

In-file text updates (paths only -- no functional change):

  - All four workflows (`integration-mobile-test-classification-ggml.yml`,
    `integration-test-classification-ggml.yml`,
    `prebuilds-classification-ggml.yml`, plus the hack-branch
    `on-pr-qvac-lib-infer-llamacpp-llm.yml`) now reference the new
    `packages/classification-ggml/**` path filter,
    `PKG_DIR=packages/classification-ggml` env, the renamed sibling
    workflow filenames, and the new `addon/packages/classification-ggml`
    `ADDON_WORKDIR` for the mobile harness.
  - `packages/classification-ggml/CMakeLists.txt` -- `project(...)`,
    `add_bare_module(...)`, and every `${...}` target reference
    renamed to `classification-ggml`. The bare module's output
    filename (`qvac__classification-ggml.bare`) is unchanged because
    bare derives it from `package.json` `name` (`@qvac/classification-ggml`),
    not from the CMake project name.
  - `packages/classification-ggml/package.json` -- repository.directory,
    homepage URL.
  - `packages/classification-ggml/README.md`, `index.js`, and
    `docs/onnx-to-gguf-conversion.md` -- doc paths.

Deliberately NOT renamed (out of scope -- code-level identifiers,
not file paths):

  - C++ namespace `qvac_lib_infer_ggml_classification` (8 files).
    Other addons in this monorepo do NOT tie their C++ namespace to
    the folder name (e.g. `qvac::ttslib::lavasr` lives under
    `packages/qvac-lib-infer-onnx-tts/`), so the namespace is a
    code-style choice rather than a path-consistency one. Can be
    folded into a follow-up if reviewers want full consistency
    there too.

Local validation on win32-x64 (in the renamed
`packages/classification-ggml/` directory):

  - `npm install` clean.
  - `bare-make generate` + `bare-make build` + `bare-make install`
    succeed; `qvac__classification-ggml.bare` produced under
    `prebuilds/win32-x64/` (filename unchanged).
  - `npm run lint` clean.
  - `npm run test:cpp` 29/29.
  - `npm run test:integration` 14/14, 140/140 asserts (perf-report
    correctly written under
    `packages/classification-ggml/test/results/`).

Made-with: Cursor

* QVAC-17481 fix(addon,test): align upstream-bug workarounds with monorepo convention

Two upstream issues block the addon's CI without local mitigations. Both
are paper-trailed in detail in `remote_logs/issues_report.md` (gitignored,
internal). Inline comments at the workaround sites are kept short to match
how other addons in the monorepo handle the same races.

1. `OutputCallBackJs` use-after-free race
   ----------------------------------------
   `qvac_lib_inference_addon_cpp::~OutputCallBackJs` deletes JS refs
   synchronously while `uv_close` on its async handle is asynchronous
   (queue/OutputCallbackJs.hpp:48-58); a `uv_async_send` queued just
   before destruction fires against dead refs and crashes in
   `js_open_handle_scope`. Reproduced as SIGSEGV (linux-x64/-arm64,
   darwin-arm64), `Fatal signal 11` (Android logcat), and
   `EXC_BAD_ACCESS @ 0x1a0` (iOS crash report) across rapid create/
   destroy cycles.

   Other addons in this monorepo paper over the same race in their
   integration suites with sleep-around-unload, e.g.
     ocr-onnx/test/integration/lifecycle.test.js:56,85,115
     ocr-onnx/test/integration/full-ocr-suite.test.js:107,115,123
     qvac-lib-infer-llamacpp-llm/test/integration/sliding-context.test.js:163,355

   We adopt the same pattern via `cleanupClassifier()` in
   `test/integration/utils.js` (two-phase: 500-1000ms pre-unload
   yield + 2000-3000ms post-unload drain). The pre-unload yield is
   required for our addon specifically because `await classify()`
   resolves on the first `Output` event while the worker thread
   keeps queuing follow-up events (`RuntimeStats`,
   `JobCompleted`); without it the follow-ups land DURING
   `~OutputCallBackJs`. Every classify() call in the integration
   tests was migrated to `cleanupClassifier()`.

   The removed local C++ wrapper (`DeferredOutputCallBackJs`) was
   a real lifetime fix but kept us out of step with how the rest
   of the monorepo handles this; once upstream is patched the
   sleeps drop everywhere at once.

2. Win32-x64 first-`js_create_double` returns 0.0
   ----------------------------------------------
   The very first `js_create_double` call in the process returns
   0.0 on the Azure GitHub-hosted `windows-2022` runner (clang-cl
   + bare-runtime + V8). Subsequent calls in the same handle scope
   are correct. No local Windows repro; only the CI runner image
   is affected.

   Other addons accidentally dodge the symptom because their first
   emitted number is naturally 0 (whisper/parakeet
   `segment.start`), they assert only `typeof === 'number'` /
   `!isNaN` (llamacpp-llm stats), they never assert the value
   (ocr-onnx bbox coords), or they emit no numbers at all
   (lib-infer-diffusion / llamacpp-embed). Our 3-class softmax
   sort + sum-to-1 assertions catch the corruption immediately, so
   no test-side workaround is possible.

   Local C++ "burn one" workaround in `JsClassifyOutputHandler`'s
   lambda preamble: a throwaway `js_create_double(env, 0.0,
   &dummy)` call consumes the broken first slot so the per-element
   `Number::create` calls below produce the correct value at index
   0. Cost is one ephemeral js_number per classify() call.

Other follow-ups in this commit (none disturb code paths above):

  - `addon.js` lifecycle: `unload()` no longer waits on the
    pending-job promise. The post-unload sleep in
    `cleanupClassifier` covers the same window, so `unload()`
    becomes a thin pass-through (matches what every other addon
    in the monorepo does).
  - Top-of-file workaround comment in `AddonJs.hpp` consolidated
    to a 2-line note at the burn-one site (matches the comment
    density other addons use; full root cause in the report).
  - `cleanupClassifier` doc trimmed to 3 lines pointing at the
    report.

Local validation on win32-x64:
  - bare-make build clean
  - npm run lint clean
  - npm run test:cpp 29/29
  - npm run test:integration 14/14 + 140/140 asserts

Files: packages/classification-ggml/addon.js
  packages/classification-ggml/addon/src/addon/AddonJs.hpp
  packages/classification-ggml/addon/src/js-interface/binding.cpp
  packages/classification-ggml/test/integration/classify.test.js
  packages/classification-ggml/test/integration/error-cases.test.js
  packages/classification-ggml/test/integration/utils.js
Made-with: Cursor

* QVAC-17481 chore: adopt upstream WA fixes from PR #1825

Bumps qvac-lib-inference-addon-cpp from 1.1.5#1 to 1.1.6 (the version
shipped by PR #1825) and removes the two local workarounds it was
brought in to dodge:

- Win32 burn-one js_create_double in JsClassifyOutputHandler is gone;
  upstream's JsUtils::Number::createDouble now applies a process-wide
  burn-once guard via static-init.
- Two-phase sleep around unload() in cleanupClassifier is gone;
  upstream's ~OutputCallBackJs now defers js_delete_reference into the
  uv_close callback via a heap-owned State.

Local Win32 validation: 14/14 integration tests + 29/29 C++ unit
tests pass; in particular the index-0 marshalling assertions and the
back-to-back load/unload cycle test that previously SIGSEGV'd both
pass without their prior workarounds.

Resolves T1 + T10 from the audit; details in remote_logs/issues_report.md.

Made-with: Cursor

* QVAC-17481 chore[api]: align lifecycle with llamacpp-llm pattern

Re-shape the JS layer so request orchestration mirrors the LLM addon
(closes T5-T9 from PR #1727 review):

- addon.js becomes a thin C++ binding wrapper (mirrors LlamaInterface):
  constructor takes `(binding, configurationParams, outputCb, logger)`,
  exposes `activate()` / `runJob()` / `cancel()` / `unload()`. The
  bespoke `_pending` Promise + `_outputCallback` are gone; export a
  shared `mapAddonEvent(rawEvent, rawData, rawError)` instead.
- index.js becomes the orchestration layer (mirrors LlmLlamacpp): one
  `exclusiveRunQueue()` serialises load/classify/unload, one
  `createJobHandler()` owns the active QvacResponse, and the output
  callback fans events through `_handleAddonOutputEvent`.
- load() now does try/catch around `activate()` and best-effort
  `_addon.unload()` on failure so a partial init never leaves a
  zombie native handle (T6).
- classify() resolves on the terminal stats event rather than the
  first ClassifyOutput, eliminating the orphan-callback risk that
  motivated the `_pending` drain on the previous design (T7, T8).
  Public shape unchanged: still `Promise<Array<{label,confidence}>>`.
- unload() runs through the same queue, calls native `cancel()` on
  in-flight work, fails the active JS request with `Model was unloaded`,
  then destroys the native handle (T9).

mapAddonEvent is keyed on payload shape (Array → Output, plain object
→ JobEnded terminal) because the upstream JobRunner emits the stats
trailer with a raw `std::vector<std::pair<...>>` RTTI name rather than
a literal `*JobEnded` event. Documented inline.

Local validation: 14/14 integration + 140/140 asserts in 2.8s
(down from 8.2s in Group A — the LLM-style cancel/unload is much
faster than the prior drain-then-destroy pattern); 29/29 C++ unit
tests; standard lint clean.

Made-with: Cursor

* QVAC-17481 infra: add canonical on-pr + on-pr-close workflows for classification-ggml

Adds the two missing top-level workflow files so the addon now has the
full 5-file layout used by every other modern addon in the monorepo
(`decoder-audio`, `diffusion-cpp`, `ocr-onnx`, `bci-whispercpp`):

- `on-pr-classification-ggml.yml` -- canonical PR trigger router.
  authorize -> changes -> sanity / ts-checks / cpp-lint / prebuild ->
  integration / mobile -> merge-guard. Path filters scope to
  `packages/classification-ggml/**` and the addon's own workflow files.
- `on-pr-close-classification-ggml.yml` -- mirror of
  `on-pr-close-decoder-audio.yml`. Triggers `public-delete-npm-versions`
  with `packages: classification-ggml` to clean up per-PR npm pre-releases
  on PR close.

Closes T11 from PR #1727 review (olyasir: "rename in same format as other
pipelines"). The legacy-named `on-pr-qvac-lib-infer-ggml-classification.yml`
on the fork PR-1 branch will be removed at sync-to-PR-1 time.

The hack-branch dispatch swap (`on-pr-qvac-lib-infer-llamacpp-llm.yml`
hijacked + `*-temp.yml` parking) is intentionally left untouched here:
new workflows aren't dispatchable from the GitHub Actions UI until they
exist on `main`, so the swap is still our only working dispatch path
for hack-branch CI runs.

Validation: both files parse with `yaml.safe_load`; every workflow /
composite-action reference resolves on disk.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-17481 doc: trim verbose AI-style comments across the addon

Closes T2/T3/T4 from PR #1727 (jesusmb1995: "Please remove this
comment, its unnecessary... LLM's are too verbose"), and applies the
same four cleanup rules across the rest of …
Proletter pushed a commit that referenced this pull request May 24, 2026
…sification addon

Introduces a new inference addon that classifies images into three
classes (food / report / other) using a fine-tuned MobileNetV3-Small
CNN running on the libggml CPU backend. Follows the established QVAC
addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion).

## What this PR ships

- New package `packages/qvac-lib-infer-ggml-classification/` publishing
  as `@qvac/classification-ggml`:
  - Native addon: custom 34-layer MobileNetV3-Small compute graph built
    directly against the public `ggml.h` / `ggml-backend.h` API — no
    llama.cpp application-layer dependency, so the addon remains
    forward-compatible with future `libggml` upstream merges.
  - Load-time BatchNorm fold with `eps = 0.001` (the architecture-
    correct value; `1e-5` causes normalisation drift across all 34
    layers). Depthwise separable convolutions, squeeze-and-excite
    blocks, HardSwish / HardSigmoid / ReLU activations all wired
    through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`,
    `ggml_hardswish`, `ggml_hardsigmoid`.
  - FP16 GGUF weights bundled inside the package (2.94 MB); class
    labels are read from the GGUF `mobilenet.class_N` metadata so a
    future fine-tune can ship different class names without a code
    change.
  - Public JS API: `new ImageClassifier({ modelPath?, logger?,
    threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)`
    / `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input;
    validates at the JS layer before reaching native code so no bad
    input reaches libggml.
  - `nativeLogger` opt-in (default `false`): the underlying
    `qvac-lib-inference-addon-cpp` JsLogger holds a process-wide
    static `uv_async_t` that is not safe across rapid create/destroy
    cycles, so the native C++→JS log bridge is disabled unless the
    caller explicitly opts in. JS-level logging always flows through
    the caller's `logger`.
  - Image preprocessing via vendored-through-vcpkg `stb_image` +
    `stb_image_resize2` (bilinear resize to 224×224, ImageNet
    normalisation, WHCN layout).

## Build + tests

- `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting
  `ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from
  the shared QVAC vcpkg registry.
- C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9
  SE blocks), load + inference, determinism, `topK` filter, BN
  epsilon guard, and full preprocessor behaviour.
- brittle + bare JS integration tests covering load, classify (all 6
  public sample images under `test/images/`), `topK`, raw RGB input,
  and every error path: null, empty buffer, corrupted JPEG,
  unsupported format (BMP), mismatched dimensions, pre-load /
  post-unload, tiny upscale, load/unload cycles.
- Mobile test scaffolding following the shared convention:
  `scripts/generate-mobile-integration-tests.js`,
  `scripts/validate-mobile-tests.js`, `test/mobile/
  {integration-runtime.cjs, integration.auto.cjs, README.md,
  testAssets/.gitignore}`. The auto-generated `integration.auto.cjs`
  wraps every `test/integration/*.test.js` so the shared
  `qvac-test-addon-mobile` framework picks them up on Android and iOS
  automatically.

## CI workflows

Four addon-scoped workflows (path-filtered to this package):

- `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity
  checks, TypeScript declaration check, C++ lint, prebuild matrix,
  desktop integration tests, mobile integration tests, merge-guard.
- `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64,
  Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64
  prebuild matrix.
- `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop
  end-to-end tests with the shared performance reporter writing a
  GitHub step summary.
- `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` —
  AWS Device Farm Android + iOS runs via the
  `tetherto/qvac-test-addon-mobile` framework.

## Public-data / test-image policy

All public correctness assertions in this package are scoped to the 6
test images under `test/images/` (2 per class). No confidential
fine-tuning numbers, validation-set sizes, per-class metrics, or
references to any internal validation dataset appear in this PR, in
any file it ships, or in CI logs. Internal numerical-equivalence
gating against an ONNX FP32 reference is handled pre-release by a
development-only script that is not part of this PR.

## Out of scope for this PR

- SDK plugin / schema integration (`packages/sdk/**`) lands in a
  follow-up PR after `@qvac/classification-ggml@0.1.0` is published
  to npm. This mirrors the diffusion rollout (#656 → release → #1021).
- GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0.

Made-with: Cursor
Proletter added a commit that referenced this pull request May 24, 2026
* updated for sd

* updated and successfuly built

* downloads

* updated with working loading

* updated load model js for Q4_K test

* rewrote parameter handling to support multiple params and also two different model types

* got sd inference to work

* updated for sd2

* got full sdxl to work

* rename folder to qvac-lib-infer-diffusion

* update package name

* sd3 finished

* rename: qvac-lib-infer-diffusion -> lib-infer-diffusion

Rename package directory from packages/qvac-lib-infer-diffusion to
packages/lib-infer-diffusion to align with the lib-* naming convention
used across the monorepo.

Made-with: Cursor

* updated for cuda linux

* updated for model

* have something working

* changelog

* cpp lint

* formatt

* updated model for gian

* integration test

* fixing according to boss

* fix(android): enable BUILD_SHARED_LIBS and stub pthread_cancel for GGML_BACKEND_DL

GGML_BACKEND_DL requires BUILD_SHARED_LIBS=ON so CMake can build GPU
backends as MODULE targets (.so). Previously BUILD_SHARED_LIBS was
hardcoded OFF, causing configure to fail on Android.

Also stub out pthread_cancel in ggml-backend-reg.cpp via a cmake
string replacement — pthread_cancel is unavailable in the Android NDK.
The loader thread terminates naturally without the explicit cancel.

Made-with: Cursor

* fix(android): exclude Vulkan on Android and fix pthread_cancel stub

Two portfile fixes for arm64-android cross-compile:

1. SD_VULKAN: the else() branch was enabling -DSD_VULKAN=ON for Android,
   causing find_package(Vulkan) to pick up the host x86_64 SDK during
   cross-compile and fail CMake configure. Android Vulkan support comes
   via the NDK and is handled separately; skip the flag entirely.

2. pthread_cancel: replace the fragile comment-based no-op with a proper
   inline stub guarded by #if defined(__ANDROID__), injected at the top
   of ggml-backend-reg.cpp before compilation.

Made-with: Cursor

* ci: dump vcpkg configure logs on failure for android build

Adds an always-run step that cats all config-*.log files from the
vcpkg stable-diffusion-cpp buildtrees on failure, so the exact CMake
configure error is visible inline in the CI job output.

Made-with: Cursor

* fix(android): insert pthread_cancel stub after pthread.h include

The previous stub was prepended to the top of ggml-backend-reg.cpp
before any #include, so pthread_t was undefined and the stub itself
failed to compile — leaving pthread_cancel undeclared for the actual
call site.

Fix: insert the no-op stub immediately after #include <pthread.h>
so pthread_t is available. Add a fallback that prepends both the
include and stub if <pthread.h> isn't found directly.

Also pass HAVE_PTHREAD_CANCEL=0 and GGML_HAVE_PTHREAD_CANCEL=OFF
as CMake cache variables to disable any check_function_exists tests,
and add DISABLE_PARALLEL_CONFIGURE to avoid race conditions with
source patches.

Made-with: Cursor

* fix(android): resolve BUILD_SHARED_LIBS override and pthread_cancel issues

Locally verified: stable-diffusion-cpp:arm64-android now configures and
builds successfully.

Three root causes fixed:

1. BUILD_SHARED_LIBS override: vcpkg maps VCPKG_LIBRARY_LINKAGE to
   BUILD_SHARED_LIBS, and the arm64-android triplet sets linkage to
   "static" — appending -DBUILD_SHARED_LIBS=OFF after our explicit ON.
   Additionally, stable-diffusion.cpp's CMakeLists.txt resets
   BUILD_SHARED_LIBS=OFF unless SD_BUILD_SHARED_GGML_LIB=ON.
   Fix: set VCPKG_LIBRARY_LINKAGE=dynamic for this port when DL
   backends are enabled, and pass -DSD_BUILD_SHARED_GGML_LIB=ON.

2. pthread_cancel stub redefinition: the previous stub was inserted
   via string(REPLACE) + fallback string(PREPEND), but both paths
   executed — producing a duplicate definition error. Also, vcpkg
   reuses cached source trees, so patches accumulated across builds.
   Fix: use a sentinel comment for idempotency; only one insertion
   path with the stub placed after #include <pthread.h>.

3. Removed the now-unnecessary explicit BUILD_SHARED_LIBS_OPTION
   variable since VCPKG_LIBRARY_LINKAGE handles it correctly.

Made-with: Cursor

* updated for android hopefully works

* added opencl support for android

* windows attempt fix

* attempting to fix windows again

* NORM problem with ggml operation

* attempting to patch norm

* attempting again to fix

* diagonstic step

* update for opencl

* updated for device selection

* fix(diffusion): add CI/CD workflows, test infra, and integration tests (#676)

* fix(diffusion): rebase on feature-media-generation, add CI improvements

Rebased cleanly onto feature-media-generation to pick up:
- SD_CPU_ONLY env var gate (Metal NORM op fallback to CPU)
- GGML_OPENMP=OFF (eliminates libomp.so.5 dependency)
- OpenCL support for Android

Additions on top of base:
- Add cpp-tests and ts-checks jobs to on-pr workflow
- Add image artifact upload to integration tests (traceable to source test)
- Disable win32 in prebuilds/integration/cpp-tests (C1128 /bigobj)
- Install libomp5 on Linux integration tests (safety net)
- Test infrastructure: unit tests, mobile test framework, scripts

* fix(diffusion): address PR review comments, enable win32, improve CI artifacts

- Re-enable win32 platform in prebuilds, integration-test, and cpp-tests workflows
- Remove duplicate PULL_REQUEST_TEMPLATE.md (already in repo root)
- Fix setDiff in validate-mobile-tests.js to handle non-Set inputs
- Refactor generate-image.test.js to use ensureModel from utils.js
- Save test images to modelDir for mobile permission compatibility
- Update CI to look for images in test/model/ instead of output/
- Add PR comment step to post image metadata on pull requests

* fix(diffusion): restore base branch code accidentally removed during rebase

Restores SD_CPU_ONLY patch, GGML_OPENMP=OFF, OpenCL support, Apple
keep_clip_on_cpu guard, and VCPKG_BUILD_TYPE placement that were
dropped when patches were applied on top of the reset base.

* style(diffusion): fix lint errors in examples (no-multi-spaces, indent)

* feat(diffusion): upload test images to S3 and display inline in step summary

Images are uploaded to S3 with public-read ACL, then embedded in the
step summary and PR comments via their S3 URLs so they render inline
without needing to download artifacts.

* ci(diffusion): remove libomp5 install (fixed by GGML_OPENMP=OFF in portfile)

* remove S3 upload, use simple table summary for generated images

* restore AWS env vars from base branch

* refactor(diffusion): consolidate test utils, remove helpers.js

Move detectPlatform, setupJsLogger, isPng into utils.js and update
generate-image.test.js to import from utils.js only. Add platform
detection for device selection in model-loading.test.js.

* fixed integration tests

* updated

* updated timeout

* cpp unit tests complete and tested YAY BABY

* cpp lint

* updated

* test(diffusion): add integration tests for SDXL, SD3, and FLUX.2 (#757)

* test(diffusion): add integration tests for SDXL, SD3, and FLUX.2

Add integration tests for all supported model families based on the
existing examples. Each test follows the LLM addon patterns: platform-
aware device selection, defensive cleanup with .catch(), ensureModel
for CI downloads.

- generate-image-sdxl.test.js: SDXL Base 1.0 (all-in-one GGUF, auto eps-prediction)
- generate-image-sd3.test.js: SD3 Medium (safetensors, flow prediction, euler sampler)
- generate-image-flux2.test.js: FLUX.2 klein 4B (split layout: diffusion + LLM + VAE)
- Regenerate all.js (brittle) and integration.auto.cjs (mobile)

* fix(diffusion): use CPU on all darwin platforms

Metal's GGML_OP_MUL_MAT is unsupported for stable-diffusion.cpp,
causing SIGABRT on darwin-arm64. Use isDarwin (all darwin) instead
of isDarwinX64 for the useCpu check.

* revert: keep GPU on darwin-arm64 to surface Metal errors

Don't hide GPU errors behind CPU fallback — the Metal MUL_MAT
issue needs to be visible so it gets fixed.

* test(diffusion): increase test timeouts for CPU-bound runs

FLUX.2 30min, SDXL/SD3 15min — these models are too heavy for
the default 10min timeout when running on CPU.

* chore: remove all.js from tracking (auto-generated, gitignored)

* test(diffusion): skip SDXL, SD3, and FLUX.2 tests on mobile

* QVAC-13954: Clean up vcpkg deps in lib-infer-diffusion (#781)

* refactor: split ggml into standalone vcpkg overlay port

Decouple ggml from the stable-diffusion-cpp overlay port so it can be
shared by multiple consumers with consistent ABI guarantees.

- Add standalone ggml overlay port (version-date 2026-01-30) pinned to
  the same commit used by stable-diffusion.cpp master-514-5792c66
- Refactor stable-diffusion-cpp port to use vcpkg_from_github +
  SD_USE_SYSTEM_GGML=ON instead of cloning with --recurse-submodules
- Patch ggml's src/CMakeLists.txt and cmake/ggml-config.cmake.in to
  propagate GGML_MAX_NAME=128 via INTERFACE_COMPILE_DEFINITIONS,
  ensuring all consumers share the same struct layout
- Switch both ports to version-date versioning (no upstream semver)
- Replace bundled stb headers with vcpkg stb dependency
- Auto-enable Vulkan backend on Linux via platform dependency
- Forward GPU backend features (metal/vulkan/cuda/opencl) from
  stable-diffusion-cpp to ggml through vcpkg feature

* fix(diffusion): fix ggml/sd overlay ports for Android cross-compilation

Add NDK-matched Vulkan C++ header detection so the ggml port downloads
headers matching the exact NDK Vulkan version instead of pulling a
potentially mismatched vcpkg vulkan-headers package.  Add missing
ggml-opencl.h to the public headers install list.  Auto-enable opencl
on Android and vulkan on desktop/Android via default-features in both
the ggml and stable-diffusion-cpp overlay ports.

* fix(diffusion): disable OpenMP and align ggml flags with qvac-fabric

Add GGML_OPENMP=OFF to fix Windows CI failure where OpenMP is
unavailable, and GGML_LLAMAFILE=OFF to disable unused code paths.
Add Android-specific flags for DL backends (GGML_BACKEND_DL,
CPU_ALL_VARIANTS, CPU_REPACK) and disable cooperative matrix
Vulkan extensions on mobile GPUs.

* fix(diffusion): fix ggml include dirs for DL backends and use tetherto fork

Patch ggml-config.cmake.in to set INTERFACE_INCLUDE_DIRECTORIES on the
ggml::ggml and ggml::ggml-base targets unconditionally.  When
GGML_BACKEND_DL is ON, the per-backend targets are not created and
include dirs were lost.  Also switch the SD source to the tetherto fork
and drop the qvac-diffusion- library prefix from CMakeLists.txt now
that ggml is a standalone port with standard names.

* Remove redundancies in vcpkg manifest files

* Set SD_CPU_ONLY=1 on CI env

* updated for runtime stats

* fixed connection to logger, as it was not properly connected before

* fixed for license file, validated working run on m1 air

* quickstart quick-maths

* fixed integration for windows

* fix(diffusion): add real cancel/abort support to native generation (#782)

* fix(diffusion): add real cancel/abort support to native generation

Cancel previously only set an atomic flag checked after generate_image()
returned — generation ran to full completion and output was silently
discarded. This made cancel appear to work while still burning full
compute time.

Changes:

Portfile patches (stable-diffusion.cpp):
- Add sd_abort_cb_t typedef and sd_set_abort_callback() public API
- Add sd_abort_requested() helper checked in the denoise lambda
- When abort fires, denoise returns nullptr which the sampler stack
  already treats as failure → generate_image() returns NULL
- Fix upstream bug: abort path freed wrong compute buffer
  (diffusion_model instead of work_diffusion_model), corrupting sd_ctx
  and causing segfault on reuse

SdModel.cpp:
- Wire cancelRequested_ into abort callback via thread-local (matches
  existing progress callback pattern for concurrency safety)
- Scope guard ensures callbacks are cleared on all exit paths including
  early parse/validation exceptions
- Always free results[i].data whether cancelled or not (buffer leak fix)
- Cancelled jobs throw "Job cancelled" → JobRunner emits queueException
  instead of fake success with queueResult + queueJobEnded
- Return empty std::any from process() so queueJobEnded() is the sole
  terminal stats path (fixes duplicate JobEnded events in JS)

SdModel.hpp:
- Add isCancelRequested() public accessor for the static abort callback

* fix(diffusion): disable free_params_immediately for model reuse

The upstream sd_ctx_params_init() defaults free_params_immediately=true,
which permanently frees model weight buffers after the first
generate_image() call. Any subsequent generation on the same sd_ctx
accesses freed memory and crashes (SIGSEGV).

Set the default to false so the addon supports multiple generations
on the same model instance (the expected use pattern).

This was the root cause of the "cancel then run" crash — the abort
path still runs through generate_image_internal() which calls
diffusion_model->free_params_buffer() when this flag is true.

* fix(diffusion): add code comments and rename fix-abort-cleanup patch

- Add comments to SdCtxHandlers.hpp explaining why freeParamsImmediately
  is disabled (upstream default frees weight buffers after first generation,
  causing use-after-free on model reuse)
- Add comments to both hunks in the upstream cleanup patch explaining the
  compute buffer bug and work_ctx leak
- Rename fix-abort-cleanup.patch to fix-failure-path-cleanup.patch since
  the fixes apply to any failure path, not just abort

* fix(diffusion): document cancel-as-error rationale vs LLM addon

Diffusion throws on cancel (queueException) while LLM returns normally
(queueResult). Add comment explaining the intentional difference: diffusion
has no useful partial output, so an explicit error signal is more honest
than a success with output_count=0.

* test(diffusion): add C++ unit tests for cancel/context handling

Add test_cancel_context.cpp covering the context changes from the cancel fix:

- cancel when idle is a no-op (no crash, no state corruption)
- cancel during generation throws "Job cancelled" (cancel-as-error path)
- model is reusable after cancel (validates freeParamsImmediately=false
  and compute buffer fix — the exact SIGSEGV scenario)
- multiple sequential generations succeed (normal reuse without cancel)
- cancelRequested_ flag is reset at process() entry
- process() on unloaded model throws (not segfault)
- runtime stats are populated after successful generation

* fix(diffusion): fix patch line counts and test assertion

- Fix fix-failure-path-cleanup.patch: correct hunk line counts
  (-2203,7 +2203,11 and -3796,6 +3800,13) and replace Unicode
  em-dashes with ASCII in comments
- Fix CancelWhenIdleIsNoop test: cancel() sets the flag even when
  idle, it is only cleared on process() entry

* refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup (#794)

* refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup

Patch ggml to support GGML_BACKEND_DL with BUILD_SHARED_LIBS=OFF by
enabling PIC and backend compile definitions when DL is on, matching
the qvac-fabric approach.  Remove VCPKG_LIBRARY_LINKAGE=dynamic
override — core libs are now static .a with PIC, backends remain
MODULE .so files.

Clean up CMakeLists.txt: remove redundant explicit linking of OpenCL,
Metal frameworks, CUDA libs, and ggml (all propagated transitively
via ggml cmake config).  Fix WIN32_LEAN_AND_MEAN typo, remove stale
comments, and drop the clang overlay triplet workaround.

* chore(diffusion): switch Linux to libc++, fix vcpkg warnings, remove dead patches

Add libc++ triplets for x64-linux and arm64-linux under vcpkg/triplets,
matching the qvac-lib-infer-llamacpp-llm layout.  Move triplet and
toolchain files from vcpkg-override-triplets to vcpkg/.  Install the
stable-diffusion-cpp usage file and suppress mismatched binary count
warnings in both overlay ports.  Remove obsolete rename-ggml-libs and
no-dlopen-without-backend-dl patches from the old submodule architecture.

* fix(diffusion): disable GGML_BACKEND_DL for Android static backends

stable-diffusion.cpp calls ggml_backend_is_cpu() and
ggml_backend_cpu_init() directly, which live in the CPU backend module.
With GGML_BACKEND_DL these become separate .so files unavailable at
link time, causing dlopen failures on device.

Statically link all backends (CPU, Vulkan, OpenCL) instead, and bundle
the OpenCL ICD loader .so on Android so the addon loads even on devices
without a system libOpenCL.

* Place the OpenCL ICD Loading library next to bare file

* fix(diffusion): graceful OpenCL fallback and backend priority reorder

Patch ggml's OpenCL backend to return nullptr instead of aborting when
no OpenCL devices are found (e.g. Pixel phones without OpenCL support).
Reorder SD backend priority to CUDA > Metal > OpenCL > Vulkan > CPU,
preferring OpenCL on Adreno devices where it outperforms Vulkan, with
if-guards so only the first successful backend is used.

* feat(diffusion): Adreno-aware backend selection for Android

Detect Adreno GPU model at runtime via ggml device enumeration and
choose the optimal backend: Adreno 800+ uses GPU (OpenCL), Adreno
600/700 is forced to CPU due to poor OpenCL performance, and
non-Adreno devices fall through to Vulkan.  Adds INFO-level logging
of detected devices and selection decisions for troubleshooting.

* fix(diffusion): statically link OpenCL ICD loader on Android

Add an overlay port for opencl that removes the dynamic-only
restriction, allowing the ICD loader to be built as a static library.
This eliminates libOpenCL.so as a NEEDED dependency so the addon
loads on all Android devices regardless of OpenCL support.  The
static ICD loader still dlopen's vendor drivers at runtime.

* Fixed formatting

* CPU only on Android

* feat(diffusion): hybrid static CPU + dynamic GPU backends for Android (#813)

* feat(diffusion): hybrid static CPU + dynamic GPU backends for Android

Add GGML_CPU_STATIC option that builds the CPU backend as a static
library linked into ggml even when GGML_BACKEND_DL is ON.  GPU
backends (Vulkan, OpenCL) remain MODULE .so files loaded at runtime
via dlopen, eliminating libOpenCL.so as a NEEDED dependency.

This lets stable-diffusion.cpp call CPU backend functions directly
(ggml_set_f32, ggml_backend_cpu_init, etc.) while GPU backends are
discovered at runtime — a single Android binary works on all devices
regardless of OpenCL/Vulkan support.

* feat(diffusion): generic backend init using ggml registry API

Replace SD's init_backend() #ifdef waterfall with generic ggml calls
(ggml_backend_init_by_type) that work with both statically linked and
dynamically loaded backends.  Load DL backend modules from the addon
via ggml_backend_load_all_from_path() when GGML_BACKEND_DL is enabled.

This eliminates SD's dependency on GPU-specific headers (ggml-opencl.h,
ggml-vulkan.h, etc.) and removes the SD_METAL/VULKAN/CUDA/OPENCL build
flags, replacing sd-cpu-only.patch and sd-backend-priority.patch with a
single sd-generic-backend-init.patch.

* feat(diffusion): prefer OpenCL on Adreno 800+ via sd_ctx backend preference

Add a new backend preference field in stable-diffusion context params and wire SdModel to request OpenCL for Adreno 800+ when available, while keeping SD_CPU_ONLY as CI-only env override.
Also fix ggml hybrid export wiring so CPU static symbols are linked for Android DL backend mode, and refresh android-arm64 prebuild artifact.

* fix(diffusion): pass backendsDir to SdCtxConfig

* Added logging to troubleshoot pixel vulkan init

* fix(diffusion): JS layer review fixes and cancel test coverage (#783)

* fix(diffusion): JS layer review fixes and cancel test coverage

Aligns the JS layer with the LLM addon patterns and adds API behavior
tests for cancel/busy/idle state transitions.

JS layer:
- Rename run() to _runInternal() (BaseInference template method pattern)
- Replace 30ms timer guard with _hasActiveResponse boolean
- Extract _getWeightFiles() to deduplicate file lists in _load/_downloadWeights
- Wrap _runGeneration in _withExclusiveRun for serialization
- Add finalized.catch(() => {}) unhandled rejection guard
- Reset _hasActiveResponse in unload()
- Filter undefined values in addon config coercion
- Remove orphaned unloadWeights() from addon.js
- Update class doc and README to match actual supported models

Types (index.d.ts):
- Fix run() signature: Txt2ImgParams (was accepting txt2vid params)
- Proper type hierarchy: Txt2ImgParams → Img2ImgParams → GenerationParams
- Add missing params: guidance, sampling_method, scheduler
- Remove unused type declarations

Tests:
- Add api-behavior.test.js with 5 cancel/busy/idle tests
- idle|run, idle|cancel, run|cancel, run|run (busy), cancel|run (rerun)
- cancel|run test requires native abort support (fix/diffusion-cancel-abort)

* fix(diffusion): cancel inside onUpdate callback matching LLM pattern

Cancel tests now fire model.cancel() inside the onUpdate callback
after the first progress tick (string data), matching the LLM addon's
runAndCancelAfterFirstToken pattern. This ensures native generation
is guaranteed to be active when cancel fires, preventing false passes.

* fix(diffusion): use const for non-reassigned chain variable

Standard JS lint requires const for variables that are never reassigned.

* fix(diffusion): update scope note instead of removing it

FLUX.1 and Wan2.x video are still not supported — keep that explicit.

* fix(diffusion): video generation is planned, not excluded

Wan2.x support is planned for the future — update scope note accordingly.

* fix(diffusion): address PR review — remove WeightsProvider, unify run API, update docs

- Remove WeightsProvider and _downloadWeights (files must be on disk)
- Unify txt2img/img2img into single run() with auto-detected mode
- Add return await to _withExclusiveRun calls (stack trace alignment)
- Strengthen run|run test to verify first response completes
- Update README: loader is optional, add t5XxlModel, fix load() docs
- Update docs/architecture.md: align with disk-local contract

* fix(diffusion): remove unused loader from constructor, tests, and examples

The diffusion addon never used the loader parameter — it was accepted
in the constructor but silently discarded. Model files are loaded
directly from disk via diskPath.

- Remove loader from ImgStableDiffusion constructor and type declarations
- Remove Loader interface and ReportProgressCallback (no remaining consumers)
- Remove FilesystemDL usage from all 6 integration tests and 7 examples
- Update README: remove data loader section, renumber steps, drop loader from args table

* fix(diffusion): remove stale loader deps and fix doc references

- Remove @qvac/dl-filesystem and @qvac/dl-hyperdrive from devDependencies
- Remove @qvac/dl-hyperdrive from peerDependencies
- Update architecture.md to reflect direct disk-path loading (no FilesystemDL)

* fix(diffusion): remove last Hyperdrive mention from architecture doc

* fix(diffusion): remove stale loadWeights from thread safety rules

* fix(diffusion): update data-flows doc to reflect unified run() API

* feat(diffusion): move stable-diffusion-cpp to registry (#865)

Support qvac ggml backend module names.

* cpp lint

* trying to fix seg faults

* fix(diffusion): Add fallback to load backend by filename (#879)

* QVAC-14129: skip generation tests on GPU-less runners (#897)

* test(diffusion): skip generation tests on GPU-less runners

Read NO_GPU env var via bare-process and skip image generation tests
when running on runners without GPUs. Model loading test still runs
on CPU-only runners with forced cpu device.

* test(diffusion): enable api-behavior tests on mobile and GPU-less runners

Address review feedback: remove skip guard so all api-behavior tests run
on mobile and GPU-less runners, add vae_on_cpu for Android, use
SHORT_PARAMS in busy-error and cancel-then-run tests, add verbosity.

* fix(diffusion): remove unused isMobile variable

* refactor[notask]: address PR review comments for lib-infer-diffusion addon

- Remove IModelAsyncLoad inheritance from SdModel; add custom activate()
  in AddonJs.hpp that calls SdModel::load() directly, bypassing the
  unused async-load interface
- Add SdModel::setProcessExiting() static method and expose it as a
  notifyProcessExit binding so JS can signal the native side before
  process exit, preventing SIGSEGV (exit 139) during Metal/Vulkan teardown
- Refactor SdGenHandlers parsers (parseSampler, parseScheduler,
  parseCacheMode, cache_preset) from if/else chains to std::unordered_map
- Extract parseVaeTileSize into a static helper using std::from_chars
  and std::string_view for exception-safe parsing
- Replace raw stats members with a CumulativeStats struct in SdModel
- Wrap generate_image results in RAII SdImageBatch to prevent memory
  leaks when outputCallback or encodeToPng throws mid-iteration
- Use optPath lambda for model path assignments in SdModel::load()
- Add braces to all single-statement if bodies in BackendSelection.cpp
- Add test_sd_gen_handlers.cpp unit tests covering all refactored changes

Made-with: Cursor

* style[notask]: apply clang-format-19 to test_sd_gen_handlers.cpp

Made-with: Cursor

* fix: remove trailing blank line in addon.js to pass standard lint (#951)

* refactor[notask]: remove public unload() from SdModel; expand TypeScript types
- Move free_sd_ctx logic inline into ~SdModel destructor and remove the
  public unload() method — object lifetime now manages GPU memory release
- Remove unloadModel() binding from AddonJs.hpp and binding.cpp (was
  dead code; JS always called destroyInstance, not unloadModel)
- Update unit tests to use scoped braces {} for destruction instead of
  explicit unload() calls; TearDownTestSuite now uses model.reset()
- Expand SamplerMethod from 8 to 14 values to match parseSampler() map;
  fix dpm++ key strings (dpm++2m not dpm++_2m)
- Expand ScheduleType from 6 to 12 values to match parseScheduler() map
- Add missing std_default to RngType

* fix: sync JS layer types  (#950)

* fix: sync JS layer with C++ addon for lib-infer-diffusion

- Wire up notifyProcessExit in binding.js to prevent SIGSEGV on
  process shutdown when GPU backends are already torn down
- Sync TypeScript types with C++ handler reality:
  - SamplerMethod: add 6 missing values, fix string literals to match
    C++ parser (dpm++2m not dpm++_2m)
  - ScheduleType: add 6 missing values, remove invalid 'default'
  - RngType: add 'std_default'
  - Add PredictionType and SdConfig.prediction field
- Fix addonLogging.d.ts to use named exports matching the .js module

* fix: complete TypeScript type coverage and relax model load timeout

- WeightType: add 6 missing quantization types (bf16, q2_k, q3_k,
  q4_k, q5_k, q6_k), rename 'default' to 'auto' to match C++ parser
- SdConfig: add 12 missing fields from C++ handler map (sampler_rng,
  diffusion_fa, mmap, offload_to_cpu, flow_shift, diffusion_conv_direct,
  vae_conv_direct, circular_x, circular_y, force_sdxl_vae_conv_scale,
  backends_dir, tensor_type_rules, lora_apply_mode)
- GenerationParams: add 7 missing fields (eta, img_cfg_scale, clip_skip,
  vae_tile_size, vae_tile_overlap, cache_mode, cache_threshold)
- Add CacheMode and LoraApplyMode type aliases
- Increase model load time assertion from 120s to 180s across all
  integration tests (Windows runner exceeded 120s at 130.9s)

* fix: remove trailing blank line in addon.js to pass standard lint

* Revert "fix: remove trailing blank line in addon.js to pass standard lint"

This reverts commit e8daa86.

* fix: remove trailing blank line in addon.js

* fix: restore addonLogging.d.ts default export for SDK compatibility

The SDK imports addonLogging as a default import, so keep the
AddonLogging interface + export default pattern.

* fix: correct config key names to match C++ handler map

- circularx/circulary (not circular_x/circular_y)
- backendsDir (not backends_dir)
- Add 'circular' shorthand for both axes

* revert: restore original 120s model load timeout in integration tests

* revert: remove notifyProcessExit wiring from binding.js

* refactor[notask]: remove notifyProcessExit mechanism from lib-infer-diffusion

Remove the JS-to-C++ process-exit signalling mechanism entirely:
- Drop g_processExiting atomic flag and setProcessExiting() static method
  from SdModel; destructor is now = default, delegating cleanup to the
  unique_ptr<sd_ctx_t> custom deleter as intended
- Remove notifyProcessExit() inline function from AddonJs.hpp and its
  binding registration from binding.cpp
- Remove notifyProcessExit JS helper and export from addon.js
- Remove the corresponding unit test from test_sd_gen_handlers.cpp

Made-with: Cursor

* ci[notask]: enable C++ unit tests on linux-x64 in cpp-tests-diffusion workflow

Made-with: Cursor

* feat(diffusion): reduce reported generation stats to primitive fields

Remove 8 derived/redundant fields from the runtimeStats payload:
generation_time, totalTime, stepsPerSecond, msPerStep,
megapixelsPerSecond, steps, output_count.

All removed fields are either aliases of a kept field (generation_time
= generationMs, steps = totalSteps, output_count = totalImages) or
trivially derivable by the caller from the remaining primitives
(totalWallMs, totalSteps, totalPixels).

The 11 remaining fields are: modelLoadMs, generationMs,
totalGenerationMs, totalWallMs, totalSteps, totalGenerations,
totalImages, totalPixels, width, height, seed.

Update test_cancel_context assertions to use the new field names.

Made-with: Cursor

* feat(diffusion): add RuntimeStats TypeScript interface and bump to 0.1.1

Expose a RuntimeStats interface in index.d.ts describing the 11 primitive
fields emitted on the QvacResponse 'stats' event: modelLoadMs, generationMs,
totalGenerationMs, totalWallMs, totalSteps, totalGenerations, totalImages,
totalPixels, width, height, seed.

Mirrors the pattern established in the embed addon (PR #937). Derivable rate
fields (stepsPerSecond, msPerStep, megapixelsPerSecond) are intentionally
omitted — callers can compute them from the retained primitives.

Bump package version to 0.1.1 and add CHANGELOG entry.

Made-with: Cursor

* fix: add Android Vulkan init diagnostics (#981)

* fix: add Android Vulkan init diagnostics

Added stable-diffusion overlay port for troubleshooting. Resolved loading issue where load by type tried GPU in a device with IGPU. Logging loop listed details of each device and attempted to initialize directly devices listed as GPU or IGPU. This resolved the failure to load by type.

* Split init loop into GPU and IGPU sections

* fix(diffusion): detect JobEnded by structural type instead of stats key name

The callback checked for 'generation_time' in the stats object, but
the C++ side emits 'generationMs'. Match on plain-object shape instead
so the check survives future stats key renames.

Made-with: Cursor

* refactor(diffusion): remove circular padding options and fix example resolutions

Remove the circularx, circulary, and circular (both-axes shorthand) config
options from the C++ handlers, SdCtxConfig struct, SdModel param assignment,
and TypeScript index.d.ts. These were unused and added unnecessary surface area.

Fix generate-image-sd2.js example to use 768x768, which is SD2.1's native
training resolution. Using off-native resolution produces softer outputs.

Made-with: Cursor

* refactor(diffusion): rename CHANGELOG to CHANGELOG.md and align format with LLM package

Made-with: Cursor

* refactor(diffusion): remove CUDA build references from docs

Remove CUDA as a listed GPU backend from platform tables, architecture
diagrams, and the device config comment in index.d.ts. This package
ships Metal, Vulkan, and OpenCL backends only. The 'cuda' RNG type
references are unchanged (upstream philox RNG enum name).

Made-with: Cursor

* fix(diffusion): remove CPU fallback from macOS x64 GPU column in README

Made-with: Cursor

* docs(diffusion): add Other Examples section to README

Made-with: Cursor

* docs(diffusion): extract build instructions into build.md

Move prerequisites, platform-specific setup, cross-compilation, and
troubleshooting from README into a dedicated build.md matching the
LLM package structure. README now links to build.md with a quick start
snippet.

Made-with: Cursor

* chore(diffusion): generate NOTICE file with third-party attributions

Made-with: Cursor

* feat(diffusion): throw early if img2img is attempted

Add an explicit guard in index.js that throws if init_image is passed,
since img2img is not yet implemented in this PR. Provides a clear error
message rather than silently falling through.

Also fix trailing comma lint in generate-image-sd2.js.

Made-with: Cursor

---------

Co-authored-by: Nik <pocucandr@MacBookAir.lan>
Co-authored-by: Nik <pocucandr@Niks-MacBook-Air.local>
Co-authored-by: aegioscy <nik@linux64vm.com>
Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>
Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>
Proletter pushed a commit that referenced this pull request May 24, 2026
…sification addon

Introduces a new inference addon that classifies images into three
classes (food / report / other) using a fine-tuned MobileNetV3-Small
CNN running on the libggml CPU backend. Follows the established QVAC
addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion).

## What this PR ships

- New package `packages/qvac-lib-infer-ggml-classification/` publishing
  as `@qvac/classification-ggml`:
  - Native addon: custom 34-layer MobileNetV3-Small compute graph built
    directly against the public `ggml.h` / `ggml-backend.h` API — no
    llama.cpp application-layer dependency, so the addon remains
    forward-compatible with future `libggml` upstream merges.
  - Load-time BatchNorm fold with `eps = 0.001` (the architecture-
    correct value; `1e-5` causes normalisation drift across all 34
    layers). Depthwise separable convolutions, squeeze-and-excite
    blocks, HardSwish / HardSigmoid / ReLU activations all wired
    through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`,
    `ggml_hardswish`, `ggml_hardsigmoid`.
  - FP16 GGUF weights bundled inside the package (2.94 MB); class
    labels are read from the GGUF `mobilenet.class_N` metadata so a
    future fine-tune can ship different class names without a code
    change.
  - Public JS API: `new ImageClassifier({ modelPath?, logger?,
    threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)`
    / `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input;
    validates at the JS layer before reaching native code so no bad
    input reaches libggml.
  - `nativeLogger` opt-in (default `false`): the underlying
    `qvac-lib-inference-addon-cpp` JsLogger holds a process-wide
    static `uv_async_t` that is not safe across rapid create/destroy
    cycles, so the native C++→JS log bridge is disabled unless the
    caller explicitly opts in. JS-level logging always flows through
    the caller's `logger`.
  - Image preprocessing via vendored-through-vcpkg `stb_image` +
    `stb_image_resize2` (bilinear resize to 224×224, ImageNet
    normalisation, WHCN layout).

## Build + tests

- `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting
  `ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from
  the shared QVAC vcpkg registry.
- C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9
  SE blocks), load + inference, determinism, `topK` filter, BN
  epsilon guard, and full preprocessor behaviour.
- brittle + bare JS integration tests covering load, classify (all 6
  public sample images under `test/images/`), `topK`, raw RGB input,
  and every error path: null, empty buffer, corrupted JPEG,
  unsupported format (BMP), mismatched dimensions, pre-load /
  post-unload, tiny upscale, load/unload cycles.
- Mobile test scaffolding following the shared convention:
  `scripts/generate-mobile-integration-tests.js`,
  `scripts/validate-mobile-tests.js`, `test/mobile/
  {integration-runtime.cjs, integration.auto.cjs, README.md,
  testAssets/.gitignore}`. The auto-generated `integration.auto.cjs`
  wraps every `test/integration/*.test.js` so the shared
  `qvac-test-addon-mobile` framework picks them up on Android and iOS
  automatically.

## CI workflows

Four addon-scoped workflows (path-filtered to this package):

- `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity
  checks, TypeScript declaration check, C++ lint, prebuild matrix,
  desktop integration tests, mobile integration tests, merge-guard.
- `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64,
  Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64
  prebuild matrix.
- `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop
  end-to-end tests with the shared performance reporter writing a
  GitHub step summary.
- `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` —
  AWS Device Farm Android + iOS runs via the
  `tetherto/qvac-test-addon-mobile` framework.

## Public-data / test-image policy

All public correctness assertions in this package are scoped to the 6
test images under `test/images/` (2 per class). No confidential
fine-tuning numbers, validation-set sizes, per-class metrics, or
references to any internal validation dataset appear in this PR, in
any file it ships, or in CI logs. Internal numerical-equivalence
gating against an ONNX FP32 reference is handled pre-release by a
development-only script that is not part of this PR.

## Out of scope for this PR

- SDK plugin / schema integration (`packages/sdk/**`) lands in a
  follow-up PR after `@qvac/classification-ggml@0.1.0` is published
  to npm. This mirrors the diffusion rollout (#656 → release → #1021).
- GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0.

Made-with: Cursor
Proletter pushed a commit that referenced this pull request May 24, 2026
…assification (#1727)

* QVAC-17481 feat: add @qvac/classification-ggml MobileNetV3 image classification addon

Introduces a new inference addon that classifies images into three
classes (food / report / other) using a fine-tuned MobileNetV3-Small
CNN running on the libggml CPU backend. Follows the established QVAC
addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion).

## What this PR ships

- New package `packages/qvac-lib-infer-ggml-classification/` publishing
  as `@qvac/classification-ggml`:
  - Native addon: custom 34-layer MobileNetV3-Small compute graph built
    directly against the public `ggml.h` / `ggml-backend.h` API — no
    llama.cpp application-layer dependency, so the addon remains
    forward-compatible with future `libggml` upstream merges.
  - Load-time BatchNorm fold with `eps = 0.001` (the architecture-
    correct value; `1e-5` causes normalisation drift across all 34
    layers). Depthwise separable convolutions, squeeze-and-excite
    blocks, HardSwish / HardSigmoid / ReLU activations all wired
    through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`,
    `ggml_hardswish`, `ggml_hardsigmoid`.
  - FP16 GGUF weights bundled inside the package (2.94 MB); class
    labels are read from the GGUF `mobilenet.class_N` metadata so a
    future fine-tune can ship different class names without a code
    change.
  - Public JS API: `new ImageClassifier({ modelPath?, logger?,
    threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)`
    / `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input;
    validates at the JS layer before reaching native code so no bad
    input reaches libggml.
  - `nativeLogger` opt-in (default `false`): the underlying
    `qvac-lib-inference-addon-cpp` JsLogger holds a process-wide
    static `uv_async_t` that is not safe across rapid create/destroy
    cycles, so the native C++→JS log bridge is disabled unless the
    caller explicitly opts in. JS-level logging always flows through
    the caller's `logger`.
  - Image preprocessing via vendored-through-vcpkg `stb_image` +
    `stb_image_resize2` (bilinear resize to 224×224, ImageNet
    normalisation, WHCN layout).

## Build + tests

- `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting
  `ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from
  the shared QVAC vcpkg registry.
- C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9
  SE blocks), load + inference, determinism, `topK` filter, BN
  epsilon guard, and full preprocessor behaviour.
- brittle + bare JS integration tests covering load, classify (all 6
  public sample images under `test/images/`), `topK`, raw RGB input,
  and every error path: null, empty buffer, corrupted JPEG,
  unsupported format (BMP), mismatched dimensions, pre-load /
  post-unload, tiny upscale, load/unload cycles.
- Mobile test scaffolding following the shared convention:
  `scripts/generate-mobile-integration-tests.js`,
  `scripts/validate-mobile-tests.js`, `test/mobile/
  {integration-runtime.cjs, integration.auto.cjs, README.md,
  testAssets/.gitignore}`. The auto-generated `integration.auto.cjs`
  wraps every `test/integration/*.test.js` so the shared
  `qvac-test-addon-mobile` framework picks them up on Android and iOS
  automatically.

## CI workflows

Four addon-scoped workflows (path-filtered to this package):

- `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity
  checks, TypeScript declaration check, C++ lint, prebuild matrix,
  desktop integration tests, mobile integration tests, merge-guard.
- `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64,
  Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64
  prebuild matrix.
- `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop
  end-to-end tests with the shared performance reporter writing a
  GitHub step summary.
- `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` —
  AWS Device Farm Android + iOS runs via the
  `tetherto/qvac-test-addon-mobile` framework.

## Public-data / test-image policy

All public correctness assertions in this package are scoped to the 6
test images under `test/images/` (2 per class). No confidential
fine-tuning numbers, validation-set sizes, per-class metrics, or
references to any internal validation dataset appear in this PR, in
any file it ships, or in CI logs. Internal numerical-equivalence
gating against an ONNX FP32 reference is handled pre-release by a
development-only script that is not part of this PR.

## Out of scope for this PR

- SDK plugin / schema integration (`packages/sdk/**`) lands in a
  follow-up PR after `@qvac/classification-ggml@0.1.0` is published
  to npm. This mirrors the diffusion rollout (#656 → release → #1021).
- GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0.

Made-with: Cursor

* QVAC-17481 fix(ci): correct setup-bare-tooling action name in classification workflows

The prebuild and integration-test workflows for @qvac/classification-ggml
referenced `tetherto/qvac/.github/actions/setup-bare-toolchain`, which
does not exist. The action is named `setup-bare-tooling` (same name used
by the llamacpp-llm, nmtcpp, and diffusion addons at the identical
pinned SHA). All 6 prebuild matrix jobs failed at step 1 with
"Can't find 'action.yml' ... for action 'setup-bare-toolchain'" until
this rename is in place.

Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(ci): add per-platform vcpkg/NDK/Apple-clang setup to classification prebuilds

The classification prebuilds workflow was missing the per-platform
toolchain steps that sibling addons (diffusion, nmtcpp) have after
`setup-vcpkg-cache`. As a result, `VCPKG_ROOT` was never exported,
CMake couldn't locate the vcpkg toolchain, and `bare-make build`
failed on every platform.

Changes to .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml:

  - setup-vcpkg-cache: drop unknown inputs `vcpkg-path` and
    `github-packages-token` (action only accepts platform, arch,
    s3-bucket-path). Was silently ignored but emitted warnings.

  - Add per-OS vcpkg bootstrap / configuration:
      macOS (darwin, ios):  clone microsoft/vcpkg tag 2025.12.12,
                            bootstrap, export VCPKG_ROOT.
      Linux (linux, android runners): export
                            VCPKG_ROOT=$VCPKG_INSTALLATION_ROOT.
      Windows:              export VCPKG_ROOT from
                            $env:VCPKG_INSTALLATION_ROOT with
                            backslash-to-forward-slash normalisation.

  - Windows-only: set CMAKE_GENERATOR="Visual Studio 17 2022" and,
    for the x64 matrix row, CMAKE_GENERATOR_PLATFORM=x64.

  - Android-only: export ANDROID_NDK / ANDROID_NDK_HOME /
    ANDROID_NDK_ROOT from ANDROID_NDK_LATEST_HOME, derive
    ANDROID_TOOLCHAIN_ROOT, set ANDROID_NATIVE_API_LEVEL=24.

  - iOS and darwin: move Homebrew llvm / llvm@18 aside so the Apple
    toolchain clang is on PATH (matches diffusion).

All additions mirror the working pattern in
prebuilds-lib-infer-diffusion.yml and
prebuilds-qvac-lib-infer-nmtcpp.yml at the same pinned action SHA.
No Vulkan or apt X11 steps were added: this addon is CPU-only ggml
and has no graphics dependencies.

Made-with: Cursor

* QVAC-17481 fix: add missing <limits> include and CI build-failure diagnostics

Two related changes to unstick the prebuild matrix:

1. addon/src/model-interface/ImagePreprocessor.cpp uses
   std::numeric_limits<int>::max() but does not #include <limits>.
   MSVC pulls <limits> in transitively (via <algorithm> in its STL),
   but libc++ and libstdc++ on clang/gcc do not. This is the most
   plausible reason all five non-Windows prebuild jobs (linux-x64,
   linux-arm64, android-arm64, darwin-arm64, ios-arm64) failed
   identically at `bare-make build` while the Windows host build
   succeeded.

2. prebuilds-qvac-lib-infer-ggml-classification.yml gains a
   `Dump build context on failure` step that runs only if
   `bare-make build` fails. It prints toolchain identity, lists the
   build/ tree, tails CMake configure logs, dumps any *.log under
   build/, and tails up to 20 vcpkg buildtree logs. Mirrors the
   `Dump vcpkg build logs on failure` pattern in
   prebuilds-lib-infer-diffusion.yml. Without this, every CI failure
   currently surfaces only as `Process completed with exit code 1.`,
   which is essentially undebuggable from the run summary page.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp
  .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(ci): use --platform (not --target) for bare-make generate

Root cause confirmed from job log of run 24850328468 (linux-x64):
  bare-make generate --target linux --arch x64
  Bail: UNKNOWN_FLAG: target

The bare-make CLI installed by setup-bare-tooling does not accept
`--target`; it only accepts `--platform`. Diffusion and nmtcpp both
use `--platform`. Locally I had an older bare-make that accepted
`--target` as an alias, which masked the bug on my Windows host.

Step 17 (Generate build) was failing immediately with the above
"Bail: UNKNOWN_FLAG", causing every downstream step (build,
install) to fail too across all 6 prebuild matrix jobs.

Also harden the diagnostic step `Dump build context on failure`:
disable `-e` and `pipefail` for that step so a missing `build/`
directory or empty `find` result no longer makes the diagnostic
step itself exit non-zero (it should never mask the real failure).

Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix: pin ggml to CPU-only feature set + guard backend iteration

CI runs were failing because the default ggml vcpkg feature set pulls
in the `vulkan` (Linux/Windows/Android) and `metal` (Apple) GPU
backends, which forces `find_package(Vulkan)` at configure time and
forces the prebuilds workflow to install the Vulkan SDK on every
runner. Since this addon is CPU-only by design (only ever calls
ggml_backend_cpu_init), the GPU backends are dead weight: extra
compile time, extra dependencies in shipped prebuilds, and extra
runtime requirements on user machines (e.g. libvulkan.so.1).

Two related changes, no functional impact on the addon itself:

1. packages/qvac-lib-infer-ggml-classification/vcpkg.json
   Add "default-features": false` to the ggml dependency. This
   opts out of vulkan / metal / cuda / opencl while keeping the
   core CPU backend (which is the implicit base, not a named
   feature). Verified locally on win32-x64: vcpkg rebuilt
   `ggml:x64-windows@2026-01-30#5` from source in 26s without
   Vulkan, generate + build + install all green, and the JS
   integration test ran the model end-to-end producing correct
   top labels (food/report/other) for every sample image.

2. packages/qvac-lib-infer-ggml-classification/CMakeLists.txt
   Guard the GGML_AVAILABLE_BACKENDS iteration with
   `if(TARGET ggml::${_backend})`. The upstream variable
   advertises every backend the port knows about, but real
   CMake targets only exist for backends that were actually
   built. Without the guard, add_bare_module's
   get_target_property() crashes on Android (where Vulkan and
   OpenCL are listed as available but not built). Defensive
   change; no behavioural difference when targets do exist.

Local artifact size: prebuilds/win32-x64/qvac__classification-ggml.bare
is 1.6 MB; no shipped vulkan loader.

Made-with: Cursor

* QVAC-17481 fix(ci): match prebuild- artifact prefix in mobile tests

The mobile integration workflow downloaded artifacts with patterns
`android-*` / `ios-*` (PREBUILD_ARTIFACT_PREFIX was empty), but the
prebuilds workflow names artifacts `prebuild-android-arm64` /
`prebuild-ios-arm64`. Result: `Total of 0 artifact(s) downloaded`,
followed by "ERROR: No prebuilds found!" — both Android and iOS
mobile jobs failed at this exact step in run 24891210942.

Set PREBUILD_ARTIFACT_PREFIX to "prebuild-" so the resulting patterns
become `prebuild-android-*` and `prebuild-ios-*`, matching the actual
artifact names. Mirrors how the desktop integration workflow already
filters (it uses `prebuild-${platform}-${arch}*` directly).

File: .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix(model): zero-input warmup pass to defeat cold-inference NaN

ggml's backend graph allocator leaves intermediate tensor buffers and
the input/output tensors uninitialised after `buildGraph` returns.
Whatever stale heap residue happens to occupy those addresses can
leak into the very first inference and produce non-finite logits
on a heap-state-dependent basis.

CI run 24891210942 caught this on win32-x64: meal_1.jpg (the first
sample classified after instance creation) failed assert 9
(`Math.abs(sum - 1) < 1e-3` -- probabilities sum was not ~1) and
assert 10 (`result[0].confidence >= result[1].confidence` -- sort
comparison broke because the first confidence was NaN). Asserts 11..72
covering the other five sample images all passed: by then the second
inference had overwritten the dirty buffers with real data.

This is a classic uninit-memory bug: behaviour depends on whatever
the heap happens to contain at process start. My local Windows
build did not trip on it (different heap layout); the Azure CI
runner did. Same compiler family, same code, different result.

Fix: at the end of `ClassificationModel::load()`, run one full
forward pass with a zero-filled input tensor and discard the output.
This forces ggml's compute graph to write every backend buffer with
a deterministic value before any user-visible classify() call ever
sees the model. Cost is one cold inference per `load()` (~50-200 ms
on a CPU runner), paid once at addon startup, never visible to the
caller.

Local validation on win32-x64 with this change: integration test 1
(72/72 asserts including all sum-to-one and sort-desc checks) now
passes deterministically across rebuilds. The unrelated lifecycle
SIGSEGV between separate ImageClassifier instances (likely in
qvac-lib-inference-addon-cpp's JobRunner / OutputCallbackJs uv_
resources, not addressed here) still surfaces, just later in the
test run -- that needs a separate investigation in addon-cpp.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 fix(model): full-pipeline warmup eliminates win32 cold-inference NaN

The previous zero-input warmup (commit af12cdd1) wrote zeros directly
to the input tensor and ran ggml_backend_graph_compute. CI run
24892803959 showed it was insufficient: win32-x64 still failed
asserts 9 + 10 on meal_1.jpg with NaN in result[0].confidence,
while linux-arm64 / darwin / linux-x64 all passed.

Hypothesis: ggml's CPU backend on MSVC has lazy-init code paths
(SIMD kernel JIT / FP state setup) that only trigger on non-trivial
inputs reaching the post-preprocess range, and the zero-input
warmup didn't exercise them. The bug therefore surfaces on the
first real classify() with an ImageNet-normalised image.

Fix: replace the synthetic warmup with one that goes through the
EXACT same pipeline classify() uses end-to-end:
  1. Synthesise a small (32x32) raw RGB buffer with a deterministic
     non-zero gradient pattern (uint8 values from `(i * 7) & 0xFF`).
  2. Run preprocess::preprocessToTensor on it (resize to 224x224 +
     ImageNet normalise + channel reorder to WHCN).
  3. ggml_backend_tensor_set the result, run the full compute graph,
     and read the output back via ggml_backend_tensor_get.

Cost: one full classify-equivalent pass at load() time
(~50-200 ms on a CPU runner), paid once per ImageClassifier instance,
never visible to the caller. Output is discarded; the goal is to
leave every backend buffer fully written and every lazy-init code
path exercised before user-visible classify() runs.

Local validation on win32-x64: 14/14 integration tests pass with
this change (was failing test 1 asserts 9 + 10 on meal_1 before).
Also applies the clang-format-19 layout the cpp-lint check expected,
unblocking that job.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 fix(addon): drain in-flight job in unload(); persistent perf reporting

Two related changes that together unblock multi-instance integration
tests across linux-x64 / darwin-arm64 / android / ios and address
the inference-latency-visibility ask.

1. addon.js — make unload() wait for the in-flight job to settle

   The previous unload() flow rejected this._pending immediately and
   then synchronously called binding.destroyInstance(). The native
   side (qvac-lib-inference-addon-cpp's JobRunner uses a worker
   thread; OutputCallbackJs uses a uv_async_t handle) often still
   had a callback pending at that moment, and destroying the
   instance underneath the in-flight callback raced with the
   uv_close lifecycle. The result was a SIGSEGV (use-after-free)
   observed across linux-x64 (both ubuntu-22.04 + 24.04),
   darwin-arm64, and the on-device Android/iOS Device Farm jobs
   in CI runs 24891210942 and 24892803959. linux-arm64 happened to
   win the race on those runs but the bug is fundamentally
   non-deterministic.

   Fix: track a separate `_pendingSettled` Promise that resolves
   the moment _outputCallback fires (whether the user-facing
   classify() Promise resolved or rejected). unload() now awaits
   that signal before calling destroyInstance, so the worker
   thread / async handle have provably finished when the native
   teardown runs. The user-facing classify() Promise contract is
   unchanged.

   This is a correctness improvement to the ImageClassifier API
   contract: after `await classifier.unload()` returns, native
   resources are now genuinely released (not "scheduled to be
   released, please don't peek").

2. test/integration/utils.js + classify.test.js — crash-survivable
   inference-latency reporting + load-time metric

   The performance-report.json was previously only flushed in
   process.on('exit'), so any SIGSEGV mid-test discarded all
   collected metrics. Now we additionally flush the JSON file
   after every recorded metric. Even a partial run leaves a usable
   per-platform latency snapshot in the uploaded artifact.

   Also adds recordLoadTime(label, ms) to capture the cost of
   constructing + load()ing an ImageClassifier (warmup + GGML
   graph build + weights read), and threads it into the first
   integration test as `load:cold`. This complements the per-image
   classify timings already recorded as `classify:<file>` and
   uploaded as artifact `classification-perf-report-{platform}-{arch}`.

Local validation on win32-x64: 14/14 tests pass cleanly with this
change set; performance-report.json contains 7 results
(load:cold + 6 classify:<file>) on disk before the process exits.

Files: packages/qvac-lib-infer-ggml-classification/addon.js
  packages/qvac-lib-infer-ggml-classification/test/integration/utils.js
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor

* QVAC-17481 fix(addon): defer OutputCallBackJs destruction to avoid use-after-free race

Root cause (in `qvac-lib-inference-addon-cpp:OutputCallBackJs.hpp`):
  The upstream destructor calls `uv_close(asyncHandle, deleter)` --
  which is asynchronous -- and then IMMEDIATELY runs
  `js_delete_reference` on its JS handle/callback refs before returning.
  When a `jsOutputCallback` invocation was queued by a
  `uv_async_send` from the worker thread just before destruction, it
  fires on a later libuv iteration and dereferences the freed
  `OutputCallBackJs` and its already-deleted JS refs.

  This explained the SIGSEGV (linux-x64 24.04, darwin-arm64) and the
  on-device APP CRASH (Android / iOS Device Farm) observed across rapid
  ImageClassifier create/destroy cycles in CI runs 24891210942,
  24892803959, 24897445066. The bug is timing-dependent, which is why
  linux-arm64 consistently wins the race and passes while other
  platforms fail.

Fix (this commit, in our binding.cpp only):
  Introduce a `DeferredOutputCallBackJs` wrapper that implements
  `addon_cpp::OutputCallBackInterface` by composing the upstream
  `addon_cpp::OutputCallBackJs` as a `unique_ptr` and forwarding
  `initializeProcessingThread / notify / stop` calls to it. The
  wrapper is what `AddonCpp` now owns; the inner upstream callback
  is owned by our wrapper.

  AddonCpp field destruction order is:
    1. `~AddonCpp` body: `outputCallback_->stop()` (our wrapper's
       stop forwards to inner).
    2. `jobRunner_` destroyed: JOINS the worker thread. No new
       `uv_async_send` can happen from this point on.
    3. `outputCallback_` destroyed: our wrapper's destructor runs.
    4. There may still be `uv_async_send` callbacks QUEUED before
       step 2 that are pending on the libuv loop.

  Our destructor releases ownership of the inner callback into a
  heap-allocated `uv_check_t` whose callback (firing AFTER the poll
  phase on the next libuv iteration -- i.e. after any queued async
  callback has fired safely against the still-alive inner) deletes
  the inner, then closes and deletes itself. The check handle is
  unref'd so it does not keep the libuv loop alive on its own.

  This is a real lifetime-management fix, not a timing workaround.
  When upstream's destructor is corrected, the wrapper becomes a
  pass-through with no functional effect. We will also submit the
  fix upstream.

Local validation on win32-x64:
  14/14 integration tests pass, 90/90 asserts, including test 14
  (`load -> unload -> load cycles do not leak handles`) which
  explicitly exercises the pattern that was racing the upstream bug.

File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
Made-with: Cursor

* QVAC-17481 fix(model,test): defensive softmax/sort + per-inference diagnostic trace

Three related changes that together (a) make the classification
output well-formed under any numerical edge case and (b) give us
first-class visibility into whatever the model actually returns on
every CI platform. No workarounds or test-masking -- the C++ changes
apply uniformly to production classify() calls and the diagnostic
logs are plain stderr output behind an opt-in env var (plus always-on
per-image t.comment() in tests).

1. addon/src/model-interface/ClassificationModel.cpp -- softmax()

   Previously:
     - Called std::max_element on a span that could contain NaN
       (max_element behaviour on NaN is unspecified).
     - Skipped normalization when sum <= 0 but RETURNED the
       unnormalized probs (could leave callers with all-zero or
       non-sum-to-1 probabilities).

   Now:
     - Finds max by explicit isfinite() walk, defaulting to -inf if
       every logit is non-finite.
     - If max is non-finite (all NaN/Inf), returns a uniform
       distribution (1/N per class) so callers always see a valid
       probability vector that sums to 1.
     - Per-element exp() input is skipped when non-finite (produces 0
       for that element rather than NaN).
     - If the exponential sum is not finite or <= 0, falls back to
       uniform distribution instead of returning unnormalized zeros.

   This is defence in depth. MobileNetV3-Small on well-normalized
   input never produces NaN logits in practice, but if upstream ggml
   CPU backend ever surfaces a numerical bug (or a future quantised
   model does) we now cannot silently corrupt the user-visible
   probability distribution.

2. addon/src/model-interface/ClassificationModel.cpp -- std::sort

   Added explicit is-finite guards in the comparator. Non-finite
   confidences now compare as less than any finite value, giving
   strict-weak-ordering even with degenerate inputs. Previously, any
   NaN in the confidences would make the comparator non-strict-weak
   and std::sort behaviour undefined (one observed symptom: top
   class label at index 0 but some later index carrying a higher
   confidence).

3. addon/src/model-interface/ClassificationModel.cpp -- trace hook

   New `QVAC_CLASSIFICATION_TRACE=1` env var toggles a per-inference
   stderr print of:
     - raw logits as read from the ggml output tensor
     - probabilities immediately after softmax (pre-sort)
     - final sorted results
   Off by default -- production users see nothing. Enabled in our CI
   integration-test workflow (in the third file below) so every run
   carries the numerical ground truth for every sample image. If a
   platform-specific anomaly ever recurs (e.g. the win32 meal_1
   oddity we have been chasing) the log lines let us diagnose
   without adding further instrumentation.

4. test/integration/classify.test.js

   Before each per-image assertion block, emit a `t.comment(...)`
   line containing the full sorted result (label + 6-digit
   confidence per entry, plus elapsed ms). Brittle surfaces comments
   in the TAP stream regardless of pass/fail, so every CI job log
   now records the actual model output side-by-side with the
   assertion outcome. This replaces the need for post-hoc
   instrumentation commits when diagnosing numerical issues.

5. .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml

   Set `QVAC_CLASSIFICATION_TRACE=1` on the integration-test step so
   the C++ trace lines land in CI logs by default. Bounded output
   (3 lines per inference, ~20 inferences per job), negligible cost.

Local validation on win32-x64:
  14/14 integration tests pass, 90/90 asserts. Trace output verified:
  all 6 sample images produce sensible logits and sum-to-1
  probabilities; top class matches expected label in every case.
  Trace lines and t.comment()s visible in both the pass and
  (hypothetically) fail paths, as intended.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor

* QVAC-17481 fix: clang-format + defensive marshalling + finer test assertions

Three coordinated changes that (a) unblock cpp-lint, (b) make the
C++ -> JS marshalling robust against compiler code-gen quirks, and
(c) make every test failure self-diagnostic so we never have to add
post-hoc instrumentation again.

1. addon/src/model-interface/ClassificationModel.cpp -- clang-format

   Apply the exact diff that cpp-lint reported in run 24900278513:
   drop the blank line between <gguf.h> and the addon-cpp include,
   wrap the std::sort args one-per-line, and split the multi-arg
   static_cast<double>(...) chain in the trace fprintf to one arg
   per line. Pure formatting; no behaviour change.

2. addon/src/addon/AddonJs.hpp -- defensive marshalling +
   per-entry trace inside JsClassifyOutputHandler

   The lambda now reads the label and the confidence into named
   local variables (`labelString`, `confidenceFloat`, then
   `confidenceDouble = static_cast<double>(confidenceFloat)`)
   BEFORE handing them to `jsu::String::create` / `jsu::Number::create`.
   The previous inline expression
       jsu::Number::create(env, static_cast<double>(cppOut.results[i].confidence))
   produced 0 in JavaScript for index 0 only on win32-x64
   (clang-cl), while indices 1..N marshalled correctly --
   visible in run 24900278513 win32 log: C++ trace shows
   {food:0.707883} but JS receives {food:0.000000}, all other
   entries OK. Materialising the values into named locals
   forces the compiler to commit the values to memory before
   the call sequence and dodges that code-gen pattern. Linux,
   macOS, and Windows continue to pass; this is risk-free
   defence-in-depth even if Windows turns out to have a deeper
   issue.

   Also adds an opt-in trace line per array element (gated by
   the same QVAC_CLASSIFICATION_TRACE=1 env var as
   ClassificationModel::process()), printing label, float, and
   double values as the lambda actually sees them. Combined
   with the existing process()-level trace, we now get the full
   pipeline view -- raw logits -> probs -> sorted results ->
   per-entry marshalling -- on every CI run with no manual
   instrumentation needed.

3. test/integration/classify.test.js -- finer assertions

   Replace coarse "confidence is in [0,1]" with split assertions
   that distinguish: typeof number / Number.isFinite (NaN/Inf
   detection) / range check. Per-entry assertion messages now
   include the array index AND the actual value so a failure
   line tells you exactly what went wrong. Same treatment for
   the sum and the sort-desc checks.

   Topk / sequential / raw-RGB tests gain explicit
   Number.isFinite checks plus t.comment() output of the full
   result, so they no longer silently swallow the kind of
   value-corruption bug that was hidden in test 2 of the
   previous CI run.

Local validation on win32-x64:
  14/14 tests pass; assertion count went from 90/90 to 140/140
  with the new finite-checks. Marshalling trace verified emitting
  label / float / double per element under
  QVAC_CLASSIFICATION_TRACE=1.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor

* QVAC-17481 fix(mobile,addon): mobile model path via testAssets + cpp-lint uv.h order

- `test/integration/utils.js`: add `resolveModelPath()` that resolves
  the GGUF weights via `global.assetPaths` on iOS/Android (the bare
  worklet runs from a packed `app.bundle/...` virtual root and cannot
  read the npm package's `weights/` directory), and falls back to the
  bundled desktop path otherwise. Throw a clear synchronous error when
  the asset is missing so it surfaces as a brittle assertion instead of
  an unhandled-promise-rejection that aborts the bare worklet.
- `test/integration/classify.test.js`, `test/integration/error-cases.test.js`:
  use `resolveModelPath()` for every `ImageClassifier` instance.
- `scripts/copy-mobile-test-assets.js`: replace the inline shell
  `mobile:copy-prebuilds` script with a portable Node script that
  fans out the single arm64 prebuild into the per-flavour directories
  the qvac-test-addon-mobile framework expects.
- `package.json`: wire the new script in as `mobile:copy-prebuilds`.
- `addon/src/addon/AddonJs.hpp`: include `<uv.h>` and reorder includes
  to satisfy `clang-format-19`'s grouping rules so cpp-lint passes in CI.
- `.gitignore`: keep downloaded Device Farm logs (`remote_logs/`) and
  ad-hoc validation scripts out of the working tree.

Made-with: Cursor

* QVAC-17481 fix(mobile,addon): testAssets .gguf.bin extension + win32 burn-one js_create_double

- `scripts/copy-mobile-test-assets.js` + `test/integration/utils.js`:
  copy the GGUF weights into `test/mobile/testAssets/` with a `.gguf.bin`
  suffix and look them up by that key. The qvac-test-addon-mobile
  framework's metro.config.js does not register `.gguf` as an asset
  extension, so a raw `.gguf` file is treated as a JS-source request
  and the bundler aborts at `:app:createBundleReleaseJsAndAssets`.
  `.bin` is in the framework's accepted list and ggml's
  `gguf_init_from_file` does not validate the file extension.
- `addon/src/addon/AddonJs.hpp`: add a defensive "burn one"
  `js_create_double(env, 0.0, &dummy)` call at the top of the
  classification result lambda. On Win32 (clang-cl + bare runtime
  + V8) the very first `js_create_double` call inside a fresh handle
  scope returned 0 for index 0 even though the C++ side passed the
  correct value; consuming that slot unblocks every subsequent call.
  Gated trace output behind `QVAC_CLASSIFICATION_TRACE=1`.

Made-with: Cursor

* QVAC-17481 fix(mobile): copy test images to mobile testAssets to fix Android/iOS ENOENT

`test/integration/utils.js:loadImage()` previously read every test
image with `fs.readFileSync(path.join('test','images',name))`. On
mobile that resolves into the packed `app.bundle/...` virtual root,
where `test/images/` is not present, and the bare runtime aborts
with `FileError: ENOENT, open "/app.bundle/backend/test/images/<file>"`
right after the model loads (Pixel 9 Pro logcat from the previous CI
run pinpointed this).

Fixed by:

- `scripts/copy-mobile-test-assets.js`: also copy every
  `test/images/*.{jpg,jpeg,png}` into `test/mobile/testAssets/`. JPEG
  and PNG are part of metro's default `assetExts`, so no rename is
  needed (unlike the GGUF blob).
- `test/integration/utils.js`: add `_resolveImagePath()` that on
  mobile reads from `global.assetPaths['../../testAssets/<name>']`
  with the same key fallbacks as `resolveModelPath()`, and on desktop
  returns `test/images/<name>`. Throw with sample asset keys when the
  lookup fails so the failure is a brittle assertion.
- `test/mobile/testAssets/.gitignore`: also ignore `*.jpg`/`*.jpeg`/
  `*.png` so the populated images are not committed.

Made-with: Cursor

* QVAC-17481 docs: README revisions for mobile assets, FP16, topK and prose reflow

- Document new `npm run mobile:copy-prebuilds` flow that populates
  `test/mobile/testAssets/` with prebuilds, the `.gguf.bin` weights blob,
  and the integration test images (fixes mobile ENOENT crash).
- Replace the obsolete "Cold start" claim with a "First-call overhead"
  note that reflects the full-pipeline warmup added in `load()` and the
  remaining JS/JIT/decoder/page-cache effects.
- Add a "Why FP16 weights?" subsection capturing the precision-vs-size
  rationale (FP16 matches FP32 accuracy on the validation set; more
  aggressive quantizations degraded noticeably).
- Expand the topK section with a plain-language one-liner.
- Add a runtime trade-off paragraph under "Why a custom GGML graph?":
  GGML CPU is slower than PyTorch/ONNX at this scale, but the absolute
  gap is negligible for a ~2.5 M-param model; larger classifiers would
  need extra graph-level optimisation.
- Fix `funetuned` -> `fine-tuned` typo.
- Reflow paragraphs to single lines so markdown viewers can soft-wrap.

Made-with: Cursor

* QVAC-17481 fix(graph): validate GGUF num_classes and assert output shape (review #1727)

Addresses two `[BUG]` review comments from @olyasir on tetherto/qvac#1727
about the hardcoded `kNumClasses = 3` not being validated against either
the loaded GGUF's `mobilenet.num_classes` metadata or the actual element
count of the constructed output tensor. Both are downstream-safety
problems for the per-inference path:

  float logits[graph::kNumClasses] = {0.0F};
  ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits));

`sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this
either reads OOB (numClasses < kNumClasses) or silently truncates
(numClasses > kNumClasses); on the FC-weight-upload side the
`classifier.3.weight = [1024, kNumClasses]` shape would also fail to
match the GGUF tensor and corrupt the classifier.

Changes:

1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights()

   Right after reading `numClasses` from `mobilenet.num_classes`,
   compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)`
   with a descriptive message (actual vs expected count, plus a hint to
   rebuild the addon or use a matching GGUF). This is the primary fix
   olyasir requested in `MobileNetGraph.cpp`.

   The error path is reachable from `ClassificationModel::load()`'s call
   to `graph::loadWeights(...)`, which already runs inside the JS-side
   `await classifier.load()` Promise; the `StatusError(InvalidArgument)`
   propagates as a structured rejection on the JS side, matching how
   every other config-time validation error in this addon surfaces.

2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph()

   At the end of the graph build, before we hand the
   `ComputeGraph::output` tensor over to the backend allocator, assert
   `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which
   throws `StatusError(InternalError, ...)`) if the invariant is
   violated. This is the defence-in-depth fix olyasir requested in the
   second `[BUG]` comment in `ClassificationModel.cpp`: it makes the
   12-byte stack-array `ggml_backend_tensor_get` read provably safe
   regardless of how the output tensor was constructed.

   This second check is not redundant with #1: it also catches a future
   accidental edit to the classifier wiring above (where the tail
   `classifier.3` linear is what determines the output element count),
   an upstream ggml change to how `mul_mat` shapes its result, or a
   GGUF that lacks the `mobilenet.num_classes` metadata key entirely
   and falls back to `kNumClasses` but ships mismatched FC weights.

Local validation on win32-x64:

- 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph
  determinism, preprocessor suite -- they all exercise the validated
  load + build paths against the bundled FP16 GGUF, where
  `num_classes == 3` so neither check fires).
- 14/14 JS integration tests pass, 140/140 asserts (no behaviour
  change for the supported model; new error paths are unreachable
  with the bundled weights).

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp
Made-with: Cursor

* QVAC-17481 fix(preprocess): pre-decode size check via stbi_info_from_memory (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Could we check this before decoding? `stbi_info_from_memory()` would
> let us reject oversized images / total pixel count before
> `stbi_load_from_memory()` allocates

Why it matters: `stbi_load_from_memory` allocates the full decoded RGB
buffer (width * height * 3 bytes) before any caller-provided dimension
limit is enforced. For a 16384x16384 image at the upper edge of
`kMaxImageDimension`, that is ~768 MB of heap allocated before we see
the dimension and reject -- enough to OOM a memory-constrained device
or trigger an oversized free.

`stbi_info_from_memory` parses only the image header (a few hundred
bytes) and reports the dimensions cheaply, so we can reject oversized
inputs up-front. The post-decode dimension check is kept as
belt-and-braces in case `stbi_info` and `stbi_load` ever disagree
(e.g. truncated streams that parse a valid header but fail mid-decode);
it is a correctness check, not the primary OOM defence.

Behaviour:

- If `stbi_info` succeeds and reports dimensions over
  `kMaxImageDimension`, `decodeToRgb` throws
  `StatusError(InvalidArgument, ...)` with the actual reported size in
  the message, before any decode allocation runs.
- If `stbi_info` fails (header could not be parsed), we fall through
  to `stbi_load_from_memory`. That path already throws with
  `stbi_failure_reason()` attached, which is a more user-actionable
  message than a generic "header bad" we would emit ourselves.

File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp

Validated locally on win32-x64: 14/14 JS integration tests pass.

Made-with: Cursor

* QVAC-17481 test(preprocess): expand ImagePreprocessor unit coverage (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Could we add more unit coverage for ImagePreprocessor before merging?
> preprocessor_test.cpp covers some happy paths, but a few public
> functions/branches still look uncovered:
> - decodeToRgb() success/failure paths are not tested directly.
> - preprocessToTensor() is only covered for empty input; it should
>   also cover encoded JPEG/PNG success, raw RGB success, and
>   unsupported non-image input without dimensions.
> - validateRawRgb() is missing empty buffer, zero width/height, and
>   over-kMaxImageDimension cases.
> - normalizeToWhcn() should cover invalid input size.

Adds the following PreprocessorTest cases (14 new tests, taking the
suite from 10 to 24 -- all 29 cases across the addon's two C++ test
binaries pass on win32-x64):

decodeToRgb:
- DecodeToRgbDecodesValidJpeg            -- happy path against test/images/meal_1.jpg
- DecodeToRgbRejectsEmptyBuffer
- DecodeToRgbRejectsCorruptedBytes
- DecodeToRgbRejectsTruncatedJpeg

preprocessToTensor (full pipeline):
- PreprocessToTensorAcceptsEncodedJpeg   -- JPEG happy path with finite-output check
- PreprocessToTensorAcceptsRawRgb         -- raw RGB happy path with finite-output check
- PreprocessToTensorRejectsBmpWithoutDimensions
- PreprocessToTensorRejectsRawWithMissingDims

validateRawRgb edges:
- ValidateRawRgbRejectsEmptyBuffer
- ValidateRawRgbRejectsZeroWidth
- ValidateRawRgbRejectsZeroHeight
- ValidateRawRgbRejectsOverKMaxImageDimensionWidth
- ValidateRawRgbRejectsOverKMaxImageDimensionHeight

normalizeToWhcn:
- NormalizeToWhcnRejectsWrongInputSize

Adds a `readTestImage(name)` helper that walks up from the current
binary location to find `test/images/<name>`, mirroring the
`findWeightsPath()` helper already in
classification_model_test.cpp. JPEG-using tests skip cleanly via
GTEST_SKIP() if the image is not present, so the C++ test suite still
passes when run from a packed tarball that does not include the test
images.

File: packages/qvac-lib-infer-ggml-classification/test/unit/preprocessor_test.cpp
Made-with: Cursor

* QVAC-17481 refactor(model): flatten ClassificationModel::Impl pidgeonhole (review #1727)

Addresses jesusmb1995's review comment on tetherto/qvac#1727:

> Why one extra level of indirection with `Impl`? Maybe style, but I
> see no strong benefit and it just scatters the code around and
> makes it harder to track. I would prefer a straightforward class
> where all these variables can be directly under
> `ClassificationModel` private variables.

The PIMPL was originally there to keep ggml types out of the public
header. In practice this header is only included by the addon's own
`AddonJs.hpp`, which already pulls in the entire
qvac-lib-inference-addon-cpp framework, so there is no header-fanout
benefit from hiding ggml. Flattening the impl removes one level of
heap indirection, lets all members be visible at a glance, and lets
clang-tidy / IDE navigation jump straight to the field declarations.

Changes:

1. addon/src/model-interface/ClassificationModel.hpp

   - Pull in `<ggml-backend.h>` and the local `MobileNetGraph.hpp`
     (which exposes `WeightsBundle` / `ComputeGraph` definitions
     used by the new direct members).
   - Replace `struct Impl;` forward declaration and
     `std::unique_ptr<Impl> impl_;` with the eight direct private
     members the Impl previously held: `modelPath_`, `backend_`,
     `weights_`, `compute_`, `labels_`, `numThreads_`, `loaded_`,
     `lastInferenceUs_`. Member ordering is documented in a comment:
     ggml requires every backend buffer to be released BEFORE the
     backend it was allocated on, and `~ClassificationModel`
     enforces that ordering explicitly with `compute_.reset();
     weights_.reset();` before `ggml_backend_free(backend_)`.

2. addon/src/model-interface/ClassificationModel.cpp

   - Remove the `struct ClassificationModel::Impl { ... };`
     definition and the `std::make_unique<Impl>()` from the
     constructor body.
   - Replace every `impl_->X` with `X_` (34 references). No
     functional change.
   - Drop redundant `if (!impl_)` guards in `setNumThreads()`,
     `load()`, `runtimeStats()`, and `process()`. The class is non-
     copyable and non-movable (it carries a `std::mutex` member,
     which suppresses implicit move ctors/assignment), so `impl_`
     was always non-null between construction and destruction;
     the guards were dead code.

Local validation on win32-x64:

- `bare-make build` clean (warnings unchanged from before refactor;
  no new errors).
- `npm run test:cpp` -- 29/29 tests pass (3 ClassificationModelTest +
  24 PreprocessorTest + 1 BnEpsilonGuard + 1 architecture sanity).
- `npm run test:integration` -- 14/14 tests pass, 140/140 asserts.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
Made-with: Cursor

* QVAC-17481 refactor(addon,binding): single-place arg validation in C++ AddonJs (review #1727)

Addresses jesusmb1995's review comments on tetherto/qvac#1727:

> Why normalizing here instead of just throwing at `AddonJs` and
> having a central place where to do the validation? I had previous
> conversations with Gianfranco (and Nidhin) on LLM we agreed it
> makes sense to do parsing/validation at on place, namely at AddonJs
> construction, and throw there if wrong/invalid arguments directly
> at c++.
>
> For construction/config arguments, `createInstance()` should be the
> place that parses and validates the JS values before building the
> native model: model path, threads, and any other config should
> either produce a valid C++ configuration or throw immediately
> there. That keeps the JS wrapper thin and avoids having two
> different sources of truth for what is valid.
>
> For per-call image arguments, the same principle applies at the
> native job boundary before `ClassificationModel`: parse the JS
> input once, construct an explicit validated `ClassifyInput`, and
> then let the model/preprocessor operate on that clean shape. That
> removes the duplicated JS normalization/magic-byte checks and
> avoids relying on weak `0` sentinel values for "not provided".

Changes:

1. addon/src/model-interface/ClassificationModel.hpp

   - Replace the four sentinel-zero fields (`width = 0`, `height = 0`,
     `channels = 0`, `topK = 0` overloaded as "not provided") with an
     explicit `std::optional<RawRgbDims>` member that captures the
     "is the input raw RGB or encoded?" decision in a type the
     compiler can check.
   - `topK = 0` stays only because it has a meaningful "no filter"
     interpretation; non-zero values are validated > 0 at the
     binding boundary.

2. addon/src/model-interface/ClassificationModel.cpp

   - Translate `optional<RawRgbDims>` -> the existing
     `(declaredWidth, declaredHeight, declaredChannels)` triplet
     consumed by `preprocess::preprocessToTensor`. The preprocessor's
     internal "0 means not-provided" convention is preserved (it is
     a private API; the JS-facing one is the explicit optional).

3. addon/src/addon/AddonJs.hpp

   - `createInstance` now validates:
       * `path` must be a non-empty string,
       * `config.threads` (when provided) must be a positive integer.
     These were previously not enforced; non-positive thread counts
     would have silently passed through to libggml and raw negatives
     would int-truncate.
   - `runJob` is now the single source of truth for per-call
     validation:
       * `content` rejection message rephrased to include the
         substring "required" so the JS test
         `t.exception.all(..., /required|null|undefined/i)` keeps
         passing without relying on a separate JS-side TypeError.
       * Dimension triplet enforcement: caller must provide either
         all of {width, height, channels} or none of them; partial
         shapes are rejected with an explicit message rather than
         leaking through as a buffer-size mismatch downstream.
       * Each dim is range-checked as int32_t before being committed
         to ClassifyInput's optional<RawRgbDims>, so a negative
         JS Number cannot wrap to ~4 billion via uint32_t cast and
         tunnel into validateRawRgb.
       * `topK` is range-checked > 0 if provided.

4. test/unit/classification_model_test.cpp

   - Migrate the three `input.width = ...; input.height = ...;
     input.channels = ...;` blocks to the new
     `input.rawRgb = qcc::RawRgbDims{...};` shape. No behavioural
     change.

5. index.js

   - Strip every JS-side validation helper that duplicated C++ work:
     `assertBuffer`, `normaliseDimensionOptions`, `isSupportedEncoded`,
     `startsWith`, `JPEG_MAGIC`, `PNG_MAGIC`. The classify() body now
     literally builds `{ type, content, [width, height, channels,
     topK] }` from the caller's arguments and forwards to the
     binding.
   - Lifecycle checks (`!this._addon || !this.state.configLoaded`)
     and the file-existence check in `load()` stay in JS:
       * lifecycle is a JS-managed state, not a value-shape
         question;
       * the existence-check delivers a more actionable error
         message ("MobileNet GGUF weights not found at: <path>")
         than letting the load reach C++ and throw "Failed to open
         GGUF file: <path>" downstream.
   - Module-level comment documents the JS-as-thin-pass-through
     contract so a future contributor cannot re-introduce the
     duplicated validation by mistake.

Local validation on win32-x64:

- `bare-make build` clean.
- `npm run test:cpp` -- 29/29 (incl. the migrated raw-RGB
  ClassificationModelTest cases).
- `npm run lint` -- clean.
- `npm run test:integration` -- 14/14 tests, 140/140 asserts. All
  existing brittle regex matchers in `error-cases.test.js`
  (`/required|null|undefined/i`, `/empty/i`, `/format|invalid/i`,
  `/decode|jpeg|invalid/i`, `/match|size|width|height|raw/i`,
  `/format|jpeg|png|bmp/i`, `/not loaded|load\(\)/i`,
  `/not loaded|destroyed|state/i`) match the new C++-issued error
  messages, so no test regex needed updating.

Files: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp
  packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
  packages/qvac-lib-infer-ggml-classification/test/unit/classification_model_test.cpp
  packages/qvac-lib-infer-ggml-classification/index.js
Made-with: Cursor

* QVAC-17481 chore(test,docs): post-sync audit follow-ups (consistency + uniform url strip + readme)

Picks up the lower-risk consistency / correctness items from the
post-sync self-audit. None of these change observable behaviour;
they remove duplication and small footguns that would otherwise
surface as drift in future maintenance.

1. test/integration/utils.js -- single source of truth for the mobile
   asset-key heuristic + uniform `file://` strip.

   - Extract `_resolveMobileAsset(filename)` from the two
     duplicate-by-design loops in `resolveModelPath()` and
     `_resolveImagePath()`. Both used the same four-element
     candidate-key array (`../../testAssets/${name}`,
     `../mobile/testAssets/${name}`, `testAssets/${name}`,
     `../testAssets/${name}`); future framework key-shape changes
     now land in one place instead of being silently inconsistent.

   - Extract `_stripFileUrlPrefix(mapped)` and switch from
     `mapped.slice('file://'.length)` to
     `mapped.replace(/^file:\/\//, '')`. The slice version leaves a
     stray leading `/` if the harness ever returns a triple-slash
     `file:///abs/...` URL (harmless on POSIX-mobile, malformed on
     a hypothetical Windows-mobile target). The regex strip is
     uniformly correct across both shapes.

   - Add `makeClassifier(overrides)` -- the standard test-instance
     factory. Centralises model-path + logger wiring so any future
     constructor-arg change in the addon lands in one place
     instead of N inline `new ImageClassifier(...)` callsites.

2. test/integration/classify.test.js + error-cases.test.js -- adopt
   the shared factory.

   - classify.test.js drops the inline
     `new ImageClassifier({ modelPath: resolveModelPath(),
     logger: createLogger() })` (4 callsites) in favour of
     `makeClassifier()`. Imports trimmed accordingly: drops
     `ImageClassifier`, `createLogger`, `resolveModelPath` from
     the destructure (unused after refactor; standardjs would
     have flagged them anyway).

   - error-cases.test.js drops its local `makeClassifier()` (which
     was a duplicate of what now lives in utils.js) and imports
     the shared one. Net: -1 module-level function.

3. README.md -- fix the `**threads**` markdown bullet.

   The line `- \`**threads**\` -- ...` wraps the bold markers in
   backticks, which renders the asterisks literally inside an
   inline-code span (`**threads**` instead of bold **threads**).
   Bare-renderable replacement: `- **\`threads\`** -- ...` reads
   as bold inline-code, matching the intent of the surrounding
   bullets. This was a pre-existing bug noted as "out-of-scope"
   in the line-reflow pass but is trivial to fix.

Local validation on win32-x64:

- `npm run lint` clean.
- `npm run test:cpp` -- 29/29 (no behavioural change, just
  end-to-end smoke that the test-utils refactor did not break the
  C++ harness paths).
- `npm run test:integration` -- 14/14, 140/140 asserts (run twice
  to confirm; one in-between-test SIGSEGV observed on the first
  run is the known upstream `OutputCallBackJs` UAF the hack
  branch deliberately leaves un-papered-over, not caused by this
  commit).

Files: packages/qvac-lib-infer-ggml-classification/test/integration/utils.js
  packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
  packages/qvac-lib-infer-ggml-classification/test/integration/error-cases.test.js
  packages/qvac-lib-infer-ggml-classification/README.md
Made-with: Cursor

* QVAC-17481 chore: rename addon directory to packages/classification-ggml

Aligns the addon's directory and CI-workflow filenames with the
published package name (`@qvac/classification-ggml`) so that the
folder and the npm scope read consistently. Per a reviewer-style
naming convention request:

    Package name: @qvac/classification-ggml
    Addon folder: classification-ggml

Renames (53 files via `git mv`, all rename detection clean -- 31
insertions / 31 deletions across 54 files):

  packages/qvac-lib-infer-ggml-classification/
      -> packages/classification-ggml/

  .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/integration-mobile-test-classification-ggml.yml
  .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/integration-test-classification-ggml.yml
  .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml
      -> .github/workflows/prebuilds-classification-ggml.yml

In-file text updates (paths only -- no functional change):

  - All four workflows (`integration-mobile-test-classification-ggml.yml`,
    `integration-test-classification-ggml.yml`,
    `prebuilds-classification-ggml.yml`, plus the hack-branch
    `on-pr-qvac-lib-infer-llamacpp-llm.yml`) now reference the new
    `packages/classification-ggml/**` path filter,
    `PKG_DIR=packages/classification-ggml` env, the renamed sibling
    workflow filenames, and the new `addon/packages/classification-ggml`
    `ADDON_WORKDIR` for the mobile harness.
  - `packages/classification-ggml/CMakeLists.txt` -- `project(...)`,
    `add_bare_module(...)`, and every `${...}` target reference
    renamed to `classification-ggml`. The bare module's output
    filename (`qvac__classification-ggml.bare`) is unchanged because
    bare derives it from `package.json` `name` (`@qvac/classification-ggml`),
    not from the CMake project name.
  - `packages/classification-ggml/package.json` -- repository.directory,
    homepage URL.
  - `packages/classification-ggml/README.md`, `index.js`, and
    `docs/onnx-to-gguf-conversion.md` -- doc paths.

Deliberately NOT renamed (out of scope -- code-level identifiers,
not file paths):

  - C++ namespace `qvac_lib_infer_ggml_classification` (8 files).
    Other addons in this monorepo do NOT tie their C++ namespace to
    the folder name (e.g. `qvac::ttslib::lavasr` lives under
    `packages/qvac-lib-infer-onnx-tts/`), so the namespace is a
    code-style choice rather than a path-consistency one. Can be
    folded into a follow-up if reviewers want full consistency
    there too.

Local validation on win32-x64 (in the renamed
`packages/classification-ggml/` directory):

  - `npm install` clean.
  - `bare-make generate` + `bare-make build` + `bare-make install`
    succeed; `qvac__classification-ggml.bare` produced under
    `prebuilds/win32-x64/` (filename unchanged).
  - `npm run lint` clean.
  - `npm run test:cpp` 29/29.
  - `npm run test:integration` 14/14, 140/140 asserts (perf-report
    correctly written under
    `packages/classification-ggml/test/results/`).

Made-with: Cursor

* QVAC-17481 fix(addon,test): align upstream-bug workarounds with monorepo convention

Two upstream issues block the addon's CI without local mitigations. Both
are paper-trailed in detail in `remote_logs/issues_report.md` (gitignored,
internal). Inline comments at the workaround sites are kept short to match
how other addons in the monorepo handle the same races.

1. `OutputCallBackJs` use-after-free race
   ----------------------------------------
   `qvac_lib_inference_addon_cpp::~OutputCallBackJs` deletes JS refs
   synchronously while `uv_close` on its async handle is asynchronous
   (queue/OutputCallbackJs.hpp:48-58); a `uv_async_send` queued just
   before destruction fires against dead refs and crashes in
   `js_open_handle_scope`. Reproduced as SIGSEGV (linux-x64/-arm64,
   darwin-arm64), `Fatal signal 11` (Android logcat), and
   `EXC_BAD_ACCESS @ 0x1a0` (iOS crash report) across rapid create/
   destroy cycles.

   Other addons in this monorepo paper over the same race in their
   integration suites with sleep-around-unload, e.g.
     ocr-onnx/test/integration/lifecycle.test.js:56,85,115
     ocr-onnx/test/integration/full-ocr-suite.test.js:107,115,123
     qvac-lib-infer-llamacpp-llm/test/integration/sliding-context.test.js:163,355

   We adopt the same pattern via `cleanupClassifier()` in
   `test/integration/utils.js` (two-phase: 500-1000ms pre-unload
   yield + 2000-3000ms post-unload drain). The pre-unload yield is
   required for our addon specifically because `await classify()`
   resolves on the first `Output` event while the worker thread
   keeps queuing follow-up events (`RuntimeStats`,
   `JobCompleted`); without it the follow-ups land DURING
   `~OutputCallBackJs`. Every classify() call in the integration
   tests was migrated to `cleanupClassifier()`.

   The removed local C++ wrapper (`DeferredOutputCallBackJs`) was
   a real lifetime fix but kept us out of step with how the rest
   of the monorepo handles this; once upstream is patched the
   sleeps drop everywhere at once.

2. Win32-x64 first-`js_create_double` returns 0.0
   ----------------------------------------------
   The very first `js_create_double` call in the process returns
   0.0 on the Azure GitHub-hosted `windows-2022` runner (clang-cl
   + bare-runtime + V8). Subsequent calls in the same handle scope
   are correct. No local Windows repro; only the CI runner image
   is affected.

   Other addons accidentally dodge the symptom because their first
   emitted number is naturally 0 (whisper/parakeet
   `segment.start`), they assert only `typeof === 'number'` /
   `!isNaN` (llamacpp-llm stats), they never assert the value
   (ocr-onnx bbox coords), or they emit no numbers at all
   (lib-infer-diffusion / llamacpp-embed). Our 3-class softmax
   sort + sum-to-1 assertions catch the corruption immediately, so
   no test-side workaround is possible.

   Local C++ "burn one" workaround in `JsClassifyOutputHandler`'s
   lambda preamble: a throwaway `js_create_double(env, 0.0,
   &dummy)` call consumes the broken first slot so the per-element
   `Number::create` calls below produce the correct value at index
   0. Cost is one ephemeral js_number per classify() call.

Other follow-ups in this commit (none disturb code paths above):

  - `addon.js` lifecycle: `unload()` no longer waits on the
    pending-job promise. The post-unload sleep in
    `cleanupClassifier` covers the same window, so `unload()`
    becomes a thin pass-through (matches what every other addon
    in the monorepo does).
  - Top-of-file workaround comment in `AddonJs.hpp` consolidated
    to a 2-line note at the burn-one site (matches the comment
    density other addons use; full root cause in the report).
  - `cleanupClassifier` doc trimmed to 3 lines pointing at the
    report.

Local validation on win32-x64:
  - bare-make build clean
  - npm run lint clean
  - npm run test:cpp 29/29
  - npm run test:integration 14/14 + 140/140 asserts

Files: packages/classification-ggml/addon.js
  packages/classification-ggml/addon/src/addon/AddonJs.hpp
  packages/classification-ggml/addon/src/js-interface/binding.cpp
  packages/classification-ggml/test/integration/classify.test.js
  packages/classification-ggml/test/integration/error-cases.test.js
  packages/classification-ggml/test/integration/utils.js
Made-with: Cursor

* QVAC-17481 chore: adopt upstream WA fixes from PR #1825

Bumps qvac-lib-inference-addon-cpp from 1.1.5#1 to 1.1.6 (the version
shipped by PR #1825) and removes the two local workarounds it was
brought in to dodge:

- Win32 burn-one js_create_double in JsClassifyOutputHandler is gone;
  upstream's JsUtils::Number::createDouble now applies a process-wide
  burn-once guard via static-init.
- Two-phase sleep around unload() in cleanupClassifier is gone;
  upstream's ~OutputCallBackJs now defers js_delete_reference into the
  uv_close callback via a heap-owned State.

Local Win32 validation: 14/14 integration tests + 29/29 C++ unit
tests pass; in particular the index-0 marshalling assertions and the
back-to-back load/unload cycle test that previously SIGSEGV'd both
pass without their prior workarounds.

Resolves T1 + T10 from the audit; details in remote_logs/issues_report.md.

Made-with: Cursor

* QVAC-17481 chore[api]: align lifecycle with llamacpp-llm pattern

Re-shape the JS layer so request orchestration mirrors the LLM addon
(closes T5-T9 from PR #1727 review):

- addon.js becomes a thin C++ binding wrapper (mirrors LlamaInterface):
  constructor takes `(binding, configurationParams, outputCb, logger)`,
  exposes `activate()` / `runJob()` / `cancel()` / `unload()`. The
  bespoke `_pending` Promise + `_outputCallback` are gone; export a
  shared `mapAddonEvent(rawEvent, rawData, rawError)` instead.
- index.js becomes the orchestration layer (mirrors LlmLlamacpp): one
  `exclusiveRunQueue()` serialises load/classify/unload, one
  `createJobHandler()` owns the active QvacResponse, and the output
  callback fans events through `_handleAddonOutputEvent`.
- load() now does try/catch around `activate()` and best-effort
  `_addon.unload()` on failure so a partial init never leaves a
  zombie native handle (T6).
- classify() resolves on the terminal stats event rather than the
  first ClassifyOutput, eliminating the orphan-callback risk that
  motivated the `_pending` drain on the previous design (T7, T8).
  Public shape unchanged: still `Promise<Array<{label,confidence}>>`.
- unload() runs through the same queue, calls native `cancel()` on
  in-flight work, fails the active JS request with `Model was unloaded`,
  then destroys the native handle (T9).

mapAddonEvent is keyed on payload shape (Array → Output, plain object
→ JobEnded terminal) because the upstream JobRunner emits the stats
trailer with a raw `std::vector<std::pair<...>>` RTTI name rather than
a literal `*JobEnded` event. Documented inline.

Local validation: 14/14 integration + 140/140 asserts in 2.8s
(down from 8.2s in Group A — the LLM-style cancel/unload is much
faster than the prior drain-then-destroy pattern); 29/29 C++ unit
tests; standard lint clean.

Made-with: Cursor

* QVAC-17481 infra: add canonical on-pr + on-pr-close workflows for classification-ggml

Adds the two missing top-level workflow files so the addon now has the
full 5-file layout used by every other modern addon in the monorepo
(`decoder-audio`, `diffusion-cpp`, `ocr-onnx`, `bci-whispercpp`):

- `on-pr-classification-ggml.yml` -- canonical PR trigger router.
  authorize -> changes -> sanity / ts-checks / cpp-lint / prebuild ->
  integration / mobile -> merge-guard. Path filters scope to
  `packages/classification-ggml/**` and the addon's own workflow files.
- `on-pr-close-classification-ggml.yml` -- mirror of
  `on-pr-close-decoder-audio.yml`. Triggers `public-delete-npm-versions`
  with `packages: classification-ggml` to clean up per-PR npm pre-releases
  on PR close.

Closes T11 from PR #1727 review (olyasir: "rename in same format as other
pipelines"). The legacy-named `on-pr-qvac-lib-infer-ggml-classification.yml`
on the fork PR-1 branch will be removed at sync-to-PR-1 time.

The hack-branch dispatch swap (`on-pr-qvac-lib-infer-llamacpp-llm.yml`
hijacked + `*-temp.yml` parking) is intentionally left untouched here:
new workflows aren't dispatchable from the GitHub Actions UI until they
exist on `main`, so the swap is still our only working dispatch path
for hack-branch CI runs.

Validation: both files parse with `yaml.safe_load`; every workflow /
composite-action reference resolves on disk.

Co-authored-by: Cursor <cursoragent@cursor.com>

* QVAC-17481 doc: trim verbose AI-style comments across the addon

Closes T2/T3/T4 from PR #1727 (jesusmb1995: "Please remove this
comment, its unnecessary... LLM's are too verbose"), and applies the
same four cleanup rules across the rest of …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants