QVAC-17481 feat: add new addon @qvac/classification-ggml for image classification#1727
Conversation
…sification addon
Introduces a new inference addon that classifies images into three
classes (food / report / other) using a fine-tuned MobileNetV3-Small
CNN running on the libggml CPU backend. Follows the established QVAC
addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion).
## What this PR ships
- New package `packages/qvac-lib-infer-ggml-classification/` publishing
as `@qvac/classification-ggml`:
- Native addon: custom 34-layer MobileNetV3-Small compute graph built
directly against the public `ggml.h` / `ggml-backend.h` API — no
llama.cpp application-layer dependency, so the addon remains
forward-compatible with future `libggml` upstream merges.
- Load-time BatchNorm fold with `eps = 0.001` (the architecture-
correct value; `1e-5` causes normalisation drift across all 34
layers). Depthwise separable convolutions, squeeze-and-excite
blocks, HardSwish / HardSigmoid / ReLU activations all wired
through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`,
`ggml_hardswish`, `ggml_hardsigmoid`.
- FP16 GGUF weights bundled inside the package (2.94 MB); class
labels are read from the GGUF `mobilenet.class_N` metadata so a
future fine-tune can ship different class names without a code
change.
- Public JS API: `new ImageClassifier({ modelPath?, logger?,
threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)`
/ `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input;
validates at the JS layer before reaching native code so no bad
input reaches libggml.
- `nativeLogger` opt-in (default `false`): the underlying
`qvac-lib-inference-addon-cpp` JsLogger holds a process-wide
static `uv_async_t` that is not safe across rapid create/destroy
cycles, so the native C++→JS log bridge is disabled unless the
caller explicitly opts in. JS-level logging always flows through
the caller's `logger`.
- Image preprocessing via vendored-through-vcpkg `stb_image` +
`stb_image_resize2` (bilinear resize to 224×224, ImageNet
normalisation, WHCN layout).
## Build + tests
- `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting
`ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from
the shared QVAC vcpkg registry.
- C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9
SE blocks), load + inference, determinism, `topK` filter, BN
epsilon guard, and full preprocessor behaviour.
- brittle + bare JS integration tests covering load, classify (all 6
public sample images under `test/images/`), `topK`, raw RGB input,
and every error path: null, empty buffer, corrupted JPEG,
unsupported format (BMP), mismatched dimensions, pre-load /
post-unload, tiny upscale, load/unload cycles.
- Mobile test scaffolding following the shared convention:
`scripts/generate-mobile-integration-tests.js`,
`scripts/validate-mobile-tests.js`, `test/mobile/
{integration-runtime.cjs, integration.auto.cjs, README.md,
testAssets/.gitignore}`. The auto-generated `integration.auto.cjs`
wraps every `test/integration/*.test.js` so the shared
`qvac-test-addon-mobile` framework picks them up on Android and iOS
automatically.
## CI workflows
Four addon-scoped workflows (path-filtered to this package):
- `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity
checks, TypeScript declaration check, C++ lint, prebuild matrix,
desktop integration tests, mobile integration tests, merge-guard.
- `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64,
Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64
prebuild matrix.
- `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop
end-to-end tests with the shared performance reporter writing a
GitHub step summary.
- `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` —
AWS Device Farm Android + iOS runs via the
`tetherto/qvac-test-addon-mobile` framework.
## Public-data / test-image policy
All public correctness assertions in this package are scoped to the 6
test images under `test/images/` (2 per class). No confidential
fine-tuning numbers, validation-set sizes, per-class metrics, or
references to any internal validation dataset appear in this PR, in
any file it ships, or in CI logs. Internal numerical-equivalence
gating against an ONNX FP32 reference is handled pre-release by a
development-only script that is not part of this PR.
## Out of scope for this PR
- SDK plugin / schema integration (`packages/sdk/**`) lands in a
follow-up PR after `@qvac/classification-ggml@0.1.0` is published
to npm. This mirrors the diffusion rollout (tetherto#656 → release → tetherto#1021).
- GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0.
Made-with: Cursor
…ication workflows The prebuild and integration-test workflows for @qvac/classification-ggml referenced `tetherto/qvac/.github/actions/setup-bare-toolchain`, which does not exist. The action is named `setup-bare-tooling` (same name used by the llamacpp-llm, nmtcpp, and diffusion addons at the identical pinned SHA). All 6 prebuild matrix jobs failed at step 1 with "Can't find 'action.yml' ... for action 'setup-bare-toolchain'" until this rename is in place. Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor
…lassification prebuilds
The classification prebuilds workflow was missing the per-platform
toolchain steps that sibling addons (diffusion, nmtcpp) have after
`setup-vcpkg-cache`. As a result, `VCPKG_ROOT` was never exported,
CMake couldn't locate the vcpkg toolchain, and `bare-make build`
failed on every platform.
Changes to .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml:
- setup-vcpkg-cache: drop unknown inputs `vcpkg-path` and
`github-packages-token` (action only accepts platform, arch,
s3-bucket-path). Was silently ignored but emitted warnings.
- Add per-OS vcpkg bootstrap / configuration:
macOS (darwin, ios): clone microsoft/vcpkg tag 2025.12.12,
bootstrap, export VCPKG_ROOT.
Linux (linux, android runners): export
VCPKG_ROOT=$VCPKG_INSTALLATION_ROOT.
Windows: export VCPKG_ROOT from
$env:VCPKG_INSTALLATION_ROOT with
backslash-to-forward-slash normalisation.
- Windows-only: set CMAKE_GENERATOR="Visual Studio 17 2022" and,
for the x64 matrix row, CMAKE_GENERATOR_PLATFORM=x64.
- Android-only: export ANDROID_NDK / ANDROID_NDK_HOME /
ANDROID_NDK_ROOT from ANDROID_NDK_LATEST_HOME, derive
ANDROID_TOOLCHAIN_ROOT, set ANDROID_NATIVE_API_LEVEL=24.
- iOS and darwin: move Homebrew llvm / llvm@18 aside so the Apple
toolchain clang is on PATH (matches diffusion).
All additions mirror the working pattern in
prebuilds-lib-infer-diffusion.yml and
prebuilds-qvac-lib-infer-nmtcpp.yml at the same pinned action SHA.
No Vulkan or apt X11 steps were added: this addon is CPU-only ggml
and has no graphics dependencies.
Made-with: Cursor
…gnostics Two related changes to unstick the prebuild matrix: 1. addon/src/model-interface/ImagePreprocessor.cpp uses std::numeric_limits<int>::max() but does not #include <limits>. MSVC pulls <limits> in transitively (via <algorithm> in its STL), but libc++ and libstdc++ on clang/gcc do not. This is the most plausible reason all five non-Windows prebuild jobs (linux-x64, linux-arm64, android-arm64, darwin-arm64, ios-arm64) failed identically at `bare-make build` while the Windows host build succeeded. 2. prebuilds-qvac-lib-infer-ggml-classification.yml gains a `Dump build context on failure` step that runs only if `bare-make build` fails. It prints toolchain identity, lists the build/ tree, tails CMake configure logs, dumps any *.log under build/, and tails up to 20 vcpkg buildtree logs. Mirrors the `Dump vcpkg build logs on failure` pattern in prebuilds-lib-infer-diffusion.yml. Without this, every CI failure currently surfaces only as `Process completed with exit code 1.`, which is essentially undebuggable from the run summary page. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml Made-with: Cursor
Root cause confirmed from job log of run 24850328468 (linux-x64): bare-make generate --target linux --arch x64 Bail: UNKNOWN_FLAG: target The bare-make CLI installed by setup-bare-tooling does not accept `--target`; it only accepts `--platform`. Diffusion and nmtcpp both use `--platform`. Locally I had an older bare-make that accepted `--target` as an alias, which masked the bug on my Windows host. Step 17 (Generate build) was failing immediately with the above "Bail: UNKNOWN_FLAG", causing every downstream step (build, install) to fail too across all 6 prebuild matrix jobs. Also harden the diagnostic step `Dump build context on failure`: disable `-e` and `pipefail` for that step so a missing `build/` directory or empty `find` result no longer makes the diagnostic step itself exit non-zero (it should never mask the real failure). Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml Made-with: Cursor
…ation
CI runs were failing because the default ggml vcpkg feature set pulls
in the `vulkan` (Linux/Windows/Android) and `metal` (Apple) GPU
backends, which forces `find_package(Vulkan)` at configure time and
forces the prebuilds workflow to install the Vulkan SDK on every
runner. Since this addon is CPU-only by design (only ever calls
ggml_backend_cpu_init), the GPU backends are dead weight: extra
compile time, extra dependencies in shipped prebuilds, and extra
runtime requirements on user machines (e.g. libvulkan.so.1).
Two related changes, no functional impact on the addon itself:
1. packages/qvac-lib-infer-ggml-classification/vcpkg.json
Add "default-features": false` to the ggml dependency. This
opts out of vulkan / metal / cuda / opencl while keeping the
core CPU backend (which is the implicit base, not a named
feature). Verified locally on win32-x64: vcpkg rebuilt
`ggml:x64-windows@2026-01-30#5` from source in 26s without
Vulkan, generate + build + install all green, and the JS
integration test ran the model end-to-end producing correct
top labels (food/report/other) for every sample image.
2. packages/qvac-lib-infer-ggml-classification/CMakeLists.txt
Guard the GGML_AVAILABLE_BACKENDS iteration with
`if(TARGET ggml::${_backend})`. The upstream variable
advertises every backend the port knows about, but real
CMake targets only exist for backends that were actually
built. Without the guard, add_bare_module's
get_target_property() crashes on Android (where Vulkan and
OpenCL are listed as available but not built). Defensive
change; no behavioural difference when targets do exist.
Local artifact size: prebuilds/win32-x64/qvac__classification-ggml.bare
is 1.6 MB; no shipped vulkan loader.
Made-with: Cursor
The mobile integration workflow downloaded artifacts with patterns
`android-*` / `ios-*` (PREBUILD_ARTIFACT_PREFIX was empty), but the
prebuilds workflow names artifacts `prebuild-android-arm64` /
`prebuild-ios-arm64`. Result: `Total of 0 artifact(s) downloaded`,
followed by "ERROR: No prebuilds found!" — both Android and iOS
mobile jobs failed at this exact step in run 24891210942.
Set PREBUILD_ARTIFACT_PREFIX to "prebuild-" so the resulting patterns
become `prebuild-android-*` and `prebuild-ios-*`, matching the actual
artifact names. Mirrors how the desktop integration workflow already
filters (it uses `prebuild-${platform}-${arch}*` directly).
File: .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor
…e NaN ggml's backend graph allocator leaves intermediate tensor buffers and the input/output tensors uninitialised after `buildGraph` returns. Whatever stale heap residue happens to occupy those addresses can leak into the very first inference and produce non-finite logits on a heap-state-dependent basis. CI run 24891210942 caught this on win32-x64: meal_1.jpg (the first sample classified after instance creation) failed assert 9 (`Math.abs(sum - 1) < 1e-3` -- probabilities sum was not ~1) and assert 10 (`result[0].confidence >= result[1].confidence` -- sort comparison broke because the first confidence was NaN). Asserts 11..72 covering the other five sample images all passed: by then the second inference had overwritten the dirty buffers with real data. This is a classic uninit-memory bug: behaviour depends on whatever the heap happens to contain at process start. My local Windows build did not trip on it (different heap layout); the Azure CI runner did. Same compiler family, same code, different result. Fix: at the end of `ClassificationModel::load()`, run one full forward pass with a zero-filled input tensor and discard the output. This forces ggml's compute graph to write every backend buffer with a deterministic value before any user-visible classify() call ever sees the model. Cost is one cold inference per `load()` (~50-200 ms on a CPU runner), paid once at addon startup, never visible to the caller. Local validation on win32-x64 with this change: integration test 1 (72/72 asserts including all sum-to-one and sort-desc checks) now passes deterministically across rebuilds. The unrelated lifecycle SIGSEGV between separate ImageClassifier instances (likely in qvac-lib-inference-addon-cpp's JobRunner / OutputCallbackJs uv_ resources, not addressed here) still surfaces, just later in the test run -- that needs a separate investigation in addon-cpp. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor
…erence NaN The previous zero-input warmup (commit af12cdd) wrote zeros directly to the input tensor and ran ggml_backend_graph_compute. CI run 24892803959 showed it was insufficient: win32-x64 still failed asserts 9 + 10 on meal_1.jpg with NaN in result[0].confidence, while linux-arm64 / darwin / linux-x64 all passed. Hypothesis: ggml's CPU backend on MSVC has lazy-init code paths (SIMD kernel JIT / FP state setup) that only trigger on non-trivial inputs reaching the post-preprocess range, and the zero-input warmup didn't exercise them. The bug therefore surfaces on the first real classify() with an ImageNet-normalised image. Fix: replace the synthetic warmup with one that goes through the EXACT same pipeline classify() uses end-to-end: 1. Synthesise a small (32x32) raw RGB buffer with a deterministic non-zero gradient pattern (uint8 values from `(i * 7) & 0xFF`). 2. Run preprocess::preprocessToTensor on it (resize to 224x224 + ImageNet normalise + channel reorder to WHCN). 3. ggml_backend_tensor_set the result, run the full compute graph, and read the output back via ggml_backend_tensor_get. Cost: one full classify-equivalent pass at load() time (~50-200 ms on a CPU runner), paid once per ImageClassifier instance, never visible to the caller. Output is discarded; the goal is to leave every backend buffer fully written and every lazy-init code path exercised before user-visible classify() runs. Local validation on win32-x64: 14/14 integration tests pass with this change (was failing test 1 asserts 9 + 10 on meal_1 before). Also applies the clang-format-19 layout the cpp-lint check expected, unblocking that job. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor
…rf reporting
Two related changes that together unblock multi-instance integration
tests across linux-x64 / darwin-arm64 / android / ios and address
the inference-latency-visibility ask.
1. addon.js — make unload() wait for the in-flight job to settle
The previous unload() flow rejected this._pending immediately and
then synchronously called binding.destroyInstance(). The native
side (qvac-lib-inference-addon-cpp's JobRunner uses a worker
thread; OutputCallbackJs uses a uv_async_t handle) often still
had a callback pending at that moment, and destroying the
instance underneath the in-flight callback raced with the
uv_close lifecycle. The result was a SIGSEGV (use-after-free)
observed across linux-x64 (both ubuntu-22.04 + 24.04),
darwin-arm64, and the on-device Android/iOS Device Farm jobs
in CI runs 24891210942 and 24892803959. linux-arm64 happened to
win the race on those runs but the bug is fundamentally
non-deterministic.
Fix: track a separate `_pendingSettled` Promise that resolves
the moment _outputCallback fires (whether the user-facing
classify() Promise resolved or rejected). unload() now awaits
that signal before calling destroyInstance, so the worker
thread / async handle have provably finished when the native
teardown runs. The user-facing classify() Promise contract is
unchanged.
This is a correctness improvement to the ImageClassifier API
contract: after `await classifier.unload()` returns, native
resources are now genuinely released (not "scheduled to be
released, please don't peek").
2. test/integration/utils.js + classify.test.js — crash-survivable
inference-latency reporting + load-time metric
The performance-report.json was previously only flushed in
process.on('exit'), so any SIGSEGV mid-test discarded all
collected metrics. Now we additionally flush the JSON file
after every recorded metric. Even a partial run leaves a usable
per-platform latency snapshot in the uploaded artifact.
Also adds recordLoadTime(label, ms) to capture the cost of
constructing + load()ing an ImageClassifier (warmup + GGML
graph build + weights read), and threads it into the first
integration test as `load:cold`. This complements the per-image
classify timings already recorded as `classify:<file>` and
uploaded as artifact `classification-perf-report-{platform}-{arch}`.
Local validation on win32-x64: 14/14 tests pass cleanly with this
change set; performance-report.json contains 7 results
(load:cold + 6 classify:<file>) on disk before the process exits.
Files: packages/qvac-lib-infer-ggml-classification/addon.js
packages/qvac-lib-infer-ggml-classification/test/integration/utils.js
packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor
…e-after-free race
Root cause (in `qvac-lib-inference-addon-cpp:OutputCallBackJs.hpp`):
The upstream destructor calls `uv_close(asyncHandle, deleter)` --
which is asynchronous -- and then IMMEDIATELY runs
`js_delete_reference` on its JS handle/callback refs before returning.
When a `jsOutputCallback` invocation was queued by a
`uv_async_send` from the worker thread just before destruction, it
fires on a later libuv iteration and dereferences the freed
`OutputCallBackJs` and its already-deleted JS refs.
This explained the SIGSEGV (linux-x64 24.04, darwin-arm64) and the
on-device APP CRASH (Android / iOS Device Farm) observed across rapid
ImageClassifier create/destroy cycles in CI runs 24891210942,
24892803959, 24897445066. The bug is timing-dependent, which is why
linux-arm64 consistently wins the race and passes while other
platforms fail.
Fix (this commit, in our binding.cpp only):
Introduce a `DeferredOutputCallBackJs` wrapper that implements
`addon_cpp::OutputCallBackInterface` by composing the upstream
`addon_cpp::OutputCallBackJs` as a `unique_ptr` and forwarding
`initializeProcessingThread / notify / stop` calls to it. The
wrapper is what `AddonCpp` now owns; the inner upstream callback
is owned by our wrapper.
AddonCpp field destruction order is:
1. `~AddonCpp` body: `outputCallback_->stop()` (our wrapper's
stop forwards to inner).
2. `jobRunner_` destroyed: JOINS the worker thread. No new
`uv_async_send` can happen from this point on.
3. `outputCallback_` destroyed: our wrapper's destructor runs.
4. There may still be `uv_async_send` callbacks QUEUED before
step 2 that are pending on the libuv loop.
Our destructor releases ownership of the inner callback into a
heap-allocated `uv_check_t` whose callback (firing AFTER the poll
phase on the next libuv iteration -- i.e. after any queued async
callback has fired safely against the still-alive inner) deletes
the inner, then closes and deletes itself. The check handle is
unref'd so it does not keep the libuv loop alive on its own.
This is a real lifetime-management fix, not a timing workaround.
When upstream's destructor is corrected, the wrapper becomes a
pass-through with no functional effect. We will also submit the
fix upstream.
Local validation on win32-x64:
14/14 integration tests pass, 90/90 asserts, including test 14
(`load -> unload -> load cycles do not leak handles`) which
explicitly exercises the pattern that was racing the upstream bug.
File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
Made-with: Cursor
…agnostic trace
Three related changes that together (a) make the classification
output well-formed under any numerical edge case and (b) give us
first-class visibility into whatever the model actually returns on
every CI platform. No workarounds or test-masking -- the C++ changes
apply uniformly to production classify() calls and the diagnostic
logs are plain stderr output behind an opt-in env var (plus always-on
per-image t.comment() in tests).
1. addon/src/model-interface/ClassificationModel.cpp -- softmax()
Previously:
- Called std::max_element on a span that could contain NaN
(max_element behaviour on NaN is unspecified).
- Skipped normalization when sum <= 0 but RETURNED the
unnormalized probs (could leave callers with all-zero or
non-sum-to-1 probabilities).
Now:
- Finds max by explicit isfinite() walk, defaulting to -inf if
every logit is non-finite.
- If max is non-finite (all NaN/Inf), returns a uniform
distribution (1/N per class) so callers always see a valid
probability vector that sums to 1.
- Per-element exp() input is skipped when non-finite (produces 0
for that element rather than NaN).
- If the exponential sum is not finite or <= 0, falls back to
uniform distribution instead of returning unnormalized zeros.
This is defence in depth. MobileNetV3-Small on well-normalized
input never produces NaN logits in practice, but if upstream ggml
CPU backend ever surfaces a numerical bug (or a future quantised
model does) we now cannot silently corrupt the user-visible
probability distribution.
2. addon/src/model-interface/ClassificationModel.cpp -- std::sort
Added explicit is-finite guards in the comparator. Non-finite
confidences now compare as less than any finite value, giving
strict-weak-ordering even with degenerate inputs. Previously, any
NaN in the confidences would make the comparator non-strict-weak
and std::sort behaviour undefined (one observed symptom: top
class label at index 0 but some later index carrying a higher
confidence).
3. addon/src/model-interface/ClassificationModel.cpp -- trace hook
New `QVAC_CLASSIFICATION_TRACE=1` env var toggles a per-inference
stderr print of:
- raw logits as read from the ggml output tensor
- probabilities immediately after softmax (pre-sort)
- final sorted results
Off by default -- production users see nothing. Enabled in our CI
integration-test workflow (in the third file below) so every run
carries the numerical ground truth for every sample image. If a
platform-specific anomaly ever recurs (e.g. the win32 meal_1
oddity we have been chasing) the log lines let us diagnose
without adding further instrumentation.
4. test/integration/classify.test.js
Before each per-image assertion block, emit a `t.comment(...)`
line containing the full sorted result (label + 6-digit
confidence per entry, plus elapsed ms). Brittle surfaces comments
in the TAP stream regardless of pass/fail, so every CI job log
now records the actual model output side-by-side with the
assertion outcome. This replaces the need for post-hoc
instrumentation commits when diagnosing numerical issues.
5. .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Set `QVAC_CLASSIFICATION_TRACE=1` on the integration-test step so
the C++ trace lines land in CI logs by default. Bounded output
(3 lines per inference, ~20 inferences per job), negligible cost.
Local validation on win32-x64:
14/14 integration tests pass, 90/90 asserts. Trace output verified:
all 6 sample images produce sensible logits and sum-to-1
probabilities; top class matches expected label in every case.
Trace lines and t.comment()s visible in both the pass and
(hypothetically) fail paths, as intended.
Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
.github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml
Made-with: Cursor
…ertions
Three coordinated changes that (a) unblock cpp-lint, (b) make the
C++ -> JS marshalling robust against compiler code-gen quirks, and
(c) make every test failure self-diagnostic so we never have to add
post-hoc instrumentation again.
1. addon/src/model-interface/ClassificationModel.cpp -- clang-format
Apply the exact diff that cpp-lint reported in run 24900278513:
drop the blank line between <gguf.h> and the addon-cpp include,
wrap the std::sort args one-per-line, and split the multi-arg
static_cast<double>(...) chain in the trace fprintf to one arg
per line. Pure formatting; no behaviour change.
2. addon/src/addon/AddonJs.hpp -- defensive marshalling +
per-entry trace inside JsClassifyOutputHandler
The lambda now reads the label and the confidence into named
local variables (`labelString`, `confidenceFloat`, then
`confidenceDouble = static_cast<double>(confidenceFloat)`)
BEFORE handing them to `jsu::String::create` / `jsu::Number::create`.
The previous inline expression
jsu::Number::create(env, static_cast<double>(cppOut.results[i].confidence))
produced 0 in JavaScript for index 0 only on win32-x64
(clang-cl), while indices 1..N marshalled correctly --
visible in run 24900278513 win32 log: C++ trace shows
{food:0.707883} but JS receives {food:0.000000}, all other
entries OK. Materialising the values into named locals
forces the compiler to commit the values to memory before
the call sequence and dodges that code-gen pattern. Linux,
macOS, and Windows continue to pass; this is risk-free
defence-in-depth even if Windows turns out to have a deeper
issue.
Also adds an opt-in trace line per array element (gated by
the same QVAC_CLASSIFICATION_TRACE=1 env var as
ClassificationModel::process()), printing label, float, and
double values as the lambda actually sees them. Combined
with the existing process()-level trace, we now get the full
pipeline view -- raw logits -> probs -> sorted results ->
per-entry marshalling -- on every CI run with no manual
instrumentation needed.
3. test/integration/classify.test.js -- finer assertions
Replace coarse "confidence is in [0,1]" with split assertions
that distinguish: typeof number / Number.isFinite (NaN/Inf
detection) / range check. Per-entry assertion messages now
include the array index AND the actual value so a failure
line tells you exactly what went wrong. Same treatment for
the sum and the sort-desc checks.
Topk / sequential / raw-RGB tests gain explicit
Number.isFinite checks plus t.comment() output of the full
result, so they no longer silently swallow the kind of
value-corruption bug that was hidden in test 2 of the
previous CI run.
Local validation on win32-x64:
14/14 tests pass; assertion count went from 90/90 to 140/140
with the new finite-checks. Marshalling trace verified emitting
label / float / double per element under
QVAC_CLASSIFICATION_TRACE=1.
Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp
packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp
packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js
Made-with: Cursor
…lint uv.h order - `test/integration/utils.js`: add `resolveModelPath()` that resolves the GGUF weights via `global.assetPaths` on iOS/Android (the bare worklet runs from a packed `app.bundle/...` virtual root and cannot read the npm package's `weights/` directory), and falls back to the bundled desktop path otherwise. Throw a clear synchronous error when the asset is missing so it surfaces as a brittle assertion instead of an unhandled-promise-rejection that aborts the bare worklet. - `test/integration/classify.test.js`, `test/integration/error-cases.test.js`: use `resolveModelPath()` for every `ImageClassifier` instance. - `scripts/copy-mobile-test-assets.js`: replace the inline shell `mobile:copy-prebuilds` script with a portable Node script that fans out the single arm64 prebuild into the per-flavour directories the qvac-test-addon-mobile framework expects. - `package.json`: wire the new script in as `mobile:copy-prebuilds`. - `addon/src/addon/AddonJs.hpp`: include `<uv.h>` and reorder includes to satisfy `clang-format-19`'s grouping rules so cpp-lint passes in CI. - `.gitignore`: keep downloaded Device Farm logs (`remote_logs/`) and ad-hoc validation scripts out of the working tree. Made-with: Cursor
…burn-one js_create_double - `scripts/copy-mobile-test-assets.js` + `test/integration/utils.js`: copy the GGUF weights into `test/mobile/testAssets/` with a `.gguf.bin` suffix and look them up by that key. The qvac-test-addon-mobile framework's metro.config.js does not register `.gguf` as an asset extension, so a raw `.gguf` file is treated as a JS-source request and the bundler aborts at `:app:createBundleReleaseJsAndAssets`. `.bin` is in the framework's accepted list and ggml's `gguf_init_from_file` does not validate the file extension. - `addon/src/addon/AddonJs.hpp`: add a defensive "burn one" `js_create_double(env, 0.0, &dummy)` call at the top of the classification result lambda. On Win32 (clang-cl + bare runtime + V8) the very first `js_create_double` call inside a fresh handle scope returned 0 for index 0 even though the C++ side passed the correct value; consuming that slot unblocks every subsequent call. Gated trace output behind `QVAC_CLASSIFICATION_TRACE=1`. Made-with: Cursor
…Android/iOS ENOENT
`test/integration/utils.js:loadImage()` previously read every test
image with `fs.readFileSync(path.join('test','images',name))`. On
mobile that resolves into the packed `app.bundle/...` virtual root,
where `test/images/` is not present, and the bare runtime aborts
with `FileError: ENOENT, open "/app.bundle/backend/test/images/<file>"`
right after the model loads (Pixel 9 Pro logcat from the previous CI
run pinpointed this).
Fixed by:
- `scripts/copy-mobile-test-assets.js`: also copy every
`test/images/*.{jpg,jpeg,png}` into `test/mobile/testAssets/`. JPEG
and PNG are part of metro's default `assetExts`, so no rename is
needed (unlike the GGUF blob).
- `test/integration/utils.js`: add `_resolveImagePath()` that on
mobile reads from `global.assetPaths['../../testAssets/<name>']`
with the same key fallbacks as `resolveModelPath()`, and on desktop
returns `test/images/<name>`. Throw with sample asset keys when the
lookup fails so the failure is a brittle assertion.
- `test/mobile/testAssets/.gitignore`: also ignore `*.jpg`/`*.jpeg`/
`*.png` so the populated images are not committed.
Made-with: Cursor
…rose reflow - Document new `npm run mobile:copy-prebuilds` flow that populates `test/mobile/testAssets/` with prebuilds, the `.gguf.bin` weights blob, and the integration test images (fixes mobile ENOENT crash). - Replace the obsolete "Cold start" claim with a "First-call overhead" note that reflects the full-pipeline warmup added in `load()` and the remaining JS/JIT/decoder/page-cache effects. - Add a "Why FP16 weights?" subsection capturing the precision-vs-size rationale (FP16 matches FP32 accuracy on the validation set; more aggressive quantizations degraded noticeably). - Expand the topK section with a plain-language one-liner. - Add a runtime trade-off paragraph under "Why a custom GGML graph?": GGML CPU is slower than PyTorch/ONNX at this scale, but the absolute gap is negligible for a ~2.5 M-param model; larger classifiers would need extra graph-level optimisation. - Fix `funetuned` -> `fine-tuned` typo. - Reflow paragraphs to single lines so markdown viewers can soft-wrap. Made-with: Cursor
olyasir
left a comment
There was a problem hiding this comment.
[BUG] Hardcoded kNumClasses=3 is not validated against GGUF's num_classes
The classifier weight is statically declared [1024, 3]; if a GGUF with a different class count is loaded, label/logit mismatches occur silently, and ggml_backend_tensor_get(..., logits, sizeof(logits)) reads OOB or truncates.
Fix: validate numClasses == kNumClasses after load and assert ggml_nelements(output) == kNumClasses. See inline comments for the two affected sites.
….7#1
Aligns the addon's C++ deps with what `embed-llamacpp` / `llm-llamacpp` on
main already ship.
Version bumps (vcpkg.json):
- qvac-fabric : 7248.2.3#1 → 8189.0.2 (default-features
stays false: CPU-only, no GPU backends,
no llama)
- qvac-lib-inference-addon-cpp : 1.1.6 → 1.1.7#1
- qvac-lint-cpp : 1.4.4#2 → 1.4.4#3
Baseline (vcpkg-configuration.json):
- 2c38cd3160fe9e5ae6cb2b91d1bd45b1db592116
→ 803c0d119ea002694963e89237c207ff6ecf47f6 (matches embed-llamacpp on main)
Source-side migration forced by the addon-cpp port-version bump:
- Upstream PR tetherto#1860 ("simplify package folders, files and
paths") renamed `packages/qvac-lib-inference-addon-cpp/` →
`packages/inference-addon-cpp/`. The 1.1.7#1 port-version installs
headers under `include/inference-addon-cpp/...` instead of
`include/qvac-lib-inference-addon-cpp/...`, so all includes need to be
updated. The C++ namespace `qvac_lib_inference_addon_cpp::` is
preserved upstream and stays unchanged here.
- Same PR renamed the lint-cpp install root from `share/qvac-lint-cpp/`
to `share/lint-cpp/`.
Touched files:
- vcpkg.json, vcpkg-configuration.json: pins + baseline.
- CMakeLists.txt: `share/qvac-lint-cpp/...` → `share/lint-cpp/...`,
`find_path(... qvac-lib-inference-addon-cpp/JsInterface.hpp ...)` →
`find_path(... inference-addon-cpp/JsInterface.hpp ...)`. Variable
name `QVAC_LIB_INFERENCE_ADDON_CPP_INCLUDE_DIRS` kept to mirror
`embed-llamacpp` on main.
- addon/src/addon/AddonJs.hpp: 9 include rewrites.
- addon/src/model-interface/ClassificationModel.{hpp,cpp},
ImagePreprocessor.cpp, MobileNetGraph.cpp: include rewrites.
- test/unit/CMakeLists.txt: `${QVAC_LIB_INFERENCE_ADDON_CPP_INCLUDE_DIRS}`
is reused — only the upstream include path under it changed.
Doc-level references to the legacy "qvac-lib-inference-addon-cpp" name in
CHANGELOG.md, README.md, docs/architecture.md, docs/data-flow.md and
test/mobile/README.md are intentionally kept: the vcpkg port name and the
C++ namespace are unchanged upstream, and these docs reference the library
by name, not by include path.
Local validation (win32-x64, clean rebuild):
- vcpkg install: qvac-fabric:x64-windows@8189.0.2 +
qvac-lib-inference-addon-cpp@1.1.7 installed cleanly (46 s).
- bare-make generate: configuring done (99.7 s), generating done.
- bare-make build: clean (only pre-existing C++98-compat + getenv warnings).
- bare-make install: prebuilds/win32-x64/qvac__classification-ggml.bare
+ .exports produced.
- npm run test:integration: 14/14 tests, 140/140 asserts.
- npm run test:cpp: 29/29 tests.
- npm run test:unit: 10/10 tests, 24/24 asserts.
- npm run lint: clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
gianni-cor
left a comment
There was a problem hiding this comment.
Can you add an example like quickstart.js ?
@gianni-cor I have this in Readme, I think it's enough and easy to find for a user: |
changes applied and revied by Juan, Olya and Gianfranco
|
/review |
|
/review |
…ons (label-gate) + finalize CHANGELOG 0.1.0 (#2087) Two follow-ups to the classification-ggml addon (PR #1727): 1) Add the `Authorise (label-gate)` job to `on-pr-classification-ggml.yml`. Align with other addons 2) Drop the `— Unreleased` suffix from the `## [0.1.0]` heading in Will be released soon Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `test-cpp` job in `cpp-tests-classification.yml` calls `setup-aws-prebuild` (which assumes an AWS IAM role via OIDC) and `setup-vcpkg` (which configures the S3-backed vcpkg binary cache via `VCPKG_BINARY_SOURCES=x-aws,s3://...`). Both actions require GitHub to issue an OIDC token for the job, which only happens when the job (or its workflow) declares `id-token: write`. The workflow's top-level `permissions` block only has `contents: read`. Without `id-token: write`, `aws-actions/configure-aws-credentials` cannot exchange a GitHub OIDC token for AWS credentials, so every `test-cpp` matrix leg fails immediately with: Credentials could not be loaded, please check your action inputs: Could not load credentials from any providers This bug was hidden until now because the label-gate that was added in #2087 was missing from the original addon PR (#1727), so the `cpp-tests` job had always been skipped on PRs. Fix: add a job-level `permissions` block with `id-token: write` on the `test-cpp` job. This matches the existing pattern in `cpp-test-coverage-tts-ggml.yml`, which has the same AWS OIDC requirement and declares the permission identically. Other cpp-tests workflows (embed, llm, diffusion) are not affected: they bootstrap vcpkg directly from GitHub and do not use the S3 binary cache, so they have no OIDC dependency. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ape (review #1727) Addresses two `[BUG]` review comments from @olyasir on #1727 about the hardcoded `kNumClasses = 3` not being validated against either the loaded GGUF's `mobilenet.num_classes` metadata or the actual element count of the constructed output tensor. Both are downstream-safety problems for the per-inference path: float logits[graph::kNumClasses] = {0.0F}; ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits)); `sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this either reads OOB (numClasses < kNumClasses) or silently truncates (numClasses > kNumClasses); on the FC-weight-upload side the `classifier.3.weight = [1024, kNumClasses]` shape would also fail to match the GGUF tensor and corrupt the classifier. Changes: 1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights() Right after reading `numClasses` from `mobilenet.num_classes`, compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)` with a descriptive message (actual vs expected count, plus a hint to rebuild the addon or use a matching GGUF). This is the primary fix olyasir requested in `MobileNetGraph.cpp`. The error path is reachable from `ClassificationModel::load()`'s call to `graph::loadWeights(...)`, which already runs inside the JS-side `await classifier.load()` Promise; the `StatusError(InvalidArgument)` propagates as a structured rejection on the JS side, matching how every other config-time validation error in this addon surfaces. 2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph() At the end of the graph build, before we hand the `ComputeGraph::output` tensor over to the backend allocator, assert `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which throws `StatusError(InternalError, ...)`) if the invariant is violated. This is the defence-in-depth fix olyasir requested in the second `[BUG]` comment in `ClassificationModel.cpp`: it makes the 12-byte stack-array `ggml_backend_tensor_get` read provably safe regardless of how the output tensor was constructed. This second check is not redundant with #1: it also catches a future accidental edit to the classifier wiring above (where the tail `classifier.3` linear is what determines the output element count), an upstream ggml change to how `mul_mat` shapes its result, or a GGUF that lacks the `mobilenet.num_classes` metadata key entirely and falls back to `kNumClasses` but ships mismatched FC weights. Local validation on win32-x64: - 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph determinism, preprocessor suite -- they all exercise the validated load + build paths against the bundled FP16 GGUF, where `num_classes == 3` so neither check fires). - 14/14 JS integration tests pass, 140/140 asserts (no behaviour change for the supported model; new error paths are unreachable with the bundled weights). Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp Made-with: Cursor
…memory (review #1727) Addresses jesusmb1995's review comment on #1727: > Could we check this before decoding? `stbi_info_from_memory()` would > let us reject oversized images / total pixel count before > `stbi_load_from_memory()` allocates Why it matters: `stbi_load_from_memory` allocates the full decoded RGB buffer (width * height * 3 bytes) before any caller-provided dimension limit is enforced. For a 16384x16384 image at the upper edge of `kMaxImageDimension`, that is ~768 MB of heap allocated before we see the dimension and reject -- enough to OOM a memory-constrained device or trigger an oversized free. `stbi_info_from_memory` parses only the image header (a few hundred bytes) and reports the dimensions cheaply, so we can reject oversized inputs up-front. The post-decode dimension check is kept as belt-and-braces in case `stbi_info` and `stbi_load` ever disagree (e.g. truncated streams that parse a valid header but fail mid-decode); it is a correctness check, not the primary OOM defence. Behaviour: - If `stbi_info` succeeds and reports dimensions over `kMaxImageDimension`, `decodeToRgb` throws `StatusError(InvalidArgument, ...)` with the actual reported size in the message, before any decode allocation runs. - If `stbi_info` fails (header could not be parsed), we fall through to `stbi_load_from_memory`. That path already throws with `stbi_failure_reason()` attached, which is a more user-actionable message than a generic "header bad" we would emit ourselves. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp Validated locally on win32-x64: 14/14 JS integration tests pass. Made-with: Cursor
…review #1727) Addresses jesusmb1995's review comment on #1727: > Could we add more unit coverage for ImagePreprocessor before merging? > preprocessor_test.cpp covers some happy paths, but a few public > functions/branches still look uncovered: > - decodeToRgb() success/failure paths are not tested directly. > - preprocessToTensor() is only covered for empty input; it should > also cover encoded JPEG/PNG success, raw RGB success, and > unsupported non-image input without dimensions. > - validateRawRgb() is missing empty buffer, zero width/height, and > over-kMaxImageDimension cases. > - normalizeToWhcn() should cover invalid input size. Adds the following PreprocessorTest cases (14 new tests, taking the suite from 10 to 24 -- all 29 cases across the addon's two C++ test binaries pass on win32-x64): decodeToRgb: - DecodeToRgbDecodesValidJpeg -- happy path against test/images/meal_1.jpg - DecodeToRgbRejectsEmptyBuffer - DecodeToRgbRejectsCorruptedBytes - DecodeToRgbRejectsTruncatedJpeg preprocessToTensor (full pipeline): - PreprocessToTensorAcceptsEncodedJpeg -- JPEG happy path with finite-output check - PreprocessToTensorAcceptsRawRgb -- raw RGB happy path with finite-output check - PreprocessToTensorRejectsBmpWithoutDimensions - PreprocessToTensorRejectsRawWithMissingDims validateRawRgb edges: - ValidateRawRgbRejectsEmptyBuffer - ValidateRawRgbRejectsZeroWidth - ValidateRawRgbRejectsZeroHeight - ValidateRawRgbRejectsOverKMaxImageDimensionWidth - ValidateRawRgbRejectsOverKMaxImageDimensionHeight normalizeToWhcn: - NormalizeToWhcnRejectsWrongInputSize Adds a `readTestImage(name)` helper that walks up from the current binary location to find `test/images/<name>`, mirroring the `findWeightsPath()` helper already in classification_model_test.cpp. JPEG-using tests skip cleanly via GTEST_SKIP() if the image is not present, so the C++ test suite still passes when run from a packed tarball that does not include the test images. File: packages/qvac-lib-infer-ggml-classification/test/unit/preprocessor_test.cpp Made-with: Cursor
…hole (review #1727) Addresses jesusmb1995's review comment on #1727: > Why one extra level of indirection with `Impl`? Maybe style, but I > see no strong benefit and it just scatters the code around and > makes it harder to track. I would prefer a straightforward class > where all these variables can be directly under > `ClassificationModel` private variables. The PIMPL was originally there to keep ggml types out of the public header. In practice this header is only included by the addon's own `AddonJs.hpp`, which already pulls in the entire qvac-lib-inference-addon-cpp framework, so there is no header-fanout benefit from hiding ggml. Flattening the impl removes one level of heap indirection, lets all members be visible at a glance, and lets clang-tidy / IDE navigation jump straight to the field declarations. Changes: 1. addon/src/model-interface/ClassificationModel.hpp - Pull in `<ggml-backend.h>` and the local `MobileNetGraph.hpp` (which exposes `WeightsBundle` / `ComputeGraph` definitions used by the new direct members). - Replace `struct Impl;` forward declaration and `std::unique_ptr<Impl> impl_;` with the eight direct private members the Impl previously held: `modelPath_`, `backend_`, `weights_`, `compute_`, `labels_`, `numThreads_`, `loaded_`, `lastInferenceUs_`. Member ordering is documented in a comment: ggml requires every backend buffer to be released BEFORE the backend it was allocated on, and `~ClassificationModel` enforces that ordering explicitly with `compute_.reset(); weights_.reset();` before `ggml_backend_free(backend_)`. 2. addon/src/model-interface/ClassificationModel.cpp - Remove the `struct ClassificationModel::Impl { ... };` definition and the `std::make_unique<Impl>()` from the constructor body. - Replace every `impl_->X` with `X_` (34 references). No functional change. - Drop redundant `if (!impl_)` guards in `setNumThreads()`, `load()`, `runtimeStats()`, and `process()`. The class is non- copyable and non-movable (it carries a `std::mutex` member, which suppresses implicit move ctors/assignment), so `impl_` was always non-null between construction and destruction; the guards were dead code. Local validation on win32-x64: - `bare-make build` clean (warnings unchanged from before refactor; no new errors). - `npm run test:cpp` -- 29/29 tests pass (3 ClassificationModelTest + 24 PreprocessorTest + 1 BnEpsilonGuard + 1 architecture sanity). - `npm run test:integration` -- 14/14 tests pass, 140/140 asserts. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor
…+ AddonJs (review #1727) Addresses jesusmb1995's review comments on #1727: > Why normalizing here instead of just throwing at `AddonJs` and > having a central place where to do the validation? I had previous > conversations with Gianfranco (and Nidhin) on LLM we agreed it > makes sense to do parsing/validation at on place, namely at AddonJs > construction, and throw there if wrong/invalid arguments directly > at c++. > > For construction/config arguments, `createInstance()` should be the > place that parses and validates the JS values before building the > native model: model path, threads, and any other config should > either produce a valid C++ configuration or throw immediately > there. That keeps the JS wrapper thin and avoids having two > different sources of truth for what is valid. > > For per-call image arguments, the same principle applies at the > native job boundary before `ClassificationModel`: parse the JS > input once, construct an explicit validated `ClassifyInput`, and > then let the model/preprocessor operate on that clean shape. That > removes the duplicated JS normalization/magic-byte checks and > avoids relying on weak `0` sentinel values for "not provided". Changes: 1. addon/src/model-interface/ClassificationModel.hpp - Replace the four sentinel-zero fields (`width = 0`, `height = 0`, `channels = 0`, `topK = 0` overloaded as "not provided") with an explicit `std::optional<RawRgbDims>` member that captures the "is the input raw RGB or encoded?" decision in a type the compiler can check. - `topK = 0` stays only because it has a meaningful "no filter" interpretation; non-zero values are validated > 0 at the binding boundary. 2. addon/src/model-interface/ClassificationModel.cpp - Translate `optional<RawRgbDims>` -> the existing `(declaredWidth, declaredHeight, declaredChannels)` triplet consumed by `preprocess::preprocessToTensor`. The preprocessor's internal "0 means not-provided" convention is preserved (it is a private API; the JS-facing one is the explicit optional). 3. addon/src/addon/AddonJs.hpp - `createInstance` now validates: * `path` must be a non-empty string, * `config.threads` (when provided) must be a positive integer. These were previously not enforced; non-positive thread counts would have silently passed through to libggml and raw negatives would int-truncate. - `runJob` is now the single source of truth for per-call validation: * `content` rejection message rephrased to include the substring "required" so the JS test `t.exception.all(..., /required|null|undefined/i)` keeps passing without relying on a separate JS-side TypeError. * Dimension triplet enforcement: caller must provide either all of {width, height, channels} or none of them; partial shapes are rejected with an explicit message rather than leaking through as a buffer-size mismatch downstream. * Each dim is range-checked as int32_t before being committed to ClassifyInput's optional<RawRgbDims>, so a negative JS Number cannot wrap to ~4 billion via uint32_t cast and tunnel into validateRawRgb. * `topK` is range-checked > 0 if provided. 4. test/unit/classification_model_test.cpp - Migrate the three `input.width = ...; input.height = ...; input.channels = ...;` blocks to the new `input.rawRgb = qcc::RawRgbDims{...};` shape. No behavioural change. 5. index.js - Strip every JS-side validation helper that duplicated C++ work: `assertBuffer`, `normaliseDimensionOptions`, `isSupportedEncoded`, `startsWith`, `JPEG_MAGIC`, `PNG_MAGIC`. The classify() body now literally builds `{ type, content, [width, height, channels, topK] }` from the caller's arguments and forwards to the binding. - Lifecycle checks (`!this._addon || !this.state.configLoaded`) and the file-existence check in `load()` stay in JS: * lifecycle is a JS-managed state, not a value-shape question; * the existence-check delivers a more actionable error message ("MobileNet GGUF weights not found at: <path>") than letting the load reach C++ and throw "Failed to open GGUF file: <path>" downstream. - Module-level comment documents the JS-as-thin-pass-through contract so a future contributor cannot re-introduce the duplicated validation by mistake. Local validation on win32-x64: - `bare-make build` clean. - `npm run test:cpp` -- 29/29 (incl. the migrated raw-RGB ClassificationModelTest cases). - `npm run lint` -- clean. - `npm run test:integration` -- 14/14 tests, 140/140 asserts. All existing brittle regex matchers in `error-cases.test.js` (`/required|null|undefined/i`, `/empty/i`, `/format|invalid/i`, `/decode|jpeg|invalid/i`, `/match|size|width|height|raw/i`, `/format|jpeg|png|bmp/i`, `/not loaded|load\(\)/i`, `/not loaded|destroyed|state/i`) match the new C++-issued error messages, so no test regex needed updating. Files: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/test/unit/classification_model_test.cpp packages/qvac-lib-infer-ggml-classification/index.js Made-with: Cursor
Re-shape the JS layer so request orchestration mirrors the LLM addon (closes T5-T9 from PR #1727 review): - addon.js becomes a thin C++ binding wrapper (mirrors LlamaInterface): constructor takes `(binding, configurationParams, outputCb, logger)`, exposes `activate()` / `runJob()` / `cancel()` / `unload()`. The bespoke `_pending` Promise + `_outputCallback` are gone; export a shared `mapAddonEvent(rawEvent, rawData, rawError)` instead. - index.js becomes the orchestration layer (mirrors LlmLlamacpp): one `exclusiveRunQueue()` serialises load/classify/unload, one `createJobHandler()` owns the active QvacResponse, and the output callback fans events through `_handleAddonOutputEvent`. - load() now does try/catch around `activate()` and best-effort `_addon.unload()` on failure so a partial init never leaves a zombie native handle (T6). - classify() resolves on the terminal stats event rather than the first ClassifyOutput, eliminating the orphan-callback risk that motivated the `_pending` drain on the previous design (T7, T8). Public shape unchanged: still `Promise<Array<{label,confidence}>>`. - unload() runs through the same queue, calls native `cancel()` on in-flight work, fails the active JS request with `Model was unloaded`, then destroys the native handle (T9). mapAddonEvent is keyed on payload shape (Array → Output, plain object → JobEnded terminal) because the upstream JobRunner emits the stats trailer with a raw `std::vector<std::pair<...>>` RTTI name rather than a literal `*JobEnded` event. Documented inline. Local validation: 14/14 integration + 140/140 asserts in 2.8s (down from 8.2s in Group A — the LLM-style cancel/unload is much faster than the prior drain-then-destroy pattern); 29/29 C++ unit tests; standard lint clean. Made-with: Cursor
…ssification-ggml Adds the two missing top-level workflow files so the addon now has the full 5-file layout used by every other modern addon in the monorepo (`decoder-audio`, `diffusion-cpp`, `ocr-onnx`, `bci-whispercpp`): - `on-pr-classification-ggml.yml` -- canonical PR trigger router. authorize -> changes -> sanity / ts-checks / cpp-lint / prebuild -> integration / mobile -> merge-guard. Path filters scope to `packages/classification-ggml/**` and the addon's own workflow files. - `on-pr-close-classification-ggml.yml` -- mirror of `on-pr-close-decoder-audio.yml`. Triggers `public-delete-npm-versions` with `packages: classification-ggml` to clean up per-PR npm pre-releases on PR close. Closes T11 from PR #1727 review (olyasir: "rename in same format as other pipelines"). The legacy-named `on-pr-qvac-lib-infer-ggml-classification.yml` on the fork PR-1 branch will be removed at sync-to-PR-1 time. The hack-branch dispatch swap (`on-pr-qvac-lib-infer-llamacpp-llm.yml` hijacked + `*-temp.yml` parking) is intentionally left untouched here: new workflows aren't dispatchable from the GitHub Actions UI until they exist on `main`, so the swap is still our only working dispatch path for hack-branch CI runs. Validation: both files parse with `yaml.safe_load`; every workflow / composite-action reference resolves on disk. Co-authored-by: Cursor <cursoragent@cursor.com>
Closes T2/T3/T4 from PR #1727 (jesusmb1995: "Please remove this comment, its unnecessary... LLM's are too verbose"), and applies the same four cleanup rules across the rest of the addon's 8 native files and 3 main JS files: R1 - delete comments that restate the next 1-3 lines of code R2 - delete comments that defend a non-question or repeat an architectural decision at consuming sites R3 - keep external-constraint warnings, but inline at the exact affected line (footgun proximity) R4 - mirror the LLM addon's comment density and tone Specifically removed: - T2: ClassificationModel.hpp's 10-line "Direct members instead of a PIMPL struct..." block. The ggml destruction-ordering note moves to ~ClassificationModel() in the .cpp where the ordering matters. - T3: AddonJs.hpp's "Single-place parsing & validation..." preamble above runJob(). - T4: AddonJs.hpp's same-shaped preamble above createInstance(). - All "// Stem", "// Tail", "// Inverted residual blocks" section headers in MobileNetGraph.cpp - the function/loop names are clear. - The 16-line warmup essay in ClassificationModel::load() trimmed to the 4-line external-constraint summary that justifies it. - Multiple verbose JSDoc blocks in addon.js / index.js / utils.js reduced to one-line summaries. Specifically kept (R3 footgun-proximity): - bare-runtime `as<int32_t>` truncation note (tunnels negative Numbers to ~4B silently). - ggml NHWC-vs-WHCN one-liner in normalizeToWhcn (load-bearing for the index math). - BN epsilon "never trust 1e-5" in loadWeights. - num_classes mismatch-corrupts-tensor warning. - stbi_info pre-decode header check rationale (~300 MB OOM bound). - mapAddonEvent's payload-shape-keying explanation (differs from LLM's name-based mapping for a real reason). - Test-contract wording note on the "is required / null / undefined" error message in runJob. Density before/after, against the LLM addon (the reviewer's reference): Native files (8 files): 282 -> 88 lines (-69%) JS main files (3 files): ~75 -> 72 lines (-4%) LLM addon for comparison: AddonJs.hpp + LlamaModel.hpp + LlamaModel.cpp = 116 addon.js + index.js = 101 Local validation: 14/14 integration tests in 3.9s, 29/29 C++ unit tests, standard lint clean. Co-authored-by: Cursor <cursoragent@cursor.com>
…assification (#1727) * QVAC-17481 feat: add @qvac/classification-ggml MobileNetV3 image classification addon Introduces a new inference addon that classifies images into three classes (food / report / other) using a fine-tuned MobileNetV3-Small CNN running on the libggml CPU backend. Follows the established QVAC addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion). ## What this PR ships - New package `packages/qvac-lib-infer-ggml-classification/` publishing as `@qvac/classification-ggml`: - Native addon: custom 34-layer MobileNetV3-Small compute graph built directly against the public `ggml.h` / `ggml-backend.h` API — no llama.cpp application-layer dependency, so the addon remains forward-compatible with future `libggml` upstream merges. - Load-time BatchNorm fold with `eps = 0.001` (the architecture- correct value; `1e-5` causes normalisation drift across all 34 layers). Depthwise separable convolutions, squeeze-and-excite blocks, HardSwish / HardSigmoid / ReLU activations all wired through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`, `ggml_hardswish`, `ggml_hardsigmoid`. - FP16 GGUF weights bundled inside the package (2.94 MB); class labels are read from the GGUF `mobilenet.class_N` metadata so a future fine-tune can ship different class names without a code change. - Public JS API: `new ImageClassifier({ modelPath?, logger?, threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)` / `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input; validates at the JS layer before reaching native code so no bad input reaches libggml. - `nativeLogger` opt-in (default `false`): the underlying `qvac-lib-inference-addon-cpp` JsLogger holds a process-wide static `uv_async_t` that is not safe across rapid create/destroy cycles, so the native C++→JS log bridge is disabled unless the caller explicitly opts in. JS-level logging always flows through the caller's `logger`. - Image preprocessing via vendored-through-vcpkg `stb_image` + `stb_image_resize2` (bilinear resize to 224×224, ImageNet normalisation, WHCN layout). ## Build + tests - `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting `ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from the shared QVAC vcpkg registry. - C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9 SE blocks), load + inference, determinism, `topK` filter, BN epsilon guard, and full preprocessor behaviour. - brittle + bare JS integration tests covering load, classify (all 6 public sample images under `test/images/`), `topK`, raw RGB input, and every error path: null, empty buffer, corrupted JPEG, unsupported format (BMP), mismatched dimensions, pre-load / post-unload, tiny upscale, load/unload cycles. - Mobile test scaffolding following the shared convention: `scripts/generate-mobile-integration-tests.js`, `scripts/validate-mobile-tests.js`, `test/mobile/ {integration-runtime.cjs, integration.auto.cjs, README.md, testAssets/.gitignore}`. The auto-generated `integration.auto.cjs` wraps every `test/integration/*.test.js` so the shared `qvac-test-addon-mobile` framework picks them up on Android and iOS automatically. ## CI workflows Four addon-scoped workflows (path-filtered to this package): - `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity checks, TypeScript declaration check, C++ lint, prebuild matrix, desktop integration tests, mobile integration tests, merge-guard. - `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64, Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64 prebuild matrix. - `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop end-to-end tests with the shared performance reporter writing a GitHub step summary. - `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` — AWS Device Farm Android + iOS runs via the `tetherto/qvac-test-addon-mobile` framework. ## Public-data / test-image policy All public correctness assertions in this package are scoped to the 6 test images under `test/images/` (2 per class). No confidential fine-tuning numbers, validation-set sizes, per-class metrics, or references to any internal validation dataset appear in this PR, in any file it ships, or in CI logs. Internal numerical-equivalence gating against an ONNX FP32 reference is handled pre-release by a development-only script that is not part of this PR. ## Out of scope for this PR - SDK plugin / schema integration (`packages/sdk/**`) lands in a follow-up PR after `@qvac/classification-ggml@0.1.0` is published to npm. This mirrors the diffusion rollout (#656 → release → #1021). - GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0. Made-with: Cursor * QVAC-17481 fix(ci): correct setup-bare-tooling action name in classification workflows The prebuild and integration-test workflows for @qvac/classification-ggml referenced `tetherto/qvac/.github/actions/setup-bare-toolchain`, which does not exist. The action is named `setup-bare-tooling` (same name used by the llamacpp-llm, nmtcpp, and diffusion addons at the identical pinned SHA). All 6 prebuild matrix jobs failed at step 1 with "Can't find 'action.yml' ... for action 'setup-bare-toolchain'" until this rename is in place. Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(ci): add per-platform vcpkg/NDK/Apple-clang setup to classification prebuilds The classification prebuilds workflow was missing the per-platform toolchain steps that sibling addons (diffusion, nmtcpp) have after `setup-vcpkg-cache`. As a result, `VCPKG_ROOT` was never exported, CMake couldn't locate the vcpkg toolchain, and `bare-make build` failed on every platform. Changes to .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml: - setup-vcpkg-cache: drop unknown inputs `vcpkg-path` and `github-packages-token` (action only accepts platform, arch, s3-bucket-path). Was silently ignored but emitted warnings. - Add per-OS vcpkg bootstrap / configuration: macOS (darwin, ios): clone microsoft/vcpkg tag 2025.12.12, bootstrap, export VCPKG_ROOT. Linux (linux, android runners): export VCPKG_ROOT=$VCPKG_INSTALLATION_ROOT. Windows: export VCPKG_ROOT from $env:VCPKG_INSTALLATION_ROOT with backslash-to-forward-slash normalisation. - Windows-only: set CMAKE_GENERATOR="Visual Studio 17 2022" and, for the x64 matrix row, CMAKE_GENERATOR_PLATFORM=x64. - Android-only: export ANDROID_NDK / ANDROID_NDK_HOME / ANDROID_NDK_ROOT from ANDROID_NDK_LATEST_HOME, derive ANDROID_TOOLCHAIN_ROOT, set ANDROID_NATIVE_API_LEVEL=24. - iOS and darwin: move Homebrew llvm / llvm@18 aside so the Apple toolchain clang is on PATH (matches diffusion). All additions mirror the working pattern in prebuilds-lib-infer-diffusion.yml and prebuilds-qvac-lib-infer-nmtcpp.yml at the same pinned action SHA. No Vulkan or apt X11 steps were added: this addon is CPU-only ggml and has no graphics dependencies. Made-with: Cursor * QVAC-17481 fix: add missing <limits> include and CI build-failure diagnostics Two related changes to unstick the prebuild matrix: 1. addon/src/model-interface/ImagePreprocessor.cpp uses std::numeric_limits<int>::max() but does not #include <limits>. MSVC pulls <limits> in transitively (via <algorithm> in its STL), but libc++ and libstdc++ on clang/gcc do not. This is the most plausible reason all five non-Windows prebuild jobs (linux-x64, linux-arm64, android-arm64, darwin-arm64, ios-arm64) failed identically at `bare-make build` while the Windows host build succeeded. 2. prebuilds-qvac-lib-infer-ggml-classification.yml gains a `Dump build context on failure` step that runs only if `bare-make build` fails. It prints toolchain identity, lists the build/ tree, tails CMake configure logs, dumps any *.log under build/, and tails up to 20 vcpkg buildtree logs. Mirrors the `Dump vcpkg build logs on failure` pattern in prebuilds-lib-infer-diffusion.yml. Without this, every CI failure currently surfaces only as `Process completed with exit code 1.`, which is essentially undebuggable from the run summary page. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(ci): use --platform (not --target) for bare-make generate Root cause confirmed from job log of run 24850328468 (linux-x64): bare-make generate --target linux --arch x64 Bail: UNKNOWN_FLAG: target The bare-make CLI installed by setup-bare-tooling does not accept `--target`; it only accepts `--platform`. Diffusion and nmtcpp both use `--platform`. Locally I had an older bare-make that accepted `--target` as an alias, which masked the bug on my Windows host. Step 17 (Generate build) was failing immediately with the above "Bail: UNKNOWN_FLAG", causing every downstream step (build, install) to fail too across all 6 prebuild matrix jobs. Also harden the diagnostic step `Dump build context on failure`: disable `-e` and `pipefail` for that step so a missing `build/` directory or empty `find` result no longer makes the diagnostic step itself exit non-zero (it should never mask the real failure). Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix: pin ggml to CPU-only feature set + guard backend iteration CI runs were failing because the default ggml vcpkg feature set pulls in the `vulkan` (Linux/Windows/Android) and `metal` (Apple) GPU backends, which forces `find_package(Vulkan)` at configure time and forces the prebuilds workflow to install the Vulkan SDK on every runner. Since this addon is CPU-only by design (only ever calls ggml_backend_cpu_init), the GPU backends are dead weight: extra compile time, extra dependencies in shipped prebuilds, and extra runtime requirements on user machines (e.g. libvulkan.so.1). Two related changes, no functional impact on the addon itself: 1. packages/qvac-lib-infer-ggml-classification/vcpkg.json Add "default-features": false` to the ggml dependency. This opts out of vulkan / metal / cuda / opencl while keeping the core CPU backend (which is the implicit base, not a named feature). Verified locally on win32-x64: vcpkg rebuilt `ggml:x64-windows@2026-01-30#5` from source in 26s without Vulkan, generate + build + install all green, and the JS integration test ran the model end-to-end producing correct top labels (food/report/other) for every sample image. 2. packages/qvac-lib-infer-ggml-classification/CMakeLists.txt Guard the GGML_AVAILABLE_BACKENDS iteration with `if(TARGET ggml::${_backend})`. The upstream variable advertises every backend the port knows about, but real CMake targets only exist for backends that were actually built. Without the guard, add_bare_module's get_target_property() crashes on Android (where Vulkan and OpenCL are listed as available but not built). Defensive change; no behavioural difference when targets do exist. Local artifact size: prebuilds/win32-x64/qvac__classification-ggml.bare is 1.6 MB; no shipped vulkan loader. Made-with: Cursor * QVAC-17481 fix(ci): match prebuild- artifact prefix in mobile tests The mobile integration workflow downloaded artifacts with patterns `android-*` / `ios-*` (PREBUILD_ARTIFACT_PREFIX was empty), but the prebuilds workflow names artifacts `prebuild-android-arm64` / `prebuild-ios-arm64`. Result: `Total of 0 artifact(s) downloaded`, followed by "ERROR: No prebuilds found!" — both Android and iOS mobile jobs failed at this exact step in run 24891210942. Set PREBUILD_ARTIFACT_PREFIX to "prebuild-" so the resulting patterns become `prebuild-android-*` and `prebuild-ios-*`, matching the actual artifact names. Mirrors how the desktop integration workflow already filters (it uses `prebuild-${platform}-${arch}*` directly). File: .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(model): zero-input warmup pass to defeat cold-inference NaN ggml's backend graph allocator leaves intermediate tensor buffers and the input/output tensors uninitialised after `buildGraph` returns. Whatever stale heap residue happens to occupy those addresses can leak into the very first inference and produce non-finite logits on a heap-state-dependent basis. CI run 24891210942 caught this on win32-x64: meal_1.jpg (the first sample classified after instance creation) failed assert 9 (`Math.abs(sum - 1) < 1e-3` -- probabilities sum was not ~1) and assert 10 (`result[0].confidence >= result[1].confidence` -- sort comparison broke because the first confidence was NaN). Asserts 11..72 covering the other five sample images all passed: by then the second inference had overwritten the dirty buffers with real data. This is a classic uninit-memory bug: behaviour depends on whatever the heap happens to contain at process start. My local Windows build did not trip on it (different heap layout); the Azure CI runner did. Same compiler family, same code, different result. Fix: at the end of `ClassificationModel::load()`, run one full forward pass with a zero-filled input tensor and discard the output. This forces ggml's compute graph to write every backend buffer with a deterministic value before any user-visible classify() call ever sees the model. Cost is one cold inference per `load()` (~50-200 ms on a CPU runner), paid once at addon startup, never visible to the caller. Local validation on win32-x64 with this change: integration test 1 (72/72 asserts including all sum-to-one and sort-desc checks) now passes deterministically across rebuilds. The unrelated lifecycle SIGSEGV between separate ImageClassifier instances (likely in qvac-lib-inference-addon-cpp's JobRunner / OutputCallbackJs uv_ resources, not addressed here) still surfaces, just later in the test run -- that needs a separate investigation in addon-cpp. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 fix(model): full-pipeline warmup eliminates win32 cold-inference NaN The previous zero-input warmup (commit af12cdd1) wrote zeros directly to the input tensor and ran ggml_backend_graph_compute. CI run 24892803959 showed it was insufficient: win32-x64 still failed asserts 9 + 10 on meal_1.jpg with NaN in result[0].confidence, while linux-arm64 / darwin / linux-x64 all passed. Hypothesis: ggml's CPU backend on MSVC has lazy-init code paths (SIMD kernel JIT / FP state setup) that only trigger on non-trivial inputs reaching the post-preprocess range, and the zero-input warmup didn't exercise them. The bug therefore surfaces on the first real classify() with an ImageNet-normalised image. Fix: replace the synthetic warmup with one that goes through the EXACT same pipeline classify() uses end-to-end: 1. Synthesise a small (32x32) raw RGB buffer with a deterministic non-zero gradient pattern (uint8 values from `(i * 7) & 0xFF`). 2. Run preprocess::preprocessToTensor on it (resize to 224x224 + ImageNet normalise + channel reorder to WHCN). 3. ggml_backend_tensor_set the result, run the full compute graph, and read the output back via ggml_backend_tensor_get. Cost: one full classify-equivalent pass at load() time (~50-200 ms on a CPU runner), paid once per ImageClassifier instance, never visible to the caller. Output is discarded; the goal is to leave every backend buffer fully written and every lazy-init code path exercised before user-visible classify() runs. Local validation on win32-x64: 14/14 integration tests pass with this change (was failing test 1 asserts 9 + 10 on meal_1 before). Also applies the clang-format-19 layout the cpp-lint check expected, unblocking that job. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 fix(addon): drain in-flight job in unload(); persistent perf reporting Two related changes that together unblock multi-instance integration tests across linux-x64 / darwin-arm64 / android / ios and address the inference-latency-visibility ask. 1. addon.js — make unload() wait for the in-flight job to settle The previous unload() flow rejected this._pending immediately and then synchronously called binding.destroyInstance(). The native side (qvac-lib-inference-addon-cpp's JobRunner uses a worker thread; OutputCallbackJs uses a uv_async_t handle) often still had a callback pending at that moment, and destroying the instance underneath the in-flight callback raced with the uv_close lifecycle. The result was a SIGSEGV (use-after-free) observed across linux-x64 (both ubuntu-22.04 + 24.04), darwin-arm64, and the on-device Android/iOS Device Farm jobs in CI runs 24891210942 and 24892803959. linux-arm64 happened to win the race on those runs but the bug is fundamentally non-deterministic. Fix: track a separate `_pendingSettled` Promise that resolves the moment _outputCallback fires (whether the user-facing classify() Promise resolved or rejected). unload() now awaits that signal before calling destroyInstance, so the worker thread / async handle have provably finished when the native teardown runs. The user-facing classify() Promise contract is unchanged. This is a correctness improvement to the ImageClassifier API contract: after `await classifier.unload()` returns, native resources are now genuinely released (not "scheduled to be released, please don't peek"). 2. test/integration/utils.js + classify.test.js — crash-survivable inference-latency reporting + load-time metric The performance-report.json was previously only flushed in process.on('exit'), so any SIGSEGV mid-test discarded all collected metrics. Now we additionally flush the JSON file after every recorded metric. Even a partial run leaves a usable per-platform latency snapshot in the uploaded artifact. Also adds recordLoadTime(label, ms) to capture the cost of constructing + load()ing an ImageClassifier (warmup + GGML graph build + weights read), and threads it into the first integration test as `load:cold`. This complements the per-image classify timings already recorded as `classify:<file>` and uploaded as artifact `classification-perf-report-{platform}-{arch}`. Local validation on win32-x64: 14/14 tests pass cleanly with this change set; performance-report.json contains 7 results (load:cold + 6 classify:<file>) on disk before the process exits. Files: packages/qvac-lib-infer-ggml-classification/addon.js packages/qvac-lib-infer-ggml-classification/test/integration/utils.js packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js Made-with: Cursor * QVAC-17481 fix(addon): defer OutputCallBackJs destruction to avoid use-after-free race Root cause (in `qvac-lib-inference-addon-cpp:OutputCallBackJs.hpp`): The upstream destructor calls `uv_close(asyncHandle, deleter)` -- which is asynchronous -- and then IMMEDIATELY runs `js_delete_reference` on its JS handle/callback refs before returning. When a `jsOutputCallback` invocation was queued by a `uv_async_send` from the worker thread just before destruction, it fires on a later libuv iteration and dereferences the freed `OutputCallBackJs` and its already-deleted JS refs. This explained the SIGSEGV (linux-x64 24.04, darwin-arm64) and the on-device APP CRASH (Android / iOS Device Farm) observed across rapid ImageClassifier create/destroy cycles in CI runs 24891210942, 24892803959, 24897445066. The bug is timing-dependent, which is why linux-arm64 consistently wins the race and passes while other platforms fail. Fix (this commit, in our binding.cpp only): Introduce a `DeferredOutputCallBackJs` wrapper that implements `addon_cpp::OutputCallBackInterface` by composing the upstream `addon_cpp::OutputCallBackJs` as a `unique_ptr` and forwarding `initializeProcessingThread / notify / stop` calls to it. The wrapper is what `AddonCpp` now owns; the inner upstream callback is owned by our wrapper. AddonCpp field destruction order is: 1. `~AddonCpp` body: `outputCallback_->stop()` (our wrapper's stop forwards to inner). 2. `jobRunner_` destroyed: JOINS the worker thread. No new `uv_async_send` can happen from this point on. 3. `outputCallback_` destroyed: our wrapper's destructor runs. 4. There may still be `uv_async_send` callbacks QUEUED before step 2 that are pending on the libuv loop. Our destructor releases ownership of the inner callback into a heap-allocated `uv_check_t` whose callback (firing AFTER the poll phase on the next libuv iteration -- i.e. after any queued async callback has fired safely against the still-alive inner) deletes the inner, then closes and deletes itself. The check handle is unref'd so it does not keep the libuv loop alive on its own. This is a real lifetime-management fix, not a timing workaround. When upstream's destructor is corrected, the wrapper becomes a pass-through with no functional effect. We will also submit the fix upstream. Local validation on win32-x64: 14/14 integration tests pass, 90/90 asserts, including test 14 (`load -> unload -> load cycles do not leak handles`) which explicitly exercises the pattern that was racing the upstream bug. File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp Made-with: Cursor * QVAC-17481 fix(model,test): defensive softmax/sort + per-inference diagnostic trace Three related changes that together (a) make the classification output well-formed under any numerical edge case and (b) give us first-class visibility into whatever the model actually returns on every CI platform. No workarounds or test-masking -- the C++ changes apply uniformly to production classify() calls and the diagnostic logs are plain stderr output behind an opt-in env var (plus always-on per-image t.comment() in tests). 1. addon/src/model-interface/ClassificationModel.cpp -- softmax() Previously: - Called std::max_element on a span that could contain NaN (max_element behaviour on NaN is unspecified). - Skipped normalization when sum <= 0 but RETURNED the unnormalized probs (could leave callers with all-zero or non-sum-to-1 probabilities). Now: - Finds max by explicit isfinite() walk, defaulting to -inf if every logit is non-finite. - If max is non-finite (all NaN/Inf), returns a uniform distribution (1/N per class) so callers always see a valid probability vector that sums to 1. - Per-element exp() input is skipped when non-finite (produces 0 for that element rather than NaN). - If the exponential sum is not finite or <= 0, falls back to uniform distribution instead of returning unnormalized zeros. This is defence in depth. MobileNetV3-Small on well-normalized input never produces NaN logits in practice, but if upstream ggml CPU backend ever surfaces a numerical bug (or a future quantised model does) we now cannot silently corrupt the user-visible probability distribution. 2. addon/src/model-interface/ClassificationModel.cpp -- std::sort Added explicit is-finite guards in the comparator. Non-finite confidences now compare as less than any finite value, giving strict-weak-ordering even with degenerate inputs. Previously, any NaN in the confidences would make the comparator non-strict-weak and std::sort behaviour undefined (one observed symptom: top class label at index 0 but some later index carrying a higher confidence). 3. addon/src/model-interface/ClassificationModel.cpp -- trace hook New `QVAC_CLASSIFICATION_TRACE=1` env var toggles a per-inference stderr print of: - raw logits as read from the ggml output tensor - probabilities immediately after softmax (pre-sort) - final sorted results Off by default -- production users see nothing. Enabled in our CI integration-test workflow (in the third file below) so every run carries the numerical ground truth for every sample image. If a platform-specific anomaly ever recurs (e.g. the win32 meal_1 oddity we have been chasing) the log lines let us diagnose without adding further instrumentation. 4. test/integration/classify.test.js Before each per-image assertion block, emit a `t.comment(...)` line containing the full sorted result (label + 6-digit confidence per entry, plus elapsed ms). Brittle surfaces comments in the TAP stream regardless of pass/fail, so every CI job log now records the actual model output side-by-side with the assertion outcome. This replaces the need for post-hoc instrumentation commits when diagnosing numerical issues. 5. .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Set `QVAC_CLASSIFICATION_TRACE=1` on the integration-test step so the C++ trace lines land in CI logs by default. Bounded output (3 lines per inference, ~20 inferences per job), negligible cost. Local validation on win32-x64: 14/14 integration tests pass, 90/90 asserts. Trace output verified: all 6 sample images produce sensible logits and sum-to-1 probabilities; top class matches expected label in every case. Trace lines and t.comment()s visible in both the pass and (hypothetically) fail paths, as intended. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix: clang-format + defensive marshalling + finer test assertions Three coordinated changes that (a) unblock cpp-lint, (b) make the C++ -> JS marshalling robust against compiler code-gen quirks, and (c) make every test failure self-diagnostic so we never have to add post-hoc instrumentation again. 1. addon/src/model-interface/ClassificationModel.cpp -- clang-format Apply the exact diff that cpp-lint reported in run 24900278513: drop the blank line between <gguf.h> and the addon-cpp include, wrap the std::sort args one-per-line, and split the multi-arg static_cast<double>(...) chain in the trace fprintf to one arg per line. Pure formatting; no behaviour change. 2. addon/src/addon/AddonJs.hpp -- defensive marshalling + per-entry trace inside JsClassifyOutputHandler The lambda now reads the label and the confidence into named local variables (`labelString`, `confidenceFloat`, then `confidenceDouble = static_cast<double>(confidenceFloat)`) BEFORE handing them to `jsu::String::create` / `jsu::Number::create`. The previous inline expression jsu::Number::create(env, static_cast<double>(cppOut.results[i].confidence)) produced 0 in JavaScript for index 0 only on win32-x64 (clang-cl), while indices 1..N marshalled correctly -- visible in run 24900278513 win32 log: C++ trace shows {food:0.707883} but JS receives {food:0.000000}, all other entries OK. Materialising the values into named locals forces the compiler to commit the values to memory before the call sequence and dodges that code-gen pattern. Linux, macOS, and Windows continue to pass; this is risk-free defence-in-depth even if Windows turns out to have a deeper issue. Also adds an opt-in trace line per array element (gated by the same QVAC_CLASSIFICATION_TRACE=1 env var as ClassificationModel::process()), printing label, float, and double values as the lambda actually sees them. Combined with the existing process()-level trace, we now get the full pipeline view -- raw logits -> probs -> sorted results -> per-entry marshalling -- on every CI run with no manual instrumentation needed. 3. test/integration/classify.test.js -- finer assertions Replace coarse "confidence is in [0,1]" with split assertions that distinguish: typeof number / Number.isFinite (NaN/Inf detection) / range check. Per-entry assertion messages now include the array index AND the actual value so a failure line tells you exactly what went wrong. Same treatment for the sum and the sort-desc checks. Topk / sequential / raw-RGB tests gain explicit Number.isFinite checks plus t.comment() output of the full result, so they no longer silently swallow the kind of value-corruption bug that was hidden in test 2 of the previous CI run. Local validation on win32-x64: 14/14 tests pass; assertion count went from 90/90 to 140/140 with the new finite-checks. Marshalling trace verified emitting label / float / double per element under QVAC_CLASSIFICATION_TRACE=1. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js Made-with: Cursor * QVAC-17481 fix(mobile,addon): mobile model path via testAssets + cpp-lint uv.h order - `test/integration/utils.js`: add `resolveModelPath()` that resolves the GGUF weights via `global.assetPaths` on iOS/Android (the bare worklet runs from a packed `app.bundle/...` virtual root and cannot read the npm package's `weights/` directory), and falls back to the bundled desktop path otherwise. Throw a clear synchronous error when the asset is missing so it surfaces as a brittle assertion instead of an unhandled-promise-rejection that aborts the bare worklet. - `test/integration/classify.test.js`, `test/integration/error-cases.test.js`: use `resolveModelPath()` for every `ImageClassifier` instance. - `scripts/copy-mobile-test-assets.js`: replace the inline shell `mobile:copy-prebuilds` script with a portable Node script that fans out the single arm64 prebuild into the per-flavour directories the qvac-test-addon-mobile framework expects. - `package.json`: wire the new script in as `mobile:copy-prebuilds`. - `addon/src/addon/AddonJs.hpp`: include `<uv.h>` and reorder includes to satisfy `clang-format-19`'s grouping rules so cpp-lint passes in CI. - `.gitignore`: keep downloaded Device Farm logs (`remote_logs/`) and ad-hoc validation scripts out of the working tree. Made-with: Cursor * QVAC-17481 fix(mobile,addon): testAssets .gguf.bin extension + win32 burn-one js_create_double - `scripts/copy-mobile-test-assets.js` + `test/integration/utils.js`: copy the GGUF weights into `test/mobile/testAssets/` with a `.gguf.bin` suffix and look them up by that key. The qvac-test-addon-mobile framework's metro.config.js does not register `.gguf` as an asset extension, so a raw `.gguf` file is treated as a JS-source request and the bundler aborts at `:app:createBundleReleaseJsAndAssets`. `.bin` is in the framework's accepted list and ggml's `gguf_init_from_file` does not validate the file extension. - `addon/src/addon/AddonJs.hpp`: add a defensive "burn one" `js_create_double(env, 0.0, &dummy)` call at the top of the classification result lambda. On Win32 (clang-cl + bare runtime + V8) the very first `js_create_double` call inside a fresh handle scope returned 0 for index 0 even though the C++ side passed the correct value; consuming that slot unblocks every subsequent call. Gated trace output behind `QVAC_CLASSIFICATION_TRACE=1`. Made-with: Cursor * QVAC-17481 fix(mobile): copy test images to mobile testAssets to fix Android/iOS ENOENT `test/integration/utils.js:loadImage()` previously read every test image with `fs.readFileSync(path.join('test','images',name))`. On mobile that resolves into the packed `app.bundle/...` virtual root, where `test/images/` is not present, and the bare runtime aborts with `FileError: ENOENT, open "/app.bundle/backend/test/images/<file>"` right after the model loads (Pixel 9 Pro logcat from the previous CI run pinpointed this). Fixed by: - `scripts/copy-mobile-test-assets.js`: also copy every `test/images/*.{jpg,jpeg,png}` into `test/mobile/testAssets/`. JPEG and PNG are part of metro's default `assetExts`, so no rename is needed (unlike the GGUF blob). - `test/integration/utils.js`: add `_resolveImagePath()` that on mobile reads from `global.assetPaths['../../testAssets/<name>']` with the same key fallbacks as `resolveModelPath()`, and on desktop returns `test/images/<name>`. Throw with sample asset keys when the lookup fails so the failure is a brittle assertion. - `test/mobile/testAssets/.gitignore`: also ignore `*.jpg`/`*.jpeg`/ `*.png` so the populated images are not committed. Made-with: Cursor * QVAC-17481 docs: README revisions for mobile assets, FP16, topK and prose reflow - Document new `npm run mobile:copy-prebuilds` flow that populates `test/mobile/testAssets/` with prebuilds, the `.gguf.bin` weights blob, and the integration test images (fixes mobile ENOENT crash). - Replace the obsolete "Cold start" claim with a "First-call overhead" note that reflects the full-pipeline warmup added in `load()` and the remaining JS/JIT/decoder/page-cache effects. - Add a "Why FP16 weights?" subsection capturing the precision-vs-size rationale (FP16 matches FP32 accuracy on the validation set; more aggressive quantizations degraded noticeably). - Expand the topK section with a plain-language one-liner. - Add a runtime trade-off paragraph under "Why a custom GGML graph?": GGML CPU is slower than PyTorch/ONNX at this scale, but the absolute gap is negligible for a ~2.5 M-param model; larger classifiers would need extra graph-level optimisation. - Fix `funetuned` -> `fine-tuned` typo. - Reflow paragraphs to single lines so markdown viewers can soft-wrap. Made-with: Cursor * QVAC-17481 fix(graph): validate GGUF num_classes and assert output shape (review #1727) Addresses two `[BUG]` review comments from @olyasir on tetherto/qvac#1727 about the hardcoded `kNumClasses = 3` not being validated against either the loaded GGUF's `mobilenet.num_classes` metadata or the actual element count of the constructed output tensor. Both are downstream-safety problems for the per-inference path: float logits[graph::kNumClasses] = {0.0F}; ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits)); `sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this either reads OOB (numClasses < kNumClasses) or silently truncates (numClasses > kNumClasses); on the FC-weight-upload side the `classifier.3.weight = [1024, kNumClasses]` shape would also fail to match the GGUF tensor and corrupt the classifier. Changes: 1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights() Right after reading `numClasses` from `mobilenet.num_classes`, compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)` with a descriptive message (actual vs expected count, plus a hint to rebuild the addon or use a matching GGUF). This is the primary fix olyasir requested in `MobileNetGraph.cpp`. The error path is reachable from `ClassificationModel::load()`'s call to `graph::loadWeights(...)`, which already runs inside the JS-side `await classifier.load()` Promise; the `StatusError(InvalidArgument)` propagates as a structured rejection on the JS side, matching how every other config-time validation error in this addon surfaces. 2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph() At the end of the graph build, before we hand the `ComputeGraph::output` tensor over to the backend allocator, assert `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which throws `StatusError(InternalError, ...)`) if the invariant is violated. This is the defence-in-depth fix olyasir requested in the second `[BUG]` comment in `ClassificationModel.cpp`: it makes the 12-byte stack-array `ggml_backend_tensor_get` read provably safe regardless of how the output tensor was constructed. This second check is not redundant with #1: it also catches a future accidental edit to the classifier wiring above (where the tail `classifier.3` linear is what determines the output element count), an upstream ggml change to how `mul_mat` shapes its result, or a GGUF that lacks the `mobilenet.num_classes` metadata key entirely and falls back to `kNumClasses` but ships mismatched FC weights. Local validation on win32-x64: - 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph determinism, preprocessor suite -- they all exercise the validated load + build paths against the bundled FP16 GGUF, where `num_classes == 3` so neither check fires). - 14/14 JS integration tests pass, 140/140 asserts (no behaviour change for the supported model; new error paths are unreachable with the bundled weights). Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp Made-with: Cursor * QVAC-17481 fix(preprocess): pre-decode size check via stbi_info_from_memory (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Could we check this before decoding? `stbi_info_from_memory()` would > let us reject oversized images / total pixel count before > `stbi_load_from_memory()` allocates Why it matters: `stbi_load_from_memory` allocates the full decoded RGB buffer (width * height * 3 bytes) before any caller-provided dimension limit is enforced. For a 16384x16384 image at the upper edge of `kMaxImageDimension`, that is ~768 MB of heap allocated before we see the dimension and reject -- enough to OOM a memory-constrained device or trigger an oversized free. `stbi_info_from_memory` parses only the image header (a few hundred bytes) and reports the dimensions cheaply, so we can reject oversized inputs up-front. The post-decode dimension check is kept as belt-and-braces in case `stbi_info` and `stbi_load` ever disagree (e.g. truncated streams that parse a valid header but fail mid-decode); it is a correctness check, not the primary OOM defence. Behaviour: - If `stbi_info` succeeds and reports dimensions over `kMaxImageDimension`, `decodeToRgb` throws `StatusError(InvalidArgument, ...)` with the actual reported size in the message, before any decode allocation runs. - If `stbi_info` fails (header could not be parsed), we fall through to `stbi_load_from_memory`. That path already throws with `stbi_failure_reason()` attached, which is a more user-actionable message than a generic "header bad" we would emit ourselves. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp Validated locally on win32-x64: 14/14 JS integration tests pass. Made-with: Cursor * QVAC-17481 test(preprocess): expand ImagePreprocessor unit coverage (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Could we add more unit coverage for ImagePreprocessor before merging? > preprocessor_test.cpp covers some happy paths, but a few public > functions/branches still look uncovered: > - decodeToRgb() success/failure paths are not tested directly. > - preprocessToTensor() is only covered for empty input; it should > also cover encoded JPEG/PNG success, raw RGB success, and > unsupported non-image input without dimensions. > - validateRawRgb() is missing empty buffer, zero width/height, and > over-kMaxImageDimension cases. > - normalizeToWhcn() should cover invalid input size. Adds the following PreprocessorTest cases (14 new tests, taking the suite from 10 to 24 -- all 29 cases across the addon's two C++ test binaries pass on win32-x64): decodeToRgb: - DecodeToRgbDecodesValidJpeg -- happy path against test/images/meal_1.jpg - DecodeToRgbRejectsEmptyBuffer - DecodeToRgbRejectsCorruptedBytes - DecodeToRgbRejectsTruncatedJpeg preprocessToTensor (full pipeline): - PreprocessToTensorAcceptsEncodedJpeg -- JPEG happy path with finite-output check - PreprocessToTensorAcceptsRawRgb -- raw RGB happy path with finite-output check - PreprocessToTensorRejectsBmpWithoutDimensions - PreprocessToTensorRejectsRawWithMissingDims validateRawRgb edges: - ValidateRawRgbRejectsEmptyBuffer - ValidateRawRgbRejectsZeroWidth - ValidateRawRgbRejectsZeroHeight - ValidateRawRgbRejectsOverKMaxImageDimensionWidth - ValidateRawRgbRejectsOverKMaxImageDimensionHeight normalizeToWhcn: - NormalizeToWhcnRejectsWrongInputSize Adds a `readTestImage(name)` helper that walks up from the current binary location to find `test/images/<name>`, mirroring the `findWeightsPath()` helper already in classification_model_test.cpp. JPEG-using tests skip cleanly via GTEST_SKIP() if the image is not present, so the C++ test suite still passes when run from a packed tarball that does not include the test images. File: packages/qvac-lib-infer-ggml-classification/test/unit/preprocessor_test.cpp Made-with: Cursor * QVAC-17481 refactor(model): flatten ClassificationModel::Impl pidgeonhole (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Why one extra level of indirection with `Impl`? Maybe style, but I > see no strong benefit and it just scatters the code around and > makes it harder to track. I would prefer a straightforward class > where all these variables can be directly under > `ClassificationModel` private variables. The PIMPL was originally there to keep ggml types out of the public header. In practice this header is only included by the addon's own `AddonJs.hpp`, which already pulls in the entire qvac-lib-inference-addon-cpp framework, so there is no header-fanout benefit from hiding ggml. Flattening the impl removes one level of heap indirection, lets all members be visible at a glance, and lets clang-tidy / IDE navigation jump straight to the field declarations. Changes: 1. addon/src/model-interface/ClassificationModel.hpp - Pull in `<ggml-backend.h>` and the local `MobileNetGraph.hpp` (which exposes `WeightsBundle` / `ComputeGraph` definitions used by the new direct members). - Replace `struct Impl;` forward declaration and `std::unique_ptr<Impl> impl_;` with the eight direct private members the Impl previously held: `modelPath_`, `backend_`, `weights_`, `compute_`, `labels_`, `numThreads_`, `loaded_`, `lastInferenceUs_`. Member ordering is documented in a comment: ggml requires every backend buffer to be released BEFORE the backend it was allocated on, and `~ClassificationModel` enforces that ordering explicitly with `compute_.reset(); weights_.reset();` before `ggml_backend_free(backend_)`. 2. addon/src/model-interface/ClassificationModel.cpp - Remove the `struct ClassificationModel::Impl { ... };` definition and the `std::make_unique<Impl>()` from the constructor body. - Replace every `impl_->X` with `X_` (34 references). No functional change. - Drop redundant `if (!impl_)` guards in `setNumThreads()`, `load()`, `runtimeStats()`, and `process()`. The class is non- copyable and non-movable (it carries a `std::mutex` member, which suppresses implicit move ctors/assignment), so `impl_` was always non-null between construction and destruction; the guards were dead code. Local validation on win32-x64: - `bare-make build` clean (warnings unchanged from before refactor; no new errors). - `npm run test:cpp` -- 29/29 tests pass (3 ClassificationModelTest + 24 PreprocessorTest + 1 BnEpsilonGuard + 1 architecture sanity). - `npm run test:integration` -- 14/14 tests pass, 140/140 asserts. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 refactor(addon,binding): single-place arg validation in C++ AddonJs (review #1727) Addresses jesusmb1995's review comments on tetherto/qvac#1727: > Why normalizing here instead of just throwing at `AddonJs` and > having a central place where to do the validation? I had previous > conversations with Gianfranco (and Nidhin) on LLM we agreed it > makes sense to do parsing/validation at on place, namely at AddonJs > construction, and throw there if wrong/invalid arguments directly > at c++. > > For construction/config arguments, `createInstance()` should be the > place that parses and validates the JS values before building the > native model: model path, threads, and any other config should > either produce a valid C++ configuration or throw immediately > there. That keeps the JS wrapper thin and avoids having two > different sources of truth for what is valid. > > For per-call image arguments, the same principle applies at the > native job boundary before `ClassificationModel`: parse the JS > input once, construct an explicit validated `ClassifyInput`, and > then let the model/preprocessor operate on that clean shape. That > removes the duplicated JS normalization/magic-byte checks and > avoids relying on weak `0` sentinel values for "not provided". Changes: 1. addon/src/model-interface/ClassificationModel.hpp - Replace the four sentinel-zero fields (`width = 0`, `height = 0`, `channels = 0`, `topK = 0` overloaded as "not provided") with an explicit `std::optional<RawRgbDims>` member that captures the "is the input raw RGB or encoded?" decision in a type the compiler can check. - `topK = 0` stays only because it has a meaningful "no filter" interpretation; non-zero values are validated > 0 at the binding boundary. 2. addon/src/model-interface/ClassificationModel.cpp - Translate `optional<RawRgbDims>` -> the existing `(declaredWidth, declaredHeight, declaredChannels)` triplet consumed by `preprocess::preprocessToTensor`. The preprocessor's internal "0 means not-provided" convention is preserved (it is a private API; the JS-facing one is the explicit optional). 3. addon/src/addon/AddonJs.hpp - `createInstance` now validates: * `path` must be a non-empty string, * `config.threads` (when provided) must be a positive integer. These were previously not enforced; non-positive thread counts would have silently passed through to libggml and raw negatives would int-truncate. - `runJob` is now the single source of truth for per-call validation: * `content` rejection message rephrased to include the substring "required" so the JS test `t.exception.all(..., /required|null|undefined/i)` keeps passing without relying on a separate JS-side TypeError. * Dimension triplet enforcement: caller must provide either all of {width, height, channels} or none of them; partial shapes are rejected with an explicit message rather than leaking through as a buffer-size mismatch downstream. * Each dim is range-checked as int32_t before being committed to ClassifyInput's optional<RawRgbDims>, so a negative JS Number cannot wrap to ~4 billion via uint32_t cast and tunnel into validateRawRgb. * `topK` is range-checked > 0 if provided. 4. test/unit/classification_model_test.cpp - Migrate the three `input.width = ...; input.height = ...; input.channels = ...;` blocks to the new `input.rawRgb = qcc::RawRgbDims{...};` shape. No behavioural change. 5. index.js - Strip every JS-side validation helper that duplicated C++ work: `assertBuffer`, `normaliseDimensionOptions`, `isSupportedEncoded`, `startsWith`, `JPEG_MAGIC`, `PNG_MAGIC`. The classify() body now literally builds `{ type, content, [width, height, channels, topK] }` from the caller's arguments and forwards to the binding. - Lifecycle checks (`!this._addon || !this.state.configLoaded`) and the file-existence check in `load()` stay in JS: * lifecycle is a JS-managed state, not a value-shape question; * the existence-check delivers a more actionable error message ("MobileNet GGUF weights not found at: <path>") than letting the load reach C++ and throw "Failed to open GGUF file: <path>" downstream. - Module-level comment documents the JS-as-thin-pass-through contract so a future contributor cannot re-introduce the duplicated validation by mistake. Local validation on win32-x64: - `bare-make build` clean. - `npm run test:cpp` -- 29/29 (incl. the migrated raw-RGB ClassificationModelTest cases). - `npm run lint` -- clean. - `npm run test:integration` -- 14/14 tests, 140/140 asserts. All existing brittle regex matchers in `error-cases.test.js` (`/required|null|undefined/i`, `/empty/i`, `/format|invalid/i`, `/decode|jpeg|invalid/i`, `/match|size|width|height|raw/i`, `/format|jpeg|png|bmp/i`, `/not loaded|load\(\)/i`, `/not loaded|destroyed|state/i`) match the new C++-issued error messages, so no test regex needed updating. Files: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/test/unit/classification_model_test.cpp packages/qvac-lib-infer-ggml-classification/index.js Made-with: Cursor * QVAC-17481 chore(test,docs): post-sync audit follow-ups (consistency + uniform url strip + readme) Picks up the lower-risk consistency / correctness items from the post-sync self-audit. None of these change observable behaviour; they remove duplication and small footguns that would otherwise surface as drift in future maintenance. 1. test/integration/utils.js -- single source of truth for the mobile asset-key heuristic + uniform `file://` strip. - Extract `_resolveMobileAsset(filename)` from the two duplicate-by-design loops in `resolveModelPath()` and `_resolveImagePath()`. Both used the same four-element candidate-key array (`../../testAssets/${name}`, `../mobile/testAssets/${name}`, `testAssets/${name}`, `../testAssets/${name}`); future framework key-shape changes now land in one place instead of being silently inconsistent. - Extract `_stripFileUrlPrefix(mapped)` and switch from `mapped.slice('file://'.length)` to `mapped.replace(/^file:\/\//, '')`. The slice version leaves a stray leading `/` if the harness ever returns a triple-slash `file:///abs/...` URL (harmless on POSIX-mobile, malformed on a hypothetical Windows-mobile target). The regex strip is uniformly correct across both shapes. - Add `makeClassifier(overrides)` -- the standard test-instance factory. Centralises model-path + logger wiring so any future constructor-arg change in the addon lands in one place instead of N inline `new ImageClassifier(...)` callsites. 2. test/integration/classify.test.js + error-cases.test.js -- adopt the shared factory. - classify.test.js drops the inline `new ImageClassifier({ modelPath: resolveModelPath(), logger: createLogger() })` (4 callsites) in favour of `makeClassifier()`. Imports trimmed accordingly: drops `ImageClassifier`, `createLogger`, `resolveModelPath` from the destructure (unused after refactor; standardjs would have flagged them anyway). - error-cases.test.js drops its local `makeClassifier()` (which was a duplicate of what now lives in utils.js) and imports the shared one. Net: -1 module-level function. 3. README.md -- fix the `**threads**` markdown bullet. The line `- \`**threads**\` -- ...` wraps the bold markers in backticks, which renders the asterisks literally inside an inline-code span (`**threads**` instead of bold **threads**). Bare-renderable replacement: `- **\`threads\`** -- ...` reads as bold inline-code, matching the intent of the surrounding bullets. This was a pre-existing bug noted as "out-of-scope" in the line-reflow pass but is trivial to fix. Local validation on win32-x64: - `npm run lint` clean. - `npm run test:cpp` -- 29/29 (no behavioural change, just end-to-end smoke that the test-utils refactor did not break the C++ harness paths). - `npm run test:integration` -- 14/14, 140/140 asserts (run twice to confirm; one in-between-test SIGSEGV observed on the first run is the known upstream `OutputCallBackJs` UAF the hack branch deliberately leaves un-papered-over, not caused by this commit). Files: packages/qvac-lib-infer-ggml-classification/test/integration/utils.js packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js packages/qvac-lib-infer-ggml-classification/test/integration/error-cases.test.js packages/qvac-lib-infer-ggml-classification/README.md Made-with: Cursor * QVAC-17481 chore: rename addon directory to packages/classification-ggml Aligns the addon's directory and CI-workflow filenames with the published package name (`@qvac/classification-ggml`) so that the folder and the npm scope read consistently. Per a reviewer-style naming convention request: Package name: @qvac/classification-ggml Addon folder: classification-ggml Renames (53 files via `git mv`, all rename detection clean -- 31 insertions / 31 deletions across 54 files): packages/qvac-lib-infer-ggml-classification/ -> packages/classification-ggml/ .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml -> .github/workflows/integration-mobile-test-classification-ggml.yml .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml -> .github/workflows/integration-test-classification-ggml.yml .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml -> .github/workflows/prebuilds-classification-ggml.yml In-file text updates (paths only -- no functional change): - All four workflows (`integration-mobile-test-classification-ggml.yml`, `integration-test-classification-ggml.yml`, `prebuilds-classification-ggml.yml`, plus the hack-branch `on-pr-qvac-lib-infer-llamacpp-llm.yml`) now reference the new `packages/classification-ggml/**` path filter, `PKG_DIR=packages/classification-ggml` env, the renamed sibling workflow filenames, and the new `addon/packages/classification-ggml` `ADDON_WORKDIR` for the mobile harness. - `packages/classification-ggml/CMakeLists.txt` -- `project(...)`, `add_bare_module(...)`, and every `${...}` target reference renamed to `classification-ggml`. The bare module's output filename (`qvac__classification-ggml.bare`) is unchanged because bare derives it from `package.json` `name` (`@qvac/classification-ggml`), not from the CMake project name. - `packages/classification-ggml/package.json` -- repository.directory, homepage URL. - `packages/classification-ggml/README.md`, `index.js`, and `docs/onnx-to-gguf-conversion.md` -- doc paths. Deliberately NOT renamed (out of scope -- code-level identifiers, not file paths): - C++ namespace `qvac_lib_infer_ggml_classification` (8 files). Other addons in this monorepo do NOT tie their C++ namespace to the folder name (e.g. `qvac::ttslib::lavasr` lives under `packages/qvac-lib-infer-onnx-tts/`), so the namespace is a code-style choice rather than a path-consistency one. Can be folded into a follow-up if reviewers want full consistency there too. Local validation on win32-x64 (in the renamed `packages/classification-ggml/` directory): - `npm install` clean. - `bare-make generate` + `bare-make build` + `bare-make install` succeed; `qvac__classification-ggml.bare` produced under `prebuilds/win32-x64/` (filename unchanged). - `npm run lint` clean. - `npm run test:cpp` 29/29. - `npm run test:integration` 14/14, 140/140 asserts (perf-report correctly written under `packages/classification-ggml/test/results/`). Made-with: Cursor * QVAC-17481 fix(addon,test): align upstream-bug workarounds with monorepo convention Two upstream issues block the addon's CI without local mitigations. Both are paper-trailed in detail in `remote_logs/issues_report.md` (gitignored, internal). Inline comments at the workaround sites are kept short to match how other addons in the monorepo handle the same races. 1. `OutputCallBackJs` use-after-free race ---------------------------------------- `qvac_lib_inference_addon_cpp::~OutputCallBackJs` deletes JS refs synchronously while `uv_close` on its async handle is asynchronous (queue/OutputCallbackJs.hpp:48-58); a `uv_async_send` queued just before destruction fires against dead refs and crashes in `js_open_handle_scope`. Reproduced as SIGSEGV (linux-x64/-arm64, darwin-arm64), `Fatal signal 11` (Android logcat), and `EXC_BAD_ACCESS @ 0x1a0` (iOS crash report) across rapid create/ destroy cycles. Other addons in this monorepo paper over the same race in their integration suites with sleep-around-unload, e.g. ocr-onnx/test/integration/lifecycle.test.js:56,85,115 ocr-onnx/test/integration/full-ocr-suite.test.js:107,115,123 qvac-lib-infer-llamacpp-llm/test/integration/sliding-context.test.js:163,355 We adopt the same pattern via `cleanupClassifier()` in `test/integration/utils.js` (two-phase: 500-1000ms pre-unload yield + 2000-3000ms post-unload drain). The pre-unload yield is required for our addon specifically because `await classify()` resolves on the first `Output` event while the worker thread keeps queuing follow-up events (`RuntimeStats`, `JobCompleted`); without it the follow-ups land DURING `~OutputCallBackJs`. Every classify() call in the integration tests was migrated to `cleanupClassifier()`. The removed local C++ wrapper (`DeferredOutputCallBackJs`) was a real lifetime fix but kept us out of step with how the rest of the monorepo handles this; once upstream is patched the sleeps drop everywhere at once. 2. Win32-x64 first-`js_create_double` returns 0.0 ---------------------------------------------- The very first `js_create_double` call in the process returns 0.0 on the Azure GitHub-hosted `windows-2022` runner (clang-cl + bare-runtime + V8). Subsequent calls in the same handle scope are correct. No local Windows repro; only the CI runner image is affected. Other addons accidentally dodge the symptom because their first emitted number is naturally 0 (whisper/parakeet `segment.start`), they assert only `typeof === 'number'` / `!isNaN` (llamacpp-llm stats), they never assert the value (ocr-onnx bbox coords), or they emit no numbers at all (lib-infer-diffusion / llamacpp-embed). Our 3-class softmax sort + sum-to-1 assertions catch the corruption immediately, so no test-side workaround is possible. Local C++ "burn one" workaround in `JsClassifyOutputHandler`'s lambda preamble: a throwaway `js_create_double(env, 0.0, &dummy)` call consumes the broken first slot so the per-element `Number::create` calls below produce the correct value at index 0. Cost is one ephemeral js_number per classify() call. Other follow-ups in this commit (none disturb code paths above): - `addon.js` lifecycle: `unload()` no longer waits on the pending-job promise. The post-unload sleep in `cleanupClassifier` covers the same window, so `unload()` becomes a thin pass-through (matches what every other addon in the monorepo does). - Top-of-file workaround comment in `AddonJs.hpp` consolidated to a 2-line note at the burn-one site (matches the comment density other addons use; full root cause in the report). - `cleanupClassifier` doc trimmed to 3 lines pointing at the report. Local validation on win32-x64: - bare-make build clean - npm run lint clean - npm run test:cpp 29/29 - npm run test:integration 14/14 + 140/140 asserts Files: packages/classification-ggml/addon.js packages/classification-ggml/addon/src/addon/AddonJs.hpp packages/classification-ggml/addon/src/js-interface/binding.cpp packages/classification-ggml/test/integration/classify.test.js packages/classification-ggml/test/integration/error-cases.test.js packages/classification-ggml/test/integration/utils.js Made-with: Cursor * QVAC-17481 chore: adopt upstream WA fixes from PR #1825 Bumps qvac-lib-inference-addon-cpp from 1.1.5#1 to 1.1.6 (the version shipped by PR #1825) and removes the two local workarounds it was brought in to dodge: - Win32 burn-one js_create_double in JsClassifyOutputHandler is gone; upstream's JsUtils::Number::createDouble now applies a process-wide burn-once guard via static-init. - Two-phase sleep around unload() in cleanupClassifier is gone; upstream's ~OutputCallBackJs now defers js_delete_reference into the uv_close callback via a heap-owned State. Local Win32 validation: 14/14 integration tests + 29/29 C++ unit tests pass; in particular the index-0 marshalling assertions and the back-to-back load/unload cycle test that previously SIGSEGV'd both pass without their prior workarounds. Resolves T1 + T10 from the audit; details in remote_logs/issues_report.md. Made-with: Cursor * QVAC-17481 chore[api]: align lifecycle with llamacpp-llm pattern Re-shape the JS layer so request orchestration mirrors the LLM addon (closes T5-T9 from PR #1727 review): - addon.js becomes a thin C++ binding wrapper (mirrors LlamaInterface): constructor takes `(binding, configurationParams, outputCb, logger)`, exposes `activate()` / `runJob()` / `cancel()` / `unload()`. The bespoke `_pending` Promise + `_outputCallback` are gone; export a shared `mapAddonEvent(rawEvent, rawData, rawError)` instead. - index.js becomes the orchestration layer (mirrors LlmLlamacpp): one `exclusiveRunQueue()` serialises load/classify/unload, one `createJobHandler()` owns the active QvacResponse, and the output callback fans events through `_handleAddonOutputEvent`. - load() now does try/catch around `activate()` and best-effort `_addon.unload()` on failure so a partial init never leaves a zombie native handle (T6). - classify() resolves on the terminal stats event rather than the first ClassifyOutput, eliminating the orphan-callback risk that motivated the `_pending` drain on the previous design (T7, T8). Public shape unchanged: still `Promise<Array<{label,confidence}>>`. - unload() runs through the same queue, calls native `cancel()` on in-flight work, fails the active JS request with `Model was unloaded`, then destroys the native handle (T9). mapAddonEvent is keyed on payload shape (Array → Output, plain object → JobEnded terminal) because the upstream JobRunner emits the stats trailer with a raw `std::vector<std::pair<...>>` RTTI name rather than a literal `*JobEnded` event. Documented inline. Local validation: 14/14 integration + 140/140 asserts in 2.8s (down from 8.2s in Group A — the LLM-style cancel/unload is much faster than the prior drain-then-destroy pattern); 29/29 C++ unit tests; standard lint clean. Made-with: Cursor * QVAC-17481 infra: add canonical on-pr + on-pr-close workflows for classification-ggml Adds the two missing top-level workflow files so the addon now has the full 5-file layout used by every other modern addon in the monorepo (`decoder-audio`, `diffusion-cpp`, `ocr-onnx`, `bci-whispercpp`): - `on-pr-classification-ggml.yml` -- canonical PR trigger router. authorize -> changes -> sanity / ts-checks / cpp-lint / prebuild -> integration / mobile -> merge-guard. Path filters scope to `packages/classification-ggml/**` and the addon's own workflow files. - `on-pr-close-classification-ggml.yml` -- mirror of `on-pr-close-decoder-audio.yml`. Triggers `public-delete-npm-versions` with `packages: classification-ggml` to clean up per-PR npm pre-releases on PR close. Closes T11 from PR #1727 review (olyasir: "rename in same format as other pipelines"). The legacy-named `on-pr-qvac-lib-infer-ggml-classification.yml` on the fork PR-1 branch will be removed at sync-to-PR-1 time. The hack-branch dispatch swap (`on-pr-qvac-lib-infer-llamacpp-llm.yml` hijacked + `*-temp.yml` parking) is intentionally left untouched here: new workflows aren't dispatchable from the GitHub Actions UI until they exist on `main`, so the swap is still our only working dispatch path for hack-branch CI runs. Validation: both files parse with `yaml.safe_load`; every workflow / composite-action reference resolves on disk. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-17481 doc: trim verbose AI-style comments across the addon Closes T2/T3/T4 from PR #1727 (jesusmb1995: "Please remove this comment, its unnecessary... LLM's are too verbose"), and applies the same four cleanup rules across the rest of …
…ons (label-gate) + finalize CHANGELOG 0.1.0 (#2087) Two follow-ups to the classification-ggml addon (PR #1727): 1) Add the `Authorise (label-gate)` job to `on-pr-classification-ggml.yml`. Align with other addons 2) Drop the `— Unreleased` suffix from the `## [0.1.0]` heading in Will be released soon Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `test-cpp` job in `cpp-tests-classification.yml` calls `setup-aws-prebuild` (which assumes an AWS IAM role via OIDC) and `setup-vcpkg` (which configures the S3-backed vcpkg binary cache via `VCPKG_BINARY_SOURCES=x-aws,s3://...`). Both actions require GitHub to issue an OIDC token for the job, which only happens when the job (or its workflow) declares `id-token: write`. The workflow's top-level `permissions` block only has `contents: read`. Without `id-token: write`, `aws-actions/configure-aws-credentials` cannot exchange a GitHub OIDC token for AWS credentials, so every `test-cpp` matrix leg fails immediately with: Credentials could not be loaded, please check your action inputs: Could not load credentials from any providers This bug was hidden until now because the label-gate that was added in #2087 was missing from the original addon PR (#1727), so the `cpp-tests` job had always been skipped on PRs. Fix: add a job-level `permissions` block with `id-token: write` on the `test-cpp` job. This matches the existing pattern in `cpp-test-coverage-tts-ggml.yml`, which has the same AWS OIDC requirement and declares the permission identically. Other cpp-tests workflows (embed, llm, diffusion) are not affected: they bootstrap vcpkg directly from GitHub and do not use the S3 binary cache, so they have no OIDC dependency. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
New Image Classification Addon Based on CNN-Model
Introduces a new inference addon that classifies images into three classes (food / report / other) using a fine-tuned MobileNetV3-Small CNN running on the libggml CPU backend.
What this PR ships
packages/qvac-lib-infer-ggml-classification/publishing as@qvac/classification-ggml:ggml.h/ggml-backend.hAPI — no llama.cpp application-layer dependency, so the addon remains forward-compatible with futurelibggmlupstream merges.eps = 0.001(the architecture- correct value;1e-5causes normalisation drift across all 34 layers). Depthwise separable convolutions, squeeze-and-excite blocks, HardSwish / HardSigmoid / ReLU activations all wired throughggml_conv_2d,ggml_conv_2d_dw,ggml_pool_2d,ggml_hardswish,ggml_hardsigmoid.mobilenet.class_Nmetadata so a future fine-tune can ship different class names without a code change.new ImageClassifier({ modelPath?, logger?, threads?, nativeLogger? })+load()/classify(buffer, opts?)/unload()/destroy(). Accepts JPEG, PNG, or raw-RGB input; validates at the JS layer before reaching native code so no bad input reaches libggml.nativeLoggeropt-in (defaultfalse): the underlyingqvac-lib-inference-addon-cppJsLogger holds a process-wide staticuv_async_tthat is not safe across rapid create/destroy cycles, so the native C++→JS log bridge is disabled unless the caller explicitly opts in. JS-level logging always flows through the caller'slogger.stb_image+stb_image_resize2(bilinear resize to 224×224, ImageNet normalisation, WHCN layout).Build + tests
bare-make+cmake-bare+cmake-vcpkgbuild, targetingggml::ggml/ggml::ggml-base/ggml::ggml-cpuandstbfrom the shared QVAC vcpkg registry.topKfilter, BN epsilon guard, and full preprocessor behaviour.test/images/),topK, raw RGB input, and every error path: null, empty buffer, corrupted JPEG, unsupported format (BMP), mismatched dimensions, pre-load / post-unload, tiny upscale, load/unload cycles.scripts/generate-mobile-integration-tests.js,scripts/validate-mobile-tests.js,test/mobile/ {integration-runtime.cjs, integration.auto.cjs, README.md, testAssets/.gitignore}. The auto-generatedintegration.auto.cjswraps everytest/integration/*.test.jsso the sharedqvac-test-addon-mobileframework picks them up on Android and iOS automatically.CI workflows
Four addon-scoped workflows (path-filtered to this package):
on-pr-qvac-lib-infer-ggml-classification.yml— authorize, sanity checks, TypeScript declaration check, C++ lint, prebuild matrix, desktop integration tests, mobile integration tests, merge-guard.prebuilds-qvac-lib-infer-ggml-classification.yml— Linux x64, Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64 prebuild matrix.integration-test-qvac-lib-infer-ggml-classification.yml— desktop end-to-end tests with the shared performance reporter writing a GitHub step summary.integration-mobile-test-qvac-lib-infer-ggml-classification.yml— AWS Device Farm Android + iOS runs via thetetherto/qvac-test-addon-mobileframework.Out of scope for this PR
packages/sdk/**) lands in a follow-up PR after@qvac/classification-ggml@0.1.0is published to npm.