QVAC-14104 feat[api]: SDK diffusion plugin integration#1021
Merged
Conversation
NamelsKing
reviewed
Mar 24, 2026
049b982 to
75f8125
Compare
Wire up the stable-diffusion.cpp plugin through all SDK layers: - Schema: sdcpp-config.ts with config, stats, request/response schemas - Plugin: resolveConfig for companion artifacts, createModel, streaming handlers - Load model: diffusion entries in all 4 schema locations - Registration: model type, alias, engine-addon map, worker, pear pre-hook - Type widening: FilesystemDL | undefined for loader-less plugins
…add integration tests - Fix sampling_method enum to match C++ addon ground truth (dpm++2m not dpm++_2m) - Add 6 missing sampler values (ipndm, ipndm_v, ddim_trailing, tcd, res_multistep, res_2s) - Fix addon index.d.ts SamplerMethod type to match C++ parser - Consolidate generation ops into single unified handler (txt2img + img2img) - Add dedicated RPC handler, client API, and first-class generation() export - Add 15 integration test definitions and desktop executor - Add examples: txt2img, img2img, flux2-klein - Add comprehensive unit tests for schemas, plugin dispatch, and stats - Wire diffusion into handler-registry, common schemas, model-config-utils, get-model-info
The bare-client dispatches via handlers/index.ts (direct mode), not handler-registry.ts (IPC worker mode). Missing entry caused RPC_NO_HANDLER when running examples via bare runtime.
Add dedicated generateDiffusionName() to produce clean export constants for diffusion registry models (SD → SD_V2_1, SDXL → SDXL_BASE, FLUX, VAE). Includes 4 unit tests covering all model families.
Run bun update-models to pull 21 new models (including diffusion) from the live registry. Replace QVAC_DIFFUSION_MODEL env var in model-manager with the FLUX_2_KLEIN_4B_Q4_0 registry constant.
Resolve statsPromise after stream loop exits (not only on done:true), add statsRejecter for error propagation, derive GenerationClientParams from schema type to prevent drift, and fix lint warnings in generation ops and test executor.
Revert statsPromise try/catch/rejecter and GenerationClientParams Omit<> derivation — these diverged from the established patterns in ocr.ts, translate.ts, and transcription.ts. Also remove unrelated model history file that was incorrectly included.
ecb1bf8.txt was a codegen artifact from bun run update-models during the merge — it should not have been committed here.
FLUX.2 models require companion LLM (Qwen3-4B) and VAE models to create the stable-diffusion context. Without them, SdModel::load() fails. Also switches device from CPU to GPU and adds img2img test fixture.
…l diffusion examples All three diffusion examples now default to the required companion LLM (QWEN3_4B_Q4_K_M) and VAE (FLUX_2_KLEIN_4B_VAE) models, matching the desktop test configuration. Also switches device from cpu to gpu.
Replace photo.png with elephant.jpg from lib-infer-llamacpp-llm/media. Update generation test definitions to reference the new filename.
import.meta.dirname is undefined in Bare runtime. Use path.resolve with a CWD-relative path instead, matching the documented convention of running examples from the SDK root.
Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836.
The server already emits progress ticks (step/totalSteps/elapsedMs) during diffusion generation but the client was silently dropping them. Add a progressStream async generator to the generation() return type so SDK callers can show progress UI. Update the streaming-progress integration test to assert progress tick presence and field validity.
Refactor generation() to follow the completion() multi-stream pattern: a single background processResponses() task drives the RPC stream and fans out to outputStream, progressStream, outputs, and stats independently. This fixes two issues with the previous implementation: - consuming progressStream alone now works (no longer requires outputStream iteration to drive the RPC stream) - RPC errors propagate to all consumers (streams throw, promises reject)
Regenerates bun.lock and models/registry/models.ts to restore FLUX model entries that were lost during rebase conflict resolution.
- Rename config field `wtype` → `type` to match C++ context handler key - Expand weight type enum to match addon: add auto, bf16, q2_k, q3_k, q4_k, q5_k, q6_k; remove invalid "default" - Remove `schedule` config field (no C++ context handler exists for it) - Fix per-request scheduler enum: remove "default" (addon rejects it), add sgm_uniform, simple, lcm, smoothstep, kl_optimal, bong_tangent - Remove phantom stats fields from diffusionStatsSchema (generation_time, totalTime, stepsPerSecond, msPerStep, megapixelsPerSecond, steps, output_count) — addon RuntimeStats never emits these - Update unit tests and generation executor to use real addon fields
- Add std_default to rng enum to match addon RngType - Add sampler_rng config field (separate RNG for sampler) - Forward sampler_rng from plugin to addon
Aligns top-level API naming with other addon-specific surfaces (completion, ocr, embed) — "diffusion" is specific to the stable-diffusion.cpp backend, while "generation" is too generic and could apply to any inference addon. Rename covers: public function, types, schemas, RPC routing literal, handler registry, plugin handler key, examples, integration tests, and unit tests. Addon RuntimeStats field names (generationMs, etc.) are unchanged — those are wire-format names from the C++ addon.
- Cast streamError to Error to satisfy @typescript-eslint/only-throw-error (closure type narrowing false positive) - Remove unnecessary SdcppConfig type assertion and unused import in load-model.ts
Strip init_image, strength, and all img2img code paths from the SDK surface. Will be re-added when the addon fully supports it (PR #884).
Register diffusionStream operation metrics (generationMs, totalSteps, totalImages, totalPixels) following the pattern of all other addons. Add sdcppGeneration to deviceConfigDefaultsSchema so device-specific config defaults can be applied to diffusion models.
…plugin-v2 # Conflicts: # packages/sdk/tests-qvac/tests/desktop/consumer.ts # packages/sdk/tests-qvac/tests/mobile/consumer.ts
…ac into feat/diffusion-sdk-plugin-v2
lauripiisang
approved these changes
Apr 3, 2026
NamelsKing
previously approved these changes
Apr 3, 2026
NamelsKing
approved these changes
Apr 3, 2026
Contributor
|
/review |
Contributor
Tier-based Approval Status |
Contributor
|
/review |
olyasir
added a commit
that referenced
this pull request
May 14, 2026
…assification (#1727) * QVAC-17481 feat: add @qvac/classification-ggml MobileNetV3 image classification addon Introduces a new inference addon that classifies images into three classes (food / report / other) using a fine-tuned MobileNetV3-Small CNN running on the libggml CPU backend. Follows the established QVAC addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion). ## What this PR ships - New package `packages/qvac-lib-infer-ggml-classification/` publishing as `@qvac/classification-ggml`: - Native addon: custom 34-layer MobileNetV3-Small compute graph built directly against the public `ggml.h` / `ggml-backend.h` API — no llama.cpp application-layer dependency, so the addon remains forward-compatible with future `libggml` upstream merges. - Load-time BatchNorm fold with `eps = 0.001` (the architecture- correct value; `1e-5` causes normalisation drift across all 34 layers). Depthwise separable convolutions, squeeze-and-excite blocks, HardSwish / HardSigmoid / ReLU activations all wired through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`, `ggml_hardswish`, `ggml_hardsigmoid`. - FP16 GGUF weights bundled inside the package (2.94 MB); class labels are read from the GGUF `mobilenet.class_N` metadata so a future fine-tune can ship different class names without a code change. - Public JS API: `new ImageClassifier({ modelPath?, logger?, threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)` / `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input; validates at the JS layer before reaching native code so no bad input reaches libggml. - `nativeLogger` opt-in (default `false`): the underlying `qvac-lib-inference-addon-cpp` JsLogger holds a process-wide static `uv_async_t` that is not safe across rapid create/destroy cycles, so the native C++→JS log bridge is disabled unless the caller explicitly opts in. JS-level logging always flows through the caller's `logger`. - Image preprocessing via vendored-through-vcpkg `stb_image` + `stb_image_resize2` (bilinear resize to 224×224, ImageNet normalisation, WHCN layout). ## Build + tests - `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting `ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from the shared QVAC vcpkg registry. - C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9 SE blocks), load + inference, determinism, `topK` filter, BN epsilon guard, and full preprocessor behaviour. - brittle + bare JS integration tests covering load, classify (all 6 public sample images under `test/images/`), `topK`, raw RGB input, and every error path: null, empty buffer, corrupted JPEG, unsupported format (BMP), mismatched dimensions, pre-load / post-unload, tiny upscale, load/unload cycles. - Mobile test scaffolding following the shared convention: `scripts/generate-mobile-integration-tests.js`, `scripts/validate-mobile-tests.js`, `test/mobile/ {integration-runtime.cjs, integration.auto.cjs, README.md, testAssets/.gitignore}`. The auto-generated `integration.auto.cjs` wraps every `test/integration/*.test.js` so the shared `qvac-test-addon-mobile` framework picks them up on Android and iOS automatically. ## CI workflows Four addon-scoped workflows (path-filtered to this package): - `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity checks, TypeScript declaration check, C++ lint, prebuild matrix, desktop integration tests, mobile integration tests, merge-guard. - `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64, Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64 prebuild matrix. - `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop end-to-end tests with the shared performance reporter writing a GitHub step summary. - `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` — AWS Device Farm Android + iOS runs via the `tetherto/qvac-test-addon-mobile` framework. ## Public-data / test-image policy All public correctness assertions in this package are scoped to the 6 test images under `test/images/` (2 per class). No confidential fine-tuning numbers, validation-set sizes, per-class metrics, or references to any internal validation dataset appear in this PR, in any file it ships, or in CI logs. Internal numerical-equivalence gating against an ONNX FP32 reference is handled pre-release by a development-only script that is not part of this PR. ## Out of scope for this PR - SDK plugin / schema integration (`packages/sdk/**`) lands in a follow-up PR after `@qvac/classification-ggml@0.1.0` is published to npm. This mirrors the diffusion rollout (#656 → release → #1021). - GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0. Made-with: Cursor * QVAC-17481 fix(ci): correct setup-bare-tooling action name in classification workflows The prebuild and integration-test workflows for @qvac/classification-ggml referenced `tetherto/qvac/.github/actions/setup-bare-toolchain`, which does not exist. The action is named `setup-bare-tooling` (same name used by the llamacpp-llm, nmtcpp, and diffusion addons at the identical pinned SHA). All 6 prebuild matrix jobs failed at step 1 with "Can't find 'action.yml' ... for action 'setup-bare-toolchain'" until this rename is in place. Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(ci): add per-platform vcpkg/NDK/Apple-clang setup to classification prebuilds The classification prebuilds workflow was missing the per-platform toolchain steps that sibling addons (diffusion, nmtcpp) have after `setup-vcpkg-cache`. As a result, `VCPKG_ROOT` was never exported, CMake couldn't locate the vcpkg toolchain, and `bare-make build` failed on every platform. Changes to .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml: - setup-vcpkg-cache: drop unknown inputs `vcpkg-path` and `github-packages-token` (action only accepts platform, arch, s3-bucket-path). Was silently ignored but emitted warnings. - Add per-OS vcpkg bootstrap / configuration: macOS (darwin, ios): clone microsoft/vcpkg tag 2025.12.12, bootstrap, export VCPKG_ROOT. Linux (linux, android runners): export VCPKG_ROOT=$VCPKG_INSTALLATION_ROOT. Windows: export VCPKG_ROOT from $env:VCPKG_INSTALLATION_ROOT with backslash-to-forward-slash normalisation. - Windows-only: set CMAKE_GENERATOR="Visual Studio 17 2022" and, for the x64 matrix row, CMAKE_GENERATOR_PLATFORM=x64. - Android-only: export ANDROID_NDK / ANDROID_NDK_HOME / ANDROID_NDK_ROOT from ANDROID_NDK_LATEST_HOME, derive ANDROID_TOOLCHAIN_ROOT, set ANDROID_NATIVE_API_LEVEL=24. - iOS and darwin: move Homebrew llvm / llvm@18 aside so the Apple toolchain clang is on PATH (matches diffusion). All additions mirror the working pattern in prebuilds-lib-infer-diffusion.yml and prebuilds-qvac-lib-infer-nmtcpp.yml at the same pinned action SHA. No Vulkan or apt X11 steps were added: this addon is CPU-only ggml and has no graphics dependencies. Made-with: Cursor * QVAC-17481 fix: add missing <limits> include and CI build-failure diagnostics Two related changes to unstick the prebuild matrix: 1. addon/src/model-interface/ImagePreprocessor.cpp uses std::numeric_limits<int>::max() but does not #include <limits>. MSVC pulls <limits> in transitively (via <algorithm> in its STL), but libc++ and libstdc++ on clang/gcc do not. This is the most plausible reason all five non-Windows prebuild jobs (linux-x64, linux-arm64, android-arm64, darwin-arm64, ios-arm64) failed identically at `bare-make build` while the Windows host build succeeded. 2. prebuilds-qvac-lib-infer-ggml-classification.yml gains a `Dump build context on failure` step that runs only if `bare-make build` fails. It prints toolchain identity, lists the build/ tree, tails CMake configure logs, dumps any *.log under build/, and tails up to 20 vcpkg buildtree logs. Mirrors the `Dump vcpkg build logs on failure` pattern in prebuilds-lib-infer-diffusion.yml. Without this, every CI failure currently surfaces only as `Process completed with exit code 1.`, which is essentially undebuggable from the run summary page. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(ci): use --platform (not --target) for bare-make generate Root cause confirmed from job log of run 24850328468 (linux-x64): bare-make generate --target linux --arch x64 Bail: UNKNOWN_FLAG: target The bare-make CLI installed by setup-bare-tooling does not accept `--target`; it only accepts `--platform`. Diffusion and nmtcpp both use `--platform`. Locally I had an older bare-make that accepted `--target` as an alias, which masked the bug on my Windows host. Step 17 (Generate build) was failing immediately with the above "Bail: UNKNOWN_FLAG", causing every downstream step (build, install) to fail too across all 6 prebuild matrix jobs. Also harden the diagnostic step `Dump build context on failure`: disable `-e` and `pipefail` for that step so a missing `build/` directory or empty `find` result no longer makes the diagnostic step itself exit non-zero (it should never mask the real failure). Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix: pin ggml to CPU-only feature set + guard backend iteration CI runs were failing because the default ggml vcpkg feature set pulls in the `vulkan` (Linux/Windows/Android) and `metal` (Apple) GPU backends, which forces `find_package(Vulkan)` at configure time and forces the prebuilds workflow to install the Vulkan SDK on every runner. Since this addon is CPU-only by design (only ever calls ggml_backend_cpu_init), the GPU backends are dead weight: extra compile time, extra dependencies in shipped prebuilds, and extra runtime requirements on user machines (e.g. libvulkan.so.1). Two related changes, no functional impact on the addon itself: 1. packages/qvac-lib-infer-ggml-classification/vcpkg.json Add "default-features": false` to the ggml dependency. This opts out of vulkan / metal / cuda / opencl while keeping the core CPU backend (which is the implicit base, not a named feature). Verified locally on win32-x64: vcpkg rebuilt `ggml:x64-windows@2026-01-30#5` from source in 26s without Vulkan, generate + build + install all green, and the JS integration test ran the model end-to-end producing correct top labels (food/report/other) for every sample image. 2. packages/qvac-lib-infer-ggml-classification/CMakeLists.txt Guard the GGML_AVAILABLE_BACKENDS iteration with `if(TARGET ggml::${_backend})`. The upstream variable advertises every backend the port knows about, but real CMake targets only exist for backends that were actually built. Without the guard, add_bare_module's get_target_property() crashes on Android (where Vulkan and OpenCL are listed as available but not built). Defensive change; no behavioural difference when targets do exist. Local artifact size: prebuilds/win32-x64/qvac__classification-ggml.bare is 1.6 MB; no shipped vulkan loader. Made-with: Cursor * QVAC-17481 fix(ci): match prebuild- artifact prefix in mobile tests The mobile integration workflow downloaded artifacts with patterns `android-*` / `ios-*` (PREBUILD_ARTIFACT_PREFIX was empty), but the prebuilds workflow names artifacts `prebuild-android-arm64` / `prebuild-ios-arm64`. Result: `Total of 0 artifact(s) downloaded`, followed by "ERROR: No prebuilds found!" — both Android and iOS mobile jobs failed at this exact step in run 24891210942. Set PREBUILD_ARTIFACT_PREFIX to "prebuild-" so the resulting patterns become `prebuild-android-*` and `prebuild-ios-*`, matching the actual artifact names. Mirrors how the desktop integration workflow already filters (it uses `prebuild-${platform}-${arch}*` directly). File: .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(model): zero-input warmup pass to defeat cold-inference NaN ggml's backend graph allocator leaves intermediate tensor buffers and the input/output tensors uninitialised after `buildGraph` returns. Whatever stale heap residue happens to occupy those addresses can leak into the very first inference and produce non-finite logits on a heap-state-dependent basis. CI run 24891210942 caught this on win32-x64: meal_1.jpg (the first sample classified after instance creation) failed assert 9 (`Math.abs(sum - 1) < 1e-3` -- probabilities sum was not ~1) and assert 10 (`result[0].confidence >= result[1].confidence` -- sort comparison broke because the first confidence was NaN). Asserts 11..72 covering the other five sample images all passed: by then the second inference had overwritten the dirty buffers with real data. This is a classic uninit-memory bug: behaviour depends on whatever the heap happens to contain at process start. My local Windows build did not trip on it (different heap layout); the Azure CI runner did. Same compiler family, same code, different result. Fix: at the end of `ClassificationModel::load()`, run one full forward pass with a zero-filled input tensor and discard the output. This forces ggml's compute graph to write every backend buffer with a deterministic value before any user-visible classify() call ever sees the model. Cost is one cold inference per `load()` (~50-200 ms on a CPU runner), paid once at addon startup, never visible to the caller. Local validation on win32-x64 with this change: integration test 1 (72/72 asserts including all sum-to-one and sort-desc checks) now passes deterministically across rebuilds. The unrelated lifecycle SIGSEGV between separate ImageClassifier instances (likely in qvac-lib-inference-addon-cpp's JobRunner / OutputCallbackJs uv_ resources, not addressed here) still surfaces, just later in the test run -- that needs a separate investigation in addon-cpp. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 fix(model): full-pipeline warmup eliminates win32 cold-inference NaN The previous zero-input warmup (commit af12cdd1) wrote zeros directly to the input tensor and ran ggml_backend_graph_compute. CI run 24892803959 showed it was insufficient: win32-x64 still failed asserts 9 + 10 on meal_1.jpg with NaN in result[0].confidence, while linux-arm64 / darwin / linux-x64 all passed. Hypothesis: ggml's CPU backend on MSVC has lazy-init code paths (SIMD kernel JIT / FP state setup) that only trigger on non-trivial inputs reaching the post-preprocess range, and the zero-input warmup didn't exercise them. The bug therefore surfaces on the first real classify() with an ImageNet-normalised image. Fix: replace the synthetic warmup with one that goes through the EXACT same pipeline classify() uses end-to-end: 1. Synthesise a small (32x32) raw RGB buffer with a deterministic non-zero gradient pattern (uint8 values from `(i * 7) & 0xFF`). 2. Run preprocess::preprocessToTensor on it (resize to 224x224 + ImageNet normalise + channel reorder to WHCN). 3. ggml_backend_tensor_set the result, run the full compute graph, and read the output back via ggml_backend_tensor_get. Cost: one full classify-equivalent pass at load() time (~50-200 ms on a CPU runner), paid once per ImageClassifier instance, never visible to the caller. Output is discarded; the goal is to leave every backend buffer fully written and every lazy-init code path exercised before user-visible classify() runs. Local validation on win32-x64: 14/14 integration tests pass with this change (was failing test 1 asserts 9 + 10 on meal_1 before). Also applies the clang-format-19 layout the cpp-lint check expected, unblocking that job. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 fix(addon): drain in-flight job in unload(); persistent perf reporting Two related changes that together unblock multi-instance integration tests across linux-x64 / darwin-arm64 / android / ios and address the inference-latency-visibility ask. 1. addon.js — make unload() wait for the in-flight job to settle The previous unload() flow rejected this._pending immediately and then synchronously called binding.destroyInstance(). The native side (qvac-lib-inference-addon-cpp's JobRunner uses a worker thread; OutputCallbackJs uses a uv_async_t handle) often still had a callback pending at that moment, and destroying the instance underneath the in-flight callback raced with the uv_close lifecycle. The result was a SIGSEGV (use-after-free) observed across linux-x64 (both ubuntu-22.04 + 24.04), darwin-arm64, and the on-device Android/iOS Device Farm jobs in CI runs 24891210942 and 24892803959. linux-arm64 happened to win the race on those runs but the bug is fundamentally non-deterministic. Fix: track a separate `_pendingSettled` Promise that resolves the moment _outputCallback fires (whether the user-facing classify() Promise resolved or rejected). unload() now awaits that signal before calling destroyInstance, so the worker thread / async handle have provably finished when the native teardown runs. The user-facing classify() Promise contract is unchanged. This is a correctness improvement to the ImageClassifier API contract: after `await classifier.unload()` returns, native resources are now genuinely released (not "scheduled to be released, please don't peek"). 2. test/integration/utils.js + classify.test.js — crash-survivable inference-latency reporting + load-time metric The performance-report.json was previously only flushed in process.on('exit'), so any SIGSEGV mid-test discarded all collected metrics. Now we additionally flush the JSON file after every recorded metric. Even a partial run leaves a usable per-platform latency snapshot in the uploaded artifact. Also adds recordLoadTime(label, ms) to capture the cost of constructing + load()ing an ImageClassifier (warmup + GGML graph build + weights read), and threads it into the first integration test as `load:cold`. This complements the per-image classify timings already recorded as `classify:<file>` and uploaded as artifact `classification-perf-report-{platform}-{arch}`. Local validation on win32-x64: 14/14 tests pass cleanly with this change set; performance-report.json contains 7 results (load:cold + 6 classify:<file>) on disk before the process exits. Files: packages/qvac-lib-infer-ggml-classification/addon.js packages/qvac-lib-infer-ggml-classification/test/integration/utils.js packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js Made-with: Cursor * QVAC-17481 fix(addon): defer OutputCallBackJs destruction to avoid use-after-free race Root cause (in `qvac-lib-inference-addon-cpp:OutputCallBackJs.hpp`): The upstream destructor calls `uv_close(asyncHandle, deleter)` -- which is asynchronous -- and then IMMEDIATELY runs `js_delete_reference` on its JS handle/callback refs before returning. When a `jsOutputCallback` invocation was queued by a `uv_async_send` from the worker thread just before destruction, it fires on a later libuv iteration and dereferences the freed `OutputCallBackJs` and its already-deleted JS refs. This explained the SIGSEGV (linux-x64 24.04, darwin-arm64) and the on-device APP CRASH (Android / iOS Device Farm) observed across rapid ImageClassifier create/destroy cycles in CI runs 24891210942, 24892803959, 24897445066. The bug is timing-dependent, which is why linux-arm64 consistently wins the race and passes while other platforms fail. Fix (this commit, in our binding.cpp only): Introduce a `DeferredOutputCallBackJs` wrapper that implements `addon_cpp::OutputCallBackInterface` by composing the upstream `addon_cpp::OutputCallBackJs` as a `unique_ptr` and forwarding `initializeProcessingThread / notify / stop` calls to it. The wrapper is what `AddonCpp` now owns; the inner upstream callback is owned by our wrapper. AddonCpp field destruction order is: 1. `~AddonCpp` body: `outputCallback_->stop()` (our wrapper's stop forwards to inner). 2. `jobRunner_` destroyed: JOINS the worker thread. No new `uv_async_send` can happen from this point on. 3. `outputCallback_` destroyed: our wrapper's destructor runs. 4. There may still be `uv_async_send` callbacks QUEUED before step 2 that are pending on the libuv loop. Our destructor releases ownership of the inner callback into a heap-allocated `uv_check_t` whose callback (firing AFTER the poll phase on the next libuv iteration -- i.e. after any queued async callback has fired safely against the still-alive inner) deletes the inner, then closes and deletes itself. The check handle is unref'd so it does not keep the libuv loop alive on its own. This is a real lifetime-management fix, not a timing workaround. When upstream's destructor is corrected, the wrapper becomes a pass-through with no functional effect. We will also submit the fix upstream. Local validation on win32-x64: 14/14 integration tests pass, 90/90 asserts, including test 14 (`load -> unload -> load cycles do not leak handles`) which explicitly exercises the pattern that was racing the upstream bug. File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp Made-with: Cursor * QVAC-17481 fix(model,test): defensive softmax/sort + per-inference diagnostic trace Three related changes that together (a) make the classification output well-formed under any numerical edge case and (b) give us first-class visibility into whatever the model actually returns on every CI platform. No workarounds or test-masking -- the C++ changes apply uniformly to production classify() calls and the diagnostic logs are plain stderr output behind an opt-in env var (plus always-on per-image t.comment() in tests). 1. addon/src/model-interface/ClassificationModel.cpp -- softmax() Previously: - Called std::max_element on a span that could contain NaN (max_element behaviour on NaN is unspecified). - Skipped normalization when sum <= 0 but RETURNED the unnormalized probs (could leave callers with all-zero or non-sum-to-1 probabilities). Now: - Finds max by explicit isfinite() walk, defaulting to -inf if every logit is non-finite. - If max is non-finite (all NaN/Inf), returns a uniform distribution (1/N per class) so callers always see a valid probability vector that sums to 1. - Per-element exp() input is skipped when non-finite (produces 0 for that element rather than NaN). - If the exponential sum is not finite or <= 0, falls back to uniform distribution instead of returning unnormalized zeros. This is defence in depth. MobileNetV3-Small on well-normalized input never produces NaN logits in practice, but if upstream ggml CPU backend ever surfaces a numerical bug (or a future quantised model does) we now cannot silently corrupt the user-visible probability distribution. 2. addon/src/model-interface/ClassificationModel.cpp -- std::sort Added explicit is-finite guards in the comparator. Non-finite confidences now compare as less than any finite value, giving strict-weak-ordering even with degenerate inputs. Previously, any NaN in the confidences would make the comparator non-strict-weak and std::sort behaviour undefined (one observed symptom: top class label at index 0 but some later index carrying a higher confidence). 3. addon/src/model-interface/ClassificationModel.cpp -- trace hook New `QVAC_CLASSIFICATION_TRACE=1` env var toggles a per-inference stderr print of: - raw logits as read from the ggml output tensor - probabilities immediately after softmax (pre-sort) - final sorted results Off by default -- production users see nothing. Enabled in our CI integration-test workflow (in the third file below) so every run carries the numerical ground truth for every sample image. If a platform-specific anomaly ever recurs (e.g. the win32 meal_1 oddity we have been chasing) the log lines let us diagnose without adding further instrumentation. 4. test/integration/classify.test.js Before each per-image assertion block, emit a `t.comment(...)` line containing the full sorted result (label + 6-digit confidence per entry, plus elapsed ms). Brittle surfaces comments in the TAP stream regardless of pass/fail, so every CI job log now records the actual model output side-by-side with the assertion outcome. This replaces the need for post-hoc instrumentation commits when diagnosing numerical issues. 5. .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Set `QVAC_CLASSIFICATION_TRACE=1` on the integration-test step so the C++ trace lines land in CI logs by default. Bounded output (3 lines per inference, ~20 inferences per job), negligible cost. Local validation on win32-x64: 14/14 integration tests pass, 90/90 asserts. Trace output verified: all 6 sample images produce sensible logits and sum-to-1 probabilities; top class matches expected label in every case. Trace lines and t.comment()s visible in both the pass and (hypothetically) fail paths, as intended. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix: clang-format + defensive marshalling + finer test assertions Three coordinated changes that (a) unblock cpp-lint, (b) make the C++ -> JS marshalling robust against compiler code-gen quirks, and (c) make every test failure self-diagnostic so we never have to add post-hoc instrumentation again. 1. addon/src/model-interface/ClassificationModel.cpp -- clang-format Apply the exact diff that cpp-lint reported in run 24900278513: drop the blank line between <gguf.h> and the addon-cpp include, wrap the std::sort args one-per-line, and split the multi-arg static_cast<double>(...) chain in the trace fprintf to one arg per line. Pure formatting; no behaviour change. 2. addon/src/addon/AddonJs.hpp -- defensive marshalling + per-entry trace inside JsClassifyOutputHandler The lambda now reads the label and the confidence into named local variables (`labelString`, `confidenceFloat`, then `confidenceDouble = static_cast<double>(confidenceFloat)`) BEFORE handing them to `jsu::String::create` / `jsu::Number::create`. The previous inline expression jsu::Number::create(env, static_cast<double>(cppOut.results[i].confidence)) produced 0 in JavaScript for index 0 only on win32-x64 (clang-cl), while indices 1..N marshalled correctly -- visible in run 24900278513 win32 log: C++ trace shows {food:0.707883} but JS receives {food:0.000000}, all other entries OK. Materialising the values into named locals forces the compiler to commit the values to memory before the call sequence and dodges that code-gen pattern. Linux, macOS, and Windows continue to pass; this is risk-free defence-in-depth even if Windows turns out to have a deeper issue. Also adds an opt-in trace line per array element (gated by the same QVAC_CLASSIFICATION_TRACE=1 env var as ClassificationModel::process()), printing label, float, and double values as the lambda actually sees them. Combined with the existing process()-level trace, we now get the full pipeline view -- raw logits -> probs -> sorted results -> per-entry marshalling -- on every CI run with no manual instrumentation needed. 3. test/integration/classify.test.js -- finer assertions Replace coarse "confidence is in [0,1]" with split assertions that distinguish: typeof number / Number.isFinite (NaN/Inf detection) / range check. Per-entry assertion messages now include the array index AND the actual value so a failure line tells you exactly what went wrong. Same treatment for the sum and the sort-desc checks. Topk / sequential / raw-RGB tests gain explicit Number.isFinite checks plus t.comment() output of the full result, so they no longer silently swallow the kind of value-corruption bug that was hidden in test 2 of the previous CI run. Local validation on win32-x64: 14/14 tests pass; assertion count went from 90/90 to 140/140 with the new finite-checks. Marshalling trace verified emitting label / float / double per element under QVAC_CLASSIFICATION_TRACE=1. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js Made-with: Cursor * QVAC-17481 fix(mobile,addon): mobile model path via testAssets + cpp-lint uv.h order - `test/integration/utils.js`: add `resolveModelPath()` that resolves the GGUF weights via `global.assetPaths` on iOS/Android (the bare worklet runs from a packed `app.bundle/...` virtual root and cannot read the npm package's `weights/` directory), and falls back to the bundled desktop path otherwise. Throw a clear synchronous error when the asset is missing so it surfaces as a brittle assertion instead of an unhandled-promise-rejection that aborts the bare worklet. - `test/integration/classify.test.js`, `test/integration/error-cases.test.js`: use `resolveModelPath()` for every `ImageClassifier` instance. - `scripts/copy-mobile-test-assets.js`: replace the inline shell `mobile:copy-prebuilds` script with a portable Node script that fans out the single arm64 prebuild into the per-flavour directories the qvac-test-addon-mobile framework expects. - `package.json`: wire the new script in as `mobile:copy-prebuilds`. - `addon/src/addon/AddonJs.hpp`: include `<uv.h>` and reorder includes to satisfy `clang-format-19`'s grouping rules so cpp-lint passes in CI. - `.gitignore`: keep downloaded Device Farm logs (`remote_logs/`) and ad-hoc validation scripts out of the working tree. Made-with: Cursor * QVAC-17481 fix(mobile,addon): testAssets .gguf.bin extension + win32 burn-one js_create_double - `scripts/copy-mobile-test-assets.js` + `test/integration/utils.js`: copy the GGUF weights into `test/mobile/testAssets/` with a `.gguf.bin` suffix and look them up by that key. The qvac-test-addon-mobile framework's metro.config.js does not register `.gguf` as an asset extension, so a raw `.gguf` file is treated as a JS-source request and the bundler aborts at `:app:createBundleReleaseJsAndAssets`. `.bin` is in the framework's accepted list and ggml's `gguf_init_from_file` does not validate the file extension. - `addon/src/addon/AddonJs.hpp`: add a defensive "burn one" `js_create_double(env, 0.0, &dummy)` call at the top of the classification result lambda. On Win32 (clang-cl + bare runtime + V8) the very first `js_create_double` call inside a fresh handle scope returned 0 for index 0 even though the C++ side passed the correct value; consuming that slot unblocks every subsequent call. Gated trace output behind `QVAC_CLASSIFICATION_TRACE=1`. Made-with: Cursor * QVAC-17481 fix(mobile): copy test images to mobile testAssets to fix Android/iOS ENOENT `test/integration/utils.js:loadImage()` previously read every test image with `fs.readFileSync(path.join('test','images',name))`. On mobile that resolves into the packed `app.bundle/...` virtual root, where `test/images/` is not present, and the bare runtime aborts with `FileError: ENOENT, open "/app.bundle/backend/test/images/<file>"` right after the model loads (Pixel 9 Pro logcat from the previous CI run pinpointed this). Fixed by: - `scripts/copy-mobile-test-assets.js`: also copy every `test/images/*.{jpg,jpeg,png}` into `test/mobile/testAssets/`. JPEG and PNG are part of metro's default `assetExts`, so no rename is needed (unlike the GGUF blob). - `test/integration/utils.js`: add `_resolveImagePath()` that on mobile reads from `global.assetPaths['../../testAssets/<name>']` with the same key fallbacks as `resolveModelPath()`, and on desktop returns `test/images/<name>`. Throw with sample asset keys when the lookup fails so the failure is a brittle assertion. - `test/mobile/testAssets/.gitignore`: also ignore `*.jpg`/`*.jpeg`/ `*.png` so the populated images are not committed. Made-with: Cursor * QVAC-17481 docs: README revisions for mobile assets, FP16, topK and prose reflow - Document new `npm run mobile:copy-prebuilds` flow that populates `test/mobile/testAssets/` with prebuilds, the `.gguf.bin` weights blob, and the integration test images (fixes mobile ENOENT crash). - Replace the obsolete "Cold start" claim with a "First-call overhead" note that reflects the full-pipeline warmup added in `load()` and the remaining JS/JIT/decoder/page-cache effects. - Add a "Why FP16 weights?" subsection capturing the precision-vs-size rationale (FP16 matches FP32 accuracy on the validation set; more aggressive quantizations degraded noticeably). - Expand the topK section with a plain-language one-liner. - Add a runtime trade-off paragraph under "Why a custom GGML graph?": GGML CPU is slower than PyTorch/ONNX at this scale, but the absolute gap is negligible for a ~2.5 M-param model; larger classifiers would need extra graph-level optimisation. - Fix `funetuned` -> `fine-tuned` typo. - Reflow paragraphs to single lines so markdown viewers can soft-wrap. Made-with: Cursor * QVAC-17481 fix(graph): validate GGUF num_classes and assert output shape (review #1727) Addresses two `[BUG]` review comments from @olyasir on tetherto/qvac#1727 about the hardcoded `kNumClasses = 3` not being validated against either the loaded GGUF's `mobilenet.num_classes` metadata or the actual element count of the constructed output tensor. Both are downstream-safety problems for the per-inference path: float logits[graph::kNumClasses] = {0.0F}; ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits)); `sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this either reads OOB (numClasses < kNumClasses) or silently truncates (numClasses > kNumClasses); on the FC-weight-upload side the `classifier.3.weight = [1024, kNumClasses]` shape would also fail to match the GGUF tensor and corrupt the classifier. Changes: 1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights() Right after reading `numClasses` from `mobilenet.num_classes`, compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)` with a descriptive message (actual vs expected count, plus a hint to rebuild the addon or use a matching GGUF). This is the primary fix olyasir requested in `MobileNetGraph.cpp`. The error path is reachable from `ClassificationModel::load()`'s call to `graph::loadWeights(...)`, which already runs inside the JS-side `await classifier.load()` Promise; the `StatusError(InvalidArgument)` propagates as a structured rejection on the JS side, matching how every other config-time validation error in this addon surfaces. 2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph() At the end of the graph build, before we hand the `ComputeGraph::output` tensor over to the backend allocator, assert `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which throws `StatusError(InternalError, ...)`) if the invariant is violated. This is the defence-in-depth fix olyasir requested in the second `[BUG]` comment in `ClassificationModel.cpp`: it makes the 12-byte stack-array `ggml_backend_tensor_get` read provably safe regardless of how the output tensor was constructed. This second check is not redundant with #1: it also catches a future accidental edit to the classifier wiring above (where the tail `classifier.3` linear is what determines the output element count), an upstream ggml change to how `mul_mat` shapes its result, or a GGUF that lacks the `mobilenet.num_classes` metadata key entirely and falls back to `kNumClasses` but ships mismatched FC weights. Local validation on win32-x64: - 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph determinism, preprocessor suite -- they all exercise the validated load + build paths against the bundled FP16 GGUF, where `num_classes == 3` so neither check fires). - 14/14 JS integration tests pass, 140/140 asserts (no behaviour change for the supported model; new error paths are unreachable with the bundled weights). Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp Made-with: Cursor * QVAC-17481 fix(preprocess): pre-decode size check via stbi_info_from_memory (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Could we check this before decoding? `stbi_info_from_memory()` would > let us reject oversized images / total pixel count before > `stbi_load_from_memory()` allocates Why it matters: `stbi_load_from_memory` allocates the full decoded RGB buffer (width * height * 3 bytes) before any caller-provided dimension limit is enforced. For a 16384x16384 image at the upper edge of `kMaxImageDimension`, that is ~768 MB of heap allocated before we see the dimension and reject -- enough to OOM a memory-constrained device or trigger an oversized free. `stbi_info_from_memory` parses only the image header (a few hundred bytes) and reports the dimensions cheaply, so we can reject oversized inputs up-front. The post-decode dimension check is kept as belt-and-braces in case `stbi_info` and `stbi_load` ever disagree (e.g. truncated streams that parse a valid header but fail mid-decode); it is a correctness check, not the primary OOM defence. Behaviour: - If `stbi_info` succeeds and reports dimensions over `kMaxImageDimension`, `decodeToRgb` throws `StatusError(InvalidArgument, ...)` with the actual reported size in the message, before any decode allocation runs. - If `stbi_info` fails (header could not be parsed), we fall through to `stbi_load_from_memory`. That path already throws with `stbi_failure_reason()` attached, which is a more user-actionable message than a generic "header bad" we would emit ourselves. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp Validated locally on win32-x64: 14/14 JS integration tests pass. Made-with: Cursor * QVAC-17481 test(preprocess): expand ImagePreprocessor unit coverage (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Could we add more unit coverage for ImagePreprocessor before merging? > preprocessor_test.cpp covers some happy paths, but a few public > functions/branches still look uncovered: > - decodeToRgb() success/failure paths are not tested directly. > - preprocessToTensor() is only covered for empty input; it should > also cover encoded JPEG/PNG success, raw RGB success, and > unsupported non-image input without dimensions. > - validateRawRgb() is missing empty buffer, zero width/height, and > over-kMaxImageDimension cases. > - normalizeToWhcn() should cover invalid input size. Adds the following PreprocessorTest cases (14 new tests, taking the suite from 10 to 24 -- all 29 cases across the addon's two C++ test binaries pass on win32-x64): decodeToRgb: - DecodeToRgbDecodesValidJpeg -- happy path against test/images/meal_1.jpg - DecodeToRgbRejectsEmptyBuffer - DecodeToRgbRejectsCorruptedBytes - DecodeToRgbRejectsTruncatedJpeg preprocessToTensor (full pipeline): - PreprocessToTensorAcceptsEncodedJpeg -- JPEG happy path with finite-output check - PreprocessToTensorAcceptsRawRgb -- raw RGB happy path with finite-output check - PreprocessToTensorRejectsBmpWithoutDimensions - PreprocessToTensorRejectsRawWithMissingDims validateRawRgb edges: - ValidateRawRgbRejectsEmptyBuffer - ValidateRawRgbRejectsZeroWidth - ValidateRawRgbRejectsZeroHeight - ValidateRawRgbRejectsOverKMaxImageDimensionWidth - ValidateRawRgbRejectsOverKMaxImageDimensionHeight normalizeToWhcn: - NormalizeToWhcnRejectsWrongInputSize Adds a `readTestImage(name)` helper that walks up from the current binary location to find `test/images/<name>`, mirroring the `findWeightsPath()` helper already in classification_model_test.cpp. JPEG-using tests skip cleanly via GTEST_SKIP() if the image is not present, so the C++ test suite still passes when run from a packed tarball that does not include the test images. File: packages/qvac-lib-infer-ggml-classification/test/unit/preprocessor_test.cpp Made-with: Cursor * QVAC-17481 refactor(model): flatten ClassificationModel::Impl pidgeonhole (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Why one extra level of indirection with `Impl`? Maybe style, but I > see no strong benefit and it just scatters the code around and > makes it harder to track. I would prefer a straightforward class > where all these variables can be directly under > `ClassificationModel` private variables. The PIMPL was originally there to keep ggml types out of the public header. In practice this header is only included by the addon's own `AddonJs.hpp`, which already pulls in the entire qvac-lib-inference-addon-cpp framework, so there is no header-fanout benefit from hiding ggml. Flattening the impl removes one level of heap indirection, lets all members be visible at a glance, and lets clang-tidy / IDE navigation jump straight to the field declarations. Changes: 1. addon/src/model-interface/ClassificationModel.hpp - Pull in `<ggml-backend.h>` and the local `MobileNetGraph.hpp` (which exposes `WeightsBundle` / `ComputeGraph` definitions used by the new direct members). - Replace `struct Impl;` forward declaration and `std::unique_ptr<Impl> impl_;` with the eight direct private members the Impl previously held: `modelPath_`, `backend_`, `weights_`, `compute_`, `labels_`, `numThreads_`, `loaded_`, `lastInferenceUs_`. Member ordering is documented in a comment: ggml requires every backend buffer to be released BEFORE the backend it was allocated on, and `~ClassificationModel` enforces that ordering explicitly with `compute_.reset(); weights_.reset();` before `ggml_backend_free(backend_)`. 2. addon/src/model-interface/ClassificationModel.cpp - Remove the `struct ClassificationModel::Impl { ... };` definition and the `std::make_unique<Impl>()` from the constructor body. - Replace every `impl_->X` with `X_` (34 references). No functional change. - Drop redundant `if (!impl_)` guards in `setNumThreads()`, `load()`, `runtimeStats()`, and `process()`. The class is non- copyable and non-movable (it carries a `std::mutex` member, which suppresses implicit move ctors/assignment), so `impl_` was always non-null between construction and destruction; the guards were dead code. Local validation on win32-x64: - `bare-make build` clean (warnings unchanged from before refactor; no new errors). - `npm run test:cpp` -- 29/29 tests pass (3 ClassificationModelTest + 24 PreprocessorTest + 1 BnEpsilonGuard + 1 architecture sanity). - `npm run test:integration` -- 14/14 tests pass, 140/140 asserts. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 refactor(addon,binding): single-place arg validation in C++ AddonJs (review #1727) Addresses jesusmb1995's review comments on tetherto/qvac#1727: > Why normalizing here instead of just throwing at `AddonJs` and > having a central place where to do the validation? I had previous > conversations with Gianfranco (and Nidhin) on LLM we agreed it > makes sense to do parsing/validation at on place, namely at AddonJs > construction, and throw there if wrong/invalid arguments directly > at c++. > > For construction/config arguments, `createInstance()` should be the > place that parses and validates the JS values before building the > native model: model path, threads, and any other config should > either produce a valid C++ configuration or throw immediately > there. That keeps the JS wrapper thin and avoids having two > different sources of truth for what is valid. > > For per-call image arguments, the same principle applies at the > native job boundary before `ClassificationModel`: parse the JS > input once, construct an explicit validated `ClassifyInput`, and > then let the model/preprocessor operate on that clean shape. That > removes the duplicated JS normalization/magic-byte checks and > avoids relying on weak `0` sentinel values for "not provided". Changes: 1. addon/src/model-interface/ClassificationModel.hpp - Replace the four sentinel-zero fields (`width = 0`, `height = 0`, `channels = 0`, `topK = 0` overloaded as "not provided") with an explicit `std::optional<RawRgbDims>` member that captures the "is the input raw RGB or encoded?" decision in a type the compiler can check. - `topK = 0` stays only because it has a meaningful "no filter" interpretation; non-zero values are validated > 0 at the binding boundary. 2. addon/src/model-interface/ClassificationModel.cpp - Translate `optional<RawRgbDims>` -> the existing `(declaredWidth, declaredHeight, declaredChannels)` triplet consumed by `preprocess::preprocessToTensor`. The preprocessor's internal "0 means not-provided" convention is preserved (it is a private API; the JS-facing one is the explicit optional). 3. addon/src/addon/AddonJs.hpp - `createInstance` now validates: * `path` must be a non-empty string, * `config.threads` (when provided) must be a positive integer. These were previously not enforced; non-positive thread counts would have silently passed through to libggml and raw negatives would int-truncate. - `runJob` is now the single source of truth for per-call validation: * `content` rejection message rephrased to include the substring "required" so the JS test `t.exception.all(..., /required|null|undefined/i)` keeps passing without relying on a separate JS-side TypeError. * Dimension triplet enforcement: caller must provide either all of {width, height, channels} or none of them; partial shapes are rejected with an explicit message rather than leaking through as a buffer-size mismatch downstream. * Each dim is range-checked as int32_t before being committed to ClassifyInput's optional<RawRgbDims>, so a negative JS Number cannot wrap to ~4 billion via uint32_t cast and tunnel into validateRawRgb. * `topK` is range-checked > 0 if provided. 4. test/unit/classification_model_test.cpp - Migrate the three `input.width = ...; input.height = ...; input.channels = ...;` blocks to the new `input.rawRgb = qcc::RawRgbDims{...};` shape. No behavioural change. 5. index.js - Strip every JS-side validation helper that duplicated C++ work: `assertBuffer`, `normaliseDimensionOptions`, `isSupportedEncoded`, `startsWith`, `JPEG_MAGIC`, `PNG_MAGIC`. The classify() body now literally builds `{ type, content, [width, height, channels, topK] }` from the caller's arguments and forwards to the binding. - Lifecycle checks (`!this._addon || !this.state.configLoaded`) and the file-existence check in `load()` stay in JS: * lifecycle is a JS-managed state, not a value-shape question; * the existence-check delivers a more actionable error message ("MobileNet GGUF weights not found at: <path>") than letting the load reach C++ and throw "Failed to open GGUF file: <path>" downstream. - Module-level comment documents the JS-as-thin-pass-through contract so a future contributor cannot re-introduce the duplicated validation by mistake. Local validation on win32-x64: - `bare-make build` clean. - `npm run test:cpp` -- 29/29 (incl. the migrated raw-RGB ClassificationModelTest cases). - `npm run lint` -- clean. - `npm run test:integration` -- 14/14 tests, 140/140 asserts. All existing brittle regex matchers in `error-cases.test.js` (`/required|null|undefined/i`, `/empty/i`, `/format|invalid/i`, `/decode|jpeg|invalid/i`, `/match|size|width|height|raw/i`, `/format|jpeg|png|bmp/i`, `/not loaded|load\(\)/i`, `/not loaded|destroyed|state/i`) match the new C++-issued error messages, so no test regex needed updating. Files: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/test/unit/classification_model_test.cpp packages/qvac-lib-infer-ggml-classification/index.js Made-with: Cursor * QVAC-17481 chore(test,docs): post-sync audit follow-ups (consistency + uniform url strip + readme) Picks up the lower-risk consistency / correctness items from the post-sync self-audit. None of these change observable behaviour; they remove duplication and small footguns that would otherwise surface as drift in future maintenance. 1. test/integration/utils.js -- single source of truth for the mobile asset-key heuristic + uniform `file://` strip. - Extract `_resolveMobileAsset(filename)` from the two duplicate-by-design loops in `resolveModelPath()` and `_resolveImagePath()`. Both used the same four-element candidate-key array (`../../testAssets/${name}`, `../mobile/testAssets/${name}`, `testAssets/${name}`, `../testAssets/${name}`); future framework key-shape changes now land in one place instead of being silently inconsistent. - Extract `_stripFileUrlPrefix(mapped)` and switch from `mapped.slice('file://'.length)` to `mapped.replace(/^file:\/\//, '')`. The slice version leaves a stray leading `/` if the harness ever returns a triple-slash `file:///abs/...` URL (harmless on POSIX-mobile, malformed on a hypothetical Windows-mobile target). The regex strip is uniformly correct across both shapes. - Add `makeClassifier(overrides)` -- the standard test-instance factory. Centralises model-path + logger wiring so any future constructor-arg change in the addon lands in one place instead of N inline `new ImageClassifier(...)` callsites. 2. test/integration/classify.test.js + error-cases.test.js -- adopt the shared factory. - classify.test.js drops the inline `new ImageClassifier({ modelPath: resolveModelPath(), logger: createLogger() })` (4 callsites) in favour of `makeClassifier()`. Imports trimmed accordingly: drops `ImageClassifier`, `createLogger`, `resolveModelPath` from the destructure (unused after refactor; standardjs would have flagged them anyway). - error-cases.test.js drops its local `makeClassifier()` (which was a duplicate of what now lives in utils.js) and imports the shared one. Net: -1 module-level function. 3. README.md -- fix the `**threads**` markdown bullet. The line `- \`**threads**\` -- ...` wraps the bold markers in backticks, which renders the asterisks literally inside an inline-code span (`**threads**` instead of bold **threads**). Bare-renderable replacement: `- **\`threads\`** -- ...` reads as bold inline-code, matching the intent of the surrounding bullets. This was a pre-existing bug noted as "out-of-scope" in the line-reflow pass but is trivial to fix. Local validation on win32-x64: - `npm run lint` clean. - `npm run test:cpp` -- 29/29 (no behavioural change, just end-to-end smoke that the test-utils refactor did not break the C++ harness paths). - `npm run test:integration` -- 14/14, 140/140 asserts (run twice to confirm; one in-between-test SIGSEGV observed on the first run is the known upstream `OutputCallBackJs` UAF the hack branch deliberately leaves un-papered-over, not caused by this commit). Files: packages/qvac-lib-infer-ggml-classification/test/integration/utils.js packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js packages/qvac-lib-infer-ggml-classification/test/integration/error-cases.test.js packages/qvac-lib-infer-ggml-classification/README.md Made-with: Cursor * QVAC-17481 chore: rename addon directory to packages/classification-ggml Aligns the addon's directory and CI-workflow filenames with the published package name (`@qvac/classification-ggml`) so that the folder and the npm scope read consistently. Per a reviewer-style naming convention request: Package name: @qvac/classification-ggml Addon folder: classification-ggml Renames (53 files via `git mv`, all rename detection clean -- 31 insertions / 31 deletions across 54 files): packages/qvac-lib-infer-ggml-classification/ -> packages/classification-ggml/ .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml -> .github/workflows/integration-mobile-test-classification-ggml.yml .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml -> .github/workflows/integration-test-classification-ggml.yml .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml -> .github/workflows/prebuilds-classification-ggml.yml In-file text updates (paths only -- no functional change): - All four workflows (`integration-mobile-test-classification-ggml.yml`, `integration-test-classification-ggml.yml`, `prebuilds-classification-ggml.yml`, plus the hack-branch `on-pr-qvac-lib-infer-llamacpp-llm.yml`) now reference the new `packages/classification-ggml/**` path filter, `PKG_DIR=packages/classification-ggml` env, the renamed sibling workflow filenames, and the new `addon/packages/classification-ggml` `ADDON_WORKDIR` for the mobile harness. - `packages/classification-ggml/CMakeLists.txt` -- `project(...)`, `add_bare_module(...)`, and every `${...}` target reference renamed to `classification-ggml`. The bare module's output filename (`qvac__classification-ggml.bare`) is unchanged because bare derives it from `package.json` `name` (`@qvac/classification-ggml`), not from the CMake project name. - `packages/classification-ggml/package.json` -- repository.directory, homepage URL. - `packages/classification-ggml/README.md`, `index.js`, and `docs/onnx-to-gguf-conversion.md` -- doc paths. Deliberately NOT renamed (out of scope -- code-level identifiers, not file paths): - C++ namespace `qvac_lib_infer_ggml_classification` (8 files). Other addons in this monorepo do NOT tie their C++ namespace to the folder name (e.g. `qvac::ttslib::lavasr` lives under `packages/qvac-lib-infer-onnx-tts/`), so the namespace is a code-style choice rather than a path-consistency one. Can be folded into a follow-up if reviewers want full consistency there too. Local validation on win32-x64 (in the renamed `packages/classification-ggml/` directory): - `npm install` clean. - `bare-make generate` + `bare-make build` + `bare-make install` succeed; `qvac__classification-ggml.bare` produced under `prebuilds/win32-x64/` (filename unchanged). - `npm run lint` clean. - `npm run test:cpp` 29/29. - `npm run test:integration` 14/14, 140/140 asserts (perf-report correctly written under `packages/classification-ggml/test/results/`). Made-with: Cursor * QVAC-17481 fix(addon,test): align upstream-bug workarounds with monorepo convention Two upstream issues block the addon's CI without local mitigations. Both are paper-trailed in detail in `remote_logs/issues_report.md` (gitignored, internal). Inline comments at the workaround sites are kept short to match how other addons in the monorepo handle the same races. 1. `OutputCallBackJs` use-after-free race ---------------------------------------- `qvac_lib_inference_addon_cpp::~OutputCallBackJs` deletes JS refs synchronously while `uv_close` on its async handle is asynchronous (queue/OutputCallbackJs.hpp:48-58); a `uv_async_send` queued just before destruction fires against dead refs and crashes in `js_open_handle_scope`. Reproduced as SIGSEGV (linux-x64/-arm64, darwin-arm64), `Fatal signal 11` (Android logcat), and `EXC_BAD_ACCESS @ 0x1a0` (iOS crash report) across rapid create/ destroy cycles. Other addons in this monorepo paper over the same race in their integration suites with sleep-around-unload, e.g. ocr-onnx/test/integration/lifecycle.test.js:56,85,115 ocr-onnx/test/integration/full-ocr-suite.test.js:107,115,123 qvac-lib-infer-llamacpp-llm/test/integration/sliding-context.test.js:163,355 We adopt the same pattern via `cleanupClassifier()` in `test/integration/utils.js` (two-phase: 500-1000ms pre-unload yield + 2000-3000ms post-unload drain). The pre-unload yield is required for our addon specifically because `await classify()` resolves on the first `Output` event while the worker thread keeps queuing follow-up events (`RuntimeStats`, `JobCompleted`); without it the follow-ups land DURING `~OutputCallBackJs`. Every classify() call in the integration tests was migrated to `cleanupClassifier()`. The removed local C++ wrapper (`DeferredOutputCallBackJs`) was a real lifetime fix but kept us out of step with how the rest of the monorepo handles this; once upstream is patched the sleeps drop everywhere at once. 2. Win32-x64 first-`js_create_double` returns 0.0 ---------------------------------------------- The very first `js_create_double` call in the process returns 0.0 on the Azure GitHub-hosted `windows-2022` runner (clang-cl + bare-runtime + V8). Subsequent calls in the same handle scope are correct. No local Windows repro; only the CI runner image is affected. Other addons accidentally dodge the symptom because their first emitted number is naturally 0 (whisper/parakeet `segment.start`), they assert only `typeof === 'number'` / `!isNaN` (llamacpp-llm stats), they never assert the value (ocr-onnx bbox coords), or they emit no numbers at all (lib-infer-diffusion / llamacpp-embed). Our 3-class softmax sort + sum-to-1 assertions catch the corruption immediately, so no test-side workaround is possible. Local C++ "burn one" workaround in `JsClassifyOutputHandler`'s lambda preamble: a throwaway `js_create_double(env, 0.0, &dummy)` call consumes the broken first slot so the per-element `Number::create` calls below produce the correct value at index 0. Cost is one ephemeral js_number per classify() call. Other follow-ups in this commit (none disturb code paths above): - `addon.js` lifecycle: `unload()` no longer waits on the pending-job promise. The post-unload sleep in `cleanupClassifier` covers the same window, so `unload()` becomes a thin pass-through (matches what every other addon in the monorepo does). - Top-of-file workaround comment in `AddonJs.hpp` consolidated to a 2-line note at the burn-one site (matches the comment density other addons use; full root cause in the report). - `cleanupClassifier` doc trimmed to 3 lines pointing at the report. Local validation on win32-x64: - bare-make build clean - npm run lint clean - npm run test:cpp 29/29 - npm run test:integration 14/14 + 140/140 asserts Files: packages/classification-ggml/addon.js packages/classification-ggml/addon/src/addon/AddonJs.hpp packages/classification-ggml/addon/src/js-interface/binding.cpp packages/classification-ggml/test/integration/classify.test.js packages/classification-ggml/test/integration/error-cases.test.js packages/classification-ggml/test/integration/utils.js Made-with: Cursor * QVAC-17481 chore: adopt upstream WA fixes from PR #1825 Bumps qvac-lib-inference-addon-cpp from 1.1.5#1 to 1.1.6 (the version shipped by PR #1825) and removes the two local workarounds it was brought in to dodge: - Win32 burn-one js_create_double in JsClassifyOutputHandler is gone; upstream's JsUtils::Number::createDouble now applies a process-wide burn-once guard via static-init. - Two-phase sleep around unload() in cleanupClassifier is gone; upstream's ~OutputCallBackJs now defers js_delete_reference into the uv_close callback via a heap-owned State. Local Win32 validation: 14/14 integration tests + 29/29 C++ unit tests pass; in particular the index-0 marshalling assertions and the back-to-back load/unload cycle test that previously SIGSEGV'd both pass without their prior workarounds. Resolves T1 + T10 from the audit; details in remote_logs/issues_report.md. Made-with: Cursor * QVAC-17481 chore[api]: align lifecycle with llamacpp-llm pattern Re-shape the JS layer so request orchestration mirrors the LLM addon (closes T5-T9 from PR #1727 review): - addon.js becomes a thin C++ binding wrapper (mirrors LlamaInterface): constructor takes `(binding, configurationParams, outputCb, logger)`, exposes `activate()` / `runJob()` / `cancel()` / `unload()`. The bespoke `_pending` Promise + `_outputCallback` are gone; export a shared `mapAddonEvent(rawEvent, rawData, rawError)` instead. - index.js becomes the orchestration layer (mirrors LlmLlamacpp): one `exclusiveRunQueue()` serialises load/classify/unload, one `createJobHandler()` owns the active QvacResponse, and the output callback fans events through `_handleAddonOutputEvent`. - load() now does try/catch around `activate()` and best-effort `_addon.unload()` on failure so a partial init never leaves a zombie native handle (T6). - classify() resolves on the terminal stats event rather than the first ClassifyOutput, eliminating the orphan-callback risk that motivated the `_pending` drain on the previous design (T7, T8). Public shape unchanged: still `Promise<Array<{label,confidence}>>`. - unload() runs through the same queue, calls native `cancel()` on in-flight work, fails the active JS request with `Model was unloaded`, then destroys the native handle (T9). mapAddonEvent is keyed on payload shape (Array → Output, plain object → JobEnded terminal) because the upstream JobRunner emits the stats trailer with a raw `std::vector<std::pair<...>>` RTTI name rather than a literal `*JobEnded` event. Documented inline. Local validation: 14/14 integration + 140/140 asserts in 2.8s (down from 8.2s in Group A — the LLM-style cancel/unload is much faster than the prior drain-then-destroy pattern); 29/29 C++ unit tests; standard lint clean. Made-with: Cursor * QVAC-17481 infra: add canonical on-pr + on-pr-close workflows for classification-ggml Adds the two missing top-level workflow files so the addon now has the full 5-file layout used by every other modern addon in the monorepo (`decoder-audio`, `diffusion-cpp`, `ocr-onnx`, `bci-whispercpp`): - `on-pr-classification-ggml.yml` -- canonical PR trigger router. authorize -> changes -> sanity / ts-checks / cpp-lint / prebuild -> integration / mobile -> merge-guard. Path filters scope to `packages/classification-ggml/**` and the addon's own workflow files. - `on-pr-close-classification-ggml.yml` -- mirror of `on-pr-close-decoder-audio.yml`. Triggers `public-delete-npm-versions` with `packages: classification-ggml` to clean up per-PR npm pre-releases on PR close. Closes T11 from PR #1727 review (olyasir: "rename in same format as other pipelines"). The legacy-named `on-pr-qvac-lib-infer-ggml-classification.yml` on the fork PR-1 branch will be removed at sync-to-PR-1 time. The hack-branch dispatch swap (`on-pr-qvac-lib-infer-llamacpp-llm.yml` hijacked + `*-temp.yml` parking) is intentionally left untouched here: new workflows aren't dispatchable from the GitHub Actions UI until they exist on `main`, so the swap is still our only working dispatch path for hack-branch CI runs. Validation: both files parse with `yaml.safe_load`; every workflow / composite-action reference resolves on disk. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-17481 doc: trim verbose AI-style comments across the addon Closes T2/T3/T4 from PR #1727 (jesusmb1995: "Please remove this comment, its unnecessary... LLM's are too verbose"), and applies the same four cleanup rules across the rest of …
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
…sification addon
Introduces a new inference addon that classifies images into three
classes (food / report / other) using a fine-tuned MobileNetV3-Small
CNN running on the libggml CPU backend. Follows the established QVAC
addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion).
## What this PR ships
- New package `packages/qvac-lib-infer-ggml-classification/` publishing
as `@qvac/classification-ggml`:
- Native addon: custom 34-layer MobileNetV3-Small compute graph built
directly against the public `ggml.h` / `ggml-backend.h` API — no
llama.cpp application-layer dependency, so the addon remains
forward-compatible with future `libggml` upstream merges.
- Load-time BatchNorm fold with `eps = 0.001` (the architecture-
correct value; `1e-5` causes normalisation drift across all 34
layers). Depthwise separable convolutions, squeeze-and-excite
blocks, HardSwish / HardSigmoid / ReLU activations all wired
through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`,
`ggml_hardswish`, `ggml_hardsigmoid`.
- FP16 GGUF weights bundled inside the package (2.94 MB); class
labels are read from the GGUF `mobilenet.class_N` metadata so a
future fine-tune can ship different class names without a code
change.
- Public JS API: `new ImageClassifier({ modelPath?, logger?,
threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)`
/ `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input;
validates at the JS layer before reaching native code so no bad
input reaches libggml.
- `nativeLogger` opt-in (default `false`): the underlying
`qvac-lib-inference-addon-cpp` JsLogger holds a process-wide
static `uv_async_t` that is not safe across rapid create/destroy
cycles, so the native C++→JS log bridge is disabled unless the
caller explicitly opts in. JS-level logging always flows through
the caller's `logger`.
- Image preprocessing via vendored-through-vcpkg `stb_image` +
`stb_image_resize2` (bilinear resize to 224×224, ImageNet
normalisation, WHCN layout).
## Build + tests
- `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting
`ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from
the shared QVAC vcpkg registry.
- C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9
SE blocks), load + inference, determinism, `topK` filter, BN
epsilon guard, and full preprocessor behaviour.
- brittle + bare JS integration tests covering load, classify (all 6
public sample images under `test/images/`), `topK`, raw RGB input,
and every error path: null, empty buffer, corrupted JPEG,
unsupported format (BMP), mismatched dimensions, pre-load /
post-unload, tiny upscale, load/unload cycles.
- Mobile test scaffolding following the shared convention:
`scripts/generate-mobile-integration-tests.js`,
`scripts/validate-mobile-tests.js`, `test/mobile/
{integration-runtime.cjs, integration.auto.cjs, README.md,
testAssets/.gitignore}`. The auto-generated `integration.auto.cjs`
wraps every `test/integration/*.test.js` so the shared
`qvac-test-addon-mobile` framework picks them up on Android and iOS
automatically.
## CI workflows
Four addon-scoped workflows (path-filtered to this package):
- `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity
checks, TypeScript declaration check, C++ lint, prebuild matrix,
desktop integration tests, mobile integration tests, merge-guard.
- `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64,
Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64
prebuild matrix.
- `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop
end-to-end tests with the shared performance reporter writing a
GitHub step summary.
- `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` —
AWS Device Farm Android + iOS runs via the
`tetherto/qvac-test-addon-mobile` framework.
## Public-data / test-image policy
All public correctness assertions in this package are scoped to the 6
test images under `test/images/` (2 per class). No confidential
fine-tuning numbers, validation-set sizes, per-class metrics, or
references to any internal validation dataset appear in this PR, in
any file it ships, or in CI logs. Internal numerical-equivalence
gating against an ONNX FP32 reference is handled pre-release by a
development-only script that is not part of this PR.
## Out of scope for this PR
- SDK plugin / schema integration (`packages/sdk/**`) lands in a
follow-up PR after `@qvac/classification-ggml@0.1.0` is published
to npm. This mirrors the diffusion rollout (#656 → release → #1021).
- GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0.
Made-with: Cursor
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
* feat: add diffusion SDK plugin integration Wire up the stable-diffusion.cpp plugin through all SDK layers: - Schema: sdcpp-config.ts with config, stats, request/response schemas - Plugin: resolveConfig for companion artifacts, createModel, streaming handlers - Load model: diffusion entries in all 4 schema locations - Registration: model type, alias, engine-addon map, worker, pear pre-hook - Type widening: FilesystemDL | undefined for loader-less plugins * feat(diffusion): consolidate SDK plugin, fix sampling_method schema, add integration tests - Fix sampling_method enum to match C++ addon ground truth (dpm++2m not dpm++_2m) - Add 6 missing sampler values (ipndm, ipndm_v, ddim_trailing, tcd, res_multistep, res_2s) - Fix addon index.d.ts SamplerMethod type to match C++ parser - Consolidate generation ops into single unified handler (txt2img + img2img) - Add dedicated RPC handler, client API, and first-class generation() export - Add 15 integration test definitions and desktop executor - Add examples: txt2img, img2img, flux2-klein - Add comprehensive unit tests for schemas, plugin dispatch, and stats - Wire diffusion into handler-registry, common schemas, model-config-utils, get-model-info * fix(diffusion): register generationStream in bare-client handler map The bare-client dispatches via handlers/index.ts (direct mode), not handler-registry.ts (IPC worker mode). Missing entry caused RPC_NO_HANDLER when running examples via bare runtime. * feat(diffusion): add diffusion naming handler for update-models codegen Add dedicated generateDiffusionName() to produce clean export constants for diffusion registry models (SD → SD_V2_1, SDXL → SDXL_BASE, FLUX, VAE). Includes 4 unit tests covering all model families. * feat(diffusion): sync registry models and use FLUX constant in tests Run bun update-models to pull 21 new models (including diffusion) from the live registry. Replace QVAC_DIFFUSION_MODEL env var in model-manager with the FLUX_2_KLEIN_4B_Q4_0 registry constant. * fix(diffusion): prevent statsPromise hang and fix lint issues Resolve statsPromise after stream loop exits (not only on done:true), add statsRejecter for error propagation, derive GenerationClientParams from schema type to prevent drift, and fix lint warnings in generation ops and test executor. * revert: remove non-matching patterns from generation client Revert statsPromise try/catch/rejecter and GenerationClientParams Omit<> derivation — these diverged from the established patterns in ocr.ts, translate.ts, and transcription.ts. Also remove unrelated model history file that was incorrectly included. * chore: remove unrelated model history file ecb1bf8.txt was a codegen artifact from bun run update-models during the merge — it should not have been committed here. * fix: configure FLUX companion models and GPU device for diffusion tests FLUX.2 models require companion LLM (Qwen3-4B) and VAE models to create the stable-diffusion context. Without them, SdModel::load() fails. Also switches device from CPU to GPU and adds img2img test fixture. * fix(examples): configure FLUX companion models consistently across all diffusion examples All three diffusion examples now default to the required companion LLM (QWEN3_4B_Q4_K_M) and VAE (FLUX_2_KLEIN_4B_VAE) models, matching the desktop test configuration. Also switches device from cpu to gpu. * fix(tests): use llm addon elephant.jpg for img2img test fixture Replace photo.png with elephant.jpg from lib-infer-llamacpp-llm/media. Update generation test definitions to reference the new filename. * fix(examples): use path.resolve for img2img default image path import.meta.dirname is undefined in Bare runtime. Use path.resolve with a CWD-relative path instead, matching the documented convention of running examples from the SDK root. * fix(tests): migrate generation executor to ResourceManager pattern Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836. * feat(api): expose progressStream in generation() client helper The server already emits progress ticks (step/totalSteps/elapsedMs) during diffusion generation but the client was silently dropping them. Add a progressStream async generator to the generation() return type so SDK callers can show progress UI. Update the streaming-progress integration test to assert progress tick presence and field validity. * refactor(api): use background fan-out loop for generation() streams Refactor generation() to follow the completion() multi-stream pattern: a single background processResponses() task drives the RPC stream and fans out to outputStream, progressStream, outputs, and stats independently. This fixes two issues with the previous implementation: - consuming progressStream alone now works (no longer requires outputStream iteration to drive the RPC stream) - RPC errors propagate to all consumers (streams throw, promises reject) * chore: regenerate bun.lock and models registry after rebase Regenerates bun.lock and models/registry/models.ts to restore FLUX model entries that were lost during rebase conflict resolution. * fix[api]: align SDK diffusion schemas with addon contract - Rename config field `wtype` → `type` to match C++ context handler key - Expand weight type enum to match addon: add auto, bf16, q2_k, q3_k, q4_k, q5_k, q6_k; remove invalid "default" - Remove `schedule` config field (no C++ context handler exists for it) - Fix per-request scheduler enum: remove "default" (addon rejects it), add sgm_uniform, simple, lcm, smoothstep, kl_optimal, bong_tangent - Remove phantom stats fields from diffusionStatsSchema (generation_time, totalTime, stepsPerSecond, msPerStep, megapixelsPerSecond, steps, output_count) — addon RuntimeStats never emits these - Update unit tests and generation executor to use real addon fields * fix[api]: align SDK rng config with addon contract - Add std_default to rng enum to match addon RngType - Add sampler_rng config field (separate RNG for sampler) - Forward sampler_rng from plugin to addon * mod[api]: rename public API from generation() to diffusion() Aligns top-level API naming with other addon-specific surfaces (completion, ocr, embed) — "diffusion" is specific to the stable-diffusion.cpp backend, while "generation" is too generic and could apply to any inference addon. Rename covers: public function, types, schemas, RPC routing literal, handler registry, plugin handler key, examples, integration tests, and unit tests. Addon RuntimeStats field names (generationMs, etc.) are unchanged — those are wire-format names from the C++ addon. * fix: resolve pre-existing lint errors in diffusion client and load-model - Cast streamError to Error to satisfy @typescript-eslint/only-throw-error (closure type narrowing false positive) - Remove unnecessary SdcppConfig type assertion and unused import in load-model.ts * mod[api]: remove img2img functionality until addon support lands Strip init_image, strength, and all img2img code paths from the SDK surface. Will be re-added when the addon fully supports it (PR #884). * feat[api]: wire up profiler and device defaults for diffusion addon Register diffusionStream operation metrics (generationMs, totalSteps, totalImages, totalPixels) following the pattern of all other addons. Add sdcppGeneration to deviceConfigDefaultsSchema so device-specific config defaults can be applied to diffusion models. * fix[api]: align diffusion client API with actual streaming behavior C++ generate_image() is synchronous — images are delivered only after generation completes, not streamed during inference. Remove misleading outputStream generator and stream param from the client API. The correct surface is: progressStream (real-time step ticks), outputs (final images), and stats. Also update @qvac/diffusion-cpp dependency from file: link to 0.1.0 now that the package is published. * chore: clean up internal comments from public-facing API Remove implementation details (RPC wire format, C++ internals) from JSDoc and schema comments that end users would see. * fix[api]: add positive constraint to width/height, describe config fields - Add .positive() to width and height in diffusionRequestSchema - Add .describe() to companion model fields (clipLModelSrc, clipGModelSrc, t5XxlModelSrc, llmModelSrc, vaeModelSrc) documenting which architectures require them - Add .describe() to prediction, type, clip_on_cpu, vae_on_cpu, vae_tiling, flash_attn config fields - Add diffusion-simple.ts example showing minimal config with a single all-in-one GGUF model (no companion files) * fix: add missing validator to download test custom expectation DownloadExecutor constructor takes no arguments — remove resources param. download-tests custom expectation requires validator field per TestDefinition schema. * fix: add mobile diffusion support, move executor to shared, bump test timeouts - Move diffusion executor from desktop/ to shared/ (no platform-specific APIs) - Add skipPreDownload to desktop diffusion resource (companion models resolve at load time) - Add mobile consumer: SD 2.1 Q8_0 model, device gpu, threads 4, prediction v, vae_on_cpu true - Bump test timeouts: 300s default, 600s for batch/seed tests - Fix DownloadExecutor() constructor call (takes no args) * fix: use exported SDK model constant in diffusion-simple example Replace hardcoded local file path with SD_V2_1_1B_Q8_0 constant. Add prediction: "v" config required by SD 2.1 models. * fix[api]: address PR review comments for diffusion SDK integration - plugin.ts: import addon types (ImgStableDiffusionArgs, SdConfig), remove as any/as never casts. Refactor resolveConfig to use destructure + explicit Promise.all matching TTS pattern. Remove SRC_TO_ARTIFACT mapping constant. Pass config directly to addon constructor. - ops/diffusion.ts: pass params inline to model.run() matching TTS/OCR pattern. - model-registry.ts: loader field optional (loader?: FilesystemDL) with conditional spread for exactOptionalPropertyTypes. - sdcpp-config.ts: derive DiffusionClientParams from DiffusionRequest. Add descriptions to cfg_scale and guidance fields. - bun.lock: regenerated, removes file:../lib-infer-diffusion leak. - Remove shared-test-data/ directory (elephant.jpg leftover from img2img). - Remove dead verify* params from diffusion test definitions. * fix: add eslint-disable for optional MCP SDK import in example The @modelcontextprotocol/sdk is a user-installed optional dependency, not a project dependency. Suppress import/no-unresolved to unblock CI. * fix: bump diffusion-cpp to 0.1.1 for absolute path fix * fix: bump diffusion-cpp to 0.1.1 for absolute path fix
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
…sification addon
Introduces a new inference addon that classifies images into three
classes (food / report / other) using a fine-tuned MobileNetV3-Small
CNN running on the libggml CPU backend. Follows the established QVAC
addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion).
## What this PR ships
- New package `packages/qvac-lib-infer-ggml-classification/` publishing
as `@qvac/classification-ggml`:
- Native addon: custom 34-layer MobileNetV3-Small compute graph built
directly against the public `ggml.h` / `ggml-backend.h` API — no
llama.cpp application-layer dependency, so the addon remains
forward-compatible with future `libggml` upstream merges.
- Load-time BatchNorm fold with `eps = 0.001` (the architecture-
correct value; `1e-5` causes normalisation drift across all 34
layers). Depthwise separable convolutions, squeeze-and-excite
blocks, HardSwish / HardSigmoid / ReLU activations all wired
through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`,
`ggml_hardswish`, `ggml_hardsigmoid`.
- FP16 GGUF weights bundled inside the package (2.94 MB); class
labels are read from the GGUF `mobilenet.class_N` metadata so a
future fine-tune can ship different class names without a code
change.
- Public JS API: `new ImageClassifier({ modelPath?, logger?,
threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)`
/ `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input;
validates at the JS layer before reaching native code so no bad
input reaches libggml.
- `nativeLogger` opt-in (default `false`): the underlying
`qvac-lib-inference-addon-cpp` JsLogger holds a process-wide
static `uv_async_t` that is not safe across rapid create/destroy
cycles, so the native C++→JS log bridge is disabled unless the
caller explicitly opts in. JS-level logging always flows through
the caller's `logger`.
- Image preprocessing via vendored-through-vcpkg `stb_image` +
`stb_image_resize2` (bilinear resize to 224×224, ImageNet
normalisation, WHCN layout).
## Build + tests
- `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting
`ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from
the shared QVAC vcpkg registry.
- C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9
SE blocks), load + inference, determinism, `topK` filter, BN
epsilon guard, and full preprocessor behaviour.
- brittle + bare JS integration tests covering load, classify (all 6
public sample images under `test/images/`), `topK`, raw RGB input,
and every error path: null, empty buffer, corrupted JPEG,
unsupported format (BMP), mismatched dimensions, pre-load /
post-unload, tiny upscale, load/unload cycles.
- Mobile test scaffolding following the shared convention:
`scripts/generate-mobile-integration-tests.js`,
`scripts/validate-mobile-tests.js`, `test/mobile/
{integration-runtime.cjs, integration.auto.cjs, README.md,
testAssets/.gitignore}`. The auto-generated `integration.auto.cjs`
wraps every `test/integration/*.test.js` so the shared
`qvac-test-addon-mobile` framework picks them up on Android and iOS
automatically.
## CI workflows
Four addon-scoped workflows (path-filtered to this package):
- `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity
checks, TypeScript declaration check, C++ lint, prebuild matrix,
desktop integration tests, mobile integration tests, merge-guard.
- `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64,
Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64
prebuild matrix.
- `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop
end-to-end tests with the shared performance reporter writing a
GitHub step summary.
- `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` —
AWS Device Farm Android + iOS runs via the
`tetherto/qvac-test-addon-mobile` framework.
## Public-data / test-image policy
All public correctness assertions in this package are scoped to the 6
test images under `test/images/` (2 per class). No confidential
fine-tuning numbers, validation-set sizes, per-class metrics, or
references to any internal validation dataset appear in this PR, in
any file it ships, or in CI logs. Internal numerical-equivalence
gating against an ONNX FP32 reference is handled pre-release by a
development-only script that is not part of this PR.
## Out of scope for this PR
- SDK plugin / schema integration (`packages/sdk/**`) lands in a
follow-up PR after `@qvac/classification-ggml@0.1.0` is published
to npm. This mirrors the diffusion rollout (#656 → release → #1021).
- GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0.
Made-with: Cursor
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
…assification (#1727) * QVAC-17481 feat: add @qvac/classification-ggml MobileNetV3 image classification addon Introduces a new inference addon that classifies images into three classes (food / report / other) using a fine-tuned MobileNetV3-Small CNN running on the libggml CPU backend. Follows the established QVAC addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion). ## What this PR ships - New package `packages/qvac-lib-infer-ggml-classification/` publishing as `@qvac/classification-ggml`: - Native addon: custom 34-layer MobileNetV3-Small compute graph built directly against the public `ggml.h` / `ggml-backend.h` API — no llama.cpp application-layer dependency, so the addon remains forward-compatible with future `libggml` upstream merges. - Load-time BatchNorm fold with `eps = 0.001` (the architecture- correct value; `1e-5` causes normalisation drift across all 34 layers). Depthwise separable convolutions, squeeze-and-excite blocks, HardSwish / HardSigmoid / ReLU activations all wired through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`, `ggml_hardswish`, `ggml_hardsigmoid`. - FP16 GGUF weights bundled inside the package (2.94 MB); class labels are read from the GGUF `mobilenet.class_N` metadata so a future fine-tune can ship different class names without a code change. - Public JS API: `new ImageClassifier({ modelPath?, logger?, threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)` / `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input; validates at the JS layer before reaching native code so no bad input reaches libggml. - `nativeLogger` opt-in (default `false`): the underlying `qvac-lib-inference-addon-cpp` JsLogger holds a process-wide static `uv_async_t` that is not safe across rapid create/destroy cycles, so the native C++→JS log bridge is disabled unless the caller explicitly opts in. JS-level logging always flows through the caller's `logger`. - Image preprocessing via vendored-through-vcpkg `stb_image` + `stb_image_resize2` (bilinear resize to 224×224, ImageNet normalisation, WHCN layout). ## Build + tests - `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting `ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from the shared QVAC vcpkg registry. - C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9 SE blocks), load + inference, determinism, `topK` filter, BN epsilon guard, and full preprocessor behaviour. - brittle + bare JS integration tests covering load, classify (all 6 public sample images under `test/images/`), `topK`, raw RGB input, and every error path: null, empty buffer, corrupted JPEG, unsupported format (BMP), mismatched dimensions, pre-load / post-unload, tiny upscale, load/unload cycles. - Mobile test scaffolding following the shared convention: `scripts/generate-mobile-integration-tests.js`, `scripts/validate-mobile-tests.js`, `test/mobile/ {integration-runtime.cjs, integration.auto.cjs, README.md, testAssets/.gitignore}`. The auto-generated `integration.auto.cjs` wraps every `test/integration/*.test.js` so the shared `qvac-test-addon-mobile` framework picks them up on Android and iOS automatically. ## CI workflows Four addon-scoped workflows (path-filtered to this package): - `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity checks, TypeScript declaration check, C++ lint, prebuild matrix, desktop integration tests, mobile integration tests, merge-guard. - `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64, Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64 prebuild matrix. - `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop end-to-end tests with the shared performance reporter writing a GitHub step summary. - `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` — AWS Device Farm Android + iOS runs via the `tetherto/qvac-test-addon-mobile` framework. ## Public-data / test-image policy All public correctness assertions in this package are scoped to the 6 test images under `test/images/` (2 per class). No confidential fine-tuning numbers, validation-set sizes, per-class metrics, or references to any internal validation dataset appear in this PR, in any file it ships, or in CI logs. Internal numerical-equivalence gating against an ONNX FP32 reference is handled pre-release by a development-only script that is not part of this PR. ## Out of scope for this PR - SDK plugin / schema integration (`packages/sdk/**`) lands in a follow-up PR after `@qvac/classification-ggml@0.1.0` is published to npm. This mirrors the diffusion rollout (#656 → release → #1021). - GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0. Made-with: Cursor * QVAC-17481 fix(ci): correct setup-bare-tooling action name in classification workflows The prebuild and integration-test workflows for @qvac/classification-ggml referenced `tetherto/qvac/.github/actions/setup-bare-toolchain`, which does not exist. The action is named `setup-bare-tooling` (same name used by the llamacpp-llm, nmtcpp, and diffusion addons at the identical pinned SHA). All 6 prebuild matrix jobs failed at step 1 with "Can't find 'action.yml' ... for action 'setup-bare-toolchain'" until this rename is in place. Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(ci): add per-platform vcpkg/NDK/Apple-clang setup to classification prebuilds The classification prebuilds workflow was missing the per-platform toolchain steps that sibling addons (diffusion, nmtcpp) have after `setup-vcpkg-cache`. As a result, `VCPKG_ROOT` was never exported, CMake couldn't locate the vcpkg toolchain, and `bare-make build` failed on every platform. Changes to .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml: - setup-vcpkg-cache: drop unknown inputs `vcpkg-path` and `github-packages-token` (action only accepts platform, arch, s3-bucket-path). Was silently ignored but emitted warnings. - Add per-OS vcpkg bootstrap / configuration: macOS (darwin, ios): clone microsoft/vcpkg tag 2025.12.12, bootstrap, export VCPKG_ROOT. Linux (linux, android runners): export VCPKG_ROOT=$VCPKG_INSTALLATION_ROOT. Windows: export VCPKG_ROOT from $env:VCPKG_INSTALLATION_ROOT with backslash-to-forward-slash normalisation. - Windows-only: set CMAKE_GENERATOR="Visual Studio 17 2022" and, for the x64 matrix row, CMAKE_GENERATOR_PLATFORM=x64. - Android-only: export ANDROID_NDK / ANDROID_NDK_HOME / ANDROID_NDK_ROOT from ANDROID_NDK_LATEST_HOME, derive ANDROID_TOOLCHAIN_ROOT, set ANDROID_NATIVE_API_LEVEL=24. - iOS and darwin: move Homebrew llvm / llvm@18 aside so the Apple toolchain clang is on PATH (matches diffusion). All additions mirror the working pattern in prebuilds-lib-infer-diffusion.yml and prebuilds-qvac-lib-infer-nmtcpp.yml at the same pinned action SHA. No Vulkan or apt X11 steps were added: this addon is CPU-only ggml and has no graphics dependencies. Made-with: Cursor * QVAC-17481 fix: add missing <limits> include and CI build-failure diagnostics Two related changes to unstick the prebuild matrix: 1. addon/src/model-interface/ImagePreprocessor.cpp uses std::numeric_limits<int>::max() but does not #include <limits>. MSVC pulls <limits> in transitively (via <algorithm> in its STL), but libc++ and libstdc++ on clang/gcc do not. This is the most plausible reason all five non-Windows prebuild jobs (linux-x64, linux-arm64, android-arm64, darwin-arm64, ios-arm64) failed identically at `bare-make build` while the Windows host build succeeded. 2. prebuilds-qvac-lib-infer-ggml-classification.yml gains a `Dump build context on failure` step that runs only if `bare-make build` fails. It prints toolchain identity, lists the build/ tree, tails CMake configure logs, dumps any *.log under build/, and tails up to 20 vcpkg buildtree logs. Mirrors the `Dump vcpkg build logs on failure` pattern in prebuilds-lib-infer-diffusion.yml. Without this, every CI failure currently surfaces only as `Process completed with exit code 1.`, which is essentially undebuggable from the run summary page. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(ci): use --platform (not --target) for bare-make generate Root cause confirmed from job log of run 24850328468 (linux-x64): bare-make generate --target linux --arch x64 Bail: UNKNOWN_FLAG: target The bare-make CLI installed by setup-bare-tooling does not accept `--target`; it only accepts `--platform`. Diffusion and nmtcpp both use `--platform`. Locally I had an older bare-make that accepted `--target` as an alias, which masked the bug on my Windows host. Step 17 (Generate build) was failing immediately with the above "Bail: UNKNOWN_FLAG", causing every downstream step (build, install) to fail too across all 6 prebuild matrix jobs. Also harden the diagnostic step `Dump build context on failure`: disable `-e` and `pipefail` for that step so a missing `build/` directory or empty `find` result no longer makes the diagnostic step itself exit non-zero (it should never mask the real failure). Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix: pin ggml to CPU-only feature set + guard backend iteration CI runs were failing because the default ggml vcpkg feature set pulls in the `vulkan` (Linux/Windows/Android) and `metal` (Apple) GPU backends, which forces `find_package(Vulkan)` at configure time and forces the prebuilds workflow to install the Vulkan SDK on every runner. Since this addon is CPU-only by design (only ever calls ggml_backend_cpu_init), the GPU backends are dead weight: extra compile time, extra dependencies in shipped prebuilds, and extra runtime requirements on user machines (e.g. libvulkan.so.1). Two related changes, no functional impact on the addon itself: 1. packages/qvac-lib-infer-ggml-classification/vcpkg.json Add "default-features": false` to the ggml dependency. This opts out of vulkan / metal / cuda / opencl while keeping the core CPU backend (which is the implicit base, not a named feature). Verified locally on win32-x64: vcpkg rebuilt `ggml:x64-windows@2026-01-30#5` from source in 26s without Vulkan, generate + build + install all green, and the JS integration test ran the model end-to-end producing correct top labels (food/report/other) for every sample image. 2. packages/qvac-lib-infer-ggml-classification/CMakeLists.txt Guard the GGML_AVAILABLE_BACKENDS iteration with `if(TARGET ggml::${_backend})`. The upstream variable advertises every backend the port knows about, but real CMake targets only exist for backends that were actually built. Without the guard, add_bare_module's get_target_property() crashes on Android (where Vulkan and OpenCL are listed as available but not built). Defensive change; no behavioural difference when targets do exist. Local artifact size: prebuilds/win32-x64/qvac__classification-ggml.bare is 1.6 MB; no shipped vulkan loader. Made-with: Cursor * QVAC-17481 fix(ci): match prebuild- artifact prefix in mobile tests The mobile integration workflow downloaded artifacts with patterns `android-*` / `ios-*` (PREBUILD_ARTIFACT_PREFIX was empty), but the prebuilds workflow names artifacts `prebuild-android-arm64` / `prebuild-ios-arm64`. Result: `Total of 0 artifact(s) downloaded`, followed by "ERROR: No prebuilds found!" — both Android and iOS mobile jobs failed at this exact step in run 24891210942. Set PREBUILD_ARTIFACT_PREFIX to "prebuild-" so the resulting patterns become `prebuild-android-*` and `prebuild-ios-*`, matching the actual artifact names. Mirrors how the desktop integration workflow already filters (it uses `prebuild-${platform}-${arch}*` directly). File: .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(model): zero-input warmup pass to defeat cold-inference NaN ggml's backend graph allocator leaves intermediate tensor buffers and the input/output tensors uninitialised after `buildGraph` returns. Whatever stale heap residue happens to occupy those addresses can leak into the very first inference and produce non-finite logits on a heap-state-dependent basis. CI run 24891210942 caught this on win32-x64: meal_1.jpg (the first sample classified after instance creation) failed assert 9 (`Math.abs(sum - 1) < 1e-3` -- probabilities sum was not ~1) and assert 10 (`result[0].confidence >= result[1].confidence` -- sort comparison broke because the first confidence was NaN). Asserts 11..72 covering the other five sample images all passed: by then the second inference had overwritten the dirty buffers with real data. This is a classic uninit-memory bug: behaviour depends on whatever the heap happens to contain at process start. My local Windows build did not trip on it (different heap layout); the Azure CI runner did. Same compiler family, same code, different result. Fix: at the end of `ClassificationModel::load()`, run one full forward pass with a zero-filled input tensor and discard the output. This forces ggml's compute graph to write every backend buffer with a deterministic value before any user-visible classify() call ever sees the model. Cost is one cold inference per `load()` (~50-200 ms on a CPU runner), paid once at addon startup, never visible to the caller. Local validation on win32-x64 with this change: integration test 1 (72/72 asserts including all sum-to-one and sort-desc checks) now passes deterministically across rebuilds. The unrelated lifecycle SIGSEGV between separate ImageClassifier instances (likely in qvac-lib-inference-addon-cpp's JobRunner / OutputCallbackJs uv_ resources, not addressed here) still surfaces, just later in the test run -- that needs a separate investigation in addon-cpp. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 fix(model): full-pipeline warmup eliminates win32 cold-inference NaN The previous zero-input warmup (commit af12cdd1) wrote zeros directly to the input tensor and ran ggml_backend_graph_compute. CI run 24892803959 showed it was insufficient: win32-x64 still failed asserts 9 + 10 on meal_1.jpg with NaN in result[0].confidence, while linux-arm64 / darwin / linux-x64 all passed. Hypothesis: ggml's CPU backend on MSVC has lazy-init code paths (SIMD kernel JIT / FP state setup) that only trigger on non-trivial inputs reaching the post-preprocess range, and the zero-input warmup didn't exercise them. The bug therefore surfaces on the first real classify() with an ImageNet-normalised image. Fix: replace the synthetic warmup with one that goes through the EXACT same pipeline classify() uses end-to-end: 1. Synthesise a small (32x32) raw RGB buffer with a deterministic non-zero gradient pattern (uint8 values from `(i * 7) & 0xFF`). 2. Run preprocess::preprocessToTensor on it (resize to 224x224 + ImageNet normalise + channel reorder to WHCN). 3. ggml_backend_tensor_set the result, run the full compute graph, and read the output back via ggml_backend_tensor_get. Cost: one full classify-equivalent pass at load() time (~50-200 ms on a CPU runner), paid once per ImageClassifier instance, never visible to the caller. Output is discarded; the goal is to leave every backend buffer fully written and every lazy-init code path exercised before user-visible classify() runs. Local validation on win32-x64: 14/14 integration tests pass with this change (was failing test 1 asserts 9 + 10 on meal_1 before). Also applies the clang-format-19 layout the cpp-lint check expected, unblocking that job. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 fix(addon): drain in-flight job in unload(); persistent perf reporting Two related changes that together unblock multi-instance integration tests across linux-x64 / darwin-arm64 / android / ios and address the inference-latency-visibility ask. 1. addon.js — make unload() wait for the in-flight job to settle The previous unload() flow rejected this._pending immediately and then synchronously called binding.destroyInstance(). The native side (qvac-lib-inference-addon-cpp's JobRunner uses a worker thread; OutputCallbackJs uses a uv_async_t handle) often still had a callback pending at that moment, and destroying the instance underneath the in-flight callback raced with the uv_close lifecycle. The result was a SIGSEGV (use-after-free) observed across linux-x64 (both ubuntu-22.04 + 24.04), darwin-arm64, and the on-device Android/iOS Device Farm jobs in CI runs 24891210942 and 24892803959. linux-arm64 happened to win the race on those runs but the bug is fundamentally non-deterministic. Fix: track a separate `_pendingSettled` Promise that resolves the moment _outputCallback fires (whether the user-facing classify() Promise resolved or rejected). unload() now awaits that signal before calling destroyInstance, so the worker thread / async handle have provably finished when the native teardown runs. The user-facing classify() Promise contract is unchanged. This is a correctness improvement to the ImageClassifier API contract: after `await classifier.unload()` returns, native resources are now genuinely released (not "scheduled to be released, please don't peek"). 2. test/integration/utils.js + classify.test.js — crash-survivable inference-latency reporting + load-time metric The performance-report.json was previously only flushed in process.on('exit'), so any SIGSEGV mid-test discarded all collected metrics. Now we additionally flush the JSON file after every recorded metric. Even a partial run leaves a usable per-platform latency snapshot in the uploaded artifact. Also adds recordLoadTime(label, ms) to capture the cost of constructing + load()ing an ImageClassifier (warmup + GGML graph build + weights read), and threads it into the first integration test as `load:cold`. This complements the per-image classify timings already recorded as `classify:<file>` and uploaded as artifact `classification-perf-report-{platform}-{arch}`. Local validation on win32-x64: 14/14 tests pass cleanly with this change set; performance-report.json contains 7 results (load:cold + 6 classify:<file>) on disk before the process exits. Files: packages/qvac-lib-infer-ggml-classification/addon.js packages/qvac-lib-infer-ggml-classification/test/integration/utils.js packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js Made-with: Cursor * QVAC-17481 fix(addon): defer OutputCallBackJs destruction to avoid use-after-free race Root cause (in `qvac-lib-inference-addon-cpp:OutputCallBackJs.hpp`): The upstream destructor calls `uv_close(asyncHandle, deleter)` -- which is asynchronous -- and then IMMEDIATELY runs `js_delete_reference` on its JS handle/callback refs before returning. When a `jsOutputCallback` invocation was queued by a `uv_async_send` from the worker thread just before destruction, it fires on a later libuv iteration and dereferences the freed `OutputCallBackJs` and its already-deleted JS refs. This explained the SIGSEGV (linux-x64 24.04, darwin-arm64) and the on-device APP CRASH (Android / iOS Device Farm) observed across rapid ImageClassifier create/destroy cycles in CI runs 24891210942, 24892803959, 24897445066. The bug is timing-dependent, which is why linux-arm64 consistently wins the race and passes while other platforms fail. Fix (this commit, in our binding.cpp only): Introduce a `DeferredOutputCallBackJs` wrapper that implements `addon_cpp::OutputCallBackInterface` by composing the upstream `addon_cpp::OutputCallBackJs` as a `unique_ptr` and forwarding `initializeProcessingThread / notify / stop` calls to it. The wrapper is what `AddonCpp` now owns; the inner upstream callback is owned by our wrapper. AddonCpp field destruction order is: 1. `~AddonCpp` body: `outputCallback_->stop()` (our wrapper's stop forwards to inner). 2. `jobRunner_` destroyed: JOINS the worker thread. No new `uv_async_send` can happen from this point on. 3. `outputCallback_` destroyed: our wrapper's destructor runs. 4. There may still be `uv_async_send` callbacks QUEUED before step 2 that are pending on the libuv loop. Our destructor releases ownership of the inner callback into a heap-allocated `uv_check_t` whose callback (firing AFTER the poll phase on the next libuv iteration -- i.e. after any queued async callback has fired safely against the still-alive inner) deletes the inner, then closes and deletes itself. The check handle is unref'd so it does not keep the libuv loop alive on its own. This is a real lifetime-management fix, not a timing workaround. When upstream's destructor is corrected, the wrapper becomes a pass-through with no functional effect. We will also submit the fix upstream. Local validation on win32-x64: 14/14 integration tests pass, 90/90 asserts, including test 14 (`load -> unload -> load cycles do not leak handles`) which explicitly exercises the pattern that was racing the upstream bug. File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp Made-with: Cursor * QVAC-17481 fix(model,test): defensive softmax/sort + per-inference diagnostic trace Three related changes that together (a) make the classification output well-formed under any numerical edge case and (b) give us first-class visibility into whatever the model actually returns on every CI platform. No workarounds or test-masking -- the C++ changes apply uniformly to production classify() calls and the diagnostic logs are plain stderr output behind an opt-in env var (plus always-on per-image t.comment() in tests). 1. addon/src/model-interface/ClassificationModel.cpp -- softmax() Previously: - Called std::max_element on a span that could contain NaN (max_element behaviour on NaN is unspecified). - Skipped normalization when sum <= 0 but RETURNED the unnormalized probs (could leave callers with all-zero or non-sum-to-1 probabilities). Now: - Finds max by explicit isfinite() walk, defaulting to -inf if every logit is non-finite. - If max is non-finite (all NaN/Inf), returns a uniform distribution (1/N per class) so callers always see a valid probability vector that sums to 1. - Per-element exp() input is skipped when non-finite (produces 0 for that element rather than NaN). - If the exponential sum is not finite or <= 0, falls back to uniform distribution instead of returning unnormalized zeros. This is defence in depth. MobileNetV3-Small on well-normalized input never produces NaN logits in practice, but if upstream ggml CPU backend ever surfaces a numerical bug (or a future quantised model does) we now cannot silently corrupt the user-visible probability distribution. 2. addon/src/model-interface/ClassificationModel.cpp -- std::sort Added explicit is-finite guards in the comparator. Non-finite confidences now compare as less than any finite value, giving strict-weak-ordering even with degenerate inputs. Previously, any NaN in the confidences would make the comparator non-strict-weak and std::sort behaviour undefined (one observed symptom: top class label at index 0 but some later index carrying a higher confidence). 3. addon/src/model-interface/ClassificationModel.cpp -- trace hook New `QVAC_CLASSIFICATION_TRACE=1` env var toggles a per-inference stderr print of: - raw logits as read from the ggml output tensor - probabilities immediately after softmax (pre-sort) - final sorted results Off by default -- production users see nothing. Enabled in our CI integration-test workflow (in the third file below) so every run carries the numerical ground truth for every sample image. If a platform-specific anomaly ever recurs (e.g. the win32 meal_1 oddity we have been chasing) the log lines let us diagnose without adding further instrumentation. 4. test/integration/classify.test.js Before each per-image assertion block, emit a `t.comment(...)` line containing the full sorted result (label + 6-digit confidence per entry, plus elapsed ms). Brittle surfaces comments in the TAP stream regardless of pass/fail, so every CI job log now records the actual model output side-by-side with the assertion outcome. This replaces the need for post-hoc instrumentation commits when diagnosing numerical issues. 5. .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Set `QVAC_CLASSIFICATION_TRACE=1` on the integration-test step so the C++ trace lines land in CI logs by default. Bounded output (3 lines per inference, ~20 inferences per job), negligible cost. Local validation on win32-x64: 14/14 integration tests pass, 90/90 asserts. Trace output verified: all 6 sample images produce sensible logits and sum-to-1 probabilities; top class matches expected label in every case. Trace lines and t.comment()s visible in both the pass and (hypothetically) fail paths, as intended. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix: clang-format + defensive marshalling + finer test assertions Three coordinated changes that (a) unblock cpp-lint, (b) make the C++ -> JS marshalling robust against compiler code-gen quirks, and (c) make every test failure self-diagnostic so we never have to add post-hoc instrumentation again. 1. addon/src/model-interface/ClassificationModel.cpp -- clang-format Apply the exact diff that cpp-lint reported in run 24900278513: drop the blank line between <gguf.h> and the addon-cpp include, wrap the std::sort args one-per-line, and split the multi-arg static_cast<double>(...) chain in the trace fprintf to one arg per line. Pure formatting; no behaviour change. 2. addon/src/addon/AddonJs.hpp -- defensive marshalling + per-entry trace inside JsClassifyOutputHandler The lambda now reads the label and the confidence into named local variables (`labelString`, `confidenceFloat`, then `confidenceDouble = static_cast<double>(confidenceFloat)`) BEFORE handing them to `jsu::String::create` / `jsu::Number::create`. The previous inline expression jsu::Number::create(env, static_cast<double>(cppOut.results[i].confidence)) produced 0 in JavaScript for index 0 only on win32-x64 (clang-cl), while indices 1..N marshalled correctly -- visible in run 24900278513 win32 log: C++ trace shows {food:0.707883} but JS receives {food:0.000000}, all other entries OK. Materialising the values into named locals forces the compiler to commit the values to memory before the call sequence and dodges that code-gen pattern. Linux, macOS, and Windows continue to pass; this is risk-free defence-in-depth even if Windows turns out to have a deeper issue. Also adds an opt-in trace line per array element (gated by the same QVAC_CLASSIFICATION_TRACE=1 env var as ClassificationModel::process()), printing label, float, and double values as the lambda actually sees them. Combined with the existing process()-level trace, we now get the full pipeline view -- raw logits -> probs -> sorted results -> per-entry marshalling -- on every CI run with no manual instrumentation needed. 3. test/integration/classify.test.js -- finer assertions Replace coarse "confidence is in [0,1]" with split assertions that distinguish: typeof number / Number.isFinite (NaN/Inf detection) / range check. Per-entry assertion messages now include the array index AND the actual value so a failure line tells you exactly what went wrong. Same treatment for the sum and the sort-desc checks. Topk / sequential / raw-RGB tests gain explicit Number.isFinite checks plus t.comment() output of the full result, so they no longer silently swallow the kind of value-corruption bug that was hidden in test 2 of the previous CI run. Local validation on win32-x64: 14/14 tests pass; assertion count went from 90/90 to 140/140 with the new finite-checks. Marshalling trace verified emitting label / float / double per element under QVAC_CLASSIFICATION_TRACE=1. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js Made-with: Cursor * QVAC-17481 fix(mobile,addon): mobile model path via testAssets + cpp-lint uv.h order - `test/integration/utils.js`: add `resolveModelPath()` that resolves the GGUF weights via `global.assetPaths` on iOS/Android (the bare worklet runs from a packed `app.bundle/...` virtual root and cannot read the npm package's `weights/` directory), and falls back to the bundled desktop path otherwise. Throw a clear synchronous error when the asset is missing so it surfaces as a brittle assertion instead of an unhandled-promise-rejection that aborts the bare worklet. - `test/integration/classify.test.js`, `test/integration/error-cases.test.js`: use `resolveModelPath()` for every `ImageClassifier` instance. - `scripts/copy-mobile-test-assets.js`: replace the inline shell `mobile:copy-prebuilds` script with a portable Node script that fans out the single arm64 prebuild into the per-flavour directories the qvac-test-addon-mobile framework expects. - `package.json`: wire the new script in as `mobile:copy-prebuilds`. - `addon/src/addon/AddonJs.hpp`: include `<uv.h>` and reorder includes to satisfy `clang-format-19`'s grouping rules so cpp-lint passes in CI. - `.gitignore`: keep downloaded Device Farm logs (`remote_logs/`) and ad-hoc validation scripts out of the working tree. Made-with: Cursor * QVAC-17481 fix(mobile,addon): testAssets .gguf.bin extension + win32 burn-one js_create_double - `scripts/copy-mobile-test-assets.js` + `test/integration/utils.js`: copy the GGUF weights into `test/mobile/testAssets/` with a `.gguf.bin` suffix and look them up by that key. The qvac-test-addon-mobile framework's metro.config.js does not register `.gguf` as an asset extension, so a raw `.gguf` file is treated as a JS-source request and the bundler aborts at `:app:createBundleReleaseJsAndAssets`. `.bin` is in the framework's accepted list and ggml's `gguf_init_from_file` does not validate the file extension. - `addon/src/addon/AddonJs.hpp`: add a defensive "burn one" `js_create_double(env, 0.0, &dummy)` call at the top of the classification result lambda. On Win32 (clang-cl + bare runtime + V8) the very first `js_create_double` call inside a fresh handle scope returned 0 for index 0 even though the C++ side passed the correct value; consuming that slot unblocks every subsequent call. Gated trace output behind `QVAC_CLASSIFICATION_TRACE=1`. Made-with: Cursor * QVAC-17481 fix(mobile): copy test images to mobile testAssets to fix Android/iOS ENOENT `test/integration/utils.js:loadImage()` previously read every test image with `fs.readFileSync(path.join('test','images',name))`. On mobile that resolves into the packed `app.bundle/...` virtual root, where `test/images/` is not present, and the bare runtime aborts with `FileError: ENOENT, open "/app.bundle/backend/test/images/<file>"` right after the model loads (Pixel 9 Pro logcat from the previous CI run pinpointed this). Fixed by: - `scripts/copy-mobile-test-assets.js`: also copy every `test/images/*.{jpg,jpeg,png}` into `test/mobile/testAssets/`. JPEG and PNG are part of metro's default `assetExts`, so no rename is needed (unlike the GGUF blob). - `test/integration/utils.js`: add `_resolveImagePath()` that on mobile reads from `global.assetPaths['../../testAssets/<name>']` with the same key fallbacks as `resolveModelPath()`, and on desktop returns `test/images/<name>`. Throw with sample asset keys when the lookup fails so the failure is a brittle assertion. - `test/mobile/testAssets/.gitignore`: also ignore `*.jpg`/`*.jpeg`/ `*.png` so the populated images are not committed. Made-with: Cursor * QVAC-17481 docs: README revisions for mobile assets, FP16, topK and prose reflow - Document new `npm run mobile:copy-prebuilds` flow that populates `test/mobile/testAssets/` with prebuilds, the `.gguf.bin` weights blob, and the integration test images (fixes mobile ENOENT crash). - Replace the obsolete "Cold start" claim with a "First-call overhead" note that reflects the full-pipeline warmup added in `load()` and the remaining JS/JIT/decoder/page-cache effects. - Add a "Why FP16 weights?" subsection capturing the precision-vs-size rationale (FP16 matches FP32 accuracy on the validation set; more aggressive quantizations degraded noticeably). - Expand the topK section with a plain-language one-liner. - Add a runtime trade-off paragraph under "Why a custom GGML graph?": GGML CPU is slower than PyTorch/ONNX at this scale, but the absolute gap is negligible for a ~2.5 M-param model; larger classifiers would need extra graph-level optimisation. - Fix `funetuned` -> `fine-tuned` typo. - Reflow paragraphs to single lines so markdown viewers can soft-wrap. Made-with: Cursor * QVAC-17481 fix(graph): validate GGUF num_classes and assert output shape (review #1727) Addresses two `[BUG]` review comments from @olyasir on tetherto/qvac#1727 about the hardcoded `kNumClasses = 3` not being validated against either the loaded GGUF's `mobilenet.num_classes` metadata or the actual element count of the constructed output tensor. Both are downstream-safety problems for the per-inference path: float logits[graph::kNumClasses] = {0.0F}; ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits)); `sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this either reads OOB (numClasses < kNumClasses) or silently truncates (numClasses > kNumClasses); on the FC-weight-upload side the `classifier.3.weight = [1024, kNumClasses]` shape would also fail to match the GGUF tensor and corrupt the classifier. Changes: 1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights() Right after reading `numClasses` from `mobilenet.num_classes`, compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)` with a descriptive message (actual vs expected count, plus a hint to rebuild the addon or use a matching GGUF). This is the primary fix olyasir requested in `MobileNetGraph.cpp`. The error path is reachable from `ClassificationModel::load()`'s call to `graph::loadWeights(...)`, which already runs inside the JS-side `await classifier.load()` Promise; the `StatusError(InvalidArgument)` propagates as a structured rejection on the JS side, matching how every other config-time validation error in this addon surfaces. 2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph() At the end of the graph build, before we hand the `ComputeGraph::output` tensor over to the backend allocator, assert `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which throws `StatusError(InternalError, ...)`) if the invariant is violated. This is the defence-in-depth fix olyasir requested in the second `[BUG]` comment in `ClassificationModel.cpp`: it makes the 12-byte stack-array `ggml_backend_tensor_get` read provably safe regardless of how the output tensor was constructed. This second check is not redundant with #1: it also catches a future accidental edit to the classifier wiring above (where the tail `classifier.3` linear is what determines the output element count), an upstream ggml change to how `mul_mat` shapes its result, or a GGUF that lacks the `mobilenet.num_classes` metadata key entirely and falls back to `kNumClasses` but ships mismatched FC weights. Local validation on win32-x64: - 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph determinism, preprocessor suite -- they all exercise the validated load + build paths against the bundled FP16 GGUF, where `num_classes == 3` so neither check fires). - 14/14 JS integration tests pass, 140/140 asserts (no behaviour change for the supported model; new error paths are unreachable with the bundled weights). Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp Made-with: Cursor * QVAC-17481 fix(preprocess): pre-decode size check via stbi_info_from_memory (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Could we check this before decoding? `stbi_info_from_memory()` would > let us reject oversized images / total pixel count before > `stbi_load_from_memory()` allocates Why it matters: `stbi_load_from_memory` allocates the full decoded RGB buffer (width * height * 3 bytes) before any caller-provided dimension limit is enforced. For a 16384x16384 image at the upper edge of `kMaxImageDimension`, that is ~768 MB of heap allocated before we see the dimension and reject -- enough to OOM a memory-constrained device or trigger an oversized free. `stbi_info_from_memory` parses only the image header (a few hundred bytes) and reports the dimensions cheaply, so we can reject oversized inputs up-front. The post-decode dimension check is kept as belt-and-braces in case `stbi_info` and `stbi_load` ever disagree (e.g. truncated streams that parse a valid header but fail mid-decode); it is a correctness check, not the primary OOM defence. Behaviour: - If `stbi_info` succeeds and reports dimensions over `kMaxImageDimension`, `decodeToRgb` throws `StatusError(InvalidArgument, ...)` with the actual reported size in the message, before any decode allocation runs. - If `stbi_info` fails (header could not be parsed), we fall through to `stbi_load_from_memory`. That path already throws with `stbi_failure_reason()` attached, which is a more user-actionable message than a generic "header bad" we would emit ourselves. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp Validated locally on win32-x64: 14/14 JS integration tests pass. Made-with: Cursor * QVAC-17481 test(preprocess): expand ImagePreprocessor unit coverage (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Could we add more unit coverage for ImagePreprocessor before merging? > preprocessor_test.cpp covers some happy paths, but a few public > functions/branches still look uncovered: > - decodeToRgb() success/failure paths are not tested directly. > - preprocessToTensor() is only covered for empty input; it should > also cover encoded JPEG/PNG success, raw RGB success, and > unsupported non-image input without dimensions. > - validateRawRgb() is missing empty buffer, zero width/height, and > over-kMaxImageDimension cases. > - normalizeToWhcn() should cover invalid input size. Adds the following PreprocessorTest cases (14 new tests, taking the suite from 10 to 24 -- all 29 cases across the addon's two C++ test binaries pass on win32-x64): decodeToRgb: - DecodeToRgbDecodesValidJpeg -- happy path against test/images/meal_1.jpg - DecodeToRgbRejectsEmptyBuffer - DecodeToRgbRejectsCorruptedBytes - DecodeToRgbRejectsTruncatedJpeg preprocessToTensor (full pipeline): - PreprocessToTensorAcceptsEncodedJpeg -- JPEG happy path with finite-output check - PreprocessToTensorAcceptsRawRgb -- raw RGB happy path with finite-output check - PreprocessToTensorRejectsBmpWithoutDimensions - PreprocessToTensorRejectsRawWithMissingDims validateRawRgb edges: - ValidateRawRgbRejectsEmptyBuffer - ValidateRawRgbRejectsZeroWidth - ValidateRawRgbRejectsZeroHeight - ValidateRawRgbRejectsOverKMaxImageDimensionWidth - ValidateRawRgbRejectsOverKMaxImageDimensionHeight normalizeToWhcn: - NormalizeToWhcnRejectsWrongInputSize Adds a `readTestImage(name)` helper that walks up from the current binary location to find `test/images/<name>`, mirroring the `findWeightsPath()` helper already in classification_model_test.cpp. JPEG-using tests skip cleanly via GTEST_SKIP() if the image is not present, so the C++ test suite still passes when run from a packed tarball that does not include the test images. File: packages/qvac-lib-infer-ggml-classification/test/unit/preprocessor_test.cpp Made-with: Cursor * QVAC-17481 refactor(model): flatten ClassificationModel::Impl pidgeonhole (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Why one extra level of indirection with `Impl`? Maybe style, but I > see no strong benefit and it just scatters the code around and > makes it harder to track. I would prefer a straightforward class > where all these variables can be directly under > `ClassificationModel` private variables. The PIMPL was originally there to keep ggml types out of the public header. In practice this header is only included by the addon's own `AddonJs.hpp`, which already pulls in the entire qvac-lib-inference-addon-cpp framework, so there is no header-fanout benefit from hiding ggml. Flattening the impl removes one level of heap indirection, lets all members be visible at a glance, and lets clang-tidy / IDE navigation jump straight to the field declarations. Changes: 1. addon/src/model-interface/ClassificationModel.hpp - Pull in `<ggml-backend.h>` and the local `MobileNetGraph.hpp` (which exposes `WeightsBundle` / `ComputeGraph` definitions used by the new direct members). - Replace `struct Impl;` forward declaration and `std::unique_ptr<Impl> impl_;` with the eight direct private members the Impl previously held: `modelPath_`, `backend_`, `weights_`, `compute_`, `labels_`, `numThreads_`, `loaded_`, `lastInferenceUs_`. Member ordering is documented in a comment: ggml requires every backend buffer to be released BEFORE the backend it was allocated on, and `~ClassificationModel` enforces that ordering explicitly with `compute_.reset(); weights_.reset();` before `ggml_backend_free(backend_)`. 2. addon/src/model-interface/ClassificationModel.cpp - Remove the `struct ClassificationModel::Impl { ... };` definition and the `std::make_unique<Impl>()` from the constructor body. - Replace every `impl_->X` with `X_` (34 references). No functional change. - Drop redundant `if (!impl_)` guards in `setNumThreads()`, `load()`, `runtimeStats()`, and `process()`. The class is non- copyable and non-movable (it carries a `std::mutex` member, which suppresses implicit move ctors/assignment), so `impl_` was always non-null between construction and destruction; the guards were dead code. Local validation on win32-x64: - `bare-make build` clean (warnings unchanged from before refactor; no new errors). - `npm run test:cpp` -- 29/29 tests pass (3 ClassificationModelTest + 24 PreprocessorTest + 1 BnEpsilonGuard + 1 architecture sanity). - `npm run test:integration` -- 14/14 tests pass, 140/140 asserts. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 refactor(addon,binding): single-place arg validation in C++ AddonJs (review #1727) Addresses jesusmb1995's review comments on tetherto/qvac#1727: > Why normalizing here instead of just throwing at `AddonJs` and > having a central place where to do the validation? I had previous > conversations with Gianfranco (and Nidhin) on LLM we agreed it > makes sense to do parsing/validation at on place, namely at AddonJs > construction, and throw there if wrong/invalid arguments directly > at c++. > > For construction/config arguments, `createInstance()` should be the > place that parses and validates the JS values before building the > native model: model path, threads, and any other config should > either produce a valid C++ configuration or throw immediately > there. That keeps the JS wrapper thin and avoids having two > different sources of truth for what is valid. > > For per-call image arguments, the same principle applies at the > native job boundary before `ClassificationModel`: parse the JS > input once, construct an explicit validated `ClassifyInput`, and > then let the model/preprocessor operate on that clean shape. That > removes the duplicated JS normalization/magic-byte checks and > avoids relying on weak `0` sentinel values for "not provided". Changes: 1. addon/src/model-interface/ClassificationModel.hpp - Replace the four sentinel-zero fields (`width = 0`, `height = 0`, `channels = 0`, `topK = 0` overloaded as "not provided") with an explicit `std::optional<RawRgbDims>` member that captures the "is the input raw RGB or encoded?" decision in a type the compiler can check. - `topK = 0` stays only because it has a meaningful "no filter" interpretation; non-zero values are validated > 0 at the binding boundary. 2. addon/src/model-interface/ClassificationModel.cpp - Translate `optional<RawRgbDims>` -> the existing `(declaredWidth, declaredHeight, declaredChannels)` triplet consumed by `preprocess::preprocessToTensor`. The preprocessor's internal "0 means not-provided" convention is preserved (it is a private API; the JS-facing one is the explicit optional). 3. addon/src/addon/AddonJs.hpp - `createInstance` now validates: * `path` must be a non-empty string, * `config.threads` (when provided) must be a positive integer. These were previously not enforced; non-positive thread counts would have silently passed through to libggml and raw negatives would int-truncate. - `runJob` is now the single source of truth for per-call validation: * `content` rejection message rephrased to include the substring "required" so the JS test `t.exception.all(..., /required|null|undefined/i)` keeps passing without relying on a separate JS-side TypeError. * Dimension triplet enforcement: caller must provide either all of {width, height, channels} or none of them; partial shapes are rejected with an explicit message rather than leaking through as a buffer-size mismatch downstream. * Each dim is range-checked as int32_t before being committed to ClassifyInput's optional<RawRgbDims>, so a negative JS Number cannot wrap to ~4 billion via uint32_t cast and tunnel into validateRawRgb. * `topK` is range-checked > 0 if provided. 4. test/unit/classification_model_test.cpp - Migrate the three `input.width = ...; input.height = ...; input.channels = ...;` blocks to the new `input.rawRgb = qcc::RawRgbDims{...};` shape. No behavioural change. 5. index.js - Strip every JS-side validation helper that duplicated C++ work: `assertBuffer`, `normaliseDimensionOptions`, `isSupportedEncoded`, `startsWith`, `JPEG_MAGIC`, `PNG_MAGIC`. The classify() body now literally builds `{ type, content, [width, height, channels, topK] }` from the caller's arguments and forwards to the binding. - Lifecycle checks (`!this._addon || !this.state.configLoaded`) and the file-existence check in `load()` stay in JS: * lifecycle is a JS-managed state, not a value-shape question; * the existence-check delivers a more actionable error message ("MobileNet GGUF weights not found at: <path>") than letting the load reach C++ and throw "Failed to open GGUF file: <path>" downstream. - Module-level comment documents the JS-as-thin-pass-through contract so a future contributor cannot re-introduce the duplicated validation by mistake. Local validation on win32-x64: - `bare-make build` clean. - `npm run test:cpp` -- 29/29 (incl. the migrated raw-RGB ClassificationModelTest cases). - `npm run lint` -- clean. - `npm run test:integration` -- 14/14 tests, 140/140 asserts. All existing brittle regex matchers in `error-cases.test.js` (`/required|null|undefined/i`, `/empty/i`, `/format|invalid/i`, `/decode|jpeg|invalid/i`, `/match|size|width|height|raw/i`, `/format|jpeg|png|bmp/i`, `/not loaded|load\(\)/i`, `/not loaded|destroyed|state/i`) match the new C++-issued error messages, so no test regex needed updating. Files: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/test/unit/classification_model_test.cpp packages/qvac-lib-infer-ggml-classification/index.js Made-with: Cursor * QVAC-17481 chore(test,docs): post-sync audit follow-ups (consistency + uniform url strip + readme) Picks up the lower-risk consistency / correctness items from the post-sync self-audit. None of these change observable behaviour; they remove duplication and small footguns that would otherwise surface as drift in future maintenance. 1. test/integration/utils.js -- single source of truth for the mobile asset-key heuristic + uniform `file://` strip. - Extract `_resolveMobileAsset(filename)` from the two duplicate-by-design loops in `resolveModelPath()` and `_resolveImagePath()`. Both used the same four-element candidate-key array (`../../testAssets/${name}`, `../mobile/testAssets/${name}`, `testAssets/${name}`, `../testAssets/${name}`); future framework key-shape changes now land in one place instead of being silently inconsistent. - Extract `_stripFileUrlPrefix(mapped)` and switch from `mapped.slice('file://'.length)` to `mapped.replace(/^file:\/\//, '')`. The slice version leaves a stray leading `/` if the harness ever returns a triple-slash `file:///abs/...` URL (harmless on POSIX-mobile, malformed on a hypothetical Windows-mobile target). The regex strip is uniformly correct across both shapes. - Add `makeClassifier(overrides)` -- the standard test-instance factory. Centralises model-path + logger wiring so any future constructor-arg change in the addon lands in one place instead of N inline `new ImageClassifier(...)` callsites. 2. test/integration/classify.test.js + error-cases.test.js -- adopt the shared factory. - classify.test.js drops the inline `new ImageClassifier({ modelPath: resolveModelPath(), logger: createLogger() })` (4 callsites) in favour of `makeClassifier()`. Imports trimmed accordingly: drops `ImageClassifier`, `createLogger`, `resolveModelPath` from the destructure (unused after refactor; standardjs would have flagged them anyway). - error-cases.test.js drops its local `makeClassifier()` (which was a duplicate of what now lives in utils.js) and imports the shared one. Net: -1 module-level function. 3. README.md -- fix the `**threads**` markdown bullet. The line `- \`**threads**\` -- ...` wraps the bold markers in backticks, which renders the asterisks literally inside an inline-code span (`**threads**` instead of bold **threads**). Bare-renderable replacement: `- **\`threads\`** -- ...` reads as bold inline-code, matching the intent of the surrounding bullets. This was a pre-existing bug noted as "out-of-scope" in the line-reflow pass but is trivial to fix. Local validation on win32-x64: - `npm run lint` clean. - `npm run test:cpp` -- 29/29 (no behavioural change, just end-to-end smoke that the test-utils refactor did not break the C++ harness paths). - `npm run test:integration` -- 14/14, 140/140 asserts (run twice to confirm; one in-between-test SIGSEGV observed on the first run is the known upstream `OutputCallBackJs` UAF the hack branch deliberately leaves un-papered-over, not caused by this commit). Files: packages/qvac-lib-infer-ggml-classification/test/integration/utils.js packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js packages/qvac-lib-infer-ggml-classification/test/integration/error-cases.test.js packages/qvac-lib-infer-ggml-classification/README.md Made-with: Cursor * QVAC-17481 chore: rename addon directory to packages/classification-ggml Aligns the addon's directory and CI-workflow filenames with the published package name (`@qvac/classification-ggml`) so that the folder and the npm scope read consistently. Per a reviewer-style naming convention request: Package name: @qvac/classification-ggml Addon folder: classification-ggml Renames (53 files via `git mv`, all rename detection clean -- 31 insertions / 31 deletions across 54 files): packages/qvac-lib-infer-ggml-classification/ -> packages/classification-ggml/ .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml -> .github/workflows/integration-mobile-test-classification-ggml.yml .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml -> .github/workflows/integration-test-classification-ggml.yml .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml -> .github/workflows/prebuilds-classification-ggml.yml In-file text updates (paths only -- no functional change): - All four workflows (`integration-mobile-test-classification-ggml.yml`, `integration-test-classification-ggml.yml`, `prebuilds-classification-ggml.yml`, plus the hack-branch `on-pr-qvac-lib-infer-llamacpp-llm.yml`) now reference the new `packages/classification-ggml/**` path filter, `PKG_DIR=packages/classification-ggml` env, the renamed sibling workflow filenames, and the new `addon/packages/classification-ggml` `ADDON_WORKDIR` for the mobile harness. - `packages/classification-ggml/CMakeLists.txt` -- `project(...)`, `add_bare_module(...)`, and every `${...}` target reference renamed to `classification-ggml`. The bare module's output filename (`qvac__classification-ggml.bare`) is unchanged because bare derives it from `package.json` `name` (`@qvac/classification-ggml`), not from the CMake project name. - `packages/classification-ggml/package.json` -- repository.directory, homepage URL. - `packages/classification-ggml/README.md`, `index.js`, and `docs/onnx-to-gguf-conversion.md` -- doc paths. Deliberately NOT renamed (out of scope -- code-level identifiers, not file paths): - C++ namespace `qvac_lib_infer_ggml_classification` (8 files). Other addons in this monorepo do NOT tie their C++ namespace to the folder name (e.g. `qvac::ttslib::lavasr` lives under `packages/qvac-lib-infer-onnx-tts/`), so the namespace is a code-style choice rather than a path-consistency one. Can be folded into a follow-up if reviewers want full consistency there too. Local validation on win32-x64 (in the renamed `packages/classification-ggml/` directory): - `npm install` clean. - `bare-make generate` + `bare-make build` + `bare-make install` succeed; `qvac__classification-ggml.bare` produced under `prebuilds/win32-x64/` (filename unchanged). - `npm run lint` clean. - `npm run test:cpp` 29/29. - `npm run test:integration` 14/14, 140/140 asserts (perf-report correctly written under `packages/classification-ggml/test/results/`). Made-with: Cursor * QVAC-17481 fix(addon,test): align upstream-bug workarounds with monorepo convention Two upstream issues block the addon's CI without local mitigations. Both are paper-trailed in detail in `remote_logs/issues_report.md` (gitignored, internal). Inline comments at the workaround sites are kept short to match how other addons in the monorepo handle the same races. 1. `OutputCallBackJs` use-after-free race ---------------------------------------- `qvac_lib_inference_addon_cpp::~OutputCallBackJs` deletes JS refs synchronously while `uv_close` on its async handle is asynchronous (queue/OutputCallbackJs.hpp:48-58); a `uv_async_send` queued just before destruction fires against dead refs and crashes in `js_open_handle_scope`. Reproduced as SIGSEGV (linux-x64/-arm64, darwin-arm64), `Fatal signal 11` (Android logcat), and `EXC_BAD_ACCESS @ 0x1a0` (iOS crash report) across rapid create/ destroy cycles. Other addons in this monorepo paper over the same race in their integration suites with sleep-around-unload, e.g. ocr-onnx/test/integration/lifecycle.test.js:56,85,115 ocr-onnx/test/integration/full-ocr-suite.test.js:107,115,123 qvac-lib-infer-llamacpp-llm/test/integration/sliding-context.test.js:163,355 We adopt the same pattern via `cleanupClassifier()` in `test/integration/utils.js` (two-phase: 500-1000ms pre-unload yield + 2000-3000ms post-unload drain). The pre-unload yield is required for our addon specifically because `await classify()` resolves on the first `Output` event while the worker thread keeps queuing follow-up events (`RuntimeStats`, `JobCompleted`); without it the follow-ups land DURING `~OutputCallBackJs`. Every classify() call in the integration tests was migrated to `cleanupClassifier()`. The removed local C++ wrapper (`DeferredOutputCallBackJs`) was a real lifetime fix but kept us out of step with how the rest of the monorepo handles this; once upstream is patched the sleeps drop everywhere at once. 2. Win32-x64 first-`js_create_double` returns 0.0 ---------------------------------------------- The very first `js_create_double` call in the process returns 0.0 on the Azure GitHub-hosted `windows-2022` runner (clang-cl + bare-runtime + V8). Subsequent calls in the same handle scope are correct. No local Windows repro; only the CI runner image is affected. Other addons accidentally dodge the symptom because their first emitted number is naturally 0 (whisper/parakeet `segment.start`), they assert only `typeof === 'number'` / `!isNaN` (llamacpp-llm stats), they never assert the value (ocr-onnx bbox coords), or they emit no numbers at all (lib-infer-diffusion / llamacpp-embed). Our 3-class softmax sort + sum-to-1 assertions catch the corruption immediately, so no test-side workaround is possible. Local C++ "burn one" workaround in `JsClassifyOutputHandler`'s lambda preamble: a throwaway `js_create_double(env, 0.0, &dummy)` call consumes the broken first slot so the per-element `Number::create` calls below produce the correct value at index 0. Cost is one ephemeral js_number per classify() call. Other follow-ups in this commit (none disturb code paths above): - `addon.js` lifecycle: `unload()` no longer waits on the pending-job promise. The post-unload sleep in `cleanupClassifier` covers the same window, so `unload()` becomes a thin pass-through (matches what every other addon in the monorepo does). - Top-of-file workaround comment in `AddonJs.hpp` consolidated to a 2-line note at the burn-one site (matches the comment density other addons use; full root cause in the report). - `cleanupClassifier` doc trimmed to 3 lines pointing at the report. Local validation on win32-x64: - bare-make build clean - npm run lint clean - npm run test:cpp 29/29 - npm run test:integration 14/14 + 140/140 asserts Files: packages/classification-ggml/addon.js packages/classification-ggml/addon/src/addon/AddonJs.hpp packages/classification-ggml/addon/src/js-interface/binding.cpp packages/classification-ggml/test/integration/classify.test.js packages/classification-ggml/test/integration/error-cases.test.js packages/classification-ggml/test/integration/utils.js Made-with: Cursor * QVAC-17481 chore: adopt upstream WA fixes from PR #1825 Bumps qvac-lib-inference-addon-cpp from 1.1.5#1 to 1.1.6 (the version shipped by PR #1825) and removes the two local workarounds it was brought in to dodge: - Win32 burn-one js_create_double in JsClassifyOutputHandler is gone; upstream's JsUtils::Number::createDouble now applies a process-wide burn-once guard via static-init. - Two-phase sleep around unload() in cleanupClassifier is gone; upstream's ~OutputCallBackJs now defers js_delete_reference into the uv_close callback via a heap-owned State. Local Win32 validation: 14/14 integration tests + 29/29 C++ unit tests pass; in particular the index-0 marshalling assertions and the back-to-back load/unload cycle test that previously SIGSEGV'd both pass without their prior workarounds. Resolves T1 + T10 from the audit; details in remote_logs/issues_report.md. Made-with: Cursor * QVAC-17481 chore[api]: align lifecycle with llamacpp-llm pattern Re-shape the JS layer so request orchestration mirrors the LLM addon (closes T5-T9 from PR #1727 review): - addon.js becomes a thin C++ binding wrapper (mirrors LlamaInterface): constructor takes `(binding, configurationParams, outputCb, logger)`, exposes `activate()` / `runJob()` / `cancel()` / `unload()`. The bespoke `_pending` Promise + `_outputCallback` are gone; export a shared `mapAddonEvent(rawEvent, rawData, rawError)` instead. - index.js becomes the orchestration layer (mirrors LlmLlamacpp): one `exclusiveRunQueue()` serialises load/classify/unload, one `createJobHandler()` owns the active QvacResponse, and the output callback fans events through `_handleAddonOutputEvent`. - load() now does try/catch around `activate()` and best-effort `_addon.unload()` on failure so a partial init never leaves a zombie native handle (T6). - classify() resolves on the terminal stats event rather than the first ClassifyOutput, eliminating the orphan-callback risk that motivated the `_pending` drain on the previous design (T7, T8). Public shape unchanged: still `Promise<Array<{label,confidence}>>`. - unload() runs through the same queue, calls native `cancel()` on in-flight work, fails the active JS request with `Model was unloaded`, then destroys the native handle (T9). mapAddonEvent is keyed on payload shape (Array → Output, plain object → JobEnded terminal) because the upstream JobRunner emits the stats trailer with a raw `std::vector<std::pair<...>>` RTTI name rather than a literal `*JobEnded` event. Documented inline. Local validation: 14/14 integration + 140/140 asserts in 2.8s (down from 8.2s in Group A — the LLM-style cancel/unload is much faster than the prior drain-then-destroy pattern); 29/29 C++ unit tests; standard lint clean. Made-with: Cursor * QVAC-17481 infra: add canonical on-pr + on-pr-close workflows for classification-ggml Adds the two missing top-level workflow files so the addon now has the full 5-file layout used by every other modern addon in the monorepo (`decoder-audio`, `diffusion-cpp`, `ocr-onnx`, `bci-whispercpp`): - `on-pr-classification-ggml.yml` -- canonical PR trigger router. authorize -> changes -> sanity / ts-checks / cpp-lint / prebuild -> integration / mobile -> merge-guard. Path filters scope to `packages/classification-ggml/**` and the addon's own workflow files. - `on-pr-close-classification-ggml.yml` -- mirror of `on-pr-close-decoder-audio.yml`. Triggers `public-delete-npm-versions` with `packages: classification-ggml` to clean up per-PR npm pre-releases on PR close. Closes T11 from PR #1727 review (olyasir: "rename in same format as other pipelines"). The legacy-named `on-pr-qvac-lib-infer-ggml-classification.yml` on the fork PR-1 branch will be removed at sync-to-PR-1 time. The hack-branch dispatch swap (`on-pr-qvac-lib-infer-llamacpp-llm.yml` hijacked + `*-temp.yml` parking) is intentionally left untouched here: new workflows aren't dispatchable from the GitHub Actions UI until they exist on `main`, so the swap is still our only working dispatch path for hack-branch CI runs. Validation: both files parse with `yaml.safe_load`; every workflow / composite-action reference resolves on disk. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-17481 doc: trim verbose AI-style comments across the addon Closes T2/T3/T4 from PR #1727 (jesusmb1995: "Please remove this comment, its unnecessary... LLM's are too verbose"), and applies the same four cleanup rules across the rest of …
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-raise of #873 onto main now that #656 (Diffusion Addon) has been merged.
What problem does this PR solve?
How does it solve it?
sdcpp-generationplugin following established plugin architecture (definePlugin,createModel,resolveConfig, handlers)diffusion()client API with streaming support (progressStream,outputs,stats)"sdcpp-generation"(alias"diffusion") across all SDK surfacesctx.resolveModelPath()vae_on_cpu: true)API Changes
How was it tested?
Known Limitations
BUILTIN_PLUGINSupdate (PR fix[notask]: add sdcpp-generation to CLI BUILTIN_PLUGINS #1338)