QVAC-17481 feat[api]: integrate @qvac/classification-ggml into SDK#2056
QVAC-17481 feat[api]: integrate @qvac/classification-ggml into SDK#2056DmitryMalishev wants to merge 1233 commits into
Conversation
…lows for SDK (tetherto#1653) * infra: add suite filtering and PR-triggered e2e test workflows for SDK - Add workflow_call trigger + suite/exclude-suite inputs to test-sdk.yml - Thread suite/exclude-suite through desktop, android, and iOS reusable workflows - Create on-pr-test-sdk.yml for label-based and release-branch triggers - Add suite choice dropdown (full/smoke/custom) for manual dispatch * infra: add Device Farm artifact download and upload to Android SDK test workflow * infra: fix CodeQL alert in on-pr-test-sdk.yml Remove git fetch/diff of PR head refs that triggered "checkout of untrusted code in trusted context" alert. Use sparse checkout (only authorize-pr action) and rely on the trigger-level paths filter for SDK change detection on release branch PRs. * infra: improve Android Device Farm logging with continuous background capture Replace inline logcat consumption with background capture to a persistent log file, matching the iOS pymobiledevice3 pattern. Adds full unfiltered logcat dump and React Native log extraction in post_test. * infra: increase Device Farm cleanup wait to 30 min for artifact download The STOPPING state can take 10+ minutes on Device Farm. Previous 5-min wait (30x10s) was insufficient, resulting in empty artifact downloads. Now waits up to 30 min (120x15s) and merges stop+wait+download into a single step. Applied to both iOS and Android cleanup jobs. * fix: fix security issue caused by allowing out-of-date labeled prs
…1704) Nested android/ios jobs in test-sdk.yml require id-token:write for AWS OIDC credential exchange. The caller workflow permissions are the ceiling for all nested reusable workflows — without this, GitHub rejects the entire workflow chain at startup with startup_failure.
tetherto#1583) * QVAC-17057 feat: add bci-whispercpp package for BCI neural signal transcription Add a new @qvac/bci-whispercpp addon that transcribes brain-computer interface neural signals into text using a modified whisper.cpp backend. This POC includes: - C++ native addon with BCI model inference (NeuralProcessor, BCIModel, BCIConfig) built on the qvac addon-cpp framework - CMake + vcpkg build system with whisper-cpp overlay ports carrying BCI-specific patches (variable conv1 kernel, windowed attention) - JavaScript API: BCIWhispercpp class with batch transcribeFile/transcribe - Integration tests for load/destroy and batch transcription - Example script and model conversion tooling - WER utility for accuracy measurement Streaming transcription will be added in a follow-up PR (QVAC-17062). Made-with: Cursor * fix[api](bci): address review feedback, refactor to infer-base pattern, fix Linux linkage - Refactor BCIWhispercpp to use createJobHandler + exclusiveRunQueue from @qvac/infer-base instead of manual promise plumbing, matching the TranscriptionWhispercpp / LlmLlamacpp addon pattern - Constructor now takes { files: { model }, logger, opts } (was { modelPath }) - transcribe/transcribeFile return QvacResponse - Add unload(), getState(), exclusiveRunQueue-serialized destroy() - Add @qvac/infer-base dependency Address all review feedback from Gustavo (PR tetherto#1583): - Remove unused END_OF_INPUT, totalSamples_, sleep_for(1ms) - Use QvacErrorAddonBCI for model-not-found, add BUFFER_LIMIT_EXCEEDED - Fix n_threads/duration_ms double→int conversion in BCIConfig.cpp - Add bounds validation for all BCIConfig numeric params - Throw on unknown config keys (was silently ignored) - Consume gpu_device in context params - Collect whisper timings in runtimeStats() - Trim unused BCIErrors enum values, map codes to distinct names - Add MAX_BUFFERED_BYTES guard and nextSafeId in bci.js - Fix _activeJobId race: set after native acceptance - Remove unimplemented bciConfig params from JS whitelist + index.d.ts - Promote hardcoded kernel-trim threshold to named constant - Pre-allocate dummyAudioPad_ as class member (avoid repeated allocs) - Rename bci-addon.test.js → addon.test.js - Replace t.skip() with proper assertions - Fix day_idx handling in tests/examples (group by day, pass to config) - Generate comprehensive NOTICE file - Update vcpkg overlay to v1.8.4 description Fix Linux C++ test linkage: - Add vcpkg triplets (x64-linux, arm64-linux) with -stdlib=libc++ - Add linux-clang toolchain (clang-19) - Set VCPKG_OVERLAY_TRIPLETS in CMakeLists.txt for Linux builds Made-with: Cursor * perf(bci): bump whisper-cpp overlay to include mask caching and per-layer flash attn Update whisper-cpp overlay to 5645ad60 which includes: - Cached window_mask recompute for exp_n_audio_ctx overrides - Per-layer flash attention (upper encoder layers use FA even with BCI) - std::abs instead of C abs in mask computation Made-with: Cursor * chore(bci): bump whisper-cpp overlay to include jpgaribotti review fixes Update overlay to tetherto/qvac-ext-lib-whisper.cpp@3e91e3a4 which addresses jpgaribotti's review on PR tetherto#10: 1. Extract compute_window_mask() helper to eliminate duplicated O(n_ctx^2) mask fill logic 2. Guard encode-time mask block with hparams.is_bci 3. Add is_bci to graph builder window_mask guard 4. Validate BCI hparams (conv1_kernel > 0, window_size >= 0) 5. Document n_mels > 256 threshold convention Bump port-version to 3. Made-with: Cursor * fix(bci): add test fixture download to download-models.sh Address Gustavo's review feedback: test fixtures (neural_sample_*.bin) are gitignored but the PR had no way for developers to obtain them. Rewrite download-models.sh to fetch both models and test fixtures from the bci-test-assets-v0.1.0 GitHub release. Supports --models, --fixtures, or both (default). Made-with: Cursor * fix(bci): address review findings — version mismatch, test indexing, cleanup - Bump whisper-cpp override in vcpkg.json from 1.7.5.1 to 1.8.4 to match the overlay port version - Move gtest to a vcpkg "tests" feature so it is only pulled when BUILD_TESTING=ON - Fix PaddedFramesAreZero test: use mel-major indexing (data[bin * n_frames + frame]) matching the actual processToMel layout - Remove four unused overlay patch files (0001–0004) now that portfile.cmake fetches from the tetherto fork with patches baked in - Add TODO comment in download-models.sh noting the temporary personal fork for release assets Made-with: Cursor * fix(bci): address review findings — race guard, cross-platform path, docs accuracy - Wrap transcribe() in exclusiveRunQueue to prevent race between inference and unload/destroy - Use find_last_of("/\\") in loadEmbedderIfNeeded for Windows compat - Add empty-buffer guard in bci.js append() before end-of-job - Update download-models.sh to use tetherto/qvac release repo - Add transformers to NOTICE and README model conversion prerequisites - Fix README WER table to match actual live test results (6.0% avg) - Fix BCI_V184_COMPAT.md stale test filename and overlay ref - Remove unused bci_wer_vs_expected field from manifest.json - Update whisper.cpp patches section to reflect fork-based overlay Made-with: Cursor * fix(bci): harden lifecycle, type safety, and C++ code quality - Fix unload/destroy race: call destroyInstance() before _job.fail() so the native side stops before the JS job is failed, and remove redundant cancel() call (destroyInstance already cancels internally) - Wrap BCIInterface construction in try/catch so a native init failure sets addon=null and throws a structured QvacErrorAddonBCI - Change JSAdapter loadContextParams/loadMiscParams/loadBCIParams to return void (callers already mutate via reference, return was dead) - Add dayIdx bounds-check warning in BCIModel::process when the value falls outside [0, numDays-1] before silent clamping - Promote hardcoded gaussian smoothing params (std=2.0, kernel=100) to named constants K_SMOOTH_KERNEL_STD / K_SMOOTH_KERNEL_SIZE - Add NeuralProcessor::getNumDays() accessor for the bounds check - Remove [key: string]: unknown escape hatch from WhisperConfig in index.d.ts; enumerate all valid keys explicitly - Fix test:cpp:run script to use direct path instead of cd && chain Made-with: Cursor * chore(bci): point whisper-cpp overlay to merged master (2b1e04f) qvac-ext-lib-whisper.cpp PR tetherto#10 has been merged. Update the overlay to reference the merge commit on master instead of the feature branch commit, so the overlay remains valid if the branch is deleted. Bump port-version to 4. Made-with: Cursor * fix(bci): serialize inference lifetime and export low-level subpaths Address ogad-tether review feedback on PR tetherto#1583: 1. Inference queue: transcribe() now holds its slot until the response settles via _enqueueInference(), matching the pattern from TranscriptionWhispercpp._enqueueExclusiveRunResponse(). Previously the exclusiveRunQueue released the slot as soon as runJob() was accepted, allowing a second concurrent transcribe() to race in and either clobber the first response or get rejected by the native side. 2. Exports map: add ./bci, ./bci.js, and ./binding subpath exports so the low-level BCIInterface API documented in the README is accessible after publish. The exports map previously only exposed ./binding.js, blocking require('@qvac/bci-whispercpp/bci'). Made-with: Cursor * refactor[bc](bci): rename to qvac-lib-infer-bci-whispercpp and address review Align the BCI package with the inference-addon family conventions and resolve the review findings that accumulated across PR tetherto#1583. Breaking changes - Package directory renamed from packages/bci-whispercpp to packages/qvac-lib-infer-bci-whispercpp (npm name @qvac/bci-whispercpp unchanged). - Error codes moved from 7001-7013 (collided with @qvac/tts-onnx and the @qvac/transcription-parakeet fallback range) to the dedicated 26001-27000 range. Also adds FAILED_TO_START_JOB, INVALID_CONFIG, and EMBEDDER_WEIGHTS_INVALID for cases that were previously swallowed. Pattern / standard alignment with peer addons - Add addonLogging.js + addonLogging.d.ts + ./addonLogging subpath export. - Add CHANGELOG.md, PULL_REQUEST_TEMPLATE.md, tsconfig.dts.json. - Pin qvac-lib-inference-addon-cpp vcpkg dep to 1.1.5#1 (port-version). - vcpkg default-registry switched from git@github.com: to https:// (fixes anonymous clones and CI runners without an SSH deploy key). - Lint glob now covers lib/**/*.js. - bare engine bumped from >=1.19.0 to >=1.24.0 to match llamacpp-llm/embed. - VCPKG_OVERLAY_TRIPLETS set unconditionally and preserves external value. - Remove test:unit script that pointed at a non-existent dir; add build:pack, lint-cpp, test:dts scripts matching peer conventions. - package.json files array now includes README.md, CHANGELOG.md, and addonLogging artifacts; repository.directory + homepage point at the renamed path. PR review fixes (Gustavo, ogad-tether, github-code-quality bot) - day_idx default aligned: C++ runtime default is now 0 (matches the public JS/TS docs and NeuralProcessor header default). - BCIInterface.runJob rewrap now uses FAILED_TO_START_JOB instead of the misleading FAILED_TO_APPEND; input is validated (Uint8Array, non-empty). - day_idx: -1 passthrough mode is now explicitly documented in configChecker, README, and index.d.ts, and values < -1 are rejected at the JS boundary. - JS _load no longer sets suppress_nst/temperature defaults that fought the BCI-tuned C++ defaults in toWhisperFullParams. - Duplicate checkConfig call in BCIWhispercpp._load removed; validation now happens once inside the BCIInterface constructor. - whisper_log_set guarded by std::once_flag so it does not clobber any log handler a coexisting whisper-based addon installed in the same process. - Embedder weight loader now checks the stream state after every read and returns false on truncation instead of silently marking the weights as loaded and producing garbage at inference time. - NeuralProcessor day projection is now memoized per day_idx; same-day batch inference no longer rebuilds the O(nf^2 * r) dense matrix. - cancelRequested_.store(false) now runs before reset() in BCIModel::process(const std::any&) to avoid a window where a cancel() is dropped on the floor. - _addonOutputCallback now unpacks transcript arrays so response.await() yields flat segments (matches TranscriptionWhispercpp). - examples/transcribe-neural.js identical-branch ternary fixed. - README broken whisper.cpp link fixed; docs/BCI_V184_COMPAT.md stale overlay commit ref updated. - Integration test honours BCI_REQUIRE_MODEL=1 to turn missing-model into a loud failure for CI (default behaviour unchanged: local dev still skips). - index.d.ts now imports QvacResponse from @qvac/infer-base/src/QvacResponse and LoggerInterface from @qvac/logging instead of hand-rolling them. Tests - Clean rebuild from scratch (rm -rf build prebuilds && bare-make generate/build/install) succeeds. - npm run lint: clean (now covers lib/**). - npm run test:dts: clean. - npm run test:integration: 3/3 pass, 10/10 asserts, 6.0% average WER (matches baseline). - npm run test:cpp: 18/18 pass (was 7; +11 new tests covering unknown-key rejection, numeric double-to-int coercion, range validation, ContextGpuDevice bounds, passthrough mode, invalid embedder handling). - bare examples/transcribe-neural.js --batch: 5/5 samples, 6.0% avg WER. - bare examples/transcribe-neural.js test/fixtures/neural_sample_0.bin: output unchanged ("You can see the good at this point as well."). Made-with: Cursor * fix[api](bci): restore bci-whispercpp package path and harden runtime validation Move the addon package back to packages/bci-whispercpp, remove unneeded overlay/docs files requested in review, and tighten JS/C++ lifecycle/config safety checks to prevent invalid-state and malformed-input issues. Made-with: Cursor * fix[api](bci): address code review findings across JS, C++, and build config - Replace cd && chain in test:cpp:run with direct path (CLAUDE.md compliance) - Route whisper_log_set through addon-cpp logger instead of silencing with once_flag, preventing inter-addon log handler clobber when BCI and transcription-whispercpp coexist in the same process - Fix stats heuristic in bci.js _addonOutputCallback to match actual BCIModel::runtimeStats keys (tokensPerSecond/totalWallMs, not the audio-addon keys audioDurationMs/totalSamples) - Drain _inferenceQueueWaiter in unload()/destroy() before calling destroyInstance(), closing the race where destroy could fire while process() is mid-execution on the native thread - Remove auto-load in BCIModel::process — throw immediately if context is null instead of lazy-loading outside the controlled lifecycle - Remove dead set_weights_for_file snake_case stub and unused <span> - Add qvac-lint-cpp to vcpkg.json dependencies (matches all peer addons) - Remove empty qvac-lint-cpp overlay directory (per Gustavo review) - Remove stale bci_wer/bci_transcription from manifest.json - Stop gitignoring package-lock.json (match monorepo convention) - Move computeWER into BCIWhispercpp namespace in index.d.ts - Downgrade @types/node to ^22.15.3, remove bare-fs from devDeps - Fix PR template code blocks from typescript to javascript Made-with: Cursor * fix[api](bci): address review findings — standards alignment, structured errors, lifecycle safety Align bci-whispercpp with monorepo conventions and fix code quality issues found during thorough review of the POC implementation. Build/config: - .gitignore aligned with peer addons (package-lock.json, .npmrc, IDE files, vcpkg cache, generated test bundles) - vcpkg.json: use "version" instead of deprecated "version-string" - package.json: replace $(find) in lint-cpp with explicit file list, remove unused bare-stream/bare-tty deps, add bare-fs to production deps - CHANGELOG.md: add date per Keep a Changelog format JS fixes: - Move fs.existsSync model check from constructor to _load(), matching TranscriptionWhispercpp lifecycle pattern - Remove dead PAUSED/STOPPED state enum values from bci.js - Add explicit event name matching alongside heuristic fallback in _addonOutputCallback (matches peer whisper.js pattern with BCI stat keys) - Add miscConfig.caption_enabled boolean type validation in configChecker - Extract duplicated flattenSegments into shared lib/util.js - Fix index.d.ts import from fragile internal path to stable @qvac/infer-base C++ fixes: - Guard whisper_log_set with std::once_flag to prevent clobbering log handlers from coexisting whisper-based addons in the same process - Replace std::runtime_error with structured StatusError/bci_error::makeStatus in BCIModel::load() and loadEmbedderIfNeeded() for proper JS error mapping - Use std::move in process(const std::any&) to avoid copying multi-MB neural signal buffers on every inference call Made-with: Cursor * fix[api](bci): align with peer addon standards and remove unused code - Add qvac-lint-cpp configure_file block to CMakeLists.txt (copies .clang-format, .clang-tidy, .valgrind.supp from vcpkg into build tree, matching qvac-lib-infer-whispercpp pattern) - Extend lint-cpp script to cover all .hpp header files - Match peer index.d.ts QvacResponse import path (deep import from @qvac/infer-base/src/QvacResponse) - Replace brittle string-matching in _isConfigurationError with structured error detection (TypeError, ERR_ASSERTION code checks) - Remove stale configChecker comments about unimplemented BCI params (smooth_kernel_std, smooth_kernel_size, sample_rate) - Remove unused error codes: FAILED_TO_GET_STATUS, FAILED_TO_RESET, FAILED_TO_PAUSE and their addCodes registrations - Remove unused K_SAMPLES_PER_SECOND constant from BCIModel.cpp - Remove unused <span> include from AddonJs.hpp - Add qvac-lib-inference-addon-cpp to NOTICE C++ dependencies - Add cpp-test-results.xml to .gitignore Made-with: Cursor * chore(bci): remove whisper-cpp overlay, consume v1.8.4.2 from registry The BCI patches (variable conv1 kernel, windowed attention) are now merged into tetherto/qvac-ext-lib-whisper.cpp master and tagged as v1.8.4.2. The local overlay that pinned a specific fork commit is no longer needed. - Delete vcpkg-overlays/whisper-cpp/ (portfile.cmake + vcpkg.json) - Remove VCPKG_OVERLAY_PORTS from CMakeLists.txt - Bump whisper-cpp override from 1.8.4 to 1.8.4.2 - Point vcpkg-configuration.json at personal fork registry (sharmaraju352/qvac-registry-vcpkg) temporarily until tetherto/qvac-registry-vcpkg#125 merges, then swap back - Update README whisper.cpp patches section Verified: clean build from scratch + 18/18 C++ tests + 3/3 integration tests (10/10 asserts, 6.0% avg WER) + batch example all pass. Made-with: Cursor * chore(bci): point vcpkg registry back to tetherto upstream Registry PR tetherto/qvac-registry-vcpkg#125 has been merged. Swap vcpkg-configuration.json from the personal fork back to the upstream tetherto/qvac-registry-vcpkg and update the baseline to the merge commit. Verified: clean build from scratch + all tests pass on both bci-whispercpp (18/18 C++, 3/3 integration, 6.0% WER) and transcription-whispercpp (106/106 C++, 28/28 unit, 10/10 integration, all extended suites). Made-with: Cursor * fix(bci): address Gustavo review — error types, lifecycle, error codes - Reset is_warmed_up_ in BCIModel::unload() so re-load triggers warmup - Add FailedToLoadModel and EmbedderWeightsNotFound error codes to BCIErrors.hpp; use them instead of InvalidNeuralSignal for context init failure (BCIModel.cpp:116) and missing embedder (BCIModel.cpp:90) - Wrap addon.activate() in try-catch in index.js _load(), throwing FAILED_TO_ACTIVATE with structured error on failure - Make all JS error codes sequential (26001-26013, no gaps) Made-with: Cursor * Remove date from changelog --------- Co-authored-by: Raju <raju.sharma> Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com> Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
…2232) (tetherto#1699) Bumps @qvac/registry-client to ^0.4.1 and passes corestoreOpts: { wait: true } when constructing QVACRegistryClient in the SDK's bare registry bootstrap. This switches the underlying Corestore from tryLock to waitForLock semantics, so concurrent SDK instances on the same machine no longer collide on ~/.qvac/registry-corestore/<key> with 'File descriptor could not be locked'. Internal bootstrap tweak only — no public SDK surface changes.
…nmtcpp 2.0.1 (tetherto#1563) feat: update SDK NMTCPP plugin to support @qvac/translaton-nmtcpp@2.0.1, which moves away from base inference inheritance
…herto#1633) * QVAC-17020: Integrate new cache api into SDK * Bumping LLM add-on version in SDK to get new cache API * Adding updated bun.lock * fix: verify kv-cache file persisted before recording saved message count * refactor: source CacheRunOptions from @qvac/llm-llamacpp RunOptions * fix(llamacpp-completion): preserve explicit saveCacheToDisk: false in run options * fix(sdk): persist KV cache during system prompt prime to keep init marker consistent with disk * mod: dedupe model.run cast into a typed runModel helper in completion-stream * fix(sdk/examples): log cleanup errors in llamacpp-cache instead of swallowing them --------- Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>
…rto#1662) * feat[api]: add img2img support to SDK diffusion API * fix(sdk): enforce img_cfg_scale default of -1 at the schema level * test(sdk): split sdcpp diffusion dispatcher tests into focused cases and extract plugin mock into withMockDiffusionPlugin helper * chore(test/diffusion): cleanup the comments * test: add diffusion img2img test definition with init_image param * fix(test/diffusion): split shared executor into separate ones for mobile and desktop due to file system logic diff --------- Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>
…therto#1674) * QVAC-17236 [Chatterbox] Investigate possibilities of reducing RTF * QVAC-17236 Rewrote the "Speech encoder output caching" bullet (and intro paragraph) to reflect that the encoder runs during load(), not on first synthesize() * QVAC-17236-Investigate-possibilities-of-reducing-RTF version @qvac/tts-onnx: bump "0.8.4" → "version": "0.8.5"
Minor version bump for the new dynamic GGML backend loading feature (tetherto#1617) which unblocks GPU-backed inference on Android via `backendsDir` and adds `openclCacheDir` for faster OpenCL startup.
…to#1668) * feat: add Parakeet performance benchmark workflows Parameterize the RTF benchmark across models and devices, and add workflow/reporting support to collect consolidated performance findings across CI and manual backends. Made-with: Cursor * fix: export mobile benchmark entrypoints Export the generated mobile integration functions and dedicated mobile benchmark entrypoints so code-quality checks do not flag the benchmark harness as unused. Made-with: Cursor * fix: package mobile benchmark runtime with custom tests Keep the dedicated mobile benchmark suite self-contained so the Device Farm build can resolve its local runtime helper after extraction into the mobile test framework. Made-with: Cursor * fix: run benchmark matrix reliably on Windows Use shell-backed npm spawning on Windows and surface process errors so the desktop benchmark workflow can produce the missing win32 DirectML artifacts. Made-with: Cursor * fix: resolve mobile benchmark extractor from addon checkout Run the mobile benchmark log-extraction script from the addon checkout path so the Device Farm workflow can publish structured mobile RTF artifacts after the run completes. Made-with: Cursor * fix: align mobile benchmark dir with integration harness Use an integration.auto.cjs entrypoint in the custom mobile benchmark directory so the mobile test framework bundles and resolves the benchmark module through the same integration loader path it expects for other mobile suites. Made-with: Cursor * fix: route mobile benchmark through copied integration modules Align the custom mobile benchmark suite with the qvac-test-addon-mobile loader by pointing it at a copied integration wrapper module and resolving the extractor from the addon checkout root. Made-with: Cursor * fix: make mobile benchmark module self-contained Use a mobile-specific benchmark integration module that imports through the copied mobile-aware helpers path instead of the desktop benchmark file's addon-relative imports. Made-with: Cursor * fix: make raw mobile log upload non-blocking Do not fail the mobile benchmark job when the final Device Farm log artifact upload times out after the benchmark execution and extraction steps have already completed. Made-with: Cursor * fix: make mobile RTF extraction file-based and logcat-safe Write mobile benchmark reports to stable device paths, emit chunked OCR-compatible console markers, and update the Device Farm testspec plus extractor so Android and iOS can surface structured RTF artifacts reliably. Made-with: Cursor * fix: persist mobile perf reports per benchmark Made-with: Cursor * fix: copy mobile perf helper from repo root Made-with: Cursor * fix: harden Parakeet report pipeline Made-with: Cursor * fix: retry mobile perf report pulls Made-with: Cursor * fix: scope Parakeet perf changes to desktop Made-with: Cursor --------- Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Made-with: Cursor Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
…to#1706) Backmerges the v0.4.1 release metadata (package.json version + CHANGELOG entry) from release-qvac-registry-client-0.4.1 into main, per docs/gitflow.md step 4. Released and published via tetherto#1698.
…KEN (tetherto#1618) * refactor: use npm trusted publishing and remove NPM_TOKEN * fix: node version for the npm trusted publishing * refactor: remove secrets-token since it will no longer be needed * refactor: remove setting package scope using npm access public * removed: deprecated steps * refactor: removed unused inputs of the action * refactor: removed unused inputs of the action * refactor: removed unused inputs of the action * refactor: removed unused inputs of the action * feat: added id-token: write permission for the publishing --------- Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
…e() api (tetherto#1691) * feat: add public state() lifecycle api * fix: route pre-handler errors on streaming and progress rpcs to stream channel * feat: enforce lifecycle gate and tighten partial resume failure * chore: demonstrate state() and blocked operations in suspend-resume example * doc[api]: add tsdoc for public state() lifecycle api * doc: expand suspend/resume tsdoc with in-flight behavior matrix
…ed vocab resolution (tetherto#1707) * feat: extract BERGAMOT_MODEL_RE and BERGAMOT_CJK_LANG_PAIRS to shared schemas * feat: add Bergamot NMT companion-set grouping in codegen * feat: refactor NMT plugin to path-based vocab with colocated derivation fallback * feat: add legacy flat-cache probe for non-ONNX companion sets * test: add Bergamot companion detection unit tests
* feat(diffusion): add LoRA support via run config (JS → native → sd.cpp) * test(diffusion): add real LoRA integration test * chore(diffusion): bump version for LoRA support * Validate diffusion LoRA paths * Export generated mobile integration runners --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
…lows (tetherto#1714) * test: point LLM and OCR mobile workflows at ios-cpp-log-capture branch Throwaway commit to validate C++ log capture on Device Farm. Do NOT merge -- delete branch after verification. Made-with: Cursor * test: add bare_console.log pullFile to iOS WDIO after hooks Pull bare_console.log from the app's documents directory at the end of each iOS test run and write it to $DEVICEFARM_LOG_DIR so it appears in customer artifacts. Throwaway -- delete branch after verification. Made-with: Cursor * fix: add 3s pause before pullFile to avoid log flush race condition The Bare worklet flushes logs to disk on a 2s timer. Without a pause, pullFile in the WDIO after hook could retrieve a stale file missing the final log entries (inference completion, errors). * fix: prevent download step hang from pullFile base64 log bloat browser.pullFile returns base64 which WDIO debug-logs in full, bloating the Appium output artifact. Replace with raw HTTP to Appium to bypass WDIO command logging. Also add --max-time 300 to all curl commands in the download steps as a safety net against any future hangs. * feat: add iOS bare_console.log capture to all addon mobile workflows Wire up the same bare-log pullFile (via raw HTTP to bypass WDIO debug log bloat) and 3s flush pause in the iOS WDIO after hooks for whisper, NMT, parakeet, onnx-tts, and decoder-audio workflows. Also add --max-time 300 to all curl commands in their download steps. * feat: split Device Farm artifacts into console logs and full logs Add two-tier artifact downloads to all 9 mobile test workflows: - "Console Logs" (small): bare_console.log (iOS C++ logs), test spec output, and logcat (Android) for quick debugging - "Full Device Farm Logs" (big): all artifacts for deep investigation Also adds bare_console.log capture to llamacpp-embed and diffusion workflows, curl --max-time 300 safety net, and resets TEST_FRAMEWORK_REF back to main now that qvac-test-addon-mobile#36 is merged. Made-with: Cursor * fix: extract bare_console.log from Customer_Artifacts.zip in console logs Device Farm bundles $DEVICEFARM_LOG_DIR files into Customer_Artifacts.zip, so bare_console.log was not appearing as a standalone file. The extract step now unzips Customer_Artifacts.zip files and pulls bare_console.log into the console-logs artifact with device-prefixed names. Made-with: Cursor * fix: simplify console logs to only bare_console, logcat, and appium logs Console logs artifact now contains only the essentials: - bare_console.log (iOS C++ logs, extracted from Customer_Artifacts.zip) - Logcat (Android native logs) - appium.log (Appium server logs, extracted from Customer_Artifacts.zip) Removed test spec output from console logs -- those stay in full logs. Made-with: Cursor * fix: add .github/actions to sparse-checkout in OCR and ONNX on-pr workflows The sanity-checks job uses sparse-checkout but only included the package directory. Custom actions (yamlfmt, run-lint-and-unit-tests) were missing from the checkout, causing "Can't find action.yml" errors. Made-with: Cursor * Revert "fix: add .github/actions to sparse-checkout in OCR and ONNX on-pr workflows" This reverts commit 0428671. * fix: align TTS console-logs extract path with download path (include variant suffix) Made-with: Cursor * fix: use find for nested zip extraction in OCR console-logs Device Farm zips nest files under Host_Machine_Files/$DEVICEFARM_LOG_DIR/ so flat path checks never found bare_console.log or appium.log. Made-with: Cursor * fix: use find for nested zip extraction in all 8 remaining workflows Device Farm Customer_Artifacts.zip nests files under Host_Machine_Files/$DEVICEFARM_LOG_DIR/ - use find to locate bare_console.log and appium.log at any depth inside the zip. Made-with: Cursor
…tadata (tetherto#1700) * feat[mod]: regenerate model registry with companion-set metadata * chore: regenerate model registry with companion-set metadata
) * feat[QVAC-17474]: port OCR mobile perf-report pipeline to NMT (Phase 2) Closes the "Mobile Phase 2" follow-up from PR tetherto#1684: surface the chrF++ + perf numbers from NMT mobile integration tests in a dedicated artifact + GitHub Step Summary, matching the OCR pattern Tobi set up in tetherto#1625. Changes: 1. .github/workflows/integration-mobile-test-qvac-lib-infer-nmtcpp.yml Added three steps between the existing "Download Device Farm Logs" and "Upload Device Farm Logs": a) "Extract performance report from Device Farm logs" — runs scripts/perf-report/extract-from-log.js against the downloaded Device Farm logs, scanning for [PERF_REPORT_START]...[PERF_ REPORT_END] markers (and PERF_CHUNK fallback for large payloads), then runs scripts/perf-report/aggregate.js to generate the HTML / MD / summary-json report. Extraction is best-effort: if no markers are found (e.g. Device Farm logs were empty or tests crashed early), the step logs a warning and continues so the raw logs still get uploaded by the following step. b) "Write mobile perf report to GitHub Step Summary" — appends the generated MD report (performance + quality sections) to $GITHUB_STEP_SUMMARY so the iOS/Android job pages render chrF++ + perf tables inline, same way the desktop job does. c) "Upload mobile performance report" — dedicated artifact perf-report-nmtcpp-mobile-<platform>-<run>, 90-day retention, containing performance-report.json/.html/.md and performance-summary.json. Mirrors the perf-report-nmtcpp-* artifact that already exists for desktop. 2. packages/qvac-lib-infer-nmtcpp/test/integration/utils.js Dual-store chrfpp in both `metrics` and `quality` fields on each reporter entry. Needed because: - writeStepSummary reads `metrics.chrfpp` via METRIC_COLUMNS. translation (that's what Olya's single table uses) → unchanged. - aggregate.js Quality Summary section reads `result.quality.*` via qKeys — without `quality.chrfpp`, the mobile (and desktop) HTML / MD reports were showing only a "Mean Total Time" column with no chrF++ anywhere. Adding `quality: {chrfpp, reference}` in the extra lets aggregate.js render the chrF++ column as a percentage in the Quality Summary. - Mobile inline reporter now also threads extra.quality into entry.quality, mirroring the desktop reporter in scripts/test-utils/performance-reporter.js, so the on-device [PERF_REPORT_START] JSON carries the quality field. Out of scope (deliberately, can be follow-up if useful): - "Combined Performance Report" job that aggregates across multiple devices (OCR's pattern). NMT mobile currently runs a single device per platform so a per-platform report is the natural unit. - Splitting the Device Farm run into perf vs regular subsets (OCR does this to bound perf-test wall time). NMT mobile has a small test count already; no need to split. - Device-name display in the extracted report. NMT's existing Device Farm log download flattens files to <device>_<suite>_ <artifact>.<ext> instead of OCR's <device>/<artifact> layout, so extract-from-log.js's device-from-path inference returns "unknown". Reports still render correctly, just with a generic column header. Fix is a small tweak to either the download layout or extract-from-log.js; not blocking this work. Verified locally: - YAML validates - Synthetic Device Farm log containing a real-shape [PERF_REPORT_START]...[PERF_REPORT_END] marker → extract-from-log.js produces 3-result performance-report.json → aggregate.js produces .md / .html / summary.json with both Performance Summary (total time) and Quality Summary (chrF++ column, percentages) sections populated: | Test | EP | chrF++ | | [Bergamot] [CPU] | CPU | 97.0% | | [IndicTrans] [CPU] | CPU | 63.0% | | [Pivot es→en→it] [CPU] | CPU | 71.0% | Made-with: Cursor * feat[QVAC-17474]: extend WDIO after-hook to extract perf report from device (Mobile Phase 2) Follow-up on tetherto#1697: the workflow-side extract step never had input because Bare-runtime's console.log on iOS doesn't reach iOS Syslog captured by Device Farm. Port the OCR tetherto#1625 approach end-to-end so the perf report actually leaves the device. Changes to .github/workflows/integration-mobile-test-qvac-lib-infer-nmtcpp.yml: 1. Android WDIO `after:` hook — after stopping the health monitor, executes the full OCR tetherto#1625 extraction routine: - Poll 6 on-device paths via `mobile: shell cat` for stability (48 retries x 5s, waits until result-count is stable for 6 consecutive reads) — bounds Device Farm flakiness. - Try `browser.pullFile` across 9 paths including the app sandbox (`@<bundleId>/files/...`), `/sdcard/Android/data/ <bundleId>/files/...`, `/data/data/<bundleId>/...`, etc. - Fall back to `browser.getLogs("logcat")` — parses [PERF_REPORT_START]...[END] markers AND reassembles PERF_CHUNK: chunks when logcat per-entry truncation splits the JSON across lines. - Fall back to `mobile: shell cat` on the same path set. - Fall back to `mobile: shell run-as <bundleId> cat` for files inside the app's private sandbox. - On success: write extracted JSON to $DEVICEFARM_LOG_DIR/perf-report-extract.json AND echo it wrapped in [PERF_REPORT_START]...[END] to the testspec output stream. 2. iOS WDIO `after:` hook — after stopping the health monitor, calls `browser.pullFile("@<bundleId>:documents/perf-report.json")` (with `:library/` fallback), writes the result to $DEVICEFARM_ LOG_DIR/perf-report-extract.json and echoes markers. 3. Testspec `post_test:` phase — added a fallback that reads perf-report-extract.json from $DEVICEFARM_LOG_DIR or $DEVICEFARM_ TEST_PACKAGE_PATH and re-emits it wrapped in markers to stdout, so the downstream extract-from-log.js picks it up from the TESTSPEC_OUTPUT.txt artifact even if the WDIO-emitted console markers were lost in the iOS Syslog stream. Note on escaping: the WDIO_CONFIG remains in NMT's existing bash-single-quoted format (rather than switching to OCR's heredoc pattern). Literal `'` in the JS regex (logcat chunk prefix matcher) uses the Unicode escape `\u0027` to avoid breaking the outer bash single-quote. Final JS regex on-device reads `/^\u0027\[Bare\]\u0027,\s*\u0027/` which matches `'[Bare]',<ws>'` — same semantics as OCR's original. Verified locally: - YAML parses cleanly. - Simulated bash eval of WDIO_CONFIG with APP_BUNDLE_ID stub produces valid JS: perf-extract block ends with `ALL methods failed");}},afterTest:...` (correct brace balance); regex reads `msg.replace(/^\u0027\[Bare\]\u0027,\s*\u0027/,"")` (proper single-backslash escapes throughout). Not done in this commit (already in place from tetherto#1697): - extract-from-log.js workflow step, aggregate.js step, Step Summary write, and dedicated artifact upload. Those were already added; this commit completes the input side of that pipeline. Made-with: Cursor * fix[QVAC-17474]: externalize WDIO config to template files (fix max-expression-length 21000) GitHub Actions reports `Exceeded max expression length 21000` when queuing `On PR Trigger (NMTCPP)` on this branch. Root cause: the `Create and Upload Test Spec` step's `run:` block had grown to 24,350 bytes after adding the Android perf-extract logic inline. GitHub Actions treats a `run:` script that contains `${{ }}` substitutions as a single template expression, and the step was over the 21,000- char limit. Fix: move the two WDIO configs out of the workflow YAML into template files checked into the repo, loaded at step-time with sed: - packages/qvac-lib-infer-nmtcpp/test/mobile/wdio-config-android.js.template - packages/qvac-lib-infer-nmtcpp/test/mobile/wdio-config-ios.js.template Workflow step now does: WDIO_CONFIG=$(sed "s#__BUNDLE_ID__#${{ env.APP_BUNDLE_ID }}#g" \ packages/qvac-lib-infer-nmtcpp/test/mobile/wdio-config-<platform>.js.template) Step size: 24,350 → 12,423 bytes. No change to the final WDIO config that lands on-device — the template is the exact same content that was being constructed inline. Bundle-ID substitution is now done via sed on a placeholder `__BUNDLE_ID__` (OCR's tetherto#1625 pattern) instead of bash-single-quote gymnastics. Both templates contain the full WDIO config (`before:` hook with crash detection + health-monitor, `afterTest:`, `after:` with perf extraction — Android: poll + pullFile + logcat + chunks + run-as; iOS: pullFile from app sandbox). Made-with: Cursor * fix[QVAC-17474]: split Create and Upload Test Spec step to fix max-expression-length limit (no new files, no hardcoding) Previous attempt introduced two .template files to work around GitHub Actions' 21,000-char expression-length limit for the `Create and Upload Test Spec` step. Tobi's OCR tetherto#1625 does NOT use template files — he keeps everything inline via the heredoc pattern (`<< 'WDIO_EOF'`). His step naturally fits under 21,000 because OCR's WDIO config is smaller than NMT's (NMT has an additional health-monitor setInterval in the before: hook + multi-level crash detection). Cleaner solution: split the single large step into three smaller ones, mirroring OCR's heredoc approach inline: 1. New step "Build WDIO config for Android" (8.7 KB, matrix-gated `if: matrix.platform == 'Android'`) — builds and base64-encodes the Android WDIO JS, exports WDIO_CONFIG_B64 via $GITHUB_ENV. 2. New step "Build WDIO config for iOS" (4.3 KB, matrix-gated) — same for iOS. 3. Existing "Create and Upload Test Spec" step (11 KB) — no longer builds the WDIO config, just consumes $WDIO_CONFIG_B64 from env alongside the existing per-platform testspec-metadata branching (PLATFORM, AUTOMATION, HOST_LINE). All three steps are well under 21,000 bytes. No new checked-in files. All paths use the workflow's standard env vars (${{ env.APP_BUNDLE_ID }}), no hardcoded package paths. Deleted the previous commit's template files since they are no longer needed: - packages/qvac-lib-infer-nmtcpp/test/mobile/wdio-config-android.js.template - packages/qvac-lib-infer-nmtcpp/test/mobile/wdio-config-ios.js.template Made-with: Cursor * fix[QVAC-17474]: cross-platform sed (pipe form) in Build WDIO config steps iOS job failed with 'sed: -I or -i may not be used with stdin' because the iOS runner is macOS and BSD sed does not accept GNU-style -i without a backup-extension argument. Rewrote the substitution as a single pipe that works on both GNU sed (Linux, Android runner) and BSD sed (macOS, iOS runner): Before: sed -i "s#__BUNDLE_ID__#...#g" /tmp/wdio-config.js WDIO_CONFIG_B64=$(base64 < /tmp/wdio-config.js | tr -d '\n') After: WDIO_CONFIG_B64=$(sed "s#__BUNDLE_ID__#...#g" /tmp/wdio-config.js | base64 | tr -d '\n') Applied to both Android and iOS Build WDIO config steps. Made-with: Cursor * fix(mobile): flush perf-report.json after every record() on mobile The Bare process is hosted inside the native test-addon-mobile app and does not exit between WDIO specs, so `process.on('exit')` never fires — meaning `writeReport()` was never called on iOS/Android and the `perf-report.json` did not exist when WDIO's `after:` hook called `pullFile`, which returned OBJECT_NOT_FOUND. Mirror the OCR pattern (packages/ocr-onnx/test/integration/utils.js): - On mobile, call `writeReport()` + `writeToConsole()` after every `_perfReporter.record()` in `formatPerformanceMetrics` so the file is present on disk ahead of the `after:` hook. - Extend the mobile inline `writeReport()` dirs list with `os.tmpdir()` for iOS (maps to the app's tmp container, reachable as `@<bundle>:tmp/perf-report.json`). - Extend the iOS WDIO `after:` hook to additionally try `@<bundle>:tmp/perf-report.json` and to wait 3s before pulling so async flushes have time to hit the filesystem. Made-with: Cursor * fix(mobile-ci): resolve perf-report scripts under monorepo/ checkout The mobile workflow checks the monorepo out at `./monorepo`, so `scripts/perf-report/extract-from-log.js` is not at the workspace root. The extract step was running from the default cwd and failing with `Cannot find module '/Users/runner/work/qvac/qvac/scripts/perf-report/ extract-from-log.js'`, which caused `extracted=false` and skipped the Step Summary write + produced an empty perf-report-mobile artifact. Reference the scripts via `monorepo/scripts/perf-report/...` and guard with a presence check so the step surfaces a clear warning instead of a generic module-not-found if the checkout ever changes. Made-with: Cursor * feat(mobile): unify mobile Step Summary with desktop integration format Previously the mobile Step Summary used `aggregate.js`'s multi-device comparison layout (one column per device) and split perf + quality into two separate tables — the quality table also carried OCR-only columns (CER/WER/KW/KV) with '-' placeholders that made it hard to read. The desktop integration Step Summary renders a single compact table per run using `performance-reporter.js::writeStepSummary()`: | Test | EP | Total Time (ms) | Decode (ms) | Tokens | TPS | chrF++ | This change makes the mobile Step Summary use that exact same layout: 1. New `scripts/perf-report/render-step-summary.js` reads a single-device perf-report.json and emits the desktop-style single-table markdown, reusing METRIC_COLUMNS / QUALITY_COLUMNS from performance-reporter.js so both surfaces stay in lockstep. It suppresses the quality section when all quality keys are already covered by metric columns (so NMT no longer gets the empty CER/WER/KW/KV columns). 2. `fix(extract): derive device name from Device Farm flat filename layout`. Device Farm artifacts come in as `<logDir>/<Device>_Tests_Suite_*.txt` rather than `<logDir>/<Device>/*`, so `deriveDeviceName` was returning null and the device column rendered as "unknown". Add a fallback that parses the filename prefix (stops before `Tests_Suite | Setup_Suite | Teardown_Suite | job` phase separator), yielding e.g. "Apple iPhone 16 Pro". 3. Update the mobile workflow's "Write mobile perf report to GitHub Step Summary" step to call the new renderer against `performance-report.json` instead of catting the aggregated `performance-report.md`. Made-with: Cursor * fix(mobile): prevent SIGABRT from unhandled model-download rejection Root-cause of the Samsung Galaxy S25 Ultra failure in CI run 1212: 10:08:51.434 # IndicTrans backend [CPU] ← next test starts 10:08:51.434 Downloading: https://.../qvac_mod... ← re-downloads 200MB 10:08:51.580 E bare: Uncaught (in promise) FetchError: NETWORK_ERROR [cause]: HTTPError: CONNECTION_LOST: Socket hung up 10:08:51.582 F libc: SIGABRT in libbare-kit.so::js_callback_s::on_call The IndicTrans [GPU] variant already downloaded the 200MB model, then the [CPU] variant unnecessarily re-downloaded it. Samsung's Device Farm lane hit a transient socket drop on the second download; bare-fetch emitted an unhandled promise rejection; Bare's default handler called abort(). The stack tip was BareKit dispatching the rejection to JS, which is why the backtrace misleadingly looked like a BareKit-internal crash. Google Pixel 9a in the same matrix ran to completion on the same commit because its Device Farm lane didn't drop the second download. Three fixes, all in the test harness for this package: 1. `ensureIndicTransModel()` — cache the 200MB model on mobile's writable root (`global.testDir`). Skip redownload when the existing file is within the expected size range. Eliminates wasted bandwidth and the second-download failure window. 2. `downloadFile()` — retry transient network errors up to 3 times with exponential backoff (500ms / 1s / 2s). HTTP status errors still fail fast since they are deterministic. 3. `bergamot.test.js` + `indictrans.test.js` — mirror pivot-bergamot's defensive `Bare.on('unhandledRejection', ...)` handler so a future uncaught rejection logs loudly instead of calling abort(). Keeps the perf-report pipeline able to record whatever data was captured up to that point rather than losing the whole run. Bug (2) and (3) are defense-in-depth; bug (1) is the specific fix for the observed Samsung crash. Made-with: Cursor * fix(lint): declare Bare as a global in bergamot/indictrans tests sanity-checks (standard@17) flagged 'Bare' is not defined (no-undef) in the new unhandledRejection handlers added in 53f137f. Mirrors pivot-bergamot.test.js which uses '/* global Bare */'. Made-with: Cursor * fix(mobile): cache Bergamot model files + dedupe Firefox records Follow-up to the Samsung Galaxy S25 Ultra timeout in CI run 24796639547. The SIGABRT in the previous run was fixed by 0b2094f / afdb3b1, but `runPivotBergamot` still timed out at 20 minutes because the Bergamot model fetcher was re-downloading the same files many times: 1. Within a single `downloadBergamotFromFirefox` invocation, Firefox's translations-models records collection exposes multiple variants (production + dev/beta) that share the same `filename`, so the loop was downloading and overwriting e.g. `lex.50.50.enit.s2t.bin` 2–3 times per call — once with the 3.9MB production variant, once with the 4.3MB dev variant. 2. Across pivot sub-tests (GPU variant, CPU variant, stats-no-hang, batch — ×2 language pairs) the test re-invokes the fetcher with the same destDir, and `ensureModelPair` was calling the raw `downloadBergamotFromFirefox` instead of the cached `ensureBergamotModelFiles` wrapper, so each sub-test re-downloaded the full pair (~70MB × 8 sub-test invocations on Samsung's slower Device Farm lane ≫ 20-min per-test timeout). Fixes: * `bergamot-model-fetcher.js` - Skip per-file download when destPath already exists with non-trivial size (≥1KB, guards against zero-byte stubs from failed earlier runs). - Dedupe by filename within one `downloadBergamotFromFirefox` call so the dev/beta variant doesn't overwrite the production one. - Log "(cached)" when a skip happens so CI logs show what saved time. * `pivot-bergamot.test.js::ensureModelPair` - Call `ensureBergamotModelFiles` (which does `hasBergamotModelFiles` destDir check) instead of `downloadBergamotFromFirefox` directly, so repeat sub-tests skip the Firefox records endpoint entirely. Expected effect: Samsung `runPivotBergamot` completes in <5 min instead of timing out at 20 min; Google Pixel / iPhone finish faster too. Made-with: Cursor * fix(mobile-ci): full 8-row perf table on Android Step Summary Two bugs surfaced on run 24820356343, both on Android only: 1. `extract-from-log.js` settled on a 6-row chunked report instead of the final 8-row one. Root cause: the pivot fr→en→es test uses the input "Bonjour, comment allez-vous aujourd'hui?", and the ReactNativeJS bridge wraps logcat output in a JS single-quoted string literal, escaping the apostrophe as `\'`. That's a valid JS escape but NOT a valid JSON escape, so `JSON.parse` on the reassembled 8-row chunk set bailed with `Bad escaped character in JSON`. The extractor silently fell back to the earlier 6-row chunk set that didn't contain fr→en→es yet. Fix: after stripping the `'[Bare]', '…'` wrapper in `cleanJsonFromLogcat`, unescape `\\'` → `'`. Verified locally against the Samsung + Pixel logcats — both now reassemble all 8 results. 2. The "Write mobile perf report to GitHub Step Summary" step only looked for `<OUTPUT_DIR>/performance-report.json` at the root, but `extract-from-log.js` writes per-device subdirs when ≥2 devices are present (Android matrix: Pixel + Samsung). iOS (single device) wrote to the root path and worked; Android fell into the `::warning::` branch and skipped the Step Summary entirely. Fix: when the root file is missing, walk `<OUTPUT_DIR>/*/` and render one table per device, using the directory name (underscores → spaces) as the Device suffix in the heading. Made-with: Cursor * chore(quality): mark translation reference fixtures as validated (A.5) Drop the "placeholder baseline — verify with native speaker" note on the four reference translations used by the integration-test chrF++ scoring and replace with "validated 2026-04-23" plus the register actually used (informal / formal). Closes A.5 of QVAC-17474. A.6 (N>1 sentences per test case) and A.7 (chrF++ quality gate) are explicitly out of scope for this ticket — chrF++ stays observational-only with a single reference per pair. Fixtures updated: - bergamot.quality.json en → it (informal) - indictrans.quality.json en → hi (formal, आप) - pivot-bergamot.quality.json#1 es → it (informal) - pivot-bergamot.quality.json#2 fr → es (formal, usted) The inline mobile copy in test/integration/utils.js is updated byte-for-byte in sync (mobile fallback has no fs access to the on-disk JSONs at runtime under bare-pack). Made-with: Cursor --------- Co-authored-by: Alok-Ranjan23 <Alok-Ranjan23@users.noreply.github.com> Co-authored-by: olyasir <sirkinolya@gmail.com>
* feat: anchored tools placement for multi-round tool chains
Replace tools-at-end placement with anchored placement: tools are
positioned after the last user message and stay in the KV cache
across chain rounds instead of being removed and re-added each round.
Changes:
- Template: anchor tools after last user message (two-pass Jinja2)
- PostInfer: keep tools when output contains <tool_call>, remove
only when chain completes (no tool call in output)
- Boundary tracking: recordToolBoundary sets anchor once, preserves
across chain rounds
- Streaming: capture output when toolsAtEnd is active for tool call
detection
- Stats: forward nPastBeforeTools, firstMsgTokens, toolsTrimmed
- Generation prompt: treat role "tool" same as "user" for
add_generation_prompt (fixes empty response on tool chain
continuation)
* fix: prevent output duplication in streaming mode with toolsAtEnd
Use captured output only for internal tool call detection, don't set
it as the return value when streaming. Prevents the JobRunner from
queuing the full text again after it was already streamed token by
token, which caused the SDK to see every tool call twice.
* fix: avoid unnecessary string copy for non-tool completions
Move captured output construction inside the toolsAtEnd guard so
non-tool completions pay zero string overhead. Only the oss.str()
call and tool_call detection happen when dynamic tools are active.
* fix: context sliding with tools_at_end corrupts tool boundary tracking
When context sliding occurs with tools_at_end enabled, the
nPastBeforeTools boundary was not adjusted after token discard.
This left stale tool tokens in the KV cache, causing incorrect
trim after generation.
Changes:
- Limit discard to conversation-only region (never eat tool tokens)
- Adjust nPastBeforeTools after sliding by the discard delta
- Reset DynamicToolsState in fallback discard path
- Applied to both TextLlmContext and MtmdLlmContext
- Add regression test for sliding during generation with large tools
* refactor: extract sliding helpers into DynamicToolsState, harden edge cases
- Extract clampDiscard() and adjustAfterSlide() into DynamicToolsState
to eliminate 4x duplicated clamping/adjustment blocks
- Remove redundant std::max(safeLimit, 0) — guard already ensures > 0
- Add discard == 0 early return in applyContextDiscard to skip no-op
KV cache operations
- Guard fallback reset() with toolsAtEnd() check for consistency
- Add comment explaining eval vs generation fallback asymmetry
- Use n_predict=-2 (fill context) in test to guarantee sliding
* test: update sliding test for anchored tools behavior
With anchored tools, postInfer keeps tools in cache when the model
produces <tool_call> in output. Update the sliding regression test
to check toolsTrimmed stat instead of assuming tools are always
removed after generation.
* test: two-phase sliding test verifies adjustAfterSlide
Replace single-phase sliding test with two-phase comparison:
Phase 1 (baseline): large context, n_predict=0 → no sliding.
Records nPastBeforeTools as the original anchor.
Phase 2 (sliding): small context, n_predict=-2 → sliding fires.
After trim, nPastBeforeTools must be less than baseline.
Without adjustAfterSlide: both phases have equal nPastBeforeTools → FAIL.
With adjustAfterSlide: phase 2 anchor is smaller → PASS.
* test: exact sliding anchor assertion with session and clamped discard
Three-phase test using session cache:
Phase 1: init session (small firstMsgTokens)
Phase 2: baseline — large context, n_predict=0, records anchor
Phase 3: sliding — small context, n_predict=-2, sliding fires
Simulates per-slide clamped discard (min(nDiscarded, safeLimit))
and asserts slideNPBT == expectedNPBT with exact values. Verifies
adjustAfterSlide reduces anchor by the correct amount per slide.
* test: add unclamped sliding test with long conversation
Second sliding test with longer user message and smaller n_discarded
(20). Verifies at least 1 slide discards the full n_discarded amount
(unclamped). Both tests simulate per-slide clamped discard and assert
exact nPastBeforeTools values.
* test: use n_discarded=100 with long conversation for unclamped sliding
Longer user message (~300 tokens) ensures the conversation region
exceeds n_discarded=100. Each slide discards the full 100 tokens
without clamping. Simpler and more direct than using small n_discarded.
* fix: don't add generation prompt on system-only prefill
When nPast=0 and the only message is a system prompt (role=system),
don't set add_generation_prompt=true. This was adding a stale
<|im_start|>assistant token to the cache that the model would see
as an empty assistant turn before the actual user message.
Now check the actual last message role instead of hardcoding true.
Saves 3 tokens in the cache prefix.
* chore: remove debug prompt logging
* chore: add debug log for tokenizeChat generation prompt flag
Logs nPast, lastRole, nMsgs, nTools, addGenPrompt at DEBUG verbosity.
Helps diagnose issues with stale generation prompt in cache.
* (fix) llamacpp-llm: "tool" role generate prompt tests
* (fix) llamacpp-llm: no "think" blocks in assistant history
* (internal) llamacpp-llm: test qwen3 dynamic tools template
* (chore) llamacpp-llm: upgrade package version
* fix: skip dispatch validation when called via workflow_call
The Validate Dispatch Inputs step fails when the mobile integration
workflow is invoked via workflow_call from a workflow_dispatch parent,
because github.event.inputs.package is empty in that context.
* fix: align prebuild download path with verify step in LLM mobile workflow
Prebuilds are downloaded to runner.temp/qvac-lib-infer-llamacpp-llm but
the verify step looked in runner.temp/prebuilds-download, so prebuilds
were never found.
* (internal) llamacpp-llm: runtimeDebugStats internal method
* (chore) llamacpp-llm: tools_at_end rename to tools_compact
* (improvement) llamacpp-llm: tools_compact feature docs
* (chore) llamacpp-llm: fix test
* (chore) llamacpp-llm: rename, cleanup, tests assertions
* (internal) llamacpp-llm: improve tests
* (internal) llamacpp-llm: reduce test flakiness with 0 temp
* (internal) llamacpp-llm: test rename
* (internal) llamacpp-llm: generate tests correct
* (internal) llamacpp-llm: improve sliding ctx tests
* (chore) llamacpp-llm: version bump
* (chore) llamacpp-llm: clang-format
* (fix) llamacpp-llm: qwen3 template perf and debug null guard
* (chore) llamacpp-llm: discard tokens warning
* (chore) llamacpp-llm: reuse getStatValue at tests
* (fix) llamacpp-llm: first msg sliding guard
* (improvement) llamacpp-llm: tools_compact require tools always
* (chore) llamacpp-llm: fix linter
* (fix) llamacpp-llm: guard regression, integration tests
* (internal) llamacpp-llm: remove over-defensive checks, fix test
* (chore) llamacpp-llm: cleanup linter and unused tests
* refactoring: anchored tools structured (tetherto#1658)
* (doc) llamacpp-llm: structure proposal
* (doc) llamacpp-llm: refactoring plan
* (internal) llamacpp-llm: extract tools compact controller from llm contexts
* (internal) llamacpp-llm: extract shared context slider for text and mtmd
* (internal) llamacpp-llm: ContextSlider testable, more tests
* (internal) llamacpp-llm: migrate tools compact coverage to deterministic unit tests
* (chore) llamacpp-llm: follow up minor fixes
* (internal) llamacpp-llm: improve multi-model portability
* (internal) llamacpp-llm: decouple ChatTemplateUtils
* (internal) llamacpp-llm: tools_compact contract, tests
* (internal) llamacpp-llm: ToolsCompactController tests and comments
* (doc) llamacpp-llm: tools_compact refine verify
* (internal) llamacpp-llm: tools compact profile resolution improved
* (chore) llamacpp-llm: clang format
* (chore) llamacpp-llm: tools-compact test improved
* (chore) llamacpp-llm: test conditin check style
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* (chore) llamacpp-llm: bump version, remove nested namespace
* (chore) llamacpp-llm: changelog improved
* (chore) llamacpp-llm: cleanup, test tool token count comment
* (chore) llamacpp-llm: tests useless conditional
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* (chore) llamacpp-llm: tests refactor and remove redundant
* (chore) llamacpp-llm: deduplicate cache management tests, context slider edge coverage
* (chore) llamacpp-llm: clang format
* (fix) llamacpp-llm: ToolsCompact tools_calls check
* (internal) llamacpp-llm: oss string handle optimization
* (internal) llamacpp-llm: compute user msg index at cpp
* Revert "(internal) llamacpp-llm: compute user msg index at cpp"
This reverts commit 872eb47.
* (internal) llamacpp-llm: qwen3 dynamic template loop perf improved
* (chore) llamacpp-llm: clang format
---------
Co-authored-by: olyasir <sirkinolya@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
…stry server (tetherto#1724) * QVAC-17131 feat: add Prometheus metrics monitoring to registry server (tetherto#1600) * feat: add Prometheus metrics monitoring to registry server * fix: restrict registry ping RPC to role and timestamp to avoid exposing operational data * fix: make metrics bind host configurable and move off port 9090 * feat: replace per-model size gauge with view-derived total blob bytes (tetherto#1689) * feat[bc]: rename gauges, add seeder metrics, and eagerly open blob core on indexers (tetherto#1692) * feat[bc]: rename gauge metrics off _total suffix and pre-initialise rpc counters * feat: add core seeder metrics and eagerly open blob core on indexers * style: drop eslint-disable directives via helper function for gauge registration * refactor[bc]: drop core_name label from blob core metrics and use median for view-derived stat panels * style: drop noisy comment above registerGauge helper * feat[bc]: replace blob_core_fully_downloaded with length/contiguous_length pair and drop blind-peer metrics (tetherto#1702) * feat: expand Grafana dashboard with blob-core replication, seeders, and Holepunch P2P panels (tetherto#1716) * feat: expand Grafana dashboard with blob-core replication, seeders, and Holepunch P2P panels * fix: use vm_name label in QVAC and Holepunch panel legends instead of raw instance IP:port * fix: apply $vm template filter to QVAC and Holepunch selectors for consistent per-node filtering * chore[docs]: tighten registry Grafana dashboard panels based on staging review (tetherto#1718) * chore[docs]: tighten registry Grafana dashboard panels based on staging review * chore[docs]: drop redundant Blob Core Contiguous stat, cluster blob panels near the top * chore[docs]: promote View Core Replication and Blob Core Bytes to the top of the metrics section (tetherto#1719) * chore[docs]: promote View Core Replication and Blob Core Bytes to the top of the metrics section * chore[docs]: split View Core Replication into length, contiguous, and gap panels * chore: remove dead blind-peer helpers and fix stale metrics docs - Drop unreferenced getConnectedBlindPeerKeys / getConfiguredBlindPeerKeys / isBlindPeerConnected chain and the _peerConnectionCounts map that only existed to back isBlindPeerConnected. Left over from the dropped blob_core_blind_peers gauge (1de851b). - Fix DEPLOYMENT_GUIDE.md: default metrics port is 9210, not 9090; drop the hypermetrics reference since it is not a dependency (abandoned, incompatible with Hypercore v11) and per-core visibility is provided by the registry_blob_core_* / registry_view_core_* gauges.
* chore: Initial test-removal of environments for PR runs, remove unnecessary npmrcs --------- Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
…ape (tetherto#1688) * chore[bc|notask]: migrate SDK plugins to new addon constructor shape The three addon refactors (tetherto#1493 embed, tetherto#1494 LLM, tetherto#1496 diffusion) landed on main without the matching SDK plugin migration. Their published releases (`@qvac/embed-llamacpp@0.14.0`, `@qvac/llm-llamacpp@0.16.0`, `@qvac/diffusion-cpp@0.3.0`) dropped the `BaseInference` / `WeightsProvider` loader-and-disk-path constructor and replaced it with a single-argument `{ files, config, logger, opts }` shape that takes pre-resolved absolute paths. Plugin changes (llamacpp-completion, llamacpp-embedding, sdcpp-generation): - Construct the addon with the new single-argument shape. LLM and embed pass `files.model: string[]`; diffusion passes `files.model: string` plus renamed companion keys (`clipL`/`clipG`/`t5Xxl`/`llm`/`vae`). - Drop `FilesystemDL`, `parseModelPath`, and the `asLoader` adapter from each plugin. Addons now receive absolute paths directly. - Return `{ model }` instead of `{ model, loader }`. Loader field removal across the plugin-registry contract: - `PluginModelResult.loader?:` removed from the interface. - `LocalOptions.loader?: FilesystemDL` and the `registerModel` options slot for it both dropped. - `unloadAllModels` and `unloadModel` no longer call `entry.local.loader.close()`. - `loadModel`'s cast tightened to `{ model: AnyModel }` and the conditional loader spread into `registerModel` removed. - Non-migrated plugins (nmt, whisper, ocr, tts, parakeet) also simplified to `return { model }`. Their addons already stopped accepting a loader in earlier refactors; the pass-through was dead code. Parakeet drops its now-unused `new FilesystemDL({ dirPath })` + `parseModelPath` call. - `server/bare/utils/loader-adapter.ts` (asLoader) and `server/utils/model-path.ts` (parseModelPath) deleted. Both had no remaining callers after the plugin cleanup. Sharded-GGUF helper and tests: - New `packages/sdk/server/utils/expand-gguf-shards.ts` turns a single sharded GGUF path into the ordered list the new addon contract expects (`.tensors.txt` companion first, then `-NNNNN-of-NNNNN.gguf` shards). Pure string manipulation, POSIX and Windows separator handling. - 9 unit tests cover non-sharded paths, first-shard input, non-first-shard input, nested directories, single-shard (1-of-1), relative paths, Windows backslash separators, and a substring-match regression test (filename containing a shard-like pattern mid-basename must not match). Dependency changes: - Bump SDK deps to the published addon versions: `@qvac/diffusion-cpp ^0.3.0`, `@qvac/embed-llamacpp ^0.14.0`, `@qvac/llm-llamacpp ^0.16.0`. - Drop `@qvac/dl-filesystem` from `dependencies`; no remaining consumer in the SDK. - `bun.lock` refreshed; `bun install --frozen-lockfile` clean. Docs: the custom-plugin example in `docs/website/.../write-custom-plugin.mdx` drops the stale `loader: null` return to match the new `{ model }` shape. Test fixtures updated: `plugin-system.test.ts` and `sdcpp-plugin.test.ts` drop `loader: {}` / `loader: undefined` from mock `createModel` returns and from `registerModel` calls (the field no longer exists on either interface). Verified: `bun run build` clean (`--max-warnings=0`), 423/423 unit tests pass, all three diffusion examples and 19 LLM examples and 4 RAG examples run end-to-end against the real addons. Supersedes tetherto#1510, which carried stale merge history from the addon-refactor side-branches and could not be rebased onto current main cleanly. * fix(examples): set FLUX guidance params on diffusion examples Both `diffusion-txt2img.ts` and `diffusion-flux2-klein.ts` default to FLUX.2 Klein but neither was sending the right guidance knobs. stable- diffusion.cpp gates the unconditional inference branch on `guidance.txt_cfg != 1.0` (`stable-diffusion.cpp:3304`) and logs "use cfg-scale=1 for distilled models" (`:1667`); FLUX is a distilled model. The addon's `GenParams.cfgScale` defaults to `7.0f` (`SdGenHandlers.hpp:44`) and is assigned straight into `sample_params.guidance.txt_cfg` at `SdModel.cpp:499`, so omitting or leaving `cfg_scale` at the default forces the full CFG path on FLUX every step for zero quality benefit. `guidance: 3.5` is the FLUX distilled-guidance default and also needs to be set explicitly. Setting `guidance: 3.5, cfg_scale: 1` in both examples halves generation cost per step on FLUX. Measured on an RTX 5080 (Vulkan): 20-step txt2img drops from 17.1s to 9.1s; flux2-klein drops from 17.4s to 9.1s. * fix: simplify SDK shard expansion helpers Keep `expandGGUFIntoShards()` Bun-testable without introducing a separate shared shard-pattern module. This reduces refactor churn while preserving the sharded GGUF behavior expected by the SDK plugins. * fix(examples): remove unintended docs and FLUX comment churn Restore the custom-plugin docs example so this PR stays scoped to the SDK addon integration work. Remove the extra FLUX.2 guidance comments from the diffusion examples as requested. * fix: restore shard-utils JSDoc * fix: address review feedback from PR tetherto#1688 - `packages/sdk/server/bare/plugins/nmtcpp-translation/plugin.ts`: inline `path.dirname(modelPath)` and `path.basename(modelPath)` in `deriveColocatedBergamotVocabPaths` instead of importing `parseModelPath`. After rebasing onto main, `parseModelPath` no longer exists (this PR deletes `server/utils/model-path.ts`) but `tetherto#1707` reintroduced its usage here. `bare-path` is already imported in this file, so the inline form is a direct swap. - `packages/sdk/server/utils/expand-gguf-shards.ts`: drop the redundant `totalDigits = String(totalShards).padStart(5, "0")` round-trip and use the regex capture `match[3]` directly (it is already a 5-digit zero-padded string). Also drop the `!Number.isFinite(totalShards)` guard: a 5-digit regex match plus base-10 `parseInt` always produces a finite integer, so the check is dead defense. The `<= 0` guard is kept (pins the `00000-of-00000.gguf` edge case). - `packages/sdk/test/unit/expand-gguf-shards.test.ts`: add a test that pins the zero-total shard-count branch; `expandGGUFIntoShards` must return the input path unchanged for `empty-00000-of-00000.gguf` rather than an empty shard list. * chore: route expand-gguf-shards import through server/utils barrel - `packages/sdk/server/utils/index.ts`: add `expand-gguf-shards` to the barrel re-exports alongside the other utility modules. - `packages/sdk/server/bare/plugins/llamacpp-completion/plugin.ts` and `llamacpp-embedding/plugin.ts`: import `expandGGUFIntoShards` from `@/server/utils` instead of the module path directly. - `packages/sdk/test/unit/expand-gguf-shards.test.ts`: same. No behavior change; keeps a single re-export point for server utils. * fix: import expandGGUFIntoShards directly in unit test The unit test runs under Bun. Importing through the `@/server/utils` barrel drags in `checksum.ts` and `formatting.ts`, both of which `import crypto from "bare-crypto"`. Bun does not implement Bare's `require.addon()`, so evaluating that import chain throws `TypeError: require.addon is not a function` at load time and the test run exits non-zero before any assertion executes. The plugin files (`llamacpp-completion/plugin.ts` and `llamacpp-embedding/plugin.ts`) keep the barrel import because they only evaluate under Bare. Only the Bun-loaded unit test needs to go through the module path directly. --------- Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>
… key instead of wiping all entries (tetherto#1740)
…es (tetherto#1712) * infra: distinguish [ios] / [android] / [desktop] jobs in sdk test workflows Add display `name:` fields to jobs in the three reusable SDK test workflows so the Actions UI can tell platforms apart when they run under the same umbrella (test-sdk.yml). Job IDs, needs: graphs, outputs, and artifact names are unchanged. * chore: bump qvac-test-suite to ^0.6.0 and add install:build script - Bump @tetherto/qvac-test-suite from ^0.5.1 to ^0.6.0 to pick up the run:local:desktop / run:local:android / run:local:ios commands and the suite + bootstrap features used by the refreshed local flow. - Add install:build script (npm install --install-links && npm run build) for a one-shot reinstall + rebuild after SDK changes. * doc: rewrite tests-qvac readme for local-first workflow and ci triggers - Lead with run:local:* one-liners instead of the old manual iOS flow. - Document the MQTT broker requirement (ws:8080 + mqtt:1883) and the embedded aedes + websocket-stream fallback behaviour. - Document the PR label triggers (test-e2e-smoke, test-e2e-full) and the manual workflow_dispatch entry point, including the non-obvious workflow-branch vs test-version distinction. - Add a "Developing new tests" section with executor placement guidance (shared / desktop / mobile) and the smoke-suite policy (1-2 tests per feature, only when no existing smoke coverage). - Keep manual Xcode fallback only as a troubleshooting bullet. * doc: add cursor rule for tests-qvac e2e impact and authoring conventions New cursor rule scoped to packages/sdk/** that enforces: - Evaluate e2e test suite impact on any SDK source change. - Rebuild tests-qvac via `npm run install:build` on SDK API or model constant changes; adapt or add tests accordingly. - Executor placement decision tree (tests/shared vs tests/desktop vs tests/mobile) with the hard rule that node:* imports are banned from shared/ and mobile/. - Smoke-suite policy: 1-2 tests per feature, only when no existing smoke coverage, stable on both desktop and mobile. - Points at tests-qvac/README.md for the local-run and CI-trigger details. * chore: gitignore tests-qvac local secrets and rag-hyperdb data Prevent accidental commits of local run artefacts: - .env / .env.bak-* may contain MQTT credentials copied from .env.example. - rag-hyperdb/ holds generated HyperDB corestore data from RAG tests. * chore: add install:build:full / prepare:sdk scripts and document rebuild flows - package.json: prepare:sdk (bun install + build in packages/sdk/) and install:build:full (prepare:sdk + install:build) for one-shot SDK + tests-qvac rebuilds. - README: new "Rebuilding after changes" section with a decision table covering SDK source changes, test-code-only changes, and producer-side-only changes (--skip-build). Clarifies that mobile always needs a fresh APK/IPA to pick up SDK or test-code changes and that --skip-build is strictly for re-runs with different suites or filters. - Cursor rule now points at the README section and references install:build:full alongside install:build. * doc: add sdk-e2e-create skill for e2e test planning New Cursor skill under .cursor/skills/sdk-e2e-create/ that guides planning and scaffolding of e2e tests in packages/sdk/tests-qvac for new or changed public SDK APIs. - Investigate-first flow: read the feature from code and existing tests, then present a concrete plan and ask targeted clarifying questions only where genuine ambiguity remains. - Enforces happy / sad / error coverage for every public API feature. - Ranks model-output validation strategies from deterministic keyword assertions down to shape-only fallbacks, to avoid weak coverage by default. - Covers executor placement (shared / desktop / mobile) with mobile memory / filesystem / platform constraints. - Smoke-suite selection rules: 1-2 tests per feature, only when no existing smoke coverage, stable across platforms. - Includes scaffolding templates and the exact run:local:desktop --filter command to hand back to the user for local verification.
…ag (PR-1701) (tetherto#1883) * test: add e2e coverage for transcribe()/transcribeStream() per-segment metadata (PR-1701) - transcription-metadata-batch / transcription-metadata-streaming: validate TranscribeSegment[] shape returned by Whisper when metadata: true - parakeet-tdt-metadata-rejected: assert metadata flag is rejected by non-Whisper engines (Parakeet) - shared validateSegments() helper, content-agnostic shape check - desktop + mobile executor parity for transcription and parakeet flows Also folds in a tooling fix surfaced while authoring these tests: tests-qvac/package.json gains clean:sdk-snapshot which wipes node_modules/@qvac/sdk plus the iOS/Android consumer build snapshots before reinstalling, so install:build:full no longer reuses a stale @qvac/sdk copy on any platform. Co-authored-by: Cursor <cursoragent@cursor.com> * test[skiplog]: exercise duplex metadata flow and pin parakeet rejection reason * test[skiplog]: apply code review suggestions --------- Co-authored-by: Cursor <cursoragent@cursor.com>
… cache invalidation (tetherto#2004) Adds 2 e2e tests in tests-qvac (translation-bergamot-fr-en-cache-reload, translation-bergamot-en-fr-cache-reload) covering the QVAC-18420 regression where shared vocab files for bidirectional Bergamot pairs were silently re-downloaded on every loadModel call. Each test does load -> unload (Round 1, warm cache) then load with onProgress -> unload (Round 2, must be a pure cache hit). Cache-hit detection is platform-agnostic via partial-percentage progress event counting (no node:fs snapshots). Skipped on mobile via SkipExecutor since the bug lives in server-side Bare code that is bit-identical across platforms. Co-authored-by: Cursor <cursoragent@cursor.com>
…ex streaming (tetherto#2018) * feat: Implement parakeet ggml backend * add duplex streaming * clean up * chore: gitignore local *-output.wav demo artifacts * Add ^ before parakeet version * QVAC-17869 feat[bc]: address PR review for parakeet 0.4.0 GGML migration Resolves blockers and should-fix items from PR tetherto#2018 review: - Revert TTS GGML bleed-in from desktop/mobile e2e consumers so the PR is parakeet-scoped. - Add structured LegacyParakeetModelDeprecatedError (server code 52210) and parakeetLoadConfigSchema that allow-lists legacy ONNX modelConfig fields so they reach resolveParakeetConfig and surface a clear migration message instead of a generic Zod failure. Public parakeetConfigSchema.strict() still rejects them. - Refactor endOfTurnEventSchema into a discriminated union on `source` ("whisper" requires silenceDurationMs; "parakeet" omits it). Threads the discriminator through transcribe op, both transcription plugins, the client API, examples, and unit tests. - Set skipPrimaryModelPathValidation to false in the parakeet plugin now that modelSrc is a real GGUF the framework can validate. - Add e2e tests parakeetStreamDestroyMidUtterance and parakeetStreamIteratorThrow (wired into desktop + mobile executors) to cover session.destroy() mid-utterance and consumer iterator unwind. - Add TODO(QVAC-17869-followup) in the parakeet duplex handler about wiring AbortSignal for cancellation in the next iteration. - Update transcription docs for the new single-GGUF loadModel shape, the LegacyParakeetModelDeprecatedError migration note, and the new transcribeStream() discriminated endOfTurn events. Update error code 52210 entry in the API reference. - Resolve package.json merge conflict (keep ^0.4.0 for @qvac/transcription-parakeet; take main's whispercpp ^0.7.0 and nmtcpp ^3.0.0). Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-17869 fix: drop stale skipPrimaryModelPathValidation on parakeet plugin Address PR tetherto#2018 review feedback: the multi-file ONNX-era flag is no longer meaningful in 0.4.0 where the top-level `modelSrc` is the actual GGUF the addon mmaps. Remove the explicit `false` so the plugin relies on the framework default (which runs primary-path validation), and leave a short comment recording the intent so the omission is not misread as an oversight. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-17869 fix: cast parakeet metadata-rejection probe via unknown `transcribeStream({ metadata: true })` resolves to an `AsyncGenerator`, not a duplex `TranscribeStreamSession`. The two types have no overlap, so a direct `as TranscribeStreamSession` is rejected by TS5 (TS2352). The only consumer here is the negative-path probe in `runParakeetStreamMetadataRejected`, which expects the call to throw before the cast is ever observed at runtime, so go through `unknown` to satisfy the type system without changing the test's behaviour. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-17869 doc: fix broken parakeet-cpp engine link in transcription docs The PR review on tetherto#2018 flagged that https://github.com/tetherto/qvac-parakeet.cpp does not exist under the tetherto org. The parakeet-cpp engine actually lives as a subdirectory inside qvac-ext-lib-whisper.cpp (consistent with the attribution URL already used in packages/transcription-parakeet/NOTICE). Update the transcription overview to link to the real location. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-17869 fix: make parakeet-stream e2e tests pass end-to-end Four interlocking fixes for the new `parakeet-stream-*` e2e suite: 1. Executor dispatch order (desktop + mobile consumer.ts). The `ParakeetExecutor` pattern `/^parakeet-/` was registered before `ParakeetStreamExecutor` (pattern `/^parakeet-stream-/`), so the broader matcher won every dispatch and stream test ids landed in the wrong executor (which lacks stream handlers, surfacing as "Unknown test"). Swap the order so the more specific pattern wins. 2. Audio fixture sample rate (parakeet-stream-tests.ts). The duplex runner feeds raw PCM directly into the parakeet session with no FFmpegDecoder hop, so the fixture must already be 16 kHz mono. The previous `transcription-short-wav.wav` (48 kHz stereo) was rejected by the runner's `sampleRate !== 16000` precondition. Switched the happy/reject/teardown/throw tests to `diarization-sample-16k.wav`. 3. Wall-clock pacing in the runner (parakeet-stream-runner.ts). The native parakeet `StreamSession` only commits transcript segments when audio arrives at roughly real-time cadence — flushing all chunks synchronously starves its internal segmenter and yields zero events. Made `writeInChunks` async and added a `delayMs` parameter that paces writes at the test's configured `chunkMs`; matches the addon's own `live-stream-simulation.test.js` / `duplex-streaming .test.js` pacing model. All call sites updated to `await` it. 4. EOU fixture (parakeet-stream-tests.ts). The EOU detector fires `<EOU>` based on sentence-final / turn-boundary linguistic patterns (see the addon-level `eou-streaming.test.js` regression note), and `diarization-sample-16k.wav` is continuous multi-speaker overlap — transcript text comes back but no `isEndOfTurn` segments surface. Switched the EOU test alone to `two-speakers-16k.wav`, the alternating two-speaker conversation fixture, which provides the turn-boundary stimulus the EOU head is trained on. Result on desktop: 21/21 parakeet stream + dependent tests pass (was 0/5 stream tests pre-fix, then 16/21, then 20/21). Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-17869 fix: explicit session destroy in parakeet stream iterator-throw test `runParakeetStreamIteratorThrow` previously relied on the for-await sentinel throw to invoke the async iterator's `return()` and, through it, tear down the native `StreamSession` before opening the recovery session against the same model. On Node/Bare-desktop the unwind runs to completion synchronously enough that the next `transcribeStream` call sees a fully released model. On the Bare-RN bridge (iOS / Android) the iterator-return → native-destroy chain crosses JSI and is best-effort, so the recovery session opens while the previous native session is still alive, the model stays wedged, and the recovery iteration yields zero events — surfacing as `assertHappy` reporting `expected at least one text event, got: {}`. Call `throwingSession.destroy()` explicitly between catching the sentinel and opening the recovery session. This matches the contract real SDK consumers should follow when abandoning a stream mid-iteration (the iterator's `return()` is intentionally best-effort across runtimes) and keeps the test focused on the recovery contract rather than JSI return-propagation timing. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local> Co-authored-by: Cursor <cursoragent@cursor.com>
… ABI verification (tetherto#1984) * fix: handle repeated package names in buildNestedPathIndex Replace `key.indexOf(marker)` with `match.index` so each regex match maps to its actual package root. Previously, a resolution key with the same package name appearing twice (e.g. `node_modules/foo/node_modules/bar/node_modules/foo/index.js`) collapsed the nested `foo` back to the top-level path. Also exports `buildNestedPathIndex` so it can be reused by the upcoming `qvac verify bundle` command. Adds regression tests covering single, nested, and repeated-name resolution keys. * feat[api]: add qvac verify bundle command for prebuild and ABI verification New `qvac verify bundle` subcommand under `qvac verify`. Validates the actual artifacts (prebuilds + ABI compatibility) of a worker bundle or installed node_modules tree before shipping. Accepts a `worker.bundle.js` (bare-pack tree-shaken output) or a `node_modules` directory via `--addons-source`; source kind is auto-detected. Per addon per `--host`: - prebuild presence: `<packageRoot>/prebuilds/<host>/*.bare` - ABI compatibility: addon's `engines.bare` must satisfy the resolved Bare runtime version. Resolution order: `--bare-runtime-version` flag -> `bare-runtime/package.json` -> `bare/package.json`. Mobile/Expo CI should pass `--bare-runtime-version` explicitly; `react-native-bare-kit` does not currently expose embedded runtime metadata. Structured issue codes for CI consumption: - error: `missing-prebuild`, `abi-mismatch`, `invalid-runtime-version`, `invalid-source` - warning: `unknown-runtime-version`, `malformed-engines-bare` Exit 1 on any error-level issue, 0 otherwise. Tests: unit + Bats smoke for both source kinds, prebuild/ABI checks, runtime auto-resolution, malformed-engines-bare warning path (surfaced even when runtime is unknown), and regression coverage for nested-only bundle resolutions and multi-instance retention. Validated end-to-end against qvac-app-workbench-mobile: 33 addons in a 10MB worker.bundle.js across 5 hosts; 45 addons in node_modules with 2 legitimate prebuild gaps surfaced; bare-os@3.6.2 (top-level) and bare-os@3.9.0 (nested under @qvac/tts-onnx) correctly distinguished. Adds `semver` as a runtime dependency. * feat[api]: read bareRuntimeVersion from qvac.config in verify bundle Adds `--config <path>` and auto-detection of `qvac.config.{json,js,mjs,ts}` so projects can pin the Bare runtime in a committed file (works in any runtime, including the future Pear pre-hook where env vars don't). Resolution: `--bare-runtime-version` > config `bareRuntimeVersion` > `bare-runtime/package.json` > `bare/package.json`. Both flag and config values share the same semver validation; malformed values emit `invalid-runtime-version` carrying `source: 'flag' | 'config'`, with the config file's actual path in the message. Explicit `--config` to a missing/unreadable file emits `invalid-source`; auto-detect failures and non-string config values fall through silently. Tests: 7 unit + 1 bats covering auto-detect, explicit `--config`, flag precedence, malformed/non-string config values, and config-path label in error messages. * fix[api]: address verify bundle review feedback - node-modules-source.ts + prebuilds.ts: treat symlinked package directories and prebuilds as valid (isDirectory || isSymbolicLink) so pnpm / yarn-pnp layouts don't silently pass verification. - abi.ts: pass `{ includePrerelease: true }` to `semver.coerce` so RC runtimes like 1.16.0-rc.1 aren't silently coerced to 1.16.0. - index.ts: emit `config-load-failed` warning when an auto-detected qvac.config.* exists but fails to parse, instead of swallowing the error. Explicit `--config` still errors via `invalid-source`. Tests: 3 regressions covering each fix. * fix[api]: address verify bundle review feedback (Simon) - index.ts: invalid-runtime-version no longer short-circuits the prebuild walk; a typo in `--bare-runtime-version` or a malformed config `bareRuntimeVersion` no longer hides real missing-prebuild errors. ABI resolution stays skipped when the runtime version is malformed. - addon-source.ts + bundle-source.ts + node-modules-source.ts: thread `CollectDiagnostics` so malformed `package.json` records (parse error, non-object, missing `name` on `addon: true`) and empty bare-pack resolutions are no longer swallowed. - index.ts: surface diagnostics as two new warning issues `invalid-package-json` and `empty-bundle-resolutions`, both warning-level (exit 0). Formatters added; `VerifyBundleIssue` union extended. Tests: 2 orchestrator regressions (no-short-circuit, empty-bundle- resolutions) + extended `readAddonPackageJson` malformed-JSON unit to assert the new `invalid` record. * feat[api]: add --json flag to qvac verify bundle Emit the verification result as pretty-printed JSON on stdout instead of the human-readable summary. Exit codes are unchanged. Mirrors the `qvac doctor --json` convention so CI scripts, dashboards, and other downstream tooling can consume the structured issue codes (`addons`, `runtime`, `issues`, `hosts`, `sourceKind`, etc.) directly. `--quiet` is ignored when `--json` is set (a JSON consumer explicitly asked for output). README updated under `verify bundle` options, exit codes, and issue codes (also documents the three new warning codes added in the previous commit: `invalid-package-json`, `empty-bundle-resolutions`, `config-load-failed`). * doc: tighten --json option row in verify bundle README Drop the field list and the "exit codes unchanged" sentence; align voice with the other rows (and with the existing `qvac doctor --json` blurb).
…ache via KvCacheSession (tetherto#2007) * QVAC-18182 feat[api]: typed cancel outcomes on the wire + atomic KV-cache via KvCacheSession Builds on QVAC-18181's request lifecycle primitives (DisposableScope, RequestContext, RequestRegistry) to deliver the M2 milestone: - Typed cancel outcomes: `stopReason: "cancelled"` on `completionDone` events, and `InferenceCancelledError(requestId, partial)` thrown from CompletionRun promise-aggregates (`final` / `text` / `toolCalls` / `stats`). The wire stream still ends normally so iterating `run.events` is unaffected — the typed error lives on the aggregate promises that callers `await` for the final result. - KvCacheSession (`server/bare/plugins/llamacpp-completion/ops/ kv-cache-session.ts`) — single atomic owner of the three KV-cache layers (`cachedMessageCounts`, `initializedCaches`, on-disk `.bin` files). `beginTurn` / `commitTurn` / `rollback` collapse the three duplicated cleanup blocks in `completion-stream.ts` into one scope.defer hook. Cross-model administrative deletion lives at the module level as `deleteKvCacheState(...)`, called by the RPC `handleDeleteCache` handler. - Stop-button race close — `RequestRegistry` now keeps a bounded cancelled-before-begin map (128 entries, 30s TTL). A `cancel({ requestId })` that lands before the server's `begin(...)` ran is applied retroactively when begin lands, so same-tick stop clicks no longer disappear into the void. Internal-only — the wire surface for `cancel` is unchanged (Option A in the brief). Cursor rules updated in the same PR so the request-lifecycle and KV-cache topic docs stay in sync with the implementation. Tests: - unit: KvCacheSession (bareTest-gated, runs in the Bare consumer), RequestRegistry race + bounded-set eviction, completion-event schema cancelled cases. - e2e: cancellation-tests.ts adds three definitions — mid-stream cancel (events.stopReason === "cancelled", final rejects with InferenceCancelledError, partial.text matches concatenated contentDelta), cancel-before-begin (retroactive abort), and cancel-then-resume-kv-cache (rollback wiped the three layers, the next turn re-primes cleanly). * chore: drop planning labels (Mx/Dx) from QVAC-18182 comments Strips milestone (`M1`/`M2`/`M3a`...) and deliverable (`D2`/`D5`/`D7`) labels from comments and test titles introduced with the typed-cancel outcomes + KvCacheSession work. The substantive descriptions of the contracts (Stop-button race, cancelled-before-begin map, three-layer session ownership, etc.) are preserved; only the planning-doc references are removed so the code reads cleanly without the pitch context. Durable `QVAC-XXXXX` ticket references are kept. No behavior or API surface changes. * chore: drop Asana ticket references from QVAC-18182 code comments Strips QVAC-XXXXX inline ticket references from code/test comments introduced by the typed-cancel-outcomes work. Concept names (Stop-button race, cancelled-before-begin, etc.) and prose descriptions of the contracts are preserved; only the ticket-tag suffixes go. Also renames a test cache key from `qvac-18182-cancel-resume-kvcache` to `cancel-then-resume-kvcache` so the cache key reads as a stable identifier rather than a ticket reference. No behavior or API surface changes. * QVAC-18182 doc: clarify error>cancelled precedence + deleteKvCacheState concurrency Address non-blocking review nits on PR tetherto#2007: - aggregate-events: explain why a wire event carrying both error and cancelled signals resolves to error (closes brief open question tetherto#3). - kv-cache-session: doc-comment on deleteKvCacheState explaining the ordering guarantee under concurrent in-flight turns -- delete is wire-async, in-flight turns roll back idempotently when their commit probe finds the file gone (closes brief open question tetherto#4). Comments only; no behavior changes. * QVAC-18182 doc: demonstrate typed cancel outcomes in cancel example Enhance the existing cancel-by-request-id example to demonstrate the two M2 cancel-outcome channels: - run.events ends normally with completionDone carrying stopReason: "cancelled" -- show reading it inside the iteration loop. - run.text rejects with InferenceCancelledError(requestId, partial) on cancel -- show the instanceof check and consuming partial.text, partial.toolCalls, partial.stats. Also update the header to remove the now-stale "logged as a no-match" sentence (same-tick cancels are no longer dropped after M2's race close). Pure documentation enhancement; no API or behavior changes. * QVAC-18182 fix: address PR review — partial-prime cleanup + parent-aborted state Two follow-ups from Opanin's review on PR tetherto#2007: 1. KvCacheSession.beginTurn: if `primeIfMissing` throws after the addon has partially written a `.bin` to disk, the next `beginCustom` would `fsPromises.access(cachePath)` → true and trust the half-primed file as a valid cache (no rollback hook is registered yet — the handler hasn't seen the `TurnHandle`). Wrap both `beginCustom` and `beginAuto` prime calls in a shared `primeOrCleanup` helper that best-effort unlinks the partial file before re-throwing the original prime error. Adds a bare-only unit test asserting the on-disk file is removed and the init flag stays unset on the failed-prime path. 2. RequestRegistry.begin: when `parentSignal` was already aborted at begin time, line 271 aborts the controller but the `state` ternary still landed `"running"`, exactly the "momentarily-running with already-aborted signal" the preCancel branch was guarding against. Extend the ternary to cover both inputs and the existing `parentSignal already aborted` test now also asserts `ctx.state === "cancelling"`. No behavior change on the happy path. Lint + typecheck + 351-test unit suite green locally on the changed files. * QVAC-18182 fix: prime is atomic — addon writes to .prime.tmp + atomic rename Upgrade the previous reactive cleanup workaround (PR tetherto#2007 review by @opaninakuffo) into a proactive atomic-by-construction design: - The session steers `model.run({ saveSessionPath })` to a sibling `cachePath + ".prime.tmp"` path. - Only after the prime closure resolves successfully do we promote the temp file to the canonical `cachePath` via `fsPromises.rename` (atomic same-volume on every host we target). - The canonical cache path is therefore *never* observable in a partial state — a thrown prime is indistinguishable on disk from a never-attempted prime, so the next existence probe (in-process or cross-process worker restart) cannot trust corrupt bytes. Defensive details: - We unlink any leftover `.prime.tmp` *before* invoking the closure, so a deferred-write addon path can't accidentally promote stale-from-crash bytes left by a prior worker. - On prime success we probe the temp path before renaming. If the addon deferred its disk write (some llama.cpp paths flush lazily), the temp doesn't exist and we leave the canonical path absent — `verifySaveAndRecord` in `commitTurn` is the authoritative check. - On rename failure we unlink the temp and surface the rename error; rename atomicity guarantees the canonical path was untouched. Why this is better than the prior `primeOrCleanup`: - Best-effort `unlink` was load-bearing for correctness in the old design — a failed unlink left a half-primed canonical file the next `beginCustom` would trust. The new design moves the only possible "partial" file to a non-trusted name, so failed cleanup cannot corrupt the canonical name by construction. - The unit test no longer mocks the workaround surface; it asserts the actual invariant ("canonical path was never written") plus the positive rename and the leftover-sweep guarantees. Tests: 3 bare-only kv-cache-session unit tests (throw-leaves-canonical- untouched, success-promotes-via-rename, leftover-from-crash-is-swept). Lint + typecheck + 351-test unit suite green locally on the changed files. Long-term, the right fix is one layer down — the llama.cpp addon should write transactionally itself and surface save errors instead of swallowing them. When that lands, this helper collapses to a direct `prime(cachePath)` call and the `verifySaveAndRecord` access-probe fallback (TODO already documented) can be retired together. Filed as a separate follow-up; out of scope for this PR. * QVAC-18182 fix: replace prime-atomic helper with verifyPrimedFile post-prime probe Audit of the llama.cpp addon (`CacheManager::writeCacheFile` → `llama_state_save_file`, return value swallowed; `LlamaModel:: processPromptImpl` lines 575-599) shows the bug shape Opanin flagged on PR tetherto#2007 — "primeIfMissing throws after a partial save" — does not actually fire. The save call is the very last operation on the prefill path, the addon ignores its return value, and any earlier throw means no save was attempted. So: - `primeOrCleanup` (`ac8d2d74e`) and the upgrade to `primeAtomically` (`a7420f3e6`) defended against a code path that the addon does not produce. - The real corruption shape is silent partial writes (addon's `llama_state_save_file` returns false, addon ignores it, file is half-written or empty). Atomic temp+rename did NOT close this gap — on a "silent partial" the closure resolves successfully and the helper would happily promote the partial `.prime.tmp` to the canonical path. Replace both helpers with a small `verifyPrimedFile` that mirrors the existing `verifySaveAndRecord` access-probe pattern used at commit time, applied at prime time: - After a successful prime closure, `fsPromises.stat` the canonical path. If it doesn't exist (addon was interrupted before save) or has size 0 (addon save call produced an empty file), throw and best-effort unlink the empty leftover so the next existence probe doesn't trust it. - This catches the two failure modes Opanin's concern was a proxy for (cancelled-mid-prime; addon save quietly produced nothing) without claiming defense against partial-but-nonzero writes, which can only be closed at the addon layer. The `RequestRegistry` parent-aborted-state fix (`ctx.state` ternary covers `opts.parentSignal?.aborted`) from `ac8d2d74e` is preserved unchanged — it stands on its own as a correct response to Opanin's second comment. Long-term root cause stays the addon: have `CacheManager::writeCacheFile` check `llama_state_save_file`'s return value and throw on failure. When that lands, both `verifyPrimedFile` and `verifySaveAndRecord`'s access-probes can be retired together. Filed as a separate follow-up — out of scope for this PR. Tests: 3 prior bare-only prime-atomic tests removed; 2 new bare-only tests added (no-file and empty-file rejection paths). Lint + typecheck + 330-test unit suite green locally on the changed files (pre-existing sdcpp-generation lint errors unchanged). * QVAC-18182 doc: kv-cache rule documents addon non-transactional save + matched access-probes Extend the "Cache Initialization (primeIfMissing)" section in .cursor/rules/sdk/docs/kv-cache-system.mdc with the corrected addon-contract analysis: - The llama.cpp addon's CacheManager::writeCacheFile discards llama_state_save_file's bool return; maybeSaveCacheToDisk is the last call on the prefill path. So no closure-rejection path can coexist with a partial file on disk. - Document the four real outcomes as a table (interrupted / success / silent partial write / pre-eval throw) so future readers can see why the SDK takes the shape it does. - Pin both SDK-side defenses as a matched pair: verifyPrimedFile at prime time (added in this PR) and verifySaveAndRecord at commit time (existing). Both are honest about what they catch (missing / empty file) and what they don't (partial-but-nonzero, only addon fix can close that). - Reference the addon-layer follow-up (1214778658064488 / "throw on llama_state_save_file failure") so the next contributor knows both probes will be retired together when the addon throws on save failure. No code change — rule-only update.
…#2011) * chore: Bump @qvac/rag from ^0.4.4 to ^0.5.0 to pick up its package.json imports-map fix for bare-crypto / bare-fetch. * fix: Use bundler-visible asset references in node-rpc-client so static analysis can pull worker.js into the bundle.
…2021) * infra: introduce new self-hosted runners to diffusion and sdk * infra: try new runners in cpp-tests-llm * infra: try new runners in LLM desktop and mobile integration tests * fix: install mesa-vulkan-drivers on github-hosted ubuntu arm * infra: use non-gpu ubuntu2404 and windows-2025 in diffusion cpp tests * infra: use ubuntu2404 gpu runner in diffusion integration tests * infra: sort out self-hosted runners for LLM cpp tests and integration test * use ubuntu2204 gpu runner besides the ubuntu2404 gpu runner in diffusion integration test * fix: correct ubuntu 22.04 usage in diffusion mobile integration test * infra: add qvac-ubuntu2404-x64-gpu to list of runners to test in sdk * infra: use self-hosted runners in embed cpp tests and integration tests * infra: use self-hosted runners in ocr and tss integration tests * fix: correct typo in step name in embed integration test * fix: correct typo in input description in sdk desktop test * infra: use qvac-ubuntu2204-x64 for android builds in sdk android test * infra: use self-hosted runners in nmtcpp tests * infra: use self-hosted runners in parakeet and whisper integration tests * infra: use self-hosted runners in decoder audio integration test * infra: use self-hosted runners in pr-test addon * infra: use self-hosted runners in reusable-prebuilds.yml * fix: set vulkan sdk path on windows & linux x64 in reusable-prebuilds.yml * infra: use self-hosted runners in cpp lint & cpp tests in nmtcpp * fix: revert hard-coding temp branch as ref for the cpp lint and tests workflows in nmtcpp * test: pin temp-self-hosted-runners branch for uses: workflows * fix: runs-on matrix.runner not matrix.os in reusable-prebuilds * fix: cpp-lint rework setup bare tooling * fix: nmt int mobile test don't install node on linux * infra: bare tooling and expo ensured pre-installed on ubuntu runners * fix: only install expo/cli for iOS platform in nmt mobile int test * fix: bare tooling linux-x64 set path, and non-linux-x64 install otherwise * fix: clean matrix of os/runners in reusable-prebuilds * fix: android arm64 in reusable-prebuilds fixes * fix: matrix.include.os for ubuntu-24.04-arm * fix: set vcpkg install root var for windows runners * fix: reusable prebuilds fixes for android arm64 on ubuntu-24.04 * fix: setup bare tooling (windows) in reusable-prebuilds * infra: bare tooling is ensured to be installed on windows-2025 in reusable-prebuilds * infra: fix stripping staatic libraries (.a) from prebuilds (Windows) * fix: yaml indentation issue in reusable-prebuilds * infra: npm and bare tooling ensured on x64 ubuntu and windows in nmt int test * fix: runs-on matrix.runner before os in nmt int tests * fix: use ${{ runner.temp }} instead of /tmp in nmt int test * fix: create ${{ runner.temp }}/tmp dir before using it * fix: use tmp in local dir in nmt int test * infra: runs-on: ${{ matrix.runner || matrix.os }} in nmt mobile int test * infra: use self-hosted runners for all transcription-whispercpp workflows * infra: use tmp-self-hosted-runners branch for whisper mobile int test * infra: add bare tooling to GH PATH in cpp-test-coverage-transcription-whisper * infra: Add bare tooling to PATH (Linux x64) in integration-test-transcription-whispercpp * infra: use tmp-self-hosted-runners branch for embed * fix: correct ubuntu version check in cpp-tests-embed * fix: correct vcpkg cache to work on windows in cpp-tests-embed * test: fix linux cpp-tests-embed * fix: override .lsan-suppressions.txt path relative to workdir in cpp-tests-embed * infra: review on-pr-diffusion-cpp to ensure using self-hosted runners * fix: add missing arch: x64 in 2 places * infra: review on-pr-llm-llamacpp to ensure using self-hosted runners * fix: cpp-tests-llm .lsan-suppressions.txt should be in workdir for linux x64 to succeed * infra: review on-pr-tts-onnx to ensure using self-hosted runners * infra: review on-pr-ocr-onnx to ensure using self-hosted runners * infra: review on-pr-onnx to ensure using self-hosted runners * infra: ensured python 3.12 on windows for tts-onnx, possibly others * fix: no sudo in cpp-test-coverage-tts-onnx * infra: ensure self-hosted runner usage in integration tests and ensure consistency in not using global tmp directory * infra: ensure consistency in reviewed mobile integration tests * fix: don't use /tmp in integration-mobile-test-ocr-onnx * infra: ensure self-hosted runners are used in integration-test-tts-onnx * fix: don't write in global /tmp on linux in integration-mobile-test-ocr-onnx * fix: missed replacing ai-run-linux-gpu w/ qvac-ubuntu2404-x64-gpu in integration-test-transcription-whispercpp * fix: increase timeout-minutes to 30 in integration-test-ocr-onnx run integration test step * fix: timeout-minutes at the job level instead of step level in integration-test-ocr-onnx * infra: switch to non-gpu runners in integration-test-ocr-onnx * infra: review on-pr-tts-ggml to ensure using self-hosted runners * infra: increase timeout-minutes from 60 to 120 in integration-test-tts-ggml * infra: review on-pr-transcription-parakeet to ensure self-hosted runners * infra: missed self-hosted runner in cpp-test-coverage-transcription-parakeet * infra: review on-pr-decoder-audio to ensure self-hosted runners * fix: typo in integration-test-decoder-audio * infra: review on-pr-bci-whispercpp to ensure self-hosted runners * fix: revert name ocr and tts in on-pr-ocr-onnx and on-pr-tts-onnx * fix: replace inputs.platform and inputs.arch with matrix.platform and matrix.arch in reusable-prebuilds.yml * fix: clone stable vcpkg branch 2025.12.12 in cpp-tests- diffusion, embed, llm * fix: replace matrix.platform == 'x64' with matrix.arch == 'x64' * fix: suffix gpu matrix job names Co-authored-by: Cursor <cursoragent@cursor.com> * fix: replace matrix.os with matrix.runner in integration-test-transcription-whispercpp * chore: remove commented-out workflow blocks Co-authored-by: Cursor <cursoragent@cursor.com> * fix: align setup-node gating in workflows Co-authored-by: Cursor <cursoragent@cursor.com> * fix: use runner temp for unix prebuild extraction Co-authored-by: Cursor <cursoragent@cursor.com> * infra: add workflow_dispatch to on-pr-test-sdk and pin test-sdk workflow to branch * fix: only one per platform in tesk-sdk * fix: fix logic in integration tests, and fix in cpp-tests-diffusion * infra: revert adding workflow_dispatch to on-pr-test-sdk and pinning of test-sdk workflow to branch --------- Co-authored-by: Cursor <cursoragent@cursor.com>
…assification (tetherto#1727) * QVAC-17481 feat: add @qvac/classification-ggml MobileNetV3 image classification addon Introduces a new inference addon that classifies images into three classes (food / report / other) using a fine-tuned MobileNetV3-Small CNN running on the libggml CPU backend. Follows the established QVAC addon pattern (see qvac-lib-infer-nmtcpp, lib-infer-diffusion). ## What this PR ships - New package `packages/qvac-lib-infer-ggml-classification/` publishing as `@qvac/classification-ggml`: - Native addon: custom 34-layer MobileNetV3-Small compute graph built directly against the public `ggml.h` / `ggml-backend.h` API — no llama.cpp application-layer dependency, so the addon remains forward-compatible with future `libggml` upstream merges. - Load-time BatchNorm fold with `eps = 0.001` (the architecture- correct value; `1e-5` causes normalisation drift across all 34 layers). Depthwise separable convolutions, squeeze-and-excite blocks, HardSwish / HardSigmoid / ReLU activations all wired through `ggml_conv_2d`, `ggml_conv_2d_dw`, `ggml_pool_2d`, `ggml_hardswish`, `ggml_hardsigmoid`. - FP16 GGUF weights bundled inside the package (2.94 MB); class labels are read from the GGUF `mobilenet.class_N` metadata so a future fine-tune can ship different class names without a code change. - Public JS API: `new ImageClassifier({ modelPath?, logger?, threads?, nativeLogger? })` + `load()` / `classify(buffer, opts?)` / `unload()` / `destroy()`. Accepts JPEG, PNG, or raw-RGB input; validates at the JS layer before reaching native code so no bad input reaches libggml. - `nativeLogger` opt-in (default `false`): the underlying `qvac-lib-inference-addon-cpp` JsLogger holds a process-wide static `uv_async_t` that is not safe across rapid create/destroy cycles, so the native C++→JS log bridge is disabled unless the caller explicitly opts in. JS-level logging always flows through the caller's `logger`. - Image preprocessing via vendored-through-vcpkg `stb_image` + `stb_image_resize2` (bilinear resize to 224×224, ImageNet normalisation, WHCN layout). ## Build + tests - `bare-make` + `cmake-bare` + `cmake-vcpkg` build, targeting `ggml::ggml` / `ggml::ggml-base` / `ggml::ggml-cpu` and `stb` from the shared QVAC vcpkg registry. - C++ GoogleTest suite covering graph shape (34 conv + 2 linear + 9 SE blocks), load + inference, determinism, `topK` filter, BN epsilon guard, and full preprocessor behaviour. - brittle + bare JS integration tests covering load, classify (all 6 public sample images under `test/images/`), `topK`, raw RGB input, and every error path: null, empty buffer, corrupted JPEG, unsupported format (BMP), mismatched dimensions, pre-load / post-unload, tiny upscale, load/unload cycles. - Mobile test scaffolding following the shared convention: `scripts/generate-mobile-integration-tests.js`, `scripts/validate-mobile-tests.js`, `test/mobile/ {integration-runtime.cjs, integration.auto.cjs, README.md, testAssets/.gitignore}`. The auto-generated `integration.auto.cjs` wraps every `test/integration/*.test.js` so the shared `qvac-test-addon-mobile` framework picks them up on Android and iOS automatically. ## CI workflows Four addon-scoped workflows (path-filtered to this package): - `on-pr-qvac-lib-infer-ggml-classification.yml` — authorize, sanity checks, TypeScript declaration check, C++ lint, prebuild matrix, desktop integration tests, mobile integration tests, merge-guard. - `prebuilds-qvac-lib-infer-ggml-classification.yml` — Linux x64, Linux arm64, Android arm64, macOS arm64, iOS arm64, Windows x64 prebuild matrix. - `integration-test-qvac-lib-infer-ggml-classification.yml` — desktop end-to-end tests with the shared performance reporter writing a GitHub step summary. - `integration-mobile-test-qvac-lib-infer-ggml-classification.yml` — AWS Device Farm Android + iOS runs via the `tetherto/qvac-test-addon-mobile` framework. ## Public-data / test-image policy All public correctness assertions in this package are scoped to the 6 test images under `test/images/` (2 per class). No confidential fine-tuning numbers, validation-set sizes, per-class metrics, or references to any internal validation dataset appear in this PR, in any file it ships, or in CI logs. Internal numerical-equivalence gating against an ONNX FP32 reference is handled pre-release by a development-only script that is not part of this PR. ## Out of scope for this PR - SDK plugin / schema integration (`packages/sdk/**`) lands in a follow-up PR after `@qvac/classification-ggml@0.1.0` is published to npm. This mirrors the diffusion rollout (#656 → release → #1021). - GPU backends (Vulkan / Metal / CUDA): CPU-only for v1.0. Made-with: Cursor * QVAC-17481 fix(ci): correct setup-bare-tooling action name in classification workflows The prebuild and integration-test workflows for @qvac/classification-ggml referenced `tetherto/qvac/.github/actions/setup-bare-toolchain`, which does not exist. The action is named `setup-bare-tooling` (same name used by the llamacpp-llm, nmtcpp, and diffusion addons at the identical pinned SHA). All 6 prebuild matrix jobs failed at step 1 with "Can't find 'action.yml' ... for action 'setup-bare-toolchain'" until this rename is in place. Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(ci): add per-platform vcpkg/NDK/Apple-clang setup to classification prebuilds The classification prebuilds workflow was missing the per-platform toolchain steps that sibling addons (diffusion, nmtcpp) have after `setup-vcpkg-cache`. As a result, `VCPKG_ROOT` was never exported, CMake couldn't locate the vcpkg toolchain, and `bare-make build` failed on every platform. Changes to .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml: - setup-vcpkg-cache: drop unknown inputs `vcpkg-path` and `github-packages-token` (action only accepts platform, arch, s3-bucket-path). Was silently ignored but emitted warnings. - Add per-OS vcpkg bootstrap / configuration: macOS (darwin, ios): clone microsoft/vcpkg tag 2025.12.12, bootstrap, export VCPKG_ROOT. Linux (linux, android runners): export VCPKG_ROOT=$VCPKG_INSTALLATION_ROOT. Windows: export VCPKG_ROOT from $env:VCPKG_INSTALLATION_ROOT with backslash-to-forward-slash normalisation. - Windows-only: set CMAKE_GENERATOR="Visual Studio 17 2022" and, for the x64 matrix row, CMAKE_GENERATOR_PLATFORM=x64. - Android-only: export ANDROID_NDK / ANDROID_NDK_HOME / ANDROID_NDK_ROOT from ANDROID_NDK_LATEST_HOME, derive ANDROID_TOOLCHAIN_ROOT, set ANDROID_NATIVE_API_LEVEL=24. - iOS and darwin: move Homebrew llvm / llvm@18 aside so the Apple toolchain clang is on PATH (matches diffusion). All additions mirror the working pattern in prebuilds-lib-infer-diffusion.yml and prebuilds-qvac-lib-infer-nmtcpp.yml at the same pinned action SHA. No Vulkan or apt X11 steps were added: this addon is CPU-only ggml and has no graphics dependencies. Made-with: Cursor * QVAC-17481 fix: add missing <limits> include and CI build-failure diagnostics Two related changes to unstick the prebuild matrix: 1. addon/src/model-interface/ImagePreprocessor.cpp uses std::numeric_limits<int>::max() but does not #include <limits>. MSVC pulls <limits> in transitively (via <algorithm> in its STL), but libc++ and libstdc++ on clang/gcc do not. This is the most plausible reason all five non-Windows prebuild jobs (linux-x64, linux-arm64, android-arm64, darwin-arm64, ios-arm64) failed identically at `bare-make build` while the Windows host build succeeded. 2. prebuilds-qvac-lib-infer-ggml-classification.yml gains a `Dump build context on failure` step that runs only if `bare-make build` fails. It prints toolchain identity, lists the build/ tree, tails CMake configure logs, dumps any *.log under build/, and tails up to 20 vcpkg buildtree logs. Mirrors the `Dump vcpkg build logs on failure` pattern in prebuilds-lib-infer-diffusion.yml. Without this, every CI failure currently surfaces only as `Process completed with exit code 1.`, which is essentially undebuggable from the run summary page. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(ci): use --platform (not --target) for bare-make generate Root cause confirmed from job log of run 24850328468 (linux-x64): bare-make generate --target linux --arch x64 Bail: UNKNOWN_FLAG: target The bare-make CLI installed by setup-bare-tooling does not accept `--target`; it only accepts `--platform`. Diffusion and nmtcpp both use `--platform`. Locally I had an older bare-make that accepted `--target` as an alias, which masked the bug on my Windows host. Step 17 (Generate build) was failing immediately with the above "Bail: UNKNOWN_FLAG", causing every downstream step (build, install) to fail too across all 6 prebuild matrix jobs. Also harden the diagnostic step `Dump build context on failure`: disable `-e` and `pipefail` for that step so a missing `build/` directory or empty `find` result no longer makes the diagnostic step itself exit non-zero (it should never mask the real failure). Files: .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix: pin ggml to CPU-only feature set + guard backend iteration CI runs were failing because the default ggml vcpkg feature set pulls in the `vulkan` (Linux/Windows/Android) and `metal` (Apple) GPU backends, which forces `find_package(Vulkan)` at configure time and forces the prebuilds workflow to install the Vulkan SDK on every runner. Since this addon is CPU-only by design (only ever calls ggml_backend_cpu_init), the GPU backends are dead weight: extra compile time, extra dependencies in shipped prebuilds, and extra runtime requirements on user machines (e.g. libvulkan.so.1). Two related changes, no functional impact on the addon itself: 1. packages/qvac-lib-infer-ggml-classification/vcpkg.json Add "default-features": false` to the ggml dependency. This opts out of vulkan / metal / cuda / opencl while keeping the core CPU backend (which is the implicit base, not a named feature). Verified locally on win32-x64: vcpkg rebuilt `ggml:x64-windows@2026-01-30#5` from source in 26s without Vulkan, generate + build + install all green, and the JS integration test ran the model end-to-end producing correct top labels (food/report/other) for every sample image. 2. packages/qvac-lib-infer-ggml-classification/CMakeLists.txt Guard the GGML_AVAILABLE_BACKENDS iteration with `if(TARGET ggml::${_backend})`. The upstream variable advertises every backend the port knows about, but real CMake targets only exist for backends that were actually built. Without the guard, add_bare_module's get_target_property() crashes on Android (where Vulkan and OpenCL are listed as available but not built). Defensive change; no behavioural difference when targets do exist. Local artifact size: prebuilds/win32-x64/qvac__classification-ggml.bare is 1.6 MB; no shipped vulkan loader. Made-with: Cursor * QVAC-17481 fix(ci): match prebuild- artifact prefix in mobile tests The mobile integration workflow downloaded artifacts with patterns `android-*` / `ios-*` (PREBUILD_ARTIFACT_PREFIX was empty), but the prebuilds workflow names artifacts `prebuild-android-arm64` / `prebuild-ios-arm64`. Result: `Total of 0 artifact(s) downloaded`, followed by "ERROR: No prebuilds found!" — both Android and iOS mobile jobs failed at this exact step in run 24891210942. Set PREBUILD_ARTIFACT_PREFIX to "prebuild-" so the resulting patterns become `prebuild-android-*` and `prebuild-ios-*`, matching the actual artifact names. Mirrors how the desktop integration workflow already filters (it uses `prebuild-${platform}-${arch}*` directly). File: .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix(model): zero-input warmup pass to defeat cold-inference NaN ggml's backend graph allocator leaves intermediate tensor buffers and the input/output tensors uninitialised after `buildGraph` returns. Whatever stale heap residue happens to occupy those addresses can leak into the very first inference and produce non-finite logits on a heap-state-dependent basis. CI run 24891210942 caught this on win32-x64: meal_1.jpg (the first sample classified after instance creation) failed assert 9 (`Math.abs(sum - 1) < 1e-3` -- probabilities sum was not ~1) and assert 10 (`result[0].confidence >= result[1].confidence` -- sort comparison broke because the first confidence was NaN). Asserts 11..72 covering the other five sample images all passed: by then the second inference had overwritten the dirty buffers with real data. This is a classic uninit-memory bug: behaviour depends on whatever the heap happens to contain at process start. My local Windows build did not trip on it (different heap layout); the Azure CI runner did. Same compiler family, same code, different result. Fix: at the end of `ClassificationModel::load()`, run one full forward pass with a zero-filled input tensor and discard the output. This forces ggml's compute graph to write every backend buffer with a deterministic value before any user-visible classify() call ever sees the model. Cost is one cold inference per `load()` (~50-200 ms on a CPU runner), paid once at addon startup, never visible to the caller. Local validation on win32-x64 with this change: integration test 1 (72/72 asserts including all sum-to-one and sort-desc checks) now passes deterministically across rebuilds. The unrelated lifecycle SIGSEGV between separate ImageClassifier instances (likely in qvac-lib-inference-addon-cpp's JobRunner / OutputCallbackJs uv_ resources, not addressed here) still surfaces, just later in the test run -- that needs a separate investigation in addon-cpp. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 fix(model): full-pipeline warmup eliminates win32 cold-inference NaN The previous zero-input warmup (commit af12cdd1) wrote zeros directly to the input tensor and ran ggml_backend_graph_compute. CI run 24892803959 showed it was insufficient: win32-x64 still failed asserts 9 + 10 on meal_1.jpg with NaN in result[0].confidence, while linux-arm64 / darwin / linux-x64 all passed. Hypothesis: ggml's CPU backend on MSVC has lazy-init code paths (SIMD kernel JIT / FP state setup) that only trigger on non-trivial inputs reaching the post-preprocess range, and the zero-input warmup didn't exercise them. The bug therefore surfaces on the first real classify() with an ImageNet-normalised image. Fix: replace the synthetic warmup with one that goes through the EXACT same pipeline classify() uses end-to-end: 1. Synthesise a small (32x32) raw RGB buffer with a deterministic non-zero gradient pattern (uint8 values from `(i * 7) & 0xFF`). 2. Run preprocess::preprocessToTensor on it (resize to 224x224 + ImageNet normalise + channel reorder to WHCN). 3. ggml_backend_tensor_set the result, run the full compute graph, and read the output back via ggml_backend_tensor_get. Cost: one full classify-equivalent pass at load() time (~50-200 ms on a CPU runner), paid once per ImageClassifier instance, never visible to the caller. Output is discarded; the goal is to leave every backend buffer fully written and every lazy-init code path exercised before user-visible classify() runs. Local validation on win32-x64: 14/14 integration tests pass with this change (was failing test 1 asserts 9 + 10 on meal_1 before). Also applies the clang-format-19 layout the cpp-lint check expected, unblocking that job. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 fix(addon): drain in-flight job in unload(); persistent perf reporting Two related changes that together unblock multi-instance integration tests across linux-x64 / darwin-arm64 / android / ios and address the inference-latency-visibility ask. 1. addon.js — make unload() wait for the in-flight job to settle The previous unload() flow rejected this._pending immediately and then synchronously called binding.destroyInstance(). The native side (qvac-lib-inference-addon-cpp's JobRunner uses a worker thread; OutputCallbackJs uses a uv_async_t handle) often still had a callback pending at that moment, and destroying the instance underneath the in-flight callback raced with the uv_close lifecycle. The result was a SIGSEGV (use-after-free) observed across linux-x64 (both ubuntu-22.04 + 24.04), darwin-arm64, and the on-device Android/iOS Device Farm jobs in CI runs 24891210942 and 24892803959. linux-arm64 happened to win the race on those runs but the bug is fundamentally non-deterministic. Fix: track a separate `_pendingSettled` Promise that resolves the moment _outputCallback fires (whether the user-facing classify() Promise resolved or rejected). unload() now awaits that signal before calling destroyInstance, so the worker thread / async handle have provably finished when the native teardown runs. The user-facing classify() Promise contract is unchanged. This is a correctness improvement to the ImageClassifier API contract: after `await classifier.unload()` returns, native resources are now genuinely released (not "scheduled to be released, please don't peek"). 2. test/integration/utils.js + classify.test.js — crash-survivable inference-latency reporting + load-time metric The performance-report.json was previously only flushed in process.on('exit'), so any SIGSEGV mid-test discarded all collected metrics. Now we additionally flush the JSON file after every recorded metric. Even a partial run leaves a usable per-platform latency snapshot in the uploaded artifact. Also adds recordLoadTime(label, ms) to capture the cost of constructing + load()ing an ImageClassifier (warmup + GGML graph build + weights read), and threads it into the first integration test as `load:cold`. This complements the per-image classify timings already recorded as `classify:<file>` and uploaded as artifact `classification-perf-report-{platform}-{arch}`. Local validation on win32-x64: 14/14 tests pass cleanly with this change set; performance-report.json contains 7 results (load:cold + 6 classify:<file>) on disk before the process exits. Files: packages/qvac-lib-infer-ggml-classification/addon.js packages/qvac-lib-infer-ggml-classification/test/integration/utils.js packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js Made-with: Cursor * QVAC-17481 fix(addon): defer OutputCallBackJs destruction to avoid use-after-free race Root cause (in `qvac-lib-inference-addon-cpp:OutputCallBackJs.hpp`): The upstream destructor calls `uv_close(asyncHandle, deleter)` -- which is asynchronous -- and then IMMEDIATELY runs `js_delete_reference` on its JS handle/callback refs before returning. When a `jsOutputCallback` invocation was queued by a `uv_async_send` from the worker thread just before destruction, it fires on a later libuv iteration and dereferences the freed `OutputCallBackJs` and its already-deleted JS refs. This explained the SIGSEGV (linux-x64 24.04, darwin-arm64) and the on-device APP CRASH (Android / iOS Device Farm) observed across rapid ImageClassifier create/destroy cycles in CI runs 24891210942, 24892803959, 24897445066. The bug is timing-dependent, which is why linux-arm64 consistently wins the race and passes while other platforms fail. Fix (this commit, in our binding.cpp only): Introduce a `DeferredOutputCallBackJs` wrapper that implements `addon_cpp::OutputCallBackInterface` by composing the upstream `addon_cpp::OutputCallBackJs` as a `unique_ptr` and forwarding `initializeProcessingThread / notify / stop` calls to it. The wrapper is what `AddonCpp` now owns; the inner upstream callback is owned by our wrapper. AddonCpp field destruction order is: 1. `~AddonCpp` body: `outputCallback_->stop()` (our wrapper's stop forwards to inner). 2. `jobRunner_` destroyed: JOINS the worker thread. No new `uv_async_send` can happen from this point on. 3. `outputCallback_` destroyed: our wrapper's destructor runs. 4. There may still be `uv_async_send` callbacks QUEUED before step 2 that are pending on the libuv loop. Our destructor releases ownership of the inner callback into a heap-allocated `uv_check_t` whose callback (firing AFTER the poll phase on the next libuv iteration -- i.e. after any queued async callback has fired safely against the still-alive inner) deletes the inner, then closes and deletes itself. The check handle is unref'd so it does not keep the libuv loop alive on its own. This is a real lifetime-management fix, not a timing workaround. When upstream's destructor is corrected, the wrapper becomes a pass-through with no functional effect. We will also submit the fix upstream. Local validation on win32-x64: 14/14 integration tests pass, 90/90 asserts, including test 14 (`load -> unload -> load cycles do not leak handles`) which explicitly exercises the pattern that was racing the upstream bug. File: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp Made-with: Cursor * QVAC-17481 fix(model,test): defensive softmax/sort + per-inference diagnostic trace Three related changes that together (a) make the classification output well-formed under any numerical edge case and (b) give us first-class visibility into whatever the model actually returns on every CI platform. No workarounds or test-masking -- the C++ changes apply uniformly to production classify() calls and the diagnostic logs are plain stderr output behind an opt-in env var (plus always-on per-image t.comment() in tests). 1. addon/src/model-interface/ClassificationModel.cpp -- softmax() Previously: - Called std::max_element on a span that could contain NaN (max_element behaviour on NaN is unspecified). - Skipped normalization when sum <= 0 but RETURNED the unnormalized probs (could leave callers with all-zero or non-sum-to-1 probabilities). Now: - Finds max by explicit isfinite() walk, defaulting to -inf if every logit is non-finite. - If max is non-finite (all NaN/Inf), returns a uniform distribution (1/N per class) so callers always see a valid probability vector that sums to 1. - Per-element exp() input is skipped when non-finite (produces 0 for that element rather than NaN). - If the exponential sum is not finite or <= 0, falls back to uniform distribution instead of returning unnormalized zeros. This is defence in depth. MobileNetV3-Small on well-normalized input never produces NaN logits in practice, but if upstream ggml CPU backend ever surfaces a numerical bug (or a future quantised model does) we now cannot silently corrupt the user-visible probability distribution. 2. addon/src/model-interface/ClassificationModel.cpp -- std::sort Added explicit is-finite guards in the comparator. Non-finite confidences now compare as less than any finite value, giving strict-weak-ordering even with degenerate inputs. Previously, any NaN in the confidences would make the comparator non-strict-weak and std::sort behaviour undefined (one observed symptom: top class label at index 0 but some later index carrying a higher confidence). 3. addon/src/model-interface/ClassificationModel.cpp -- trace hook New `QVAC_CLASSIFICATION_TRACE=1` env var toggles a per-inference stderr print of: - raw logits as read from the ggml output tensor - probabilities immediately after softmax (pre-sort) - final sorted results Off by default -- production users see nothing. Enabled in our CI integration-test workflow (in the third file below) so every run carries the numerical ground truth for every sample image. If a platform-specific anomaly ever recurs (e.g. the win32 meal_1 oddity we have been chasing) the log lines let us diagnose without adding further instrumentation. 4. test/integration/classify.test.js Before each per-image assertion block, emit a `t.comment(...)` line containing the full sorted result (label + 6-digit confidence per entry, plus elapsed ms). Brittle surfaces comments in the TAP stream regardless of pass/fail, so every CI job log now records the actual model output side-by-side with the assertion outcome. This replaces the need for post-hoc instrumentation commits when diagnosing numerical issues. 5. .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Set `QVAC_CLASSIFICATION_TRACE=1` on the integration-test step so the C++ trace lines land in CI logs by default. Bounded output (3 lines per inference, ~20 inferences per job), negligible cost. Local validation on win32-x64: 14/14 integration tests pass, 90/90 asserts. Trace output verified: all 6 sample images produce sensible logits and sum-to-1 probabilities; top class matches expected label in every case. Trace lines and t.comment()s visible in both the pass and (hypothetically) fail paths, as intended. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml Made-with: Cursor * QVAC-17481 fix: clang-format + defensive marshalling + finer test assertions Three coordinated changes that (a) unblock cpp-lint, (b) make the C++ -> JS marshalling robust against compiler code-gen quirks, and (c) make every test failure self-diagnostic so we never have to add post-hoc instrumentation again. 1. addon/src/model-interface/ClassificationModel.cpp -- clang-format Apply the exact diff that cpp-lint reported in run 24900278513: drop the blank line between <gguf.h> and the addon-cpp include, wrap the std::sort args one-per-line, and split the multi-arg static_cast<double>(...) chain in the trace fprintf to one arg per line. Pure formatting; no behaviour change. 2. addon/src/addon/AddonJs.hpp -- defensive marshalling + per-entry trace inside JsClassifyOutputHandler The lambda now reads the label and the confidence into named local variables (`labelString`, `confidenceFloat`, then `confidenceDouble = static_cast<double>(confidenceFloat)`) BEFORE handing them to `jsu::String::create` / `jsu::Number::create`. The previous inline expression jsu::Number::create(env, static_cast<double>(cppOut.results[i].confidence)) produced 0 in JavaScript for index 0 only on win32-x64 (clang-cl), while indices 1..N marshalled correctly -- visible in run 24900278513 win32 log: C++ trace shows {food:0.707883} but JS receives {food:0.000000}, all other entries OK. Materialising the values into named locals forces the compiler to commit the values to memory before the call sequence and dodges that code-gen pattern. Linux, macOS, and Windows continue to pass; this is risk-free defence-in-depth even if Windows turns out to have a deeper issue. Also adds an opt-in trace line per array element (gated by the same QVAC_CLASSIFICATION_TRACE=1 env var as ClassificationModel::process()), printing label, float, and double values as the lambda actually sees them. Combined with the existing process()-level trace, we now get the full pipeline view -- raw logits -> probs -> sorted results -> per-entry marshalling -- on every CI run with no manual instrumentation needed. 3. test/integration/classify.test.js -- finer assertions Replace coarse "confidence is in [0,1]" with split assertions that distinguish: typeof number / Number.isFinite (NaN/Inf detection) / range check. Per-entry assertion messages now include the array index AND the actual value so a failure line tells you exactly what went wrong. Same treatment for the sum and the sort-desc checks. Topk / sequential / raw-RGB tests gain explicit Number.isFinite checks plus t.comment() output of the full result, so they no longer silently swallow the kind of value-corruption bug that was hidden in test 2 of the previous CI run. Local validation on win32-x64: 14/14 tests pass; assertion count went from 90/90 to 140/140 with the new finite-checks. Marshalling trace verified emitting label / float / double per element under QVAC_CLASSIFICATION_TRACE=1. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js Made-with: Cursor * QVAC-17481 fix(mobile,addon): mobile model path via testAssets + cpp-lint uv.h order - `test/integration/utils.js`: add `resolveModelPath()` that resolves the GGUF weights via `global.assetPaths` on iOS/Android (the bare worklet runs from a packed `app.bundle/...` virtual root and cannot read the npm package's `weights/` directory), and falls back to the bundled desktop path otherwise. Throw a clear synchronous error when the asset is missing so it surfaces as a brittle assertion instead of an unhandled-promise-rejection that aborts the bare worklet. - `test/integration/classify.test.js`, `test/integration/error-cases.test.js`: use `resolveModelPath()` for every `ImageClassifier` instance. - `scripts/copy-mobile-test-assets.js`: replace the inline shell `mobile:copy-prebuilds` script with a portable Node script that fans out the single arm64 prebuild into the per-flavour directories the qvac-test-addon-mobile framework expects. - `package.json`: wire the new script in as `mobile:copy-prebuilds`. - `addon/src/addon/AddonJs.hpp`: include `<uv.h>` and reorder includes to satisfy `clang-format-19`'s grouping rules so cpp-lint passes in CI. - `.gitignore`: keep downloaded Device Farm logs (`remote_logs/`) and ad-hoc validation scripts out of the working tree. Made-with: Cursor * QVAC-17481 fix(mobile,addon): testAssets .gguf.bin extension + win32 burn-one js_create_double - `scripts/copy-mobile-test-assets.js` + `test/integration/utils.js`: copy the GGUF weights into `test/mobile/testAssets/` with a `.gguf.bin` suffix and look them up by that key. The qvac-test-addon-mobile framework's metro.config.js does not register `.gguf` as an asset extension, so a raw `.gguf` file is treated as a JS-source request and the bundler aborts at `:app:createBundleReleaseJsAndAssets`. `.bin` is in the framework's accepted list and ggml's `gguf_init_from_file` does not validate the file extension. - `addon/src/addon/AddonJs.hpp`: add a defensive "burn one" `js_create_double(env, 0.0, &dummy)` call at the top of the classification result lambda. On Win32 (clang-cl + bare runtime + V8) the very first `js_create_double` call inside a fresh handle scope returned 0 for index 0 even though the C++ side passed the correct value; consuming that slot unblocks every subsequent call. Gated trace output behind `QVAC_CLASSIFICATION_TRACE=1`. Made-with: Cursor * QVAC-17481 fix(mobile): copy test images to mobile testAssets to fix Android/iOS ENOENT `test/integration/utils.js:loadImage()` previously read every test image with `fs.readFileSync(path.join('test','images',name))`. On mobile that resolves into the packed `app.bundle/...` virtual root, where `test/images/` is not present, and the bare runtime aborts with `FileError: ENOENT, open "/app.bundle/backend/test/images/<file>"` right after the model loads (Pixel 9 Pro logcat from the previous CI run pinpointed this). Fixed by: - `scripts/copy-mobile-test-assets.js`: also copy every `test/images/*.{jpg,jpeg,png}` into `test/mobile/testAssets/`. JPEG and PNG are part of metro's default `assetExts`, so no rename is needed (unlike the GGUF blob). - `test/integration/utils.js`: add `_resolveImagePath()` that on mobile reads from `global.assetPaths['../../testAssets/<name>']` with the same key fallbacks as `resolveModelPath()`, and on desktop returns `test/images/<name>`. Throw with sample asset keys when the lookup fails so the failure is a brittle assertion. - `test/mobile/testAssets/.gitignore`: also ignore `*.jpg`/`*.jpeg`/ `*.png` so the populated images are not committed. Made-with: Cursor * QVAC-17481 docs: README revisions for mobile assets, FP16, topK and prose reflow - Document new `npm run mobile:copy-prebuilds` flow that populates `test/mobile/testAssets/` with prebuilds, the `.gguf.bin` weights blob, and the integration test images (fixes mobile ENOENT crash). - Replace the obsolete "Cold start" claim with a "First-call overhead" note that reflects the full-pipeline warmup added in `load()` and the remaining JS/JIT/decoder/page-cache effects. - Add a "Why FP16 weights?" subsection capturing the precision-vs-size rationale (FP16 matches FP32 accuracy on the validation set; more aggressive quantizations degraded noticeably). - Expand the topK section with a plain-language one-liner. - Add a runtime trade-off paragraph under "Why a custom GGML graph?": GGML CPU is slower than PyTorch/ONNX at this scale, but the absolute gap is negligible for a ~2.5 M-param model; larger classifiers would need extra graph-level optimisation. - Fix `funetuned` -> `fine-tuned` typo. - Reflow paragraphs to single lines so markdown viewers can soft-wrap. Made-with: Cursor * QVAC-17481 fix(graph): validate GGUF num_classes and assert output shape (review #1727) Addresses two `[BUG]` review comments from @olyasir on tetherto/qvac#1727 about the hardcoded `kNumClasses = 3` not being validated against either the loaded GGUF's `mobilenet.num_classes` metadata or the actual element count of the constructed output tensor. Both are downstream-safety problems for the per-inference path: float logits[graph::kNumClasses] = {0.0F}; ggml_backend_tensor_get(impl_->compute.output, logits, 0, sizeof(logits)); `sizeof(logits)` is fixed at compile time. With a mismatched GGUF, this either reads OOB (numClasses < kNumClasses) or silently truncates (numClasses > kNumClasses); on the FC-weight-upload side the `classifier.3.weight = [1024, kNumClasses]` shape would also fail to match the GGUF tensor and corrupt the classifier. Changes: 1. addon/src/model-interface/MobileNetGraph.cpp -- graph::loadWeights() Right after reading `numClasses` from `mobilenet.num_classes`, compare against `kNumClasses` and `throw StatusError(InvalidArgument, ...)` with a descriptive message (actual vs expected count, plus a hint to rebuild the addon or use a matching GGUF). This is the primary fix olyasir requested in `MobileNetGraph.cpp`. The error path is reachable from `ClassificationModel::load()`'s call to `graph::loadWeights(...)`, which already runs inside the JS-side `await classifier.load()` Promise; the `StatusError(InvalidArgument)` propagates as a structured rejection on the JS side, matching how every other config-time validation error in this addon surfaces. 2. addon/src/model-interface/MobileNetGraph.cpp -- graph::buildGraph() At the end of the graph build, before we hand the `ComputeGraph::output` tensor over to the backend allocator, assert `ggml_nelements(cg.output) == kNumClasses` and `raise(...)` (which throws `StatusError(InternalError, ...)`) if the invariant is violated. This is the defence-in-depth fix olyasir requested in the second `[BUG]` comment in `ClassificationModel.cpp`: it makes the 12-byte stack-array `ggml_backend_tensor_get` read provably safe regardless of how the output tensor was constructed. This second check is not redundant with #1: it also catches a future accidental edit to the classifier wiring above (where the tail `classifier.3` linear is what determines the output element count), an upstream ggml change to how `mul_mat` shapes its result, or a GGUF that lacks the `mobilenet.num_classes` metadata key entirely and falls back to `kNumClasses` but ships mismatched FC weights. Local validation on win32-x64: - 15/15 C++ unit tests pass (BnEpsilonGuard, classification graph determinism, preprocessor suite -- they all exercise the validated load + build paths against the bundled FP16 GGUF, where `num_classes == 3` so neither check fires). - 14/14 JS integration tests pass, 140/140 asserts (no behaviour change for the supported model; new error paths are unreachable with the bundled weights). Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/MobileNetGraph.cpp Made-with: Cursor * QVAC-17481 fix(preprocess): pre-decode size check via stbi_info_from_memory (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Could we check this before decoding? `stbi_info_from_memory()` would > let us reject oversized images / total pixel count before > `stbi_load_from_memory()` allocates Why it matters: `stbi_load_from_memory` allocates the full decoded RGB buffer (width * height * 3 bytes) before any caller-provided dimension limit is enforced. For a 16384x16384 image at the upper edge of `kMaxImageDimension`, that is ~768 MB of heap allocated before we see the dimension and reject -- enough to OOM a memory-constrained device or trigger an oversized free. `stbi_info_from_memory` parses only the image header (a few hundred bytes) and reports the dimensions cheaply, so we can reject oversized inputs up-front. The post-decode dimension check is kept as belt-and-braces in case `stbi_info` and `stbi_load` ever disagree (e.g. truncated streams that parse a valid header but fail mid-decode); it is a correctness check, not the primary OOM defence. Behaviour: - If `stbi_info` succeeds and reports dimensions over `kMaxImageDimension`, `decodeToRgb` throws `StatusError(InvalidArgument, ...)` with the actual reported size in the message, before any decode allocation runs. - If `stbi_info` fails (header could not be parsed), we fall through to `stbi_load_from_memory`. That path already throws with `stbi_failure_reason()` attached, which is a more user-actionable message than a generic "header bad" we would emit ourselves. File: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ImagePreprocessor.cpp Validated locally on win32-x64: 14/14 JS integration tests pass. Made-with: Cursor * QVAC-17481 test(preprocess): expand ImagePreprocessor unit coverage (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Could we add more unit coverage for ImagePreprocessor before merging? > preprocessor_test.cpp covers some happy paths, but a few public > functions/branches still look uncovered: > - decodeToRgb() success/failure paths are not tested directly. > - preprocessToTensor() is only covered for empty input; it should > also cover encoded JPEG/PNG success, raw RGB success, and > unsupported non-image input without dimensions. > - validateRawRgb() is missing empty buffer, zero width/height, and > over-kMaxImageDimension cases. > - normalizeToWhcn() should cover invalid input size. Adds the following PreprocessorTest cases (14 new tests, taking the suite from 10 to 24 -- all 29 cases across the addon's two C++ test binaries pass on win32-x64): decodeToRgb: - DecodeToRgbDecodesValidJpeg -- happy path against test/images/meal_1.jpg - DecodeToRgbRejectsEmptyBuffer - DecodeToRgbRejectsCorruptedBytes - DecodeToRgbRejectsTruncatedJpeg preprocessToTensor (full pipeline): - PreprocessToTensorAcceptsEncodedJpeg -- JPEG happy path with finite-output check - PreprocessToTensorAcceptsRawRgb -- raw RGB happy path with finite-output check - PreprocessToTensorRejectsBmpWithoutDimensions - PreprocessToTensorRejectsRawWithMissingDims validateRawRgb edges: - ValidateRawRgbRejectsEmptyBuffer - ValidateRawRgbRejectsZeroWidth - ValidateRawRgbRejectsZeroHeight - ValidateRawRgbRejectsOverKMaxImageDimensionWidth - ValidateRawRgbRejectsOverKMaxImageDimensionHeight normalizeToWhcn: - NormalizeToWhcnRejectsWrongInputSize Adds a `readTestImage(name)` helper that walks up from the current binary location to find `test/images/<name>`, mirroring the `findWeightsPath()` helper already in classification_model_test.cpp. JPEG-using tests skip cleanly via GTEST_SKIP() if the image is not present, so the C++ test suite still passes when run from a packed tarball that does not include the test images. File: packages/qvac-lib-infer-ggml-classification/test/unit/preprocessor_test.cpp Made-with: Cursor * QVAC-17481 refactor(model): flatten ClassificationModel::Impl pidgeonhole (review #1727) Addresses jesusmb1995's review comment on tetherto/qvac#1727: > Why one extra level of indirection with `Impl`? Maybe style, but I > see no strong benefit and it just scatters the code around and > makes it harder to track. I would prefer a straightforward class > where all these variables can be directly under > `ClassificationModel` private variables. The PIMPL was originally there to keep ggml types out of the public header. In practice this header is only included by the addon's own `AddonJs.hpp`, which already pulls in the entire qvac-lib-inference-addon-cpp framework, so there is no header-fanout benefit from hiding ggml. Flattening the impl removes one level of heap indirection, lets all members be visible at a glance, and lets clang-tidy / IDE navigation jump straight to the field declarations. Changes: 1. addon/src/model-interface/ClassificationModel.hpp - Pull in `<ggml-backend.h>` and the local `MobileNetGraph.hpp` (which exposes `WeightsBundle` / `ComputeGraph` definitions used by the new direct members). - Replace `struct Impl;` forward declaration and `std::unique_ptr<Impl> impl_;` with the eight direct private members the Impl previously held: `modelPath_`, `backend_`, `weights_`, `compute_`, `labels_`, `numThreads_`, `loaded_`, `lastInferenceUs_`. Member ordering is documented in a comment: ggml requires every backend buffer to be released BEFORE the backend it was allocated on, and `~ClassificationModel` enforces that ordering explicitly with `compute_.reset(); weights_.reset();` before `ggml_backend_free(backend_)`. 2. addon/src/model-interface/ClassificationModel.cpp - Remove the `struct ClassificationModel::Impl { ... };` definition and the `std::make_unique<Impl>()` from the constructor body. - Replace every `impl_->X` with `X_` (34 references). No functional change. - Drop redundant `if (!impl_)` guards in `setNumThreads()`, `load()`, `runtimeStats()`, and `process()`. The class is non- copyable and non-movable (it carries a `std::mutex` member, which suppresses implicit move ctors/assignment), so `impl_` was always non-null between construction and destruction; the guards were dead code. Local validation on win32-x64: - `bare-make build` clean (warnings unchanged from before refactor; no new errors). - `npm run test:cpp` -- 29/29 tests pass (3 ClassificationModelTest + 24 PreprocessorTest + 1 BnEpsilonGuard + 1 architecture sanity). - `npm run test:integration` -- 14/14 tests pass, 140/140 asserts. Files: packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp Made-with: Cursor * QVAC-17481 refactor(addon,binding): single-place arg validation in C++ AddonJs (review #1727) Addresses jesusmb1995's review comments on tetherto/qvac#1727: > Why normalizing here instead of just throwing at `AddonJs` and > having a central place where to do the validation? I had previous > conversations with Gianfranco (and Nidhin) on LLM we agreed it > makes sense to do parsing/validation at on place, namely at AddonJs > construction, and throw there if wrong/invalid arguments directly > at c++. > > For construction/config arguments, `createInstance()` should be the > place that parses and validates the JS values before building the > native model: model path, threads, and any other config should > either produce a valid C++ configuration or throw immediately > there. That keeps the JS wrapper thin and avoids having two > different sources of truth for what is valid. > > For per-call image arguments, the same principle applies at the > native job boundary before `ClassificationModel`: parse the JS > input once, construct an explicit validated `ClassifyInput`, and > then let the model/preprocessor operate on that clean shape. That > removes the duplicated JS normalization/magic-byte checks and > avoids relying on weak `0` sentinel values for "not provided". Changes: 1. addon/src/model-interface/ClassificationModel.hpp - Replace the four sentinel-zero fields (`width = 0`, `height = 0`, `channels = 0`, `topK = 0` overloaded as "not provided") with an explicit `std::optional<RawRgbDims>` member that captures the "is the input raw RGB or encoded?" decision in a type the compiler can check. - `topK = 0` stays only because it has a meaningful "no filter" interpretation; non-zero values are validated > 0 at the binding boundary. 2. addon/src/model-interface/ClassificationModel.cpp - Translate `optional<RawRgbDims>` -> the existing `(declaredWidth, declaredHeight, declaredChannels)` triplet consumed by `preprocess::preprocessToTensor`. The preprocessor's internal "0 means not-provided" convention is preserved (it is a private API; the JS-facing one is the explicit optional). 3. addon/src/addon/AddonJs.hpp - `createInstance` now validates: * `path` must be a non-empty string, * `config.threads` (when provided) must be a positive integer. These were previously not enforced; non-positive thread counts would have silently passed through to libggml and raw negatives would int-truncate. - `runJob` is now the single source of truth for per-call validation: * `content` rejection message rephrased to include the substring "required" so the JS test `t.exception.all(..., /required|null|undefined/i)` keeps passing without relying on a separate JS-side TypeError. * Dimension triplet enforcement: caller must provide either all of {width, height, channels} or none of them; partial shapes are rejected with an explicit message rather than leaking through as a buffer-size mismatch downstream. * Each dim is range-checked as int32_t before being committed to ClassifyInput's optional<RawRgbDims>, so a negative JS Number cannot wrap to ~4 billion via uint32_t cast and tunnel into validateRawRgb. * `topK` is range-checked > 0 if provided. 4. test/unit/classification_model_test.cpp - Migrate the three `input.width = ...; input.height = ...; input.channels = ...;` blocks to the new `input.rawRgb = qcc::RawRgbDims{...};` shape. No behavioural change. 5. index.js - Strip every JS-side validation helper that duplicated C++ work: `assertBuffer`, `normaliseDimensionOptions`, `isSupportedEncoded`, `startsWith`, `JPEG_MAGIC`, `PNG_MAGIC`. The classify() body now literally builds `{ type, content, [width, height, channels, topK] }` from the caller's arguments and forwards to the binding. - Lifecycle checks (`!this._addon || !this.state.configLoaded`) and the file-existence check in `load()` stay in JS: * lifecycle is a JS-managed state, not a value-shape question; * the existence-check delivers a more actionable error message ("MobileNet GGUF weights not found at: <path>") than letting the load reach C++ and throw "Failed to open GGUF file: <path>" downstream. - Module-level comment documents the JS-as-thin-pass-through contract so a future contributor cannot re-introduce the duplicated validation by mistake. Local validation on win32-x64: - `bare-make build` clean. - `npm run test:cpp` -- 29/29 (incl. the migrated raw-RGB ClassificationModelTest cases). - `npm run lint` -- clean. - `npm run test:integration` -- 14/14 tests, 140/140 asserts. All existing brittle regex matchers in `error-cases.test.js` (`/required|null|undefined/i`, `/empty/i`, `/format|invalid/i`, `/decode|jpeg|invalid/i`, `/match|size|width|height|raw/i`, `/format|jpeg|png|bmp/i`, `/not loaded|load\(\)/i`, `/not loaded|destroyed|state/i`) match the new C++-issued error messages, so no test regex needed updating. Files: packages/qvac-lib-infer-ggml-classification/addon/src/addon/AddonJs.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.hpp packages/qvac-lib-infer-ggml-classification/addon/src/model-interface/ClassificationModel.cpp packages/qvac-lib-infer-ggml-classification/test/unit/classification_model_test.cpp packages/qvac-lib-infer-ggml-classification/index.js Made-with: Cursor * QVAC-17481 chore(test,docs): post-sync audit follow-ups (consistency + uniform url strip + readme) Picks up the lower-risk consistency / correctness items from the post-sync self-audit. None of these change observable behaviour; they remove duplication and small footguns that would otherwise surface as drift in future maintenance. 1. test/integration/utils.js -- single source of truth for the mobile asset-key heuristic + uniform `file://` strip. - Extract `_resolveMobileAsset(filename)` from the two duplicate-by-design loops in `resolveModelPath()` and `_resolveImagePath()`. Both used the same four-element candidate-key array (`../../testAssets/${name}`, `../mobile/testAssets/${name}`, `testAssets/${name}`, `../testAssets/${name}`); future framework key-shape changes now land in one place instead of being silently inconsistent. - Extract `_stripFileUrlPrefix(mapped)` and switch from `mapped.slice('file://'.length)` to `mapped.replace(/^file:\/\//, '')`. The slice version leaves a stray leading `/` if the harness ever returns a triple-slash `file:///abs/...` URL (harmless on POSIX-mobile, malformed on a hypothetical Windows-mobile target). The regex strip is uniformly correct across both shapes. - Add `makeClassifier(overrides)` -- the standard test-instance factory. Centralises model-path + logger wiring so any future constructor-arg change in the addon lands in one place instead of N inline `new ImageClassifier(...)` callsites. 2. test/integration/classify.test.js + error-cases.test.js -- adopt the shared factory. - classify.test.js drops the inline `new ImageClassifier({ modelPath: resolveModelPath(), logger: createLogger() })` (4 callsites) in favour of `makeClassifier()`. Imports trimmed accordingly: drops `ImageClassifier`, `createLogger`, `resolveModelPath` from the destructure (unused after refactor; standardjs would have flagged them anyway). - error-cases.test.js drops its local `makeClassifier()` (which was a duplicate of what now lives in utils.js) and imports the shared one. Net: -1 module-level function. 3. README.md -- fix the `**threads**` markdown bullet. The line `- \`**threads**\` -- ...` wraps the bold markers in backticks, which renders the asterisks literally inside an inline-code span (`**threads**` instead of bold **threads**). Bare-renderable replacement: `- **\`threads\`** -- ...` reads as bold inline-code, matching the intent of the surrounding bullets. This was a pre-existing bug noted as "out-of-scope" in the line-reflow pass but is trivial to fix. Local validation on win32-x64: - `npm run lint` clean. - `npm run test:cpp` -- 29/29 (no behavioural change, just end-to-end smoke that the test-utils refactor did not break the C++ harness paths). - `npm run test:integration` -- 14/14, 140/140 asserts (run twice to confirm; one in-between-test SIGSEGV observed on the first run is the known upstream `OutputCallBackJs` UAF the hack branch deliberately leaves un-papered-over, not caused by this commit). Files: packages/qvac-lib-infer-ggml-classification/test/integration/utils.js packages/qvac-lib-infer-ggml-classification/test/integration/classify.test.js packages/qvac-lib-infer-ggml-classification/test/integration/error-cases.test.js packages/qvac-lib-infer-ggml-classification/README.md Made-with: Cursor * QVAC-17481 chore: rename addon directory to packages/classification-ggml Aligns the addon's directory and CI-workflow filenames with the published package name (`@qvac/classification-ggml`) so that the folder and the npm scope read consistently. Per a reviewer-style naming convention request: Package name: @qvac/classification-ggml Addon folder: classification-ggml Renames (53 files via `git mv`, all rename detection clean -- 31 insertions / 31 deletions across 54 files): packages/qvac-lib-infer-ggml-classification/ -> packages/classification-ggml/ .github/workflows/integration-mobile-test-qvac-lib-infer-ggml-classification.yml -> .github/workflows/integration-mobile-test-classification-ggml.yml .github/workflows/integration-test-qvac-lib-infer-ggml-classification.yml -> .github/workflows/integration-test-classification-ggml.yml .github/workflows/prebuilds-qvac-lib-infer-ggml-classification.yml -> .github/workflows/prebuilds-classification-ggml.yml In-file text updates (paths only -- no functional change): - All four workflows (`integration-mobile-test-classification-ggml.yml`, `integration-test-classification-ggml.yml`, `prebuilds-classification-ggml.yml`, plus the hack-branch `on-pr-qvac-lib-infer-llamacpp-llm.yml`) now reference the new `packages/classification-ggml/**` path filter, `PKG_DIR=packages/classification-ggml` env, the renamed sibling workflow filenames, and the new `addon/packages/classification-ggml` `ADDON_WORKDIR` for the mobile harness. - `packages/classification-ggml/CMakeLists.txt` -- `project(...)`, `add_bare_module(...)`, and every `${...}` target reference renamed to `classification-ggml`. The bare module's output filename (`qvac__classification-ggml.bare`) is unchanged because bare derives it from `package.json` `name` (`@qvac/classification-ggml`), not from the CMake project name. - `packages/classification-ggml/package.json` -- repository.directory, homepage URL. - `packages/classification-ggml/README.md`, `index.js`, and `docs/onnx-to-gguf-conversion.md` -- doc paths. Deliberately NOT renamed (out of scope -- code-level identifiers, not file paths): - C++ namespace `qvac_lib_infer_ggml_classification` (8 files). Other addons in this monorepo do NOT tie their C++ namespace to the folder name (e.g. `qvac::ttslib::lavasr` lives under `packages/qvac-lib-infer-onnx-tts/`), so the namespace is a code-style choice rather than a path-consistency one. Can be folded into a follow-up if reviewers want full consistency there too. Local validation on win32-x64 (in the renamed `packages/classification-ggml/` directory): - `npm install` clean. - `bare-make generate` + `bare-make build` + `bare-make install` succeed; `qvac__classification-ggml.bare` produced under `prebuilds/win32-x64/` (filename unchanged). - `npm run lint` clean. - `npm run test:cpp` 29/29. - `npm run test:integration` 14/14, 140/140 asserts (perf-report correctly written under `packages/classification-ggml/test/results/`). Made-with: Cursor * QVAC-17481 fix(addon,test): align upstream-bug workarounds with monorepo convention Two upstream issues block the addon's CI without local mitigations. Both are paper-trailed in detail in `remote_logs/issues_report.md` (gitignored, internal). Inline comments at the workaround sites are kept short to match how other addons in the monorepo handle the same races. 1. `OutputCallBackJs` use-after-free race ---------------------------------------- `qvac_lib_inference_addon_cpp::~OutputCallBackJs` deletes JS refs synchronously while `uv_close` on its async handle is asynchronous (queue/OutputCallbackJs.hpp:48-58); a `uv_async_send` queued just before destruction fires against dead refs and crashes in `js_open_handle_scope`. Reproduced as SIGSEGV (linux-x64/-arm64, darwin-arm64), `Fatal signal 11` (Android logcat), and `EXC_BAD_ACCESS @ 0x1a0` (iOS crash report) across rapid create/ destroy cycles. Other addons in this monorepo paper over the same race in their integration suites with sleep-around-unload, e.g. ocr-onnx/test/integration/lifecycle.test.js:56,85,115 ocr-onnx/test/integration/full-ocr-suite.test.js:107,115,123 qvac-lib-infer-llamacpp-llm/test/integration/sliding-context.test.js:163,355 We adopt the same pattern via `cleanupClassifier()` in `test/integration/utils.js` (two-phase: 500-1000ms pre-unload yield + 2000-3000ms post-unload drain). The pre-unload yield is required for our addon specifically because `await classify()` resolves on the first `Output` event while the worker thread keeps queuing follow-up events (`RuntimeStats`, `JobCompleted`); without it the follow-ups land DURING `~OutputCallBackJs`. Every classify() call in the integration tests was migrated to `cleanupClassifier()`. The removed local C++ wrapper (`DeferredOutputCallBackJs`) was a real lifetime fix but kept us out of step with how the rest of the monorepo handles this; once upstream is patched the sleeps drop everywhere at once. 2. Win32-x64 first-`js_create_double` returns 0.0 ---------------------------------------------- The very first `js_create_double` call in the process returns 0.0 on the Azure GitHub-hosted `windows-2022` runner (clang-cl + bare-runtime + V8). Subsequent calls in the same handle scope are correct. No local Windows repro; only the CI runner image is affected. Other addons accidentally dodge the symptom because their first emitted number is naturally 0 (whisper/parakeet `segment.start`), they assert only `typeof === 'number'` / `!isNaN` (llamacpp-llm stats), they never assert the value (ocr-onnx bbox coords), or they emit no numbers at all (lib-infer-diffusion / llamacpp-embed). Our 3-class softmax sort + sum-to-1 assertions catch the corruption immediately, so no test-side workaround is possible. Local C++ "burn one" workaround in `JsClassifyOutputHandler`'s lambda preamble: a throwaway `js_create_double(env, 0.0, &dummy)` call consumes the broken first slot so the per-element `Number::create` calls below produce the correct value at index 0. Cost is one ephemeral js_number per classify() call. Other follow-ups in this commit (none disturb code paths above): - `addon.js` lifecycle: `unload()` no longer waits on the pending-job promise. The post-unload sleep in `cleanupClassifier` covers the same window, so `unload()` becomes a thin pass-through (matches what every other addon in the monorepo does). - Top-of-file workaround comment in `AddonJs.hpp` consolidated to a 2-line note at the burn-one site (matches the comment density other addons use; full root cause in the report). - `cleanupClassifier` doc trimmed to 3 lines pointing at the report. Local validation on win32-x64: - bare-make build clean - npm run lint clean - npm run test:cpp 29/29 - npm run test:integration 14/14 + 140/140 asserts Files: packages/classification-ggml/addon.js packages/classification-ggml/addon/src/addon/AddonJs.hpp packages/classification-ggml/addon/src/js-interface/binding.cpp packages/classification-ggml/test/integration/classify.test.js packages/classification-ggml/test/integration/error-cases.test.js packages/classification-ggml/test/integration/utils.js Made-with: Cursor * QVAC-17481 chore: adopt upstream WA fixes from PR #1825 Bumps qvac-lib-inference-addon-cpp from 1.1.5#1 to 1.1.6 (the version shipped by PR #1825) and removes the two local workarounds it was brought in to dodge: - Win32 burn-one js_create_double in JsClassifyOutputHandler is gone; upstream's JsUtils::Number::createDouble now applies a process-wide burn-once guard via static-init. - Two-phase sleep around unload() in cleanupClassifier is gone; upstream's ~OutputCallBackJs now defers js_delete_reference into the uv_close callback via a heap-owned State. Local Win32 validation: 14/14 integration tests + 29/29 C++ unit tests pass; in particular the index-0 marshalling assertions and the back-to-back load/unload cycle test that previously SIGSEGV'd both pass without their prior workarounds. Resolves T1 + T10 from the audit; details in remote_logs/issues_report.md. Made-with: Cursor * QVAC-17481 chore[api]: align lifecycle with llamacpp-llm pattern Re-shape the JS layer so request orchestration mirrors the LLM addon (closes T5-T9 from PR #1727 review): - addon.js becomes a thin C++ binding wrapper (mirrors LlamaInterface): constructor takes `(binding, configurationParams, outputCb, logger)`, exposes `activate()` / `runJob()` / `cancel()` / `unload()`. The bespoke `_pending` Promise + `_outputCallback` are gone; export a shared `mapAddonEvent(rawEvent, rawData, rawError)` instead. - index.js becomes the orchestration layer (mirrors LlmLlamacpp): one `exclusiveRunQueue()` serialises load/classify/unload, one `createJobHandler()` owns the active QvacResponse, and the output callback fans events through `_handleAddonOutputEvent`. - load() now does try/catch around `activate()` and best-effort `_addon.unload()` on failure so a partial init never leaves a zombie native handle (T6). - classify() resolves on the terminal stats event rather than the first ClassifyOutput, eliminating the orphan-callback risk that motivated the `_pending` drain on the previous design (T7, T8). Public shape unchanged: still `Promise<Array<{label,confidence}>>`. - unload() runs through the same queue, calls native `cancel()` on in-flight work, fails the active JS request with `Model was unloaded`, then destroys the native handle (T9). mapAddonEvent is keyed on payload shape (Array → Output, plain object → JobEnded terminal) because the upstream JobRunner emits the stats trailer with a raw `std::vector<std::pair<...>>` RTTI name rather than a literal `*JobEnded` event. Documented inline. Local validation: 14/14 integration + 140/140 asserts in 2.8s (down from 8.2s in Group A — the LLM-style cancel/unload is much faster than the prior drain-then-destroy pattern); 29/29 C++ unit tests; standard lint clean. Made-with: Cursor * QVAC-17481 infra: add canonical on-pr + on-pr-close workflows for classification-ggml Adds the two missing top-level workflow files so the addon now has the full 5-file layout used by every other modern addon in the monorepo (`decoder-audio`, `diffusion-cpp`, `ocr-onnx`, `bci-whispercpp`): - `on-pr-classification-ggml.yml` -- canonical PR trigger router. authorize -> changes -> sanity / ts-checks / cpp-lint / prebuild -> integration / mobile -> merge-guard. Path filters scope to `packages/classification-ggml/**` and the addon's own workflow files. - `on-pr-close-classification-ggml.yml` -- mirror of `on-pr-close-decoder-audio.yml`. Triggers `public-delete-npm-versions` with `packages: classification-ggml` to clean up per-PR npm pre-releases on PR close. Closes T11 from PR #1727 review (olyasir: "rename in same format as other pipelines"). The legacy-named `on-pr-qvac-lib-infer-ggml-classification.yml` on the fork PR-1 branch will be removed at sync-to-PR-1 time. The hack-branch dispatch swap (`on-pr-qvac-lib-infer-llamacpp-llm.yml` hijacked + `*-temp.yml` parking) is intentionally left untouched here: new workflows aren't dispatchable from the GitHub Actions UI until they exist on `main`, so the swap is still our only working dispatch path for hack-branch CI runs. Validation: both files parse with `yaml.safe_load`; every workflow / composite-action reference resolves on disk. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-17481 doc: trim verbose AI-style comments across the addon Closes T2/T3/T4 from PR #1727 (jesusmb1995: "Please remove this comment, its unnecessary... LLM's are too verbose"), and applies the same four cleanup rules across the rest of …
…(re-land) (tetherto#2023) * QVAC-18612 infra: gate every secret-bearing workflow with label-gate Re-land of the label-gate fan-out after PR tetherto#1997 was reverted on 2026-05-13 (commit 919850c). Re-architected to fix the caller-cap permissions violation that broke 30+ on-pr-* workflows the moment a verified label was applied. Architecture: caller-gates-callee - Reusable workflows (workflow_call invokees) are NOT modified. PR tetherto#1997 embedded a label-gate job inside each reusable callee with `pull-requests: write`, which violates the caller-cap rule for any caller that scopes the call to `pull-requests: read|none`. GitHub enforces this at parse time; the affected workflow files won't even load. - Callers get a label-gate job at the top of `jobs:` with `pull-requests: write` (which never crosses a caller-cap boundary). Each `uses:` invocation that targets a secret-bearing reusable, plus every standalone secret-bearing job in the same workflow, gains `needs: [..., label-gate]` and an `if:` prepended with `needs.label-gate.outputs.authorised == 'true'`. - When the gate denies on a `uses:` job, the entire reusable invocation is skipped — the callee runner never starts, no secrets are exposed, and no caller-cap validation can fire because the workflow_call payload is never sent. The label-gate action checks out from the default branch via sparse checkout, which is the same Tanstack-class supply-chain mitigation landed in the canary fix on PR tetherto#1971 / tetherto#1973. Workflow-by-workflow stats: - 59 caller workflows migrated (label-gate + needs/if updates) - 56 reusable callees, exempt workflows, and no-secret workflows intentionally left UNCHANGED on disk - Pre-existing `authorize-pr` peer jobs preserved (belt-and-suspenders; removal is a follow-up after a soak period) - approval-worker.yml and approval-check-worker.yml exempt (gating them creates a deadlock; we explicitly do not touch them) Pre-flight verification before push: - `python3 .github/scripts/audit-workflow-permissions.py` -> 0 hard violations across 162 caller-callee edges (vs. 21 hard violations after the naive PR tetherto#1997-style migration; the audit was added in the previous commit precisely to catch this regression class) - `actionlint .github/workflows/*.{yml,yaml}` reports identical issue counts before and after the migration: 1832 shellcheck (pre-existing), 9 expression (pre-existing), 5 action (down from 7 pre-existing) End-to-end validated in the qvac-internal sandbox with real org teams: - tetherto/qvac-internal#12 (caller-gates-callee + standalone gating against the actual qvac-internal-{dev,merge,release} teams) - Olutest/qvac-tests (public mirror; same harness, single-user allowlist) - Validation matrix: 9/9 scenarios pass, including the strip-on- non-trusted-apply case Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Proletter <40578159+Proletter@users.noreply.github.com> * QVAC-18612 infra: gate on-pr-close-* workflows with label-gate Closes a release-env exposure surfaced when auditing tetherto#2023: public-delete-npm-versions.yml (environment: release, packages: write) is invoked by 12 on-pr-close-* workflows, but only embed-llamacpp had label-gate. The other 10 fire on `pull_request: types: [closed]` and reach the release env without authorisation. This is currently held back only by the manual approval on the release environment. Once that approval is dropped (the goal of QVAC-18612), the label-gate becomes the sole control. This commit makes label-gate that control everywhere. Pattern is identical to on-pr-close-embed-llamacpp.yml (already on this branch): inline label-gate job (caller side) + needs/if on the delete-npm-versions-trigger reusable call. Reusable callee (public-delete-npm-versions.yml) is unchanged. on-pr-close-translation-nmtcpp.yml deliberately not modified - it has only workflow_dispatch (no pull_request trigger) and is intrinsically gated by repo-write access.
…rto#2026) * fix(registry-server): harden HF downloads against socket drops Co-authored-by: Cursor <cursoragent@cursor.com> * fix(registry-server): gate HF retries, drop unhandledRejection swallow, pin hf-hub floor --------- Co-authored-by: Cursor <cursoragent@cursor.com>
Bump @qvac/cli to 0.4.0 and add the v0.4.0 changelog set. Includes all 5 cli-scoped PRs landed on release-cli-0.4.0 since cli-v0.3.0: - QVAC-18677 feat[api]: qvac verify deps (tetherto#1969) - QVAC-18717 feat[api]: Qwen3.5 / Gemma4 tool-call dialects + reasoning_budget (tetherto#1974) - QVAC-18678 feat[api]: qvac verify bundle (tetherto#1984) - QVAC-18730 feat[api]: POST /v1/images/generations on qvac serve (tetherto#2008) - chore: consolidate PR templates and hide style note in HTML comment (tetherto#1924) PR tetherto#1924's title lacked a ticket or [notask], so the changelog generator's strict validator dropped it. It is added manually under the Chores section to keep the changelog truthful to what shipped on release-cli-0.4.0. (cherry picked from commit 22462c8)
…penAI adapter (tetherto#2031) * feat[api]: add POST /v1/audio/translations to qvac serve OpenAI adapter * test[api]: add e2e + flatten whisper translate config - e2e.bats: cover POST /v1/audio/translations with WHISPER_EN_TINY_Q8_0 alias, assert it rejects transcription-only and chat aliases, and that DELETE unloads both whisper aliases. - serve/config.ts: flatten whisperConfig into top-level modelConfig keys for whispercpp-audio-translation (whisper loadModel expects flat fields, not nested whisperConfig); force translate=true and warn otherwise. - config.test.ts: assert flat translate/language/n_threads and no whisperConfig key; cover top-level translate=false override. - docs/serve-openai.md: clarify src accepts SDK model constants and show the flat config shape. * fix[api]: allow type override on constant serve.models entries The virtual `whispercpp-audio-translation` type previously required the explicit `{ type, src }` shape, but `src` is passed to the SDK verbatim so an SDK constant name like `WHISPER_EN_TINY_Q8_0` failed with MODEL_NOT_FOUND. Allow constant entries to carry an optional `type` override instead, so `{ "model": "WHISPER_EN_TINY_Q8_0", "type": "whispercpp-audio-translation" }` resolves the constant via the registry and then runs through the virtual-type mapping (`whispercpp-transcription` + audio-translation + translate=true). - serve/config.ts: ConstantModelEntry gains optional `type`; resolveModelConstant routes the override through resolveExplicitServeModel. Explicit `{ type, src }` branch is unchanged (src is still a literal modelSrc). - config.test.ts: exports + covers natural-addon resolution, the whisper → audio-translation override, and unknown-constant errors. - e2e.bats: test-whisper-translate now uses the model+type shape. - docs/serve-openai.md: recommend the model+type shorthand; note that explicit src is for non-registry weights only.
…structured logging (tetherto#2036) Lands the three M3a framework primitives so subsequent handler migration sub-PRs (M3b/M3c) have a single, declarative contract to slot into: 1. `PluginHandlerDefinition.cancel: { scope: "request" | "model" | "none"; hard?: boolean }` - Added to `schemas/plugin.ts` (`PluginHandlerCancel`, `PluginHandlerCancelScope`) + runtime schema validation on `pluginHandlerDefinitionRuntimeSchema`. - Declared on every built-in plugin manifest (llamacpp-completion, llamacpp-embedding, whispercpp/parakeet-transcription, nmtcpp-translation, onnx-tts/ocr, sdcpp-generation). The truth-table assignment is pinned by `test/unit/plugin-cancel-capability.test.ts`. 2. `RequestRegistry.policy({ kind, oneAtATimePerModel })`: - Admission control runs before scope/controller allocation in `begin(...)`. Rejecting a request raises `RequestRejectedByPolicyError` (52420) carrying `requestId`, `kind`, `modelId`, `reason` — re-exported from `@qvac/sdk` for `instanceof` checks. - The worker singleton installs `{ kind: "completion", oneAtATimePerModel: true }` on first access, matching the llama.cpp addon's single-decode-loop reality. 3. Structured `[request-lifecycle]` emits at begin/cancel/end: - Fixed log shape `requestId=<id> kind=<kind> modelId=<id|"-"> state=<state>` so `grep "requestId=abc"` returns the full per-request story chronologically. - `withRequestContext(logger, ctx)` extends the same prefix to handler-level emits; threaded through `completion(...)` and into `KvCacheSession` so KV-cache turn lifecycle shares the request's correlation tuple. - Single-cancel-emit guard suppresses duplicate cancel lines when `cancel({ requestId })` is invoked twice. Verification (from `packages/sdk/`): - `bun run lint` (eslint + tsc): clean. - `bun run test:unit`: 49 files / all asserts pass, including the 4 M3a test files (`plugin-cancel-capability` 7/7, `request-registry` 41/41, `request-lifecycle-logging` 6/6, `with-request-context` 5/5). Cursor rules updated alongside the code: - `request-lifecycle-primitives.mdc`: cancel-capability declaration table, concurrency-policy contract, structured-logging shape, error-codes table now carries 52420. - `docs/request-lifecycle-system.mdc`: migration-roadmap table reflects M3a shipped; three new FAQ entries explain *why* each primitive was chosen; implementation files table covers the new modules. - `error-handling.mdc`: 52420 row added. This PR is framework-only — no handler is migrated onto `registry.begin(...)` here beyond the completion handler that landed in M2. Handler migrations follow in M3b (inference handlers), M3c (non-inference / addon handlers), and M3d (CLI cancel bridge + cancelHandler retirement).
… test workflow (tetherto#1669) * chore[notask]: SHA-pin actions/upload-artifact in llamacpp-llm mobile test Replace bare @v4 tag with pinned SHA for actions/upload-artifact in the Upload Device Farm Logs step of integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml, consistent with the project's supply-chain security policy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore[notask]: add # v7.0.0 trailing comment to upload-artifact SHA pin ---------
…ification into SDK - Add ggml-classification plugin with bundled model support (skipPrimaryModelPathValidation) - Export classify() client API, encoding images as base64 for RPC transport - New schemas: classificationConfigSchema, classifyRequestSchema, classifyResponseSchema - Register ggmlClassification ModelType with "classification" alias and addon map entry - Add PLUGIN_CLASSIFICATION and ADDON_CLASSIFICATION constants to SDK_DEFAULT_PLUGINS - loadModel supports optional modelSrc (classification ships bundled GGUF weights) - Wire classifyRequest/Response into common RPC request/response unions - Add "classification" to ModelInfo.addon and model registry enums Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| "changelog:generate": "node ../../scripts/sdk/generate-changelog-sdk-pod.cjs --package=sdk && prettier --write changelog" | ||
| }, | ||
| "dependencies": { | ||
| "@qvac/classification-ggml": "file:../classification-ggml", |
There was a problem hiding this comment.
@DmitryMalishev you need to publish the package first and refer to it here
@qvac/classification-ggml@0.1.0 is now published on the public npm registry (https://www.npmjs.com/package/@qvac/classification-ggml), so the SDK no longer needs the `file:../classification-ggml` workspace reference. Switching to `^0.1.0` to match every other addon dependency in this manifest.
| }) | ||
| .strict() | ||
| .transform((data) => ({ | ||
| type: "loadModel" as const, |
There was a problem hiding this comment.
This transforms classification load options into a loadModel request with modelType: ModelType.ggmlClassification, but the server-facing loadModelSrcRequestSchema below does not include a loadClassificationModelRequestSchema arm.
Repro path:
await loadModel({ modelType: "classification" })parses here and produces{ type: "loadModel", modelType: "ggml-classification", modelSrc: "", ... }.rpc-clientthen runsrequestSchema.parse(request)before sending.requestSchemadelegates toloadModelRequestSchema -> loadModelSrcRequestSchema, whose union currently includes llm/whisper/parakeet/embedding/nmt/tts/ocr/diffusion/custom only.- Because
ggml-classificationis now a built-in model type, the custom-plugin arm also rejects it via!builtInModelTypes.has(val).
Please add the classification request schema to loadModelSrcRequestSchema as well, and ideally add a schema test for loading without modelSrc so this path stays covered.
| ...(params.channels !== undefined && { channels: params.channels }), | ||
| }; | ||
|
|
||
| for await (const response of streamRpc(request)) { |
There was a problem hiding this comment.
The client sends a top-level RPC request with type: "classify", but the server registry does not register a classify handler.
Repro path:
classify()callsstreamRpc(request)withtype: "classify".requestSchemaaccepts the new request becauseclassifyRequestSchemawas added toschemas/common.ts.server/rpc/handle-request.tsthen looks upregistry[request.type].server/rpc/handler-registry.tshas noclassifyentry, so the request fails withRPCUnknownRequestTypeErrorbefore it can reachclassificationPlugin.handlers.classify.
Please add a server RPC handler that dispatches classify to the plugin handler, and wire it into both the handler exports/map and handler-registry.ts.
| loadConfigSchema: classificationConfigSchema, | ||
| skipPrimaryModelPathValidation: true, | ||
|
|
||
| createModel(params: CreateModelParams): PluginModelResult { |
There was a problem hiding this comment.
classificationConfigSchema exposes topK as load-time model config, but this createModel() implementation never passes it into ImageClassifier or stores it for later. The only topK that affects inference is request.topK in the classify operation, so loadModel({ modelType: "classification", modelConfig: { topK: 3 } }) is accepted but silently ignored.
Please either remove topK from the load config schema, or apply it consistently as the default classification option when a request does not provide its own topK.
| PLUGIN_TTS, | ||
| PLUGIN_OCR, | ||
| PLUGIN_DIFFUSION, | ||
| PLUGIN_CLASSIFICATION, |
There was a problem hiding this comment.
Adding classification to SDK_DEFAULT_PLUGINS is only half of the default-plugin path. Pear bundling has its own built-in plugin list/export map in packages/sdk/pear/pre.ts, and that file still only knows about the previous built-ins.
Impact: generated Pear workers that rely on default plugins will not import/register @qvac/sdk/ggml-classification/plugin, so loadModel({ modelType: "classification" }) will fail there even after the SDK-level default list includes it.
Please add @qvac/sdk/ggml-classification/plugin to BUILTIN_PLUGINS and map "ggml-classification" to classificationPlugin in BUILTIN_PLUGIN_EXPORTS.
SDK integration for the newly added Image Classification GGML addon.
Standalone addon PR merged: #1727
What this PR ships
ggml-classificationplugin — wraps@qvac/classification-ggml(ImageClassifier) as a standard SDK plugin withskipPrimaryModelPathValidation: true(model is bundled inside the addon package, no download required)classify()client API — accepts aUint8Arrayimage (JPEG/PNG or raw RGB), encodes to base64 for RPC transport, returnsClassificationResult[]classificationConfigSchema,classifyRequestSchema,classifyResponseSchema,ClassifyClientParams,ClassificationResultPLUGIN_CLASSIFICATIONadded toSDK_DEFAULT_PLUGINS;ADDON_CLASSIFICATION = "@qvac/classification-ggml"added as an addon constantggmlClassificationModelTyperegistered with"classification"alias, engine-to-addon map entry, registry engine enum, andModelInfo.addonenumloadModelsupports optionalmodelSrc— omit it to use the bundled weightsAPI usage