QVAC-18300 chore[notask]: validate whisper-cpp 1.8.4.3 upstream sync by mario-rei · Pull Request #1975 · tetherto/qvac

mario-rei · 2026-05-11T15:00:16Z

Purpose

Temporary validation branch for tetherto/qvac-ext-lib-whisper.cpp PR #12 (upstream-sync to whisper.cpp v1.8.4.3).

Not for merge.

How it is wired up

whisper-cpp port in mario-rei/qvac-registry-vcpkg@whisper-184-3-validation repointed to mario-rei/qvac-ext-lib-whisper.cpp at commit 1318aee92eb807b32ff0419bd431cf0dbd2128b3 (head of whisper PR fix npm publish qvac-lib-dl-hyperdrive-integration-test1 #12)
whisper-cpp version bumped to 1.8.4.3
This branch updates the two consumers (packages/transcription-whispercpp and packages/bci-whispercpp):
- vcpkg-configuration.json -> mario-rei/qvac-registry-vcpkg fork, baseline pinned to the validation branch HEAD
- vcpkg.json whisper-cpp override -> 1.8.4.3

Workflows triggered

prebuilds-transcription-whispercpp.yml
prebuilds-bci-whispercpp.yml
integration-mobile-test-transcription-whispercpp.yml
integration-mobile-test-bci-whispercpp.yml

After validation

If green: land whisper PR fix npm publish qvac-lib-dl-hyperdrive-integration-test1 #12 on tetherto, then bump the port on tetherto/qvac-registry-vcpkg from upstream/main, then bump the consumer overrides on main. This branch can then be deleted.
If red: keep this branch around for follow-up fixes on the whisper PR.

…1142) * feat(tts): integrate LavaSR audio enhancer as opt-in post-processing Add neural speech enhancement (LavaSR) to the TTS package with three independent, opt-in config flags: `enhance` (Vocos BWE to 48kHz), `denoise` (UL-UNAS denoiser), and `outputSampleRate` (arbitrary target rate with smart algorithm selection). All flags default to off so backward compatibility is preserved; no new dependencies introduced. C++ implementation: - DSP utilities: Lanczos resampler, radix-2 FFT, windowed STFT/ISTFT, Slaney mel filterbank, spectral crossover merge (pure C++, no ML dep) - LavaSRDenoiser: chunked STFT-domain ONNX inference with overlap-add - LavaSREnhancer: backbone + spec head ONNX sessions with DSP pipeline - TTSModel::postProcess() pipeline: denoise -> enhance -> resample JS bridge: - AddonJs.hpp: 6 new config keys (enhance, denoise, outputSampleRate, enhancerBackbonePath, enhancerSpecHeadPath, denoiserPath) - index.js: constructor params, _getLavaSRParams(), download integration - index.d.ts: LavaSROptions interface, extended type declarations Testing: - 40 new C++ unit tests (DSP, LavaSR wrappers, TTSModel integration) - 3 C++ integration tests with real ONNX models (enhancer, denoiser, full Chatterbox+enhance pipeline verified producing 48kHz output) - JS integration test scaffolding for 6 enhancement scenarios - Model download helper for LavaSR from GitHub releases Made-with: Cursor * feat(tts): add nice-to-haves for LavaSR integration - Expose sampleRate in JS output callback: outputArray now includes a sampleRate field so consumers know the actual output rate. Also added to runtimeStats (JobEnded event). Uses shared atomic<int> between TTSModel and JsAudioOutputHandler. - Per-job enhance/denoise toggle: _runInternal now passes enhance, denoise, outputSampleRate from the job input to the native config, enabling per-utterance control. LavaSR sessions are lazily loaded on first use when toggled on per-job. - Benchmark tests: enhancer, denoiser, and resampler latency across 1s/3s/5s/10s audio durations. Results: enhancer ~22x realtime, denoiser ~48x realtime on Apple Silicon. Made-with: Cursor * chore(tts): fix lint, add release notes for CI - Remove unused constants to pass standard linter - Add release-notes/v0.6.2.md for release-notes-check workflow Made-with: Cursor * fix(tts): restore sample rate constants for lint compliance Made-with: Cursor * style(tts): apply clang-format to LavaSR C++ files Made-with: Cursor * test(tts): add missing LavaSR test coverage - Add outputSampleRate-only + backward-compat tests to addon.test.js (runs in CI without LavaSR models, validates resampling path) - Add per-job outputSampleRate toggle C++ test via AnyInput config - Add denoise+enhance combined pipeline C++ integration test - Add enhance+outputSampleRate combined path C++ integration test Made-with: Cursor * fix(tts): construct ONNXTTS directly in outputSampleRate test Made-with: Cursor * refactor(tts): remove enhancerModelDir, use individual model paths Follow existing codebase pattern where each model path is passed individually (enhancerBackbonePath, enhancerSpecHeadPath, denoiserPath) rather than via a directory. Consistent with how Chatterbox and Supertonic engines accept their model paths. Made-with: Cursor * fix(tts): use onUpdate pattern in outputSampleRate integration test Made-with: Cursor * chore(tts): move release notes to CHANGELOG, remove release-notes file Made-with: Cursor * fix(tts): address PR review feedback for LavaSR integration - C++: guard padReflect against N<=1 input, hoist frame/chunk/shape vectors out of hot loops in StftProcessor and LavaSRDenoiser, cache StftProcessor in MelFilterbank instead of recreating per call - C++: parseLavaSRConfig only overwrites fields present in configMap so reload events don't clear existing paths with empty strings - C++: add cancel check before postProcess step - JS: forward enhancerBackbonePath/specHeadPath/denoiserPath in per-job config so run()-level overrides reach the C++ layer - Tests: move LavaSR tests from separate file into addon.test.js, switch outputSampleRate test to Supertonic (faster), verify reported sampleRate via callback assertions instead of only checking output is non-empty - Rename example-lavasr-compare.js to example-enhanced-audio.js Made-with: Cursor * fix(tts): fix LavaSR review issues — merge safety, perf, PCM conversion, JS path handling Address remaining PR review findings for LavaSR integration: - Fix FastLRMerge division-by-zero when transition band is a single bin - Add input size validation in FastLRMerge for mismatched enhanced/original vectors - Fix PCM16 conversion asymmetry: use 32768.0f for symmetric [-1, 1) mapping - Hoist istft() frame allocation out of hot loop to avoid per-frame heap alloc - Send LavaSR model paths to C++ unconditionally (not gated behind boolean flags) - Update reload() to persist enhancer/denoiser model path changes - Log warning instead of silently swallowing stoi parse failures for outputSampleRate - Add clarifying comments on ORT_DISABLE_ALL, magnitude spectrogram, and cutoff logic - Add trailing newline to TTSModel.cpp Made-with: Cursor * fix(tts): address second-round review issues for LavaSR integration - Fix PCM16 overflow: clamp scaled value to [-32768, 32767] before int16_t cast to avoid UB when audio reaches +1.0f - Add FastLRMerge input size validation: throw on mismatched lengths, return enhanced directly when original is empty - Add outputSampleRate range validation [8000, 192000] in both C++ (parseLavaSRConfig warns and rejects) and JS (constructor throws) - Extract shared DspConstants.hpp with inline constexpr PI, replacing duplicate anonymous-namespace definitions in 3 source files - Extract shared DspTestHelpers.hpp with generateSine, rms, maxAbsDiff, replacing duplicate definitions across 6 test files - Improve LavaSREnhancer CONFIG_SAMPLE_RATE comment with upstream ref URL - Make LavaSR integration tests fail loudly via t.fail() when model downloads fail, instead of silently returning Made-with: Cursor * refactor(tts)[api]: restructure LavaSR config as enhancer object Replace flat enhance/denoise/model-path constructor fields with a structured enhancer config object using a type discriminator: enhancer: { type: 'lavasr', enhance, denoise, backbonePath, specHeadPath, denoiserPath } - Constructor: accepts enhancer object + outputSampleRate at top level - Per-job run(): accepts enhancer with type + boolean toggles - reload(): merges enhancer config incrementally - Internally flattens to C++ addon's flat config map - Future enhancers added as new type values Made-with: Cursor * fix(tts): address review feedback — fix runtime bugs, update API patterns, bump to 0.9.0 - Fix undeclared `accepted` variable in _runInternal (ReferenceError in strict mode) - Fix outputSampleRate validation: reject all out-of-range values, not just positive ones - Update all LavaSR integration tests and example to use 0.8.0 constructor API (files:{}, config:{}, enhancer:{} instead of flat top-level props) - Add cancel check before postProcess to allow early abort before expensive enhancement - Save/restore lavaSRConfig_ around per-job overrides to prevent state pollution - Guard negative outputSampleRate values in C++ parseLavaSRConfig - Remove BUILD_CLI option from CMakeLists.txt (out of scope for this PR) - Bump version to 0.9.0 - Remove double blank lines in index.js Made-with: Cursor * fix(tts): remove duplicate cancelRequested_ check in postProcess path The second cancel check was dead code — exchange(false) on the first check already cleared the flag, so the adjacent second check always saw false. Made-with: Cursor * chore(tts): update changelog heading from Unreleased to 0.9.0 Made-with: Cursor * fix(tts): fix LavaSR integration tests and address PR review feedback - Fix all LavaSR tests: use onUpdate callback pattern instead of accessing undefined result.data.outputArray - Move lavasrEnhancerConfig and loadReferenceAudio to test/utils/ - Switch LavaSR tests from Chatterbox to Supertonic (faster execution) - Remove separate backward-compat test; add reportedSampleRate assertion to existing Chatterbox and Supertonic tests - Rename section header to 'outputSampleRate tests' - Capture reportedSampleRate in runTTS utility - Apply clang-format to all changed C++ files - Add error handling to multilingual test reload/unload Made-with: Cursor * fix(tts): restore CMakeLists.txt mangled by clang-format clang-format was accidentally run on the CMake file, breaking its syntax. Restore the original version. Made-with: Cursor * fix(tts): revert format-only changes in files not modified by this PR Revert clang-format noise in SupertonicEngine.cpp, ChatterboxEngine.cpp, OnnxInferSession.cpp/hpp — these files have no functional changes in this PR, only formatting from an overly broad clang-format run. Made-with: Cursor * test(tts): skip LavaSR integration tests on mobile LavaSR ONNX models (~58MB) are not bundled in the mobile test app and downloading them on device farm is unreliable. Skip all 5 LavaSR tests when running on iOS/Android. Made-with: Cursor * chore: retrigger CI Made-with: Cursor * Change version bump to 0.8.3 --------- Co-authored-by: Raju <raju.sharma> Co-authored-by: Gustavo Araujo <gustavogefa@hotmail.com>

…nditioning (#884) * updated for sd * updated and successfuly built * downloads * updated with working loading * updated load model js for Q4_K test * rewrote parameter handling to support multiple params and also two different model types * got sd inference to work * updated for sd2 * got full sdxl to work * rename folder to qvac-lib-infer-diffusion * update package name * sd3 finished * rename: qvac-lib-infer-diffusion -> lib-infer-diffusion Rename package directory from packages/qvac-lib-infer-diffusion to packages/lib-infer-diffusion to align with the lib-* naming convention used across the monorepo. Made-with: Cursor * updated for cuda linux * updated for model * have something working * changelog * cpp lint * formatt * updated model for gian * integration test * fixing according to boss * fix(android): enable BUILD_SHARED_LIBS and stub pthread_cancel for GGML_BACKEND_DL GGML_BACKEND_DL requires BUILD_SHARED_LIBS=ON so CMake can build GPU backends as MODULE targets (.so). Previously BUILD_SHARED_LIBS was hardcoded OFF, causing configure to fail on Android. Also stub out pthread_cancel in ggml-backend-reg.cpp via a cmake string replacement — pthread_cancel is unavailable in the Android NDK. The loader thread terminates naturally without the explicit cancel. Made-with: Cursor * fix(android): exclude Vulkan on Android and fix pthread_cancel stub Two portfile fixes for arm64-android cross-compile: 1. SD_VULKAN: the else() branch was enabling -DSD_VULKAN=ON for Android, causing find_package(Vulkan) to pick up the host x86_64 SDK during cross-compile and fail CMake configure. Android Vulkan support comes via the NDK and is handled separately; skip the flag entirely. 2. pthread_cancel: replace the fragile comment-based no-op with a proper inline stub guarded by #if defined(__ANDROID__), injected at the top of ggml-backend-reg.cpp before compilation. Made-with: Cursor * ci: dump vcpkg configure logs on failure for android build Adds an always-run step that cats all config-*.log files from the vcpkg stable-diffusion-cpp buildtrees on failure, so the exact CMake configure error is visible inline in the CI job output. Made-with: Cursor * fix(android): insert pthread_cancel stub after pthread.h include The previous stub was prepended to the top of ggml-backend-reg.cpp before any #include, so pthread_t was undefined and the stub itself failed to compile — leaving pthread_cancel undeclared for the actual call site. Fix: insert the no-op stub immediately after #include <pthread.h> so pthread_t is available. Add a fallback that prepends both the include and stub if <pthread.h> isn't found directly. Also pass HAVE_PTHREAD_CANCEL=0 and GGML_HAVE_PTHREAD_CANCEL=OFF as CMake cache variables to disable any check_function_exists tests, and add DISABLE_PARALLEL_CONFIGURE to avoid race conditions with source patches. Made-with: Cursor * fix(android): resolve BUILD_SHARED_LIBS override and pthread_cancel issues Locally verified: stable-diffusion-cpp:arm64-android now configures and builds successfully. Three root causes fixed: 1. BUILD_SHARED_LIBS override: vcpkg maps VCPKG_LIBRARY_LINKAGE to BUILD_SHARED_LIBS, and the arm64-android triplet sets linkage to "static" — appending -DBUILD_SHARED_LIBS=OFF after our explicit ON. Additionally, stable-diffusion.cpp's CMakeLists.txt resets BUILD_SHARED_LIBS=OFF unless SD_BUILD_SHARED_GGML_LIB=ON. Fix: set VCPKG_LIBRARY_LINKAGE=dynamic for this port when DL backends are enabled, and pass -DSD_BUILD_SHARED_GGML_LIB=ON. 2. pthread_cancel stub redefinition: the previous stub was inserted via string(REPLACE) + fallback string(PREPEND), but both paths executed — producing a duplicate definition error. Also, vcpkg reuses cached source trees, so patches accumulated across builds. Fix: use a sentinel comment for idempotency; only one insertion path with the stub placed after #include <pthread.h>. 3. Removed the now-unnecessary explicit BUILD_SHARED_LIBS_OPTION variable since VCPKG_LIBRARY_LINKAGE handles it correctly. Made-with: Cursor * updated for android hopefully works * added opencl support for android * windows attempt fix * attempting to fix windows again * NORM problem with ggml operation * attempting to patch norm * attempting again to fix * diagonstic step * update for opencl * updated for device selection * fix(diffusion): add CI/CD workflows, test infra, and integration tests (#676) * fix(diffusion): rebase on feature-media-generation, add CI improvements Rebased cleanly onto feature-media-generation to pick up: - SD_CPU_ONLY env var gate (Metal NORM op fallback to CPU) - GGML_OPENMP=OFF (eliminates libomp.so.5 dependency) - OpenCL support for Android Additions on top of base: - Add cpp-tests and ts-checks jobs to on-pr workflow - Add image artifact upload to integration tests (traceable to source test) - Disable win32 in prebuilds/integration/cpp-tests (C1128 /bigobj) - Install libomp5 on Linux integration tests (safety net) - Test infrastructure: unit tests, mobile test framework, scripts * fix(diffusion): address PR review comments, enable win32, improve CI artifacts - Re-enable win32 platform in prebuilds, integration-test, and cpp-tests workflows - Remove duplicate PULL_REQUEST_TEMPLATE.md (already in repo root) - Fix setDiff in validate-mobile-tests.js to handle non-Set inputs - Refactor generate-image.test.js to use ensureModel from utils.js - Save test images to modelDir for mobile permission compatibility - Update CI to look for images in test/model/ instead of output/ - Add PR comment step to post image metadata on pull requests * fix(diffusion): restore base branch code accidentally removed during rebase Restores SD_CPU_ONLY patch, GGML_OPENMP=OFF, OpenCL support, Apple keep_clip_on_cpu guard, and VCPKG_BUILD_TYPE placement that were dropped when patches were applied on top of the reset base. * style(diffusion): fix lint errors in examples (no-multi-spaces, indent) * feat(diffusion): upload test images to S3 and display inline in step summary Images are uploaded to S3 with public-read ACL, then embedded in the step summary and PR comments via their S3 URLs so they render inline without needing to download artifacts. * ci(diffusion): remove libomp5 install (fixed by GGML_OPENMP=OFF in portfile) * remove S3 upload, use simple table summary for generated images * restore AWS env vars from base branch * refactor(diffusion): consolidate test utils, remove helpers.js Move detectPlatform, setupJsLogger, isPng into utils.js and update generate-image.test.js to import from utils.js only. Add platform detection for device selection in model-loading.test.js. * fixed integration tests * updated * updated timeout * cpp unit tests complete and tested YAY BABY * cpp lint * updated * test(diffusion): add integration tests for SDXL, SD3, and FLUX.2 (#757) * test(diffusion): add integration tests for SDXL, SD3, and FLUX.2 Add integration tests for all supported model families based on the existing examples. Each test follows the LLM addon patterns: platform- aware device selection, defensive cleanup with .catch(), ensureModel for CI downloads. - generate-image-sdxl.test.js: SDXL Base 1.0 (all-in-one GGUF, auto eps-prediction) - generate-image-sd3.test.js: SD3 Medium (safetensors, flow prediction, euler sampler) - generate-image-flux2.test.js: FLUX.2 klein 4B (split layout: diffusion + LLM + VAE) - Regenerate all.js (brittle) and integration.auto.cjs (mobile) * fix(diffusion): use CPU on all darwin platforms Metal's GGML_OP_MUL_MAT is unsupported for stable-diffusion.cpp, causing SIGABRT on darwin-arm64. Use isDarwin (all darwin) instead of isDarwinX64 for the useCpu check. * revert: keep GPU on darwin-arm64 to surface Metal errors Don't hide GPU errors behind CPU fallback — the Metal MUL_MAT issue needs to be visible so it gets fixed. * test(diffusion): increase test timeouts for CPU-bound runs FLUX.2 30min, SDXL/SD3 15min — these models are too heavy for the default 10min timeout when running on CPU. * chore: remove all.js from tracking (auto-generated, gitignored) * test(diffusion): skip SDXL, SD3, and FLUX.2 tests on mobile * QVAC-13954: Clean up vcpkg deps in lib-infer-diffusion (#781) * refactor: split ggml into standalone vcpkg overlay port Decouple ggml from the stable-diffusion-cpp overlay port so it can be shared by multiple consumers with consistent ABI guarantees. - Add standalone ggml overlay port (version-date 2026-01-30) pinned to the same commit used by stable-diffusion.cpp master-514-5792c66 - Refactor stable-diffusion-cpp port to use vcpkg_from_github + SD_USE_SYSTEM_GGML=ON instead of cloning with --recurse-submodules - Patch ggml's src/CMakeLists.txt and cmake/ggml-config.cmake.in to propagate GGML_MAX_NAME=128 via INTERFACE_COMPILE_DEFINITIONS, ensuring all consumers share the same struct layout - Switch both ports to version-date versioning (no upstream semver) - Replace bundled stb headers with vcpkg stb dependency - Auto-enable Vulkan backend on Linux via platform dependency - Forward GPU backend features (metal/vulkan/cuda/opencl) from stable-diffusion-cpp to ggml through vcpkg feature * fix(diffusion): fix ggml/sd overlay ports for Android cross-compilation Add NDK-matched Vulkan C++ header detection so the ggml port downloads headers matching the exact NDK Vulkan version instead of pulling a potentially mismatched vcpkg vulkan-headers package. Add missing ggml-opencl.h to the public headers install list. Auto-enable opencl on Android and vulkan on desktop/Android via default-features in both the ggml and stable-diffusion-cpp overlay ports. * fix(diffusion): disable OpenMP and align ggml flags with qvac-fabric Add GGML_OPENMP=OFF to fix Windows CI failure where OpenMP is unavailable, and GGML_LLAMAFILE=OFF to disable unused code paths. Add Android-specific flags for DL backends (GGML_BACKEND_DL, CPU_ALL_VARIANTS, CPU_REPACK) and disable cooperative matrix Vulkan extensions on mobile GPUs. * fix(diffusion): fix ggml include dirs for DL backends and use tetherto fork Patch ggml-config.cmake.in to set INTERFACE_INCLUDE_DIRECTORIES on the ggml::ggml and ggml::ggml-base targets unconditionally. When GGML_BACKEND_DL is ON, the per-backend targets are not created and include dirs were lost. Also switch the SD source to the tetherto fork and drop the qvac-diffusion- library prefix from CMakeLists.txt now that ggml is a standalone port with standard names. * Remove redundancies in vcpkg manifest files * Set SD_CPU_ONLY=1 on CI env * updated for runtime stats * fixed connection to logger, as it was not properly connected before * fixed for license file, validated working run on m1 air * quickstart quick-maths * fixed integration for windows * fix(diffusion): add real cancel/abort support to native generation (#782) * fix(diffusion): add real cancel/abort support to native generation Cancel previously only set an atomic flag checked after generate_image() returned — generation ran to full completion and output was silently discarded. This made cancel appear to work while still burning full compute time. Changes: Portfile patches (stable-diffusion.cpp): - Add sd_abort_cb_t typedef and sd_set_abort_callback() public API - Add sd_abort_requested() helper checked in the denoise lambda - When abort fires, denoise returns nullptr which the sampler stack already treats as failure → generate_image() returns NULL - Fix upstream bug: abort path freed wrong compute buffer (diffusion_model instead of work_diffusion_model), corrupting sd_ctx and causing segfault on reuse SdModel.cpp: - Wire cancelRequested_ into abort callback via thread-local (matches existing progress callback pattern for concurrency safety) - Scope guard ensures callbacks are cleared on all exit paths including early parse/validation exceptions - Always free results[i].data whether cancelled or not (buffer leak fix) - Cancelled jobs throw "Job cancelled" → JobRunner emits queueException instead of fake success with queueResult + queueJobEnded - Return empty std::any from process() so queueJobEnded() is the sole terminal stats path (fixes duplicate JobEnded events in JS) SdModel.hpp: - Add isCancelRequested() public accessor for the static abort callback * fix(diffusion): disable free_params_immediately for model reuse The upstream sd_ctx_params_init() defaults free_params_immediately=true, which permanently frees model weight buffers after the first generate_image() call. Any subsequent generation on the same sd_ctx accesses freed memory and crashes (SIGSEGV). Set the default to false so the addon supports multiple generations on the same model instance (the expected use pattern). This was the root cause of the "cancel then run" crash — the abort path still runs through generate_image_internal() which calls diffusion_model->free_params_buffer() when this flag is true. * fix(diffusion): add code comments and rename fix-abort-cleanup patch - Add comments to SdCtxHandlers.hpp explaining why freeParamsImmediately is disabled (upstream default frees weight buffers after first generation, causing use-after-free on model reuse) - Add comments to both hunks in the upstream cleanup patch explaining the compute buffer bug and work_ctx leak - Rename fix-abort-cleanup.patch to fix-failure-path-cleanup.patch since the fixes apply to any failure path, not just abort * fix(diffusion): document cancel-as-error rationale vs LLM addon Diffusion throws on cancel (queueException) while LLM returns normally (queueResult). Add comment explaining the intentional difference: diffusion has no useful partial output, so an explicit error signal is more honest than a success with output_count=0. * test(diffusion): add C++ unit tests for cancel/context handling Add test_cancel_context.cpp covering the context changes from the cancel fix: - cancel when idle is a no-op (no crash, no state corruption) - cancel during generation throws "Job cancelled" (cancel-as-error path) - model is reusable after cancel (validates freeParamsImmediately=false and compute buffer fix — the exact SIGSEGV scenario) - multiple sequential generations succeed (normal reuse without cancel) - cancelRequested_ flag is reset at process() entry - process() on unloaded model throws (not segfault) - runtime stats are populated after successful generation * fix(diffusion): fix patch line counts and test assertion - Fix fix-failure-path-cleanup.patch: correct hunk line counts (-2203,7 +2203,11 and -3796,6 +3800,13) and replace Unicode em-dashes with ASCII in comments - Fix CancelWhenIdleIsNoop test: cancel() sets the flag even when idle, it is only cleared on process() entry * refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup (#794) * refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup Patch ggml to support GGML_BACKEND_DL with BUILD_SHARED_LIBS=OFF by enabling PIC and backend compile definitions when DL is on, matching the qvac-fabric approach. Remove VCPKG_LIBRARY_LINKAGE=dynamic override — core libs are now static .a with PIC, backends remain MODULE .so files. Clean up CMakeLists.txt: remove redundant explicit linking of OpenCL, Metal frameworks, CUDA libs, and ggml (all propagated transitively via ggml cmake config). Fix WIN32_LEAN_AND_MEAN typo, remove stale comments, and drop the clang overlay triplet workaround. * chore(diffusion): switch Linux to libc++, fix vcpkg warnings, remove dead patches Add libc++ triplets for x64-linux and arm64-linux under vcpkg/triplets, matching the qvac-lib-infer-llamacpp-llm layout. Move triplet and toolchain files from vcpkg-override-triplets to vcpkg/. Install the stable-diffusion-cpp usage file and suppress mismatched binary count warnings in both overlay ports. Remove obsolete rename-ggml-libs and no-dlopen-without-backend-dl patches from the old submodule architecture. * fix(diffusion): disable GGML_BACKEND_DL for Android static backends stable-diffusion.cpp calls ggml_backend_is_cpu() and ggml_backend_cpu_init() directly, which live in the CPU backend module. With GGML_BACKEND_DL these become separate .so files unavailable at link time, causing dlopen failures on device. Statically link all backends (CPU, Vulkan, OpenCL) instead, and bundle the OpenCL ICD loader .so on Android so the addon loads even on devices without a system libOpenCL. * Place the OpenCL ICD Loading library next to bare file * fix(diffusion): graceful OpenCL fallback and backend priority reorder Patch ggml's OpenCL backend to return nullptr instead of aborting when no OpenCL devices are found (e.g. Pixel phones without OpenCL support). Reorder SD backend priority to CUDA > Metal > OpenCL > Vulkan > CPU, preferring OpenCL on Adreno devices where it outperforms Vulkan, with if-guards so only the first successful backend is used. * feat(diffusion): Adreno-aware backend selection for Android Detect Adreno GPU model at runtime via ggml device enumeration and choose the optimal backend: Adreno 800+ uses GPU (OpenCL), Adreno 600/700 is forced to CPU due to poor OpenCL performance, and non-Adreno devices fall through to Vulkan. Adds INFO-level logging of detected devices and selection decisions for troubleshooting. * fix(diffusion): statically link OpenCL ICD loader on Android Add an overlay port for opencl that removes the dynamic-only restriction, allowing the ICD loader to be built as a static library. This eliminates libOpenCL.so as a NEEDED dependency so the addon loads on all Android devices regardless of OpenCL support. The static ICD loader still dlopen's vendor drivers at runtime. * Fixed formatting * CPU only on Android * feat(diffusion): hybrid static CPU + dynamic GPU backends for Android (#813) * feat(diffusion): hybrid static CPU + dynamic GPU backends for Android Add GGML_CPU_STATIC option that builds the CPU backend as a static library linked into ggml even when GGML_BACKEND_DL is ON. GPU backends (Vulkan, OpenCL) remain MODULE .so files loaded at runtime via dlopen, eliminating libOpenCL.so as a NEEDED dependency. This lets stable-diffusion.cpp call CPU backend functions directly (ggml_set_f32, ggml_backend_cpu_init, etc.) while GPU backends are discovered at runtime — a single Android binary works on all devices regardless of OpenCL/Vulkan support. * feat(diffusion): generic backend init using ggml registry API Replace SD's init_backend() #ifdef waterfall with generic ggml calls (ggml_backend_init_by_type) that work with both statically linked and dynamically loaded backends. Load DL backend modules from the addon via ggml_backend_load_all_from_path() when GGML_BACKEND_DL is enabled. This eliminates SD's dependency on GPU-specific headers (ggml-opencl.h, ggml-vulkan.h, etc.) and removes the SD_METAL/VULKAN/CUDA/OPENCL build flags, replacing sd-cpu-only.patch and sd-backend-priority.patch with a single sd-generic-backend-init.patch. * feat(diffusion): prefer OpenCL on Adreno 800+ via sd_ctx backend preference Add a new backend preference field in stable-diffusion context params and wire SdModel to request OpenCL for Adreno 800+ when available, while keeping SD_CPU_ONLY as CI-only env override. Also fix ggml hybrid export wiring so CPU static symbols are linked for Android DL backend mode, and refresh android-arm64 prebuild artifact. * fix(diffusion): pass backendsDir to SdCtxConfig * Added logging to troubleshoot pixel vulkan init * fix(diffusion): JS layer review fixes and cancel test coverage (#783) * fix(diffusion): JS layer review fixes and cancel test coverage Aligns the JS layer with the LLM addon patterns and adds API behavior tests for cancel/busy/idle state transitions. JS layer: - Rename run() to _runInternal() (BaseInference template method pattern) - Replace 30ms timer guard with _hasActiveResponse boolean - Extract _getWeightFiles() to deduplicate file lists in _load/_downloadWeights - Wrap _runGeneration in _withExclusiveRun for serialization - Add finalized.catch(() => {}) unhandled rejection guard - Reset _hasActiveResponse in unload() - Filter undefined values in addon config coercion - Remove orphaned unloadWeights() from addon.js - Update class doc and README to match actual supported models Types (index.d.ts): - Fix run() signature: Txt2ImgParams (was accepting txt2vid params) - Proper type hierarchy: Txt2ImgParams → Img2ImgParams → GenerationParams - Add missing params: guidance, sampling_method, scheduler - Remove unused type declarations Tests: - Add api-behavior.test.js with 5 cancel/busy/idle tests - idle|run, idle|cancel, run|cancel, run|run (busy), cancel|run (rerun) - cancel|run test requires native abort support (fix/diffusion-cancel-abort) * fix(diffusion): cancel inside onUpdate callback matching LLM pattern Cancel tests now fire model.cancel() inside the onUpdate callback after the first progress tick (string data), matching the LLM addon's runAndCancelAfterFirstToken pattern. This ensures native generation is guaranteed to be active when cancel fires, preventing false passes. * fix(diffusion): use const for non-reassigned chain variable Standard JS lint requires const for variables that are never reassigned. * fix(diffusion): update scope note instead of removing it FLUX.1 and Wan2.x video are still not supported — keep that explicit. * fix(diffusion): video generation is planned, not excluded Wan2.x support is planned for the future — update scope note accordingly. * fix(diffusion): address PR review — remove WeightsProvider, unify run API, update docs - Remove WeightsProvider and _downloadWeights (files must be on disk) - Unify txt2img/img2img into single run() with auto-detected mode - Add return await to _withExclusiveRun calls (stack trace alignment) - Strengthen run|run test to verify first response completes - Update README: loader is optional, add t5XxlModel, fix load() docs - Update docs/architecture.md: align with disk-local contract * fix(diffusion): remove unused loader from constructor, tests, and examples The diffusion addon never used the loader parameter — it was accepted in the constructor but silently discarded. Model files are loaded directly from disk via diskPath. - Remove loader from ImgStableDiffusion constructor and type declarations - Remove Loader interface and ReportProgressCallback (no remaining consumers) - Remove FilesystemDL usage from all 6 integration tests and 7 examples - Update README: remove data loader section, renumber steps, drop loader from args table * fix(diffusion): remove stale loader deps and fix doc references - Remove @qvac/dl-filesystem and @qvac/dl-hyperdrive from devDependencies - Remove @qvac/dl-hyperdrive from peerDependencies - Update architecture.md to reflect direct disk-path loading (no FilesystemDL) * fix(diffusion): remove last Hyperdrive mention from architecture doc * fix(diffusion): remove stale loadWeights from thread safety rules * fix(diffusion): update data-flows doc to reflect unified run() API * feat(diffusion): move stable-diffusion-cpp to registry (#865) Support qvac ggml backend module names. * updated i2i * working anime version of i2i * cpp lint * fixed * feat(diffusion): unify img2img to always use in-context conditioning Remove the traditional img2img path (VAE encode → noise → denoise) and route all image-conditioned generation through FLUX in-context conditioning (reference tokens + joint attention). The user-facing API stays simple: pass init_image → img2img mode automatically. - addon.js: only handle init_image, always serialize as ref_image_bytes - index.js: mode = init_image ? 'img2img' : 'txt2img' (no ref2img) - SdModel.cpp: single img2img path using ref_images / joint attention - SdGenHandlers.cpp: accept txt2img and img2img only - test_ref2img.cpp: update mode from ref2img → img2img - ref2img-flux2.js: use init_image instead of ref_image Made-with: Cursor * chore(diffusion): remove accidentally committed 27MB android prebuild zip sd-cpp-android-arm64.zip was committed in e2f140e during the Android GPU backend work. Add *.zip to .gitignore to prevent recurrence. Made-with: Cursor * fix(diffusion): remove unload() calls from img2img/ref2img tests SdModel on main uses RAII (default destructor + unique_ptr deleter), so unload() no longer exists. model.reset() is sufficient. Made-with: Cursor * refactor(diffusion): unify img2img API, add von Neumann test asset, remove ref2img/SDXL - Add assets/von-neumann.jpg (Public Domain, U.S. DOE HD.3F.191) as the canonical test image for img2img examples and tests - Remove ref2img as a separate concept — all image-to-image is now just "img2img" using FLUX in-context conditioning under the hood - Delete ref2img-flux2.js example and test_ref2img.cpp unit test - Delete img2img-sdxl.js example (FLUX-only for this delivery) - Update all examples, integration test, C++ unit tests, and docs to use the new asset path and consistent img2img terminology - Add image attribution to NOTICE and Credits section to README - Round auto-detected image dimensions to nearest multiple of 8 in addon.js - Run clang-format on modified C++ sources Made-with: Cursor * style(diffusion): fix standard lint violations in img2img examples Replace backtick strings without interpolation with single quotes, remove trailing spaces, and collapse multi-space comment alignment. Made-with: Cursor * fix(diffusion): add bare-fs as direct dependency to resolve CI module error Move bare-fs from devDependencies to dependencies to fix MODULE_NOT_FOUND errors in CI workflows. The package is required by the transitive dependency @qvac/dl-filesystem and by test generation scripts, and file: dependencies don't always properly resolve transitive dependencies in npm. Made-with: Cursor * attempting to resolve dl * fixed pathing issue * increased timeouts * fix(diffusion): skip FLUX2 img2img test on CPU-only runners Add NO_GPU environment variable check to skip FLUX2 img2img test on CPU-only runners. FLUX2 img2img requires GPU acceleration as it's too slow on CPU (VAE encoding + diffusion steps exceed 30min timeout). This aligns with the existing FLUX2 txt2img test behavior and ensures the test only runs on GPU-enabled runners (ai-run-linux-gpu, mac-mini-m4-gpu, ai-run-windows11-gpu). Made-with: Cursor * fix(diffusion): only set SD_CPU_ONLY on no-GPU runners Make SD_CPU_ONLY conditional based on matrix.no_gpu to allow GPU-enabled runners (ai-run-linux-gpu, mac-mini-m4-gpu) to use GPU acceleration. Previously, SD_CPU_ONLY was hardcoded to '1' for all Linux/macOS runners, forcing even GPU runners to use CPU. This caused FLUX2 tests to be extremely slow or timeout. Now: - GPU runners: SD_CPU_ONLY='0' (uses GPU) - CPU-only runners: SD_CPU_ONLY='1' (uses CPU) Made-with: Cursor * fix(diffusion): remove SD_CPU_ONLY env var from workflow Remove SD_CPU_ONLY entirely from the workflow as the C++ code checks if the env var is set at all, not its value. Setting SD_CPU_ONLY=0 still forces CPU mode. The integration tests already handle CPU/GPU selection via the NO_GPU env var and the skip logic, so SD_CPU_ONLY is not needed at the workflow level. This allows GPU runners to properly use GPU acceleration without the workflow interfering with the backend selection. Made-with: Cursor * fix(diffusion): remove ggml overlay port to use registry version Remove the ggml overlay port to align with main branch (commit ba9f55e) which switched to using ggml from the registry instead of overlay ports. This ensures consistency across the codebase and avoids reintroducing the overlay port that was intentionally removed in PR #1066. Made-with: Cursor * changed seed and description * fix(diffusion): increase Windows test timeout to 30 minutes Increase Windows GPU runner timeout from 600s (10 min) to 1800s (30 min) to match the FLUX2 test timeout. Windows Vulkan backend may be slower than Linux/Mac for FLUX2 generation, and the sampling operations were timing out. This gives Windows tests sufficient time to complete FLUX2 img2img and txt2img generation without premature cancellation. Made-with: Cursor * chore(diffusion): regenerate mobile integration tests Add FLUX2 img2img test to mobile integration test runners. The integration.auto.cjs file is auto-generated and needs to be updated whenever new integration tests are added. Generated with: npm run test:mobile:generate Made-with: Cursor * feat(diffusion): change FLUX2 txt2img prompt to cartoon watercolor style Update test prompt from photorealistic to cartoon watercolor style for more visually distinctive output. The new style better demonstrates FLUX2's artistic capabilities. Prompt: "a red fox in a snowy forest, laying on a rock with a santa hat, cartoon, watercolor" Made-with: Cursor * fix(diffusion): double test timeouts on Windows Windows Vulkan backend is significantly slower than Linux/Mac, causing integration tests to timeout. Double all test timeouts (600s → 1200s) specifically on Windows platform while keeping other platforms unchanged. Changes: - model-loading.test.js: 10min → 20min on Windows - api-behavior.test.js: 10min → 20min on Windows (5 tests) This prevents premature timeout failures during diffusion model sampling on Windows GPU runners. Made-with: Cursor * feat(diffusion): add SD3 img2img support with SDEdit and dual-path routing Implements image-to-image transformation for SD3 Medium using SDEdit, with automatic model-specific routing between FLUX in-context conditioning and traditional SDEdit for other model families. Key changes: - Add examples/img2img-sd3.js: SDEdit example with flow-matching parameters (cfg_scale 4.5, strength 0.35-0.75, euler sampling) - Implement dual-path img2img routing in SdModel.cpp: * FLUX2/FLUX: ref_images with auto_resize_ref_image (in-context conditioning) * SD1/SD2/SDXL/SD3: init_image with SDEdit (noise + denoise) - Add automatic 8-alignment for non-multiple-of-8 input images: * Aligns dimensions up to nearest multiple of 8 to match generate_image()'s internal rounding, preventing GGML_ASSERT failures * Uses nearest-neighbor resize for the few pixels of padding needed - Rename ref_image_bytes to init_image_bytes in JS layer (addon.js) for clarity - Add integration test: test/integration/generate-image-sd3-i2i.test.js - Update README with comprehensive img2img documentation: * Document dual-path routing strategy * Add SDEdit limitations (B&W images, resolution, strength, style biases) * Add SD3 img2img example - Update JSDoc comments in index.js to reflect dual routing behavior - Fix linting error in img2img-flux2.js (remove stray text on line 13) Technical details: The vcpkg version of stable-diffusion.cpp's generate_image() aligns width/height up to spatial_multiple (typically 8) before creating tensors, then asserts that init_image dimensions match exactly. For JPEG/PNG images with non-8-aligned dimensions (e.g. 500×627), this caused assertion failures. The fix detects mismatches and resizes the decoded image to the aligned dimensions before passing to generate_image(). FLUX models are unaffected (use ref_images path with internal auto-resize). SD3 and other models now handle arbitrary input dimensions correctly. Made-with: Cursor * added linting fix Made-with: Cursor * fixed integration test * updated cpp lint * updated for sizing * fix(diffusion): fix SD3 img2img integration test OOM on Vulkan CI - Add vae_on_cpu: true to avoid GPU memory exhaustion during VAE encode/decode on CI runners with limited VRAM - Reduce steps from 40 to 20 for faster CI execution - Add null guard on images array to prevent crash when generation fails, producing a clear error message instead - Regenerate mobile integration test bundle Made-with: Cursor * attemping pr start * fix(diffusion): format cpp files with clang-format Made-with: Cursor * fix(diffusion): address PR review — image resize, error handling, alignment - Replace manual nearest-neighbor resize with stb_image_resize2 linear filtering via a new image_utils::resizeSdImage() utility - Add null checks with descriptive errors on malloc, resize, and image decode failures - Throw on failed init_image decode instead of silently skipping, removing one indentation level for readability - Fix JS/C++ alignment mismatch: Math.round → Math.ceil to match the C++ ceil-alignment ((w + 7) / 8 * 8) - Fix potential 32-bit overflow in allocation size computation by casting all operands to size_t Made-with: Cursor * fix(diffusion): format C++ files with clang-format-19 Made-with: Cursor * perf(diffusion): use stbi_info_from_memory for efficient dimension decoding - Replace stbi_load_from_memory with stbi_info_from_memory in decodeDimensions() - Avoids allocating and loading full pixel data when only dimensions are needed - Significantly more efficient for image dimension detection Made-with: Cursor * fix(diffusion): format test_img2img.cpp with clang-format-19 Made-with: Cursor * docs(diffusion): add comprehensive guidance scale reference for img2img - Document CFG scale vs distilled guidance parameter differences - Add per-model guidance scale recommendations (SD1/SD2, SDXL, SD3, FLUX.2) - Explain architectural differences: SD3 uses standard CFG while FLUX.2 uses distilled guidance - Include img2img-specific guidance behavior and examples for each model - Clarify why FLUX.2 sets cfg_scale=1.0 and uses guidance instead - Add quick reference code examples for each model family Made-with: Cursor * chore: update vcpkg-registry baseline commit Made-with: Cursor * fix(diffusion): pin ggml to port-version 4 for Vulkan LSan leak fix Revert the registry baseline bump and instead use a vcpkg override to pull in only the ggml port-version 4 patch (qvac-registry-vcpkg#119), which fixes LeakSanitizer reports in the Vulkan device cache. Made-with: Cursor * fix(diffusion): revert ggml to port-version 3, port-version 4 patch is broken The ggml-vulkan-device-cache-owned-storage.patch from port-version 4 (qvac-registry-vcpkg#119) fails to apply — the patch context does not match the ggml source at the pinned commit. Reverting to port-version 3 until the registry patch is fixed. Made-with: Cursor * fix(diffusion): add ggml overlay port with corrected Vulkan LSan patch The ggml port-version 4 patch in qvac-registry-vcpkg#119 uses zero- context hunks that git-apply cannot locate. Add a local overlay port with the same fix (unique_ptr ownership for Vulkan device cache) but with proper unified-diff context lines so the patch applies cleanly. Made-with: Cursor * fix(diffusion): use ggml port-version 5 from jpgaribotti fork Use the corrected ggml overlay port from jpgaribotti/qvac-registry-vcpkg which bumps to port-version 5 with a properly formatted Vulkan device cache patch (includes unified-diff context lines). Made-with: Cursor * fix(diffusion): point registry to jpgaribotti fork for ggml port-version 5 Switch default-registry to jpgaribotti/qvac-registry-vcpkg which has the corrected ggml Vulkan device cache patch (port-version 5). Remove the local overlay port since the fork provides the fix directly. Made-with: Cursor * fix: suppress LSAN false positives in diffusion C++ tests Updates vcpkg registry to tetherto/qvac-registry-vcpkg main (baseline 8778399) which includes the ggml Vulkan device cache fix. Also corrects LSAN suppressions file path in CI workflow to resolve the suppression file within the package workdir. Made-with: Cursor * fix: add dbus leak suppressions for test initialization Made-with: Cursor * fix: add Windows model download step to cpp-tests workflow Made-with: Cursor * fix: reduce SD3 example steps from 100 to 28 SD3 Medium typically needs 20–30 steps; 100 was leftover from experimentation and makes this example ~5x slower than needed. Made-with: Cursor * fix: correct example image paths - img2img-flux2.js: use assets/von-neumann.jpg (works on fresh checkout) instead of temp/von-neumann_transformed.png (doesn't exist) - img2img-sd3.js: write output to temp/ instead of assets/ (assets are for checked-in test files, not generated images) Made-with: Cursor * fix: ensure temp directory exists in example scripts Made-with: Cursor * fix: validate init_image is Uint8Array in img2img mode Prevents users from accidentally passing string paths (e.g., init_image: 'path/to/file.jpg') which would be misinterpreted as raw bytes and cause cryptic C++ decoding failures. Now throws a clear error with guidance. Made-with: Cursor * fix: guard SdImageBatch against nullptr from generate_image() generate_image() can return NULL on failure (OOM, abort mid-denoise). When it does, SdImageBatch was constructed with data_=nullptr but count_≥1, causing the destructor to dereference nullptr—segfault. Now the destructor, operator[], and release() all check for null before dereferencing. operator[] throws a descriptive error if called on null. Made-with: Cursor * fix(diffusion): format cpp files with clang-format-19 * fix(readme): clarify config vs parameter serialization * fix: restore dbus leak suppressions removed by clang-format commit * fix(diffusion): apply clang-format-19 to test_stb_image_security.cpp * Update packages/lib-infer-diffusion/addon/src/model-interface/SdModel.cpp Co-authored-by: gianni-cor <gianfranco.cordella@tether.io> * Update packages/lib-infer-diffusion/addon/src/model-interface/SdModel.cpp Co-authored-by: gianni-cor <gianfranco.cordella@tether.io> * fix(diffusion): format cpp files with clang-format-19 * Revert "fix(diffusion): format cpp files with clang-format-19" This reverts commit 8082388. * fix(diffusion): guard FLUX img2img prediction and harden readImageDimensions - Add JS-side guard in _runInternal() that throws when init_image is present on a FLUX model (llmModel set) but prediction is not explicitly flux2_flow or flux_flow, preventing silent fallback to SDEdit branch - Add buffer-length checks to readImageDimensions() for truncated PNG (require >= 24 bytes) and JPEG (validate segLen >= 2, guard SOF reads) - Update prediction docstring in index.d.ts to clarify FLUX img2img requires an explicit prediction value - Add regression tests for all of the above (13 cases) Made-with: Cursor * fix(diffusion): remove FLUX.1 references from documentation - Update prediction docstring to focus on FLUX.2 img2img guidance - Remove FLUX.1 from encoder file name comments (keep only relevant models) - Update error message to reference FLUX.2 only in user-facing guidance - Keep flux_flow type in PredictionType union for backward compatibility Made-with: Cursor * test(diffusion): add input-validation test to mobile integration suite Register the new input-validation regression tests in the mobile test runner so truncated image and FLUX prediction guard tests run on all platforms. Made-with: Cursor * chore(diffusion): bump to 0.2.0 and update changelog - Bump package version from 0.1.3 to 0.2.0 for img2img feature release - Update CHANGELOG.md with 0.2.0 entry: FLUX.2 img2img, input validation, regression tests - Remove stale CHANGELOG (keeping CHANGELOG.md as canonical source) Made-with: Cursor * fix(diffusion): revert vcpkg registry baseline to main Restore default-registry baseline to a9eae49a7c95a63 (matches main). The 87783998cb67fe6 baseline was an unintended change. Made-with: Cursor --------- Co-authored-by: gianni-cor <gianfrancocordella@gmail.com> Co-authored-by: aegioscy <nik@linux64vm.com> Co-authored-by: Ridwan Taiwo <donriddo@gmail.com> Co-authored-by: gianni <gianfranco.cordella@tether.io> Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>

* doc: point diffusion README to qvac fork Clarify that the diffusion addon vendors qvac-ext-stable-diffusion.cpp rather than the upstream stable-diffusion.cpp repository.

Remove 32 Opus/Marian-MT model entries from models.prod.json (712 → 680 entries) and their attributions from both NOTICE files. Removed entries: all with tags "opus", "marian", or "opus-ggml" and S3 sources containing ggml/marian/ or ggml-opus. Kept: all IndicTrans, Bergamot, LLM, TTS, STT, OCR, diffusion entries. Also kept mariana-coelho-9 Whisper model (HuggingFace username, not Opus-related). Made-with: Cursor

#1614) * feat: auto-download models in pivot and indictrans examples * chore: bump @qvac/translation-nmtcpp to 2.0.3 and update CHANGELOG --------- Co-authored-by: Ramaz Tskhadadze <bubu@Ramazs-MacBook-Pro-2.local>

Add suites: ["smoke"] to 84 test definitions across 27 files. Curated for API surface coverage, validation quality, and performance.

#1465) * test: sdk ios device farm poc * test: test ios build changes in framework, attempt to improve device logs * fix: clarify naming, fix python logging script error for ios * fix: attempt to get logs via pymobiledevice3 * test: remove unnecessary python script, update test suite (build improvements) * fix: add test suite termination on consumer finish for ios * test: add device farm link to run summary, artifact ios logs * chore: bump test suite to stable released version * chore: use caret version instead

…untime (#1480) * fix: handle unhandled worker process termination in Bare runtime (QVAC-12232) - Add uncaughtException/unhandledRejection handlers to trigger cleanup on crash - Add IPC socket close detection to handle parent process termination - Add PID-based lock file (~/.qvac/.worker.lock) with stale detection on startup - Fix unloadAllModels to close FilesystemDL loaders (aligned with single-model path) - Use exit code 1 for crash shutdowns, 0 for graceful Made-with: Cursor * fix: add fd-lock retry with backoff and corestoreOpts plumbing for registry client - Add retry with exponential backoff (3 attempts, 500/1000/2000ms) on fd-lock errors in getRegistryClient() - Plumb corestoreOpts through QVACRegistryClient constructor to Corestore (registry-server) - Add corestoreOpts to QVACRegistryClientOptions type definition - SDK will pass wait: true once @qvac/registry-client dep is bumped * fix: make getRegistryClient single-flight to prevent concurrent fd-lock contention * fix: address review comments on worker termination PR - Remove unused isStaleWorkerLock export from worker-lock.ts - Use optional chaining in unloadAllModels (model-registry.ts) * fix: instantiate Corestore in constructor, remove _corestoreOpts property * fix: remove homeDir param from worker-lock, resolve via getEnv internally * chore: add getQvacPath utility, consolidate ~/.qvac path resolution Introduces server/utils/qvac-paths.ts with getQvacPath(...subPaths) as the single place that resolves paths under ~/.qvac. Updated worker-lock, cache, config-registry, and hyperdrive to use it instead of manually constructing path.join(HOME_DIR, ".qvac", ...) each time.

) * chore: remove Opus NMT engine from SDK API surface Remove Opus engine literal from NMT_ENGINES, opusConfigSchema, MARIAN_LANGUAGES, MarianLanguage type, generateNmtOpusName utility, Opus examples, and Marian Opus test definitions and resources. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump @qvac/translation-nmtcpp to ^1.0.1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Ramaz Tskhadadze <bubu@Ramazs-MacBook-Pro-2.local> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

#1627) This reverts commit 436e29c.

* Updated SDK translation-config.ts and released-models-bergamot.txt * Ran update-models.

* doc: api v0.9 manually written * doc: SDK - api reference - v0.9.0

* infra: centralize Vulkan SDK setup for prebuild workflows * Enable version selection of vulkan sdk * Use config file to hold vulkan sdk version This pins the vulkan sdk version in a single location so we don't have to hard-code it in a workflow, while still allowing specific prebuilds to override it if needed.

@TSV

…all platforms (#1625) * wip: perf reporting automation and integration test updates * feat: add clinical chemistry quality test with dedicated ground truth Separate user-provided clinical chemistry lab result image from the existing lab_results.png (which is a different document). Adds dedicated test and ground truth file for accurate quality evaluation. Made-with: Cursor * feat: enhance performance reporting for mobile integration tests - Added steps to download Device Farm artifacts and extract performance reports for mobile tests in the integration workflow. - Updated performance report generation to include HTML and JSON outputs for mobile tests. - Refactored performance reporter utility to support runtime module configuration for Bare compatibility. This improves the visibility of performance metrics for mobile integration tests and ensures consistent reporting across platforms. * feat: add liver function test image and regenerate mobile test runners - Add liver_function_test.png (Simone's benchmark image) with ground truth - Create doctr-liver-function.test.js integration test - Regenerate integration.auto.cjs to include all 18 tests (was missing 3) - Add liver_function_test.png, clinical_chemistry.png, ct_scan_report.png to mobile CI testAssets copy step * fix: re-authenticate AWS OIDC before Device Farm log download The OIDC token from the initial auth may expire during the 2hr Device Farm test run. Add a fresh configure-aws-credentials step before the artifact download so list-jobs/list-artifacts calls succeed. * chore: remove unnecessary OIDC re-auth before log download OCR tests complete well within the 2hr OIDC token lifetime. * fix: resolve standardjs lint errors in OCR integration tests * feat: add CPU/GPU variants to OCR medical image tests Each test now runs twice — once with useGPU: false [CPU] and once with useGPU: true [GPU] — so the performance report clearly shows side-by-side CPU vs GPU timings per image. Labels include [CPU]/[GPU] tags which the reporter uses to set the execution_provider field in the JSON/HTML output. * fix: resolve mobile bundling failure and desktop test issues - Use dynamic require via path.join for performance-reporter and quality-metrics modules so bare-pack cannot statically resolve them during mobile bundling (fixes MODULE_NOT_FOUND on iOS/Android) - Provide no-op fallbacks when modules are unavailable in mobile bundle - Replace 'liver' with 'pathology' in liver function test assertions since OCR reads the header as 'VER.FUNCTION' not 'LIVER' - Flush performance report from run-with-exit.js before writing exit code, since bare is killed by run-tests.sh before exit handler fires - Add debug logging to CI workflow HTML report generation step Made-with: Cursor * fix: expand expected words for OCR medical document tests Add more reliably-detected words to each test to strengthen assertions — verified against actual CI OCR output. - liver_function_test: +8 words (biochemistry, hospital, conjugated, unconjugated, ratio, specimen, investigation, total) - lab_results: +10 words (medivista, hospital, biochemistry, department, arterial, gases, oxygen, electrolyte, metabolite, oximetry) - ct_scan: +8 words (allied, medical, center, patient, heart, trachea, vascular, normal) Made-with: Cursor * fix: resolve desktop artifact collision and Android AAPT2 build failure - Use matrix.os instead of matrix.platform in desktop performance report artifact names to avoid 409 Conflict when linux-x64 and linux-arm64 jobs upload to the same name - Re-encode ct_scan_report.png, liver_function_test.png, and clinical_chemistry.png as actual PNG format — they were JPEG data with .png extensions, causing AAPT2 to fail during Android resource compilation Made-with: Cursor * fix: enable performance reporting on mobile Device Farm runs The mobile no-op fallback silently discarded all metrics because scripts/test-utils/ is outside the bare-pack bundle. Replace with a lightweight inline reporter that records metrics in memory and outputs [PERF_REPORT_START]...[PERF_REPORT_END] markers to console. On mobile, write markers after every test recording so the last (most complete) report is always available in Device Farm logs even if the process is killed before exit handlers fire. Update extract-from-log.js to find the last marker pair instead of the first, so it picks up the fully accumulated report. Made-with: Cursor * fix: graceful DocTR model download failure on mobile - ensureDoctrModels returns null on mobile when downloads fail instead of letting unhandled rejection SIGABRT BareKit - Medical test files (ct-scan, lab-results, clinical-chemistry, liver-function) skip gracefully when models unavailable - Mobile gets 5 retries with 10s backoff (was 3/5s) - downloadDoctrModel checks ocr-model-urls.json on mobile first for alternative URLs (future S3 presigned URL support) - Added DocTR model URLs to generate-ocr-presigned-urls.sh output Made-with: Cursor * fix: download device logcat from Device Farm for perf report extraction The workflow only downloaded --type FILE artifacts (test spec output), but app console.log goes to device logcat which is --type LOG. Performance markers were never found because they live in DEVICE_LOG and LOGCAT artifacts. - Add --type LOG artifact download alongside --type FILE - Update extract-from-log.js to handle JSON logcat format (Device Farm stores logcat as JSON arrays with message fields) Made-with: Cursor * fix: capture device logs via Appium getLogs for perf report extraction Root cause: console.log from BareKit goes to device logcat/syslog, NOT to the Appium test spec output. The extract script was scanning TESTSPEC_OUTPUT which never contained the markers. Fix: Add wdio after hook that calls browser.getLogs() to pull device logs into TESTSPEC_OUTPUT where extract-from-log.js can find them. - Both Android/iOS wdio configs: add after hook using getLogs('logcat') and getLogs('syslog') to dump perf markers to testspec console - Android post_test: adb logcat -d backup dump to DEVICEFARM_LOG_DIR - iOS post_test: search all files in DEVICEFARM_LOG_DIR for markers - Also download --type LOG artifacts (DEVICE_LOG/LOGCAT) as fallback Made-with: Cursor * fix: use Appium pullFile for mobile perf report extraction Replace unreliable console.log-to-device-log chain with file-based approach: inline reporter writes perf JSON to disk, wdio after hook pulls it via Appium pullFile API. Android tries multiple sandbox paths, iOS uses known @bundleId:documents/ path. getLogs kept as tertiary fallback. Android post_test adds adb find+cat for path discovery. Made-with: Cursor * fix: prevent false positive perf marker extraction from wdio config The cat of wdio.config.devicefarm.js in the testspec printed the literal JS code console.log("[PERF_REPORT_START]"+json+"[PERF_REPORT_END]") to TESTSPEC_OUTPUT. extract-from-log.js picked up "+json+" as valid JSON (a JSON string literal) and wrote it as the report. Two fixes: - Remove cat of wdio config from testspec (eliminates false positive source) - Add isValidReport() check in extract-from-log.js requiring schema_version and results array (defense in depth against any future false positives) Made-with: Cursor * fix: call writeReport() after each test on mobile, not just on exit writeReport() was only called inside _flushPerfReport() which runs on process.on('exit') — unreliable on BareKit. The file never got written, so pullFile had nothing to retrieve. Now writeReport() is called after each test alongside writeToConsole(), progressively writing cumulative results to global.testDir/perf-report.json. Made-with: Cursor * fix: resolve incomplete mobile perf report (truncation, permissions, multi-device) Three issues caused the Android report to show only 3 of many results: 1. Logcat ~4KB line truncation: writeToConsole included the output field (hundreds of detected text strings per test), causing the JSON to exceed the logcat line limit. Stripped input/output fields from console payload; writeReport file still has the full data. 2. pullFile permission denied: Device Farm adb can't access app sandbox. Replaced adb find+cat with run-as <pkg> cat which executes as the app user and can read private files. Wraps output in PERF markers. 3. Single-device extraction: extract-from-log.js exited after first valid report. Now scans ALL files and picks the report with the most results. Made-with: Cursor * fix: sanitize logcat control chars and fix run-as extraction on Device Farm Two issues from run 614: 1. Logcat entries from getLogs contain embedded control characters (ASCII 0x00-0x1F) that break JSON.parse. Added regex sanitization in extractFromText to strip control chars before parsing. 2. The multi-line run-as block in post_test didn't expand ${PERF_JSON} on Device Farm. Replaced with simple single-line commands: run-as cat to a file in DEVICEFARM_LOG_DIR, then cat to stdout. Made-with: Cursor * fix: write per-device performance reports with actual device model names Device Farm organizes artifacts by device (e.g. Apple_iPhone_16_Pro/). Previously the extract script picked only the best single report and device.name was just "ios"/"android". Now when multiple devices are found, each gets its own performance-report.json tagged with the real device name, which the aggregate script discovers and groups by device. Made-with: Cursor * feat: enable quality metrics (CER/WER/keyword/KV) on mobile reports Previously quality evaluation was stubbed out on mobile because quality-metrics.js couldn't be loaded by bare-pack. Now the core algorithms (Levenshtein, CER, WER, keyword detection, KV accuracy) are inlined in the mobile fallback, and findGroundTruth reads .quality.json files from global.assetPaths. The workflow now also copies ground truth JSON files to testAssets for mobile bundling. Made-with: Cursor * fix: multi-device HTML generation, artifact upload, and run_number injection Three bugs identified during end-to-end mobile pipeline audit: 1. Workflow checked `if [ -f performance-report.json ]` at the root, but multi-device extraction writes per-device subdirectories only. Changed to `find` so aggregate.js runs for any layout. 2. Upload artifact paths only listed root-level files. Added glob to include per-device subdirectory JSONs. 3. Mobile reports lacked run_number (not available on Device Farm). Added --run-number flag to extract-from-log.js; workflow now passes github.run_number so aggregate HTML shows proper run columns. Made-with: Cursor * feat: split mobile OCR tests into perf + regular Device Farm runs - Add test-groups.json to define perf (4 medical tests) and regular groups - Run each perf test 3 times for mean + stddev averaging - Schedule 2 parallel Device Farm runs per platform (perf + regular) - Add __TEST_FILTER__ + __MOCHA_GREP__ for app-level and mocha-level filtering - Monitor both runs concurrently, check both for pass/fail - Download artifacts from perf run only for report extraction - Fix duplicate run_number columns in aggregated reports Made-with: Cursor * fix: aggressive control char sanitization + filter perf results at extraction - Strip ALL ASCII control characters (0x00-0x1F) from JSON between perf markers, fixing "Bad control character at position 1004" on Android - Add --filter flag to extract-from-log.js to keep only results matching a regex pattern (e.g. medical test labels) - Add perf_report_filter to test-groups.json with medical test label pattern - Workflow passes --filter to extraction step so reports only contain perf test data even if non-perf tests also ran Made-with: Cursor * fix: show individual iteration columns and fix mobile quality metrics Report tables now display Run 1, Run 2, Run 3 columns (from the values array) instead of collapsing all iterations into a single Run #NNN column. Header shows CI run numbers and iteration count separately. CER/WER computation now sorts tokens alphabetically before comparison so reading-order differences between platforms (mobile bottom-to-top vs desktop top-to-bottom) do not inflate error rates. Mobile CER drops from ~81% to ~12%, matching desktop. Made-with: Cursor * fix: resolve Android perf report extraction failures Three root cause fixes for the Android performance reporting issues: 1. Mocha grep causing WDIO early exit: The grep patterns were function names from test-groups.json, NOT WDIO spec test titles. This caused WDIO to skip all spec tests and exit immediately without waiting for the app to finish running tests — producing incomplete reports (only 4 results captured instead of 15+). Fixed by setting grep to "." (match-all) and relying on post-extraction --filter for test selection. 2. JSON parse errors on Android logcat: When console.log output spans multiple logcat lines, Android injects timestamp/PID/tag prefixes into the middle of the JSON. Added regex to strip these prefixes during extraction. 3. Missing clean extraction source: Added marker-wrapped output in the Android post_test phase using run-as cat, providing a clean secondary extraction source when the WDIO after hook fails (e.g., app crash on Pixel 9 Pro). Made-with: Cursor * fix: write perf report to file instead of console.log to avoid interleaving The JSON parse errors (Expected ':' after property name) are caused by WDIO debug-level logging interleaving with console.log output when the JSON string is large. Node.js stdout.write splits large strings across multiple chunks, and WDIO debug output gets inserted between chunks. Fix: write pullFile JSON to a local file (perf-report-extract.json) via fs.writeFileSync in the WDIO after hook, then output it cleanly from the post_test phase using cat with markers. This completely avoids the console.log interleaving problem. Also removes the getLogs('logcat') marker printing from both Android and iOS WDIO configs — these were a major source of corrupted duplicate markers. The file-based approach is the primary extraction method now. Added diagnostic char-level logging to extract-from-log.js to capture what corruption pattern exists if any marker pairs still fail to parse. Made-with: Cursor * fix: correct lab_results ground truth and add quality verification tooling lab_results.quality.json described a KIMS-ICON Hospital report but the actual test image is from Medivista Central Hospital. Rewrote the entire ground truth (reference_text, keywords, key_values) to match the image. CER drops from 39.6% to ~18%. Added verify-quality.js script for independent metric auditing and expandable diagnostic details in the HTML quality report. Made-with: Cursor * fix: make keyword and KV matching order-independent for multi-word phrases Keyword detection and KV accuracy were using substring matching which fails when OCR text regions are in spatial rather than reading order. Multi-word keywords like "ALLIED CARE EXPERTS" never match when the OCR outputs "EXPERTS", "CARE", "ALLIED" as separate tokens in reverse. Now falls back to word-level matching: checks that every word of the keyword exists anywhere in the OCR output. Applied to both desktop (quality-metrics.js) and mobile (utils.js) evaluation paths. Local results: clinical_chemistry KW 57%→94%, KV 29%→83%; ct_scan KW 79%→97%; liver_function KV 70%→90%. Made-with: Cursor * fix: add detailed KV mismatch table to HTML quality diagnostics The expandable diagnostic section now shows a table for each unmatched key-value pair with columns for expected key, expected value, and whether each was individually found in the OCR output. Helps identify whether failures are key misreads vs value misreads. Made-with: Cursor * feat: add quality metrics methodology section to HTML report Adds a detailed "How We Measure" section explaining CER, WER, keyword detection, and KV accuracy: what each metric means, how it is calculated, what the formula is, and why tokens are sorted alphabetically. Includes concrete examples of OCR misreads and a note on reading-order independence. Made-with: Cursor * fix: Android perf report extraction failures Three issues addressed: 1. extract-from-log.js was parsing test spec SOURCE CODE as JSON because Device Farm logs the spec before running it, and the printf statements contain literal [PERF_REPORT_START] markers. Now skips content that doesn't start with { or [ (not JSON). 2. Android WDIO after hook only tried 4 app sandbox paths but the app may write to /tmp or /data/local/tmp. Added those paths and validation that pulled content is actual JSON before writing. Also added diagnostic file listing when all paths fail. 3. Android post_test run-as fallback now tries files/, cache/, and ./ subpaths instead of just files/. Added diagnostic commands to list perf-report files in both test package dir and app sandbox. Made-with: Cursor * feat: add Word Recognition Rate metric alongside CER/WER Adds Dima-style single-word detection metric (Word Recognition Rate) to both desktop and mobile quality evaluation. This lets the team compare against the Android benchmark baseline (~97%) while also seeing the stricter CER/WER numbers. The HTML methodology section now explains both approaches and when each matters. Made-with: Cursor * feat: embed test images in HTML report, add CI step summary, fix clinical chemistry CPU/GPU - Clinical chemistry test now runs both CPU and GPU variants like the other OCR tests, fixing the missing [CPU]/[GPU] tag in reports - HTML quality report embeds source images as base64 thumbnails next to each test row (hover to zoom, click to open full size) - Performance reports now include image_path in the JSON schema - Added GitHub Step Summary for both desktop and mobile CI workflows so Olya and others can see results directly on the CI job page without downloading artifacts Made-with: Cursor * fix: limit perf report to medical image tests only, improve image lightbox - Only the 4 medical image tests (clinical_chemistry, ct_scan, lab_results, liver_function) now write to the performance report. Other tests (basic, french, models, ocr-basic) use skipReport flag so they still log metrics but don't appear in the HTML/CI report. - Replaced broken href-based image click with a proper JS lightbox overlay (90vw/90vh, click or Escape to close). Made-with: Cursor * fix: overhaul Android perf report extraction with 4-strategy approach Previous approach relied solely on Appium pullFile which consistently fails on Android Device Farm. New post_test extraction uses: 1. Logcat path hint: grep PERF_REPORT_PATH from logcat to get the exact path the app wrote to, then adb shell cat it 2. Known system paths: /tmp and /data/local/tmp 3. run-as sandbox paths: files/, cache/, ./ 4. Device-wide find: adb shell find /data -name perf-report.json Also writes WDIO pullFile output to DEVICEFARM_LOG_DIR (absolute path) instead of relative CWD which may differ from post_test working dir. Made-with: Cursor * fix: split WDIO config into separate step to fix GH Actions expression limit The inline WDIO config + testspec generation was a single run: block exceeding GitHub Actions' 21000 character expression limit. Split into two steps: 1. "Define WDIO Config" — writes platform-specific WDIO config to a temp file using heredoc (no character limit issues) 2. "Create and Upload Test Spec" — reads the config file and generates the Device Farm test spec YAML Also compacted the Android post_test extraction to reduce line count while keeping all 4 extraction strategies (logcat path, system paths, run-as, device-wide find). Made-with: Cursor * fix: replace sed -i with portable sed + mv for macOS compatibility BSD sed (macOS/iOS runners) requires a backup extension with -i flag. Use sed to a temp file + mv instead, which works on both GNU and BSD. Made-with: Cursor * fix: add Device Farm log diagnostics and upload raw artifacts We keep failing to extract Android perf reports but can't see WHY because the TESTSPEC_OUTPUT.txt content isn't visible. This adds: 1. "Diagnose Device Farm output" step that greps TESTSPEC_OUTPUT.txt for perf-related lines and prints the last 50 lines (post_test) 2. Uploads raw Device Farm logs as a separate artifact so they can be downloaded and inspected directly This will tell us exactly what the Device Farm test spec outputs during post_test and whether the extraction strategies are running. Made-with: Cursor * fix: Android perf report extraction — search logcat and download customer artifacts Root cause analysis: - The app's console.log([PERF_REPORT_START]...) goes to Android logcat, NOT to TESTSPEC_OUTPUT.txt (which only captures test spec stdout) - writeReport() writes to global.testDir || '/tmp' which may fail on Android - WDIO pullFile is unreliable - post_test adb strategies all fail because the file was never written - Result: TESTSPEC_OUTPUT.txt has NO perf markers Three fixes: 1. Search logcat_full.txt for [PERF_REPORT_START] markers in post_test and pipe them to stdout (TESTSPEC_OUTPUT.txt) as a fallback 2. Download CUSTOMER_ARTIFACT files from Device Farm (files placed in $DEVICEFARM_LOG_DIR during post_test, including logcat_full.txt) 3. Enhanced diagnostic step showing all downloaded files, marker search across all files, post_test execution evidence, and device log contents Also improved extract-from-log.js to list all scanned files with sizes. Made-with: Cursor * fix: rewrite WDIO after hook with 3 extraction strategies for Android The after hook now has 3 strategies to get the perf report: 1. pullFile — existing, tries known device paths (often fails) 2. getLogs("logcat") — NEW: reads app console output via Appium, searches for [PERF_REPORT_START] markers the app already emits 3. mobile:shell — NEW: runs adb cat via Appium to read the file When any strategy succeeds, the report is: - Written to perf-report-extract.json (for post_test to find) - Output to console.log with markers (goes directly to TESTSPEC_OUTPUT.txt — no post_test dependency needed) Made-with: Cursor * fix: broken artifact download (spaces in name) and macOS bash syntax 1. Artifact name field contains spaces (e.g. "Test spec output") which broke read -r TYPE NAME URL splitting. Fix: use jq @TSV with IFS=$'\t' to properly delimit fields. 2. ${PLATFORM^} is Bash 4+ syntax, fails on macOS. Removed. Made-with: Cursor * fix: parse Android perf reports from logcat with interleaved content Root cause from CI logs: - Samsung S25: logcat markers ARE in TESTSPEC_OUTPUT (52 occurrences!) but parser fails because bare runtime splits output across logcat lines, and other logs interleave between START and END markers - Pixel 9 Pro: logcat buffer overflow — markers lost before post_test Three fixes: 1. extract-from-log.js parser rewrite: - When outer START/END pair fails to parse (interleaved content), search for inner START markers closer to the END — the ReactNativeJS bridge has complete reports on single lines - Fix logcat prefix regex: handle "bare :" (spaces before colon) - Strip ReactNativeJS wrapper: '[Bare]', '...' 2. Mobile writeReport() writes to multiple locations: - global.testDir (app sandbox) - /data/local/tmp/ (world-readable, adb accessible) - /tmp (fallback) This ensures adb can read the file without root/run-as 3. Fix diagnostic grep integer comparison error Made-with: Cursor * fix: chunk large perf reports to avoid Android logcat truncation Android logcat has a ~4096 byte line limit. Reports with 4+ test results exceed this and get truncated, causing the parser to only find 3 of 6 results. - writeToConsole now splits large JSON into 2000-char numbered chunks using [PERF_CHUNK:<id>:<index>:<total>] markers - extract-from-log.js reassembles chunks (deduplicating bare/RN tags) and picks the report with the most results - WDIO after hook: added logcat chunk reassembly + run-as extraction - post_test: greps for PERF_CHUNK markers in logcat_full.txt - Diagnostic step now also counts PERF_CHUNK occurrences Small reports (<2000 chars) still use the original START/END format. Made-with: Cursor * fix: robust Android chunked perf report extraction - Strip WDIO [0-0] prefix and conditionally strip ReactNativeJS trailing quote in cleanJsonFromLogcat - Keep longest chunk content per index instead of first-seen to mitigate logcat truncation - Add sanitizeChunkContent and brace-boundary trimming fallback - Reduce chunk size from 2000 to 800 chars for wider logcat margin - Apply same fixes to WDIO after hook inline JS - Fix missing closing brace that caused SyntaxError in workflow Made-with: Cursor * fix: fallback to repo test images when device paths unavailable Device-local image_paths (e.g. /var/mobile/...) don't exist on the CI runner. Fall back to packages/ocr-onnx/test/images/ by filename so thumbnails are embedded in the HTML report. Made-with: Cursor * fix: write perf report to console once at exit instead of per iteration writeToConsole() was called after every test iteration, dumping the full cumulative report each time. With 24 iterations this produced ~1300 logcat entries of chunk data, exceeding the 10000-entry cap of browser.getLogs("logcat") and causing the Pixel to only recover 19/24 results. Now writeToConsole() runs once on process exit via the existing _flushPerfReport handler, reducing chunk entries to ~70. File-based writeReport() still runs per iteration for crash safety. Made-with: Cursor * fix: use lightweight per-iteration perf writes to avoid logcat buffer overflow Per-iteration writeToConsole calls now strip verbose quality arrays (words_missed, keywords_missing, key_values_unmatched, etc.) keeping only summary metrics (cer, wer, rates). This reduces logcat entries from ~838 to ~252 for 24 iterations, safely within the 10K buffer. The exit handler still writes the full report with all quality details. Also prefer last chunk set when result counts are equal (>= vs >). Made-with: Cursor * fix: poll results file for stability before extracting perf report The WDIO after hook now polls the on-device perf-report.json every 5s (up to 120s) and waits for the result count to stabilize before extracting. This fixes Pixel 9 Pro getting 20/24 results because the after hook was firing before the app finished all test iterations. Samsung is unaffected (stabilizes within 10-15s). iOS unchanged. Made-with: Cursor * fix: include image_path in writeToConsole output for Android reports Android pullFile always fails on Device Farm (permission issues), so reports are extracted from logcat via writeToConsole. The writeToConsole method was stripping image_path, causing empty image_paths in Android HTML reports. Now image_path is preserved in both lightweight and full console output. Only ~2 extra chunks for 24 results — negligible. Made-with: Cursor * fix: write full quality at checkpoints and bail poll early on Android Per-iteration writeToConsole now writes full quality (words_missed, keywords_missing, key_values_unmatched, etc.) every 6th result (matching test file boundaries), and lightweight otherwise. The 24th result's checkpoint write includes full quality for ALL results. Total logcat entries: 123 (down from 252 lightweight-only). Also fix stability poll to bail after 2 consecutive pullFile failures so Android doesn't waste 120s when pullFile is unavailable. Made-with: Cursor * fix: use mobile:shell for stability poll and direct file extraction Replace pullFile (which always fails on Android Device Farm) with mobile:shell (adb cat) for the stability poll. When the poll succeeds, use the full file content directly — bypassing logcat entirely. This gives full quality data, image_path, and waits for the Pixel to finish all tests. Bail after 3 consecutive shell failures. Made-with: Cursor * feat: combined perf report across all platforms + unified summary table - Add combine-reports job that runs after both Android/iOS matrix jobs, downloads both artifacts, and generates a single HTML/MD/JSON report with all 4 devices (Samsung, Pixel, iPhone 16 Pro, iPhone 17) - Rewrite markdown summary to use combined tables: rows=test×EP, cols=devices side-by-side (replaces separate per-device tables) - Increase Android stability poll timeout from 120s to 240s for Pixel - Add HTML report artifact note in GitHub step summary Made-with: Cursor * chore: remove per-platform summaries, use combined report only Made-with: Cursor * fix: increase stability threshold to 30s to avoid premature poll exit The poll was exiting after just 10s of no change (stableCnt>=2), but the gap between test files (e.g. lab_results→liver_function) on Pixel exceeds 10s due to model loading. Pixel had 18/24 results because the poll declared "stable" during the inter-file gap. Now requires 30s of no change (stableCnt>=6) before exiting. Made-with: Cursor * feat: combined report includes desktop + mobile, all platforms in one table - Download ALL perf-report-ocr-* artifacts (desktop + mobile) using pattern matching instead of individual Android/iOS downloads - Add preprocessing step to patch desktop device names from artifact directory names (e.g. "ubuntu-24.04-x64", "macos-15-arm64") - Strip "-xlarge" from runner size suffixes in short device names - Combined HTML and summary table now shows all platforms side by side Made-with: Cursor * fix: add /tmp/ to poll paths and remove premature bail-out Root cause: Pixel writes perf-report.json to /tmp/ (fallback) instead of /data/local/tmp/ (which Samsung uses). The poll only checked /data/local/tmp/ and /data/data/, so the file was never found. Additionally, pollMiss>=3 bailed out after just 15s of not finding the file, before the Pixel even finished writing. Changes: - Add /tmp/perf-report.json and /data/user/0/ paths to poll - Remove pollMiss>=3 bail-out entirely — poll always runs to timeout or until stableCnt>=6 (30s of no change) - Add /tmp/ to mobile:shell fallback paths - Add periodic progress logging during poll Made-with: Cursor * fix: add external storage paths for Pixel perf report extraction Pixel 9 Pro logcat has zero Bare runtime output, making logcat-based extraction impossible. The app also can't write to /data/local/tmp/ or /tmp/ on Pixel (EACCES), and ADB can't read the app's private directory. Fix by writing perf-report.json to Android external app storage (/sdcard/Android/data/<pkg>/files/) which is writable by the app without permissions and readable by ADB on all devices. Made-with: Cursor * fix: add safety nets for Pixel perf report extraction - Increase logcat buffer to 16MB in pre_test (retains Bare runtime output that normally gets rotated out on Pixel) - Pre-create /sdcard/Android/data/<pkg>/files/ via adb in pre_test (ensures external storage dir exists before app tries to write) - Capture all logcat buffers (-b all) in post_test instead of just main Made-with: Cursor * feat: per-device HTML reports, cleaner summary, remove PR comments - Remove per-platform PR comments (Comment PR with results step) - Generate individual HTML reports per device in combine-reports job - Rename combined artifact to HTML-Report-All-Platforms-{run} - Upload per-device HTMLs as HTML-Reports-Per-Device-{run} - Summary keeps tables at top, lists downloadable reports below * fix: use echo instead of printf in summary to avoid format string errors * chore: remove per-platform step summaries from desktop workflow Only the combined summary from combine-reports should appear in the GitHub run. Desktop per-platform data is still uploaded as artifacts and included in the combined report. * fix: remove per-device step summary from desktop test runner The PerfReporter.writeStepSummary() was writing individual device summaries to GITHUB_STEP_SUMMARY from within each desktop test job. The combined report in combine-reports is now the sole summary source. * doc: clarify why DocTR URLs are included in presigned URL script * fix: address CodeQL regex injection and remove unused variables - Escape regex special chars in filter pattern (extract-from-log.js) - Remove unused imports in verify-quality.js - Remove unused QUALITY_LABELS constant in utils.js * fix: use filter pattern as regex first, escape only on invalid syntax The perf_report_filter uses pipe alternation (a|b|c) which was broken by unconditional regex escaping. Now tries the pattern as-is first and only escapes if it's invalid regex. * fix: replace regex filter with substring matching to resolve CodeQL alert Use split('|') + includes() instead of RegExp construction from CLI argument. Eliminates the regex injection vector entirely while keeping the same pipe-delimited filter syntax from test-groups.json.

…tegration test pipelines (#1638) * QVAC-17318 infra: add Device Farm artifact downloads to all mobile integration test pipelines Add automatic downloading and uploading of Device Farm artifacts (logs, test specs, results) as GitHub Actions artifacts across all mobile integration test workflows. Excludes video artifacts on Android to reduce artifact size, keeping video only for iOS where C++ logs are unavailable. New artifact downloads: decoder-audio, llamacpp-embed, diffusion, ocr-onnx. Android video exclusion: all 9 workflows including llm, nmt, tts, whisper, parakeet. Made-with: Cursor * QVAC-17318 infra: add generic log download alongside perf download for OCR Keep main's perf-specific artifact download (for perf report extraction) and add the generic Device Farm log download for both perf + regular runs. Includes Android video exclusion. Both artifact sets uploaded separately (df-raw-logs for perf, devicefarm-logs for full logs).

…NOTICE, model registry, and tooling fixes (#1645) * QVAC-17009 test[skiplog]: add finetune progress zero-drop E2E test (#1629) * QVAC-17303 test[skiplog]: add suspend/resume lifecycle integration tests (#1619) * QVAC-17303 test[skiplog]: add suspend/resume lifecycle integration tests * QVAC-17303 test[skiplog]: add suspend-during-inference test and refactor error handling * QVAC-17009 test[skiplog]: use typed defaultHandler signature with Expectation * chore[notask]: sdk v0.9.0 release changelog, NOTICE, model registry update, and tooling fixes - Generate v0.9.0 changelog (41 PRs, breaking/api/models detail files, CHANGELOG_LLM.md) - Update model registry: 312 → 653 models (+341, including 295 Bergamot translation models) - Update NOTICE with correct model licenses (fix licenseId field mapping) - Fix changelog script: upstream-first tag fetch, minor/patch release-type detection - Add model history fallback to changelog script for untagged [mod] PRs - Update sdk-changelog skill with upstream tags and release-type documentation --------- Co-authored-by: Victor-Rodzko <victor.rodzko@itrexgroup.com>

…ests (#1647) The static require('../../../../scripts/test-utils/performance-reporter') cannot be resolved by Bare runtime on CI because the module lives outside the addon package. Use path.join to build the path (prevents bare-pack static resolution on mobile), call configure() with bare-* modules, and wrap in try/catch with a no-op fallback for environments where the script is unavailable. Fixes MODULE_NOT_FOUND breaking all nmtcpp integration tests (desktop and mobile) and hardens the same pattern in llamacpp-llm image tests. Made-with: Cursor

…1646)

Drop ccache install, configure, cache restore/save, stats, and CMAKE_*_LAUNCHER exports from all prebuilds-*.yml workflows. Also drop the -D CMAKE_C_COMPILER_LAUNCHER=ccache / -D CMAKE_CXX_COMPILER_LAUNCHER=ccache flags on `bare-make generate` in prebuilds-qvac-lib-infer-onnx.yml and prebuilds-ocr-onnx.yml. Preserve unrelated Windows bootstrap that shared the ccache step (git config --system core.longpaths true, choco upgrade llvm) and extract the onnx-tts `brew unlink fmt` fmt-header workaround into its own macOS-only step. Non-prebuild workflows (cpp-test-coverage-*, benchmark-*) still use ccache and are out of scope for this change.

…x.d.ts (#1612)

…1496) * chore[bc]: remove BaseInference inheritance from diffusion addon Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0: - createJobHandler() for single-job lifecycle management - exclusiveRunQueue() for run serialization Constructor now takes { files: { model, clipL?, clipG?, t5Xxl?, llm?, vae? }, config, logger, opts } instead of { diskPath, modelName, clipLModel, ... } + config. All examples and tests updated to new constructor shape. * fix: restore JSDoc comments in index.js and index.d.ts * docs: update SD README for new constructor pattern * fix: guard SD _runInternal against run-before-load with clear error * docs: align SD architecture.md with new constructor and composition pattern * chore[bc]: address PR #1496 review findings and bump to 0.2.0 Bumps `@qvac/diffusion-cpp` to `0.2.0` per the addon-changelog process — minor bump on a pre-1.0 package signals the breaking constructor change to consumers using semver ranges. Adds the matching `0.2.0` block to `CHANGELOG.md` documenting the new single-object constructor with `files`, the removal of `BaseInference`, and every behaviour change in this release. Hardens the JS layer based on the review: - Constructor now throws a clear `TypeError` when `files` / `files.model` is missing, instead of crashing with an opaque "cannot read properties of undefined" later. - `createJobHandler({ cancel })` closure uses optional chaining so a `response.cancel()` after `unload()` is a no-op rather than a `TypeError`. - `unload()` sets `this.addon = null` after `addon.unload()`, so the existing `if (!this.addon)` guard in `_runInternal` is also effective post-unload. - `cancel()` re-adds the defensive `?.cancel` check. - `isSplitLayout` now also triggers on `clipL` / `clipG`, closing a footgun where a FLUX.1 caller passing only encoders without `t5Xxl` would silently misroute the diffusion model into the all-in-one `path` parameter and fail to load. - `_addonOutputCallback` no longer pushes unknown event payloads into the active response output stream — unknown events are logged at debug level instead. The error log line is updated to pass the `Error` object directly so loggers can format the full stack. Doc + test cleanup: - README section 3 now describes `args.config` as a field of the same `args` object built in section 2 (the old wording made it sound like a separate constructor argument). - The `api-behavior` integration test no longer calls `binding.releaseLogger()` manually in the teardown — `unload()` already releases the native logger via `_releaseNativeLogger`. * refactor: move SD C++ event normalization into addon.js Per the team-2 task doc (`TD-ADDON-INTERFACE-LLM-EMBED-SD.md`, applied to SD for parity with the LLM/Embed extractions): the native binding wrapper should own the mapping from raw C++ events to Output / Error / JobEnded. Adds `mapAddonEvent(rawEvent, data, error)` as a free export from `addon.js`, co-located with `SdInterface`. The function normalizes the C++-mangled event vocabulary into one of `Output` / `Error` / `JobEnded`: - `Error`-flavored event names → Error. - `Uint8Array` payloads (encoded image bytes) and `string` payloads (per-step progress JSON ticks) → Output. - Plain object payloads (RuntimeStats) → JobEnded. - Anything else → `null` (caller logs at debug level). `ImgStableDiffusion._addonOutputCallback` becomes a thin shim that imports `mapAddonEvent`, runs it on the raw C++ event, and dispatches the mapped logical event onto the active job. The "unhandled event" debug log is preserved at the dispatch site so a future C++ event-shape change still surfaces. * fix: address PR #1496 second-round review findings 1. `index.d.ts` `ImgStableDiffusionArgs.config` is now optional (`config?: SdConfig`). The README and the runtime already treated it as optional — the runtime forwards an empty config object when omitted and the C++ layer falls back to stable-diffusion.cpp defaults — but the type required it, producing a compile-time error for the call shape the runtime accepts. The JSDoc on the constructor is updated in lockstep. 2. The constructor now actually enforces the "absolute paths only" contract that the README and the error messages advertise. `assertAbsolute(key, value)` checks `path.isAbsolute` on `files.model` and on every supplied companion path (`clipL`, `clipG`, `t5Xxl`, `llm`, `vae`). Relative paths are rejected at construction time with a clear `TypeError` instead of silently being passed through to the native loader and failing later with an opaque error. 3. `docs/architecture.md` is no longer stale: - Version stamp updated from `v0.1.2` to `v0.2.0`. - The `isSplitLayout` description now matches the current code (`!!llm || !!t5Xxl || !!clipL || !!clipG`) and explains the FLUX.1 case the broader heuristic was added for. - The Decision 6 ("Exclusive Run Queue") section no longer claims `cancel()` is wrapped in `this._run(...)`. The "Key Relationships" table is updated to match — `cancel()` is intentionally outside the queue so it can interrupt an in-flight `run()`, with a short note explaining why. * fix: remove task-doc reference and refactor-narration comments - Remove task-doc reference from mapAddonEvent JSDoc in addon.js - Remove refactor-narration comment from _addonOutputCallback in index.js * fix: throw on second load(), log rejected responses, add mapAddonEvent unit test - load(): throw if already loaded. Caller must unload() first. Aligns with the team consensus (Yury/Gianfranco/Gustavo) — silent reload masks caller bugs. unload() already clears configLoaded. - _runInternal: replace silent `finalized.catch(() => {})` with a warn-level log so rejected responses are not swallowed when the caller does not await. - test/unit/map-addon-event.test.js: new unit test covering Error event mapping, Uint8Array → Output (image bytes), string → Output (progress tick), plain object → JobEnded (RuntimeStats), Error precedence over data shape, and null returns for unknown shapes. - CHANGELOG 0.2.0: document the load() throw. * fix: restore JSDoc on run() that was dropped during BaseInference removal The extensive JSDoc documenting every run() parameter (prompt, steps, width, height, guidance, cfg_scale, sampling_method, scheduler, seed, batch_count, vae_tiling, cache_preset, init_image, strength) was accidentally removed during the BaseInference removal refactor when run() was split into run() + _runInternal(). Restore it on the public run() method since that is the caller-facing contract. * fix: correct CHANGELOG error quote and remove dead files.model fallback Address review findings from the qvac-staff-code-reviewer agent: - CHANGELOG quoted a fabricated error message ("must be an absolute path to the main model weights") that the code does not throw. Replace with the actual messages emitted by assertAbsolute(): "must be an absolute path string" and "must be an absolute path (got: <value>)". Note that the same validation applies to the optional companion fields. - index.js: remove `this._files.model || ''` fallbacks in _load(). The constructor's assertAbsolute('model', files.model) already guarantees a non-empty absolute string, so the fallbacks are unreachable and encode a phantom contract (empty model path) that can never hold. * fix: make load() idempotent when already loaded Second load() on an already-loaded instance returns immediately instead of throwing. Matches the ReadyResource pattern used elsewhere in QVAC: open/load is idempotent; explicit unload() is required to swap weights. CHANGELOG updated. * doc: document missing breaking changes from BaseInference removal Address feedback to report all breaking changes from the BaseInference refactor, not just the constructor shape: - ImgStableDiffusion public methods removed: downloadWeights, pause, unpause, stop, status, destroy, getApiDefinition - cancel() no longer accepts a jobId argument getState() shape change and unload() addon-reference fix were already documented in prior commits. * fix: address lifecycle, cleanup, and CI-surface review findings - load() now runs through `this._run()` so concurrent calls on the same instance serialize instead of racing past the `configLoaded` guard. Two overlapping loads could previously both allocate a native addon and clobber `this.addon`, leaking one native handle. - _load() now wraps `addon.activate()` in a try/catch that best-effort unloads the partially-initialized addon, releases the native logger, and resets `this.addon = null` before re-throwing. Matches the crash-safety pattern already in embed and LLM. A failing activate() no longer leaves a zombie native instance that the next load() would orphan. - Add `test:unit` and `test:unit:generate` scripts that run the JS unit tests under `test/unit/*.test.js` via brittle + bare. Wire `test:unit` into `test:all` and into the PR workflow's ts-checks job so `map-addon-event.test.js` runs on every PR. - `.gitignore` the generated `test/unit/all.js` brittle runner. - CHANGELOG: document both fixes under Bug Fixes. * fix[ci]: run test:unit inside test:integration flow Unit tests need the bare runtime, which is only installed globally in the integration-test workflow (via npm install -g bare). My previous commit wired test:unit into the ts-checks job, which doesn't install bare, so it would have failed with command not found in CI. Chain test:unit at the script level instead — the integration-test workflow already runs npm run test:integration, which now runs unit tests first. Matches the standalone-repo precedent (qvac-lib-dl-filesystem, qvac-lib-decoder-audio, qvac-lib-error-base, etc.) of having the test script drive both. * fix[ci]: run test:unit via run-lint-and-unit-tests action Reviewer flagged that test:unit invoked the `bare` CLI, but the ts-checks job did not install it. My previous commit's workaround — chaining test:unit into test:integration at the script level — would have re-run unit tests on every platform in the 7-way integration matrix. Revert both. Use the existing `tetherto/oss-actions/.github/actions/run-lint-and-unit-tests` action instead, same as `qvac-lib-infer-onnx` and `ocr-onnx`. The action installs bare globally and runs `npm run test:unit --if-present` in a single fast step. * chore: test script chains test:unit + test:integration Matches the standalone-repo precedent (qvac-lib-inference-addon-base, qvac-lib-dl-filesystem, etc.) so 'npm run test' runs both flows locally for developers. * doc: fix mermaid classDiagram parsing error in architecture.md Mermaid's classDiagram uses { and } as class-body delimiters, so the inline object literal in the method signatures broke the parser and prevented the diagram from rendering on GitHub. Replace: - constructor(args: {files, config, logger?, opts?}) → constructor(args: ImgStableDiffusionArgs) (matches index.d.ts) - getState() {configLoaded} → getState() State Reported by maxim-smotrov at architecture.md:150. * chore[ci]: rename step to reflect what the action actually runs The run-lint-and-unit-tests action runs `npm run lint` and `npm run test:unit`. The step name "Run JavaScript tests" hides the lint half. Rename to "Run lint and unit tests" and update the step id accordingly. * fix: doc and type drift around img2img; dead code in SdModel.cpp - index.d.ts: replace the stale "not yet supported, throws at runtime" JSDoc on `init_image` and `strength` with accurate docs covering the FLUX.2 in-context-conditioning branch and the SD/SDEdit branch. This PR ships img2img support end-to-end, so the type hover docs contradicted the runtime behavior. - docs/img2img-quickstart.md: rewrite against the refactored API. Replace the two-arg constructor (`diskPath`, `modelName`, `llmModel`, `vaeModel`) with the single-object `{ files, config, logger }` shape, switch every example from `model.img2img(...)` to `model.run({ init_image, ... })`, and correct the package name from `@qvac/lib-infer-diffusion` to `@qvac/diffusion-cpp`. - docs/architecture.md: bump the package header to v0.3.0. - examples/quick-test.js: delete. It used ESM `import` syntax in a CommonJS package and imported a non-existent `diffusionAddon` named export; nothing referenced it. - addon/src/model-interface/SdModel.cpp: remove the duplicate `genParams.height = imgH;` in the `useRefImages` branch. Harmless dead store, but easy to miss in review. * doc: refresh Key Features, migration marker, and img2img JSDoc - architecture.md: Key Features list `Generation modes` now includes img2img alongside txt2img, matching the shipped runtime. - CHANGELOG.md: migration example marker changes from `BEFORE (<= 0.1.x)` to `BEFORE (<= 0.2.x)` since 0.2.x also used the old two-argument constructor. - index.d.ts: trim the init_image / strength JSDoc to verified facts. The old text said both were "not yet supported, throws at runtime", which is false on this branch (img2img ships end-to-end). Previous revision added branch-specific behavior claims; replaced with the minimal accurate description. * doc: restore JSDoc on SD cancel() and unload() Both methods had one-line JSDoc on main describing what they do ("Cancel the current generation job.", "Unload the model and release all resources."). The refactor dropped the JSDoc comments when it rewrote the method bodies. Restore them since the purpose statement is still accurate. * doc: trim verbose comments added during the refactor Tighten comments this PR introduced that drifted into over-explanation. Leave pre-existing comments as-is. - addon.js mapAddonEvent JSDoc: drop multi-paragraph prose; keep the one-sentence contract plus the param block. - index.js constructor: collapse the cancel-closure rationale to one line. - docs/architecture.md: bump Last Updated to 2026-04-16. * doc: restore pre-refactor createAddon JSDoc and load error log The refactor commit silently dropped the JSDoc block on _createAddon() and the 'Error during stable-diffusion model load' error log in _load(). Put them back so the refactor only changes what needs to change. * fix: release native logger when addon construction throws _load() wrapped only await this.addon.activate() in try/catch, but _createAddon() calls _connectNativeLogger() and then constructs SdInterface. If the SdInterface constructor throws, _nativeLoggerActive stays set and the native logger hook is never released; a retry on the same instance would reconnect on top of a stale hook. Move _createAddon() inside the try so the existing catch path runs _releaseNativeLogger() for every pre-activate failure. * chore: drop unused 'test' script, inline into 'test:all' The 'test' alias was only consumed by 'test:all', and neither was referenced in CI workflows or the README. 'test:all' ran test:unit twice because it called both test:unit and the 'test' alias. Remove 'test' and rewrite 'test:all' to run test:unit, test:integration, and test:cpp directly. * doc: fix 0.3.0 CHANGELOG heading depth and queue serialization scope The 0.3.0 section used ## for Breaking Changes / Features / Bug Fixes / Pull Requests, making them siblings of the version heading instead of children; bump those to ### and the leaf subsections from ### to #### so the TOC renders correctly, matching the 0.2.x entries. Two architecture.md spots still said the exclusive run queue serialises only run()/unload() even though load() now also wraps in this._run(...); align both. * doc: fix flowchart mermaid parse errors in data-flows Three flowchart node labels contained ( ) inside unquoted [ ] labels: RunJob[addon.runJob(paramsJson)] EncodePrompt[Encode prompt (CLIP)] InitLatents[Initialize random latents (seed)] Mermaid treats the ( as the start of a stadium-shaped node inside the rectangular node, so the diagram failed to parse. Quote each label so the parens render literally. * doc: note FLUX.2 ignores strength in GenerationParams JSDoc docs/img2img-quickstart.md already notes that the FLUX.2 in-context conditioning path does not use strength and the input image is routed through ref_images instead. The d.ts on GenerationParams.strength still implied the knob applies universally; match the doc so IDE hover docs tell the same story.

…WeightsProvider (#1493) * chore[bc]: remove BaseInference inheritance and WeightsProvider from embed addon Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0: - createJobHandler() for single-job lifecycle management - exclusiveRunQueue() for run serialization - Direct shard streaming via bare-fs instead of WeightsProvider Constructor now takes { files: { model: string[] }, config, logger, opts } instead of { loader, diskPath, modelName }. * fix: update embed examples to use new constructor shape Remove FilesystemDL dependency, use files: { model: [...] } pattern. * fix: update embed benchmark tooling to new constructor shape * fix: pass typed config object to embed addon and restore addonCtor param name * fix: pass no-mmap as empty string flag in embed benchmark config The C++ embed addon converts config map to llama.cpp CLI flags via --key value format. Boolean flags like --no-mmap must be passed with an empty string value so the C++ side emits just --no-mmap (no value). String true was being passed as --no-mmap true which llama.cpp rejects. * docs: update embed README and data-flows for new constructor pattern * docs: update embed architecture, data-flows, README sharded contract * fix: drop loader destructuring from embed multi-instance test * chore[bc]: address PR #1493 review findings and bump to 0.14.0 Bumps `@qvac/embed-llamacpp` to `0.14.0` per the addon-changelog process — minor bump on a pre-1.0 package signals the breaking constructor change to consumers using semver ranges. Adds the matching `0.14.0` block to `CHANGELOG.md` documenting the new single-object constructor with `files.model`, the removal of `BaseInference` + `WeightsProvider`, the dependency churn, and every behaviour change in this release. Hardens the JS layer based on the review: - Constructor now throws a clear `TypeError` when `files` / `files.model` is missing or empty, instead of crashing with an opaque "cannot read properties of undefined" later. - `_runInternal` now throws "Addon not initialized. Call load() first." when invoked before `load()`, matching the diffusion addon and giving a useful error from the public surface. - `_load()` wraps `_streamShards` + `addon.activate()` in a try/catch that best-effort-unloads the partially-initialized native instance and resets `this.addon = null` so a subsequent `load()` does not leak a zombie addon. - `createJobHandler({ cancel })` closure uses optional chaining so a stale `response.cancel()` after `unload()` is a no-op rather than a `TypeError`. - `unload()` sets `this.addon = null` after `addon.unload()`, so the new `if (!this.addon)` guard in `_runInternal` is also effective post-unload. - `cancel()` re-adds the defensive `?.cancel` check. - `_addonOutputCallback` no longer treats unknown events as embedding output. The dead `else` branch that mirrored the `Embeddings` branch is replaced with a debug log so a future C++ event surface change is surfaced rather than silently fed into `response.output`. - The `_load()` primary-path selection now picks the first entry matching the shard regex, replacing the fragile `[length - 1]` index. This stays compatible with the documented sharded order (`tensors.txt` first, shards second) and with the non-sharded single-file path; an inline comment explains the contract. - Error log line passes the `Error` object directly so loggers can format the full stack instead of `toString()`-ing it. * refactor: move embed C++ event normalization into addon.js Per the team-2 task doc (`TD-ADDON-INTERFACE-LLM-EMBED-SD.md`, embed section): "Move event normalization into `addon.js` `BertInterface` — the native binding wrapper should own the mapping from raw C++ events to Output / Error / JobEnded". Adds `mapAddonEvent(rawEvent, data, error)` as a free export from `addon.js`, co-located with `BertInterface`. The function normalizes the C++-mangled event vocabulary into one of `Output` / `Error` / `JobEnded`: - RuntimeStats payloads (structurally detected via the `tokens_per_second` / `total_tokens` / `total_time_ms` / `batch_size` / `context_size` keys) → JobEnded with `backendDevice` mapped from `0/1` to `'cpu'/'gpu'`. - `Error`-flavored event names → Error. - `Embeddings`-flavored event names → Output. - Anything else → `null` (caller logs at debug level). `GGMLBert._addonOutputCallback` becomes a thin shim that imports `mapAddonEvent`, runs it on the raw C++ event, and dispatches the mapped logical event onto the active job. The "unhandled event" debug log is preserved at the dispatch site so a future C++ event-shape change still surfaces. Also fixes the misleading JSDoc on `BertInterface.loadWeights`: the native binding reads the JS property name `chunk` (verified in `qvac-lib-inference-addon-cpp/JsBlobsStream.hpp::appendBlob`, lines 41–42 and 66–67), not `contents`. The C++ local variable is named `contents`, which is what the proposal text was referencing — but the on-the-wire JS property name is `chunk` and the JS layer call sites are correct. * docs: address PR #1493 second-round review findings 1. `docs/data-flows-detailed.md` no longer claims `path: lastFile` / `files.model[last]` is the primary path handed to llama.cpp. The mermaid diagram and prose now describe the actual algorithm: the addon scans `files.model` for the first entry matching the shard regex `^(.+)-(\d+)-of-(\d+)\.gguf$` and uses that as `params.model.path`, falling back to `files.model[0]` for non-sharded single-file models. The `.tensors.txt` companion is consumed by the streaming layer but is never the primary path. 2. `docs/architecture.md` Decision 6 ("Exclusive Run Queue") no longer describes `_withExclusiveRun()` as the active mechanism. It now correctly documents that the addon composes `exclusiveRunQueue()` from `@qvac/infer-base@^0.4.0` directly, stores it as `this._run`, and wraps `run()` / `unload()` with it. The historical reference to `_withExclusiveRun` is preserved for context. 3. `README.md` config table no longer documents CLI-style flags (`-dev`, `-ngl`, `--batch-size`, `-fa`, `--main-gpu`, etc.). It now lists the actual JS config object keys consumed by the native layer (`device`, `gpu_layers`, `batch_size`, `pooling`, `attention`, `embd_normalize`, `flash_attn`, `main-gpu`, `verbosity`), each documented as a string value because that is what `getSubmap` parses on the native side. 4. `README.md` "API behavior by state" no longer claims a second `run()` "can wait very briefly" before failing. The actual `exclusiveRunQueue` busy guard is synchronous and rejects immediately. Updated the table and the prose to match. 5. `README.md` "Tests" section no longer claims the embed test suite covers "tool calling, multimodal capabilities, cache management, chat templates" — embed has none of those, those belong to the LLM addon. Replaced with an accurate description of the actual integration test coverage (load → embed → unload, multi-instance concurrency, run/cancel lifecycle) and the C++ unit tests, plus an explicit note pointing readers at `@qvac/llm-llamacpp` for the missing features. * fix: remove internal task-doc reference from mapAddonEvent JSDoc * fix: accurate README busy-state, log rejected inferences, add mapAddonEvent unit test - README: rewrite the "run while a job is active" section to reflect actual behavior — exclusiveRunQueue serializes the call, it waits in the queue, then the busy guard rejects because _hasActiveResponse is still set. The queue does buffer, previous claim was wrong. - index.js: replace silent `finalized.catch(() => {})` with a warn-level log so inference rejections are not swallowed completely when the caller does not await the response. - test/unit/map-addon-event.test.js: new unit test covering the stats detection, backendDevice mapping, Error/Embeddings event names, stats precedence over event name, and null-return for unknown shapes. * fix: throw on second load() instead of silently unload+reload Per team consensus (Yury, Gianfranco, Gustavo): a second load() on an already-loaded instance is almost always a caller bug. Silent unload+ reload masks the mistake. Throw instead so the caller must either unload() explicitly or avoid the duplicate call. unload() already clears configLoaded, so the supported pattern is: await model.load() // ... await model.unload() await model.load() // starts fresh Updates CHANGELOG 0.14.0 with the new behavior. * fix: extract pickPrimaryGgufPath, document network-streaming loss, warn on unknown events, revert AddonCtor rename Address review findings from the qvac-staff-code-reviewer agent: - CHANGELOG: document the capability loss (in-memory streaming from network sources — URLs, Hyperdrive — is no longer supported). Matches the LLM addon's CHANGELOG and is what the PR description promised but the CHANGELOG was missing. - index.js: extract pickPrimaryGgufPath(files) as a named function with JSDoc (mirrors packages/qvac-lib-infer-llamacpp-llm/index.js precedent). Export via module.exports.pickPrimaryGgufPath for unit testing. - test/unit/pick-primary-gguf-path.test.js: 4 unit tests documenting the tensors.txt-first ordering contract and single-file fallback, matching the LLM addon's test shape. - index.js: unknown addon events now log at warn (not debug). The inline comment already said "reaching this branch indicates a native-layer change worth surfacing" — debug level was hiding exactly what we want to surface. - CHANGELOG: update the Bug Fixes entry to reflect the warn level. - benchmarks: revert the AddonCtor → addonCtor casing rename. PascalCase for a constructor reference is the conventional JS style, and the rename was a while-I'm-here change unrelated to the refactor. * fix: make load() a silent no-op when already loaded (ReadyResource pattern) Per the thread review, Yury pushed back on the earlier throw behavior: - A second load() is always accidental — SDK has no code path that calls load twice on the same instance. - If it's always accidental, throwing forces SDK to try/catch and swallow, which is wasted ceremony vs. silent no-op. - Aligns with the ReadyResource pattern QVAC already uses elsewhere. Change: `load()` on an already-loaded instance now returns immediately instead of throwing. CHANGELOG updated to document idempotency. Callers that intentionally want to swap weights still must call unload() first. * doc: correct mapAddonEvent return-null contract The JSDoc claimed `null` was "currently never" returned, but the function does return `null` for unknown event names so the caller (`_addonOutputCallback` in index.js) can log a warning and skip dispatch instead of feeding the payload into the active response. * doc: document missing breaking changes from BaseInference removal Address review feedback from Maksim: the 0.14.0 CHANGELOG missed several breaking changes that fall out of removing BaseInference inheritance. Fill in: - getState() narrows from {configLoaded, weightsLoaded, destroyed} to {configLoaded} only - GGMLBert public methods removed: downloadWeights, pause, unpause, stop, status, destroy, getApiDefinition - load() takes no arguments (was (closeLoader, reportProgressCallback)) - Type exports removed from index.d.ts: ReportProgressCallback, Loader, GGMLArgs, DownloadWeightsOptions, DownloadResult - BertInterface outputCb drops the jobId argument - BertInterface.runJob now returns Promise<boolean> instead of Promise<void> (true = accepted, false = busy) * fix: address lifecycle, validation, and CI-surface review findings - load() now runs through `this._run()` so concurrent calls on the same instance serialize instead of racing past the `configLoaded` guard. Two overlapping loads could previously both allocate a native addon and clobber `this.addon`, leaking one native handle. - Constructor now validates each `files.model` entry with `path.isAbsolute()` (matching the existing error-message contract). Relative paths are rejected at construction time instead of bubbling up from bare-fs later. - `pickPrimaryGgufPath` is now declared in `index.d.ts` so the TS surface matches the CommonJS export at `index.js`. - Add `test:unit` and `test:unit:generate` scripts that run the JS unit tests under `test/unit/*.test.js` via brittle + bare. Wire `test:unit` into `test:all` and into the PR workflow's ts-checks job so the new unit coverage runs on every PR. - `.gitignore` the generated `test/unit/all.js` and `test/integration/all.js` brittle runners. * doc: add CHANGELOG entries for load() serialization and absolute-path validation * fix[ci]: run test:unit via run-lint-and-unit-tests action Replace hand-rolled test:unit step (which invoked `bare` in a job that never installs it) with the existing run-lint-and-unit-tests external action. Same pattern qvac-lib-infer-onnx and ocr-onnx already use. The action installs bare globally and runs `npm run test:unit --if-present`. Also add a `test` script that chains test:unit + test:integration for local dev convenience, matching the standalone-repo precedent. * doc: fix mermaid parsing errors in architecture.md and data-flows-detailed.md architecture.md: - Line 108: `BARE[Bare Runtime<br/>(bare-fs)]` — unquoted parens in a `graph` node label triggered mermaid's circle-shape interpretation. Wrap the label in double quotes. - Line 157: classDiagram constructor signature used an inline destructured-object literal (`{ files, config, logger, opts }`), and { } are class-body delimiters. Replace with the canonical named type `GGMLBertArgs` from index.d.ts. data-flows-detailed.md: - Line 340: flowchart node label `ArrayPath[type: 'sequences'<br/>input: string[]]` — the nested `[]` on `string[]` collided with the outer node-label brackets. Wrap the label in double quotes; do the same for the sibling StringPath node. - Line 354: edge label `-->|vector<string>|` — unquoted angle brackets rendered as HTML. Quote the label and escape the angle brackets. Reported by maxim-smotrov. * chore[ci]: rename step to reflect what the action actually runs The run-lint-and-unit-tests action runs `npm run lint` and `npm run test:unit`. The step name "Run JavaScript tests" hides the lint half. Rename to "Run lint and unit tests" and update the step id accordingly. * doc: refresh architecture.md and data-flows-detailed.md for refactored API architecture.md: - Bump package version header to v0.14.0. - Bump `@qvac/infer-base` min version in the dependency table to ^0.4.0 to match package.json. - Update the Exclusive Run Queue decision to reflect that `load()` is now serialized alongside `run()` and `unload()`. data-flows-detailed.md: - Drop the `jobId` argument from the `jsOutputCallback` and `outputCb` calls in the two inference-flow sequence diagrams. The refactor dropped `jobId` from the `BertInterface` outputCb signature; the diagrams were still showing the pre-refactor shape. * doc: restore class-level JSDoc and inline comments the refactor dropped Restore three pieces of documentation that were deleted when the BaseInference removal was applied but whose content is still accurate against the refactored code: - Class-level JSDoc on `GGMLBert` describing what the class does. - Inline comment in `_runInternal` explaining array-vs-string input routing (`type: 'sequences'` vs `type: 'text'`). - Inline comment noting the addon-cpp accept contract (no events fire until `runJob` returns true). - Short one-line JSDoc on `cancel()` and `unload()`. * doc: tighten d.ts JSDoc, loadWeights JSDoc, doc dates Declaration-file JSDoc surfaces in IDE hover tooltips, so multi-paragraph prose is noise. Trim `pickPrimaryGgufPath` to a one-liner covering the only behavior the type hover needs to convey. `loadWeights` in addon.js had a paragraph explaining JsBlobsStream.hpp and field-name load-bearing-ness. The pre-refactor JSDoc was a plain list of params; the field-name rationale belongs in commit history / architecture doc, not hovering on every caller. Trim to the original style with updated field names (`chunk` not `contents`). architecture.md and data-flows-detailed.md: bump Last Updated from 2026-04-07 to 2026-04-16 since both were updated on this branch. * doc: trim verbose comments added during the refactor Tighten comments this PR introduced that drifted into over-explanation. Leave pre-existing comments as-is. - addon.js mapAddonEvent JSDoc: drop the multi-paragraph prose; keep the one-sentence contract plus the param block. - addon.js stats-detection inline comment: reduce to one line. - index.js pickPrimaryGgufPath JSDoc: replace multi-paragraph prose with a single-line summary citing the C++ contract. - index.js class header: one-line purpose statement. - index.js constructor: collapse the cancel-closure rationale to one line. - index.js _addonOutputCallback: drop the narration comment pointing at addon.js — the very next line imports and calls mapAddonEvent, so the code already makes the reader know where event mapping lives. - index.js unknown-event handler: reduce to two lines. * doc: restore pre-refactor load/createAddon logs and JSDoc The refactor commit silently dropped three info logs from _load() ('Creating addon with configuration', 'Activating addon') and _createAddon() ('Creating Bert interface with configuration'), plus the JSDoc block on _createAddon(). Put them back so the refactor only changes what needs to change. * doc: fix classDiagram parse error in architecture.md BertInterface.loadWeights destructure signature {filename, chunk, completed} collided with the classDiagram class-body braces. Replace with the scalar 'data' param so mermaid parses the block. * chore: drop unused 'test' script The 'test' alias was redundant with 'test:all' (which already runs test:unit, test:integration, and test:cpp directly) and had no workflow or README caller. Drop it. * doc: fix _createAddon config field and queue serialization scope _createAddon() JSDoc said 'configurationParams.settings' but the field is actually 'config' (same pre-existing typo on main; restoring it carried the typo forward). Two spots in docs/architecture.md still said the exclusive run queue serialises only run()/unload() even though load() now also wraps in this._run(...); align both. * doc: document ctx_size in the embed config table The usage example set ctx_size: '512' but the config table below listed every other key and omitted it, leaving readers without a description of what the value controls or its default. Add a row for ctx_size so the example keys are all explained. * doc: fix flowchart mermaid parse error in data-flows Inner [] inside the quoted flowchart node label ('input: string[]') tripped the mermaid parser on GitHub's renderer. Swap for plain English so the diagram renders. * fix: move addon construction into crash-safe try block _createAddon() was outside the try so a synchronous throw in require('./binding') or binding.createInstance() would leave this.addon set to a partial native handle and never reach the cleanup path. Route addon construction through the same try the shard-streaming and activate() calls use. * doc: align shard regex in data-flows with index.js data-flows-detailed.md:80 advertised the regex as '^(.+)-(\d+)-of-(\d+)\.gguf$' but index.js:19 uses the simpler non-capturing form '/-\d+-of-\d+\.gguf$/'. Update the doc to match the code so readers who copy the regex get the real contract. --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

…ightsProvider (#1494) * chore[bc]: remove BaseInference inheritance and WeightsProvider from LLM addon Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0: - createJobHandler() for single-job lifecycle management - exclusiveRunQueue() for run serialization - Direct shard streaming via bare-fs instead of WeightsProvider Constructor now takes { files: { model: string[], projectionModel?: string }, config, logger, opts } instead of { loader, diskPath, modelName, projectionModel } + config. All finetune, media, and filtered logger functionality preserved. * fix: correct FinetuneProgress and finetune terminal handling in output callback FinetuneProgress must call updateStats(data.stats), not updateOutput(data). Finetune terminal JobEnded must call ended(data) as result, not updateStats. * fix: update all LLM examples and model-loading test to new constructor shape Update 13 examples and sharded model test to use files: { model: [...] } pattern. Remove FilesystemDL dependency from all examples and tests. * fix: update sharded model test to download shards to disk first The network loader test used the old loader-based constructor. Rewritten to download shards via HttpDL to disk, then pass absolute paths. * fix: update LLM benchmark tooling to new constructor shape * fix: update LLM perf benchmark sweep and judge to new constructor shape * docs: update LLM README, finetuning, and afriquegemma docs for new constructor * fix: update LLM prepare-prompts and verify-prompts to new constructor * fix: update LLM finetuning unit tests to new constructor and exclusiveRunQueue * docs: update LLM architecture, data-flows, finetuning, README sharded contract * docs: align LLM finetuning docs and mobile README with new constructor * chore[bc]: address PR #1494 review findings and bump to 0.15.0 Bumps `@qvac/llm-llamacpp` to `0.15.0` per the addon-changelog process — minor bump on a pre-1.0 package signals the breaking constructor change to consumers using semver ranges. Adds the matching `0.15.0` block to `CHANGELOG.md` documenting the new single-object constructor with `files`, the removal of `BaseInference` + `WeightsProvider`, the dropped `destroy()` method, the dependency churn, and every behaviour change in this release. Hardens the JS layer based on the review: - Constructor now throws a clear `TypeError` when `files` / `files.model` is missing or empty, instead of crashing with an opaque "cannot read properties of undefined" later. - `_runInternal` now throws "Addon not initialized. Call load() first." when invoked before `load()`, matching `finetune()` and the diffusion addon. - `_load()` wraps `_streamShards` + `addon.activate()` in a try/catch that best-effort-unloads the partially-initialized native instance and resets `this.addon = null` so a subsequent `load()` does not leak a zombie addon. - `createJobHandler({ cancel })` closure uses optional chaining so a stale `response.cancel()` after `unload()` is a no-op rather than a `TypeError`. - `unload()` sets `this.addon = null` after `addon.unload()`, so the new `if (!this.addon)` guard in `_runInternal` is also effective post-unload. - `pause()` and `cancel()` re-add the defensive `?.cancel` check. - The `_load()` primary-path selection now picks the first entry matching the shard regex, replacing the fragile `[length - 1]` index. This stays compatible with the documented sharded order (`tensors.txt` first, shards second) and with the non-sharded single-file path; an inline comment explains the contract. - The `_handleAddonOutputEvent` error log line now passes the `Error` object directly so loggers can format the full stack. Drops dead `_isSuppressedNoResponseLog` / `_createFilteredLogger` / `_originalLogger` plumbing. Those existed to swallow `'No response found for job'` warnings emitted by the old `BaseInference._jobToResponse` Map; the new `createJobHandler`-based architecture cannot emit that message, so the filter, the wrapped logger, and the `_originalLogger` indirection are all gone. The user-supplied logger is now used directly. Restores JSDoc on every `FinetuneOptions` field in `index.d.ts`, including default values (`numberOfEpochs = 1`, `learningRate = 1e-4`, `batchSize = 128`, …) so IDE tooltips show them without needing to read `docs/finetuning.md`. * refactor: move LLM C++ event normalization into addon.js Per the team-2 task doc (`TD-ADDON-INTERFACE-LLM-EMBED-SD.md`, LLM section): "Move event name normalization from `index.js` `_addonOutputCallback` into `addon.js` `LlamaInterface` — the native binding wrapper should own the mapping from raw C++ events to Output / Error / JobEnded / FinetuneProgress." Adds `mapAddonEvent(rawEvent, data, error, state)` as a free export from `addon.js`, co-located with `LlamaInterface`. The function normalizes the C++-mangled event vocabulary into one of `Output` / `Error` / `JobEnded` / `FinetuneProgress`, including: - TPS-shaped runtime stats → JobEnded with `backendDevice` mapped from `0/1` to `'cpu'/'gpu'`. - Finetune terminal payloads (`{op:'finetune', status, stats?}`) → JobEnded carrying the finetune payload, and arms the skip flag so the trailing TPS stats from the finetune are not dispatched as a fresh inference terminal. - `finetune_progress` payloads → FinetuneProgress. - Anything else with an `Error`-flavored event name → Error. - String payloads → Output. `LlmLlamacpp._addonOutputCallback` becomes a thin shim that imports `mapAddonEvent`, hands it the per-instance state object (now `this._addonEventState = { skipNextRuntimeStats }` instead of the bare `_skipNextRuntimeStats` field), and forwards the mapped event to `_handleAddonOutputEvent`. Stateful flag lives on the model so unit tests can still poke at it via `model._addonEventState.skipNextRuntimeStats`. Updated all 9 references in `test/unit/finetuning.test.js`. All 31 unit tests still pass; lint and dts checks clean. Also fixes the misleading JSDoc on `LlamaInterface.loadWeights`: the native binding reads the JS property name `chunk` (verified in `qvac-lib-inference-addon-cpp/JsBlobsStream.hpp::appendBlob`, lines 41–42 and 66–67), not `contents`. The C++ local variable is named `contents`, which is what the proposal text was referencing — but the on-the-wire JS property name is `chunk` and the JS layer call sites are correct. * fix: address PR #1494 second-round review findings 1. `test/integration/http-loader.js` no longer extends `@qvac/dl-base`. The base class was only providing a `close()` shim around `_close()`, and the package's devDependencies no longer list `@qvac/dl-base` after the loader-removal refactor. The helper now stands on its own — `getStream()` and `close()` are the only methods the sharded model-loading test calls, so the rest of the BaseDL surface (including the unused `getFileSize` and `list`) is dropped. Removes the dangling require that would break a clean install of this package and block the sharded test in CI. 2. `examples/multiModal.js` no longer passes `content: imageFilePath` on the second `media` message. The native binding only accepts `Uint8Array` payloads on `media` messages — file paths were silently broken after the loader removal. The example now reuses the same `imageBuffer` for both inferences and uses a different prompt on the second one to keep the example pedagogically distinct. 3. `index.d.ts` `AddonMessage` now exposes the optional `generationParams?: GenerationParams` field. The runtime path in `LlmLlamacpp._runInternal` already serializes this field onto every text message it forwards through `addon.runJob`, but the published transport type omitted it — IDE consumers building their own message-shaped payloads would lose the per-call overrides. The field documents that it is forwarded from `RunOptions.generationParams` and is the canonical way to vary sampling per request without re-loading the model. * fix: extract pickPrimaryGgufPath, restore multiModal example, fix docs - Extract shard-picker logic into named pickPrimaryGgufPath() with unit tests documenting the contract (tensors.txt-first ordering, single-file fallback). Move SHARD_REGEX inside the function. - Revert multiModal.js to original: first inference uses Uint8Array, second uses string path. Both C++ code paths work. Remove false comment claiming file paths are not supported. - Restore stripped JSDoc on FinetuneValidationSplit.fraction and FinetuneValidationDataset.path in index.d.ts. - Fix docs/architecture.md and docs/data-flows-detailed.md: 4 occurrences incorrectly said "last" shard is the primary path; actual code picks the first shard regex match. - Hardcode shard filenames in model-loading integration test instead of generating them via regex. - Add network streaming capability loss note to CHANGELOG. * fix: correct version in architecture.md and remove stale dl-filesystem benchmark dep - docs/architecture.md header: v0.14.3 → v0.15.0 to match package.json - benchmarks/performance/package.json: remove @qvac/dl-filesystem (no longer used after FilesystemDL references were removed from all benchmark JS files) * fix: align _hasActiveResponse clearing with embed pattern Remove the synchronous clear in _handleAddonOutputEvent on JobEnded/Error. The .finally() on response.await() already clears the flag when the response promise settles, and exclusiveRunQueue serializes _runInternal so the next call cannot race the current one. Matches the embed addon's pattern, where .finally() is the sole clear path outside of unload(). * fix: throw on second load(), log rejected responses, add mapAddonEvent unit test - load(): throw if already loaded. Caller must unload() first. Aligns with the team consensus (Yury/Gianfranco/Gustavo) — silent reload masks caller bugs. unload() already clears configLoaded. - _runInternal / finetune: replace silent `finalized.catch(() => {})` with a warn-level log so rejected responses are not swallowed when the caller does not await. - test/unit/map-addon-event.test.js: new unit test covering TPS stats mapping + backendDevice translation, skipNextRuntimeStats dropping, finetune terminal + skip-flag arming, finetune_progress, Error event, string-as-token Output, and default fall-through. - CHANGELOG 0.15.0: document the load() throw. * fix: restore JSDoc on run() that was dropped during BaseInference removal The JSDoc documenting run()'s prompt and runOptions parameters was accidentally removed during the BaseInference removal refactor when run() was split into run() + _runInternal(). Restore it on the public run() method, and reference the full RunOptions type (which already documents prefill / generationParams / cacheKey / saveCacheToDisk in index.d.ts) so the docs stay authoritative in one place. * fix: migrate afriquegemma-edge-cases test to new addon constructor The afriquegemma-edge-cases.test.js file came in via the upstream/main merge but still used the pre-refactor constructor shape: new LlmLlamacpp({ loader, modelName, diskPath, ... }, config) with a FilesystemDL loader. All 7 tests in the file are now migrated to: new LlmLlamacpp({ files: { model: [path.join(dirPath, modelName)] }, config, logger, opts }) Removed FilesystemDL import and all loader.close() calls. Added isMobile skip flag matching the pattern in afriquegemma-translation. Caught by the qvac-staff-code-reviewer agent as a "merge brought in a new consumer of the old API" — restore-the-class issue across the family. * fix: make load() idempotent when already loaded Second load() on an already-loaded instance returns immediately instead of throwing. Matches the ReadyResource pattern used elsewhere in QVAC: open/load is idempotent; explicit unload() is required to swap weights. CHANGELOG updated. * test: regenerate mobile integration auto.cjs Integration test files were touched during the refactor and the generated mobile harness was not regenerated. `npm run test:mobile:generate` output committed so `validate-mobile-tests.js` passes. * doc: document missing breaking changes from BaseInference removal Address feedback to report all breaking changes from the BaseInference refactor, not just the constructor shape: - getState() narrows from {configLoaded, weightsLoaded, destroyed} to {configLoaded} only - LlmLlamacpp public methods removed: downloadWeights, unpause, stop, status, destroy, getApiDefinition (destroy was already mentioned; other five were missing) - load() takes no arguments (was (closeLoader, onDownloadProgress)) - Type exports removed from index.d.ts: ReportProgressCallback, Loader, DownloadWeightsOptions, DownloadResult Also fix the stale (0.15.0) version marker in the AFTER code block. * fix: address lifecycle, validation, and CI-surface review findings - load() now runs through `this._run()` so concurrent calls on the same instance serialize instead of racing past the `configLoaded` guard. Two overlapping loads could previously both allocate a native addon and clobber `this.addon`, leaking one native handle. - Constructor now validates each `files.model` entry with `path.isAbsolute()` and applies the same check to the optional `files.projectionModel` (which previously had no validation at all). Relative paths are rejected at construction time instead of bubbling up from bare-fs / native load. - `pickPrimaryGgufPath` is now declared in `index.d.ts` so the TS surface matches the CommonJS export at `index.js`. - Add `test:unit` and `test:unit:generate` scripts that run the JS unit tests under `test/unit/*.test.js` via brittle + bare. Wire `test:unit` into `test:all` and into the PR workflow's ts-checks job so `map-addon-event.test.js`, `pick-primary-gguf-path.test.js`, and the pre-existing `finetuning.test.js` all run on every PR. * doc: add CHANGELOG entries for load() serialization and absolute-path validation * fix[ci]: run test:unit via run-lint-and-unit-tests action Replace my hand-rolled test:unit step (which invoked `bare` in a job that never installs it) with the existing run-lint-and-unit-tests external action. Same pattern qvac-lib-infer-onnx and ocr-onnx already use. The action installs bare globally and runs `npm run test:unit --if-present`. Also chain test:unit into the `test` script for local dev convenience, matching the standalone-repo precedent (qvac-lib-inference-addon-base, qvac-lib-dl-filesystem, etc.). * doc: fix mermaid parsing errors in architecture.md and finetuning.md architecture.md:159 — mermaid classDiagram uses { } as class-body delimiters; the inline destructured-object syntax in the constructor signature broke parsing. Replace with the canonical named type `LlmLlamacppArgs` from index.d.ts so the class diagram renders. finetuning.md:251 — sequence-diagram message contained `(_run)` and `_hasActiveResponse` where the leading underscore was being interpreted as mermaid italic-open, and slashes in `validationSplit/useEvalDatasetForValidation/evalDatasetPath` made the message ambiguous. Reword to use prose-style commas and drop the leading-underscore identifiers. Reported by maxim-smotrov. * chore[ci]: rename step to reflect what the action actually runs The run-lint-and-unit-tests action runs `npm run lint` and `npm run test:unit` (and installs bare in between). The step name "Run JavaScript tests" hides the lint half. Rename to "Run lint and unit tests" and update the step id accordingly. * fix: readme, finetune lifecycle, multimodal type README quickstart, sharded, and OCR examples now use `path.resolve('./models')` so the resulting `files.model` entries and `files.projectionModel` are absolute. The refactored constructor rejects relative paths, which meant the README snippets threw `TypeError` when copied verbatim. `finetune()` moves the `!this.addon` readiness check and the `_checkpointSaveDir` assignment inside the `this._run(...)` closure, matching the pattern `run()` uses via `_runInternal`. If `unload()` is already queued ahead of `finetune()`, the guard now runs after `unload()` nulls `this.addon` instead of before, so the caller gets the intended "Call load() first." error rather than a null-dereference crash inside the queued body. `UserMediaMessage.content` widens from `Uint8Array` to `Uint8Array | string`. The C++ layer has always accepted both (raw bytes go through `parseMedia`; string paths go through `loadMedia` in LlamaModel.cpp), and the OCR / multimodal examples exercise the string-path form. The d.ts was inadvertently narrower than the runtime contract. * fix: preserve LogMsg event name in mapAddonEvent Native `JsLogMsgOutputHandler` emits log events whose payload is a plain string (`js::String::create(env, logMsg)`). The old mapping had a generic `typeof rawData === 'string'` fallback that remapped every string-payload event to `Output`, so any native LogMsg was quietly pushed into the job output stream instead of the logger. The `_handleAddonOutputEvent` branch that routes `LogMsg` to `this.logger.info()` was therefore unreachable. Check the `LogMsg` event name before the string-to-Output fallback so log messages keep their type and reach the logger. Add a unit test covering the precedence. * doc: restore class JSDoc, method JSDoc, and media-separation comments Restore documentation that the refactor dropped but whose content is still accurate against the refactored code: - Class-level JSDoc on LlmLlamacpp describing what the class does. - Short JSDoc on pause(), cancel(), and unload() explaining each method's purpose, including how pause() saves a resumable checkpoint and how cancel() wipes it so the next finetune() starts fresh. - Inline comments in _runInternal explaining the media/text separation: binary blobs go into promptMessages as type: 'media' entries in order, then the JSON text payload carries empty-content placeholders for each media item so tokenization can align. * doc: shorten pickPrimaryGgufPath JSDoc in d.ts to a single line Declaration-file JSDoc surfaces in IDE hover tooltips, so multi-paragraph prose is noise. Trim to a one-liner covering the only behavior the type hover needs to convey. The "exported for unit testing" rationale is dropped since consumers do not need it on the type surface. * doc: trim verbose comments added during the refactor Tighten comments this PR introduced that drifted into over-explanation. Leave pre-existing comments as-is. - addon.js mapAddonEvent JSDoc: drop the multi-paragraph prose about C++ event naming and stateful ordering; keep the one-sentence contract plus the param block. - index.js pickPrimaryGgufPath JSDoc: replace the multi-paragraph explanation of the caller's shard-list contract with a single-line summary citing the C++ regex contract. - index.js class header on LlmLlamacpp: reduce to a single purpose line. - index.js constructor block: shorten the lazy-deref rationale and the _addonEventState comment to one line each. - index.js _addonOutputCallback: reduce the three-line comment pointing at addon.js to a single line. The detailed rationale is already in addon.js mapAddonEvent JSDoc. - index.js media-separation comment: restore the one-line wording that already existed on main; earlier revision expanded it into three lines unnecessarily. * doc: drop narration comment on _addonOutputCallback The comment said "Event-name normalization lives in addon.js (mapAddonEvent)", but the very next line imports and calls mapAddonEvent — the code already tells the reader where event mapping lives. Remove the line so the code speaks for itself. * doc: restore FinetuneOptions JSDoc to pre-refactor forms The refactor commit unintentionally rephrased FinetuneOptions JSDoc lines that the refactor itself did not change. Revert those fields back to main's original wording so the diff only carries structural changes tied to the interface migration. * doc: restore pre-refactor load/createAddon logs and JSDoc The refactor commit silently dropped the _load() progress logs ('Creating addon with configuration', 'Activating addon'), the 'Error during model load' error log, and the JSDoc block on _createAddon(). Put them back so the refactor only changes what needs to change. * chore: drop unused 'test' script, inline into 'test:all' The 'test' alias was only consumed by 'test:all', and neither was referenced in CI workflows or the README. 'test:all' ran test:unit twice because it called both test:unit and the 'test' alias. Remove 'test' and rewrite 'test:all' to run test:unit, test:integration, and test:cpp directly. * doc: correct pre-refactor constructor marker to <= 0.15.x 0.15.x still used the old (args, config) constructor shape; the old example applies to any 0.15.x caller, not just 0.14.x. Align the CHANGELOG marker with the PR body. * test: run AfriqueGemma tests on mobile, matching main The backmerge of upstream/main carried a stale 'skip: isMobile' from the pre-refactor translation test into the six new translation tests and the edge-cases migration. Main's a570189 deliberately dropped the mobile skip; restore that intent. The isMobile constant is unused after this and dropped. * doc, test: fix _createAddon JSDoc and cover string-path media content _createAddon() JSDoc referenced 'configurationParams.settings' and omitted 'projectionPath'. The actual shape built in _load() is { path, projectionPath, config }; align the JSDoc with that. UserMediaMessage.content widened to Uint8Array | string earlier in this PR but no integration test exercised the string-path branch. Add one elephant-image test that passes the absolute path as message content, exercising the loadMedia(string) path through the JS-to-C++ handoff. * build: promote @qvac/logging to runtime dependency index.js requires('@qvac/logging') at runtime, so it belongs under dependencies, not devDependencies. Previously it worked only because another runtime dep pulled it in transitively — fragile for publish and can break under stricter package managers. * doc: finish finetuning.md mermaid fix Previous commit 979a070 reworded only my own addition (line 251) but the block still failed at the same position because the surrounding pre-existing message bodies still used ; as a statement separator. Mermaid sequenceDiagram parses ; as end-of-statement, so every message containing it broke the diagram. Replace ; with , or a separator word across all four affected lines (block #1 lines 251, 256, 266 and block #2 line 296) so the finetune and pause flow diagrams render on GitHub. * fix: move addon construction into crash-safe try block _createAddon() was outside the try so a synchronous throw in require('./binding') or binding.createInstance() would leave this.addon set to a partial native handle and never reach the cleanup path. Route addon construction through the same try the shard-streaming and activate() calls use. --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* doc: Re-add favicon in InkeepAI widget * doc: update release notes link * doc: update external resources - add X QVAC profile --------- Co-authored-by: Matteo Giardino <mat.gia.dev@gmail.com>

Picks up mario-rei/qvac-registry-vcpkg e36670e, which adds the missing spirv-headers dep on the whisper-cpp port. Unblocks linux/win32/android prebuild legs. Also routes spirv-headers to microsoft/vcpkg in packages/transcription-whispercpp/vcpkg-configuration.json so vcpkg can resolve the new transitive dep when the vulkan feature is active. bci-whispercpp does not activate whisper-cpp[vulkan] so its configuration only needs the baseline bump.

The overrides block was beating the baseline bump from 72d6c41, so vcpkg kept resolving the old port-version 0 (no spirv-headers dep) and the linux/win/android prebuild legs failed with the SPIRV header miss. This pins port-version 1 explicitly so the registry fix (mario-rei/qvac-registry-vcpkg@e36670e) actually takes effect.

The bci-whispercpp cpp-tests job manually installed clang-19 via `sudo ./llvm.sh 19 all` but never put `/usr/lib/llvm-19/bin` on PATH. That left unversioned `clang`/`clang++` pointing at the runner's default clang-14 while `libc++-19-dev` replaced the libc++ headers under `/usr/include/c++/v1`. With the triplet's `-stdlib=libc++` the build picked up the libc++-19 headers with clang-14 and exploded inside ggml-impl.h with "'std' is not a class, namespace, or enumeration" once a vcpkg cache miss forced ggml to be rebuilt from source. The sibling transcription-whispercpp job survived because it uses the canonical `.github/actions/setup-llvm` composite, which is the single source of truth for the monorepo's clang major (currently 22) and prepends the versioned bin to PATH. Switch the bci-whispercpp cpp-test workflow to that same composite action so clang and libc++ stay version-matched, and drop the hard-coded `llvm-cov-19` / `llvm-profdata-19` invocations in bci-whispercpp's `coverage:cpp` scripts in favour of the unversioned binaries that `setup-llvm` now exposes (mirroring the transcription-whispercpp setup).

The previous commit's workflow + package.json edits cannot take effect through this PR: `on-pr-bci-whispercpp.yml` runs as `pull_request_target`, so its `workflow_call` of `cpp-test-coverage-bci-whispercpp.yml` resolves the reusable workflow file against `main`, not the PR branch. The locked-from-main BCI cpp-tests job runs `sudo ./llvm.sh 19 all` and never adds `/usr/lib/llvm-19/bin` to PATH, so unversioned `clang`/`clang++` keep pointing at the runner image default (clang-14) while `libc++-19-dev` replaces the headers under `/usr/include/c++/v1`. When vcpkg cache misses (as it does every BCI run, since the binary cache is workspace-local), ggml is rebuilt from source and the libc++-19 headers fail to compile against clang-14. Revert the previous workflow + package.json changes (they would only have effect once merged to `main`, and the package.json change would actively break coverage against the current workflow). Instead, fix the toolchain that vcpkg actually loads from this branch: detect the highest LLVM major installed under `/usr/lib/llvm-N/bin` and pin `CMAKE_C_COMPILER` / `CMAKE_CXX_COMPILER` to that versioned bin. This keeps clang and libc++ version-matched regardless of whether the workflow pre-installs LLVM 19 (current main) or LLVM 22 (post setup-llvm). Falls back to unversioned `clang`/`clang++` on hosts that don't ship a `/usr/lib/llvm-N` tree (local macOS dev, runners that already expose the desired major via PATH).

sharmaraju352 and others added 30 commits April 15, 2026 12:19

doc: point diffusion README to qvac fork (#1609)

8e4b6c4

* doc: point diffusion README to qvac fork Clarify that the diffusion addon vendors qvac-ext-stable-diffusion.cpp rather than the upstream stable-diffusion.cpp repository.

Fix chunking not working in whisper (#1615)

aced7e0

chore: unify prebuild vcpkg cache setup on shared S3 backend (#1603)

c8f18c7

Fix extra brace in TTS script (#1621)

7a58f4d

test: define smoke suite for SDK test definitions (#1577)

f606702

Add suites: ["smoke"] to 84 test definitions across 27 files. Curated for API surface coverage, validation quality, and performance.

Fix TTS C++ failing test (#1624)

56b5901

Revert "chore: remove Opus model entries from registry server (#1602)" (

3ebee8c

#1627) This reverts commit 436e29c.

Qvac 16772 sdk integration for bergamot (#1620)

2365732

* Updated SDK translation-config.ts and released-models-bergamot.txt * Ran update-models.

QVAC-17317 doc: write API ref SDK v0.9.0 (#1632)

5a5ea01

* doc: api v0.9 manually written * doc: SDK - api reference - v0.9.0

feat[api]: implement cancel embed operation (QVAC-11724) (#1607)

975d06d

infra: refresh AWS credentials before Vulkan ARM64 cache upload (#1639)

9154065

infra[notask]: switch sdk desktop ci macos runner to mac-mini-m4-gpu (#…

24c6f6b

…1646)

QVAC-14302 feat[api]: export RuntimeStats interface in OCR addon inde…

9a6e8d6

…x.d.ts (#1612)

QVAC-17410 doc: Content update: minor adjustments (#1656)

391da8a

* doc: Re-add favicon in InkeepAI widget * doc: update release notes link * doc: update external resources - add X QVAC profile --------- Co-authored-by: Matteo Giardino <mat.gia.dev@gmail.com>

mario-rei had a problem deploying to release May 11, 2026 15:01 — with GitHub Actions Failure

mario-rei temporarily deployed to release May 11, 2026 17:09 — with GitHub Actions Inactive

mario-rei had a problem deploying to release May 11, 2026 17:09 — with GitHub Actions Failure

mario-rei temporarily deployed to release May 11, 2026 17:09 — with GitHub Actions Inactive

mario-rei had a problem deploying to release May 11, 2026 17:09 — with GitHub Actions Failure

mario-rei temporarily deployed to release May 11, 2026 17:09 — with GitHub Actions Inactive

mario-rei added 4 commits May 11, 2026 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-18300 chore[notask]: validate whisper-cpp 1.8.4.3 upstream sync#1975

QVAC-18300 chore[notask]: validate whisper-cpp 1.8.4.3 upstream sync#1975
mario-rei wants to merge 1181 commits into
mainfrom
tmp-whisper-184-3-validation

mario-rei commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

mario-rei commented May 11, 2026

Purpose

How it is wired up

Workflows triggered

After validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants