fix: sync JS layer types #950
Merged
gianni-cor merged 10 commits intoMar 17, 2026
Merged
Conversation
- Wire up notifyProcessExit in binding.js to prevent SIGSEGV on
process shutdown when GPU backends are already torn down
- Sync TypeScript types with C++ handler reality:
- SamplerMethod: add 6 missing values, fix string literals to match
C++ parser (dpm++2m not dpm++_2m)
- ScheduleType: add 6 missing values, remove invalid 'default'
- RngType: add 'std_default'
- Add PredictionType and SdConfig.prediction field
- Fix addonLogging.d.ts to use named exports matching the .js module
- WeightType: add 6 missing quantization types (bf16, q2_k, q3_k, q4_k, q5_k, q6_k), rename 'default' to 'auto' to match C++ parser - SdConfig: add 12 missing fields from C++ handler map (sampler_rng, diffusion_fa, mmap, offload_to_cpu, flow_shift, diffusion_conv_direct, vae_conv_direct, circular_x, circular_y, force_sdxl_vae_conv_scale, backends_dir, tensor_type_rules, lora_apply_mode) - GenerationParams: add 7 missing fields (eta, img_cfg_scale, clip_skip, vae_tile_size, vae_tile_overlap, cache_mode, cache_threshold) - Add CacheMode and LoraApplyMode type aliases - Increase model load time assertion from 120s to 180s across all integration tests (Windows runner exceeded 120s at 130.9s)
gianni-cor
requested changes
Mar 17, 2026
…lint" This reverts commit e8daa86.
The SDK imports addonLogging as a default import, so keep the AddonLogging interface + export default pattern.
- circularx/circulary (not circular_x/circular_y) - backendsDir (not backends_dir) - Add 'circular' shorthand for both axes
… fix/diffusion-js-layer-fixes # Conflicts: # packages/lib-infer-diffusion/index.d.ts
c09eec8
into
tetherto:feature-media-generation
14 of 15 checks passed
gianni-cor
added a commit
that referenced
this pull request
Mar 19, 2026
* updated for sd * updated and successfuly built * downloads * updated with working loading * updated load model js for Q4_K test * rewrote parameter handling to support multiple params and also two different model types * got sd inference to work * updated for sd2 * got full sdxl to work * rename folder to qvac-lib-infer-diffusion * update package name * sd3 finished * rename: qvac-lib-infer-diffusion -> lib-infer-diffusion Rename package directory from packages/qvac-lib-infer-diffusion to packages/lib-infer-diffusion to align with the lib-* naming convention used across the monorepo. Made-with: Cursor * updated for cuda linux * updated for model * have something working * changelog * cpp lint * formatt * updated model for gian * integration test * fixing according to boss * fix(android): enable BUILD_SHARED_LIBS and stub pthread_cancel for GGML_BACKEND_DL GGML_BACKEND_DL requires BUILD_SHARED_LIBS=ON so CMake can build GPU backends as MODULE targets (.so). Previously BUILD_SHARED_LIBS was hardcoded OFF, causing configure to fail on Android. Also stub out pthread_cancel in ggml-backend-reg.cpp via a cmake string replacement — pthread_cancel is unavailable in the Android NDK. The loader thread terminates naturally without the explicit cancel. Made-with: Cursor * fix(android): exclude Vulkan on Android and fix pthread_cancel stub Two portfile fixes for arm64-android cross-compile: 1. SD_VULKAN: the else() branch was enabling -DSD_VULKAN=ON for Android, causing find_package(Vulkan) to pick up the host x86_64 SDK during cross-compile and fail CMake configure. Android Vulkan support comes via the NDK and is handled separately; skip the flag entirely. 2. pthread_cancel: replace the fragile comment-based no-op with a proper inline stub guarded by #if defined(__ANDROID__), injected at the top of ggml-backend-reg.cpp before compilation. Made-with: Cursor * ci: dump vcpkg configure logs on failure for android build Adds an always-run step that cats all config-*.log files from the vcpkg stable-diffusion-cpp buildtrees on failure, so the exact CMake configure error is visible inline in the CI job output. Made-with: Cursor * fix(android): insert pthread_cancel stub after pthread.h include The previous stub was prepended to the top of ggml-backend-reg.cpp before any #include, so pthread_t was undefined and the stub itself failed to compile — leaving pthread_cancel undeclared for the actual call site. Fix: insert the no-op stub immediately after #include <pthread.h> so pthread_t is available. Add a fallback that prepends both the include and stub if <pthread.h> isn't found directly. Also pass HAVE_PTHREAD_CANCEL=0 and GGML_HAVE_PTHREAD_CANCEL=OFF as CMake cache variables to disable any check_function_exists tests, and add DISABLE_PARALLEL_CONFIGURE to avoid race conditions with source patches. Made-with: Cursor * fix(android): resolve BUILD_SHARED_LIBS override and pthread_cancel issues Locally verified: stable-diffusion-cpp:arm64-android now configures and builds successfully. Three root causes fixed: 1. BUILD_SHARED_LIBS override: vcpkg maps VCPKG_LIBRARY_LINKAGE to BUILD_SHARED_LIBS, and the arm64-android triplet sets linkage to "static" — appending -DBUILD_SHARED_LIBS=OFF after our explicit ON. Additionally, stable-diffusion.cpp's CMakeLists.txt resets BUILD_SHARED_LIBS=OFF unless SD_BUILD_SHARED_GGML_LIB=ON. Fix: set VCPKG_LIBRARY_LINKAGE=dynamic for this port when DL backends are enabled, and pass -DSD_BUILD_SHARED_GGML_LIB=ON. 2. pthread_cancel stub redefinition: the previous stub was inserted via string(REPLACE) + fallback string(PREPEND), but both paths executed — producing a duplicate definition error. Also, vcpkg reuses cached source trees, so patches accumulated across builds. Fix: use a sentinel comment for idempotency; only one insertion path with the stub placed after #include <pthread.h>. 3. Removed the now-unnecessary explicit BUILD_SHARED_LIBS_OPTION variable since VCPKG_LIBRARY_LINKAGE handles it correctly. Made-with: Cursor * updated for android hopefully works * added opencl support for android * windows attempt fix * attempting to fix windows again * NORM problem with ggml operation * attempting to patch norm * attempting again to fix * diagonstic step * update for opencl * updated for device selection * fix(diffusion): add CI/CD workflows, test infra, and integration tests (#676) * fix(diffusion): rebase on feature-media-generation, add CI improvements Rebased cleanly onto feature-media-generation to pick up: - SD_CPU_ONLY env var gate (Metal NORM op fallback to CPU) - GGML_OPENMP=OFF (eliminates libomp.so.5 dependency) - OpenCL support for Android Additions on top of base: - Add cpp-tests and ts-checks jobs to on-pr workflow - Add image artifact upload to integration tests (traceable to source test) - Disable win32 in prebuilds/integration/cpp-tests (C1128 /bigobj) - Install libomp5 on Linux integration tests (safety net) - Test infrastructure: unit tests, mobile test framework, scripts * fix(diffusion): address PR review comments, enable win32, improve CI artifacts - Re-enable win32 platform in prebuilds, integration-test, and cpp-tests workflows - Remove duplicate PULL_REQUEST_TEMPLATE.md (already in repo root) - Fix setDiff in validate-mobile-tests.js to handle non-Set inputs - Refactor generate-image.test.js to use ensureModel from utils.js - Save test images to modelDir for mobile permission compatibility - Update CI to look for images in test/model/ instead of output/ - Add PR comment step to post image metadata on pull requests * fix(diffusion): restore base branch code accidentally removed during rebase Restores SD_CPU_ONLY patch, GGML_OPENMP=OFF, OpenCL support, Apple keep_clip_on_cpu guard, and VCPKG_BUILD_TYPE placement that were dropped when patches were applied on top of the reset base. * style(diffusion): fix lint errors in examples (no-multi-spaces, indent) * feat(diffusion): upload test images to S3 and display inline in step summary Images are uploaded to S3 with public-read ACL, then embedded in the step summary and PR comments via their S3 URLs so they render inline without needing to download artifacts. * ci(diffusion): remove libomp5 install (fixed by GGML_OPENMP=OFF in portfile) * remove S3 upload, use simple table summary for generated images * restore AWS env vars from base branch * refactor(diffusion): consolidate test utils, remove helpers.js Move detectPlatform, setupJsLogger, isPng into utils.js and update generate-image.test.js to import from utils.js only. Add platform detection for device selection in model-loading.test.js. * fixed integration tests * updated * updated timeout * cpp unit tests complete and tested YAY BABY * cpp lint * updated * test(diffusion): add integration tests for SDXL, SD3, and FLUX.2 (#757) * test(diffusion): add integration tests for SDXL, SD3, and FLUX.2 Add integration tests for all supported model families based on the existing examples. Each test follows the LLM addon patterns: platform- aware device selection, defensive cleanup with .catch(), ensureModel for CI downloads. - generate-image-sdxl.test.js: SDXL Base 1.0 (all-in-one GGUF, auto eps-prediction) - generate-image-sd3.test.js: SD3 Medium (safetensors, flow prediction, euler sampler) - generate-image-flux2.test.js: FLUX.2 klein 4B (split layout: diffusion + LLM + VAE) - Regenerate all.js (brittle) and integration.auto.cjs (mobile) * fix(diffusion): use CPU on all darwin platforms Metal's GGML_OP_MUL_MAT is unsupported for stable-diffusion.cpp, causing SIGABRT on darwin-arm64. Use isDarwin (all darwin) instead of isDarwinX64 for the useCpu check. * revert: keep GPU on darwin-arm64 to surface Metal errors Don't hide GPU errors behind CPU fallback — the Metal MUL_MAT issue needs to be visible so it gets fixed. * test(diffusion): increase test timeouts for CPU-bound runs FLUX.2 30min, SDXL/SD3 15min — these models are too heavy for the default 10min timeout when running on CPU. * chore: remove all.js from tracking (auto-generated, gitignored) * test(diffusion): skip SDXL, SD3, and FLUX.2 tests on mobile * QVAC-13954: Clean up vcpkg deps in lib-infer-diffusion (#781) * refactor: split ggml into standalone vcpkg overlay port Decouple ggml from the stable-diffusion-cpp overlay port so it can be shared by multiple consumers with consistent ABI guarantees. - Add standalone ggml overlay port (version-date 2026-01-30) pinned to the same commit used by stable-diffusion.cpp master-514-5792c66 - Refactor stable-diffusion-cpp port to use vcpkg_from_github + SD_USE_SYSTEM_GGML=ON instead of cloning with --recurse-submodules - Patch ggml's src/CMakeLists.txt and cmake/ggml-config.cmake.in to propagate GGML_MAX_NAME=128 via INTERFACE_COMPILE_DEFINITIONS, ensuring all consumers share the same struct layout - Switch both ports to version-date versioning (no upstream semver) - Replace bundled stb headers with vcpkg stb dependency - Auto-enable Vulkan backend on Linux via platform dependency - Forward GPU backend features (metal/vulkan/cuda/opencl) from stable-diffusion-cpp to ggml through vcpkg feature * fix(diffusion): fix ggml/sd overlay ports for Android cross-compilation Add NDK-matched Vulkan C++ header detection so the ggml port downloads headers matching the exact NDK Vulkan version instead of pulling a potentially mismatched vcpkg vulkan-headers package. Add missing ggml-opencl.h to the public headers install list. Auto-enable opencl on Android and vulkan on desktop/Android via default-features in both the ggml and stable-diffusion-cpp overlay ports. * fix(diffusion): disable OpenMP and align ggml flags with qvac-fabric Add GGML_OPENMP=OFF to fix Windows CI failure where OpenMP is unavailable, and GGML_LLAMAFILE=OFF to disable unused code paths. Add Android-specific flags for DL backends (GGML_BACKEND_DL, CPU_ALL_VARIANTS, CPU_REPACK) and disable cooperative matrix Vulkan extensions on mobile GPUs. * fix(diffusion): fix ggml include dirs for DL backends and use tetherto fork Patch ggml-config.cmake.in to set INTERFACE_INCLUDE_DIRECTORIES on the ggml::ggml and ggml::ggml-base targets unconditionally. When GGML_BACKEND_DL is ON, the per-backend targets are not created and include dirs were lost. Also switch the SD source to the tetherto fork and drop the qvac-diffusion- library prefix from CMakeLists.txt now that ggml is a standalone port with standard names. * Remove redundancies in vcpkg manifest files * Set SD_CPU_ONLY=1 on CI env * updated for runtime stats * fixed connection to logger, as it was not properly connected before * fixed for license file, validated working run on m1 air * quickstart quick-maths * fixed integration for windows * fix(diffusion): add real cancel/abort support to native generation (#782) * fix(diffusion): add real cancel/abort support to native generation Cancel previously only set an atomic flag checked after generate_image() returned — generation ran to full completion and output was silently discarded. This made cancel appear to work while still burning full compute time. Changes: Portfile patches (stable-diffusion.cpp): - Add sd_abort_cb_t typedef and sd_set_abort_callback() public API - Add sd_abort_requested() helper checked in the denoise lambda - When abort fires, denoise returns nullptr which the sampler stack already treats as failure → generate_image() returns NULL - Fix upstream bug: abort path freed wrong compute buffer (diffusion_model instead of work_diffusion_model), corrupting sd_ctx and causing segfault on reuse SdModel.cpp: - Wire cancelRequested_ into abort callback via thread-local (matches existing progress callback pattern for concurrency safety) - Scope guard ensures callbacks are cleared on all exit paths including early parse/validation exceptions - Always free results[i].data whether cancelled or not (buffer leak fix) - Cancelled jobs throw "Job cancelled" → JobRunner emits queueException instead of fake success with queueResult + queueJobEnded - Return empty std::any from process() so queueJobEnded() is the sole terminal stats path (fixes duplicate JobEnded events in JS) SdModel.hpp: - Add isCancelRequested() public accessor for the static abort callback * fix(diffusion): disable free_params_immediately for model reuse The upstream sd_ctx_params_init() defaults free_params_immediately=true, which permanently frees model weight buffers after the first generate_image() call. Any subsequent generation on the same sd_ctx accesses freed memory and crashes (SIGSEGV). Set the default to false so the addon supports multiple generations on the same model instance (the expected use pattern). This was the root cause of the "cancel then run" crash — the abort path still runs through generate_image_internal() which calls diffusion_model->free_params_buffer() when this flag is true. * fix(diffusion): add code comments and rename fix-abort-cleanup patch - Add comments to SdCtxHandlers.hpp explaining why freeParamsImmediately is disabled (upstream default frees weight buffers after first generation, causing use-after-free on model reuse) - Add comments to both hunks in the upstream cleanup patch explaining the compute buffer bug and work_ctx leak - Rename fix-abort-cleanup.patch to fix-failure-path-cleanup.patch since the fixes apply to any failure path, not just abort * fix(diffusion): document cancel-as-error rationale vs LLM addon Diffusion throws on cancel (queueException) while LLM returns normally (queueResult). Add comment explaining the intentional difference: diffusion has no useful partial output, so an explicit error signal is more honest than a success with output_count=0. * test(diffusion): add C++ unit tests for cancel/context handling Add test_cancel_context.cpp covering the context changes from the cancel fix: - cancel when idle is a no-op (no crash, no state corruption) - cancel during generation throws "Job cancelled" (cancel-as-error path) - model is reusable after cancel (validates freeParamsImmediately=false and compute buffer fix — the exact SIGSEGV scenario) - multiple sequential generations succeed (normal reuse without cancel) - cancelRequested_ flag is reset at process() entry - process() on unloaded model throws (not segfault) - runtime stats are populated after successful generation * fix(diffusion): fix patch line counts and test assertion - Fix fix-failure-path-cleanup.patch: correct hunk line counts (-2203,7 +2203,11 and -3796,6 +3800,13) and replace Unicode em-dashes with ASCII in comments - Fix CancelWhenIdleIsNoop test: cancel() sets the flag even when idle, it is only cleared on process() entry * refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup (#794) * refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup Patch ggml to support GGML_BACKEND_DL with BUILD_SHARED_LIBS=OFF by enabling PIC and backend compile definitions when DL is on, matching the qvac-fabric approach. Remove VCPKG_LIBRARY_LINKAGE=dynamic override — core libs are now static .a with PIC, backends remain MODULE .so files. Clean up CMakeLists.txt: remove redundant explicit linking of OpenCL, Metal frameworks, CUDA libs, and ggml (all propagated transitively via ggml cmake config). Fix WIN32_LEAN_AND_MEAN typo, remove stale comments, and drop the clang overlay triplet workaround. * chore(diffusion): switch Linux to libc++, fix vcpkg warnings, remove dead patches Add libc++ triplets for x64-linux and arm64-linux under vcpkg/triplets, matching the qvac-lib-infer-llamacpp-llm layout. Move triplet and toolchain files from vcpkg-override-triplets to vcpkg/. Install the stable-diffusion-cpp usage file and suppress mismatched binary count warnings in both overlay ports. Remove obsolete rename-ggml-libs and no-dlopen-without-backend-dl patches from the old submodule architecture. * fix(diffusion): disable GGML_BACKEND_DL for Android static backends stable-diffusion.cpp calls ggml_backend_is_cpu() and ggml_backend_cpu_init() directly, which live in the CPU backend module. With GGML_BACKEND_DL these become separate .so files unavailable at link time, causing dlopen failures on device. Statically link all backends (CPU, Vulkan, OpenCL) instead, and bundle the OpenCL ICD loader .so on Android so the addon loads even on devices without a system libOpenCL. * Place the OpenCL ICD Loading library next to bare file * fix(diffusion): graceful OpenCL fallback and backend priority reorder Patch ggml's OpenCL backend to return nullptr instead of aborting when no OpenCL devices are found (e.g. Pixel phones without OpenCL support). Reorder SD backend priority to CUDA > Metal > OpenCL > Vulkan > CPU, preferring OpenCL on Adreno devices where it outperforms Vulkan, with if-guards so only the first successful backend is used. * feat(diffusion): Adreno-aware backend selection for Android Detect Adreno GPU model at runtime via ggml device enumeration and choose the optimal backend: Adreno 800+ uses GPU (OpenCL), Adreno 600/700 is forced to CPU due to poor OpenCL performance, and non-Adreno devices fall through to Vulkan. Adds INFO-level logging of detected devices and selection decisions for troubleshooting. * fix(diffusion): statically link OpenCL ICD loader on Android Add an overlay port for opencl that removes the dynamic-only restriction, allowing the ICD loader to be built as a static library. This eliminates libOpenCL.so as a NEEDED dependency so the addon loads on all Android devices regardless of OpenCL support. The static ICD loader still dlopen's vendor drivers at runtime. * Fixed formatting * CPU only on Android * feat(diffusion): hybrid static CPU + dynamic GPU backends for Android (#813) * feat(diffusion): hybrid static CPU + dynamic GPU backends for Android Add GGML_CPU_STATIC option that builds the CPU backend as a static library linked into ggml even when GGML_BACKEND_DL is ON. GPU backends (Vulkan, OpenCL) remain MODULE .so files loaded at runtime via dlopen, eliminating libOpenCL.so as a NEEDED dependency. This lets stable-diffusion.cpp call CPU backend functions directly (ggml_set_f32, ggml_backend_cpu_init, etc.) while GPU backends are discovered at runtime — a single Android binary works on all devices regardless of OpenCL/Vulkan support. * feat(diffusion): generic backend init using ggml registry API Replace SD's init_backend() #ifdef waterfall with generic ggml calls (ggml_backend_init_by_type) that work with both statically linked and dynamically loaded backends. Load DL backend modules from the addon via ggml_backend_load_all_from_path() when GGML_BACKEND_DL is enabled. This eliminates SD's dependency on GPU-specific headers (ggml-opencl.h, ggml-vulkan.h, etc.) and removes the SD_METAL/VULKAN/CUDA/OPENCL build flags, replacing sd-cpu-only.patch and sd-backend-priority.patch with a single sd-generic-backend-init.patch. * feat(diffusion): prefer OpenCL on Adreno 800+ via sd_ctx backend preference Add a new backend preference field in stable-diffusion context params and wire SdModel to request OpenCL for Adreno 800+ when available, while keeping SD_CPU_ONLY as CI-only env override. Also fix ggml hybrid export wiring so CPU static symbols are linked for Android DL backend mode, and refresh android-arm64 prebuild artifact. * fix(diffusion): pass backendsDir to SdCtxConfig * Added logging to troubleshoot pixel vulkan init * fix(diffusion): JS layer review fixes and cancel test coverage (#783) * fix(diffusion): JS layer review fixes and cancel test coverage Aligns the JS layer with the LLM addon patterns and adds API behavior tests for cancel/busy/idle state transitions. JS layer: - Rename run() to _runInternal() (BaseInference template method pattern) - Replace 30ms timer guard with _hasActiveResponse boolean - Extract _getWeightFiles() to deduplicate file lists in _load/_downloadWeights - Wrap _runGeneration in _withExclusiveRun for serialization - Add finalized.catch(() => {}) unhandled rejection guard - Reset _hasActiveResponse in unload() - Filter undefined values in addon config coercion - Remove orphaned unloadWeights() from addon.js - Update class doc and README to match actual supported models Types (index.d.ts): - Fix run() signature: Txt2ImgParams (was accepting txt2vid params) - Proper type hierarchy: Txt2ImgParams → Img2ImgParams → GenerationParams - Add missing params: guidance, sampling_method, scheduler - Remove unused type declarations Tests: - Add api-behavior.test.js with 5 cancel/busy/idle tests - idle|run, idle|cancel, run|cancel, run|run (busy), cancel|run (rerun) - cancel|run test requires native abort support (fix/diffusion-cancel-abort) * fix(diffusion): cancel inside onUpdate callback matching LLM pattern Cancel tests now fire model.cancel() inside the onUpdate callback after the first progress tick (string data), matching the LLM addon's runAndCancelAfterFirstToken pattern. This ensures native generation is guaranteed to be active when cancel fires, preventing false passes. * fix(diffusion): use const for non-reassigned chain variable Standard JS lint requires const for variables that are never reassigned. * fix(diffusion): update scope note instead of removing it FLUX.1 and Wan2.x video are still not supported — keep that explicit. * fix(diffusion): video generation is planned, not excluded Wan2.x support is planned for the future — update scope note accordingly. * fix(diffusion): address PR review — remove WeightsProvider, unify run API, update docs - Remove WeightsProvider and _downloadWeights (files must be on disk) - Unify txt2img/img2img into single run() with auto-detected mode - Add return await to _withExclusiveRun calls (stack trace alignment) - Strengthen run|run test to verify first response completes - Update README: loader is optional, add t5XxlModel, fix load() docs - Update docs/architecture.md: align with disk-local contract * fix(diffusion): remove unused loader from constructor, tests, and examples The diffusion addon never used the loader parameter — it was accepted in the constructor but silently discarded. Model files are loaded directly from disk via diskPath. - Remove loader from ImgStableDiffusion constructor and type declarations - Remove Loader interface and ReportProgressCallback (no remaining consumers) - Remove FilesystemDL usage from all 6 integration tests and 7 examples - Update README: remove data loader section, renumber steps, drop loader from args table * fix(diffusion): remove stale loader deps and fix doc references - Remove @qvac/dl-filesystem and @qvac/dl-hyperdrive from devDependencies - Remove @qvac/dl-hyperdrive from peerDependencies - Update architecture.md to reflect direct disk-path loading (no FilesystemDL) * fix(diffusion): remove last Hyperdrive mention from architecture doc * fix(diffusion): remove stale loadWeights from thread safety rules * fix(diffusion): update data-flows doc to reflect unified run() API * feat(diffusion): move stable-diffusion-cpp to registry (#865) Support qvac ggml backend module names. * cpp lint * trying to fix seg faults * fix(diffusion): Add fallback to load backend by filename (#879) * QVAC-14129: skip generation tests on GPU-less runners (#897) * test(diffusion): skip generation tests on GPU-less runners Read NO_GPU env var via bare-process and skip image generation tests when running on runners without GPUs. Model loading test still runs on CPU-only runners with forced cpu device. * test(diffusion): enable api-behavior tests on mobile and GPU-less runners Address review feedback: remove skip guard so all api-behavior tests run on mobile and GPU-less runners, add vae_on_cpu for Android, use SHORT_PARAMS in busy-error and cancel-then-run tests, add verbosity. * fix(diffusion): remove unused isMobile variable * refactor[notask]: address PR review comments for lib-infer-diffusion addon - Remove IModelAsyncLoad inheritance from SdModel; add custom activate() in AddonJs.hpp that calls SdModel::load() directly, bypassing the unused async-load interface - Add SdModel::setProcessExiting() static method and expose it as a notifyProcessExit binding so JS can signal the native side before process exit, preventing SIGSEGV (exit 139) during Metal/Vulkan teardown - Refactor SdGenHandlers parsers (parseSampler, parseScheduler, parseCacheMode, cache_preset) from if/else chains to std::unordered_map - Extract parseVaeTileSize into a static helper using std::from_chars and std::string_view for exception-safe parsing - Replace raw stats members with a CumulativeStats struct in SdModel - Wrap generate_image results in RAII SdImageBatch to prevent memory leaks when outputCallback or encodeToPng throws mid-iteration - Use optPath lambda for model path assignments in SdModel::load() - Add braces to all single-statement if bodies in BackendSelection.cpp - Add test_sd_gen_handlers.cpp unit tests covering all refactored changes Made-with: Cursor * style[notask]: apply clang-format-19 to test_sd_gen_handlers.cpp Made-with: Cursor * fix: remove trailing blank line in addon.js to pass standard lint (#951) * refactor[notask]: remove public unload() from SdModel; expand TypeScript types - Move free_sd_ctx logic inline into ~SdModel destructor and remove the public unload() method — object lifetime now manages GPU memory release - Remove unloadModel() binding from AddonJs.hpp and binding.cpp (was dead code; JS always called destroyInstance, not unloadModel) - Update unit tests to use scoped braces {} for destruction instead of explicit unload() calls; TearDownTestSuite now uses model.reset() - Expand SamplerMethod from 8 to 14 values to match parseSampler() map; fix dpm++ key strings (dpm++2m not dpm++_2m) - Expand ScheduleType from 6 to 12 values to match parseScheduler() map - Add missing std_default to RngType * fix: sync JS layer types (#950) * fix: sync JS layer with C++ addon for lib-infer-diffusion - Wire up notifyProcessExit in binding.js to prevent SIGSEGV on process shutdown when GPU backends are already torn down - Sync TypeScript types with C++ handler reality: - SamplerMethod: add 6 missing values, fix string literals to match C++ parser (dpm++2m not dpm++_2m) - ScheduleType: add 6 missing values, remove invalid 'default' - RngType: add 'std_default' - Add PredictionType and SdConfig.prediction field - Fix addonLogging.d.ts to use named exports matching the .js module * fix: complete TypeScript type coverage and relax model load timeout - WeightType: add 6 missing quantization types (bf16, q2_k, q3_k, q4_k, q5_k, q6_k), rename 'default' to 'auto' to match C++ parser - SdConfig: add 12 missing fields from C++ handler map (sampler_rng, diffusion_fa, mmap, offload_to_cpu, flow_shift, diffusion_conv_direct, vae_conv_direct, circular_x, circular_y, force_sdxl_vae_conv_scale, backends_dir, tensor_type_rules, lora_apply_mode) - GenerationParams: add 7 missing fields (eta, img_cfg_scale, clip_skip, vae_tile_size, vae_tile_overlap, cache_mode, cache_threshold) - Add CacheMode and LoraApplyMode type aliases - Increase model load time assertion from 120s to 180s across all integration tests (Windows runner exceeded 120s at 130.9s) * fix: remove trailing blank line in addon.js to pass standard lint * Revert "fix: remove trailing blank line in addon.js to pass standard lint" This reverts commit e8daa86. * fix: remove trailing blank line in addon.js * fix: restore addonLogging.d.ts default export for SDK compatibility The SDK imports addonLogging as a default import, so keep the AddonLogging interface + export default pattern. * fix: correct config key names to match C++ handler map - circularx/circulary (not circular_x/circular_y) - backendsDir (not backends_dir) - Add 'circular' shorthand for both axes * revert: restore original 120s model load timeout in integration tests * revert: remove notifyProcessExit wiring from binding.js * refactor[notask]: remove notifyProcessExit mechanism from lib-infer-diffusion Remove the JS-to-C++ process-exit signalling mechanism entirely: - Drop g_processExiting atomic flag and setProcessExiting() static method from SdModel; destructor is now = default, delegating cleanup to the unique_ptr<sd_ctx_t> custom deleter as intended - Remove notifyProcessExit() inline function from AddonJs.hpp and its binding registration from binding.cpp - Remove notifyProcessExit JS helper and export from addon.js - Remove the corresponding unit test from test_sd_gen_handlers.cpp Made-with: Cursor * ci[notask]: enable C++ unit tests on linux-x64 in cpp-tests-diffusion workflow Made-with: Cursor * feat(diffusion): reduce reported generation stats to primitive fields Remove 8 derived/redundant fields from the runtimeStats payload: generation_time, totalTime, stepsPerSecond, msPerStep, megapixelsPerSecond, steps, output_count. All removed fields are either aliases of a kept field (generation_time = generationMs, steps = totalSteps, output_count = totalImages) or trivially derivable by the caller from the remaining primitives (totalWallMs, totalSteps, totalPixels). The 11 remaining fields are: modelLoadMs, generationMs, totalGenerationMs, totalWallMs, totalSteps, totalGenerations, totalImages, totalPixels, width, height, seed. Update test_cancel_context assertions to use the new field names. Made-with: Cursor * feat(diffusion): add RuntimeStats TypeScript interface and bump to 0.1.1 Expose a RuntimeStats interface in index.d.ts describing the 11 primitive fields emitted on the QvacResponse 'stats' event: modelLoadMs, generationMs, totalGenerationMs, totalWallMs, totalSteps, totalGenerations, totalImages, totalPixels, width, height, seed. Mirrors the pattern established in the embed addon (PR #937). Derivable rate fields (stepsPerSecond, msPerStep, megapixelsPerSecond) are intentionally omitted — callers can compute them from the retained primitives. Bump package version to 0.1.1 and add CHANGELOG entry. Made-with: Cursor * fix: add Android Vulkan init diagnostics (#981) * fix: add Android Vulkan init diagnostics Added stable-diffusion overlay port for troubleshooting. Resolved loading issue where load by type tried GPU in a device with IGPU. Logging loop listed details of each device and attempted to initialize directly devices listed as GPU or IGPU. This resolved the failure to load by type. * Split init loop into GPU and IGPU sections * fix(diffusion): detect JobEnded by structural type instead of stats key name The callback checked for 'generation_time' in the stats object, but the C++ side emits 'generationMs'. Match on plain-object shape instead so the check survives future stats key renames. Made-with: Cursor * refactor(diffusion): remove circular padding options and fix example resolutions Remove the circularx, circulary, and circular (both-axes shorthand) config options from the C++ handlers, SdCtxConfig struct, SdModel param assignment, and TypeScript index.d.ts. These were unused and added unnecessary surface area. Fix generate-image-sd2.js example to use 768x768, which is SD2.1's native training resolution. Using off-native resolution produces softer outputs. Made-with: Cursor * refactor(diffusion): rename CHANGELOG to CHANGELOG.md and align format with LLM package Made-with: Cursor * refactor(diffusion): remove CUDA build references from docs Remove CUDA as a listed GPU backend from platform tables, architecture diagrams, and the device config comment in index.d.ts. This package ships Metal, Vulkan, and OpenCL backends only. The 'cuda' RNG type references are unchanged (upstream philox RNG enum name). Made-with: Cursor * fix(diffusion): remove CPU fallback from macOS x64 GPU column in README Made-with: Cursor * docs(diffusion): add Other Examples section to README Made-with: Cursor * docs(diffusion): extract build instructions into build.md Move prerequisites, platform-specific setup, cross-compilation, and troubleshooting from README into a dedicated build.md matching the LLM package structure. README now links to build.md with a quick start snippet. Made-with: Cursor * chore(diffusion): generate NOTICE file with third-party attributions Made-with: Cursor * feat(diffusion): throw early if img2img is attempted Add an explicit guard in index.js that throws if init_image is passed, since img2img is not yet implemented in this PR. Provides a clear error message rather than silently falling through. Also fix trailing comma lint in generate-image-sd2.js. Made-with: Cursor --------- Co-authored-by: Nik <pocucandr@MacBookAir.lan> Co-authored-by: Nik <pocucandr@Niks-MacBook-Air.local> Co-authored-by: aegioscy <nik@linux64vm.com> Co-authored-by: Ridwan Taiwo <donriddo@gmail.com> Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io> Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>
Proletter
added a commit
that referenced
this pull request
May 24, 2026
* updated for sd * updated and successfuly built * downloads * updated with working loading * updated load model js for Q4_K test * rewrote parameter handling to support multiple params and also two different model types * got sd inference to work * updated for sd2 * got full sdxl to work * rename folder to qvac-lib-infer-diffusion * update package name * sd3 finished * rename: qvac-lib-infer-diffusion -> lib-infer-diffusion Rename package directory from packages/qvac-lib-infer-diffusion to packages/lib-infer-diffusion to align with the lib-* naming convention used across the monorepo. Made-with: Cursor * updated for cuda linux * updated for model * have something working * changelog * cpp lint * formatt * updated model for gian * integration test * fixing according to boss * fix(android): enable BUILD_SHARED_LIBS and stub pthread_cancel for GGML_BACKEND_DL GGML_BACKEND_DL requires BUILD_SHARED_LIBS=ON so CMake can build GPU backends as MODULE targets (.so). Previously BUILD_SHARED_LIBS was hardcoded OFF, causing configure to fail on Android. Also stub out pthread_cancel in ggml-backend-reg.cpp via a cmake string replacement — pthread_cancel is unavailable in the Android NDK. The loader thread terminates naturally without the explicit cancel. Made-with: Cursor * fix(android): exclude Vulkan on Android and fix pthread_cancel stub Two portfile fixes for arm64-android cross-compile: 1. SD_VULKAN: the else() branch was enabling -DSD_VULKAN=ON for Android, causing find_package(Vulkan) to pick up the host x86_64 SDK during cross-compile and fail CMake configure. Android Vulkan support comes via the NDK and is handled separately; skip the flag entirely. 2. pthread_cancel: replace the fragile comment-based no-op with a proper inline stub guarded by #if defined(__ANDROID__), injected at the top of ggml-backend-reg.cpp before compilation. Made-with: Cursor * ci: dump vcpkg configure logs on failure for android build Adds an always-run step that cats all config-*.log files from the vcpkg stable-diffusion-cpp buildtrees on failure, so the exact CMake configure error is visible inline in the CI job output. Made-with: Cursor * fix(android): insert pthread_cancel stub after pthread.h include The previous stub was prepended to the top of ggml-backend-reg.cpp before any #include, so pthread_t was undefined and the stub itself failed to compile — leaving pthread_cancel undeclared for the actual call site. Fix: insert the no-op stub immediately after #include <pthread.h> so pthread_t is available. Add a fallback that prepends both the include and stub if <pthread.h> isn't found directly. Also pass HAVE_PTHREAD_CANCEL=0 and GGML_HAVE_PTHREAD_CANCEL=OFF as CMake cache variables to disable any check_function_exists tests, and add DISABLE_PARALLEL_CONFIGURE to avoid race conditions with source patches. Made-with: Cursor * fix(android): resolve BUILD_SHARED_LIBS override and pthread_cancel issues Locally verified: stable-diffusion-cpp:arm64-android now configures and builds successfully. Three root causes fixed: 1. BUILD_SHARED_LIBS override: vcpkg maps VCPKG_LIBRARY_LINKAGE to BUILD_SHARED_LIBS, and the arm64-android triplet sets linkage to "static" — appending -DBUILD_SHARED_LIBS=OFF after our explicit ON. Additionally, stable-diffusion.cpp's CMakeLists.txt resets BUILD_SHARED_LIBS=OFF unless SD_BUILD_SHARED_GGML_LIB=ON. Fix: set VCPKG_LIBRARY_LINKAGE=dynamic for this port when DL backends are enabled, and pass -DSD_BUILD_SHARED_GGML_LIB=ON. 2. pthread_cancel stub redefinition: the previous stub was inserted via string(REPLACE) + fallback string(PREPEND), but both paths executed — producing a duplicate definition error. Also, vcpkg reuses cached source trees, so patches accumulated across builds. Fix: use a sentinel comment for idempotency; only one insertion path with the stub placed after #include <pthread.h>. 3. Removed the now-unnecessary explicit BUILD_SHARED_LIBS_OPTION variable since VCPKG_LIBRARY_LINKAGE handles it correctly. Made-with: Cursor * updated for android hopefully works * added opencl support for android * windows attempt fix * attempting to fix windows again * NORM problem with ggml operation * attempting to patch norm * attempting again to fix * diagonstic step * update for opencl * updated for device selection * fix(diffusion): add CI/CD workflows, test infra, and integration tests (#676) * fix(diffusion): rebase on feature-media-generation, add CI improvements Rebased cleanly onto feature-media-generation to pick up: - SD_CPU_ONLY env var gate (Metal NORM op fallback to CPU) - GGML_OPENMP=OFF (eliminates libomp.so.5 dependency) - OpenCL support for Android Additions on top of base: - Add cpp-tests and ts-checks jobs to on-pr workflow - Add image artifact upload to integration tests (traceable to source test) - Disable win32 in prebuilds/integration/cpp-tests (C1128 /bigobj) - Install libomp5 on Linux integration tests (safety net) - Test infrastructure: unit tests, mobile test framework, scripts * fix(diffusion): address PR review comments, enable win32, improve CI artifacts - Re-enable win32 platform in prebuilds, integration-test, and cpp-tests workflows - Remove duplicate PULL_REQUEST_TEMPLATE.md (already in repo root) - Fix setDiff in validate-mobile-tests.js to handle non-Set inputs - Refactor generate-image.test.js to use ensureModel from utils.js - Save test images to modelDir for mobile permission compatibility - Update CI to look for images in test/model/ instead of output/ - Add PR comment step to post image metadata on pull requests * fix(diffusion): restore base branch code accidentally removed during rebase Restores SD_CPU_ONLY patch, GGML_OPENMP=OFF, OpenCL support, Apple keep_clip_on_cpu guard, and VCPKG_BUILD_TYPE placement that were dropped when patches were applied on top of the reset base. * style(diffusion): fix lint errors in examples (no-multi-spaces, indent) * feat(diffusion): upload test images to S3 and display inline in step summary Images are uploaded to S3 with public-read ACL, then embedded in the step summary and PR comments via their S3 URLs so they render inline without needing to download artifacts. * ci(diffusion): remove libomp5 install (fixed by GGML_OPENMP=OFF in portfile) * remove S3 upload, use simple table summary for generated images * restore AWS env vars from base branch * refactor(diffusion): consolidate test utils, remove helpers.js Move detectPlatform, setupJsLogger, isPng into utils.js and update generate-image.test.js to import from utils.js only. Add platform detection for device selection in model-loading.test.js. * fixed integration tests * updated * updated timeout * cpp unit tests complete and tested YAY BABY * cpp lint * updated * test(diffusion): add integration tests for SDXL, SD3, and FLUX.2 (#757) * test(diffusion): add integration tests for SDXL, SD3, and FLUX.2 Add integration tests for all supported model families based on the existing examples. Each test follows the LLM addon patterns: platform- aware device selection, defensive cleanup with .catch(), ensureModel for CI downloads. - generate-image-sdxl.test.js: SDXL Base 1.0 (all-in-one GGUF, auto eps-prediction) - generate-image-sd3.test.js: SD3 Medium (safetensors, flow prediction, euler sampler) - generate-image-flux2.test.js: FLUX.2 klein 4B (split layout: diffusion + LLM + VAE) - Regenerate all.js (brittle) and integration.auto.cjs (mobile) * fix(diffusion): use CPU on all darwin platforms Metal's GGML_OP_MUL_MAT is unsupported for stable-diffusion.cpp, causing SIGABRT on darwin-arm64. Use isDarwin (all darwin) instead of isDarwinX64 for the useCpu check. * revert: keep GPU on darwin-arm64 to surface Metal errors Don't hide GPU errors behind CPU fallback — the Metal MUL_MAT issue needs to be visible so it gets fixed. * test(diffusion): increase test timeouts for CPU-bound runs FLUX.2 30min, SDXL/SD3 15min — these models are too heavy for the default 10min timeout when running on CPU. * chore: remove all.js from tracking (auto-generated, gitignored) * test(diffusion): skip SDXL, SD3, and FLUX.2 tests on mobile * QVAC-13954: Clean up vcpkg deps in lib-infer-diffusion (#781) * refactor: split ggml into standalone vcpkg overlay port Decouple ggml from the stable-diffusion-cpp overlay port so it can be shared by multiple consumers with consistent ABI guarantees. - Add standalone ggml overlay port (version-date 2026-01-30) pinned to the same commit used by stable-diffusion.cpp master-514-5792c66 - Refactor stable-diffusion-cpp port to use vcpkg_from_github + SD_USE_SYSTEM_GGML=ON instead of cloning with --recurse-submodules - Patch ggml's src/CMakeLists.txt and cmake/ggml-config.cmake.in to propagate GGML_MAX_NAME=128 via INTERFACE_COMPILE_DEFINITIONS, ensuring all consumers share the same struct layout - Switch both ports to version-date versioning (no upstream semver) - Replace bundled stb headers with vcpkg stb dependency - Auto-enable Vulkan backend on Linux via platform dependency - Forward GPU backend features (metal/vulkan/cuda/opencl) from stable-diffusion-cpp to ggml through vcpkg feature * fix(diffusion): fix ggml/sd overlay ports for Android cross-compilation Add NDK-matched Vulkan C++ header detection so the ggml port downloads headers matching the exact NDK Vulkan version instead of pulling a potentially mismatched vcpkg vulkan-headers package. Add missing ggml-opencl.h to the public headers install list. Auto-enable opencl on Android and vulkan on desktop/Android via default-features in both the ggml and stable-diffusion-cpp overlay ports. * fix(diffusion): disable OpenMP and align ggml flags with qvac-fabric Add GGML_OPENMP=OFF to fix Windows CI failure where OpenMP is unavailable, and GGML_LLAMAFILE=OFF to disable unused code paths. Add Android-specific flags for DL backends (GGML_BACKEND_DL, CPU_ALL_VARIANTS, CPU_REPACK) and disable cooperative matrix Vulkan extensions on mobile GPUs. * fix(diffusion): fix ggml include dirs for DL backends and use tetherto fork Patch ggml-config.cmake.in to set INTERFACE_INCLUDE_DIRECTORIES on the ggml::ggml and ggml::ggml-base targets unconditionally. When GGML_BACKEND_DL is ON, the per-backend targets are not created and include dirs were lost. Also switch the SD source to the tetherto fork and drop the qvac-diffusion- library prefix from CMakeLists.txt now that ggml is a standalone port with standard names. * Remove redundancies in vcpkg manifest files * Set SD_CPU_ONLY=1 on CI env * updated for runtime stats * fixed connection to logger, as it was not properly connected before * fixed for license file, validated working run on m1 air * quickstart quick-maths * fixed integration for windows * fix(diffusion): add real cancel/abort support to native generation (#782) * fix(diffusion): add real cancel/abort support to native generation Cancel previously only set an atomic flag checked after generate_image() returned — generation ran to full completion and output was silently discarded. This made cancel appear to work while still burning full compute time. Changes: Portfile patches (stable-diffusion.cpp): - Add sd_abort_cb_t typedef and sd_set_abort_callback() public API - Add sd_abort_requested() helper checked in the denoise lambda - When abort fires, denoise returns nullptr which the sampler stack already treats as failure → generate_image() returns NULL - Fix upstream bug: abort path freed wrong compute buffer (diffusion_model instead of work_diffusion_model), corrupting sd_ctx and causing segfault on reuse SdModel.cpp: - Wire cancelRequested_ into abort callback via thread-local (matches existing progress callback pattern for concurrency safety) - Scope guard ensures callbacks are cleared on all exit paths including early parse/validation exceptions - Always free results[i].data whether cancelled or not (buffer leak fix) - Cancelled jobs throw "Job cancelled" → JobRunner emits queueException instead of fake success with queueResult + queueJobEnded - Return empty std::any from process() so queueJobEnded() is the sole terminal stats path (fixes duplicate JobEnded events in JS) SdModel.hpp: - Add isCancelRequested() public accessor for the static abort callback * fix(diffusion): disable free_params_immediately for model reuse The upstream sd_ctx_params_init() defaults free_params_immediately=true, which permanently frees model weight buffers after the first generate_image() call. Any subsequent generation on the same sd_ctx accesses freed memory and crashes (SIGSEGV). Set the default to false so the addon supports multiple generations on the same model instance (the expected use pattern). This was the root cause of the "cancel then run" crash — the abort path still runs through generate_image_internal() which calls diffusion_model->free_params_buffer() when this flag is true. * fix(diffusion): add code comments and rename fix-abort-cleanup patch - Add comments to SdCtxHandlers.hpp explaining why freeParamsImmediately is disabled (upstream default frees weight buffers after first generation, causing use-after-free on model reuse) - Add comments to both hunks in the upstream cleanup patch explaining the compute buffer bug and work_ctx leak - Rename fix-abort-cleanup.patch to fix-failure-path-cleanup.patch since the fixes apply to any failure path, not just abort * fix(diffusion): document cancel-as-error rationale vs LLM addon Diffusion throws on cancel (queueException) while LLM returns normally (queueResult). Add comment explaining the intentional difference: diffusion has no useful partial output, so an explicit error signal is more honest than a success with output_count=0. * test(diffusion): add C++ unit tests for cancel/context handling Add test_cancel_context.cpp covering the context changes from the cancel fix: - cancel when idle is a no-op (no crash, no state corruption) - cancel during generation throws "Job cancelled" (cancel-as-error path) - model is reusable after cancel (validates freeParamsImmediately=false and compute buffer fix — the exact SIGSEGV scenario) - multiple sequential generations succeed (normal reuse without cancel) - cancelRequested_ flag is reset at process() entry - process() on unloaded model throws (not segfault) - runtime stats are populated after successful generation * fix(diffusion): fix patch line counts and test assertion - Fix fix-failure-path-cleanup.patch: correct hunk line counts (-2203,7 +2203,11 and -3796,6 +3800,13) and replace Unicode em-dashes with ASCII in comments - Fix CancelWhenIdleIsNoop test: cancel() sets the flag even when idle, it is only cleared on process() entry * refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup (#794) * refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup Patch ggml to support GGML_BACKEND_DL with BUILD_SHARED_LIBS=OFF by enabling PIC and backend compile definitions when DL is on, matching the qvac-fabric approach. Remove VCPKG_LIBRARY_LINKAGE=dynamic override — core libs are now static .a with PIC, backends remain MODULE .so files. Clean up CMakeLists.txt: remove redundant explicit linking of OpenCL, Metal frameworks, CUDA libs, and ggml (all propagated transitively via ggml cmake config). Fix WIN32_LEAN_AND_MEAN typo, remove stale comments, and drop the clang overlay triplet workaround. * chore(diffusion): switch Linux to libc++, fix vcpkg warnings, remove dead patches Add libc++ triplets for x64-linux and arm64-linux under vcpkg/triplets, matching the qvac-lib-infer-llamacpp-llm layout. Move triplet and toolchain files from vcpkg-override-triplets to vcpkg/. Install the stable-diffusion-cpp usage file and suppress mismatched binary count warnings in both overlay ports. Remove obsolete rename-ggml-libs and no-dlopen-without-backend-dl patches from the old submodule architecture. * fix(diffusion): disable GGML_BACKEND_DL for Android static backends stable-diffusion.cpp calls ggml_backend_is_cpu() and ggml_backend_cpu_init() directly, which live in the CPU backend module. With GGML_BACKEND_DL these become separate .so files unavailable at link time, causing dlopen failures on device. Statically link all backends (CPU, Vulkan, OpenCL) instead, and bundle the OpenCL ICD loader .so on Android so the addon loads even on devices without a system libOpenCL. * Place the OpenCL ICD Loading library next to bare file * fix(diffusion): graceful OpenCL fallback and backend priority reorder Patch ggml's OpenCL backend to return nullptr instead of aborting when no OpenCL devices are found (e.g. Pixel phones without OpenCL support). Reorder SD backend priority to CUDA > Metal > OpenCL > Vulkan > CPU, preferring OpenCL on Adreno devices where it outperforms Vulkan, with if-guards so only the first successful backend is used. * feat(diffusion): Adreno-aware backend selection for Android Detect Adreno GPU model at runtime via ggml device enumeration and choose the optimal backend: Adreno 800+ uses GPU (OpenCL), Adreno 600/700 is forced to CPU due to poor OpenCL performance, and non-Adreno devices fall through to Vulkan. Adds INFO-level logging of detected devices and selection decisions for troubleshooting. * fix(diffusion): statically link OpenCL ICD loader on Android Add an overlay port for opencl that removes the dynamic-only restriction, allowing the ICD loader to be built as a static library. This eliminates libOpenCL.so as a NEEDED dependency so the addon loads on all Android devices regardless of OpenCL support. The static ICD loader still dlopen's vendor drivers at runtime. * Fixed formatting * CPU only on Android * feat(diffusion): hybrid static CPU + dynamic GPU backends for Android (#813) * feat(diffusion): hybrid static CPU + dynamic GPU backends for Android Add GGML_CPU_STATIC option that builds the CPU backend as a static library linked into ggml even when GGML_BACKEND_DL is ON. GPU backends (Vulkan, OpenCL) remain MODULE .so files loaded at runtime via dlopen, eliminating libOpenCL.so as a NEEDED dependency. This lets stable-diffusion.cpp call CPU backend functions directly (ggml_set_f32, ggml_backend_cpu_init, etc.) while GPU backends are discovered at runtime — a single Android binary works on all devices regardless of OpenCL/Vulkan support. * feat(diffusion): generic backend init using ggml registry API Replace SD's init_backend() #ifdef waterfall with generic ggml calls (ggml_backend_init_by_type) that work with both statically linked and dynamically loaded backends. Load DL backend modules from the addon via ggml_backend_load_all_from_path() when GGML_BACKEND_DL is enabled. This eliminates SD's dependency on GPU-specific headers (ggml-opencl.h, ggml-vulkan.h, etc.) and removes the SD_METAL/VULKAN/CUDA/OPENCL build flags, replacing sd-cpu-only.patch and sd-backend-priority.patch with a single sd-generic-backend-init.patch. * feat(diffusion): prefer OpenCL on Adreno 800+ via sd_ctx backend preference Add a new backend preference field in stable-diffusion context params and wire SdModel to request OpenCL for Adreno 800+ when available, while keeping SD_CPU_ONLY as CI-only env override. Also fix ggml hybrid export wiring so CPU static symbols are linked for Android DL backend mode, and refresh android-arm64 prebuild artifact. * fix(diffusion): pass backendsDir to SdCtxConfig * Added logging to troubleshoot pixel vulkan init * fix(diffusion): JS layer review fixes and cancel test coverage (#783) * fix(diffusion): JS layer review fixes and cancel test coverage Aligns the JS layer with the LLM addon patterns and adds API behavior tests for cancel/busy/idle state transitions. JS layer: - Rename run() to _runInternal() (BaseInference template method pattern) - Replace 30ms timer guard with _hasActiveResponse boolean - Extract _getWeightFiles() to deduplicate file lists in _load/_downloadWeights - Wrap _runGeneration in _withExclusiveRun for serialization - Add finalized.catch(() => {}) unhandled rejection guard - Reset _hasActiveResponse in unload() - Filter undefined values in addon config coercion - Remove orphaned unloadWeights() from addon.js - Update class doc and README to match actual supported models Types (index.d.ts): - Fix run() signature: Txt2ImgParams (was accepting txt2vid params) - Proper type hierarchy: Txt2ImgParams → Img2ImgParams → GenerationParams - Add missing params: guidance, sampling_method, scheduler - Remove unused type declarations Tests: - Add api-behavior.test.js with 5 cancel/busy/idle tests - idle|run, idle|cancel, run|cancel, run|run (busy), cancel|run (rerun) - cancel|run test requires native abort support (fix/diffusion-cancel-abort) * fix(diffusion): cancel inside onUpdate callback matching LLM pattern Cancel tests now fire model.cancel() inside the onUpdate callback after the first progress tick (string data), matching the LLM addon's runAndCancelAfterFirstToken pattern. This ensures native generation is guaranteed to be active when cancel fires, preventing false passes. * fix(diffusion): use const for non-reassigned chain variable Standard JS lint requires const for variables that are never reassigned. * fix(diffusion): update scope note instead of removing it FLUX.1 and Wan2.x video are still not supported — keep that explicit. * fix(diffusion): video generation is planned, not excluded Wan2.x support is planned for the future — update scope note accordingly. * fix(diffusion): address PR review — remove WeightsProvider, unify run API, update docs - Remove WeightsProvider and _downloadWeights (files must be on disk) - Unify txt2img/img2img into single run() with auto-detected mode - Add return await to _withExclusiveRun calls (stack trace alignment) - Strengthen run|run test to verify first response completes - Update README: loader is optional, add t5XxlModel, fix load() docs - Update docs/architecture.md: align with disk-local contract * fix(diffusion): remove unused loader from constructor, tests, and examples The diffusion addon never used the loader parameter — it was accepted in the constructor but silently discarded. Model files are loaded directly from disk via diskPath. - Remove loader from ImgStableDiffusion constructor and type declarations - Remove Loader interface and ReportProgressCallback (no remaining consumers) - Remove FilesystemDL usage from all 6 integration tests and 7 examples - Update README: remove data loader section, renumber steps, drop loader from args table * fix(diffusion): remove stale loader deps and fix doc references - Remove @qvac/dl-filesystem and @qvac/dl-hyperdrive from devDependencies - Remove @qvac/dl-hyperdrive from peerDependencies - Update architecture.md to reflect direct disk-path loading (no FilesystemDL) * fix(diffusion): remove last Hyperdrive mention from architecture doc * fix(diffusion): remove stale loadWeights from thread safety rules * fix(diffusion): update data-flows doc to reflect unified run() API * feat(diffusion): move stable-diffusion-cpp to registry (#865) Support qvac ggml backend module names. * cpp lint * trying to fix seg faults * fix(diffusion): Add fallback to load backend by filename (#879) * QVAC-14129: skip generation tests on GPU-less runners (#897) * test(diffusion): skip generation tests on GPU-less runners Read NO_GPU env var via bare-process and skip image generation tests when running on runners without GPUs. Model loading test still runs on CPU-only runners with forced cpu device. * test(diffusion): enable api-behavior tests on mobile and GPU-less runners Address review feedback: remove skip guard so all api-behavior tests run on mobile and GPU-less runners, add vae_on_cpu for Android, use SHORT_PARAMS in busy-error and cancel-then-run tests, add verbosity. * fix(diffusion): remove unused isMobile variable * refactor[notask]: address PR review comments for lib-infer-diffusion addon - Remove IModelAsyncLoad inheritance from SdModel; add custom activate() in AddonJs.hpp that calls SdModel::load() directly, bypassing the unused async-load interface - Add SdModel::setProcessExiting() static method and expose it as a notifyProcessExit binding so JS can signal the native side before process exit, preventing SIGSEGV (exit 139) during Metal/Vulkan teardown - Refactor SdGenHandlers parsers (parseSampler, parseScheduler, parseCacheMode, cache_preset) from if/else chains to std::unordered_map - Extract parseVaeTileSize into a static helper using std::from_chars and std::string_view for exception-safe parsing - Replace raw stats members with a CumulativeStats struct in SdModel - Wrap generate_image results in RAII SdImageBatch to prevent memory leaks when outputCallback or encodeToPng throws mid-iteration - Use optPath lambda for model path assignments in SdModel::load() - Add braces to all single-statement if bodies in BackendSelection.cpp - Add test_sd_gen_handlers.cpp unit tests covering all refactored changes Made-with: Cursor * style[notask]: apply clang-format-19 to test_sd_gen_handlers.cpp Made-with: Cursor * fix: remove trailing blank line in addon.js to pass standard lint (#951) * refactor[notask]: remove public unload() from SdModel; expand TypeScript types - Move free_sd_ctx logic inline into ~SdModel destructor and remove the public unload() method — object lifetime now manages GPU memory release - Remove unloadModel() binding from AddonJs.hpp and binding.cpp (was dead code; JS always called destroyInstance, not unloadModel) - Update unit tests to use scoped braces {} for destruction instead of explicit unload() calls; TearDownTestSuite now uses model.reset() - Expand SamplerMethod from 8 to 14 values to match parseSampler() map; fix dpm++ key strings (dpm++2m not dpm++_2m) - Expand ScheduleType from 6 to 12 values to match parseScheduler() map - Add missing std_default to RngType * fix: sync JS layer types (#950) * fix: sync JS layer with C++ addon for lib-infer-diffusion - Wire up notifyProcessExit in binding.js to prevent SIGSEGV on process shutdown when GPU backends are already torn down - Sync TypeScript types with C++ handler reality: - SamplerMethod: add 6 missing values, fix string literals to match C++ parser (dpm++2m not dpm++_2m) - ScheduleType: add 6 missing values, remove invalid 'default' - RngType: add 'std_default' - Add PredictionType and SdConfig.prediction field - Fix addonLogging.d.ts to use named exports matching the .js module * fix: complete TypeScript type coverage and relax model load timeout - WeightType: add 6 missing quantization types (bf16, q2_k, q3_k, q4_k, q5_k, q6_k), rename 'default' to 'auto' to match C++ parser - SdConfig: add 12 missing fields from C++ handler map (sampler_rng, diffusion_fa, mmap, offload_to_cpu, flow_shift, diffusion_conv_direct, vae_conv_direct, circular_x, circular_y, force_sdxl_vae_conv_scale, backends_dir, tensor_type_rules, lora_apply_mode) - GenerationParams: add 7 missing fields (eta, img_cfg_scale, clip_skip, vae_tile_size, vae_tile_overlap, cache_mode, cache_threshold) - Add CacheMode and LoraApplyMode type aliases - Increase model load time assertion from 120s to 180s across all integration tests (Windows runner exceeded 120s at 130.9s) * fix: remove trailing blank line in addon.js to pass standard lint * Revert "fix: remove trailing blank line in addon.js to pass standard lint" This reverts commit e8daa86. * fix: remove trailing blank line in addon.js * fix: restore addonLogging.d.ts default export for SDK compatibility The SDK imports addonLogging as a default import, so keep the AddonLogging interface + export default pattern. * fix: correct config key names to match C++ handler map - circularx/circulary (not circular_x/circular_y) - backendsDir (not backends_dir) - Add 'circular' shorthand for both axes * revert: restore original 120s model load timeout in integration tests * revert: remove notifyProcessExit wiring from binding.js * refactor[notask]: remove notifyProcessExit mechanism from lib-infer-diffusion Remove the JS-to-C++ process-exit signalling mechanism entirely: - Drop g_processExiting atomic flag and setProcessExiting() static method from SdModel; destructor is now = default, delegating cleanup to the unique_ptr<sd_ctx_t> custom deleter as intended - Remove notifyProcessExit() inline function from AddonJs.hpp and its binding registration from binding.cpp - Remove notifyProcessExit JS helper and export from addon.js - Remove the corresponding unit test from test_sd_gen_handlers.cpp Made-with: Cursor * ci[notask]: enable C++ unit tests on linux-x64 in cpp-tests-diffusion workflow Made-with: Cursor * feat(diffusion): reduce reported generation stats to primitive fields Remove 8 derived/redundant fields from the runtimeStats payload: generation_time, totalTime, stepsPerSecond, msPerStep, megapixelsPerSecond, steps, output_count. All removed fields are either aliases of a kept field (generation_time = generationMs, steps = totalSteps, output_count = totalImages) or trivially derivable by the caller from the remaining primitives (totalWallMs, totalSteps, totalPixels). The 11 remaining fields are: modelLoadMs, generationMs, totalGenerationMs, totalWallMs, totalSteps, totalGenerations, totalImages, totalPixels, width, height, seed. Update test_cancel_context assertions to use the new field names. Made-with: Cursor * feat(diffusion): add RuntimeStats TypeScript interface and bump to 0.1.1 Expose a RuntimeStats interface in index.d.ts describing the 11 primitive fields emitted on the QvacResponse 'stats' event: modelLoadMs, generationMs, totalGenerationMs, totalWallMs, totalSteps, totalGenerations, totalImages, totalPixels, width, height, seed. Mirrors the pattern established in the embed addon (PR #937). Derivable rate fields (stepsPerSecond, msPerStep, megapixelsPerSecond) are intentionally omitted — callers can compute them from the retained primitives. Bump package version to 0.1.1 and add CHANGELOG entry. Made-with: Cursor * fix: add Android Vulkan init diagnostics (#981) * fix: add Android Vulkan init diagnostics Added stable-diffusion overlay port for troubleshooting. Resolved loading issue where load by type tried GPU in a device with IGPU. Logging loop listed details of each device and attempted to initialize directly devices listed as GPU or IGPU. This resolved the failure to load by type. * Split init loop into GPU and IGPU sections * fix(diffusion): detect JobEnded by structural type instead of stats key name The callback checked for 'generation_time' in the stats object, but the C++ side emits 'generationMs'. Match on plain-object shape instead so the check survives future stats key renames. Made-with: Cursor * refactor(diffusion): remove circular padding options and fix example resolutions Remove the circularx, circulary, and circular (both-axes shorthand) config options from the C++ handlers, SdCtxConfig struct, SdModel param assignment, and TypeScript index.d.ts. These were unused and added unnecessary surface area. Fix generate-image-sd2.js example to use 768x768, which is SD2.1's native training resolution. Using off-native resolution produces softer outputs. Made-with: Cursor * refactor(diffusion): rename CHANGELOG to CHANGELOG.md and align format with LLM package Made-with: Cursor * refactor(diffusion): remove CUDA build references from docs Remove CUDA as a listed GPU backend from platform tables, architecture diagrams, and the device config comment in index.d.ts. This package ships Metal, Vulkan, and OpenCL backends only. The 'cuda' RNG type references are unchanged (upstream philox RNG enum name). Made-with: Cursor * fix(diffusion): remove CPU fallback from macOS x64 GPU column in README Made-with: Cursor * docs(diffusion): add Other Examples section to README Made-with: Cursor * docs(diffusion): extract build instructions into build.md Move prerequisites, platform-specific setup, cross-compilation, and troubleshooting from README into a dedicated build.md matching the LLM package structure. README now links to build.md with a quick start snippet. Made-with: Cursor * chore(diffusion): generate NOTICE file with third-party attributions Made-with: Cursor * feat(diffusion): throw early if img2img is attempted Add an explicit guard in index.js that throws if init_image is passed, since img2img is not yet implemented in this PR. Provides a clear error message rather than silently falling through. Also fix trailing comma lint in generate-image-sd2.js. Made-with: Cursor --------- Co-authored-by: Nik <pocucandr@MacBookAir.lan> Co-authored-by: Nik <pocucandr@Niks-MacBook-Air.local> Co-authored-by: aegioscy <nik@linux64vm.com> Co-authored-by: Ridwan Taiwo <donriddo@gmail.com> Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io> Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
index.d.tsare out of sync with the C++ addon handlers — missing enum values, config fields, and generation params. Some string literals are wrong (e.g.dpm++_2mvsdpm++2m).📝 How does it solve it?
Sync all type unions and interfaces in
index.d.tswith C++ handler maps:Type unions:
SamplerMethod: add 6 missing values (ipndm,ipndm_v,ddim_trailing,tcd,res_multistep,res_2s), fix string literals to match C++ (dpm++2mnotdpm++_2m)WeightType: add 6 missing quantization types (bf16,q2_k,q3_k,q4_k,q5_k,q6_k), renamedefault→autoScheduleType: add 6 missing values (sgm_uniform,simple,lcm,smoothstep,kl_optimal,bong_tangent)RngType: addstd_defaultPredictionType,LoraApplyMode,CacheModeSdConfig — 15 missing fields added:
sampler_rng,diffusion_fa,mmap,offload_to_cpu,prediction,flow_shift,diffusion_conv_direct,vae_conv_direct,circularx,circulary,circular,force_sdxl_vae_conv_scale,backendsDir,tensor_type_rules,lora_apply_modeGenerationParams — 7 missing fields added:
eta,img_cfg_scale,clip_skip,vae_tile_size,vae_tile_overlap,cache_mode,cache_threshold💥 Breaking Changes
WeightTyperename:'default'→'auto'to match C++ parser.SamplerMethodstring literals corrected:'dpm++_2m'→'dpm++2m''dpm++_2m_v2'→'dpm++2mv2''dpm++_2s_a'→'dpm++2s_a'