Updated ggml to version 2026-01-30#7 by aegioscy · Pull Request #134 · tetherto/qvac-registry-vcpkg

aegioscy · 2026-05-07T12:09:24Z

Summary

Bumps the ggml port to the merge of tetherto/qvac-ext-ggml#6 (05afdc5981031b8dcfd5f9cc979442b707b8486c).

The current pin (e16bdae2, port-version 6) carries the qvac hybrid-backend packaging work but predates the Wan-required Metal kernels — so today, any consumer hitting the Wan video path on Metal has to ship a local overlay or aborts at runtime with:

ggml_metal_op_encode_impl: error: unsupported op 'IM2COL_3D'

Five commits land on top of e16bdae2 with this bump:

SHA	Description
`bc053644`	metal: `IM2COL_3D` op + `PAD` left-padding for Wan video (PR #5)
`6d2d24bb`	metal: tighten `IM2COL_3D` `supports_op` to require `src[1]->type == F32`
`b1923e29`	metal: extend `IM2COL_3D` `supports_op` for `nb[0]==sizeof(float)` and F16-dst => F16-kernel match
`05afdc59`	Merge of `tetherto/qvac-ext-ggml#6` into `2026-01-30`

The supports_op tightening commits resolve advertise-then-abort gaps where Metal returned SUPPORTED for IM2COL_3D graphs that the CPU reference would then GGML_ASSERT on.

Files changed

Path	Change
`ports/ggml/portfile.cmake`	`REF e16bdae2…` → `REF 05afdc59…`; new `SHA512`; header comment updated
`ports/ggml/vcpkg.json`	`port-version: 6` → `7`; description annotated
`versions/g-/ggml.json`	prepend `{ git-tree: f1632875…, version-date: 2026-01-30, port-version: 7 }`
`versions/baseline.json`	`ggml.port-version: 6` → `7`

The git-tree SHA was computed via git rev-parse HEAD:ports/ggml after staging the port edits.

Verification

Tarball downloaded and SHA512 recomputed from tetherto/qvac-ext-ggml@05afdc59 via curl … | shasum -a 512.
IM2COL_3D predicate at the new pin verified via the GitHub contents API to match the final form from PR Updated port qvac-lib-inference-addon-cpp to v0.2.0 #6.
This same source tarball is the one currently consumed by diffusion-cpp's local overlay (port-version 104, identical REF+SHA512); that overlay builds clean with zero patches on darwin-arm64 and runs Wan2.1 1.3B txt2video end-to-end on Metal — which is what this bump enables for all registry consumers.

Follow-up

Once this is merged, qvac/packages/diffusion-cpp can drop its local vcpkg/ports/ggml/ overlay entirely (it currently exists only because the registry was 5 commits behind PR #6's merge).

Made with Cursor.

Made with Cursor

Bumps ggml port to the merge of tetherto/qvac-ext-ggml#6 (05afdc59), which lands on top of the previous pin (e16bdae2): - bc053644 metal: IM2COL_3D op + PAD left-padding for Wan video (#5) - 6d2d24bb metal: tighten IM2COL_3D supports_op (src[1]==F32) - b1923e29 metal: extend IM2COL_3D supports_op for nb[0]==sizeof(float) and F16-dst => F16-kernel match - 05afdc59 Merge pull request #6 from aegioscy Without these the Metal backend aborts mid-Wan inference with `unsupported op 'IM2COL_3D'` and test-backend-ops support advertises invalid IM2COL_3D combos that hit CPU GGML_ASSERTs. Verified end-to-end on darwin-arm64 via the same source tarball already used by diffusion-cpp's local overlay (now redundant after this bump): ggml@2026-01-30#7 builds with no patches, addon links against it, and Wan2.1 1.3B txt2video runs end-to-end on Metal. Co-authored-by: Cursor <cursoragent@cursor.com>

…s 2026-01-30#7 The previous commit (04a6496) repointed the local ggml overlay at the merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134 landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the identical REF + SHA512 the overlay was carrying. This means the diffusion-cpp-local overlay is now strictly redundant -- and slightly behind, since the registry's port-version 7 also picks up two improvements the overlay didn't have: - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build off the Apple Accelerate / BLAS path that breaks the iOS toolchain. - The Android backend-glob now also matches `libqvac-ggml-*.so` in addition to `libggml-*.so`, so the qvac-prefixed DL backends get installed alongside the upstream-named ones. So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake, vcpkg.json, usage, android-vulkan-version.cmake) and: - Bump `vcpkg-configuration.json`'s default-registry baseline from a9eae49a -> d1b2497b (the merge commit of registry PR #134), which is the first registry SHA that serves ggml@2026-01-30#7. - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5` to `version>=: 2026-01-30#7` so any later baseline bump can't silently drop us back below the Wan-Metal pin. The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep` marker are kept in place so future overlays can be added without a config flap. Verified end-to-end on darwin-arm64: clean `npm run build` (bare-make generate + build + install) with the build/ tree wiped. vcpkg resolves ggml[core,metal]:arm64-osx@2026-01-30#7 -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875... straight from the registry (no overlay), all 8 ports install in 47s, the addon links cleanly against the registry-supplied libggml*.a, and prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten. Net diff: +2 / -283. Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(diffusion): refactor download scripts and add Wan 2.1 support - Extract shared dl() function into reusable dl-functions.sh module - Update all download-model-*.sh scripts to source shared utilities - Add download-model-wan.sh for Wan 2.1 video generation models - Reduces code duplication and improves maintainability Wan 2.1 downloads (~8.3 GB): - wan2.1_t2v_1.3B_fp16.safetensors (diffusion model) - wan_2.1_vae.safetensors (VAE encoder/decoder) - umt5_xxl_fp16.safetensors (text encoder) Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video foundation -- ctx/vid handlers, AVI muxer, shared parsers Phase 1-4 of Wan 2.1 / 2.2 video generation support in the diffusion-cpp addon. Configuration + parsing layer only; dispatch + callback plumbing + JS surface land in follow-up commits on this branch. SdCtxConfig: - Add highNoiseDiffusionModelPath for Wan 2.2 MoE high-noise expert (leave empty for Wan 2.1 and all non-Wan models) - Add previewMode / previewInterval / previewDenoised / previewNoisy for optional mid-denoising preview frames via sd_set_preview_callback - Wire both through SdCtxHandlers (new JS keys: preview_mode, preview_interval, preview_denoised, preview_noisy) and AddonJs (highNoiseDiffusionModelPath in args map) AviWriter (new utility): - addon/src/utils/AviWriter.{hpp,cpp} ports the upstream avi_writer.h MJPG encoder onto an in-memory std::vector<uint8_t> sink (no stdio, no temp files) so video bytes flow through the existing OutputCallBackJs queue - Full input validation (numFrames, fps, jpegQuality, channel count, frame homogeneity, null data) -- StatusError on any rejection SdParsers (new shared module): - Extract parseSampler / parseScheduler / parseCacheMode / parseVaeTileSize / parseCachePreset / requireNum/Str/Bool from SdGenHandlers into addon/src/handlers/SdParsers.{hpp,cpp} - Reused by both SdGenHandlers (image) and SdVidGenHandlers (video) SdVidGenHandlers (new): - SdVidGenConfig struct with full Wan 2.1 + 2.2 surface: mode (txt2vid/img2vid/flf2vid), prompts, dimensions, videoFrames (4k+1 validated), fps, seed, low-noise expert sample params, high-noise expert sample params, moeBoundary, strength, vaceStrength, VAE tiling, cache mode/preset/threshold - 22 JSON handlers with validation for each field Tests (all pass): - 5 new SdCtxHandlers tests for preview_* + high_noise path default - 18 new AviWriter tests covering happy path, RIFF header structure, all validation rejections, JPEG round-trip - 54 new SdVidGenHandlers tests covering every field + integration payload + defaults - Zero regressions across existing 144 fast-unit tests No user-facing JS API changes yet. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video generation -- dispatch, processVideo, JS wrapper + examples Builds on the Wan foundation commit by wiring the video path end-to-end from JS to C++ and back. Adds txt2vid / img2vid / flf2vid generation via a new VideoStableDiffusion class that shares the single native binding with the existing ImgStableDiffusion class. Native: - SdModel::process() dispatches on the JSON "mode" field to processImage() (existing) or the new processVideo() path. - processVideo() applies SdVidGenHandlers, validates mode-vs-inputs invariants (img2vid requires init_image; flf2vid requires both; txt2vid rejects both; end_image only valid on flf2vid), decodes init/end/control frames, fills sd_vid_gen_params_t, and encodes the returned sd_image_t* sequence to an in-memory MJPG AVI. - SdVideoFrames RAII wrapper extracted to addon/src/utils/ so it can be unit-tested without a loaded model. - GenerationJob grows endImageBytes and controlFramesBytes plus an optional per-frame frameCallback (unused from JS in this PR; reserved for the preview follow-up). - AddonJs::runJob reads endImageBuffer (single Uint8Array) and controlFramesBuffers (Array of Uint8Array) as typed-array args, no JSON encoding. JS surface: - video.js / video.d.ts: new VideoStableDiffusion class with full per-mode validation, 4k+1 frame-count rule, fps range, moe_boundary range, Uint8Array type checks, and warning when high_noise_* params are set without files.highNoiseDiffusionModel. - addon.js: SdInterface.runJob threads end_image and control_frames through to the native runJob without round-tripping through JSON. - index.js / index.d.ts: unchanged -- image wrapper continues to work exactly as before. Both classes compose the same SdInterface and hit the same binding.cpp entry points. - package.json: exports "./video", ships video.js / video.d.ts, adds generate:video / generate:img2vid / generate:flf2vid scripts. Examples: - examples/generate-video-wan.js (txt2vid @ 832x480, 33 frames) - examples/img2vid-wan.js (reuses assets/von-neumann.jpg as first frame) - examples/flf2vid-wan.js (expects flf-first.png / flf-last.png) Tests: - test_sd_video_frames.cpp: 12 RAII tests (empty states, destruction of 4k+1 production sizes, null-pixel tolerance, bounds-checked operator[], compile-time copy/move deletion). - test_wan_video.cpp: 12 validation tests reusing the SD2.1 context to satisfy isLoaded() and exercise every processVideo() guard before generate_video() runs; plus an opt-in happy-path smoke test (SD_RUN_WAN_SMOKE=1) gated off by default because ggml-metal lacks IM2COL_3D for Wan's 3D convs. Gates: npm run lint, npm run test:dts, npm run build, and the fast subset of addon-test (178/178) all pass. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video tests, ggml overlay, example tuning Add a vcpkg overlay-port for ggml at vcpkg/ports/ggml/ that pins tetherto/qvac-ext-ggml @ feature/metal-pr-16669-clean (commit bc053644). The fork adds Metal kernels for IM2COL_3D and 3-axis PAD-left, both required by Wan 2.1 / 2.2 video generation; without them ggml hard-aborts mid-run with "unsupported op 'IM2COL_3D'". Rationale lives in portfile.cmake -- the overlay is transient and will be removed once the registry baseline rolls forward. Add JS test coverage for VideoStableDiffusion: - test/unit/video-validation.test.js: 63 input-validation cases mirroring the existing input-validation.test.js pattern. - test/integration/generate-video-wan.test.js: opt-in (WAN_INTEGRATION=1) end-to-end T2V smoke test plus sniffAvi self-tests. Tune the Wan examples: - generate-video-wan.js: env-var-driven (PROMPT, FRAMES, STEPS, SEED, CFG_SCALE, FLOW_SHIFT, ...), inline frame-count cheat sheet, (4*k+1) pre-flight check, default FRAMES bumped to 81 (Wan 1.3B's native training length). - img2vid-wan.js, flf2vid-wan.js: flow_shift 5.0 -> 3.0 to match the upstream test-wan reference scripts. Refresh the C++ smoke-test gating doc in test_wan_video.cpp to reflect that Metal works once the overlay is in place. Drop build.md: the vcpkg overlay rationale already lives next to the overlay (portfile.cmake header), and transient infrastructure doesn't earn its own long-form doc. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(diffusion-cpp): restore build.md The earlier deletion conflated build.md with the vcpkg overlay rationale, but build.md is the package's standalone build guide (prerequisites, build pipeline, cross-compilation, troubleshooting) and is still the target of README.md's "Building from Source" link. Restore it from main, which also picks up the LLVM 19 -> 22 bump. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): address PR review feedback for Wan video gen * Flip default video dimensions to 480x832 portrait (phone-screen friendly). Wan 2.1 T2V 1.3B handles both orientations equally well; the previous 832x480 landscape default disagreed with the example. * Document the flow_shift=0 fall-through sentinel in JSDoc, .d.ts, and C++ struct/handler comments; correct stale "5-8" recommendation to the actually-used 3.0 (matches example + ref scripts). * Make video_frames error messages consistent JS<->C++ and list the full valid set up to 81 (Wan 1.3B native training cap). * Fix frame-duration arithmetic (33 frames is ~2s @ default 16 fps, not ~1.3s @ 24 fps). * Warn when upscaler_* keys are passed to VideoStableDiffusion -- ESRGAN upscale is image-only and was being silently ignored. * Annotate addon.js end_image / control_frames forwarding to call out the typed-array transport (avoids JSON byte-array bloat). * Document the two-level concurrency model around _hasActiveResponse (the busy guard isn't dead under exclusiveRunQueue -- it covers overlap between the released queue lock and an in-flight response). * Update C++ defaults test + JS suggestion-fallback test for the new portrait orientation. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): retarget ggml overlay to merged tetherto/qvac-ext-ggml@2026-01-30 The Wan-Metal work that was carried as a local overlay has all landed upstream on tetherto/qvac-ext-ggml's 2026-01-30 branch: - bc053644 metal: IM2COL_3D op + PAD left-padding for Wan video (#5) - 512e1773 cmake: support qvac hybrid backend packaging (static CPU + dynamic GPU backends, GGML_MAX_NAME prop, graceful no-OpenCL-device fallback, public ggml-opencl.h install -- previously six local overlay patches) - 6d2d24bb / b1923e29 / 05afdc59 metal: tighten IM2COL_3D supports_op to match the CPU-reference invariants (#6) Repin vcpkg/ports/ggml from PR #5's head (bc053644) to PR #6's merge commit (05afdc59) on 2026-01-30, drop all seven local overlay patches since their content is now upstream verbatim, and bump port-version 102 -> 104 to force a clean rebuild of ggml. Net diff: +22 / -201; the overlay now exists only as a baseline pin that overrides the registry's ggml-org/ggml@a8db410a (which still lacks the Wan-required Metal ops). Once the registry baseline catches up to a ref containing this work, vcpkg/ports/ggml/ can be deleted entirely. Verified with npm run build on darwin-arm64: ggml@2026-01-30#104 builds fresh from 05afdc59 with zero patches applied, addon links and tests compile, prebuild installed. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): drop local ggml overlay now that registry serves 2026-01-30#7 The previous commit (04a6496) repointed the local ggml overlay at the merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134 landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the identical REF + SHA512 the overlay was carrying. This means the diffusion-cpp-local overlay is now strictly redundant -- and slightly behind, since the registry's port-version 7 also picks up two improvements the overlay didn't have: - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build off the Apple Accelerate / BLAS path that breaks the iOS toolchain. - The Android backend-glob now also matches `libqvac-ggml-*.so` in addition to `libggml-*.so`, so the qvac-prefixed DL backends get installed alongside the upstream-named ones. So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake, vcpkg.json, usage, android-vulkan-version.cmake) and: - Bump `vcpkg-configuration.json`'s default-registry baseline from a9eae49a -> d1b2497b (the merge commit of registry PR #134), which is the first registry SHA that serves ggml@2026-01-30#7. - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5` to `version>=: 2026-01-30#7` so any later baseline bump can't silently drop us back below the Wan-Metal pin. The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep` marker are kept in place so future overlays can be added without a config flap. Verified end-to-end on darwin-arm64: clean `npm run build` (bare-make generate + build + install) with the build/ tree wiped. vcpkg resolves ggml[core,metal]:arm64-osx@2026-01-30#7 -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875... straight from the registry (no overlay), all 8 ports install in 47s, the addon links cleanly against the registry-supplied libggml*.a, and prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten. Net diff: +2 / -283. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): satisfy standard quotes rule in validateVideoFrames The middle line of the validateVideoFrames Error message was a template literal with no `${...}` interpolation, so `standard` (configured via `npm run lint`) flags it as `quotes`: video.js:39:7: Strings must use singlequote. Adjacent lines 37, 38 use single quotes, and line 40 legitimately uses backticks for `${n}`. Just the one stray backtick-string -- swap to single quotes; no behaviour change. Sanity-checks job 74830306544 on PR #1879 fails on this single line; `npm run lint` passes locally after the swap. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: enable diffusion FA in examples and fix addon paths - Set diffusion_fa: true across SD, FLUX, and integration test ImgStableDiffusion configs so diffusion flash attention matches WAN video examples. - Pass highNoiseDiffusionModelPath (empty when unset) from index.js so native createInstance validation succeeds for image mode; document optional files.highNoiseDiffusionModel in index.d.ts and validate absolute paths. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp(video): pass esrganPath to native createInstance VideoStableDiffusion omitted esrganPath while the binding validates it as a string; mirror image-mode by forwarding files.esrgan or empty string. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: align C++ includes and image codec with inference-addon-cpp - Switch remaining qvac-lib-inference-addon-cpp includes to inference-addon-cpp (vcpkg installs headers under the shorter prefix). - Use image_codec::decodeImage / encodeToPng in processVideo after ImageCodec API rename from decodePng. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: apply clang-format to changed C++ sources Run git-clang-format against ce2ea93 to satisfy the repo formatter on the video addon, image codec, and Wan tests. No behavior changes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): address review comments 1-3 1. Use global addonLogging instead of per-instance setLogger/releaseLogger - Eliminates process-global logger collision (was reintroduced in video.js) - Mirrors fix from ImgStableDiffusion / EsrganUpscaler - video.js no longer manages per-instance logger state 2. Reject width/height values <= 0 in JS validation - Now validates that width > 0 and height > 0 before alignment check - Error message updated to say "positive multiples of 8" - Updated test expectations to match new message 3. Validate double values are integers before casting in C++ - All int casts now check std::floor(d) == d first - Affects: width, height, video_frames, fps handlers - Prevents silent truncation (e.g. 8.5 -> 8) All 70 unit tests pass; build/lint/dts all clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): address review comments 4-7 4. Validate end_image / control_frames dimensions match video dimensions - Added dimension checks in processVideo() before generate_video() - Rejects mismatched frame sizes with clear error messages - Prevents silent corruption or undefined behavior in native layer 5. Use ImageCodec ownership helper instead of raw free() - Replaced FrameBuffersGuard with unique_ptr<uint8_t, FreeDeleter> - Consistent with existing image_codec ownership pattern - Automatic cleanup on exception; no manual free() calls 6. Regenerate mobile integration test manifest - Ran npm run test:mobile:generate - Updated test/mobile/integration.auto.cjs with new runners 7. Add checked buffer size calculation in AviWriter - Validates width * height overflow before multiplication - Validates numFrames * bytesPerFrame overflow - Rejects allocations that would exceed SIZE_MAX - Prevents silent integer overflow in reserve() call All 70 unit tests pass; build/lint/dts all clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): harden int validation, ownership, AVI overflow Follow-up tightening on top of the review fixes for #1879. SdVidGenHandlers: - Extract a single requireInt() helper used by width / height / video_frames / fps / requirePositiveInt. The helper rejects NaN, +/-inf, fractional doubles, and values outside [INT_MIN, INT_MAX] before static_cast<int>, so casts to int are always well-defined and no JSON value silently truncates (e.g. 8.5 -> 8). - Add <cmath>/<climits> includes that were transitively available. SdModel::processVideo: - Replace the bespoke FrameBuffersGuard struct with three plain unique_ptr<uint8_t, image_codec::FreeDeleter> values (initData / endData / controlData). Same lifetime semantics, less custom code, and the control-frame dimension mismatch path now takes ownership *before* the check so a throw can no longer leak the freshly-decoded buffer. AviWriter::encodeFramesToAvi: - Reserve calculation is now step-wise overflow-checked against SIZE_MAX (width vs height vs *3 vs *numFrames) instead of a single multiply that could wrap. - Add a hard upper bound at UINT32_MAX (AVI 1.0 RIFF size header is a uint32_t -- anything past 4 GB cannot be addressed by the spec). - Re-check the final size before patching the RIFF header in case JPEG output overshoots the pre-flight estimate. Tests: - SdVidGenHandlers: new IntCoercion suite covers fractional doubles, out-of-int-range doubles, picojson's own NaN/inf rejection at the JSON layer, and integer-valued doubles (the common case from JSON). - AviWriter: new tests for the overflow guard and the 4 GB RIFF cap, both fire before any encoding starts. - test_wan_video: pin width/height in the existing CorruptControlFrame test so the new dimension check passes for frame [0] and we still exercise the decode-failure path at frame [1]. Add two new cases covering end_image and control_frames dimension mismatch. All 211 C++ tests, 70 JS unit tests, lint and tsc --dts pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): don't eager-require binding via addonLogging CI sanity-checks (JS unit tests on a runner with no native prebuild) was crashing with `AddonError: ADDON_NOT_FOUND` because the top-level `require('./addonLogging')` introduced in e6b13ae transitively pulled in `binding.js` -> `libqvac__diffusion-cpp.so`. The unit tests only exercise JS-side validation and never call `load()`, so they used to work without the prebuilt addon -- this regression broke that. Match `ImgStableDiffusion` instead: drop the per-instance native logger plumbing entirely (it's dead code anyway after the e6b13ae refactor, since `_connectNativeLogger` was no longer called), and document in the constructor JSDoc that callers wire up native C++ logs once globally via `addonLogging.setLogger(...)`. Net diff: - Remove `const addonLogging = require('./addonLogging')` at top. - Remove `_connectNativeLogger` / `_releaseNativeLogger` methods and their two stale call sites. - Remove `LOG_METHODS` (only used by the removed method) and `this._binding` (used to keep a handle for the removed release path; the binding is now scoped to `_createAddon` only, matching `ImgStableDiffusion::_createAddon`). - JSDoc on `args.logger` now mirrors `index.js` and points users at `addonLogging.setLogger`. Verified: JS unit tests 70/70 pass with the prebuilds directory moved aside, lint clean, tsc --dts clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): validate init_image dims; reject unsupported lora Two reviewer-flagged regressions on PR #1879: 1. blocker (gabrielgrigoras-serv): processVideo() validates dimensions for end_image and every control_frames[i] but not for init_image. A caller passing width/height that don't match the decoded init_image would hand mismatched (width, height) and frame pixel stride to generate_video(), producing inconsistent frame data downstream (and risking VAE segfaults). Fix: add the same dimension check in SdModel.cpp processVideo() right after the init_image decode, throwing StatusError on mismatch -- consistent with the existing end_image / control_frames checks. All three checks now compare against vid.width / vid.height as the single source of truth for the video's final dimensions. Ownership of the freshly-decoded init pixel buffer is taken into the unique_ptr *before* the dim check, mirroring the control_frames path so a mismatch can't leak the buffer. 2. gianni-cor: params.lora silently dropped on the video path -- video.js validated it as a non-empty absolute path and video.d.ts advertised `lora?: string`, but SD_VID_GEN_HANDLERS has no "lora" entry and SdModel::processVideo never touches sd_vid_gen_params_t::loras, so any LoRA passed through was swallowed by the unknown-keys branch in applySdVidGenHandlers and silently produced LoRA-less output. Fix B applied (reviewer's preferred "out of scope" option): - video.js: replaced the absolute-path validation with a loud TypeError('params.lora is not supported for video generation yet'), so existing callers fail at the JS boundary instead of getting silent LoRA-less output. - video.d.ts: dropped `lora?: string` from VideoGenerationParams. - video-validation.test.js: collapsed the four old lora cases (empty / non-string / relative / absolute) into one parametrised test that asserts the new TypeError fires for every shape, so a future re-introduction of the JS validation can't bring back the silent-drop regression. When LoRA-on-video is wired through native (mirror of processImage's prepareLoras() + sd_img_gen_params_t::loras), the right path is to restore the absolute-path validation here and add a "lora" handler to SD_VID_GEN_HANDLERS, NOT to revert the d.ts. C++ test changes: - new Img2VidRejectsInitImageWithWrongDimensions covers the blocker. - Flf2VidRejectsCorruptEndImage pinned width/height to 64 so the new init dim check passes for the 64x64 init and we still reach the intended end-decode-failure path (same approach as the existing Img2VidRejectsCorruptControlFrame fixture). Verified: 67/67 JS unit tests pass with and without prebuilds, 176/176 C++ tests pass (1 opt-in Wan smoke skipped, requires ~8GB weights), lint and tsc --dts clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): regression + 7 review-batch fixes (NaN/Inf guards, cancel, etc.) Addresses all 8 outstanding comments on PR #1879 (one regression from commit 59f2663 plus a CHANGES_REQUESTED batch of seven items). Major points below; per-file rationale in the inline comments. == Regression fix (highest priority) * gianni-cor flagged that the new init_image strict-equality check from commit 59f2663 rejects every off-grid frame with a confusing error citing wrapper-picked dims. Root cause: addon.js _fillDimsFromImage was silently doing Math.ceil(d/8)*8, so a 100x100 init_image got dispatched as 104x104 and the native check then threw "100x100 != 104x104" -- citing a value the caller never passed. Fixes: - addon.js _fillDimsFromImage now passes dims through verbatim (no rounding). The image SDEdit path already realigns internally (SdModel.cpp ~600) and the FLUX2 ref path uses auto_resize_ref_image, so dropping the rounding is safe across every path. - video.js _runInternal pre-empts the cryptic native error with a JS-layer off-grid probe: when width/height aren't explicit it reads init_image / end_image / control_frames[i] dimensions and throws a clear "your image is off-grid, pre-align or pass explicit dims" message naming the exact buffer. - Removes the ceil-vs-round inconsistency wart between _fillDimsFromImage (ceil) and the user-facing validator (round). - Three new JS regression tests for off-grid init / end / control, plus one positive test for explicit aligned dims overriding the probe. == JS hardening * params.prompt is documented Required but was never validated -- undefined / "" / 42 each produced a different failure mode (silent noise, silent noise, far-away C++ error). video.js now throws a loud TypeError at the wrapper boundary. Four new prompt-validation tests. * mapAddonEvent JobEnded fallback accepted every typed-array view -- works today only because uint8_t is the sole registered TypedArrayOutputHandler. When frameCallback (SdModel.hpp:139) gets wired through to JS, every per-frame event would have been misclassified as JobEnded and the response stream would have closed after the first frame. One-token fix: add `&& !ArrayBuffer.isView(rawData)` to the discriminator. ArrayBuffer.isView is true for every TypedArray + DataView, false for plain objects -- exactly the discrimination needed for the runtime-stats POJO. == C++ parser hardening (NaN / Inf / int64 / range) * Promoted requireInt from SdVidGenHandlers.cpp's anonymous namespace into parsers::, and added two siblings: - requireFiniteFloat: rejects NaN / +inf / -inf before the float cast (NaN compares false against every bound, so range checks of the form `f < lo || f > hi` previously let it sneak through). - requireInt64: same finite + integer guards as requireInt, range check against representable [INT64_MIN, INT64_MAX] doubles. - requireFiniteFloatInRange: convenience wrapper for [lo, hi] checks. * Routed every previously-vulnerable cast through the new helpers: - SdVidGenHandlers.cpp: seed (int64), cfg_scale, flow_shift, high_noise_cfg_scale, high_noise_flow_shift, vae_tile_overlap, cache_threshold, moe_boundary, strength, vace_strength - SdGenHandlers.cpp (image path, reviewer asked for symmetric fix): eta, cfg_scale, guidance, img_cfg_scale, seed, batch_count, strength, clip_skip, vae_tile_overlap, cache_threshold, width, height, steps, parseUpscaleRepeats * parseVaeTileSize (SdParsers.cpp): numeric form now routes through requireInt (rejects NaN/Inf/fractional/out-of-range), and BOTH forms (numeric and "WxH" string) now reject <= 0. Five new tests. == Cancellation gap + typed status * SdModel.cpp processVideo cancelRequested_ was checked exactly once after generate_video() returns -- the slow tail (per-frame PNG fan-out + AVI mux, multi-second on 81-frame 832x480 videos) had no cancellation visibility. Added 2 checks: top of frame-callback loop body, and immediately before encodeFramesToAvi. * Switched both Job cancelled throws (image path at SdModel.cpp:730, video path at :987, plus the 2 new C1 sites) from bare std::runtime_error to StatusError tagged with localCodeMsg="Cancelled", so the JS layer can discriminate cancel from real internal failures via codeString() ("[ General :: Cancelled ]") instead of string-matching the exception message. Note: this PR deliberately does NOT add `Cancelled = 6` to the shared inference-addon-cpp Errors.hpp enum, because that header ships via vcpkg to every package in the monorepo and a cross-package coordinated change is out of scope. Instead we use the 3-arg StatusError ctor (addonId, localCodeMsg, errorMsg) which produces the same codeString without touching the shared enum. When the enum is updated later, the 4 call sites can switch to the 2-arg ctor in a one-line follow-up. == C5 (preview_*) -- product decision deferred * The header comment at SdCtxHandlers.hpp:112 claimed preview_mode et al are "Wired to sd_set_preview_callback() in SdModel::process()", but a grep across packages/diffusion-cpp for sd_set_preview_callback returns zero matches -- the four config keys are validated and stored but the upstream callback is never installed, so they're a silent no-op end-to-end. Downgraded the misleading comment to an explicit TODO(QVAC-18026 follow-up) documenting the gap and the two viable resolution paths (wire it up alongside sd_set_abort_callback, OR remove the handlers + fields + tests). Reviewer asked which path is intended; this commit picks neither and just stops claiming the wiring exists. The choice can land in a separate PR without holding this one up. == Test surface * +8 JS tests (prompt validation x4, off-grid probe x4) * +5 C++ tests (vae_tile_size zero/negative/fractional/out-of-range rejection, plus the existing IntCoercion suite carried over to the promoted helpers transparently) * Cancel-context test updated to assert the typed "[ General :: Cancelled ]" codeString in addition to the message. Verified locally: JS unit tests: 75/75 pass with prebuild, 75/75 also without (CI sanity-checks mode, no native binary loaded) C++ unit tests: 209/210 pass, 1 opt-in skip (SdWanHappyPathTest needs ~8GB Wan weights) npm run lint: clean npm run test:dts: clean Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): release 0.8.0 Bumps @qvac/diffusion-cpp to 0.8.0 and documents the Wan 2.1 / Wan 2.2 video pipeline shipped since 0.7.0: new VideoStableDiffusion class (txt2vid / img2vid / flf2vid), MoE high-noise expert routing, streaming MJPG AVI muxer, refactored download helpers + Wan model script, plus the supporting JS + C++ test coverage and validation hardening. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): re-align auto-detected img dims to multiple of 8 _fillDimsFromImage was passing raw image dimensions through verbatim since fe4d10f, but the native SdGenHandlers validates width/height % 8 == 0 before the downstream alignment in SdModel::processImage ever runs. Any img2img call with a non-aligned source image (e.g. the bundled 500x627 von-neumann.jpg used by the FLUX2 i2i integration test) therefore failed with: height must be a positive multiple of 8, got: 627 Restore the Math.ceil(d/8)*8 round-up that was removed in fe4d10f. The original motivation for the removal -- avoiding a spurious dim mismatch on the video path where processVideo strict-compares decoded frame dims against vid.width/vid.height -- is already handled at the JS layer by VideoStableDiffusion's off-grid pre-validation in video.js, which runs before this helper and rejects unaligned init/end/control frames with a clear caller-facing error. The ceil() is therefore a no-op on the video path. Co-authored-by: Cursor <cursoragent@cursor.com> * style(diffusion-cpp): apply clang-format to drifted C++ sources cpp-lint surfaced clang-format drift in 4 files that accumulated across recent Wan-video commits. No semantic changes -- only mechanical line-wrap / arg-break placement to match the project's .clang-format. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/test): use package export for video module in wan integration test The generate-video-wan.test.js test was using a relative import (require('../../video')) that breaks when test files are bundled and relocated to the test-framework backend directory during mobile test setup. Change to the package export pattern (@qvac/diffusion-cpp/video) used by other integration tests, which remains valid regardless of file location. Fixes: https://github.com/tetherto/qvac/actions/runs/25929776543/job/76221440417 Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): expose video API from package root Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): repair variable names in SdModel after merge Co-authored-by: Cursor <cursoragent@cursor.com> * style(diffusion-cpp): apply git-clang-format Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

…s 2026-01-30#7 The previous commit (0f5c522) repointed the local ggml overlay at the merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134 landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the identical REF + SHA512 the overlay was carrying. This means the diffusion-cpp-local overlay is now strictly redundant -- and slightly behind, since the registry's port-version 7 also picks up two improvements the overlay didn't have: - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build off the Apple Accelerate / BLAS path that breaks the iOS toolchain. - The Android backend-glob now also matches `libqvac-ggml-*.so` in addition to `libggml-*.so`, so the qvac-prefixed DL backends get installed alongside the upstream-named ones. So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake, vcpkg.json, usage, android-vulkan-version.cmake) and: - Bump `vcpkg-configuration.json`'s default-registry baseline from a9eae49a -> d1b2497b (the merge commit of registry PR #134), which is the first registry SHA that serves ggml@2026-01-30#7. - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5` to `version>=: 2026-01-30#7` so any later baseline bump can't silently drop us back below the Wan-Metal pin. The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep` marker are kept in place so future overlays can be added without a config flap. Verified end-to-end on darwin-arm64: clean `npm run build` (bare-make generate + build + install) with the build/ tree wiped. vcpkg resolves ggml[core,metal]:arm64-osx@2026-01-30#7 -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875... straight from the registry (no overlay), all 8 ports install in 47s, the addon links cleanly against the registry-supplied libggml*.a, and prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten. Net diff: +2 / -283. Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(diffusion): refactor download scripts and add Wan 2.1 support - Extract shared dl() function into reusable dl-functions.sh module - Update all download-model-*.sh scripts to source shared utilities - Add download-model-wan.sh for Wan 2.1 video generation models - Reduces code duplication and improves maintainability Wan 2.1 downloads (~8.3 GB): - wan2.1_t2v_1.3B_fp16.safetensors (diffusion model) - wan_2.1_vae.safetensors (VAE encoder/decoder) - umt5_xxl_fp16.safetensors (text encoder) Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video foundation -- ctx/vid handlers, AVI muxer, shared parsers Phase 1-4 of Wan 2.1 / 2.2 video generation support in the diffusion-cpp addon. Configuration + parsing layer only; dispatch + callback plumbing + JS surface land in follow-up commits on this branch. SdCtxConfig: - Add highNoiseDiffusionModelPath for Wan 2.2 MoE high-noise expert (leave empty for Wan 2.1 and all non-Wan models) - Add previewMode / previewInterval / previewDenoised / previewNoisy for optional mid-denoising preview frames via sd_set_preview_callback - Wire both through SdCtxHandlers (new JS keys: preview_mode, preview_interval, preview_denoised, preview_noisy) and AddonJs (highNoiseDiffusionModelPath in args map) AviWriter (new utility): - addon/src/utils/AviWriter.{hpp,cpp} ports the upstream avi_writer.h MJPG encoder onto an in-memory std::vector<uint8_t> sink (no stdio, no temp files) so video bytes flow through the existing OutputCallBackJs queue - Full input validation (numFrames, fps, jpegQuality, channel count, frame homogeneity, null data) -- StatusError on any rejection SdParsers (new shared module): - Extract parseSampler / parseScheduler / parseCacheMode / parseVaeTileSize / parseCachePreset / requireNum/Str/Bool from SdGenHandlers into addon/src/handlers/SdParsers.{hpp,cpp} - Reused by both SdGenHandlers (image) and SdVidGenHandlers (video) SdVidGenHandlers (new): - SdVidGenConfig struct with full Wan 2.1 + 2.2 surface: mode (txt2vid/img2vid/flf2vid), prompts, dimensions, videoFrames (4k+1 validated), fps, seed, low-noise expert sample params, high-noise expert sample params, moeBoundary, strength, vaceStrength, VAE tiling, cache mode/preset/threshold - 22 JSON handlers with validation for each field Tests (all pass): - 5 new SdCtxHandlers tests for preview_* + high_noise path default - 18 new AviWriter tests covering happy path, RIFF header structure, all validation rejections, JPEG round-trip - 54 new SdVidGenHandlers tests covering every field + integration payload + defaults - Zero regressions across existing 144 fast-unit tests No user-facing JS API changes yet. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video generation -- dispatch, processVideo, JS wrapper + examples Builds on the Wan foundation commit by wiring the video path end-to-end from JS to C++ and back. Adds txt2vid / img2vid / flf2vid generation via a new VideoStableDiffusion class that shares the single native binding with the existing ImgStableDiffusion class. Native: - SdModel::process() dispatches on the JSON "mode" field to processImage() (existing) or the new processVideo() path. - processVideo() applies SdVidGenHandlers, validates mode-vs-inputs invariants (img2vid requires init_image; flf2vid requires both; txt2vid rejects both; end_image only valid on flf2vid), decodes init/end/control frames, fills sd_vid_gen_params_t, and encodes the returned sd_image_t* sequence to an in-memory MJPG AVI. - SdVideoFrames RAII wrapper extracted to addon/src/utils/ so it can be unit-tested without a loaded model. - GenerationJob grows endImageBytes and controlFramesBytes plus an optional per-frame frameCallback (unused from JS in this PR; reserved for the preview follow-up). - AddonJs::runJob reads endImageBuffer (single Uint8Array) and controlFramesBuffers (Array of Uint8Array) as typed-array args, no JSON encoding. JS surface: - video.js / video.d.ts: new VideoStableDiffusion class with full per-mode validation, 4k+1 frame-count rule, fps range, moe_boundary range, Uint8Array type checks, and warning when high_noise_* params are set without files.highNoiseDiffusionModel. - addon.js: SdInterface.runJob threads end_image and control_frames through to the native runJob without round-tripping through JSON. - index.js / index.d.ts: unchanged -- image wrapper continues to work exactly as before. Both classes compose the same SdInterface and hit the same binding.cpp entry points. - package.json: exports "./video", ships video.js / video.d.ts, adds generate:video / generate:img2vid / generate:flf2vid scripts. Examples: - examples/generate-video-wan.js (txt2vid @ 832x480, 33 frames) - examples/img2vid-wan.js (reuses assets/von-neumann.jpg as first frame) - examples/flf2vid-wan.js (expects flf-first.png / flf-last.png) Tests: - test_sd_video_frames.cpp: 12 RAII tests (empty states, destruction of 4k+1 production sizes, null-pixel tolerance, bounds-checked operator[], compile-time copy/move deletion). - test_wan_video.cpp: 12 validation tests reusing the SD2.1 context to satisfy isLoaded() and exercise every processVideo() guard before generate_video() runs; plus an opt-in happy-path smoke test (SD_RUN_WAN_SMOKE=1) gated off by default because ggml-metal lacks IM2COL_3D for Wan's 3D convs. Gates: npm run lint, npm run test:dts, npm run build, and the fast subset of addon-test (178/178) all pass. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(diffusion): Wan video tests, ggml overlay, example tuning Add a vcpkg overlay-port for ggml at vcpkg/ports/ggml/ that pins tetherto/qvac-ext-ggml @ feature/metal-pr-16669-clean (commit bc053644). The fork adds Metal kernels for IM2COL_3D and 3-axis PAD-left, both required by Wan 2.1 / 2.2 video generation; without them ggml hard-aborts mid-run with "unsupported op 'IM2COL_3D'". Rationale lives in portfile.cmake -- the overlay is transient and will be removed once the registry baseline rolls forward. Add JS test coverage for VideoStableDiffusion: - test/unit/video-validation.test.js: 63 input-validation cases mirroring the existing input-validation.test.js pattern. - test/integration/generate-video-wan.test.js: opt-in (WAN_INTEGRATION=1) end-to-end T2V smoke test plus sniffAvi self-tests. Tune the Wan examples: - generate-video-wan.js: env-var-driven (PROMPT, FRAMES, STEPS, SEED, CFG_SCALE, FLOW_SHIFT, ...), inline frame-count cheat sheet, (4*k+1) pre-flight check, default FRAMES bumped to 81 (Wan 1.3B's native training length). - img2vid-wan.js, flf2vid-wan.js: flow_shift 5.0 -> 3.0 to match the upstream test-wan reference scripts. Refresh the C++ smoke-test gating doc in test_wan_video.cpp to reflect that Metal works once the overlay is in place. Drop build.md: the vcpkg overlay rationale already lives next to the overlay (portfile.cmake header), and transient infrastructure doesn't earn its own long-form doc. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(diffusion-cpp): restore build.md The earlier deletion conflated build.md with the vcpkg overlay rationale, but build.md is the package's standalone build guide (prerequisites, build pipeline, cross-compilation, troubleshooting) and is still the target of README.md's "Building from Source" link. Restore it from main, which also picks up the LLVM 19 -> 22 bump. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): address PR review feedback for Wan video gen * Flip default video dimensions to 480x832 portrait (phone-screen friendly). Wan 2.1 T2V 1.3B handles both orientations equally well; the previous 832x480 landscape default disagreed with the example. * Document the flow_shift=0 fall-through sentinel in JSDoc, .d.ts, and C++ struct/handler comments; correct stale "5-8" recommendation to the actually-used 3.0 (matches example + ref scripts). * Make video_frames error messages consistent JS<->C++ and list the full valid set up to 81 (Wan 1.3B native training cap). * Fix frame-duration arithmetic (33 frames is ~2s @ default 16 fps, not ~1.3s @ 24 fps). * Warn when upscaler_* keys are passed to VideoStableDiffusion -- ESRGAN upscale is image-only and was being silently ignored. * Annotate addon.js end_image / control_frames forwarding to call out the typed-array transport (avoids JSON byte-array bloat). * Document the two-level concurrency model around _hasActiveResponse (the busy guard isn't dead under exclusiveRunQueue -- it covers overlap between the released queue lock and an in-flight response). * Update C++ defaults test + JS suggestion-fallback test for the new portrait orientation. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): retarget ggml overlay to merged tetherto/qvac-ext-ggml@2026-01-30 The Wan-Metal work that was carried as a local overlay has all landed upstream on tetherto/qvac-ext-ggml's 2026-01-30 branch: - bc053644 metal: IM2COL_3D op + PAD left-padding for Wan video (#5) - 512e1773 cmake: support qvac hybrid backend packaging (static CPU + dynamic GPU backends, GGML_MAX_NAME prop, graceful no-OpenCL-device fallback, public ggml-opencl.h install -- previously six local overlay patches) - 6d2d24bb / b1923e29 / 05afdc59 metal: tighten IM2COL_3D supports_op to match the CPU-reference invariants (#6) Repin vcpkg/ports/ggml from PR #5's head (bc053644) to PR #6's merge commit (05afdc59) on 2026-01-30, drop all seven local overlay patches since their content is now upstream verbatim, and bump port-version 102 -> 104 to force a clean rebuild of ggml. Net diff: +22 / -201; the overlay now exists only as a baseline pin that overrides the registry's ggml-org/ggml@a8db410a (which still lacks the Wan-required Metal ops). Once the registry baseline catches up to a ref containing this work, vcpkg/ports/ggml/ can be deleted entirely. Verified with npm run build on darwin-arm64: ggml@2026-01-30#104 builds fresh from 05afdc59 with zero patches applied, addon links and tests compile, prebuild installed. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): drop local ggml overlay now that registry serves 2026-01-30#7 The previous commit (04a6496) repointed the local ggml overlay at the merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134 landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the identical REF + SHA512 the overlay was carrying. This means the diffusion-cpp-local overlay is now strictly redundant -- and slightly behind, since the registry's port-version 7 also picks up two improvements the overlay didn't have: - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build off the Apple Accelerate / BLAS path that breaks the iOS toolchain. - The Android backend-glob now also matches `libqvac-ggml-*.so` in addition to `libggml-*.so`, so the qvac-prefixed DL backends get installed alongside the upstream-named ones. So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake, vcpkg.json, usage, android-vulkan-version.cmake) and: - Bump `vcpkg-configuration.json`'s default-registry baseline from a9eae49a -> d1b2497b (the merge commit of registry PR #134), which is the first registry SHA that serves ggml@2026-01-30#7. - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5` to `version>=: 2026-01-30#7` so any later baseline bump can't silently drop us back below the Wan-Metal pin. The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep` marker are kept in place so future overlays can be added without a config flap. Verified end-to-end on darwin-arm64: clean `npm run build` (bare-make generate + build + install) with the build/ tree wiped. vcpkg resolves ggml[core,metal]:arm64-osx@2026-01-30#7 -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875... straight from the registry (no overlay), all 8 ports install in 47s, the addon links cleanly against the registry-supplied libggml*.a, and prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten. Net diff: +2 / -283. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): satisfy standard quotes rule in validateVideoFrames The middle line of the validateVideoFrames Error message was a template literal with no `${...}` interpolation, so `standard` (configured via `npm run lint`) flags it as `quotes`: video.js:39:7: Strings must use singlequote. Adjacent lines 37, 38 use single quotes, and line 40 legitimately uses backticks for `${n}`. Just the one stray backtick-string -- swap to single quotes; no behaviour change. Sanity-checks job 74830306544 on PR #1879 fails on this single line; `npm run lint` passes locally after the swap. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: enable diffusion FA in examples and fix addon paths - Set diffusion_fa: true across SD, FLUX, and integration test ImgStableDiffusion configs so diffusion flash attention matches WAN video examples. - Pass highNoiseDiffusionModelPath (empty when unset) from index.js so native createInstance validation succeeds for image mode; document optional files.highNoiseDiffusionModel in index.d.ts and validate absolute paths. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp(video): pass esrganPath to native createInstance VideoStableDiffusion omitted esrganPath while the binding validates it as a string; mirror image-mode by forwarding files.esrgan or empty string. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: align C++ includes and image codec with inference-addon-cpp - Switch remaining qvac-lib-inference-addon-cpp includes to inference-addon-cpp (vcpkg installs headers under the shorter prefix). - Use image_codec::decodeImage / encodeToPng in processVideo after ImageCodec API rename from decodePng. Co-authored-by: Cursor <cursoragent@cursor.com> * diffusion-cpp: apply clang-format to changed C++ sources Run git-clang-format against 2c4dc65 to satisfy the repo formatter on the video addon, image codec, and Wan tests. No behavior changes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): address review comments 1-3 1. Use global addonLogging instead of per-instance setLogger/releaseLogger - Eliminates process-global logger collision (was reintroduced in video.js) - Mirrors fix from ImgStableDiffusion / EsrganUpscaler - video.js no longer manages per-instance logger state 2. Reject width/height values <= 0 in JS validation - Now validates that width > 0 and height > 0 before alignment check - Error message updated to say "positive multiples of 8" - Updated test expectations to match new message 3. Validate double values are integers before casting in C++ - All int casts now check std::floor(d) == d first - Affects: width, height, video_frames, fps handlers - Prevents silent truncation (e.g. 8.5 -> 8) All 70 unit tests pass; build/lint/dts all clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): address review comments 4-7 4. Validate end_image / control_frames dimensions match video dimensions - Added dimension checks in processVideo() before generate_video() - Rejects mismatched frame sizes with clear error messages - Prevents silent corruption or undefined behavior in native layer 5. Use ImageCodec ownership helper instead of raw free() - Replaced FrameBuffersGuard with unique_ptr<uint8_t, FreeDeleter> - Consistent with existing image_codec ownership pattern - Automatic cleanup on exception; no manual free() calls 6. Regenerate mobile integration test manifest - Ran npm run test:mobile:generate - Updated test/mobile/integration.auto.cjs with new runners 7. Add checked buffer size calculation in AviWriter - Validates width * height overflow before multiplication - Validates numFrames * bytesPerFrame overflow - Rejects allocations that would exceed SIZE_MAX - Prevents silent integer overflow in reserve() call All 70 unit tests pass; build/lint/dts all clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): harden int validation, ownership, AVI overflow Follow-up tightening on top of the review fixes for #1879. SdVidGenHandlers: - Extract a single requireInt() helper used by width / height / video_frames / fps / requirePositiveInt. The helper rejects NaN, +/-inf, fractional doubles, and values outside [INT_MIN, INT_MAX] before static_cast<int>, so casts to int are always well-defined and no JSON value silently truncates (e.g. 8.5 -> 8). - Add <cmath>/<climits> includes that were transitively available. SdModel::processVideo: - Replace the bespoke FrameBuffersGuard struct with three plain unique_ptr<uint8_t, image_codec::FreeDeleter> values (initData / endData / controlData). Same lifetime semantics, less custom code, and the control-frame dimension mismatch path now takes ownership *before* the check so a throw can no longer leak the freshly-decoded buffer. AviWriter::encodeFramesToAvi: - Reserve calculation is now step-wise overflow-checked against SIZE_MAX (width vs height vs *3 vs *numFrames) instead of a single multiply that could wrap. - Add a hard upper bound at UINT32_MAX (AVI 1.0 RIFF size header is a uint32_t -- anything past 4 GB cannot be addressed by the spec). - Re-check the final size before patching the RIFF header in case JPEG output overshoots the pre-flight estimate. Tests: - SdVidGenHandlers: new IntCoercion suite covers fractional doubles, out-of-int-range doubles, picojson's own NaN/inf rejection at the JSON layer, and integer-valued doubles (the common case from JSON). - AviWriter: new tests for the overflow guard and the 4 GB RIFF cap, both fire before any encoding starts. - test_wan_video: pin width/height in the existing CorruptControlFrame test so the new dimension check passes for frame [0] and we still exercise the decode-failure path at frame [1]. Add two new cases covering end_image and control_frames dimension mismatch. All 211 C++ tests, 70 JS unit tests, lint and tsc --dts pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): don't eager-require binding via addonLogging CI sanity-checks (JS unit tests on a runner with no native prebuild) was crashing with `AddonError: ADDON_NOT_FOUND` because the top-level `require('./addonLogging')` introduced in e6b13ae transitively pulled in `binding.js` -> `libqvac__diffusion-cpp.so`. The unit tests only exercise JS-side validation and never call `load()`, so they used to work without the prebuilt addon -- this regression broke that. Match `ImgStableDiffusion` instead: drop the per-instance native logger plumbing entirely (it's dead code anyway after the e6b13ae refactor, since `_connectNativeLogger` was no longer called), and document in the constructor JSDoc that callers wire up native C++ logs once globally via `addonLogging.setLogger(...)`. Net diff: - Remove `const addonLogging = require('./addonLogging')` at top. - Remove `_connectNativeLogger` / `_releaseNativeLogger` methods and their two stale call sites. - Remove `LOG_METHODS` (only used by the removed method) and `this._binding` (used to keep a handle for the removed release path; the binding is now scoped to `_createAddon` only, matching `ImgStableDiffusion::_createAddon`). - JSDoc on `args.logger` now mirrors `index.js` and points users at `addonLogging.setLogger`. Verified: JS unit tests 70/70 pass with the prebuilds directory moved aside, lint clean, tsc --dts clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/video): validate init_image dims; reject unsupported lora Two reviewer-flagged regressions on PR #1879: 1. blocker (gabrielgrigoras-serv): processVideo() validates dimensions for end_image and every control_frames[i] but not for init_image. A caller passing width/height that don't match the decoded init_image would hand mismatched (width, height) and frame pixel stride to generate_video(), producing inconsistent frame data downstream (and risking VAE segfaults). Fix: add the same dimension check in SdModel.cpp processVideo() right after the init_image decode, throwing StatusError on mismatch -- consistent with the existing end_image / control_frames checks. All three checks now compare against vid.width / vid.height as the single source of truth for the video's final dimensions. Ownership of the freshly-decoded init pixel buffer is taken into the unique_ptr *before* the dim check, mirroring the control_frames path so a mismatch can't leak the buffer. 2. gianni-cor: params.lora silently dropped on the video path -- video.js validated it as a non-empty absolute path and video.d.ts advertised `lora?: string`, but SD_VID_GEN_HANDLERS has no "lora" entry and SdModel::processVideo never touches sd_vid_gen_params_t::loras, so any LoRA passed through was swallowed by the unknown-keys branch in applySdVidGenHandlers and silently produced LoRA-less output. Fix B applied (reviewer's preferred "out of scope" option): - video.js: replaced the absolute-path validation with a loud TypeError('params.lora is not supported for video generation yet'), so existing callers fail at the JS boundary instead of getting silent LoRA-less output. - video.d.ts: dropped `lora?: string` from VideoGenerationParams. - video-validation.test.js: collapsed the four old lora cases (empty / non-string / relative / absolute) into one parametrised test that asserts the new TypeError fires for every shape, so a future re-introduction of the JS validation can't bring back the silent-drop regression. When LoRA-on-video is wired through native (mirror of processImage's prepareLoras() + sd_img_gen_params_t::loras), the right path is to restore the absolute-path validation here and add a "lora" handler to SD_VID_GEN_HANDLERS, NOT to revert the d.ts. C++ test changes: - new Img2VidRejectsInitImageWithWrongDimensions covers the blocker. - Flf2VidRejectsCorruptEndImage pinned width/height to 64 so the new init dim check passes for the 64x64 init and we still reach the intended end-decode-failure path (same approach as the existing Img2VidRejectsCorruptControlFrame fixture). Verified: 67/67 JS unit tests pass with and without prebuilds, 176/176 C++ tests pass (1 opt-in Wan smoke skipped, requires ~8GB weights), lint and tsc --dts clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): regression + 7 review-batch fixes (NaN/Inf guards, cancel, etc.) Addresses all 8 outstanding comments on PR #1879 (one regression from commit 59f2663 plus a CHANGES_REQUESTED batch of seven items). Major points below; per-file rationale in the inline comments. == Regression fix (highest priority) * gianni-cor flagged that the new init_image strict-equality check from commit 59f2663 rejects every off-grid frame with a confusing error citing wrapper-picked dims. Root cause: addon.js _fillDimsFromImage was silently doing Math.ceil(d/8)*8, so a 100x100 init_image got dispatched as 104x104 and the native check then threw "100x100 != 104x104" -- citing a value the caller never passed. Fixes: - addon.js _fillDimsFromImage now passes dims through verbatim (no rounding). The image SDEdit path already realigns internally (SdModel.cpp ~600) and the FLUX2 ref path uses auto_resize_ref_image, so dropping the rounding is safe across every path. - video.js _runInternal pre-empts the cryptic native error with a JS-layer off-grid probe: when width/height aren't explicit it reads init_image / end_image / control_frames[i] dimensions and throws a clear "your image is off-grid, pre-align or pass explicit dims" message naming the exact buffer. - Removes the ceil-vs-round inconsistency wart between _fillDimsFromImage (ceil) and the user-facing validator (round). - Three new JS regression tests for off-grid init / end / control, plus one positive test for explicit aligned dims overriding the probe. == JS hardening * params.prompt is documented Required but was never validated -- undefined / "" / 42 each produced a different failure mode (silent noise, silent noise, far-away C++ error). video.js now throws a loud TypeError at the wrapper boundary. Four new prompt-validation tests. * mapAddonEvent JobEnded fallback accepted every typed-array view -- works today only because uint8_t is the sole registered TypedArrayOutputHandler. When frameCallback (SdModel.hpp:139) gets wired through to JS, every per-frame event would have been misclassified as JobEnded and the response stream would have closed after the first frame. One-token fix: add `&& !ArrayBuffer.isView(rawData)` to the discriminator. ArrayBuffer.isView is true for every TypedArray + DataView, false for plain objects -- exactly the discrimination needed for the runtime-stats POJO. == C++ parser hardening (NaN / Inf / int64 / range) * Promoted requireInt from SdVidGenHandlers.cpp's anonymous namespace into parsers::, and added two siblings: - requireFiniteFloat: rejects NaN / +inf / -inf before the float cast (NaN compares false against every bound, so range checks of the form `f < lo || f > hi` previously let it sneak through). - requireInt64: same finite + integer guards as requireInt, range check against representable [INT64_MIN, INT64_MAX] doubles. - requireFiniteFloatInRange: convenience wrapper for [lo, hi] checks. * Routed every previously-vulnerable cast through the new helpers: - SdVidGenHandlers.cpp: seed (int64), cfg_scale, flow_shift, high_noise_cfg_scale, high_noise_flow_shift, vae_tile_overlap, cache_threshold, moe_boundary, strength, vace_strength - SdGenHandlers.cpp (image path, reviewer asked for symmetric fix): eta, cfg_scale, guidance, img_cfg_scale, seed, batch_count, strength, clip_skip, vae_tile_overlap, cache_threshold, width, height, steps, parseUpscaleRepeats * parseVaeTileSize (SdParsers.cpp): numeric form now routes through requireInt (rejects NaN/Inf/fractional/out-of-range), and BOTH forms (numeric and "WxH" string) now reject <= 0. Five new tests. == Cancellation gap + typed status * SdModel.cpp processVideo cancelRequested_ was checked exactly once after generate_video() returns -- the slow tail (per-frame PNG fan-out + AVI mux, multi-second on 81-frame 832x480 videos) had no cancellation visibility. Added 2 checks: top of frame-callback loop body, and immediately before encodeFramesToAvi. * Switched both Job cancelled throws (image path at SdModel.cpp:730, video path at :987, plus the 2 new C1 sites) from bare std::runtime_error to StatusError tagged with localCodeMsg="Cancelled", so the JS layer can discriminate cancel from real internal failures via codeString() ("[ General :: Cancelled ]") instead of string-matching the exception message. Note: this PR deliberately does NOT add `Cancelled = 6` to the shared inference-addon-cpp Errors.hpp enum, because that header ships via vcpkg to every package in the monorepo and a cross-package coordinated change is out of scope. Instead we use the 3-arg StatusError ctor (addonId, localCodeMsg, errorMsg) which produces the same codeString without touching the shared enum. When the enum is updated later, the 4 call sites can switch to the 2-arg ctor in a one-line follow-up. == C5 (preview_*) -- product decision deferred * The header comment at SdCtxHandlers.hpp:112 claimed preview_mode et al are "Wired to sd_set_preview_callback() in SdModel::process()", but a grep across packages/diffusion-cpp for sd_set_preview_callback returns zero matches -- the four config keys are validated and stored but the upstream callback is never installed, so they're a silent no-op end-to-end. Downgraded the misleading comment to an explicit TODO(QVAC-18026 follow-up) documenting the gap and the two viable resolution paths (wire it up alongside sd_set_abort_callback, OR remove the handlers + fields + tests). Reviewer asked which path is intended; this commit picks neither and just stops claiming the wiring exists. The choice can land in a separate PR without holding this one up. == Test surface * +8 JS tests (prompt validation x4, off-grid probe x4) * +5 C++ tests (vae_tile_size zero/negative/fractional/out-of-range rejection, plus the existing IntCoercion suite carried over to the promoted helpers transparently) * Cancel-context test updated to assert the typed "[ General :: Cancelled ]" codeString in addition to the message. Verified locally: JS unit tests: 75/75 pass with prebuild, 75/75 also without (CI sanity-checks mode, no native binary loaded) C++ unit tests: 209/210 pass, 1 opt-in skip (SdWanHappyPathTest needs ~8GB Wan weights) npm run lint: clean npm run test:dts: clean Co-authored-by: Cursor <cursoragent@cursor.com> * chore(diffusion-cpp): release 0.8.0 Bumps @qvac/diffusion-cpp to 0.8.0 and documents the Wan 2.1 / Wan 2.2 video pipeline shipped since 0.7.0: new VideoStableDiffusion class (txt2vid / img2vid / flf2vid), MoE high-noise expert routing, streaming MJPG AVI muxer, refactored download helpers + Wan model script, plus the supporting JS + C++ test coverage and validation hardening. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): re-align auto-detected img dims to multiple of 8 _fillDimsFromImage was passing raw image dimensions through verbatim since fe4d10f, but the native SdGenHandlers validates width/height % 8 == 0 before the downstream alignment in SdModel::processImage ever runs. Any img2img call with a non-aligned source image (e.g. the bundled 500x627 von-neumann.jpg used by the FLUX2 i2i integration test) therefore failed with: height must be a positive multiple of 8, got: 627 Restore the Math.ceil(d/8)*8 round-up that was removed in fe4d10f. The original motivation for the removal -- avoiding a spurious dim mismatch on the video path where processVideo strict-compares decoded frame dims against vid.width/vid.height -- is already handled at the JS layer by VideoStableDiffusion's off-grid pre-validation in video.js, which runs before this helper and rejects unaligned init/end/control frames with a clear caller-facing error. The ceil() is therefore a no-op on the video path. Co-authored-by: Cursor <cursoragent@cursor.com> * style(diffusion-cpp): apply clang-format to drifted C++ sources cpp-lint surfaced clang-format drift in 4 files that accumulated across recent Wan-video commits. No semantic changes -- only mechanical line-wrap / arg-break placement to match the project's .clang-format. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp/test): use package export for video module in wan integration test The generate-video-wan.test.js test was using a relative import (require('../../video')) that breaks when test files are bundled and relocated to the test-framework backend directory during mobile test setup. Change to the package export pattern (@qvac/diffusion-cpp/video) used by other integration tests, which remains valid regardless of file location. Fixes: https://github.com/tetherto/qvac/actions/runs/25929776543/job/76221440417 Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): expose video API from package root Co-authored-by: Cursor <cursoragent@cursor.com> * fix(diffusion-cpp): repair variable names in SdModel after merge Co-authored-by: Cursor <cursoragent@cursor.com> * style(diffusion-cpp): apply git-clang-format Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

jpgaribotti approved these changes May 7, 2026

View reviewed changes

jpgaribotti merged commit d1b2497 into main May 7, 2026
2 checks passed

gianni-cor deleted the chore/ggml-2026-01-30-pv7 branch May 9, 2026 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated ggml to version 2026-01-30#7#134

Updated ggml to version 2026-01-30#7#134
jpgaribotti merged 1 commit into
mainfrom
chore/ggml-2026-01-30-pv7

aegioscy commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aegioscy commented May 7, 2026

Summary

Files changed

Verification

Follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants