Skip to content

Updated ggml to version 2026-01-30#7#134

Merged
jpgaribotti merged 1 commit into
mainfrom
chore/ggml-2026-01-30-pv7
May 7, 2026
Merged

Updated ggml to version 2026-01-30#7#134
jpgaribotti merged 1 commit into
mainfrom
chore/ggml-2026-01-30-pv7

Conversation

@aegioscy

@aegioscy aegioscy commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Bumps the ggml port to the merge of tetherto/qvac-ext-ggml#6 (05afdc5981031b8dcfd5f9cc979442b707b8486c).

The current pin (e16bdae2, port-version 6) carries the qvac hybrid-backend packaging work but predates the Wan-required Metal kernels — so today, any consumer hitting the Wan video path on Metal has to ship a local overlay or aborts at runtime with:

ggml_metal_op_encode_impl: error: unsupported op 'IM2COL_3D'

Five commits land on top of e16bdae2 with this bump:

SHA Description
bc053644 metal: IM2COL_3D op + PAD left-padding for Wan video (PR #5)
6d2d24bb metal: tighten IM2COL_3D supports_op to require src[1]->type == F32
b1923e29 metal: extend IM2COL_3D supports_op for nb[0]==sizeof(float) and F16-dst => F16-kernel match
05afdc59 Merge of tetherto/qvac-ext-ggml#6 into 2026-01-30

The supports_op tightening commits resolve advertise-then-abort gaps where Metal returned SUPPORTED for IM2COL_3D graphs that the CPU reference would then GGML_ASSERT on.

Files changed

Path Change
ports/ggml/portfile.cmake REF e16bdae2…REF 05afdc59…; new SHA512; header comment updated
ports/ggml/vcpkg.json port-version: 67; description annotated
versions/g-/ggml.json prepend { git-tree: f1632875…, version-date: 2026-01-30, port-version: 7 }
versions/baseline.json ggml.port-version: 67

The git-tree SHA was computed via git rev-parse HEAD:ports/ggml after staging the port edits.

Verification

  • Tarball downloaded and SHA512 recomputed from tetherto/qvac-ext-ggml@05afdc59 via curl … | shasum -a 512.
  • IM2COL_3D predicate at the new pin verified via the GitHub contents API to match the final form from PR Updated port qvac-lib-inference-addon-cpp to v0.2.0 #6.
  • This same source tarball is the one currently consumed by diffusion-cpp's local overlay (port-version 104, identical REF+SHA512); that overlay builds clean with zero patches on darwin-arm64 and runs Wan2.1 1.3B txt2video end-to-end on Metal — which is what this bump enables for all registry consumers.

Follow-up

Once this is merged, qvac/packages/diffusion-cpp can drop its local vcpkg/ports/ggml/ overlay entirely (it currently exists only because the registry was 5 commits behind PR #6's merge).

Made with Cursor.

Made with Cursor

Bumps ggml port to the merge of tetherto/qvac-ext-ggml#6 (05afdc59),
which lands on top of the previous pin (e16bdae2):

  - bc053644  metal: IM2COL_3D op + PAD left-padding for Wan video    (#5)
  - 6d2d24bb  metal: tighten IM2COL_3D supports_op (src[1]==F32)
  - b1923e29  metal: extend IM2COL_3D supports_op for nb[0]==sizeof(float)
              and F16-dst => F16-kernel match
  - 05afdc59  Merge pull request #6 from aegioscy

Without these the Metal backend aborts mid-Wan inference with
`unsupported op 'IM2COL_3D'` and test-backend-ops support advertises
invalid IM2COL_3D combos that hit CPU GGML_ASSERTs.

Verified end-to-end on darwin-arm64 via the same source tarball
already used by diffusion-cpp's local overlay (now redundant after
this bump): ggml@2026-01-30#7 builds with no patches, addon links
against it, and Wan2.1 1.3B txt2video runs end-to-end on Metal.

Co-authored-by: Cursor <cursoragent@cursor.com>
@jpgaribotti jpgaribotti merged commit d1b2497 into main May 7, 2026
2 checks passed
aegioscy added a commit to tetherto/qvac that referenced this pull request May 7, 2026
…s 2026-01-30#7

The previous commit (04a6496) repointed the local ggml overlay at the
merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on
Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same
ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134
landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the
identical REF + SHA512 the overlay was carrying.

This means the diffusion-cpp-local overlay is now strictly redundant --
and slightly behind, since the registry's port-version 7 also picks up
two improvements the overlay didn't have:
  - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build
    off the Apple Accelerate / BLAS path that breaks the iOS toolchain.
  - The Android backend-glob now also matches `libqvac-ggml-*.so` in
    addition to `libggml-*.so`, so the qvac-prefixed DL backends get
    installed alongside the upstream-named ones.

So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake,
vcpkg.json, usage, android-vulkan-version.cmake) and:

  - Bump `vcpkg-configuration.json`'s default-registry baseline from
    a9eae49a -> d1b2497b (the merge commit of registry PR #134), which
    is the first registry SHA that serves ggml@2026-01-30#7.
  - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5`
    to `version>=: 2026-01-30#7` so any later baseline bump can't
    silently drop us back below the Wan-Metal pin.

The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep`
marker are kept in place so future overlays can be added without a
config flap.

Verified end-to-end on darwin-arm64: clean `npm run build`
(bare-make generate + build + install) with the build/ tree wiped.
vcpkg resolves
  ggml[core,metal]:arm64-osx@2026-01-30#7
  -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875...
straight from the registry (no overlay), all 8 ports install in 47s,
the addon links cleanly against the registry-supplied libggml*.a, and
prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten.

Net diff: +2 / -283.

Co-authored-by: Cursor <cursoragent@cursor.com>
@gianni-cor gianni-cor deleted the chore/ggml-2026-01-30-pv7 branch May 9, 2026 06:58
gianni-cor added a commit to tetherto/qvac that referenced this pull request May 18, 2026
* feat(diffusion): refactor download scripts and add Wan 2.1 support

- Extract shared dl() function into reusable dl-functions.sh module
- Update all download-model-*.sh scripts to source shared utilities
- Add download-model-wan.sh for Wan 2.1 video generation models
- Reduces code duplication and improves maintainability

Wan 2.1 downloads (~8.3 GB):
- wan2.1_t2v_1.3B_fp16.safetensors (diffusion model)
- wan_2.1_vae.safetensors (VAE encoder/decoder)
- umt5_xxl_fp16.safetensors (text encoder)

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(diffusion): Wan video foundation -- ctx/vid handlers, AVI muxer, shared parsers

Phase 1-4 of Wan 2.1 / 2.2 video generation support in the diffusion-cpp
addon. Configuration + parsing layer only; dispatch + callback plumbing +
JS surface land in follow-up commits on this branch.

SdCtxConfig:
  - Add highNoiseDiffusionModelPath for Wan 2.2 MoE high-noise expert
    (leave empty for Wan 2.1 and all non-Wan models)
  - Add previewMode / previewInterval / previewDenoised / previewNoisy
    for optional mid-denoising preview frames via sd_set_preview_callback
  - Wire both through SdCtxHandlers (new JS keys: preview_mode,
    preview_interval, preview_denoised, preview_noisy) and AddonJs
    (highNoiseDiffusionModelPath in args map)

AviWriter (new utility):
  - addon/src/utils/AviWriter.{hpp,cpp} ports the upstream avi_writer.h
    MJPG encoder onto an in-memory std::vector<uint8_t> sink (no stdio,
    no temp files) so video bytes flow through the existing
    OutputCallBackJs queue
  - Full input validation (numFrames, fps, jpegQuality, channel count,
    frame homogeneity, null data) -- StatusError on any rejection

SdParsers (new shared module):
  - Extract parseSampler / parseScheduler / parseCacheMode /
    parseVaeTileSize / parseCachePreset / requireNum/Str/Bool from
    SdGenHandlers into addon/src/handlers/SdParsers.{hpp,cpp}
  - Reused by both SdGenHandlers (image) and SdVidGenHandlers (video)

SdVidGenHandlers (new):
  - SdVidGenConfig struct with full Wan 2.1 + 2.2 surface: mode
    (txt2vid/img2vid/flf2vid), prompts, dimensions, videoFrames (4k+1
    validated), fps, seed, low-noise expert sample params, high-noise
    expert sample params, moeBoundary, strength, vaceStrength, VAE
    tiling, cache mode/preset/threshold
  - 22 JSON handlers with validation for each field

Tests (all pass):
  - 5 new SdCtxHandlers tests for preview_* + high_noise path default
  - 18 new AviWriter tests covering happy path, RIFF header structure,
    all validation rejections, JPEG round-trip
  - 54 new SdVidGenHandlers tests covering every field + integration
    payload + defaults
  - Zero regressions across existing 144 fast-unit tests

No user-facing JS API changes yet.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(diffusion): Wan video generation -- dispatch, processVideo, JS wrapper + examples

Builds on the Wan foundation commit by wiring the video path end-to-end
from JS to C++ and back. Adds txt2vid / img2vid / flf2vid generation
via a new VideoStableDiffusion class that shares the single native
binding with the existing ImgStableDiffusion class.

Native:
- SdModel::process() dispatches on the JSON "mode" field to
  processImage() (existing) or the new processVideo() path.
- processVideo() applies SdVidGenHandlers, validates mode-vs-inputs
  invariants (img2vid requires init_image; flf2vid requires both;
  txt2vid rejects both; end_image only valid on flf2vid), decodes
  init/end/control frames, fills sd_vid_gen_params_t, and encodes
  the returned sd_image_t* sequence to an in-memory MJPG AVI.
- SdVideoFrames RAII wrapper extracted to addon/src/utils/ so it
  can be unit-tested without a loaded model.
- GenerationJob grows endImageBytes and controlFramesBytes plus an
  optional per-frame frameCallback (unused from JS in this PR;
  reserved for the preview follow-up).
- AddonJs::runJob reads endImageBuffer (single Uint8Array) and
  controlFramesBuffers (Array of Uint8Array) as typed-array args,
  no JSON encoding.

JS surface:
- video.js / video.d.ts: new VideoStableDiffusion class with
  full per-mode validation, 4k+1 frame-count rule, fps range,
  moe_boundary range, Uint8Array type checks, and warning when
  high_noise_* params are set without files.highNoiseDiffusionModel.
- addon.js: SdInterface.runJob threads end_image and control_frames
  through to the native runJob without round-tripping through JSON.
- index.js / index.d.ts: unchanged -- image wrapper continues to
  work exactly as before. Both classes compose the same SdInterface
  and hit the same binding.cpp entry points.
- package.json: exports "./video", ships video.js / video.d.ts,
  adds generate:video / generate:img2vid / generate:flf2vid scripts.

Examples:
- examples/generate-video-wan.js (txt2vid @ 832x480, 33 frames)
- examples/img2vid-wan.js (reuses assets/von-neumann.jpg as first frame)
- examples/flf2vid-wan.js (expects flf-first.png / flf-last.png)

Tests:
- test_sd_video_frames.cpp: 12 RAII tests (empty states, destruction
  of 4k+1 production sizes, null-pixel tolerance, bounds-checked
  operator[], compile-time copy/move deletion).
- test_wan_video.cpp: 12 validation tests reusing the SD2.1 context
  to satisfy isLoaded() and exercise every processVideo() guard
  before generate_video() runs; plus an opt-in happy-path smoke
  test (SD_RUN_WAN_SMOKE=1) gated off by default because ggml-metal
  lacks IM2COL_3D for Wan's 3D convs.

Gates: npm run lint, npm run test:dts, npm run build, and the
fast subset of addon-test (178/178) all pass.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(diffusion): Wan video tests, ggml overlay, example tuning

Add a vcpkg overlay-port for ggml at vcpkg/ports/ggml/ that pins
tetherto/qvac-ext-ggml @ feature/metal-pr-16669-clean (commit
bc053644). The fork adds Metal kernels for IM2COL_3D and 3-axis
PAD-left, both required by Wan 2.1 / 2.2 video generation; without
them ggml hard-aborts mid-run with "unsupported op 'IM2COL_3D'".
Rationale lives in portfile.cmake -- the overlay is transient and
will be removed once the registry baseline rolls forward.

Add JS test coverage for VideoStableDiffusion:
  - test/unit/video-validation.test.js: 63 input-validation cases
    mirroring the existing input-validation.test.js pattern.
  - test/integration/generate-video-wan.test.js: opt-in
    (WAN_INTEGRATION=1) end-to-end T2V smoke test plus sniffAvi
    self-tests.

Tune the Wan examples:
  - generate-video-wan.js: env-var-driven (PROMPT, FRAMES, STEPS,
    SEED, CFG_SCALE, FLOW_SHIFT, ...), inline frame-count cheat
    sheet, (4*k+1) pre-flight check, default FRAMES bumped to 81
    (Wan 1.3B's native training length).
  - img2vid-wan.js, flf2vid-wan.js: flow_shift 5.0 -> 3.0 to match
    the upstream test-wan reference scripts.

Refresh the C++ smoke-test gating doc in test_wan_video.cpp to
reflect that Metal works once the overlay is in place.

Drop build.md: the vcpkg overlay rationale already lives next to
the overlay (portfile.cmake header), and transient infrastructure
doesn't earn its own long-form doc.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(diffusion-cpp): restore build.md

The earlier deletion conflated build.md with the vcpkg overlay rationale,
but build.md is the package's standalone build guide (prerequisites,
build pipeline, cross-compilation, troubleshooting) and is still the
target of README.md's "Building from Source" link. Restore it from main,
which also picks up the LLVM 19 -> 22 bump.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): address PR review feedback for Wan video gen

* Flip default video dimensions to 480x832 portrait (phone-screen
  friendly). Wan 2.1 T2V 1.3B handles both orientations equally well;
  the previous 832x480 landscape default disagreed with the example.
* Document the flow_shift=0 fall-through sentinel in JSDoc, .d.ts, and
  C++ struct/handler comments; correct stale "5-8" recommendation to
  the actually-used 3.0 (matches example + ref scripts).
* Make video_frames error messages consistent JS<->C++ and list the
  full valid set up to 81 (Wan 1.3B native training cap).
* Fix frame-duration arithmetic (33 frames is ~2s @ default 16 fps,
  not ~1.3s @ 24 fps).
* Warn when upscaler_* keys are passed to VideoStableDiffusion --
  ESRGAN upscale is image-only and was being silently ignored.
* Annotate addon.js end_image / control_frames forwarding to call
  out the typed-array transport (avoids JSON byte-array bloat).
* Document the two-level concurrency model around _hasActiveResponse
  (the busy guard isn't dead under exclusiveRunQueue -- it covers
  overlap between the released queue lock and an in-flight response).
* Update C++ defaults test + JS suggestion-fallback test for the new
  portrait orientation.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(diffusion-cpp): retarget ggml overlay to merged tetherto/qvac-ext-ggml@2026-01-30

The Wan-Metal work that was carried as a local overlay has all landed
upstream on tetherto/qvac-ext-ggml's 2026-01-30 branch:

  - bc053644  metal: IM2COL_3D op + PAD left-padding for Wan video       (#5)
  - 512e1773  cmake: support qvac hybrid backend packaging
              (static CPU + dynamic GPU backends, GGML_MAX_NAME prop,
               graceful no-OpenCL-device fallback, public ggml-opencl.h
               install -- previously six local overlay patches)
  - 6d2d24bb / b1923e29 / 05afdc59  metal: tighten IM2COL_3D supports_op
              to match the CPU-reference invariants                     (#6)

Repin vcpkg/ports/ggml from PR #5's head (bc053644) to PR #6's merge
commit (05afdc59) on 2026-01-30, drop all seven local overlay patches
since their content is now upstream verbatim, and bump port-version
102 -> 104 to force a clean rebuild of ggml.

Net diff: +22 / -201; the overlay now exists only as a baseline pin
that overrides the registry's ggml-org/ggml@a8db410a (which still lacks
the Wan-required Metal ops). Once the registry baseline catches up to
a ref containing this work, vcpkg/ports/ggml/ can be deleted entirely.

Verified with npm run build on darwin-arm64: ggml@2026-01-30#104 builds
fresh from 05afdc59 with zero patches applied, addon links and tests
compile, prebuild installed.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(diffusion-cpp): drop local ggml overlay now that registry serves 2026-01-30#7

The previous commit (04a6496) repointed the local ggml overlay at the
merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on
Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same
ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134
landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the
identical REF + SHA512 the overlay was carrying.

This means the diffusion-cpp-local overlay is now strictly redundant --
and slightly behind, since the registry's port-version 7 also picks up
two improvements the overlay didn't have:
  - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build
    off the Apple Accelerate / BLAS path that breaks the iOS toolchain.
  - The Android backend-glob now also matches `libqvac-ggml-*.so` in
    addition to `libggml-*.so`, so the qvac-prefixed DL backends get
    installed alongside the upstream-named ones.

So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake,
vcpkg.json, usage, android-vulkan-version.cmake) and:

  - Bump `vcpkg-configuration.json`'s default-registry baseline from
    a9eae49a -> d1b2497b (the merge commit of registry PR #134), which
    is the first registry SHA that serves ggml@2026-01-30#7.
  - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5`
    to `version>=: 2026-01-30#7` so any later baseline bump can't
    silently drop us back below the Wan-Metal pin.

The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep`
marker are kept in place so future overlays can be added without a
config flap.

Verified end-to-end on darwin-arm64: clean `npm run build`
(bare-make generate + build + install) with the build/ tree wiped.
vcpkg resolves
  ggml[core,metal]:arm64-osx@2026-01-30#7
  -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875...
straight from the registry (no overlay), all 8 ports install in 47s,
the addon links cleanly against the registry-supplied libggml*.a, and
prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten.

Net diff: +2 / -283.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): satisfy standard quotes rule in validateVideoFrames

The middle line of the validateVideoFrames Error message was a template
literal with no `${...}` interpolation, so `standard` (configured via
`npm run lint`) flags it as `quotes`:

  video.js:39:7: Strings must use singlequote.

Adjacent lines 37, 38 use single quotes, and line 40 legitimately uses
backticks for `${n}`. Just the one stray backtick-string -- swap to
single quotes; no behaviour change.

Sanity-checks job 74830306544 on PR #1879 fails on this single line;
`npm run lint` passes locally after the swap.

Co-authored-by: Cursor <cursoragent@cursor.com>

* diffusion-cpp: enable diffusion FA in examples and fix addon paths

- Set diffusion_fa: true across SD, FLUX, and integration test ImgStableDiffusion
  configs so diffusion flash attention matches WAN video examples.
- Pass highNoiseDiffusionModelPath (empty when unset) from index.js so native
  createInstance validation succeeds for image mode; document optional
  files.highNoiseDiffusionModel in index.d.ts and validate absolute paths.

Co-authored-by: Cursor <cursoragent@cursor.com>

* diffusion-cpp(video): pass esrganPath to native createInstance

VideoStableDiffusion omitted esrganPath while the binding validates it as a
string; mirror image-mode by forwarding files.esrgan or empty string.

Co-authored-by: Cursor <cursoragent@cursor.com>

* diffusion-cpp: align C++ includes and image codec with inference-addon-cpp

- Switch remaining qvac-lib-inference-addon-cpp includes to inference-addon-cpp
  (vcpkg installs headers under the shorter prefix).
- Use image_codec::decodeImage / encodeToPng in processVideo after ImageCodec
  API rename from decodePng.

Co-authored-by: Cursor <cursoragent@cursor.com>

* diffusion-cpp: apply clang-format to changed C++ sources

Run git-clang-format against ce2ea93 to satisfy the repo formatter on the
video addon, image codec, and Wan tests. No behavior changes.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/video): address review comments 1-3

1. Use global addonLogging instead of per-instance setLogger/releaseLogger
   - Eliminates process-global logger collision (was reintroduced in video.js)
   - Mirrors fix from ImgStableDiffusion / EsrganUpscaler
   - video.js no longer manages per-instance logger state

2. Reject width/height values <= 0 in JS validation
   - Now validates that width > 0 and height > 0 before alignment check
   - Error message updated to say "positive multiples of 8"
   - Updated test expectations to match new message

3. Validate double values are integers before casting in C++
   - All int casts now check std::floor(d) == d first
   - Affects: width, height, video_frames, fps handlers
   - Prevents silent truncation (e.g. 8.5 -> 8)

All 70 unit tests pass; build/lint/dts all clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/video): address review comments 4-7

4. Validate end_image / control_frames dimensions match video dimensions
   - Added dimension checks in processVideo() before generate_video()
   - Rejects mismatched frame sizes with clear error messages
   - Prevents silent corruption or undefined behavior in native layer

5. Use ImageCodec ownership helper instead of raw free()
   - Replaced FrameBuffersGuard with unique_ptr<uint8_t, FreeDeleter>
   - Consistent with existing image_codec ownership pattern
   - Automatic cleanup on exception; no manual free() calls

6. Regenerate mobile integration test manifest
   - Ran npm run test:mobile:generate
   - Updated test/mobile/integration.auto.cjs with new runners

7. Add checked buffer size calculation in AviWriter
   - Validates width * height overflow before multiplication
   - Validates numFrames * bytesPerFrame overflow
   - Rejects allocations that would exceed SIZE_MAX
   - Prevents silent integer overflow in reserve() call

All 70 unit tests pass; build/lint/dts all clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/video): harden int validation, ownership, AVI overflow

Follow-up tightening on top of the review fixes for #1879.

SdVidGenHandlers:
- Extract a single requireInt() helper used by width / height / video_frames
  / fps / requirePositiveInt. The helper rejects NaN, +/-inf, fractional
  doubles, and values outside [INT_MIN, INT_MAX] before static_cast<int>,
  so casts to int are always well-defined and no JSON value silently
  truncates (e.g. 8.5 -> 8).
- Add <cmath>/<climits> includes that were transitively available.

SdModel::processVideo:
- Replace the bespoke FrameBuffersGuard struct with three plain
  unique_ptr<uint8_t, image_codec::FreeDeleter> values (initData / endData
  / controlData). Same lifetime semantics, less custom code, and the
  control-frame dimension mismatch path now takes ownership *before* the
  check so a throw can no longer leak the freshly-decoded buffer.

AviWriter::encodeFramesToAvi:
- Reserve calculation is now step-wise overflow-checked against SIZE_MAX
  (width vs height vs *3 vs *numFrames) instead of a single multiply that
  could wrap.
- Add a hard upper bound at UINT32_MAX (AVI 1.0 RIFF size header is a
  uint32_t -- anything past 4 GB cannot be addressed by the spec).
- Re-check the final size before patching the RIFF header in case JPEG
  output overshoots the pre-flight estimate.

Tests:
- SdVidGenHandlers: new IntCoercion suite covers fractional doubles,
  out-of-int-range doubles, picojson's own NaN/inf rejection at the
  JSON layer, and integer-valued doubles (the common case from JSON).
- AviWriter: new tests for the overflow guard and the 4 GB RIFF cap,
  both fire before any encoding starts.
- test_wan_video: pin width/height in the existing CorruptControlFrame
  test so the new dimension check passes for frame [0] and we still
  exercise the decode-failure path at frame [1]. Add two new cases
  covering end_image and control_frames dimension mismatch.

All 211 C++ tests, 70 JS unit tests, lint and tsc --dts pass.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/video): don't eager-require binding via addonLogging

CI sanity-checks (JS unit tests on a runner with no native prebuild)
was crashing with `AddonError: ADDON_NOT_FOUND` because the top-level
`require('./addonLogging')` introduced in e6b13ae transitively pulled
in `binding.js` -> `libqvac__diffusion-cpp.so`. The unit tests only
exercise JS-side validation and never call `load()`, so they used to
work without the prebuilt addon -- this regression broke that.

Match `ImgStableDiffusion` instead: drop the per-instance native
logger plumbing entirely (it's dead code anyway after the e6b13ae
refactor, since `_connectNativeLogger` was no longer called), and
document in the constructor JSDoc that callers wire up native C++
logs once globally via `addonLogging.setLogger(...)`.

Net diff:
- Remove `const addonLogging = require('./addonLogging')` at top.
- Remove `_connectNativeLogger` / `_releaseNativeLogger` methods and
  their two stale call sites.
- Remove `LOG_METHODS` (only used by the removed method) and
  `this._binding` (used to keep a handle for the removed release
  path; the binding is now scoped to `_createAddon` only, matching
  `ImgStableDiffusion::_createAddon`).
- JSDoc on `args.logger` now mirrors `index.js` and points users at
  `addonLogging.setLogger`.

Verified: JS unit tests 70/70 pass with the prebuilds directory
moved aside, lint clean, tsc --dts clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/video): validate init_image dims; reject unsupported lora

Two reviewer-flagged regressions on PR #1879:

1. blocker (gabrielgrigoras-serv): processVideo() validates dimensions
   for end_image and every control_frames[i] but not for init_image.
   A caller passing width/height that don't match the decoded init_image
   would hand mismatched (width, height) and frame pixel stride to
   generate_video(), producing inconsistent frame data downstream
   (and risking VAE segfaults).

   Fix: add the same dimension check in SdModel.cpp processVideo()
   right after the init_image decode, throwing StatusError on
   mismatch -- consistent with the existing end_image / control_frames
   checks. All three checks now compare against vid.width / vid.height
   as the single source of truth for the video's final dimensions.

   Ownership of the freshly-decoded init pixel buffer is taken into
   the unique_ptr *before* the dim check, mirroring the control_frames
   path so a mismatch can't leak the buffer.

2. gianni-cor: params.lora silently dropped on the video path -- video.js
   validated it as a non-empty absolute path and video.d.ts advertised
   `lora?: string`, but SD_VID_GEN_HANDLERS has no "lora" entry and
   SdModel::processVideo never touches sd_vid_gen_params_t::loras, so
   any LoRA passed through was swallowed by the unknown-keys branch
   in applySdVidGenHandlers and silently produced LoRA-less output.

   Fix B applied (reviewer's preferred "out of scope" option):
   - video.js: replaced the absolute-path validation with a loud
     TypeError('params.lora is not supported for video generation
     yet'), so existing callers fail at the JS boundary instead of
     getting silent LoRA-less output.
   - video.d.ts: dropped `lora?: string` from VideoGenerationParams.
   - video-validation.test.js: collapsed the four old lora cases
     (empty / non-string / relative / absolute) into one parametrised
     test that asserts the new TypeError fires for every shape, so a
     future re-introduction of the JS validation can't bring back the
     silent-drop regression.

   When LoRA-on-video is wired through native (mirror of processImage's
   prepareLoras() + sd_img_gen_params_t::loras), the right path is to
   restore the absolute-path validation here and add a "lora" handler
   to SD_VID_GEN_HANDLERS, NOT to revert the d.ts.

C++ test changes:
- new Img2VidRejectsInitImageWithWrongDimensions covers the blocker.
- Flf2VidRejectsCorruptEndImage pinned width/height to 64 so the new
  init dim check passes for the 64x64 init and we still reach the
  intended end-decode-failure path (same approach as the existing
  Img2VidRejectsCorruptControlFrame fixture).

Verified: 67/67 JS unit tests pass with and without prebuilds, 176/176
C++ tests pass (1 opt-in Wan smoke skipped, requires ~8GB weights),
lint and tsc --dts clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): regression + 7 review-batch fixes (NaN/Inf guards, cancel, etc.)

Addresses all 8 outstanding comments on PR #1879 (one regression from
commit 59f2663 plus a CHANGES_REQUESTED batch of seven items). Major
points below; per-file rationale in the inline comments.

== Regression fix (highest priority)
* gianni-cor flagged that the new init_image strict-equality check from
  commit 59f2663 rejects every off-grid frame with a confusing error
  citing wrapper-picked dims. Root cause: addon.js _fillDimsFromImage
  was silently doing Math.ceil(d/8)*8, so a 100x100 init_image got
  dispatched as 104x104 and the native check then threw "100x100 != 104x104"
  -- citing a value the caller never passed. Fixes:
  - addon.js _fillDimsFromImage now passes dims through verbatim
    (no rounding). The image SDEdit path already realigns internally
    (SdModel.cpp ~600) and the FLUX2 ref path uses
    auto_resize_ref_image, so dropping the rounding is safe across
    every path.
  - video.js _runInternal pre-empts the cryptic native error with a
    JS-layer off-grid probe: when width/height aren't explicit it
    reads init_image / end_image / control_frames[i] dimensions and
    throws a clear "your image is off-grid, pre-align or pass explicit
    dims" message naming the exact buffer.
  - Removes the ceil-vs-round inconsistency wart between
    _fillDimsFromImage (ceil) and the user-facing validator (round).
  - Three new JS regression tests for off-grid init / end / control,
    plus one positive test for explicit aligned dims overriding the
    probe.

== JS hardening
* params.prompt is documented Required but was never validated --
  undefined / "" / 42 each produced a different failure mode (silent
  noise, silent noise, far-away C++ error). video.js now throws a loud
  TypeError at the wrapper boundary. Four new prompt-validation tests.
* mapAddonEvent JobEnded fallback accepted every typed-array view --
  works today only because uint8_t is the sole registered
  TypedArrayOutputHandler. When frameCallback (SdModel.hpp:139) gets
  wired through to JS, every per-frame event would have been
  misclassified as JobEnded and the response stream would have closed
  after the first frame. One-token fix: add `&& !ArrayBuffer.isView(rawData)`
  to the discriminator. ArrayBuffer.isView is true for every TypedArray
  + DataView, false for plain objects -- exactly the discrimination
  needed for the runtime-stats POJO.

== C++ parser hardening (NaN / Inf / int64 / range)
* Promoted requireInt from SdVidGenHandlers.cpp's anonymous namespace
  into parsers::, and added two siblings:
  - requireFiniteFloat: rejects NaN / +inf / -inf before the float
    cast (NaN compares false against every bound, so range checks
    of the form `f < lo || f > hi` previously let it sneak through).
  - requireInt64: same finite + integer guards as requireInt, range
    check against representable [INT64_MIN, INT64_MAX] doubles.
  - requireFiniteFloatInRange: convenience wrapper for [lo, hi] checks.
* Routed every previously-vulnerable cast through the new helpers:
  - SdVidGenHandlers.cpp: seed (int64), cfg_scale, flow_shift,
    high_noise_cfg_scale, high_noise_flow_shift, vae_tile_overlap,
    cache_threshold, moe_boundary, strength, vace_strength
  - SdGenHandlers.cpp (image path, reviewer asked for symmetric fix):
    eta, cfg_scale, guidance, img_cfg_scale, seed, batch_count,
    strength, clip_skip, vae_tile_overlap, cache_threshold, width,
    height, steps, parseUpscaleRepeats
* parseVaeTileSize (SdParsers.cpp): numeric form now routes through
  requireInt (rejects NaN/Inf/fractional/out-of-range), and BOTH
  forms (numeric and "WxH" string) now reject <= 0. Five new tests.

== Cancellation gap + typed status
* SdModel.cpp processVideo cancelRequested_ was checked exactly once
  after generate_video() returns -- the slow tail (per-frame PNG
  fan-out + AVI mux, multi-second on 81-frame 832x480 videos) had no
  cancellation visibility. Added 2 checks: top of frame-callback loop
  body, and immediately before encodeFramesToAvi.
* Switched both Job cancelled throws (image path at SdModel.cpp:730,
  video path at :987, plus the 2 new C1 sites) from bare
  std::runtime_error to StatusError tagged with
  localCodeMsg="Cancelled", so the JS layer can discriminate cancel
  from real internal failures via codeString() ("[ General :: Cancelled ]")
  instead of string-matching the exception message.

  Note: this PR deliberately does NOT add `Cancelled = 6` to the
  shared inference-addon-cpp Errors.hpp enum, because that header
  ships via vcpkg to every package in the monorepo and a cross-package
  coordinated change is out of scope. Instead we use the 3-arg
  StatusError ctor (addonId, localCodeMsg, errorMsg) which produces
  the same codeString without touching the shared enum. When the
  enum is updated later, the 4 call sites can switch to the 2-arg
  ctor in a one-line follow-up.

== C5 (preview_*) -- product decision deferred
* The header comment at SdCtxHandlers.hpp:112 claimed preview_mode et
  al are "Wired to sd_set_preview_callback() in SdModel::process()",
  but a grep across packages/diffusion-cpp for sd_set_preview_callback
  returns zero matches -- the four config keys are validated and stored
  but the upstream callback is never installed, so they're a silent
  no-op end-to-end. Downgraded the misleading comment to an explicit
  TODO(QVAC-18026 follow-up) documenting the gap and the two viable
  resolution paths (wire it up alongside sd_set_abort_callback, OR
  remove the handlers + fields + tests). Reviewer asked which path is
  intended; this commit picks neither and just stops claiming the
  wiring exists. The choice can land in a separate PR without holding
  this one up.

== Test surface
* +8 JS tests (prompt validation x4, off-grid probe x4)
* +5 C++ tests (vae_tile_size zero/negative/fractional/out-of-range
  rejection, plus the existing IntCoercion suite carried over to the
  promoted helpers transparently)
* Cancel-context test updated to assert the typed
  "[ General :: Cancelled ]" codeString in addition to the message.

Verified locally:
  JS unit tests:   75/75 pass with prebuild, 75/75 also without
                   (CI sanity-checks mode, no native binary loaded)
  C++ unit tests: 209/210 pass, 1 opt-in skip
                   (SdWanHappyPathTest needs ~8GB Wan weights)
  npm run lint:    clean
  npm run test:dts: clean

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(diffusion-cpp): release 0.8.0

Bumps @qvac/diffusion-cpp to 0.8.0 and documents the Wan 2.1 / Wan 2.2
video pipeline shipped since 0.7.0: new VideoStableDiffusion class
(txt2vid / img2vid / flf2vid), MoE high-noise expert routing, streaming
MJPG AVI muxer, refactored download helpers + Wan model script, plus
the supporting JS + C++ test coverage and validation hardening.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): re-align auto-detected img dims to multiple of 8

_fillDimsFromImage was passing raw image dimensions through verbatim
since fe4d10f, but the native SdGenHandlers validates width/height
% 8 == 0 before the downstream alignment in SdModel::processImage
ever runs. Any img2img call with a non-aligned source image (e.g.
the bundled 500x627 von-neumann.jpg used by the FLUX2 i2i integration
test) therefore failed with:

  height must be a positive multiple of 8, got: 627

Restore the Math.ceil(d/8)*8 round-up that was removed in fe4d10f.
The original motivation for the removal -- avoiding a spurious dim
mismatch on the video path where processVideo strict-compares decoded
frame dims against vid.width/vid.height -- is already handled at the
JS layer by VideoStableDiffusion's off-grid pre-validation in
video.js, which runs before this helper and rejects unaligned
init/end/control frames with a clear caller-facing error. The ceil()
is therefore a no-op on the video path.

Co-authored-by: Cursor <cursoragent@cursor.com>

* style(diffusion-cpp): apply clang-format to drifted C++ sources

cpp-lint surfaced clang-format drift in 4 files that accumulated
across recent Wan-video commits. No semantic changes -- only
mechanical line-wrap / arg-break placement to match the project's
.clang-format.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/test): use package export for video module in wan integration test

The generate-video-wan.test.js test was using a relative import
(require('../../video')) that breaks when test files are bundled
and relocated to the test-framework backend directory during mobile
test setup.

Change to the package export pattern (@qvac/diffusion-cpp/video)
used by other integration tests, which remains valid regardless of
file location.

Fixes: https://github.com/tetherto/qvac/actions/runs/25929776543/job/76221440417
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): expose video API from package root

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): repair variable names in SdModel after merge

Co-authored-by: Cursor <cursoragent@cursor.com>

* style(diffusion-cpp): apply git-clang-format

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Proletter pushed a commit to tetherto/qvac that referenced this pull request May 24, 2026
…s 2026-01-30#7

The previous commit (0f5c522) repointed the local ggml overlay at the
merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on
Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same
ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134
landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the
identical REF + SHA512 the overlay was carrying.

This means the diffusion-cpp-local overlay is now strictly redundant --
and slightly behind, since the registry's port-version 7 also picks up
two improvements the overlay didn't have:
  - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build
    off the Apple Accelerate / BLAS path that breaks the iOS toolchain.
  - The Android backend-glob now also matches `libqvac-ggml-*.so` in
    addition to `libggml-*.so`, so the qvac-prefixed DL backends get
    installed alongside the upstream-named ones.

So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake,
vcpkg.json, usage, android-vulkan-version.cmake) and:

  - Bump `vcpkg-configuration.json`'s default-registry baseline from
    a9eae49a -> d1b2497b (the merge commit of registry PR #134), which
    is the first registry SHA that serves ggml@2026-01-30#7.
  - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5`
    to `version>=: 2026-01-30#7` so any later baseline bump can't
    silently drop us back below the Wan-Metal pin.

The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep`
marker are kept in place so future overlays can be added without a
config flap.

Verified end-to-end on darwin-arm64: clean `npm run build`
(bare-make generate + build + install) with the build/ tree wiped.
vcpkg resolves
  ggml[core,metal]:arm64-osx@2026-01-30#7
  -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875...
straight from the registry (no overlay), all 8 ports install in 47s,
the addon links cleanly against the registry-supplied libggml*.a, and
prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten.

Net diff: +2 / -283.

Co-authored-by: Cursor <cursoragent@cursor.com>
Proletter pushed a commit to tetherto/qvac that referenced this pull request May 24, 2026
* feat(diffusion): refactor download scripts and add Wan 2.1 support

- Extract shared dl() function into reusable dl-functions.sh module
- Update all download-model-*.sh scripts to source shared utilities
- Add download-model-wan.sh for Wan 2.1 video generation models
- Reduces code duplication and improves maintainability

Wan 2.1 downloads (~8.3 GB):
- wan2.1_t2v_1.3B_fp16.safetensors (diffusion model)
- wan_2.1_vae.safetensors (VAE encoder/decoder)
- umt5_xxl_fp16.safetensors (text encoder)

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(diffusion): Wan video foundation -- ctx/vid handlers, AVI muxer, shared parsers

Phase 1-4 of Wan 2.1 / 2.2 video generation support in the diffusion-cpp
addon. Configuration + parsing layer only; dispatch + callback plumbing +
JS surface land in follow-up commits on this branch.

SdCtxConfig:
  - Add highNoiseDiffusionModelPath for Wan 2.2 MoE high-noise expert
    (leave empty for Wan 2.1 and all non-Wan models)
  - Add previewMode / previewInterval / previewDenoised / previewNoisy
    for optional mid-denoising preview frames via sd_set_preview_callback
  - Wire both through SdCtxHandlers (new JS keys: preview_mode,
    preview_interval, preview_denoised, preview_noisy) and AddonJs
    (highNoiseDiffusionModelPath in args map)

AviWriter (new utility):
  - addon/src/utils/AviWriter.{hpp,cpp} ports the upstream avi_writer.h
    MJPG encoder onto an in-memory std::vector<uint8_t> sink (no stdio,
    no temp files) so video bytes flow through the existing
    OutputCallBackJs queue
  - Full input validation (numFrames, fps, jpegQuality, channel count,
    frame homogeneity, null data) -- StatusError on any rejection

SdParsers (new shared module):
  - Extract parseSampler / parseScheduler / parseCacheMode /
    parseVaeTileSize / parseCachePreset / requireNum/Str/Bool from
    SdGenHandlers into addon/src/handlers/SdParsers.{hpp,cpp}
  - Reused by both SdGenHandlers (image) and SdVidGenHandlers (video)

SdVidGenHandlers (new):
  - SdVidGenConfig struct with full Wan 2.1 + 2.2 surface: mode
    (txt2vid/img2vid/flf2vid), prompts, dimensions, videoFrames (4k+1
    validated), fps, seed, low-noise expert sample params, high-noise
    expert sample params, moeBoundary, strength, vaceStrength, VAE
    tiling, cache mode/preset/threshold
  - 22 JSON handlers with validation for each field

Tests (all pass):
  - 5 new SdCtxHandlers tests for preview_* + high_noise path default
  - 18 new AviWriter tests covering happy path, RIFF header structure,
    all validation rejections, JPEG round-trip
  - 54 new SdVidGenHandlers tests covering every field + integration
    payload + defaults
  - Zero regressions across existing 144 fast-unit tests

No user-facing JS API changes yet.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(diffusion): Wan video generation -- dispatch, processVideo, JS wrapper + examples

Builds on the Wan foundation commit by wiring the video path end-to-end
from JS to C++ and back. Adds txt2vid / img2vid / flf2vid generation
via a new VideoStableDiffusion class that shares the single native
binding with the existing ImgStableDiffusion class.

Native:
- SdModel::process() dispatches on the JSON "mode" field to
  processImage() (existing) or the new processVideo() path.
- processVideo() applies SdVidGenHandlers, validates mode-vs-inputs
  invariants (img2vid requires init_image; flf2vid requires both;
  txt2vid rejects both; end_image only valid on flf2vid), decodes
  init/end/control frames, fills sd_vid_gen_params_t, and encodes
  the returned sd_image_t* sequence to an in-memory MJPG AVI.
- SdVideoFrames RAII wrapper extracted to addon/src/utils/ so it
  can be unit-tested without a loaded model.
- GenerationJob grows endImageBytes and controlFramesBytes plus an
  optional per-frame frameCallback (unused from JS in this PR;
  reserved for the preview follow-up).
- AddonJs::runJob reads endImageBuffer (single Uint8Array) and
  controlFramesBuffers (Array of Uint8Array) as typed-array args,
  no JSON encoding.

JS surface:
- video.js / video.d.ts: new VideoStableDiffusion class with
  full per-mode validation, 4k+1 frame-count rule, fps range,
  moe_boundary range, Uint8Array type checks, and warning when
  high_noise_* params are set without files.highNoiseDiffusionModel.
- addon.js: SdInterface.runJob threads end_image and control_frames
  through to the native runJob without round-tripping through JSON.
- index.js / index.d.ts: unchanged -- image wrapper continues to
  work exactly as before. Both classes compose the same SdInterface
  and hit the same binding.cpp entry points.
- package.json: exports "./video", ships video.js / video.d.ts,
  adds generate:video / generate:img2vid / generate:flf2vid scripts.

Examples:
- examples/generate-video-wan.js (txt2vid @ 832x480, 33 frames)
- examples/img2vid-wan.js (reuses assets/von-neumann.jpg as first frame)
- examples/flf2vid-wan.js (expects flf-first.png / flf-last.png)

Tests:
- test_sd_video_frames.cpp: 12 RAII tests (empty states, destruction
  of 4k+1 production sizes, null-pixel tolerance, bounds-checked
  operator[], compile-time copy/move deletion).
- test_wan_video.cpp: 12 validation tests reusing the SD2.1 context
  to satisfy isLoaded() and exercise every processVideo() guard
  before generate_video() runs; plus an opt-in happy-path smoke
  test (SD_RUN_WAN_SMOKE=1) gated off by default because ggml-metal
  lacks IM2COL_3D for Wan's 3D convs.

Gates: npm run lint, npm run test:dts, npm run build, and the
fast subset of addon-test (178/178) all pass.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(diffusion): Wan video tests, ggml overlay, example tuning

Add a vcpkg overlay-port for ggml at vcpkg/ports/ggml/ that pins
tetherto/qvac-ext-ggml @ feature/metal-pr-16669-clean (commit
bc053644). The fork adds Metal kernels for IM2COL_3D and 3-axis
PAD-left, both required by Wan 2.1 / 2.2 video generation; without
them ggml hard-aborts mid-run with "unsupported op 'IM2COL_3D'".
Rationale lives in portfile.cmake -- the overlay is transient and
will be removed once the registry baseline rolls forward.

Add JS test coverage for VideoStableDiffusion:
  - test/unit/video-validation.test.js: 63 input-validation cases
    mirroring the existing input-validation.test.js pattern.
  - test/integration/generate-video-wan.test.js: opt-in
    (WAN_INTEGRATION=1) end-to-end T2V smoke test plus sniffAvi
    self-tests.

Tune the Wan examples:
  - generate-video-wan.js: env-var-driven (PROMPT, FRAMES, STEPS,
    SEED, CFG_SCALE, FLOW_SHIFT, ...), inline frame-count cheat
    sheet, (4*k+1) pre-flight check, default FRAMES bumped to 81
    (Wan 1.3B's native training length).
  - img2vid-wan.js, flf2vid-wan.js: flow_shift 5.0 -> 3.0 to match
    the upstream test-wan reference scripts.

Refresh the C++ smoke-test gating doc in test_wan_video.cpp to
reflect that Metal works once the overlay is in place.

Drop build.md: the vcpkg overlay rationale already lives next to
the overlay (portfile.cmake header), and transient infrastructure
doesn't earn its own long-form doc.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(diffusion-cpp): restore build.md

The earlier deletion conflated build.md with the vcpkg overlay rationale,
but build.md is the package's standalone build guide (prerequisites,
build pipeline, cross-compilation, troubleshooting) and is still the
target of README.md's "Building from Source" link. Restore it from main,
which also picks up the LLVM 19 -> 22 bump.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): address PR review feedback for Wan video gen

* Flip default video dimensions to 480x832 portrait (phone-screen
  friendly). Wan 2.1 T2V 1.3B handles both orientations equally well;
  the previous 832x480 landscape default disagreed with the example.
* Document the flow_shift=0 fall-through sentinel in JSDoc, .d.ts, and
  C++ struct/handler comments; correct stale "5-8" recommendation to
  the actually-used 3.0 (matches example + ref scripts).
* Make video_frames error messages consistent JS<->C++ and list the
  full valid set up to 81 (Wan 1.3B native training cap).
* Fix frame-duration arithmetic (33 frames is ~2s @ default 16 fps,
  not ~1.3s @ 24 fps).
* Warn when upscaler_* keys are passed to VideoStableDiffusion --
  ESRGAN upscale is image-only and was being silently ignored.
* Annotate addon.js end_image / control_frames forwarding to call
  out the typed-array transport (avoids JSON byte-array bloat).
* Document the two-level concurrency model around _hasActiveResponse
  (the busy guard isn't dead under exclusiveRunQueue -- it covers
  overlap between the released queue lock and an in-flight response).
* Update C++ defaults test + JS suggestion-fallback test for the new
  portrait orientation.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(diffusion-cpp): retarget ggml overlay to merged tetherto/qvac-ext-ggml@2026-01-30

The Wan-Metal work that was carried as a local overlay has all landed
upstream on tetherto/qvac-ext-ggml's 2026-01-30 branch:

  - bc053644  metal: IM2COL_3D op + PAD left-padding for Wan video       (#5)
  - 512e1773  cmake: support qvac hybrid backend packaging
              (static CPU + dynamic GPU backends, GGML_MAX_NAME prop,
               graceful no-OpenCL-device fallback, public ggml-opencl.h
               install -- previously six local overlay patches)
  - 6d2d24bb / b1923e29 / 05afdc59  metal: tighten IM2COL_3D supports_op
              to match the CPU-reference invariants                     (#6)

Repin vcpkg/ports/ggml from PR #5's head (bc053644) to PR #6's merge
commit (05afdc59) on 2026-01-30, drop all seven local overlay patches
since their content is now upstream verbatim, and bump port-version
102 -> 104 to force a clean rebuild of ggml.

Net diff: +22 / -201; the overlay now exists only as a baseline pin
that overrides the registry's ggml-org/ggml@a8db410a (which still lacks
the Wan-required Metal ops). Once the registry baseline catches up to
a ref containing this work, vcpkg/ports/ggml/ can be deleted entirely.

Verified with npm run build on darwin-arm64: ggml@2026-01-30#104 builds
fresh from 05afdc59 with zero patches applied, addon links and tests
compile, prebuild installed.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(diffusion-cpp): drop local ggml overlay now that registry serves 2026-01-30#7

The previous commit (04a6496) repointed the local ggml overlay at the
merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on
Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same
ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134
landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the
identical REF + SHA512 the overlay was carrying.

This means the diffusion-cpp-local overlay is now strictly redundant --
and slightly behind, since the registry's port-version 7 also picks up
two improvements the overlay didn't have:
  - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build
    off the Apple Accelerate / BLAS path that breaks the iOS toolchain.
  - The Android backend-glob now also matches `libqvac-ggml-*.so` in
    addition to `libggml-*.so`, so the qvac-prefixed DL backends get
    installed alongside the upstream-named ones.

So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake,
vcpkg.json, usage, android-vulkan-version.cmake) and:

  - Bump `vcpkg-configuration.json`'s default-registry baseline from
    a9eae49a -> d1b2497b (the merge commit of registry PR #134), which
    is the first registry SHA that serves ggml@2026-01-30#7.
  - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5`
    to `version>=: 2026-01-30#7` so any later baseline bump can't
    silently drop us back below the Wan-Metal pin.

The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep`
marker are kept in place so future overlays can be added without a
config flap.

Verified end-to-end on darwin-arm64: clean `npm run build`
(bare-make generate + build + install) with the build/ tree wiped.
vcpkg resolves
  ggml[core,metal]:arm64-osx@2026-01-30#7
  -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875...
straight from the registry (no overlay), all 8 ports install in 47s,
the addon links cleanly against the registry-supplied libggml*.a, and
prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten.

Net diff: +2 / -283.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): satisfy standard quotes rule in validateVideoFrames

The middle line of the validateVideoFrames Error message was a template
literal with no `${...}` interpolation, so `standard` (configured via
`npm run lint`) flags it as `quotes`:

  video.js:39:7: Strings must use singlequote.

Adjacent lines 37, 38 use single quotes, and line 40 legitimately uses
backticks for `${n}`. Just the one stray backtick-string -- swap to
single quotes; no behaviour change.

Sanity-checks job 74830306544 on PR #1879 fails on this single line;
`npm run lint` passes locally after the swap.

Co-authored-by: Cursor <cursoragent@cursor.com>

* diffusion-cpp: enable diffusion FA in examples and fix addon paths

- Set diffusion_fa: true across SD, FLUX, and integration test ImgStableDiffusion
  configs so diffusion flash attention matches WAN video examples.
- Pass highNoiseDiffusionModelPath (empty when unset) from index.js so native
  createInstance validation succeeds for image mode; document optional
  files.highNoiseDiffusionModel in index.d.ts and validate absolute paths.

Co-authored-by: Cursor <cursoragent@cursor.com>

* diffusion-cpp(video): pass esrganPath to native createInstance

VideoStableDiffusion omitted esrganPath while the binding validates it as a
string; mirror image-mode by forwarding files.esrgan or empty string.

Co-authored-by: Cursor <cursoragent@cursor.com>

* diffusion-cpp: align C++ includes and image codec with inference-addon-cpp

- Switch remaining qvac-lib-inference-addon-cpp includes to inference-addon-cpp
  (vcpkg installs headers under the shorter prefix).
- Use image_codec::decodeImage / encodeToPng in processVideo after ImageCodec
  API rename from decodePng.

Co-authored-by: Cursor <cursoragent@cursor.com>

* diffusion-cpp: apply clang-format to changed C++ sources

Run git-clang-format against 2c4dc65 to satisfy the repo formatter on the
video addon, image codec, and Wan tests. No behavior changes.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/video): address review comments 1-3

1. Use global addonLogging instead of per-instance setLogger/releaseLogger
   - Eliminates process-global logger collision (was reintroduced in video.js)
   - Mirrors fix from ImgStableDiffusion / EsrganUpscaler
   - video.js no longer manages per-instance logger state

2. Reject width/height values <= 0 in JS validation
   - Now validates that width > 0 and height > 0 before alignment check
   - Error message updated to say "positive multiples of 8"
   - Updated test expectations to match new message

3. Validate double values are integers before casting in C++
   - All int casts now check std::floor(d) == d first
   - Affects: width, height, video_frames, fps handlers
   - Prevents silent truncation (e.g. 8.5 -> 8)

All 70 unit tests pass; build/lint/dts all clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/video): address review comments 4-7

4. Validate end_image / control_frames dimensions match video dimensions
   - Added dimension checks in processVideo() before generate_video()
   - Rejects mismatched frame sizes with clear error messages
   - Prevents silent corruption or undefined behavior in native layer

5. Use ImageCodec ownership helper instead of raw free()
   - Replaced FrameBuffersGuard with unique_ptr<uint8_t, FreeDeleter>
   - Consistent with existing image_codec ownership pattern
   - Automatic cleanup on exception; no manual free() calls

6. Regenerate mobile integration test manifest
   - Ran npm run test:mobile:generate
   - Updated test/mobile/integration.auto.cjs with new runners

7. Add checked buffer size calculation in AviWriter
   - Validates width * height overflow before multiplication
   - Validates numFrames * bytesPerFrame overflow
   - Rejects allocations that would exceed SIZE_MAX
   - Prevents silent integer overflow in reserve() call

All 70 unit tests pass; build/lint/dts all clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/video): harden int validation, ownership, AVI overflow

Follow-up tightening on top of the review fixes for #1879.

SdVidGenHandlers:
- Extract a single requireInt() helper used by width / height / video_frames
  / fps / requirePositiveInt. The helper rejects NaN, +/-inf, fractional
  doubles, and values outside [INT_MIN, INT_MAX] before static_cast<int>,
  so casts to int are always well-defined and no JSON value silently
  truncates (e.g. 8.5 -> 8).
- Add <cmath>/<climits> includes that were transitively available.

SdModel::processVideo:
- Replace the bespoke FrameBuffersGuard struct with three plain
  unique_ptr<uint8_t, image_codec::FreeDeleter> values (initData / endData
  / controlData). Same lifetime semantics, less custom code, and the
  control-frame dimension mismatch path now takes ownership *before* the
  check so a throw can no longer leak the freshly-decoded buffer.

AviWriter::encodeFramesToAvi:
- Reserve calculation is now step-wise overflow-checked against SIZE_MAX
  (width vs height vs *3 vs *numFrames) instead of a single multiply that
  could wrap.
- Add a hard upper bound at UINT32_MAX (AVI 1.0 RIFF size header is a
  uint32_t -- anything past 4 GB cannot be addressed by the spec).
- Re-check the final size before patching the RIFF header in case JPEG
  output overshoots the pre-flight estimate.

Tests:
- SdVidGenHandlers: new IntCoercion suite covers fractional doubles,
  out-of-int-range doubles, picojson's own NaN/inf rejection at the
  JSON layer, and integer-valued doubles (the common case from JSON).
- AviWriter: new tests for the overflow guard and the 4 GB RIFF cap,
  both fire before any encoding starts.
- test_wan_video: pin width/height in the existing CorruptControlFrame
  test so the new dimension check passes for frame [0] and we still
  exercise the decode-failure path at frame [1]. Add two new cases
  covering end_image and control_frames dimension mismatch.

All 211 C++ tests, 70 JS unit tests, lint and tsc --dts pass.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/video): don't eager-require binding via addonLogging

CI sanity-checks (JS unit tests on a runner with no native prebuild)
was crashing with `AddonError: ADDON_NOT_FOUND` because the top-level
`require('./addonLogging')` introduced in e6b13ae transitively pulled
in `binding.js` -> `libqvac__diffusion-cpp.so`. The unit tests only
exercise JS-side validation and never call `load()`, so they used to
work without the prebuilt addon -- this regression broke that.

Match `ImgStableDiffusion` instead: drop the per-instance native
logger plumbing entirely (it's dead code anyway after the e6b13ae
refactor, since `_connectNativeLogger` was no longer called), and
document in the constructor JSDoc that callers wire up native C++
logs once globally via `addonLogging.setLogger(...)`.

Net diff:
- Remove `const addonLogging = require('./addonLogging')` at top.
- Remove `_connectNativeLogger` / `_releaseNativeLogger` methods and
  their two stale call sites.
- Remove `LOG_METHODS` (only used by the removed method) and
  `this._binding` (used to keep a handle for the removed release
  path; the binding is now scoped to `_createAddon` only, matching
  `ImgStableDiffusion::_createAddon`).
- JSDoc on `args.logger` now mirrors `index.js` and points users at
  `addonLogging.setLogger`.

Verified: JS unit tests 70/70 pass with the prebuilds directory
moved aside, lint clean, tsc --dts clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/video): validate init_image dims; reject unsupported lora

Two reviewer-flagged regressions on PR #1879:

1. blocker (gabrielgrigoras-serv): processVideo() validates dimensions
   for end_image and every control_frames[i] but not for init_image.
   A caller passing width/height that don't match the decoded init_image
   would hand mismatched (width, height) and frame pixel stride to
   generate_video(), producing inconsistent frame data downstream
   (and risking VAE segfaults).

   Fix: add the same dimension check in SdModel.cpp processVideo()
   right after the init_image decode, throwing StatusError on
   mismatch -- consistent with the existing end_image / control_frames
   checks. All three checks now compare against vid.width / vid.height
   as the single source of truth for the video's final dimensions.

   Ownership of the freshly-decoded init pixel buffer is taken into
   the unique_ptr *before* the dim check, mirroring the control_frames
   path so a mismatch can't leak the buffer.

2. gianni-cor: params.lora silently dropped on the video path -- video.js
   validated it as a non-empty absolute path and video.d.ts advertised
   `lora?: string`, but SD_VID_GEN_HANDLERS has no "lora" entry and
   SdModel::processVideo never touches sd_vid_gen_params_t::loras, so
   any LoRA passed through was swallowed by the unknown-keys branch
   in applySdVidGenHandlers and silently produced LoRA-less output.

   Fix B applied (reviewer's preferred "out of scope" option):
   - video.js: replaced the absolute-path validation with a loud
     TypeError('params.lora is not supported for video generation
     yet'), so existing callers fail at the JS boundary instead of
     getting silent LoRA-less output.
   - video.d.ts: dropped `lora?: string` from VideoGenerationParams.
   - video-validation.test.js: collapsed the four old lora cases
     (empty / non-string / relative / absolute) into one parametrised
     test that asserts the new TypeError fires for every shape, so a
     future re-introduction of the JS validation can't bring back the
     silent-drop regression.

   When LoRA-on-video is wired through native (mirror of processImage's
   prepareLoras() + sd_img_gen_params_t::loras), the right path is to
   restore the absolute-path validation here and add a "lora" handler
   to SD_VID_GEN_HANDLERS, NOT to revert the d.ts.

C++ test changes:
- new Img2VidRejectsInitImageWithWrongDimensions covers the blocker.
- Flf2VidRejectsCorruptEndImage pinned width/height to 64 so the new
  init dim check passes for the 64x64 init and we still reach the
  intended end-decode-failure path (same approach as the existing
  Img2VidRejectsCorruptControlFrame fixture).

Verified: 67/67 JS unit tests pass with and without prebuilds, 176/176
C++ tests pass (1 opt-in Wan smoke skipped, requires ~8GB weights),
lint and tsc --dts clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): regression + 7 review-batch fixes (NaN/Inf guards, cancel, etc.)

Addresses all 8 outstanding comments on PR #1879 (one regression from
commit 59f2663 plus a CHANGES_REQUESTED batch of seven items). Major
points below; per-file rationale in the inline comments.

== Regression fix (highest priority)
* gianni-cor flagged that the new init_image strict-equality check from
  commit 59f2663 rejects every off-grid frame with a confusing error
  citing wrapper-picked dims. Root cause: addon.js _fillDimsFromImage
  was silently doing Math.ceil(d/8)*8, so a 100x100 init_image got
  dispatched as 104x104 and the native check then threw "100x100 != 104x104"
  -- citing a value the caller never passed. Fixes:
  - addon.js _fillDimsFromImage now passes dims through verbatim
    (no rounding). The image SDEdit path already realigns internally
    (SdModel.cpp ~600) and the FLUX2 ref path uses
    auto_resize_ref_image, so dropping the rounding is safe across
    every path.
  - video.js _runInternal pre-empts the cryptic native error with a
    JS-layer off-grid probe: when width/height aren't explicit it
    reads init_image / end_image / control_frames[i] dimensions and
    throws a clear "your image is off-grid, pre-align or pass explicit
    dims" message naming the exact buffer.
  - Removes the ceil-vs-round inconsistency wart between
    _fillDimsFromImage (ceil) and the user-facing validator (round).
  - Three new JS regression tests for off-grid init / end / control,
    plus one positive test for explicit aligned dims overriding the
    probe.

== JS hardening
* params.prompt is documented Required but was never validated --
  undefined / "" / 42 each produced a different failure mode (silent
  noise, silent noise, far-away C++ error). video.js now throws a loud
  TypeError at the wrapper boundary. Four new prompt-validation tests.
* mapAddonEvent JobEnded fallback accepted every typed-array view --
  works today only because uint8_t is the sole registered
  TypedArrayOutputHandler. When frameCallback (SdModel.hpp:139) gets
  wired through to JS, every per-frame event would have been
  misclassified as JobEnded and the response stream would have closed
  after the first frame. One-token fix: add `&& !ArrayBuffer.isView(rawData)`
  to the discriminator. ArrayBuffer.isView is true for every TypedArray
  + DataView, false for plain objects -- exactly the discrimination
  needed for the runtime-stats POJO.

== C++ parser hardening (NaN / Inf / int64 / range)
* Promoted requireInt from SdVidGenHandlers.cpp's anonymous namespace
  into parsers::, and added two siblings:
  - requireFiniteFloat: rejects NaN / +inf / -inf before the float
    cast (NaN compares false against every bound, so range checks
    of the form `f < lo || f > hi` previously let it sneak through).
  - requireInt64: same finite + integer guards as requireInt, range
    check against representable [INT64_MIN, INT64_MAX] doubles.
  - requireFiniteFloatInRange: convenience wrapper for [lo, hi] checks.
* Routed every previously-vulnerable cast through the new helpers:
  - SdVidGenHandlers.cpp: seed (int64), cfg_scale, flow_shift,
    high_noise_cfg_scale, high_noise_flow_shift, vae_tile_overlap,
    cache_threshold, moe_boundary, strength, vace_strength
  - SdGenHandlers.cpp (image path, reviewer asked for symmetric fix):
    eta, cfg_scale, guidance, img_cfg_scale, seed, batch_count,
    strength, clip_skip, vae_tile_overlap, cache_threshold, width,
    height, steps, parseUpscaleRepeats
* parseVaeTileSize (SdParsers.cpp): numeric form now routes through
  requireInt (rejects NaN/Inf/fractional/out-of-range), and BOTH
  forms (numeric and "WxH" string) now reject <= 0. Five new tests.

== Cancellation gap + typed status
* SdModel.cpp processVideo cancelRequested_ was checked exactly once
  after generate_video() returns -- the slow tail (per-frame PNG
  fan-out + AVI mux, multi-second on 81-frame 832x480 videos) had no
  cancellation visibility. Added 2 checks: top of frame-callback loop
  body, and immediately before encodeFramesToAvi.
* Switched both Job cancelled throws (image path at SdModel.cpp:730,
  video path at :987, plus the 2 new C1 sites) from bare
  std::runtime_error to StatusError tagged with
  localCodeMsg="Cancelled", so the JS layer can discriminate cancel
  from real internal failures via codeString() ("[ General :: Cancelled ]")
  instead of string-matching the exception message.

  Note: this PR deliberately does NOT add `Cancelled = 6` to the
  shared inference-addon-cpp Errors.hpp enum, because that header
  ships via vcpkg to every package in the monorepo and a cross-package
  coordinated change is out of scope. Instead we use the 3-arg
  StatusError ctor (addonId, localCodeMsg, errorMsg) which produces
  the same codeString without touching the shared enum. When the
  enum is updated later, the 4 call sites can switch to the 2-arg
  ctor in a one-line follow-up.

== C5 (preview_*) -- product decision deferred
* The header comment at SdCtxHandlers.hpp:112 claimed preview_mode et
  al are "Wired to sd_set_preview_callback() in SdModel::process()",
  but a grep across packages/diffusion-cpp for sd_set_preview_callback
  returns zero matches -- the four config keys are validated and stored
  but the upstream callback is never installed, so they're a silent
  no-op end-to-end. Downgraded the misleading comment to an explicit
  TODO(QVAC-18026 follow-up) documenting the gap and the two viable
  resolution paths (wire it up alongside sd_set_abort_callback, OR
  remove the handlers + fields + tests). Reviewer asked which path is
  intended; this commit picks neither and just stops claiming the
  wiring exists. The choice can land in a separate PR without holding
  this one up.

== Test surface
* +8 JS tests (prompt validation x4, off-grid probe x4)
* +5 C++ tests (vae_tile_size zero/negative/fractional/out-of-range
  rejection, plus the existing IntCoercion suite carried over to the
  promoted helpers transparently)
* Cancel-context test updated to assert the typed
  "[ General :: Cancelled ]" codeString in addition to the message.

Verified locally:
  JS unit tests:   75/75 pass with prebuild, 75/75 also without
                   (CI sanity-checks mode, no native binary loaded)
  C++ unit tests: 209/210 pass, 1 opt-in skip
                   (SdWanHappyPathTest needs ~8GB Wan weights)
  npm run lint:    clean
  npm run test:dts: clean

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(diffusion-cpp): release 0.8.0

Bumps @qvac/diffusion-cpp to 0.8.0 and documents the Wan 2.1 / Wan 2.2
video pipeline shipped since 0.7.0: new VideoStableDiffusion class
(txt2vid / img2vid / flf2vid), MoE high-noise expert routing, streaming
MJPG AVI muxer, refactored download helpers + Wan model script, plus
the supporting JS + C++ test coverage and validation hardening.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): re-align auto-detected img dims to multiple of 8

_fillDimsFromImage was passing raw image dimensions through verbatim
since fe4d10f, but the native SdGenHandlers validates width/height
% 8 == 0 before the downstream alignment in SdModel::processImage
ever runs. Any img2img call with a non-aligned source image (e.g.
the bundled 500x627 von-neumann.jpg used by the FLUX2 i2i integration
test) therefore failed with:

  height must be a positive multiple of 8, got: 627

Restore the Math.ceil(d/8)*8 round-up that was removed in fe4d10f.
The original motivation for the removal -- avoiding a spurious dim
mismatch on the video path where processVideo strict-compares decoded
frame dims against vid.width/vid.height -- is already handled at the
JS layer by VideoStableDiffusion's off-grid pre-validation in
video.js, which runs before this helper and rejects unaligned
init/end/control frames with a clear caller-facing error. The ceil()
is therefore a no-op on the video path.

Co-authored-by: Cursor <cursoragent@cursor.com>

* style(diffusion-cpp): apply clang-format to drifted C++ sources

cpp-lint surfaced clang-format drift in 4 files that accumulated
across recent Wan-video commits. No semantic changes -- only
mechanical line-wrap / arg-break placement to match the project's
.clang-format.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp/test): use package export for video module in wan integration test

The generate-video-wan.test.js test was using a relative import
(require('../../video')) that breaks when test files are bundled
and relocated to the test-framework backend directory during mobile
test setup.

Change to the package export pattern (@qvac/diffusion-cpp/video)
used by other integration tests, which remains valid regardless of
file location.

Fixes: https://github.com/tetherto/qvac/actions/runs/25929776543/job/76221440417
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): expose video API from package root

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(diffusion-cpp): repair variable names in SdModel after merge

Co-authored-by: Cursor <cursoragent@cursor.com>

* style(diffusion-cpp): apply git-clang-format

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants