Updated ggml to version 2026-01-30#7#134
Merged
Merged
Conversation
Bumps ggml port to the merge of tetherto/qvac-ext-ggml#6 (05afdc59), which lands on top of the previous pin (e16bdae2): - bc053644 metal: IM2COL_3D op + PAD left-padding for Wan video (#5) - 6d2d24bb metal: tighten IM2COL_3D supports_op (src[1]==F32) - b1923e29 metal: extend IM2COL_3D supports_op for nb[0]==sizeof(float) and F16-dst => F16-kernel match - 05afdc59 Merge pull request #6 from aegioscy Without these the Metal backend aborts mid-Wan inference with `unsupported op 'IM2COL_3D'` and test-backend-ops support advertises invalid IM2COL_3D combos that hit CPU GGML_ASSERTs. Verified end-to-end on darwin-arm64 via the same source tarball already used by diffusion-cpp's local overlay (now redundant after this bump): ggml@2026-01-30#7 builds with no patches, addon links against it, and Wan2.1 1.3B txt2video runs end-to-end on Metal. Co-authored-by: Cursor <cursoragent@cursor.com>
jpgaribotti
approved these changes
May 7, 2026
aegioscy
added a commit
to tetherto/qvac
that referenced
this pull request
May 7, 2026
…s 2026-01-30#7 The previous commit (04a6496) repointed the local ggml overlay at the merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134 landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the identical REF + SHA512 the overlay was carrying. This means the diffusion-cpp-local overlay is now strictly redundant -- and slightly behind, since the registry's port-version 7 also picks up two improvements the overlay didn't have: - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build off the Apple Accelerate / BLAS path that breaks the iOS toolchain. - The Android backend-glob now also matches `libqvac-ggml-*.so` in addition to `libggml-*.so`, so the qvac-prefixed DL backends get installed alongside the upstream-named ones. So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake, vcpkg.json, usage, android-vulkan-version.cmake) and: - Bump `vcpkg-configuration.json`'s default-registry baseline from a9eae49a -> d1b2497b (the merge commit of registry PR #134), which is the first registry SHA that serves ggml@2026-01-30#7. - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5` to `version>=: 2026-01-30#7` so any later baseline bump can't silently drop us back below the Wan-Metal pin. The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep` marker are kept in place so future overlays can be added without a config flap. Verified end-to-end on darwin-arm64: clean `npm run build` (bare-make generate + build + install) with the build/ tree wiped. vcpkg resolves ggml[core,metal]:arm64-osx@2026-01-30#7 -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875... straight from the registry (no overlay), all 8 ports install in 47s, the addon links cleanly against the registry-supplied libggml*.a, and prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten. Net diff: +2 / -283. Co-authored-by: Cursor <cursoragent@cursor.com>
gianni-cor
added a commit
to tetherto/qvac
that referenced
this pull request
May 18, 2026
* feat(diffusion): refactor download scripts and add Wan 2.1 support
- Extract shared dl() function into reusable dl-functions.sh module
- Update all download-model-*.sh scripts to source shared utilities
- Add download-model-wan.sh for Wan 2.1 video generation models
- Reduces code duplication and improves maintainability
Wan 2.1 downloads (~8.3 GB):
- wan2.1_t2v_1.3B_fp16.safetensors (diffusion model)
- wan_2.1_vae.safetensors (VAE encoder/decoder)
- umt5_xxl_fp16.safetensors (text encoder)
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(diffusion): Wan video foundation -- ctx/vid handlers, AVI muxer, shared parsers
Phase 1-4 of Wan 2.1 / 2.2 video generation support in the diffusion-cpp
addon. Configuration + parsing layer only; dispatch + callback plumbing +
JS surface land in follow-up commits on this branch.
SdCtxConfig:
- Add highNoiseDiffusionModelPath for Wan 2.2 MoE high-noise expert
(leave empty for Wan 2.1 and all non-Wan models)
- Add previewMode / previewInterval / previewDenoised / previewNoisy
for optional mid-denoising preview frames via sd_set_preview_callback
- Wire both through SdCtxHandlers (new JS keys: preview_mode,
preview_interval, preview_denoised, preview_noisy) and AddonJs
(highNoiseDiffusionModelPath in args map)
AviWriter (new utility):
- addon/src/utils/AviWriter.{hpp,cpp} ports the upstream avi_writer.h
MJPG encoder onto an in-memory std::vector<uint8_t> sink (no stdio,
no temp files) so video bytes flow through the existing
OutputCallBackJs queue
- Full input validation (numFrames, fps, jpegQuality, channel count,
frame homogeneity, null data) -- StatusError on any rejection
SdParsers (new shared module):
- Extract parseSampler / parseScheduler / parseCacheMode /
parseVaeTileSize / parseCachePreset / requireNum/Str/Bool from
SdGenHandlers into addon/src/handlers/SdParsers.{hpp,cpp}
- Reused by both SdGenHandlers (image) and SdVidGenHandlers (video)
SdVidGenHandlers (new):
- SdVidGenConfig struct with full Wan 2.1 + 2.2 surface: mode
(txt2vid/img2vid/flf2vid), prompts, dimensions, videoFrames (4k+1
validated), fps, seed, low-noise expert sample params, high-noise
expert sample params, moeBoundary, strength, vaceStrength, VAE
tiling, cache mode/preset/threshold
- 22 JSON handlers with validation for each field
Tests (all pass):
- 5 new SdCtxHandlers tests for preview_* + high_noise path default
- 18 new AviWriter tests covering happy path, RIFF header structure,
all validation rejections, JPEG round-trip
- 54 new SdVidGenHandlers tests covering every field + integration
payload + defaults
- Zero regressions across existing 144 fast-unit tests
No user-facing JS API changes yet.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(diffusion): Wan video generation -- dispatch, processVideo, JS wrapper + examples
Builds on the Wan foundation commit by wiring the video path end-to-end
from JS to C++ and back. Adds txt2vid / img2vid / flf2vid generation
via a new VideoStableDiffusion class that shares the single native
binding with the existing ImgStableDiffusion class.
Native:
- SdModel::process() dispatches on the JSON "mode" field to
processImage() (existing) or the new processVideo() path.
- processVideo() applies SdVidGenHandlers, validates mode-vs-inputs
invariants (img2vid requires init_image; flf2vid requires both;
txt2vid rejects both; end_image only valid on flf2vid), decodes
init/end/control frames, fills sd_vid_gen_params_t, and encodes
the returned sd_image_t* sequence to an in-memory MJPG AVI.
- SdVideoFrames RAII wrapper extracted to addon/src/utils/ so it
can be unit-tested without a loaded model.
- GenerationJob grows endImageBytes and controlFramesBytes plus an
optional per-frame frameCallback (unused from JS in this PR;
reserved for the preview follow-up).
- AddonJs::runJob reads endImageBuffer (single Uint8Array) and
controlFramesBuffers (Array of Uint8Array) as typed-array args,
no JSON encoding.
JS surface:
- video.js / video.d.ts: new VideoStableDiffusion class with
full per-mode validation, 4k+1 frame-count rule, fps range,
moe_boundary range, Uint8Array type checks, and warning when
high_noise_* params are set without files.highNoiseDiffusionModel.
- addon.js: SdInterface.runJob threads end_image and control_frames
through to the native runJob without round-tripping through JSON.
- index.js / index.d.ts: unchanged -- image wrapper continues to
work exactly as before. Both classes compose the same SdInterface
and hit the same binding.cpp entry points.
- package.json: exports "./video", ships video.js / video.d.ts,
adds generate:video / generate:img2vid / generate:flf2vid scripts.
Examples:
- examples/generate-video-wan.js (txt2vid @ 832x480, 33 frames)
- examples/img2vid-wan.js (reuses assets/von-neumann.jpg as first frame)
- examples/flf2vid-wan.js (expects flf-first.png / flf-last.png)
Tests:
- test_sd_video_frames.cpp: 12 RAII tests (empty states, destruction
of 4k+1 production sizes, null-pixel tolerance, bounds-checked
operator[], compile-time copy/move deletion).
- test_wan_video.cpp: 12 validation tests reusing the SD2.1 context
to satisfy isLoaded() and exercise every processVideo() guard
before generate_video() runs; plus an opt-in happy-path smoke
test (SD_RUN_WAN_SMOKE=1) gated off by default because ggml-metal
lacks IM2COL_3D for Wan's 3D convs.
Gates: npm run lint, npm run test:dts, npm run build, and the
fast subset of addon-test (178/178) all pass.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(diffusion): Wan video tests, ggml overlay, example tuning
Add a vcpkg overlay-port for ggml at vcpkg/ports/ggml/ that pins
tetherto/qvac-ext-ggml @ feature/metal-pr-16669-clean (commit
bc053644). The fork adds Metal kernels for IM2COL_3D and 3-axis
PAD-left, both required by Wan 2.1 / 2.2 video generation; without
them ggml hard-aborts mid-run with "unsupported op 'IM2COL_3D'".
Rationale lives in portfile.cmake -- the overlay is transient and
will be removed once the registry baseline rolls forward.
Add JS test coverage for VideoStableDiffusion:
- test/unit/video-validation.test.js: 63 input-validation cases
mirroring the existing input-validation.test.js pattern.
- test/integration/generate-video-wan.test.js: opt-in
(WAN_INTEGRATION=1) end-to-end T2V smoke test plus sniffAvi
self-tests.
Tune the Wan examples:
- generate-video-wan.js: env-var-driven (PROMPT, FRAMES, STEPS,
SEED, CFG_SCALE, FLOW_SHIFT, ...), inline frame-count cheat
sheet, (4*k+1) pre-flight check, default FRAMES bumped to 81
(Wan 1.3B's native training length).
- img2vid-wan.js, flf2vid-wan.js: flow_shift 5.0 -> 3.0 to match
the upstream test-wan reference scripts.
Refresh the C++ smoke-test gating doc in test_wan_video.cpp to
reflect that Metal works once the overlay is in place.
Drop build.md: the vcpkg overlay rationale already lives next to
the overlay (portfile.cmake header), and transient infrastructure
doesn't earn its own long-form doc.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs(diffusion-cpp): restore build.md
The earlier deletion conflated build.md with the vcpkg overlay rationale,
but build.md is the package's standalone build guide (prerequisites,
build pipeline, cross-compilation, troubleshooting) and is still the
target of README.md's "Building from Source" link. Restore it from main,
which also picks up the LLVM 19 -> 22 bump.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): address PR review feedback for Wan video gen
* Flip default video dimensions to 480x832 portrait (phone-screen
friendly). Wan 2.1 T2V 1.3B handles both orientations equally well;
the previous 832x480 landscape default disagreed with the example.
* Document the flow_shift=0 fall-through sentinel in JSDoc, .d.ts, and
C++ struct/handler comments; correct stale "5-8" recommendation to
the actually-used 3.0 (matches example + ref scripts).
* Make video_frames error messages consistent JS<->C++ and list the
full valid set up to 81 (Wan 1.3B native training cap).
* Fix frame-duration arithmetic (33 frames is ~2s @ default 16 fps,
not ~1.3s @ 24 fps).
* Warn when upscaler_* keys are passed to VideoStableDiffusion --
ESRGAN upscale is image-only and was being silently ignored.
* Annotate addon.js end_image / control_frames forwarding to call
out the typed-array transport (avoids JSON byte-array bloat).
* Document the two-level concurrency model around _hasActiveResponse
(the busy guard isn't dead under exclusiveRunQueue -- it covers
overlap between the released queue lock and an in-flight response).
* Update C++ defaults test + JS suggestion-fallback test for the new
portrait orientation.
Co-authored-by: Cursor <cursoragent@cursor.com>
* chore(diffusion-cpp): retarget ggml overlay to merged tetherto/qvac-ext-ggml@2026-01-30
The Wan-Metal work that was carried as a local overlay has all landed
upstream on tetherto/qvac-ext-ggml's 2026-01-30 branch:
- bc053644 metal: IM2COL_3D op + PAD left-padding for Wan video (#5)
- 512e1773 cmake: support qvac hybrid backend packaging
(static CPU + dynamic GPU backends, GGML_MAX_NAME prop,
graceful no-OpenCL-device fallback, public ggml-opencl.h
install -- previously six local overlay patches)
- 6d2d24bb / b1923e29 / 05afdc59 metal: tighten IM2COL_3D supports_op
to match the CPU-reference invariants (#6)
Repin vcpkg/ports/ggml from PR #5's head (bc053644) to PR #6's merge
commit (05afdc59) on 2026-01-30, drop all seven local overlay patches
since their content is now upstream verbatim, and bump port-version
102 -> 104 to force a clean rebuild of ggml.
Net diff: +22 / -201; the overlay now exists only as a baseline pin
that overrides the registry's ggml-org/ggml@a8db410a (which still lacks
the Wan-required Metal ops). Once the registry baseline catches up to
a ref containing this work, vcpkg/ports/ggml/ can be deleted entirely.
Verified with npm run build on darwin-arm64: ggml@2026-01-30#104 builds
fresh from 05afdc59 with zero patches applied, addon links and tests
compile, prebuild installed.
Co-authored-by: Cursor <cursoragent@cursor.com>
* chore(diffusion-cpp): drop local ggml overlay now that registry serves 2026-01-30#7
The previous commit (04a6496) repointed the local ggml overlay at the
merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on
Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same
ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134
landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the
identical REF + SHA512 the overlay was carrying.
This means the diffusion-cpp-local overlay is now strictly redundant --
and slightly behind, since the registry's port-version 7 also picks up
two improvements the overlay didn't have:
- iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build
off the Apple Accelerate / BLAS path that breaks the iOS toolchain.
- The Android backend-glob now also matches `libqvac-ggml-*.so` in
addition to `libggml-*.so`, so the qvac-prefixed DL backends get
installed alongside the upstream-named ones.
So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake,
vcpkg.json, usage, android-vulkan-version.cmake) and:
- Bump `vcpkg-configuration.json`'s default-registry baseline from
a9eae49a -> d1b2497b (the merge commit of registry PR #134), which
is the first registry SHA that serves ggml@2026-01-30#7.
- Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5`
to `version>=: 2026-01-30#7` so any later baseline bump can't
silently drop us back below the Wan-Metal pin.
The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep`
marker are kept in place so future overlays can be added without a
config flap.
Verified end-to-end on darwin-arm64: clean `npm run build`
(bare-make generate + build + install) with the build/ tree wiped.
vcpkg resolves
ggml[core,metal]:arm64-osx@2026-01-30#7
-- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875...
straight from the registry (no overlay), all 8 ports install in 47s,
the addon links cleanly against the registry-supplied libggml*.a, and
prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten.
Net diff: +2 / -283.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): satisfy standard quotes rule in validateVideoFrames
The middle line of the validateVideoFrames Error message was a template
literal with no `${...}` interpolation, so `standard` (configured via
`npm run lint`) flags it as `quotes`:
video.js:39:7: Strings must use singlequote.
Adjacent lines 37, 38 use single quotes, and line 40 legitimately uses
backticks for `${n}`. Just the one stray backtick-string -- swap to
single quotes; no behaviour change.
Sanity-checks job 74830306544 on PR #1879 fails on this single line;
`npm run lint` passes locally after the swap.
Co-authored-by: Cursor <cursoragent@cursor.com>
* diffusion-cpp: enable diffusion FA in examples and fix addon paths
- Set diffusion_fa: true across SD, FLUX, and integration test ImgStableDiffusion
configs so diffusion flash attention matches WAN video examples.
- Pass highNoiseDiffusionModelPath (empty when unset) from index.js so native
createInstance validation succeeds for image mode; document optional
files.highNoiseDiffusionModel in index.d.ts and validate absolute paths.
Co-authored-by: Cursor <cursoragent@cursor.com>
* diffusion-cpp(video): pass esrganPath to native createInstance
VideoStableDiffusion omitted esrganPath while the binding validates it as a
string; mirror image-mode by forwarding files.esrgan or empty string.
Co-authored-by: Cursor <cursoragent@cursor.com>
* diffusion-cpp: align C++ includes and image codec with inference-addon-cpp
- Switch remaining qvac-lib-inference-addon-cpp includes to inference-addon-cpp
(vcpkg installs headers under the shorter prefix).
- Use image_codec::decodeImage / encodeToPng in processVideo after ImageCodec
API rename from decodePng.
Co-authored-by: Cursor <cursoragent@cursor.com>
* diffusion-cpp: apply clang-format to changed C++ sources
Run git-clang-format against ce2ea93 to satisfy the repo formatter on the
video addon, image codec, and Wan tests. No behavior changes.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/video): address review comments 1-3
1. Use global addonLogging instead of per-instance setLogger/releaseLogger
- Eliminates process-global logger collision (was reintroduced in video.js)
- Mirrors fix from ImgStableDiffusion / EsrganUpscaler
- video.js no longer manages per-instance logger state
2. Reject width/height values <= 0 in JS validation
- Now validates that width > 0 and height > 0 before alignment check
- Error message updated to say "positive multiples of 8"
- Updated test expectations to match new message
3. Validate double values are integers before casting in C++
- All int casts now check std::floor(d) == d first
- Affects: width, height, video_frames, fps handlers
- Prevents silent truncation (e.g. 8.5 -> 8)
All 70 unit tests pass; build/lint/dts all clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/video): address review comments 4-7
4. Validate end_image / control_frames dimensions match video dimensions
- Added dimension checks in processVideo() before generate_video()
- Rejects mismatched frame sizes with clear error messages
- Prevents silent corruption or undefined behavior in native layer
5. Use ImageCodec ownership helper instead of raw free()
- Replaced FrameBuffersGuard with unique_ptr<uint8_t, FreeDeleter>
- Consistent with existing image_codec ownership pattern
- Automatic cleanup on exception; no manual free() calls
6. Regenerate mobile integration test manifest
- Ran npm run test:mobile:generate
- Updated test/mobile/integration.auto.cjs with new runners
7. Add checked buffer size calculation in AviWriter
- Validates width * height overflow before multiplication
- Validates numFrames * bytesPerFrame overflow
- Rejects allocations that would exceed SIZE_MAX
- Prevents silent integer overflow in reserve() call
All 70 unit tests pass; build/lint/dts all clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/video): harden int validation, ownership, AVI overflow
Follow-up tightening on top of the review fixes for #1879.
SdVidGenHandlers:
- Extract a single requireInt() helper used by width / height / video_frames
/ fps / requirePositiveInt. The helper rejects NaN, +/-inf, fractional
doubles, and values outside [INT_MIN, INT_MAX] before static_cast<int>,
so casts to int are always well-defined and no JSON value silently
truncates (e.g. 8.5 -> 8).
- Add <cmath>/<climits> includes that were transitively available.
SdModel::processVideo:
- Replace the bespoke FrameBuffersGuard struct with three plain
unique_ptr<uint8_t, image_codec::FreeDeleter> values (initData / endData
/ controlData). Same lifetime semantics, less custom code, and the
control-frame dimension mismatch path now takes ownership *before* the
check so a throw can no longer leak the freshly-decoded buffer.
AviWriter::encodeFramesToAvi:
- Reserve calculation is now step-wise overflow-checked against SIZE_MAX
(width vs height vs *3 vs *numFrames) instead of a single multiply that
could wrap.
- Add a hard upper bound at UINT32_MAX (AVI 1.0 RIFF size header is a
uint32_t -- anything past 4 GB cannot be addressed by the spec).
- Re-check the final size before patching the RIFF header in case JPEG
output overshoots the pre-flight estimate.
Tests:
- SdVidGenHandlers: new IntCoercion suite covers fractional doubles,
out-of-int-range doubles, picojson's own NaN/inf rejection at the
JSON layer, and integer-valued doubles (the common case from JSON).
- AviWriter: new tests for the overflow guard and the 4 GB RIFF cap,
both fire before any encoding starts.
- test_wan_video: pin width/height in the existing CorruptControlFrame
test so the new dimension check passes for frame [0] and we still
exercise the decode-failure path at frame [1]. Add two new cases
covering end_image and control_frames dimension mismatch.
All 211 C++ tests, 70 JS unit tests, lint and tsc --dts pass.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/video): don't eager-require binding via addonLogging
CI sanity-checks (JS unit tests on a runner with no native prebuild)
was crashing with `AddonError: ADDON_NOT_FOUND` because the top-level
`require('./addonLogging')` introduced in e6b13ae transitively pulled
in `binding.js` -> `libqvac__diffusion-cpp.so`. The unit tests only
exercise JS-side validation and never call `load()`, so they used to
work without the prebuilt addon -- this regression broke that.
Match `ImgStableDiffusion` instead: drop the per-instance native
logger plumbing entirely (it's dead code anyway after the e6b13ae
refactor, since `_connectNativeLogger` was no longer called), and
document in the constructor JSDoc that callers wire up native C++
logs once globally via `addonLogging.setLogger(...)`.
Net diff:
- Remove `const addonLogging = require('./addonLogging')` at top.
- Remove `_connectNativeLogger` / `_releaseNativeLogger` methods and
their two stale call sites.
- Remove `LOG_METHODS` (only used by the removed method) and
`this._binding` (used to keep a handle for the removed release
path; the binding is now scoped to `_createAddon` only, matching
`ImgStableDiffusion::_createAddon`).
- JSDoc on `args.logger` now mirrors `index.js` and points users at
`addonLogging.setLogger`.
Verified: JS unit tests 70/70 pass with the prebuilds directory
moved aside, lint clean, tsc --dts clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/video): validate init_image dims; reject unsupported lora
Two reviewer-flagged regressions on PR #1879:
1. blocker (gabrielgrigoras-serv): processVideo() validates dimensions
for end_image and every control_frames[i] but not for init_image.
A caller passing width/height that don't match the decoded init_image
would hand mismatched (width, height) and frame pixel stride to
generate_video(), producing inconsistent frame data downstream
(and risking VAE segfaults).
Fix: add the same dimension check in SdModel.cpp processVideo()
right after the init_image decode, throwing StatusError on
mismatch -- consistent with the existing end_image / control_frames
checks. All three checks now compare against vid.width / vid.height
as the single source of truth for the video's final dimensions.
Ownership of the freshly-decoded init pixel buffer is taken into
the unique_ptr *before* the dim check, mirroring the control_frames
path so a mismatch can't leak the buffer.
2. gianni-cor: params.lora silently dropped on the video path -- video.js
validated it as a non-empty absolute path and video.d.ts advertised
`lora?: string`, but SD_VID_GEN_HANDLERS has no "lora" entry and
SdModel::processVideo never touches sd_vid_gen_params_t::loras, so
any LoRA passed through was swallowed by the unknown-keys branch
in applySdVidGenHandlers and silently produced LoRA-less output.
Fix B applied (reviewer's preferred "out of scope" option):
- video.js: replaced the absolute-path validation with a loud
TypeError('params.lora is not supported for video generation
yet'), so existing callers fail at the JS boundary instead of
getting silent LoRA-less output.
- video.d.ts: dropped `lora?: string` from VideoGenerationParams.
- video-validation.test.js: collapsed the four old lora cases
(empty / non-string / relative / absolute) into one parametrised
test that asserts the new TypeError fires for every shape, so a
future re-introduction of the JS validation can't bring back the
silent-drop regression.
When LoRA-on-video is wired through native (mirror of processImage's
prepareLoras() + sd_img_gen_params_t::loras), the right path is to
restore the absolute-path validation here and add a "lora" handler
to SD_VID_GEN_HANDLERS, NOT to revert the d.ts.
C++ test changes:
- new Img2VidRejectsInitImageWithWrongDimensions covers the blocker.
- Flf2VidRejectsCorruptEndImage pinned width/height to 64 so the new
init dim check passes for the 64x64 init and we still reach the
intended end-decode-failure path (same approach as the existing
Img2VidRejectsCorruptControlFrame fixture).
Verified: 67/67 JS unit tests pass with and without prebuilds, 176/176
C++ tests pass (1 opt-in Wan smoke skipped, requires ~8GB weights),
lint and tsc --dts clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): regression + 7 review-batch fixes (NaN/Inf guards, cancel, etc.)
Addresses all 8 outstanding comments on PR #1879 (one regression from
commit 59f2663 plus a CHANGES_REQUESTED batch of seven items). Major
points below; per-file rationale in the inline comments.
== Regression fix (highest priority)
* gianni-cor flagged that the new init_image strict-equality check from
commit 59f2663 rejects every off-grid frame with a confusing error
citing wrapper-picked dims. Root cause: addon.js _fillDimsFromImage
was silently doing Math.ceil(d/8)*8, so a 100x100 init_image got
dispatched as 104x104 and the native check then threw "100x100 != 104x104"
-- citing a value the caller never passed. Fixes:
- addon.js _fillDimsFromImage now passes dims through verbatim
(no rounding). The image SDEdit path already realigns internally
(SdModel.cpp ~600) and the FLUX2 ref path uses
auto_resize_ref_image, so dropping the rounding is safe across
every path.
- video.js _runInternal pre-empts the cryptic native error with a
JS-layer off-grid probe: when width/height aren't explicit it
reads init_image / end_image / control_frames[i] dimensions and
throws a clear "your image is off-grid, pre-align or pass explicit
dims" message naming the exact buffer.
- Removes the ceil-vs-round inconsistency wart between
_fillDimsFromImage (ceil) and the user-facing validator (round).
- Three new JS regression tests for off-grid init / end / control,
plus one positive test for explicit aligned dims overriding the
probe.
== JS hardening
* params.prompt is documented Required but was never validated --
undefined / "" / 42 each produced a different failure mode (silent
noise, silent noise, far-away C++ error). video.js now throws a loud
TypeError at the wrapper boundary. Four new prompt-validation tests.
* mapAddonEvent JobEnded fallback accepted every typed-array view --
works today only because uint8_t is the sole registered
TypedArrayOutputHandler. When frameCallback (SdModel.hpp:139) gets
wired through to JS, every per-frame event would have been
misclassified as JobEnded and the response stream would have closed
after the first frame. One-token fix: add `&& !ArrayBuffer.isView(rawData)`
to the discriminator. ArrayBuffer.isView is true for every TypedArray
+ DataView, false for plain objects -- exactly the discrimination
needed for the runtime-stats POJO.
== C++ parser hardening (NaN / Inf / int64 / range)
* Promoted requireInt from SdVidGenHandlers.cpp's anonymous namespace
into parsers::, and added two siblings:
- requireFiniteFloat: rejects NaN / +inf / -inf before the float
cast (NaN compares false against every bound, so range checks
of the form `f < lo || f > hi` previously let it sneak through).
- requireInt64: same finite + integer guards as requireInt, range
check against representable [INT64_MIN, INT64_MAX] doubles.
- requireFiniteFloatInRange: convenience wrapper for [lo, hi] checks.
* Routed every previously-vulnerable cast through the new helpers:
- SdVidGenHandlers.cpp: seed (int64), cfg_scale, flow_shift,
high_noise_cfg_scale, high_noise_flow_shift, vae_tile_overlap,
cache_threshold, moe_boundary, strength, vace_strength
- SdGenHandlers.cpp (image path, reviewer asked for symmetric fix):
eta, cfg_scale, guidance, img_cfg_scale, seed, batch_count,
strength, clip_skip, vae_tile_overlap, cache_threshold, width,
height, steps, parseUpscaleRepeats
* parseVaeTileSize (SdParsers.cpp): numeric form now routes through
requireInt (rejects NaN/Inf/fractional/out-of-range), and BOTH
forms (numeric and "WxH" string) now reject <= 0. Five new tests.
== Cancellation gap + typed status
* SdModel.cpp processVideo cancelRequested_ was checked exactly once
after generate_video() returns -- the slow tail (per-frame PNG
fan-out + AVI mux, multi-second on 81-frame 832x480 videos) had no
cancellation visibility. Added 2 checks: top of frame-callback loop
body, and immediately before encodeFramesToAvi.
* Switched both Job cancelled throws (image path at SdModel.cpp:730,
video path at :987, plus the 2 new C1 sites) from bare
std::runtime_error to StatusError tagged with
localCodeMsg="Cancelled", so the JS layer can discriminate cancel
from real internal failures via codeString() ("[ General :: Cancelled ]")
instead of string-matching the exception message.
Note: this PR deliberately does NOT add `Cancelled = 6` to the
shared inference-addon-cpp Errors.hpp enum, because that header
ships via vcpkg to every package in the monorepo and a cross-package
coordinated change is out of scope. Instead we use the 3-arg
StatusError ctor (addonId, localCodeMsg, errorMsg) which produces
the same codeString without touching the shared enum. When the
enum is updated later, the 4 call sites can switch to the 2-arg
ctor in a one-line follow-up.
== C5 (preview_*) -- product decision deferred
* The header comment at SdCtxHandlers.hpp:112 claimed preview_mode et
al are "Wired to sd_set_preview_callback() in SdModel::process()",
but a grep across packages/diffusion-cpp for sd_set_preview_callback
returns zero matches -- the four config keys are validated and stored
but the upstream callback is never installed, so they're a silent
no-op end-to-end. Downgraded the misleading comment to an explicit
TODO(QVAC-18026 follow-up) documenting the gap and the two viable
resolution paths (wire it up alongside sd_set_abort_callback, OR
remove the handlers + fields + tests). Reviewer asked which path is
intended; this commit picks neither and just stops claiming the
wiring exists. The choice can land in a separate PR without holding
this one up.
== Test surface
* +8 JS tests (prompt validation x4, off-grid probe x4)
* +5 C++ tests (vae_tile_size zero/negative/fractional/out-of-range
rejection, plus the existing IntCoercion suite carried over to the
promoted helpers transparently)
* Cancel-context test updated to assert the typed
"[ General :: Cancelled ]" codeString in addition to the message.
Verified locally:
JS unit tests: 75/75 pass with prebuild, 75/75 also without
(CI sanity-checks mode, no native binary loaded)
C++ unit tests: 209/210 pass, 1 opt-in skip
(SdWanHappyPathTest needs ~8GB Wan weights)
npm run lint: clean
npm run test:dts: clean
Co-authored-by: Cursor <cursoragent@cursor.com>
* chore(diffusion-cpp): release 0.8.0
Bumps @qvac/diffusion-cpp to 0.8.0 and documents the Wan 2.1 / Wan 2.2
video pipeline shipped since 0.7.0: new VideoStableDiffusion class
(txt2vid / img2vid / flf2vid), MoE high-noise expert routing, streaming
MJPG AVI muxer, refactored download helpers + Wan model script, plus
the supporting JS + C++ test coverage and validation hardening.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): re-align auto-detected img dims to multiple of 8
_fillDimsFromImage was passing raw image dimensions through verbatim
since fe4d10f, but the native SdGenHandlers validates width/height
% 8 == 0 before the downstream alignment in SdModel::processImage
ever runs. Any img2img call with a non-aligned source image (e.g.
the bundled 500x627 von-neumann.jpg used by the FLUX2 i2i integration
test) therefore failed with:
height must be a positive multiple of 8, got: 627
Restore the Math.ceil(d/8)*8 round-up that was removed in fe4d10f.
The original motivation for the removal -- avoiding a spurious dim
mismatch on the video path where processVideo strict-compares decoded
frame dims against vid.width/vid.height -- is already handled at the
JS layer by VideoStableDiffusion's off-grid pre-validation in
video.js, which runs before this helper and rejects unaligned
init/end/control frames with a clear caller-facing error. The ceil()
is therefore a no-op on the video path.
Co-authored-by: Cursor <cursoragent@cursor.com>
* style(diffusion-cpp): apply clang-format to drifted C++ sources
cpp-lint surfaced clang-format drift in 4 files that accumulated
across recent Wan-video commits. No semantic changes -- only
mechanical line-wrap / arg-break placement to match the project's
.clang-format.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/test): use package export for video module in wan integration test
The generate-video-wan.test.js test was using a relative import
(require('../../video')) that breaks when test files are bundled
and relocated to the test-framework backend directory during mobile
test setup.
Change to the package export pattern (@qvac/diffusion-cpp/video)
used by other integration tests, which remains valid regardless of
file location.
Fixes: https://github.com/tetherto/qvac/actions/runs/25929776543/job/76221440417
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): expose video API from package root
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): repair variable names in SdModel after merge
Co-authored-by: Cursor <cursoragent@cursor.com>
* style(diffusion-cpp): apply git-clang-format
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Proletter
pushed a commit
to tetherto/qvac
that referenced
this pull request
May 24, 2026
…s 2026-01-30#7 The previous commit (0f5c522) repointed the local ggml overlay at the merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134 landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the identical REF + SHA512 the overlay was carrying. This means the diffusion-cpp-local overlay is now strictly redundant -- and slightly behind, since the registry's port-version 7 also picks up two improvements the overlay didn't have: - iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build off the Apple Accelerate / BLAS path that breaks the iOS toolchain. - The Android backend-glob now also matches `libqvac-ggml-*.so` in addition to `libggml-*.so`, so the qvac-prefixed DL backends get installed alongside the upstream-named ones. So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake, vcpkg.json, usage, android-vulkan-version.cmake) and: - Bump `vcpkg-configuration.json`'s default-registry baseline from a9eae49a -> d1b2497b (the merge commit of registry PR #134), which is the first registry SHA that serves ggml@2026-01-30#7. - Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5` to `version>=: 2026-01-30#7` so any later baseline bump can't silently drop us back below the Wan-Metal pin. The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep` marker are kept in place so future overlays can be added without a config flap. Verified end-to-end on darwin-arm64: clean `npm run build` (bare-make generate + build + install) with the build/ tree wiped. vcpkg resolves ggml[core,metal]:arm64-osx@2026-01-30#7 -- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875... straight from the registry (no overlay), all 8 ports install in 47s, the addon links cleanly against the registry-supplied libggml*.a, and prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten. Net diff: +2 / -283. Co-authored-by: Cursor <cursoragent@cursor.com>
Proletter
pushed a commit
to tetherto/qvac
that referenced
this pull request
May 24, 2026
* feat(diffusion): refactor download scripts and add Wan 2.1 support
- Extract shared dl() function into reusable dl-functions.sh module
- Update all download-model-*.sh scripts to source shared utilities
- Add download-model-wan.sh for Wan 2.1 video generation models
- Reduces code duplication and improves maintainability
Wan 2.1 downloads (~8.3 GB):
- wan2.1_t2v_1.3B_fp16.safetensors (diffusion model)
- wan_2.1_vae.safetensors (VAE encoder/decoder)
- umt5_xxl_fp16.safetensors (text encoder)
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(diffusion): Wan video foundation -- ctx/vid handlers, AVI muxer, shared parsers
Phase 1-4 of Wan 2.1 / 2.2 video generation support in the diffusion-cpp
addon. Configuration + parsing layer only; dispatch + callback plumbing +
JS surface land in follow-up commits on this branch.
SdCtxConfig:
- Add highNoiseDiffusionModelPath for Wan 2.2 MoE high-noise expert
(leave empty for Wan 2.1 and all non-Wan models)
- Add previewMode / previewInterval / previewDenoised / previewNoisy
for optional mid-denoising preview frames via sd_set_preview_callback
- Wire both through SdCtxHandlers (new JS keys: preview_mode,
preview_interval, preview_denoised, preview_noisy) and AddonJs
(highNoiseDiffusionModelPath in args map)
AviWriter (new utility):
- addon/src/utils/AviWriter.{hpp,cpp} ports the upstream avi_writer.h
MJPG encoder onto an in-memory std::vector<uint8_t> sink (no stdio,
no temp files) so video bytes flow through the existing
OutputCallBackJs queue
- Full input validation (numFrames, fps, jpegQuality, channel count,
frame homogeneity, null data) -- StatusError on any rejection
SdParsers (new shared module):
- Extract parseSampler / parseScheduler / parseCacheMode /
parseVaeTileSize / parseCachePreset / requireNum/Str/Bool from
SdGenHandlers into addon/src/handlers/SdParsers.{hpp,cpp}
- Reused by both SdGenHandlers (image) and SdVidGenHandlers (video)
SdVidGenHandlers (new):
- SdVidGenConfig struct with full Wan 2.1 + 2.2 surface: mode
(txt2vid/img2vid/flf2vid), prompts, dimensions, videoFrames (4k+1
validated), fps, seed, low-noise expert sample params, high-noise
expert sample params, moeBoundary, strength, vaceStrength, VAE
tiling, cache mode/preset/threshold
- 22 JSON handlers with validation for each field
Tests (all pass):
- 5 new SdCtxHandlers tests for preview_* + high_noise path default
- 18 new AviWriter tests covering happy path, RIFF header structure,
all validation rejections, JPEG round-trip
- 54 new SdVidGenHandlers tests covering every field + integration
payload + defaults
- Zero regressions across existing 144 fast-unit tests
No user-facing JS API changes yet.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(diffusion): Wan video generation -- dispatch, processVideo, JS wrapper + examples
Builds on the Wan foundation commit by wiring the video path end-to-end
from JS to C++ and back. Adds txt2vid / img2vid / flf2vid generation
via a new VideoStableDiffusion class that shares the single native
binding with the existing ImgStableDiffusion class.
Native:
- SdModel::process() dispatches on the JSON "mode" field to
processImage() (existing) or the new processVideo() path.
- processVideo() applies SdVidGenHandlers, validates mode-vs-inputs
invariants (img2vid requires init_image; flf2vid requires both;
txt2vid rejects both; end_image only valid on flf2vid), decodes
init/end/control frames, fills sd_vid_gen_params_t, and encodes
the returned sd_image_t* sequence to an in-memory MJPG AVI.
- SdVideoFrames RAII wrapper extracted to addon/src/utils/ so it
can be unit-tested without a loaded model.
- GenerationJob grows endImageBytes and controlFramesBytes plus an
optional per-frame frameCallback (unused from JS in this PR;
reserved for the preview follow-up).
- AddonJs::runJob reads endImageBuffer (single Uint8Array) and
controlFramesBuffers (Array of Uint8Array) as typed-array args,
no JSON encoding.
JS surface:
- video.js / video.d.ts: new VideoStableDiffusion class with
full per-mode validation, 4k+1 frame-count rule, fps range,
moe_boundary range, Uint8Array type checks, and warning when
high_noise_* params are set without files.highNoiseDiffusionModel.
- addon.js: SdInterface.runJob threads end_image and control_frames
through to the native runJob without round-tripping through JSON.
- index.js / index.d.ts: unchanged -- image wrapper continues to
work exactly as before. Both classes compose the same SdInterface
and hit the same binding.cpp entry points.
- package.json: exports "./video", ships video.js / video.d.ts,
adds generate:video / generate:img2vid / generate:flf2vid scripts.
Examples:
- examples/generate-video-wan.js (txt2vid @ 832x480, 33 frames)
- examples/img2vid-wan.js (reuses assets/von-neumann.jpg as first frame)
- examples/flf2vid-wan.js (expects flf-first.png / flf-last.png)
Tests:
- test_sd_video_frames.cpp: 12 RAII tests (empty states, destruction
of 4k+1 production sizes, null-pixel tolerance, bounds-checked
operator[], compile-time copy/move deletion).
- test_wan_video.cpp: 12 validation tests reusing the SD2.1 context
to satisfy isLoaded() and exercise every processVideo() guard
before generate_video() runs; plus an opt-in happy-path smoke
test (SD_RUN_WAN_SMOKE=1) gated off by default because ggml-metal
lacks IM2COL_3D for Wan's 3D convs.
Gates: npm run lint, npm run test:dts, npm run build, and the
fast subset of addon-test (178/178) all pass.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(diffusion): Wan video tests, ggml overlay, example tuning
Add a vcpkg overlay-port for ggml at vcpkg/ports/ggml/ that pins
tetherto/qvac-ext-ggml @ feature/metal-pr-16669-clean (commit
bc053644). The fork adds Metal kernels for IM2COL_3D and 3-axis
PAD-left, both required by Wan 2.1 / 2.2 video generation; without
them ggml hard-aborts mid-run with "unsupported op 'IM2COL_3D'".
Rationale lives in portfile.cmake -- the overlay is transient and
will be removed once the registry baseline rolls forward.
Add JS test coverage for VideoStableDiffusion:
- test/unit/video-validation.test.js: 63 input-validation cases
mirroring the existing input-validation.test.js pattern.
- test/integration/generate-video-wan.test.js: opt-in
(WAN_INTEGRATION=1) end-to-end T2V smoke test plus sniffAvi
self-tests.
Tune the Wan examples:
- generate-video-wan.js: env-var-driven (PROMPT, FRAMES, STEPS,
SEED, CFG_SCALE, FLOW_SHIFT, ...), inline frame-count cheat
sheet, (4*k+1) pre-flight check, default FRAMES bumped to 81
(Wan 1.3B's native training length).
- img2vid-wan.js, flf2vid-wan.js: flow_shift 5.0 -> 3.0 to match
the upstream test-wan reference scripts.
Refresh the C++ smoke-test gating doc in test_wan_video.cpp to
reflect that Metal works once the overlay is in place.
Drop build.md: the vcpkg overlay rationale already lives next to
the overlay (portfile.cmake header), and transient infrastructure
doesn't earn its own long-form doc.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs(diffusion-cpp): restore build.md
The earlier deletion conflated build.md with the vcpkg overlay rationale,
but build.md is the package's standalone build guide (prerequisites,
build pipeline, cross-compilation, troubleshooting) and is still the
target of README.md's "Building from Source" link. Restore it from main,
which also picks up the LLVM 19 -> 22 bump.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): address PR review feedback for Wan video gen
* Flip default video dimensions to 480x832 portrait (phone-screen
friendly). Wan 2.1 T2V 1.3B handles both orientations equally well;
the previous 832x480 landscape default disagreed with the example.
* Document the flow_shift=0 fall-through sentinel in JSDoc, .d.ts, and
C++ struct/handler comments; correct stale "5-8" recommendation to
the actually-used 3.0 (matches example + ref scripts).
* Make video_frames error messages consistent JS<->C++ and list the
full valid set up to 81 (Wan 1.3B native training cap).
* Fix frame-duration arithmetic (33 frames is ~2s @ default 16 fps,
not ~1.3s @ 24 fps).
* Warn when upscaler_* keys are passed to VideoStableDiffusion --
ESRGAN upscale is image-only and was being silently ignored.
* Annotate addon.js end_image / control_frames forwarding to call
out the typed-array transport (avoids JSON byte-array bloat).
* Document the two-level concurrency model around _hasActiveResponse
(the busy guard isn't dead under exclusiveRunQueue -- it covers
overlap between the released queue lock and an in-flight response).
* Update C++ defaults test + JS suggestion-fallback test for the new
portrait orientation.
Co-authored-by: Cursor <cursoragent@cursor.com>
* chore(diffusion-cpp): retarget ggml overlay to merged tetherto/qvac-ext-ggml@2026-01-30
The Wan-Metal work that was carried as a local overlay has all landed
upstream on tetherto/qvac-ext-ggml's 2026-01-30 branch:
- bc053644 metal: IM2COL_3D op + PAD left-padding for Wan video (#5)
- 512e1773 cmake: support qvac hybrid backend packaging
(static CPU + dynamic GPU backends, GGML_MAX_NAME prop,
graceful no-OpenCL-device fallback, public ggml-opencl.h
install -- previously six local overlay patches)
- 6d2d24bb / b1923e29 / 05afdc59 metal: tighten IM2COL_3D supports_op
to match the CPU-reference invariants (#6)
Repin vcpkg/ports/ggml from PR #5's head (bc053644) to PR #6's merge
commit (05afdc59) on 2026-01-30, drop all seven local overlay patches
since their content is now upstream verbatim, and bump port-version
102 -> 104 to force a clean rebuild of ggml.
Net diff: +22 / -201; the overlay now exists only as a baseline pin
that overrides the registry's ggml-org/ggml@a8db410a (which still lacks
the Wan-required Metal ops). Once the registry baseline catches up to
a ref containing this work, vcpkg/ports/ggml/ can be deleted entirely.
Verified with npm run build on darwin-arm64: ggml@2026-01-30#104 builds
fresh from 05afdc59 with zero patches applied, addon links and tests
compile, prebuild installed.
Co-authored-by: Cursor <cursoragent@cursor.com>
* chore(diffusion-cpp): drop local ggml overlay now that registry serves 2026-01-30#7
The previous commit (04a6496) repointed the local ggml overlay at the
merge of tetherto/qvac-ext-ggml#6 (05afdc59) so Wan video generation on
Metal would stop aborting with `unsupported op 'IM2COL_3D'`. That same
ref has now been promoted into the registry: tetherto/qvac-registry-vcpkg#134
landed on main as d1b2497b, bumping ggml port-version 6 -> 7 against the
identical REF + SHA512 the overlay was carrying.
This means the diffusion-cpp-local overlay is now strictly redundant --
and slightly behind, since the registry's port-version 7 also picks up
two improvements the overlay didn't have:
- iOS gets `-DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF` to keep the build
off the Apple Accelerate / BLAS path that breaks the iOS toolchain.
- The Android backend-glob now also matches `libqvac-ggml-*.so` in
addition to `libggml-*.so`, so the qvac-prefixed DL backends get
installed alongside the upstream-named ones.
So we delete the entire `vcpkg/ports/ggml/` overlay (portfile.cmake,
vcpkg.json, usage, android-vulkan-version.cmake) and:
- Bump `vcpkg-configuration.json`'s default-registry baseline from
a9eae49a -> d1b2497b (the merge commit of registry PR #134), which
is the first registry SHA that serves ggml@2026-01-30#7.
- Tighten `vcpkg.json`'s ggml constraint from `version>=: 2026-01-30#5`
to `version>=: 2026-01-30#7` so any later baseline bump can't
silently drop us back below the Wan-Metal pin.
The `overlay-ports: ["vcpkg/ports"]` entry and the `vcpkg/ports/.gitkeep`
marker are kept in place so future overlays can be added without a
config flap.
Verified end-to-end on darwin-arm64: clean `npm run build`
(bare-make generate + build + install) with the build/ tree wiped.
vcpkg resolves
ggml[core,metal]:arm64-osx@2026-01-30#7
-- git+https://github.com/tetherto/qvac-registry-vcpkg.git@f1632875...
straight from the registry (no overlay), all 8 ports install in 47s,
the addon links cleanly against the registry-supplied libggml*.a, and
prebuilds/darwin-arm64/qvac__diffusion-cpp.bare is rewritten.
Net diff: +2 / -283.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): satisfy standard quotes rule in validateVideoFrames
The middle line of the validateVideoFrames Error message was a template
literal with no `${...}` interpolation, so `standard` (configured via
`npm run lint`) flags it as `quotes`:
video.js:39:7: Strings must use singlequote.
Adjacent lines 37, 38 use single quotes, and line 40 legitimately uses
backticks for `${n}`. Just the one stray backtick-string -- swap to
single quotes; no behaviour change.
Sanity-checks job 74830306544 on PR #1879 fails on this single line;
`npm run lint` passes locally after the swap.
Co-authored-by: Cursor <cursoragent@cursor.com>
* diffusion-cpp: enable diffusion FA in examples and fix addon paths
- Set diffusion_fa: true across SD, FLUX, and integration test ImgStableDiffusion
configs so diffusion flash attention matches WAN video examples.
- Pass highNoiseDiffusionModelPath (empty when unset) from index.js so native
createInstance validation succeeds for image mode; document optional
files.highNoiseDiffusionModel in index.d.ts and validate absolute paths.
Co-authored-by: Cursor <cursoragent@cursor.com>
* diffusion-cpp(video): pass esrganPath to native createInstance
VideoStableDiffusion omitted esrganPath while the binding validates it as a
string; mirror image-mode by forwarding files.esrgan or empty string.
Co-authored-by: Cursor <cursoragent@cursor.com>
* diffusion-cpp: align C++ includes and image codec with inference-addon-cpp
- Switch remaining qvac-lib-inference-addon-cpp includes to inference-addon-cpp
(vcpkg installs headers under the shorter prefix).
- Use image_codec::decodeImage / encodeToPng in processVideo after ImageCodec
API rename from decodePng.
Co-authored-by: Cursor <cursoragent@cursor.com>
* diffusion-cpp: apply clang-format to changed C++ sources
Run git-clang-format against 2c4dc65 to satisfy the repo formatter on the
video addon, image codec, and Wan tests. No behavior changes.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/video): address review comments 1-3
1. Use global addonLogging instead of per-instance setLogger/releaseLogger
- Eliminates process-global logger collision (was reintroduced in video.js)
- Mirrors fix from ImgStableDiffusion / EsrganUpscaler
- video.js no longer manages per-instance logger state
2. Reject width/height values <= 0 in JS validation
- Now validates that width > 0 and height > 0 before alignment check
- Error message updated to say "positive multiples of 8"
- Updated test expectations to match new message
3. Validate double values are integers before casting in C++
- All int casts now check std::floor(d) == d first
- Affects: width, height, video_frames, fps handlers
- Prevents silent truncation (e.g. 8.5 -> 8)
All 70 unit tests pass; build/lint/dts all clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/video): address review comments 4-7
4. Validate end_image / control_frames dimensions match video dimensions
- Added dimension checks in processVideo() before generate_video()
- Rejects mismatched frame sizes with clear error messages
- Prevents silent corruption or undefined behavior in native layer
5. Use ImageCodec ownership helper instead of raw free()
- Replaced FrameBuffersGuard with unique_ptr<uint8_t, FreeDeleter>
- Consistent with existing image_codec ownership pattern
- Automatic cleanup on exception; no manual free() calls
6. Regenerate mobile integration test manifest
- Ran npm run test:mobile:generate
- Updated test/mobile/integration.auto.cjs with new runners
7. Add checked buffer size calculation in AviWriter
- Validates width * height overflow before multiplication
- Validates numFrames * bytesPerFrame overflow
- Rejects allocations that would exceed SIZE_MAX
- Prevents silent integer overflow in reserve() call
All 70 unit tests pass; build/lint/dts all clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/video): harden int validation, ownership, AVI overflow
Follow-up tightening on top of the review fixes for #1879.
SdVidGenHandlers:
- Extract a single requireInt() helper used by width / height / video_frames
/ fps / requirePositiveInt. The helper rejects NaN, +/-inf, fractional
doubles, and values outside [INT_MIN, INT_MAX] before static_cast<int>,
so casts to int are always well-defined and no JSON value silently
truncates (e.g. 8.5 -> 8).
- Add <cmath>/<climits> includes that were transitively available.
SdModel::processVideo:
- Replace the bespoke FrameBuffersGuard struct with three plain
unique_ptr<uint8_t, image_codec::FreeDeleter> values (initData / endData
/ controlData). Same lifetime semantics, less custom code, and the
control-frame dimension mismatch path now takes ownership *before* the
check so a throw can no longer leak the freshly-decoded buffer.
AviWriter::encodeFramesToAvi:
- Reserve calculation is now step-wise overflow-checked against SIZE_MAX
(width vs height vs *3 vs *numFrames) instead of a single multiply that
could wrap.
- Add a hard upper bound at UINT32_MAX (AVI 1.0 RIFF size header is a
uint32_t -- anything past 4 GB cannot be addressed by the spec).
- Re-check the final size before patching the RIFF header in case JPEG
output overshoots the pre-flight estimate.
Tests:
- SdVidGenHandlers: new IntCoercion suite covers fractional doubles,
out-of-int-range doubles, picojson's own NaN/inf rejection at the
JSON layer, and integer-valued doubles (the common case from JSON).
- AviWriter: new tests for the overflow guard and the 4 GB RIFF cap,
both fire before any encoding starts.
- test_wan_video: pin width/height in the existing CorruptControlFrame
test so the new dimension check passes for frame [0] and we still
exercise the decode-failure path at frame [1]. Add two new cases
covering end_image and control_frames dimension mismatch.
All 211 C++ tests, 70 JS unit tests, lint and tsc --dts pass.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/video): don't eager-require binding via addonLogging
CI sanity-checks (JS unit tests on a runner with no native prebuild)
was crashing with `AddonError: ADDON_NOT_FOUND` because the top-level
`require('./addonLogging')` introduced in e6b13ae transitively pulled
in `binding.js` -> `libqvac__diffusion-cpp.so`. The unit tests only
exercise JS-side validation and never call `load()`, so they used to
work without the prebuilt addon -- this regression broke that.
Match `ImgStableDiffusion` instead: drop the per-instance native
logger plumbing entirely (it's dead code anyway after the e6b13ae
refactor, since `_connectNativeLogger` was no longer called), and
document in the constructor JSDoc that callers wire up native C++
logs once globally via `addonLogging.setLogger(...)`.
Net diff:
- Remove `const addonLogging = require('./addonLogging')` at top.
- Remove `_connectNativeLogger` / `_releaseNativeLogger` methods and
their two stale call sites.
- Remove `LOG_METHODS` (only used by the removed method) and
`this._binding` (used to keep a handle for the removed release
path; the binding is now scoped to `_createAddon` only, matching
`ImgStableDiffusion::_createAddon`).
- JSDoc on `args.logger` now mirrors `index.js` and points users at
`addonLogging.setLogger`.
Verified: JS unit tests 70/70 pass with the prebuilds directory
moved aside, lint clean, tsc --dts clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/video): validate init_image dims; reject unsupported lora
Two reviewer-flagged regressions on PR #1879:
1. blocker (gabrielgrigoras-serv): processVideo() validates dimensions
for end_image and every control_frames[i] but not for init_image.
A caller passing width/height that don't match the decoded init_image
would hand mismatched (width, height) and frame pixel stride to
generate_video(), producing inconsistent frame data downstream
(and risking VAE segfaults).
Fix: add the same dimension check in SdModel.cpp processVideo()
right after the init_image decode, throwing StatusError on
mismatch -- consistent with the existing end_image / control_frames
checks. All three checks now compare against vid.width / vid.height
as the single source of truth for the video's final dimensions.
Ownership of the freshly-decoded init pixel buffer is taken into
the unique_ptr *before* the dim check, mirroring the control_frames
path so a mismatch can't leak the buffer.
2. gianni-cor: params.lora silently dropped on the video path -- video.js
validated it as a non-empty absolute path and video.d.ts advertised
`lora?: string`, but SD_VID_GEN_HANDLERS has no "lora" entry and
SdModel::processVideo never touches sd_vid_gen_params_t::loras, so
any LoRA passed through was swallowed by the unknown-keys branch
in applySdVidGenHandlers and silently produced LoRA-less output.
Fix B applied (reviewer's preferred "out of scope" option):
- video.js: replaced the absolute-path validation with a loud
TypeError('params.lora is not supported for video generation
yet'), so existing callers fail at the JS boundary instead of
getting silent LoRA-less output.
- video.d.ts: dropped `lora?: string` from VideoGenerationParams.
- video-validation.test.js: collapsed the four old lora cases
(empty / non-string / relative / absolute) into one parametrised
test that asserts the new TypeError fires for every shape, so a
future re-introduction of the JS validation can't bring back the
silent-drop regression.
When LoRA-on-video is wired through native (mirror of processImage's
prepareLoras() + sd_img_gen_params_t::loras), the right path is to
restore the absolute-path validation here and add a "lora" handler
to SD_VID_GEN_HANDLERS, NOT to revert the d.ts.
C++ test changes:
- new Img2VidRejectsInitImageWithWrongDimensions covers the blocker.
- Flf2VidRejectsCorruptEndImage pinned width/height to 64 so the new
init dim check passes for the 64x64 init and we still reach the
intended end-decode-failure path (same approach as the existing
Img2VidRejectsCorruptControlFrame fixture).
Verified: 67/67 JS unit tests pass with and without prebuilds, 176/176
C++ tests pass (1 opt-in Wan smoke skipped, requires ~8GB weights),
lint and tsc --dts clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): regression + 7 review-batch fixes (NaN/Inf guards, cancel, etc.)
Addresses all 8 outstanding comments on PR #1879 (one regression from
commit 59f2663 plus a CHANGES_REQUESTED batch of seven items). Major
points below; per-file rationale in the inline comments.
== Regression fix (highest priority)
* gianni-cor flagged that the new init_image strict-equality check from
commit 59f2663 rejects every off-grid frame with a confusing error
citing wrapper-picked dims. Root cause: addon.js _fillDimsFromImage
was silently doing Math.ceil(d/8)*8, so a 100x100 init_image got
dispatched as 104x104 and the native check then threw "100x100 != 104x104"
-- citing a value the caller never passed. Fixes:
- addon.js _fillDimsFromImage now passes dims through verbatim
(no rounding). The image SDEdit path already realigns internally
(SdModel.cpp ~600) and the FLUX2 ref path uses
auto_resize_ref_image, so dropping the rounding is safe across
every path.
- video.js _runInternal pre-empts the cryptic native error with a
JS-layer off-grid probe: when width/height aren't explicit it
reads init_image / end_image / control_frames[i] dimensions and
throws a clear "your image is off-grid, pre-align or pass explicit
dims" message naming the exact buffer.
- Removes the ceil-vs-round inconsistency wart between
_fillDimsFromImage (ceil) and the user-facing validator (round).
- Three new JS regression tests for off-grid init / end / control,
plus one positive test for explicit aligned dims overriding the
probe.
== JS hardening
* params.prompt is documented Required but was never validated --
undefined / "" / 42 each produced a different failure mode (silent
noise, silent noise, far-away C++ error). video.js now throws a loud
TypeError at the wrapper boundary. Four new prompt-validation tests.
* mapAddonEvent JobEnded fallback accepted every typed-array view --
works today only because uint8_t is the sole registered
TypedArrayOutputHandler. When frameCallback (SdModel.hpp:139) gets
wired through to JS, every per-frame event would have been
misclassified as JobEnded and the response stream would have closed
after the first frame. One-token fix: add `&& !ArrayBuffer.isView(rawData)`
to the discriminator. ArrayBuffer.isView is true for every TypedArray
+ DataView, false for plain objects -- exactly the discrimination
needed for the runtime-stats POJO.
== C++ parser hardening (NaN / Inf / int64 / range)
* Promoted requireInt from SdVidGenHandlers.cpp's anonymous namespace
into parsers::, and added two siblings:
- requireFiniteFloat: rejects NaN / +inf / -inf before the float
cast (NaN compares false against every bound, so range checks
of the form `f < lo || f > hi` previously let it sneak through).
- requireInt64: same finite + integer guards as requireInt, range
check against representable [INT64_MIN, INT64_MAX] doubles.
- requireFiniteFloatInRange: convenience wrapper for [lo, hi] checks.
* Routed every previously-vulnerable cast through the new helpers:
- SdVidGenHandlers.cpp: seed (int64), cfg_scale, flow_shift,
high_noise_cfg_scale, high_noise_flow_shift, vae_tile_overlap,
cache_threshold, moe_boundary, strength, vace_strength
- SdGenHandlers.cpp (image path, reviewer asked for symmetric fix):
eta, cfg_scale, guidance, img_cfg_scale, seed, batch_count,
strength, clip_skip, vae_tile_overlap, cache_threshold, width,
height, steps, parseUpscaleRepeats
* parseVaeTileSize (SdParsers.cpp): numeric form now routes through
requireInt (rejects NaN/Inf/fractional/out-of-range), and BOTH
forms (numeric and "WxH" string) now reject <= 0. Five new tests.
== Cancellation gap + typed status
* SdModel.cpp processVideo cancelRequested_ was checked exactly once
after generate_video() returns -- the slow tail (per-frame PNG
fan-out + AVI mux, multi-second on 81-frame 832x480 videos) had no
cancellation visibility. Added 2 checks: top of frame-callback loop
body, and immediately before encodeFramesToAvi.
* Switched both Job cancelled throws (image path at SdModel.cpp:730,
video path at :987, plus the 2 new C1 sites) from bare
std::runtime_error to StatusError tagged with
localCodeMsg="Cancelled", so the JS layer can discriminate cancel
from real internal failures via codeString() ("[ General :: Cancelled ]")
instead of string-matching the exception message.
Note: this PR deliberately does NOT add `Cancelled = 6` to the
shared inference-addon-cpp Errors.hpp enum, because that header
ships via vcpkg to every package in the monorepo and a cross-package
coordinated change is out of scope. Instead we use the 3-arg
StatusError ctor (addonId, localCodeMsg, errorMsg) which produces
the same codeString without touching the shared enum. When the
enum is updated later, the 4 call sites can switch to the 2-arg
ctor in a one-line follow-up.
== C5 (preview_*) -- product decision deferred
* The header comment at SdCtxHandlers.hpp:112 claimed preview_mode et
al are "Wired to sd_set_preview_callback() in SdModel::process()",
but a grep across packages/diffusion-cpp for sd_set_preview_callback
returns zero matches -- the four config keys are validated and stored
but the upstream callback is never installed, so they're a silent
no-op end-to-end. Downgraded the misleading comment to an explicit
TODO(QVAC-18026 follow-up) documenting the gap and the two viable
resolution paths (wire it up alongside sd_set_abort_callback, OR
remove the handlers + fields + tests). Reviewer asked which path is
intended; this commit picks neither and just stops claiming the
wiring exists. The choice can land in a separate PR without holding
this one up.
== Test surface
* +8 JS tests (prompt validation x4, off-grid probe x4)
* +5 C++ tests (vae_tile_size zero/negative/fractional/out-of-range
rejection, plus the existing IntCoercion suite carried over to the
promoted helpers transparently)
* Cancel-context test updated to assert the typed
"[ General :: Cancelled ]" codeString in addition to the message.
Verified locally:
JS unit tests: 75/75 pass with prebuild, 75/75 also without
(CI sanity-checks mode, no native binary loaded)
C++ unit tests: 209/210 pass, 1 opt-in skip
(SdWanHappyPathTest needs ~8GB Wan weights)
npm run lint: clean
npm run test:dts: clean
Co-authored-by: Cursor <cursoragent@cursor.com>
* chore(diffusion-cpp): release 0.8.0
Bumps @qvac/diffusion-cpp to 0.8.0 and documents the Wan 2.1 / Wan 2.2
video pipeline shipped since 0.7.0: new VideoStableDiffusion class
(txt2vid / img2vid / flf2vid), MoE high-noise expert routing, streaming
MJPG AVI muxer, refactored download helpers + Wan model script, plus
the supporting JS + C++ test coverage and validation hardening.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): re-align auto-detected img dims to multiple of 8
_fillDimsFromImage was passing raw image dimensions through verbatim
since fe4d10f, but the native SdGenHandlers validates width/height
% 8 == 0 before the downstream alignment in SdModel::processImage
ever runs. Any img2img call with a non-aligned source image (e.g.
the bundled 500x627 von-neumann.jpg used by the FLUX2 i2i integration
test) therefore failed with:
height must be a positive multiple of 8, got: 627
Restore the Math.ceil(d/8)*8 round-up that was removed in fe4d10f.
The original motivation for the removal -- avoiding a spurious dim
mismatch on the video path where processVideo strict-compares decoded
frame dims against vid.width/vid.height -- is already handled at the
JS layer by VideoStableDiffusion's off-grid pre-validation in
video.js, which runs before this helper and rejects unaligned
init/end/control frames with a clear caller-facing error. The ceil()
is therefore a no-op on the video path.
Co-authored-by: Cursor <cursoragent@cursor.com>
* style(diffusion-cpp): apply clang-format to drifted C++ sources
cpp-lint surfaced clang-format drift in 4 files that accumulated
across recent Wan-video commits. No semantic changes -- only
mechanical line-wrap / arg-break placement to match the project's
.clang-format.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp/test): use package export for video module in wan integration test
The generate-video-wan.test.js test was using a relative import
(require('../../video')) that breaks when test files are bundled
and relocated to the test-framework backend directory during mobile
test setup.
Change to the package export pattern (@qvac/diffusion-cpp/video)
used by other integration tests, which remains valid regardless of
file location.
Fixes: https://github.com/tetherto/qvac/actions/runs/25929776543/job/76221440417
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): expose video API from package root
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(diffusion-cpp): repair variable names in SdModel after merge
Co-authored-by: Cursor <cursoragent@cursor.com>
* style(diffusion-cpp): apply git-clang-format
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bumps the
ggmlport to the merge of tetherto/qvac-ext-ggml#6 (05afdc5981031b8dcfd5f9cc979442b707b8486c).The current pin (
e16bdae2, port-version 6) carries the qvac hybrid-backend packaging work but predates the Wan-required Metal kernels — so today, any consumer hitting the Wan video path on Metal has to ship a local overlay or aborts at runtime with:Five commits land on top of
e16bdae2with this bump:bc053644IM2COL_3Dop +PADleft-padding for Wan video (PR #5)6d2d24bbIM2COL_3Dsupports_opto requiresrc[1]->type == F32b1923e29IM2COL_3Dsupports_opfornb[0]==sizeof(float)and F16-dst => F16-kernel match05afdc59tetherto/qvac-ext-ggml#6into2026-01-30The supports_op tightening commits resolve advertise-then-abort gaps where Metal returned SUPPORTED for
IM2COL_3Dgraphs that the CPU reference would thenGGML_ASSERTon.Files changed
ports/ggml/portfile.cmakeREF e16bdae2…→REF 05afdc59…; newSHA512; header comment updatedports/ggml/vcpkg.jsonport-version: 6→7; description annotatedversions/g-/ggml.json{ git-tree: f1632875…, version-date: 2026-01-30, port-version: 7 }versions/baseline.jsonggml.port-version: 6→7The
git-treeSHA was computed viagit rev-parse HEAD:ports/ggmlafter staging the port edits.Verification
SHA512recomputed fromtetherto/qvac-ext-ggml@05afdc59viacurl … | shasum -a 512.IM2COL_3Dpredicate at the new pin verified via the GitHub contents API to match the final form from PR Updated port qvac-lib-inference-addon-cpp to v0.2.0 #6.diffusion-cpp's local overlay (port-version 104, identicalREF+SHA512); that overlay builds clean with zero patches on darwin-arm64 and runs Wan2.1 1.3B txt2video end-to-end on Metal — which is what this bump enables for all registry consumers.Follow-up
Once this is merged,
qvac/packages/diffusion-cppcan drop its localvcpkg/ports/ggml/overlay entirely (it currently exists only because the registry was 5 commits behind PR #6's merge).Made with Cursor.
Made with Cursor