QVAC-18026 Feature/video#1879
Conversation
- Extract shared dl() function into reusable dl-functions.sh module - Update all download-model-*.sh scripts to source shared utilities - Add download-model-wan.sh for Wan 2.1 video generation models - Reduces code duplication and improves maintainability Wan 2.1 downloads (~8.3 GB): - wan2.1_t2v_1.3B_fp16.safetensors (diffusion model) - wan_2.1_vae.safetensors (VAE encoder/decoder) - umt5_xxl_fp16.safetensors (text encoder) Co-authored-by: Cursor <cursoragent@cursor.com>
… shared parsers
Phase 1-4 of Wan 2.1 / 2.2 video generation support in the diffusion-cpp
addon. Configuration + parsing layer only; dispatch + callback plumbing +
JS surface land in follow-up commits on this branch.
SdCtxConfig:
- Add highNoiseDiffusionModelPath for Wan 2.2 MoE high-noise expert
(leave empty for Wan 2.1 and all non-Wan models)
- Add previewMode / previewInterval / previewDenoised / previewNoisy
for optional mid-denoising preview frames via sd_set_preview_callback
- Wire both through SdCtxHandlers (new JS keys: preview_mode,
preview_interval, preview_denoised, preview_noisy) and AddonJs
(highNoiseDiffusionModelPath in args map)
AviWriter (new utility):
- addon/src/utils/AviWriter.{hpp,cpp} ports the upstream avi_writer.h
MJPG encoder onto an in-memory std::vector<uint8_t> sink (no stdio,
no temp files) so video bytes flow through the existing
OutputCallBackJs queue
- Full input validation (numFrames, fps, jpegQuality, channel count,
frame homogeneity, null data) -- StatusError on any rejection
SdParsers (new shared module):
- Extract parseSampler / parseScheduler / parseCacheMode /
parseVaeTileSize / parseCachePreset / requireNum/Str/Bool from
SdGenHandlers into addon/src/handlers/SdParsers.{hpp,cpp}
- Reused by both SdGenHandlers (image) and SdVidGenHandlers (video)
SdVidGenHandlers (new):
- SdVidGenConfig struct with full Wan 2.1 + 2.2 surface: mode
(txt2vid/img2vid/flf2vid), prompts, dimensions, videoFrames (4k+1
validated), fps, seed, low-noise expert sample params, high-noise
expert sample params, moeBoundary, strength, vaceStrength, VAE
tiling, cache mode/preset/threshold
- 22 JSON handlers with validation for each field
Tests (all pass):
- 5 new SdCtxHandlers tests for preview_* + high_noise path default
- 18 new AviWriter tests covering happy path, RIFF header structure,
all validation rejections, JPEG round-trip
- 54 new SdVidGenHandlers tests covering every field + integration
payload + defaults
- Zero regressions across existing 144 fast-unit tests
No user-facing JS API changes yet.
Co-authored-by: Cursor <cursoragent@cursor.com>
…rapper + examples Builds on the Wan foundation commit by wiring the video path end-to-end from JS to C++ and back. Adds txt2vid / img2vid / flf2vid generation via a new VideoStableDiffusion class that shares the single native binding with the existing ImgStableDiffusion class. Native: - SdModel::process() dispatches on the JSON "mode" field to processImage() (existing) or the new processVideo() path. - processVideo() applies SdVidGenHandlers, validates mode-vs-inputs invariants (img2vid requires init_image; flf2vid requires both; txt2vid rejects both; end_image only valid on flf2vid), decodes init/end/control frames, fills sd_vid_gen_params_t, and encodes the returned sd_image_t* sequence to an in-memory MJPG AVI. - SdVideoFrames RAII wrapper extracted to addon/src/utils/ so it can be unit-tested without a loaded model. - GenerationJob grows endImageBytes and controlFramesBytes plus an optional per-frame frameCallback (unused from JS in this PR; reserved for the preview follow-up). - AddonJs::runJob reads endImageBuffer (single Uint8Array) and controlFramesBuffers (Array of Uint8Array) as typed-array args, no JSON encoding. JS surface: - video.js / video.d.ts: new VideoStableDiffusion class with full per-mode validation, 4k+1 frame-count rule, fps range, moe_boundary range, Uint8Array type checks, and warning when high_noise_* params are set without files.highNoiseDiffusionModel. - addon.js: SdInterface.runJob threads end_image and control_frames through to the native runJob without round-tripping through JSON. - index.js / index.d.ts: unchanged -- image wrapper continues to work exactly as before. Both classes compose the same SdInterface and hit the same binding.cpp entry points. - package.json: exports "./video", ships video.js / video.d.ts, adds generate:video / generate:img2vid / generate:flf2vid scripts. Examples: - examples/generate-video-wan.js (txt2vid @ 832x480, 33 frames) - examples/img2vid-wan.js (reuses assets/von-neumann.jpg as first frame) - examples/flf2vid-wan.js (expects flf-first.png / flf-last.png) Tests: - test_sd_video_frames.cpp: 12 RAII tests (empty states, destruction of 4k+1 production sizes, null-pixel tolerance, bounds-checked operator[], compile-time copy/move deletion). - test_wan_video.cpp: 12 validation tests reusing the SD2.1 context to satisfy isLoaded() and exercise every processVideo() guard before generate_video() runs; plus an opt-in happy-path smoke test (SD_RUN_WAN_SMOKE=1) gated off by default because ggml-metal lacks IM2COL_3D for Wan's 3D convs. Gates: npm run lint, npm run test:dts, npm run build, and the fast subset of addon-test (178/178) all pass. Co-authored-by: Cursor <cursoragent@cursor.com>
Add a vcpkg overlay-port for ggml at vcpkg/ports/ggml/ that pins
tetherto/qvac-ext-ggml @ feature/metal-pr-16669-clean (commit
bc053644). The fork adds Metal kernels for IM2COL_3D and 3-axis
PAD-left, both required by Wan 2.1 / 2.2 video generation; without
them ggml hard-aborts mid-run with "unsupported op 'IM2COL_3D'".
Rationale lives in portfile.cmake -- the overlay is transient and
will be removed once the registry baseline rolls forward.
Add JS test coverage for VideoStableDiffusion:
- test/unit/video-validation.test.js: 63 input-validation cases
mirroring the existing input-validation.test.js pattern.
- test/integration/generate-video-wan.test.js: opt-in
(WAN_INTEGRATION=1) end-to-end T2V smoke test plus sniffAvi
self-tests.
Tune the Wan examples:
- generate-video-wan.js: env-var-driven (PROMPT, FRAMES, STEPS,
SEED, CFG_SCALE, FLOW_SHIFT, ...), inline frame-count cheat
sheet, (4*k+1) pre-flight check, default FRAMES bumped to 81
(Wan 1.3B's native training length).
- img2vid-wan.js, flf2vid-wan.js: flow_shift 5.0 -> 3.0 to match
the upstream test-wan reference scripts.
Refresh the C++ smoke-test gating doc in test_wan_video.cpp to
reflect that Metal works once the overlay is in place.
Drop build.md: the vcpkg overlay rationale already lives next to
the overlay (portfile.cmake header), and transient infrastructure
doesn't earn its own long-form doc.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
(General PR comment — line 24 of The Change if (rawData && typeof rawData === 'object') {to: if (rawData && typeof rawData === 'object' && !ArrayBuffer.isView(rawData)) {
|
gianni-cor
left a comment
There was a problem hiding this comment.
Five C++-side issues from a re-review of the latest commits (one real bug + three hardening + one feature-completeness clarification). Per-comment inline below; the init_image regression and JS-side issues were posted in earlier comment threads.
…, cancel, etc.) Addresses all 8 outstanding comments on PR #1879 (one regression from commit 59f2663 plus a CHANGES_REQUESTED batch of seven items). Major points below; per-file rationale in the inline comments. == Regression fix (highest priority) * gianni-cor flagged that the new init_image strict-equality check from commit 59f2663 rejects every off-grid frame with a confusing error citing wrapper-picked dims. Root cause: addon.js _fillDimsFromImage was silently doing Math.ceil(d/8)*8, so a 100x100 init_image got dispatched as 104x104 and the native check then threw "100x100 != 104x104" -- citing a value the caller never passed. Fixes: - addon.js _fillDimsFromImage now passes dims through verbatim (no rounding). The image SDEdit path already realigns internally (SdModel.cpp ~600) and the FLUX2 ref path uses auto_resize_ref_image, so dropping the rounding is safe across every path. - video.js _runInternal pre-empts the cryptic native error with a JS-layer off-grid probe: when width/height aren't explicit it reads init_image / end_image / control_frames[i] dimensions and throws a clear "your image is off-grid, pre-align or pass explicit dims" message naming the exact buffer. - Removes the ceil-vs-round inconsistency wart between _fillDimsFromImage (ceil) and the user-facing validator (round). - Three new JS regression tests for off-grid init / end / control, plus one positive test for explicit aligned dims overriding the probe. == JS hardening * params.prompt is documented Required but was never validated -- undefined / "" / 42 each produced a different failure mode (silent noise, silent noise, far-away C++ error). video.js now throws a loud TypeError at the wrapper boundary. Four new prompt-validation tests. * mapAddonEvent JobEnded fallback accepted every typed-array view -- works today only because uint8_t is the sole registered TypedArrayOutputHandler. When frameCallback (SdModel.hpp:139) gets wired through to JS, every per-frame event would have been misclassified as JobEnded and the response stream would have closed after the first frame. One-token fix: add `&& !ArrayBuffer.isView(rawData)` to the discriminator. ArrayBuffer.isView is true for every TypedArray + DataView, false for plain objects -- exactly the discrimination needed for the runtime-stats POJO. == C++ parser hardening (NaN / Inf / int64 / range) * Promoted requireInt from SdVidGenHandlers.cpp's anonymous namespace into parsers::, and added two siblings: - requireFiniteFloat: rejects NaN / +inf / -inf before the float cast (NaN compares false against every bound, so range checks of the form `f < lo || f > hi` previously let it sneak through). - requireInt64: same finite + integer guards as requireInt, range check against representable [INT64_MIN, INT64_MAX] doubles. - requireFiniteFloatInRange: convenience wrapper for [lo, hi] checks. * Routed every previously-vulnerable cast through the new helpers: - SdVidGenHandlers.cpp: seed (int64), cfg_scale, flow_shift, high_noise_cfg_scale, high_noise_flow_shift, vae_tile_overlap, cache_threshold, moe_boundary, strength, vace_strength - SdGenHandlers.cpp (image path, reviewer asked for symmetric fix): eta, cfg_scale, guidance, img_cfg_scale, seed, batch_count, strength, clip_skip, vae_tile_overlap, cache_threshold, width, height, steps, parseUpscaleRepeats * parseVaeTileSize (SdParsers.cpp): numeric form now routes through requireInt (rejects NaN/Inf/fractional/out-of-range), and BOTH forms (numeric and "WxH" string) now reject <= 0. Five new tests. == Cancellation gap + typed status * SdModel.cpp processVideo cancelRequested_ was checked exactly once after generate_video() returns -- the slow tail (per-frame PNG fan-out + AVI mux, multi-second on 81-frame 832x480 videos) had no cancellation visibility. Added 2 checks: top of frame-callback loop body, and immediately before encodeFramesToAvi. * Switched both Job cancelled throws (image path at SdModel.cpp:730, video path at :987, plus the 2 new C1 sites) from bare std::runtime_error to StatusError tagged with localCodeMsg="Cancelled", so the JS layer can discriminate cancel from real internal failures via codeString() ("[ General :: Cancelled ]") instead of string-matching the exception message. Note: this PR deliberately does NOT add `Cancelled = 6` to the shared inference-addon-cpp Errors.hpp enum, because that header ships via vcpkg to every package in the monorepo and a cross-package coordinated change is out of scope. Instead we use the 3-arg StatusError ctor (addonId, localCodeMsg, errorMsg) which produces the same codeString without touching the shared enum. When the enum is updated later, the 4 call sites can switch to the 2-arg ctor in a one-line follow-up. == C5 (preview_*) -- product decision deferred * The header comment at SdCtxHandlers.hpp:112 claimed preview_mode et al are "Wired to sd_set_preview_callback() in SdModel::process()", but a grep across packages/diffusion-cpp for sd_set_preview_callback returns zero matches -- the four config keys are validated and stored but the upstream callback is never installed, so they're a silent no-op end-to-end. Downgraded the misleading comment to an explicit TODO(QVAC-18026 follow-up) documenting the gap and the two viable resolution paths (wire it up alongside sd_set_abort_callback, OR remove the handlers + fields + tests). Reviewer asked which path is intended; this commit picks neither and just stops claiming the wiring exists. The choice can land in a separate PR without holding this one up. == Test surface * +8 JS tests (prompt validation x4, off-grid probe x4) * +5 C++ tests (vae_tile_size zero/negative/fractional/out-of-range rejection, plus the existing IntCoercion suite carried over to the promoted helpers transparently) * Cancel-context test updated to assert the typed "[ General :: Cancelled ]" codeString in addition to the message. Verified locally: JS unit tests: 75/75 pass with prebuild, 75/75 also without (CI sanity-checks mode, no native binary loaded) C++ unit tests: 209/210 pass, 1 opt-in skip (SdWanHappyPathTest needs ~8GB Wan weights) npm run lint: clean npm run test:dts: clean Co-authored-by: Cursor <cursoragent@cursor.com>
|
Done in fe4d10f — one-token fix as you suggested. Inline comment added at the call site explaining what each side of the discriminator catches and why (TypedArray + DataView vs plain runtime-stats POJO), so the rationale is visible for whoever wires |
Bumps @qvac/diffusion-cpp to 0.8.0 and documents the Wan 2.1 / Wan 2.2 video pipeline shipped since 0.7.0: new VideoStableDiffusion class (txt2vid / img2vid / flf2vid), MoE high-noise expert routing, streaming MJPG AVI muxer, refactored download helpers + Wan model script, plus the supporting JS + C++ test coverage and validation hardening. Co-authored-by: Cursor <cursoragent@cursor.com>
_fillDimsFromImage was passing raw image dimensions through verbatim since fe4d10f, but the native SdGenHandlers validates width/height % 8 == 0 before the downstream alignment in SdModel::processImage ever runs. Any img2img call with a non-aligned source image (e.g. the bundled 500x627 von-neumann.jpg used by the FLUX2 i2i integration test) therefore failed with: height must be a positive multiple of 8, got: 627 Restore the Math.ceil(d/8)*8 round-up that was removed in fe4d10f. The original motivation for the removal -- avoiding a spurious dim mismatch on the video path where processVideo strict-compares decoded frame dims against vid.width/vid.height -- is already handled at the JS layer by VideoStableDiffusion's off-grid pre-validation in video.js, which runs before this helper and rejects unaligned init/end/control frames with a clear caller-facing error. The ceil() is therefore a no-op on the video path. Co-authored-by: Cursor <cursoragent@cursor.com>
cpp-lint surfaced clang-format drift in 4 files that accumulated across recent Wan-video commits. No semantic changes -- only mechanical line-wrap / arg-break placement to match the project's .clang-format. Co-authored-by: Cursor <cursoragent@cursor.com>
…ntegration test
The generate-video-wan.test.js test was using a relative import
(require('../../video')) that breaks when test files are bundled
and relocated to the test-framework backend directory during mobile
test setup.
Change to the package export pattern (@qvac/diffusion-cpp/video)
used by other integration tests, which remains valid regardless of
file location.
Fixes: https://github.com/tetherto/qvac/actions/runs/25929776543/job/76221440417
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
/review |
|
/review |
QVAC-18026 Wan 2.1 text-to-video generation support
Add complete text-to-video inference pipeline for Wan 2.1 (1.3B & 14B) models with GGML backend support:
Wan model optimizations included: flow_shift=3.0 for smooth motion, frame constraints (4k+1), and motion-explicit prompt guidance.