Skip to content

QVAC-14019: feat(diffusion): add img2img generation via in-context conditioning#884

Merged
gianni-cor merged 148 commits into
mainfrom
feature-im2im
Apr 15, 2026
Merged

QVAC-14019: feat(diffusion): add img2img generation via in-context conditioning#884
gianni-cor merged 148 commits into
mainfrom
feature-im2im

Conversation

@aegioscy

@aegioscy aegioscy commented Mar 13, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds img2img (image-to-image) generation to lib-infer-diffusion using FLUX in-context conditioning — the reference image is attended to via joint attention, NOT mixed with noise
  • Wires the full JS → C++ pipeline: PNG/JPEG dimension auto-detection, Uint8Array serialization, and automatic mode selection (init_image present → img2img, otherwise → txt2img)
  • Includes C++ unit tests, JS integration tests, and example scripts

How it works

The user passes init_image (a PNG/JPEG Uint8Array) alongside a text prompt. Internally:

  1. The reference image is VAE-encoded into separate latent tokens
  2. The target starts from pure noise (not noised input)
  3. The FLUX transformer attends to both reference and target tokens via joint attention with distinct RoPE positions
  4. The model reasons about reference features (skin tone, structure, facial identity) while generating a new image guided by the prompt
    This approach (matching the Iris C engine) produces significantly better results than traditional img2img (VAE encode → add noise → denoise), which loses identity features at high strength and produces artifacts at low strength.

Changes

JS layer

  • addon.jsreadImageDimensions() extracts width/height from PNG IHDR or JPEG SOFx headers. runJob() serializes init_image Uint8Array as ref_image_bytes JSON array and auto-injects dimensions to prevent GGML tensor shape assertions.
  • index.js_runInternal() auto-selects mode: img2img when init_image is present, txt2img otherwise.

C++ layer

  • SdModel.cppload() sets vae_decode_only = false for VAE encoder graph. process() decodes ref_image_bytes, sets ref_images + auto_resize_ref_image for FLUX joint-attention conditioning.
  • SdGenHandlers.cpp — Mode handler validates txt2img and img2img.

Tests

  • test/unit/test_img2img.cpp (301 lines) — JSON round-trip, dimension override, strength bounds, synthetic image pipeline, cancel
  • test/unit/test_ref2img.cpp (390 lines) — reference image routing, auto-resize, full FLUX2 generation with real headshot
  • test/integration/generate-image-flux2-i2i.test.js (175 lines) — end-to-end FLUX2-klein img2img

Examples

  • examples/img2img-flux2.js — FLUX2-klein Q8 img2img
  • examples/img2img-flux2-f16.js — FLUX2-klein F16 variant
  • examples/img2img-sdxl.js — SDXL img2img
  • examples/ref2img-flux2.js — In-context conditioning example

Usage

const response = await model.run({
  prompt: 'a soccer player version of this photo',
  init_image: fs.readFileSync('headshot.jpg'),
  steps: 15,
  guidance: 9.0
})

Test Plan

Build

  • npm run build — native addon builds successfully
  • npm run test:cpp:build — C++ test binary compiles

C++ Unit Tests

  • npm run test:cpp:run:unitSdModelTest + SdBackendSelectionTest
  • npm run test:cpp:run:loadingSdModelLoadingTest
  • npm run test:cpp:run:inferenceSdSingleStepInferenceTest
  • npm run test:cpp:run:generationSdFullGenerationTest
  • npm run test:cpp:run — all C++ tests (includes img2img + ref2img + cancel + gen_handlers)

JS Integration Tests

  • npm run test:integration — all JS integration tests
  • Individual: generate-image-flux2-i2i.test.js — FLUX2 img2img end-to-end
  • Individual: generate-image-flux2.test.js — FLUX2 txt2img
  • Individual: generate-image-sdxl.test.js — SDXL txt2img
  • Individual: generate-image-sd3.test.js — SD3 txt2img
  • Individual: generate-image.test.js — SD1/SD2 txt2img
  • Individual: model-loading.test.js — model load/unload
  • Individual: api-behavior.test.js — API behaviour validation

Examples (manual)

  • bare examples/img2img-flux2.js — FLUX2 Q8 img2img
  • bare examples/img2img-flux2-f16.js — FLUX2 F16 img2img
  • bare examples/img2img-sdxl.js — SDXL img2img
  • bare examples/ref2img-flux2.js — FLUX2 in-context conditioning
  • bare examples/generate-image.js — SD txt2img (regression)
  • bare examples/generate-image-sdxl.js — SDXL txt2img (regression)
  • bare examples/generate-image-sd3.js — SD3 txt2img (regression)
  • bare examples/quickstart.js — quickstart (regression)

Regression

  • txt2img workflows produce identical output (no init_image → mode stays txt2img)
  • npm run lint — JS lint passes

Nik and others added 30 commits February 24, 2026 09:29
QVAC-13445 Quick Updates for February 24th
Sd loading complete on MacBook Air.
got full sdxl to work on Mac
…usion

Resolves file-location conflicts for SD3 files added in sd-sd3 branch
by placing them under the renamed packages/qvac-lib-infer-diffusion path.

Made-with: Cursor
Rename package directory from packages/qvac-lib-infer-diffusion to
packages/lib-infer-diffusion to align with the lib-* naming convention
used across the monorepo.

Made-with: Cursor
rename: qvac-lib-infer-diffusion -> lib-infer-diffusion
maxim-smotrov
maxim-smotrov previously approved these changes Apr 14, 2026
jesusmb1995
jesusmb1995 previously approved these changes Apr 14, 2026

@gianni-cor gianni-cor left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining nit: stats report user-requested dimensions instead of actual output dimensions

When SDEdit or FLUX override genParams.width/genParams.height (e.g. user passes explicit 768x768 but the input image is 375x500), the stats at lines 702-725 still read from gen.width/gen.height which hold the original JSON values. Fix: sync gen after each override so stats reflect what was actually generated.

Comment thread packages/lib-infer-diffusion/addon/src/model-interface/SdModel.cpp
Comment thread packages/lib-infer-diffusion/addon/src/model-interface/SdModel.cpp
aegioscy and others added 2 commits April 15, 2026 12:42
….cpp

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
….cpp

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

@gianni-cor gianni-cor left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please guard the FLUX img2img entry point in JS. The latest SdModel.cpp change fixed the runtime-stats mismatch, but FLUX img2img can still silently take the wrong native branch when users rely on prediction: 'auto' / omitted prediction.

The addon still decides FLUX vs SDEdit from config_.prediction, not from the model family auto-detected inside stable-diffusion.cpp, so this remains a user-facing footgun. A JS-side validation here would make the failure immediate and actionable.

Comment thread packages/lib-infer-diffusion/index.js
@gianni-cor

Copy link
Copy Markdown
Contributor

One additional docs/types issue: packages/lib-infer-diffusion/index.d.ts still says

/** Noise prediction type override (auto-detected from model by default) */
prediction?: PredictionType

That wording is misleading for FLUX img2img in the current addon implementation. Auto-detection may be sufficient inside stable-diffusion.cpp for load/inference, but it is not sufficient for the addon's FLUX-vs-SDEdit img2img branch selection. Please update this docstring to make it clear that FLUX img2img currently requires an explicit flux_flow / flux2_flow prediction.

@gianni-cor gianni-cor left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more issue that I think needs fixing before merge: readImageDimensions() currently trusts fixed PNG/JPEG offsets without verifying the buffer is long enough. Because the JS img2img path auto-injects width / height from this helper when callers omit them, a truncated/corrupt image can produce bogus dimensions and a misleading request failure instead of a clean decode error.

Comment thread packages/lib-infer-diffusion/addon.js
@aegioscy

Copy link
Copy Markdown
Contributor Author

@gianni-cor , addressed comments, new regression tests exist to prevent these two bugs from silently regressing in the future.

  • FLUX prediction guard, without a test, someone could remove or weaken the guard and the CI would still pass, leaving the silent-wrong-branch footgun back in place.
  • Truncated image dimensions , a refactor of readImageDimensions() could re-introduce the missing length checks, causing corrupt images to produce bogus width/height values instead of a clean failure.

…ensions

- Add JS-side guard in _runInternal() that throws when init_image is
  present on a FLUX model (llmModel set) but prediction is not explicitly
  flux2_flow or flux_flow, preventing silent fallback to SDEdit branch
- Add buffer-length checks to readImageDimensions() for truncated PNG
  (require >= 24 bytes) and JPEG (validate segLen >= 2, guard SOF reads)
- Update prediction docstring in index.d.ts to clarify FLUX img2img
  requires an explicit prediction value
- Add regression tests for all of the above (13 cases)

Made-with: Cursor
- Update prediction docstring to focus on FLUX.2 img2img guidance
- Remove FLUX.1 from encoder file name comments (keep only relevant models)
- Update error message to reference FLUX.2 only in user-facing guidance
- Keep flux_flow type in PredictionType union for backward compatibility

Made-with: Cursor
Comment thread packages/lib-infer-diffusion/test/mobile/integration.auto.cjs
Register the new input-validation regression tests in the mobile test runner
so truncated image and FLUX prediction guard tests run on all platforms.

Made-with: Cursor
Comment thread packages/lib-infer-diffusion/test/mobile/integration.auto.cjs Dismissed
Comment thread packages/lib-infer-diffusion/CHANGELOG Outdated
Comment thread packages/lib-infer-diffusion/package.json
- Bump package version from 0.1.3 to 0.2.0 for img2img feature release
- Update CHANGELOG.md with 0.2.0 entry: FLUX.2 img2img, input validation, regression tests
- Remove stale CHANGELOG (keeping CHANGELOG.md as canonical source)

Made-with: Cursor
Comment thread packages/lib-infer-diffusion/vcpkg-configuration.json Outdated
Restore default-registry baseline to a9eae49a7c95a63 (matches main).
The 87783998cb67fe6 baseline was an unintended change.

Made-with: Cursor
@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #884
Commit: e8c2237

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #884
Commit: e8c2237

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants