QVAC-14019: feat(diffusion): add img2img generation via in-context conditioning by aegioscy · Pull Request #884 · tetherto/qvac

aegioscy · 2026-03-13T12:07:01Z

Summary

Adds img2img (image-to-image) generation to lib-infer-diffusion using FLUX in-context conditioning — the reference image is attended to via joint attention, NOT mixed with noise
Wires the full JS → C++ pipeline: PNG/JPEG dimension auto-detection, Uint8Array serialization, and automatic mode selection (init_image present → img2img, otherwise → txt2img)
Includes C++ unit tests, JS integration tests, and example scripts

How it works

The user passes init_image (a PNG/JPEG Uint8Array) alongside a text prompt. Internally:

The reference image is VAE-encoded into separate latent tokens
The target starts from pure noise (not noised input)
The FLUX transformer attends to both reference and target tokens via joint attention with distinct RoPE positions
The model reasons about reference features (skin tone, structure, facial identity) while generating a new image guided by the prompt
This approach (matching the Iris C engine) produces significantly better results than traditional img2img (VAE encode → add noise → denoise), which loses identity features at high strength and produces artifacts at low strength.

Changes

JS layer

addon.js — readImageDimensions() extracts width/height from PNG IHDR or JPEG SOFx headers. runJob() serializes init_image Uint8Array as ref_image_bytes JSON array and auto-injects dimensions to prevent GGML tensor shape assertions.
index.js — _runInternal() auto-selects mode: img2img when init_image is present, txt2img otherwise.

C++ layer

SdModel.cpp — load() sets vae_decode_only = false for VAE encoder graph. process() decodes ref_image_bytes, sets ref_images + auto_resize_ref_image for FLUX joint-attention conditioning.
SdGenHandlers.cpp — Mode handler validates txt2img and img2img.

Tests

test/unit/test_img2img.cpp (301 lines) — JSON round-trip, dimension override, strength bounds, synthetic image pipeline, cancel
test/unit/test_ref2img.cpp (390 lines) — reference image routing, auto-resize, full FLUX2 generation with real headshot
test/integration/generate-image-flux2-i2i.test.js (175 lines) — end-to-end FLUX2-klein img2img

Examples

examples/img2img-flux2.js — FLUX2-klein Q8 img2img
examples/img2img-flux2-f16.js — FLUX2-klein F16 variant
examples/img2img-sdxl.js — SDXL img2img
examples/ref2img-flux2.js — In-context conditioning example

Usage

const response = await model.run({
  prompt: 'a soccer player version of this photo',
  init_image: fs.readFileSync('headshot.jpg'),
  steps: 15,
  guidance: 9.0
})

Test Plan

Build

npm run build — native addon builds successfully
npm run test:cpp:build — C++ test binary compiles

C++ Unit Tests

npm run test:cpp:run:unit — SdModelTest + SdBackendSelectionTest
npm run test:cpp:run:loading — SdModelLoadingTest
npm run test:cpp:run:inference — SdSingleStepInferenceTest
npm run test:cpp:run:generation — SdFullGenerationTest
npm run test:cpp:run — all C++ tests (includes img2img + ref2img + cancel + gen_handlers)

JS Integration Tests

npm run test:integration — all JS integration tests
Individual: generate-image-flux2-i2i.test.js — FLUX2 img2img end-to-end
Individual: generate-image-flux2.test.js — FLUX2 txt2img
Individual: generate-image-sdxl.test.js — SDXL txt2img
Individual: generate-image-sd3.test.js — SD3 txt2img
Individual: generate-image.test.js — SD1/SD2 txt2img
Individual: model-loading.test.js — model load/unload
Individual: api-behavior.test.js — API behaviour validation

Examples (manual)

bare examples/img2img-flux2.js — FLUX2 Q8 img2img
bare examples/img2img-flux2-f16.js — FLUX2 F16 img2img
bare examples/img2img-sdxl.js — SDXL img2img
bare examples/ref2img-flux2.js — FLUX2 in-context conditioning
bare examples/generate-image.js — SD txt2img (regression)
bare examples/generate-image-sdxl.js — SDXL txt2img (regression)
bare examples/generate-image-sd3.js — SD3 txt2img (regression)
bare examples/quickstart.js — quickstart (regression)

Regression

txt2img workflows produce identical output (no init_image → mode stays txt2img)
npm run lint — JS lint passes

QVAC-13445 Quick Updates for February 24th

Sd loading complete on MacBook Air.

…fferent model types

Sd

updated for sd2

got full sdxl to work on Mac

…usion Resolves file-location conflicts for SD3 files added in sd-sd3 branch by placing them under the renamed packages/qvac-lib-infer-diffusion path. Made-with: Cursor

sd3 finished

Rename package directory from packages/qvac-lib-infer-diffusion to packages/lib-infer-diffusion to align with the lib-* naming convention used across the monorepo. Made-with: Cursor

rename: qvac-lib-infer-diffusion -> lib-infer-diffusion

…nto feature-media-generation

gianni-cor

Remaining nit: stats report user-requested dimensions instead of actual output dimensions

When SDEdit or FLUX override genParams.width/genParams.height (e.g. user passes explicit 768x768 but the input image is 375x500), the stats at lines 702-725 still read from gen.width/gen.height which hold the original JSON values. Fix: sync gen after each override so stats reflect what was actually generated.

….cpp Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

gianni-cor

Please guard the FLUX img2img entry point in JS. The latest SdModel.cpp change fixed the runtime-stats mismatch, but FLUX img2img can still silently take the wrong native branch when users rely on prediction: 'auto' / omitted prediction.

The addon still decides FLUX vs SDEdit from config_.prediction, not from the model family auto-detected inside stable-diffusion.cpp, so this remains a user-facing footgun. A JS-side validation here would make the failure immediate and actionable.

gianni-cor · 2026-04-15T12:26:46Z

One additional docs/types issue: packages/lib-infer-diffusion/index.d.ts still says

/** Noise prediction type override (auto-detected from model by default) */
prediction?: PredictionType

That wording is misleading for FLUX img2img in the current addon implementation. Auto-detection may be sufficient inside stable-diffusion.cpp for load/inference, but it is not sufficient for the addon's FLUX-vs-SDEdit img2img branch selection. Please update this docstring to make it clear that FLUX img2img currently requires an explicit flux_flow / flux2_flow prediction.

gianni-cor

One more issue that I think needs fixing before merge: readImageDimensions() currently trusts fixed PNG/JPEG offsets without verifying the buffer is long enough. Because the JS img2img path auto-injects width / height from this helper when callers omit them, a truncated/corrupt image can produce bogus dimensions and a misleading request failure instead of a clean decode error.

aegioscy · 2026-04-15T12:53:31Z

@gianni-cor , addressed comments, new regression tests exist to prevent these two bugs from silently regressing in the future.

FLUX prediction guard, without a test, someone could remove or weaken the guard and the CI would still pass, leaving the silent-wrong-branch footgun back in place.
Truncated image dimensions , a refactor of readImageDimensions() could re-introduce the missing length checks, causing corrupt images to produce bogus width/height values instead of a clean failure.

This reverts commit 8082388.

…ensions - Add JS-side guard in _runInternal() that throws when init_image is present on a FLUX model (llmModel set) but prediction is not explicitly flux2_flow or flux_flow, preventing silent fallback to SDEdit branch - Add buffer-length checks to readImageDimensions() for truncated PNG (require >= 24 bytes) and JPEG (validate segLen >= 2, guard SOF reads) - Update prediction docstring in index.d.ts to clarify FLUX img2img requires an explicit prediction value - Add regression tests for all of the above (13 cases) Made-with: Cursor

- Update prediction docstring to focus on FLUX.2 img2img guidance - Remove FLUX.1 from encoder file name comments (keep only relevant models) - Update error message to reference FLUX.2 only in user-facing guidance - Keep flux_flow type in PredictionType union for backward compatibility Made-with: Cursor

Register the new input-validation regression tests in the mobile test runner so truncated image and FLUX prediction guard tests run on all platforms. Made-with: Cursor

Made-with: Cursor

- Bump package version from 0.1.3 to 0.2.0 for img2img feature release - Update CHANGELOG.md with 0.2.0 entry: FLUX.2 img2img, input validation, regression tests - Remove stale CHANGELOG (keeping CHANGELOG.md as canonical source) Made-with: Cursor

Restore default-registry baseline to a9eae49a7c95a63 (matches main). The 87783998cb67fe6 baseline was an unintended change. Made-with: Cursor

gianni-cor · 2026-04-15T16:10:39Z

/review

github-actions · 2026-05-15T12:53:53Z

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #884
Commit: e8c2237

Test Summary

Metric	Count
Total Tests	0
✅ Passed	0
❌ Failed	0
⏭️ Skipped	0

Links

🔗 Device Farm Run: View on AWS Device Farm
🔗 Workflow: View Details
📋 Run ARN: N/A

Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

github-actions · 2026-05-15T12:55:40Z

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #884
Commit: e8c2237

Test Summary

Metric	Count
Total Tests	0
✅ Passed	0
❌ Failed	0
⏭️ Skipped	0

Links

🔗 Device Farm Run: View on AWS Device Farm
🔗 Workflow: View Details
📋 Run ARN: N/A

Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

Nik and others added 30 commits February 24, 2026 09:29

updated for sd

a7ddced

updated and successfuly built

dfa0f59

downloads

5564ec7

Merge pull request #517 from aegioscy/sd

ce927db

QVAC-13445 Quick Updates for February 24th

updated with working loading

c239c8b

Merge pull request #523 from tetherto/sd

72aa075

Sd loading complete on MacBook Air.

updated load model js for Q4_K test

e06fea6

rewrote parameter handling to support multiple params and also two di…

4713ab2

…fferent model types

got sd inference to work

ff0be14

Merge pull request #545 from tetherto/sd

6bf7db6

Sd

updated for sd2

21b2257

Merge pull request #557 from tetherto/sd

b6ee2b8

updated for sd2

got full sdxl to work

834f069

Merge pull request #593 from tetherto/sd

ee1b525

got full sdxl to work on Mac

rename folder to qvac-lib-infer-diffusion

a9cb009

update package name

8730525

sd3 finished

ba0cccd

Merge feature-media-generation: rename package to qvac-lib-infer-diff…

139b0b6

…usion Resolves file-location conflicts for SD3 files added in sd-sd3 branch by placing them under the renamed packages/qvac-lib-infer-diffusion path. Made-with: Cursor

Merge pull request #618 from tetherto/sd

d9a9642

sd3 finished

rename: qvac-lib-infer-diffusion -> lib-infer-diffusion

cc9327a

Rename package directory from packages/qvac-lib-infer-diffusion to packages/lib-infer-diffusion to align with the lib-* naming convention used across the monorepo. Made-with: Cursor

Merge pull request #620 from tetherto/media-gen-rename

40e104f

rename: qvac-lib-infer-diffusion -> lib-infer-diffusion

updated for cuda linux

1683a05

updated for model

2573b6f

have something working

9b30ad0

changelog

8287228

cpp lint

ba1068a

Merge branch 'main' into feature-media-generation

4b031d6

formatt

6d96bda

Merge branch 'feature-media-generation' of github.com:tetherto/qvac i…

38ad01d

…nto feature-media-generation

Merge branch 'main' into feature-media-generation

d284265

fix(diffusion): apply clang-format-19 to test_stb_image_security.cpp

395e115

maxim-smotrov previously approved these changes Apr 14, 2026

View reviewed changes

jesusmb1995 previously approved these changes Apr 14, 2026

View reviewed changes

gianni-cor previously requested changes Apr 15, 2026

View reviewed changes

Comment thread packages/lib-infer-diffusion/addon/src/model-interface/SdModel.cpp

Comment thread packages/lib-infer-diffusion/addon/src/model-interface/SdModel.cpp

aegioscy and others added 2 commits April 15, 2026 12:42

Update packages/lib-infer-diffusion/addon/src/model-interface/SdModel…

5046966

….cpp Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

Update packages/lib-infer-diffusion/addon/src/model-interface/SdModel…

c461f5f

….cpp Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

gianni-cor requested changes Apr 15, 2026

View reviewed changes

Comment thread packages/lib-infer-diffusion/index.js

gianni-cor requested changes Apr 15, 2026

View reviewed changes

Comment thread packages/lib-infer-diffusion/addon.js

fix(diffusion): format cpp files with clang-format-19

8082388

aegioscy added 3 commits April 15, 2026 15:54

Revert "fix(diffusion): format cpp files with clang-format-19"

0e5908e

This reverts commit 8082388.

gianni-cor requested changes Apr 15, 2026

View reviewed changes

Comment thread packages/lib-infer-diffusion/test/mobile/integration.auto.cjs

aegioscy added 2 commits April 15, 2026 16:37

test(diffusion): add input-validation test to mobile integration suite

bff36aa

Register the new input-validation regression tests in the mobile test runner so truncated image and FLUX prediction guard tests run on all platforms. Made-with: Cursor

Merge remote-tracking branch 'origin/main' into feature-im2im

fb03a68

github-code-quality Bot found potential problems Apr 15, 2026

View reviewed changes

Comment thread packages/lib-infer-diffusion/test/mobile/integration.auto.cjs Dismissed

Merge origin/main: keep img2img implementation from feature-im2im

ccff5c1

Made-with: Cursor

gianni-cor approved these changes Apr 15, 2026

View reviewed changes

gianni-cor requested changes Apr 15, 2026

View reviewed changes

Comment thread packages/lib-infer-diffusion/CHANGELOG Outdated

Comment thread packages/lib-infer-diffusion/package.json

gianni-cor requested changes Apr 15, 2026

View reviewed changes

Comment thread packages/lib-infer-diffusion/vcpkg-configuration.json Outdated

fix(diffusion): revert vcpkg registry baseline to main

749ed8c

Restore default-registry baseline to a9eae49a7c95a63 (matches main). The 87783998cb67fe6 baseline was an unintended change. Made-with: Cursor

gianni-cor approved these changes Apr 15, 2026

View reviewed changes

donriddo approved these changes Apr 15, 2026

View reviewed changes

jesusmb1995 approved these changes Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-14019: feat(diffusion): add img2img generation via in-context conditioning#884

QVAC-14019: feat(diffusion): add img2img generation via in-context conditioning#884
gianni-cor merged 148 commits into
mainfrom
feature-im2im

aegioscy commented Mar 13, 2026 •

edited

Loading

Uh oh!

gianni-cor left a comment

Uh oh!

Uh oh!

Uh oh!

gianni-cor left a comment

Uh oh!

Uh oh!

gianni-cor commented Apr 15, 2026

Uh oh!

gianni-cor left a comment

Uh oh!

Uh oh!

aegioscy commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gianni-cor commented Apr 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

aegioscy commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Changes

JS layer

C++ layer

Tests

Examples

Usage

Test Plan

Build

C++ Unit Tests

JS Integration Tests

Examples (manual)

Regression

Uh oh!

gianni-cor left a comment

Choose a reason for hiding this comment

Remaining nit: stats report user-requested dimensions instead of actual output dimensions

Uh oh!

Uh oh!

Uh oh!

gianni-cor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gianni-cor commented Apr 15, 2026

Uh oh!

gianni-cor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aegioscy commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gianni-cor commented Apr 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

❌ E2E Mobile Test Results - iOS

Test Summary

Links

Uh oh!

github-actions Bot commented May 15, 2026

❌ E2E Mobile Test Results - Android

Test Summary

Links

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

aegioscy commented Mar 13, 2026 •

edited

Loading