Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
148 commits
Select commit Hold shift + click to select a range
a7ddced
updated for sd
Feb 24, 2026
dfa0f59
updated and successfuly built
Feb 24, 2026
5564ec7
downloads
Feb 24, 2026
ce927db
Merge pull request #517 from aegioscy/sd
aegioscy Feb 24, 2026
c239c8b
updated with working loading
Feb 24, 2026
72aa075
Merge pull request #523 from tetherto/sd
aegioscy Feb 24, 2026
e06fea6
updated load model js for Q4_K test
Feb 25, 2026
4713ab2
rewrote parameter handling to support multiple params and also two di…
Feb 25, 2026
ff0be14
got sd inference to work
Feb 25, 2026
6bf7db6
Merge pull request #545 from tetherto/sd
aegioscy Feb 25, 2026
21b2257
updated for sd2
Feb 26, 2026
b6ee2b8
Merge pull request #557 from tetherto/sd
aegioscy Feb 26, 2026
834f069
got full sdxl to work
Feb 27, 2026
ee1b525
Merge pull request #593 from tetherto/sd
aegioscy Feb 27, 2026
a9cb009
rename folder to qvac-lib-infer-diffusion
gianni-cor Feb 27, 2026
8730525
update package name
gianni-cor Feb 27, 2026
ba0cccd
sd3 finished
Mar 1, 2026
139b0b6
Merge feature-media-generation: rename package to qvac-lib-infer-diff…
Mar 1, 2026
d9a9642
Merge pull request #618 from tetherto/sd
aegioscy Mar 1, 2026
cc9327a
rename: qvac-lib-infer-diffusion -> lib-infer-diffusion
Mar 2, 2026
40e104f
Merge pull request #620 from tetherto/media-gen-rename
aegioscy Mar 2, 2026
1683a05
updated for cuda linux
Mar 2, 2026
2573b6f
updated for model
Mar 2, 2026
9b30ad0
have something working
Mar 3, 2026
8287228
changelog
Mar 3, 2026
ba1068a
cpp lint
Mar 3, 2026
4b031d6
Merge branch 'main' into feature-media-generation
donriddo Mar 3, 2026
6d96bda
formatt
Mar 3, 2026
38ad01d
Merge branch 'feature-media-generation' of github.com:tetherto/qvac i…
Mar 3, 2026
d284265
Merge branch 'main' into feature-media-generation
donriddo Mar 4, 2026
f6b1417
updated model for gian
Mar 4, 2026
3874f3a
Merge branch 'feature-media-generation' of github.com:tetherto/qvac i…
Mar 4, 2026
f469311
Merge branch 'main' into feature-media-generation
Proletter Mar 4, 2026
9b5bc5f
integration test
Mar 4, 2026
c0bd401
Merge branch 'feature-media-generation' of https://github.com/tethert…
Mar 4, 2026
8a6a1d5
fixing according to boss
Mar 4, 2026
12b6f38
fix(android): enable BUILD_SHARED_LIBS and stub pthread_cancel for GG…
Mar 4, 2026
b814301
fix(android): exclude Vulkan on Android and fix pthread_cancel stub
Mar 4, 2026
ecf9a71
ci: dump vcpkg configure logs on failure for android build
Mar 4, 2026
a279e0e
fix(android): insert pthread_cancel stub after pthread.h include
Mar 4, 2026
9cd62b9
fix(android): resolve BUILD_SHARED_LIBS override and pthread_cancel i…
Mar 4, 2026
b2ca6bd
updated for android hopefully works
Mar 4, 2026
0fac183
added opencl support for android
Mar 5, 2026
23f778d
windows attempt fix
Mar 5, 2026
02c5471
attempting to fix windows again
Mar 5, 2026
e704ab6
NORM problem with ggml operation
Mar 5, 2026
63a3b61
attempting to patch norm
Mar 5, 2026
df09ddd
attempting again to fix
Mar 5, 2026
f033816
diagonstic step
Mar 5, 2026
a678b6d
update for opencl
Mar 5, 2026
94b6bcd
Merge branch 'main' into feature-media-generation
gianni-cor Mar 6, 2026
e3a1568
Merge branch 'main' into feature-media-generation
gianni-cor Mar 6, 2026
b0f3b9c
updated for device selection
Mar 6, 2026
63ce363
Merge branch 'feature-media-generation' of https://github.com/tethert…
Mar 6, 2026
75c2d36
fix(diffusion): add CI/CD workflows, test infra, and integration test…
donriddo Mar 6, 2026
7e89d1e
fixed integration tests
Mar 6, 2026
57c5404
resolved
Mar 6, 2026
c710640
updated
Mar 6, 2026
9d1c040
updated timeout
Mar 6, 2026
9c37577
cpp unit tests complete and tested YAY BABY
Mar 9, 2026
c50921c
cpp lint
Mar 9, 2026
052f222
updated
Mar 9, 2026
29db6db
test(diffusion): add integration tests for SDXL, SD3, and FLUX.2 (#757)
donriddo Mar 9, 2026
a0f033a
QVAC-13954: Clean up vcpkg deps in lib-infer-diffusion (#781)
jpgaribotti Mar 9, 2026
fd84602
updated for runtime stats
Mar 10, 2026
1fadd8d
fixed connection to logger, as it was not properly connected before
Mar 10, 2026
38320db
fixed for license file, validated working run on m1 air
Mar 10, 2026
f17de01
quickstart quick-maths
Mar 10, 2026
d2910fb
fixed integration for windows
Mar 10, 2026
c7b4f48
fix(diffusion): add real cancel/abort support to native generation (#…
donriddo Mar 10, 2026
f912111
Merge branch 'main' into feature-media-generation
gianni-cor Mar 11, 2026
f551adb
refactor(diffusion): static ggml core with DL backends and CMakeLists…
jpgaribotti Mar 11, 2026
e2f140e
feat(diffusion): hybrid static CPU + dynamic GPU backends for Android…
jpgaribotti Mar 11, 2026
809e31a
Merge branch 'main' into feature-media-generation
gianni-cor Mar 11, 2026
51081c7
fix(diffusion): JS layer review fixes and cancel test coverage (#783)
donriddo Mar 12, 2026
d47cb08
feat(diffusion): move stable-diffusion-cpp to registry (#865)
jpgaribotti Mar 12, 2026
7a3aa34
updated i2i
Mar 12, 2026
679eff2
working anime version of i2i
Mar 13, 2026
46ba818
Merge feature-media-generation into feature-im2im
Mar 13, 2026
73971d4
cpp lint
Mar 13, 2026
3396fda
fixed
Mar 25, 2026
2a57b66
Merge origin/main into feature-im2im
aegioscy Mar 27, 2026
20cb31d
feat(diffusion): unify img2img to always use in-context conditioning
aegioscy Mar 27, 2026
b471e5f
chore(diffusion): remove accidentally committed 27MB android prebuild…
aegioscy Mar 27, 2026
a5dd58f
fix(diffusion): remove unload() calls from img2img/ref2img tests
aegioscy Mar 27, 2026
fc2f146
refactor(diffusion): unify img2img API, add von Neumann test asset, r…
aegioscy Mar 30, 2026
3682f7c
style(diffusion): fix standard lint violations in img2img examples
aegioscy Mar 30, 2026
24624e3
Merge branch 'main' into feature-im2im
gianni-cor Mar 30, 2026
024f430
fix(diffusion): add bare-fs as direct dependency to resolve CI module…
aegioscy Mar 31, 2026
3d5f12e
attempting to resolve dl
aegioscy Apr 1, 2026
019264a
fixed pathing issue
aegioscy Apr 1, 2026
1b3af2b
increased timeouts
aegioscy Apr 1, 2026
2e2121e
fix(diffusion): skip FLUX2 img2img test on CPU-only runners
aegioscy Apr 1, 2026
7f5affb
fix(diffusion): only set SD_CPU_ONLY on no-GPU runners
aegioscy Apr 1, 2026
11e9b55
fix(diffusion): remove SD_CPU_ONLY env var from workflow
aegioscy Apr 2, 2026
e811013
Merge remote-tracking branch 'origin/main' into feature-im2im
aegioscy Apr 2, 2026
1de4fa2
fix(diffusion): remove ggml overlay port to use registry version
aegioscy Apr 2, 2026
92f6189
changed seed and description
aegioscy Apr 2, 2026
c0cb44e
fix(diffusion): increase Windows test timeout to 30 minutes
aegioscy Apr 2, 2026
cbbc500
chore(diffusion): regenerate mobile integration tests
aegioscy Apr 3, 2026
038e823
feat(diffusion): change FLUX2 txt2img prompt to cartoon watercolor style
aegioscy Apr 3, 2026
842e8dd
fix(diffusion): double test timeouts on Windows
aegioscy Apr 3, 2026
957432c
feat(diffusion): add SD3 img2img support with SDEdit and dual-path ro…
aegioscy Apr 8, 2026
026d9af
Merge branch 'main' into feature-im2im
aegioscy Apr 8, 2026
64d6c70
added linting fix
aegioscy Apr 9, 2026
af8ecaf
fixed integration test
aegioscy Apr 9, 2026
abd8cb3
updated cpp lint
aegioscy Apr 9, 2026
9f347ab
updated for sizing
aegioscy Apr 9, 2026
d906438
fix(diffusion): fix SD3 img2img integration test OOM on Vulkan CI
aegioscy Apr 9, 2026
d83d264
Merge branch 'main' into feature-im2im
aegioscy Apr 9, 2026
4636107
attemping pr start
aegioscy Apr 10, 2026
8c29019
fix(diffusion): format cpp files with clang-format
aegioscy Apr 10, 2026
f62500f
fix(diffusion): address PR review — image resize, error handling, ali…
aegioscy Apr 10, 2026
053aee0
fix(diffusion): format C++ files with clang-format-19
aegioscy Apr 10, 2026
01a3595
perf(diffusion): use stbi_info_from_memory for efficient dimension de…
aegioscy Apr 12, 2026
3b86fcf
fix(diffusion): format test_img2img.cpp with clang-format-19
aegioscy Apr 12, 2026
754d299
docs(diffusion): add comprehensive guidance scale reference for img2img
aegioscy Apr 12, 2026
bb91481
chore: update vcpkg-registry baseline commit
aegioscy Apr 13, 2026
32d7db8
Merge branch 'main' into feature-im2im
aegioscy Apr 13, 2026
879ff65
fix(diffusion): pin ggml to port-version 4 for Vulkan LSan leak fix
aegioscy Apr 13, 2026
6b06567
fix(diffusion): revert ggml to port-version 3, port-version 4 patch i…
aegioscy Apr 13, 2026
6bb5e8e
fix(diffusion): add ggml overlay port with corrected Vulkan LSan patch
aegioscy Apr 13, 2026
171abd8
fix(diffusion): use ggml port-version 5 from jpgaribotti fork
aegioscy Apr 13, 2026
f122bc5
fix(diffusion): point registry to jpgaribotti fork for ggml port-vers…
aegioscy Apr 13, 2026
79e4647
fix: suppress LSAN false positives in diffusion C++ tests
aegioscy Apr 13, 2026
7c930d5
fix: add dbus leak suppressions for test initialization
aegioscy Apr 13, 2026
0afcea1
fix: add Windows model download step to cpp-tests workflow
aegioscy Apr 13, 2026
c656858
fix: reduce SD3 example steps from 100 to 28
aegioscy Apr 14, 2026
44fe93c
fix: correct example image paths
aegioscy Apr 14, 2026
51a00d8
fix: ensure temp directory exists in example scripts
aegioscy Apr 14, 2026
ea5f172
fix: validate init_image is Uint8Array in img2img mode
aegioscy Apr 14, 2026
000847e
fix: guard SdImageBatch against nullptr from generate_image()
aegioscy Apr 14, 2026
0fc523b
fix(diffusion): format cpp files with clang-format-19
aegioscy Apr 14, 2026
ed4c7fa
Merge origin/main into feature-im2im: resolve test timeout conflicts
aegioscy Apr 14, 2026
e66703b
fix(readme): clarify config vs parameter serialization
aegioscy Apr 14, 2026
18f0e0b
fix: restore dbus leak suppressions removed by clang-format commit
aegioscy Apr 14, 2026
395e115
fix(diffusion): apply clang-format-19 to test_stb_image_security.cpp
aegioscy Apr 14, 2026
5046966
Update packages/lib-infer-diffusion/addon/src/model-interface/SdModel…
aegioscy Apr 15, 2026
c461f5f
Update packages/lib-infer-diffusion/addon/src/model-interface/SdModel…
aegioscy Apr 15, 2026
8082388
fix(diffusion): format cpp files with clang-format-19
aegioscy Apr 15, 2026
0e5908e
Revert "fix(diffusion): format cpp files with clang-format-19"
aegioscy Apr 15, 2026
6a62d96
fix(diffusion): guard FLUX img2img prediction and harden readImageDim…
aegioscy Apr 15, 2026
831b69c
fix(diffusion): remove FLUX.1 references from documentation
aegioscy Apr 15, 2026
bff36aa
test(diffusion): add input-validation test to mobile integration suite
aegioscy Apr 15, 2026
fb03a68
Merge remote-tracking branch 'origin/main' into feature-im2im
aegioscy Apr 15, 2026
ccff5c1
Merge origin/main: keep img2img implementation from feature-im2im
aegioscy Apr 15, 2026
da6c822
chore(diffusion): bump to 0.2.0 and update changelog
aegioscy Apr 15, 2026
749ed8c
fix(diffusion): revert vcpkg registry baseline to main
aegioscy Apr 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions .github/workflows/cpp-tests-diffusion.yml
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,18 @@ jobs:
cp "$MODEL_PATH" ./test/model/stable-diffusion-v2-1-Q8_0.gguf
test -f ./test/model/stable-diffusion-v2-1-Q8_0.gguf

- if: ${{ matrix.os == 'windows-11' }}
name: Download SD2 model for C++ tests (Windows)
working-directory: ${{ env.WORKDIR }}
shell: powershell
run: |
& .\scripts\download-model-sd2.ps1
$modelPath = Get-ChildItem -Path . -Filter "stable-diffusion-v2-1-Q8_0.gguf" -Recurse | Select-Object -First 1 -ExpandProperty FullName
if (-not $modelPath) { throw "Model file not found" }
New-Item -ItemType Directory -Force -Path ".\test\model" | Out-Null
Copy-Item -Path $modelPath -Destination ".\test\model\stable-diffusion-v2-1-Q8_0.gguf" -Force
if (-not (Test-Path ".\test\model\stable-diffusion-v2-1-Q8_0.gguf")) { throw "Failed to copy model file" }

- if: ${{ matrix.platform == 'darwin' }}
name: Use Apple clang for Apple platform builds
run: |
Expand Down Expand Up @@ -228,8 +240,8 @@ jobs:
env:
SD_TEST_MODEL_PATH: ${{ github.workspace }}/${{ env.WORKDIR }}/test/model/stable-diffusion-v2-1-Q8_0.gguf
run: |
if [ -f "${{ github.workspace }}/.lsan-suppressions.txt" ]; then
export LSAN_OPTIONS="suppressions=${{ github.workspace }}/.lsan-suppressions.txt"
if [ -f "${{ github.workspace }}/${{ env.WORKDIR }}/.lsan-suppressions.txt" ]; then
export LSAN_OPTIONS="suppressions=${{ github.workspace }}/${{ env.WORKDIR }}/.lsan-suppressions.txt"
fi
echo "SD_TEST_MODEL_PATH=$SD_TEST_MODEL_PATH"
ls -lh "$SD_TEST_MODEL_PATH"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ jobs:
platform: win32
arch: x64
runner: ai-run-windows11-gpu
timeout: 600
timeout: 1800

steps:
- name: Setup Node.js
Expand Down
1 change: 1 addition & 0 deletions packages/lib-infer-diffusion/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,5 @@ logs/
output/
temp/
*.deb
*.zip
test/integration/all.js
5 changes: 5 additions & 0 deletions packages/lib-infer-diffusion/.lsan-suppressions.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# Known false positive with N-API callbacks under ASan
leak:SdModel::process
leak:SdModel::load

# D-Bus library false positives during test initialization
leak:dbus_bus_register
leak:dbus_pending_call_block
leak:_dbus_message_loader_queue_messages
18 changes: 18 additions & 0 deletions packages/lib-infer-diffusion/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,23 @@
# Changelog

## [0.2.0] - 2026-04-15

### Added

- FLUX.2 img2img support with in-context conditioning (`ref_images`) via `init_image` parameter
- JS-side input validation for `readImageDimensions()` with buffer-length guards for truncated PNG/JPEG
- Regression tests for FLUX img2img prediction guard and truncated image handling

### Changed

- FLUX img2img now requires explicit `prediction: 'flux2_flow'` in config to prevent silent fallback to SDEdit
- Updated `prediction` docstring to clarify auto-detection is insufficient for FLUX img2img
- Exported `readImageDimensions()` for testing and external use

### Fixed

- `readImageDimensions()` now safely handles truncated/corrupt PNG and JPEG buffers

## [0.1.3] - 2026-04-15

### Changed
Expand Down
1 change: 1 addition & 0 deletions packages/lib-infer-diffusion/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ add_bare_module(qvac-lib-inference-addon-sd EXPORTS ${BACKEND_DL_EXPORTS})
${PROJECT_SOURCE_DIR}/addon/src/handlers/SdCtxHandlers.cpp
${PROJECT_SOURCE_DIR}/addon/src/handlers/SdGenHandlers.cpp
${PROJECT_SOURCE_DIR}/addon/src/model-interface/SdModel.cpp
${PROJECT_SOURCE_DIR}/addon/src/utils/ImageUtils.cpp
${PROJECT_SOURCE_DIR}/addon/src/utils/LoggingMacros.cpp
${PROJECT_SOURCE_DIR}/addon/src/utils/BackendSelection.cpp
)
Expand Down
12 changes: 12 additions & 0 deletions packages/lib-infer-diffusion/NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,18 @@ JavaScript Dependencies
https://github.com/mafintosh/z32


=========================================================================
Image Assets
=========================================================================

--- public-domain (U.S. Federal Government Work) ---

assets/von-neumann.jpg
John von Neumann (1956). U.S. Department of Energy, File ID: HD.3F.191.
This image is in the Public Domain as a work of the U.S. Federal
Government (17 U.S.C. § 105). No copyright restrictions apply.
https://commons.wikimedia.org/wiki/File:JohnvonNeumann-LosAlamos.gif

=========================================================================
C++ Dependencies
=========================================================================
Expand Down
72 changes: 63 additions & 9 deletions packages/lib-infer-diffusion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Native C++ addon for text-to-image generation using [stable-diffusion.cpp](https
- [7. Release Resources](#7-release-resources)
- [Model File Reference](#model-file-reference)
- [FLUX.2 Implementation Notes](#flux2-implementation-notes)
- [Credits](#credits)
- [License](#license)

---
Expand Down Expand Up @@ -146,13 +147,16 @@ Source: [`examples/generate-image.js`](./examples/generate-image.js)

> **Performance note:** On an M1 MacBook Air (16 GB) with Metal enabled, loading takes ~15 s and 20 steps at 512 × 512 take ~10 minutes. Reduce `STEPS` to 4 for quick tests — FLUX.2's distilled model is designed for low step counts.

## Other Exampless
## Other Examples

- [Quickstart](./examples/quickstart.js) – Minimal text-to-image generation with SD2.1.
- [Generate Image (SD2.1)](./examples/generate-image-sd2.js) – Text-to-image with an SD2.1 all-in-one GGUF model.
- [Generate Image (SD3)](./examples/generate-image-sd3.js) – Text-to-image with SD3 Medium (safetensors, diffusion + CLIP encoders).
- [Generate Image (SDXL)](./examples/generate-image-sdxl.js) – Text-to-image with an SDXL base all-in-one GGUF model.
- [Runtime Stats](./examples/runtime-stats-sd2.js) – Run SD2.1 inference and report runtime statistics.
- [img2img FLUX2](./examples/img2img-flux2.js) – Transform an image with FLUX2-klein (Q8_0, in-context conditioning).
- [img2img FLUX2 F16](./examples/img2img-flux2-f16.js) – Transform an image with FLUX2-klein (F16 full precision).
- [img2img SD3](./examples/img2img-sd3.js) – Transform an image with SD3 Medium (SDEdit, flow-matching).

---

Expand Down Expand Up @@ -198,7 +202,7 @@ const config = {
}
```

All config values are coerced to strings internally before being passed to the native layer.
Config values are coerced to strings internally. Generation parameters (prompt, steps, seed, etc.) are JSON-serialized with their native types preserved.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
Expand Down Expand Up @@ -279,21 +283,61 @@ require('bare-fs').writeFileSync('output.png', images[0])

> **Sampler note:** Do not set `sampling_method: 'euler_a'` for FLUX.2 models — it will produce random noise. Leave the field unset to let the library auto-select `euler` for flow-matching models.

#### Image-to-image (not yet supported)
#### Image-to-image (`init_image`)

> **Note:** img2img is not yet wired in the JS layer — calling `model.run()` with `init_image` will throw. The parameters below are reserved for a future release.
Pass `init_image` (a `Uint8Array` of PNG or JPEG bytes) to transform an existing image with a text prompt. Width and height are auto-detected from the image header and rounded to the nearest multiple of 8.

The addon automatically selects the correct img2img strategy based on the model's prediction type:

| Model family | Prediction type | Strategy | How it works |
|-------------|----------------|----------|-------------|
| FLUX.2 | `flux2_flow` / `flux_flow` | In-context conditioning (`ref_images`) | Input image is VAE-encoded into separate latent tokens; the transformer attends to them via joint attention with distinct RoPE positions. The target starts from pure noise, so the model preserves features while generating a fully new image. |
| SD1.x / SD2.x / SDXL / SD3 | All others | SDEdit (`init_image`) | Input image is noised according to `strength` (0.0–1.0), then denoised with the text prompt. Lower strength preserves more of the original; higher strength allows more creative freedom. |

**FLUX.2 example (in-context conditioning):**

```js
const inputPng = require('bare-fs').readFileSync('input.png')
const fs = require('bare-fs')

const inputImage = fs.readFileSync('assets/von-neumann.jpg')

const response = await model.run({
prompt: 'a photo of a cat in a snowy landscape',
init_image: inputPng,
strength: 0.75, // 0.0 = no change, 1.0 = full redraw
steps: 20
prompt: 'a modern tech CEO version of this person, professional headshot',
init_image: inputImage,
cfg_scale: 1.0,
steps: 20,
guidance: 9.0,
seed: 42
})
```

**SD3 example (SDEdit):**

```js
const inputImage = fs.readFileSync('headshot.jpeg')

const response = await model.run({
prompt: 'anime portrait, same pose, studio ghibli style, soft cel shading',
negative_prompt: 'photorealistic, blurry, low quality',
init_image: inputImage,
cfg_scale: 4.5,
steps: 30,
strength: 0.75,
sampling_method: 'euler',
seed: 42
})
```

> **SDEdit img2img limitations:**
>
> - **Black-and-white input images** produce weaker results because the model must hallucinate all color information. Consider colorizing the image before feeding it in.
> - **Low-resolution images** (below ~512×512) give the model less detail to preserve identity. Upscaling beforehand helps.
> - **High `strength` values** (≥ 0.7) allow the model to deviate significantly from the input, including changing facial features, gender, or ethnicity. Use `strength` 0.35–0.55 for identity-preserving edits.
> - **Style prompts** like "anime" or "studio ghibli" carry training-data biases that can alter the subject's appearance. Anchor the prompt with terms like "same person, same face" and use the negative prompt to block unwanted changes.
> - **Non-multiple-of-8 images** are automatically aligned (nearest-neighbor resize to the next multiple of 8) before processing. For best quality, provide images with dimensions that are already multiples of 8.

The bundled test image (`assets/von-neumann.jpg`) is a 1956 portrait of John von Neumann sourced from the U.S. Department of Energy (Public Domain). See the [Credits](#credits) section for details.

### 7. Release Resources

```js
Expand Down Expand Up @@ -441,6 +485,16 @@ The underlying pattern across all these fixes is the same: our C++ config struct

---

## Credits

### Test Image

`assets/von-neumann.jpg` — **John von Neumann** (1956).
Source: U.S. Department of Energy, File ID: HD.3F.191.
This image is in the **Public Domain** as a work of the U.S. Federal Government.

---

## License

Apache-2.0 — see [LICENSE](./LICENSE) for details.
77 changes: 76 additions & 1 deletion packages/lib-infer-diffusion/addon.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,55 @@

const path = require('bare-path')

/**
* Extract pixel dimensions from a PNG or JPEG buffer without a full decode.
*
* PNG: width/height are stored as big-endian uint32 at bytes 16–23 of the IHDR chunk.
* JPEG: scan for the first SOFx segment (0xFFCx) which stores height at +5 and width at +7.
*
* Returns { width, height } or null if the format is not recognised.
*
* @param {Uint8Array} buf
* @returns {{ width: number, height: number } | null}
*/
function readImageDimensions (buf) {
if (!buf || buf.length < 4) return null

// PNG — magic: \x89PNG\r\n\x1a\n (IHDR width/height at bytes 16–23)
if (buf[0] === 0x89 && buf[1] === 0x50 && buf[2] === 0x4E && buf[3] === 0x47) {
Comment thread
gianni-cor marked this conversation as resolved.
if (buf.length < 24) return null
const w = (buf[16] << 24 | buf[17] << 16 | buf[18] << 8 | buf[19]) >>> 0
const h = (buf[20] << 24 | buf[21] << 16 | buf[22] << 8 | buf[23]) >>> 0
return { width: w, height: h }
}

// JPEG — magic: 0xFF 0xD8
if (buf[0] === 0xFF && buf[1] === 0xD8) {
let i = 2
while (i + 4 < buf.length) {
if (buf[i] !== 0xFF) break
const marker = buf[i + 1]
const segLen = (buf[i + 2] << 8 | buf[i + 3])
if (segLen < 2) break
// SOF0–SOF3, SOF5–SOF7, SOF9–SOF11, SOF13–SOF15
if (
(marker >= 0xC0 && marker <= 0xC3) ||
(marker >= 0xC5 && marker <= 0xC7) ||
(marker >= 0xC9 && marker <= 0xCB) ||
(marker >= 0xCD && marker <= 0xCF)
) {
if (i + 8 >= buf.length) return null
const h = (buf[i + 5] << 8 | buf[i + 6])
const w = (buf[i + 7] << 8 | buf[i + 8])
return { width: w, height: h }
}
i += 2 + segLen
}
}

return null
}

/**
* JavaScript wrapper around the native stable-diffusion.cpp addon.
* Manages the native handle lifecycle and bridges JS ↔ C++.
Expand Down Expand Up @@ -61,6 +110,32 @@ class SdInterface {
* @returns {Promise<boolean>} true if job was accepted, false if busy
*/
async runJob (params) {
// Pass init_image Uint8Array directly to C++ as a typed-array property
// (avoids JSON-encoding every byte as a number).
// Auto-detect width/height from the image header so the C++ tensor
// dimensions always match the decoded image — without this, generate_image()
// hits GGML_ASSERT(image.width == tensor->ne[0]).
if (params.init_image) {
const serializable = { ...params }
const imgBuf = serializable.init_image
delete serializable.init_image

if (!serializable.width || !serializable.height) {
const dims = readImageDimensions(imgBuf)
if (dims) {
serializable.width = Math.ceil(dims.width / 8) * 8
serializable.height = Math.ceil(dims.height / 8) * 8
}
}

const paramsJson = JSON.stringify(serializable)
return this._binding.runJob(this._handle, {
type: 'text',
input: paramsJson,
initImageBuffer: imgBuf
})
}

const paramsJson = JSON.stringify(params)
return this._binding.runJob(this._handle, { type: 'text', input: paramsJson })
}
Expand All @@ -76,4 +151,4 @@ class SdInterface {
}
}

module.exports = { SdInterface }
module.exports = { SdInterface, readImageDimensions }
8 changes: 8 additions & 0 deletions packages/lib-infer-diffusion/addon/src/addon/AddonJs.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,14 @@ inline js_value_t* runJob(js_env_t* env, js_callback_info_t* info) try {
SdModel::GenerationJob job;
job.paramsJson = paramsJson;

auto inputObj = args.getJsObject(1, "inputObj");
auto initBuf =
inputObj
.getOptionalPropertyAs<js::TypedArray<uint8_t>, std::vector<uint8_t>>(
env, "initImageBuffer");
if (initBuf.has_value())
job.initImageBytes = std::move(initBuf.value());

// Progress updates are queued as JSON strings (JsStringOutputHandler).
job.progressCallback = [&instance](const std::string& progressJson) {
instance.addonCpp->outputQueue->queueResult(std::any(progressJson));
Expand Down
Loading
Loading