Skip to content

chore[bc]: diffusion addon interface refactor - remove BaseInference#1496

Merged
gianni-cor merged 35 commits into
tetherto:mainfrom
donriddo:chore/sd-addon-interface-refactor
Apr 17, 2026
Merged

chore[bc]: diffusion addon interface refactor - remove BaseInference#1496
gianni-cor merged 35 commits into
tetherto:mainfrom
donriddo:chore/sd-addon-interface-refactor

Conversation

@donriddo

@donriddo donriddo commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • Diffusion addon extends BaseInference solely for job-management boilerplate.
  • Constructor uses diskPath + individual model-name strings, forcing callers to also pass a filesystem root.
  • Native C++ event vocabulary is normalized inline in index.js, spreading bridge logic across the class.

📝 How does it solve it?

  • Remove BaseInference inheritance; compose createJobHandler() + exclusiveRunQueue() from @qvac/infer-base@^0.4.0 directly.
  • Constructor now takes { files: { model, clipL?, clipG?, t5Xxl?, llm?, vae? }, config?, logger?, opts? } with absolute paths per component.
  • Constructor validation: assertAbsolute(...) throws a TypeError on missing/relative paths, mapping the old two-argument shape to a clear error for callers porting old code.
  • Extract native event normalization into addon.js (mapAddonEvent): Error event name → Error, Uint8ArrayOutput (image bytes), string → Output (progress tick JSON), plain object → JobEnded (RuntimeStats), unknown shapes → null (caller logs at debug level).
  • run()-before-load() guard: throws Error('Addon not initialized. Call load() first.') instead of crashing in native code.
  • load() on an already-loaded instance is now a silent no-op (ReadyResource pattern). Call unload() first if you intend to swap weights, then load() again.
  • load() is serialized through the same exclusiveRunQueue as run() and unload(), so two concurrent load() calls on the same instance no longer race past the configLoaded guard and leak a native handle.
  • Crash-safe activation: if addon.activate() throws during _load(), the partially initialized addon is best-effort-unloaded, the native logger is released, and this.addon is reset to null before the error is re-thrown.
  • Broaden isSplitLayout heuristic to also trigger when only clipL or clipG is supplied — closes a footgun where a FLUX.1 caller passing { model, clipL, clipG, vae } without t5Xxl would silently mis-route the diffusion model into the all-in-one path parameter and fail to load.
  • unload() nulls this.addon so post-unload cancel() / run() calls hit the guards instead of dereferencing a disposed native handle.
  • Rejected generation responses log at warn level instead of being silently swallowed.
  • Dependency changes: @qvac/infer-base ^0.2.2^0.4.0; new @qvac/logging runtime dependency.

🧪 How was it tested?

  • npm run lint (standard) and npm run test:dts clean.
  • New unit test test/unit/map-addon-event.test.js — 7 tests covering Error event precedence, Uint8Array → Output, string → Output, plain-object → JobEnded, and null returns for unknown shapes.
  • All 5 integration tests and 7 examples updated to the new constructor shape.

💥 Breaking Changes

BEFORE (≤ 0.2.x):

const model = new ImgStableDiffusion(
  { diskPath: '/models', modelName: 'model.gguf', llmModel: 'qwen.gguf', vaeModel: 'vae.sft', logger, opts },
  config
)

AFTER (0.3.0):

const model = new ImgStableDiffusion({
  files: {
    model: '/models/model.gguf',
    llm: '/models/qwen.gguf',
    vae: '/models/vae.sft'
  },
  config: { device: 'gpu', threads: 4 },
  logger: console,
  opts: { stats: true }
})

Additional behavior changes:

  • Second load() is a silent no-op (ReadyResource pattern). Call unload() first if you intend to swap weights.
  • Artifact keys renamed: clipLModel / clipGModel / t5XxlModel / llmModel / vaeModelclipL / clipG / t5Xxl / llm / vae.
  • Callers no longer resolve paths via diskPath + name; addon treats file paths as authoritative.

🔌 API Changes

// Constructor: single object with `files` (absolute paths per component), plus config / logger / opts
// Removed: diskPath, modelName, clipLModel, clipGModel, t5XxlModel, llmModel, vaeModel params
// Removed: BaseInference inheritance
// Kept: load(), run(), cancel(), unload()
// Added: getState()

Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization

Constructor now takes { files: { model, clipL?, clipG?, t5Xxl?, llm?, vae? }, config, logger, opts }
instead of { diskPath, modelName, clipLModel, ... } + config.

All examples and tests updated to new constructor shape.
@donriddo donriddo marked this pull request as ready for review April 10, 2026 09:50
@donriddo donriddo requested review from a team as code owners April 10, 2026 09:50
donriddo added a commit to donriddo/qvac that referenced this pull request Apr 10, 2026
The embed, LLM, and diffusion addons (PRs tetherto#1493, tetherto#1494, tetherto#1496) drop
BaseInference + WeightsProvider in favour of composable utilities and
expose a single-arg constructor of the form
`new Addon({ files, config, logger, opts })` where `files.model` is
either a string (diffusion) or an array of absolute paths to all GGUF
shards (embed/LLM). Per the addon-loader-abstraction proposal, shard
expansion now belongs to the SDK plugin layer rather than the addon.

This commit:

- Adds `expandGGUFIntoShards()` in `server/utils`, mirroring the C++
  `GGUFShards::expandGGUFIntoShards` regex contract. It is pure string
  manipulation so it works under both bun (unit tests) and bare
  (production), and handles POSIX and Windows separators.

- Rewrites `llamacpp-embedding`, `llamacpp-completion`, and
  `sdcpp-generation` plugins to call the new constructor. FilesystemDL
  and the loader-adapter shim are dropped from these three plugins —
  they no longer create or pass a loader, returning
  `{ model, loader: undefined }` (the SD plugin already followed this
  pattern). The diffusion artifact keys are renamed
  `clipLModel`/`t5XxlModel`/etc. → `clipL`/`t5Xxl`/etc. to match the
  new `DiffusionFiles` shape.

- Pins `@qvac/embed-llamacpp`, `@qvac/llm-llamacpp`, and
  `@qvac/diffusion-cpp` to `file:../...` so the SDK consumes the
  in-monorepo refactored addons until the addon PRs land and publish
  versioned releases. These pins must be bumped back to semver ranges
  before merging.

- Adds a unit test for `expandGGUFIntoShards` covering non-sharded
  paths, sharded paths supplied at any index, nested directories,
  single-shard sharded models, relative paths, and Windows
  separators.

`@qvac/dl-filesystem` stays in `dependencies` because OCR, parakeet,
nmtcpp, and whispercpp plugins still rely on it.
Comment thread packages/lib-infer-diffusion/package.json
Bumps `@qvac/diffusion-cpp` to `0.2.0` per the addon-changelog
process — minor bump on a pre-1.0 package signals the breaking
constructor change to consumers using semver ranges. Adds the
matching `0.2.0` block to `CHANGELOG.md` documenting the new
single-object constructor with `files`, the removal of
`BaseInference`, and every behaviour change in this release.

Hardens the JS layer based on the review:

- Constructor now throws a clear `TypeError` when `files` /
  `files.model` is missing, instead of crashing with an opaque
  "cannot read properties of undefined" later.
- `createJobHandler({ cancel })` closure uses optional chaining so
  a `response.cancel()` after `unload()` is a no-op rather than a
  `TypeError`.
- `unload()` sets `this.addon = null` after `addon.unload()`, so
  the existing `if (!this.addon)` guard in `_runInternal` is also
  effective post-unload.
- `cancel()` re-adds the defensive `?.cancel` check.
- `isSplitLayout` now also triggers on `clipL` / `clipG`, closing
  a footgun where a FLUX.1 caller passing only encoders without
  `t5Xxl` would silently misroute the diffusion model into the
  all-in-one `path` parameter and fail to load.
- `_addonOutputCallback` no longer pushes unknown event payloads
  into the active response output stream — unknown events are
  logged at debug level instead. The error log line is updated to
  pass the `Error` object directly so loggers can format the full
  stack.

Doc + test cleanup:

- README section 3 now describes `args.config` as a field of the
  same `args` object built in section 2 (the old wording made it
  sound like a separate constructor argument).
- The `api-behavior` integration test no longer calls
  `binding.releaseLogger()` manually in the teardown — `unload()`
  already releases the native logger via `_releaseNativeLogger`.

@gianni-cor gianni-cor left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for concurrent lifecycle safety. load() should be serialized through the same queue as run() / unload() so overlapping load() calls on the same instance cannot race on shared addon state.

Comment thread packages/lib-infer-diffusion/index.js
- load() now runs through `this._run()` so concurrent calls on the same
  instance serialize instead of racing past the `configLoaded` guard.
  Two overlapping loads could previously both allocate a native addon
  and clobber `this.addon`, leaking one native handle.
- _load() now wraps `addon.activate()` in a try/catch that best-effort
  unloads the partially-initialized addon, releases the native logger,
  and resets `this.addon = null` before re-throwing. Matches the
  crash-safety pattern already in embed and LLM. A failing activate()
  no longer leaves a zombie native instance that the next load() would
  orphan.
- Add `test:unit` and `test:unit:generate` scripts that run the JS
  unit tests under `test/unit/*.test.js` via brittle + bare. Wire
  `test:unit` into `test:all` and into the PR workflow's ts-checks
  job so `map-addon-event.test.js` runs on every PR.
- `.gitignore` the generated `test/unit/all.js` brittle runner.
- CHANGELOG: document both fixes under Bug Fixes.
Unit tests need the bare runtime, which is only installed globally in
the integration-test workflow (via npm install -g bare). My previous
commit wired test:unit into the ts-checks job, which doesn't install
bare, so it would have failed with command not found in CI.

Chain test:unit at the script level instead — the integration-test
workflow already runs npm run test:integration, which now runs unit
tests first. Matches the standalone-repo precedent (qvac-lib-dl-filesystem,
qvac-lib-decoder-audio, qvac-lib-error-base, etc.) of having the test
script drive both.
Reviewer flagged that test:unit invoked the `bare` CLI, but the
ts-checks job did not install it. My previous commit's workaround —
chaining test:unit into test:integration at the script level — would
have re-run unit tests on every platform in the 7-way integration
matrix. Revert both.

Use the existing `tetherto/oss-actions/.github/actions/run-lint-and-unit-tests`
action instead, same as `qvac-lib-infer-onnx` and `ocr-onnx`. The
action installs bare globally and runs `npm run test:unit --if-present`
in a single fast step.
Matches the standalone-repo precedent (qvac-lib-inference-addon-base,
qvac-lib-dl-filesystem, etc.) so 'npm run test' runs both flows
locally for developers.

@maxim-smotrov maxim-smotrov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the mermaid diagram syntax.

Comment thread packages/lib-infer-diffusion/docs/architecture.md Outdated
donriddo added 11 commits April 16, 2026 13:34
Mermaid's classDiagram uses { and } as class-body delimiters, so the
inline object literal in the method signatures broke the parser and
prevented the diagram from rendering on GitHub. Replace:

- constructor(args: {files, config, logger?, opts?}) →
  constructor(args: ImgStableDiffusionArgs)  (matches index.d.ts)
- getState() {configLoaded} →
  getState() State

Reported by maxim-smotrov at architecture.md:150.
The run-lint-and-unit-tests action runs `npm run lint` and
`npm run test:unit`. The step name "Run JavaScript tests" hides
the lint half. Rename to "Run lint and unit tests" and update the
step id accordingly.
- index.d.ts: replace the stale "not yet supported, throws at runtime"
  JSDoc on `init_image` and `strength` with accurate docs covering the
  FLUX.2 in-context-conditioning branch and the SD/SDEdit branch. This
  PR ships img2img support end-to-end, so the type hover docs
  contradicted the runtime behavior.

- docs/img2img-quickstart.md: rewrite against the refactored API.
  Replace the two-arg constructor (`diskPath`, `modelName`,
  `llmModel`, `vaeModel`) with the single-object `{ files, config,
  logger }` shape, switch every example from `model.img2img(...)` to
  `model.run({ init_image, ... })`, and correct the package name from
  `@qvac/lib-infer-diffusion` to `@qvac/diffusion-cpp`.

- docs/architecture.md: bump the package header to v0.3.0.

- examples/quick-test.js: delete. It used ESM `import` syntax in a
  CommonJS package and imported a non-existent `diffusionAddon` named
  export; nothing referenced it.

- addon/src/model-interface/SdModel.cpp: remove the duplicate
  `genParams.height = imgH;` in the `useRefImages` branch. Harmless
  dead store, but easy to miss in review.
- architecture.md: Key Features list `Generation modes` now includes
  img2img alongside txt2img, matching the shipped runtime.
- CHANGELOG.md: migration example marker changes from
  `BEFORE (<= 0.1.x)` to `BEFORE (<= 0.2.x)` since 0.2.x also
  used the old two-argument constructor.
- index.d.ts: trim the init_image / strength JSDoc to verified facts.
  The old text said both were "not yet supported, throws at runtime",
  which is false on this branch (img2img ships end-to-end). Previous
  revision added branch-specific behavior claims; replaced with the
  minimal accurate description.
Both methods had one-line JSDoc on main describing what they do
("Cancel the current generation job.", "Unload the model and release
all resources."). The refactor dropped the JSDoc comments when it
rewrote the method bodies. Restore them since the purpose statement
is still accurate.
Tighten comments this PR introduced that drifted into over-explanation.
Leave pre-existing comments as-is.

- addon.js mapAddonEvent JSDoc: drop multi-paragraph prose; keep the
  one-sentence contract plus the param block.
- index.js constructor: collapse the cancel-closure rationale to one line.
- docs/architecture.md: bump Last Updated to 2026-04-16.
The refactor commit silently dropped the JSDoc block on _createAddon()
and the 'Error during stable-diffusion model load' error log in _load().
Put them back so the refactor only changes what needs to change.
_load() wrapped only await this.addon.activate() in try/catch, but
_createAddon() calls _connectNativeLogger() and then constructs
SdInterface. If the SdInterface constructor throws, _nativeLoggerActive
stays set and the native logger hook is never released; a retry on the
same instance would reconnect on top of a stale hook.

Move _createAddon() inside the try so the existing catch path runs
_releaseNativeLogger() for every pre-activate failure.
The 'test' alias was only consumed by 'test:all', and neither was
referenced in CI workflows or the README. 'test:all' ran test:unit
twice because it called both test:unit and the 'test' alias. Remove
'test' and rewrite 'test:all' to run test:unit, test:integration, and
test:cpp directly.
The 0.3.0 section used ## for Breaking Changes / Features / Bug Fixes
/ Pull Requests, making them siblings of the version heading instead
of children; bump those to ### and the leaf subsections from ### to
#### so the TOC renders correctly, matching the 0.2.x entries.

Two architecture.md spots still said the exclusive run queue
serialises only run()/unload() even though load() now also wraps in
this._run(...); align both.
maxim-smotrov
maxim-smotrov previously approved these changes Apr 17, 2026
Three flowchart node labels contained ( ) inside unquoted [ ] labels:
  RunJob[addon.runJob(paramsJson)]
  EncodePrompt[Encode prompt (CLIP)]
  InitLatents[Initialize random latents (seed)]
Mermaid treats the ( as the start of a stadium-shaped node inside
the rectangular node, so the diagram failed to parse. Quote each
label so the parens render literally.
yuranich
yuranich previously approved these changes Apr 17, 2026
docs/img2img-quickstart.md already notes that the FLUX.2 in-context
conditioning path does not use strength and the input image is routed
through ref_images instead. The d.ts on GenerationParams.strength
still implied the knob applies universally; match the doc so IDE
hover docs tell the same story.
@donriddo

Copy link
Copy Markdown
Contributor Author

/review

@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (3/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants