Skip to content

cleanup(diffusion): Removed overlay ports.#1066

Merged
gianni-cor merged 1 commit into
tetherto:mainfrom
jpgaribotti:cleanup-diffusion
Mar 20, 2026
Merged

cleanup(diffusion): Removed overlay ports.#1066
gianni-cor merged 1 commit into
tetherto:mainfrom
jpgaribotti:cleanup-diffusion

Conversation

@jpgaribotti

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • Removes stale local overlay ports from lib-infer-diffusion so dependency builds come from the registry-managed ports.
  • Aligns dependency constraints to explicit registry port versions (#1) for reproducible vcpkg resolution.

📝 How does it solve it?

  • Deletes packages/lib-infer-diffusion/vcpkg/ports/* (both ggml and stable-diffusion-cpp overlays), keeping only .gitkeep.
  • Updates packages/lib-infer-diffusion/vcpkg.json:
    • qvac-lint-cpp from 1.4.4 to 1.4.4#1
    • stable-diffusion-cpp from 2026-03-01 to 2026-03-01#1

🧪 How was it tested?

  • Ran bare-make generate && bare-make build && bare-make install in packages/lib-infer-diffusion.
  • vcpkg resolution switched to registry ports as intended. qvac-lint-cpp installed from monorepo package.

@jpgaribotti jpgaribotti self-assigned this Mar 20, 2026
@jpgaribotti jpgaribotti requested review from a team as code owners March 20, 2026 14:40
@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@github-actions

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

@gianni-cor gianni-cor merged commit ba9f55e into tetherto:main Mar 20, 2026
16 of 17 checks passed
BrunoCampana added a commit that referenced this pull request Mar 23, 2026
* fix: fix race condition in LLM example download utility (#1019)

* fix: fix race condition in LLM example download utility

The redirect handler in examples/utils.js called fs.unlink fire-and-forget
then immediately recursed into downloadModel. The recursive call could find
the empty file still on disk (existsSync → true) before unlink completed,
causing an ENOENT crash on the subsequent statSync.

Port the proven download pattern from test/integration/utils.js:
- Wait for unlink callback before recursing on redirect
- Handle 307/308 redirects (HuggingFace uses 302)
- Handle relative redirect URLs
- Use safeResolve/safeReject guards to prevent double settlement
- Add response error handler and fileStream error handler

* fix: use URL constructor for safer redirect resolution


* fix: fix race condition in embed and diffusion download utilities

Port the proven download pattern from the LLM package (PR #1019):
- Wait for fs.unlink callback before recursing on redirect
- Add safeResolve/safeReject guards to prevent double settlement
- Handle 307/308 redirects in embed examples/utils.js
- Add fileStream and response error handlers
- Use URL constructor for safer redirect resolution
- Use close event instead of finish for write completion


---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* doc: update README - table of packages - add diffusion and diagnostics - key features - add openAI-compatible API (#1033)

* fix: fix docs build and escape MDX curly braces in errors.mdx and removed randomly created (#1051)

* doc: generate API docs for v0.8.0

* chore[notask]: remove accidentally committed file

* fix: fix docs build and escape MDX curly braces in errors.mdx and removed random

* fix: revert pre-build script

---------

Co-authored-by: Bruno Campana <7632562+BrunoCampana@users.noreply.github.com>

* Fix security issues flagged by CodeQL in TTS package (#1058)

* Updated qvac-lint-cpp to match latest version from original repo (#1064)

* fix: add native job IDs to addon-cpp callbacks (#955)

* fix: preserve addon job ownership across cancel/reuse

Propagate native job IDs through addon-cpp queued callbacks so late cancel events stay attached to the cancelled job. Remove the Parakeet stale-cancel workaround and align Whisper with the shared runtime contract.

Made-with: Cursor

* chore: scope addon-cpp job-id update to 1.1.3

Limit this branch to the shared addon-cpp runtime changes and bump the package to 1.1.3. Follow-up addon consumer updates will land in separate PRs after the registry is updated.

Made-with: Cursor

* fix: move pending job state before unlock

Copy the pending job into local state before releasing the JobRunner mutex so processing and error paths no longer read job_ without synchronization.

Made-with: Cursor

---------

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

* Removed overlay ports. Build from registry. (#1066)

* fix: use object config format in nativelog example (#1070)

* QVAC-13813 chore: add int8 parakeet eou and sortformer production registry entries (#1035)

* chore: Add int8 quantised models for Parakeet EOU and Sortformer

* fix: Add links for quantised parakeet models

* fix: Remove tokenizer for int8

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>

* fix[notask]: resolve code scanning security findings in nmtcpp and ocr-onnx (#1060)

* fix[notask]: resolve code scanning security findings in nmtcpp and ocr-onnx

Fix ReDoS vulnerabilities in indic-processor URL and numeral regexes by
removing nested quantifiers. Fix ReDoS in sacremoses tokenizer protected
patterns by requiring opening quotes to eliminate ambiguous backtracking.
Fix incomplete string replacement in indic_normalize by using global
regex for pipe character substitution. Replace insecure tempfile.mktemp
with NamedTemporaryFile in ocr-onnx benchmark script.

* fix[notask]: resolve polynomial ReDoS in numeral and other patterns

Fix _NUMERAL_PATTERN by replacing ambiguous \d+\.?\d* with
\d+(?:\.\d+)? to eliminate overlapping digit quantifiers.
Fix _OTHER_PATTERN by bounding the prefix to {0,100} to prevent
polynomial backtracking when no separator is found.

* fix[notask]: bound regex quantifiers to eliminate polynomial ReDoS

Replace unbounded \d+ with \d{1,20} and \w+ with \w{1,100} in
_NUMERAL_PATTERN and _OTHER_PATTERN to make backtracking constant-time
regardless of input length. No real-world numeral exceeds 20 digits
and no hashtag/mention exceeds 100 chars.

---------

Co-authored-by: RamazTs <66473301+RamazTs@users.noreply.github.com>

* feat[whisper][notask]: add streaming VAD transcription to whisper addon (#998)

* feat: add streaming VAD transcription to whisper addon

- Add C++ StreamingProcessor with Silero VAD for speech segmentation
- StreamingProcessor runs on its own thread, buffers incoming audio,
  and uses whisper_vad_* APIs to detect speech boundaries
- RAII wrapper (VadSegmentsPtr) for automatic VAD segment cleanup
- Backpressure handling: drop oldest audio when buffer exceeds cap
- JS bindings: startStreaming, appendStreamingAudio, endStreaming
- New error codes for streaming operations (6012-6014)
- Addon state properly reset in response finally handler

Made-with: Cursor

* fix: address PR review comments for whisper streaming VAD

- Replace g_streamingProcessors map with single-processor globals
  (one active streaming job at a time per Gustavo's feedback)
- Wire streaming cleanup into cancel and destroyInstance via
  cancelWithStreaming and destroyInstanceWithStreaming wrappers
- Add StreamingProcessor::cancel() for forceful abort with
  model cancellation and thread join
- Fix stats accumulation: use WhisperModel::process(Input&) void
  overload + takeOutput() so stats accumulate across segments
  instead of resetting per-segment
- Add WhisperModel::prepareForStreaming() to reset stats and
  cancel flag once at session start
- Propagate segment processing errors via hasError_ flag and
  queue exception at stream end
- Add streaming methods to MockedBinding (startStreaming,
  appendStreamingAudio, endStreaming, error simulation)
- Add 6 unit tests covering streaming lifecycle, stats, cancel,
  destroy, error propagation, and concurrent session rejection
- Add example.streaming-vad.js demonstrating runStreaming() API
  with fs.createReadStream as audio source

Made-with: Cursor

---------

Co-authored-by: Raju <raju.sharma>

* QVAC-14357 fix(onnx): Code clean-up and fixes (#1049)

* (feature) llamacpp-llm: dynamic tools (#706)

* (improvement) llamacpp-llm: Qwen3 dynamic tools template

* (improvement) llamacpp-llm: add llm config tools flag

* (improvement) llamacpp-llm: use template based on tools param

* (improvement) llamacpp-llm: count tools token offset with tokenizer

* (improvement) llamacpp-llm: track n-past, run Qwen3 tests, fix reset

* (improvement) llamacpp-llm: save cache with respect to tools flag

* (fix) llamacpp-llm: add Qwen3ToolsDynamicTemplate.cpp to production CMakeLists

The new source file was added to the test CMakeLists but missing from the addon and cli_tool targets, causing an undefined symbol linker error on CI win64 builds.

Made-with: Cursor

* chore: retrigger CI for CMakeLists fix

Made-with: Cursor

* (fix) llamacpp-llm: fix use-after-free SIGSEGV on process exit (linux)

Reorder TextLlmContext members so threadpools are declared before llamaInit_. C++ destroys members in reverse declaration order, so llamaInit_ (which calls llama_free) now runs while threadpools are still alive, preventing use-after-free when llama_free accesses attached threadpool pointers.

Made-with: Cursor

* Revert "(fix) llamacpp-llm: fix use-after-free SIGSEGV on process exit (linux)"

This reverts commit 7d9c237.

* (fix) llamacpp-llm: robust threadpool teardown to prevent SIGSEGV on exit

The ThreadPoolDeleter was doing ggml backend registry lookups during destruction, which is fragile during process teardown when the registry may already be torn down. Additionally, threadpools attached to llama_context could be freed before the context itself, causing use-after-free. Fix: cache ggml_threadpool_free fn pointer at construction time, and add explicit destructor that detaches threadpools before freeing them.

Made-with: Cursor

* Revert "(fix) llamacpp-llm: robust threadpool teardown to prevent SIGSEGV on exit"

This reverts commit 4e66b38.

* fix(llm): reset stale state before non-cached run after prefill

When a prefill run leaves nPast_ > 0 and the next run is a non-cached single-shot, the stale KV cache and dynamic-tools bookkeeping (nPastBeforeTools_, nConversationOnlyTokens_) caused token duplication and incorrect cache trimming. Clear state eagerly when shouldResetAfterInference is true and nPast_ is non-zero.

Made-with: Cursor

* fix(llm): trim stale tool tokens in multi-turn sessions with tools_at_end

When tools_at_end is true and a session continues without explicit save between turns, old tool+response tokens remained in the KV cache. New tool tokens were appended, causing conflicting tool definitions.

Add a guard in processPrompt() that trims from nPastBeforeTools_ to nPast_ before eval when stale tool tokens are detected. Includes new dynamic-tools integration tests covering changing tools, same tools, and single-shot regression.

Made-with: Cursor

* (fix) llamacpp-llm: dynamic tools cache trim, tmp template, debugs

* fix(llm): pass toolsAtEnd flag to context constructors to fix template selection race

The toolsAtEnd flag was set via setToolsAtEnd() after context creation,
but getChatTemplateForModel() was called during construction — always
seeing toolsAtEnd=0 and selecting the wrong Qwen3 template.

Pass the flag through createContext() into TextLlmContext and
MtmdLlmContext constructors so the correct template is selected
from the start. Also restore the conditional template selection
in ChatTemplateUtils that was previously hardcoded.

* feat(llm): strip tool_call/think blocks from re-sent assistant responses

Add stripInternalBlocks() helper to testToolRemoval.js and
benchToolsPlacement.js to remove <tool_call> and <think> blocks
from assistant responses before including them in conversation
history. Prevents model from pattern-matching on old tool calls
and hallucinating removed tools.

Also extend benchToolsPlacement to 20 turns and add HTML chart.

* (fix) llamacpp-llm: use correct template in tests

* (chore) llamacpp-llm: move qwen3 cache tests to own file

* (improvement) llamacpp-llm: simplify nPastBeforeTools reset, multi-turn cache tests

* (improvement) llamacpp-llm: simply nPastBeforeTools tracking, no trim on save

* (chore) llamacpp-llm: remove redundant getters and cleanup

* (internal) llamacpp-llm: run Qwen3 context tests

* (chore) cleanup

* (chore) fix lint errors in examples

* (chore) fix remaining lint errors in benchToolsPlacement

* (chore) fix indentation in benchToolsPlacement ternary

* (chore) llamacpp-llm: remove unused example files

* (chore) remove scratch planning docs

* (doc) llamacpp-llm: tools_at_end param description

* (chore) llamacpp-llm: changelog and version bump

* refactor(llamacpp-llm): address PR #706 review comments

Implement all 10 reviewer requests from PR #706 (jesusmb1995, gianni-cor).

| # | Reviewer | Request | Result |
|---|---------|---------|--------|
| R1 | @jesusmb1995 | Extract DynamicToolsState class | Done - new class in LlmContext.hpp with toolsAtEnd_, nConversationOnlyTokens_, nPastBeforeTools_, recordToolBoundary(), reset() |
| R2 | @jesusmb1995 | Collapse 3 virtual methods into single dynamicToolsState() accessor | Done - removed setToolsAtEnd, getNPastBeforeTools, setNPastBeforeTools virtuals; added dynamicToolsState() non-virtual accessor on base class |
| R3 | @gianni-cor | Remove redundant setToolsAtEnd() after createContext() | Done - removed the 4-line block in LlamaModel::init() |
| R4 | @gianni-cor | Add assert: nConversationOnlyTokens_ <= inputTokens.size() | Done - added in TextLlmContext::tokenizeChat |
| R5 | @gianni-cor | Reset nConversationOnlyTokens_ in TextLlmContext::resetState | Done - both contexts now call dynamicToolsState().reset() which resets both values |
| R6 | @gianni-cor | Guard tools_at_end for non-Qwen3 models | Done - architecture check after config parsing, logs warning and disables flag |
| R7 | @gianni-cor | Fix off-by-A trim error (disable add_generation_prompt) | Done - both TextLlmContext and MtmdLlmContext save/restore add_generation_prompt=false during no-tools tokenization |
| R8 | @gianni-cor | Add cold-start reset in MtmdLlmContext::tokenizeChat | Done - dynamicToolsState().reset() added at cold-start path |
| R9 | @gianni-cor | Cap firstMsgTokens_ after post-eval trim | Done - setFirstMsgTokens(getNPast()) if inflated after trim |
| R10 | @gianni-cor | Remove duplicate toolsAtEnd_ from LlamaModel | Done - runtime code in processPromptImpl queries dynamicToolsState().toolsAtEnd() instead of state_->toolsAtEnd_ |

Made-with: Cursor

* refactor(llamacpp-llm): remove toolsAtEnd_ from ReloadableState, single source of truth in DynamicToolsState

Made-with: Cursor

* fix(llamacpp-llm): use dts.reset() after post-eval trim for full state cleanup

Made-with: Cursor

* (draft) llamacpp-llm: dynamic tools cache tokens test debug

* (internal) llamacpp-llm: dynamic tools token count and cache match test

* Revert "(internal) llamacpp-llm: dynamic tools token count and cache match test"

This reverts commit 181b98a.

* Revert "(draft) llamacpp-llm: dynamic tools cache tokens test debug"

This reverts commit 27e6a5c.

* fix(llamacpp-llm): address PR review comments N3-N8, merge main

N3: Save/restore inputs.use_jinja around no-tools tokenization to
    prevent getPrompt() Jinja fallback from corrupting the flag.
N4: Remove dead Jinja template variables (ns.multi_step_tool,
    ns.last_query_index) from Qwen3ToolsDynamicTemplate.
N5: Add missing assert(conversationOnlyTokens <= totalTokens) in
    MtmdLlmContext::tokenizeChat, matching TextLlmContext.
N6: Document Qwen3-only model support in tools-at-end.md.
N7: Merge duplicate if(nPast_==0 && !isCacheLoaded) blocks in
    TextLlmContext::tokenizeChat.
N8: Remove unnecessary save/restore of inputs.tools and
    inputs.add_generation_prompt (locals not read after).

Also: merge main into feature branch, move dynamic-tools changelog
to separate 0.13.1 entry.

Made-with: Cursor

* style(llamacpp-llm): apply clang-format to all PR-touched C++ files

Made-with: Cursor

* style(llamacpp-llm): fix remaining clang-format-19 brace-init formatting

Made-with: Cursor

* chore: remove accidentally committed binary file

The file packages/ocr-onnx/big_and_clear_watermarks.png was unintentionally staged during merge conflict resolution.

Made-with: Cursor

* chore(llm): bump version to 0.14.0

Made-with: Cursor

* chore: remove working artifacts from feature branch

Made-with: Cursor

* chore: remove accidentally committed sdk model history file

Made-with: Cursor

* doc: add dynamic-tools examples to README

Made-with: Cursor

* fix(llm): reset use_jinja from params_ instead of save/restore

Made-with: Cursor

* fix(llm): reset use_jinja before second getPrompt call

Made-with: Cursor

---------

Co-authored-by: Dmitry Malishev <dmitry.malishev@tether.io>
Co-authored-by: olyasir <sirkinolya@gmail.com>
Co-authored-by: gianni <gianfranco.cordella@tether.io>

* [tetherto/qvac] fix(nmtcpp): fix critical C++ bugs, add lint-cpp, update README (#1071)

* fix(nmtcpp): fix critical C++ bugs, add lint-cpp, update README

- Fix UB: PivotTranslationModel::translateString missing return path
- Fix cancel propagation to sub-models in PivotTranslationModel
- Fix stopTranslation_ flag never reset after cancel
- Fix translateBatch ignoring cancellation flag
- Fix private inheritance of IModelCancel in TranslationModel and
  PivotTranslationModel (enables dynamic_cast from framework)
- Fix typo: "Invalid backed type" -> "Invalid backend type"
- Fix operator precedence in detectBackendType (add explicit parens)
- Add lint-cpp script to package.json
- Update README: fix Bare version mismatch, doc links, pause/resume
  claim, add pivot example, update clone URLs for monorepo, clarify
  Bergamot build flag

Made-with: Cursor

* delete Move Semantics

---------

Co-authored-by: olyasir <sirkinolya@gmail.com>

* chore[notask]: backmerge release @qvac/cli v0.2.2 (#1076)

* chore: trigger CLI release 0.2.2 (#1011)

* doc[notask|skiplog]: add changelog for CLI v0.2.2 (#1013)

* doc[notask|skiplog]: add changelog for CLI v0.2.2

Made-with: Cursor

* fix: preserve existing changelog history

Made-with: Cursor

---------

Co-authored-by: Lauri Piisang <lauri.piisang@gmail.com>

* QVAC-14188: langdetect-text-cld2 ISO 369-3 support (#1078)

feat: cld2 support for ISO 639-1/2/3 code inputs for getting language names

* fix: handle absolute companion model paths in diffusion addon (#1077)

The SDK's resolveConfig() resolves companion model names (clipL, clipG,
t5Xxl, llm, vae) to absolute disk paths. Previously, the addon always
joined these with diskPath, which would produce broken double-joined
paths when given an already-absolute path. Add a resolve() helper that
passes absolute paths through unchanged and only joins relative ones.

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* fix: recover content gaps (#1067)

* infra[notask]: extend onnx tts mobile device farm timeouts and run q4/q4f16 matrix (#1075)

* chore: Add fp16 and q4 models in mobile integration tests

* fix: Increase timeout and run q4 and q4f16 models

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

* fix: replace lab results test fixture image (#1063)

Update the DocTR lab results fixture to use the new realistic sample while keeping the original filename for existing test and workflow references.

Made-with: Cursor

Co-authored-by: olyasir <sirkinolya@gmail.com>

* fix: update package.json URLs to monorepo for all packages (#1088)

* fix: update package.json URLs to point to monorepo for LLM, Embed, and Diffusion addons

The repository, bugs, and homepage URLs pointed to old standalone repos
that are either private or non-existent. Update to point to the qvac
monorepo with correct directory fields for npm.

* fix: update package.json URLs to monorepo for nmtcpp, ocr-onnx, and registry-server

Same fix as the previous commit but for the remaining packages with
stale standalone repo URLs.

* fix: add repository and homepage fields to remaining JS packages

Add consistent repository, bugs, and homepage fields pointing to
the monorepo for error, dl-base, dl-filesystem, dl-hyperdrive,
infer-base, langdetect-text, and rag packages.

* fix: add monorepo metadata to remaining packages

Add repository (with directory), bugs, and homepage fields to sdk,
logging, decoder-audio, diagnostics, onnx, tts-onnx, and
langdetect-text-cld2. Fix whispercpp to include directory in
repository and package-scoped homepage.

* fix: add monorepo metadata to cli, registry-client, and registry-schema

Add homepage to cli. Add repository, bugs, and homepage to
registry-client and registry-schema sub-packages.

* feat[notask]: add download profiler for registry blob performance diagnostics (#1040)

* feat[notask]: add download profiler for registry blob performance diagnostics

Made-with: Cursor

* fix: move profiler deps from devDependencies to dependencies

Made-with: Cursor

* doc: add profile command and example to client README

Made-with: Cursor

* fix: show full peer keys in profiler output for troubleshooting

Made-with: Cursor

* fix: validate parseInt results for interval and timeout CLI flags

Made-with: Cursor

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>

* fix: resolve dependabot alerts for registry-server transitive deps (#1093)

* fix(registry-server): PBKDF2 for passphrase-derived keys (CodeQL #9) (#1065)

* fix(registry-server): derive passphrase keys with PBKDF2

Replace single-pass SHA-256 with PBKDF2-HMAC-SHA256 (310k iterations)
for deterministic test keys; addresses CodeQL js/insufficient-password-hash.

* chore(registry-server): remove passphrase migration note from guide

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>

---------

Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Co-authored-by: Giacomo <119889121+GiacomoSorbiWork@users.noreply.github.com>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>
Co-authored-by: ogad-tether <omar.gad@tether.io>
Co-authored-by: dev-nid <nidhinpd811@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>
Co-authored-by: olyasir <sirkinolya@gmail.com>
Co-authored-by: RamazTs <66473301+RamazTs@users.noreply.github.com>
Co-authored-by: Raju Sharma <sharmaraju352@gmail.com>
Co-authored-by: iancris <17702377+iancris@users.noreply.github.com>
Co-authored-by: Mikhail Sotnikov <mialsot@gmail.com>
Co-authored-by: Dmitry Malishev <dmitry.malishev@tether.io>
Co-authored-by: alsrivas <40749307+Alok-Ranjan23@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>
Co-authored-by: Lauri Piisang <lauri.piisang@gmail.com>
Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
aegioscy added a commit that referenced this pull request Apr 2, 2026
Remove the ggml overlay port to align with main branch (commit ba9f55e)
which switched to using ggml from the registry instead of overlay ports.

This ensures consistency across the codebase and avoids reintroducing
the overlay port that was intentionally removed in PR #1066.

Made-with: Cursor
gianni-cor added a commit that referenced this pull request Apr 15, 2026
…nditioning (#884)

* updated for sd

* updated and successfuly built

* downloads

* updated with working loading

* updated load model js for Q4_K test

* rewrote parameter handling to support multiple params and also two different model types

* got sd inference to work

* updated for sd2

* got full sdxl to work

* rename folder to qvac-lib-infer-diffusion

* update package name

* sd3 finished

* rename: qvac-lib-infer-diffusion -> lib-infer-diffusion

Rename package directory from packages/qvac-lib-infer-diffusion to
packages/lib-infer-diffusion to align with the lib-* naming convention
used across the monorepo.

Made-with: Cursor

* updated for cuda linux

* updated for model

* have something working

* changelog

* cpp lint

* formatt

* updated model for gian

* integration test

* fixing according to boss

* fix(android): enable BUILD_SHARED_LIBS and stub pthread_cancel for GGML_BACKEND_DL

GGML_BACKEND_DL requires BUILD_SHARED_LIBS=ON so CMake can build GPU
backends as MODULE targets (.so). Previously BUILD_SHARED_LIBS was
hardcoded OFF, causing configure to fail on Android.

Also stub out pthread_cancel in ggml-backend-reg.cpp via a cmake
string replacement — pthread_cancel is unavailable in the Android NDK.
The loader thread terminates naturally without the explicit cancel.

Made-with: Cursor

* fix(android): exclude Vulkan on Android and fix pthread_cancel stub

Two portfile fixes for arm64-android cross-compile:

1. SD_VULKAN: the else() branch was enabling -DSD_VULKAN=ON for Android,
   causing find_package(Vulkan) to pick up the host x86_64 SDK during
   cross-compile and fail CMake configure. Android Vulkan support comes
   via the NDK and is handled separately; skip the flag entirely.

2. pthread_cancel: replace the fragile comment-based no-op with a proper
   inline stub guarded by #if defined(__ANDROID__), injected at the top
   of ggml-backend-reg.cpp before compilation.

Made-with: Cursor

* ci: dump vcpkg configure logs on failure for android build

Adds an always-run step that cats all config-*.log files from the
vcpkg stable-diffusion-cpp buildtrees on failure, so the exact CMake
configure error is visible inline in the CI job output.

Made-with: Cursor

* fix(android): insert pthread_cancel stub after pthread.h include

The previous stub was prepended to the top of ggml-backend-reg.cpp
before any #include, so pthread_t was undefined and the stub itself
failed to compile — leaving pthread_cancel undeclared for the actual
call site.

Fix: insert the no-op stub immediately after #include <pthread.h>
so pthread_t is available. Add a fallback that prepends both the
include and stub if <pthread.h> isn't found directly.

Also pass HAVE_PTHREAD_CANCEL=0 and GGML_HAVE_PTHREAD_CANCEL=OFF
as CMake cache variables to disable any check_function_exists tests,
and add DISABLE_PARALLEL_CONFIGURE to avoid race conditions with
source patches.

Made-with: Cursor

* fix(android): resolve BUILD_SHARED_LIBS override and pthread_cancel issues

Locally verified: stable-diffusion-cpp:arm64-android now configures and
builds successfully.

Three root causes fixed:

1. BUILD_SHARED_LIBS override: vcpkg maps VCPKG_LIBRARY_LINKAGE to
   BUILD_SHARED_LIBS, and the arm64-android triplet sets linkage to
   "static" — appending -DBUILD_SHARED_LIBS=OFF after our explicit ON.
   Additionally, stable-diffusion.cpp's CMakeLists.txt resets
   BUILD_SHARED_LIBS=OFF unless SD_BUILD_SHARED_GGML_LIB=ON.
   Fix: set VCPKG_LIBRARY_LINKAGE=dynamic for this port when DL
   backends are enabled, and pass -DSD_BUILD_SHARED_GGML_LIB=ON.

2. pthread_cancel stub redefinition: the previous stub was inserted
   via string(REPLACE) + fallback string(PREPEND), but both paths
   executed — producing a duplicate definition error. Also, vcpkg
   reuses cached source trees, so patches accumulated across builds.
   Fix: use a sentinel comment for idempotency; only one insertion
   path with the stub placed after #include <pthread.h>.

3. Removed the now-unnecessary explicit BUILD_SHARED_LIBS_OPTION
   variable since VCPKG_LIBRARY_LINKAGE handles it correctly.

Made-with: Cursor

* updated for android hopefully works

* added opencl support for android

* windows attempt fix

* attempting to fix windows again

* NORM problem with ggml operation

* attempting to patch norm

* attempting again to fix

* diagonstic step

* update for opencl

* updated for device selection

* fix(diffusion): add CI/CD workflows, test infra, and integration tests (#676)

* fix(diffusion): rebase on feature-media-generation, add CI improvements

Rebased cleanly onto feature-media-generation to pick up:
- SD_CPU_ONLY env var gate (Metal NORM op fallback to CPU)
- GGML_OPENMP=OFF (eliminates libomp.so.5 dependency)
- OpenCL support for Android

Additions on top of base:
- Add cpp-tests and ts-checks jobs to on-pr workflow
- Add image artifact upload to integration tests (traceable to source test)
- Disable win32 in prebuilds/integration/cpp-tests (C1128 /bigobj)
- Install libomp5 on Linux integration tests (safety net)
- Test infrastructure: unit tests, mobile test framework, scripts

* fix(diffusion): address PR review comments, enable win32, improve CI artifacts

- Re-enable win32 platform in prebuilds, integration-test, and cpp-tests workflows
- Remove duplicate PULL_REQUEST_TEMPLATE.md (already in repo root)
- Fix setDiff in validate-mobile-tests.js to handle non-Set inputs
- Refactor generate-image.test.js to use ensureModel from utils.js
- Save test images to modelDir for mobile permission compatibility
- Update CI to look for images in test/model/ instead of output/
- Add PR comment step to post image metadata on pull requests

* fix(diffusion): restore base branch code accidentally removed during rebase

Restores SD_CPU_ONLY patch, GGML_OPENMP=OFF, OpenCL support, Apple
keep_clip_on_cpu guard, and VCPKG_BUILD_TYPE placement that were
dropped when patches were applied on top of the reset base.

* style(diffusion): fix lint errors in examples (no-multi-spaces, indent)

* feat(diffusion): upload test images to S3 and display inline in step summary

Images are uploaded to S3 with public-read ACL, then embedded in the
step summary and PR comments via their S3 URLs so they render inline
without needing to download artifacts.

* ci(diffusion): remove libomp5 install (fixed by GGML_OPENMP=OFF in portfile)

* remove S3 upload, use simple table summary for generated images

* restore AWS env vars from base branch

* refactor(diffusion): consolidate test utils, remove helpers.js

Move detectPlatform, setupJsLogger, isPng into utils.js and update
generate-image.test.js to import from utils.js only. Add platform
detection for device selection in model-loading.test.js.

* fixed integration tests

* updated

* updated timeout

* cpp unit tests complete and tested YAY BABY

* cpp lint

* updated

* test(diffusion): add integration tests for SDXL, SD3, and FLUX.2 (#757)

* test(diffusion): add integration tests for SDXL, SD3, and FLUX.2

Add integration tests for all supported model families based on the
existing examples. Each test follows the LLM addon patterns: platform-
aware device selection, defensive cleanup with .catch(), ensureModel
for CI downloads.

- generate-image-sdxl.test.js: SDXL Base 1.0 (all-in-one GGUF, auto eps-prediction)
- generate-image-sd3.test.js: SD3 Medium (safetensors, flow prediction, euler sampler)
- generate-image-flux2.test.js: FLUX.2 klein 4B (split layout: diffusion + LLM + VAE)
- Regenerate all.js (brittle) and integration.auto.cjs (mobile)

* fix(diffusion): use CPU on all darwin platforms

Metal's GGML_OP_MUL_MAT is unsupported for stable-diffusion.cpp,
causing SIGABRT on darwin-arm64. Use isDarwin (all darwin) instead
of isDarwinX64 for the useCpu check.

* revert: keep GPU on darwin-arm64 to surface Metal errors

Don't hide GPU errors behind CPU fallback — the Metal MUL_MAT
issue needs to be visible so it gets fixed.

* test(diffusion): increase test timeouts for CPU-bound runs

FLUX.2 30min, SDXL/SD3 15min — these models are too heavy for
the default 10min timeout when running on CPU.

* chore: remove all.js from tracking (auto-generated, gitignored)

* test(diffusion): skip SDXL, SD3, and FLUX.2 tests on mobile

* QVAC-13954: Clean up vcpkg deps in lib-infer-diffusion (#781)

* refactor: split ggml into standalone vcpkg overlay port

Decouple ggml from the stable-diffusion-cpp overlay port so it can be
shared by multiple consumers with consistent ABI guarantees.

- Add standalone ggml overlay port (version-date 2026-01-30) pinned to
  the same commit used by stable-diffusion.cpp master-514-5792c66
- Refactor stable-diffusion-cpp port to use vcpkg_from_github +
  SD_USE_SYSTEM_GGML=ON instead of cloning with --recurse-submodules
- Patch ggml's src/CMakeLists.txt and cmake/ggml-config.cmake.in to
  propagate GGML_MAX_NAME=128 via INTERFACE_COMPILE_DEFINITIONS,
  ensuring all consumers share the same struct layout
- Switch both ports to version-date versioning (no upstream semver)
- Replace bundled stb headers with vcpkg stb dependency
- Auto-enable Vulkan backend on Linux via platform dependency
- Forward GPU backend features (metal/vulkan/cuda/opencl) from
  stable-diffusion-cpp to ggml through vcpkg feature

* fix(diffusion): fix ggml/sd overlay ports for Android cross-compilation

Add NDK-matched Vulkan C++ header detection so the ggml port downloads
headers matching the exact NDK Vulkan version instead of pulling a
potentially mismatched vcpkg vulkan-headers package.  Add missing
ggml-opencl.h to the public headers install list.  Auto-enable opencl
on Android and vulkan on desktop/Android via default-features in both
the ggml and stable-diffusion-cpp overlay ports.

* fix(diffusion): disable OpenMP and align ggml flags with qvac-fabric

Add GGML_OPENMP=OFF to fix Windows CI failure where OpenMP is
unavailable, and GGML_LLAMAFILE=OFF to disable unused code paths.
Add Android-specific flags for DL backends (GGML_BACKEND_DL,
CPU_ALL_VARIANTS, CPU_REPACK) and disable cooperative matrix
Vulkan extensions on mobile GPUs.

* fix(diffusion): fix ggml include dirs for DL backends and use tetherto fork

Patch ggml-config.cmake.in to set INTERFACE_INCLUDE_DIRECTORIES on the
ggml::ggml and ggml::ggml-base targets unconditionally.  When
GGML_BACKEND_DL is ON, the per-backend targets are not created and
include dirs were lost.  Also switch the SD source to the tetherto fork
and drop the qvac-diffusion- library prefix from CMakeLists.txt now
that ggml is a standalone port with standard names.

* Remove redundancies in vcpkg manifest files

* Set SD_CPU_ONLY=1 on CI env

* updated for runtime stats

* fixed connection to logger, as it was not properly connected before

* fixed for license file, validated working run on m1 air

* quickstart quick-maths

* fixed integration for windows

* fix(diffusion): add real cancel/abort support to native generation (#782)

* fix(diffusion): add real cancel/abort support to native generation

Cancel previously only set an atomic flag checked after generate_image()
returned — generation ran to full completion and output was silently
discarded. This made cancel appear to work while still burning full
compute time.

Changes:

Portfile patches (stable-diffusion.cpp):
- Add sd_abort_cb_t typedef and sd_set_abort_callback() public API
- Add sd_abort_requested() helper checked in the denoise lambda
- When abort fires, denoise returns nullptr which the sampler stack
  already treats as failure → generate_image() returns NULL
- Fix upstream bug: abort path freed wrong compute buffer
  (diffusion_model instead of work_diffusion_model), corrupting sd_ctx
  and causing segfault on reuse

SdModel.cpp:
- Wire cancelRequested_ into abort callback via thread-local (matches
  existing progress callback pattern for concurrency safety)
- Scope guard ensures callbacks are cleared on all exit paths including
  early parse/validation exceptions
- Always free results[i].data whether cancelled or not (buffer leak fix)
- Cancelled jobs throw "Job cancelled" → JobRunner emits queueException
  instead of fake success with queueResult + queueJobEnded
- Return empty std::any from process() so queueJobEnded() is the sole
  terminal stats path (fixes duplicate JobEnded events in JS)

SdModel.hpp:
- Add isCancelRequested() public accessor for the static abort callback

* fix(diffusion): disable free_params_immediately for model reuse

The upstream sd_ctx_params_init() defaults free_params_immediately=true,
which permanently frees model weight buffers after the first
generate_image() call. Any subsequent generation on the same sd_ctx
accesses freed memory and crashes (SIGSEGV).

Set the default to false so the addon supports multiple generations
on the same model instance (the expected use pattern).

This was the root cause of the "cancel then run" crash — the abort
path still runs through generate_image_internal() which calls
diffusion_model->free_params_buffer() when this flag is true.

* fix(diffusion): add code comments and rename fix-abort-cleanup patch

- Add comments to SdCtxHandlers.hpp explaining why freeParamsImmediately
  is disabled (upstream default frees weight buffers after first generation,
  causing use-after-free on model reuse)
- Add comments to both hunks in the upstream cleanup patch explaining the
  compute buffer bug and work_ctx leak
- Rename fix-abort-cleanup.patch to fix-failure-path-cleanup.patch since
  the fixes apply to any failure path, not just abort

* fix(diffusion): document cancel-as-error rationale vs LLM addon

Diffusion throws on cancel (queueException) while LLM returns normally
(queueResult). Add comment explaining the intentional difference: diffusion
has no useful partial output, so an explicit error signal is more honest
than a success with output_count=0.

* test(diffusion): add C++ unit tests for cancel/context handling

Add test_cancel_context.cpp covering the context changes from the cancel fix:

- cancel when idle is a no-op (no crash, no state corruption)
- cancel during generation throws "Job cancelled" (cancel-as-error path)
- model is reusable after cancel (validates freeParamsImmediately=false
  and compute buffer fix — the exact SIGSEGV scenario)
- multiple sequential generations succeed (normal reuse without cancel)
- cancelRequested_ flag is reset at process() entry
- process() on unloaded model throws (not segfault)
- runtime stats are populated after successful generation

* fix(diffusion): fix patch line counts and test assertion

- Fix fix-failure-path-cleanup.patch: correct hunk line counts
  (-2203,7 +2203,11 and -3796,6 +3800,13) and replace Unicode
  em-dashes with ASCII in comments
- Fix CancelWhenIdleIsNoop test: cancel() sets the flag even when
  idle, it is only cleared on process() entry

* refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup (#794)

* refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup

Patch ggml to support GGML_BACKEND_DL with BUILD_SHARED_LIBS=OFF by
enabling PIC and backend compile definitions when DL is on, matching
the qvac-fabric approach.  Remove VCPKG_LIBRARY_LINKAGE=dynamic
override — core libs are now static .a with PIC, backends remain
MODULE .so files.

Clean up CMakeLists.txt: remove redundant explicit linking of OpenCL,
Metal frameworks, CUDA libs, and ggml (all propagated transitively
via ggml cmake config).  Fix WIN32_LEAN_AND_MEAN typo, remove stale
comments, and drop the clang overlay triplet workaround.

* chore(diffusion): switch Linux to libc++, fix vcpkg warnings, remove dead patches

Add libc++ triplets for x64-linux and arm64-linux under vcpkg/triplets,
matching the qvac-lib-infer-llamacpp-llm layout.  Move triplet and
toolchain files from vcpkg-override-triplets to vcpkg/.  Install the
stable-diffusion-cpp usage file and suppress mismatched binary count
warnings in both overlay ports.  Remove obsolete rename-ggml-libs and
no-dlopen-without-backend-dl patches from the old submodule architecture.

* fix(diffusion): disable GGML_BACKEND_DL for Android static backends

stable-diffusion.cpp calls ggml_backend_is_cpu() and
ggml_backend_cpu_init() directly, which live in the CPU backend module.
With GGML_BACKEND_DL these become separate .so files unavailable at
link time, causing dlopen failures on device.

Statically link all backends (CPU, Vulkan, OpenCL) instead, and bundle
the OpenCL ICD loader .so on Android so the addon loads even on devices
without a system libOpenCL.

* Place the OpenCL ICD Loading library next to bare file

* fix(diffusion): graceful OpenCL fallback and backend priority reorder

Patch ggml's OpenCL backend to return nullptr instead of aborting when
no OpenCL devices are found (e.g. Pixel phones without OpenCL support).
Reorder SD backend priority to CUDA > Metal > OpenCL > Vulkan > CPU,
preferring OpenCL on Adreno devices where it outperforms Vulkan, with
if-guards so only the first successful backend is used.

* feat(diffusion): Adreno-aware backend selection for Android

Detect Adreno GPU model at runtime via ggml device enumeration and
choose the optimal backend: Adreno 800+ uses GPU (OpenCL), Adreno
600/700 is forced to CPU due to poor OpenCL performance, and
non-Adreno devices fall through to Vulkan.  Adds INFO-level logging
of detected devices and selection decisions for troubleshooting.

* fix(diffusion): statically link OpenCL ICD loader on Android

Add an overlay port for opencl that removes the dynamic-only
restriction, allowing the ICD loader to be built as a static library.
This eliminates libOpenCL.so as a NEEDED dependency so the addon
loads on all Android devices regardless of OpenCL support.  The
static ICD loader still dlopen's vendor drivers at runtime.

* Fixed formatting

* CPU only on Android

* feat(diffusion): hybrid static CPU + dynamic GPU backends for Android (#813)

* feat(diffusion): hybrid static CPU + dynamic GPU backends for Android

Add GGML_CPU_STATIC option that builds the CPU backend as a static
library linked into ggml even when GGML_BACKEND_DL is ON.  GPU
backends (Vulkan, OpenCL) remain MODULE .so files loaded at runtime
via dlopen, eliminating libOpenCL.so as a NEEDED dependency.

This lets stable-diffusion.cpp call CPU backend functions directly
(ggml_set_f32, ggml_backend_cpu_init, etc.) while GPU backends are
discovered at runtime — a single Android binary works on all devices
regardless of OpenCL/Vulkan support.

* feat(diffusion): generic backend init using ggml registry API

Replace SD's init_backend() #ifdef waterfall with generic ggml calls
(ggml_backend_init_by_type) that work with both statically linked and
dynamically loaded backends.  Load DL backend modules from the addon
via ggml_backend_load_all_from_path() when GGML_BACKEND_DL is enabled.

This eliminates SD's dependency on GPU-specific headers (ggml-opencl.h,
ggml-vulkan.h, etc.) and removes the SD_METAL/VULKAN/CUDA/OPENCL build
flags, replacing sd-cpu-only.patch and sd-backend-priority.patch with a
single sd-generic-backend-init.patch.

* feat(diffusion): prefer OpenCL on Adreno 800+ via sd_ctx backend preference

Add a new backend preference field in stable-diffusion context params and wire SdModel to request OpenCL for Adreno 800+ when available, while keeping SD_CPU_ONLY as CI-only env override.
Also fix ggml hybrid export wiring so CPU static symbols are linked for Android DL backend mode, and refresh android-arm64 prebuild artifact.

* fix(diffusion): pass backendsDir to SdCtxConfig

* Added logging to troubleshoot pixel vulkan init

* fix(diffusion): JS layer review fixes and cancel test coverage (#783)

* fix(diffusion): JS layer review fixes and cancel test coverage

Aligns the JS layer with the LLM addon patterns and adds API behavior
tests for cancel/busy/idle state transitions.

JS layer:
- Rename run() to _runInternal() (BaseInference template method pattern)
- Replace 30ms timer guard with _hasActiveResponse boolean
- Extract _getWeightFiles() to deduplicate file lists in _load/_downloadWeights
- Wrap _runGeneration in _withExclusiveRun for serialization
- Add finalized.catch(() => {}) unhandled rejection guard
- Reset _hasActiveResponse in unload()
- Filter undefined values in addon config coercion
- Remove orphaned unloadWeights() from addon.js
- Update class doc and README to match actual supported models

Types (index.d.ts):
- Fix run() signature: Txt2ImgParams (was accepting txt2vid params)
- Proper type hierarchy: Txt2ImgParams → Img2ImgParams → GenerationParams
- Add missing params: guidance, sampling_method, scheduler
- Remove unused type declarations

Tests:
- Add api-behavior.test.js with 5 cancel/busy/idle tests
- idle|run, idle|cancel, run|cancel, run|run (busy), cancel|run (rerun)
- cancel|run test requires native abort support (fix/diffusion-cancel-abort)

* fix(diffusion): cancel inside onUpdate callback matching LLM pattern

Cancel tests now fire model.cancel() inside the onUpdate callback
after the first progress tick (string data), matching the LLM addon's
runAndCancelAfterFirstToken pattern. This ensures native generation
is guaranteed to be active when cancel fires, preventing false passes.

* fix(diffusion): use const for non-reassigned chain variable

Standard JS lint requires const for variables that are never reassigned.

* fix(diffusion): update scope note instead of removing it

FLUX.1 and Wan2.x video are still not supported — keep that explicit.

* fix(diffusion): video generation is planned, not excluded

Wan2.x support is planned for the future — update scope note accordingly.

* fix(diffusion): address PR review — remove WeightsProvider, unify run API, update docs

- Remove WeightsProvider and _downloadWeights (files must be on disk)
- Unify txt2img/img2img into single run() with auto-detected mode
- Add return await to _withExclusiveRun calls (stack trace alignment)
- Strengthen run|run test to verify first response completes
- Update README: loader is optional, add t5XxlModel, fix load() docs
- Update docs/architecture.md: align with disk-local contract

* fix(diffusion): remove unused loader from constructor, tests, and examples

The diffusion addon never used the loader parameter — it was accepted
in the constructor but silently discarded. Model files are loaded
directly from disk via diskPath.

- Remove loader from ImgStableDiffusion constructor and type declarations
- Remove Loader interface and ReportProgressCallback (no remaining consumers)
- Remove FilesystemDL usage from all 6 integration tests and 7 examples
- Update README: remove data loader section, renumber steps, drop loader from args table

* fix(diffusion): remove stale loader deps and fix doc references

- Remove @qvac/dl-filesystem and @qvac/dl-hyperdrive from devDependencies
- Remove @qvac/dl-hyperdrive from peerDependencies
- Update architecture.md to reflect direct disk-path loading (no FilesystemDL)

* fix(diffusion): remove last Hyperdrive mention from architecture doc

* fix(diffusion): remove stale loadWeights from thread safety rules

* fix(diffusion): update data-flows doc to reflect unified run() API

* feat(diffusion): move stable-diffusion-cpp to registry (#865)

Support qvac ggml backend module names.

* updated i2i

* working anime version of i2i

* cpp lint

* fixed

* feat(diffusion): unify img2img to always use in-context conditioning

Remove the traditional img2img path (VAE encode → noise → denoise)
and route all image-conditioned generation through FLUX in-context
conditioning (reference tokens + joint attention). The user-facing
API stays simple: pass init_image → img2img mode automatically.

- addon.js: only handle init_image, always serialize as ref_image_bytes
- index.js: mode = init_image ? 'img2img' : 'txt2img' (no ref2img)
- SdModel.cpp: single img2img path using ref_images / joint attention
- SdGenHandlers.cpp: accept txt2img and img2img only
- test_ref2img.cpp: update mode from ref2img → img2img
- ref2img-flux2.js: use init_image instead of ref_image

Made-with: Cursor

* chore(diffusion): remove accidentally committed 27MB android prebuild zip

sd-cpp-android-arm64.zip was committed in e2f140e during the Android
GPU backend work. Add *.zip to .gitignore to prevent recurrence.

Made-with: Cursor

* fix(diffusion): remove unload() calls from img2img/ref2img tests

SdModel on main uses RAII (default destructor + unique_ptr deleter),
so unload() no longer exists. model.reset() is sufficient.

Made-with: Cursor

* refactor(diffusion): unify img2img API, add von Neumann test asset, remove ref2img/SDXL

- Add assets/von-neumann.jpg (Public Domain, U.S. DOE HD.3F.191) as the
  canonical test image for img2img examples and tests
- Remove ref2img as a separate concept — all image-to-image is now just
  "img2img" using FLUX in-context conditioning under the hood
- Delete ref2img-flux2.js example and test_ref2img.cpp unit test
- Delete img2img-sdxl.js example (FLUX-only for this delivery)
- Update all examples, integration test, C++ unit tests, and docs to use
  the new asset path and consistent img2img terminology
- Add image attribution to NOTICE and Credits section to README
- Round auto-detected image dimensions to nearest multiple of 8 in addon.js
- Run clang-format on modified C++ sources

Made-with: Cursor

* style(diffusion): fix standard lint violations in img2img examples

Replace backtick strings without interpolation with single quotes,
remove trailing spaces, and collapse multi-space comment alignment.

Made-with: Cursor

* fix(diffusion): add bare-fs as direct dependency to resolve CI module error

Move bare-fs from devDependencies to dependencies to fix MODULE_NOT_FOUND
errors in CI workflows. The package is required by the transitive dependency
@qvac/dl-filesystem and by test generation scripts, and file: dependencies
don't always properly resolve transitive dependencies in npm.

Made-with: Cursor

* attempting to resolve dl

* fixed pathing issue

* increased timeouts

* fix(diffusion): skip FLUX2 img2img test on CPU-only runners

Add NO_GPU environment variable check to skip FLUX2 img2img test on
CPU-only runners. FLUX2 img2img requires GPU acceleration as it's too
slow on CPU (VAE encoding + diffusion steps exceed 30min timeout).

This aligns with the existing FLUX2 txt2img test behavior and ensures
the test only runs on GPU-enabled runners (ai-run-linux-gpu,
mac-mini-m4-gpu, ai-run-windows11-gpu).

Made-with: Cursor

* fix(diffusion): only set SD_CPU_ONLY on no-GPU runners

Make SD_CPU_ONLY conditional based on matrix.no_gpu to allow GPU-enabled
runners (ai-run-linux-gpu, mac-mini-m4-gpu) to use GPU acceleration.

Previously, SD_CPU_ONLY was hardcoded to '1' for all Linux/macOS runners,
forcing even GPU runners to use CPU. This caused FLUX2 tests to be
extremely slow or timeout.

Now:
- GPU runners: SD_CPU_ONLY='0' (uses GPU)
- CPU-only runners: SD_CPU_ONLY='1' (uses CPU)

Made-with: Cursor

* fix(diffusion): remove SD_CPU_ONLY env var from workflow

Remove SD_CPU_ONLY entirely from the workflow as the C++ code checks if
the env var is set at all, not its value. Setting SD_CPU_ONLY=0 still
forces CPU mode.

The integration tests already handle CPU/GPU selection via the NO_GPU
env var and the skip logic, so SD_CPU_ONLY is not needed at the workflow
level.

This allows GPU runners to properly use GPU acceleration without the
workflow interfering with the backend selection.

Made-with: Cursor

* fix(diffusion): remove ggml overlay port to use registry version

Remove the ggml overlay port to align with main branch (commit ba9f55e)
which switched to using ggml from the registry instead of overlay ports.

This ensures consistency across the codebase and avoids reintroducing
the overlay port that was intentionally removed in PR #1066.

Made-with: Cursor

* changed seed and description

* fix(diffusion): increase Windows test timeout to 30 minutes

Increase Windows GPU runner timeout from 600s (10 min) to 1800s (30 min)
to match the FLUX2 test timeout. Windows Vulkan backend may be slower than
Linux/Mac for FLUX2 generation, and the sampling operations were timing out.

This gives Windows tests sufficient time to complete FLUX2 img2img and
txt2img generation without premature cancellation.

Made-with: Cursor

* chore(diffusion): regenerate mobile integration tests

Add FLUX2 img2img test to mobile integration test runners. The
integration.auto.cjs file is auto-generated and needs to be updated
whenever new integration tests are added.

Generated with: npm run test:mobile:generate

Made-with: Cursor

* feat(diffusion): change FLUX2 txt2img prompt to cartoon watercolor style

Update test prompt from photorealistic to cartoon watercolor style for
more visually distinctive output. The new style better demonstrates
FLUX2's artistic capabilities.

Prompt: "a red fox in a snowy forest, laying on a rock with a santa hat,
cartoon, watercolor"

Made-with: Cursor

* fix(diffusion): double test timeouts on Windows

Windows Vulkan backend is significantly slower than Linux/Mac, causing
integration tests to timeout. Double all test timeouts (600s → 1200s)
specifically on Windows platform while keeping other platforms unchanged.

Changes:
- model-loading.test.js: 10min → 20min on Windows
- api-behavior.test.js: 10min → 20min on Windows (5 tests)

This prevents premature timeout failures during diffusion model sampling
on Windows GPU runners.

Made-with: Cursor

* feat(diffusion): add SD3 img2img support with SDEdit and dual-path routing

Implements image-to-image transformation for SD3 Medium using SDEdit, with
automatic model-specific routing between FLUX in-context conditioning and
traditional SDEdit for other model families.

Key changes:
- Add examples/img2img-sd3.js: SDEdit example with flow-matching parameters
  (cfg_scale 4.5, strength 0.35-0.75, euler sampling)
- Implement dual-path img2img routing in SdModel.cpp:
  * FLUX2/FLUX: ref_images with auto_resize_ref_image (in-context conditioning)
  * SD1/SD2/SDXL/SD3: init_image with SDEdit (noise + denoise)
- Add automatic 8-alignment for non-multiple-of-8 input images:
  * Aligns dimensions up to nearest multiple of 8 to match generate_image()'s
    internal rounding, preventing GGML_ASSERT failures
  * Uses nearest-neighbor resize for the few pixels of padding needed
- Rename ref_image_bytes to init_image_bytes in JS layer (addon.js) for clarity
- Add integration test: test/integration/generate-image-sd3-i2i.test.js
- Update README with comprehensive img2img documentation:
  * Document dual-path routing strategy
  * Add SDEdit limitations (B&W images, resolution, strength, style biases)
  * Add SD3 img2img example
- Update JSDoc comments in index.js to reflect dual routing behavior
- Fix linting error in img2img-flux2.js (remove stray text on line 13)

Technical details:
The vcpkg version of stable-diffusion.cpp's generate_image() aligns width/height
up to spatial_multiple (typically 8) before creating tensors, then asserts that
init_image dimensions match exactly. For JPEG/PNG images with non-8-aligned
dimensions (e.g. 500×627), this caused assertion failures. The fix detects
mismatches and resizes the decoded image to the aligned dimensions before
passing to generate_image().

FLUX models are unaffected (use ref_images path with internal auto-resize).
SD3 and other models now handle arbitrary input dimensions correctly.

Made-with: Cursor

* added linting fix

Made-with: Cursor

* fixed integration test

* updated cpp lint

* updated for sizing

* fix(diffusion): fix SD3 img2img integration test OOM on Vulkan CI

- Add vae_on_cpu: true to avoid GPU memory exhaustion during VAE
  encode/decode on CI runners with limited VRAM
- Reduce steps from 40 to 20 for faster CI execution
- Add null guard on images array to prevent crash when generation
  fails, producing a clear error message instead
- Regenerate mobile integration test bundle

Made-with: Cursor

* attemping pr start

* fix(diffusion): format cpp files with clang-format

Made-with: Cursor

* fix(diffusion): address PR review — image resize, error handling, alignment

- Replace manual nearest-neighbor resize with stb_image_resize2 linear
  filtering via a new image_utils::resizeSdImage() utility
- Add null checks with descriptive errors on malloc, resize, and image
  decode failures
- Throw on failed init_image decode instead of silently skipping,
  removing one indentation level for readability
- Fix JS/C++ alignment mismatch: Math.round → Math.ceil to match the
  C++ ceil-alignment ((w + 7) / 8 * 8)
- Fix potential 32-bit overflow in allocation size computation by
  casting all operands to size_t

Made-with: Cursor

* fix(diffusion): format C++ files with clang-format-19

Made-with: Cursor

* perf(diffusion): use stbi_info_from_memory for efficient dimension decoding

- Replace stbi_load_from_memory with stbi_info_from_memory in decodeDimensions()
- Avoids allocating and loading full pixel data when only dimensions are needed
- Significantly more efficient for image dimension detection

Made-with: Cursor

* fix(diffusion): format test_img2img.cpp with clang-format-19

Made-with: Cursor

* docs(diffusion): add comprehensive guidance scale reference for img2img

- Document CFG scale vs distilled guidance parameter differences
- Add per-model guidance scale recommendations (SD1/SD2, SDXL, SD3, FLUX.2)
- Explain architectural differences: SD3 uses standard CFG while FLUX.2 uses distilled guidance
- Include img2img-specific guidance behavior and examples for each model
- Clarify why FLUX.2 sets cfg_scale=1.0 and uses guidance instead
- Add quick reference code examples for each model family

Made-with: Cursor

* chore: update vcpkg-registry baseline commit

Made-with: Cursor

* fix(diffusion): pin ggml to port-version 4 for Vulkan LSan leak fix

Revert the registry baseline bump and instead use a vcpkg override to
pull in only the ggml port-version 4 patch (qvac-registry-vcpkg#119),
which fixes LeakSanitizer reports in the Vulkan device cache.

Made-with: Cursor

* fix(diffusion): revert ggml to port-version 3, port-version 4 patch is broken

The ggml-vulkan-device-cache-owned-storage.patch from port-version 4
(qvac-registry-vcpkg#119) fails to apply — the patch context does not
match the ggml source at the pinned commit. Reverting to port-version 3
until the registry patch is fixed.

Made-with: Cursor

* fix(diffusion): add ggml overlay port with corrected Vulkan LSan patch

The ggml port-version 4 patch in qvac-registry-vcpkg#119 uses zero-
context hunks that git-apply cannot locate. Add a local overlay port
with the same fix (unique_ptr ownership for Vulkan device cache) but
with proper unified-diff context lines so the patch applies cleanly.

Made-with: Cursor

* fix(diffusion): use ggml port-version 5 from jpgaribotti fork

Use the corrected ggml overlay port from jpgaribotti/qvac-registry-vcpkg
which bumps to port-version 5 with a properly formatted Vulkan device
cache patch (includes unified-diff context lines).

Made-with: Cursor

* fix(diffusion): point registry to jpgaribotti fork for ggml port-version 5

Switch default-registry to jpgaribotti/qvac-registry-vcpkg which has the
corrected ggml Vulkan device cache patch (port-version 5). Remove the
local overlay port since the fork provides the fix directly.

Made-with: Cursor

* fix: suppress LSAN false positives in diffusion C++ tests

Updates vcpkg registry to tetherto/qvac-registry-vcpkg main
(baseline 8778399) which includes the ggml Vulkan device cache fix.
Also corrects LSAN suppressions file path in CI workflow to resolve
the suppression file within the package workdir.

Made-with: Cursor

* fix: add dbus leak suppressions for test initialization

Made-with: Cursor

* fix: add Windows model download step to cpp-tests workflow

Made-with: Cursor

* fix: reduce SD3 example steps from 100 to 28

SD3 Medium typically needs 20–30 steps; 100 was leftover
from experimentation and makes this example ~5x slower than needed.

Made-with: Cursor

* fix: correct example image paths

- img2img-flux2.js: use assets/von-neumann.jpg (works on fresh checkout)
  instead of temp/von-neumann_transformed.png (doesn't exist)
- img2img-sd3.js: write output to temp/ instead of assets/
  (assets are for checked-in test files, not generated images)

Made-with: Cursor

* fix: ensure temp directory exists in example scripts

Made-with: Cursor

* fix: validate init_image is Uint8Array in img2img mode

Prevents users from accidentally passing string paths (e.g., init_image:
'path/to/file.jpg') which would be misinterpreted as raw bytes and cause
cryptic C++ decoding failures. Now throws a clear error with guidance.

Made-with: Cursor

* fix: guard SdImageBatch against nullptr from generate_image()

generate_image() can return NULL on failure (OOM, abort mid-denoise).
When it does, SdImageBatch was constructed with data_=nullptr but
count_≥1, causing the destructor to dereference nullptr—segfault.

Now the destructor, operator[], and release() all check for null before
dereferencing. operator[] throws a descriptive error if called on null.

Made-with: Cursor

* fix(diffusion): format cpp files with clang-format-19

* fix(readme): clarify config vs parameter serialization

* fix: restore dbus leak suppressions removed by clang-format commit

* fix(diffusion): apply clang-format-19 to test_stb_image_security.cpp

* Update packages/lib-infer-diffusion/addon/src/model-interface/SdModel.cpp

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* Update packages/lib-infer-diffusion/addon/src/model-interface/SdModel.cpp

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* fix(diffusion): format cpp files with clang-format-19

* Revert "fix(diffusion): format cpp files with clang-format-19"

This reverts commit 8082388.

* fix(diffusion): guard FLUX img2img prediction and harden readImageDimensions

- Add JS-side guard in _runInternal() that throws when init_image is
  present on a FLUX model (llmModel set) but prediction is not explicitly
  flux2_flow or flux_flow, preventing silent fallback to SDEdit branch
- Add buffer-length checks to readImageDimensions() for truncated PNG
  (require >= 24 bytes) and JPEG (validate segLen >= 2, guard SOF reads)
- Update prediction docstring in index.d.ts to clarify FLUX img2img
  requires an explicit prediction value
- Add regression tests for all of the above (13 cases)

Made-with: Cursor

* fix(diffusion): remove FLUX.1 references from documentation

- Update prediction docstring to focus on FLUX.2 img2img guidance
- Remove FLUX.1 from encoder file name comments (keep only relevant models)
- Update error message to reference FLUX.2 only in user-facing guidance
- Keep flux_flow type in PredictionType union for backward compatibility

Made-with: Cursor

* test(diffusion): add input-validation test to mobile integration suite

Register the new input-validation regression tests in the mobile test runner
so truncated image and FLUX prediction guard tests run on all platforms.

Made-with: Cursor

* chore(diffusion): bump to 0.2.0 and update changelog

- Bump package version from 0.1.3 to 0.2.0 for img2img feature release
- Update CHANGELOG.md with 0.2.0 entry: FLUX.2 img2img, input validation, regression tests
- Remove stale CHANGELOG (keeping CHANGELOG.md as canonical source)

Made-with: Cursor

* fix(diffusion): revert vcpkg registry baseline to main

Restore default-registry baseline to a9eae49a7c95a63 (matches main).
The 87783998cb67fe6 baseline was an unintended change.

Made-with: Cursor

---------

Co-authored-by: gianni-cor <gianfrancocordella@gmail.com>
Co-authored-by: aegioscy <nik@linux64vm.com>
Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>
Co-authored-by: gianni <gianfranco.cordella@tether.io>
Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>
Proletter added a commit that referenced this pull request May 24, 2026
* fix: fix race condition in LLM example download utility (#1019)

* fix: fix race condition in LLM example download utility

The redirect handler in examples/utils.js called fs.unlink fire-and-forget
then immediately recursed into downloadModel. The recursive call could find
the empty file still on disk (existsSync → true) before unlink completed,
causing an ENOENT crash on the subsequent statSync.

Port the proven download pattern from test/integration/utils.js:
- Wait for unlink callback before recursing on redirect
- Handle 307/308 redirects (HuggingFace uses 302)
- Handle relative redirect URLs
- Use safeResolve/safeReject guards to prevent double settlement
- Add response error handler and fileStream error handler

* fix: use URL constructor for safer redirect resolution


* fix: fix race condition in embed and diffusion download utilities

Port the proven download pattern from the LLM package (PR #1019):
- Wait for fs.unlink callback before recursing on redirect
- Add safeResolve/safeReject guards to prevent double settlement
- Handle 307/308 redirects in embed examples/utils.js
- Add fileStream and response error handlers
- Use URL constructor for safer redirect resolution
- Use close event instead of finish for write completion


---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* doc: update README - table of packages - add diffusion and diagnostics - key features - add openAI-compatible API (#1033)

* fix: fix docs build and escape MDX curly braces in errors.mdx and removed randomly created (#1051)

* doc: generate API docs for v0.8.0

* chore[notask]: remove accidentally committed file

* fix: fix docs build and escape MDX curly braces in errors.mdx and removed random

* fix: revert pre-build script

---------

Co-authored-by: Bruno Campana <7632562+BrunoCampana@users.noreply.github.com>

* Fix security issues flagged by CodeQL in TTS package (#1058)

* Updated qvac-lint-cpp to match latest version from original repo (#1064)

* fix: add native job IDs to addon-cpp callbacks (#955)

* fix: preserve addon job ownership across cancel/reuse

Propagate native job IDs through addon-cpp queued callbacks so late cancel events stay attached to the cancelled job. Remove the Parakeet stale-cancel workaround and align Whisper with the shared runtime contract.

Made-with: Cursor

* chore: scope addon-cpp job-id update to 1.1.3

Limit this branch to the shared addon-cpp runtime changes and bump the package to 1.1.3. Follow-up addon consumer updates will land in separate PRs after the registry is updated.

Made-with: Cursor

* fix: move pending job state before unlock

Copy the pending job into local state before releasing the JobRunner mutex so processing and error paths no longer read job_ without synchronization.

Made-with: Cursor

---------

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

* Removed overlay ports. Build from registry. (#1066)

* fix: use object config format in nativelog example (#1070)

* QVAC-13813 chore: add int8 parakeet eou and sortformer production registry entries (#1035)

* chore: Add int8 quantised models for Parakeet EOU and Sortformer

* fix: Add links for quantised parakeet models

* fix: Remove tokenizer for int8

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>

* fix[notask]: resolve code scanning security findings in nmtcpp and ocr-onnx (#1060)

* fix[notask]: resolve code scanning security findings in nmtcpp and ocr-onnx

Fix ReDoS vulnerabilities in indic-processor URL and numeral regexes by
removing nested quantifiers. Fix ReDoS in sacremoses tokenizer protected
patterns by requiring opening quotes to eliminate ambiguous backtracking.
Fix incomplete string replacement in indic_normalize by using global
regex for pipe character substitution. Replace insecure tempfile.mktemp
with NamedTemporaryFile in ocr-onnx benchmark script.

* fix[notask]: resolve polynomial ReDoS in numeral and other patterns

Fix _NUMERAL_PATTERN by replacing ambiguous \d+\.?\d* with
\d+(?:\.\d+)? to eliminate overlapping digit quantifiers.
Fix _OTHER_PATTERN by bounding the prefix to {0,100} to prevent
polynomial backtracking when no separator is found.

* fix[notask]: bound regex quantifiers to eliminate polynomial ReDoS

Replace unbounded \d+ with \d{1,20} and \w+ with \w{1,100} in
_NUMERAL_PATTERN and _OTHER_PATTERN to make backtracking constant-time
regardless of input length. No real-world numeral exceeds 20 digits
and no hashtag/mention exceeds 100 chars.

---------

Co-authored-by: RamazTs <66473301+RamazTs@users.noreply.github.com>

* feat[whisper][notask]: add streaming VAD transcription to whisper addon (#998)

* feat: add streaming VAD transcription to whisper addon

- Add C++ StreamingProcessor with Silero VAD for speech segmentation
- StreamingProcessor runs on its own thread, buffers incoming audio,
  and uses whisper_vad_* APIs to detect speech boundaries
- RAII wrapper (VadSegmentsPtr) for automatic VAD segment cleanup
- Backpressure handling: drop oldest audio when buffer exceeds cap
- JS bindings: startStreaming, appendStreamingAudio, endStreaming
- New error codes for streaming operations (6012-6014)
- Addon state properly reset in response finally handler

Made-with: Cursor

* fix: address PR review comments for whisper streaming VAD

- Replace g_streamingProcessors map with single-processor globals
  (one active streaming job at a time per Gustavo's feedback)
- Wire streaming cleanup into cancel and destroyInstance via
  cancelWithStreaming and destroyInstanceWithStreaming wrappers
- Add StreamingProcessor::cancel() for forceful abort with
  model cancellation and thread join
- Fix stats accumulation: use WhisperModel::process(Input&) void
  overload + takeOutput() so stats accumulate across segments
  instead of resetting per-segment
- Add WhisperModel::prepareForStreaming() to reset stats and
  cancel flag once at session start
- Propagate segment processing errors via hasError_ flag and
  queue exception at stream end
- Add streaming methods to MockedBinding (startStreaming,
  appendStreamingAudio, endStreaming, error simulation)
- Add 6 unit tests covering streaming lifecycle, stats, cancel,
  destroy, error propagation, and concurrent session rejection
- Add example.streaming-vad.js demonstrating runStreaming() API
  with fs.createReadStream as audio source

Made-with: Cursor

---------

Co-authored-by: Raju <raju.sharma>

* QVAC-14357 fix(onnx): Code clean-up and fixes (#1049)

* (feature) llamacpp-llm: dynamic tools (#706)

* (improvement) llamacpp-llm: Qwen3 dynamic tools template

* (improvement) llamacpp-llm: add llm config tools flag

* (improvement) llamacpp-llm: use template based on tools param

* (improvement) llamacpp-llm: count tools token offset with tokenizer

* (improvement) llamacpp-llm: track n-past, run Qwen3 tests, fix reset

* (improvement) llamacpp-llm: save cache with respect to tools flag

* (fix) llamacpp-llm: add Qwen3ToolsDynamicTemplate.cpp to production CMakeLists

The new source file was added to the test CMakeLists but missing from the addon and cli_tool targets, causing an undefined symbol linker error on CI win64 builds.

Made-with: Cursor

* chore: retrigger CI for CMakeLists fix

Made-with: Cursor

* (fix) llamacpp-llm: fix use-after-free SIGSEGV on process exit (linux)

Reorder TextLlmContext members so threadpools are declared before llamaInit_. C++ destroys members in reverse declaration order, so llamaInit_ (which calls llama_free) now runs while threadpools are still alive, preventing use-after-free when llama_free accesses attached threadpool pointers.

Made-with: Cursor

* Revert "(fix) llamacpp-llm: fix use-after-free SIGSEGV on process exit (linux)"

This reverts commit 7d9c237.

* (fix) llamacpp-llm: robust threadpool teardown to prevent SIGSEGV on exit

The ThreadPoolDeleter was doing ggml backend registry lookups during destruction, which is fragile during process teardown when the registry may already be torn down. Additionally, threadpools attached to llama_context could be freed before the context itself, causing use-after-free. Fix: cache ggml_threadpool_free fn pointer at construction time, and add explicit destructor that detaches threadpools before freeing them.

Made-with: Cursor

* Revert "(fix) llamacpp-llm: robust threadpool teardown to prevent SIGSEGV on exit"

This reverts commit 4e66b38.

* fix(llm): reset stale state before non-cached run after prefill

When a prefill run leaves nPast_ > 0 and the next run is a non-cached single-shot, the stale KV cache and dynamic-tools bookkeeping (nPastBeforeTools_, nConversationOnlyTokens_) caused token duplication and incorrect cache trimming. Clear state eagerly when shouldResetAfterInference is true and nPast_ is non-zero.

Made-with: Cursor

* fix(llm): trim stale tool tokens in multi-turn sessions with tools_at_end

When tools_at_end is true and a session continues without explicit save between turns, old tool+response tokens remained in the KV cache. New tool tokens were appended, causing conflicting tool definitions.

Add a guard in processPrompt() that trims from nPastBeforeTools_ to nPast_ before eval when stale tool tokens are detected. Includes new dynamic-tools integration tests covering changing tools, same tools, and single-shot regression.

Made-with: Cursor

* (fix) llamacpp-llm: dynamic tools cache trim, tmp template, debugs

* fix(llm): pass toolsAtEnd flag to context constructors to fix template selection race

The toolsAtEnd flag was set via setToolsAtEnd() after context creation,
but getChatTemplateForModel() was called during construction — always
seeing toolsAtEnd=0 and selecting the wrong Qwen3 template.

Pass the flag through createContext() into TextLlmContext and
MtmdLlmContext constructors so the correct template is selected
from the start. Also restore the conditional template selection
in ChatTemplateUtils that was previously hardcoded.

* feat(llm): strip tool_call/think blocks from re-sent assistant responses

Add stripInternalBlocks() helper to testToolRemoval.js and
benchToolsPlacement.js to remove <tool_call> and <think> blocks
from assistant responses before including them in conversation
history. Prevents model from pattern-matching on old tool calls
and hallucinating removed tools.

Also extend benchToolsPlacement to 20 turns and add HTML chart.

* (fix) llamacpp-llm: use correct template in tests

* (chore) llamacpp-llm: move qwen3 cache tests to own file

* (improvement) llamacpp-llm: simplify nPastBeforeTools reset, multi-turn cache tests

* (improvement) llamacpp-llm: simply nPastBeforeTools tracking, no trim on save

* (chore) llamacpp-llm: remove redundant getters and cleanup

* (internal) llamacpp-llm: run Qwen3 context tests

* (chore) cleanup

* (chore) fix lint errors in examples

* (chore) fix remaining lint errors in benchToolsPlacement

* (chore) fix indentation in benchToolsPlacement ternary

* (chore) llamacpp-llm: remove unused example files

* (chore) remove scratch planning docs

* (doc) llamacpp-llm: tools_at_end param description

* (chore) llamacpp-llm: changelog and version bump

* refactor(llamacpp-llm): address PR #706 review comments

Implement all 10 reviewer requests from PR #706 (jesusmb1995, gianni-cor).

| # | Reviewer | Request | Result |
|---|---------|---------|--------|
| R1 | @jesusmb1995 | Extract DynamicToolsState class | Done - new class in LlmContext.hpp with toolsAtEnd_, nConversationOnlyTokens_, nPastBeforeTools_, recordToolBoundary(), reset() |
| R2 | @jesusmb1995 | Collapse 3 virtual methods into single dynamicToolsState() accessor | Done - removed setToolsAtEnd, getNPastBeforeTools, setNPastBeforeTools virtuals; added dynamicToolsState() non-virtual accessor on base class |
| R3 | @gianni-cor | Remove redundant setToolsAtEnd() after createContext() | Done - removed the 4-line block in LlamaModel::init() |
| R4 | @gianni-cor | Add assert: nConversationOnlyTokens_ <= inputTokens.size() | Done - added in TextLlmContext::tokenizeChat |
| R5 | @gianni-cor | Reset nConversationOnlyTokens_ in TextLlmContext::resetState | Done - both contexts now call dynamicToolsState().reset() which resets both values |
| R6 | @gianni-cor | Guard tools_at_end for non-Qwen3 models | Done - architecture check after config parsing, logs warning and disables flag |
| R7 | @gianni-cor | Fix off-by-A trim error (disable add_generation_prompt) | Done - both TextLlmContext and MtmdLlmContext save/restore add_generation_prompt=false during no-tools tokenization |
| R8 | @gianni-cor | Add cold-start reset in MtmdLlmContext::tokenizeChat | Done - dynamicToolsState().reset() added at cold-start path |
| R9 | @gianni-cor | Cap firstMsgTokens_ after post-eval trim | Done - setFirstMsgTokens(getNPast()) if inflated after trim |
| R10 | @gianni-cor | Remove duplicate toolsAtEnd_ from LlamaModel | Done - runtime code in processPromptImpl queries dynamicToolsState().toolsAtEnd() instead of state_->toolsAtEnd_ |

Made-with: Cursor

* refactor(llamacpp-llm): remove toolsAtEnd_ from ReloadableState, single source of truth in DynamicToolsState

Made-with: Cursor

* fix(llamacpp-llm): use dts.reset() after post-eval trim for full state cleanup

Made-with: Cursor

* (draft) llamacpp-llm: dynamic tools cache tokens test debug

* (internal) llamacpp-llm: dynamic tools token count and cache match test

* Revert "(internal) llamacpp-llm: dynamic tools token count and cache match test"

This reverts commit 181b98a.

* Revert "(draft) llamacpp-llm: dynamic tools cache tokens test debug"

This reverts commit 27e6a5c.

* fix(llamacpp-llm): address PR review comments N3-N8, merge main

N3: Save/restore inputs.use_jinja around no-tools tokenization to
    prevent getPrompt() Jinja fallback from corrupting the flag.
N4: Remove dead Jinja template variables (ns.multi_step_tool,
    ns.last_query_index) from Qwen3ToolsDynamicTemplate.
N5: Add missing assert(conversationOnlyTokens <= totalTokens) in
    MtmdLlmContext::tokenizeChat, matching TextLlmContext.
N6: Document Qwen3-only model support in tools-at-end.md.
N7: Merge duplicate if(nPast_==0 && !isCacheLoaded) blocks in
    TextLlmContext::tokenizeChat.
N8: Remove unnecessary save/restore of inputs.tools and
    inputs.add_generation_prompt (locals not read after).

Also: merge main into feature branch, move dynamic-tools changelog
to separate 0.13.1 entry.

Made-with: Cursor

* style(llamacpp-llm): apply clang-format to all PR-touched C++ files

Made-with: Cursor

* style(llamacpp-llm): fix remaining clang-format-19 brace-init formatting

Made-with: Cursor

* chore: remove accidentally committed binary file

The file packages/ocr-onnx/big_and_clear_watermarks.png was unintentionally staged during merge conflict resolution.

Made-with: Cursor

* chore(llm): bump version to 0.14.0

Made-with: Cursor

* chore: remove working artifacts from feature branch

Made-with: Cursor

* chore: remove accidentally committed sdk model history file

Made-with: Cursor

* doc: add dynamic-tools examples to README

Made-with: Cursor

* fix(llm): reset use_jinja from params_ instead of save/restore

Made-with: Cursor

* fix(llm): reset use_jinja before second getPrompt call

Made-with: Cursor

---------

Co-authored-by: Dmitry Malishev <dmitry.malishev@tether.io>
Co-authored-by: olyasir <sirkinolya@gmail.com>
Co-authored-by: gianni <gianfranco.cordella@tether.io>

* [tetherto/qvac] fix(nmtcpp): fix critical C++ bugs, add lint-cpp, update README (#1071)

* fix(nmtcpp): fix critical C++ bugs, add lint-cpp, update README

- Fix UB: PivotTranslationModel::translateString missing return path
- Fix cancel propagation to sub-models in PivotTranslationModel
- Fix stopTranslation_ flag never reset after cancel
- Fix translateBatch ignoring cancellation flag
- Fix private inheritance of IModelCancel in TranslationModel and
  PivotTranslationModel (enables dynamic_cast from framework)
- Fix typo: "Invalid backed type" -> "Invalid backend type"
- Fix operator precedence in detectBackendType (add explicit parens)
- Add lint-cpp script to package.json
- Update README: fix Bare version mismatch, doc links, pause/resume
  claim, add pivot example, update clone URLs for monorepo, clarify
  Bergamot build flag

Made-with: Cursor

* delete Move Semantics

---------

Co-authored-by: olyasir <sirkinolya@gmail.com>

* chore[notask]: backmerge release @qvac/cli v0.2.2 (#1076)

* chore: trigger CLI release 0.2.2 (#1011)

* doc[notask|skiplog]: add changelog for CLI v0.2.2 (#1013)

* doc[notask|skiplog]: add changelog for CLI v0.2.2

Made-with: Cursor

* fix: preserve existing changelog history

Made-with: Cursor

---------

Co-authored-by: Lauri Piisang <lauri.piisang@gmail.com>

* QVAC-14188: langdetect-text-cld2 ISO 369-3 support (#1078)

feat: cld2 support for ISO 639-1/2/3 code inputs for getting language names

* fix: handle absolute companion model paths in diffusion addon (#1077)

The SDK's resolveConfig() resolves companion model names (clipL, clipG,
t5Xxl, llm, vae) to absolute disk paths. Previously, the addon always
joined these with diskPath, which would produce broken double-joined
paths when given an already-absolute path. Add a resolve() helper that
passes absolute paths through unchanged and only joins relative ones.

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* fix: recover content gaps (#1067)

* infra[notask]: extend onnx tts mobile device farm timeouts and run q4/q4f16 matrix (#1075)

* chore: Add fp16 and q4 models in mobile integration tests

* fix: Increase timeout and run q4 and q4f16 models

---------

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

* fix: replace lab results test fixture image (#1063)

Update the DocTR lab results fixture to use the new realistic sample while keeping the original filename for existing test and workflow references.

Made-with: Cursor

Co-authored-by: olyasir <sirkinolya@gmail.com>

* fix: update package.json URLs to monorepo for all packages (#1088)

* fix: update package.json URLs to point to monorepo for LLM, Embed, and Diffusion addons

The repository, bugs, and homepage URLs pointed to old standalone repos
that are either private or non-existent. Update to point to the qvac
monorepo with correct directory fields for npm.

* fix: update package.json URLs to monorepo for nmtcpp, ocr-onnx, and registry-server

Same fix as the previous commit but for the remaining packages with
stale standalone repo URLs.

* fix: add repository and homepage fields to remaining JS packages

Add consistent repository, bugs, and homepage fields pointing to
the monorepo for error, dl-base, dl-filesystem, dl-hyperdrive,
infer-base, langdetect-text, and rag packages.

* fix: add monorepo metadata to remaining packages

Add repository (with directory), bugs, and homepage fields to sdk,
logging, decoder-audio, diagnostics, onnx, tts-onnx, and
langdetect-text-cld2. Fix whispercpp to include directory in
repository and package-scoped homepage.

* fix: add monorepo metadata to cli, registry-client, and registry-schema

Add homepage to cli. Add repository, bugs, and homepage to
registry-client and registry-schema sub-packages.

* feat[notask]: add download profiler for registry blob performance diagnostics (#1040)

* feat[notask]: add download profiler for registry blob performance diagnostics

Made-with: Cursor

* fix: move profiler deps from devDependencies to dependencies

Made-with: Cursor

* doc: add profile command and example to client README

Made-with: Cursor

* fix: show full peer keys in profiler output for troubleshooting

Made-with: Cursor

* fix: validate parseInt results for interval and timeout CLI flags

Made-with: Cursor

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>

* fix: resolve dependabot alerts for registry-server transitive deps (#1093)

* fix(registry-server): PBKDF2 for passphrase-derived keys (CodeQL #9) (#1065)

* fix(registry-server): derive passphrase keys with PBKDF2

Replace single-pass SHA-256 with PBKDF2-HMAC-SHA256 (310k iterations)
for deterministic test keys; addresses CodeQL js/insufficient-password-hash.

* chore(registry-server): remove passphrase migration note from guide

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>

---------

Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Co-authored-by: Giacomo <119889121+GiacomoSorbiWork@users.noreply.github.com>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>
Co-authored-by: ogad-tether <omar.gad@tether.io>
Co-authored-by: dev-nid <nidhinpd811@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra2@gmail.com>
Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com>
Co-authored-by: olyasir <sirkinolya@gmail.com>
Co-authored-by: RamazTs <66473301+RamazTs@users.noreply.github.com>
Co-authored-by: Raju Sharma <sharmaraju352@gmail.com>
Co-authored-by: iancris <17702377+iancris@users.noreply.github.com>
Co-authored-by: Mikhail Sotnikov <mialsot@gmail.com>
Co-authored-by: Dmitry Malishev <dmitry.malishev@tether.io>
Co-authored-by: alsrivas <40749307+Alok-Ranjan23@users.noreply.github.com>
Co-authored-by: Simon Iribarren <simon.ig13@gmail.com>
Co-authored-by: Lauri Piisang <lauri.piisang@gmail.com>
Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
Proletter pushed a commit that referenced this pull request May 24, 2026
…nditioning (#884)

* updated for sd

* updated and successfuly built

* downloads

* updated with working loading

* updated load model js for Q4_K test

* rewrote parameter handling to support multiple params and also two different model types

* got sd inference to work

* updated for sd2

* got full sdxl to work

* rename folder to qvac-lib-infer-diffusion

* update package name

* sd3 finished

* rename: qvac-lib-infer-diffusion -> lib-infer-diffusion

Rename package directory from packages/qvac-lib-infer-diffusion to
packages/lib-infer-diffusion to align with the lib-* naming convention
used across the monorepo.

Made-with: Cursor

* updated for cuda linux

* updated for model

* have something working

* changelog

* cpp lint

* formatt

* updated model for gian

* integration test

* fixing according to boss

* fix(android): enable BUILD_SHARED_LIBS and stub pthread_cancel for GGML_BACKEND_DL

GGML_BACKEND_DL requires BUILD_SHARED_LIBS=ON so CMake can build GPU
backends as MODULE targets (.so). Previously BUILD_SHARED_LIBS was
hardcoded OFF, causing configure to fail on Android.

Also stub out pthread_cancel in ggml-backend-reg.cpp via a cmake
string replacement — pthread_cancel is unavailable in the Android NDK.
The loader thread terminates naturally without the explicit cancel.

Made-with: Cursor

* fix(android): exclude Vulkan on Android and fix pthread_cancel stub

Two portfile fixes for arm64-android cross-compile:

1. SD_VULKAN: the else() branch was enabling -DSD_VULKAN=ON for Android,
   causing find_package(Vulkan) to pick up the host x86_64 SDK during
   cross-compile and fail CMake configure. Android Vulkan support comes
   via the NDK and is handled separately; skip the flag entirely.

2. pthread_cancel: replace the fragile comment-based no-op with a proper
   inline stub guarded by #if defined(__ANDROID__), injected at the top
   of ggml-backend-reg.cpp before compilation.

Made-with: Cursor

* ci: dump vcpkg configure logs on failure for android build

Adds an always-run step that cats all config-*.log files from the
vcpkg stable-diffusion-cpp buildtrees on failure, so the exact CMake
configure error is visible inline in the CI job output.

Made-with: Cursor

* fix(android): insert pthread_cancel stub after pthread.h include

The previous stub was prepended to the top of ggml-backend-reg.cpp
before any #include, so pthread_t was undefined and the stub itself
failed to compile — leaving pthread_cancel undeclared for the actual
call site.

Fix: insert the no-op stub immediately after #include <pthread.h>
so pthread_t is available. Add a fallback that prepends both the
include and stub if <pthread.h> isn't found directly.

Also pass HAVE_PTHREAD_CANCEL=0 and GGML_HAVE_PTHREAD_CANCEL=OFF
as CMake cache variables to disable any check_function_exists tests,
and add DISABLE_PARALLEL_CONFIGURE to avoid race conditions with
source patches.

Made-with: Cursor

* fix(android): resolve BUILD_SHARED_LIBS override and pthread_cancel issues

Locally verified: stable-diffusion-cpp:arm64-android now configures and
builds successfully.

Three root causes fixed:

1. BUILD_SHARED_LIBS override: vcpkg maps VCPKG_LIBRARY_LINKAGE to
   BUILD_SHARED_LIBS, and the arm64-android triplet sets linkage to
   "static" — appending -DBUILD_SHARED_LIBS=OFF after our explicit ON.
   Additionally, stable-diffusion.cpp's CMakeLists.txt resets
   BUILD_SHARED_LIBS=OFF unless SD_BUILD_SHARED_GGML_LIB=ON.
   Fix: set VCPKG_LIBRARY_LINKAGE=dynamic for this port when DL
   backends are enabled, and pass -DSD_BUILD_SHARED_GGML_LIB=ON.

2. pthread_cancel stub redefinition: the previous stub was inserted
   via string(REPLACE) + fallback string(PREPEND), but both paths
   executed — producing a duplicate definition error. Also, vcpkg
   reuses cached source trees, so patches accumulated across builds.
   Fix: use a sentinel comment for idempotency; only one insertion
   path with the stub placed after #include <pthread.h>.

3. Removed the now-unnecessary explicit BUILD_SHARED_LIBS_OPTION
   variable since VCPKG_LIBRARY_LINKAGE handles it correctly.

Made-with: Cursor

* updated for android hopefully works

* added opencl support for android

* windows attempt fix

* attempting to fix windows again

* NORM problem with ggml operation

* attempting to patch norm

* attempting again to fix

* diagonstic step

* update for opencl

* updated for device selection

* fix(diffusion): add CI/CD workflows, test infra, and integration tests (#676)

* fix(diffusion): rebase on feature-media-generation, add CI improvements

Rebased cleanly onto feature-media-generation to pick up:
- SD_CPU_ONLY env var gate (Metal NORM op fallback to CPU)
- GGML_OPENMP=OFF (eliminates libomp.so.5 dependency)
- OpenCL support for Android

Additions on top of base:
- Add cpp-tests and ts-checks jobs to on-pr workflow
- Add image artifact upload to integration tests (traceable to source test)
- Disable win32 in prebuilds/integration/cpp-tests (C1128 /bigobj)
- Install libomp5 on Linux integration tests (safety net)
- Test infrastructure: unit tests, mobile test framework, scripts

* fix(diffusion): address PR review comments, enable win32, improve CI artifacts

- Re-enable win32 platform in prebuilds, integration-test, and cpp-tests workflows
- Remove duplicate PULL_REQUEST_TEMPLATE.md (already in repo root)
- Fix setDiff in validate-mobile-tests.js to handle non-Set inputs
- Refactor generate-image.test.js to use ensureModel from utils.js
- Save test images to modelDir for mobile permission compatibility
- Update CI to look for images in test/model/ instead of output/
- Add PR comment step to post image metadata on pull requests

* fix(diffusion): restore base branch code accidentally removed during rebase

Restores SD_CPU_ONLY patch, GGML_OPENMP=OFF, OpenCL support, Apple
keep_clip_on_cpu guard, and VCPKG_BUILD_TYPE placement that were
dropped when patches were applied on top of the reset base.

* style(diffusion): fix lint errors in examples (no-multi-spaces, indent)

* feat(diffusion): upload test images to S3 and display inline in step summary

Images are uploaded to S3 with public-read ACL, then embedded in the
step summary and PR comments via their S3 URLs so they render inline
without needing to download artifacts.

* ci(diffusion): remove libomp5 install (fixed by GGML_OPENMP=OFF in portfile)

* remove S3 upload, use simple table summary for generated images

* restore AWS env vars from base branch

* refactor(diffusion): consolidate test utils, remove helpers.js

Move detectPlatform, setupJsLogger, isPng into utils.js and update
generate-image.test.js to import from utils.js only. Add platform
detection for device selection in model-loading.test.js.

* fixed integration tests

* updated

* updated timeout

* cpp unit tests complete and tested YAY BABY

* cpp lint

* updated

* test(diffusion): add integration tests for SDXL, SD3, and FLUX.2 (#757)

* test(diffusion): add integration tests for SDXL, SD3, and FLUX.2

Add integration tests for all supported model families based on the
existing examples. Each test follows the LLM addon patterns: platform-
aware device selection, defensive cleanup with .catch(), ensureModel
for CI downloads.

- generate-image-sdxl.test.js: SDXL Base 1.0 (all-in-one GGUF, auto eps-prediction)
- generate-image-sd3.test.js: SD3 Medium (safetensors, flow prediction, euler sampler)
- generate-image-flux2.test.js: FLUX.2 klein 4B (split layout: diffusion + LLM + VAE)
- Regenerate all.js (brittle) and integration.auto.cjs (mobile)

* fix(diffusion): use CPU on all darwin platforms

Metal's GGML_OP_MUL_MAT is unsupported for stable-diffusion.cpp,
causing SIGABRT on darwin-arm64. Use isDarwin (all darwin) instead
of isDarwinX64 for the useCpu check.

* revert: keep GPU on darwin-arm64 to surface Metal errors

Don't hide GPU errors behind CPU fallback — the Metal MUL_MAT
issue needs to be visible so it gets fixed.

* test(diffusion): increase test timeouts for CPU-bound runs

FLUX.2 30min, SDXL/SD3 15min — these models are too heavy for
the default 10min timeout when running on CPU.

* chore: remove all.js from tracking (auto-generated, gitignored)

* test(diffusion): skip SDXL, SD3, and FLUX.2 tests on mobile

* QVAC-13954: Clean up vcpkg deps in lib-infer-diffusion (#781)

* refactor: split ggml into standalone vcpkg overlay port

Decouple ggml from the stable-diffusion-cpp overlay port so it can be
shared by multiple consumers with consistent ABI guarantees.

- Add standalone ggml overlay port (version-date 2026-01-30) pinned to
  the same commit used by stable-diffusion.cpp master-514-5792c66
- Refactor stable-diffusion-cpp port to use vcpkg_from_github +
  SD_USE_SYSTEM_GGML=ON instead of cloning with --recurse-submodules
- Patch ggml's src/CMakeLists.txt and cmake/ggml-config.cmake.in to
  propagate GGML_MAX_NAME=128 via INTERFACE_COMPILE_DEFINITIONS,
  ensuring all consumers share the same struct layout
- Switch both ports to version-date versioning (no upstream semver)
- Replace bundled stb headers with vcpkg stb dependency
- Auto-enable Vulkan backend on Linux via platform dependency
- Forward GPU backend features (metal/vulkan/cuda/opencl) from
  stable-diffusion-cpp to ggml through vcpkg feature

* fix(diffusion): fix ggml/sd overlay ports for Android cross-compilation

Add NDK-matched Vulkan C++ header detection so the ggml port downloads
headers matching the exact NDK Vulkan version instead of pulling a
potentially mismatched vcpkg vulkan-headers package.  Add missing
ggml-opencl.h to the public headers install list.  Auto-enable opencl
on Android and vulkan on desktop/Android via default-features in both
the ggml and stable-diffusion-cpp overlay ports.

* fix(diffusion): disable OpenMP and align ggml flags with qvac-fabric

Add GGML_OPENMP=OFF to fix Windows CI failure where OpenMP is
unavailable, and GGML_LLAMAFILE=OFF to disable unused code paths.
Add Android-specific flags for DL backends (GGML_BACKEND_DL,
CPU_ALL_VARIANTS, CPU_REPACK) and disable cooperative matrix
Vulkan extensions on mobile GPUs.

* fix(diffusion): fix ggml include dirs for DL backends and use tetherto fork

Patch ggml-config.cmake.in to set INTERFACE_INCLUDE_DIRECTORIES on the
ggml::ggml and ggml::ggml-base targets unconditionally.  When
GGML_BACKEND_DL is ON, the per-backend targets are not created and
include dirs were lost.  Also switch the SD source to the tetherto fork
and drop the qvac-diffusion- library prefix from CMakeLists.txt now
that ggml is a standalone port with standard names.

* Remove redundancies in vcpkg manifest files

* Set SD_CPU_ONLY=1 on CI env

* updated for runtime stats

* fixed connection to logger, as it was not properly connected before

* fixed for license file, validated working run on m1 air

* quickstart quick-maths

* fixed integration for windows

* fix(diffusion): add real cancel/abort support to native generation (#782)

* fix(diffusion): add real cancel/abort support to native generation

Cancel previously only set an atomic flag checked after generate_image()
returned — generation ran to full completion and output was silently
discarded. This made cancel appear to work while still burning full
compute time.

Changes:

Portfile patches (stable-diffusion.cpp):
- Add sd_abort_cb_t typedef and sd_set_abort_callback() public API
- Add sd_abort_requested() helper checked in the denoise lambda
- When abort fires, denoise returns nullptr which the sampler stack
  already treats as failure → generate_image() returns NULL
- Fix upstream bug: abort path freed wrong compute buffer
  (diffusion_model instead of work_diffusion_model), corrupting sd_ctx
  and causing segfault on reuse

SdModel.cpp:
- Wire cancelRequested_ into abort callback via thread-local (matches
  existing progress callback pattern for concurrency safety)
- Scope guard ensures callbacks are cleared on all exit paths including
  early parse/validation exceptions
- Always free results[i].data whether cancelled or not (buffer leak fix)
- Cancelled jobs throw "Job cancelled" → JobRunner emits queueException
  instead of fake success with queueResult + queueJobEnded
- Return empty std::any from process() so queueJobEnded() is the sole
  terminal stats path (fixes duplicate JobEnded events in JS)

SdModel.hpp:
- Add isCancelRequested() public accessor for the static abort callback

* fix(diffusion): disable free_params_immediately for model reuse

The upstream sd_ctx_params_init() defaults free_params_immediately=true,
which permanently frees model weight buffers after the first
generate_image() call. Any subsequent generation on the same sd_ctx
accesses freed memory and crashes (SIGSEGV).

Set the default to false so the addon supports multiple generations
on the same model instance (the expected use pattern).

This was the root cause of the "cancel then run" crash — the abort
path still runs through generate_image_internal() which calls
diffusion_model->free_params_buffer() when this flag is true.

* fix(diffusion): add code comments and rename fix-abort-cleanup patch

- Add comments to SdCtxHandlers.hpp explaining why freeParamsImmediately
  is disabled (upstream default frees weight buffers after first generation,
  causing use-after-free on model reuse)
- Add comments to both hunks in the upstream cleanup patch explaining the
  compute buffer bug and work_ctx leak
- Rename fix-abort-cleanup.patch to fix-failure-path-cleanup.patch since
  the fixes apply to any failure path, not just abort

* fix(diffusion): document cancel-as-error rationale vs LLM addon

Diffusion throws on cancel (queueException) while LLM returns normally
(queueResult). Add comment explaining the intentional difference: diffusion
has no useful partial output, so an explicit error signal is more honest
than a success with output_count=0.

* test(diffusion): add C++ unit tests for cancel/context handling

Add test_cancel_context.cpp covering the context changes from the cancel fix:

- cancel when idle is a no-op (no crash, no state corruption)
- cancel during generation throws "Job cancelled" (cancel-as-error path)
- model is reusable after cancel (validates freeParamsImmediately=false
  and compute buffer fix — the exact SIGSEGV scenario)
- multiple sequential generations succeed (normal reuse without cancel)
- cancelRequested_ flag is reset at process() entry
- process() on unloaded model throws (not segfault)
- runtime stats are populated after successful generation

* fix(diffusion): fix patch line counts and test assertion

- Fix fix-failure-path-cleanup.patch: correct hunk line counts
  (-2203,7 +2203,11 and -3796,6 +3800,13) and replace Unicode
  em-dashes with ASCII in comments
- Fix CancelWhenIdleIsNoop test: cancel() sets the flag even when
  idle, it is only cleared on process() entry

* refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup (#794)

* refactor(diffusion): static ggml core with DL backends and CMakeLists cleanup

Patch ggml to support GGML_BACKEND_DL with BUILD_SHARED_LIBS=OFF by
enabling PIC and backend compile definitions when DL is on, matching
the qvac-fabric approach.  Remove VCPKG_LIBRARY_LINKAGE=dynamic
override — core libs are now static .a with PIC, backends remain
MODULE .so files.

Clean up CMakeLists.txt: remove redundant explicit linking of OpenCL,
Metal frameworks, CUDA libs, and ggml (all propagated transitively
via ggml cmake config).  Fix WIN32_LEAN_AND_MEAN typo, remove stale
comments, and drop the clang overlay triplet workaround.

* chore(diffusion): switch Linux to libc++, fix vcpkg warnings, remove dead patches

Add libc++ triplets for x64-linux and arm64-linux under vcpkg/triplets,
matching the qvac-lib-infer-llamacpp-llm layout.  Move triplet and
toolchain files from vcpkg-override-triplets to vcpkg/.  Install the
stable-diffusion-cpp usage file and suppress mismatched binary count
warnings in both overlay ports.  Remove obsolete rename-ggml-libs and
no-dlopen-without-backend-dl patches from the old submodule architecture.

* fix(diffusion): disable GGML_BACKEND_DL for Android static backends

stable-diffusion.cpp calls ggml_backend_is_cpu() and
ggml_backend_cpu_init() directly, which live in the CPU backend module.
With GGML_BACKEND_DL these become separate .so files unavailable at
link time, causing dlopen failures on device.

Statically link all backends (CPU, Vulkan, OpenCL) instead, and bundle
the OpenCL ICD loader .so on Android so the addon loads even on devices
without a system libOpenCL.

* Place the OpenCL ICD Loading library next to bare file

* fix(diffusion): graceful OpenCL fallback and backend priority reorder

Patch ggml's OpenCL backend to return nullptr instead of aborting when
no OpenCL devices are found (e.g. Pixel phones without OpenCL support).
Reorder SD backend priority to CUDA > Metal > OpenCL > Vulkan > CPU,
preferring OpenCL on Adreno devices where it outperforms Vulkan, with
if-guards so only the first successful backend is used.

* feat(diffusion): Adreno-aware backend selection for Android

Detect Adreno GPU model at runtime via ggml device enumeration and
choose the optimal backend: Adreno 800+ uses GPU (OpenCL), Adreno
600/700 is forced to CPU due to poor OpenCL performance, and
non-Adreno devices fall through to Vulkan.  Adds INFO-level logging
of detected devices and selection decisions for troubleshooting.

* fix(diffusion): statically link OpenCL ICD loader on Android

Add an overlay port for opencl that removes the dynamic-only
restriction, allowing the ICD loader to be built as a static library.
This eliminates libOpenCL.so as a NEEDED dependency so the addon
loads on all Android devices regardless of OpenCL support.  The
static ICD loader still dlopen's vendor drivers at runtime.

* Fixed formatting

* CPU only on Android

* feat(diffusion): hybrid static CPU + dynamic GPU backends for Android (#813)

* feat(diffusion): hybrid static CPU + dynamic GPU backends for Android

Add GGML_CPU_STATIC option that builds the CPU backend as a static
library linked into ggml even when GGML_BACKEND_DL is ON.  GPU
backends (Vulkan, OpenCL) remain MODULE .so files loaded at runtime
via dlopen, eliminating libOpenCL.so as a NEEDED dependency.

This lets stable-diffusion.cpp call CPU backend functions directly
(ggml_set_f32, ggml_backend_cpu_init, etc.) while GPU backends are
discovered at runtime — a single Android binary works on all devices
regardless of OpenCL/Vulkan support.

* feat(diffusion): generic backend init using ggml registry API

Replace SD's init_backend() #ifdef waterfall with generic ggml calls
(ggml_backend_init_by_type) that work with both statically linked and
dynamically loaded backends.  Load DL backend modules from the addon
via ggml_backend_load_all_from_path() when GGML_BACKEND_DL is enabled.

This eliminates SD's dependency on GPU-specific headers (ggml-opencl.h,
ggml-vulkan.h, etc.) and removes the SD_METAL/VULKAN/CUDA/OPENCL build
flags, replacing sd-cpu-only.patch and sd-backend-priority.patch with a
single sd-generic-backend-init.patch.

* feat(diffusion): prefer OpenCL on Adreno 800+ via sd_ctx backend preference

Add a new backend preference field in stable-diffusion context params and wire SdModel to request OpenCL for Adreno 800+ when available, while keeping SD_CPU_ONLY as CI-only env override.
Also fix ggml hybrid export wiring so CPU static symbols are linked for Android DL backend mode, and refresh android-arm64 prebuild artifact.

* fix(diffusion): pass backendsDir to SdCtxConfig

* Added logging to troubleshoot pixel vulkan init

* fix(diffusion): JS layer review fixes and cancel test coverage (#783)

* fix(diffusion): JS layer review fixes and cancel test coverage

Aligns the JS layer with the LLM addon patterns and adds API behavior
tests for cancel/busy/idle state transitions.

JS layer:
- Rename run() to _runInternal() (BaseInference template method pattern)
- Replace 30ms timer guard with _hasActiveResponse boolean
- Extract _getWeightFiles() to deduplicate file lists in _load/_downloadWeights
- Wrap _runGeneration in _withExclusiveRun for serialization
- Add finalized.catch(() => {}) unhandled rejection guard
- Reset _hasActiveResponse in unload()
- Filter undefined values in addon config coercion
- Remove orphaned unloadWeights() from addon.js
- Update class doc and README to match actual supported models

Types (index.d.ts):
- Fix run() signature: Txt2ImgParams (was accepting txt2vid params)
- Proper type hierarchy: Txt2ImgParams → Img2ImgParams → GenerationParams
- Add missing params: guidance, sampling_method, scheduler
- Remove unused type declarations

Tests:
- Add api-behavior.test.js with 5 cancel/busy/idle tests
- idle|run, idle|cancel, run|cancel, run|run (busy), cancel|run (rerun)
- cancel|run test requires native abort support (fix/diffusion-cancel-abort)

* fix(diffusion): cancel inside onUpdate callback matching LLM pattern

Cancel tests now fire model.cancel() inside the onUpdate callback
after the first progress tick (string data), matching the LLM addon's
runAndCancelAfterFirstToken pattern. This ensures native generation
is guaranteed to be active when cancel fires, preventing false passes.

* fix(diffusion): use const for non-reassigned chain variable

Standard JS lint requires const for variables that are never reassigned.

* fix(diffusion): update scope note instead of removing it

FLUX.1 and Wan2.x video are still not supported — keep that explicit.

* fix(diffusion): video generation is planned, not excluded

Wan2.x support is planned for the future — update scope note accordingly.

* fix(diffusion): address PR review — remove WeightsProvider, unify run API, update docs

- Remove WeightsProvider and _downloadWeights (files must be on disk)
- Unify txt2img/img2img into single run() with auto-detected mode
- Add return await to _withExclusiveRun calls (stack trace alignment)
- Strengthen run|run test to verify first response completes
- Update README: loader is optional, add t5XxlModel, fix load() docs
- Update docs/architecture.md: align with disk-local contract

* fix(diffusion): remove unused loader from constructor, tests, and examples

The diffusion addon never used the loader parameter — it was accepted
in the constructor but silently discarded. Model files are loaded
directly from disk via diskPath.

- Remove loader from ImgStableDiffusion constructor and type declarations
- Remove Loader interface and ReportProgressCallback (no remaining consumers)
- Remove FilesystemDL usage from all 6 integration tests and 7 examples
- Update README: remove data loader section, renumber steps, drop loader from args table

* fix(diffusion): remove stale loader deps and fix doc references

- Remove @qvac/dl-filesystem and @qvac/dl-hyperdrive from devDependencies
- Remove @qvac/dl-hyperdrive from peerDependencies
- Update architecture.md to reflect direct disk-path loading (no FilesystemDL)

* fix(diffusion): remove last Hyperdrive mention from architecture doc

* fix(diffusion): remove stale loadWeights from thread safety rules

* fix(diffusion): update data-flows doc to reflect unified run() API

* feat(diffusion): move stable-diffusion-cpp to registry (#865)

Support qvac ggml backend module names.

* updated i2i

* working anime version of i2i

* cpp lint

* fixed

* feat(diffusion): unify img2img to always use in-context conditioning

Remove the traditional img2img path (VAE encode → noise → denoise)
and route all image-conditioned generation through FLUX in-context
conditioning (reference tokens + joint attention). The user-facing
API stays simple: pass init_image → img2img mode automatically.

- addon.js: only handle init_image, always serialize as ref_image_bytes
- index.js: mode = init_image ? 'img2img' : 'txt2img' (no ref2img)
- SdModel.cpp: single img2img path using ref_images / joint attention
- SdGenHandlers.cpp: accept txt2img and img2img only
- test_ref2img.cpp: update mode from ref2img → img2img
- ref2img-flux2.js: use init_image instead of ref_image

Made-with: Cursor

* chore(diffusion): remove accidentally committed 27MB android prebuild zip

sd-cpp-android-arm64.zip was committed in e2f140e during the Android
GPU backend work. Add *.zip to .gitignore to prevent recurrence.

Made-with: Cursor

* fix(diffusion): remove unload() calls from img2img/ref2img tests

SdModel on main uses RAII (default destructor + unique_ptr deleter),
so unload() no longer exists. model.reset() is sufficient.

Made-with: Cursor

* refactor(diffusion): unify img2img API, add von Neumann test asset, remove ref2img/SDXL

- Add assets/von-neumann.jpg (Public Domain, U.S. DOE HD.3F.191) as the
  canonical test image for img2img examples and tests
- Remove ref2img as a separate concept — all image-to-image is now just
  "img2img" using FLUX in-context conditioning under the hood
- Delete ref2img-flux2.js example and test_ref2img.cpp unit test
- Delete img2img-sdxl.js example (FLUX-only for this delivery)
- Update all examples, integration test, C++ unit tests, and docs to use
  the new asset path and consistent img2img terminology
- Add image attribution to NOTICE and Credits section to README
- Round auto-detected image dimensions to nearest multiple of 8 in addon.js
- Run clang-format on modified C++ sources

Made-with: Cursor

* style(diffusion): fix standard lint violations in img2img examples

Replace backtick strings without interpolation with single quotes,
remove trailing spaces, and collapse multi-space comment alignment.

Made-with: Cursor

* fix(diffusion): add bare-fs as direct dependency to resolve CI module error

Move bare-fs from devDependencies to dependencies to fix MODULE_NOT_FOUND
errors in CI workflows. The package is required by the transitive dependency
@qvac/dl-filesystem and by test generation scripts, and file: dependencies
don't always properly resolve transitive dependencies in npm.

Made-with: Cursor

* attempting to resolve dl

* fixed pathing issue

* increased timeouts

* fix(diffusion): skip FLUX2 img2img test on CPU-only runners

Add NO_GPU environment variable check to skip FLUX2 img2img test on
CPU-only runners. FLUX2 img2img requires GPU acceleration as it's too
slow on CPU (VAE encoding + diffusion steps exceed 30min timeout).

This aligns with the existing FLUX2 txt2img test behavior and ensures
the test only runs on GPU-enabled runners (ai-run-linux-gpu,
mac-mini-m4-gpu, ai-run-windows11-gpu).

Made-with: Cursor

* fix(diffusion): only set SD_CPU_ONLY on no-GPU runners

Make SD_CPU_ONLY conditional based on matrix.no_gpu to allow GPU-enabled
runners (ai-run-linux-gpu, mac-mini-m4-gpu) to use GPU acceleration.

Previously, SD_CPU_ONLY was hardcoded to '1' for all Linux/macOS runners,
forcing even GPU runners to use CPU. This caused FLUX2 tests to be
extremely slow or timeout.

Now:
- GPU runners: SD_CPU_ONLY='0' (uses GPU)
- CPU-only runners: SD_CPU_ONLY='1' (uses CPU)

Made-with: Cursor

* fix(diffusion): remove SD_CPU_ONLY env var from workflow

Remove SD_CPU_ONLY entirely from the workflow as the C++ code checks if
the env var is set at all, not its value. Setting SD_CPU_ONLY=0 still
forces CPU mode.

The integration tests already handle CPU/GPU selection via the NO_GPU
env var and the skip logic, so SD_CPU_ONLY is not needed at the workflow
level.

This allows GPU runners to properly use GPU acceleration without the
workflow interfering with the backend selection.

Made-with: Cursor

* fix(diffusion): remove ggml overlay port to use registry version

Remove the ggml overlay port to align with main branch (commit 90f72cd)
which switched to using ggml from the registry instead of overlay ports.

This ensures consistency across the codebase and avoids reintroducing
the overlay port that was intentionally removed in PR #1066.

Made-with: Cursor

* changed seed and description

* fix(diffusion): increase Windows test timeout to 30 minutes

Increase Windows GPU runner timeout from 600s (10 min) to 1800s (30 min)
to match the FLUX2 test timeout. Windows Vulkan backend may be slower than
Linux/Mac for FLUX2 generation, and the sampling operations were timing out.

This gives Windows tests sufficient time to complete FLUX2 img2img and
txt2img generation without premature cancellation.

Made-with: Cursor

* chore(diffusion): regenerate mobile integration tests

Add FLUX2 img2img test to mobile integration test runners. The
integration.auto.cjs file is auto-generated and needs to be updated
whenever new integration tests are added.

Generated with: npm run test:mobile:generate

Made-with: Cursor

* feat(diffusion): change FLUX2 txt2img prompt to cartoon watercolor style

Update test prompt from photorealistic to cartoon watercolor style for
more visually distinctive output. The new style better demonstrates
FLUX2's artistic capabilities.

Prompt: "a red fox in a snowy forest, laying on a rock with a santa hat,
cartoon, watercolor"

Made-with: Cursor

* fix(diffusion): double test timeouts on Windows

Windows Vulkan backend is significantly slower than Linux/Mac, causing
integration tests to timeout. Double all test timeouts (600s → 1200s)
specifically on Windows platform while keeping other platforms unchanged.

Changes:
- model-loading.test.js: 10min → 20min on Windows
- api-behavior.test.js: 10min → 20min on Windows (5 tests)

This prevents premature timeout failures during diffusion model sampling
on Windows GPU runners.

Made-with: Cursor

* feat(diffusion): add SD3 img2img support with SDEdit and dual-path routing

Implements image-to-image transformation for SD3 Medium using SDEdit, with
automatic model-specific routing between FLUX in-context conditioning and
traditional SDEdit for other model families.

Key changes:
- Add examples/img2img-sd3.js: SDEdit example with flow-matching parameters
  (cfg_scale 4.5, strength 0.35-0.75, euler sampling)
- Implement dual-path img2img routing in SdModel.cpp:
  * FLUX2/FLUX: ref_images with auto_resize_ref_image (in-context conditioning)
  * SD1/SD2/SDXL/SD3: init_image with SDEdit (noise + denoise)
- Add automatic 8-alignment for non-multiple-of-8 input images:
  * Aligns dimensions up to nearest multiple of 8 to match generate_image()'s
    internal rounding, preventing GGML_ASSERT failures
  * Uses nearest-neighbor resize for the few pixels of padding needed
- Rename ref_image_bytes to init_image_bytes in JS layer (addon.js) for clarity
- Add integration test: test/integration/generate-image-sd3-i2i.test.js
- Update README with comprehensive img2img documentation:
  * Document dual-path routing strategy
  * Add SDEdit limitations (B&W images, resolution, strength, style biases)
  * Add SD3 img2img example
- Update JSDoc comments in index.js to reflect dual routing behavior
- Fix linting error in img2img-flux2.js (remove stray text on line 13)

Technical details:
The vcpkg version of stable-diffusion.cpp's generate_image() aligns width/height
up to spatial_multiple (typically 8) before creating tensors, then asserts that
init_image dimensions match exactly. For JPEG/PNG images with non-8-aligned
dimensions (e.g. 500×627), this caused assertion failures. The fix detects
mismatches and resizes the decoded image to the aligned dimensions before
passing to generate_image().

FLUX models are unaffected (use ref_images path with internal auto-resize).
SD3 and other models now handle arbitrary input dimensions correctly.

Made-with: Cursor

* added linting fix

Made-with: Cursor

* fixed integration test

* updated cpp lint

* updated for sizing

* fix(diffusion): fix SD3 img2img integration test OOM on Vulkan CI

- Add vae_on_cpu: true to avoid GPU memory exhaustion during VAE
  encode/decode on CI runners with limited VRAM
- Reduce steps from 40 to 20 for faster CI execution
- Add null guard on images array to prevent crash when generation
  fails, producing a clear error message instead
- Regenerate mobile integration test bundle

Made-with: Cursor

* attemping pr start

* fix(diffusion): format cpp files with clang-format

Made-with: Cursor

* fix(diffusion): address PR review — image resize, error handling, alignment

- Replace manual nearest-neighbor resize with stb_image_resize2 linear
  filtering via a new image_utils::resizeSdImage() utility
- Add null checks with descriptive errors on malloc, resize, and image
  decode failures
- Throw on failed init_image decode instead of silently skipping,
  removing one indentation level for readability
- Fix JS/C++ alignment mismatch: Math.round → Math.ceil to match the
  C++ ceil-alignment ((w + 7) / 8 * 8)
- Fix potential 32-bit overflow in allocation size computation by
  casting all operands to size_t

Made-with: Cursor

* fix(diffusion): format C++ files with clang-format-19

Made-with: Cursor

* perf(diffusion): use stbi_info_from_memory for efficient dimension decoding

- Replace stbi_load_from_memory with stbi_info_from_memory in decodeDimensions()
- Avoids allocating and loading full pixel data when only dimensions are needed
- Significantly more efficient for image dimension detection

Made-with: Cursor

* fix(diffusion): format test_img2img.cpp with clang-format-19

Made-with: Cursor

* docs(diffusion): add comprehensive guidance scale reference for img2img

- Document CFG scale vs distilled guidance parameter differences
- Add per-model guidance scale recommendations (SD1/SD2, SDXL, SD3, FLUX.2)
- Explain architectural differences: SD3 uses standard CFG while FLUX.2 uses distilled guidance
- Include img2img-specific guidance behavior and examples for each model
- Clarify why FLUX.2 sets cfg_scale=1.0 and uses guidance instead
- Add quick reference code examples for each model family

Made-with: Cursor

* chore: update vcpkg-registry baseline commit

Made-with: Cursor

* fix(diffusion): pin ggml to port-version 4 for Vulkan LSan leak fix

Revert the registry baseline bump and instead use a vcpkg override to
pull in only the ggml port-version 4 patch (qvac-registry-vcpkg#119),
which fixes LeakSanitizer reports in the Vulkan device cache.

Made-with: Cursor

* fix(diffusion): revert ggml to port-version 3, port-version 4 patch is broken

The ggml-vulkan-device-cache-owned-storage.patch from port-version 4
(qvac-registry-vcpkg#119) fails to apply — the patch context does not
match the ggml source at the pinned commit. Reverting to port-version 3
until the registry patch is fixed.

Made-with: Cursor

* fix(diffusion): add ggml overlay port with corrected Vulkan LSan patch

The ggml port-version 4 patch in qvac-registry-vcpkg#119 uses zero-
context hunks that git-apply cannot locate. Add a local overlay port
with the same fix (unique_ptr ownership for Vulkan device cache) but
with proper unified-diff context lines so the patch applies cleanly.

Made-with: Cursor

* fix(diffusion): use ggml port-version 5 from jpgaribotti fork

Use the corrected ggml overlay port from jpgaribotti/qvac-registry-vcpkg
which bumps to port-version 5 with a properly formatted Vulkan device
cache patch (includes unified-diff context lines).

Made-with: Cursor

* fix(diffusion): point registry to jpgaribotti fork for ggml port-version 5

Switch default-registry to jpgaribotti/qvac-registry-vcpkg which has the
corrected ggml Vulkan device cache patch (port-version 5). Remove the
local overlay port since the fork provides the fix directly.

Made-with: Cursor

* fix: suppress LSAN false positives in diffusion C++ tests

Updates vcpkg registry to tetherto/qvac-registry-vcpkg main
(baseline 8778399) which includes the ggml Vulkan device cache fix.
Also corrects LSAN suppressions file path in CI workflow to resolve
the suppression file within the package workdir.

Made-with: Cursor

* fix: add dbus leak suppressions for test initialization

Made-with: Cursor

* fix: add Windows model download step to cpp-tests workflow

Made-with: Cursor

* fix: reduce SD3 example steps from 100 to 28

SD3 Medium typically needs 20–30 steps; 100 was leftover
from experimentation and makes this example ~5x slower than needed.

Made-with: Cursor

* fix: correct example image paths

- img2img-flux2.js: use assets/von-neumann.jpg (works on fresh checkout)
  instead of temp/von-neumann_transformed.png (doesn't exist)
- img2img-sd3.js: write output to temp/ instead of assets/
  (assets are for checked-in test files, not generated images)

Made-with: Cursor

* fix: ensure temp directory exists in example scripts

Made-with: Cursor

* fix: validate init_image is Uint8Array in img2img mode

Prevents users from accidentally passing string paths (e.g., init_image:
'path/to/file.jpg') which would be misinterpreted as raw bytes and cause
cryptic C++ decoding failures. Now throws a clear error with guidance.

Made-with: Cursor

* fix: guard SdImageBatch against nullptr from generate_image()

generate_image() can return NULL on failure (OOM, abort mid-denoise).
When it does, SdImageBatch was constructed with data_=nullptr but
count_≥1, causing the destructor to dereference nullptr—segfault.

Now the destructor, operator[], and release() all check for null before
dereferencing. operator[] throws a descriptive error if called on null.

Made-with: Cursor

* fix(diffusion): format cpp files with clang-format-19

* fix(readme): clarify config vs parameter serialization

* fix: restore dbus leak suppressions removed by clang-format commit

* fix(diffusion): apply clang-format-19 to test_stb_image_security.cpp

* Update packages/lib-infer-diffusion/addon/src/model-interface/SdModel.cpp

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* Update packages/lib-infer-diffusion/addon/src/model-interface/SdModel.cpp

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* fix(diffusion): format cpp files with clang-format-19

* Revert "fix(diffusion): format cpp files with clang-format-19"

This reverts commit 8082388.

* fix(diffusion): guard FLUX img2img prediction and harden readImageDimensions

- Add JS-side guard in _runInternal() that throws when init_image is
  present on a FLUX model (llmModel set) but prediction is not explicitly
  flux2_flow or flux_flow, preventing silent fallback to SDEdit branch
- Add buffer-length checks to readImageDimensions() for truncated PNG
  (require >= 24 bytes) and JPEG (validate segLen >= 2, guard SOF reads)
- Update prediction docstring in index.d.ts to clarify FLUX img2img
  requires an explicit prediction value
- Add regression tests for all of the above (13 cases)

Made-with: Cursor

* fix(diffusion): remove FLUX.1 references from documentation

- Update prediction docstring to focus on FLUX.2 img2img guidance
- Remove FLUX.1 from encoder file name comments (keep only relevant models)
- Update error message to reference FLUX.2 only in user-facing guidance
- Keep flux_flow type in PredictionType union for backward compatibility

Made-with: Cursor

* test(diffusion): add input-validation test to mobile integration suite

Register the new input-validation regression tests in the mobile test runner
so truncated image and FLUX prediction guard tests run on all platforms.

Made-with: Cursor

* chore(diffusion): bump to 0.2.0 and update changelog

- Bump package version from 0.1.3 to 0.2.0 for img2img feature release
- Update CHANGELOG.md with 0.2.0 entry: FLUX.2 img2img, input validation, regression tests
- Remove stale CHANGELOG (keeping CHANGELOG.md as canonical source)

Made-with: Cursor

* fix(diffusion): revert vcpkg registry baseline to main

Restore default-registry baseline to a9eae49a7c95a63 (matches main).
The 87783998cb67fe6 baseline was an unintended change.

Made-with: Cursor

---------

Co-authored-by: gianni-cor <gianfrancocordella@gmail.com>
Co-authored-by: aegioscy <nik@linux64vm.com>
Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>
Co-authored-by: gianni <gianfranco.cordella@tether.io>
Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants