Skip to content

QVAC-17810 test[skiplog]: add img2img integration tests for diffusion#2186

Closed
Victor-Rodzko wants to merge 1338 commits into
mainfrom
test/qvac-17810-img2img-integration-tests
Closed

QVAC-17810 test[skiplog]: add img2img integration tests for diffusion#2186
Victor-Rodzko wants to merge 1338 commits into
mainfrom
test/qvac-17810-img2img-integration-tests

Conversation

@Victor-Rodzko

@Victor-Rodzko Victor-Rodzko commented May 21, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • img2img was shipped to the SDK in QVAC-17304 feat[api]: add img2img support to SDK diffusion API #1662 but tests-qvac only had unit/mock coverage; real integration coverage against loaded diffusion models was missing.
  • tests-qvac/tests/shared/executors/diffusion-executor.ts had drifted: heavy if (testId === ...) branching, unknown/any params, ad-hoc PNG-size byte checks that produce false positives on compressed images.

📝 How does it solve it?

  • New e2e cases in diffusion-tests.ts exercising the img2img path against real loaded models:
    • diffusion-img2img-vs-txt2img-baseline — proves init_image actually changes output (byte-delta + IHDR-dimension comparison vs txt2img baseline).
    • diffusion-img2img-img-cfg-scaleimg_cfg_scale parameter accepted/rendered.
    • diffusion-img2img-invalid-strength — Zod rejects out-of-range strength.
    • Reuses existing diffusion-basic-img2img.
  • Platform split (matches vision-test pattern): asset filename → Uint8Array resolution lives in desktop/executors/diffusion-executor.ts (Node fs); shared executor stays React Native-clean and only sees bytes.
  • Mobile: all diffusion tests skipped (SD 2.1 1B Q8_0 cold-load OOMs Device Farm devices, ~3GB). SkipExecutor message updated; mobile/executors/diffusion-executor.ts removed as dead code.
  • Refactor of shared/executors/diffusion-executor.ts to be a typed reference implementation:
    • Removed execute() override; replaced with a strongly-typed handlers map.
    • Required<{ [K in testId]: HandlerFn<…> }> annotation makes the map exhaustive at compile time — adding a new test without a handler is a TS error.
    • New DiffusionParams interface (no more unknown/any); buildParams/resolveParams typed end-to-end.
    • Consolidated 4 near-duplicate handlers into a single runBasic(resourceKey, …) via bind.
    • Extracted compareWithBaseline helper for img2img-vs-txt2img and fusion-vs-flux2 comparisons.
    • Extracted readPngDims / assertEqualPngDimensions (parse IHDR) so we no longer false-positive on compressed-byte length differences.
  • New minimal asset assets/images/diffusion-img2img-source-256.png (562 B, 256×256 RGB) — keeps SD 2.1 output dimensions matching requested 256×256 and minimizes resource cost.

🧪 How was it tested?

  • Desktop: npm run install:build:full → full diffusion suite green locally (FLUX 2 Klein).
  • iOS (single dev device): img2img cases green locally (full Device Farm run still skipped at consumer level due to OOM).
  • tsc --noEmit clean. Exhaustiveness check verified by removing a handler entry and confirming TS error: Property '"diffusion-standalone-upscaler-x4"' is missing in type … but required in type 'Required<…>'.
  • CI run: https://github.com/tetherto/qvac/actions/runs/26229867278

NamelsKing and others added 30 commits May 1, 2026 12:04
…dk publish doesn't auto-skip (#1853)

publish-npm needs [build, publish-logic, release-merge-guard]. On a
manual workflow_dispatch from a release-sdk-* branch, the guard's
if: rejected the event (push only), so the guard was skipped, and
GitHub Actions' implicit success() check on needs auto-skipped
publish-npm before its if: with the explicit
needs.release-merge-guard.result == 'skipped' branch could even be
evaluated.

Allow the guard to run on workflow_dispatch too. The guard already
handles workflow_dispatch safely: github.event.before is empty, so
base-sha is empty, so isInitialPush is true and the changelog diff
check is skipped. The branch-name pattern check and the
package.json-version-matches-branch check still run, which is what
we want for a manual release publish.

Net effect: manual publish-sdk dispatches on release branches now
actually reach the publish-npm job instead of silently skipping.
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
…branch dispatch (#1856)

Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
#1835)

The bare worker leaks indefinitely when started while another SDK process
holds the registry corestore lock. Root cause: `corestoreOpts: { wait: true }`
issues a blocking `flock(LOCK_EX)` on a libuv worker thread that JS cannot
cancel, so when SIGTERM/IPC-disconnect arrives, the in-flight `client.ready()`
never resolves (cleanup early-returns with `registryClient = null`) and
`process.exit()` cannot terminate Bare while the native handle is held.
The OS process wedges forever, breaking the three `no-lingering-bare-*`
e2e tests in mixed-suite runs.

`wait: true` was deliberately added by #1480 (QVAC-12232) to tolerate
transient lock contention during another SDK's startup/shutdown; reverting
to the bare default would re-introduce that bug. Instead, switch to
`wait: false` (tryLock) and provide an equivalent JS-bounded retry budget
in the existing retry loop:

  - 8 attempts, 250 ms base backoff, capped by a 10 s deadline
  - each step is a fresh non-blocking syscall — `EBUSY` surfaces to JS
    immediately, so shutdown remains cancellable at every point
  - exhausted budget propagates the underlying error, hitting the
    existing `closeRegistryClient` early-return on `null` and letting
    `process.exit()` terminate the worker cleanly

As defense in depth, arm a 3-second SIGKILL safety net in
`shutdownBareDirectWorker` (unrefed timer) before calling `process.exit`,
so any future blocking-handle bug can't survive shutdown.

Covered by existing `no-lingering-bare-{sigterm,close,ipc-disconnect}`
e2e tests, which now pass in mixed-suite runs.

Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>
* doc: create Cursor rule for docs website

* docs: add robots.txt to website

* doc: website source - refactor - standardize env vars to standard used in JSON and infra envs like GH Actions

* doc: website source - add autogen sitemap.xml

* doc: website source - add JSON-LD

* doc: frontmatter improvement - add type of page to enrich metadata

* doc: content update - add missing frontmatter field for SEO

* doc: website source - robots.txt - add AI bot rules

* doc: website source - simplify SEO machinery

* doc: website source - robots.txt - add content signals
…otes (#1865)

Tooling (scripts/sdk/generate-changelog-sdk-pod.cjs):
- Backmerge filter: PRs whose subject starts with `Backmerge` or
  `Merge release ...` are skipped during processSDKPRs (same shape as
  the existing [skiplog] filter).
- Companion filter + entry-count strip: new isCompanionEntry,
  stripEntryCount, cleanModelEntries helpers applied to the inline
  [mod] summary in CHANGELOG.md and the body of models.md. Recognises
  *_LEX / *_VOCAB / *_DATA / *_METADATA constant suffixes and any
  line containing the word "companion".
- Indented continuation lines for [mod] PRs: Added/Updated/Removed
  are emitted as indented sub-rows under the bullet (capped at
  MAX_INLINE_MODELS = 5 per section, "(and N more)" for the rest)
  instead of stuffed inline.
- Announcement-post generator: new --generate-announcement-post CLI
  flag (with optional --version) parses CHANGELOG.md via
  parseChangelogMarkdown and emits the Slack template (:qvac: header,
  NPM/GitHub/changelog links, conditional :warning: Breaking Changes,
  per-section bullets with <url> link wrapping and :boom: breaking
  markers, footer). Sections cap at MAX_ANNOUNCEMENT_BULLETS = 10
  with "... And much more, see full list in changelog :memo:" only
  when strictly more than 10.
- New helpers exported: parseChangelogMarkdown, generateAnnouncementPost.

Skill (.cursor/skills/sdk-changelog/SKILL.md):
- Step 4 (CHANGELOG_LLM.md) is now mandatory.
- New Step 5: generate announcement-post.txt (mandatory) with the
  gitignore note and template spec.
- NOTICE renumbered to Step 6.
- Documented all new policies (backmerge, companion, entry-count
  strip, indentation, max-bullets cap).
- CLI parameters table refreshed.

.gitignore:
- Added packages/*/changelog/*/announcement-post.txt. The post is a
  Slack copy-paste working artifact, not a release deliverable.

Release notes for 0.10.0:
- New packages/sdk/changelog/0.10.0/ folder with CHANGELOG.md,
  breaking.md, api.md, models.md, CHANGELOG_LLM.md.
- Root aggregate packages/sdk/CHANGELOG.md rebuilt with v0.10.0 at
  top.
- packages/sdk/NOTICE refreshed (191 models, 179 JS deps).
- packages/sdk/package.json bumped 0.9.1 -> 0.10.0.

Backmerge of release-sdk-0.10.0 -> main is a no-op for the release
artifacts (changelog, NOTICE) because they land here directly.
…desktop runner (#1832)

* QVAC-17837 feat[ci]: surface synthetic IndicTrans [GPU] row on every desktop runner

The on-PR Step Summary previously showed [GPU] rows only on the 2 of 6
desktop runners that have a real GGML GPU backend bound today (macOS
Metal, ai-run-windows11-gpu Vulkan). The 4 hosted Linux runners showed
[CPU]-only rows because:
  - bergamot.test.js + pivot-bergamot.test.js gate their GPU probe loop
    on `if (isMobile)` so they never run GPU on desktop, and
  - indictrans.test.js does probe GPU on every platform but
    discoverGpuDevices() returns empty when GGML can't bind a backend
    (loader fix is still pending per QVAC-17640 / QVAC-17880).

This commit adds a synthetic always-running [IndicTrans] [GPU] test that
loads with use_gpu: true and no explicit gpu_device. The existing shared
runSingleTranslation helper records perf regardless of the resolved
backend; resolveExecutionProvider (now lifted into utils.js) tags the
execution_provider as 'cpu (fallback)' when GGML emitted a CPU sentinel
and as the real backend tag (vulkan/metal/opencl/...) when a GPU
resolved. So today the 4 Linux runners show CPU + GPU(cpu (fallback))
rows, and macOS / ai-run-windows11-gpu show CPU + GPU(real) rows.

Once Ian's GPU loader fix lands on a given platform, the same row's EP
auto-flips from 'cpu (fallback)' to the real backend without further CI
wiring — that's the contract QVAC-17837's description asks for.

Other clean-ups in the same file because the audit surfaced them:
  - resolveExecutionProvider now treats 'BLAS' as a CPU sentinel so the
    [CPU] row's EP no longer reports 'blas' on macOS.
  - discoverGpuDevices() now breaks on BLAS (suppresses macOS's three
    spurious [GPU:1 BLAS] / [GPU:2 BLAS] / [GPU:3 BLAS] rows) and
    dedupes by backend name (also fixes mobile Android's 4xVulkan0
    duplicates when that file is next exercised, though mobile is out
    of scope for this PR).
  - The per-device GPU test's t.not(backendName, 'CPU') hard assertion
    is loosened to a t.comment warning so a silent GPU fallback at a
    discovered device index doesn't fail CI on a perf-only test.

Bergamot and Pivot stay CPU-only on desktop. Bergamot is intgemm-only
and has no GPU port architecturally, so a synthetic GPU row for those
tests would be perpetual fallback noise. Mobile workflows are
unchanged.

Made-with: Cursor

* QVAC-17837 fix: address parallel-review feedback on synthetic GPU test

Two correctness/consistency follow-ups from the parallel review:

- Wrap the new synthetic [IndicTrans] [GPU] test in `if (!isMobile)`.
  D2 scope explicitly said mobile workflows are untouched, but the
  test had no mobile gate so it would have added a duplicate
  default-device GPU row alongside the existing per-device probe rows
  on Pixel/S25/iPhone. Mobile already has meaningful GPU rows; the
  synthetic row is only needed on the 6 desktop runners that today
  emit zero GPU rows for some/all tests.

- Replace the literal `backendName === 'CPU'` check in the per-device
  GPU test's soft-fallback warning with `CPU_SENTINEL_BACKENDS.has(...)`
  so the warning fires consistently for every backend treated as CPU
  by `resolveExecutionProvider` (including BLAS and Unloaded), not
  just the addon's `CPU` sentinel. Same set, same definition,
  one source of truth.

No behaviour change on desktop; restores intended D2 scope on mobile;
self-consistent fallback definition between the helper and the
warning. Reviewers' other findings (`feat[ci]:` tag style, BLAS-break
order dependency, Bergamot/Pivot still using regex EP fallback) are
documented or pre-existing — not addressed here.

Made-with: Cursor

* QVAC-17837 fix[lint]: re-indent synthetic [GPU] test body inside if (!isMobile) block

Pure whitespace fix — `npm run lint:fix` (standard --fix). Sanity-checks
job in CI run #25166275184 was failing on ESLint indent errors because
the previous commit wrapped the test body in `if (!isMobile) {...}`
without bumping each line's indentation by 2 spaces. `git diff -w` is
empty.

Made-with: Cursor
…flows (#1728)" (#1871)

* Revert "fix: prevent code injection and untrusted checkout in CI workflows (#1728)"

Reverts commit a79602f, with two
intentional exclusions noted below.

Excluded from this revert:
- .github/actions/run-lint-and-unit-tests/action.yaml: kept at its
  current state on main; the env-var indirection #1728 introduced for
  npm-token/pat-token in the .npmrc-configuration step is preserved.
- .github/workflows/cpp-lint.yaml: net effect on this file is zero.
  PR #1829 (commit 65bd746) later rewrote the same `cpp-lint` job and
  added `id-token: write` to the `permissions` block originally
  introduced by #1728. The `permissions` block is preserved as-is
  (contents: read + id-token: write) because #1829's AWS OIDC
  integration depends on it.

All other changes from #1728 are reverted.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Potential fix for pull request finding 'CodeQL / Code injection'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ed addon (#1833)

* feat: add multi-GPU pipeline parallelism via split-mode config

Ports the split-mode/tensor-split feature from the LLM addon to the embed
addon. When split-mode is layer or row and a GPU backend is available, the
--device flag is omitted so llama.cpp distributes embedding model layers
across all available GPUs. Falls back to CPU silently when no GPU is found.

- Add split-mode (none|layer|row) and tensor-split config keys to setupParams,
  accepting both hyphen and underscore variants
- Omit --device in split mode so llama.cpp routes across all GPUs
- Accept main_gpu underscore variant alongside main-gpu in tryMainGpuFromMap
- Add getEffectiveGpuDeviceCount() to BackendSelection for GPU inventory
- Add split-mode and tensor-split to GGMLConfig in index.d.ts
- Bump version 0.14.0 -> 0.15.0

* test: add multi-GPU split-mode tests and benchmark example

Ports the test and example surface from the LLM multi-GPU PR to the embed
addon, matching the pattern exactly.

- Add BertModel::getCommonParams() so tests can inspect split_mode after load
- Add 8 BertModelTest split-mode cases: none, layer, row, case-insensitive,
  underscore variant, CPU fallback clears GPU params, invalid value, both
  keys reject
- Add 9 BackendSelectionTest getEffectiveGpuDeviceCount cases covering
  zero, CPU-only, single dGPU, single iGPU, two dGPUs, dGPU+iGPU, two
  dGPUs+iGPU, two iGPUs, and accel/CPU ignored
- Add test/integration/spec-logger.js for native log capture in integration
  tests
- Add test/integration/multi-gpu.test.js: 4 integration tests gated on
  QVAC_HAS_MULTI_GPU=1 (layer, row, default single-device, layer+tensor-split)
- Add examples/multiGpuBenchmark.js: single vs layer vs row throughput
  comparison using the embed model
- Regenerate test/mobile/integration.auto.cjs with runMultiGpuTest entry

* fix: harden CPU fallback and add missing main_gpu alias tests

CPU fallback in setupParams was missing two details present in the final
LLM implementation:

- Set params.main_gpu = -1 on CPU fallback so llama.cpp does not retain
  a stale GPU index.
- Reset the local splitMode variable to LLAMA_SPLIT_MODE_NONE after the
  CPU-fallback warning so the --device gate below emits --device correctly
  instead of silently suppressing it when the requested split mode was
  layer or row.

Also add two missing BackendSelection unit tests for the main_gpu underscore
alias and both-key rejection introduced in tryMainGpuFromMap, mirroring the
coverage in the LLM package.

* fix: wire all integration tests into test:integration runner

test:integration was hardcoded to addon.test.js, so multi-gpu.test.js
and multi-instance.test.js were never executed in desktop CI. Switch to
the same generate-then-run-all pattern used by the LLM addon: brittle -r
generates test/integration/all.js from the full *.test.js glob, then
bare runs it.

* fix: resolve cpp-lint failures in BackendSelection and BertModel

Apply clang-format and clang-tidy fixes flagged by the cpp-lint job:
- Use std::ranges::transform in BackendSelection.cpp and BertModel.cpp
- Drop else-after-return in parseMainGpu
- Rename short iterator names (it -> foundIt/configIt/splitModeIt)
- Use designated initializers for BackendInterface and BertEmbeddings::Layout
- Drop redundant (void) on BackendInterface function pointer
- Move pointer-arithmetic NOLINT to the diagnostic line in batchDecode
- Extract parseSplitMode helper to bring setupParams cognitive complexity
  back under the threshold
- Suppress non-const-global and macro-usage diagnostics in logging.hpp
- Reorder includes in test_bert_model.cpp and collapse getCommonParams
  to a single line for clang-format

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
* QVAC-18184 chore[notask|skiplog]: backmerge release sdk 0.9.2

Brings the 0.9.2 release artifacts back into main now that
@qvac/sdk@0.9.2 has been published to npm (`latest` dist-tag,
2026-05-01 10:09 UTC).

- Bump packages/sdk/package.json: 0.9.1 -> 0.9.2
- Add packages/sdk/changelog/0.9.2/CHANGELOG.md and CHANGELOG_LLM.md
- Prepend 0.9.2 entry to aggregated packages/sdk/CHANGELOG.md

Hotfix content (z.xor -> z.union, zod floor bump) is the cherry-pick
of #1790 that already landed on main, so no source changes here.

Dependencies in package.json are intentionally NOT brought over from
the release branch — main has progressed past 0.9.1 on several
internal packages (e.g. @qvac/llm-llamacpp 0.14.4 -> 0.17.1,
@qvac/translation-nmtcpp 0.6.10 -> 2.0.1, react-native-bare-kit
0.11.5 -> 0.12.3) and a blind merge would regress them. Only the
version field is changed, matching the 0.9.1 backmerge precedent (#1726).

* chore[skiplog]: drop package.json version bump from backmerge to avoid conflict with 0.10.0 PR

PR #1865 (the 0.10.0 release) is open against main and bumps
packages/sdk/package.json version 0.9.1 -> 0.10.0. This backmerge
was bumping the same line 0.9.1 -> 0.9.2, so whichever lands second
hits a conflict on that single line.

Since main is moving to 0.10.0 directly (the 0.9.2 hotfix is a
separate release line), drop the package.json change from this
backmerge and let #1865 own the version bump. Main's package.json
will briefly say 0.9.1 while CHANGELOG.md lists 0.9.2 as the latest
shipped version, but that's transient — #1865 overwrites it to
0.10.0 anyway.

Keep the changelog artifacts (changelog/0.9.2/ folder + the
prepended ## [0.9.2] entry in aggregated CHANGELOG.md) so main
retains a record of the 0.9.2 release in its history.

---------

Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>
…x.d.ts (#1613)

* feat[api]: export RuntimeStats interface in NMT addon index.d.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump @qvac/translation-nmtcpp to 2.0.2 and update changelog

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* doc: document RuntimeStats units and per-backend fields; fix README ms→s

Address review on PR #1613:
- Add JSDoc to `RuntimeStats` clarifying that `totalTime`/`encodeTime`/
  `decodeTime` are seconds while `TTFT` is milliseconds, and listing
  which fields each backend emits (Bergamot omits `encodeTime`/`TTFT`).
  Note that pivot translations use prefixed keys.
- Fix README quickstart that printed `totalTime` with a `'ms'` label
  even though the C++ producer emits seconds.

---------

Co-authored-by: Ramaz Tskhadadze <bubu@Ramazs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… qvac-internal teams (#1877)

Repoint code ownership from `@tetherto/ai-runtime-merge` and
`@tetherto/ai-runtime-bk` to `@tetherto/qvac-internal-dev`, and add
`qvac-internal-merge` to the approval-check-worker team-lead and
team-member checks while keeping the legacy `ai-runtime-merge*` teams
in place during the transition.
…ow_dispatch (#1839)

* QVAC-18111 infra[notask]: scaffold Benchmark Performance (LLM) workflow_dispatch

GitHub requires a `workflow_dispatch` workflow to exist on the
default branch before it shows up in the Actions tab and becomes
triggerable with `--ref <feature-branch>`. This lands the LLM
benchmark workflow on `main` so the QVAC-17830 perf-metrics feature
branch can be dispatched against it for end-to-end validation.

Changes:
- `benchmark-performance-qvac-lib-infer-llamacpp-llm.yml` (new):
  manual `workflow_dispatch` only — mirrors the structure of the
  existing Parakeet / Whispercpp benchmark workflows. Calls
  `prebuilds-...yml` then `integration-test-...yml` with
  bench-mode iteration counts (`QVAC_PERF_RUNS=3`,
  `QVAC_PERF_WARMUP_RUNS=1` by default), then aggregates desktop
  artifacts into a combined HTML / step-summary. Phase-1 scope is
  desktop only — mobile (Device Farm) needs a build-time hook in
  the test app to thread env vars through to bare and is tracked
  as a QVAC-18111 follow-up.
- `integration-test-qvac-lib-infer-llamacpp-llm.yml`: thread
  `qvac_perf_runs` / `qvac_perf_warmup_runs` through `workflow_call`
  + `workflow_dispatch` and surface them as `QVAC_PERF_RUNS` /
  `QVAC_PERF_WARMUP_RUNS` on the Linux/macOS and Windows test run
  steps. Empty string => unset, so the umbrella PR workflow
  continues to honour the test-side default and PR runs are
  unaffected by this change.

Per the perf policy agreed on Slack (2026-04-30): the umbrella
on-pr workflow runs perf tests at the cheap default so we don't pay
full perf cost on every PR; this dedicated workflow is the only
place we crank up the iteration counts to produce mean ± std
numbers.

Made-with: Cursor

* QVAC-18111 chore[notask]: trim chatty inline comments in benchmark workflow

Made-with: Cursor

* QVAC-18111 chore[notask]: add run_desktop toggle to benchmark workflow_dispatch

Made-with: Cursor

---------

Co-authored-by: olyasir <sirkinolya@gmail.com>
* chore(onnx-tts): bump addon-cpp to 1.1.6

Update qvac-lib-inference-addon-cpp version constraint in vcpkg.json
from 1.1.5#1 to 1.1.6 and add a corresponding CHANGELOG entry under
the existing [Unreleased] section.

Made-with: Cursor

* chore(tts): bump version to 0.8.6

Bump @qvac/tts-onnx from 0.8.5 to 0.8.6 and convert the [Unreleased]
CHANGELOG section to [0.8.6] for the addon-cpp 1.1.6 release alongside
the queued Chatterbox engine and tensor-helper changes.

Made-with: Cursor

---------

Co-authored-by: Mariusz Reichert <reichert.programming@gmail.com>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
* chore(whispercpp): bump addon-cpp to 1.1.6

Update qvac-lib-inference-addon-cpp version constraint in vcpkg.json
from 1.1.5#1 to 1.1.6 and add a corresponding CHANGELOG entry.

Made-with: Cursor

* chore(whispercpp): bump version to 0.6.7

Bump @qvac/transcription-whispercpp from 0.6.6 to 0.6.7 and convert the
[Unreleased] CHANGELOG section to [0.6.7] for the addon-cpp 1.1.6 release.

Made-with: Cursor

---------

Co-authored-by: Mariusz Reichert <reichert.programming@gmail.com>
…angelog (#1867)

Brings the @qvac/cli@0.3.0 release artifacts back onto main per
gitflow.md "Keep main aligned". Same shape as #1766 (the 0.2.4
backmerge precedent).

- packages/cli/package.json: version 0.2.4 -> 0.3.0
- packages/cli/changelog/0.3.0/CHANGELOG.md: new
- packages/cli/changelog/0.3.0/api.md: new
- packages/cli/CHANGELOG.md: prepend ## [0.3.0] entry

NOTE: Opened as DRAFT because the companion release PR #1836 is also
still draft and 5 of its CI checks are failing. @qvac/cli@0.3.0 has
not yet been published to npm (latest is 0.2.4). Mark this PR ready
for review only after #1836 merges into release-cli-0.3.0 and the
GPR/npm publish completes.

The source-level changes (@qvac/sdk devDep ^0.10.0 + sdk.ts
MIN_SDK_VERSION='0.10.0') are already on main from PR #1810 — only
the release metadata needs to come back.

CLI's package.json on main has no dependency drift versus
release-cli-0.3.0, so unlike the SDK 0.9.2 backmerge (#1857) the
package.json version bump can be safely included here. There's also
no competing CLI release PR in flight on main.

Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>
…a pushFile (#1840)

* QVAC-18111 infra[notask]: scaffold Benchmark Performance (LLM) workflow_dispatch

GitHub requires a `workflow_dispatch` workflow to exist on the
default branch before it shows up in the Actions tab and becomes
triggerable with `--ref <feature-branch>`. This lands the LLM
benchmark workflow on `main` so the QVAC-17830 perf-metrics feature
branch can be dispatched against it for end-to-end validation.

Changes:
- `benchmark-performance-qvac-lib-infer-llamacpp-llm.yml` (new):
  manual `workflow_dispatch` only — mirrors the structure of the
  existing Parakeet / Whispercpp benchmark workflows. Calls
  `prebuilds-...yml` then `integration-test-...yml` with
  bench-mode iteration counts (`QVAC_PERF_RUNS=3`,
  `QVAC_PERF_WARMUP_RUNS=1` by default), then aggregates desktop
  artifacts into a combined HTML / step-summary. Phase-1 scope is
  desktop only — mobile (Device Farm) needs a build-time hook in
  the test app to thread env vars through to bare and is tracked
  as a QVAC-18111 follow-up.
- `integration-test-qvac-lib-infer-llamacpp-llm.yml`: thread
  `qvac_perf_runs` / `qvac_perf_warmup_runs` through `workflow_call`
  + `workflow_dispatch` and surface them as `QVAC_PERF_RUNS` /
  `QVAC_PERF_WARMUP_RUNS` on the Linux/macOS and Windows test run
  steps. Empty string => unset, so the umbrella PR workflow
  continues to honour the test-side default and PR runs are
  unaffected by this change.

Per the perf policy agreed on Slack (2026-04-30): the umbrella
on-pr workflow runs perf tests at the cheap default so we don't pay
full perf cost on every PR; this dedicated workflow is the only
place we crank up the iteration counts to produce mean ± std
numbers.

Made-with: Cursor

* QVAC-18111 chore[notask]: trim chatty inline comments in benchmark workflow

Made-with: Cursor

* QVAC-18111 chore[notask]: add run_desktop toggle to benchmark workflow_dispatch

Made-with: Cursor

* QVAC-18111 infra[notask]: bridge QVAC_PERF_RUNS to mobile test app via pushFile

Extends the mobile integration workflow with the same iteration-count
inputs as the desktop reusable workflow, and adds a `mobile-benchmarks`
job to the LLM benchmark dispatch so it covers Device Farm too.

The bare runtime on Device Farm doesn't see GitHub Actions env vars,
so we mirror the existing `testFilter.txt` pattern: when the workflow
inputs are non-empty, the WDIO before-hook pushes a `qvacPerfConfig.txt`
to the device (Android: `/data/local/tmp/`, iOS:
`@bundleId:documents/`) with the iteration overrides as KEY=VALUE
lines. The file-reading side on bare lives on the QVAC-17830 perf
branch — without that branch this PR is a no-op (orphan file), so it
is safe to land independently.

Changes:
- `integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml`: add
  `qvac_perf_runs` / `qvac_perf_warmup_runs` to `workflow_call` and
  `workflow_dispatch`; add `__QVAC_PERF_RUNS__` /
  `__QVAC_PERF_WARMUP_RUNS__` placeholders to the Android + iOS
  WDIO config blobs and the corresponding pushFile block in the
  `before` hook; substitute the placeholders in `make_split`.
- `benchmark-performance-qvac-lib-infer-llamacpp-llm.yml`: add a
  `mobile-benchmarks` job calling the mobile workflow with the
  bench-mode iteration counts; have `summarize` `needs:` it; drop
  the "desktop only" caveat in the step-summary blurb.

PR runs are unchanged: empty input ⇒ empty placeholder ⇒ before-hook
skips the perf-config push.

Made-with: Cursor

* QVAC-18111 chore[notask]: add run_mobile toggle to benchmark workflow_dispatch

Made-with: Cursor

---------

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
* QVAC-18064 feat: optimize nmtcpp for Android GPU inference

- Optimize nmtcpp for Android GPU inference with Vulkan backend support
- Move beam search KV cache pool to CPU backend
- Propagate config params after GGML context load and fix multi-GPU handling
- Disable OpenCL until upstream qvac-fabric is updated
- Prevent backend device accumulation and skip OpenCL comparison test
- Fix clang-format for ggml_backend_load_all_from_path call
- Remove Android debug logging added for Adreno 830 crash investigation
- Resolve cpp-lint clang-tidy naming and implicit-bool errors
- Address code review findings

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Port LlamacppUtils.hpp helpers to common_init_result_ptr API.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Update vcpkg.json

---------

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
* QVAC-17989 Add post-generation ESRGAN upscale

* QVAC-17989 Add ESRGAN JS test and example

* QVAC-17989 Fix upscaled output stats

* Update CHANGELOG.md

* Update package.json

* QVAC-17989 Format ESRGAN handler changes

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
* Create new buckets to run tests in independent processes.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* ci(ios): include all run ARNs in results aggregation and log download

The two RUN_ARNS aggregation loops were hardcoded to iterate over indices
2..8, so the new Heavy7/Heavy8 runs (RUN_ARN_9, RUN_ARN_10) were silently
dropped from the final test-results summary and the Device Farm log
download. As a result, Heavy7/Heavy8 failures would not have failed the
workflow and their device logs would not have been collected.

Iterate up to RUN_COUNT instead, so any future bucket additions are
picked up automatically.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Co-authored-by: Cursor <cursoragent@cursor.com>
)

The "Create and Upload Test Spec" step's run: | block in
integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml grew to
21,074 chars after #1889, putting it just over GitHub Actions'
21,000-char limit on a single template expression.

This breaks every reusable-workflow_call into the file, so the
On PR Trigger (LLM) workflow fails instantly with:

  error parsing called workflow ... :
  (Line: 914, Col: 14): Exceeded max expression length 21000

and no jobs run. Every open PR that touches the LLM package is
currently blocked from getting LLM CI.

Fix: remove 32 in-block comment lines that were pure narration of
already-readable code (echo/printf/sed) and verbose intent text
duplicated by the surrounding context. Brings the run-block payload
to ~19,008 chars (well under 21,000) without changing any executed
logic.

Diff is comments-only: 32 deletions, 0 additions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
…ter support to diffusion API (#1838)

* feat[api]: add FLUX.2 multi-reference fusion and LoRA adapter support to diffusion API

* doc[skiplog]: trim verbose lora docs and prune zod-builtin tests

Address PR review:
- shorten lora_apply_mode description in sdcppConfigSchema and drop the
  external file references the user can't see at usage time
- shorten the LoRA JSDoc block in diffusion.ts to the essentials
- drop unit tests that effectively re-assert zod built-ins (z.boolean(),
  z.string().min(1), individual enum members); keep the
  ABSOLUTE_PATH_PATTERN matrix, the mutual-exclusion refine, and one
  happy-path per new field

Made-with: Cursor

* test[api]: validate FLUX.2 fusion diverges from txt2img baseline and reject conflicting init_image inputs
New composite action that installs LLVM/Clang to a pinned version on
Linux and Windows runners and exposes the unversioned binaries on PATH.
Intended to become the single source of truth for the LLVM major used
across every prebuild / cpp-test / coverage / benchmark workflow in the
monorepo: bumping `version` (Linux apt major) and `windows-version`
(chocolatey full pin) defaults rolls the whole repo forward in one
place.

- Linux: install via apt.llvm.org `llvm.sh <version> all`, then prepend
  `/usr/lib/llvm-<version>/bin` to `$GITHUB_PATH` so unversioned
  `clang`, `clang++`, `clang-format`, `clang-tidy`, `git-clang-format`,
  `lld`, `llvm-cov`, `llvm-profdata`, ... resolve to the chosen major.
- Windows: `choco upgrade llvm --version=<windows-version> -y
  --allow-downgrade` (defaults to a specific patch release to avoid
  silent drift when chocolatey ships a new one) and add
  `C:\Program Files\LLVM\bin` to `$GITHUB_PATH`.
- macOS: no-op (Apple Clang is set up via setup-apple-clang).

Defaults: version=22, windows-version=22.1.0.
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@Victor-Rodzko Victor-Rodzko added test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] and removed test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] labels May 22, 2026
@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — ios — ✅ all tests passed (82/91, 990s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports · Device Farm logs

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — windows — ✅ all tests passed (91/91, 408s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — linux — ✅ all tests passed (91/91, 244s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — android — ✅ all tests passed (83/91, 2339s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports · Device Farm logs

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — macos⚠️ no results

Config: suite=smoke · filter=(none) · exclude=(none)
View run

The test job did not produce a results artifact. Check the run for job-level failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

e2e-tested Test suite has run on this PR. Does not indicate tests pass/fail - see results in comments. test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] tier1 verified Authorize secrets / label-gate in PR workflows verify

Projects

None yet

Development

Successfully merging this pull request may close these issues.