QVAC-17810 test[skiplog]: add img2img integration tests for diffusion by Victor-Rodzko · Pull Request #2186 · tetherto/qvac

Victor-Rodzko · 2026-05-21T13:18:56Z

🎯 What problem does this PR solve?

img2img was shipped to the SDK in QVAC-17304 feat[api]: add img2img support to SDK diffusion API #1662 but tests-qvac only had unit/mock coverage; real integration coverage against loaded diffusion models was missing.
tests-qvac/tests/shared/executors/diffusion-executor.ts had drifted: heavy if (testId === ...) branching, unknown/any params, ad-hoc PNG-size byte checks that produce false positives on compressed images.

📝 How does it solve it?

New e2e cases in diffusion-tests.ts exercising the img2img path against real loaded models:
- diffusion-img2img-vs-txt2img-baseline — proves init_image actually changes output (byte-delta + IHDR-dimension comparison vs txt2img baseline).
- diffusion-img2img-img-cfg-scale — img_cfg_scale parameter accepted/rendered.
- diffusion-img2img-invalid-strength — Zod rejects out-of-range strength.
- Reuses existing diffusion-basic-img2img.
Platform split (matches vision-test pattern): asset filename → Uint8Array resolution lives in desktop/executors/diffusion-executor.ts (Node fs); shared executor stays React Native-clean and only sees bytes.
Mobile: all diffusion tests skipped (SD 2.1 1B Q8_0 cold-load OOMs Device Farm devices, ~3GB). SkipExecutor message updated; mobile/executors/diffusion-executor.ts removed as dead code.
Refactor of shared/executors/diffusion-executor.ts to be a typed reference implementation:
- Removed execute() override; replaced with a strongly-typed handlers map.
- Required<{ [K in testId]: HandlerFn<…> }> annotation makes the map exhaustive at compile time — adding a new test without a handler is a TS error.
- New DiffusionParams interface (no more unknown/any); buildParams/resolveParams typed end-to-end.
- Consolidated 4 near-duplicate handlers into a single runBasic(resourceKey, …) via bind.
- Extracted compareWithBaseline helper for img2img-vs-txt2img and fusion-vs-flux2 comparisons.
- Extracted readPngDims / assertEqualPngDimensions (parse IHDR) so we no longer false-positive on compressed-byte length differences.
New minimal asset assets/images/diffusion-img2img-source-256.png (562 B, 256×256 RGB) — keeps SD 2.1 output dimensions matching requested 256×256 and minimizes resource cost.

🧪 How was it tested?

Desktop: npm run install:build:full → full diffusion suite green locally (FLUX 2 Klein).
iOS (single dev device): img2img cases green locally (full Device Farm run still skipped at consumer level due to OOM).
tsc --noEmit clean. Exhaustiveness check verified by removing a handler entry and confirming TS error: Property '"diffusion-standalone-upscaler-x4"' is missing in type … but required in type 'Required<…>'.
CI run: https://github.com/tetherto/qvac/actions/runs/26229867278

…dk publish doesn't auto-skip (#1853) publish-npm needs [build, publish-logic, release-merge-guard]. On a manual workflow_dispatch from a release-sdk-* branch, the guard's if: rejected the event (push only), so the guard was skipped, and GitHub Actions' implicit success() check on needs auto-skipped publish-npm before its if: with the explicit needs.release-merge-guard.result == 'skipped' branch could even be evaluated. Allow the guard to run on workflow_dispatch too. The guard already handles workflow_dispatch safely: github.event.before is empty, so base-sha is empty, so isInitialPush is true and the changelog diff check is skipped. The branch-name pattern check and the package.json-version-matches-branch check still run, which is what we want for a manual release publish. Net effect: manual publish-sdk dispatches on release branches now actually reach the publish-npm job instead of silently skipping.

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

…branch dispatch (#1856) Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

#1835) The bare worker leaks indefinitely when started while another SDK process holds the registry corestore lock. Root cause: `corestoreOpts: { wait: true }` issues a blocking `flock(LOCK_EX)` on a libuv worker thread that JS cannot cancel, so when SIGTERM/IPC-disconnect arrives, the in-flight `client.ready()` never resolves (cleanup early-returns with `registryClient = null`) and `process.exit()` cannot terminate Bare while the native handle is held. The OS process wedges forever, breaking the three `no-lingering-bare-*` e2e tests in mixed-suite runs. `wait: true` was deliberately added by #1480 (QVAC-12232) to tolerate transient lock contention during another SDK's startup/shutdown; reverting to the bare default would re-introduce that bug. Instead, switch to `wait: false` (tryLock) and provide an equivalent JS-bounded retry budget in the existing retry loop: - 8 attempts, 250 ms base backoff, capped by a 10 s deadline - each step is a fresh non-blocking syscall — `EBUSY` surfaces to JS immediately, so shutdown remains cancellable at every point - exhausted budget propagates the underlying error, hitting the existing `closeRegistryClient` early-return on `null` and letting `process.exit()` terminate the worker cleanly As defense in depth, arm a 3-second SIGKILL safety net in `shutdownBareDirectWorker` (unrefed timer) before calling `process.exit`, so any future blocking-handle bug can't survive shutdown. Covered by existing `no-lingering-bare-{sigterm,close,ipc-disconnect}` e2e tests, which now pass in mixed-suite runs. Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>

* doc: create Cursor rule for docs website * docs: add robots.txt to website * doc: website source - refactor - standardize env vars to standard used in JSON and infra envs like GH Actions * doc: website source - add autogen sitemap.xml * doc: website source - add JSON-LD * doc: frontmatter improvement - add type of page to enrich metadata * doc: content update - add missing frontmatter field for SEO * doc: website source - robots.txt - add AI bot rules * doc: website source - simplify SEO machinery * doc: website source - robots.txt - add content signals

…ease prs (#1862)

…otes (#1865) Tooling (scripts/sdk/generate-changelog-sdk-pod.cjs): - Backmerge filter: PRs whose subject starts with `Backmerge` or `Merge release ...` are skipped during processSDKPRs (same shape as the existing [skiplog] filter). - Companion filter + entry-count strip: new isCompanionEntry, stripEntryCount, cleanModelEntries helpers applied to the inline [mod] summary in CHANGELOG.md and the body of models.md. Recognises *_LEX / *_VOCAB / *_DATA / *_METADATA constant suffixes and any line containing the word "companion". - Indented continuation lines for [mod] PRs: Added/Updated/Removed are emitted as indented sub-rows under the bullet (capped at MAX_INLINE_MODELS = 5 per section, "(and N more)" for the rest) instead of stuffed inline. - Announcement-post generator: new --generate-announcement-post CLI flag (with optional --version) parses CHANGELOG.md via parseChangelogMarkdown and emits the Slack template (:qvac: header, NPM/GitHub/changelog links, conditional :warning: Breaking Changes, per-section bullets with <url> link wrapping and :boom: breaking markers, footer). Sections cap at MAX_ANNOUNCEMENT_BULLETS = 10 with "... And much more, see full list in changelog :memo:" only when strictly more than 10. - New helpers exported: parseChangelogMarkdown, generateAnnouncementPost. Skill (.cursor/skills/sdk-changelog/SKILL.md): - Step 4 (CHANGELOG_LLM.md) is now mandatory. - New Step 5: generate announcement-post.txt (mandatory) with the gitignore note and template spec. - NOTICE renumbered to Step 6. - Documented all new policies (backmerge, companion, entry-count strip, indentation, max-bullets cap). - CLI parameters table refreshed. .gitignore: - Added packages/*/changelog/*/announcement-post.txt. The post is a Slack copy-paste working artifact, not a release deliverable. Release notes for 0.10.0: - New packages/sdk/changelog/0.10.0/ folder with CHANGELOG.md, breaking.md, api.md, models.md, CHANGELOG_LLM.md. - Root aggregate packages/sdk/CHANGELOG.md rebuilt with v0.10.0 at top. - packages/sdk/NOTICE refreshed (191 models, 179 JS deps). - packages/sdk/package.json bumped 0.9.1 -> 0.10.0. Backmerge of release-sdk-0.10.0 -> main is a no-op for the release artifacts (changelog, NOTICE) because they land here directly.

…desktop runner (#1832) * QVAC-17837 feat[ci]: surface synthetic IndicTrans [GPU] row on every desktop runner The on-PR Step Summary previously showed [GPU] rows only on the 2 of 6 desktop runners that have a real GGML GPU backend bound today (macOS Metal, ai-run-windows11-gpu Vulkan). The 4 hosted Linux runners showed [CPU]-only rows because: - bergamot.test.js + pivot-bergamot.test.js gate their GPU probe loop on `if (isMobile)` so they never run GPU on desktop, and - indictrans.test.js does probe GPU on every platform but discoverGpuDevices() returns empty when GGML can't bind a backend (loader fix is still pending per QVAC-17640 / QVAC-17880). This commit adds a synthetic always-running [IndicTrans] [GPU] test that loads with use_gpu: true and no explicit gpu_device. The existing shared runSingleTranslation helper records perf regardless of the resolved backend; resolveExecutionProvider (now lifted into utils.js) tags the execution_provider as 'cpu (fallback)' when GGML emitted a CPU sentinel and as the real backend tag (vulkan/metal/opencl/...) when a GPU resolved. So today the 4 Linux runners show CPU + GPU(cpu (fallback)) rows, and macOS / ai-run-windows11-gpu show CPU + GPU(real) rows. Once Ian's GPU loader fix lands on a given platform, the same row's EP auto-flips from 'cpu (fallback)' to the real backend without further CI wiring — that's the contract QVAC-17837's description asks for. Other clean-ups in the same file because the audit surfaced them: - resolveExecutionProvider now treats 'BLAS' as a CPU sentinel so the [CPU] row's EP no longer reports 'blas' on macOS. - discoverGpuDevices() now breaks on BLAS (suppresses macOS's three spurious [GPU:1 BLAS] / [GPU:2 BLAS] / [GPU:3 BLAS] rows) and dedupes by backend name (also fixes mobile Android's 4xVulkan0 duplicates when that file is next exercised, though mobile is out of scope for this PR). - The per-device GPU test's t.not(backendName, 'CPU') hard assertion is loosened to a t.comment warning so a silent GPU fallback at a discovered device index doesn't fail CI on a perf-only test. Bergamot and Pivot stay CPU-only on desktop. Bergamot is intgemm-only and has no GPU port architecturally, so a synthetic GPU row for those tests would be perpetual fallback noise. Mobile workflows are unchanged. Made-with: Cursor * QVAC-17837 fix: address parallel-review feedback on synthetic GPU test Two correctness/consistency follow-ups from the parallel review: - Wrap the new synthetic [IndicTrans] [GPU] test in `if (!isMobile)`. D2 scope explicitly said mobile workflows are untouched, but the test had no mobile gate so it would have added a duplicate default-device GPU row alongside the existing per-device probe rows on Pixel/S25/iPhone. Mobile already has meaningful GPU rows; the synthetic row is only needed on the 6 desktop runners that today emit zero GPU rows for some/all tests. - Replace the literal `backendName === 'CPU'` check in the per-device GPU test's soft-fallback warning with `CPU_SENTINEL_BACKENDS.has(...)` so the warning fires consistently for every backend treated as CPU by `resolveExecutionProvider` (including BLAS and Unloaded), not just the addon's `CPU` sentinel. Same set, same definition, one source of truth. No behaviour change on desktop; restores intended D2 scope on mobile; self-consistent fallback definition between the helper and the warning. Reviewers' other findings (`feat[ci]:` tag style, BLAS-break order dependency, Bergamot/Pivot still using regex EP fallback) are documented or pre-existing — not addressed here. Made-with: Cursor * QVAC-17837 fix[lint]: re-indent synthetic [GPU] test body inside if (!isMobile) block Pure whitespace fix — `npm run lint:fix` (standard --fix). Sanity-checks job in CI run #25166275184 was failing on ESLint indent errors because the previous commit wrapped the test body in `if (!isMobile) {...}` without bumping each line's indentation by 2 spaces. `git diff -w` is empty. Made-with: Cursor

…flows (#1728)" (#1871) * Revert "fix: prevent code injection and untrusted checkout in CI workflows (#1728)" Reverts commit a79602f, with two intentional exclusions noted below. Excluded from this revert: - .github/actions/run-lint-and-unit-tests/action.yaml: kept at its current state on main; the env-var indirection #1728 introduced for npm-token/pat-token in the .npmrc-configuration step is preserved. - .github/workflows/cpp-lint.yaml: net effect on this file is zero. PR #1829 (commit 65bd746) later rewrote the same `cpp-lint` job and added `id-token: write` to the `permissions` block originally introduced by #1728. The `permissions` block is preserved as-is (contents: read + id-token: write) because #1829's AWS OIDC integration depends on it. All other changes from #1728 are reverted. Co-authored-by: Cursor <cursoragent@cursor.com> * Potential fix for pull request finding 'CodeQL / Code injection' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

…ed addon (#1833) * feat: add multi-GPU pipeline parallelism via split-mode config Ports the split-mode/tensor-split feature from the LLM addon to the embed addon. When split-mode is layer or row and a GPU backend is available, the --device flag is omitted so llama.cpp distributes embedding model layers across all available GPUs. Falls back to CPU silently when no GPU is found. - Add split-mode (none|layer|row) and tensor-split config keys to setupParams, accepting both hyphen and underscore variants - Omit --device in split mode so llama.cpp routes across all GPUs - Accept main_gpu underscore variant alongside main-gpu in tryMainGpuFromMap - Add getEffectiveGpuDeviceCount() to BackendSelection for GPU inventory - Add split-mode and tensor-split to GGMLConfig in index.d.ts - Bump version 0.14.0 -> 0.15.0 * test: add multi-GPU split-mode tests and benchmark example Ports the test and example surface from the LLM multi-GPU PR to the embed addon, matching the pattern exactly. - Add BertModel::getCommonParams() so tests can inspect split_mode after load - Add 8 BertModelTest split-mode cases: none, layer, row, case-insensitive, underscore variant, CPU fallback clears GPU params, invalid value, both keys reject - Add 9 BackendSelectionTest getEffectiveGpuDeviceCount cases covering zero, CPU-only, single dGPU, single iGPU, two dGPUs, dGPU+iGPU, two dGPUs+iGPU, two iGPUs, and accel/CPU ignored - Add test/integration/spec-logger.js for native log capture in integration tests - Add test/integration/multi-gpu.test.js: 4 integration tests gated on QVAC_HAS_MULTI_GPU=1 (layer, row, default single-device, layer+tensor-split) - Add examples/multiGpuBenchmark.js: single vs layer vs row throughput comparison using the embed model - Regenerate test/mobile/integration.auto.cjs with runMultiGpuTest entry * fix: harden CPU fallback and add missing main_gpu alias tests CPU fallback in setupParams was missing two details present in the final LLM implementation: - Set params.main_gpu = -1 on CPU fallback so llama.cpp does not retain a stale GPU index. - Reset the local splitMode variable to LLAMA_SPLIT_MODE_NONE after the CPU-fallback warning so the --device gate below emits --device correctly instead of silently suppressing it when the requested split mode was layer or row. Also add two missing BackendSelection unit tests for the main_gpu underscore alias and both-key rejection introduced in tryMainGpuFromMap, mirroring the coverage in the LLM package. * fix: wire all integration tests into test:integration runner test:integration was hardcoded to addon.test.js, so multi-gpu.test.js and multi-instance.test.js were never executed in desktop CI. Switch to the same generate-then-run-all pattern used by the LLM addon: brittle -r generates test/integration/all.js from the full *.test.js glob, then bare runs it. * fix: resolve cpp-lint failures in BackendSelection and BertModel Apply clang-format and clang-tidy fixes flagged by the cpp-lint job: - Use std::ranges::transform in BackendSelection.cpp and BertModel.cpp - Drop else-after-return in parseMainGpu - Rename short iterator names (it -> foundIt/configIt/splitModeIt) - Use designated initializers for BackendInterface and BertEmbeddings::Layout - Drop redundant (void) on BackendInterface function pointer - Move pointer-arithmetic NOLINT to the diagnostic line in batchDecode - Extract parseSplitMode helper to bring setupParams cognitive complexity back under the threshold - Suppress non-const-global and macro-usage diagnostics in logging.hpp - Reorder includes in test_bert_model.cpp and collapse getCommonParams to a single line for clang-format --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* QVAC-18184 chore[notask|skiplog]: backmerge release sdk 0.9.2 Brings the 0.9.2 release artifacts back into main now that @qvac/sdk@0.9.2 has been published to npm (`latest` dist-tag, 2026-05-01 10:09 UTC). - Bump packages/sdk/package.json: 0.9.1 -> 0.9.2 - Add packages/sdk/changelog/0.9.2/CHANGELOG.md and CHANGELOG_LLM.md - Prepend 0.9.2 entry to aggregated packages/sdk/CHANGELOG.md Hotfix content (z.xor -> z.union, zod floor bump) is the cherry-pick of #1790 that already landed on main, so no source changes here. Dependencies in package.json are intentionally NOT brought over from the release branch — main has progressed past 0.9.1 on several internal packages (e.g. @qvac/llm-llamacpp 0.14.4 -> 0.17.1, @qvac/translation-nmtcpp 0.6.10 -> 2.0.1, react-native-bare-kit 0.11.5 -> 0.12.3) and a blind merge would regress them. Only the version field is changed, matching the 0.9.1 backmerge precedent (#1726). * chore[skiplog]: drop package.json version bump from backmerge to avoid conflict with 0.10.0 PR PR #1865 (the 0.10.0 release) is open against main and bumps packages/sdk/package.json version 0.9.1 -> 0.10.0. This backmerge was bumping the same line 0.9.1 -> 0.9.2, so whichever lands second hits a conflict on that single line. Since main is moving to 0.10.0 directly (the 0.9.2 hotfix is a separate release line), drop the package.json change from this backmerge and let #1865 own the version bump. Main's package.json will briefly say 0.9.1 while CHANGELOG.md lists 0.9.2 as the latest shipped version, but that's transient — #1865 overwrites it to 0.10.0 anyway. Keep the changelog artifacts (changelog/0.9.2/ folder + the prepended ## [0.9.2] entry in aggregated CHANGELOG.md) so main retains a record of the 0.9.2 release in its history. --------- Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>

…x.d.ts (#1613) * feat[api]: export RuntimeStats interface in NMT addon index.d.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump @qvac/translation-nmtcpp to 2.0.2 and update changelog Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * doc: document RuntimeStats units and per-backend fields; fix README ms→s Address review on PR #1613: - Add JSDoc to `RuntimeStats` clarifying that `totalTime`/`encodeTime`/ `decodeTime` are seconds while `TTFT` is milliseconds, and listing which fields each backend emits (Bergamot omits `encodeTime`/`TTFT`). Note that pivot translations use prefixed keys. - Fix README quickstart that printed `totalTime` with a `'ms'` label even though the C++ producer emits seconds. --------- Co-authored-by: Ramaz Tskhadadze <bubu@Ramazs-MacBook-Pro-2.local> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… qvac-internal teams (#1877) Repoint code ownership from `@tetherto/ai-runtime-merge` and `@tetherto/ai-runtime-bk` to `@tetherto/qvac-internal-dev`, and add `qvac-internal-merge` to the approval-check-worker team-lead and team-member checks while keeping the legacy `ai-runtime-merge*` teams in place during the transition.

…ow_dispatch (#1839) * QVAC-18111 infra[notask]: scaffold Benchmark Performance (LLM) workflow_dispatch GitHub requires a `workflow_dispatch` workflow to exist on the default branch before it shows up in the Actions tab and becomes triggerable with `--ref <feature-branch>`. This lands the LLM benchmark workflow on `main` so the QVAC-17830 perf-metrics feature branch can be dispatched against it for end-to-end validation. Changes: - `benchmark-performance-qvac-lib-infer-llamacpp-llm.yml` (new): manual `workflow_dispatch` only — mirrors the structure of the existing Parakeet / Whispercpp benchmark workflows. Calls `prebuilds-...yml` then `integration-test-...yml` with bench-mode iteration counts (`QVAC_PERF_RUNS=3`, `QVAC_PERF_WARMUP_RUNS=1` by default), then aggregates desktop artifacts into a combined HTML / step-summary. Phase-1 scope is desktop only — mobile (Device Farm) needs a build-time hook in the test app to thread env vars through to bare and is tracked as a QVAC-18111 follow-up. - `integration-test-qvac-lib-infer-llamacpp-llm.yml`: thread `qvac_perf_runs` / `qvac_perf_warmup_runs` through `workflow_call` + `workflow_dispatch` and surface them as `QVAC_PERF_RUNS` / `QVAC_PERF_WARMUP_RUNS` on the Linux/macOS and Windows test run steps. Empty string => unset, so the umbrella PR workflow continues to honour the test-side default and PR runs are unaffected by this change. Per the perf policy agreed on Slack (2026-04-30): the umbrella on-pr workflow runs perf tests at the cheap default so we don't pay full perf cost on every PR; this dedicated workflow is the only place we crank up the iteration counts to produce mean ± std numbers. Made-with: Cursor * QVAC-18111 chore[notask]: trim chatty inline comments in benchmark workflow Made-with: Cursor * QVAC-18111 chore[notask]: add run_desktop toggle to benchmark workflow_dispatch Made-with: Cursor --------- Co-authored-by: olyasir <sirkinolya@gmail.com>

* chore(onnx-tts): bump addon-cpp to 1.1.6 Update qvac-lib-inference-addon-cpp version constraint in vcpkg.json from 1.1.5#1 to 1.1.6 and add a corresponding CHANGELOG entry under the existing [Unreleased] section. Made-with: Cursor * chore(tts): bump version to 0.8.6 Bump @qvac/tts-onnx from 0.8.5 to 0.8.6 and convert the [Unreleased] CHANGELOG section to [0.8.6] for the addon-cpp 1.1.6 release alongside the queued Chatterbox engine and tensor-helper changes. Made-with: Cursor --------- Co-authored-by: Mariusz Reichert <reichert.programming@gmail.com> Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

* chore(whispercpp): bump addon-cpp to 1.1.6 Update qvac-lib-inference-addon-cpp version constraint in vcpkg.json from 1.1.5#1 to 1.1.6 and add a corresponding CHANGELOG entry. Made-with: Cursor * chore(whispercpp): bump version to 0.6.7 Bump @qvac/transcription-whispercpp from 0.6.6 to 0.6.7 and convert the [Unreleased] CHANGELOG section to [0.6.7] for the addon-cpp 1.1.6 release. Made-with: Cursor --------- Co-authored-by: Mariusz Reichert <reichert.programming@gmail.com>

…angelog (#1867) Brings the @qvac/cli@0.3.0 release artifacts back onto main per gitflow.md "Keep main aligned". Same shape as #1766 (the 0.2.4 backmerge precedent). - packages/cli/package.json: version 0.2.4 -> 0.3.0 - packages/cli/changelog/0.3.0/CHANGELOG.md: new - packages/cli/changelog/0.3.0/api.md: new - packages/cli/CHANGELOG.md: prepend ## [0.3.0] entry NOTE: Opened as DRAFT because the companion release PR #1836 is also still draft and 5 of its CI checks are failing. @qvac/cli@0.3.0 has not yet been published to npm (latest is 0.2.4). Mark this PR ready for review only after #1836 merges into release-cli-0.3.0 and the GPR/npm publish completes. The source-level changes (@qvac/sdk devDep ^0.10.0 + sdk.ts MIN_SDK_VERSION='0.10.0') are already on main from PR #1810 — only the release metadata needs to come back. CLI's package.json on main has no dependency drift versus release-cli-0.3.0, so unlike the SDK 0.9.2 backmerge (#1857) the package.json version bump can be safely included here. There's also no competing CLI release PR in flight on main. Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>

…a pushFile (#1840) * QVAC-18111 infra[notask]: scaffold Benchmark Performance (LLM) workflow_dispatch GitHub requires a `workflow_dispatch` workflow to exist on the default branch before it shows up in the Actions tab and becomes triggerable with `--ref <feature-branch>`. This lands the LLM benchmark workflow on `main` so the QVAC-17830 perf-metrics feature branch can be dispatched against it for end-to-end validation. Changes: - `benchmark-performance-qvac-lib-infer-llamacpp-llm.yml` (new): manual `workflow_dispatch` only — mirrors the structure of the existing Parakeet / Whispercpp benchmark workflows. Calls `prebuilds-...yml` then `integration-test-...yml` with bench-mode iteration counts (`QVAC_PERF_RUNS=3`, `QVAC_PERF_WARMUP_RUNS=1` by default), then aggregates desktop artifacts into a combined HTML / step-summary. Phase-1 scope is desktop only — mobile (Device Farm) needs a build-time hook in the test app to thread env vars through to bare and is tracked as a QVAC-18111 follow-up. - `integration-test-qvac-lib-infer-llamacpp-llm.yml`: thread `qvac_perf_runs` / `qvac_perf_warmup_runs` through `workflow_call` + `workflow_dispatch` and surface them as `QVAC_PERF_RUNS` / `QVAC_PERF_WARMUP_RUNS` on the Linux/macOS and Windows test run steps. Empty string => unset, so the umbrella PR workflow continues to honour the test-side default and PR runs are unaffected by this change. Per the perf policy agreed on Slack (2026-04-30): the umbrella on-pr workflow runs perf tests at the cheap default so we don't pay full perf cost on every PR; this dedicated workflow is the only place we crank up the iteration counts to produce mean ± std numbers. Made-with: Cursor * QVAC-18111 chore[notask]: trim chatty inline comments in benchmark workflow Made-with: Cursor * QVAC-18111 chore[notask]: add run_desktop toggle to benchmark workflow_dispatch Made-with: Cursor * QVAC-18111 infra[notask]: bridge QVAC_PERF_RUNS to mobile test app via pushFile Extends the mobile integration workflow with the same iteration-count inputs as the desktop reusable workflow, and adds a `mobile-benchmarks` job to the LLM benchmark dispatch so it covers Device Farm too. The bare runtime on Device Farm doesn't see GitHub Actions env vars, so we mirror the existing `testFilter.txt` pattern: when the workflow inputs are non-empty, the WDIO before-hook pushes a `qvacPerfConfig.txt` to the device (Android: `/data/local/tmp/`, iOS: `@bundleId:documents/`) with the iteration overrides as KEY=VALUE lines. The file-reading side on bare lives on the QVAC-17830 perf branch — without that branch this PR is a no-op (orphan file), so it is safe to land independently. Changes: - `integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml`: add `qvac_perf_runs` / `qvac_perf_warmup_runs` to `workflow_call` and `workflow_dispatch`; add `__QVAC_PERF_RUNS__` / `__QVAC_PERF_WARMUP_RUNS__` placeholders to the Android + iOS WDIO config blobs and the corresponding pushFile block in the `before` hook; substitute the placeholders in `make_split`. - `benchmark-performance-qvac-lib-infer-llamacpp-llm.yml`: add a `mobile-benchmarks` job calling the mobile workflow with the bench-mode iteration counts; have `summarize` `needs:` it; drop the "desktop only" caveat in the step-summary blurb. PR runs are unchanged: empty input ⇒ empty placeholder ⇒ before-hook skips the perf-config push. Made-with: Cursor * QVAC-18111 chore[notask]: add run_mobile toggle to benchmark workflow_dispatch Made-with: Cursor --------- Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>

* QVAC-18064 feat: optimize nmtcpp for Android GPU inference - Optimize nmtcpp for Android GPU inference with Vulkan backend support - Move beam search KV cache pool to CPU backend - Propagate config params after GGML context load and fix multi-GPU handling - Disable OpenCL until upstream qvac-fabric is updated - Prevent backend device accumulation and skip OpenCL comparison test - Fix clang-format for ggml_backend_load_all_from_path call - Remove Android debug logging added for Adreno 830 crash investigation - Resolve cpp-lint clang-tidy naming and implicit-bool errors - Address code review findings --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Port LlamacppUtils.hpp helpers to common_init_result_ptr API. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Update vcpkg.json --------- Signed-off-by: Marcus Edel <marcus.edel@collabora.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* QVAC-17989 Add post-generation ESRGAN upscale * QVAC-17989 Add ESRGAN JS test and example * QVAC-17989 Fix upscaled output stats * Update CHANGELOG.md * Update package.json * QVAC-17989 Format ESRGAN handler changes --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* Create new buckets to run tests in independent processes. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * ci(ios): include all run ARNs in results aggregation and log download The two RUN_ARNS aggregation loops were hardcoded to iterate over indices 2..8, so the new Heavy7/Heavy8 runs (RUN_ARN_9, RUN_ARN_10) were silently dropped from the final test-results summary and the Device Farm log download. As a result, Heavy7/Heavy8 failures would not have failed the workflow and their device logs would not have been collected. Iterate up to RUN_COUNT instead, so any future bucket additions are picked up automatically. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Signed-off-by: Marcus Edel <marcus.edel@collabora.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io> Co-authored-by: Cursor <cursoragent@cursor.com>

) The "Create and Upload Test Spec" step's run: | block in integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml grew to 21,074 chars after #1889, putting it just over GitHub Actions' 21,000-char limit on a single template expression. This breaks every reusable-workflow_call into the file, so the On PR Trigger (LLM) workflow fails instantly with: error parsing called workflow ... : (Line: 914, Col: 14): Exceeded max expression length 21000 and no jobs run. Every open PR that touches the LLM package is currently blocked from getting LLM CI. Fix: remove 32 in-block comment lines that were pure narration of already-readable code (echo/printf/sed) and verbose intent text duplicated by the surrounding context. Brings the run-block payload to ~19,008 chars (well under 21,000) without changing any executed logic. Diff is comments-only: 32 deletions, 0 additions. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>

…ter support to diffusion API (#1838) * feat[api]: add FLUX.2 multi-reference fusion and LoRA adapter support to diffusion API * doc[skiplog]: trim verbose lora docs and prune zod-builtin tests Address PR review: - shorten lora_apply_mode description in sdcppConfigSchema and drop the external file references the user can't see at usage time - shorten the LoRA JSDoc block in diffusion.ts to the essentials - drop unit tests that effectively re-assert zod built-ins (z.boolean(), z.string().min(1), individual enum members); keep the ABSOLUTE_PATH_PATTERN matrix, the mutual-exclusion refine, and one happy-path per new field Made-with: Cursor * test[api]: validate FLUX.2 fusion diverges from txt2img baseline and reject conflicting init_image inputs

New composite action that installs LLVM/Clang to a pinned version on Linux and Windows runners and exposes the unversioned binaries on PATH. Intended to become the single source of truth for the LLVM major used across every prebuild / cpp-test / coverage / benchmark workflow in the monorepo: bumping `version` (Linux apt major) and `windows-version` (chocolatey full pin) defaults rolls the whole repo forward in one place. - Linux: install via apt.llvm.org `llvm.sh <version> all`, then prepend `/usr/lib/llvm-<version>/bin` to `$GITHUB_PATH` so unversioned `clang`, `clang++`, `clang-format`, `clang-tidy`, `git-clang-format`, `lld`, `llvm-cov`, `llvm-profdata`, ... resolve to the chosen major. - Windows: `choco upgrade llvm --version=<windows-version> -y --allow-downgrade` (defaults to a specific patch release to avoid silent drift when chocolatey ships a new one) and add `C:\Program Files\LLVM\bin` to `$GITHUB_PATH`. - macOS: no-op (Apple Clang is set up via setup-apple-clang). Defaults: version=22, windows-version=22.1.0.

…ancel-on-first-token (#1880)

github-actions · 2026-05-22T10:12:55Z

QVAC E2E — `ios` — ✅ all tests passed (82/91, 990s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports · Device Farm logs

github-actions · 2026-05-22T10:12:55Z

QVAC E2E — `windows` — ✅ all tests passed (91/91, 408s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

github-actions · 2026-05-22T10:12:57Z

QVAC E2E — `linux` — ✅ all tests passed (91/91, 244s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

github-actions · 2026-05-22T10:12:58Z

QVAC E2E — `android` — ✅ all tests passed (83/91, 2339s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports · Device Farm logs

github-actions · 2026-05-22T10:13:03Z

QVAC E2E — `macos` — ⚠️ no results

Config: suite=smoke · filter=(none) · exclude=(none)
View run

The test job did not produce a results artifact. Check the run for job-level failures.

NamelsKing and others added 30 commits May 1, 2026 12:04

QVAC-18142 [Whisper] v0.6.6 (#1830)

cc0f267

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

infra[QVAC-17058]: add empty BCI whispercpp workflow stubs to enable …

5e47460

…branch dispatch (#1856) Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

doc: autogen pipeline - fix release branch prefix (#1846)

5444738

QVAC-18280 [Decoder] v0.3.9 (#1859)

1c16b5a

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

feat: add sdk-backmerge skill and chain it from sdk-pr-create for rel…

cafc2f9

…ease prs (#1862)

doc: release v0.10.0 - API summary + release notes (#1882)

9c12ed0

doc: fix - missing redirect for v0.9.1 (#1888)

c7a8fe6

Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>

fix(llm): prime system-prompt KV cache via addon prefill instead of c…

0d755b4

…ancel-on-first-token (#1880)

This comment has been minimized.

Sign in to view

Victor-Rodzko added test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] and removed test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] labels May 22, 2026

Victor-Rodzko had a problem deploying to release May 22, 2026 10:12 — with GitHub Actions Failure

Victor-Rodzko temporarily deployed to release May 22, 2026 10:12 — with GitHub Actions Inactive

Victor-Rodzko had a problem deploying to release May 22, 2026 10:12 — with GitHub Actions Error

Victor-Rodzko temporarily deployed to release May 22, 2026 10:12 — with GitHub Actions Inactive

Victor-Rodzko temporarily deployed to release May 22, 2026 10:21 — with GitHub Actions Inactive

Victor-Rodzko had a problem deploying to release May 22, 2026 10:21 — with GitHub Actions Failure

Victor-Rodzko temporarily deployed to release May 22, 2026 10:43 — with GitHub Actions Inactive

Victor-Rodzko temporarily deployed to release May 22, 2026 11:10 — with GitHub Actions Inactive

Victor-Rodzko temporarily deployed to release May 22, 2026 11:51 — with GitHub Actions Inactive

Victor-Rodzko temporarily deployed to release May 22, 2026 13:06 — with GitHub Actions Inactive

Victor-Rodzko temporarily deployed to release May 22, 2026 13:15 — with GitHub Actions Inactive

Victor-Rodzko temporarily deployed to release May 22, 2026 13:34 — with GitHub Actions Inactive

Proletter force-pushed the test/qvac-17810-img2img-integration-tests branch 2 times, most recently from 9daf47c to 39d2782 Compare May 24, 2026 18:30

Proletter closed this May 24, 2026

Proletter force-pushed the main branch from 098a0fc to b8310f6 Compare May 24, 2026 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-17810 test[skiplog]: add img2img integration tests for diffusion#2186

QVAC-17810 test[skiplog]: add img2img integration tests for diffusion#2186
Victor-Rodzko wants to merge 1338 commits into
mainfrom
test/qvac-17810-img2img-integration-tests

Victor-Rodzko commented May 21, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

Victor-Rodzko commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What problem does this PR solve?

📝 How does it solve it?

🧪 How was it tested?

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QVAC E2E — ios — ✅ all tests passed (82/91, 990s)

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QVAC E2E — windows — ✅ all tests passed (91/91, 408s)

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QVAC E2E — linux — ✅ all tests passed (91/91, 244s)

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QVAC E2E — android — ✅ all tests passed (83/91, 2339s)

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QVAC E2E — macos — ⚠️ no results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Victor-Rodzko commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading

QVAC E2E — `ios` — ✅ all tests passed (82/91, 990s)

github-actions Bot commented May 22, 2026 •

edited

Loading

QVAC E2E — `windows` — ✅ all tests passed (91/91, 408s)

github-actions Bot commented May 22, 2026 •

edited

Loading

QVAC E2E — `linux` — ✅ all tests passed (91/91, 244s)

github-actions Bot commented May 22, 2026 •

edited

Loading

QVAC E2E — `android` — ✅ all tests passed (83/91, 2339s)

github-actions Bot commented May 22, 2026 •

edited

Loading

QVAC E2E — `macos` — ⚠️ no results