QVAC-17810 test[skiplog]: add img2img integration tests for diffusion#2186
Closed
Victor-Rodzko wants to merge 1338 commits into
Closed
QVAC-17810 test[skiplog]: add img2img integration tests for diffusion#2186Victor-Rodzko wants to merge 1338 commits into
Victor-Rodzko wants to merge 1338 commits into
Conversation
…dk publish doesn't auto-skip (#1853) publish-npm needs [build, publish-logic, release-merge-guard]. On a manual workflow_dispatch from a release-sdk-* branch, the guard's if: rejected the event (push only), so the guard was skipped, and GitHub Actions' implicit success() check on needs auto-skipped publish-npm before its if: with the explicit needs.release-merge-guard.result == 'skipped' branch could even be evaluated. Allow the guard to run on workflow_dispatch too. The guard already handles workflow_dispatch safely: github.event.before is empty, so base-sha is empty, so isInitialPush is true and the changelog diff check is skipped. The branch-name pattern check and the package.json-version-matches-branch check still run, which is what we want for a manual release publish. Net effect: manual publish-sdk dispatches on release branches now actually reach the publish-npm job instead of silently skipping.
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
…branch dispatch (#1856) Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
#1835) The bare worker leaks indefinitely when started while another SDK process holds the registry corestore lock. Root cause: `corestoreOpts: { wait: true }` issues a blocking `flock(LOCK_EX)` on a libuv worker thread that JS cannot cancel, so when SIGTERM/IPC-disconnect arrives, the in-flight `client.ready()` never resolves (cleanup early-returns with `registryClient = null`) and `process.exit()` cannot terminate Bare while the native handle is held. The OS process wedges forever, breaking the three `no-lingering-bare-*` e2e tests in mixed-suite runs. `wait: true` was deliberately added by #1480 (QVAC-12232) to tolerate transient lock contention during another SDK's startup/shutdown; reverting to the bare default would re-introduce that bug. Instead, switch to `wait: false` (tryLock) and provide an equivalent JS-bounded retry budget in the existing retry loop: - 8 attempts, 250 ms base backoff, capped by a 10 s deadline - each step is a fresh non-blocking syscall — `EBUSY` surfaces to JS immediately, so shutdown remains cancellable at every point - exhausted budget propagates the underlying error, hitting the existing `closeRegistryClient` early-return on `null` and letting `process.exit()` terminate the worker cleanly As defense in depth, arm a 3-second SIGKILL safety net in `shutdownBareDirectWorker` (unrefed timer) before calling `process.exit`, so any future blocking-handle bug can't survive shutdown. Covered by existing `no-lingering-bare-{sigterm,close,ipc-disconnect}` e2e tests, which now pass in mixed-suite runs. Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>
* doc: create Cursor rule for docs website * docs: add robots.txt to website * doc: website source - refactor - standardize env vars to standard used in JSON and infra envs like GH Actions * doc: website source - add autogen sitemap.xml * doc: website source - add JSON-LD * doc: frontmatter improvement - add type of page to enrich metadata * doc: content update - add missing frontmatter field for SEO * doc: website source - robots.txt - add AI bot rules * doc: website source - simplify SEO machinery * doc: website source - robots.txt - add content signals
…otes (#1865) Tooling (scripts/sdk/generate-changelog-sdk-pod.cjs): - Backmerge filter: PRs whose subject starts with `Backmerge` or `Merge release ...` are skipped during processSDKPRs (same shape as the existing [skiplog] filter). - Companion filter + entry-count strip: new isCompanionEntry, stripEntryCount, cleanModelEntries helpers applied to the inline [mod] summary in CHANGELOG.md and the body of models.md. Recognises *_LEX / *_VOCAB / *_DATA / *_METADATA constant suffixes and any line containing the word "companion". - Indented continuation lines for [mod] PRs: Added/Updated/Removed are emitted as indented sub-rows under the bullet (capped at MAX_INLINE_MODELS = 5 per section, "(and N more)" for the rest) instead of stuffed inline. - Announcement-post generator: new --generate-announcement-post CLI flag (with optional --version) parses CHANGELOG.md via parseChangelogMarkdown and emits the Slack template (:qvac: header, NPM/GitHub/changelog links, conditional :warning: Breaking Changes, per-section bullets with <url> link wrapping and :boom: breaking markers, footer). Sections cap at MAX_ANNOUNCEMENT_BULLETS = 10 with "... And much more, see full list in changelog :memo:" only when strictly more than 10. - New helpers exported: parseChangelogMarkdown, generateAnnouncementPost. Skill (.cursor/skills/sdk-changelog/SKILL.md): - Step 4 (CHANGELOG_LLM.md) is now mandatory. - New Step 5: generate announcement-post.txt (mandatory) with the gitignore note and template spec. - NOTICE renumbered to Step 6. - Documented all new policies (backmerge, companion, entry-count strip, indentation, max-bullets cap). - CLI parameters table refreshed. .gitignore: - Added packages/*/changelog/*/announcement-post.txt. The post is a Slack copy-paste working artifact, not a release deliverable. Release notes for 0.10.0: - New packages/sdk/changelog/0.10.0/ folder with CHANGELOG.md, breaking.md, api.md, models.md, CHANGELOG_LLM.md. - Root aggregate packages/sdk/CHANGELOG.md rebuilt with v0.10.0 at top. - packages/sdk/NOTICE refreshed (191 models, 179 JS deps). - packages/sdk/package.json bumped 0.9.1 -> 0.10.0. Backmerge of release-sdk-0.10.0 -> main is a no-op for the release artifacts (changelog, NOTICE) because they land here directly.
…desktop runner (#1832) * QVAC-17837 feat[ci]: surface synthetic IndicTrans [GPU] row on every desktop runner The on-PR Step Summary previously showed [GPU] rows only on the 2 of 6 desktop runners that have a real GGML GPU backend bound today (macOS Metal, ai-run-windows11-gpu Vulkan). The 4 hosted Linux runners showed [CPU]-only rows because: - bergamot.test.js + pivot-bergamot.test.js gate their GPU probe loop on `if (isMobile)` so they never run GPU on desktop, and - indictrans.test.js does probe GPU on every platform but discoverGpuDevices() returns empty when GGML can't bind a backend (loader fix is still pending per QVAC-17640 / QVAC-17880). This commit adds a synthetic always-running [IndicTrans] [GPU] test that loads with use_gpu: true and no explicit gpu_device. The existing shared runSingleTranslation helper records perf regardless of the resolved backend; resolveExecutionProvider (now lifted into utils.js) tags the execution_provider as 'cpu (fallback)' when GGML emitted a CPU sentinel and as the real backend tag (vulkan/metal/opencl/...) when a GPU resolved. So today the 4 Linux runners show CPU + GPU(cpu (fallback)) rows, and macOS / ai-run-windows11-gpu show CPU + GPU(real) rows. Once Ian's GPU loader fix lands on a given platform, the same row's EP auto-flips from 'cpu (fallback)' to the real backend without further CI wiring — that's the contract QVAC-17837's description asks for. Other clean-ups in the same file because the audit surfaced them: - resolveExecutionProvider now treats 'BLAS' as a CPU sentinel so the [CPU] row's EP no longer reports 'blas' on macOS. - discoverGpuDevices() now breaks on BLAS (suppresses macOS's three spurious [GPU:1 BLAS] / [GPU:2 BLAS] / [GPU:3 BLAS] rows) and dedupes by backend name (also fixes mobile Android's 4xVulkan0 duplicates when that file is next exercised, though mobile is out of scope for this PR). - The per-device GPU test's t.not(backendName, 'CPU') hard assertion is loosened to a t.comment warning so a silent GPU fallback at a discovered device index doesn't fail CI on a perf-only test. Bergamot and Pivot stay CPU-only on desktop. Bergamot is intgemm-only and has no GPU port architecturally, so a synthetic GPU row for those tests would be perpetual fallback noise. Mobile workflows are unchanged. Made-with: Cursor * QVAC-17837 fix: address parallel-review feedback on synthetic GPU test Two correctness/consistency follow-ups from the parallel review: - Wrap the new synthetic [IndicTrans] [GPU] test in `if (!isMobile)`. D2 scope explicitly said mobile workflows are untouched, but the test had no mobile gate so it would have added a duplicate default-device GPU row alongside the existing per-device probe rows on Pixel/S25/iPhone. Mobile already has meaningful GPU rows; the synthetic row is only needed on the 6 desktop runners that today emit zero GPU rows for some/all tests. - Replace the literal `backendName === 'CPU'` check in the per-device GPU test's soft-fallback warning with `CPU_SENTINEL_BACKENDS.has(...)` so the warning fires consistently for every backend treated as CPU by `resolveExecutionProvider` (including BLAS and Unloaded), not just the addon's `CPU` sentinel. Same set, same definition, one source of truth. No behaviour change on desktop; restores intended D2 scope on mobile; self-consistent fallback definition between the helper and the warning. Reviewers' other findings (`feat[ci]:` tag style, BLAS-break order dependency, Bergamot/Pivot still using regex EP fallback) are documented or pre-existing — not addressed here. Made-with: Cursor * QVAC-17837 fix[lint]: re-indent synthetic [GPU] test body inside if (!isMobile) block Pure whitespace fix — `npm run lint:fix` (standard --fix). Sanity-checks job in CI run #25166275184 was failing on ESLint indent errors because the previous commit wrapped the test body in `if (!isMobile) {...}` without bumping each line's indentation by 2 spaces. `git diff -w` is empty. Made-with: Cursor
…flows (#1728)" (#1871) * Revert "fix: prevent code injection and untrusted checkout in CI workflows (#1728)" Reverts commit a79602f, with two intentional exclusions noted below. Excluded from this revert: - .github/actions/run-lint-and-unit-tests/action.yaml: kept at its current state on main; the env-var indirection #1728 introduced for npm-token/pat-token in the .npmrc-configuration step is preserved. - .github/workflows/cpp-lint.yaml: net effect on this file is zero. PR #1829 (commit 65bd746) later rewrote the same `cpp-lint` job and added `id-token: write` to the `permissions` block originally introduced by #1728. The `permissions` block is preserved as-is (contents: read + id-token: write) because #1829's AWS OIDC integration depends on it. All other changes from #1728 are reverted. Co-authored-by: Cursor <cursoragent@cursor.com> * Potential fix for pull request finding 'CodeQL / Code injection' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ed addon (#1833) * feat: add multi-GPU pipeline parallelism via split-mode config Ports the split-mode/tensor-split feature from the LLM addon to the embed addon. When split-mode is layer or row and a GPU backend is available, the --device flag is omitted so llama.cpp distributes embedding model layers across all available GPUs. Falls back to CPU silently when no GPU is found. - Add split-mode (none|layer|row) and tensor-split config keys to setupParams, accepting both hyphen and underscore variants - Omit --device in split mode so llama.cpp routes across all GPUs - Accept main_gpu underscore variant alongside main-gpu in tryMainGpuFromMap - Add getEffectiveGpuDeviceCount() to BackendSelection for GPU inventory - Add split-mode and tensor-split to GGMLConfig in index.d.ts - Bump version 0.14.0 -> 0.15.0 * test: add multi-GPU split-mode tests and benchmark example Ports the test and example surface from the LLM multi-GPU PR to the embed addon, matching the pattern exactly. - Add BertModel::getCommonParams() so tests can inspect split_mode after load - Add 8 BertModelTest split-mode cases: none, layer, row, case-insensitive, underscore variant, CPU fallback clears GPU params, invalid value, both keys reject - Add 9 BackendSelectionTest getEffectiveGpuDeviceCount cases covering zero, CPU-only, single dGPU, single iGPU, two dGPUs, dGPU+iGPU, two dGPUs+iGPU, two iGPUs, and accel/CPU ignored - Add test/integration/spec-logger.js for native log capture in integration tests - Add test/integration/multi-gpu.test.js: 4 integration tests gated on QVAC_HAS_MULTI_GPU=1 (layer, row, default single-device, layer+tensor-split) - Add examples/multiGpuBenchmark.js: single vs layer vs row throughput comparison using the embed model - Regenerate test/mobile/integration.auto.cjs with runMultiGpuTest entry * fix: harden CPU fallback and add missing main_gpu alias tests CPU fallback in setupParams was missing two details present in the final LLM implementation: - Set params.main_gpu = -1 on CPU fallback so llama.cpp does not retain a stale GPU index. - Reset the local splitMode variable to LLAMA_SPLIT_MODE_NONE after the CPU-fallback warning so the --device gate below emits --device correctly instead of silently suppressing it when the requested split mode was layer or row. Also add two missing BackendSelection unit tests for the main_gpu underscore alias and both-key rejection introduced in tryMainGpuFromMap, mirroring the coverage in the LLM package. * fix: wire all integration tests into test:integration runner test:integration was hardcoded to addon.test.js, so multi-gpu.test.js and multi-instance.test.js were never executed in desktop CI. Switch to the same generate-then-run-all pattern used by the LLM addon: brittle -r generates test/integration/all.js from the full *.test.js glob, then bare runs it. * fix: resolve cpp-lint failures in BackendSelection and BertModel Apply clang-format and clang-tidy fixes flagged by the cpp-lint job: - Use std::ranges::transform in BackendSelection.cpp and BertModel.cpp - Drop else-after-return in parseMainGpu - Rename short iterator names (it -> foundIt/configIt/splitModeIt) - Use designated initializers for BackendInterface and BertEmbeddings::Layout - Drop redundant (void) on BackendInterface function pointer - Move pointer-arithmetic NOLINT to the diagnostic line in batchDecode - Extract parseSplitMode helper to bring setupParams cognitive complexity back under the threshold - Suppress non-const-global and macro-usage diagnostics in logging.hpp - Reorder includes in test_bert_model.cpp and collapse getCommonParams to a single line for clang-format --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
* QVAC-18184 chore[notask|skiplog]: backmerge release sdk 0.9.2 Brings the 0.9.2 release artifacts back into main now that @qvac/sdk@0.9.2 has been published to npm (`latest` dist-tag, 2026-05-01 10:09 UTC). - Bump packages/sdk/package.json: 0.9.1 -> 0.9.2 - Add packages/sdk/changelog/0.9.2/CHANGELOG.md and CHANGELOG_LLM.md - Prepend 0.9.2 entry to aggregated packages/sdk/CHANGELOG.md Hotfix content (z.xor -> z.union, zod floor bump) is the cherry-pick of #1790 that already landed on main, so no source changes here. Dependencies in package.json are intentionally NOT brought over from the release branch — main has progressed past 0.9.1 on several internal packages (e.g. @qvac/llm-llamacpp 0.14.4 -> 0.17.1, @qvac/translation-nmtcpp 0.6.10 -> 2.0.1, react-native-bare-kit 0.11.5 -> 0.12.3) and a blind merge would regress them. Only the version field is changed, matching the 0.9.1 backmerge precedent (#1726). * chore[skiplog]: drop package.json version bump from backmerge to avoid conflict with 0.10.0 PR PR #1865 (the 0.10.0 release) is open against main and bumps packages/sdk/package.json version 0.9.1 -> 0.10.0. This backmerge was bumping the same line 0.9.1 -> 0.9.2, so whichever lands second hits a conflict on that single line. Since main is moving to 0.10.0 directly (the 0.9.2 hotfix is a separate release line), drop the package.json change from this backmerge and let #1865 own the version bump. Main's package.json will briefly say 0.9.1 while CHANGELOG.md lists 0.9.2 as the latest shipped version, but that's transient — #1865 overwrites it to 0.10.0 anyway. Keep the changelog artifacts (changelog/0.9.2/ folder + the prepended ## [0.9.2] entry in aggregated CHANGELOG.md) so main retains a record of the 0.9.2 release in its history. --------- Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>
…x.d.ts (#1613) * feat[api]: export RuntimeStats interface in NMT addon index.d.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump @qvac/translation-nmtcpp to 2.0.2 and update changelog Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * doc: document RuntimeStats units and per-backend fields; fix README ms→s Address review on PR #1613: - Add JSDoc to `RuntimeStats` clarifying that `totalTime`/`encodeTime`/ `decodeTime` are seconds while `TTFT` is milliseconds, and listing which fields each backend emits (Bergamot omits `encodeTime`/`TTFT`). Note that pivot translations use prefixed keys. - Fix README quickstart that printed `totalTime` with a `'ms'` label even though the C++ producer emits seconds. --------- Co-authored-by: Ramaz Tskhadadze <bubu@Ramazs-MacBook-Pro-2.local> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… qvac-internal teams (#1877) Repoint code ownership from `@tetherto/ai-runtime-merge` and `@tetherto/ai-runtime-bk` to `@tetherto/qvac-internal-dev`, and add `qvac-internal-merge` to the approval-check-worker team-lead and team-member checks while keeping the legacy `ai-runtime-merge*` teams in place during the transition.
…ow_dispatch (#1839) * QVAC-18111 infra[notask]: scaffold Benchmark Performance (LLM) workflow_dispatch GitHub requires a `workflow_dispatch` workflow to exist on the default branch before it shows up in the Actions tab and becomes triggerable with `--ref <feature-branch>`. This lands the LLM benchmark workflow on `main` so the QVAC-17830 perf-metrics feature branch can be dispatched against it for end-to-end validation. Changes: - `benchmark-performance-qvac-lib-infer-llamacpp-llm.yml` (new): manual `workflow_dispatch` only — mirrors the structure of the existing Parakeet / Whispercpp benchmark workflows. Calls `prebuilds-...yml` then `integration-test-...yml` with bench-mode iteration counts (`QVAC_PERF_RUNS=3`, `QVAC_PERF_WARMUP_RUNS=1` by default), then aggregates desktop artifacts into a combined HTML / step-summary. Phase-1 scope is desktop only — mobile (Device Farm) needs a build-time hook in the test app to thread env vars through to bare and is tracked as a QVAC-18111 follow-up. - `integration-test-qvac-lib-infer-llamacpp-llm.yml`: thread `qvac_perf_runs` / `qvac_perf_warmup_runs` through `workflow_call` + `workflow_dispatch` and surface them as `QVAC_PERF_RUNS` / `QVAC_PERF_WARMUP_RUNS` on the Linux/macOS and Windows test run steps. Empty string => unset, so the umbrella PR workflow continues to honour the test-side default and PR runs are unaffected by this change. Per the perf policy agreed on Slack (2026-04-30): the umbrella on-pr workflow runs perf tests at the cheap default so we don't pay full perf cost on every PR; this dedicated workflow is the only place we crank up the iteration counts to produce mean ± std numbers. Made-with: Cursor * QVAC-18111 chore[notask]: trim chatty inline comments in benchmark workflow Made-with: Cursor * QVAC-18111 chore[notask]: add run_desktop toggle to benchmark workflow_dispatch Made-with: Cursor --------- Co-authored-by: olyasir <sirkinolya@gmail.com>
* chore(onnx-tts): bump addon-cpp to 1.1.6 Update qvac-lib-inference-addon-cpp version constraint in vcpkg.json from 1.1.5#1 to 1.1.6 and add a corresponding CHANGELOG entry under the existing [Unreleased] section. Made-with: Cursor * chore(tts): bump version to 0.8.6 Bump @qvac/tts-onnx from 0.8.5 to 0.8.6 and convert the [Unreleased] CHANGELOG section to [0.8.6] for the addon-cpp 1.1.6 release alongside the queued Chatterbox engine and tensor-helper changes. Made-with: Cursor --------- Co-authored-by: Mariusz Reichert <reichert.programming@gmail.com> Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
* chore(whispercpp): bump addon-cpp to 1.1.6 Update qvac-lib-inference-addon-cpp version constraint in vcpkg.json from 1.1.5#1 to 1.1.6 and add a corresponding CHANGELOG entry. Made-with: Cursor * chore(whispercpp): bump version to 0.6.7 Bump @qvac/transcription-whispercpp from 0.6.6 to 0.6.7 and convert the [Unreleased] CHANGELOG section to [0.6.7] for the addon-cpp 1.1.6 release. Made-with: Cursor --------- Co-authored-by: Mariusz Reichert <reichert.programming@gmail.com>
…angelog (#1867) Brings the @qvac/cli@0.3.0 release artifacts back onto main per gitflow.md "Keep main aligned". Same shape as #1766 (the 0.2.4 backmerge precedent). - packages/cli/package.json: version 0.2.4 -> 0.3.0 - packages/cli/changelog/0.3.0/CHANGELOG.md: new - packages/cli/changelog/0.3.0/api.md: new - packages/cli/CHANGELOG.md: prepend ## [0.3.0] entry NOTE: Opened as DRAFT because the companion release PR #1836 is also still draft and 5 of its CI checks are failing. @qvac/cli@0.3.0 has not yet been published to npm (latest is 0.2.4). Mark this PR ready for review only after #1836 merges into release-cli-0.3.0 and the GPR/npm publish completes. The source-level changes (@qvac/sdk devDep ^0.10.0 + sdk.ts MIN_SDK_VERSION='0.10.0') are already on main from PR #1810 — only the release metadata needs to come back. CLI's package.json on main has no dependency drift versus release-cli-0.3.0, so unlike the SDK 0.9.2 backmerge (#1857) the package.json version bump can be safely included here. There's also no competing CLI release PR in flight on main. Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>
…a pushFile (#1840) * QVAC-18111 infra[notask]: scaffold Benchmark Performance (LLM) workflow_dispatch GitHub requires a `workflow_dispatch` workflow to exist on the default branch before it shows up in the Actions tab and becomes triggerable with `--ref <feature-branch>`. This lands the LLM benchmark workflow on `main` so the QVAC-17830 perf-metrics feature branch can be dispatched against it for end-to-end validation. Changes: - `benchmark-performance-qvac-lib-infer-llamacpp-llm.yml` (new): manual `workflow_dispatch` only — mirrors the structure of the existing Parakeet / Whispercpp benchmark workflows. Calls `prebuilds-...yml` then `integration-test-...yml` with bench-mode iteration counts (`QVAC_PERF_RUNS=3`, `QVAC_PERF_WARMUP_RUNS=1` by default), then aggregates desktop artifacts into a combined HTML / step-summary. Phase-1 scope is desktop only — mobile (Device Farm) needs a build-time hook in the test app to thread env vars through to bare and is tracked as a QVAC-18111 follow-up. - `integration-test-qvac-lib-infer-llamacpp-llm.yml`: thread `qvac_perf_runs` / `qvac_perf_warmup_runs` through `workflow_call` + `workflow_dispatch` and surface them as `QVAC_PERF_RUNS` / `QVAC_PERF_WARMUP_RUNS` on the Linux/macOS and Windows test run steps. Empty string => unset, so the umbrella PR workflow continues to honour the test-side default and PR runs are unaffected by this change. Per the perf policy agreed on Slack (2026-04-30): the umbrella on-pr workflow runs perf tests at the cheap default so we don't pay full perf cost on every PR; this dedicated workflow is the only place we crank up the iteration counts to produce mean ± std numbers. Made-with: Cursor * QVAC-18111 chore[notask]: trim chatty inline comments in benchmark workflow Made-with: Cursor * QVAC-18111 chore[notask]: add run_desktop toggle to benchmark workflow_dispatch Made-with: Cursor * QVAC-18111 infra[notask]: bridge QVAC_PERF_RUNS to mobile test app via pushFile Extends the mobile integration workflow with the same iteration-count inputs as the desktop reusable workflow, and adds a `mobile-benchmarks` job to the LLM benchmark dispatch so it covers Device Farm too. The bare runtime on Device Farm doesn't see GitHub Actions env vars, so we mirror the existing `testFilter.txt` pattern: when the workflow inputs are non-empty, the WDIO before-hook pushes a `qvacPerfConfig.txt` to the device (Android: `/data/local/tmp/`, iOS: `@bundleId:documents/`) with the iteration overrides as KEY=VALUE lines. The file-reading side on bare lives on the QVAC-17830 perf branch — without that branch this PR is a no-op (orphan file), so it is safe to land independently. Changes: - `integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml`: add `qvac_perf_runs` / `qvac_perf_warmup_runs` to `workflow_call` and `workflow_dispatch`; add `__QVAC_PERF_RUNS__` / `__QVAC_PERF_WARMUP_RUNS__` placeholders to the Android + iOS WDIO config blobs and the corresponding pushFile block in the `before` hook; substitute the placeholders in `make_split`. - `benchmark-performance-qvac-lib-infer-llamacpp-llm.yml`: add a `mobile-benchmarks` job calling the mobile workflow with the bench-mode iteration counts; have `summarize` `needs:` it; drop the "desktop only" caveat in the step-summary blurb. PR runs are unchanged: empty input ⇒ empty placeholder ⇒ before-hook skips the perf-config push. Made-with: Cursor * QVAC-18111 chore[notask]: add run_mobile toggle to benchmark workflow_dispatch Made-with: Cursor --------- Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
* QVAC-18064 feat: optimize nmtcpp for Android GPU inference - Optimize nmtcpp for Android GPU inference with Vulkan backend support - Move beam search KV cache pool to CPU backend - Propagate config params after GGML context load and fix multi-GPU handling - Disable OpenCL until upstream qvac-fabric is updated - Prevent backend device accumulation and skip OpenCL comparison test - Fix clang-format for ggml_backend_load_all_from_path call - Remove Android debug logging added for Adreno 830 crash investigation - Resolve cpp-lint clang-tidy naming and implicit-bool errors - Address code review findings --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Port LlamacppUtils.hpp helpers to common_init_result_ptr API. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * Update vcpkg.json --------- Signed-off-by: Marcus Edel <marcus.edel@collabora.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
* QVAC-17989 Add post-generation ESRGAN upscale * QVAC-17989 Add ESRGAN JS test and example * QVAC-17989 Fix upscaled output stats * Update CHANGELOG.md * Update package.json * QVAC-17989 Format ESRGAN handler changes --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
* Create new buckets to run tests in independent processes. Signed-off-by: Marcus Edel <marcus.edel@collabora.com> * ci(ios): include all run ARNs in results aggregation and log download The two RUN_ARNS aggregation loops were hardcoded to iterate over indices 2..8, so the new Heavy7/Heavy8 runs (RUN_ARN_9, RUN_ARN_10) were silently dropped from the final test-results summary and the Device Farm log download. As a result, Heavy7/Heavy8 failures would not have failed the workflow and their device logs would not have been collected. Iterate up to RUN_COUNT instead, so any future bucket additions are picked up automatically. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Signed-off-by: Marcus Edel <marcus.edel@collabora.com> Co-authored-by: gianni-cor <gianfranco.cordella@tether.io> Co-authored-by: Cursor <cursoragent@cursor.com>
) The "Create and Upload Test Spec" step's run: | block in integration-mobile-test-qvac-lib-infer-llamacpp-llm.yml grew to 21,074 chars after #1889, putting it just over GitHub Actions' 21,000-char limit on a single template expression. This breaks every reusable-workflow_call into the file, so the On PR Trigger (LLM) workflow fails instantly with: error parsing called workflow ... : (Line: 914, Col: 14): Exceeded max expression length 21000 and no jobs run. Every open PR that touches the LLM package is currently blocked from getting LLM CI. Fix: remove 32 in-block comment lines that were pure narration of already-readable code (echo/printf/sed) and verbose intent text duplicated by the surrounding context. Brings the run-block payload to ~19,008 chars (well under 21,000) without changing any executed logic. Diff is comments-only: 32 deletions, 0 additions. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com>
…ter support to diffusion API (#1838) * feat[api]: add FLUX.2 multi-reference fusion and LoRA adapter support to diffusion API * doc[skiplog]: trim verbose lora docs and prune zod-builtin tests Address PR review: - shorten lora_apply_mode description in sdcppConfigSchema and drop the external file references the user can't see at usage time - shorten the LoRA JSDoc block in diffusion.ts to the essentials - drop unit tests that effectively re-assert zod built-ins (z.boolean(), z.string().min(1), individual enum members); keep the ABSOLUTE_PATH_PATTERN matrix, the mutual-exclusion refine, and one happy-path per new field Made-with: Cursor * test[api]: validate FLUX.2 fusion diverges from txt2img baseline and reject conflicting init_image inputs
New composite action that installs LLVM/Clang to a pinned version on Linux and Windows runners and exposes the unversioned binaries on PATH. Intended to become the single source of truth for the LLVM major used across every prebuild / cpp-test / coverage / benchmark workflow in the monorepo: bumping `version` (Linux apt major) and `windows-version` (chocolatey full pin) defaults rolls the whole repo forward in one place. - Linux: install via apt.llvm.org `llvm.sh <version> all`, then prepend `/usr/lib/llvm-<version>/bin` to `$GITHUB_PATH` so unversioned `clang`, `clang++`, `clang-format`, `clang-tidy`, `git-clang-format`, `lld`, `llvm-cov`, `llvm-profdata`, ... resolve to the chosen major. - Windows: `choco upgrade llvm --version=<windows-version> -y --allow-downgrade` (defaults to a specific patch release to avoid silent drift when chocolatey ships a new one) and add `C:\Program Files\LLVM\bin` to `$GITHUB_PATH`. - macOS: no-op (Apple Clang is set up via setup-apple-clang). Defaults: version=22, windows-version=22.1.0.
…ancel-on-first-token (#1880)
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Contributor
QVAC E2E —
|
Contributor
Contributor
Contributor
QVAC E2E —
|
Contributor
QVAC E2E —
|
9daf47c to
39d2782
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
img2imgwas shipped to the SDK in QVAC-17304 feat[api]: add img2img support to SDK diffusion API #1662 buttests-qvaconly had unit/mock coverage; real integration coverage against loaded diffusion models was missing.tests-qvac/tests/shared/executors/diffusion-executor.tshad drifted: heavyif (testId === ...)branching,unknown/anyparams, ad-hoc PNG-size byte checks that produce false positives on compressed images.📝 How does it solve it?
diffusion-tests.tsexercising the img2img path against real loaded models:diffusion-img2img-vs-txt2img-baseline— provesinit_imageactually changes output (byte-delta + IHDR-dimension comparison vs txt2img baseline).diffusion-img2img-img-cfg-scale—img_cfg_scaleparameter accepted/rendered.diffusion-img2img-invalid-strength— Zod rejects out-of-rangestrength.diffusion-basic-img2img.Uint8Arrayresolution lives indesktop/executors/diffusion-executor.ts(Nodefs); shared executor stays React Native-clean and only sees bytes.SkipExecutormessage updated;mobile/executors/diffusion-executor.tsremoved as dead code.shared/executors/diffusion-executor.tsto be a typed reference implementation:execute()override; replaced with a strongly-typedhandlersmap.Required<{ [K in testId]: HandlerFn<…> }>annotation makes the map exhaustive at compile time — adding a new test without a handler is a TS error.DiffusionParamsinterface (no moreunknown/any);buildParams/resolveParamstyped end-to-end.runBasic(resourceKey, …)viabind.compareWithBaselinehelper for img2img-vs-txt2img and fusion-vs-flux2 comparisons.readPngDims/assertEqualPngDimensions(parse IHDR) so we no longer false-positive on compressed-byte length differences.assets/images/diffusion-img2img-source-256.png(562 B, 256×256 RGB) — keeps SD 2.1 output dimensions matching requested 256×256 and minimizes resource cost.🧪 How was it tested?
npm run install:build:full→ full diffusion suite green locally (FLUX 2 Klein).tsc --noEmitclean. Exhaustiveness check verified by removing a handler entry and confirming TS error:Property '"diffusion-standalone-upscaler-x4"' is missing in type … but required in type 'Required<…>'.