infra[notask]: extend qv-devops-pr-status to scan qvac-*/github-ops/oss-actions repos by Proletter · Pull Request #2183 · tetherto/qvac

Proletter · 2026-05-21T11:47:36Z

🎯 What problem does this PR solve?

The DevOps pod's /qv-devops-pr-status dashboard was scoped to a single repo (tetherto/qvac), filtered by ownedPaths inside the monorepo. PRs in DevOps-owned satellite repos (tetherto/qvac-*, tetherto/github-ops, tetherto/oss-actions) were completely invisible to the team's stale/needs-review queue — they had to be triaged out-of-band per repo.
The single-repo assumption was baked into the shared pr-skills library (collectPRActivity only ever called fetchOpenPRs once, against the configured primary repo), so every per-pod wrapper inherited it.

📝 How does it solve it?

New optional extraRepos field on .github/teams/<pod>.json. Each entry is exactly one of {"repo": "owner/name"} (explicit) or {"match": "owner/name-glob"} (single * wildcard, resolved per-run via gh repo list <owner>, archived repos auto-excluded, primary repo de-duplicated). Backward-compatible: pods without the field (e.g. sdk) keep their exact previous behavior.
collectPRActivity now fetches the primary repo plus every resolved extra repo, tagging each PR with sourceRepo / sourceIsExtra. For PRs sourced from extra repos the ownedPaths filter is intentionally bypassed — extra repos are treated as wholly pod-owned. Repos the user cannot read are skipped with a single stderr warning rather than aborting the whole run.
extraRepos is honored only by --mode team (the only mode wired through a multi-repo workflow today). --mode review (personal queue) and --mode my (used by cross-pod qv-pr-mine) stay single-repo, so existing skills' semantics are unchanged.
pr-status.mjs text/JSON output:
- Adds a Repos: <primary> (primary) + N extra · <list> line above the headline summary when extra repos contributed.
- PR references render as owner/repo#<num> for extra-repo PRs and stay as bare #<num> for primary-repo PRs (in both the active sections and the Excluded section).
- JSON output gains top-level extraRepos and per-PR repo / isExtraRepo.
devops.json opts in with [{"match": "tetherto/qvac-*"}, {"repo": "tetherto/github-ops"}, {"repo": "tetherto/oss-actions"}], which today resolves to 43 extra repos (44 scanned total).
qv-devops-pr-status/SKILL.md and the shared _lib/pr-skills/README.md document the new field, the owner/repo#<num> render rule, and the read-permission failure mode.

🧪 How was it tested?

End-to-end run against live tetherto org from a clean main checkout:

node .cursor/skills/_lib/pr-skills/pr-status.mjs --pod devops --mode team --authors pod

Primary repo (tetherto/qvac) is correctly excluded from the extra-repo list (de-dup works).
tetherto/qvac-* glob resolves to 43 non-archived repos, plus tetherto/github-ops and tetherto/oss-actions from explicit entries → 44 repos scanned total.
Both github-ops and oss-actions surfaced PRs (github-ops#111 in Needs Review; oss-actions#19/#24/#25 and github-ops#63/#102 in Excluded), confirming the explicit {"repo": ...} entries land alongside the glob expansion.
Headline: 22 PRs need attention · 0 fully approved · 0 need your re-review · 14 stale · 4 ⚠️ merge conflicts · 85 excluded. Counts cross-check against per-section bullet counts.
Primary-repo PRs (#2169, #1657, #2050, etc.) render with bare #<num>. Extra-repo PRs (e.g. tetherto/qvac-devops#119, tetherto/qvac-ext-marian-dev#5) render with the owner/repo#<num> prefix in both Stale / Needs Review and Excluded sections.
Merge-conflict flag (⚠️ MERGE CONFLICTS!) propagates from extra-repo PRs (e.g. qvac-devops#136/#137/#144, qvac-internal#11) into both the per-PR block and the headline conflict count.
Empty stderr; no repo skipped for permissions.
Wall time: ~77s for 44 repos (sequential paged GraphQL). Acceptable for a once-a-day dashboard.

JSON-mode parity check (--json) confirms extraRepos and per-PR repo / isExtraRepo are populated as expected.

SDK pod (.github/teams/sdk.json) has no extraRepos, and dry-runs of --pod sdk --mode team confirm its output is unchanged byte-for-byte (only-primary-repo fetch path).

* feat[api]: route responseFormat via per-request generationParams (QVAC-17939) Adds an OpenAI-compatible `responseFormat` to `completion()`: - `{ type: 'text' }` (default, free-form) - `{ type: 'json_object' }` (any valid JSON) - `{ type: 'json_schema', json_schema: { name, schema, ... } }` Forwards the schema to the addon as a per-request `generationParams.json_schema`, which the addon converts to GBNF and applies for the duration of the request only — avoiding the previous shared-`modelConfig.grammar` mutation, which was unsafe under concurrent completions and didn't actually flow per-request anyway. `tools` and a non-text `responseFormat` are mutually exclusive at the schema layer (tools already constrain output via their parameter schema). Bumps `@qvac/llm-llamacpp` to ^0.17.1 for the new addon API. Includes `examples/llamacpp-structured-output.ts` demonstrating all three modes against `QWEN3_600M_INST_Q4`. * review: address #1768 comments — events/final example + json_schema field docs (QVAC-17939) - Migrate `examples/llamacpp-structured-output.ts` from the legacy `tokenStream` surface to the canonical `events` / `final` API recommended for new SDK code. Streaming now consumes `contentDelta` events and aggregates via `final.contentText`. - Document `json_schema.description` and `json_schema.strict` as accepted-for-OpenAI-compatibility-only on `responseFormatSchema` via `.describe()` annotations, and on the `responseFormat` JSDoc in `client/api/completion-stream.ts`. Both fields are accepted by the schema but not forwarded to the addon — `getResponseFormatJsonSchema()` only forwards `json_schema.schema`, and `strict: true` does NOT trigger OpenAI's auto-tightening (implicit `additionalProperties: false`, all properties required). Callers wanting strict validation must encode it explicitly in `schema`. Honoring `strict` semantics natively is tracked as a follow-up — out of scope for this PR. * review: drop double blank line in completion-stream.ts (QVAC-17939)

…moke (#1797) * feat: add pre-terminate cleanup signal for SDK clients Lets a client request a clean addon teardown before tearing the bare runtime down, so addon static state (e.g. js_ref_t handles into the worker V8 isolate) is released while that env is still alive. Without this, tearing down a runtime whose addons retain isolate-bound refs trips a V8 GlobalHandles assertion (brk 0 / SIGTRAP) inside the next runtime that re-imports the same .bare files in the same OS process — the JsLogger.setLogger path in qvac-lib-inference-addon-cpp is the reproducer (every addon that links it has the same retention). - worker-core.ts: extract the existing shutdown body into a reusable cleanupForTerminate() that runs the same registry / model / resource cleanup but skips releaseWorkerLock() and process.exit(). The full shutdownBareDirectWorker still runs both for desktop signal and exit paths. - handler-utils.ts + handle-request.ts: new internal __shutdown__ message dispatched alongside __init_config. Bypasses the schema, awaits cleanupForTerminate(), and replies success. Lazy-imports the worker-core function to break the handler-utils -> worker-core -> create-server -> handle-request import cycle. - bare-client.ts: mirror the message in the in-process mock RPC for desktop direct-mode (Pear-style) consumers. - expo-rpc-client.ts: close() is now async; sends __shutdown__ over RPC and awaits the success reply (with a 10s timeout safety) before calling worklet.terminate(). Best-effort: timeouts log a warning and proceed with terminate. The auto-close path in unload-model.ts already awaits close(), so this is non-breaking for that caller. * test: stabilise mobile smoke run via eviction-on-none and post-unload settle Two related fixes that together let the mobile smoke run progress past the "previous heavy model still resident" memory ceiling: - resource-lifecycle: tests with dependency:none used to skip evictExcept and leave whatever was loaded by the previous test resident. Now treated as evictExcept([]), so a heavy model from the prior test gets unloaded before the next one starts allocating. Empirically this is what kept tripping sharded-model-load right after translation-afriquegemma-sw-en (afriquegemma 4B leaves ~550 MB resident; sharded then asks for multi-GB on top and hits the iOS memory limit). - resource-manager: new ResourceManager({ unloadSettleMs }) option that sleeps for the configured duration after a successful unloadModel (only on success — failure path returns immediately). Lets the kernel release pages before the next load starts allocating. Defaults to 0 (off, desktop is fine without it). Mobile consumer opts in to 100ms. Mobile consumer also picks up SkipExecutor entries for the lifecycle-suspend tests; suspend hangs the runner indefinitely on mobile because the lifecycle coordinator pauses MQTT and never resumes within the test timeout. * chore: bump qvac-test-suite to ^0.6.2 Picks up: - in-app memory poller in mobile-consumer template - desktop in-app memory poller (process-tree RSS) - Memory tab + per-test memory metrics in HTML/JSON reports - bucket results by metadata.category instead of testId-prefix split Required by the eviction / settle work in this PR; both depend on the new MemorySummary fields and the corrected category bucketing. * fix: split cleanupRan and isShuttingDown so shutdown still releases lock cleanupForTerminate previously set the same isShuttingDown flag that shutdownBareDirectWorker uses as its early-return guard. After a __shutdown__ message ran the pre-terminate cleanup, a subsequent SIGTERM / SIGINT / uncaught-exception in desktop direct mode would early-return at the guard and skip releaseWorkerLock() + process.exit(). Result: lock file leak and no graceful exit. Mobile is unaffected because each Worklet has its own module instance (fresh isShuttingDown per worklet). The bug only bites the bare-client mock-RPC path (Pear-style consumers where the worker shares the host process for its lifetime). Two flags now: - cleanupRan: idempotent guard around runCleanup body - isShuttingDown: only set by shutdownBareDirectWorker; cleanupForTerminate must NOT set it shutdownBareDirectWorker still calls runCleanup which is now a no-op when cleanupRan is already true. * fix: serialise expo-rpc-client.close() to avoid duplicate __shutdown__ races If two callers race close() (or one calls close() while another getRPC() is mid-flight), the second sees rpcInstance still set, fires a redundant __shutdown__, then re-enters the terminate block on already-null state. Wrap the body in a singleton closingPromise; concurrent callers share the same in-flight close. Reset to null in finally so a fresh worker brought up later can be cleanly closed again. The auto-close path in unload-model.ts is naturally serialised today so this is robustness rather than fixing an active bug, but the cost is minimal and the failure mode (double __shutdown__ after terminate) is annoying to diagnose. * fix: skip Worklet.terminate() on non-iOS platforms Worklet.terminate() crashes on Android: addon dlclose unmaps the lib but pthread_key_t destructors registered by some addons (likely rocksdb-native, libbare-tls, libbare-crypto) are never pthread_key_delete'd before unload, so libc's per-thread cleanup table points at unmapped memory and the next pthread_exit SIGSEGVs in pthread_key_clean_all(). iOS dyld no-ops dlclose for already-loaded third-party libs, so the dangling-destructor problem cannot manifest there. The terminate path stays enabled on iOS. On non-iOS, fall back to the legacy refs-only close: drop rpcInstance and rpcPromise, leave workletInstance + workletInitialized intact so the next getRPC() reuses the live worklet. Skip the __shutdown__ roundtrip too -- it would clear the worker plugin registry without a follow-up terminate, leaving the worker unusable for subsequent loadModel. Trade-off: Android tests no longer recover memory between heavy tests the way iOS now does, so memory accumulates across the smoke run. On Pixel-class devices (8+ GB RAM) this is fine; smaller-RAM Android devices may regress vs the pre-PR baseline. Acceptable until the upstream holepunchto/bare exposes a per-addon unload hook. Platform is resolved via the existing getRuntimeContext() path (getDeviceInfo handles a missing expo-device safely via dynamic import + try/catch), so no new react-native imports are added. * test: skip diffusion-streaming-progress on mobile The test reliably times out on mobile (Android Pixel 10 Pro hit the 600s timeout in the latest smoke run). Test framework drops the await on timeout but the underlying streaming inference keeps running on the Bare worker side, leaving the diffusion model "in use" from the runtime's perspective. Knock-on effect: any later test whose modelSetup needs to evict diffusion (e.g. wrong-model-transcribe-on-llm via ResourceManager.evictExcept) blocks indefinitely waiting for the stream to finish. Observed in local-android-smoke: 86/88 tests completed, then the runner stuck for 50+ minutes inside the eviction of diffusion at test 86's setup. Skipping unblocks the smoke run end-to-end. The proper fixes (framework-side cancel-on-timeout, resource-manager bounded waits) are tracked separately.

* fix[api]: deterministic decoding for LLM translate Force greedy decoding with a fixed seed and bounded output length on every LLM translate call (non-African branch) so output is reproducible across calls and runaway generations cannot blow ctx_size on the next call. Background: with @qvac/llm-llamacpp 0.17.x, calling `translate()` against Salamandra (loaded with no decoding params) intermittently produced verbatim source echo, "Translation in Spanish:" preambles, or `processPromptImpl: context overflow` on tiny inputs like "bank". The flake was non-deterministic across runs on the same input, masked in the smoke suite by `contains-any` validators that still matched a Spanish keyword inside a preamble. The change is one call site: when the model is llamacpp-completion and the prompt is not the AfriqueGemma path, pass per-call generationParams overriding sampling for that runJob: - temp/top_k/top_p collapse to greedy - repeat_penalty: 1.3 breaks single-token echo loops (e.g. greedy "bank" -> "bank\nbank\n...") - seed: 42 pins anything residual sampling - predict: 256 caps output so a runaway can't accumulate KV state Prompt template, NMT branch, and African branch are unchanged. AfriqueGemma is loaded with its own deterministic config + stop_sequences already, so we skip the override there. Verified locally on @qvac/llm-llamacpp 0.17.1 with 30 calls (streaming + en-es + context, 10 iterations each): - before: 23/30 pass with 2 echoes, 2 ctx-overflow, 3 echoes - after: 30/30 pass, all outputs identical across iterations * refactor: extract LLM translate generation params into named constant Pull the per-call sampling overrides for LLM translate out of the call site into a top-of-file constant with a comment that explains the purpose of each field. No behavior change — values are identical to the previous commit. Adding a third translate-friendly LLM model later still goes through this single constant unless it needs different sampling, in which case it would warrant a small profile lookup keyed on model family. That restructure is deferred until a concrete second profile lands. * refactor[api]: skip per-call sampling override for AfriqueGemma Apply the per-call deterministic-decoding override only to non-AFRICAN_* LLM models. AfriqueGemma's load-time `modelConfig` carries `stop_sequences: ["\n"]` and `repeat_penalty: 1`, and these values must not be overridden mid-call: with `repeat_penalty: 1.3`, the addon penalises "\n" and the stop never fires, so generation runs all the way to `predict` and produces non-translation output. The earlier attempt to dispatch by `afriquePrompt` (language-pair-derived) silently did nothing for the actual AfriqueGemma traffic: `isAfrican("sw")` returns `false` because `AFRICAN_LANGUAGES_MAP` is keyed by FLORES codes (`"swh_Latn"`), not the ISO codes the smoke tests pass. This commit dispatches by model name (entry.local.name starts with "AFRICAN_") and falls back to `model.run(input)` with no override — identical to the pre-fix call shape — so AfriqueGemma's behaviour is preserved exactly as it is on main. A latent AfriqueGemma garbage-output issue exists at HEAD regardless of this PR; that is out of scope. The constant is renamed `LLM_TRANSLATE_GENERATION_PARAMS` since it now applies to every non-skipped LLM, not just Salamandra. * refactor: tighten typing on per-call generation params Pull `RunOptions` and `GenerationParams` from `@qvac/llm-llamacpp` and use them in place of the loose `Record<string, number>` cast in `translate()`. Define a `LlmTranslateGenerationParams` alias as the specific subset of `GenerationParams` we set per call (six fields, required) so a typo on any of them is a compile error. The cast on `model.run.bind(model)` now references the addon's `RunOptions` shape directly, which keeps us protected if the addon's option shape changes. No behaviour change. --------- Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>

…ios, raise c… (#1773) * test[notask]: stabilize mobile e2e (skip afriquegemma on ios, raise chatterbox timeout) * test[notask]: inactivity timeout bumps * test[notask]: wait out addon busy throw in logging tests * test[notask]: increase timeouts * test[notask]: skip diffusion on ios, drop kv-cache math assertion, revert heartbeat to 300s * test[notask]: bump parakeet-ctc-mp3 timeout for mobile cold-load * test[notask]: fix mobile sentence-stream + parakeet-ctc-mp3 timeout + stop-sequences flake * test[notask]: update types for tts executors * test[notask]: bump mobile e2e timeouts (device-farm 90m, consumer 1200s) * test[notask]: android skip diffusion tests * test[notask]: revert consumer-inactivity-timeout input for mobile workflows

…at/completions (#1810) * feat[api]: response_format support in OpenAI-compat /v1/chat/completions (QVAC-17939) Wires OpenAI's `response_format` field through the CLI serve adapter: - `{ type: "text" }` (default, free-form) - `{ type: "json_object" }` (any valid JSON) - `{ type: "json_schema", json_schema: { name, schema, ... } }` The adapter validates the shape, removes `response_format` from `UNSUPPORTED_PARAMS`, and forwards it to the SDK's `responseFormat` parameter, which converts the JSON Schema to GBNF in the addon and applies it for the duration of the request only. Depends on the SDK PR #1768 (`@qvac/sdk@>=0.10.0`) for the actual constraint to take effect end-to-end. Without that landed, the SDK silently ignores `responseFormat`. Includes 11 new `extractResponseFormat` unit tests in `test/translate.test.ts` covering valid shapes, optional fields, and validation errors. * chore: bump @qvac/sdk dep floor to ^0.10.0 CLI now consumes the per-request `responseFormat` API added in qvac-sdk 0.10.0 (PR #1768 / QVAC-17939). Caret-pinned to ^0.10.0 to keep upgrades within the same minor line. * chore: bump runtime MIN_SDK_VERSION to 0.10.0 Matches the package.json devDep floor bumped in the previous commit. The runtime guard in serve/core/sdk.ts otherwise still accepted SDK 0.9.x, which lacks the responseFormat parameter. Addresses opaninakuffo's review on PR #1810.

infra: align cpp-lint with reusable-prebuilds and gate it on PR-changed files Refactor the reusable cpp-lint workflow so its bootstrap is the same as `reusable-prebuilds.yml`, and tighten the linters so the step actually fails PRs on findings. Toolchain bootstrap: - Replace inline LLVM/Vulkan/vcpkg/Node setup with the shared composite actions used by prebuilds (`setup-build-host`, `setup-aws-prebuild`, `setup-vcpkg`, `setup-bare-tooling`, `setup-vulkan-sdk`). - Drop divergent scaffolding no caller exercises: the ssh/https `git insteadOf` rewrites, the `@tetherto:registry` npmrc setup, the `~/.npm` cache, and the hardcoded `node-version: 18.x` override (default `lts/*` now matches prebuilds). - Move `runs-on` from `ai-run-linux` to `ubuntu-22.04` to match prebuilds and decommission self-hosted usage for this job. - Add an optional `include-vulkan-sdk` boolean input (default `true`) mirroring the same input on `reusable-prebuilds.yml` so ONNX/decoder addons can opt out in a follow-up. Inputs / refs: - Resolve `BASE_SHA` / `HEAD_REF` / `HEAD_REPO` up-front with documented fallbacks (`inputs.sha → github.event.before → HEAD~1`) so `workflow_dispatch` runs work without a PR context. The `sha` input is now optional; existing callers already pass it. Lint scope and gating: - Drop `bare-make build` / `bare-make test` (duplicated the Linux x64 prebuild work; ctest finds nothing without `BUILD_TESTING=ON`, and the gtest binary belongs to the dedicated `cpp-tests-*.yml` workflows). - Keep `bare-make generate -D ENABLE_VULKAN=OFF` because clang-tidy needs the resulting `compile_commands.json`. `ENABLE_VULKAN=OFF` is honoured today only by `qvac-lib-infer-whispercpp` and is a no-op CMake variable elsewhere. - Scope `clang-tidy` to the C/C++ translation units touched by the PR by intersecting `git diff $BASE_SHA HEAD` with the TU list emitted by `clang-tidy-helper.py --files`. Header-only and removed-source diffs short-circuit cleanly. Reuse the helper's `--clang-tidy-cmd` output for the `-p` build dir and `--header-filter` regex; only the file list is replaced. - Pass `--warnings-as-errors='*'` to `clang-tidy-19` so every check enabled in the shared `qvac-lint-cpp/.clang-tidy` `Checks:` is promoted to an error and the step's exit code reflects findings (the shared config sets `WarningsAsErrors: ''`, which by itself would let clang-tidy print warnings and exit 0). - Wire `clang-tidy` into the existing failure tally alongside `cpp_files_fmt`. Toolchain pinning: - Pin both linters to their LLVM-19 binaries explicitly: `git-clang-format-19 --binary clang-format-19` and `clang-tidy-19`. Removes the dependency on whichever unversioned `clang-format` / `clang-tidy` wins PATH resolution (Ubuntu's stock 18.x was being picked up before this fix). The `::error` annotation contributors copy-paste locally is updated to match. cpp-lint remains advisory at the on-pr workflow level — no `prebuild`, `cpp-tests*`, integration test, or ts-checks job lists cpp-lint in `needs:`, so a failing cpp-lint doesn't block the addon build/test path; merge-guard semantics are unchanged.

* fix Vulkan SDK installation verification * run vulkaninfo on win11-rtx4000-hetzner * add job build-win11-nvidia-image * fix missing double quoted string value in poweshell * run vulkaninfo on ai-run-windows11-gpu * install vulkan sdk on ai-run-windows11-gpu * fix typo * fix typo * correct vulkaninfoSDK.exe path * Force WDDM Mode for Vulkan to work * nvidia-smi -fdm 0 * install nvidia grid drivers instead of datacenter driver * use Use .NET HttpClient with Progress Tracking * fix httpclient instance creation for newer powershell * dd-Type -AssemblyName System.Net.Http * try wget * test run vulkaninfo on ai-run-windows11-gpu * installl git in win11-nvidia-grid-image * disable inteeractivity when installing git with winget * git version separate step after winget install Git * test * test * install winget before winget install git * install winget from PSGallery * fix * fix2 * use Cyberboss/install-winget@v1 * add --source winget to winget install * add --custom '/o:PathOption=CmdTools' to winget installation of git for windows * do not verify git installation in path in same job where it gets installed (needs new job run) * verify that git is in the path for ai-run-windows11-gpu * uncomment needed lines * Install Chocolatey on win11-nvidia-grid-image * s/pwsh/powershell/ * verify choco installation * verify vcpkg * clone vcpkg * vulkaninfo check * install visual studio build tools, LLVM, and Vulkan SDK in base image for win11 gpu runner * do not install vulkan sdk on base win11 gpu image * rename vulkaninfo to win11-nvidia-image-builder.yml * add vulkaninfo.yml

…1796) * doc: content update - sdk - completion * doc: content new - SDK - runtime lifecycle * doc: update sidebar - add new page - runtime lifecycle * doc: content update - SDK - diffusion - add img2img gen * doc: replacement for PR 1735 to be closed * doc: new code example - SDK - img gen img2img with flux2 * doc: fix broken link in completion page * doc: sdk - create new example - img2img with klein --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

…1819)

* fix: handle Harmony <|call|> EOG token for GPT-OSS tool calling GPT-OSS models use <|call|> as a frame delimiter in Harmony tool-call protocol. This token is in the EOG set, causing generation to stop silently before tool calls reach the SDK. Add Harmony model detection and <|call|>-specific handling in the generation loop: render the token as visible text (special=true) so the SDK can parse frame boundaries, then stop generation for the turn-based tool execution protocol. * Example added to showcase GPT-OSS mutli-turn tool call * CHANGELOG added, package version bumped to 0.19.1 * CPP lint applied --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

* doc: create new page - sdk - new ai capability - voice assistant * doc: create new page - sdk - voice assistant * doc: remove temporary doc that leaked into commit * doc: content new - sdk - voice assistant - PR review

…1834) GitHub only registers `workflow_dispatch` for workflow files that live on the default branch. The full benchmark orchestrator is wired in PR #1677 (QVAC-17092) but cannot be triggered from the Actions UI or `gh workflow run` until the file exists on `main`. This stub adds the file with a no-op job so the workflow becomes dispatchable. Dispatching with `--ref <branch>` runs the *branch's* version of the YAML, which is how we'll validate PR #1677's orchestrator end-to-end before merging it. The stub will be overwritten by PR #1677's real workflow on merge.

* test[notask]: fix android sharded-model-resume scudo oom Tag sharded-model {detection, hash-validation, progress, resume, cancellation} with dependency "sharded-embeddings" instead of "none". With dep:none and the default loadSharded handler, modelSetup evicts sharded-embeddings then immediately re-mmaps the same 5 shards; on Android (Pixel 10 Pro) Scudo's mmap fails with "internal map failure" before the kernel reclaims the prior maps, killing the worklet and cascading the rest of the sharded category. Also bump mobile unloadSettleMs 100 -> 200 ms to keep some slack for remaining same-model unload / reload paths. * infra[notask]: pass --report-dir to CI sdk producer runs Without --report-dir, BatchOrchestrator skips writing app-mem.ndjson and test-timeline.ndjson, so mobile in-app memory samples published on qvac/app-memory get dropped silently and the per-test memory rows / chart / suite peak never make it into the report. run:local already passes --report-dir; the three CI workflows did not. Pin --report-dir=./reports in test-android-sdk.yml, test-ios-sdk.yml and test-desktop-sdk.yml. The existing "Upload results" step already uploads ${working-directory}/reports/ so the new files ride along. --------- Co-authored-by: Victor-Rodzko <victor.rodzko@itrexgroup.com> Co-authored-by: Opanin Akuffo <46673050+opaninakuffo@users.noreply.github.com>

…se/workaround bare double creation issue, harden addon-cpp workflows (#1825) * add JS integration CI for inference addon Run addon JS integration packages across desktop targets so callback lifetime and platform-specific runtime issues are exercised in PR checks. * fix output callback lifetime cleanup Keep callback state alive until the libuv async handle is closed so pending JS output delivery cannot observe freed addon storage during teardown. * add JS number creation integration test Add a minimal integration package that exercises first double creation through js::Number and keeps js_create_int32 as a control across the JS test matrix. * work around Windows js_create_double first call Route addon double creation through js::Number so win32 can burn the first js_create_double call observed to produce an invalid value on GitHub Azure runners. * version * fix Windows double burn-in guard Use a function-local static initializer for the Windows-only js_create_double burn-in. This keeps the workaround process-wide without carrying template-level atomic state before returning the requested double value. * harden PR JS test workflow Limit the addon-cpp JS PR tests to read-only repository access and avoid release secrets, inherited secrets, and persisted checkout credentials. Install fixture dependencies without lifecycle scripts so PR-controlled package code cannot run during setup. * fix PR JS workflow trust boundary Run addon-cpp JS tests from the unprivileged pull_request workflow so PR-controlled code is not executed from pull_request_target. Keep a lightweight verify-label gate for external forks while leaving the privileged workflow focused on authorization and native tests. * harden addon-cpp PR workflows Move PR-controlled native test execution out of pull_request_target so cache restore and builds run without write-capable credentials, inherited secrets, or PAT-backed checkouts. Keep pull_request_target limited to clearing the verify label when new commits arrive, and make unverified PR events skip the native/JS matrices without failing the workflow. --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

…flow_dispatch input (#1824) Co-authored-by: Matt Cavanagh <1789097+darkynt@users.noreply.github.com> Co-authored-by: tamer-hassan-tether <tamer.hassan@tether.io>

The publish step now uses npm trusted publishing (OIDC), so the legacy NPM_TOKEN auth in the build step's .npmrc and bunfig.toml is no longer needed. @qvac/* packages on npmjs.org are public, so anonymous reads work for installs. Aligns with #1618 (DEVOPS-2062) which migrated the publish step to trusted publishing but left the build-step NPM_TOKEN references in place.

… CMakeLists (#1852) The version in CMakeLists.txt kept drifting from vcpkg.json (currently 1.1.4 vs 1.1.5) but was never consumed: no downstream package calls find_package(qvac-lib-inference-addon-cpp <version> ...) — they all use find_path() to locate headers. Remove the project VERSION and the generated ConfigVersion.cmake so vcpkg.json stays the single source of truth. Bump vcpkg.json version to 1.1.6, as it drifted from the version in the registry.

…dk publish doesn't auto-skip (#1853) publish-npm needs [build, publish-logic, release-merge-guard]. On a manual workflow_dispatch from a release-sdk-* branch, the guard's if: rejected the event (push only), so the guard was skipped, and GitHub Actions' implicit success() check on needs auto-skipped publish-npm before its if: with the explicit needs.release-merge-guard.result == 'skipped' branch could even be evaluated. Allow the guard to run on workflow_dispatch too. The guard already handles workflow_dispatch safely: github.event.before is empty, so base-sha is empty, so isInitialPush is true and the changelog diff check is skipped. The branch-name pattern check and the package.json-version-matches-branch check still run, which is what we want for a manual release publish. Net effect: manual publish-sdk dispatches on release branches now actually reach the publish-npm job instead of silently skipping.

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

…branch dispatch (#1856) Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

#1835) The bare worker leaks indefinitely when started while another SDK process holds the registry corestore lock. Root cause: `corestoreOpts: { wait: true }` issues a blocking `flock(LOCK_EX)` on a libuv worker thread that JS cannot cancel, so when SIGTERM/IPC-disconnect arrives, the in-flight `client.ready()` never resolves (cleanup early-returns with `registryClient = null`) and `process.exit()` cannot terminate Bare while the native handle is held. The OS process wedges forever, breaking the three `no-lingering-bare-*` e2e tests in mixed-suite runs. `wait: true` was deliberately added by #1480 (QVAC-12232) to tolerate transient lock contention during another SDK's startup/shutdown; reverting to the bare default would re-introduce that bug. Instead, switch to `wait: false` (tryLock) and provide an equivalent JS-bounded retry budget in the existing retry loop: - 8 attempts, 250 ms base backoff, capped by a 10 s deadline - each step is a fresh non-blocking syscall — `EBUSY` surfaces to JS immediately, so shutdown remains cancellable at every point - exhausted budget propagates the underlying error, hitting the existing `closeRegistryClient` early-return on `null` and letting `process.exit()` terminate the worker cleanly As defense in depth, arm a 3-second SIGKILL safety net in `shutdownBareDirectWorker` (unrefed timer) before calling `process.exit`, so any future blocking-handle bug can't survive shutdown. Covered by existing `no-lingering-bare-{sigterm,close,ipc-disconnect}` e2e tests, which now pass in mixed-suite runs. Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>

* doc: create Cursor rule for docs website * docs: add robots.txt to website * doc: website source - refactor - standardize env vars to standard used in JSON and infra envs like GH Actions * doc: website source - add autogen sitemap.xml * doc: website source - add JSON-LD * doc: frontmatter improvement - add type of page to enrich metadata * doc: content update - add missing frontmatter field for SEO * doc: website source - robots.txt - add AI bot rules * doc: website source - simplify SEO machinery * doc: website source - robots.txt - add content signals

…ease prs (#1862)

…otes (#1865) Tooling (scripts/sdk/generate-changelog-sdk-pod.cjs): - Backmerge filter: PRs whose subject starts with `Backmerge` or `Merge release ...` are skipped during processSDKPRs (same shape as the existing [skiplog] filter). - Companion filter + entry-count strip: new isCompanionEntry, stripEntryCount, cleanModelEntries helpers applied to the inline [mod] summary in CHANGELOG.md and the body of models.md. Recognises *_LEX / *_VOCAB / *_DATA / *_METADATA constant suffixes and any line containing the word "companion". - Indented continuation lines for [mod] PRs: Added/Updated/Removed are emitted as indented sub-rows under the bullet (capped at MAX_INLINE_MODELS = 5 per section, "(and N more)" for the rest) instead of stuffed inline. - Announcement-post generator: new --generate-announcement-post CLI flag (with optional --version) parses CHANGELOG.md via parseChangelogMarkdown and emits the Slack template (:qvac: header, NPM/GitHub/changelog links, conditional :warning: Breaking Changes, per-section bullets with <url> link wrapping and :boom: breaking markers, footer). Sections cap at MAX_ANNOUNCEMENT_BULLETS = 10 with "... And much more, see full list in changelog :memo:" only when strictly more than 10. - New helpers exported: parseChangelogMarkdown, generateAnnouncementPost. Skill (.cursor/skills/sdk-changelog/SKILL.md): - Step 4 (CHANGELOG_LLM.md) is now mandatory. - New Step 5: generate announcement-post.txt (mandatory) with the gitignore note and template spec. - NOTICE renumbered to Step 6. - Documented all new policies (backmerge, companion, entry-count strip, indentation, max-bullets cap). - CLI parameters table refreshed. .gitignore: - Added packages/*/changelog/*/announcement-post.txt. The post is a Slack copy-paste working artifact, not a release deliverable. Release notes for 0.10.0: - New packages/sdk/changelog/0.10.0/ folder with CHANGELOG.md, breaking.md, api.md, models.md, CHANGELOG_LLM.md. - Root aggregate packages/sdk/CHANGELOG.md rebuilt with v0.10.0 at top. - packages/sdk/NOTICE refreshed (191 models, 179 JS deps). - packages/sdk/package.json bumped 0.9.1 -> 0.10.0. Backmerge of release-sdk-0.10.0 -> main is a no-op for the release artifacts (changelog, NOTICE) because they land here directly.

…desktop runner (#1832) * QVAC-17837 feat[ci]: surface synthetic IndicTrans [GPU] row on every desktop runner The on-PR Step Summary previously showed [GPU] rows only on the 2 of 6 desktop runners that have a real GGML GPU backend bound today (macOS Metal, ai-run-windows11-gpu Vulkan). The 4 hosted Linux runners showed [CPU]-only rows because: - bergamot.test.js + pivot-bergamot.test.js gate their GPU probe loop on `if (isMobile)` so they never run GPU on desktop, and - indictrans.test.js does probe GPU on every platform but discoverGpuDevices() returns empty when GGML can't bind a backend (loader fix is still pending per QVAC-17640 / QVAC-17880). This commit adds a synthetic always-running [IndicTrans] [GPU] test that loads with use_gpu: true and no explicit gpu_device. The existing shared runSingleTranslation helper records perf regardless of the resolved backend; resolveExecutionProvider (now lifted into utils.js) tags the execution_provider as 'cpu (fallback)' when GGML emitted a CPU sentinel and as the real backend tag (vulkan/metal/opencl/...) when a GPU resolved. So today the 4 Linux runners show CPU + GPU(cpu (fallback)) rows, and macOS / ai-run-windows11-gpu show CPU + GPU(real) rows. Once Ian's GPU loader fix lands on a given platform, the same row's EP auto-flips from 'cpu (fallback)' to the real backend without further CI wiring — that's the contract QVAC-17837's description asks for. Other clean-ups in the same file because the audit surfaced them: - resolveExecutionProvider now treats 'BLAS' as a CPU sentinel so the [CPU] row's EP no longer reports 'blas' on macOS. - discoverGpuDevices() now breaks on BLAS (suppresses macOS's three spurious [GPU:1 BLAS] / [GPU:2 BLAS] / [GPU:3 BLAS] rows) and dedupes by backend name (also fixes mobile Android's 4xVulkan0 duplicates when that file is next exercised, though mobile is out of scope for this PR). - The per-device GPU test's t.not(backendName, 'CPU') hard assertion is loosened to a t.comment warning so a silent GPU fallback at a discovered device index doesn't fail CI on a perf-only test. Bergamot and Pivot stay CPU-only on desktop. Bergamot is intgemm-only and has no GPU port architecturally, so a synthetic GPU row for those tests would be perpetual fallback noise. Mobile workflows are unchanged. Made-with: Cursor * QVAC-17837 fix: address parallel-review feedback on synthetic GPU test Two correctness/consistency follow-ups from the parallel review: - Wrap the new synthetic [IndicTrans] [GPU] test in `if (!isMobile)`. D2 scope explicitly said mobile workflows are untouched, but the test had no mobile gate so it would have added a duplicate default-device GPU row alongside the existing per-device probe rows on Pixel/S25/iPhone. Mobile already has meaningful GPU rows; the synthetic row is only needed on the 6 desktop runners that today emit zero GPU rows for some/all tests. - Replace the literal `backendName === 'CPU'` check in the per-device GPU test's soft-fallback warning with `CPU_SENTINEL_BACKENDS.has(...)` so the warning fires consistently for every backend treated as CPU by `resolveExecutionProvider` (including BLAS and Unloaded), not just the addon's `CPU` sentinel. Same set, same definition, one source of truth. No behaviour change on desktop; restores intended D2 scope on mobile; self-consistent fallback definition between the helper and the warning. Reviewers' other findings (`feat[ci]:` tag style, BLAS-break order dependency, Bergamot/Pivot still using regex EP fallback) are documented or pre-existing — not addressed here. Made-with: Cursor * QVAC-17837 fix[lint]: re-indent synthetic [GPU] test body inside if (!isMobile) block Pure whitespace fix — `npm run lint:fix` (standard --fix). Sanity-checks job in CI run #25166275184 was failing on ESLint indent errors because the previous commit wrapped the test body in `if (!isMobile) {...}` without bumping each line's indentation by 2 spaces. `git diff -w` is empty. Made-with: Cursor

…flows (#1728)" (#1871) * Revert "fix: prevent code injection and untrusted checkout in CI workflows (#1728)" Reverts commit a79602f, with two intentional exclusions noted below. Excluded from this revert: - .github/actions/run-lint-and-unit-tests/action.yaml: kept at its current state on main; the env-var indirection #1728 introduced for npm-token/pat-token in the .npmrc-configuration step is preserved. - .github/workflows/cpp-lint.yaml: net effect on this file is zero. PR #1829 (commit 65bd746) later rewrote the same `cpp-lint` job and added `id-token: write` to the `permissions` block originally introduced by #1728. The `permissions` block is preserved as-is (contents: read + id-token: write) because #1829's AWS OIDC integration depends on it. All other changes from #1728 are reverted. Co-authored-by: Cursor <cursoragent@cursor.com> * Potential fix for pull request finding 'CodeQL / Code injection' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

…ed addon (#1833) * feat: add multi-GPU pipeline parallelism via split-mode config Ports the split-mode/tensor-split feature from the LLM addon to the embed addon. When split-mode is layer or row and a GPU backend is available, the --device flag is omitted so llama.cpp distributes embedding model layers across all available GPUs. Falls back to CPU silently when no GPU is found. - Add split-mode (none|layer|row) and tensor-split config keys to setupParams, accepting both hyphen and underscore variants - Omit --device in split mode so llama.cpp routes across all GPUs - Accept main_gpu underscore variant alongside main-gpu in tryMainGpuFromMap - Add getEffectiveGpuDeviceCount() to BackendSelection for GPU inventory - Add split-mode and tensor-split to GGMLConfig in index.d.ts - Bump version 0.14.0 -> 0.15.0 * test: add multi-GPU split-mode tests and benchmark example Ports the test and example surface from the LLM multi-GPU PR to the embed addon, matching the pattern exactly. - Add BertModel::getCommonParams() so tests can inspect split_mode after load - Add 8 BertModelTest split-mode cases: none, layer, row, case-insensitive, underscore variant, CPU fallback clears GPU params, invalid value, both keys reject - Add 9 BackendSelectionTest getEffectiveGpuDeviceCount cases covering zero, CPU-only, single dGPU, single iGPU, two dGPUs, dGPU+iGPU, two dGPUs+iGPU, two iGPUs, and accel/CPU ignored - Add test/integration/spec-logger.js for native log capture in integration tests - Add test/integration/multi-gpu.test.js: 4 integration tests gated on QVAC_HAS_MULTI_GPU=1 (layer, row, default single-device, layer+tensor-split) - Add examples/multiGpuBenchmark.js: single vs layer vs row throughput comparison using the embed model - Regenerate test/mobile/integration.auto.cjs with runMultiGpuTest entry * fix: harden CPU fallback and add missing main_gpu alias tests CPU fallback in setupParams was missing two details present in the final LLM implementation: - Set params.main_gpu = -1 on CPU fallback so llama.cpp does not retain a stale GPU index. - Reset the local splitMode variable to LLAMA_SPLIT_MODE_NONE after the CPU-fallback warning so the --device gate below emits --device correctly instead of silently suppressing it when the requested split mode was layer or row. Also add two missing BackendSelection unit tests for the main_gpu underscore alias and both-key rejection introduced in tryMainGpuFromMap, mirroring the coverage in the LLM package. * fix: wire all integration tests into test:integration runner test:integration was hardcoded to addon.test.js, so multi-gpu.test.js and multi-instance.test.js were never executed in desktop CI. Switch to the same generate-then-run-all pattern used by the LLM addon: brittle -r generates test/integration/all.js from the full *.test.js glob, then bare runs it. * fix: resolve cpp-lint failures in BackendSelection and BertModel Apply clang-format and clang-tidy fixes flagged by the cpp-lint job: - Use std::ranges::transform in BackendSelection.cpp and BertModel.cpp - Drop else-after-return in parseMainGpu - Rename short iterator names (it -> foundIt/configIt/splitModeIt) - Use designated initializers for BackendInterface and BertEmbeddings::Layout - Drop redundant (void) on BackendInterface function pointer - Move pointer-arithmetic NOLINT to the diagnostic line in batchDecode - Extract parseSplitMode helper to bring setupParams cognitive complexity back under the threshold - Suppress non-const-global and macro-usage diagnostics in logging.hpp - Reorder includes in test_bert_model.cpp and collapse getCommonParams to a single line for clang-format --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>

…2137) * feat[api]: add Sortformer v2.1 + AOSC streaming diarization support Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive API only; no breaking changes). ## 🎯 What problem does this PR solve? - v1 Sortformer streaming uses a fixed-size sliding-history window; once a speaker goes silent long enough to roll out of the window, their slot identity drifts onto a different physical voice when they return. - Continuous single-speaker stretches collapse all voices onto `sortformer_0` once two speakers have been seen, breaking live speaker-tagged transcripts. - v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in parakeet-cpp, but until now there was no way to consume it from the JS layer. ## 📝 How does it solve it? - Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg bump in PR #156 pulls in PRs #22 / #24 of qvac-ext-lib-whisper.cpp). - Plumb 6 AOSC knobs (`streamingSpkCacheEnable`, `streamingSpkCacheLen`, `streamingFifoLen`, `streamingChunkLeftContextMs`, `streamingChunkRightContextMs`, `streamingSpkCacheUpdatePeriod`) from JS through `ParakeetConfig` -> `ParakeetModel` / `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`, for both the in-process Mode-3 streaming path and the duplex `runStreaming()` processor. - v2.1 is auto-detected by the engine via the GGUF metadata tag `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled). - Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline default. Both GGUFs remain registered. - New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a CLI flag for A/B comparison against the v1 sliding-window path. ## 🧪 How was it tested? - Built locally against a vcpkg overlay pointing at the PR #156 branch; addon compiled cleanly with all 6 new AOSC field references through `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`, and `JSAdapter.cpp`. - Full integration suite: **37/37 tests pass, 72/72 assertions in 145s** (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer). - New `test/integration/sortformer-aosc-streaming.test.js` covers default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to the v1 sliding-window code path. Confirmed via engine logs that the override actually disables the cache (`Sortformer AOSC enabled` line only prints when AOSC is active). - v1 Sortformer desktop integration + GPU smoke tests still pass -- no regression to the existing diarization path. ## 🔌 API Changes New optional fields on `ParakeetConfig`, mirrored as per-call overrides on `StreamingRunConfig`. All default to parakeet-cpp's NeMo-port tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on non-Sortformer engines (no-op forwarding is safe). ```typescript import { TranscriptionParakeet } from "@qvac/transcription-parakeet"; const model = new TranscriptionParakeet({ files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" }, config: { parakeetConfig: { streaming: true, streamingChunkMs: 2000, // AOSC (v2.1+ only; auto-detected via GGUF metadata) streamingSpkCacheEnable: true, // default streamingSpkCacheLen: 188, // long-term cache rows streamingFifoLen: 188, // warmup FIFO rows streamingChunkLeftContextMs: 80, // ~1 encoder frame streamingChunkRightContextMs: 560, // ~7 encoder frames streamingSpkCacheUpdatePeriod: 144, // FIFO-overflow pop count }, }, }); ``` ## Depends on - qvac-registry-vcpkg #156 (parakeet-cpp 2026-05-20 bump). CI will not resolve the new `version>=` constraint until that PR merges. - Separate registry-server PR for the v2.1 GGUF entry in `models.prod.json` (out of scope for this PR -- handled independently). - Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`). ## Follow-up (separate PR, not in scope here) SDK adoption (`@qvac/sdk` schema + plugin + example) lands in a separate PR after this addon is published and the v2.1 GGUF entry has synced into `sdk/models/registry/models.ts`. The SDK needs both pieces in place before its schema can meaningfully forward AOSC knobs. * chore[notask]: address review — setup-models v2.1 + CHANGELOG [Unreleased] Two reviewer follow-ups on the v2.1 + AOSC PR: 1. `npm run setup-models` now fetches + converts v2.1 sortformer. - download-models.sh: new `sortformer-streaming-v2.1` type pulling from https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1/resolve/main/diar_streaming_sortformer_4spk-v2.1.nemo - convert-nemo.sh: matching type maps .nemo -> `diar_streaming_sortformer_4spk-v2.1.${q}.gguf`. - `--type all` (default) now includes the new type, so `npm run setup-models` stages v2.1 alongside the other models. - convert-nemo-to-gguf.py: surgically picked up PR #24's variant emission (the `detect_sortformer_variant(ckpt)` helper + `writer.add_string("parakeet.model_variant", ...)` call) without touching local qvac divergences (vendored attribution header, descriptive docstrings, `--quant f16` default, and the huggingface_hub import-error helper). The C++ engine's strict v2.1 detection now matches on `parakeet.model_variant == "sortformer-streaming-v2.1-aosc"` instead of falling back to the encoder-shape heuristic. - Verified end-to-end locally: `bash scripts/convert-nemo.sh --type sortformer-streaming-v2.1 --quant q8_0 --force` produces models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf and the resulting GGUF carries `parakeet.model_variant = "sortformer-streaming-v2.1-aosc"` (confirmed via gguf reader). 2. CHANGELOG entry moved under `## [Unreleased]`; version bumps in package.json + vcpkg.json reverted to 0.4.0. The release PR will promote `[Unreleased]` -> `[0.5.0]` and bump the versions then. * fix[notask]: pin parakeet-cpp to 2026-05-20#1 to avoid orphan tree The registry's parakeet-cpp.json lists both 2026-05-20#0 and 2026-05-20#1 (PR #156 introduced both port-versions in its two commits before squash-merging). vcpkg's minimum-version-selection picks #0 when the manifest says `version>=: 2026-05-20`, but the #0 git-tree is orphaned by the squash merge -- unreachable from main, so `git fetch HEAD` doesn't pull it in. CI fails with: fatal: failed to unpack tree object 91a6fc169003b70dcc66b82ca8d1d23445343127 note: while loading parakeet-cpp@2026-05-20 Pinning `version>=: 2026-05-20#1` skips the orphan and resolves to the actual port content on main (tree 69619b43...). Matches the existing `qvac-lint-cpp >= 1.4.4#3` precedent in the same file. Local clean build (no overlay, no cached registry) succeeds. * cpp lint format * Bump version --------- Co-authored-by: Pratik Narola <pratiknarola@Mac.bbrouter> Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com> Co-authored-by: GustavoA1604 <gustavogefa@hotmail.com>

…nx (#2110)

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

* fix: drop SDK peerDependencies; enforce in CI Completes the peer-deps cleanup trajectory started in #2089 by removing the peerDependencies + peerDependenciesMeta blocks from @qvac/sdk entirely and adding a CI gate that asserts the invariant holds on every SDK pod PR. Policy: - dependencies: every package directly imported by SDK source. - devDependencies: build / typecheck / lint-only modules. - No peer declarations -- host owns anything SDK doesn't import. CI gate (.github/workflows/pr-checks-sdk-pod.yml, sdk-only): - Packs the tarball and installs it into a fresh consumer dir. - Fails on any ERESOLVE / `npm warn peer` line on install. - Fails if any of corestore / hyperswarm / hyperdrive / hyperdb / hyperblobs / hyperdht resolves to more than one copy. - Smoke-imports @qvac/sdk and asserts >= 50 named exports. * fix: split SDK conditional modules into optionalDependencies Promote the 7 packages that are runtime-conditional (per-platform or per-feature) from devDependencies into optionalDependencies: - @modelcontextprotocol/sdk (MCP, only if hosting an MCP server) - bare-link (Bare-only linker shim) - compact-encoding (pin to ^3 to dedupe Holepunch v3 tree) - expo-device (Expo runtime only) - expo-file-system (Expo runtime only) - pear-pipe (Pear runtime only) - react-native-bare-kit (RN/Expo runtime only) Why not regular dependencies: - Backend consumers (npm install --omit=optional) get a 182-package tree instead of 790; mobile/Pear consumers get the plug-n-play default with all 7 auto-installed. Why not peerDependencies: - npm 7+ auto-installs peers and emits ERESOLVE on range drift, which is the exact failure mode this PR is fixing for Keet. Validated: - Keet-shape repro (cross-worker-bare-kit@^2 + @qvac/sdk): 0 ERESOLVE. - Default install: 0 peer warnings, 7/7 optionals present. - --omit=optional: 0 peer warnings, lean tree, Holepunch invariant still holds (single copy of corestore / hyperswarm / hyperdrive / hyperdb / hyperblobs / hyperdht / react-native-bare-kit). * test[ci]: add --omit=optional install gate to SDK consumer check The default-install gate validates the plug-n-play path (all optionals present); it does not exercise the lean backend path that consumers get with `npm install --omit=optional`. That path was implicitly trusted when we adopted optionalDependencies in the previous commit. Refactor the inline gate into a check_consumer() helper and call it twice: 1. Default install (plug-n-play): all 7 optionalDeps installed. Catches Keet-style ERESOLVE from optional deps' peer ranges colliding with other deps. Validates mobile/Pear consumer profile. 2. Lean install (--omit=optional): no optionalDeps. Catches (a) backend-required packages accidentally classified as optional, (b) SDK entry-point eagerly importing an optional module. Both scenarios run the same three assertions: - No ERESOLVE / `npm warn peer` lines. - Single copy of corestore / hyperswarm / hyperdrive / hyperdb / hyperblobs / hyperdht. - Smoke import yields >= 50 named exports. Validated locally: lean install yields 385 named exports (well above threshold), Holepunch invariant holds, 0 peer warnings. Adds ~60s of CI per SDK pod PR. * update dev deps * fix: address review — restore expo-build-properties; revert bare-subprocess major bump - Add expo-build-properties to optionalDependencies. withQvacSDK wires it by string into the Expo plugin chain, so dropping it entirely broke plug-n-play for Expo consumers (the consumer-install gate runs in a Node consumer dir, so it didn't catch the missing Expo plugin module). Keeping it optional preserves the previous behavior while letting Node-only backends skip it via --omit=optional. - Revert bare-subprocess to ^5.2.3 (was bumped to ^6.0.0 by bun add). v5→v6 is a major bump for what is only a dev-time shim consumed by scripts/bare-bootstrap.js; staying on the prior major avoids dragging drift into the dev tree and keeps NOTICE accurate. Validated: - bun install + bun run build clean. - Consumer install gate (default + --omit=optional): both green, Holepunch invariant holds, 385 named exports. * chore: drop hyperdb + hyperblobs from SDK dependencies Neither package is imported by SDK source — they were guarantor pins for @qvac/registry-client, which declares both as non-optional peerDependencies. npm 7+ auto-installs non-optional peers, so the consumer install graph is unchanged: hyperdb and hyperblobs still resolve to a single copy in both default and --omit=optional installs, satisfied by registry-client's own peer ranges. Aligns with the "declare what you import, nothing else" policy. Validated: - bun install + bun run build clean. - Default install (790 pkgs): single copy of corestore / hyperswarm / hyperdrive / hyperdb / hyperblobs / hyperdht. - --omit=optional install (182 pkgs): same. - 0 peer warnings in both scenarios; 385 named exports.

The verified label already gates every secret-bearing workflow via label-gate (108 workflows since QVAC-18612). The legacy verify label was still in use on five paths for non-secret heavy CI and a per-package merge assertion, forcing reviewers to apply two labels for the same trust ceremony. Collapse onto verified everywhere. - public-pr.yml merge gate now reads verified. - public-reusable-npm.yml integration step now reads verified. - pr-test-inference-addon-cpp.yml + -js.yml replace their bespoke "verify must be freshly applied" dance with a verified-presence check that still denies on fork synchronize (pending label-gate strip in sibling pull_request_target workflows). Trusted same-repo pushes now re-trigger automatically instead of requiring re-labelling. - pr-test-inference-addon-cpp-verify.yml deleted; its sole purpose was to strip verify on every push, which would actively conflict with label-gate's verified strip policy. - pr-models-validation-registry-server.yml comment refreshed; its authorize-pr invocation picks up the new default. - authorize-pr composite action default flipped from verify to verified. Affects 17 consumers that all already pair authorize-pr with a label-gate job requiring verified, so the change removes the double-label awkwardness for fork PRs without altering the trust model. - Description strings on six on-pr-*.yml workflow_dispatch inputs and two integration-mobile-test comments updated for consistency (run_verify variable kept to avoid breaking dispatch scripts). - docs/ci/LABELS.md collapses the deprecated verify row and expands the verified section to cover the broader scope. - devops-why-my-pr-not SKILL.md C6 row drops the verify-deprecation caveat. Validation: - 58/58 label-gate unit tests pass. - actionlint issue count unchanged (30) across the five edited critical workflows; every remaining warning is pre-existing shellcheck noise in the PowerShell/CMake matrix steps. - yaml.safe_load round-trips every modified workflow. - Grep for remaining verify-label references in .github/ returns only the human-facing run_verify workflow_dispatch input names (kept) and the unrelated qvac verify CLI subcommand bats test. Behavioural changes worth flagging: 1. inference-addon-cpp heavy tests now re-run on every trusted push to a verified PR (previously needed a remove+re-add label dance). Bounded by the existing paths filter. 2. The github label verify itself is NOT deleted by this PR; run gh label delete verify --repo tetherto/qvac after merge so in-flight PRs with the legacy label aren't surprised.

…bs (#2177)

* fix resume order * fix: align resume order with new suspend order and fix unit tests Resume now runs swarms before stores so it mirrors the new suspend order (stores before swarms) in reverse — LIFO. Updates resume / suspend log lines and the order-sensitive assertions in `test/unit/runtime-lifecycle.test.ts`. --------- Co-authored-by: namelsking <functionsilence@gmail.com>

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io> Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

* Add dynamic backend loading for android and model download in integration tests * Remove gguf bundling from mobile integration test * Add missing registry-client dependency * Remove non-working GPUs * Fix failing test * Point to tetherto repo * Remove redundant comments * Update readme

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

…ss-actions repos Adds an optional `extraRepos` field to `.github/teams/<pod>.json` so a pod's `--mode team` dashboard can span repos beyond the configured monorepo. Each entry is either `{"repo": "owner/name"}` or `{"match": "owner/name-glob"}` (`*` wildcard, resolved per-run via `gh repo list <owner>`, archived repos auto-excluded, primary repo de-duplicated). `collectPRActivity` now fetches the primary repo plus every resolved extra repo, tags each PR with its source repo, and bypasses the `ownedPaths` filter for extra-repo PRs (entire repo treated as pod-owned). Repos the caller cannot read are skipped with a single stderr warning rather than aborting the run. Backward-compatible: pods without `extraRepos` (e.g. sdk) keep their previous single-repo behavior. Only `--mode team` honors `extraRepos`; `--mode review` and `--mode my` stay single-repo so existing skills are unchanged. Renderer prefixes extra-repo PRs as `owner/repo#<num>` (primary-repo PRs keep the bare `#<num>` form) and emits a `Repos:` summary line above the headline when extra repos contributed. JSON output exposes top-level `extraRepos` plus per-PR `repo` / `isExtraRepo`. DevOps pod opts in via `[{"match": "tetherto/qvac-*"}, {"repo": "tetherto/ github-ops"}, {"repo": "tetherto/oss-actions"}]` (44 repos scanned today). Co-authored-by: Cursor <cursoragent@cursor.com>

Now that the team dashboard is multi-repo, `pr.number` is no longer unique inside `needsAction` — `#159` and `tetherto/qvac-devops#159` can legitimately collide. With the old Set keyed on `pr.number`, if one of the two landed in `reReviewPRs`, the other was silently dropped from both `stalePRs` and `activePRs` by the `!reReviewSet.has(pr.number)` filter. No log line, just a disappearing PR. Key on `pr.url` instead — it is unique across repos by construction. Spot-checked `classifyMyPRs` and `classifyReviewPRs`; neither has the same shape of cross-collection number-keyed Set, so the fix stays localized to `classifyTeamPRs`. Caught in self-review on #2183. Co-authored-by: Cursor <cursoragent@cursor.com>

* doc: add missing DevDependency for local http-server * doc: fix bug in page index of nested links * doc: improve support to LLMs: llms.txt, llms-full.txt and pages served as .md * doc: implement page actions bar * doc: adjust page actions after cherry pick * doc: bug - page action - button Ask AI

kinsta · 2026-05-21T12:17:58Z

Preview deployments for qvac-docs-staging ⚡️

Status	Branch preview	Commit preview
✅ Ready	Visit preview	Visit preview

Commit: ccdc428b9bf7096f68576281a68272adea3c1691

Deployment ID: 8999c790-1192-41db-9a48-1c74bcfe9b45

Static site name: qvac-docs-staging-fazwv

github-actions · 2026-05-21T12:18:24Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (2/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

Now that the team dashboard is multi-repo, `pr.number` is no longer unique inside `needsAction` — `#159` and `tetherto/qvac-devops#159` can legitimately collide. With the old Set keyed on `pr.number`, if one of the two landed in `reReviewPRs`, the other was silently dropped from both `stalePRs` and `activePRs` by the `!reReviewSet.has(pr.number)` filter. No log line, just a disappearing PR. Key on `pr.url` instead — it is unique across repos by construction. Spot-checked `classifyMyPRs` and `classifyReviewPRs`; neither has the same shape of cross-collection number-keyed Set, so the fix stays localized to `classifyTeamPRs`. Caught in self-review on #2183. Co-authored-by: Cursor <cursoragent@cursor.com>

simon-iribarren and others added 30 commits April 29, 2026 21:29

doc: sdk - configuration - add new configs (#1813)

eaf555d

doc: CLI - add new subcommand doctor + new page system requirements (#…

14da5b5

…1819)

Docs/voice assistant (#1816)

e135639

* doc: create new page - sdk - new ai capability - voice assistant * doc: create new page - sdk - voice assistant * doc: remove temporary doc that leaked into commit * doc: content new - sdk - voice assistant - PR review

fix(ci): allow publishing addons under a custom npm dist-tag via work…

4db504b

…flow_dispatch input (#1824) Co-authored-by: Matt Cavanagh <1789097+darkynt@users.noreply.github.com> Co-authored-by: tamer-hassan-tether <tamer.hassan@tether.io>

QVAC-18142 [Whisper] v0.6.6 (#1830)

cc0f267

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

infra[QVAC-17058]: add empty BCI whispercpp workflow stubs to enable …

5e47460

…branch dispatch (#1856) Co-authored-by: Ishan Vohra <ishanvohra@Ishans-MacBook-Air.local>

doc: autogen pipeline - fix release branch prefix (#1846)

5444738

QVAC-18280 [Decoder] v0.3.9 (#1859)

1c16b5a

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

feat: add sdk-backmerge skill and chain it from sdk-pr-create for rel…

cafc2f9

…ease prs (#1862)

pratiknarola-t and others added 12 commits May 20, 2026 13:55

QVAC-18453 chore: remove unused @qvac/response dependency from ocr-on…

6571ebd

…nx (#2110)

remove dead deps and move dev ones to dev (#2159)

d23cea8

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

infra: use npm environment for vla and classification-ggml publish jo…

66ecc73

…bs (#2177)

cleanup deps and add missing ones (#2171)

2c17e52

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

remove dead deps (#2173)

4ff7910

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io> Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

subprocess is a dev dep (#2170)

4236335

Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>

Proletter requested review from a team as code owners May 21, 2026 11:47

Proletter and others added 5 commits May 21, 2026 13:01

Merge branch 'main' into infra/devops-pr-status-multi-repo

d4f97b6

doc: add link headers - enhance support form ai agents (#2175)

d882ae6

Merge branch 'main' into infra/devops-pr-status-multi-repo

ccdc428

kinsta Bot deployed to preview May 21, 2026 12:17 View deployment

darkynt approved these changes May 21, 2026

View reviewed changes

NamelsKing approved these changes May 21, 2026

View reviewed changes

Proletter force-pushed the infra/devops-pr-status-multi-repo branch 2 times, most recently from d9ee080 to ccdc428 Compare May 24, 2026 18:30

Proletter closed this May 24, 2026

Proletter force-pushed the main branch from 098a0fc to b8310f6 Compare May 24, 2026 19:13

Proletter mentioned this pull request Jun 9, 2026

infra[notask]: add multi-repo coverage to DevOps PR status skill #2500

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

infra[notask]: extend qv-devops-pr-status to scan qvac-*/github-ops/oss-actions repos#2183

infra[notask]: extend qv-devops-pr-status to scan qvac-*/github-ops/oss-actions repos#2183
Proletter wants to merge 1320 commits into
mainfrom
infra/devops-pr-status-multi-repo

Proletter commented May 21, 2026

Uh oh!

kinsta Bot commented May 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Conversation

Proletter commented May 21, 2026

🎯 What problem does this PR solve?

📝 How does it solve it?

🧪 How was it tested?

Uh oh!

kinsta Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preview deployments for qvac-docs-staging ⚡️

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tier-based Approval Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

kinsta Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading