QVAC-18111 infra[notask]: scaffold Benchmark Performance (LLM) workflow_dispatch by tobi-legan · Pull Request #1839 · tetherto/qvac

tobi-legan · 2026-04-30T17:22:28Z

Summary

Scaffolds the dedicated Benchmark Performance (LLM) workflow_dispatch on main so the QVAC-17830 perf-metrics feature branch can be dispatched against it. Per the perf policy agreed on Slack, the umbrella on-pr workflow keeps the cheap iteration default; this is the only place we crank up QVAC_PERF_RUNS to get mean ± std numbers.

GitHub requires a workflow_dispatch to exist on the default branch before it shows up in the Actions tab and becomes triggerable with --ref <feature-branch> — that's why this small infra PR lands ahead of the main perf PR.

Changes

benchmark-performance-qvac-lib-infer-llamacpp-llm.yml (new)
- Manual workflow_dispatch only, mirrors the existing Parakeet / Whispercpp benchmark workflows
- Inputs: repository, ref, qvac_perf_runs (default 3), qvac_perf_warmup_runs (default 1), run_desktop (default true)
- Jobs: context → prebuild → desktop-benchmarks (gated by run_desktop, calls integration-test-...yml) → summarize (aggregates desktop artifacts into combined HTML + GitHub step summary)
- Phase-1 scope: desktop only. Mobile (Device Farm) requires a build-time hook in the test app to thread env vars through to bare — landing as a stacked follow-up PR (QVAC-18111 infra[notask]: bridge QVAC_PERF_RUNS to mobile test app via pushFile #1840) which adds mobile-benchmarks + a matching run_mobile toggle so the two matrices can be triggered independently
integration-test-qvac-lib-infer-llamacpp-llm.yml
- Thread qvac_perf_runs / qvac_perf_warmup_runs through workflow_call + workflow_dispatch
- Surface as QVAC_PERF_RUNS / QVAC_PERF_WARMUP_RUNS env on the Linux/macOS and Windows run-test steps
- Empty string ⇒ unset, so the umbrella PR workflow continues to honour the test-side default. Existing PR runs are unaffected.

Test plan

Land this PR on main so Benchmark Performance (LLM) appears in the Actions tab
Dispatch with --ref feature-qvac-17830-vlm-perf-metrics (the perf-metrics branch carries the actual env-var consumption in _image-common.js / bitnet.test.js / tool-calling.test.js) to confirm the bench-mode 3 + 1 iteration counts surface in the combined report
Dispatch with run_desktop=false to verify the desktop matrix is skipped (no-op until QVAC-18111 infra[notask]: bridge QVAC_PERF_RUNS to mobile test app via pushFile #1840 lands and adds run_mobile; until then this dispatch produces an empty summary, which is the expected behaviour)
Confirm the umbrella on-pr LLM workflow stays unchanged (PR runs use 1 + 1)
actionlint / GitHub workflow validation passes

…ow_dispatch GitHub requires a `workflow_dispatch` workflow to exist on the default branch before it shows up in the Actions tab and becomes triggerable with `--ref <feature-branch>`. This lands the LLM benchmark workflow on `main` so the QVAC-17830 perf-metrics feature branch can be dispatched against it for end-to-end validation. Changes: - `benchmark-performance-qvac-lib-infer-llamacpp-llm.yml` (new): manual `workflow_dispatch` only — mirrors the structure of the existing Parakeet / Whispercpp benchmark workflows. Calls `prebuilds-...yml` then `integration-test-...yml` with bench-mode iteration counts (`QVAC_PERF_RUNS=3`, `QVAC_PERF_WARMUP_RUNS=1` by default), then aggregates desktop artifacts into a combined HTML / step-summary. Phase-1 scope is desktop only — mobile (Device Farm) needs a build-time hook in the test app to thread env vars through to bare and is tracked as a QVAC-18111 follow-up. - `integration-test-qvac-lib-infer-llamacpp-llm.yml`: thread `qvac_perf_runs` / `qvac_perf_warmup_runs` through `workflow_call` + `workflow_dispatch` and surface them as `QVAC_PERF_RUNS` / `QVAC_PERF_WARMUP_RUNS` on the Linux/macOS and Windows test run steps. Empty string => unset, so the umbrella PR workflow continues to honour the test-side default and PR runs are unaffected by this change. Per the perf policy agreed on Slack (2026-04-30): the umbrella on-pr workflow runs perf tests at the cheap default so we don't pay full perf cost on every PR; this dedicated workflow is the only place we crank up the iteration counts to produce mean ± std numbers. Made-with: Cursor

…rkflow Made-with: Cursor

olyasir · 2026-05-01T07:12:44Z

/review

tobi-legan · 2026-05-04T11:40:33Z

/review

tobi-legan requested review from a team as code owners April 30, 2026 17:22

tobi-legan had a problem deploying to release April 30, 2026 17:23 — with GitHub Actions Failure

tobi-legan had a problem deploying to release April 30, 2026 17:24 — with GitHub Actions Failure

tobi-legan requested a deployment to release April 30, 2026 17:24 — with GitHub Actions Waiting

QVAC-18111 chore[notask]: trim chatty inline comments in benchmark wo…

b454173

…rkflow Made-with: Cursor

tobi-legan had a problem deploying to release April 30, 2026 17:44 — with GitHub Actions Failure

tobi-legan had a problem deploying to release April 30, 2026 17:45 — with GitHub Actions Failure

olyasir had a problem deploying to release May 1, 2026 06:55 — with GitHub Actions Failure

Merge branch 'main' into infra/qvac-18111-benchmark-llm-workflow

8393492

olyasir had a problem deploying to release May 1, 2026 07:12 — with GitHub Actions Failure

olyasir had a problem deploying to release May 1, 2026 07:14 — with GitHub Actions Failure

Merge branch 'main' into infra/qvac-18111-benchmark-llm-workflow

9e3f37f

tobi-legan requested a review from a team as a code owner May 4, 2026 11:39

tobi-legan had a problem deploying to release May 4, 2026 11:40 — with GitHub Actions Failure

tobi-legan had a problem deploying to release May 4, 2026 11:41 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-18111 infra[notask]: scaffold Benchmark Performance (LLM) workflow_dispatch#1839

QVAC-18111 infra[notask]: scaffold Benchmark Performance (LLM) workflow_dispatch#1839
tobi-legan merged 6 commits into
mainfrom
infra/qvac-18111-benchmark-llm-workflow

tobi-legan commented Apr 30, 2026 •

edited

Loading

Uh oh!

olyasir commented May 1, 2026

Uh oh!

tobi-legan commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tobi-legan commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

olyasir commented May 1, 2026

Uh oh!

tobi-legan commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tobi-legan commented Apr 30, 2026 •

edited

Loading