QVAC-18111 infra[notask]: scaffold Benchmark Performance (LLM) workflow_dispatch#1839
Merged
Merged
Conversation
…ow_dispatch GitHub requires a `workflow_dispatch` workflow to exist on the default branch before it shows up in the Actions tab and becomes triggerable with `--ref <feature-branch>`. This lands the LLM benchmark workflow on `main` so the QVAC-17830 perf-metrics feature branch can be dispatched against it for end-to-end validation. Changes: - `benchmark-performance-qvac-lib-infer-llamacpp-llm.yml` (new): manual `workflow_dispatch` only — mirrors the structure of the existing Parakeet / Whispercpp benchmark workflows. Calls `prebuilds-...yml` then `integration-test-...yml` with bench-mode iteration counts (`QVAC_PERF_RUNS=3`, `QVAC_PERF_WARMUP_RUNS=1` by default), then aggregates desktop artifacts into a combined HTML / step-summary. Phase-1 scope is desktop only — mobile (Device Farm) needs a build-time hook in the test app to thread env vars through to bare and is tracked as a QVAC-18111 follow-up. - `integration-test-qvac-lib-infer-llamacpp-llm.yml`: thread `qvac_perf_runs` / `qvac_perf_warmup_runs` through `workflow_call` + `workflow_dispatch` and surface them as `QVAC_PERF_RUNS` / `QVAC_PERF_WARMUP_RUNS` on the Linux/macOS and Windows test run steps. Empty string => unset, so the umbrella PR workflow continues to honour the test-side default and PR runs are unaffected by this change. Per the perf policy agreed on Slack (2026-04-30): the umbrella on-pr workflow runs perf tests at the cheap default so we don't pay full perf cost on every PR; this dedicated workflow is the only place we crank up the iteration counts to produce mean ± std numbers. Made-with: Cursor
…rkflow Made-with: Cursor
Contributor
|
/review |
Contributor
Author
|
/review |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Scaffolds the dedicated Benchmark Performance (LLM)
workflow_dispatchonmainso the QVAC-17830 perf-metrics feature branch can be dispatched against it. Per the perf policy agreed on Slack, the umbrella on-pr workflow keeps the cheap iteration default; this is the only place we crank upQVAC_PERF_RUNSto get mean ± std numbers.GitHub requires a
workflow_dispatchto exist on the default branch before it shows up in the Actions tab and becomes triggerable with--ref <feature-branch>— that's why this small infra PR lands ahead of the main perf PR.Changes
benchmark-performance-qvac-lib-infer-llamacpp-llm.yml(new)workflow_dispatchonly, mirrors the existing Parakeet / Whispercpp benchmark workflowsrepository,ref,qvac_perf_runs(default3),qvac_perf_warmup_runs(default1),run_desktop(defaulttrue)context→prebuild→desktop-benchmarks(gated byrun_desktop, callsintegration-test-...yml) →summarize(aggregates desktop artifacts into combined HTML + GitHub step summary)mobile-benchmarks+ a matchingrun_mobiletoggle so the two matrices can be triggered independentlyintegration-test-qvac-lib-infer-llamacpp-llm.ymlqvac_perf_runs/qvac_perf_warmup_runsthroughworkflow_call+workflow_dispatchQVAC_PERF_RUNS/QVAC_PERF_WARMUP_RUNSenv on the Linux/macOS and Windows run-test stepsTest plan
mainsoBenchmark Performance (LLM)appears in the Actions tab--ref feature-qvac-17830-vlm-perf-metrics(the perf-metrics branch carries the actual env-var consumption in_image-common.js/bitnet.test.js/tool-calling.test.js) to confirm the bench-mode3 + 1iteration counts surface in the combined reportrun_desktop=falseto verify the desktop matrix is skipped (no-op until QVAC-18111 infra[notask]: bridge QVAC_PERF_RUNS to mobile test app via pushFile #1840 lands and addsrun_mobile; until then this dispatch produces an empty summary, which is the expected behaviour)1 + 1)actionlint/ GitHub workflow validation passes