Skip to content
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
bd3bd69
QVAC-17830 feat: add VLM perf metrics with multi-run averaging
tobi-legan Apr 24, 2026
9f8962b
QVAC-17830 feat: wire Mobile LLM into perf-report.yml weekly aggregator
tobi-legan Apr 24, 2026
5739149
QVAC-17830 feat: add per-run joint perf reporter to mobile LLM workflow
tobi-legan Apr 24, 2026
37afd3d
QVAC-17830 fix: preserve mobile perf data on OOM, split image tests p…
tobi-legan Apr 24, 2026
9788468
QVAC-17830 fix: preserve mobile perf data under iOS V8 Zone OOM
tobi-legan Apr 24, 2026
b11e612
QVAC-17830 fix: plug combined perf report gaps (desktop race + artifa…
tobi-legan Apr 24, 2026
2dbabba
QVAC-17830 fix: iOS fruit plate retry + consolidate Android images + …
tobi-legan Apr 24, 2026
c1cff75
QVAC-17830 fix: inline crash flush-delay, drop duplicated pull helper
tobi-legan Apr 24, 2026
562ac14
QVAC-17830 fix: iOS fruit plate warmup + merge linux legs + mobile de…
tobi-legan Apr 24, 2026
398292d
QVAC-17830 fix: iOS fruit plate 1-iter override + HTML detail tables
tobi-legan Apr 24, 2026
e422652
QVAC-17830 fix: warm process iOS heavy7 + dedupe perf legs + drop retry
tobi-legan Apr 24, 2026
d691a15
QVAC-17830 fix: warm iOS heavy7 with elephant instead of api-behavior
tobi-legan Apr 25, 2026
14b1c48
QVAC-17830 fix: shrink iOS fruit-plate to 2 inferences cold
tobi-legan Apr 25, 2026
86ef719
QVAC-17830 feat: scenario grouping, GPU probe, squashed PR summary, p…
tobi-legan Apr 28, 2026
5fabbac
feat: surface per-device detail tables in PR summary with mean ±std c…
tobi-legan Apr 28, 2026
15d06e7
refactor: drop image_prefill_time_ms from perf report
tobi-legan Apr 28, 2026
7f17c52
Merge remote-tracking branch 'origin/main' into feature-qvac-17830-vl…
tobi-legan Apr 28, 2026
4f23a05
Merge remote-tracking branch 'origin/main' into feature-qvac-17830-vl…
tobi-legan Apr 29, 2026
fa2e76d
Merge remote-tracking branch 'origin/main' into feature-qvac-17830-vl…
tobi-legan Apr 30, 2026
973b744
QVAC-17830 fix: tighten combined perf report layout (column filtering…
tobi-legan Apr 30, 2026
0fba1d0
QVAC-18111 feat: env-driven perf iterations + Benchmark Performance (…
tobi-legan Apr 30, 2026
670ee24
QVAC-17830 fix: tool-calling EP label honours NO_GPU on linux-x64-cpu…
tobi-legan Apr 30, 2026
45caf66
QVAC-17830 fix: terser perf-report legend + full metric breakdown in …
tobi-legan Apr 30, 2026
7cf5f34
Merge remote-tracking branch 'origin/main' into feature-qvac-17830-vl…
tobi-legan Apr 30, 2026
47239ce
QVAC-17830 fix: use bare-os getEnv() for QVAC_PERF_RUNS / NO_GPU lookups
tobi-legan Apr 30, 2026
a561656
QVAC-18111 chore: align Benchmark Performance (LLM) workflow with the…
tobi-legan Apr 30, 2026
102f9cd
QVAC-17830 fix: address CodeQL security findings on combined-perf-rep…
tobi-legan Apr 30, 2026
a6751c5
QVAC-17830 fix: add shell + security note on Generate combined report…
tobi-legan Apr 30, 2026
5793a04
Merge branch 'main' into feature-qvac-17830-vlm-perf-metrics
tobi-legan May 4, 2026
da03bb5
QVAC-17830 feat: bridge QVAC_PERF_RUNS overrides into mobile bare run…
tobi-legan May 4, 2026
e6f04bf
QVAC-17830 feat: gate Benchmark Performance (LLM) to perf-emitting te…
tobi-legan May 4, 2026
e0ac869
Merge remote-tracking branch 'origin/main' into feature-qvac-17830-vl…
tobi-legan May 4, 2026
c3db5cb
Merge remote-tracking branch 'origin/main' into feature-qvac-17830-vl…
tobi-legan May 4, 2026
e67fe3a
Merge remote-tracking branch 'origin/main' into feature-qvac-17830-vl…
tobi-legan May 5, 2026
706b0d4
fix[ci]: drop PR-head checkout in combine-perf-reports to clear CodeQ…
tobi-legan May 6, 2026
3102ddf
Merge branch 'main' into feature-qvac-17830-vlm-perf-metrics
tobi-legan May 7, 2026
6665416
Merge branch 'main' into feature-qvac-17830-vlm-perf-metrics
tobi-legan May 11, 2026
8f30064
fix[ci]: add runGemma4Test + runOcrPaddleTest to iOS lightB
tobi-legan May 11, 2026
cb0a21c
fix[ci]: bound mobile monitor when AWS API permanently fails
tobi-legan May 11, 2026
23e23f7
mod[notask]: isolate iOS Gemma4 and OcrPaddle into their own Device F…
tobi-legan May 11, 2026
5d16ad2
QVAC-17830 fix: apply fruit-plate iOS OOM mitigation to high-res aurora
tobi-legan May 11, 2026
03e8346
Merge branch 'main' into feature-qvac-17830-vlm-perf-metrics
gianni-cor May 12, 2026
66c24fa
Merge branch 'main' into feature-qvac-17830-vlm-perf-metrics
gianni-cor May 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ jobs:
ref: ${{ needs.context.outputs.ref }}
qvac_perf_runs: ${{ inputs.qvac_perf_runs }}
qvac_perf_warmup_runs: ${{ inputs.qvac_perf_warmup_runs }}
qvac_perf_only: true

mobile-benchmarks:
needs: [context, prebuild]
Expand All @@ -105,6 +106,7 @@ jobs:
ref: ${{ needs.context.outputs.ref }}
qvac_perf_runs: ${{ inputs.qvac_perf_runs }}
qvac_perf_warmup_runs: ${{ inputs.qvac_perf_warmup_runs }}
qvac_perf_only: true

summarize:
# `if: always()` lets summarize run even when one of the benchmark
Expand Down
442 changes: 299 additions & 143 deletions .github/workflows/integration-mobile-test-llm-llamacpp.yml

Large diffs are not rendered by default.

76 changes: 75 additions & 1 deletion .github/workflows/integration-test-llm-llamacpp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@ on:
type: string
required: false
default: ""
qvac_perf_only:
description: "If true, run only the perf-emitting tests (image-elephant, image-fruit-plate, image-high-res-aurora, bitnet, tool-calling)."
type: boolean
required: false
default: false

workflow_dispatch:
inputs:
Expand All @@ -42,6 +47,11 @@ on:
type: string
required: false
default: ""
qvac_perf_only:
description: "If true, run only the perf-emitting tests."
type: boolean
required: false
default: false

jobs:
run-integration-tests:
Expand All @@ -61,39 +71,53 @@ jobs:
strategy:
fail-fast: false
matrix:
# QVAC-17830: `label` disambiguates matrix entries that share
# the same `<platform>-<arch>` tuple so every entry produces a
# uniquely-named perf-report artifact. Without this, the two
# `linux-x64` entries (CPU-only vs GPU runner) and the two
# `linux-arm64` entries (ubuntu 22 vs 24) collided on upload
# and `actions/download-artifact` silently dropped one of each,
# hiding GPU data in the combined report.
include:
- os: ubuntu-22.04
platform: linux
arch: x64
runner: ubuntu-22.04
no_gpu: 'true'
label: linux-x64-cpu
- os: ubuntu-24.04
platform: linux
arch: x64
runner: ai-run-linux-gpu
timeout_minutes: 480
label: linux-x64-gpu
- os: ubuntu-24.04-arm
platform: linux
arch: arm64
runner: ubuntu-24.04-arm
no_gpu: 'true'
label: linux-arm64-u24
- os: ubuntu-22.04-arm
platform: linux
arch: arm64
runner: ubuntu-22.04-arm
no_gpu: 'true'
label: linux-arm64-u22
- os: macos-15-xlarge
platform: darwin
arch: arm64
runner: macos-15-xlarge
label: darwin-arm64
- os: macos-15-large
platform: darwin
arch: x64
runner: macos-15-large
label: darwin-x64
- os: windows-11
platform: win32
arch: x64
runner: ai-run-windows11-gpu
label: win32-x64

steps:
- name: Setup Node.js
Expand Down Expand Up @@ -211,6 +235,17 @@ jobs:
if: ${{ matrix.platform != 'win32' }}
working-directory: ${{ env.WORKDIR }}
run: |
if [ "${{ inputs.qvac_perf_only }}" = "true" ]; then
echo "qvac_perf_only=true: regenerating test/integration/all.js with perf-emitting tests only"
npx brittle -r test/integration/all.js \
test/integration/bitnet.test.js \
test/integration/tool-calling.test.js \
test/integration/image-elephant.test.js \
test/integration/image-fruit-plate.test.js \
test/integration/image-high-res-aurora.test.js
bare test/integration/all.js --exit 2>&1 | tee test-output.log
exit ${PIPESTATUS[0]}
fi
npm run test:integration 2>&1 | tee test-output.log
exit ${PIPESTATUS[0]}
shell: bash
Expand All @@ -223,10 +258,49 @@ jobs:
if: ${{ matrix.platform == 'win32' }}
working-directory: ${{ env.WORKDIR }}
run: |
npm run test:integration:generate
if ("${{ inputs.qvac_perf_only }}" -eq "true") {
Write-Host "qvac_perf_only=true: regenerating test/integration/all.js with perf-emitting tests only"
npx brittle -r test/integration/all.js `
test/integration/bitnet.test.js `
test/integration/tool-calling.test.js `
test/integration/image-elephant.test.js `
test/integration/image-fruit-plate.test.js `
test/integration/image-high-res-aurora.test.js
} else {
npm run test:integration:generate
}
bare test/integration/all.js --exit | Tee-Object test-output.log
shell: powershell
env:
QASE_API_TOKEN: ${{ secrets.QASE_API_TOKEN }}
QVAC_PERF_RUNS: ${{ inputs.qvac_perf_runs }}
QVAC_PERF_WARMUP_RUNS: ${{ inputs.qvac_perf_warmup_runs }}

- name: Generate HTML performance report
if: ${{ always() }}
working-directory: ${{ env.WORKDIR }}
shell: bash
run: |
if [ -f test/results/performance-report.json ]; then
echo "Found performance-report.json, generating HTML/MD/summary..."
node ../../scripts/perf-report/aggregate.js \
--dir test/results \
--output-html test/results/performance-report.html \
--output-json test/results/performance-summary.json \
--output test/results/performance-report.md
else
echo "performance-report.json not found - skipping HTML generation"
fi

- name: Upload performance report
if: ${{ always() }}
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # 7.0.0
with:
name: perf-report-llamacpp-llm-${{ matrix.label }}-${{ github.run_number }}
path: |
${{ env.WORKDIR }}/test/results/performance-report.json
${{ env.WORKDIR }}/test/results/performance-report.html
${{ env.WORKDIR }}/test/results/performance-summary.json
${{ env.WORKDIR }}/test/results/performance-report.md
retention-days: 90
if-no-files-found: ignore
Loading
Loading