Skip to content

Add matrix regression detection CI workflow#9

Merged
intech merged 1 commit intomainfrom
ci/benchmark-workflow
Apr 19, 2026
Merged

Add matrix regression detection CI workflow#9
intech merged 1 commit intomainfrom
ci/benchmark-workflow

Conversation

@intech
Copy link
Copy Markdown

@intech intech commented Apr 19, 2026

Summary

Adds a GitHub Actions workflow that runs the bench-matrix.ts suite on
every PR against main, diffs the results against the latest main
baseline, and posts a sticky comment flagging >5% throughput regressions
and >10% memory regressions. Push-to-main runs refresh the authoritative
baseline artifact. Supporting scripts keep the behaviour reproducible
locally via npm run bench:matrix:ci + npm run bench:matrix:compare.

This is additive — no existing benchmark code is touched beyond
honouring two new env vars (BENCH_MATRIX_TIME, BENCH_MATRIX_WARMUP)
in bench-matrix.ts, which the CI wrapper uses to get tighter RME than
the dev-optimised defaults.

Targets feat/benchmark-matrix (PR #7) so the workflow rides in with
the matrix when that PR lands on main.

What's in the box

  • .github/workflows/benchmark.yaml — PR + push + manual triggers,
    Node 22 (matches .nvmrc consumer requirement), 25-minute timeout,
    concurrency group scoped per ref.
  • benchmarks/scripts/run-matrix-ci.sh — host profile logging +
    throwaway JIT warmup run + real measurement pass at 3000 ms time /
    1000 ms warmup, extracts the matrix JSON payload from stdout into
    bench-results.json.
  • benchmarks/scripts/compare-results.ts — delta computation against
    the baseline, configurable thresholds, emits a sticky-comment-ready
    markdown table. Exits 0 even on regression — the workflow surfaces
    the flag via ::warning:: annotation plus the PR comment so intentional
    throughput trades are not hard-blocked. Tech-lead can promote to
    hard-fail later.
  • benchmarks/baselines/README.md — documents the two-tier storage
    decision (Actions artifact bench-baseline-main is source of truth
    with 365-day retention; committed baselines/main.json is the
    zero-network fallback and local-dev quick reference; refreshed by a
    follow-up chore PR after material main-branch moves).
  • bench:matrix:ci + bench:matrix:compare npm scripts — same
    pipeline, runnable locally.

Baseline storage decision

Two-tier. Artifacts hold the authoritative baseline because they give
trend history for free and survive repo churn. A committed
baselines/main.json quick-reference de-risks the artifact dependency
and lets developers run the comparison offline. Refresh of the
in-repo file is manual via a chore(benchmarks): refresh main baseline
PR until a follow-up automates it. Rationale in
benchmarks/baselines/README.md.

Test plan

  • Workflow YAML parses (python3 -c 'import yaml; yaml.safe_load(open(".github/workflows/benchmark.yaml"))' locally — passes).
  • bash -n benchmarks/scripts/run-matrix-ci.sh — passes.
  • compare-results.ts smoke test on synthetic baseline + current
    JSON correctly flags a -10% row as REGRESSION, +9.1% as improved,
    and a new-only row as new — verified locally.
  • First CI run on this PR produces a bench-results-<n> artifact
    and posts a sticky comment. Since no bench-baseline-main artifact
    exists yet on the fork, the first comment will be informational
    ("No baseline available").
  • After this PR rides into main via PR Add benchmark matrix with realistic fixtures #7, the push-to-main run
    uploads the first bench-baseline-main artifact. Subsequent PRs
    get real deltas.

Follow-ups

  • Trend dashboard: small JSON-aggregation script that pulls the last
    N bench-baseline-main artifacts and renders a sparkline per fixture
    into the benchmarks README. Blocked on having N>1 baselines.
  • Self-hosted single-core runner to cut RME further. GitHub-hosted
    runners give ±2–5% RME on most fixtures; a pinned bare-metal runner
    would bring that to ±0.5% and let us drop the regression threshold
    to 3%.
  • Automate the in-repo main.json refresh by having the push-to-main
    job open a PR with the updated file instead of relying on manual
    chore PRs.
  • Wire bytesPerOp into bench-matrix.ts (currently only the compare
    script has the plumbing — the fixture runner does not yet emit heap
    delta per op).

@intech intech changed the title ci(benchmarks): matrix regression detection workflow ci(benchmarks): Add matrix regression detection workflow Apr 19, 2026
@intech intech changed the title ci(benchmarks): Add matrix regression detection workflow Add matrix regression detection CI workflow Apr 19, 2026
@intech intech force-pushed the feat/benchmark-matrix branch from 0852010 to 63b86ef Compare April 19, 2026 21:35
@intech intech force-pushed the ci/benchmark-workflow branch from 11018a3 to c54d41a Compare April 19, 2026 21:35
@intech intech self-assigned this Apr 19, 2026
Adds a GitHub Actions workflow that runs the benchmark matrix on every
PR against main and flags >5% throughput regressions / >10% memory
regressions. Push-to-main runs refresh the authoritative baseline
artifact.

- .github/workflows/benchmark.yaml — PR + push + manual triggers on
  Node 22, 25-minute timeout, concurrency group scoped per ref.
- benchmarks/scripts/run-matrix-ci.sh — host profile + discarded JIT
  warmup run + measurement pass with 3000 ms time / 1000 ms warmup,
  extracts the matrix JSON payload from stdout into bench-results.json.
- benchmarks/scripts/compare-results.ts — delta computation with
  configurable thresholds, emits a sticky-comment-ready markdown table,
  exits 0 on regression (workflow surfaces via ::warning:: annotation).
- benchmarks/src/bench-matrix.ts — honour BENCH_MATRIX_TIME and
  BENCH_MATRIX_WARMUP env vars so the wrapper can tune budgets without
  editing the runner.
- benchmarks/baselines/README.md — documents the two-tier storage
  (Actions artifact is source of truth, in-repo main.json is the
  zero-network fallback and local-dev quick reference).
- bench:matrix:ci + bench:matrix:compare npm scripts — local equivalent
  of the CI flow for reproducible dev-side regression checks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@intech intech force-pushed the ci/benchmark-workflow branch from c54d41a to c69ff8b Compare April 19, 2026 22:20
@intech intech changed the base branch from feat/benchmark-matrix to main April 19, 2026 22:20
@intech intech merged commit f764e81 into main Apr 19, 2026
2 checks passed
@intech intech deleted the ci/benchmark-workflow branch April 19, 2026 22:22
intech added a commit that referenced this pull request Apr 20, 2026
Formatting drift from PR #9 surfaced by turbo format check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant