Merged
Conversation
0852010 to
63b86ef
Compare
11018a3 to
c54d41a
Compare
Adds a GitHub Actions workflow that runs the benchmark matrix on every PR against main and flags >5% throughput regressions / >10% memory regressions. Push-to-main runs refresh the authoritative baseline artifact. - .github/workflows/benchmark.yaml — PR + push + manual triggers on Node 22, 25-minute timeout, concurrency group scoped per ref. - benchmarks/scripts/run-matrix-ci.sh — host profile + discarded JIT warmup run + measurement pass with 3000 ms time / 1000 ms warmup, extracts the matrix JSON payload from stdout into bench-results.json. - benchmarks/scripts/compare-results.ts — delta computation with configurable thresholds, emits a sticky-comment-ready markdown table, exits 0 on regression (workflow surfaces via ::warning:: annotation). - benchmarks/src/bench-matrix.ts — honour BENCH_MATRIX_TIME and BENCH_MATRIX_WARMUP env vars so the wrapper can tune budgets without editing the runner. - benchmarks/baselines/README.md — documents the two-tier storage (Actions artifact is source of truth, in-repo main.json is the zero-network fallback and local-dev quick reference). - bench:matrix:ci + bench:matrix:compare npm scripts — local equivalent of the CI flow for reproducible dev-side regression checks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c54d41a to
c69ff8b
Compare
intech
added a commit
that referenced
this pull request
Apr 20, 2026
Formatting drift from PR #9 surfaced by turbo format check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a GitHub Actions workflow that runs the
bench-matrix.tssuite onevery PR against
main, diffs the results against the latest mainbaseline, and posts a sticky comment flagging >5% throughput regressions
and >10% memory regressions. Push-to-main runs refresh the authoritative
baseline artifact. Supporting scripts keep the behaviour reproducible
locally via
npm run bench:matrix:ci+npm run bench:matrix:compare.This is additive — no existing benchmark code is touched beyond
honouring two new env vars (
BENCH_MATRIX_TIME,BENCH_MATRIX_WARMUP)in
bench-matrix.ts, which the CI wrapper uses to get tighter RME thanthe dev-optimised defaults.
Targets
feat/benchmark-matrix(PR #7) so the workflow rides in withthe matrix when that PR lands on
main.What's in the box
.github/workflows/benchmark.yaml— PR + push + manual triggers,Node 22 (matches
.nvmrcconsumer requirement), 25-minute timeout,concurrency group scoped per ref.
benchmarks/scripts/run-matrix-ci.sh— host profile logging +throwaway JIT warmup run + real measurement pass at 3000 ms time /
1000 ms warmup, extracts the matrix JSON payload from stdout into
bench-results.json.benchmarks/scripts/compare-results.ts— delta computation againstthe baseline, configurable thresholds, emits a sticky-comment-ready
markdown table. Exits 0 even on regression — the workflow surfaces
the flag via
::warning::annotation plus the PR comment so intentionalthroughput trades are not hard-blocked. Tech-lead can promote to
hard-fail later.
benchmarks/baselines/README.md— documents the two-tier storagedecision (Actions artifact
bench-baseline-mainis source of truthwith 365-day retention; committed
baselines/main.jsonis thezero-network fallback and local-dev quick reference; refreshed by a
follow-up chore PR after material main-branch moves).
bench:matrix:ci+bench:matrix:comparenpm scripts — samepipeline, runnable locally.
Baseline storage decision
Two-tier. Artifacts hold the authoritative baseline because they give
trend history for free and survive repo churn. A committed
baselines/main.jsonquick-reference de-risks the artifact dependencyand lets developers run the comparison offline. Refresh of the
in-repo file is manual via a
chore(benchmarks): refresh main baselinePR until a follow-up automates it. Rationale in
benchmarks/baselines/README.md.Test plan
python3 -c 'import yaml; yaml.safe_load(open(".github/workflows/benchmark.yaml"))'locally — passes).bash -n benchmarks/scripts/run-matrix-ci.sh— passes.compare-results.tssmoke test on synthetic baseline + currentJSON correctly flags a -10% row as REGRESSION, +9.1% as improved,
and a new-only row as
new— verified locally.bench-results-<n>artifactand posts a sticky comment. Since no
bench-baseline-mainartifactexists yet on the fork, the first comment will be informational
("No baseline available").
mainvia PR Add benchmark matrix with realistic fixtures #7, the push-to-main runuploads the first
bench-baseline-mainartifact. Subsequent PRsget real deltas.
Follow-ups
N
bench-baseline-mainartifacts and renders a sparkline per fixtureinto the benchmarks README. Blocked on having N>1 baselines.
runners give ±2–5% RME on most fixtures; a pinned bare-metal runner
would bring that to ±0.5% and let us drop the regression threshold
to 3%.
main.jsonrefresh by having the push-to-mainjob open a PR with the updated file instead of relying on manual
chore PRs.
bytesPerOpintobench-matrix.ts(currently only the comparescript has the plumbing — the fixture runner does not yet emit heap
delta per op).