perf(ci): Improve benchmark stability with interleaved execution by yamadashy · Pull Request #1348 · yamadashy/repomix

yamadashy · 2026-03-28T14:35:38Z

Improve performance benchmark reliability by reducing variance in the PR vs main comparison, and improve maintainability by extracting inline scripts.

Changes

Interleaved execution

Switch from sequential execution (all PR runs → all main runs) to interleaved execution (PR → main alternating each iteration). This ensures both branches experience similar runner load conditions at each measurement point, significantly reducing variance in the difference between PR and main timings.

# Before (sequential): runner load changes between blocks skew the diff
PR, PR, PR, ..., main, main, main, ...

# After (interleaved): both branches share the same conditions per iteration
PR, main, PR, main, PR, main, ...

Increased measurement runs

Ubuntu: 10 → 20
macOS: 20 → 30
Windows: 10 → 20

More samples improve statistical stability, especially important since the benchmark runs on shared CI runners. The additional time (~30-50s per OS) is well within the 15-minute timeout.

Extract scripts to separate files

Move inline Node.js scripts from YAML into .github/scripts/perf-benchmark/:

bench-run.mjs — Benchmark execution (interleaved measurement)
bench-pending.mjs — Pending comment generation
bench-comment.mjs — Results comment generation

This reduces the workflow YAML from ~370 lines to ~160 lines, enables proper syntax highlighting/linting, and makes the scripts easier to review and maintain. Jobs that only need the scripts use sparse-checkout for fast checkout.

Checklist

Run npm run test
Run npm run lint

…extract scripts - Switch from sequential (all PR then all main) to interleaved execution (PR→main alternating) so both branches experience similar runner load conditions, reducing variance in the measured difference - Increase measurement runs from 10/20/10 to 20/30/20 for better statistical stability - Extract inline Node.js scripts from YAML into separate .mjs files under .github/scripts/perf-benchmark/ for maintainability - Use sparse-checkout for jobs that only need the scripts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-28T14:35:51Z

⚡ Performance Benchmark

Latest commit:	`50a2cc0` fix(ci): Exit with error when all benchmark runs fail
Status:	✅ Benchmark complete!
Ubuntu:	2.08s (±0.04s) → 2.08s (±0.04s) · +0.00s (+0.0%)
macOS:	1.44s (±0.18s) → 1.39s (±0.17s) · -0.05s (-3.4%)
Windows:	2.30s (±0.12s) → 2.30s (±0.13s) · -0.00s (-0.2%)

Details

Packing the repomix repository with node bin/repomix.cjs
Warmup: 2 runs (discarded), interleaved execution
Measurement: 20 runs / 30 on macOS (median ± IQR)
Workflow run

History

1dbaebc refactor(ci): Address review feedback and fix lint errors

Ubuntu:	1.96s (±0.01s) → 1.96s (±0.02s) · +0.00s (+0.2%)
macOS:	1.70s (±0.17s) → 1.69s (±0.11s) · -0.01s (-0.6%)
Windows:	2.32s (±0.03s) → 2.33s (±0.05s) · +0.01s (+0.3%)

8e7cfe9 refactor(ci): Move history benchmark script to perf-benchmark-history/

Ubuntu:	2.16s (±0.02s) → 2.17s (±0.02s) · +0.01s (+0.3%)
macOS:	1.69s (±0.19s) → 1.73s (±0.17s) · +0.04s (+2.1%)
Windows:	2.81s (±0.06s) → 2.79s (±0.06s) · -0.02s (-0.7%)

coderabbitai · 2026-03-28T14:35:56Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a83eb29f-b702-4dc3-a6db-1e1360edaa88

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

The PR refactors performance benchmarking infrastructure by extracting inline benchmark scripts from GitHub Actions workflows into dedicated Node.js files in .github/scripts/perf-benchmark and .github/scripts/perf-benchmark-history directories. Workflows are updated to invoke these external scripts instead of embedding logic inline, alongside adjustments to benchmark run counts and artifact handling.

Changes

Cohort / File(s)	Summary
Benchmark Runner Scripts `.github/scripts/perf-benchmark-history/bench-run.mjs`, `.github/scripts/perf-benchmark/bench-run.mjs`	Two benchmark runner scripts: one executes Repomix against a repo directory with warmup and statistical analysis (median/IQR); the other compares PR vs. main branches with per-run timing and writes results to JSON. Both perform warmup executions to stabilize environment before measurement.
Benchmark Comment & History Scripts `.github/scripts/perf-benchmark/bench-comment.mjs`, `.github/scripts/perf-benchmark/bench-pending.mjs`	Comment generation scripts: `bench-comment.mjs` generates completed benchmark results with history tables; `bench-pending.mjs` creates in-progress comments with embedded JSON history (max 50 entries) for future extraction and accumulation.
Workflow Updates `.github/workflows/perf-benchmark-history.yml`, `.github/workflows/perf-benchmark.yml`	Replaced inline Node scripts with external script invocations; adjusted benchmark run counts (Ubuntu 10→20, macOS 20→30, Windows 10→20); added sparse checkout steps in comment-generation jobs; simplified workflow logic by delegating to dedicated scripts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

ci(perf): Add benchmark history tracking with github-action-benchmark #1318 — Directly modifies perf-benchmark-history workflow and bench-run/bench-comment scripts with overlapping logic.
ci(perf): Add benchmark history with JSON-in-comment storage #1289 — Modifies perf benchmark comment/history workflow and introduces JSON-in-comment history handling.
ci(benchmark): Add performance benchmark workflow for PRs #1252 — Refactors PR-vs-main benchmarking flow into dedicated scripts with the same per-branch timing approach.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change—improving benchmark stability through interleaved execution—which is the primary innovation in this changeset.
Description check	✅ Passed	The description provides comprehensive context: motivates the interleaved execution approach with examples, details measurement run increases per OS, explains script extraction benefits, and includes the required checklist items.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/benchmark-interleave-extract-scripts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Extract inline benchmark script to bench-run-history.mjs - Increase measurement runs from 10/20/10 to 20/30/20 to match perf-benchmark.yml for consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-03-28T14:37:00Z

Deploying repomix with Cloudflare Pages

Latest commit:	`50a2cc0`
Status:	✅ Deploy successful!
Preview URL:	https://44a49c4b.repomix.pages.dev
Branch Preview URL:	https://perf-benchmark-interleave-ex.repomix.pages.dev

View logs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

codecov · 2026-03-28T14:43:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.13%. Comparing base (fe6da90) to head (50a2cc0).
⚠️ Report is 12 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1348   +/-   ##
=======================================
  Coverage   87.13%   87.13%           
=======================================
  Files         116      116           
  Lines        4393     4393           
  Branches     1020     1020           
=======================================
  Hits         3828     3828           
  Misses        565      565

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Alternate PR/main execution order on even/odd iterations to neutralize ordering bias from CPU/filesystem cache warming - Add try/catch in measurement loops so a single failure doesn't lose all data; abort if all runs fail - Extract shared esc(), extractHistory(), renderHistory() into bench-utils.mjs to eliminate duplication between pending and comment - Add error logging for JSON parse failures instead of silent catch - Fix biome lint: use template literals, sort imports, expand single-line try/catch blocks, avoid assignment in expressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add early exit guard matching bench-run-history.mjs behavior, so a broken build fails the workflow step instead of silently reporting 0ms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

perf(ci): Extract benchmark-history script and increase runs

d19065e

- Extract inline benchmark script to bench-run-history.mjs - Increase measurement runs from 10/20/10 to 20/30/20 to match perf-benchmark.yml for consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor(ci): Move history benchmark script to perf-benchmark-history/

8e7cfe9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This comment was marked as resolved.

Sign in to view

This comment has been minimized.

Sign in to view

devin-ai-integration bot reviewed Mar 28, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

This comment was marked as resolved.

Sign in to view

coderabbitai bot approved these changes Mar 28, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

fix(ci): Exit with error when all benchmark runs fail

50a2cc0

Add early exit guard matching bench-run-history.mjs behavior, so a broken build fails the workflow step instead of silently reporting 0ms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

yamadashy merged commit bd9f343 into main Mar 28, 2026
61 checks passed

yamadashy deleted the perf/benchmark-interleave-extract-scripts branch March 28, 2026 15:01

This was referenced Mar 28, 2026

fix(ci): Disable cancel-in-progress for perf benchmark #1349

Merged

ci(perf-benchmark): Enable GitHub autolink for commit SHAs in benchmark comments #1353

Merged

This was referenced Apr 5, 2026

chore(ci): Add paths-ignore to CI workflow triggers #1407

Merged

ci(perf): Track perf/auto-perf-tuning benchmarks on separate gh-pages page #1449

Merged

Uh oh!

Conversation

yamadashy commented Mar 28, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Interleaved execution

Increased measurement runs

Extract scripts to separate files

Checklist

Uh oh!

github-actions bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡ Performance Benchmark

Uh oh!

coderabbitai bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

cloudflare-workers-and-pages bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying repomix with Cloudflare Pages

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment has been minimized.

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment has been minimized.

codecov bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yamadashy commented Mar 28, 2026 •

edited by devin-ai-integration bot

Loading

github-actions bot commented Mar 28, 2026 •

edited

Loading

coderabbitai bot commented Mar 28, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Mar 28, 2026 •

edited

Loading

codecov bot commented Mar 28, 2026 •

edited

Loading