perf(ci): Improve benchmark stability with interleaved execution#1348
perf(ci): Improve benchmark stability with interleaved execution#1348
Conversation
…extract scripts - Switch from sequential (all PR then all main) to interleaved execution (PR→main alternating) so both branches experience similar runner load conditions, reducing variance in the measured difference - Increase measurement runs from 10/20/10 to 20/30/20 for better statistical stability - Extract inline Node.js scripts from YAML into separate .mjs files under .github/scripts/perf-benchmark/ for maintainability - Use sparse-checkout for jobs that only need the scripts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
⚡ Performance Benchmark
Details
History
|
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR refactors performance benchmarking infrastructure by extracting inline benchmark scripts from GitHub Actions workflows into dedicated Node.js files in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Extract inline benchmark script to bench-run-history.mjs - Increase measurement runs from 10/20/10 to 20/30/20 to match perf-benchmark.yml for consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploying repomix with
|
| Latest commit: |
50a2cc0
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://44a49c4b.repomix.pages.dev |
| Branch Preview URL: | https://perf-benchmark-interleave-ex.repomix.pages.dev |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1348 +/- ##
=======================================
Coverage 87.13% 87.13%
=======================================
Files 116 116
Lines 4393 4393
Branches 1020 1020
=======================================
Hits 3828 3828
Misses 565 565 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Alternate PR/main execution order on even/odd iterations to neutralize ordering bias from CPU/filesystem cache warming - Add try/catch in measurement loops so a single failure doesn't lose all data; abort if all runs fail - Extract shared esc(), extractHistory(), renderHistory() into bench-utils.mjs to eliminate duplication between pending and comment - Add error logging for JSON parse failures instead of silent catch - Fix biome lint: use template literals, sort imports, expand single-line try/catch blocks, avoid assignment in expressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add early exit guard matching bench-run-history.mjs behavior, so a broken build fails the workflow step instead of silently reporting 0ms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Improve performance benchmark reliability by reducing variance in the PR vs main comparison, and improve maintainability by extracting inline scripts.
Changes
Interleaved execution
Switch from sequential execution (all PR runs → all main runs) to interleaved execution (PR → main alternating each iteration). This ensures both branches experience similar runner load conditions at each measurement point, significantly reducing variance in the difference between PR and main timings.
Increased measurement runs
More samples improve statistical stability, especially important since the benchmark runs on shared CI runners. The additional time (~30-50s per OS) is well within the 15-minute timeout.
Extract scripts to separate files
Move inline Node.js scripts from YAML into
.github/scripts/perf-benchmark/:bench-run.mjs— Benchmark execution (interleaved measurement)bench-pending.mjs— Pending comment generationbench-comment.mjs— Results comment generationThis reduces the workflow YAML from ~370 lines to ~160 lines, enables proper syntax highlighting/linting, and makes the scripts easier to review and maintain. Jobs that only need the scripts use
sparse-checkoutfor fast checkout.Checklist
npm run testnpm run lint