Skip to content

ci(perf): Add benchmark history with JSON-in-comment storage#1289

Merged
yamadashy merged 5 commits intomainfrom
ci/perf-benchmark-json-history
Mar 22, 2026
Merged

ci(perf): Add benchmark history with JSON-in-comment storage#1289
yamadashy merged 5 commits intomainfrom
ci/perf-benchmark-json-history

Conversation

@yamadashy
Copy link
Copy Markdown
Owner

@yamadashy yamadashy commented Mar 22, 2026

Store benchmark history as JSON in an HTML comment within the PR comment body. Each benchmark run archives its results into a JSON array (<!-- bench-history-json [...] -->), which is parsed with JSON.parse instead of fragile HTML regex extraction.

How it works

  1. post-pending job: Reads existing comment, if it contains completed results, archives them into the JSON history array, then posts a pending comment with the preserved history
  2. comment job: Reads JSON history from the pending comment, posts final results with a History <details> section rendered from JSON

Key design decisions

  • JSON-in-comment: Data and view are separated — JSON is the source of truth, HTML is rendered from it
  • File-based passing: Old comment body is saved to $RUNNER_TEMP/old-comment.txt to avoid env var size/escaping issues
  • Shell for gh api: Commit message fetching stays in shell to avoid escaping nightmares in Node
  • 5 entry cap: History is limited to 5 entries to prevent comment bloat

Checklist

  • Run npm run test
  • Run npm run lint

🤖 Generated with Claude Code


Open with Devin

Store benchmark history as JSON in an HTML comment within the PR
comment body, replacing the need for artifact-based or HTML regex
parsing approaches.

- History data stored as `<!-- bench-history-json [...] -->`
- post-pending job archives completed results into JSON history
- comment job reads JSON history and renders History section
- Both jobs share the same renderHistory() logic
- Capped at 5 history entries to prevent comment bloat
- File-based comment passing to avoid env var escaping issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gemini-code-assist

This comment has been minimized.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 22, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 00fe2aa5-579f-4b74-9419-3c9c8e9585f1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Modified .github/workflows/perf-benchmark.yml to replace shell-based PR comment generation with Node-driven rendering, adding WORKFLOW_RUN_URL environment variable, embedded JSON history tracking with 5-entry limit, and improved benchmark result formatting.

Changes

Cohort / File(s) Summary
Benchmark Workflow Refactoring
.github/workflows/perf-benchmark.yml
Refactored PR comment generation from shell-based to Node-driven approach; added WORKFLOW_RUN_URL environment variable to post-pending and comment jobs; implemented embedded JSON history extraction and archiving (5-entry limit) via inline Node script; improved benchmark result reading from JSON files and formatting with optional History details section.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly summarizes the main change: adding benchmark history with JSON storage in comments.
Description check ✅ Passed The description comprehensively covers the change with detailed explanations of how it works, design decisions, and completed checklist items.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/perf-benchmark-json-history

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 22, 2026

⚡ Performance Benchmark

Latest commit:29e5b6e fix(ci): Add post-pending dependency to comment job
Status:✅ Benchmark complete!
Ubuntu:2.51s (±0.05s) → 2.53s (±0.02s) · +0.02s (+0.7%)
macOS:1.60s (±0.19s) → 1.86s (±0.17s) · +0.26s (+16.2%)
Windows:3.10s (±0.04s) → 3.10s (±0.02s) · -0.01s (-0.2%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded)
  • Measurement: 10 runs / 20 on macOS (median ± IQR)
  • Workflow run
History

b34dab8 fix(ci): Suppress shellcheck SC2016 for inline Node scripts

Ubuntu:2.56s (±0.01s) → 2.56s (±0.04s) · +0.01s (+0.2%)
macOS:1.32s (±0.07s) → 1.32s (±0.10s) · -0.00s (-0.3%)
Windows:3.03s (±0.13s) → 3.01s (±0.04s) · -0.02s (-0.7%)

ee092dd fix(ci): Skip archiving same SHA on benchmark rerun

Ubuntu:2.46s (±0.02s) → 2.49s (±0.06s) · +0.03s (+1.3%)
macOS:1.98s (±0.14s) → 2.01s (±0.19s) · +0.03s (+1.6%)
Windows:3.00s (±0.05s) → 3.25s (±0.13s) · +0.26s (+8.6%)

3828132 fix(ci): Harden benchmark history JSON parsing and HTML escaping

Ubuntu:2.62s (±0.03s) → 2.62s (±0.03s) · +0.00s (+0.0%)
macOS:1.25s (±0.04s) → 1.28s (±0.06s) · +0.04s (+2.8%)
Windows:3.63s (±0.68s) → 3.63s (±0.59s) · +0.01s (+0.1%)

ef865fb ci(perf): Add benchmark history with JSON-in-comment storage

Ubuntu:2.68s (±0.05s) → 2.69s (±0.08s) · +0.01s (+0.6%)
macOS:2.02s (±0.14s) → 2.13s (±0.49s) · +0.11s (+5.6%)
Windows:3.37s (±0.07s) → 3.46s (±0.06s) · +0.09s (+2.6%)

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 22, 2026

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 29e5b6e
Status: ✅  Deploy successful!
Preview URL: https://8acfa38e.repomix.pages.dev
Branch Preview URL: https://ci-perf-benchmark-json-histo.repomix.pages.dev

View logs

devin-ai-integration[bot]

This comment was marked as resolved.

@claude

This comment has been minimized.

- Scope OS-row regex to main table only (exclude History section)
- Wrap JSON.parse in try/catch to handle corrupted comment bodies
- HTML-escape commit messages to prevent injection and comment breakage
- Use start/end delimiters for JSON comment to avoid --> conflicts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
coderabbitai[bot]

This comment was marked as resolved.

Prevent duplicate history entry when a workflow is re-run on the
same commit by checking prevSha !== shortSha before archiving.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude

This comment has been minimized.

@claude

This comment has been minimized.

Add shellcheck disable directives for SC2016 (expressions don't
expand in single quotes) on node -e invocations where single quotes
are intentionally used to pass JavaScript code without shell expansion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Ensure comment job waits for post-pending to finish before reading
the PR comment body, preventing a race condition where post-pending
could overwrite completed results with an in-progress status.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 22, 2026

Code Review - PR #1289 (3rd Review)

Previous reviews covered HTML escaping, double-escaping, duplication, stale comments, and consistency risks. Here are new findings only:

Issues

1. False "Benchmark complete!" when all benchmarks fail

The comment job runs with if: always() && !cancelled(), so it executes even when all benchmark jobs fail and produce no artifacts. readResult() returns null for missing files, formatResult(null) returns "-", and the comment shows a success status with all dashes. This is misleading — it claims success when everything failed. Consider checking if any results were read before displaying the success status.

2. No error gate between Node script and file read

If the inline Node script crashes (syntax error, runtime exception), the shell continues to read the temp file which either has stale content from a previous run or does not exist. The actual Node error gets buried. Consider deleting the temp file before running Node so stale reads are impossible.

3. Silent JSON parse failure with no diagnostic output

Both try/catch blocks around JSON.parse silently discard errors. If the history JSON gets corrupted, all history is lost with zero indication of why. A one-line console.error in the catch would make debugging possible.

Suggestions (non-blocking)

4. Redundant API call to re-fetch comment body

Both jobs make two API calls: one paginated call to find the comment ID (which already returns the full body in the response), then a second call to fetch the body by ID. These could be combined into a single call by extracting both .id and .body from the paginated response.

5. Concurrent pushes can silently lose history

If two commits are pushed to a PR in quick succession, both post-pending jobs read the same comment, compute their own histories, and write back — the second write silently overwrites the first. This is distinct from the previously-noted eventual-consistency risk. Consider using the comment ID ETag or a simple SHA-based version check.


Reviewed with Claude Code

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 6 additional findings in Devin Review.

Open in Devin Review

Comment on lines +242 to +243
- name: Comment on PR
if: ${{ github.event.pull_request.head.repo.fork == false }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Step summary no longer written for fork PRs due to merged steps

In the old code, GITHUB_STEP_SUMMARY was written in the "Generate benchmark report" step which had no if guard — it ran for all PRs, including forks. In the new code, the step summary write (fs.appendFileSync(summaryFile, ...) at line 352) is inside the "Comment on PR" step which is guarded by if: ${{ github.event.pull_request.head.repo.fork == false }} at line 243. This means fork PRs lose the step summary entirely. Previously, the step summary was fork PR authors' only way to view benchmark results (since the PR comment was also fork-guarded).

Prompt for agents
In .github/workflows/perf-benchmark.yml, the "comment" job (starting around line 229) currently has a single "Comment on PR" step (line 242) guarded by the fork check. To restore the old behavior where GITHUB_STEP_SUMMARY was written regardless of fork status, split this into two steps:

1. A "Generate benchmark report" step with NO `if` condition that runs the Node script to generate the comment body and write to GITHUB_STEP_SUMMARY. This step should write the body to $RUNNER_TEMP/new-comment.md.

2. A "Comment on PR" step with `if: ${{ github.event.pull_request.head.repo.fork == false }}` that reads $RUNNER_TEMP/new-comment.md and posts/updates the PR comment.

This matches the old code's pattern where report generation (including step summary) was unconditional, and only the PR comment posting was fork-guarded.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 22, 2026

Code Review - PR 1289 (4th Review)

Previous reviews (Claude 3rd review + CodeRabbit) comprehensively covered the major issues. After a fresh pass, most significant findings have already been raised. Here is what remains:

Already-raised issues still open (summary for tracking)

Issue Raised by Status
Commit messages with closing comment sequence can break JSON-in-comment CodeRabbit Open
Silent catch hides JSON corruption Claude 3rd review Open
False success status when all benchmarks fail Claude 3rd review Open
No error gate if Node script crashes before writing file Claude 3rd review Open
Regex parses all tables, not just the latest CodeRabbit Open

New finding (minor)

Asymmetric encoding of msg field between storage and rendering

In post-pending, prevMsg is extracted from already-rendered HTML (so it is HTML-escaped), then stored raw in the JSON history blob. In renderHistory, h.msg is injected directly into HTML without escaping, which works today because the value is already HTML-escaped from the prior render. But commitMsg for the current entry is escaped via esc() before rendering, then the HTML-escaped version gets stored in JSON on the next cycle.

This means history entries accumulate one layer of HTML escaping per cycle they survive. After 2 cycles, an ampersand becomes double-escaped. This is a concrete bug for commit messages containing special HTML characters and they will progressively double-escape in the history section.

Fix: Store the plain-text commit message in JSON (not the HTML-escaped version). When extracting prevMsg, unescape HTML entities back to plain text before storing in the JSON blob. In renderHistory, always apply esc() to h.msg before inserting into HTML.

Overall assessment

This is a well-structured CI change. The JSON-in-comment approach is clever and the file-based passing avoids env var escaping issues. The main risks are around HTML/comment escaping edge cases. Once the injection and double-escaping issues are addressed, this looks good to merge.


Reviewed with Claude Code

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.18%. Comparing base (506d7cf) to head (29e5b6e).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1289   +/-   ##
=======================================
  Coverage   87.18%   87.18%           
=======================================
  Files         115      115           
  Lines        4324     4324           
  Branches     1002     1002           
=======================================
  Hits         3770     3770           
  Misses        554      554           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yamadashy yamadashy merged commit f4a5131 into main Mar 22, 2026
63 checks passed
@yamadashy yamadashy deleted the ci/perf-benchmark-json-history branch March 22, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant