Restructure benchmark workflow with per-test jobs and dual runner groups by michaelstaib · Pull Request #9217 · ChilliCream/graphql-platform

michaelstaib · 2026-02-25T13:17:08Z

Summary

Split the monolithic benchmark workflow into 12 separate matrix jobs: 3 tests (no-recursion, deep-recursion, variable-batch) x 2 modes (constant, ramping) x 2 runner groups (Benchmarking, Benchmarking-2)
Each job runs independently on its assigned runner group and progressively updates the PR comment as it completes, providing immediate feedback
The performance report distinguishes results by runner group: Constant 1 (Benchmarking) and Constant 2 (Benchmarking-2), etc.
Pending benchmarks show as pending until their job completes
Baseline results are stored per runner group in the external performance data repository

New files

run-single-benchmark.sh - Runs a single test+mode combination with median calculation for constant mode
generate-report.sh - Merges all available result JSONs into a combined markdown report

Test plan

Verify workflow matrix generates 12 jobs (3 tests x 2 modes x 2 runner groups)
Verify each job runs on the correct runner group
Verify PR comment is created/updated as each job completes
Verify the report shows "pending" for incomplete benchmarks
Verify final report shows all 12 results with correct runner labels
Verify baseline storage works on push to main

🤖 Generated with Claude Code

…nner groups Split the monolithic benchmark workflow into separate jobs per benchmark (no-recursion, deep-recursion, variable-batch) x mode (constant, ramping) x runner group (Benchmarking, Benchmarking-2) using a matrix strategy. Each of the 12 jobs runs independently and progressively updates the PR comment as it completes, giving immediate feedback. The report table distinguishes runner groups as "Constant 1" / "Constant 2" etc. New scripts: - run-single-benchmark.sh: runs a single test+mode combination - generate-report.sh: merges available results into a combined report Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-25T13:24:08Z

Fusion Gateway Performance Results

Simple Composite Query

	Req/s	Err%
Constant 1 (50 VUs)	5058.48	0.00%
Constant 2 (50 VUs)	3764.65	0.00%
Ramping 1 (0-500-0 VUs)	5605.69	0.00%
Ramping 2 (0-500-0 VUs)	3829.16	0.00%

Response Times

	Min	Med	Avg	P90	P95	Max
Constant 1	0.58ms	8.08ms	9.74ms	15.90ms	21.66ms	179.55ms
Constant 2	1.05ms	10.58ms	13.05ms	21.70ms	30.05ms	251.05ms
Ramping 1	0.62ms	34.61ms	39.88ms	74.96ms	104.18ms	250.85ms
Ramping 2	1.33ms	51.55ms	57.08ms	100.06ms	142.75ms	289.04ms

Deep Recursion Query

	Req/s	Err%
Constant 1 (50 VUs)	953.66	0.00%
Constant 2 (50 VUs)	676.05	0.00%
Ramping 1 (0-500-0 VUs)	1146.77	0.00%
Ramping 2 (0-500-0 VUs)	770.77	0.00%

Response Times

	Min	Med	Avg	P90	P95	Max
Constant 1	4.87ms	46.72ms	50.97ms	64.96ms	76.68ms	490.79ms
Constant 2	11.39ms	64.51ms	71.21ms	91.26ms	108.62ms	765.07ms
Ramping 1	1.92ms	159.64ms	184.76ms	400.69ms	449.50ms	707.73ms
Ramping 2	3.10ms	241.07ms	267.35ms	556.33ms	620.58ms	982.39ms

Variable Batching Throughput

	Req/s	Err%
Constant 1 (50 VUs)	11772.28	0.00%
Constant 2 (50 VUs)	5761.46	0.00%
Ramping 1 (0-500-0 VUs)	9229.55	0.00%
Ramping 2 (0-500-0 VUs)	5197.23	0.00%

Response Times

	Min	Med	Avg	P90	P95	Max
Constant 1	0.09ms	3.87ms	4.20ms	6.86ms	9.17ms	50.46ms
Constant 2	0.16ms	8.10ms	8.58ms	14.35ms	17.31ms	60.32ms
Ramping 1	0.10ms	21.74ms	25.01ms	47.16ms	64.48ms	169.17ms
Ramping 2	0.20ms	39.49ms	44.19ms	82.60ms	106.80ms	229.01ms

Runner 1 = Benchmarking, Runner 2 = Benchmarking-2

Run 22403122842 • Commit 60ac985 • Wed, 25 Feb 2026 19:48:42 GMT

Per-job updates now track a completion count in a hidden HTML marker and only overwrite the PR comment if they have more data than what's already posted. A separate "Final Performance Report" job runs after all benchmarks complete (using needs + if: always()) and posts the definitive result with all available data, guaranteeing correctness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Benchmarking-2 machines have 8 cores instead of 16. The script now selects the CPU pinning profile based on the runner group: Benchmarking (16 cores): k6: 0-1, Gateway: 2-4/2-5, Sources: 5-15/6-15, Inventory: 2-5 Benchmarking-2 (8 cores): k6: 0, Gateway: 1-2, Sources: 3-7, Inventory: 1-7 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…pproach The benchmark jobs were wasting runner time on artifact downloads, report generation, and comment updates. Now each benchmark job only does: 1. Run the benchmark 2. Upload the artifact 3. A lightweight github-script step that reads the local result.json, pulls accumulated data from a hidden JSON block in the PR comment, merges its own result, regenerates the markdown inline, and updates. No artifact downloads, no shell script execution for reporting — just a few GitHub API calls. The benchmark runner is freed immediately. The accumulated results are stored as base64-encoded JSON in a hidden HTML comment () so each job can read what previous jobs posted and build on it. The final "report" job on ubuntu-latest still runs after all benchmarks complete to post the definitive result from artifacts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The previous pinning gave the gateway only 3 cores on an 8-core machine while source schemas got the rest — starving the component being measured. Cores 8-15 were also referenced on machines that only have 0-7. New pinning matches the gateways benchmark repo: Constant (50 VUs): k6 core 0, Gateway cores 1-2, Sources unpinned Ramping (500 VUs): k6 core 0, Gateway cores 1-3, Sources unpinned Same layout for all runner groups — pinning is by mode, not machine size. Helper scripts now default to no pinning when env vars are unset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Detects total cores via nproc and pins source schemas to whatever is left after k6 (core 0) and gateway (1-2 or 1-3): Constant: k6=0, Gateway=1-2, Sources=3-(N-1) Ramping: k6=0, Gateway=1-3, Sources=4-(N-1) On an 8-core machine this gives sources cores 3-7 (constant) or 4-7 (ramping). Clean separation with no overlap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The final report job used completed:999 which always won over progressive updates. When cancel-in-progress killed jobs mid-run, the final report had fewer artifacts than the accumulated comment data and overwrote it. Now the final report job: 1. Reads artifacts AND the accumulated JSON from the existing PR comment 2. Merges both (artifacts win for conflicts, comment fills gaps) 3. Uses the same optimistic concurrency check — only updates if it has more data than what's already posted 4. Preserves the accumulated data block for future updates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When a new commit triggers a new workflow run, the PR comment now detects the mismatched run ID and discards accumulated data from the previous run, preventing stale results from persisting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Instead of relying on run ID detection to reset stale data, a dedicated setup job on ubuntu-latest now posts the all-pending comment before any benchmark jobs run. Progressive updates only accumulate data from the matching run ID as a safety net. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions Bot added 🧰 maintenance 🌶️ hot chocolate labels Feb 25, 2026

michaelstaib and others added 8 commits February 25, 2026 13:29

michaelstaib merged commit 469250f into main Feb 25, 2026
130 checks passed

michaelstaib deleted the mst/benchmarks-wf branch February 25, 2026 15:29

dependabot Bot mentioned this pull request Mar 2, 2026

deps(nuget): Bump the hotchocolate group with 10 updates nexamediaserver/server#169

Closed

dependabot Bot mentioned this pull request Mar 16, 2026

deps(nuget): Bump the hotchocolate group with 10 updates nexamediaserver/server#175

Closed

This was referenced May 1, 2026

Bump HotChocolate.AspNetCore from 15.1.16 to 16.0.0 TheCodeTraveler/HotChocolateGraphQL#166

Merged

Bump StrawberryShake.Maui from 15.1.16 to 16.0.0 TheCodeTraveler/HotChocolateGraphQL#167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restructure benchmark workflow with per-test jobs and dual runner groups#9217

Restructure benchmark workflow with per-test jobs and dual runner groups#9217
michaelstaib merged 9 commits intomainfrom
mst/benchmarks-wf

michaelstaib commented Feb 25, 2026

Uh oh!

github-actions Bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

michaelstaib commented Feb 25, 2026

Summary

New files

Test plan

Uh oh!

github-actions Bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fusion Gateway Performance Results

Simple Composite Query

Deep Recursion Query

Variable Batching Throughput

Runner 1 = Benchmarking, Runner 2 = Benchmarking-2

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Feb 25, 2026 •

edited

Loading