ci(bench): move CodSpeed to physical-exclusive runner and add walltime mode by upupming · Pull Request #2793 · lynx-family/lynx-stack

upupming · 2026-06-04T03:20:37Z

Summary

Switch the benchmark job from lynx-ubuntu-24.04-xlarge to the new physical-exclusive runner.
Plumb OSS cache credentials (ACCESS_KEY / SECRET_KEY / ENDPOINT / BUCKET_NAME / REGION) at job level so lynx-infra/cache can authenticate on the new machine (the xlarge VM had these baked into the image; the physical runner doesn't).
Add a second codspeed run --mode walltime step alongside the existing simulation run, so we get real-world timings on top of the deterministic instruction-count data.

Draft — exploring whether the physical runner gives stabler walltime numbers than the shared VM. If the walltime variance is small enough, we can start treating it as a reportable signal next to simulation.

Test plan

Benchmark / nodejs-benchmark job runs to completion on the new runner (no lack params: accessKeyId/... errors from lynx-infra/cache).
CodSpeed receives one simulation run and one walltime run for the same commit.
Compare walltime variance vs. simulation on a couple of commits to decide whether walltime is signal or noise on this hardware.

…e mode Switch the benchmark job to the new physical-exclusive runner, plumb the OSS cache credentials (ACCESS_KEY / SECRET_KEY / ENDPOINT / BUCKET_NAME / REGION) at job level so lynx-infra/cache can authenticate, and add a second codspeed step in walltime mode alongside the existing simulation run. Draft — exploring whether the physical runner gives stabler walltime numbers than the xlarge VM.

changeset-bot · 2026-06-04T03:20:43Z

⚠️ No Changeset found

Latest commit: 89ede06

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-06-04T03:20:45Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a0f681b3-1b71-4968-9c15-971a2e19605a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ci/codspeed-physical-exclusive-walltime

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-06-04T03:23:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

The previous commit set ACCESS_KEY/SECRET_KEY/ENDPOINT/BUCKET_NAME/REGION as job env from ${{ secrets.* }}, but workflow-bench.yml is invoked via workflow_call from test.yml without a `secrets:` block, so the secrets resolved to empty strings and lynx-infra/cache failed with `lack params: accessKeyId, accessKeySecret, region`. Declare the five secrets as optional inputs on the called workflow and forward them explicitly from test.yml so the OSS cache restore can authenticate on the new physical-exclusive runner (the xlarge image had these credentials baked in; the physical runner doesn't).

github-actions · 2026-06-04T04:00:20Z

UI Judge

GEQI weighted score: 60.6 / 100 across 8 examples.
Average visual-correctness score: 3.4 / 5.

Dimension	Weight	Average	Results	Status
Usability & Interaction	30%	3 / 5	8	OK
Visual & Aesthetics	25%	3.1 / 5	8	OK
Consistency & Standards	15%	3 / 5	8	OK
Architecture & UX Writing	15%	3 / 5	8	OK
Accessibility & Performance	15%	3 / 5	8	OK

#	Example	Visual Correctness	Usability & Interaction (30%)	Visual & Aesthetics (25%)	Consistency & Standards (15%)	Architecture & UX Writing (15%)	Accessibility & Performance (15%)	GEQI	Page	Status
1	recs	2 / 5	2 / 5	3 / 5	2 / 5	2 / 5	2 / 5	45 / 100	preview	OK
2	cast-grid	5 / 5	3 / 5	4 / 5	4 / 5	4 / 5	4 / 5	74 / 100	preview	OK
3	citywalk-list	2 / 5	2 / 5	3 / 5	2 / 5	2 / 5	3 / 5	48 / 100	preview	OK
4	fridge-search	4 / 5	3 / 5	3 / 5	3 / 5	3 / 5	3 / 5	60 / 100	preview	OK
5	trip-planner	2 / 5	2 / 5	2 / 5	2 / 5	2 / 5	2 / 5	40 / 100	preview	OK
6	weather-current	5 / 5	5 / 5	4 / 5	5 / 5	4 / 5	4 / 5	89 / 100	preview	OK
7	product-card	5 / 5	5 / 5	4 / 5	4 / 5	5 / 5	4 / 5	89 / 100	preview	OK
8	workout-plan	2 / 5	2 / 5	2 / 5	2 / 5	2 / 5	2 / 5	40 / 100	preview	OK

Details

Result 1

Example: recs
Dimension: visual-correctness
Visual correctness: 2 / 5
GEQI dimensions:
- Usability & Interaction: 2 / 5 (30%)
- Visual & Aesthetics: 3 / 5 (25%)
- Consistency & Standards: 2 / 5 (15%)
- Architecture & UX Writing: 2 / 5 (15%)
- Accessibility & Performance: 2 / 5 (15%)
Task: The A2UI playground preview should show date-night dining recommendations for Moonlight Terrace, Pinewood Bistro, and Sea Breeze Kitchen.

Result 2

Example: cast-grid
Dimension: visual-correctness
Visual correctness: 5 / 5
GEQI dimensions:
- Usability & Interaction: 3 / 5 (30%)
- Visual & Aesthetics: 4 / 5 (25%)
- Consistency & Standards: 4 / 5 (15%)
- Architecture & UX Writing: 4 / 5 (15%)
- Accessibility & Performance: 4 / 5 (15%)
Task: The A2UI playground preview should show a cast grid for the short film Night Notes, including Lin Xia and Zhou Ning cast cards.

Result 3

Example: citywalk-list
Dimension: visual-correctness
Visual correctness: 2 / 5
GEQI dimensions:
- Usability & Interaction: 2 / 5 (30%)
- Visual & Aesthetics: 3 / 5 (25%)
- Consistency & Standards: 2 / 5 (15%)
- Architecture & UX Writing: 2 / 5 (15%)
- Accessibility & Performance: 3 / 5 (15%)
Task: The A2UI playground preview should show weekend citywalk coffee picks with Rooftop Brew Room, Corner Canvas Lab, and Late Sun Roastery.

Result 4

Example: fridge-search
Dimension: visual-correctness
Visual correctness: 4 / 5
GEQI dimensions:
- Usability & Interaction: 3 / 5 (30%)
- Visual & Aesthetics: 3 / 5 (25%)
- Consistency & Standards: 3 / 5 (15%)
- Architecture & UX Writing: 3 / 5 (15%)
- Accessibility & Performance: 3 / 5 (15%)
Task: The A2UI playground preview should show refrigerator search results with Siemens, Hualing, Haier, and Midea product cards.

Result 5

Example: trip-planner
Dimension: visual-correctness
Visual correctness: 2 / 5
GEQI dimensions:
- Usability & Interaction: 2 / 5 (30%)
- Visual & Aesthetics: 2 / 5 (25%)
- Consistency & Standards: 2 / 5 (15%)
- Architecture & UX Writing: 2 / 5 (15%)
- Accessibility & Performance: 2 / 5 (15%)
Task: The A2UI playground preview should show a Kyoto 48-hour trip planner with Day 1 and Day 2 itinerary sections, including Monkey Park Viewpoint.

Result 6

Example: weather-current
Dimension: visual-correctness
Visual correctness: 5 / 5
GEQI dimensions:
- Usability & Interaction: 5 / 5 (30%)
- Visual & Aesthetics: 4 / 5 (25%)
- Consistency & Standards: 5 / 5 (15%)
- Architecture & UX Writing: 4 / 5 (15%)
- Accessibility & Performance: 4 / 5 (15%)
Task: The A2UI playground preview should show the current weather for Austin, TX, including clear skies with light breeze.

Result 7

Example: product-card
Dimension: visual-correctness
Visual correctness: 5 / 5
GEQI dimensions:
- Usability & Interaction: 5 / 5 (30%)
- Visual & Aesthetics: 4 / 5 (25%)
- Consistency & Standards: 4 / 5 (15%)
- Architecture & UX Writing: 5 / 5 (15%)
- Accessibility & Performance: 4 / 5 (15%)
Task: The A2UI playground preview should show a Wireless Headphones Pro product card with a visible Add to Cart action.

Result 8

Example: workout-plan
Dimension: visual-correctness
Visual correctness: 2 / 5
GEQI dimensions:
- Usability & Interaction: 2 / 5 (30%)
- Visual & Aesthetics: 2 / 5 (25%)
- Consistency & Standards: 2 / 5 (15%)
- Architecture & UX Writing: 2 / 5 (15%)
- Accessibility & Performance: 2 / 5 (15%)
Task: The A2UI playground preview should show a weekly workout plan with five days from Monday Ramp-Up through Friday Conditioning.

Workflow run

The build job runs on lynx-ubuntu-24.04-xlarge whose OSS credentials point to a different bucket than the new physical-exclusive runner used by this benchmark job. With fail-on-cache-miss: true the cache restore hard-fails, since the cache key written by build can't be read by bench. Drop fail-on-cache-miss so the restore step is best-effort and let `pnpm turbo build` rebuild locally on the physical machine instead. Bump timeout-minutes to 45 to absorb the cold rust compile.

The physical-exclusive runner image doesn't ship cargo, so `pnpm turbo build` fails at @lynx-js/swc-plugin-reactlynx-compat#build with `/bin/sh: 1: cargo: not found` when it invokes the package's build.js (which shells out to `cargo build` to compile the SWC plugin's Rust crate). The lynx-ubuntu-24.04-xlarge image had Rust pre-installed; the new physical runner doesn't. Add the repo's ./.github/actions/rustup composite action (same one used by workflow-build.yml) ahead of TurboCache. Reuse the action's save-if='${{ github.ref_name == 'main' }}' gate so the rustup cache isn't written from PR runs.

…runner The reusable ./.github/actions/rustup action wrote `${HOME:-/.cargo}/bin` to GITHUB_PATH, but \$HOME is not propagated between steps on the physical-exclusive self-hosted runner (visible in setup-uv's `Added undefined/.local/bin to the path`), so the resulting PATH entry is empty and the next step fails with `rustup: command not found`. Install rustup directly with explicit HOME=/root (the runner's true home, evidenced by the existing /root/.rustup/settings.toml the installer found) and pin /root/.cargo/bin onto GITHUB_PATH. Also export HOME=/root in the simulation/walltime steps so `. "$HOME/.cargo/env"` resolves correctly to pick up the codspeed CLI that the prepare step installed.

… build After fixing PATH propagation, the next failure was @lynx-js/web-core#build:wasm aborting in rustup with `could not rename '...partial' file ... No such file or directory` (os error 2). The previous step installed the `stable` toolchain, but rust-toolchain.toml pins 1.92.0 and adds wasm32-unknown-unknown / wasm32-wasip1 targets. With turbo running many cargo invocations concurrently, multiple processes each tried to sync the 1.92.0 channel and races on /root/.rustup/downloads/*.partial corrupted the install. Install rustup with --default-toolchain none, then explicitly install the pinned toolchain and add the wasm targets up front so every parallel cargo invocation downstream sees a complete, ready-to-use install.

codspeed-hq · 2026-06-04T07:06:57Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 87 untouched benchmarks
🆕 10 new benchmarks
⏩ 26 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
🆕	WallTime	`transform 1000 view elements`	N/A	5.7 ms	N/A
🆕	WallTime	`basic-performance-large-css`	N/A	1.1 ms	N/A
🆕	WallTime	`basic-performance-nest-level-100`	N/A	945.8 µs	N/A
🆕	WallTime	`basic-performance-small-css`	N/A	972.2 µs	N/A
🆕	WallTime	`basic-performance-div-1000`	N/A	8.8 ms	N/A
🆕	WallTime	`basic-performance-image-100`	N/A	1.3 ms	N/A
🆕	WallTime	`basic-performance-div-10000`	N/A	30.8 ms	N/A
🆕	WallTime	`basic-performance-div-100`	N/A	1 ms	N/A
🆕	WallTime	`basic-performance-text-200`	N/A	2 ms	N/A
🆕	WallTime	`basic-performance-scroll-view-100`	N/A	1.5 ms	N/A

_{Comparing ci/codspeed-physical-exclusive-walltime (89ede06) with main (e33c08f)}

26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

upupming added 3 commits June 4, 2026 12:28

upupming mentioned this pull request Jun 4, 2026

fix(genui-a2ui-prompt): serialize build:api after build to avoid rspack cache race #2794

Merged

3 tasks

github-advanced-security AI found potential problems Jun 5, 2026

View reviewed changes

Comment thread .github/workflows/workflow-bench.yml Fixed

coolkiid force-pushed the ci/codspeed-physical-exclusive-walltime branch from dc05dc8 to 89ede06 Compare June 5, 2026 07:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(bench): move CodSpeed to physical-exclusive runner and add walltime mode#2793

ci(bench): move CodSpeed to physical-exclusive runner and add walltime mode#2793
upupming wants to merge 6 commits into
mainfrom
ci/codspeed-physical-exclusive-walltime

upupming commented Jun 4, 2026

Uh oh!

changeset-bot Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Review skipped

Uh oh!

codecov Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Result 1

Result 2

Result 3

Result 4

Result 5

Result 6

Result 7

Result 8

Uh oh!

codspeed-hq Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

upupming commented Jun 4, 2026

Summary

Test plan

Uh oh!

changeset-bot Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

UI Judge

Result 1

Result 2

Result 3

Result 4

Result 5

Result 6

Result 7

Result 8

Uh oh!

codspeed-hq Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Footnotes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot Bot commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

codecov Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 4, 2026 •

edited

Loading