Skip to content

ci(bench): move CodSpeed to physical-exclusive runner and add walltime mode#2793

Draft
upupming wants to merge 6 commits into
mainfrom
ci/codspeed-physical-exclusive-walltime
Draft

ci(bench): move CodSpeed to physical-exclusive runner and add walltime mode#2793
upupming wants to merge 6 commits into
mainfrom
ci/codspeed-physical-exclusive-walltime

Conversation

@upupming
Copy link
Copy Markdown
Collaborator

@upupming upupming commented Jun 4, 2026

Summary

  • Switch the benchmark job from lynx-ubuntu-24.04-xlarge to the new physical-exclusive runner.
  • Plumb OSS cache credentials (ACCESS_KEY / SECRET_KEY / ENDPOINT / BUCKET_NAME / REGION) at job level so lynx-infra/cache can authenticate on the new machine (the xlarge VM had these baked into the image; the physical runner doesn't).
  • Add a second codspeed run --mode walltime step alongside the existing simulation run, so we get real-world timings on top of the deterministic instruction-count data.

Draft — exploring whether the physical runner gives stabler walltime numbers than the shared VM. If the walltime variance is small enough, we can start treating it as a reportable signal next to simulation.

Test plan

  • Benchmark / nodejs-benchmark job runs to completion on the new runner (no lack params: accessKeyId/... errors from lynx-infra/cache).
  • CodSpeed receives one simulation run and one walltime run for the same commit.
  • Compare walltime variance vs. simulation on a couple of commits to decide whether walltime is signal or noise on this hardware.

…e mode

Switch the benchmark job to the new physical-exclusive runner, plumb the
OSS cache credentials (ACCESS_KEY / SECRET_KEY / ENDPOINT / BUCKET_NAME /
REGION) at job level so lynx-infra/cache can authenticate, and add a
second codspeed step in walltime mode alongside the existing simulation
run.

Draft — exploring whether the physical runner gives stabler walltime
numbers than the xlarge VM.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 4, 2026

⚠️ No Changeset found

Latest commit: 89ede06

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a0f681b3-1b71-4968-9c15-971a2e19605a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/codspeed-physical-exclusive-walltime

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

The previous commit set ACCESS_KEY/SECRET_KEY/ENDPOINT/BUCKET_NAME/REGION
as job env from ${{ secrets.* }}, but workflow-bench.yml is invoked via
workflow_call from test.yml without a `secrets:` block, so the secrets
resolved to empty strings and lynx-infra/cache failed with
`lack params: accessKeyId, accessKeySecret, region`.

Declare the five secrets as optional inputs on the called workflow and
forward them explicitly from test.yml so the OSS cache restore can
authenticate on the new physical-exclusive runner (the xlarge image had
these credentials baked in; the physical runner doesn't).
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

UI Judge

GEQI weighted score: 60.6 / 100 across 8 examples.
Average visual-correctness score: 3.4 / 5.

Dimension Weight Average Results Status
Usability & Interaction 30% 3 / 5 8 OK
Visual & Aesthetics 25% 3.1 / 5 8 OK
Consistency & Standards 15% 3 / 5 8 OK
Architecture & UX Writing 15% 3 / 5 8 OK
Accessibility & Performance 15% 3 / 5 8 OK
# Example Visual Correctness Usability & Interaction (30%) Visual & Aesthetics (25%) Consistency & Standards (15%) Architecture & UX Writing (15%) Accessibility & Performance (15%) GEQI Page Status
1 recs 2 / 5 2 / 5 3 / 5 2 / 5 2 / 5 2 / 5 45 / 100 preview OK
2 cast-grid 5 / 5 3 / 5 4 / 5 4 / 5 4 / 5 4 / 5 74 / 100 preview OK
3 citywalk-list 2 / 5 2 / 5 3 / 5 2 / 5 2 / 5 3 / 5 48 / 100 preview OK
4 fridge-search 4 / 5 3 / 5 3 / 5 3 / 5 3 / 5 3 / 5 60 / 100 preview OK
5 trip-planner 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 40 / 100 preview OK
6 weather-current 5 / 5 5 / 5 4 / 5 5 / 5 4 / 5 4 / 5 89 / 100 preview OK
7 product-card 5 / 5 5 / 5 4 / 5 4 / 5 5 / 5 4 / 5 89 / 100 preview OK
8 workout-plan 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 40 / 100 preview OK
Details

Result 1

  • Example: recs
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 3 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 2 / 5 (15%)
    • Accessibility & Performance: 2 / 5 (15%)
  • Task: The A2UI playground preview should show date-night dining recommendations for Moonlight Terrace, Pinewood Bistro, and Sea Breeze Kitchen.

Result 2

  • Example: cast-grid
  • Dimension: visual-correctness
  • Visual correctness: 5 / 5
  • GEQI dimensions:
    • Usability & Interaction: 3 / 5 (30%)
    • Visual & Aesthetics: 4 / 5 (25%)
    • Consistency & Standards: 4 / 5 (15%)
    • Architecture & UX Writing: 4 / 5 (15%)
    • Accessibility & Performance: 4 / 5 (15%)
  • Task: The A2UI playground preview should show a cast grid for the short film Night Notes, including Lin Xia and Zhou Ning cast cards.

Result 3

  • Example: citywalk-list
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 3 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 2 / 5 (15%)
    • Accessibility & Performance: 3 / 5 (15%)
  • Task: The A2UI playground preview should show weekend citywalk coffee picks with Rooftop Brew Room, Corner Canvas Lab, and Late Sun Roastery.

Result 4

  • Example: fridge-search
  • Dimension: visual-correctness
  • Visual correctness: 4 / 5
  • GEQI dimensions:
    • Usability & Interaction: 3 / 5 (30%)
    • Visual & Aesthetics: 3 / 5 (25%)
    • Consistency & Standards: 3 / 5 (15%)
    • Architecture & UX Writing: 3 / 5 (15%)
    • Accessibility & Performance: 3 / 5 (15%)
  • Task: The A2UI playground preview should show refrigerator search results with Siemens, Hualing, Haier, and Midea product cards.

Result 5

  • Example: trip-planner
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 2 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 2 / 5 (15%)
    • Accessibility & Performance: 2 / 5 (15%)
  • Task: The A2UI playground preview should show a Kyoto 48-hour trip planner with Day 1 and Day 2 itinerary sections, including Monkey Park Viewpoint.

Result 6

  • Example: weather-current
  • Dimension: visual-correctness
  • Visual correctness: 5 / 5
  • GEQI dimensions:
    • Usability & Interaction: 5 / 5 (30%)
    • Visual & Aesthetics: 4 / 5 (25%)
    • Consistency & Standards: 5 / 5 (15%)
    • Architecture & UX Writing: 4 / 5 (15%)
    • Accessibility & Performance: 4 / 5 (15%)
  • Task: The A2UI playground preview should show the current weather for Austin, TX, including clear skies with light breeze.

Result 7

  • Example: product-card
  • Dimension: visual-correctness
  • Visual correctness: 5 / 5
  • GEQI dimensions:
    • Usability & Interaction: 5 / 5 (30%)
    • Visual & Aesthetics: 4 / 5 (25%)
    • Consistency & Standards: 4 / 5 (15%)
    • Architecture & UX Writing: 5 / 5 (15%)
    • Accessibility & Performance: 4 / 5 (15%)
  • Task: The A2UI playground preview should show a Wireless Headphones Pro product card with a visible Add to Cart action.

Result 8

  • Example: workout-plan
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 2 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 2 / 5 (15%)
    • Accessibility & Performance: 2 / 5 (15%)
  • Task: The A2UI playground preview should show a weekly workout plan with five days from Monday Ramp-Up through Friday Conditioning.

Workflow run

upupming added 3 commits June 4, 2026 12:28
The build job runs on lynx-ubuntu-24.04-xlarge whose OSS credentials
point to a different bucket than the new physical-exclusive runner used
by this benchmark job. With fail-on-cache-miss: true the cache restore
hard-fails, since the cache key written by build can't be read by bench.

Drop fail-on-cache-miss so the restore step is best-effort and let
`pnpm turbo build` rebuild locally on the physical machine instead. Bump
timeout-minutes to 45 to absorb the cold rust compile.
The physical-exclusive runner image doesn't ship cargo, so `pnpm turbo
build` fails at @lynx-js/swc-plugin-reactlynx-compat#build with
`/bin/sh: 1: cargo: not found` when it invokes the package's build.js
(which shells out to `cargo build` to compile the SWC plugin's Rust
crate). The lynx-ubuntu-24.04-xlarge image had Rust pre-installed; the
new physical runner doesn't.

Add the repo's ./.github/actions/rustup composite action (same one used
by workflow-build.yml) ahead of TurboCache. Reuse the action's
save-if='${{ github.ref_name == 'main' }}' gate so the rustup cache
isn't written from PR runs.
…runner

The reusable ./.github/actions/rustup action wrote `${HOME:-/.cargo}/bin`
to GITHUB_PATH, but \$HOME is not propagated between steps on the
physical-exclusive self-hosted runner (visible in setup-uv's
`Added undefined/.local/bin to the path`), so the resulting PATH entry
is empty and the next step fails with `rustup: command not found`.

Install rustup directly with explicit HOME=/root (the runner's true
home, evidenced by the existing /root/.rustup/settings.toml the
installer found) and pin /root/.cargo/bin onto GITHUB_PATH. Also export
HOME=/root in the simulation/walltime steps so
`. "$HOME/.cargo/env"` resolves correctly to pick up the codspeed CLI
that the prepare step installed.
… build

After fixing PATH propagation, the next failure was
@lynx-js/web-core#build:wasm aborting in rustup with
`could not rename '...partial' file ... No such file or directory`
(os error 2). The previous step installed the `stable` toolchain, but
rust-toolchain.toml pins 1.92.0 and adds wasm32-unknown-unknown /
wasm32-wasip1 targets. With turbo running many cargo invocations
concurrently, multiple processes each tried to sync the 1.92.0 channel
and races on /root/.rustup/downloads/*.partial corrupted the install.

Install rustup with --default-toolchain none, then explicitly install
the pinned toolchain and add the wasm targets up front so every parallel
cargo invocation downstream sees a complete, ready-to-use install.
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 4, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 87 untouched benchmarks
🆕 10 new benchmarks
⏩ 26 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 WallTime transform 1000 view elements N/A 5.7 ms N/A
🆕 WallTime basic-performance-large-css N/A 1.1 ms N/A
🆕 WallTime basic-performance-nest-level-100 N/A 945.8 µs N/A
🆕 WallTime basic-performance-small-css N/A 972.2 µs N/A
🆕 WallTime basic-performance-div-1000 N/A 8.8 ms N/A
🆕 WallTime basic-performance-image-100 N/A 1.3 ms N/A
🆕 WallTime basic-performance-div-10000 N/A 30.8 ms N/A
🆕 WallTime basic-performance-div-100 N/A 1 ms N/A
🆕 WallTime basic-performance-text-200 N/A 2 ms N/A
🆕 WallTime basic-performance-scroll-view-100 N/A 1.5 ms N/A

Comparing ci/codspeed-physical-exclusive-walltime (89ede06) with main (e33c08f)

Open in CodSpeed

Footnotes

  1. 26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Comment thread .github/workflows/workflow-bench.yml Fixed
@coolkiid coolkiid force-pushed the ci/codspeed-physical-exclusive-walltime branch from dc05dc8 to 89ede06 Compare June 5, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants