test(a2ui): use playground render preview for ui-judge tests by PupilTong · Pull Request #2673 · lynx-family/lynx-stack

PupilTong · 2026-05-20T03:30:17Z

Summary

Replace the ui-judge scratch HTTP fixture with the A2UI playground render.html preview path.
Start packages/genui/a2ui-playground with pnpm dev from a test-only helper, and cover several direct examples with speed=0.
Preserve judgePage as the only public @lynx-js/ui-judge API while keeping validation tests server-free.
Add .github/actions/ui-judge-comment, a dependency-free composite action that creates or updates a PR comment from UiJudgeResult JSON.
Wire UI Judge into the main Test workflow so relevant PRs can run the model-backed playground check and publish a PR comment.

Details

Playground test URLs now use routes such as /render.html?protocol=a2ui&demoUrl=.%2Fa2ui.web.js&theme=light&demo=recs&speed=0.
speed=0 now means no streaming delay, so all messages are processed immediately.
The model-backed test still runs only when MIDSCENE_MODEL_NAME is configured.
The PR comment action accepts result-file or result-json, supports dry-run output, updates an existing marked comment by default, and uses github.token unless a token is supplied.
The CI ui-judge job is gated to pull requests with Midscene secrets and UI Judge/A2UI/playground-related file changes; it writes UI_JUDGE_RESULT_FILE and calls the new comment action.
The ui-judge changed-file gate uses the GitHub pull request files API instead of local git diff, because the custom container runner does not expose the checkout as a Git worktree inside shell commands.
The ui-judge job now follows the web-elements Playwright pattern: it depends on the repository build job, restores the strict .turbo cache for the current commit, runs pnpm turbo build --summarize in the Playwright container, then prepares the A2UI playground Lynx artifact before testing.
If setup or build fails before the model test writes a result, CI writes a fallback result JSON so the PR comment still explains the failure.
Model-backed ui-judge failures still publish or update the PR comment, then fail the ui-judge job so the PR cannot pass on a broken judge run. Long setup, build, and judge steps have step-level timeouts so fallback comments can run before the job-level timeout.

Test Plan

pnpm turbo build:lynx --filter a2ui-playground
pnpm --filter @lynx-js/ui-judge build
UI_JUDGE_RESULT_FILE=/tmp/ui-judge-results.json pnpm --filter @lynx-js/ui-judge test (6 passed; generated a score result JSON)
INPUT_RESULT_FILE=/tmp/ui-judge-results.json INPUT_DRY_RUN=true GITHUB_REPOSITORY=lynx-family/lynx-stack GITHUB_RUN_ID=123 node .github/actions/ui-judge-comment/comment.mjs
node --check .github/actions/ui-judge-comment/comment.mjs
pnpm eslint .github/actions/ui-judge-comment/comment.mjs packages/genui/ui-judge/tests/judge-page.spec.ts --flag v10_config_lookup_from_file
pnpm biome check .github/actions/ui-judge-comment/comment.mjs packages/genui/ui-judge/tests/judge-page.spec.ts
ruby -e 'require "yaml"; YAML.load_file(".github/workflows/test.yml"); puts "yaml ok"'
env -u MIDSCENE_MODEL_NAME -u MIDSCENE_MODEL_API_KEY bash -ceu 'tmp=$(mktemp); event=$(mktemp); printf "{\"number\":2673}\n" > "$event"; export GITHUB_OUTPUT="$tmp" GITHUB_EVENT_PATH="$event" GITHUB_EVENT_NAME=pull_request; node --input-type=module <<'"'"'NODE'"'"' import { appendFileSync } from "node:fs"; let shouldRun = false; let reason = "UI Judge only comments on pull_request events."; if (process.env.GITHUB_EVENT_NAME === "pull_request") { if (!process.env.MIDSCENE_MODEL_NAME || !process.env.MIDSCENE_MODEL_API_KEY) { reason = "Midscene model secrets are not configured for this pull request."; } } appendFileSync(process.env.GITHUB_OUTPUT, should-run=${shouldRun}\n); appendFileSync(process.env.GITHUB_OUTPUT, reason=${reason}\n); console.info(reason); NODE cat "$tmp"'
git diff --check

Summary by CodeRabbit

Release Notes

New Features
- Added UI Judge GitHub Action to automatically post/update pull request comments with visual test results
- Integrated UI Judge into CI pipeline with artifact uploads
Improvements
- Playground simulation now supports speed=0 for no-delay rendering mode
- Updated visual-correctness grading mechanism for improved accuracy
Documentation
- Added comprehensive UI Judge CI integration and testing guidance
- Updated build and testing instructions

changeset-bot · 2026-05-20T03:30:22Z

⚠️ No Changeset found

Latest commit: 83162b2

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-05-20T03:30:24Z

📝 Walkthrough

Walkthrough

This PR integrates UI Judge visual correctness testing into the CI workflow end-to-end: enhances the playground runtime to support zero-delay simulation and multiple protocol variants, provides a playground server test helper replacing local fixtures, creates a GitHub composite action for posting results to PR comments, wires the workflow with fallback artifact handling, and documents operational practices.

Changes

UI Judge Visual Correctness Testing Integration

Layer / File(s)	Summary
Playground Runtime & Scoring Enhancements `packages/genui/a2ui-playground/lynx-src/a2ui/App.tsx`, `packages/genui/a2ui-playground/src/render.tsx`, `packages/genui/a2ui-playground/src/utils/renderUrl.ts`, `packages/genui/ui-judge/src/index.ts`	`streamDelay` treats `speed=0` as explicit no-delay mode; `protocol` type widened to `'0.9' \| 'a2ui' \| 'openui'` with updated query parsing; speed parameter accepts `>= 0`; UI Judge scoring prompt requires Midscene's structured `Number` field instead of bare integers.
Playground Test Infrastructure `packages/genui/ui-judge/tests/helpers/playground-preview-server.ts`, `packages/genui/ui-judge/tests/judge-page.spec.ts`, `packages/genui/ui-judge/README.md`	New test helper boots A2UI playground dev server on free local port, captures logs, polls readiness endpoints, exposes baseUrl/createDemoPreviewUrl/dispose/getLogs. Test suite refactored to use preview server with `speed=0` instead of local HTTP fixture; helper functions added to wait for text in shadow/document DOM and persist results; old fixture removed.
UI Judge Comment GitHub Action `.github/actions/ui-judge-comment/README.md`, `.github/actions/ui-judge-comment/action.yml`, `.github/actions/ui-judge-comment/comment.mjs`	Composite action that posts or updates PR comments from UI Judge results. Reads JSON/file input, normalizes scores (0–5 bounds), formats Markdown table with status per result, renders expandable details (task/reference/steps/error), truncates long outputs, and uses GitHub REST API to create/update comment by marker. Includes API client, URL sanitization, Markdown escaping, and actions-run link generation with attempt suffixing.
CI Workflow Integration `.github/ui-judge-ci.instructions.md`, `.github/workflows/test.yml`, `.github/workflows/workflow-test.yml`, `.github/scripts/write-ui-judge-result.mjs`	Extends `workflow-test.yml` with MIDSCENE secrets and artifact upload inputs; simplifies build to single-line command. Adds `ui-judge` job running tests and uploading results, with fallback script writing failure result when secrets unavailable. Adds `ui-judge-comment` job downloading results and invoking action for PRs. Updates `done` job to wait on both new jobs.
Build & Operational Guidance `.gitignore`, `AGENTS.md`, `.github/ui-judge.instructions.md`	Ignores `midscene_run`; clarifies Turbo filtering in build guidance. Documents Midscene scoring requirements (`aiNumber()` structured Number field, no GRADE text), model-backed test structure (skip only model-dependent cases, keep server startup skipped), fixture patterns (prefer `page.setContent()` for static tests, use playground for network navigation), sandbox TCP restrictions, and prerequisite catalog artifact builds.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

lynx-family/lynx-stack#2629: Both PRs modify the @lynx-js/ui-judge package's judgePage implementation, test suite, documentation, and scoring prompt behavior, introducing the package integration and its visual correctness testing.
lynx-family/lynx-stack#2561: Both PRs touch the same genui/a2ui-playground render logic for protocol and demo URL handling—main PR extends protocol/speed parsing for new variants while retrieved PR refactors protocol/demo URL selection.
lynx-family/lynx-stack#2472: Main PR's updates to render.tsx query parameter parsing for protocol and speed directly extend the retrieved PR's initial render.tsx initData construction from URL parameters.

Suggested reviewers

HuJean
Sherry-hue
gaoachao

Poem

🐰 A rabbit hops through test results with glee,
Zero-delay renders flowing wild and free,
Comments posted, protocols aligned,
Playground servers ready, carefully designed,
Visual truths captured, in CI we confide! 🎨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: replacing a scratch HTTP fixture with the A2UI playground render preview for ui-judge tests. It is specific, concise, and accurately reflects the primary objective of the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch hw/codex/ui-judge-playground-preview

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-20T03:34:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

codspeed-hq · 2026-05-20T04:00:25Z

Merging this PR will degrade performance by 7.3%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 1 regressed benchmark
✅ 80 untouched benchmarks
⏩ 26 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
❌	`transform 1000 view elements`	40 ms	43.2 ms	-7.3%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing hw/codex/ui-judge-playground-preview (83162b2) with main (7e6ff74)}

26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

relativeci · 2026-05-20T04:14:10Z

React External

#1678 Bundle Size — 698.01KiB (0%).

83162b2(current) vs 7e6ff74 main#1660(baseline)

Bundle metrics no changes

	Current #1678	Baseline #1660
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`3`	`3`
Modules	`17`	`17`
Duplicate Modules	`5`	`5`
Duplicate Code	`8.59%`	`8.59%`
Packages	`0`	`0`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #1678	Baseline #1660
Other	`698.01KiB`	`698.01KiB`

Bundle analysis report Branch hw/codex/ui-judge-playground-pre... Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-20T04:14:12Z

React Example with Element Template

#831 Bundle Size — 202.16KiB (0%).

83162b2(current) vs 7e6ff74 main#814(baseline)

Bundle metrics no changes

	Current #831	Baseline #814
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`4`	`4`
Modules	`100`	`100`
Duplicate Modules	`30`	`30`
Duplicate Code	`39.22%`	`39.22%`
Packages	`2`	`2`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #831	Baseline #814
IMG	`145.76KiB`	`145.76KiB`
Other	`56.41KiB`	`56.41KiB`

Bundle analysis report Branch hw/codex/ui-judge-playground-pre... Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-20T04:14:13Z

React Example

#8563 Bundle Size — 237.81KiB (0%).

83162b2(current) vs 7e6ff74 main#8545(baseline)

Bundle metrics no changes

	Current #8563	Baseline #8545
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`4`	`4`
Modules	`200`	`200`
Duplicate Modules	`80`	`80`
Duplicate Code	`44.68%`	`44.68%`
Packages	`2`	`2`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #8563	Baseline #8545
IMG	`145.76KiB`	`145.76KiB`
Other	`92.05KiB`	`92.05KiB`

Bundle analysis report Branch hw/codex/ui-judge-playground-pre... Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-20T04:14:16Z

React MTF Example

#1696 Bundle Size — 208.75KiB (0%).

83162b2(current) vs 7e6ff74 main#1678(baseline)

Bundle metrics no changes

	Current #1696	Baseline #1678
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`3`	`3`
Modules	`195`	`195`
Duplicate Modules	`77`	`77`
Duplicate Code	`44.17%`	`44.17%`
Packages	`2`	`2`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #1696	Baseline #1678
IMG	`111.23KiB`	`111.23KiB`
Other	`97.52KiB`	`97.52KiB`

Bundle analysis report Branch hw/codex/ui-judge-playground-pre... Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-20T04:14:22Z

Web Explorer

#10137 Bundle Size — 903.53KiB (0%).

83162b2(current) vs 7e6ff74 main#10119(baseline)

Bundle metrics

1 change

	Current #10137	Baseline #10119
Initial JS	`45.06KiB`	`45.06KiB`
Initial CSS	`2.22KiB`	`2.22KiB`
Cache Invalidation	`0%`	`0%`
Chunks	`9`	`9`
Assets	`11`	`11`
Modules	`230`	`230`
Duplicate Modules	`11`	`11`
Duplicate Code	`27.12%`(`-0.04%`)	`27.13%`
Packages	`10`	`10`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #10137	Baseline #10119
JS	`499.15KiB`	`499.15KiB`
Other	`402.16KiB`	`402.16KiB`
CSS	`2.22KiB`	`2.22KiB`

Bundle analysis report Branch hw/codex/ui-judge-playground-pre... Project dashboard

^{Generated by RelativeCI Documentation Report issue}

github-actions · 2026-05-20T12:56:59Z

UI Judge

Average score: 2 / 5 across 1 result.

#	Dimension	Score	Page	Status
1	visual-correctness	2 / 5	preview	OK

Details

Result 1

Task: The A2UI playground preview should show date-night dining recommendations for Moonlight Terrace, Pinewood Bistro, and Sea Breeze Kitchen.

Workflow run

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (5)

.github/workflows/test.yml (1)

109-130: 💤 Low value

De-duplicate the repeated if: expression on the three steps.

The same multi-clause expression is repeated on checkout, download-artifact, and the comment step. Hoist it into a single env var (or a single job-level guard plus a per-step skip) so it only has to be maintained in one place.

♻️ Example using a job-level env

   ui-judge-comment:
     needs: ui-judge
     if: always()
     runs-on: lynx-ubuntu-24.04-medium
     permissions:
       contents: read
       issues: write
       pull-requests: write
+    env:
+      SHOULD_COMMENT: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && needs.ui-judge.result != 'skipped' && needs.ui-judge.result != 'cancelled' }}
     steps:
       - uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5
-        if: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && needs.ui-judge.result != 'skipped' && needs.ui-judge.result != 'cancelled' }}
+        if: ${{ env.SHOULD_COMMENT == 'true' }}
         with:
           persist-credentials: false
       - uses: actions/download-artifact@634f93cb2916e3fdff6788551b99b062d0335ce0 # v5
-        if: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && needs.ui-judge.result != 'skipped' && needs.ui-judge.result != 'cancelled' }}
+        if: ${{ env.SHOULD_COMMENT == 'true' }}
         with:
           name: ui-judge-results
       - name: Comment UI Judge result
-        if: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && needs.ui-judge.result != 'skipped' && needs.ui-judge.result != 'cancelled' }}
+        if: ${{ env.SHOULD_COMMENT == 'true' }}
         uses: ./.github/actions/ui-judge-comment
         with:
           result-file: ui-judge-results.json

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test.yml around lines 109 - 130, The repeated multi-clause
condition used in the three steps of the ui-judge-comment job should be
centralized: create a single job-level guard or job env variable (e.g., set env
like UI_JUDGE_RUN_CONDITION or move the entire if: expression to the job-level
if for ui-judge-comment) and then remove the duplicated per-step if: on the
actions/checkout, actions/download-artifact and the "Comment UI Judge result"
step; update those steps to rely on the job-level guard (or a simple per-step
conditional that checks the new env var) so the expression is maintained in one
place and all three steps reference that single symbol instead of duplicating
the long clause.

.github/actions/ui-judge-comment/comment.mjs (1)

219-231: 💤 Low value

Simplify table row construction.

The .join(' | ').replace(/^/, '| ').replace(/$/, ' |') chain is non-obvious; a template literal expresses the same intent more clearly without the regex round-trip.

♻️ Proposed simplification

-  return [
-    String(index + 1),
-    escapeTableCell(result.dimension),
-    `${result.score} / 5`,
-    page,
-    status,
-  ].join(' | ').replace(/^/, '| ').replace(/$/, ' |');
+  const cells = [
+    String(index + 1),
+    escapeTableCell(result.dimension),
+    `${result.score} / 5`,
+    page,
+    status,
+  ];
+  return `| ${cells.join(' | ')} |`;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/actions/ui-judge-comment/comment.mjs around lines 219 - 231, The
table row construction in formatTableRow uses an opaque join(' | ').replace(/^/,
'| ').replace(/$/, ' |') pattern; simplify by building the row with a single
template literal that inserts String(index + 1),
escapeTableCell(result.dimension), `${result.score} / 5`, the page variable
(from sanitizeUrlForMarkdown(result.url)), and status separated by " | " and
wrapped with leading and trailing pipes, ensuring the same output but clearer
intent and easier maintenance.

packages/genui/ui-judge/tests/helpers/playground-preview-server.ts (2)

159-180: 💤 Low value

Free-port allocation has an inherent TOCTOU race.

findFreePort closes the listener before pnpm dev binds, so another local process could grab the port in between and the readiness loop would silently time out on a foreign server. For a CI helper this is usually acceptable, but consider passing the listening server's port via SO_REUSEADDR-style handoff or accepting PORT via env to allow a CI-provided port range. Optional hardening only.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/tests/helpers/playground-preview-server.ts` around
lines 159 - 180, findFreePort currently picks and closes a free port, creating a
TOCTOU race between allocation and the child process binding; to harden it,
first allow honoring an externally provided port (process.env.PORT) or an
allowed port-range argument and validate it in findFreePort, and if no env port
is provided change the API to either return a still-bound server/socket (so the
caller can hand it off to the child process or keep it open until the child
confirms binding) or implement a simple handoff (e.g., keep the listener open
and pass the port to the child via env + wait for child to accept, then close),
and update the readiness loop to use the new behavior; touch the findFreePort
implementation and any callers in the readiness loop to consume the bound server
or PORT env instead of relying on a closed-port result.
10-16: 💤 Low value

protocol option type is narrower than the playground accepts.

render.tsx now treats '0.9', 'a2ui', and 'openui' as valid, but PlaygroundDemoPreviewOptions.protocol here is 'a2ui' | 'openui'. Not a bug today (tests only use a2ui), but it makes the helper inconsistent with the runtime contract.
Suggested change
-  protocol?: 'a2ui' | 'openui';
+  protocol?: '0.9' | 'a2ui' | 'openui';
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/tests/helpers/playground-preview-server.ts` around
lines 10 - 16, PlaygroundDemoPreviewOptions.protocol is too narrow compared to
runtime expectations in render.tsx; update the protocol type in the
PlaygroundDemoPreviewOptions interface to include '0.9' (e.g. '0.9' | 'a2ui' |
'openui') or widen it to string so the test helper matches render.tsx's accepted
values and avoids type mismatches when render.tsx treats '0.9' as valid.

packages/genui/ui-judge/tests/judge-page.spec.ts (1)

52-84: 💤 Low value

Render-only checks are gated behind MIDSCENE_MODEL_NAME.

The parameterized renders playground example … tests don't invoke judgePage or any Midscene model, but they live inside this describe block and are therefore skipped whenever MIDSCENE_MODEL_NAME is absent. If the intent is to use them as cheap smoke tests for the playground preview in regular CI, consider splitting them out of the model-gated describe so they run unconditionally; if the intent is to keep them only on model-backed runs, this is fine as-is.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/tests/judge-page.spec.ts` around lines 52 - 84, The
tests under the describe "A2UI playground preview" are being skipped when
hasMidsceneModelConfig() is false because test.skip wraps the entire block, but
the parameterized tests that iterate PLAYGROUND_DEMO_CASES (the `renders
playground example ${demo.demoId} with speed zero` tests) do not require the
Midscene model; move those render-only tests out of the model-gated describe or
create a new describe that is not guarded by test.skip, leaving model-dependent
tests (that call judgePage/Midscene) inside the existing gated block; adjust
references to startPlaygroundPreviewServer(),
previewServer.createDemoPreviewUrl(), waitForPreviewText(), and the
PLAYGROUND_DEMO_CASES loop so the render-only loop runs unconditionally while
preserving the gated tests that actually need hasMidsceneModelConfig().

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/actions/ui-judge-comment/comment.mjs:
- Around line 317-324: findExistingComment currently fetches only the first page
of comments and can miss matches on busy PRs; update it to paginate the
/repos/{owner}/{repo}/issues/{prNumber}/comments endpoint by following the Link
header (rel="next") from the client response and iterate pages until you find a
comment whose body includes the marker, returning immediately when found; ensure
the loop handles the client’s response shape (comments array and response
headers) and stops when no next link is present to avoid infinite loops.

In `@packages/genui/ui-judge/README.md`:
- Around line 39-47: The README command and the prereq check message disagree:
update either the README snippet in packages/genui/ui-judge/README.md or the
error text in playground-preview-server.ts so both instruct the same canonical
build command; pick the canonical command used by your build system (e.g., pnpm
--filter `@lynx-js/a2ui-reactlynx` build or pnpm turbo build:lynx --filter
a2ui-playground), then replace the mismatched string in README.md or the
hardcoded message in playground-preview-server.ts (search for the runtime error
text "Run `pnpm --filter `@lynx-js/a2ui-reactlynx` build`..." and the README build
block) so users get one consistent prep command.

---

Nitpick comments:
In @.github/actions/ui-judge-comment/comment.mjs:
- Around line 219-231: The table row construction in formatTableRow uses an
opaque join(' | ').replace(/^/, '| ').replace(/$/, ' |') pattern; simplify by
building the row with a single template literal that inserts String(index + 1),
escapeTableCell(result.dimension), `${result.score} / 5`, the page variable
(from sanitizeUrlForMarkdown(result.url)), and status separated by " | " and
wrapped with leading and trailing pipes, ensuring the same output but clearer
intent and easier maintenance.

In @.github/workflows/test.yml:
- Around line 109-130: The repeated multi-clause condition used in the three
steps of the ui-judge-comment job should be centralized: create a single
job-level guard or job env variable (e.g., set env like UI_JUDGE_RUN_CONDITION
or move the entire if: expression to the job-level if for ui-judge-comment) and
then remove the duplicated per-step if: on the actions/checkout,
actions/download-artifact and the "Comment UI Judge result" step; update those
steps to rely on the job-level guard (or a simple per-step conditional that
checks the new env var) so the expression is maintained in one place and all
three steps reference that single symbol instead of duplicating the long clause.

In `@packages/genui/ui-judge/tests/helpers/playground-preview-server.ts`:
- Around line 159-180: findFreePort currently picks and closes a free port,
creating a TOCTOU race between allocation and the child process binding; to
harden it, first allow honoring an externally provided port (process.env.PORT)
or an allowed port-range argument and validate it in findFreePort, and if no env
port is provided change the API to either return a still-bound server/socket (so
the caller can hand it off to the child process or keep it open until the child
confirms binding) or implement a simple handoff (e.g., keep the listener open
and pass the port to the child via env + wait for child to accept, then close),
and update the readiness loop to use the new behavior; touch the findFreePort
implementation and any callers in the readiness loop to consume the bound server
or PORT env instead of relying on a closed-port result.
- Around line 10-16: PlaygroundDemoPreviewOptions.protocol is too narrow
compared to runtime expectations in render.tsx; update the protocol type in the
PlaygroundDemoPreviewOptions interface to include '0.9' (e.g. '0.9' | 'a2ui' |
'openui') or widen it to string so the test helper matches render.tsx's accepted
values and avoids type mismatches when render.tsx treats '0.9' as valid.

In `@packages/genui/ui-judge/tests/judge-page.spec.ts`:
- Around line 52-84: The tests under the describe "A2UI playground preview" are
being skipped when hasMidsceneModelConfig() is false because test.skip wraps the
entire block, but the parameterized tests that iterate PLAYGROUND_DEMO_CASES
(the `renders playground example ${demo.demoId} with speed zero` tests) do not
require the Midscene model; move those render-only tests out of the model-gated
describe or create a new describe that is not guarded by test.skip, leaving
model-dependent tests (that call judgePage/Midscene) inside the existing gated
block; adjust references to startPlaygroundPreviewServer(),
previewServer.createDemoPreviewUrl(), waitForPreviewText(), and the
PLAYGROUND_DEMO_CASES loop so the render-only loop runs unconditionally while
preserving the gated tests that actually need hasMidsceneModelConfig().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 01f9d408-aec2-41be-afe6-3e20735c758b

📥 Commits

Reviewing files that changed from the base of the PR and between 8083591 and 1b15be2.

📒 Files selected for processing (18)

.github/actions/ui-judge-comment/README.md
.github/actions/ui-judge-comment/action.yml
.github/actions/ui-judge-comment/comment.mjs
.github/scripts/write-ui-judge-failure-result.mjs
.github/ui-judge-ci.instructions.md
.github/ui-judge.instructions.md
.github/workflows/test.yml
.github/workflows/workflow-test.yml
.gitignore
AGENTS.md
packages/genui/a2ui-playground/lynx-src/a2ui/App.tsx
packages/genui/a2ui-playground/src/render.tsx
packages/genui/a2ui-playground/src/utils/renderUrl.ts
packages/genui/ui-judge/README.md
packages/genui/ui-judge/src/index.ts
packages/genui/ui-judge/tests/fixtures/interactive.html
packages/genui/ui-judge/tests/helpers/playground-preview-server.ts
packages/genui/ui-judge/tests/judge-page.spec.ts

💤 Files with no reviewable changes (1)

packages/genui/ui-judge/tests/fixtures/interactive.html

coderabbitai

🧹 Nitpick comments (2)

.github/workflows/test.yml (1)
115-123: 💤 Low value

Consider DRYing the repeated step-level condition.

The same github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && needs.ui-judge.result != 'skipped' && needs.ui-judge.result != 'cancelled' is repeated verbatim on the checkout, download-artifact, and comment steps. The reason for keeping it at step-level (rather than at job-level) is presumably so the job still resolves to success for the done aggregator on non-PR/fork events; that's fine. As a minor cleanup, you could compute it once into an env value (e.g. env.SHOULD_COMMENT) at the job level and reference ${{ env.SHOULD_COMMENT == 'true' }} per step to avoid drift if the eligibility rules change. Non-blocking.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test.yml around lines 115 - 123, The repeated complex
step-level condition used on the checkout, download-artifact, and "Comment UI
Judge result" steps should be computed once and reused; add a job-level env
variable (e.g. env.SHOULD_COMMENT) that evaluates the same expression to 'true'
or 'false' and then replace each step's if: with a simple check referencing that
env (e.g. if: ${{ env.SHOULD_COMMENT == 'true' }}), updating the steps that
include actions/checkout, actions/download-artifact@..., and the "Comment UI
Judge result" step to use the new env variable.
.github/scripts/write-ui-judge-result.mjs (1)
7-10: 💤 Low value

Guard against GITHUB_WORKSPACE being unset.

When neither UI_JUDGE_RESULT_FILE nor GITHUB_WORKSPACE is set (e.g., when running the script locally for the dry-run testing path mentioned in the PR description), join(undefined, 'ui-judge-results.json') throws a TypeError [ERR_INVALID_ARG_TYPE] with a stack trace rather than a helpful message. Adding an explicit check (or falling back to process.cwd()) keeps the failure mode clean for local invocations.
🛡️ Suggested defensive fallback
-const resultFile = process.env.UI_JUDGE_RESULT_FILE
-  || join(process.env.GITHUB_WORKSPACE, 'ui-judge-results.json');
+const workspace = process.env.GITHUB_WORKSPACE || process.cwd();
+const resultFile = process.env.UI_JUDGE_RESULT_FILE
+  || join(workspace, 'ui-judge-results.json');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/scripts/write-ui-judge-result.mjs around lines 7 - 10, The code
assumes process.env.GITHUB_WORKSPACE is defined when computing resultFile (using
join(process.env.GITHUB_WORKSPACE, 'ui-judge-results.json')), which throws if
it's undefined; change the logic that sets resultFile to first use
process.env.UI_JUDGE_RESULT_FILE, and if absent compute the path using a safe
workspace variable (e.g., const workspace = process.env.GITHUB_WORKSPACE ||
process.cwd()) before calling join; update the resultFile assignment to use that
workspace fallback so local/dry-run invocations don't throw a TypeError.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/scripts/write-ui-judge-result.mjs:
- Around line 7-10: The code assumes process.env.GITHUB_WORKSPACE is defined
when computing resultFile (using join(process.env.GITHUB_WORKSPACE,
'ui-judge-results.json')), which throws if it's undefined; change the logic that
sets resultFile to first use process.env.UI_JUDGE_RESULT_FILE, and if absent
compute the path using a safe workspace variable (e.g., const workspace =
process.env.GITHUB_WORKSPACE || process.cwd()) before calling join; update the
resultFile assignment to use that workspace fallback so local/dry-run
invocations don't throw a TypeError.

In @.github/workflows/test.yml:
- Around line 115-123: The repeated complex step-level condition used on the
checkout, download-artifact, and "Comment UI Judge result" steps should be
computed once and reused; add a job-level env variable (e.g. env.SHOULD_COMMENT)
that evaluates the same expression to 'true' or 'false' and then replace each
step's if: with a simple check referencing that env (e.g. if: ${{
env.SHOULD_COMMENT == 'true' }}), updating the steps that include
actions/checkout, actions/download-artifact@..., and the "Comment UI Judge
result" step to use the new env variable.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7692921e-e4e1-40c5-98cd-b823a0d3a3f1

📥 Commits

Reviewing files that changed from the base of the PR and between 1b15be2 and 83162b2.

📒 Files selected for processing (4)

.github/scripts/write-ui-judge-result.mjs
.github/ui-judge-ci.instructions.md
.github/workflows/test.yml
.github/workflows/workflow-test.yml

💤 Files with no reviewable changes (1)

.github/workflows/workflow-test.yml

✅ Files skipped from review due to trivial changes (1)

.github/ui-judge-ci.instructions.md

PupilTong force-pushed the hw/codex/ui-judge-playground-preview branch from 81ba50d to 609c9e0 Compare May 20, 2026 07:51

PupilTong changed the title ~~[codex] Use playground preview for ui-judge tests~~ [codex] Use playground render preview for ui-judge tests May 20, 2026

PupilTong force-pushed the hw/codex/ui-judge-playground-preview branch 4 times, most recently from d1209b1 to cbf445d Compare May 20, 2026 12:07

PupilTong changed the title ~~[codex] Use playground render preview for ui-judge tests~~ test: use playground render preview for ui-judge tests May 20, 2026

PupilTong force-pushed the hw/codex/ui-judge-playground-preview branch 5 times, most recently from 3423b1a to a5842bd Compare May 20, 2026 12:55

PupilTong force-pushed the hw/codex/ui-judge-playground-preview branch 5 times, most recently from dbd4430 to 0002eec Compare May 20, 2026 13:59

PupilTong self-assigned this May 21, 2026

Use playground preview for ui-judge tests

a968ffe

PupilTong force-pushed the hw/codex/ui-judge-playground-preview branch from 0002eec to a968ffe Compare May 21, 2026 04:59

PupilTong added 7 commits May 21, 2026 14:18

Show UI judge rerun attempts in PR comment

b021be0

Scope ui judge CI build steps

19935de

Build rspeedy prerequisites for ui judge CI

bed72e7

Use turbo cache for ui judge CI builds

664ae38

Raise file limit for ui judge CI

265227b

Clean up ui judge CI wiring

6383bd5

Merge origin/main into ui judge preview

1372e14

github-advanced-security AI found potential problems May 21, 2026

View reviewed changes

Comment thread .github/workflows/workflow-test.yml Fixed

PupilTong added 7 commits May 21, 2026 17:27

Make ui judge CI scripts sh-compatible

7a3b68f

Set ui judge CI workspace explicitly

0654b5d

Fix ui judge reusable preflight cwd

74997d5

Make ui judge preflight independent of git metadata

e53f0d0

Make ui judge comment action set up node

d3d19f9

Simplify ui judge secrets guard

92bd7de

Simplify ui judge reusable workflow inputs

1b15be2

PupilTong marked this pull request as ready for review May 21, 2026 12:45

PupilTong changed the title ~~test: use playground render preview for ui-judge tests~~ test（a2ui): use playground render preview for ui-judge tests May 21, 2026

PupilTong changed the title ~~test（a2ui): use playground render preview for ui-judge tests~~ test(a2ui): use playground render preview for ui-judge tests May 21, 2026

Remove ui judge failure result hook

83162b2

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread .github/actions/ui-judge-comment/comment.mjs

Comment thread packages/genui/ui-judge/README.md

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Sherry-hue approved these changes May 22, 2026

View reviewed changes

PupilTong merged commit 2d64575 into main May 22, 2026
55 of 56 checks passed

PupilTong deleted the hw/codex/ui-judge-playground-preview branch May 22, 2026 04:03

coderabbitai Bot mentioned this pull request May 22, 2026

test(ui-judge): add GEQI dimension scoring #2693

Open

Conversation

PupilTong commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test Plan

Summary by CodeRabbit

Release Notes

Uh oh!

changeset-bot Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 7.3%

Performance Changes

Footnotes

Uh oh!

relativeci Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React External

#1678 Bundle Size — 698.01KiB (0%).

Uh oh!

relativeci Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React Example with Element Template

#831 Bundle Size — 202.16KiB (0%).

Uh oh!

relativeci Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React Example

#8563 Bundle Size — 237.81KiB (0%).

Uh oh!

relativeci Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React MTF Example

#1696 Bundle Size — 208.75KiB (0%).

Uh oh!

relativeci Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Web Explorer

#10137 Bundle Size — 903.53KiB (0%).

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

UI Judge

Result 1

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

PupilTong commented May 20, 2026 •

edited by coderabbitai Bot

Loading

changeset-bot Bot commented May 20, 2026 •

edited

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading

codecov Bot commented May 20, 2026 •

edited

Loading

codspeed-hq Bot commented May 20, 2026 •

edited

Loading

relativeci Bot commented May 20, 2026 •

edited

Loading

relativeci Bot commented May 20, 2026 •

edited

Loading

relativeci Bot commented May 20, 2026 •

edited

Loading

relativeci Bot commented May 20, 2026 •

edited

Loading

relativeci Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading