Skip to content

test(a2ui): use playground render preview for ui-judge tests#2673

Merged
PupilTong merged 16 commits into
mainfrom
hw/codex/ui-judge-playground-preview
May 22, 2026
Merged

test(a2ui): use playground render preview for ui-judge tests#2673
PupilTong merged 16 commits into
mainfrom
hw/codex/ui-judge-playground-preview

Conversation

@PupilTong
Copy link
Copy Markdown
Collaborator

@PupilTong PupilTong commented May 20, 2026

Summary

  • Replace the ui-judge scratch HTTP fixture with the A2UI playground render.html preview path.
  • Start packages/genui/a2ui-playground with pnpm dev from a test-only helper, and cover several direct examples with speed=0.
  • Preserve judgePage as the only public @lynx-js/ui-judge API while keeping validation tests server-free.
  • Add .github/actions/ui-judge-comment, a dependency-free composite action that creates or updates a PR comment from UiJudgeResult JSON.
  • Wire UI Judge into the main Test workflow so relevant PRs can run the model-backed playground check and publish a PR comment.

Details

  • Playground test URLs now use routes such as /render.html?protocol=a2ui&demoUrl=.%2Fa2ui.web.js&theme=light&demo=recs&speed=0.
  • speed=0 now means no streaming delay, so all messages are processed immediately.
  • The model-backed test still runs only when MIDSCENE_MODEL_NAME is configured.
  • The PR comment action accepts result-file or result-json, supports dry-run output, updates an existing marked comment by default, and uses github.token unless a token is supplied.
  • The CI ui-judge job is gated to pull requests with Midscene secrets and UI Judge/A2UI/playground-related file changes; it writes UI_JUDGE_RESULT_FILE and calls the new comment action.
  • The ui-judge changed-file gate uses the GitHub pull request files API instead of local git diff, because the custom container runner does not expose the checkout as a Git worktree inside shell commands.
  • The ui-judge job now follows the web-elements Playwright pattern: it depends on the repository build job, restores the strict .turbo cache for the current commit, runs pnpm turbo build --summarize in the Playwright container, then prepares the A2UI playground Lynx artifact before testing.
  • If setup or build fails before the model test writes a result, CI writes a fallback result JSON so the PR comment still explains the failure.
  • Model-backed ui-judge failures still publish or update the PR comment, then fail the ui-judge job so the PR cannot pass on a broken judge run. Long setup, build, and judge steps have step-level timeouts so fallback comments can run before the job-level timeout.

Test Plan

  • pnpm turbo build:lynx --filter a2ui-playground
  • pnpm --filter @lynx-js/ui-judge build
  • UI_JUDGE_RESULT_FILE=/tmp/ui-judge-results.json pnpm --filter @lynx-js/ui-judge test (6 passed; generated a score result JSON)
  • INPUT_RESULT_FILE=/tmp/ui-judge-results.json INPUT_DRY_RUN=true GITHUB_REPOSITORY=lynx-family/lynx-stack GITHUB_RUN_ID=123 node .github/actions/ui-judge-comment/comment.mjs
  • node --check .github/actions/ui-judge-comment/comment.mjs
  • pnpm eslint .github/actions/ui-judge-comment/comment.mjs packages/genui/ui-judge/tests/judge-page.spec.ts --flag v10_config_lookup_from_file
  • pnpm biome check .github/actions/ui-judge-comment/comment.mjs packages/genui/ui-judge/tests/judge-page.spec.ts
  • ruby -e 'require "yaml"; YAML.load_file(".github/workflows/test.yml"); puts "yaml ok"'
  • env -u MIDSCENE_MODEL_NAME -u MIDSCENE_MODEL_API_KEY bash -ceu 'tmp=$(mktemp); event=$(mktemp); printf "{\"number\":2673}\n" > "$event"; export GITHUB_OUTPUT="$tmp" GITHUB_EVENT_PATH="$event" GITHUB_EVENT_NAME=pull_request; node --input-type=module <<'"'"'NODE'"'"' import { appendFileSync } from "node:fs"; let shouldRun = false; let reason = "UI Judge only comments on pull_request events."; if (process.env.GITHUB_EVENT_NAME === "pull_request") { if (!process.env.MIDSCENE_MODEL_NAME || !process.env.MIDSCENE_MODEL_API_KEY) { reason = "Midscene model secrets are not configured for this pull request."; } } appendFileSync(process.env.GITHUB_OUTPUT, should-run=${shouldRun}\n); appendFileSync(process.env.GITHUB_OUTPUT, reason=${reason}\n); console.info(reason); NODE cat "$tmp"'
  • git diff --check

Summary by CodeRabbit

Release Notes

  • New Features

    • Added UI Judge GitHub Action to automatically post/update pull request comments with visual test results
    • Integrated UI Judge into CI pipeline with artifact uploads
  • Improvements

    • Playground simulation now supports speed=0 for no-delay rendering mode
    • Updated visual-correctness grading mechanism for improved accuracy
  • Documentation

    • Added comprehensive UI Judge CI integration and testing guidance
    • Updated build and testing instructions

Review Change Stack

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 20, 2026

⚠️ No Changeset found

Latest commit: 83162b2

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

This PR integrates UI Judge visual correctness testing into the CI workflow end-to-end: enhances the playground runtime to support zero-delay simulation and multiple protocol variants, provides a playground server test helper replacing local fixtures, creates a GitHub composite action for posting results to PR comments, wires the workflow with fallback artifact handling, and documents operational practices.

Changes

UI Judge Visual Correctness Testing Integration

Layer / File(s) Summary
Playground Runtime & Scoring Enhancements
packages/genui/a2ui-playground/lynx-src/a2ui/App.tsx, packages/genui/a2ui-playground/src/render.tsx, packages/genui/a2ui-playground/src/utils/renderUrl.ts, packages/genui/ui-judge/src/index.ts
streamDelay treats speed=0 as explicit no-delay mode; protocol type widened to '0.9' | 'a2ui' | 'openui' with updated query parsing; speed parameter accepts >= 0; UI Judge scoring prompt requires Midscene's structured Number field instead of bare integers.
Playground Test Infrastructure
packages/genui/ui-judge/tests/helpers/playground-preview-server.ts, packages/genui/ui-judge/tests/judge-page.spec.ts, packages/genui/ui-judge/README.md
New test helper boots A2UI playground dev server on free local port, captures logs, polls readiness endpoints, exposes baseUrl/createDemoPreviewUrl/dispose/getLogs. Test suite refactored to use preview server with speed=0 instead of local HTTP fixture; helper functions added to wait for text in shadow/document DOM and persist results; old fixture removed.
UI Judge Comment GitHub Action
.github/actions/ui-judge-comment/README.md, .github/actions/ui-judge-comment/action.yml, .github/actions/ui-judge-comment/comment.mjs
Composite action that posts or updates PR comments from UI Judge results. Reads JSON/file input, normalizes scores (0–5 bounds), formats Markdown table with status per result, renders expandable details (task/reference/steps/error), truncates long outputs, and uses GitHub REST API to create/update comment by marker. Includes API client, URL sanitization, Markdown escaping, and actions-run link generation with attempt suffixing.
CI Workflow Integration
.github/ui-judge-ci.instructions.md, .github/workflows/test.yml, .github/workflows/workflow-test.yml, .github/scripts/write-ui-judge-result.mjs
Extends workflow-test.yml with MIDSCENE secrets and artifact upload inputs; simplifies build to single-line command. Adds ui-judge job running tests and uploading results, with fallback script writing failure result when secrets unavailable. Adds ui-judge-comment job downloading results and invoking action for PRs. Updates done job to wait on both new jobs.
Build & Operational Guidance
.gitignore, AGENTS.md, .github/ui-judge.instructions.md
Ignores midscene_run; clarifies Turbo filtering in build guidance. Documents Midscene scoring requirements (aiNumber() structured Number field, no GRADE text), model-backed test structure (skip only model-dependent cases, keep server startup skipped), fixture patterns (prefer page.setContent() for static tests, use playground for network navigation), sandbox TCP restrictions, and prerequisite catalog artifact builds.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • lynx-family/lynx-stack#2629: Both PRs modify the @lynx-js/ui-judge package's judgePage implementation, test suite, documentation, and scoring prompt behavior, introducing the package integration and its visual correctness testing.
  • lynx-family/lynx-stack#2561: Both PRs touch the same genui/a2ui-playground render logic for protocol and demo URL handling—main PR extends protocol/speed parsing for new variants while retrieved PR refactors protocol/demo URL selection.
  • lynx-family/lynx-stack#2472: Main PR's updates to render.tsx query parameter parsing for protocol and speed directly extend the retrieved PR's initial render.tsx initData construction from URL parameters.

Suggested reviewers

  • HuJean
  • Sherry-hue
  • gaoachao

Poem

🐰 A rabbit hops through test results with glee,
Zero-delay renders flowing wild and free,
Comments posted, protocols aligned,
Playground servers ready, carefully designed,
Visual truths captured, in CI we confide! 🎨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: replacing a scratch HTTP fixture with the A2UI playground render preview for ui-judge tests. It is specific, concise, and accurately reflects the primary objective of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hw/codex/ui-judge-playground-preview

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 20, 2026

Merging this PR will degrade performance by 7.3%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 1 regressed benchmark
✅ 80 untouched benchmarks
⏩ 26 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
transform 1000 view elements 40 ms 43.2 ms -7.3%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing hw/codex/ui-judge-playground-preview (83162b2) with main (7e6ff74)

Open in CodSpeed

Footnotes

  1. 26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 20, 2026

React External

#1678 Bundle Size — 698.01KiB (0%).

83162b2(current) vs 7e6ff74 main#1660(baseline)

Bundle metrics  no changes
                 Current
#1678
     Baseline
#1660
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 17 17
No change  Duplicate Modules 5 5
No change  Duplicate Code 8.59% 8.59%
No change  Packages 0 0
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#1678
     Baseline
#1660
No change  Other 698.01KiB 698.01KiB

Bundle analysis reportBranch hw/codex/ui-judge-playground-pre...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 20, 2026

React Example with Element Template

#831 Bundle Size — 202.16KiB (0%).

83162b2(current) vs 7e6ff74 main#814(baseline)

Bundle metrics  no changes
                 Current
#831
     Baseline
#814
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 4 4
No change  Modules 100 100
No change  Duplicate Modules 30 30
No change  Duplicate Code 39.22% 39.22%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#831
     Baseline
#814
No change  IMG 145.76KiB 145.76KiB
No change  Other 56.41KiB 56.41KiB

Bundle analysis reportBranch hw/codex/ui-judge-playground-pre...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 20, 2026

React Example

#8563 Bundle Size — 237.81KiB (0%).

83162b2(current) vs 7e6ff74 main#8545(baseline)

Bundle metrics  no changes
                 Current
#8563
     Baseline
#8545
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 4 4
No change  Modules 200 200
No change  Duplicate Modules 80 80
No change  Duplicate Code 44.68% 44.68%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#8563
     Baseline
#8545
No change  IMG 145.76KiB 145.76KiB
No change  Other 92.05KiB 92.05KiB

Bundle analysis reportBranch hw/codex/ui-judge-playground-pre...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 20, 2026

React MTF Example

#1696 Bundle Size — 208.75KiB (0%).

83162b2(current) vs 7e6ff74 main#1678(baseline)

Bundle metrics  no changes
                 Current
#1696
     Baseline
#1678
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 195 195
No change  Duplicate Modules 77 77
No change  Duplicate Code 44.17% 44.17%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#1696
     Baseline
#1678
No change  IMG 111.23KiB 111.23KiB
No change  Other 97.52KiB 97.52KiB

Bundle analysis reportBranch hw/codex/ui-judge-playground-pre...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 20, 2026

Web Explorer

#10137 Bundle Size — 903.53KiB (0%).

83162b2(current) vs 7e6ff74 main#10119(baseline)

Bundle metrics  Change 1 change
                 Current
#10137
     Baseline
#10119
No change  Initial JS 45.06KiB 45.06KiB
No change  Initial CSS 2.22KiB 2.22KiB
No change  Cache Invalidation 0% 0%
No change  Chunks 9 9
No change  Assets 11 11
No change  Modules 230 230
No change  Duplicate Modules 11 11
Change  Duplicate Code 27.12%(-0.04%) 27.13%
No change  Packages 10 10
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#10137
     Baseline
#10119
No change  JS 499.15KiB 499.15KiB
No change  Other 402.16KiB 402.16KiB
No change  CSS 2.22KiB 2.22KiB

Bundle analysis reportBranch hw/codex/ui-judge-playground-pre...Project dashboard


Generated by RelativeCIDocumentationReport issue

@PupilTong PupilTong force-pushed the hw/codex/ui-judge-playground-preview branch from 81ba50d to 609c9e0 Compare May 20, 2026 07:51
@PupilTong PupilTong changed the title [codex] Use playground preview for ui-judge tests [codex] Use playground render preview for ui-judge tests May 20, 2026
@PupilTong PupilTong force-pushed the hw/codex/ui-judge-playground-preview branch 4 times, most recently from d1209b1 to cbf445d Compare May 20, 2026 12:07
@PupilTong PupilTong changed the title [codex] Use playground render preview for ui-judge tests test: use playground render preview for ui-judge tests May 20, 2026
@PupilTong PupilTong force-pushed the hw/codex/ui-judge-playground-preview branch 5 times, most recently from 3423b1a to a5842bd Compare May 20, 2026 12:55
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

UI Judge

Average score: 2 / 5 across 1 result.

# Dimension Score Page Status
1 visual-correctness 2 / 5 preview OK
Details

Result 1

  • Task: The A2UI playground preview should show date-night dining recommendations for Moonlight Terrace, Pinewood Bistro, and Sea Breeze Kitchen.

Workflow run

@PupilTong PupilTong force-pushed the hw/codex/ui-judge-playground-preview branch 5 times, most recently from dbd4430 to 0002eec Compare May 20, 2026 13:59
@PupilTong PupilTong self-assigned this May 21, 2026
@PupilTong PupilTong force-pushed the hw/codex/ui-judge-playground-preview branch from 0002eec to a968ffe Compare May 21, 2026 04:59
Comment thread .github/workflows/workflow-test.yml Fixed
@PupilTong PupilTong marked this pull request as ready for review May 21, 2026 12:45
@PupilTong PupilTong changed the title test: use playground render preview for ui-judge tests test(a2ui): use playground render preview for ui-judge tests May 21, 2026
@PupilTong PupilTong changed the title test(a2ui): use playground render preview for ui-judge tests test(a2ui): use playground render preview for ui-judge tests May 21, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
.github/workflows/test.yml (1)

109-130: 💤 Low value

De-duplicate the repeated if: expression on the three steps.

The same multi-clause expression is repeated on checkout, download-artifact, and the comment step. Hoist it into a single env var (or a single job-level guard plus a per-step skip) so it only has to be maintained in one place.

♻️ Example using a job-level env
   ui-judge-comment:
     needs: ui-judge
     if: always()
     runs-on: lynx-ubuntu-24.04-medium
     permissions:
       contents: read
       issues: write
       pull-requests: write
+    env:
+      SHOULD_COMMENT: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && needs.ui-judge.result != 'skipped' && needs.ui-judge.result != 'cancelled' }}
     steps:
       - uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5
-        if: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && needs.ui-judge.result != 'skipped' && needs.ui-judge.result != 'cancelled' }}
+        if: ${{ env.SHOULD_COMMENT == 'true' }}
         with:
           persist-credentials: false
       - uses: actions/download-artifact@634f93cb2916e3fdff6788551b99b062d0335ce0 # v5
-        if: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && needs.ui-judge.result != 'skipped' && needs.ui-judge.result != 'cancelled' }}
+        if: ${{ env.SHOULD_COMMENT == 'true' }}
         with:
           name: ui-judge-results
       - name: Comment UI Judge result
-        if: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && needs.ui-judge.result != 'skipped' && needs.ui-judge.result != 'cancelled' }}
+        if: ${{ env.SHOULD_COMMENT == 'true' }}
         uses: ./.github/actions/ui-judge-comment
         with:
           result-file: ui-judge-results.json
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test.yml around lines 109 - 130, The repeated multi-clause
condition used in the three steps of the ui-judge-comment job should be
centralized: create a single job-level guard or job env variable (e.g., set env
like UI_JUDGE_RUN_CONDITION or move the entire if: expression to the job-level
if for ui-judge-comment) and then remove the duplicated per-step if: on the
actions/checkout, actions/download-artifact and the "Comment UI Judge result"
step; update those steps to rely on the job-level guard (or a simple per-step
conditional that checks the new env var) so the expression is maintained in one
place and all three steps reference that single symbol instead of duplicating
the long clause.
.github/actions/ui-judge-comment/comment.mjs (1)

219-231: 💤 Low value

Simplify table row construction.

The .join(' | ').replace(/^/, '| ').replace(/$/, ' |') chain is non-obvious; a template literal expresses the same intent more clearly without the regex round-trip.

♻️ Proposed simplification
-  return [
-    String(index + 1),
-    escapeTableCell(result.dimension),
-    `${result.score} / 5`,
-    page,
-    status,
-  ].join(' | ').replace(/^/, '| ').replace(/$/, ' |');
+  const cells = [
+    String(index + 1),
+    escapeTableCell(result.dimension),
+    `${result.score} / 5`,
+    page,
+    status,
+  ];
+  return `| ${cells.join(' | ')} |`;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/actions/ui-judge-comment/comment.mjs around lines 219 - 231, The
table row construction in formatTableRow uses an opaque join(' | ').replace(/^/,
'| ').replace(/$/, ' |') pattern; simplify by building the row with a single
template literal that inserts String(index + 1),
escapeTableCell(result.dimension), `${result.score} / 5`, the page variable
(from sanitizeUrlForMarkdown(result.url)), and status separated by " | " and
wrapped with leading and trailing pipes, ensuring the same output but clearer
intent and easier maintenance.
packages/genui/ui-judge/tests/helpers/playground-preview-server.ts (2)

159-180: 💤 Low value

Free-port allocation has an inherent TOCTOU race.

findFreePort closes the listener before pnpm dev binds, so another local process could grab the port in between and the readiness loop would silently time out on a foreign server. For a CI helper this is usually acceptable, but consider passing the listening server's port via SO_REUSEADDR-style handoff or accepting PORT via env to allow a CI-provided port range. Optional hardening only.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/tests/helpers/playground-preview-server.ts` around
lines 159 - 180, findFreePort currently picks and closes a free port, creating a
TOCTOU race between allocation and the child process binding; to harden it,
first allow honoring an externally provided port (process.env.PORT) or an
allowed port-range argument and validate it in findFreePort, and if no env port
is provided change the API to either return a still-bound server/socket (so the
caller can hand it off to the child process or keep it open until the child
confirms binding) or implement a simple handoff (e.g., keep the listener open
and pass the port to the child via env + wait for child to accept, then close),
and update the readiness loop to use the new behavior; touch the findFreePort
implementation and any callers in the readiness loop to consume the bound server
or PORT env instead of relying on a closed-port result.

10-16: 💤 Low value

protocol option type is narrower than the playground accepts.

render.tsx now treats '0.9', 'a2ui', and 'openui' as valid, but PlaygroundDemoPreviewOptions.protocol here is 'a2ui' | 'openui'. Not a bug today (tests only use a2ui), but it makes the helper inconsistent with the runtime contract.

Suggested change
-  protocol?: 'a2ui' | 'openui';
+  protocol?: '0.9' | 'a2ui' | 'openui';
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/tests/helpers/playground-preview-server.ts` around
lines 10 - 16, PlaygroundDemoPreviewOptions.protocol is too narrow compared to
runtime expectations in render.tsx; update the protocol type in the
PlaygroundDemoPreviewOptions interface to include '0.9' (e.g. '0.9' | 'a2ui' |
'openui') or widen it to string so the test helper matches render.tsx's accepted
values and avoids type mismatches when render.tsx treats '0.9' as valid.
packages/genui/ui-judge/tests/judge-page.spec.ts (1)

52-84: 💤 Low value

Render-only checks are gated behind MIDSCENE_MODEL_NAME.

The parameterized renders playground example … tests don't invoke judgePage or any Midscene model, but they live inside this describe block and are therefore skipped whenever MIDSCENE_MODEL_NAME is absent. If the intent is to use them as cheap smoke tests for the playground preview in regular CI, consider splitting them out of the model-gated describe so they run unconditionally; if the intent is to keep them only on model-backed runs, this is fine as-is.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/tests/judge-page.spec.ts` around lines 52 - 84, The
tests under the describe "A2UI playground preview" are being skipped when
hasMidsceneModelConfig() is false because test.skip wraps the entire block, but
the parameterized tests that iterate PLAYGROUND_DEMO_CASES (the `renders
playground example ${demo.demoId} with speed zero` tests) do not require the
Midscene model; move those render-only tests out of the model-gated describe or
create a new describe that is not guarded by test.skip, leaving model-dependent
tests (that call judgePage/Midscene) inside the existing gated block; adjust
references to startPlaygroundPreviewServer(),
previewServer.createDemoPreviewUrl(), waitForPreviewText(), and the
PLAYGROUND_DEMO_CASES loop so the render-only loop runs unconditionally while
preserving the gated tests that actually need hasMidsceneModelConfig().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/actions/ui-judge-comment/comment.mjs:
- Around line 317-324: findExistingComment currently fetches only the first page
of comments and can miss matches on busy PRs; update it to paginate the
/repos/{owner}/{repo}/issues/{prNumber}/comments endpoint by following the Link
header (rel="next") from the client response and iterate pages until you find a
comment whose body includes the marker, returning immediately when found; ensure
the loop handles the client’s response shape (comments array and response
headers) and stops when no next link is present to avoid infinite loops.

In `@packages/genui/ui-judge/README.md`:
- Around line 39-47: The README command and the prereq check message disagree:
update either the README snippet in packages/genui/ui-judge/README.md or the
error text in playground-preview-server.ts so both instruct the same canonical
build command; pick the canonical command used by your build system (e.g., pnpm
--filter `@lynx-js/a2ui-reactlynx` build or pnpm turbo build:lynx --filter
a2ui-playground), then replace the mismatched string in README.md or the
hardcoded message in playground-preview-server.ts (search for the runtime error
text "Run `pnpm --filter `@lynx-js/a2ui-reactlynx` build`..." and the README build
block) so users get one consistent prep command.

---

Nitpick comments:
In @.github/actions/ui-judge-comment/comment.mjs:
- Around line 219-231: The table row construction in formatTableRow uses an
opaque join(' | ').replace(/^/, '| ').replace(/$/, ' |') pattern; simplify by
building the row with a single template literal that inserts String(index + 1),
escapeTableCell(result.dimension), `${result.score} / 5`, the page variable
(from sanitizeUrlForMarkdown(result.url)), and status separated by " | " and
wrapped with leading and trailing pipes, ensuring the same output but clearer
intent and easier maintenance.

In @.github/workflows/test.yml:
- Around line 109-130: The repeated multi-clause condition used in the three
steps of the ui-judge-comment job should be centralized: create a single
job-level guard or job env variable (e.g., set env like UI_JUDGE_RUN_CONDITION
or move the entire if: expression to the job-level if for ui-judge-comment) and
then remove the duplicated per-step if: on the actions/checkout,
actions/download-artifact and the "Comment UI Judge result" step; update those
steps to rely on the job-level guard (or a simple per-step conditional that
checks the new env var) so the expression is maintained in one place and all
three steps reference that single symbol instead of duplicating the long clause.

In `@packages/genui/ui-judge/tests/helpers/playground-preview-server.ts`:
- Around line 159-180: findFreePort currently picks and closes a free port,
creating a TOCTOU race between allocation and the child process binding; to
harden it, first allow honoring an externally provided port (process.env.PORT)
or an allowed port-range argument and validate it in findFreePort, and if no env
port is provided change the API to either return a still-bound server/socket (so
the caller can hand it off to the child process or keep it open until the child
confirms binding) or implement a simple handoff (e.g., keep the listener open
and pass the port to the child via env + wait for child to accept, then close),
and update the readiness loop to use the new behavior; touch the findFreePort
implementation and any callers in the readiness loop to consume the bound server
or PORT env instead of relying on a closed-port result.
- Around line 10-16: PlaygroundDemoPreviewOptions.protocol is too narrow
compared to runtime expectations in render.tsx; update the protocol type in the
PlaygroundDemoPreviewOptions interface to include '0.9' (e.g. '0.9' | 'a2ui' |
'openui') or widen it to string so the test helper matches render.tsx's accepted
values and avoids type mismatches when render.tsx treats '0.9' as valid.

In `@packages/genui/ui-judge/tests/judge-page.spec.ts`:
- Around line 52-84: The tests under the describe "A2UI playground preview" are
being skipped when hasMidsceneModelConfig() is false because test.skip wraps the
entire block, but the parameterized tests that iterate PLAYGROUND_DEMO_CASES
(the `renders playground example ${demo.demoId} with speed zero` tests) do not
require the Midscene model; move those render-only tests out of the model-gated
describe or create a new describe that is not guarded by test.skip, leaving
model-dependent tests (that call judgePage/Midscene) inside the existing gated
block; adjust references to startPlaygroundPreviewServer(),
previewServer.createDemoPreviewUrl(), waitForPreviewText(), and the
PLAYGROUND_DEMO_CASES loop so the render-only loop runs unconditionally while
preserving the gated tests that actually need hasMidsceneModelConfig().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 01f9d408-aec2-41be-afe6-3e20735c758b

📥 Commits

Reviewing files that changed from the base of the PR and between 8083591 and 1b15be2.

📒 Files selected for processing (18)
  • .github/actions/ui-judge-comment/README.md
  • .github/actions/ui-judge-comment/action.yml
  • .github/actions/ui-judge-comment/comment.mjs
  • .github/scripts/write-ui-judge-failure-result.mjs
  • .github/ui-judge-ci.instructions.md
  • .github/ui-judge.instructions.md
  • .github/workflows/test.yml
  • .github/workflows/workflow-test.yml
  • .gitignore
  • AGENTS.md
  • packages/genui/a2ui-playground/lynx-src/a2ui/App.tsx
  • packages/genui/a2ui-playground/src/render.tsx
  • packages/genui/a2ui-playground/src/utils/renderUrl.ts
  • packages/genui/ui-judge/README.md
  • packages/genui/ui-judge/src/index.ts
  • packages/genui/ui-judge/tests/fixtures/interactive.html
  • packages/genui/ui-judge/tests/helpers/playground-preview-server.ts
  • packages/genui/ui-judge/tests/judge-page.spec.ts
💤 Files with no reviewable changes (1)
  • packages/genui/ui-judge/tests/fixtures/interactive.html

Comment thread .github/actions/ui-judge-comment/comment.mjs
Comment thread packages/genui/ui-judge/README.md
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
.github/workflows/test.yml (1)

115-123: 💤 Low value

Consider DRYing the repeated step-level condition.

The same github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && needs.ui-judge.result != 'skipped' && needs.ui-judge.result != 'cancelled' is repeated verbatim on the checkout, download-artifact, and comment steps. The reason for keeping it at step-level (rather than at job-level) is presumably so the job still resolves to success for the done aggregator on non-PR/fork events; that's fine. As a minor cleanup, you could compute it once into an env value (e.g. env.SHOULD_COMMENT) at the job level and reference ${{ env.SHOULD_COMMENT == 'true' }} per step to avoid drift if the eligibility rules change. Non-blocking.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test.yml around lines 115 - 123, The repeated complex
step-level condition used on the checkout, download-artifact, and "Comment UI
Judge result" steps should be computed once and reused; add a job-level env
variable (e.g. env.SHOULD_COMMENT) that evaluates the same expression to 'true'
or 'false' and then replace each step's if: with a simple check referencing that
env (e.g. if: ${{ env.SHOULD_COMMENT == 'true' }}), updating the steps that
include actions/checkout, actions/download-artifact@..., and the "Comment UI
Judge result" step to use the new env variable.
.github/scripts/write-ui-judge-result.mjs (1)

7-10: 💤 Low value

Guard against GITHUB_WORKSPACE being unset.

When neither UI_JUDGE_RESULT_FILE nor GITHUB_WORKSPACE is set (e.g., when running the script locally for the dry-run testing path mentioned in the PR description), join(undefined, 'ui-judge-results.json') throws a TypeError [ERR_INVALID_ARG_TYPE] with a stack trace rather than a helpful message. Adding an explicit check (or falling back to process.cwd()) keeps the failure mode clean for local invocations.

🛡️ Suggested defensive fallback
-const resultFile = process.env.UI_JUDGE_RESULT_FILE
-  || join(process.env.GITHUB_WORKSPACE, 'ui-judge-results.json');
+const workspace = process.env.GITHUB_WORKSPACE || process.cwd();
+const resultFile = process.env.UI_JUDGE_RESULT_FILE
+  || join(workspace, 'ui-judge-results.json');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/scripts/write-ui-judge-result.mjs around lines 7 - 10, The code
assumes process.env.GITHUB_WORKSPACE is defined when computing resultFile (using
join(process.env.GITHUB_WORKSPACE, 'ui-judge-results.json')), which throws if
it's undefined; change the logic that sets resultFile to first use
process.env.UI_JUDGE_RESULT_FILE, and if absent compute the path using a safe
workspace variable (e.g., const workspace = process.env.GITHUB_WORKSPACE ||
process.cwd()) before calling join; update the resultFile assignment to use that
workspace fallback so local/dry-run invocations don't throw a TypeError.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/scripts/write-ui-judge-result.mjs:
- Around line 7-10: The code assumes process.env.GITHUB_WORKSPACE is defined
when computing resultFile (using join(process.env.GITHUB_WORKSPACE,
'ui-judge-results.json')), which throws if it's undefined; change the logic that
sets resultFile to first use process.env.UI_JUDGE_RESULT_FILE, and if absent
compute the path using a safe workspace variable (e.g., const workspace =
process.env.GITHUB_WORKSPACE || process.cwd()) before calling join; update the
resultFile assignment to use that workspace fallback so local/dry-run
invocations don't throw a TypeError.

In @.github/workflows/test.yml:
- Around line 115-123: The repeated complex step-level condition used on the
checkout, download-artifact, and "Comment UI Judge result" steps should be
computed once and reused; add a job-level env variable (e.g. env.SHOULD_COMMENT)
that evaluates the same expression to 'true' or 'false' and then replace each
step's if: with a simple check referencing that env (e.g. if: ${{
env.SHOULD_COMMENT == 'true' }}), updating the steps that include
actions/checkout, actions/download-artifact@..., and the "Comment UI Judge
result" step to use the new env variable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7692921e-e4e1-40c5-98cd-b823a0d3a3f1

📥 Commits

Reviewing files that changed from the base of the PR and between 1b15be2 and 83162b2.

📒 Files selected for processing (4)
  • .github/scripts/write-ui-judge-result.mjs
  • .github/ui-judge-ci.instructions.md
  • .github/workflows/test.yml
  • .github/workflows/workflow-test.yml
💤 Files with no reviewable changes (1)
  • .github/workflows/workflow-test.yml
✅ Files skipped from review due to trivial changes (1)
  • .github/ui-judge-ci.instructions.md

@PupilTong PupilTong merged commit 2d64575 into main May 22, 2026
55 of 56 checks passed
@PupilTong PupilTong deleted the hw/codex/ui-judge-playground-preview branch May 22, 2026 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants