feat: add page-based GenUI UI judge package by PupilTong · Pull Request #2629 · lynx-family/lynx-stack

PupilTong · 2026-05-14T04:28:01Z

Summary

Add @lynx-js/ui-judge under packages/genui/ui-judge with a single public judgePage API.
Let callers provide an already-prepared Playwright page; callers own navigation, viewport, cookies, route mocks, authentication, and page lifecycle.
Use Midscene aiAct/aiNumber to interact with the current page and return a JSON-serializable visual-correctness score from 0 to 5; the returned url is read from page.url().
Add package-local Playwright tests, a static interactive HTML fixture, Rslib build config, docs, workspace references, and Midscene-related pnpm build policy.

Self-review

Verified the public surface remains limited to judgePage and exported TypeScript types.
Confirmed the package no longer launches a browser, creates a page, accepts url, or calls page.goto() internally.
Confirmed the scoring prompt requests a single numeric 0-5 value and does not reintroduce GRADE: or letter grades.
Confirmed runtime errors return JSON with score: 0 and error.message instead of escaping as unhandled failures.
Confirmed generated artifacts are not present in the worktree after local verification.

Validation

pnpm run build
pnpm -F @lynx-js/ui-judge build
pnpm eslint packages/genui/ui-judge --flag v10_config_lookup_from_file
pnpm -F @lynx-js/ui-judge test in the current session: 1 passed, 1 skipped because this Codex shell does not have MIDSCENE_MODEL_NAME.
Previous real Midscene run before the page-based API adjustment: 2 passed (17.8s).
pnpm dprint check packages/genui/ui-judge .github/ui-judge.instructions.md returned exit code 0 with a sandbox cache-write warning.
git diff --check
Commit hooks passed eslint, biome, dprint, and sort-package-json.

Summary by CodeRabbit

Release Notes

New Features
- Introduced @lynx-js/ui-judge package for automated UI evaluation
- Added judgePage API for assessing visual correctness on a 0-5 scale
- Supports Playwright-based test integration
Documentation
- Added comprehensive README with usage examples and API guidance
Chores
- Configured package build, test runner, and workspace settings

changeset-bot · 2026-05-14T04:28:08Z

⚠️ No Changeset found

Latest commit: a010ecb

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-05-14T04:28:11Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d51370ea-4eaa-4cac-9943-7ab42ce2b7ee

📥 Commits

Reviewing files that changed from the base of the PR and between 2f23edb and a010ecb.

📒 Files selected for processing (1)

packages/genui/ui-judge/rslib.config.ts

🚧 Files skipped from review as they are similar to previous changes (1)

packages/genui/ui-judge/rslib.config.ts

📝 Walkthrough

Walkthrough

This PR introduces the @lynx-js/ui-judge package, a GenUI utility that scores Playwright pages using Midscene AI for visual correctness. The package exports a single public judgePage function that executes Midscene steps on a page and returns a numeric score from 0–5 along with metadata. It includes build configuration, TypeScript setup, tests with a local fixture server, and extension guidelines.

Changes

@lynx-js/ui-judge Package

Layer / File(s)	Summary
Package Configuration & Workspace Integration `packages/genui/ui-judge/package.json`, `packages/genui/ui-judge/tsconfig.json`, `packages/genui/ui-judge/rslib.config.ts`, `packages/genui/ui-judge/turbo.json`, `packages/genui/tsconfig.json`, `packages/genui/ui-judge/playwright.config.ts`	New package manifest with ESM type and Midscene/Playwright dependencies; TypeScript composite config with ES2022 target; rslib build configuration bundling declarations; TurboRepo task with inputs/outputs; parent workspace project reference; Playwright single-worker test config with CI-conditional retries and trace retention.
Public API Contract & Documentation `packages/genui/ui-judge/src/index.ts` (types), `packages/genui/ui-judge/README.md`, `.github/ui-judge.instructions.md`	Exports `UiJudgeScore` (0–5), `JudgePageOptions`, `UiJudgeError`, `UiJudgeResult`, and `judgePage` function. README documents caller-owned Playwright lifecycle, JSON result shape with `visual-correctness` dimension, and Midscene env configuration. Extension guidelines restrict public API to `judgePage`, require integer scoring via `aiNumber()`, define screenshot and build-policy defaults, and specify test behavior for Midscene model configuration.
Core judgePage Implementation `packages/genui/ui-judge/src/index.ts` (main functions)	`judgePage` normalizes options and delegates to `judgePageUnsafe`, which waits for network-idle best-effort, creates a PlaywrightAgent, executes each step with abortable per-step timeout, requests numeric score via Midscene, normalizes score to 0–5 range, and cleans up agent. Errors are caught top-level: function returns zero score with error message, normalized steps, and best-effort page URL.
Supporting Utilities & Helpers `packages/genui/ui-judge/src/index.ts` (helper functions)	Option normalization validates task and filters steps; prompt builder constructs strict Midscene request for integer 0–5 output with grading criteria; network-idle helper waits up to timeout; score normalizer validates finiteness and clamps to range; abortable timeout races promise against AbortController abort; plain timeout races promise against timeout rejection; error conversion renders thrown values to strings; safe page URL retrieval reads `page.url()` with empty-string fallback.
Test Configuration & Fixtures `packages/genui/ui-judge/tests/fixtures/interactive.html`	Interactive HTML fixture with centered "Order confirmed" card and collapsible details section toggled by button click. Details reveal shipping, status, and viewport dimensions populated by JavaScript for testing visual-correctness assertions.
Test Suite with Midscene Integration `packages/genui/ui-judge/tests/judge-page.spec.ts`	Test harness starts local HTTP server serving fixture at `/` and `/interactive` in `beforeAll`, closes in `afterAll`. First test conditionally skips unless `MIDSCENE_MODEL_NAME` is set, navigates page, calls `judgePage` with viewport task and step, asserts dimension/url/steps/score bounds and absence of error. Second test validates empty task input, asserts score 0, empty steps, and error message presence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

Sherry-hue
colinaaa

A judge package hops into view,
Scoring pages with Midscene's might true—
From Playwright's own stage it will spring,
Each visual-correctness test will ring,
And scores from 0 to 5 will ring.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding a new page-based UI judge package to GenUI, which aligns with the comprehensive changeset introducing the `@lynx-js/ui-judge` package with the judgePage API.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-14T14:38:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

codspeed-hq · 2026-05-14T14:56:46Z

Merging this PR will improve performance by 17.81%

⚡ 1 improved benchmark
✅ 80 untouched benchmarks
⏩ 26 skipped benchmarks¹

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`basic-performance-large-css`	19 ms	16.1 ms	+17.81%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing PupilTong:codex/genuiuijudge-0 (a010ecb) with main (8aebe79)}

26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

relativeci · 2026-05-14T15:00:51Z

Web Explorer

#10038 Bundle Size — 903.49KiB (~-0.01%).

a010ecb(current) vs 531ef76 main#10033(baseline)

Bundle metrics

2 changes

	Current #10038	Baseline #10033
Initial JS	`45.06KiB`	`45.06KiB`
Initial CSS	`2.22KiB`	`2.22KiB`
Cache Invalidation	`8.34%`	`8.33%`
Chunks	`9`	`9`
Assets	`11`	`11`
Modules	`231`(`+0.43%`)	`230`
Duplicate Modules	`11`	`11`
Duplicate Code	`27.12%`	`27.12%`
Packages	`10`	`10`
Duplicate Packages	`0`	`0`

Bundle size by type

1 change

1 improvement

	Current #10038	Baseline #10033
JS	`499.11KiB` (`~-0.01%`)	`499.15KiB`
Other	`402.16KiB`	`402.16KiB`
CSS	`2.22KiB`	`2.22KiB`

Bundle analysis report Branch PupilTong:codex/genuiuijudge-0 Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-14T15:00:54Z

React Example with Element Template

#733 Bundle Size — 200.08KiB (0%).

a010ecb(current) vs 531ef76 main#728(baseline)

Bundle metrics

2 changes

	Current #733	Baseline #728
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`4`	`4`
Modules	`91`(`-1.09%`)	`92`
Duplicate Modules	`27`	`27`
Duplicate Code	`39.78%`(`+0.05%`)	`39.76%`
Packages	`2`	`2`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #733	Baseline #728
IMG	`145.76KiB`	`145.76KiB`
Other	`54.32KiB`	`54.32KiB`

Bundle analysis report Branch PupilTong:codex/genuiuijudge-0 Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-14T15:00:56Z

React External

#1579 Bundle Size — 695.64KiB (0%).

a010ecb(current) vs 531ef76 main#1574(baseline)

Bundle metrics no changes

	Current #1579	Baseline #1574
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`3`	`3`
Modules	`17`	`17`
Duplicate Modules	`5`	`5`
Duplicate Code	`8.59%`	`8.59%`
Packages	`0`	`0`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #1579	Baseline #1574
Other	`695.64KiB`	`695.64KiB`

Bundle analysis report Branch PupilTong:codex/genuiuijudge-0 Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-14T15:00:56Z

React Example

#8464 Bundle Size — 237.24KiB (0%).

a010ecb(current) vs 531ef76 main#8459(baseline)

Bundle metrics no changes

	Current #8464	Baseline #8459
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`4`	`4`
Modules	`198`	`198`
Duplicate Modules	`80`	`80`
Duplicate Code	`44.74%`	`44.74%`
Packages	`2`	`2`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #8464	Baseline #8459
IMG	`145.76KiB`	`145.76KiB`
Other	`91.48KiB`	`91.48KiB`

Bundle analysis report Branch PupilTong:codex/genuiuijudge-0 Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-14T15:01:04Z

React MTF Example

#1597 Bundle Size — 208.18KiB (0%).

a010ecb(current) vs 531ef76 main#1592(baseline)

Bundle metrics no changes

	Current #1597	Baseline #1592
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`3`	`3`
Modules	`193`	`193`
Duplicate Modules	`77`	`77`
Duplicate Code	`44.24%`	`44.24%`
Packages	`2`	`2`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #1597	Baseline #1592
IMG	`111.23KiB`	`111.23KiB`
Other	`96.95KiB`	`96.95KiB`

Bundle analysis report Branch PupilTong:codex/genuiuijudge-0 Project dashboard

^{Generated by RelativeCI Documentation Report issue}

This reverts commit e36f807.

coderabbitai

🧹 Nitpick comments (2)

packages/genui/ui-judge/tests/fixtures/interactive.html (1)

108-110: ⚡ Quick win

Fix inconsistent indentation in the script block.

Line 108 has no indentation while lines 109-110 have 6 spaces. All variable declarations should use consistent indentation.

✨ Proposed fix for consistent indentation

     <script>
-    const details = document.getElementById('details');
+      const details = document.getElementById('details');
       const viewport = document.getElementById('viewport');
       const reveal = document.getElementById('reveal');

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/tests/fixtures/interactive.html` around lines 108 -
110, The const declarations for the DOM elements (const details, const viewport,
const reveal) have inconsistent indentation; make all three declarations use the
same indentation level (e.g., align the leading whitespace so each line starts
with the same number of spaces or tabs) so the script block is consistently
formatted; update the lines that define document.getElementById('details'),
document.getElementById('viewport'), and document.getElementById('reveal') to
match the chosen indentation style.

packages/genui/ui-judge/package.json (1)

25-30: ⚡ Quick win

Move @playwright/test to devDependencies.

The runtime source (src/index.ts) only imports Page as a type, while @playwright/test is actually needed for test execution and build configuration. Moving it from dependencies to devDependencies prevents test tooling from being included in runtime installs.

♻️ Proposed manifest adjustment

   "dependencies": {
-    "`@midscene/web`": "^1.8.0",
-    "`@playwright/test`": "^1.58.2"
+    "`@midscene/web`": "^1.8.0"
   },
   "devDependencies": {
-    "`@types/node`": "^24.10.13"
+    "`@playwright/test`": "^1.58.2",
+    "`@types/node`": "^24.10.13"
   },

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/package.json` around lines 25 - 30, Update the
package manifest so `@playwright/test` is listed under devDependencies instead of
dependencies: remove "`@playwright/test`" from the "dependencies" block and add it
to "devDependencies" (keeping the same version "^1.58.2"); this ensures runtime
imports like the type-only Page in src/index.ts do not pull test tooling into
production installs and keeps test-only packages with other dev tooling.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/genui/ui-judge/package.json`:
- Around line 25-30: Update the package manifest so `@playwright/test` is listed
under devDependencies instead of dependencies: remove "`@playwright/test`" from
the "dependencies" block and add it to "devDependencies" (keeping the same
version "^1.58.2"); this ensures runtime imports like the type-only Page in
src/index.ts do not pull test tooling into production installs and keeps
test-only packages with other dev tooling.

In `@packages/genui/ui-judge/tests/fixtures/interactive.html`:
- Around line 108-110: The const declarations for the DOM elements (const
details, const viewport, const reveal) have inconsistent indentation; make all
three declarations use the same indentation level (e.g., align the leading
whitespace so each line starts with the same number of spaces or tabs) so the
script block is consistently formatted; update the lines that define
document.getElementById('details'), document.getElementById('viewport'), and
document.getElementById('reveal') to match the chosen indentation style.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0ea55898-b698-48bb-9d04-c1ebb9f31cee

📥 Commits

Reviewing files that changed from the base of the PR and between 363f9e7 and 71a8d17.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (12)

.github/ui-judge.instructions.md
packages/genui/tsconfig.json
packages/genui/ui-judge/README.md
packages/genui/ui-judge/package.json
packages/genui/ui-judge/playwright.config.ts
packages/genui/ui-judge/rslib.config.ts
packages/genui/ui-judge/src/index.ts
packages/genui/ui-judge/tests/fixtures/interactive.html
packages/genui/ui-judge/tests/judge-page.spec.ts
packages/genui/ui-judge/tsconfig.build.json
packages/genui/ui-judge/tsconfig.json
packages/genui/ui-judge/turbo.json

PupilTong changed the title ~~[codex] add GenUI UI judge package~~ [codex] add page-based GenUI UI judge package May 14, 2026

PupilTong changed the title ~~[codex] add page-based GenUI UI judge package~~ feat: add page-based GenUI UI judge package May 14, 2026

PupilTong self-assigned this May 14, 2026

PupilTong force-pushed the codex/genuiuijudge-0 branch from a01f968 to 32f64bd Compare May 15, 2026 10:38

PupilTong added 4 commits May 15, 2026 18:58

feat(genui): add ui judge package

0c5ff70

refactor(genui): judge caller-provided pages

bf26a95

fix(genui): satisfy strict env access

9cc2c1a

ci(genui): run and report ui judge

e36f807

PupilTong force-pushed the codex/genuiuijudge-0 branch from 32f64bd to e36f807 Compare May 15, 2026 10:59

github-advanced-security AI found potential problems May 15, 2026

View reviewed changes

Comment thread .github/workflows/ui-judge-pr-comment.yml Fixed

Revert "ci(genui): run and report ui judge"

71a8d17

This reverts commit e36f807.

PupilTong marked this pull request as ready for review May 18, 2026 13:42

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

PupilTong added 2 commits May 18, 2026 22:00

chore: dedupe lockfile

0b63447

ci: reduce web core e2e file watchers

2f23edb

PupilTong requested a review from Sherry-hue as a code owner May 18, 2026 14:33

chore: trim ui judge cleanup

a010ecb

HuJean approved these changes May 19, 2026

View reviewed changes

PupilTong merged commit fba0849 into lynx-family:main May 19, 2026
113 of 117 checks passed

This was referenced May 21, 2026

test(a2ui): use playground render preview for ui-judge tests #2673

Merged

test(ui-judge): score more playground examples #2689

Merged

coderabbitai Bot mentioned this pull request May 22, 2026

test(ui-judge): add GEQI dimension scoring #2693

Open

Conversation

PupilTong commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Self-review

Validation

Summary by CodeRabbit

Release Notes

Uh oh!

changeset-bot Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 17.81%

Performance Changes

Footnotes

Uh oh!

relativeci Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Web Explorer

#10038 Bundle Size — 903.49KiB (~-0.01%).

Uh oh!

relativeci Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React Example with Element Template

#733 Bundle Size — 200.08KiB (0%).

Uh oh!

relativeci Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React External

#1579 Bundle Size — 695.64KiB (0%).

Uh oh!

relativeci Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React Example

#8464 Bundle Size — 237.24KiB (0%).

Uh oh!

relativeci Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React MTF Example

#1597 Bundle Size — 208.18KiB (0%).

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PupilTong commented May 14, 2026 •

edited by coderabbitai Bot

Loading

changeset-bot Bot commented May 14, 2026 •

edited

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading

codecov Bot commented May 14, 2026 •

edited

Loading

codspeed-hq Bot commented May 14, 2026 •

edited

Loading

relativeci Bot commented May 14, 2026 •

edited

Loading

relativeci Bot commented May 14, 2026 •

edited

Loading

relativeci Bot commented May 14, 2026 •

edited

Loading

relativeci Bot commented May 14, 2026 •

edited

Loading

relativeci Bot commented May 14, 2026 •

edited

Loading