Skip to content

feat: add page-based GenUI UI judge package#2629

Merged
PupilTong merged 8 commits into
lynx-family:mainfrom
PupilTong:codex/genuiuijudge-0
May 19, 2026
Merged

feat: add page-based GenUI UI judge package#2629
PupilTong merged 8 commits into
lynx-family:mainfrom
PupilTong:codex/genuiuijudge-0

Conversation

@PupilTong
Copy link
Copy Markdown
Collaborator

@PupilTong PupilTong commented May 14, 2026

Summary

  • Add @lynx-js/ui-judge under packages/genui/ui-judge with a single public judgePage API.
  • Let callers provide an already-prepared Playwright page; callers own navigation, viewport, cookies, route mocks, authentication, and page lifecycle.
  • Use Midscene aiAct/aiNumber to interact with the current page and return a JSON-serializable visual-correctness score from 0 to 5; the returned url is read from page.url().
  • Add package-local Playwright tests, a static interactive HTML fixture, Rslib build config, docs, workspace references, and Midscene-related pnpm build policy.

Self-review

  • Verified the public surface remains limited to judgePage and exported TypeScript types.
  • Confirmed the package no longer launches a browser, creates a page, accepts url, or calls page.goto() internally.
  • Confirmed the scoring prompt requests a single numeric 0-5 value and does not reintroduce GRADE: or letter grades.
  • Confirmed runtime errors return JSON with score: 0 and error.message instead of escaping as unhandled failures.
  • Confirmed generated artifacts are not present in the worktree after local verification.

Validation

  • pnpm run build
  • pnpm -F @lynx-js/ui-judge build
  • pnpm eslint packages/genui/ui-judge --flag v10_config_lookup_from_file
  • pnpm -F @lynx-js/ui-judge test in the current session: 1 passed, 1 skipped because this Codex shell does not have MIDSCENE_MODEL_NAME.
  • Previous real Midscene run before the page-based API adjustment: 2 passed (17.8s).
  • pnpm dprint check packages/genui/ui-judge .github/ui-judge.instructions.md returned exit code 0 with a sandbox cache-write warning.
  • git diff --check
  • Commit hooks passed eslint, biome, dprint, and sort-package-json.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced @lynx-js/ui-judge package for automated UI evaluation
    • Added judgePage API for assessing visual correctness on a 0-5 scale
    • Supports Playwright-based test integration
  • Documentation

    • Added comprehensive README with usage examples and API guidance
  • Chores

    • Configured package build, test runner, and workspace settings

Review Change Stack

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 14, 2026

⚠️ No Changeset found

Latest commit: a010ecb

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d51370ea-4eaa-4cac-9943-7ab42ce2b7ee

📥 Commits

Reviewing files that changed from the base of the PR and between 2f23edb and a010ecb.

📒 Files selected for processing (1)
  • packages/genui/ui-judge/rslib.config.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/genui/ui-judge/rslib.config.ts

📝 Walkthrough

Walkthrough

This PR introduces the @lynx-js/ui-judge package, a GenUI utility that scores Playwright pages using Midscene AI for visual correctness. The package exports a single public judgePage function that executes Midscene steps on a page and returns a numeric score from 0–5 along with metadata. It includes build configuration, TypeScript setup, tests with a local fixture server, and extension guidelines.

Changes

@lynx-js/ui-judge Package

Layer / File(s) Summary
Package Configuration & Workspace Integration
packages/genui/ui-judge/package.json, packages/genui/ui-judge/tsconfig.json, packages/genui/ui-judge/rslib.config.ts, packages/genui/ui-judge/turbo.json, packages/genui/tsconfig.json, packages/genui/ui-judge/playwright.config.ts
New package manifest with ESM type and Midscene/Playwright dependencies; TypeScript composite config with ES2022 target; rslib build configuration bundling declarations; TurboRepo task with inputs/outputs; parent workspace project reference; Playwright single-worker test config with CI-conditional retries and trace retention.
Public API Contract & Documentation
packages/genui/ui-judge/src/index.ts (types), packages/genui/ui-judge/README.md, .github/ui-judge.instructions.md
Exports UiJudgeScore (0–5), JudgePageOptions, UiJudgeError, UiJudgeResult, and judgePage function. README documents caller-owned Playwright lifecycle, JSON result shape with visual-correctness dimension, and Midscene env configuration. Extension guidelines restrict public API to judgePage, require integer scoring via aiNumber(), define screenshot and build-policy defaults, and specify test behavior for Midscene model configuration.
Core judgePage Implementation
packages/genui/ui-judge/src/index.ts (main functions)
judgePage normalizes options and delegates to judgePageUnsafe, which waits for network-idle best-effort, creates a PlaywrightAgent, executes each step with abortable per-step timeout, requests numeric score via Midscene, normalizes score to 0–5 range, and cleans up agent. Errors are caught top-level: function returns zero score with error message, normalized steps, and best-effort page URL.
Supporting Utilities & Helpers
packages/genui/ui-judge/src/index.ts (helper functions)
Option normalization validates task and filters steps; prompt builder constructs strict Midscene request for integer 0–5 output with grading criteria; network-idle helper waits up to timeout; score normalizer validates finiteness and clamps to range; abortable timeout races promise against AbortController abort; plain timeout races promise against timeout rejection; error conversion renders thrown values to strings; safe page URL retrieval reads page.url() with empty-string fallback.
Test Configuration & Fixtures
packages/genui/ui-judge/tests/fixtures/interactive.html
Interactive HTML fixture with centered "Order confirmed" card and collapsible details section toggled by button click. Details reveal shipping, status, and viewport dimensions populated by JavaScript for testing visual-correctness assertions.
Test Suite with Midscene Integration
packages/genui/ui-judge/tests/judge-page.spec.ts
Test harness starts local HTTP server serving fixture at / and /interactive in beforeAll, closes in afterAll. First test conditionally skips unless MIDSCENE_MODEL_NAME is set, navigates page, calls judgePage with viewport task and step, asserts dimension/url/steps/score bounds and absence of error. Second test validates empty task input, asserts score 0, empty steps, and error message presence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • Sherry-hue
  • colinaaa

A judge package hops into view,
Scoring pages with Midscene's might true—
From Playwright's own stage it will spring,
Each visual-correctness test will ring,
And scores from 0 to 5 will ring.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a new page-based UI judge package to GenUI, which aligns with the comprehensive changeset introducing the @lynx-js/ui-judge package with the judgePage API.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@PupilTong PupilTong changed the title [codex] add GenUI UI judge package [codex] add page-based GenUI UI judge package May 14, 2026
@PupilTong PupilTong changed the title [codex] add page-based GenUI UI judge package feat: add page-based GenUI UI judge package May 14, 2026
@PupilTong PupilTong self-assigned this May 14, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 14, 2026

Merging this PR will improve performance by 17.81%

⚡ 1 improved benchmark
✅ 80 untouched benchmarks
⏩ 26 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
basic-performance-large-css 19 ms 16.1 ms +17.81%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing PupilTong:codex/genuiuijudge-0 (a010ecb) with main (8aebe79)

Open in CodSpeed

Footnotes

  1. 26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 14, 2026

Web Explorer

#10038 Bundle Size — 903.49KiB (~-0.01%).

a010ecb(current) vs 531ef76 main#10033(baseline)

Bundle metrics  Change 2 changes
                 Current
#10038
     Baseline
#10033
No change  Initial JS 45.06KiB 45.06KiB
No change  Initial CSS 2.22KiB 2.22KiB
Change  Cache Invalidation 8.34% 8.33%
No change  Chunks 9 9
No change  Assets 11 11
Change  Modules 231(+0.43%) 230
No change  Duplicate Modules 11 11
No change  Duplicate Code 27.12% 27.12%
No change  Packages 10 10
No change  Duplicate Packages 0 0
Bundle size by type  Change 1 change Improvement 1 improvement
                 Current
#10038
     Baseline
#10033
Improvement  JS 499.11KiB (~-0.01%) 499.15KiB
No change  Other 402.16KiB 402.16KiB
No change  CSS 2.22KiB 2.22KiB

Bundle analysis reportBranch PupilTong:codex/genuiuijudge-0Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 14, 2026

React Example with Element Template

#733 Bundle Size — 200.08KiB (0%).

a010ecb(current) vs 531ef76 main#728(baseline)

Bundle metrics  Change 2 changes
                 Current
#733
     Baseline
#728
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 4 4
Change  Modules 91(-1.09%) 92
No change  Duplicate Modules 27 27
Change  Duplicate Code 39.78%(+0.05%) 39.76%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#733
     Baseline
#728
No change  IMG 145.76KiB 145.76KiB
No change  Other 54.32KiB 54.32KiB

Bundle analysis reportBranch PupilTong:codex/genuiuijudge-0Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 14, 2026

React External

#1579 Bundle Size — 695.64KiB (0%).

a010ecb(current) vs 531ef76 main#1574(baseline)

Bundle metrics  no changes
                 Current
#1579
     Baseline
#1574
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 17 17
No change  Duplicate Modules 5 5
No change  Duplicate Code 8.59% 8.59%
No change  Packages 0 0
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#1579
     Baseline
#1574
No change  Other 695.64KiB 695.64KiB

Bundle analysis reportBranch PupilTong:codex/genuiuijudge-0Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 14, 2026

React Example

#8464 Bundle Size — 237.24KiB (0%).

a010ecb(current) vs 531ef76 main#8459(baseline)

Bundle metrics  no changes
                 Current
#8464
     Baseline
#8459
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 4 4
No change  Modules 198 198
No change  Duplicate Modules 80 80
No change  Duplicate Code 44.74% 44.74%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#8464
     Baseline
#8459
No change  IMG 145.76KiB 145.76KiB
No change  Other 91.48KiB 91.48KiB

Bundle analysis reportBranch PupilTong:codex/genuiuijudge-0Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 14, 2026

React MTF Example

#1597 Bundle Size — 208.18KiB (0%).

a010ecb(current) vs 531ef76 main#1592(baseline)

Bundle metrics  no changes
                 Current
#1597
     Baseline
#1592
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 193 193
No change  Duplicate Modules 77 77
No change  Duplicate Code 44.24% 44.24%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#1597
     Baseline
#1592
No change  IMG 111.23KiB 111.23KiB
No change  Other 96.95KiB 96.95KiB

Bundle analysis reportBranch PupilTong:codex/genuiuijudge-0Project dashboard


Generated by RelativeCIDocumentationReport issue

@PupilTong PupilTong force-pushed the codex/genuiuijudge-0 branch from a01f968 to 32f64bd Compare May 15, 2026 10:38
@PupilTong PupilTong force-pushed the codex/genuiuijudge-0 branch from 32f64bd to e36f807 Compare May 15, 2026 10:59
Comment thread .github/workflows/ui-judge-pr-comment.yml Fixed
@PupilTong PupilTong marked this pull request as ready for review May 18, 2026 13:42
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
packages/genui/ui-judge/tests/fixtures/interactive.html (1)

108-110: ⚡ Quick win

Fix inconsistent indentation in the script block.

Line 108 has no indentation while lines 109-110 have 6 spaces. All variable declarations should use consistent indentation.

✨ Proposed fix for consistent indentation
     <script>
-    const details = document.getElementById('details');
+      const details = document.getElementById('details');
       const viewport = document.getElementById('viewport');
       const reveal = document.getElementById('reveal');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/tests/fixtures/interactive.html` around lines 108 -
110, The const declarations for the DOM elements (const details, const viewport,
const reveal) have inconsistent indentation; make all three declarations use the
same indentation level (e.g., align the leading whitespace so each line starts
with the same number of spaces or tabs) so the script block is consistently
formatted; update the lines that define document.getElementById('details'),
document.getElementById('viewport'), and document.getElementById('reveal') to
match the chosen indentation style.
packages/genui/ui-judge/package.json (1)

25-30: ⚡ Quick win

Move @playwright/test to devDependencies.

The runtime source (src/index.ts) only imports Page as a type, while @playwright/test is actually needed for test execution and build configuration. Moving it from dependencies to devDependencies prevents test tooling from being included in runtime installs.

♻️ Proposed manifest adjustment
   "dependencies": {
-    "`@midscene/web`": "^1.8.0",
-    "`@playwright/test`": "^1.58.2"
+    "`@midscene/web`": "^1.8.0"
   },
   "devDependencies": {
-    "`@types/node`": "^24.10.13"
+    "`@playwright/test`": "^1.58.2",
+    "`@types/node`": "^24.10.13"
   },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/package.json` around lines 25 - 30, Update the
package manifest so `@playwright/test` is listed under devDependencies instead of
dependencies: remove "`@playwright/test`" from the "dependencies" block and add it
to "devDependencies" (keeping the same version "^1.58.2"); this ensures runtime
imports like the type-only Page in src/index.ts do not pull test tooling into
production installs and keeps test-only packages with other dev tooling.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/genui/ui-judge/package.json`:
- Around line 25-30: Update the package manifest so `@playwright/test` is listed
under devDependencies instead of dependencies: remove "`@playwright/test`" from
the "dependencies" block and add it to "devDependencies" (keeping the same
version "^1.58.2"); this ensures runtime imports like the type-only Page in
src/index.ts do not pull test tooling into production installs and keeps
test-only packages with other dev tooling.

In `@packages/genui/ui-judge/tests/fixtures/interactive.html`:
- Around line 108-110: The const declarations for the DOM elements (const
details, const viewport, const reveal) have inconsistent indentation; make all
three declarations use the same indentation level (e.g., align the leading
whitespace so each line starts with the same number of spaces or tabs) so the
script block is consistently formatted; update the lines that define
document.getElementById('details'), document.getElementById('viewport'), and
document.getElementById('reveal') to match the chosen indentation style.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0ea55898-b698-48bb-9d04-c1ebb9f31cee

📥 Commits

Reviewing files that changed from the base of the PR and between 363f9e7 and 71a8d17.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (12)
  • .github/ui-judge.instructions.md
  • packages/genui/tsconfig.json
  • packages/genui/ui-judge/README.md
  • packages/genui/ui-judge/package.json
  • packages/genui/ui-judge/playwright.config.ts
  • packages/genui/ui-judge/rslib.config.ts
  • packages/genui/ui-judge/src/index.ts
  • packages/genui/ui-judge/tests/fixtures/interactive.html
  • packages/genui/ui-judge/tests/judge-page.spec.ts
  • packages/genui/ui-judge/tsconfig.build.json
  • packages/genui/ui-judge/tsconfig.json
  • packages/genui/ui-judge/turbo.json

@PupilTong PupilTong requested a review from Sherry-hue as a code owner May 18, 2026 14:33
@PupilTong PupilTong merged commit fba0849 into lynx-family:main May 19, 2026
113 of 117 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants