Skip to content

feat: add Kitten-Lynx Android support to UI Judge#2716

Merged
PupilTong merged 4 commits into
mainfrom
hw/codex/midscene-android-ui-judge
Jun 1, 2026
Merged

feat: add Kitten-Lynx Android support to UI Judge#2716
PupilTong merged 4 commits into
mainfrom
hw/codex/midscene-android-ui-judge

Conversation

@PupilTong
Copy link
Copy Markdown
Collaborator

@PupilTong PupilTong commented May 26, 2026

Summary

  • Change judgeAndroidAgent to mirror judgePage by accepting a page from @lynx-js/kitten-lynx-test-infra's Lynx.newPage() instead of a caller-supplied Midscene agent.
  • Add a Midscene core adapter for Kitten-Lynx pages that uses Lynx screenshots for scoring and CDP touch events for tap/swipe actions.
  • Split UI Judge Android coverage into a Vitest suite and wire it into the Kitten-Lynx Android emulator CI job while keeping Playwright web/model tests separate.

Validation

  • pnpm turbo build --filter @lynx-js/kitten-lynx-test-infra
  • pnpm --filter @lynx-js/kitten-lynx-test-infra test
  • pnpm --filter @lynx-js/ui-judge build
  • pnpm --filter @lynx-js/ui-judge run api-extractor
  • pnpm --filter @lynx-js/ui-judge run test:android
  • UI_JUDGE_ANDROID_INTEGRATION=1 pnpm --filter @lynx-js/ui-judge run test:android
  • env -u MIDSCENE_MODEL_NAME pnpm --filter @lynx-js/ui-judge run test:playwright
  • pnpm eslint --cache --fix --no-warn-ignored --flag v10_config_lookup_from_file ...changed files...
  • pnpm biome check ...changed files...
  • pnpm dprint check ...changed files...
  • git diff --check

Summary by CodeRabbit

  • New Features

    • Android UI Judge for evaluating Lynx screens on Android emulators
    • Kitten Lynx view exposes the current navigated bundle URL
  • Documentation

    • Guidance for extending UI Judge on Android, API/agent expectations, and test execution updated
    • README and API docs describe Android judge usage and returned URL behavior
  • Chores

    • CI updated to run Android UI Judge tests and collect reports/coverage
    • Test/build scripts and package build pipelines reorganized for GenUI packages

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 26, 2026

🦋 Changeset detected

Latest commit: e2c3803

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@lynx-js/kitten-lynx-test-infra Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an Android-targeted judging flow: new judgeAndroidAgent API and types, Kitten-Lynx page URL tracking, a Midscene adapter translating actions/screenshots for Kitten-Lynx, shared scoring/refactor, Vitest-based Android tests and CI steps, and supporting build/test configuration updates.

Changes

Android UI Judge Integration

Layer / File(s) Summary
KittenLynx URL tracking
packages/testing-library/kitten-lynx/src/KittenLynxView.ts
Adds _url field and url() getter to store and retrieve the last navigated URL from goto().
Public API, types, and guidance
packages/genui/ui-judge/src/index.ts, packages/genui/ui-judge/etc/ui-judge.api.md, packages/genui/ui-judge/README.md, .changeset/quiet-views-report.md, .github/ui-judge.instructions.md
Exports KittenLynxJudgePage (screenshot + url) and JudgeAndroidAgentOptions (task, page, dimension, steps, reference, timeout); documents judgeAndroidAgent in README and API report; adds changeset; updates extension guidance requiring page.url() mapping and Android test routing.
Kitten-Lynx Midscene adapter
packages/genui/ui-judge/src/index.ts
Implements KittenLynxMidscenePage translating Midscene tap/swipe to Kitten-Lynx input events, requires attached channel, and provides screenshot capture with PNG/JPEG format detection and dimension parsing.
Shared judge execution refactor
packages/genui/ui-judge/src/index.ts
Refactors scoring into judgeWithAgentUnsafe with per-step abortable timeouts; normalizes options across Playwright and Android; generalizes dimension validation and prompt building; updates judgePage to use shared logic.
Android judge entrypoint
packages/genui/ui-judge/src/index.ts
Adds judgeAndroidAgent that normalizes Android options, delegates to unsafe judging, and returns a UiJudgeResult with fallback dimension, default score 0, normalized steps, and Kitten-Lynx page URL on error.
Build and test configuration
packages/genui/ui-judge/rslib.config.ts, packages/genui/ui-judge/tsconfig.build.json, packages/genui/ui-judge/tsconfig.json, packages/genui/ui-judge/vitest.config.ts, packages/genui/ui-judge/package.json, packages/genui/ui-judge/playwright.config.ts, .github/a2ui-catalog.instructions.md, examples/react-externals/package.json, packages/genui/package.json, packages/genui/turbo.json, packages/genui/a2ui-catalog-extractor/*, packages/genui/a2ui-prompt/*, packages/genui/a2ui/*, packages/genui/openui/*
Adds rslib tsconfigPath, src-only build tsconfig, Vitest config (node env, 60s timeouts, test discovery), splits test scripts into test:android (Vitest) and test:playwright (Playwright), adds @midscene/core/vitest/@lynx-js/kitten-lynx-test-infra deps, Playwright ignores Vitest specs, simplifies genui/react-externals build scripts, adds a2ui-catalog-extractor prerequisite, and adds Turbo api-extractor task (cache disabled).
Android integration test suite
packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts
Adds validation test (missing page error) and conditional Android integration test (fixture spawn, readiness polling, ADB reverse, Lynx connect, newPage, judgeAndroidAgent, task validation); includes subprocess utilities and reliable teardown.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • lynx-family/lynx-stack#2751: Build order fix ensuring @lynx-js/genui-a2ui-catalog-extractor builds before @lynx-js/genui-a2ui's API extraction step, aligned with this PR's a2ui-catalog.instructions prerequisite update.
  • lynx-family/lynx-stack#2741: Related API Extractor and .api.md generation changes that touch ui-judge API reporting used alongside this PR's exported API additions.
  • lynx-family/lynx-stack#2712: Overlapping GenUI build/tooling changes touching Turbo/task and package build scripts.

Suggested reviewers

  • HuJean
  • gaoachao
  • fzx2666-fz

"🐰 I hopped through code with tiny paws,
Midscene prompts and Kitten-Lynx applause,
I stored the last URL with a gentle nudge,
Ran tests in Vitest, let Playwright judge,
Android screens now shine — a carrot-sized trudge!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add Kitten-Lynx Android support to UI Judge' accurately summarizes the main change—adding Android/Kitten-Lynx support to the UI Judge library.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hw/codex/midscene-android-ui-judge

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 44.44444% with 5 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
.../testing-library/kitten-lynx/src/KittenLynxView.ts 44.44% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 26, 2026

Merging this PR will degrade performance by 10.73%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 1 regressed benchmark
✅ 80 untouched benchmarks
⏩ 26 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
transform 1000 view elements 42.2 ms 47.2 ms -10.73%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing hw/codex/midscene-android-ui-judge (e2c3803) with main (fc217dc)

Open in CodSpeed

Footnotes

  1. 26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

UI Judge

GEQI weighted score: 61.8 / 100 across 8 examples.
Average visual-correctness score: 3.3 / 5.

Dimension Weight Average Results Status
Usability & Interaction 30% 3 / 5 8 OK
Visual & Aesthetics 25% 3.1 / 5 8 OK
Consistency & Standards 15% 3.4 / 5 8 OK
Architecture & UX Writing 15% 3.1 / 5 8 OK
Accessibility & Performance 15% 2.9 / 5 8 OK
# Example Visual Correctness Usability & Interaction (30%) Visual & Aesthetics (25%) Consistency & Standards (15%) Architecture & UX Writing (15%) Accessibility & Performance (15%) GEQI Page Status
1 recs 2 / 5 2 / 5 3 / 5 2 / 5 2 / 5 3 / 5 48 / 100 preview OK
2 cast-grid 5 / 5 5 / 5 4 / 5 5 / 5 5 / 5 3 / 5 89 / 100 preview OK
3 citywalk-list 2 / 5 2 / 5 3 / 5 3 / 5 2 / 5 2 / 5 48 / 100 preview OK
4 fridge-search 3 / 5 3 / 5 3 / 5 4 / 5 3 / 5 3 / 5 63 / 100 preview OK
5 trip-planner 2 / 5 3 / 5 2 / 5 2 / 5 3 / 5 2 / 5 49 / 100 preview OK
6 weather-current 5 / 5 3 / 5 4 / 5 5 / 5 4 / 5 4 / 5 77 / 100 preview OK
7 product-card 5 / 5 4 / 5 4 / 5 4 / 5 4 / 5 4 / 5 80 / 100 preview OK
8 workout-plan 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 40 / 100 preview OK
Details

Result 1

  • Example: recs
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 3 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 2 / 5 (15%)
    • Accessibility & Performance: 3 / 5 (15%)
  • Task: The A2UI playground preview should show date-night dining recommendations for Moonlight Terrace, Pinewood Bistro, and Sea Breeze Kitchen.

Result 2

  • Example: cast-grid
  • Dimension: visual-correctness
  • Visual correctness: 5 / 5
  • GEQI dimensions:
    • Usability & Interaction: 5 / 5 (30%)
    • Visual & Aesthetics: 4 / 5 (25%)
    • Consistency & Standards: 5 / 5 (15%)
    • Architecture & UX Writing: 5 / 5 (15%)
    • Accessibility & Performance: 3 / 5 (15%)
  • Task: The A2UI playground preview should show a cast grid for the short film Night Notes, including Lin Xia and Zhou Ning cast cards.

Result 3

  • Example: citywalk-list
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 3 / 5 (25%)
    • Consistency & Standards: 3 / 5 (15%)
    • Architecture & UX Writing: 2 / 5 (15%)
    • Accessibility & Performance: 2 / 5 (15%)
  • Task: The A2UI playground preview should show weekend citywalk coffee picks with Rooftop Brew Room, Corner Canvas Lab, and Late Sun Roastery.

Result 4

  • Example: fridge-search
  • Dimension: visual-correctness
  • Visual correctness: 3 / 5
  • GEQI dimensions:
    • Usability & Interaction: 3 / 5 (30%)
    • Visual & Aesthetics: 3 / 5 (25%)
    • Consistency & Standards: 4 / 5 (15%)
    • Architecture & UX Writing: 3 / 5 (15%)
    • Accessibility & Performance: 3 / 5 (15%)
  • Task: The A2UI playground preview should show refrigerator search results with Siemens, Hualing, Haier, and Midea product cards.

Result 5

  • Example: trip-planner
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 3 / 5 (30%)
    • Visual & Aesthetics: 2 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 3 / 5 (15%)
    • Accessibility & Performance: 2 / 5 (15%)
  • Task: The A2UI playground preview should show a Kyoto 48-hour trip planner with Day 1 and Day 2 itinerary sections, including Monkey Park Viewpoint.

Result 6

  • Example: weather-current
  • Dimension: visual-correctness
  • Visual correctness: 5 / 5
  • GEQI dimensions:
    • Usability & Interaction: 3 / 5 (30%)
    • Visual & Aesthetics: 4 / 5 (25%)
    • Consistency & Standards: 5 / 5 (15%)
    • Architecture & UX Writing: 4 / 5 (15%)
    • Accessibility & Performance: 4 / 5 (15%)
  • Task: The A2UI playground preview should show the current weather for Austin, TX, including clear skies with light breeze.

Result 7

  • Example: product-card
  • Dimension: visual-correctness
  • Visual correctness: 5 / 5
  • GEQI dimensions:
    • Usability & Interaction: 4 / 5 (30%)
    • Visual & Aesthetics: 4 / 5 (25%)
    • Consistency & Standards: 4 / 5 (15%)
    • Architecture & UX Writing: 4 / 5 (15%)
    • Accessibility & Performance: 4 / 5 (15%)
  • Task: The A2UI playground preview should show a Wireless Headphones Pro product card with a visible Add to Cart action.

Result 8

  • Example: workout-plan
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 2 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 2 / 5 (15%)
    • Accessibility & Performance: 2 / 5 (15%)
  • Task: The A2UI playground preview should show a weekly workout plan with five days from Monday Ramp-Up through Friday Conditioning.

Workflow run

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 26, 2026

React Example with Element Template

#954 Bundle Size — 204.36KiB (0%).

83927eb(current) vs 81802d3 main#950(baseline)

Bundle metrics  no changes
                 Current
#954
     Baseline
#950
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 4 4
No change  Modules 124 124
No change  Duplicate Modules 50 50
No change  Duplicate Code 45.19% 45.19%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#954
     Baseline
#950
No change  IMG 145.76KiB 145.76KiB
No change  Other 58.61KiB 58.61KiB

Bundle analysis reportBranch hw/codex/midscene-android-ui-jud...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 26, 2026

React MTF Example

#1819 Bundle Size — 208.94KiB (0%).

83927eb(current) vs 81802d3 main#1815(baseline)

Bundle metrics  no changes
                 Current
#1819
     Baseline
#1815
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 199 199
No change  Duplicate Modules 78 78
No change  Duplicate Code 44.08% 44.08%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#1819
     Baseline
#1815
No change  IMG 111.23KiB 111.23KiB
No change  Other 97.71KiB 97.71KiB

Bundle analysis reportBranch hw/codex/midscene-android-ui-jud...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 26, 2026

React Example

#8685 Bundle Size — 238KiB (0%).

83927eb(current) vs 81802d3 main#8681(baseline)

Bundle metrics  no changes
                 Current
#8685
     Baseline
#8681
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 4 4
No change  Modules 204 204
No change  Duplicate Modules 81 81
No change  Duplicate Code 44.59% 44.59%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#8685
     Baseline
#8681
No change  IMG 145.76KiB 145.76KiB
No change  Other 92.24KiB 92.24KiB

Bundle analysis reportBranch hw/codex/midscene-android-ui-jud...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 26, 2026

React External

#1801 Bundle Size — 699.5KiB (0%).

83927eb(current) vs 81802d3 main#1797(baseline)

Bundle metrics  no changes
                 Current
#1801
     Baseline
#1797
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 17 17
No change  Duplicate Modules 5 5
No change  Duplicate Code 7.13% 7.13%
No change  Packages 0 0
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#1801
     Baseline
#1797
No change  Other 699.5KiB 699.5KiB

Bundle analysis reportBranch hw/codex/midscene-android-ui-jud...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 26, 2026

Web Explorer

#10262 Bundle Size — 903.53KiB (0%).

83927eb(current) vs 81802d3 main#10258(baseline)

Bundle metrics  no changes
                 Current
#10262
     Baseline
#10258
No change  Initial JS 45.06KiB 45.06KiB
No change  Initial CSS 2.22KiB 2.22KiB
No change  Cache Invalidation 0% 0%
No change  Chunks 9 9
No change  Assets 11 11
No change  Modules 231 231
No change  Duplicate Modules 11 11
No change  Duplicate Code 27.12% 27.12%
No change  Packages 10 10
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#10262
     Baseline
#10258
No change  JS 499.15KiB 499.15KiB
No change  Other 402.16KiB 402.16KiB
No change  CSS 2.22KiB 2.22KiB

Bundle analysis reportBranch hw/codex/midscene-android-ui-jud...Project dashboard


Generated by RelativeCIDocumentationReport issue

@PupilTong PupilTong force-pushed the hw/codex/midscene-android-ui-judge branch from 83927eb to d986c4a Compare May 27, 2026 07:46
@PupilTong PupilTong changed the title [codex] add Android Midscene agent support to UI Judge [codex] add Kitten-Lynx Android support to UI Judge May 27, 2026
@PupilTong PupilTong self-assigned this May 27, 2026
@PupilTong PupilTong force-pushed the hw/codex/midscene-android-ui-judge branch 3 times, most recently from 034b63f to 84b229d Compare May 28, 2026 10:25
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 28, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@lynx-js/kitten-lynx-test-infra@2716

commit: e2c3803

@PupilTong PupilTong force-pushed the hw/codex/midscene-android-ui-judge branch 2 times, most recently from 2ba994b to f8af74a Compare May 28, 2026 11:34
@PupilTong PupilTong changed the title [codex] add Kitten-Lynx Android support to UI Judge feat: add Kitten-Lynx Android support to UI Judge May 28, 2026
@PupilTong PupilTong marked this pull request as ready for review May 28, 2026 11:34
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts (1)

325-334: 💤 Low value

Promise.race resolves immediately after SIGKILL without waiting for exit.

After the timeout, SIGKILL is sent but the race resolves as soon as the sleep promise fulfills—there's no await for the process to actually terminate. This could leave orphan processes if SIGTERM didn't work.

For test cleanup this is low-impact (CI reaps orphans), but a cleaner approach would await exit after SIGKILL:

Optional fix
   await Promise.race([
     exitPromise,
-    sleep(DISPOSE_TIMEOUT_MS).then(() => {
-      if (detached) {
-        process.kill(-child.pid!, 'SIGKILL');
-      } else {
-        child.kill('SIGKILL');
-      }
-    }),
+    sleep(DISPOSE_TIMEOUT_MS),
   ]);
+
+  if (!child.killed) {
+    if (detached) {
+      process.kill(-child.pid!, 'SIGKILL');
+    } else {
+      child.kill('SIGKILL');
+    }
+    await exitPromise;
+  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts` around
lines 325 - 334, The current Promise.race uses sleep(DISPOSE_TIMEOUT_MS) to send
SIGKILL but resolves as soon as sleep fulfills, not when the child actually
exits; update the logic so that after sleep triggers SIGKILL (using the same
detached/child.pid/child.kill logic) you then await exitPromise before
resolving. Specifically modify the race/timeout branch around exitPromise,
sleep, DISPOSE_TIMEOUT_MS, detached, child.pid and child.kill so the kill is
sent and then exitPromise is awaited (e.g., replace the plain sleep branch with
an async block that sends SIGKILL and then awaits exitPromise) to ensure the
child process has terminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/genui/ui-judge/src/index.ts`:
- Around line 57-64: The interface KittenLynxJudgePage.screenshot currently
allows 'webp' but the parser used by judgeAndroidAgent only handles 'png' and
'jpeg', risking runtime failures; either remove 'webp' from the screenshot
format union in KittenLynxJudgePage (and any other similar declarations around
lines noted) or update the parsing logic in judgeAndroidAgent (and the parsing
helper used at 402-409 / 475-490) to accept and correctly decode WebP buffers;
choose one approach and make the types and the runtime parser consistent so
callers and consumers agree on supported formats.

In `@packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts`:
- Around line 83-91: The beforeAll hook timeout is too short (90_000) compared
to READY_TIMEOUT_MS (120_000) used by the fixture readiness polling; update the
beforeAll timeout in judge-android-agent.vitest.spec.ts (the beforeAll that
calls startKittenLynxFixtureServer, reverseFixturePort, Lynx.connect, page.goto)
to be at least READY_TIMEOUT_MS plus a small buffer (for example
READY_TIMEOUT_MS + 30_000 or 150_000) so
waitForFixtureReady/startKittenLynxFixtureServer has time to surface logs and
exit state before Vitest aborts.

---

Nitpick comments:
In `@packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts`:
- Around line 325-334: The current Promise.race uses sleep(DISPOSE_TIMEOUT_MS)
to send SIGKILL but resolves as soon as sleep fulfills, not when the child
actually exits; update the logic so that after sleep triggers SIGKILL (using the
same detached/child.pid/child.kill logic) you then await exitPromise before
resolving. Specifically modify the race/timeout branch around exitPromise,
sleep, DISPOSE_TIMEOUT_MS, detached, child.pid and child.kill so the kill is
sent and then exitPromise is awaited (e.g., replace the plain sleep branch with
an async block that sends SIGKILL and then awaits exitPromise) to ensure the
child process has terminated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5944bec7-401f-4af5-8750-3f37c1b3e081

📥 Commits

Reviewing files that changed from the base of the PR and between 353363e and f8af74a.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (16)
  • .changeset/quiet-views-report.md
  • .github/a2ui-catalog.instructions.md
  • .github/ui-judge.instructions.md
  • .github/workflows/test.yml
  • packages/genui/a2ui/package.json
  • packages/genui/ui-judge/README.md
  • packages/genui/ui-judge/etc/ui-judge.api.md
  • packages/genui/ui-judge/package.json
  • packages/genui/ui-judge/playwright.config.ts
  • packages/genui/ui-judge/rslib.config.ts
  • packages/genui/ui-judge/src/index.ts
  • packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts
  • packages/genui/ui-judge/tsconfig.build.json
  • packages/genui/ui-judge/tsconfig.json
  • packages/genui/ui-judge/vitest.config.ts
  • packages/testing-library/kitten-lynx/src/KittenLynxView.ts

Comment thread packages/genui/ui-judge/src/index.ts
Comment thread packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts
@PupilTong PupilTong force-pushed the hw/codex/midscene-android-ui-judge branch from f8af74a to 76f5a0b Compare May 28, 2026 12:05
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@PupilTong PupilTong force-pushed the hw/codex/midscene-android-ui-judge branch from 76f5a0b to 1ae4164 Compare June 1, 2026 04:47
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/genui/ui-judge/etc/ui-judge.api.md (1)

9-30: ⚡ Quick win

Add explicit release tags and TSDoc to the new public Android API types/functions.

The new public symbols are exported as undocumented with ae-missing-release-tag warnings. Please add explicit TSDoc release tags (and brief docs) in packages/genui/ui-judge/src/index.ts for judgeAndroidAgent, JudgeAndroidAgentOptions, and KittenLynxJudgePage so the generated API report is clean and the contract is clearer.

Also applies to: 55-67

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/genui/ui-judge/etc/ui-judge.api.md` around lines 9 - 30, The public
API symbols judgeAndroidAgent, JudgeAndroidAgentOptions, and KittenLynxJudgePage
are missing TSDoc release tags and docs; open the declarations in
packages/genui/ui-judge/src/index.ts and add concise TSDoc comments including an
explicit release tag (e.g., `@public`) above each exported
function/interface/type, and add brief descriptions for the overall symbol and
each option property (dimension, page, reference, steps, task, timeoutMs) so the
generated API report no longer shows ae-missing-release-tag and the contract is
documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/genui/package.json`:
- Around line 98-99: The package.json "build" script no longer builds required
subpackages (a2ui, a2ui-prompt, a2ui-catalog-extractor, openui) which can
produce missing/stale publish artifacts; update the "build" script so it
deterministically runs the build for those subpackages before running tsc (e.g.,
invoke the monorepo package builds for a2ui, a2ui-prompt,
a2ui-catalog-extractor, openui prior to "tsc --project tsconfig.build.json"),
keeping the existing "clean" step intact; modify the "build" script entry (the
"build" NPM script) to run the subpackage builds then the TypeScript build so
exported/published artifacts are always produced.

---

Nitpick comments:
In `@packages/genui/ui-judge/etc/ui-judge.api.md`:
- Around line 9-30: The public API symbols judgeAndroidAgent,
JudgeAndroidAgentOptions, and KittenLynxJudgePage are missing TSDoc release tags
and docs; open the declarations in packages/genui/ui-judge/src/index.ts and add
concise TSDoc comments including an explicit release tag (e.g., `@public`) above
each exported function/interface/type, and add brief descriptions for the
overall symbol and each option property (dimension, page, reference, steps,
task, timeoutMs) so the generated API report no longer shows
ae-missing-release-tag and the contract is documented.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6a5ca9fe-0678-4bbd-bb94-4e13c9e17ccd

📥 Commits

Reviewing files that changed from the base of the PR and between 76f5a0b and 1ae4164.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (17)
  • .changeset/quiet-views-report.md
  • .github/a2ui-catalog.instructions.md
  • .github/ui-judge.instructions.md
  • .github/workflows/test.yml
  • examples/react-externals/package.json
  • packages/genui/package.json
  • packages/genui/ui-judge/README.md
  • packages/genui/ui-judge/etc/ui-judge.api.md
  • packages/genui/ui-judge/package.json
  • packages/genui/ui-judge/playwright.config.ts
  • packages/genui/ui-judge/rslib.config.ts
  • packages/genui/ui-judge/src/index.ts
  • packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts
  • packages/genui/ui-judge/tsconfig.build.json
  • packages/genui/ui-judge/tsconfig.json
  • packages/genui/ui-judge/vitest.config.ts
  • packages/testing-library/kitten-lynx/src/KittenLynxView.ts
✅ Files skipped from review due to trivial changes (4)
  • packages/genui/ui-judge/tsconfig.json
  • packages/genui/ui-judge/tsconfig.build.json
  • .changeset/quiet-views-report.md
  • .github/ui-judge.instructions.md
🚧 Files skipped from review as they are similar to previous changes (10)
  • .github/a2ui-catalog.instructions.md
  • .github/workflows/test.yml
  • packages/genui/ui-judge/vitest.config.ts
  • packages/genui/ui-judge/package.json
  • packages/genui/ui-judge/README.md
  • packages/genui/ui-judge/rslib.config.ts
  • packages/genui/ui-judge/playwright.config.ts
  • packages/testing-library/kitten-lynx/src/KittenLynxView.ts
  • packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts
  • packages/genui/ui-judge/src/index.ts

Comment thread packages/genui/package.json
@PupilTong PupilTong merged commit 0adc88d into main Jun 1, 2026
74 of 78 checks passed
@PupilTong PupilTong deleted the hw/codex/midscene-android-ui-judge branch June 1, 2026 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants