feat: add Kitten-Lynx Android support to UI Judge#2716
Conversation
🦋 Changeset detectedLatest commit: e2c3803 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an Android-targeted judging flow: new ChangesAndroid UI Judge Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Merging this PR will degrade performance by 10.73%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | transform 1000 view elements |
42.2 ms | 47.2 ms | -10.73% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing hw/codex/midscene-android-ui-judge (e2c3803) with main (fc217dc)
Footnotes
-
26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
UI JudgeGEQI weighted score: 61.8 / 100 across 8 examples.
DetailsResult 1
Result 2
Result 3
Result 4
Result 5
Result 6
Result 7
Result 8
|
React Example with Element Template#954 Bundle Size — 204.36KiB (0%).83927eb(current) vs 81802d3 main#950(baseline) Bundle metrics
|
| Current #954 |
Baseline #950 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
4 |
4 |
|
124 |
124 |
|
50 |
50 |
|
45.19% |
45.19% |
|
2 |
2 |
|
0 |
0 |
Bundle size by type no changes
| Current #954 |
Baseline #950 |
|
|---|---|---|
145.76KiB |
145.76KiB |
|
58.61KiB |
58.61KiB |
Bundle analysis report Branch hw/codex/midscene-android-ui-jud... Project dashboard
Generated by RelativeCI Documentation Report issue
React MTF Example#1819 Bundle Size — 208.94KiB (0%).83927eb(current) vs 81802d3 main#1815(baseline) Bundle metrics
|
| Current #1819 |
Baseline #1815 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
3 |
3 |
|
199 |
199 |
|
78 |
78 |
|
44.08% |
44.08% |
|
2 |
2 |
|
0 |
0 |
Bundle size by type no changes
| Current #1819 |
Baseline #1815 |
|
|---|---|---|
111.23KiB |
111.23KiB |
|
97.71KiB |
97.71KiB |
Bundle analysis report Branch hw/codex/midscene-android-ui-jud... Project dashboard
Generated by RelativeCI Documentation Report issue
React Example#8685 Bundle Size — 238KiB (0%).83927eb(current) vs 81802d3 main#8681(baseline) Bundle metrics
|
| Current #8685 |
Baseline #8681 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
4 |
4 |
|
204 |
204 |
|
81 |
81 |
|
44.59% |
44.59% |
|
2 |
2 |
|
0 |
0 |
Bundle size by type no changes
| Current #8685 |
Baseline #8681 |
|
|---|---|---|
145.76KiB |
145.76KiB |
|
92.24KiB |
92.24KiB |
Bundle analysis report Branch hw/codex/midscene-android-ui-jud... Project dashboard
Generated by RelativeCI Documentation Report issue
React External#1801 Bundle Size — 699.5KiB (0%).83927eb(current) vs 81802d3 main#1797(baseline) Bundle metrics
|
| Current #1801 |
Baseline #1797 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
3 |
3 |
|
17 |
17 |
|
5 |
5 |
|
7.13% |
7.13% |
|
0 |
0 |
|
0 |
0 |
Bundle analysis report Branch hw/codex/midscene-android-ui-jud... Project dashboard
Generated by RelativeCI Documentation Report issue
Web Explorer#10262 Bundle Size — 903.53KiB (0%).83927eb(current) vs 81802d3 main#10258(baseline) Bundle metrics
|
| Current #10262 |
Baseline #10258 |
|
|---|---|---|
45.06KiB |
45.06KiB |
|
2.22KiB |
2.22KiB |
|
0% |
0% |
|
9 |
9 |
|
11 |
11 |
|
231 |
231 |
|
11 |
11 |
|
27.12% |
27.12% |
|
10 |
10 |
|
0 |
0 |
Bundle size by type no changes
| Current #10262 |
Baseline #10258 |
|
|---|---|---|
499.15KiB |
499.15KiB |
|
402.16KiB |
402.16KiB |
|
2.22KiB |
2.22KiB |
Bundle analysis report Branch hw/codex/midscene-android-ui-jud... Project dashboard
Generated by RelativeCI Documentation Report issue
83927eb to
d986c4a
Compare
034b63f to
84b229d
Compare
commit: |
2ba994b to
f8af74a
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts (1)
325-334: 💤 Low value
Promise.raceresolves immediately after SIGKILL without waiting for exit.After the timeout, SIGKILL is sent but the race resolves as soon as the sleep promise fulfills—there's no await for the process to actually terminate. This could leave orphan processes if SIGTERM didn't work.
For test cleanup this is low-impact (CI reaps orphans), but a cleaner approach would await exit after SIGKILL:
Optional fix
await Promise.race([ exitPromise, - sleep(DISPOSE_TIMEOUT_MS).then(() => { - if (detached) { - process.kill(-child.pid!, 'SIGKILL'); - } else { - child.kill('SIGKILL'); - } - }), + sleep(DISPOSE_TIMEOUT_MS), ]); + + if (!child.killed) { + if (detached) { + process.kill(-child.pid!, 'SIGKILL'); + } else { + child.kill('SIGKILL'); + } + await exitPromise; + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts` around lines 325 - 334, The current Promise.race uses sleep(DISPOSE_TIMEOUT_MS) to send SIGKILL but resolves as soon as sleep fulfills, not when the child actually exits; update the logic so that after sleep triggers SIGKILL (using the same detached/child.pid/child.kill logic) you then await exitPromise before resolving. Specifically modify the race/timeout branch around exitPromise, sleep, DISPOSE_TIMEOUT_MS, detached, child.pid and child.kill so the kill is sent and then exitPromise is awaited (e.g., replace the plain sleep branch with an async block that sends SIGKILL and then awaits exitPromise) to ensure the child process has terminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/genui/ui-judge/src/index.ts`:
- Around line 57-64: The interface KittenLynxJudgePage.screenshot currently
allows 'webp' but the parser used by judgeAndroidAgent only handles 'png' and
'jpeg', risking runtime failures; either remove 'webp' from the screenshot
format union in KittenLynxJudgePage (and any other similar declarations around
lines noted) or update the parsing logic in judgeAndroidAgent (and the parsing
helper used at 402-409 / 475-490) to accept and correctly decode WebP buffers;
choose one approach and make the types and the runtime parser consistent so
callers and consumers agree on supported formats.
In `@packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts`:
- Around line 83-91: The beforeAll hook timeout is too short (90_000) compared
to READY_TIMEOUT_MS (120_000) used by the fixture readiness polling; update the
beforeAll timeout in judge-android-agent.vitest.spec.ts (the beforeAll that
calls startKittenLynxFixtureServer, reverseFixturePort, Lynx.connect, page.goto)
to be at least READY_TIMEOUT_MS plus a small buffer (for example
READY_TIMEOUT_MS + 30_000 or 150_000) so
waitForFixtureReady/startKittenLynxFixtureServer has time to surface logs and
exit state before Vitest aborts.
---
Nitpick comments:
In `@packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts`:
- Around line 325-334: The current Promise.race uses sleep(DISPOSE_TIMEOUT_MS)
to send SIGKILL but resolves as soon as sleep fulfills, not when the child
actually exits; update the logic so that after sleep triggers SIGKILL (using the
same detached/child.pid/child.kill logic) you then await exitPromise before
resolving. Specifically modify the race/timeout branch around exitPromise,
sleep, DISPOSE_TIMEOUT_MS, detached, child.pid and child.kill so the kill is
sent and then exitPromise is awaited (e.g., replace the plain sleep branch with
an async block that sends SIGKILL and then awaits exitPromise) to ensure the
child process has terminated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5944bec7-401f-4af5-8750-3f37c1b3e081
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (16)
.changeset/quiet-views-report.md.github/a2ui-catalog.instructions.md.github/ui-judge.instructions.md.github/workflows/test.ymlpackages/genui/a2ui/package.jsonpackages/genui/ui-judge/README.mdpackages/genui/ui-judge/etc/ui-judge.api.mdpackages/genui/ui-judge/package.jsonpackages/genui/ui-judge/playwright.config.tspackages/genui/ui-judge/rslib.config.tspackages/genui/ui-judge/src/index.tspackages/genui/ui-judge/tests/judge-android-agent.vitest.spec.tspackages/genui/ui-judge/tsconfig.build.jsonpackages/genui/ui-judge/tsconfig.jsonpackages/genui/ui-judge/vitest.config.tspackages/testing-library/kitten-lynx/src/KittenLynxView.ts
f8af74a to
76f5a0b
Compare
|
Actionable comments posted: 0 |
76f5a0b to
1ae4164
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/genui/ui-judge/etc/ui-judge.api.md (1)
9-30: ⚡ Quick winAdd explicit release tags and TSDoc to the new public Android API types/functions.
The new public symbols are exported as undocumented with
ae-missing-release-tagwarnings. Please add explicit TSDoc release tags (and brief docs) inpackages/genui/ui-judge/src/index.tsforjudgeAndroidAgent,JudgeAndroidAgentOptions, andKittenLynxJudgePageso the generated API report is clean and the contract is clearer.Also applies to: 55-67
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/genui/ui-judge/etc/ui-judge.api.md` around lines 9 - 30, The public API symbols judgeAndroidAgent, JudgeAndroidAgentOptions, and KittenLynxJudgePage are missing TSDoc release tags and docs; open the declarations in packages/genui/ui-judge/src/index.ts and add concise TSDoc comments including an explicit release tag (e.g., `@public`) above each exported function/interface/type, and add brief descriptions for the overall symbol and each option property (dimension, page, reference, steps, task, timeoutMs) so the generated API report no longer shows ae-missing-release-tag and the contract is documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/genui/package.json`:
- Around line 98-99: The package.json "build" script no longer builds required
subpackages (a2ui, a2ui-prompt, a2ui-catalog-extractor, openui) which can
produce missing/stale publish artifacts; update the "build" script so it
deterministically runs the build for those subpackages before running tsc (e.g.,
invoke the monorepo package builds for a2ui, a2ui-prompt,
a2ui-catalog-extractor, openui prior to "tsc --project tsconfig.build.json"),
keeping the existing "clean" step intact; modify the "build" script entry (the
"build" NPM script) to run the subpackage builds then the TypeScript build so
exported/published artifacts are always produced.
---
Nitpick comments:
In `@packages/genui/ui-judge/etc/ui-judge.api.md`:
- Around line 9-30: The public API symbols judgeAndroidAgent,
JudgeAndroidAgentOptions, and KittenLynxJudgePage are missing TSDoc release tags
and docs; open the declarations in packages/genui/ui-judge/src/index.ts and add
concise TSDoc comments including an explicit release tag (e.g., `@public`) above
each exported function/interface/type, and add brief descriptions for the
overall symbol and each option property (dimension, page, reference, steps,
task, timeoutMs) so the generated API report no longer shows
ae-missing-release-tag and the contract is documented.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6a5ca9fe-0678-4bbd-bb94-4e13c9e17ccd
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (17)
.changeset/quiet-views-report.md.github/a2ui-catalog.instructions.md.github/ui-judge.instructions.md.github/workflows/test.ymlexamples/react-externals/package.jsonpackages/genui/package.jsonpackages/genui/ui-judge/README.mdpackages/genui/ui-judge/etc/ui-judge.api.mdpackages/genui/ui-judge/package.jsonpackages/genui/ui-judge/playwright.config.tspackages/genui/ui-judge/rslib.config.tspackages/genui/ui-judge/src/index.tspackages/genui/ui-judge/tests/judge-android-agent.vitest.spec.tspackages/genui/ui-judge/tsconfig.build.jsonpackages/genui/ui-judge/tsconfig.jsonpackages/genui/ui-judge/vitest.config.tspackages/testing-library/kitten-lynx/src/KittenLynxView.ts
✅ Files skipped from review due to trivial changes (4)
- packages/genui/ui-judge/tsconfig.json
- packages/genui/ui-judge/tsconfig.build.json
- .changeset/quiet-views-report.md
- .github/ui-judge.instructions.md
🚧 Files skipped from review as they are similar to previous changes (10)
- .github/a2ui-catalog.instructions.md
- .github/workflows/test.yml
- packages/genui/ui-judge/vitest.config.ts
- packages/genui/ui-judge/package.json
- packages/genui/ui-judge/README.md
- packages/genui/ui-judge/rslib.config.ts
- packages/genui/ui-judge/playwright.config.ts
- packages/testing-library/kitten-lynx/src/KittenLynxView.ts
- packages/genui/ui-judge/tests/judge-android-agent.vitest.spec.ts
- packages/genui/ui-judge/src/index.ts
Summary
judgeAndroidAgentto mirrorjudgePageby accepting apagefrom@lynx-js/kitten-lynx-test-infra'sLynx.newPage()instead of a caller-supplied Midscene agent.Validation
pnpm turbo build --filter @lynx-js/kitten-lynx-test-infrapnpm --filter @lynx-js/kitten-lynx-test-infra testpnpm --filter @lynx-js/ui-judge buildpnpm --filter @lynx-js/ui-judge run api-extractorpnpm --filter @lynx-js/ui-judge run test:androidUI_JUDGE_ANDROID_INTEGRATION=1 pnpm --filter @lynx-js/ui-judge run test:androidenv -u MIDSCENE_MODEL_NAME pnpm --filter @lynx-js/ui-judge run test:playwrightpnpm eslint --cache --fix --no-warn-ignored --flag v10_config_lookup_from_file ...changed files...pnpm biome check ...changed files...pnpm dprint check ...changed files...git diff --checkSummary by CodeRabbit
New Features
Documentation
Chores