test(ui-judge): score more playground examples#2689
Conversation
|
📝 WalkthroughWalkthroughThe UI Judge test suite expands from evaluating a single playground demo to comprehensively scoring all ChangesUI Judge multi-demo test expansion
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/genui/ui-judge/tests/judge-page.spec.ts (1)
150-153: ⚡ Quick winPersist
demoIdalongside each result for a stable identifier.The JSON entry currently depends on mutable
tasktext and environment-specific URL. IncludingdemoIdmakes downstream diffing and trend tracking deterministic.💡 Proposed refactor
interface JudgedPlaygroundResult { + demoId: string; result: UiJudgeResult; task: string; } @@ judgedResults.push({ + demoId: demo.demoId, result, task: demo.task, }); @@ - results: judgedResults.map(({ result, task }) => ({ + results: judgedResults.map(({ demoId, result, task }) => ({ + demoId, ...result, task, })),Also applies to: 205-208, 224-227
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/genui/ui-judge/tests/judge-page.spec.ts` around lines 150 - 153, Include a stable demo identifier when appending judged results: when pushing into judgedResults (the push that currently uses { result, task: demo.task }), add demoId: demo.demoId so each JSON entry contains a deterministic id; make the same change for the other two places that push results (the similar pushes around the other occurrences) to ensure all saved entries include demoId alongside result and task.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/genui/ui-judge/tests/judge-page.spec.ts`:
- Around line 123-124: The global test timeout set by test.setTimeout(1_200_000)
is too small given judgePage is awaited for up to timeoutMs: 180_000 across 8
demos and additional waitForPreviewText delays; increase the timeout to a value
that covers 8 * 180_000 plus the per-demo waitForPreviewText overhead (suggest
using test.setTimeout(1_800_000) or 2_000_000) so the loop that calls
judgePage(...) and waitForPreviewText(...) has enough time to complete.
---
Nitpick comments:
In `@packages/genui/ui-judge/tests/judge-page.spec.ts`:
- Around line 150-153: Include a stable demo identifier when appending judged
results: when pushing into judgedResults (the push that currently uses { result,
task: demo.task }), add demoId: demo.demoId so each JSON entry contains a
deterministic id; make the same change for the other two places that push
results (the similar pushes around the other occurrences) to ensure all saved
entries include demoId alongside result and task.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b006a33e-70ed-49d3-8d83-6f60bfbe4492
📒 Files selected for processing (1)
packages/genui/ui-judge/tests/judge-page.spec.ts
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
UI JudgeAverage score: 3.6 / 5 across 8 results.
DetailsResult 1
Result 2
Result 3
Result 4
Result 5
Result 6
Result 7
Result 8
|
Merging this PR will degrade performance by 7.16%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | transform 1000 view elements |
40 ms | 43.1 ms | -7.16% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing hw/codex/ui-judge-playground-preview (d2fbed3) with main (2d64575)
Footnotes
-
26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
React Example with Element Template#850 Bundle Size — 202.16KiB (0%).d2fbed3(current) vs 2d64575 main#848(baseline) Bundle metrics
|
| Current #850 |
Baseline #848 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
4 |
4 |
|
100 |
100 |
|
30 |
30 |
|
39.22% |
39.22% |
|
2 |
2 |
|
0 |
0 |
Bundle size by type no changes
| Current #850 |
Baseline #848 |
|
|---|---|---|
145.76KiB |
145.76KiB |
|
56.41KiB |
56.41KiB |
Bundle analysis report Branch hw/codex/ui-judge-playground-pre... Project dashboard
Generated by RelativeCI Documentation Report issue
Web Explorer#10157 Bundle Size — 903.53KiB (0%).d2fbed3(current) vs 2d64575 main#10155(baseline) Bundle metrics
Bundle size by type
|
| Current #10157 |
Baseline #10155 |
|
|---|---|---|
499.15KiB |
499.15KiB |
|
402.16KiB |
402.16KiB |
|
2.22KiB |
2.22KiB |
Bundle analysis report Branch hw/codex/ui-judge-playground-pre... Project dashboard
Generated by RelativeCI Documentation Report issue
React MTF Example#1715 Bundle Size — 208.75KiB (0%).d2fbed3(current) vs 2d64575 main#1713(baseline) Bundle metrics
|
| Current #1715 |
Baseline #1713 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
3 |
3 |
|
195 |
195 |
|
77 |
77 |
|
44.17% |
44.17% |
|
2 |
2 |
|
0 |
0 |
Bundle size by type no changes
| Current #1715 |
Baseline #1713 |
|
|---|---|---|
111.23KiB |
111.23KiB |
|
97.52KiB |
97.52KiB |
Bundle analysis report Branch hw/codex/ui-judge-playground-pre... Project dashboard
Generated by RelativeCI Documentation Report issue
React External#1698 Bundle Size — 698.01KiB (0%).d2fbed3(current) vs 2d64575 main#1696(baseline) Bundle metrics
|
| Current #1698 |
Baseline #1696 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
3 |
3 |
|
17 |
17 |
|
5 |
5 |
|
8.59% |
8.59% |
|
0 |
0 |
|
0 |
0 |
Bundle analysis report Branch hw/codex/ui-judge-playground-pre... Project dashboard
Generated by RelativeCI Documentation Report issue
React Example#8582 Bundle Size — 237.81KiB (0%).d2fbed3(current) vs 2d64575 main#8580(baseline) Bundle metrics
|
| Current #8582 |
Baseline #8580 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
4 |
4 |
|
200 |
200 |
|
80 |
80 |
|
44.68% |
44.68% |
|
2 |
2 |
|
0 |
0 |
Bundle size by type no changes
| Current #8582 |
Baseline #8580 |
|
|---|---|---|
145.76KiB |
145.76KiB |
|
92.05KiB |
92.05KiB |
Bundle analysis report Branch hw/codex/ui-judge-playground-pre... Project dashboard
Generated by RelativeCI Documentation Report issue
Summary
Test Plan
Summary by CodeRabbit
Tests
Refactor