test(ui-judge): add GEQI dimension scoring by PupilTong · Pull Request #2693 · lynx-family/lynx-stack

PupilTong · 2026-05-22T08:25:31Z

Summary

add GEQI scoring dimensions and dimension-specific Midscene prompts to judgePage
score each A2UI playground example across the five weighted GEQI dimensions
update the UI Judge PR comment to show weighted 100-point GEQI summaries while preserving raw 1-5 results

Test Plan

./node_modules/.bin/dprint fmt packages/genui/ui-judge/src/index.ts packages/genui/ui-judge/tests/judge-page.spec.ts packages/genui/ui-judge/README.md .github/actions/ui-judge-comment/comment.mjs .github/actions/ui-judge-comment/README.md .github/scripts/write-ui-judge-result.mjs .github/ui-judge.instructions.md
./node_modules/.bin/biome check packages/genui/ui-judge/src/index.ts packages/genui/ui-judge/tests/judge-page.spec.ts .github/actions/ui-judge-comment/comment.mjs .github/scripts/write-ui-judge-result.mjs
CI=1 pnpm --filter @lynx-js/ui-judge exec tsc -p tsconfig.json
env -u MIDSCENE_MODEL_NAME -u MIDSCENE_MODEL_API_KEY -u MIDSCENE_OPENAI_INIT_CONFIG_JSON CI=1 pnpm --filter @lynx-js/ui-judge test
INPUT_DRY_RUN=true INPUT_RESULT_JSON='' node .github/actions/ui-judge-comment/comment.mjs
UI_JUDGE_RESULT_FILE=/private/tmp/ui-judge-fallback.json UI_JUDGE_RESULT_ERROR_MESSAGE='Midscene secrets are unavailable; UI Judge model test was skipped.' node .github/scripts/write-ui-judge-result.mjs && INPUT_DRY_RUN=true INPUT_RESULT_FILE=/private/tmp/ui-judge-fallback.json node .github/actions/ui-judge-comment/comment.mjs
pnpm turbo build --filter @lynx-js/ui-judge

Summary by CodeRabbit

New Features
- UI Judge now scores across five GEQI dimensions with per-example weights and shows both weighted 0–100 GEQI summaries and raw 1–5 Likert scores.
- PR comments and result tables now include example IDs, dimension labels, per-example weights, and a separate weighted GEQI summary alongside visual-correctness.
Documentation
- Updated instructions and READMEs to describe multi-dimensional GEQI scoring, required dimension labels/weights, and how results are rendered.

changeset-bot · 2026-05-22T08:25:37Z

⚠️ No Changeset found

Latest commit: 42b4346

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-05-22T08:25:46Z

📝 Walkthrough

Walkthrough

This PR extends UI Judge from single visual-correctness scoring to multi-dimension GEQI scoring across five weighted dimensions. The core judgePage, prompt registry, result initialization, comment rendering, and tests are updated to support dimension selection, weighted aggregation, and per-result metadata (demoId, dimensionLabel, weight).

Changes

Multi-dimension GEQI scoring system

Layer / File(s)	Summary
Type contracts and dimension definitions `packages/genui/ui-judge/src/index.ts`, `packages/genui/ui-judge/tests/judge-page.spec.ts`	`UiJudgeDimension` union expands to five dimensions; `JudgePageOptions.dimension` and `UiJudgeResult.dimension` updated; tests enumerate GEQI dimension cases.
Result initialization (write defaults) `.github/scripts/write-ui-judge-result.mjs`	Adds `geqiDimensions` and initializes default JSON with a `dimensions` array containing `dimension`, `dimensionLabel`, `weight`, `score`, `error`, `steps`, and `url`.
Core judgePage normalization and error handling `packages/genui/ui-judge/src/index.ts`	Option normalization validates/resolves dimensions via `normalizeDimension`/`getResultDimension`; judgePage returns normalized dimension on success and fallback on error.
Dimension-driven prompt registry and scoring `packages/genui/ui-judge/src/index.ts`	Introduces `JUDGE_DIMENSION_PROMPTS` and `buildJudgePrompt` to compose dimension-specific rubrics; grading uses Likert-style 1–5 (plus 0).
Comment generation with weighted summaries and metadata `.github/actions/ui-judge-comment/comment.mjs`, `.github/actions/ui-judge-comment/README.md`	Normalizes `demoId`/`dimensionLabel`/`weight`, validates `weight`, computes weighted GEQI aggregates, conditionally renders GEQI weighted intro and dimension-summary table, extends result tables with Example/Weight and per-dimension columns, and centralizes table row formatting.
Test coverage for multi-dimension scoring `packages/genui/ui-judge/tests/judge-page.spec.ts`	Test loop now upserts visual and per-dimension results into a Map keyed by `demoId`, writes JSON per update, and serializes `results` with `demoId` and `dimensions` arrays.
Documentation and instructions `packages/genui/ui-judge/README.md`, `.github/ui-judge.instructions.md`	README generalizes scoring description, documents `dimension` option and supported values, and instructions describe GEQI scoring rules and payload fields required for weighted PR comment rendering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

lynx-family/lynx-stack#2689: Updates judge-page test coverage and multi-result writing similar to this PR's test changes.
lynx-family/lynx-stack#2629: Introduces the original judgePage visual-correctness implementation that this PR extends to multi-dimension scoring.
lynx-family/lynx-stack#2673: Previous changes to the UI Judge comment action that this PR further extends with weighted summaries and dimension metadata.

Suggested reviewers

Sherry-hue
HuJean
colinaaa

Poem

🐰 I hopped through prompts and weights today,
Five dim'ns to judge each demo's play.
Labels, demos, scores aligned in rows,
A carrot for each weighted GEQI that grows.
Cheers — the rabbit claps and twirls away.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding GEQI dimension scoring to the UI Judge test. It directly summarizes the primary objective of the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch hw/codex/ui-judge-geqi-dimensions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

.github/scripts/write-ui-judge-result.mjs (1)

11-17: ⚡ Quick win

Add a fail-fast 100-point weight invariant.

A typo in geqiDimensions weights can silently skew GEQI summary percentages. Please validate total weight once before writing results.

Proposed patch

 const geqiDimensions = [
   ['usability-interaction', 'Usability & Interaction', 30],
   ['visual-aesthetics', 'Visual & Aesthetics', 25],
   ['consistency-standards', 'Consistency & Standards', 15],
   ['architecture-writing', 'Architecture & UX Writing', 15],
   ['accessibility-performance', 'Accessibility & Performance', 15],
 ];
+
+const totalWeight = geqiDimensions.reduce((sum, [, , weight]) => sum + weight, 0);
+if (totalWeight !== 100) {
+  throw new Error(`GEQI weights must sum to 100, got ${totalWeight}.`);
+}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/scripts/write-ui-judge-result.mjs around lines 11 - 17, Validate
that the sum of the weight values in the geqiDimensions array equals 100 before
proceeding to write results; compute the total by summing the third element of
each tuple in geqiDimensions and, if the total !== 100, throw an error or log a
clear message and exit/fail-fast so the script stops rather than producing
skewed percentages—add this check early in the script (before any result
aggregation or file writes) and reference the geqiDimensions variable when
locating the change.

.github/actions/ui-judge-comment/comment.mjs (1)

258-300: ⚡ Quick win

Consider validating weight consistency across results for the same dimension.

When multiple results share the same dimension (lines 264-270), the code uses the weight from the first result encountered and doesn't check whether subsequent results for that dimension have the same weight. If the input data contains inconsistent weights for the same dimension, the aggregation will silently use an arbitrary first weight.

Given that GEQI dimension weights are model constants, all results for the same dimension should carry identical weights. Adding validation would catch input data issues early.
🛡️ Suggested validation
     const existing = dimensionsById.get(result.dimension);
     if (existing) {
+      if (existing.weight !== result.weight) {
+        throw new Error(
+          `Dimension "${result.dimension}" has inconsistent weights: ${existing.weight} vs ${result.weight}`
+        );
+      }
       existing.count += 1;
       existing.errorCount += result.error ? 1 : 0;
       existing.score += result.score;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/actions/ui-judge-comment/comment.mjs around lines 258 - 300, In
buildWeightedSummary, validate that all results for the same dimension use the
same weight: when merging a new result into dimensionsById (the existing object
created for result.dimension), check that existing.weight === result.weight and
if not, surface a failure (throw an Error or log and return undefined) so
inconsistent input weights aren't silently accepted; update the merge logic
around the existing variable in the for loop to perform this check before
incrementing existing.count/errorCount/score.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/actions/ui-judge-comment/comment.mjs:
- Around line 258-300: In buildWeightedSummary, validate that all results for
the same dimension use the same weight: when merging a new result into
dimensionsById (the existing object created for result.dimension), check that
existing.weight === result.weight and if not, surface a failure (throw an Error
or log and return undefined) so inconsistent input weights aren't silently
accepted; update the merge logic around the existing variable in the for loop to
perform this check before incrementing existing.count/errorCount/score.

In @.github/scripts/write-ui-judge-result.mjs:
- Around line 11-17: Validate that the sum of the weight values in the
geqiDimensions array equals 100 before proceeding to write results; compute the
total by summing the third element of each tuple in geqiDimensions and, if the
total !== 100, throw an error or log a clear message and exit/fail-fast so the
script stops rather than producing skewed percentages—add this check early in
the script (before any result aggregation or file writes) and reference the
geqiDimensions variable when locating the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0ca79c4b-ca98-4017-8d68-c744c5e6d2b1

📥 Commits

Reviewing files that changed from the base of the PR and between 1851187 and b01553e.

📒 Files selected for processing (7)

.github/actions/ui-judge-comment/README.md
.github/actions/ui-judge-comment/comment.mjs
.github/scripts/write-ui-judge-result.mjs
.github/ui-judge.instructions.md
packages/genui/ui-judge/README.md
packages/genui/ui-judge/src/index.ts
packages/genui/ui-judge/tests/judge-page.spec.ts

codecov · 2026-05-22T08:39:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

codspeed-hq · 2026-05-22T08:47:17Z

Merging this PR will improve performance by 17.63%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
✅ 80 untouched benchmarks
⏩ 26 skipped benchmarks¹

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`transform 1000 view elements`	47.3 ms	40.2 ms	+17.63%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing hw/codex/ui-judge-geqi-dimensions (42b4346) with main (11ef105)²}

26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on main (1851187) during the generation of this report, so 11ef105 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

relativeci · 2026-05-22T08:48:31Z

React Example with Element Template

#881 Bundle Size — 201.67KiB (0%).

42b4346(current) vs e73c383 main#880(baseline)

Bundle metrics no changes

	Current #881	Baseline #880
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`4`	`4`
Modules	`99`	`99`
Duplicate Modules	`30`	`30`
Duplicate Code	`39.25%`	`39.25%`
Packages	`2`	`2`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #881	Baseline #880
IMG	`145.76KiB`	`145.76KiB`
Other	`55.91KiB`	`55.91KiB`

Bundle analysis report Branch hw/codex/ui-judge-geqi-dimension... Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-22T08:48:33Z

React External

#1729 Bundle Size — 698.01KiB (0%).

42b4346(current) vs e73c383 main#1728(baseline)

Bundle metrics no changes

	Current #1729	Baseline #1728
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`3`	`3`
Modules	`17`	`17`
Duplicate Modules	`5`	`5`
Duplicate Code	`8.59%`	`8.59%`
Packages	`0`	`0`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #1729	Baseline #1728
Other	`698.01KiB`	`698.01KiB`

Bundle analysis report Branch hw/codex/ui-judge-geqi-dimension... Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-22T08:48:38Z

React MTF Example

#1746 Bundle Size — 208.75KiB (0%).

42b4346(current) vs e73c383 main#1745(baseline)

Bundle metrics no changes

	Current #1746	Baseline #1745
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`3`	`3`
Modules	`195`	`195`
Duplicate Modules	`77`	`77`
Duplicate Code	`44.17%`	`44.17%`
Packages	`2`	`2`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #1746	Baseline #1745
IMG	`111.23KiB`	`111.23KiB`
Other	`97.52KiB`	`97.52KiB`

Bundle analysis report Branch hw/codex/ui-judge-geqi-dimension... Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-22T08:48:39Z

React Example

#8612 Bundle Size — 237.81KiB (0%).

42b4346(current) vs e73c383 main#8611(baseline)

Bundle metrics no changes

	Current #8612	Baseline #8611
Initial JS	`0B`	`0B`
Initial CSS	`0B`	`0B`
Cache Invalidation	`0%`	`0%`
Chunks	`0`	`0`
Assets	`4`	`4`
Modules	`200`	`200`
Duplicate Modules	`80`	`80`
Duplicate Code	`44.68%`	`44.68%`
Packages	`2`	`2`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #8612	Baseline #8611
IMG	`145.76KiB`	`145.76KiB`
Other	`92.05KiB`	`92.05KiB`

Bundle analysis report Branch hw/codex/ui-judge-geqi-dimension... Project dashboard

^{Generated by RelativeCI Documentation Report issue}

relativeci · 2026-05-22T08:48:40Z

Web Explorer

#10188 Bundle Size — 903.53KiB (0%).

42b4346(current) vs e73c383 main#10187(baseline)

Bundle metrics no changes

	Current #10188	Baseline #10187
Initial JS	`45.06KiB`	`45.06KiB`
Initial CSS	`2.22KiB`	`2.22KiB`
Cache Invalidation	`0%`	`0%`
Chunks	`9`	`9`
Assets	`11`	`11`
Modules	`230`	`230`
Duplicate Modules	`11`	`11`
Duplicate Code	`27.12%`	`27.12%`
Packages	`10`	`10`
Duplicate Packages	`0`	`0`

Bundle size by type no changes

	Current #10188	Baseline #10187
JS	`499.15KiB`	`499.15KiB`
Other	`402.16KiB`	`402.16KiB`
CSS	`2.22KiB`	`2.22KiB`

Bundle analysis report Branch hw/codex/ui-judge-geqi-dimension... Project dashboard

^{Generated by RelativeCI Documentation Report issue}

github-actions · 2026-05-22T09:07:48Z

UI Judge

GEQI weighted score: 57.5 / 100 across 8 examples.
Average visual-correctness score: 3.1 / 5.

Dimension	Weight	Average	Results	Status
Usability & Interaction	30%	2.9 / 5	8	OK
Visual & Aesthetics	25%	2.9 / 5	8	OK
Consistency & Standards	15%	2.9 / 5	8	OK
Architecture & UX Writing	15%	2.8 / 5	8	OK
Accessibility & Performance	15%	3 / 5	8	OK

#	Example	Visual Correctness	Usability & Interaction (30%)	Visual & Aesthetics (25%)	Consistency & Standards (15%)	Architecture & UX Writing (15%)	Accessibility & Performance (15%)	GEQI	Page	Status
1	recs	2 / 5	2 / 5	2 / 5	2 / 5	2 / 5	2 / 5	40 / 100	preview	OK
2	cast-grid	5 / 5	3 / 5	3 / 5	4 / 5	5 / 5	4 / 5	72 / 100	preview	OK
3	citywalk-list	2 / 5	2 / 5	3 / 5	2 / 5	1 / 5	3 / 5	45 / 100	preview	OK
4	fridge-search	2 / 5	2 / 5	2 / 5	2 / 5	2 / 5	2 / 5	40 / 100	preview	OK
5	trip-planner	2 / 5	2 / 5	3 / 5	2 / 5	2 / 5	2 / 5	45 / 100	preview	OK
6	weather-current	5 / 5	5 / 5	4 / 5	5 / 5	4 / 5	5 / 5	92 / 100	preview	OK
7	product-card	5 / 5	5 / 5	4 / 5	4 / 5	4 / 5	4 / 5	86 / 100	preview	OK
8	workout-plan	2 / 5	2 / 5	2 / 5	2 / 5	2 / 5	2 / 5	40 / 100	preview	OK

Details

Result 1

Example: recs
Dimension: visual-correctness
Visual correctness: 2 / 5
GEQI dimensions:
- Usability & Interaction: 2 / 5 (30%)
- Visual & Aesthetics: 2 / 5 (25%)
- Consistency & Standards: 2 / 5 (15%)
- Architecture & UX Writing: 2 / 5 (15%)
- Accessibility & Performance: 2 / 5 (15%)
Task: The A2UI playground preview should show date-night dining recommendations for Moonlight Terrace, Pinewood Bistro, and Sea Breeze Kitchen.

Result 2

Example: cast-grid
Dimension: visual-correctness
Visual correctness: 5 / 5
GEQI dimensions:
- Usability & Interaction: 3 / 5 (30%)
- Visual & Aesthetics: 3 / 5 (25%)
- Consistency & Standards: 4 / 5 (15%)
- Architecture & UX Writing: 5 / 5 (15%)
- Accessibility & Performance: 4 / 5 (15%)
Task: The A2UI playground preview should show a cast grid for the short film Night Notes, including Lin Xia and Zhou Ning cast cards.

Result 3

Example: citywalk-list
Dimension: visual-correctness
Visual correctness: 2 / 5
GEQI dimensions:
- Usability & Interaction: 2 / 5 (30%)
- Visual & Aesthetics: 3 / 5 (25%)
- Consistency & Standards: 2 / 5 (15%)
- Architecture & UX Writing: 1 / 5 (15%)
- Accessibility & Performance: 3 / 5 (15%)
Task: The A2UI playground preview should show weekend citywalk coffee picks with Rooftop Brew Room, Corner Canvas Lab, and Late Sun Roastery.

Result 4

Example: fridge-search
Dimension: visual-correctness
Visual correctness: 2 / 5
GEQI dimensions:
- Usability & Interaction: 2 / 5 (30%)
- Visual & Aesthetics: 2 / 5 (25%)
- Consistency & Standards: 2 / 5 (15%)
- Architecture & UX Writing: 2 / 5 (15%)
- Accessibility & Performance: 2 / 5 (15%)
Task: The A2UI playground preview should show refrigerator search results with Siemens, Hualing, Haier, and Midea product cards.

Result 5

Example: trip-planner
Dimension: visual-correctness
Visual correctness: 2 / 5
GEQI dimensions:
- Usability & Interaction: 2 / 5 (30%)
- Visual & Aesthetics: 3 / 5 (25%)
- Consistency & Standards: 2 / 5 (15%)
- Architecture & UX Writing: 2 / 5 (15%)
- Accessibility & Performance: 2 / 5 (15%)
Task: The A2UI playground preview should show a Kyoto 48-hour trip planner with Day 1 and Day 2 itinerary sections, including Monkey Park Viewpoint.

Result 6

Example: weather-current
Dimension: visual-correctness
Visual correctness: 5 / 5
GEQI dimensions:
- Usability & Interaction: 5 / 5 (30%)
- Visual & Aesthetics: 4 / 5 (25%)
- Consistency & Standards: 5 / 5 (15%)
- Architecture & UX Writing: 4 / 5 (15%)
- Accessibility & Performance: 5 / 5 (15%)
Task: The A2UI playground preview should show the current weather for Austin, TX, including clear skies with light breeze.

Result 7

Example: product-card
Dimension: visual-correctness
Visual correctness: 5 / 5
GEQI dimensions:
- Usability & Interaction: 5 / 5 (30%)
- Visual & Aesthetics: 4 / 5 (25%)
- Consistency & Standards: 4 / 5 (15%)
- Architecture & UX Writing: 4 / 5 (15%)
- Accessibility & Performance: 4 / 5 (15%)
Task: The A2UI playground preview should show a Wireless Headphones Pro product card with a visible Add to Cart action.

Result 8

Example: workout-plan
Dimension: visual-correctness
Visual correctness: 2 / 5
GEQI dimensions:
- Usability & Interaction: 2 / 5 (30%)
- Visual & Aesthetics: 2 / 5 (25%)
- Consistency & Standards: 2 / 5 (15%)
- Architecture & UX Writing: 2 / 5 (15%)
- Accessibility & Performance: 2 / 5 (15%)
Task: The A2UI playground preview should show a weekly workout plan with five days from Monday Ramp-Up through Friday Conditioning.

Workflow run (attempt 3)

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/actions/ui-judge-comment/comment.mjs:
- Around line 327-342: When aggregating into dimensionsById, validate that
subsequent result.weight values match the first stored weight for that dimension
(check existing.weight vs result.weight inside the existing branch handling for
result.dimension); if they differ, do not silently ignore—record the
inconsistency by (for example) adding a weightInconsistent flag and a
weightsSeen array or incrementing a mismatch counter on the existing dimension
object and emit a warning/log entry so the downstream GEQI weighted score
calculation can detect and surface bad input data instead of using a silently
wrong weight.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ab144c57-f234-428d-8afe-d7289f6b844e

📥 Commits

Reviewing files that changed from the base of the PR and between b01553e and 42b4346.

📒 Files selected for processing (7)

.github/actions/ui-judge-comment/README.md
.github/actions/ui-judge-comment/comment.mjs
.github/scripts/write-ui-judge-result.mjs
.github/ui-judge.instructions.md
packages/genui/ui-judge/README.md
packages/genui/ui-judge/src/index.ts
packages/genui/ui-judge/tests/judge-page.spec.ts

✅ Files skipped from review due to trivial changes (3)

.github/actions/ui-judge-comment/README.md
.github/ui-judge.instructions.md
packages/genui/ui-judge/README.md

coderabbitai · 2026-05-22T09:34:28Z

+    const existing = dimensionsById.get(result.dimension);
+    if (existing) {
+      existing.count += 1;
+      existing.errorCount += result.error ? 1 : 0;
+      existing.score += result.score;
+      continue;
+    }
+
+    dimensionsById.set(result.dimension, {
+      count: 1,
+      dimension: result.dimension,
+      errorCount: result.error ? 1 : 0,
+      label: result.dimensionLabel || result.dimension,
+      score: result.score,
+      weight: result.weight,
+    });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate per-dimension weight consistency before aggregation.

On Line 341, the first encountered weight for a dimension is reused, and later conflicting weights are silently ignored. That can produce incorrect GEQI weighted scores in the PR comment without surfacing data issues.

Suggested fix

for (const result of weightedResults) { const existing = dimensionsById.get(result.dimension); if (existing) { + if (existing.weight !== result.weight) { + throw new Error( + `Inconsistent weight for dimension "${result.dimension}": ` + + `${existing.weight} vs ${result.weight}.`, + ); + } existing.count += 1; existing.errorCount += result.error ? 1 : 0; existing.score += result.score; continue; }

Also applies to: 345-363

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/actions/ui-judge-comment/comment.mjs around lines 327 - 342, When aggregating into dimensionsById, validate that subsequent result.weight values match the first stored weight for that dimension (check existing.weight vs result.weight inside the existing branch handling for result.dimension); if they differ, do not silently ignore—record the inconsistency by (for example) adding a weightInconsistent flag and a weightsSeen array or incrementing a mismatch counter on the existing dimension object and emit a warning/log entry so the downstream GEQI weighted score calculation can detect and surface bad input data instead of using a silently wrong weight.

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

test(ui-judge): add GEQI dimension scoring

42b4346

PupilTong force-pushed the hw/codex/ui-judge-geqi-dimensions branch from b01553e to 42b4346 Compare May 22, 2026 09:28

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

Conversation

PupilTong commented May 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Summary by CodeRabbit

Uh oh!

changeset-bot Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 22, 2026

Codecov Report

Uh oh!

codspeed-hq Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 17.63%

Performance Changes

Footnotes

Uh oh!

relativeci Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React Example with Element Template

#881 Bundle Size — 201.67KiB (0%).

Uh oh!

relativeci Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React External

#1729 Bundle Size — 698.01KiB (0%).

Uh oh!

relativeci Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React MTF Example

#1746 Bundle Size — 208.75KiB (0%).

Uh oh!

relativeci Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

React Example

#8612 Bundle Size — 237.81KiB (0%).

Uh oh!

relativeci Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Web Explorer

#10188 Bundle Size — 903.53KiB (0%).

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

UI Judge

Result 1

Result 2

Result 3

Result 4

Result 5

Result 6

Result 7

Result 8

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

PupilTong commented May 22, 2026 •

edited by coderabbitai Bot

Loading

changeset-bot Bot commented May 22, 2026 •

edited

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading

codspeed-hq Bot commented May 22, 2026 •

edited

Loading

relativeci Bot commented May 22, 2026 •

edited

Loading

relativeci Bot commented May 22, 2026 •

edited

Loading

relativeci Bot commented May 22, 2026 •

edited

Loading

relativeci Bot commented May 22, 2026 •

edited

Loading

relativeci Bot commented May 22, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading