Skip to content

test(ui-judge): add GEQI dimension scoring#2693

Open
PupilTong wants to merge 1 commit into
mainfrom
hw/codex/ui-judge-geqi-dimensions
Open

test(ui-judge): add GEQI dimension scoring#2693
PupilTong wants to merge 1 commit into
mainfrom
hw/codex/ui-judge-geqi-dimensions

Conversation

@PupilTong
Copy link
Copy Markdown
Collaborator

@PupilTong PupilTong commented May 22, 2026

Summary

  • add GEQI scoring dimensions and dimension-specific Midscene prompts to judgePage
  • score each A2UI playground example across the five weighted GEQI dimensions
  • update the UI Judge PR comment to show weighted 100-point GEQI summaries while preserving raw 1-5 results

Test Plan

  • ./node_modules/.bin/dprint fmt packages/genui/ui-judge/src/index.ts packages/genui/ui-judge/tests/judge-page.spec.ts packages/genui/ui-judge/README.md .github/actions/ui-judge-comment/comment.mjs .github/actions/ui-judge-comment/README.md .github/scripts/write-ui-judge-result.mjs .github/ui-judge.instructions.md
  • ./node_modules/.bin/biome check packages/genui/ui-judge/src/index.ts packages/genui/ui-judge/tests/judge-page.spec.ts .github/actions/ui-judge-comment/comment.mjs .github/scripts/write-ui-judge-result.mjs
  • CI=1 pnpm --filter @lynx-js/ui-judge exec tsc -p tsconfig.json
  • env -u MIDSCENE_MODEL_NAME -u MIDSCENE_MODEL_API_KEY -u MIDSCENE_OPENAI_INIT_CONFIG_JSON CI=1 pnpm --filter @lynx-js/ui-judge test
  • INPUT_DRY_RUN=true INPUT_RESULT_JSON='' node .github/actions/ui-judge-comment/comment.mjs
  • UI_JUDGE_RESULT_FILE=/private/tmp/ui-judge-fallback.json UI_JUDGE_RESULT_ERROR_MESSAGE='Midscene secrets are unavailable; UI Judge model test was skipped.' node .github/scripts/write-ui-judge-result.mjs && INPUT_DRY_RUN=true INPUT_RESULT_FILE=/private/tmp/ui-judge-fallback.json node .github/actions/ui-judge-comment/comment.mjs
  • pnpm turbo build --filter @lynx-js/ui-judge

Summary by CodeRabbit

  • New Features

    • UI Judge now scores across five GEQI dimensions with per-example weights and shows both weighted 0–100 GEQI summaries and raw 1–5 Likert scores.
    • PR comments and result tables now include example IDs, dimension labels, per-example weights, and a separate weighted GEQI summary alongside visual-correctness.
  • Documentation

    • Updated instructions and READMEs to describe multi-dimensional GEQI scoring, required dimension labels/weights, and how results are rendered.

Review Change Stack

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 22, 2026

⚠️ No Changeset found

Latest commit: 42b4346

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

📝 Walkthrough

Walkthrough

This PR extends UI Judge from single visual-correctness scoring to multi-dimension GEQI scoring across five weighted dimensions. The core judgePage, prompt registry, result initialization, comment rendering, and tests are updated to support dimension selection, weighted aggregation, and per-result metadata (demoId, dimensionLabel, weight).

Changes

Multi-dimension GEQI scoring system

Layer / File(s) Summary
Type contracts and dimension definitions
packages/genui/ui-judge/src/index.ts, packages/genui/ui-judge/tests/judge-page.spec.ts
UiJudgeDimension union expands to five dimensions; JudgePageOptions.dimension and UiJudgeResult.dimension updated; tests enumerate GEQI dimension cases.
Result initialization (write defaults)
.github/scripts/write-ui-judge-result.mjs
Adds geqiDimensions and initializes default JSON with a dimensions array containing dimension, dimensionLabel, weight, score, error, steps, and url.
Core judgePage normalization and error handling
packages/genui/ui-judge/src/index.ts
Option normalization validates/resolves dimensions via normalizeDimension/getResultDimension; judgePage returns normalized dimension on success and fallback on error.
Dimension-driven prompt registry and scoring
packages/genui/ui-judge/src/index.ts
Introduces JUDGE_DIMENSION_PROMPTS and buildJudgePrompt to compose dimension-specific rubrics; grading uses Likert-style 1–5 (plus 0).
Comment generation with weighted summaries and metadata
.github/actions/ui-judge-comment/comment.mjs, .github/actions/ui-judge-comment/README.md
Normalizes demoId/dimensionLabel/weight, validates weight, computes weighted GEQI aggregates, conditionally renders GEQI weighted intro and dimension-summary table, extends result tables with Example/Weight and per-dimension columns, and centralizes table row formatting.
Test coverage for multi-dimension scoring
packages/genui/ui-judge/tests/judge-page.spec.ts
Test loop now upserts visual and per-dimension results into a Map keyed by demoId, writes JSON per update, and serializes results with demoId and dimensions arrays.
Documentation and instructions
packages/genui/ui-judge/README.md, .github/ui-judge.instructions.md
README generalizes scoring description, documents dimension option and supported values, and instructions describe GEQI scoring rules and payload fields required for weighted PR comment rendering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • lynx-family/lynx-stack#2689: Updates judge-page test coverage and multi-result writing similar to this PR's test changes.
  • lynx-family/lynx-stack#2629: Introduces the original judgePage visual-correctness implementation that this PR extends to multi-dimension scoring.
  • lynx-family/lynx-stack#2673: Previous changes to the UI Judge comment action that this PR further extends with weighted summaries and dimension metadata.

Suggested reviewers

  • Sherry-hue
  • HuJean
  • colinaaa

Poem

🐰 I hopped through prompts and weights today,
Five dim'ns to judge each demo's play.
Labels, demos, scores aligned in rows,
A carrot for each weighted GEQI that grows.
Cheers — the rabbit claps and twirls away.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding GEQI dimension scoring to the UI Judge test. It directly summarizes the primary objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hw/codex/ui-judge-geqi-dimensions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
.github/scripts/write-ui-judge-result.mjs (1)

11-17: ⚡ Quick win

Add a fail-fast 100-point weight invariant.

A typo in geqiDimensions weights can silently skew GEQI summary percentages. Please validate total weight once before writing results.

Proposed patch
 const geqiDimensions = [
   ['usability-interaction', 'Usability & Interaction', 30],
   ['visual-aesthetics', 'Visual & Aesthetics', 25],
   ['consistency-standards', 'Consistency & Standards', 15],
   ['architecture-writing', 'Architecture & UX Writing', 15],
   ['accessibility-performance', 'Accessibility & Performance', 15],
 ];
+
+const totalWeight = geqiDimensions.reduce((sum, [, , weight]) => sum + weight, 0);
+if (totalWeight !== 100) {
+  throw new Error(`GEQI weights must sum to 100, got ${totalWeight}.`);
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/scripts/write-ui-judge-result.mjs around lines 11 - 17, Validate
that the sum of the weight values in the geqiDimensions array equals 100 before
proceeding to write results; compute the total by summing the third element of
each tuple in geqiDimensions and, if the total !== 100, throw an error or log a
clear message and exit/fail-fast so the script stops rather than producing
skewed percentages—add this check early in the script (before any result
aggregation or file writes) and reference the geqiDimensions variable when
locating the change.
.github/actions/ui-judge-comment/comment.mjs (1)

258-300: ⚡ Quick win

Consider validating weight consistency across results for the same dimension.

When multiple results share the same dimension (lines 264-270), the code uses the weight from the first result encountered and doesn't check whether subsequent results for that dimension have the same weight. If the input data contains inconsistent weights for the same dimension, the aggregation will silently use an arbitrary first weight.

Given that GEQI dimension weights are model constants, all results for the same dimension should carry identical weights. Adding validation would catch input data issues early.

🛡️ Suggested validation
     const existing = dimensionsById.get(result.dimension);
     if (existing) {
+      if (existing.weight !== result.weight) {
+        throw new Error(
+          `Dimension "${result.dimension}" has inconsistent weights: ${existing.weight} vs ${result.weight}`
+        );
+      }
       existing.count += 1;
       existing.errorCount += result.error ? 1 : 0;
       existing.score += result.score;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/actions/ui-judge-comment/comment.mjs around lines 258 - 300, In
buildWeightedSummary, validate that all results for the same dimension use the
same weight: when merging a new result into dimensionsById (the existing object
created for result.dimension), check that existing.weight === result.weight and
if not, surface a failure (throw an Error or log and return undefined) so
inconsistent input weights aren't silently accepted; update the merge logic
around the existing variable in the for loop to perform this check before
incrementing existing.count/errorCount/score.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/actions/ui-judge-comment/comment.mjs:
- Around line 258-300: In buildWeightedSummary, validate that all results for
the same dimension use the same weight: when merging a new result into
dimensionsById (the existing object created for result.dimension), check that
existing.weight === result.weight and if not, surface a failure (throw an Error
or log and return undefined) so inconsistent input weights aren't silently
accepted; update the merge logic around the existing variable in the for loop to
perform this check before incrementing existing.count/errorCount/score.

In @.github/scripts/write-ui-judge-result.mjs:
- Around line 11-17: Validate that the sum of the weight values in the
geqiDimensions array equals 100 before proceeding to write results; compute the
total by summing the third element of each tuple in geqiDimensions and, if the
total !== 100, throw an error or log a clear message and exit/fail-fast so the
script stops rather than producing skewed percentages—add this check early in
the script (before any result aggregation or file writes) and reference the
geqiDimensions variable when locating the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0ca79c4b-ca98-4017-8d68-c744c5e6d2b1

📥 Commits

Reviewing files that changed from the base of the PR and between 1851187 and b01553e.

📒 Files selected for processing (7)
  • .github/actions/ui-judge-comment/README.md
  • .github/actions/ui-judge-comment/comment.mjs
  • .github/scripts/write-ui-judge-result.mjs
  • .github/ui-judge.instructions.md
  • packages/genui/ui-judge/README.md
  • packages/genui/ui-judge/src/index.ts
  • packages/genui/ui-judge/tests/judge-page.spec.ts

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 22, 2026

Merging this PR will improve performance by 17.63%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
✅ 80 untouched benchmarks
⏩ 26 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
transform 1000 view elements 47.3 ms 40.2 ms +17.63%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing hw/codex/ui-judge-geqi-dimensions (42b4346) with main (11ef105)2

Open in CodSpeed

Footnotes

  1. 26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (1851187) during the generation of this report, so 11ef105 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 22, 2026

React Example with Element Template

#881 Bundle Size — 201.67KiB (0%).

42b4346(current) vs e73c383 main#880(baseline)

Bundle metrics  no changes
                 Current
#881
     Baseline
#880
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 4 4
No change  Modules 99 99
No change  Duplicate Modules 30 30
No change  Duplicate Code 39.25% 39.25%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#881
     Baseline
#880
No change  IMG 145.76KiB 145.76KiB
No change  Other 55.91KiB 55.91KiB

Bundle analysis reportBranch hw/codex/ui-judge-geqi-dimension...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 22, 2026

React External

#1729 Bundle Size — 698.01KiB (0%).

42b4346(current) vs e73c383 main#1728(baseline)

Bundle metrics  no changes
                 Current
#1729
     Baseline
#1728
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 17 17
No change  Duplicate Modules 5 5
No change  Duplicate Code 8.59% 8.59%
No change  Packages 0 0
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#1729
     Baseline
#1728
No change  Other 698.01KiB 698.01KiB

Bundle analysis reportBranch hw/codex/ui-judge-geqi-dimension...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 22, 2026

React MTF Example

#1746 Bundle Size — 208.75KiB (0%).

42b4346(current) vs e73c383 main#1745(baseline)

Bundle metrics  no changes
                 Current
#1746
     Baseline
#1745
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 195 195
No change  Duplicate Modules 77 77
No change  Duplicate Code 44.17% 44.17%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#1746
     Baseline
#1745
No change  IMG 111.23KiB 111.23KiB
No change  Other 97.52KiB 97.52KiB

Bundle analysis reportBranch hw/codex/ui-judge-geqi-dimension...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 22, 2026

React Example

#8612 Bundle Size — 237.81KiB (0%).

42b4346(current) vs e73c383 main#8611(baseline)

Bundle metrics  no changes
                 Current
#8612
     Baseline
#8611
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 4 4
No change  Modules 200 200
No change  Duplicate Modules 80 80
No change  Duplicate Code 44.68% 44.68%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#8612
     Baseline
#8611
No change  IMG 145.76KiB 145.76KiB
No change  Other 92.05KiB 92.05KiB

Bundle analysis reportBranch hw/codex/ui-judge-geqi-dimension...Project dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 22, 2026

Web Explorer

#10188 Bundle Size — 903.53KiB (0%).

42b4346(current) vs e73c383 main#10187(baseline)

Bundle metrics  no changes
                 Current
#10188
     Baseline
#10187
No change  Initial JS 45.06KiB 45.06KiB
No change  Initial CSS 2.22KiB 2.22KiB
No change  Cache Invalidation 0% 0%
No change  Chunks 9 9
No change  Assets 11 11
No change  Modules 230 230
No change  Duplicate Modules 11 11
No change  Duplicate Code 27.12% 27.12%
No change  Packages 10 10
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#10188
     Baseline
#10187
No change  JS 499.15KiB 499.15KiB
No change  Other 402.16KiB 402.16KiB
No change  CSS 2.22KiB 2.22KiB

Bundle analysis reportBranch hw/codex/ui-judge-geqi-dimension...Project dashboard


Generated by RelativeCIDocumentationReport issue

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

UI Judge

GEQI weighted score: 57.5 / 100 across 8 examples.
Average visual-correctness score: 3.1 / 5.

Dimension Weight Average Results Status
Usability & Interaction 30% 2.9 / 5 8 OK
Visual & Aesthetics 25% 2.9 / 5 8 OK
Consistency & Standards 15% 2.9 / 5 8 OK
Architecture & UX Writing 15% 2.8 / 5 8 OK
Accessibility & Performance 15% 3 / 5 8 OK
# Example Visual Correctness Usability & Interaction (30%) Visual & Aesthetics (25%) Consistency & Standards (15%) Architecture & UX Writing (15%) Accessibility & Performance (15%) GEQI Page Status
1 recs 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 40 / 100 preview OK
2 cast-grid 5 / 5 3 / 5 3 / 5 4 / 5 5 / 5 4 / 5 72 / 100 preview OK
3 citywalk-list 2 / 5 2 / 5 3 / 5 2 / 5 1 / 5 3 / 5 45 / 100 preview OK
4 fridge-search 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 40 / 100 preview OK
5 trip-planner 2 / 5 2 / 5 3 / 5 2 / 5 2 / 5 2 / 5 45 / 100 preview OK
6 weather-current 5 / 5 5 / 5 4 / 5 5 / 5 4 / 5 5 / 5 92 / 100 preview OK
7 product-card 5 / 5 5 / 5 4 / 5 4 / 5 4 / 5 4 / 5 86 / 100 preview OK
8 workout-plan 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 2 / 5 40 / 100 preview OK
Details

Result 1

  • Example: recs
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 2 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 2 / 5 (15%)
    • Accessibility & Performance: 2 / 5 (15%)
  • Task: The A2UI playground preview should show date-night dining recommendations for Moonlight Terrace, Pinewood Bistro, and Sea Breeze Kitchen.

Result 2

  • Example: cast-grid
  • Dimension: visual-correctness
  • Visual correctness: 5 / 5
  • GEQI dimensions:
    • Usability & Interaction: 3 / 5 (30%)
    • Visual & Aesthetics: 3 / 5 (25%)
    • Consistency & Standards: 4 / 5 (15%)
    • Architecture & UX Writing: 5 / 5 (15%)
    • Accessibility & Performance: 4 / 5 (15%)
  • Task: The A2UI playground preview should show a cast grid for the short film Night Notes, including Lin Xia and Zhou Ning cast cards.

Result 3

  • Example: citywalk-list
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 3 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 1 / 5 (15%)
    • Accessibility & Performance: 3 / 5 (15%)
  • Task: The A2UI playground preview should show weekend citywalk coffee picks with Rooftop Brew Room, Corner Canvas Lab, and Late Sun Roastery.

Result 4

  • Example: fridge-search
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 2 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 2 / 5 (15%)
    • Accessibility & Performance: 2 / 5 (15%)
  • Task: The A2UI playground preview should show refrigerator search results with Siemens, Hualing, Haier, and Midea product cards.

Result 5

  • Example: trip-planner
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 3 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 2 / 5 (15%)
    • Accessibility & Performance: 2 / 5 (15%)
  • Task: The A2UI playground preview should show a Kyoto 48-hour trip planner with Day 1 and Day 2 itinerary sections, including Monkey Park Viewpoint.

Result 6

  • Example: weather-current
  • Dimension: visual-correctness
  • Visual correctness: 5 / 5
  • GEQI dimensions:
    • Usability & Interaction: 5 / 5 (30%)
    • Visual & Aesthetics: 4 / 5 (25%)
    • Consistency & Standards: 5 / 5 (15%)
    • Architecture & UX Writing: 4 / 5 (15%)
    • Accessibility & Performance: 5 / 5 (15%)
  • Task: The A2UI playground preview should show the current weather for Austin, TX, including clear skies with light breeze.

Result 7

  • Example: product-card
  • Dimension: visual-correctness
  • Visual correctness: 5 / 5
  • GEQI dimensions:
    • Usability & Interaction: 5 / 5 (30%)
    • Visual & Aesthetics: 4 / 5 (25%)
    • Consistency & Standards: 4 / 5 (15%)
    • Architecture & UX Writing: 4 / 5 (15%)
    • Accessibility & Performance: 4 / 5 (15%)
  • Task: The A2UI playground preview should show a Wireless Headphones Pro product card with a visible Add to Cart action.

Result 8

  • Example: workout-plan
  • Dimension: visual-correctness
  • Visual correctness: 2 / 5
  • GEQI dimensions:
    • Usability & Interaction: 2 / 5 (30%)
    • Visual & Aesthetics: 2 / 5 (25%)
    • Consistency & Standards: 2 / 5 (15%)
    • Architecture & UX Writing: 2 / 5 (15%)
    • Accessibility & Performance: 2 / 5 (15%)
  • Task: The A2UI playground preview should show a weekly workout plan with five days from Monday Ramp-Up through Friday Conditioning.

Workflow run (attempt 3)

@PupilTong PupilTong force-pushed the hw/codex/ui-judge-geqi-dimensions branch from b01553e to 42b4346 Compare May 22, 2026 09:28
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/actions/ui-judge-comment/comment.mjs:
- Around line 327-342: When aggregating into dimensionsById, validate that
subsequent result.weight values match the first stored weight for that dimension
(check existing.weight vs result.weight inside the existing branch handling for
result.dimension); if they differ, do not silently ignore—record the
inconsistency by (for example) adding a weightInconsistent flag and a
weightsSeen array or incrementing a mismatch counter on the existing dimension
object and emit a warning/log entry so the downstream GEQI weighted score
calculation can detect and surface bad input data instead of using a silently
wrong weight.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ab144c57-f234-428d-8afe-d7289f6b844e

📥 Commits

Reviewing files that changed from the base of the PR and between b01553e and 42b4346.

📒 Files selected for processing (7)
  • .github/actions/ui-judge-comment/README.md
  • .github/actions/ui-judge-comment/comment.mjs
  • .github/scripts/write-ui-judge-result.mjs
  • .github/ui-judge.instructions.md
  • packages/genui/ui-judge/README.md
  • packages/genui/ui-judge/src/index.ts
  • packages/genui/ui-judge/tests/judge-page.spec.ts
✅ Files skipped from review due to trivial changes (3)
  • .github/actions/ui-judge-comment/README.md
  • .github/ui-judge.instructions.md
  • packages/genui/ui-judge/README.md

Comment on lines +327 to +342
const existing = dimensionsById.get(result.dimension);
if (existing) {
existing.count += 1;
existing.errorCount += result.error ? 1 : 0;
existing.score += result.score;
continue;
}

dimensionsById.set(result.dimension, {
count: 1,
dimension: result.dimension,
errorCount: result.error ? 1 : 0,
label: result.dimensionLabel || result.dimension,
score: result.score,
weight: result.weight,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate per-dimension weight consistency before aggregation.

On Line 341, the first encountered weight for a dimension is reused, and later conflicting weights are silently ignored. That can produce incorrect GEQI weighted scores in the PR comment without surfacing data issues.

Suggested fix
   for (const result of weightedResults) {
     const existing = dimensionsById.get(result.dimension);
     if (existing) {
+      if (existing.weight !== result.weight) {
+        throw new Error(
+          `Inconsistent weight for dimension "${result.dimension}": `
+          + `${existing.weight} vs ${result.weight}.`,
+        );
+      }
       existing.count += 1;
       existing.errorCount += result.error ? 1 : 0;
       existing.score += result.score;
       continue;
     }

Also applies to: 345-363

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/actions/ui-judge-comment/comment.mjs around lines 327 - 342, When
aggregating into dimensionsById, validate that subsequent result.weight values
match the first stored weight for that dimension (check existing.weight vs
result.weight inside the existing branch handling for result.dimension); if they
differ, do not silently ignore—record the inconsistency by (for example) adding
a weightInconsistent flag and a weightsSeen array or incrementing a mismatch
counter on the existing dimension object and emit a warning/log entry so the
downstream GEQI weighted score calculation can detect and surface bad input data
instead of using a silently wrong weight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant