test(ui-judge): add GEQI dimension scoring#2693
Conversation
|
📝 WalkthroughWalkthroughThis PR extends UI Judge from single visual-correctness scoring to multi-dimension GEQI scoring across five weighted dimensions. The core judgePage, prompt registry, result initialization, comment rendering, and tests are updated to support dimension selection, weighted aggregation, and per-result metadata (demoId, dimensionLabel, weight). ChangesMulti-dimension GEQI scoring system
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
.github/scripts/write-ui-judge-result.mjs (1)
11-17: ⚡ Quick winAdd a fail-fast 100-point weight invariant.
A typo in
geqiDimensionsweights can silently skew GEQI summary percentages. Please validate total weight once before writing results.Proposed patch
const geqiDimensions = [ ['usability-interaction', 'Usability & Interaction', 30], ['visual-aesthetics', 'Visual & Aesthetics', 25], ['consistency-standards', 'Consistency & Standards', 15], ['architecture-writing', 'Architecture & UX Writing', 15], ['accessibility-performance', 'Accessibility & Performance', 15], ]; + +const totalWeight = geqiDimensions.reduce((sum, [, , weight]) => sum + weight, 0); +if (totalWeight !== 100) { + throw new Error(`GEQI weights must sum to 100, got ${totalWeight}.`); +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/scripts/write-ui-judge-result.mjs around lines 11 - 17, Validate that the sum of the weight values in the geqiDimensions array equals 100 before proceeding to write results; compute the total by summing the third element of each tuple in geqiDimensions and, if the total !== 100, throw an error or log a clear message and exit/fail-fast so the script stops rather than producing skewed percentages—add this check early in the script (before any result aggregation or file writes) and reference the geqiDimensions variable when locating the change..github/actions/ui-judge-comment/comment.mjs (1)
258-300: ⚡ Quick winConsider validating weight consistency across results for the same dimension.
When multiple results share the same dimension (lines 264-270), the code uses the weight from the first result encountered and doesn't check whether subsequent results for that dimension have the same weight. If the input data contains inconsistent weights for the same dimension, the aggregation will silently use an arbitrary first weight.
Given that GEQI dimension weights are model constants, all results for the same dimension should carry identical weights. Adding validation would catch input data issues early.
🛡️ Suggested validation
const existing = dimensionsById.get(result.dimension); if (existing) { + if (existing.weight !== result.weight) { + throw new Error( + `Dimension "${result.dimension}" has inconsistent weights: ${existing.weight} vs ${result.weight}` + ); + } existing.count += 1; existing.errorCount += result.error ? 1 : 0; existing.score += result.score;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/actions/ui-judge-comment/comment.mjs around lines 258 - 300, In buildWeightedSummary, validate that all results for the same dimension use the same weight: when merging a new result into dimensionsById (the existing object created for result.dimension), check that existing.weight === result.weight and if not, surface a failure (throw an Error or log and return undefined) so inconsistent input weights aren't silently accepted; update the merge logic around the existing variable in the for loop to perform this check before incrementing existing.count/errorCount/score.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/actions/ui-judge-comment/comment.mjs:
- Around line 258-300: In buildWeightedSummary, validate that all results for
the same dimension use the same weight: when merging a new result into
dimensionsById (the existing object created for result.dimension), check that
existing.weight === result.weight and if not, surface a failure (throw an Error
or log and return undefined) so inconsistent input weights aren't silently
accepted; update the merge logic around the existing variable in the for loop to
perform this check before incrementing existing.count/errorCount/score.
In @.github/scripts/write-ui-judge-result.mjs:
- Around line 11-17: Validate that the sum of the weight values in the
geqiDimensions array equals 100 before proceeding to write results; compute the
total by summing the third element of each tuple in geqiDimensions and, if the
total !== 100, throw an error or log a clear message and exit/fail-fast so the
script stops rather than producing skewed percentages—add this check early in
the script (before any result aggregation or file writes) and reference the
geqiDimensions variable when locating the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0ca79c4b-ca98-4017-8d68-c744c5e6d2b1
📒 Files selected for processing (7)
.github/actions/ui-judge-comment/README.md.github/actions/ui-judge-comment/comment.mjs.github/scripts/write-ui-judge-result.mjs.github/ui-judge.instructions.mdpackages/genui/ui-judge/README.mdpackages/genui/ui-judge/src/index.tspackages/genui/ui-judge/tests/judge-page.spec.ts
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Merging this PR will improve performance by 17.63%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ⚡ | transform 1000 view elements |
47.3 ms | 40.2 ms | +17.63% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing hw/codex/ui-judge-geqi-dimensions (42b4346) with main (11ef105)2
Footnotes
-
26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
-
No successful run was found on
main(1851187) during the generation of this report, so 11ef105 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
React Example with Element Template#881 Bundle Size — 201.67KiB (0%).42b4346(current) vs e73c383 main#880(baseline) Bundle metrics
|
| Current #881 |
Baseline #880 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
4 |
4 |
|
99 |
99 |
|
30 |
30 |
|
39.25% |
39.25% |
|
2 |
2 |
|
0 |
0 |
Bundle size by type no changes
| Current #881 |
Baseline #880 |
|
|---|---|---|
145.76KiB |
145.76KiB |
|
55.91KiB |
55.91KiB |
Bundle analysis report Branch hw/codex/ui-judge-geqi-dimension... Project dashboard
Generated by RelativeCI Documentation Report issue
React External#1729 Bundle Size — 698.01KiB (0%).42b4346(current) vs e73c383 main#1728(baseline) Bundle metrics
|
| Current #1729 |
Baseline #1728 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
3 |
3 |
|
17 |
17 |
|
5 |
5 |
|
8.59% |
8.59% |
|
0 |
0 |
|
0 |
0 |
Bundle analysis report Branch hw/codex/ui-judge-geqi-dimension... Project dashboard
Generated by RelativeCI Documentation Report issue
React MTF Example#1746 Bundle Size — 208.75KiB (0%).42b4346(current) vs e73c383 main#1745(baseline) Bundle metrics
|
| Current #1746 |
Baseline #1745 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
3 |
3 |
|
195 |
195 |
|
77 |
77 |
|
44.17% |
44.17% |
|
2 |
2 |
|
0 |
0 |
Bundle size by type no changes
| Current #1746 |
Baseline #1745 |
|
|---|---|---|
111.23KiB |
111.23KiB |
|
97.52KiB |
97.52KiB |
Bundle analysis report Branch hw/codex/ui-judge-geqi-dimension... Project dashboard
Generated by RelativeCI Documentation Report issue
React Example#8612 Bundle Size — 237.81KiB (0%).42b4346(current) vs e73c383 main#8611(baseline) Bundle metrics
|
| Current #8612 |
Baseline #8611 |
|
|---|---|---|
0B |
0B |
|
0B |
0B |
|
0% |
0% |
|
0 |
0 |
|
4 |
4 |
|
200 |
200 |
|
80 |
80 |
|
44.68% |
44.68% |
|
2 |
2 |
|
0 |
0 |
Bundle size by type no changes
| Current #8612 |
Baseline #8611 |
|
|---|---|---|
145.76KiB |
145.76KiB |
|
92.05KiB |
92.05KiB |
Bundle analysis report Branch hw/codex/ui-judge-geqi-dimension... Project dashboard
Generated by RelativeCI Documentation Report issue
Web Explorer#10188 Bundle Size — 903.53KiB (0%).42b4346(current) vs e73c383 main#10187(baseline) Bundle metrics
|
| Current #10188 |
Baseline #10187 |
|
|---|---|---|
45.06KiB |
45.06KiB |
|
2.22KiB |
2.22KiB |
|
0% |
0% |
|
9 |
9 |
|
11 |
11 |
|
230 |
230 |
|
11 |
11 |
|
27.12% |
27.12% |
|
10 |
10 |
|
0 |
0 |
Bundle size by type no changes
| Current #10188 |
Baseline #10187 |
|
|---|---|---|
499.15KiB |
499.15KiB |
|
402.16KiB |
402.16KiB |
|
2.22KiB |
2.22KiB |
Bundle analysis report Branch hw/codex/ui-judge-geqi-dimension... Project dashboard
Generated by RelativeCI Documentation Report issue
UI JudgeGEQI weighted score: 57.5 / 100 across 8 examples.
DetailsResult 1
Result 2
Result 3
Result 4
Result 5
Result 6
Result 7
Result 8
|
b01553e to
42b4346
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/actions/ui-judge-comment/comment.mjs:
- Around line 327-342: When aggregating into dimensionsById, validate that
subsequent result.weight values match the first stored weight for that dimension
(check existing.weight vs result.weight inside the existing branch handling for
result.dimension); if they differ, do not silently ignore—record the
inconsistency by (for example) adding a weightInconsistent flag and a
weightsSeen array or incrementing a mismatch counter on the existing dimension
object and emit a warning/log entry so the downstream GEQI weighted score
calculation can detect and surface bad input data instead of using a silently
wrong weight.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ab144c57-f234-428d-8afe-d7289f6b844e
📒 Files selected for processing (7)
.github/actions/ui-judge-comment/README.md.github/actions/ui-judge-comment/comment.mjs.github/scripts/write-ui-judge-result.mjs.github/ui-judge.instructions.mdpackages/genui/ui-judge/README.mdpackages/genui/ui-judge/src/index.tspackages/genui/ui-judge/tests/judge-page.spec.ts
✅ Files skipped from review due to trivial changes (3)
- .github/actions/ui-judge-comment/README.md
- .github/ui-judge.instructions.md
- packages/genui/ui-judge/README.md
| const existing = dimensionsById.get(result.dimension); | ||
| if (existing) { | ||
| existing.count += 1; | ||
| existing.errorCount += result.error ? 1 : 0; | ||
| existing.score += result.score; | ||
| continue; | ||
| } | ||
|
|
||
| dimensionsById.set(result.dimension, { | ||
| count: 1, | ||
| dimension: result.dimension, | ||
| errorCount: result.error ? 1 : 0, | ||
| label: result.dimensionLabel || result.dimension, | ||
| score: result.score, | ||
| weight: result.weight, | ||
| }); |
There was a problem hiding this comment.
Validate per-dimension weight consistency before aggregation.
On Line 341, the first encountered weight for a dimension is reused, and later conflicting weights are silently ignored. That can produce incorrect GEQI weighted scores in the PR comment without surfacing data issues.
Suggested fix
for (const result of weightedResults) {
const existing = dimensionsById.get(result.dimension);
if (existing) {
+ if (existing.weight !== result.weight) {
+ throw new Error(
+ `Inconsistent weight for dimension "${result.dimension}": `
+ + `${existing.weight} vs ${result.weight}.`,
+ );
+ }
existing.count += 1;
existing.errorCount += result.error ? 1 : 0;
existing.score += result.score;
continue;
}Also applies to: 345-363
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/actions/ui-judge-comment/comment.mjs around lines 327 - 342, When
aggregating into dimensionsById, validate that subsequent result.weight values
match the first stored weight for that dimension (check existing.weight vs
result.weight inside the existing branch handling for result.dimension); if they
differ, do not silently ignore—record the inconsistency by (for example) adding
a weightInconsistent flag and a weightsSeen array or incrementing a mismatch
counter on the existing dimension object and emit a warning/log entry so the
downstream GEQI weighted score calculation can detect and surface bad input data
instead of using a silently wrong weight.
Summary
Test Plan
Summary by CodeRabbit
New Features
Documentation