Skip to content

feat: responsive viewport comparison in calibration (#182)#183

Merged
let-sunny merged 8 commits intomainfrom
feat/182-responsive-visual-compare
Mar 29, 2026
Merged

feat: responsive viewport comparison in calibration (#182)#183
let-sunny merged 8 commits intomainfrom
feat/182-responsive-visual-compare

Conversation

@let-sunny
Copy link
Copy Markdown
Owner

@let-sunny let-sunny commented Mar 29, 2026

Summary

캘리브레이션 파이프라인에 responsive viewport 비교 추가. responsive-critical 룰의 score를 AI 의견이 아닌 pixel-level 측정값으로 검증.

Problem

기존 파이프라인은 디자인 원본 사이즈에서만 1회 비교. Ablation 실험에서 확인된 바:

  • 같은 뷰포트: bad-structure 98% vs good 94% (구분 불가)
  • 다른 뷰포트: bad-structure 63% vs good 95% (-32%p gap)

fixed-size-in-auto-layout, missing-size-constraint 등 responsive-critical 룰의 score가 측정 없이 AI 의견으로만 validated/overscored 판정됨.

Changes

1. visual-compare CLI — --figma-screenshot 옵션 추가

  • 로컬 스크린샷 파일을 직접 지정 (Figma API fetch 생략)
  • --figma-url과 조합: URL은 파싱용, 스크린샷은 로컬 파일 사용
  • 기존 --figma-url 단독 사용은 변경 없음 (하위 호환)
# 기존 (변경 없음)
npx canicode visual-compare output.html --figma-url "..." --output ./run

# 새로운: 확대 뷰포트 비교
npx canicode visual-compare output.html \
  --figma-url "..." \
  --figma-screenshot fixtures/desktop-home-page/screenshot-1920.png \
  --width 1920 \
  --output ./run/responsive

2. visual-compare core — figmaScreenshotPath 옵션

  • VisualCompareOptionsfigmaScreenshotPath 필드 추가
  • 제공되면 해당 파일을 figma.png로 복사, API fetch 생략
  • 없으면 기존 동작 (URL fetch 또는 캐시)

3. Converter 프롬프트 — 2-pass 비교

  • fixture에서 screenshot-*.png glob → 숫자 정렬 → 가장 작은 = 원본, 가장 큰 = 확대
  • 원본 비교 (기존) → similarity
  • 확대 비교 (신규) → responsiveSimilarity
  • responsiveDelta = similarity - responsiveSimilarity
  • 스크린샷 1개뿐이면 responsive 비교 skip

4. conversion.json — responsive 필드 추가

{
  "similarity": 87,
  "responsiveSimilarity": 72,
  "responsiveDelta": 15,
  "responsiveViewport": 1920,
  ...
}

5. Evaluation — responsiveDelta 기반 difficulty 결정

  • responsive-critical 룰에 한해 AI 의견 대신 측정된 responsiveDelta로 difficulty 판정
  • 임계값 (Experiment 04 기반):
responsiveDelta Difficulty 근거
≤5%p easy 뷰포트 변경해도 거의 안 깨짐
6-15%p moderate 눈에 띄는 깨짐
16-30%p hard 심각한 깨짐
>30%p failed 완전히 깨짐 (Exp.04: -32%p)
  • 기존 mismatch의 typereasoning이 측정값으로 덮어씌워지며, 원래 AI 의견도 기록에 남김

6. Orchestrator — responsiveDelta 전달

  • conversion.json에서 responsiveDelta 추출 → evaluation input에 전달
  • null이면 responsive override 안 함 (하위 호환)

Architecture

Converter
  ├── visual-compare (original viewport) → similarity
  └── visual-compare --figma-screenshot (expanded) → responsiveSimilarity
                                                      ↓
  responsiveDelta = similarity - responsiveSimilarity
                                                      ↓
Evaluation
  ├── non-responsive rules: AI opinion (기존)
  └── responsive-critical rules: responsiveDelta (측정값) ← NEW

Test plan

  • pnpm lint — pass
  • pnpm test:run — 646 tests pass
  • calibrate-loop 실행으로 responsive 비교 동작 확인

Closes #182

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Responsive viewport visual comparison when multiple fixture screenshots exist; records responsiveSimilarity, responsiveDelta and responsiveViewport
    • Use a local Figma screenshot via CLI (--figma-screenshot) and automatic export-scale detection for local screenshots
    • Capture HTML/code metrics (bytes, lines, CSS class/variable counts) in conversion output
  • Behavior / Evaluation

    • Rule evaluation can be influenced by responsiveDelta to adjust difficulty and validation
  • Documentation

    • Updated converter and calibration loop docs to describe new outputs and metrics
  • Tests

    • Added tests for responsive-driven evaluation overrides

Add --figma-screenshot option to visual-compare CLI for local screenshot
comparison without API fetch. Converter now runs 2-pass comparison:
original viewport + expanded viewport from fixture screenshots.

responsiveDelta feeds into evaluation to validate responsive-critical
rule scores with measured pixel data instead of AI opinion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 29, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds an optional responsive-viewport comparison and HTML/CSS code-metric collection to the converter; orchestrator forwards a numeric responsiveDelta to the evaluation agent, which can selectively reclassify responsive-critical rules based on that delta. Visual-compare and its CLI accept local Figma screenshot inputs and infer export scale.

Changes

Cohort / File(s) Summary
Converter docs
​.claude/agents/calibration/converter.md
Documented responsive comparison step; added responsiveSimilarity, responsiveDelta, responsiveViewport, and code metrics (htmlBytes, htmlLines, cssClassCount, cssVariableCount) to conversion.json.
Evaluation contract
src/agents/contracts/evaluation-agent.ts
Added optional `responsiveDelta?: number
Evaluation logic & tests
src/agents/evaluation-agent.ts, src/agents/evaluation-agent.test.ts
Added responsiveDeltaToDifficulty helper and logic to override classifications for rules in RULE_ID_CATEGORY with category "responsive-critical" when responsiveDelta is provided; updated reasoning/validated set handling and added tests covering overrides and edge cases.
Orchestrator wiring
src/agents/orchestrator.ts
Extracts numeric responsiveDelta from conversionJson["responsiveDelta"] (or null) and passes it into runEvaluationAgent.
CLI visual-compare
src/cli/commands/visual-compare.ts
Added --figma-screenshot option; allowed either --figma-url or --figma-screenshot; made token requirement conditional; pass screenshot path through to visualCompare.
Visual-compare helpers
src/core/engine/visual-compare-helpers.ts
Added exported KNOWN_1X_WIDTHS, inferExportScale, countCssClasses, countCssVariables, and computeCodeMetrics.
Visual-compare core
src/core/engine/visual-compare.ts
Added VisualCompareOptions.figmaScreenshotPath; support validating/copying local PNG as ground truth and infer exportScale from PNG width when using local screenshot.
Ablation helpers re-export
src/agents/ablation/helpers.ts
Replaced local 1x-width heuristic and CSS-metric implementations with imports/re-exports from visual-compare-helpers.
Calibration loop docs & logging
​.claude/commands/calibrate-loop.md
Persist converter token/tool/duration metrics into conversion.json and include token counts in converter activity logs (activity.jsonl).

Sequence Diagrams

sequenceDiagram
    participant Converter
    participant Orchestrator
    participant EvaluationAgent
    participant RuleClassifier
    Converter->>Orchestrator: emit conversionJson (includes responsiveDelta, code metrics)
    Orchestrator->>EvaluationAgent: runEvaluationAgent(input + responsiveDelta)
    EvaluationAgent->>EvaluationAgent: if responsiveDelta != null\ncompute responsiveDifficulty
    EvaluationAgent->>RuleClassifier: classifyFlaggedRule(score, responsiveDifficulty)
    RuleClassifier-->>EvaluationAgent: updated type & actualDifficulty
    EvaluationAgent->>EvaluationAgent: update reasoning & validatedRuleSet
    EvaluationAgent-->>Orchestrator: return updated mismatches & report
Loading
sequenceDiagram
    participant User
    participant CLI
    participant VisualCompare
    participant FileSystem
    User->>CLI: visual-compare --figma-screenshot <path>
    CLI->>VisualCompare: visualCompare({figmaScreenshotPath: path, ...})
    VisualCompare->>FileSystem: stat/validate path
    FileSystem-->>VisualCompare: file metadata (width, exists)
    VisualCompare->>VisualCompare: inferExportScale(png.width)
    VisualCompare->>FileSystem: copy PNG -> outputDir/figma.png
    VisualCompare->>VisualCompare: run comparison with inferred scale
    VisualCompare-->>CLI: comparison results (similarity, responsiveSimilarity, metrics)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped from small to wide this morn,
I measured deltas where views were torn.
Pixels whispered, classes counted true,
Rules were nudged by what I knew.
Hooray — the rabbit logged the view!

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: responsive viewport comparison in calibration (#182)' clearly and specifically describes the main feature addition: implementing responsive viewport comparison in the calibration pipeline.
Linked Issues check ✅ Passed The PR implements all core requirements: responsive viewport comparison with dual-screenshot processing, responsiveDelta measurement, evaluation override logic for responsive-critical rules with specified thresholds, backward compatibility, and code metrics collection.
Out of Scope Changes check ✅ Passed All changes directly support responsive viewport comparison: CLI option for local screenshots, converter two-pass logic with responsiveDelta, evaluation rule override using thresholds, orchestrator integration, visual-compare core extensions, and refactoring of CSS metric helpers for code reuse.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/182-responsive-visual-compare

Comment @coderabbitai help to get the list of available commands and usage tips.

let-sunny and others added 2 commits March 29, 2026 23:13
- Delete rule from validatedRuleSet when override changes type away from validated
- Clamp negative responsiveDelta to 0 (treat as easy)
- Add 4 tests: override to underscored, keep validated, skip non-responsive, negative delta

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reuse ablation helpers' KNOWN_1X_WIDTHS convention: 1920/768 = @1x,
others = @2x. When --figma-screenshot is used, inferExportScale()
reads PNG width and sets the correct scale automatically.

Previously defaulted to @2x for all local screenshots, causing
dimension mismatch for 1920px/768px fixture screenshots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@let-sunny let-sunny marked this pull request as ready for review March 29, 2026 14:22
@let-sunny let-sunny marked this pull request as draft March 29, 2026 14:25
- Add computeCodeMetrics() to visual-compare-helpers.ts
  (htmlBytes, htmlLines, cssClassCount, cssVariableCount)
- Converter records metrics in conversion.json after generating output.html
- De-duplicate: ablation helpers.ts re-exports from core instead of own copy
- Metrics proxy for token cost — subagents can't measure tokens directly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/agents/calibration/converter.md:
- Around line 63-77: The ordered list item labelled "5. **Responsive
comparison**" has incorrect/ inconsistent numbering; update the list numbering
to continue the sequence used earlier (e.g., ensure it follows the prior item
number) and adjust the markdown so there is a blank line before and after the
fenced code block to satisfy MD031; locate the "Responsive comparison" section
and the fenced code block that starts with "```bash" and correct the list
numeral and surrounding blank lines accordingly.

In `@src/agents/evaluation-agent.ts`:
- Around line 172-193: The code uses RULE_ID_CATEGORY[mismatch.ruleId as RuleId]
without guarding for an unknown key; change the block in evaluate/adjustment so
you first ensure mismatch.ruleId is present and that
RULE_ID_CATEGORY[mismatch.ruleId] is !== undefined before comparing to
"responsive-critical" (remove the unsafe `as RuleId` cast), e.g. compute const
category = RULE_ID_CATEGORY[mismatch.ruleId]; if (!mismatch.ruleId || category
=== undefined) continue; then proceed with classifyFlaggedRule, setting
mismatch.type, actualDifficulty, reasoning and updating validatedRuleSet as
before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5eb89a93-5ef4-4fde-a67b-6fea4343f405

📥 Commits

Reviewing files that changed from the base of the PR and between 87e7d64 and 2289bf8.

📒 Files selected for processing (8)
  • .claude/agents/calibration/converter.md
  • src/agents/contracts/evaluation-agent.ts
  • src/agents/evaluation-agent.test.ts
  • src/agents/evaluation-agent.ts
  • src/agents/orchestrator.ts
  • src/cli/commands/visual-compare.ts
  • src/core/engine/visual-compare-helpers.ts
  • src/core/engine/visual-compare.ts

let-sunny and others added 2 commits March 29, 2026 23:29
Orchestrator records converterTokens, converterToolUses, converterDurationMs
from subagent usage metadata. Direct measurement — no API proxy needed.

Code metrics (htmlBytes, cssClassCount etc.) remain for output quality,
token usage now measured from subagent execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix ordered list numbering in converter.md (MD031/MD029)
- Guard unknown ruleId before RULE_ID_CATEGORY access (noUncheckedIndexedAccess)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@let-sunny let-sunny marked this pull request as ready for review March 29, 2026 14:33
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/agents/ablation/helpers.ts (1)

154-156: 🧹 Nitpick | 🔵 Trivial

Duplicate KNOWN_1X_WIDTHS constant — import from shared helpers instead.

This constant duplicates KNOWN_1X_WIDTHS from src/core/engine/visual-compare-helpers.ts (line 19). Since you're already importing from that module, consider importing this constant as well to maintain a single source of truth.

♻️ Proposed fix
-import { compareScreenshots } from "../../core/engine/visual-compare-helpers.js";
+import { compareScreenshots, KNOWN_1X_WIDTHS } from "../../core/engine/visual-compare-helpers.js";

Then remove lines 155-156:

   // Figma save-fixture exports at `@2x` by default. 1920/768 condition screenshots are `@1x`.
   // Detect: if width matches a known `@1x` size (1920, 768), use scale 1. Otherwise `@2x`.
-  const KNOWN_1X_WIDTHS = [1920, 768];
   const exportScale = KNOWN_1X_WIDTHS.includes(figmaWidth) ? 1 : 2;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/helpers.ts` around lines 154 - 156, Remove the duplicate
KNOWN_1X_WIDTHS constant in src/agents/ablation/helpers.ts and import the shared
KNOWN_1X_WIDTHS from src/core/engine/visual-compare-helpers.ts instead; update
the code that computes exportScale (the line using
KNOWN_1X_WIDTHS.includes(figmaWidth) ? 1 : 2) to reference the imported constant
and delete the local declaration so the module uses the single source of truth.
.claude/agents/calibration/converter.md (1)

81-90: ⚠️ Potential issue | 🟡 Minor

Duplicate step numbering: two steps labeled "6".

Lines 81 and 90 are both numbered "6". The second should be "7" to maintain sequential ordering.

📝 Suggested fix
-6. **MANDATORY — Rule Impact Assessment**: For EVERY rule ID in `nodeIssueSummaries[].flaggedRuleIds`, assess its actual impact on conversion.
+7. **MANDATORY — Rule Impact Assessment**: For EVERY rule ID in `nodeIssueSummaries[].flaggedRuleIds`, assess its actual impact on conversion.

Also update the subsequent steps:

  • Line 94: "7." → "8."
  • Line 99: "8." → "9."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/agents/calibration/converter.md around lines 81 - 90, Fix the
duplicate step numbering in the calibration converter docs by renumbering the
second "6." (the Rule Impact Assessment step that references
nodeIssueSummaries[].flaggedRuleIds and ruleImpactAssessment) to "7.", then
increment the following steps accordingly (change the current "7." to "8." and
"8." to "9.") so the sequence is consecutive.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In @.claude/agents/calibration/converter.md:
- Around line 81-90: Fix the duplicate step numbering in the calibration
converter docs by renumbering the second "6." (the Rule Impact Assessment step
that references nodeIssueSummaries[].flaggedRuleIds and ruleImpactAssessment) to
"7.", then increment the following steps accordingly (change the current "7." to
"8." and "8." to "9.") so the sequence is consecutive.

In `@src/agents/ablation/helpers.ts`:
- Around line 154-156: Remove the duplicate KNOWN_1X_WIDTHS constant in
src/agents/ablation/helpers.ts and import the shared KNOWN_1X_WIDTHS from
src/core/engine/visual-compare-helpers.ts instead; update the code that computes
exportScale (the line using KNOWN_1X_WIDTHS.includes(figmaWidth) ? 1 : 2) to
reference the imported constant and delete the local declaration so the module
uses the single source of truth.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fae58546-060d-4004-a18a-5a100762f686

📥 Commits

Reviewing files that changed from the base of the PR and between 2289bf8 and 8b3934b.

📒 Files selected for processing (5)
  • .claude/agents/calibration/converter.md
  • .claude/commands/calibrate-loop.md
  • src/agents/ablation/helpers.ts
  • src/agents/evaluation-agent.ts
  • src/core/engine/visual-compare-helpers.ts

- ablation helpers.ts: import inferExportScale() from core instead of local constant
- converter.md: fix duplicate step 6 → sequential 7/8/9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
.claude/agents/calibration/converter.md (1)

63-63: ⚠️ Potential issue | 🟡 Minor

List numbering still incorrect after previous fix attempt.

Static analysis continues to flag this line as incorrectly numbered (expected 6, actual 5). The past review comment indicated this was addressed, but the issue persists.

Root cause: Line 56 duplicates step number 3, causing all subsequent steps to be off by one. While line 56 is not part of this PR's changes, the new responsive comparison step should be numbered correctly as step 6 to reflect its actual position in the sequence.

🔢 Numbering fix
-5. **Responsive comparison** (if expanded screenshot exists):
+6. **Responsive comparison** (if expanded screenshot exists):

Also fix subsequent steps: 6→7, 7→8, 8→9, 9→10, or address the root cause by fixing the duplicate step 3 at line 56.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/agents/calibration/converter.md at line 63, The numbered list for
"Responsive comparison" is currently labeled 5 but should be 6; update the list
item text "5. **Responsive comparison** (if expanded screenshot exists):" to "6.
**Responsive comparison** (if expanded screenshot exists):" and then renumber
the following steps (change 6→7, 7→8, 8→9, 9→10) OR remove the duplicate "3."
entry at the earlier step (the duplicate on line 56) so the entire sequence
shifts correctly; ensure the label text for the item containing "Responsive
comparison" and all subsequent step labels are consistent with the corrected
sequence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/agents/calibration/converter.md:
- Around line 79-80: The instructions currently say "set both to null" when only
1 screenshot exists but the schema defines three responsive fields; update the
converter text to explicitly set responsiveSimilarity, responsiveDelta, and
responsiveViewport to null when fewer than two screenshots exist (i.e., skip
responsive comparison and assign null to all three fields).
- Line 98: The description for cssVariableCount uses the ambiguous pattern
`--*:`; update it to an explicit pattern or example so readers know what to
count — e.g., change `--*:` to a concrete regex `--[a-zA-Z0-9_-]+:` (or show an
example like `--primary-color:`) and mention that cssVariableCount should count
unique occurrences of custom properties defined in a <style> block using that
pattern.
- Around line 65-77: The doc is missing explicit steps to pick the largest
screenshot, extract its width, and read the similarity result; update the
instructions to: list and sort screenshot-*.png into an array (e.g.,
SCREENSHOTS), set LARGEST to the last element of that sorted list, extract the
numeric width from LARGEST into LARGEST_WIDTH (e.g., by regex matching the
screenshot-<number>.png pattern), pass "$LARGEST" and "$LARGEST_WIDTH" to the
existing npx canicode visual-compare invocation, and after the run read
responsiveSimilarity from the comparison output (e.g., parse
$RUN_DIR/responsive/comparison.json and read the similarity field into
RESPONSIVE_SIMILARITY). Include these exact symbol names (SCREENSHOTS, LARGEST,
LARGEST_WIDTH, RESPONSIVE_SIMILARITY) and reference the screenshot-*.png pattern
and $RUN_DIR/responsive/comparison.json so reviewers can locate the changes.
- Around line 94-99: The documented code-metrics fields (htmlBytes, htmlLines,
cssClassCount, cssVariableCount, responsiveViewport, responsiveSimilarity) in
step 8 and the conversion.json schema are unused downstream (downstream agents
only read ruleImpactAssessment, uncoveredStruggles, responsiveDelta); either
remove these six fields from the converter.md step 8 and the conversion.json
schema so the converter agent stops collecting them, or update the evaluation
pipeline/downstream extraction to consume them (i.e., add mappings from
htmlBytes/htmlLines/cssClassCount/cssVariableCount/responsiveViewport/responsiveSimilarity
into the downstream outputs or ruleImpactAssessment/responsiveDelta), and ensure
conversion.json, step 8 text, and any schema references (conversion.json,
ruleImpactAssessment, uncoveredStruggles, responsiveDelta) are kept consistent.

---

Duplicate comments:
In @.claude/agents/calibration/converter.md:
- Line 63: The numbered list for "Responsive comparison" is currently labeled 5
but should be 6; update the list item text "5. **Responsive comparison** (if
expanded screenshot exists):" to "6. **Responsive comparison** (if expanded
screenshot exists):" and then renumber the following steps (change 6→7, 7→8,
8→9, 9→10) OR remove the duplicate "3." entry at the earlier step (the duplicate
on line 56) so the entire sequence shifts correctly; ensure the label text for
the item containing "Responsive comparison" and all subsequent step labels are
consistent with the corrected sequence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 186753e8-3a2c-4501-9542-a9c1bbb41336

📥 Commits

Reviewing files that changed from the base of the PR and between 8b3934b and 3f9548c.

📒 Files selected for processing (2)
  • .claude/agents/calibration/converter.md
  • src/agents/ablation/helpers.ts

Comment on lines +94 to +99
8. **Code metrics**: After writing `output.html`, record these in conversion.json:
- `htmlBytes`: file size in bytes
- `htmlLines`: line count
- `cssClassCount`: unique CSS class selectors in `<style>` block
- `cssVariableCount`: unique CSS custom properties (`--*:`) in `<style>` block
9. Note any difficulties NOT covered by existing rules as `uncoveredStruggles`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that code metrics and responsive fields (except responsiveDelta) are not consumed

rg -n --type=ts -C3 'htmlBytes|htmlLines|cssClassCount|cssVariableCount|responsiveSimilarity|responsiveViewport' \
  -g '!*.test.ts' \
  -g '!*.md'

Repository: let-sunny/canicode

Length of output: 5457


🏁 Script executed:

# Check orchestrator.ts at the referenced lines to see what fields are extracted
find . -name "orchestrator.ts" -type f | head -5

Repository: let-sunny/canicode

Length of output: 90


🏁 Script executed:

# Also search for calibration converter implementation and how it's used
rg -l "calibration.*converter" --type=ts -g '!*.test.ts'

Repository: let-sunny/canicode

Length of output: 44


🏁 Script executed:

# Check what the converter agent actually does with these metrics
rg -n "conversion\.json" --type=ts -C2 -g '!*.test.ts' | head -50

Repository: let-sunny/canicode

Length of output: 2245


🏁 Script executed:

# Check orchestrator.ts at the referenced lines
sed -n '300,330p' src/agents/orchestrator.ts

Repository: let-sunny/canicode

Length of output: 866


🏁 Script executed:

# Check calibrate-debate.ts to see what fields it extracts from conversion.json
sed -n '28,80p' src/cli/commands/internal/calibrate-debate.ts

Repository: let-sunny/canicode

Length of output: 2013


🏁 Script executed:

# Also search for where these metrics (htmlBytes, htmlLines, etc.) are used after being read
rg -n "ruleImpactAssessment|uncoveredStruggles|conversion" src/cli/commands/internal/calibrate-debate.ts -A3 -B3

Repository: let-sunny/canicode

Length of output: 1700


🏁 Script executed:

# Read the converter.md file mentioned in the review
cat .claude/agents/calibration/converter.md | head -150

Repository: let-sunny/canicode

Length of output: 6743


🏁 Script executed:

# Search for actual converter agent implementation or execution
rg -l "converter" src/agents --type=ts -g '!*.test.ts' | head -10

Repository: let-sunny/canicode

Length of output: 94


🏁 Script executed:

# Look for where conversion.json is written to understand the schema
rg -n "conversion.json" src --type=ts -B5 -A5 | grep -A20 "writeFileSync.*conversion"

Repository: let-sunny/canicode

Length of output: 1761


🏁 Script executed:

# Verify responsiveSimilarity and responsiveViewport are extracted
rg -n "responsiveSimilarity|responsiveViewport" src/agents/orchestrator.ts -B2 -A2

Repository: let-sunny/canicode

Length of output: 44


🏁 Script executed:

# Double-check evaluation-agent contract for these fields
grep -A30 "responsiveSimilarity\|responsiveViewport" src/agents/contracts/evaluation-agent.ts

Repository: let-sunny/canicode

Length of output: 451


🏁 Script executed:

# Check for all responsive-related field usage in orchestrator
rg -n "responsive" src/agents/orchestrator.ts -i

Repository: let-sunny/canicode

Length of output: 281


🏁 Script executed:

# Check if responsiveViewport appears anywhere in codebase
rg -n "responsiveViewport" src --type=ts

Repository: let-sunny/canicode

Length of output: 44


🏁 Script executed:

# Check the full conversion.json schema comment in converter.md to see all fields
sed -n '114,155p' .claude/agents/calibration/converter.md

Repository: let-sunny/canicode

Length of output: 1380


Remove or use code metrics fields documented in step 8.

The instructions direct the converter agent to collect htmlBytes, htmlLines, cssClassCount, cssVariableCount, responsiveViewport, and responsiveSimilarity (lines 94-99 and the conversion.json schema). However, downstream agents extract only ruleImpactAssessment, uncoveredStruggles, and responsiveDelta from conversion.json. The six metrics above are never consumed and create unnecessary effort for the agent.

Either remove these fields from the documentation or integrate them into the evaluation pipeline if they serve a future purpose.

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 94-94: Ordered list item prefix
Expected: 9; Actual: 8; Style: 1/2/3

(MD029, ol-prefix)


[warning] 99-99: Ordered list item prefix
Expected: 10; Actual: 9; Style: 1/2/3

(MD029, ol-prefix)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/agents/calibration/converter.md around lines 94 - 99, The documented
code-metrics fields (htmlBytes, htmlLines, cssClassCount, cssVariableCount,
responsiveViewport, responsiveSimilarity) in step 8 and the conversion.json
schema are unused downstream (downstream agents only read ruleImpactAssessment,
uncoveredStruggles, responsiveDelta); either remove these six fields from the
converter.md step 8 and the conversion.json schema so the converter agent stops
collecting them, or update the evaluation pipeline/downstream extraction to
consume them (i.e., add mappings from
htmlBytes/htmlLines/cssClassCount/cssVariableCount/responsiveViewport/responsiveSimilarity
into the downstream outputs or ruleImpactAssessment/responsiveDelta), and ensure
conversion.json, step 8 text, and any schema references (conversion.json,
ruleImpactAssessment, uncoveredStruggles, responsiveDelta) are kept consistent.

…s, css pattern

- Fix step numbering: 1-10 sequential (was duplicate 3, duplicate 6)
- Concrete bash: array sort + grep for screenshot width extraction
- "set both to null" → explicit 3 fields (responsiveSimilarity, responsiveDelta, responsiveViewport)
- Code metrics annotated as "for analysis/reporting — not consumed by evaluation"
- CSS variable pattern: --*: → example (--primary-color:, --spacing-md:)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add responsive viewport comparison to calibration pipeline

1 participant