Skip to content

fix: calibration pipeline (retina visual-compare, evidence, convergence)#28

Merged
let-sunny merged 12 commits intomainfrom
fix/calibration-pipeline-issue-14
Mar 24, 2026
Merged

fix: calibration pipeline (retina visual-compare, evidence, convergence)#28
let-sunny merged 12 commits intomainfrom
fix/calibration-pipeline-issue-14

Conversation

@let-sunny
Copy link
Copy Markdown
Owner

@let-sunny let-sunny commented Mar 24, 2026

Summary

Addresses issue #14: calibration pipeline friction around @2x screenshots, evidence pruning, and fixture-done convergence when proposals keep getting rejected.

Changes

  • visual-compare: Figma Images API uses scale 2 (aligned with save-fixture); infer logical CSS viewport and set Playwright deviceScaleFactor so code.png pixel dimensions match figma.png. CLI no longer defaults to 1440×900. New --figma-scale. MCP tool accepts optional figmaExportScale. Cache filenames include scale.
  • Evidence: extractAppliedRuleIds trims rule IDs and normalizes decision casing; pruneCalibrationEvidence trims when matching; appendCalibrationEvidence replaces prior row for the same (ruleId, fixture).
  • Convergence: isConverged(runDir, { lenient: true }) when there are no applied/revised decisions but rejects remain; fixture-done --lenient-convergence. Documented in calibrate-night command.

Not in this PR

  • Automatic decay/removal of repeatedly rejected evidence (needs a separate design).
  • Optional: tie export scale to fixture data.json frame bounds for stronger inference.

Test plan

  • pnpm lint
  • pnpm test:run

Made with Cursor

Summary by CodeRabbit

Release Notes

  • New Features

    • Added --lenient-convergence flag to fixture-done command for flexible convergence checking when rejections occur without applications
    • Added --figma-scale option to visual-compare command for configurable Figma export scaling
    • Design tree output now includes token usage metrics
    • Calibration reports display token consumption statistics alongside grades
  • Documentation

    • Updated guidance on convergence modes and visual comparison behavior
    • Added analysis validity feedback reference documentation

…vergence

- visual-compare: default Figma export scale 2, logical viewport + deviceScaleFactor
  so code.png matches @2x figma.png; CLI omits 1440×900 override; --figma-scale
- Figma image cache keys include scale; MCP visual-compare accepts figmaExportScale
- extractAppliedRuleIds / pruneCalibrationEvidence: trim + case-insensitive decisions
- appendCalibrationEvidence: replace same (ruleId, fixture) on re-run
- fixture-done --lenient-convergence + isConverged({ lenient }) for reject-only stalls
- Document lenient flag in calibrate-night command

Closes #14
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a710e783-9b0a-4429-8515-6ee487246508

📥 Commits

Reviewing files that changed from the base of the PR and between fdfd3e9 and aec2e6a.

📒 Files selected for processing (4)
  • CLAUDE.md
  • src/agents/run-directory.test.ts
  • src/agents/run-directory.ts
  • src/mcp/server.ts

📝 Walkthrough

Walkthrough

The PR adds lenient convergence mode for fixture completion, normalizes whitespace in calibration evidence handling, introduces configurable Figma export scaling to the visual-compare pipeline with logical viewport inference, adds design-tree metrics tracking (tokens and bytes), and updates supporting documentation and test coverage across multiple systems.

Changes

Cohort / File(s) Summary
Convergence & Evidence Handling
src/agents/run-directory.ts, src/agents/run-directory.test.ts, src/agents/evidence-collector.ts, src/agents/evidence-collector.test.ts
Added ConvergenceOptions interface with optional lenient flag; updated isConverged to accept options and return true in lenient mode when zero applied/revised decisions exist (regardless of rejections). Enhanced evidence collection with whitespace trimming and composite-key deduplication for calibration entries; added test coverage for dedup logic and whitespace-robust rule matching.
CLI Updates
src/cli/index.ts, src/cli/docs.ts
Added --lenient-convergence boolean option to fixture-done command; updated design-tree to use generateDesignTreeWithStats and report token/byte metrics; refactored visual-compare to accept --figma-scale, remove hardcoded viewport defaults, and validate numeric parameters. Updated documentation comments for visual-compare workflow and viewport inference logic.
Visual Comparison Pipeline
src/core/engine/visual-compare.ts, src/core/engine/visual-compare.test.ts, src/mcp/server.ts
Introduced figmaExportScale option (default 2) for Figma screenshot fetching; made viewport properties optional and added logic to infer logical CSS viewport from PNG dimensions and scale. Updated renderCodeScreenshot signature to accept logicalViewport and deviceScaleFactor; modified test helpers to include scale in cache filenames. Added figmaExportScale parameter to MCP tool schema (integers 1–4).
Design Tree Metrics
src/core/engine/design-tree.ts, src/core/engine/design-tree.test.ts, src/agents/report-generator.ts
Added DesignTreeResult interface and new generateDesignTreeWithStats function computing byte length and token estimates; refactored generateDesignTree to delegate to stats variant. Extended CalibrationReportData with optional tokenMetrics field; updated overview rendering to conditionally display formatted token counts.
Documentation & Calibration Guidance
.claude/commands/calibrate-night.md, .claude/commands/add-rule.md, CLAUDE.md, docs/fixtures/ANALYSIS-VALIDITY-FEEDBACK.md
Clarified default convergence condition in debate.json and documented --lenient-convergence option. Renamed validation step to include token efficiency measurement alongside visual comparison. Added "Target Environment" section defining scope (large component-heavy pages). Introduced comprehensive fixture validity feedback document with categorized analysis patterns and remedies.

Sequence Diagram(s)

sequenceDiagram
    participant Client as CLI/MCP
    participant FigmaAPI as Figma API
    participant PNG as PNG Processor
    participant Viewport as Viewport Calculator
    participant Browser as Playwright Browser
    participant Screenshot as Screenshot Engine

    Client->>FigmaAPI: Fetch screenshot (scale=${figmaExportScale})
    FigmaAPI-->>PNG: Return PNG @${scale}x
    PNG->>PNG: Extract physical dimensions
    Viewport->>Viewport: Compute logical CSS viewport<br/>(PNG dims ÷ scale)
    Viewport->>Viewport: Infer deviceScaleFactor<br/>(physical dims ÷ logical dims)
    Client->>Browser: Create context (logicalViewport,<br/>deviceScaleFactor)
    Browser->>Screenshot: Render code screenshot
    Screenshot-->>Browser: Pixel data
    Browser-->>Client: Captured image (physical dims)
    Client->>Client: Compare images
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Poem

🐰 In viewports where pixels align just right,
Figma scales and screenshots take flight!
Convergence grows lenient, evidence trim'd,
Design trees now count tokens, their edges well-brimm'd!
With logic and metrics, we hop to the light!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: calibration pipeline (retina visual-compare, evidence, convergence)' clearly and concisely summarizes the main changes across multiple interrelated systems in the calibration pipeline, directly reflecting the PR's core objectives.
Docstring Coverage ✅ Passed Docstring coverage is 84.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/calibration-pipeline-issue-14

Comment @coderabbitai help to get the list of available commands and usage tips.

- Guard extractAppliedRuleIds and isConverged when arbitrator.decisions is not an array
- Use trimmed fixture in calibration evidence dedupe key
- Add tests for malformed debate.json
- Add docs/RESEARCH-BRANCH-CALIBRATION-RULE-DISCOVERY.md (calibration & rule discovery)
- Align visual-compare.test.ts comment with padPng behavior

Made-with: Cursor
@let-sunny
Copy link
Copy Markdown
Owner Author

Latest push (7bc939c)

  • Research report: docs/RESEARCH-BRANCH-CALIBRATION-RULE-DISCOVERY.md — system map, retina/visual-compare analysis, evidence & convergence semantics, risks, and a prioritized improvement roadmap (P0–P4).
  • Hardening: extractAppliedRuleIds / isConverged treat non-array arbitrator.decisions safely; appendCalibrationEvidence dedupe keys now trim fixture; tests for malformed debate.json.
  • Chore: Align visual-compare.test.ts header comment with padPng (no behavior change).

let-sunny and others added 3 commits March 25, 2026 01:29
- Dedupe calibration evidence within a single append (Map, last wins)
- Treat non-object decision rows as inert in extractAppliedRuleIds and isConverged
- Doc: fix overscored/underscored typo; strict convergence note; Appendix A self-review

Made-with: Cursor
- SCALE_ROUNDING_TOLERANCE (0.08): broader for @2x/@3x detection
- UNITY_SCALE_TOLERANCE (0.02): tighter for 1x false-positive prevention
- parseDebateResult: validates shape, debug logs on malformed input

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@let-sunny let-sunny linked an issue Mar 24, 2026 that may be closed by this pull request
@let-sunny let-sunny marked this pull request as ready for review March 24, 2026 22:36
@let-sunny let-sunny marked this pull request as draft March 24, 2026 22:40
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agents/evidence-collector.ts`:
- Around line 89-99: The dedupe currently trims only the key but reuses original
entries, causing untrimmed ruleId/fixture to be persisted; update the dedup
logic in incomingByKey/mergedIncoming so that when you compute k in the loop
over entries you also create and store a normalized CalibrationEvidenceEntry
(with ruleId and fixture trimmed) into incomingByKey (instead of the raw e), so
mergedIncoming contains trimmed values and withoutDupes/mergedIncoming push
persists the normalized ruleId/fixture (affecting incomingByKey, mergedIncoming,
keys, withoutDupes and referencing loadCalibrationEvidence()).

In `@src/agents/run-directory.ts`:
- Around line 193-206: The parseDebateResult function currently casts nested
fields from debate.json directly, allowing malformed critic/arbitrator shapes to
pass through; replace the ad-hoc casting with a Zod schema (e.g., create
DebateResultSchema in contracts/) that describes the expected DebateResult
shape, use DebateResultSchema.parse or safeParse inside parseDebateResult to
validate the parsed raw object, return null on validation failure and use
z.infer<typeof DebateResultSchema> to drive the DebateResult type so downstream
code receives a sound, validated shape.

In `@src/cli/index.ts`:
- Around line 935-958: The CLI currently validates only --figma-scale; add
validation for options.width and options.height so zero, negative, non-numeric
or non-finite values are rejected at the CLI boundary (mirror the figmaScale
check: coerce to Number, ensure Number.isFinite(value) and value >= 1) before
calling visualCompare; implement this by creating/using a Zod schema in the
contracts directory for the viewport overrides (or extend the existing CLI input
schema) to validate width and height and then pass the validated values into
visualCompare (refer to options.width, options.height, visualCompare, and the
contracts Zod schemas).

In `@src/core/engine/visual-compare.test.ts`:
- Around line 23-24: The test is duplicating helper logic (padPng,
compareScreenshots and viewport/device-scale math) instead of using the real
implementation, so extract the shared cache/scale helpers into a small internal
module and import them in tests: create an internal helpers module that exports
padPng, compareScreenshots and inferDeviceScaleFactor (or the cache/scale
helpers currently in visual-compare.ts), update visual-compare.ts to import and
use those helpers, then update visual-compare.test.ts to remove the hand-copied
slice and import the real helpers and add tests that exercise the
inferDeviceScaleFactor path and the cache/scale behavior to ensure device-scale
math is validated against the actual implementation.

In `@src/core/engine/visual-compare.ts`:
- Around line 150-154: Wrap the browser/context/page usage in a try/finally so
the Chromium process is always closed: after creating browser, call
browser.newContext(...) and context.newPage() inside a try block where you
perform goto/waitForTimeout/screenshot, and in the finally block await
context.close() (or context?.close()) and await browser.close() to guarantee
cleanup even on errors; update references to browser, context, and page in the
function (e.g., the code around browser.newContext, context.newPage, and any
subsequent waits/screenshots) so they are declared in outer scope and closed in
the finally.
- Around line 261-268: The code currently reuses figma.png from
figmaScreenshotPath without validating that it matches the current fileKey,
nodeId or exportScale (exportScale / options.figmaExportScale), causing stale or
wrong-scale images to be reused; update the logic in the visual-compare flow
(around exportScale, figmaScreenshotPath, and fetchFigmaScreenshot) to include
exportScale (and/or fileKey/nodeId) in the cache identity or verify metadata
before short-circuiting: either compute a cache filename that embeds
exportScale/fileKey/nodeId or check stored metadata for those values and call
fetchFigmaScreenshot(fileKey, nodeId, options.figmaToken, figmaScreenshotPath,
exportScale) when they differ so the image is always correct for the current
run.

In `@src/mcp/server.ts`:
- Around line 419-427: Update the inline "visual-compare" docs string in
src/mcp/server.ts so it matches the current contract: include the new
figmaExportScale option and its default/behavior, mention that logical viewport
and device pixel ratio are inferred from the Figma PNG export scale (default 2×)
so code.png matches figma.png pixels, and ensure the Playwright + Chromium
requirement and Figma API token fallback are documented; specifically edit the
docstring that precedes the zod schema (the paragraph referencing
Playwright/Chromium and the subsequent z.string()/z.number() entries) to
describe figmaExportScale, its default (2), valid range, and how it affects
DPR/viewport.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fb191c0e-41e3-4d98-ae9b-335d38350be2

📥 Commits

Reviewing files that changed from the base of the PR and between b77b433 and 465eecc.

📒 Files selected for processing (10)
  • .claude/commands/calibrate-night.md
  • src/agents/evidence-collector.test.ts
  • src/agents/evidence-collector.ts
  • src/agents/run-directory.test.ts
  • src/agents/run-directory.ts
  • src/cli/docs.ts
  • src/cli/index.ts
  • src/core/engine/visual-compare.test.ts
  • src/core/engine/visual-compare.ts
  • src/mcp/server.ts

Comment on lines +23 to 24
* logic (padPng + compareScreenshots) so the tests validate
* the algorithm rather than just the export boundary.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Extract the pure helpers instead of mirroring them here.

This file still tests a hand-copied slice of visual-compare.ts, so the new viewport/device-scale math can drift or regress without the suite noticing. Pull the cache/scale helpers into a small internal module and test the real implementation, especially the new inferDeviceScaleFactor path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/engine/visual-compare.test.ts` around lines 23 - 24, The test is
duplicating helper logic (padPng, compareScreenshots and viewport/device-scale
math) instead of using the real implementation, so extract the shared
cache/scale helpers into a small internal module and import them in tests:
create an internal helpers module that exports padPng, compareScreenshots and
inferDeviceScaleFactor (or the cache/scale helpers currently in
visual-compare.ts), update visual-compare.ts to import and use those helpers,
then update visual-compare.test.ts to remove the hand-copied slice and import
the real helpers and add tests that exercise the inferDeviceScaleFactor path and
the cache/scale behavior to ensure device-scale math is validated against the
actual implementation.

Comment on lines +150 to +154
const context = await browser.newContext({
viewport: logicalViewport,
deviceScaleFactor,
});
const page = await context.newPage();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Close Chromium in a finally block.

Any later goto, waitForTimeout, or screenshot failure skips browser.close(). In the MCP server that leaks browser processes across requests.

Suggested fix
   const { chromium } = await import("playwright");
   const browser = await chromium.launch();
-  const context = await browser.newContext({
-    viewport: logicalViewport,
-    deviceScaleFactor,
-  });
-  const page = await context.newPage();
-
-  await page.goto(`file://${resolve(codePath)}`, {
-    waitUntil: "networkidle",
-    timeout: 30_000,
-  });
-  await page.waitForTimeout(1000);
-
-  // Capture only the first child element (the design root), not the full body/viewport
-  const root = page.locator("body > *:first-child");
-  if (await root.count() > 0) {
-    await root.screenshot({ path: outputPath });
-  } else {
-    await page.screenshot({ path: outputPath });
-  }
-  await browser.close();
+  try {
+    const context = await browser.newContext({
+      viewport: logicalViewport,
+      deviceScaleFactor,
+    });
+    const page = await context.newPage();
+
+    await page.goto(`file://${resolve(codePath)}`, {
+      waitUntil: "networkidle",
+      timeout: 30_000,
+    });
+    await page.waitForTimeout(1000);
+
+    // Capture only the first child element (the design root), not the full body/viewport
+    const root = page.locator("body > *:first-child");
+    if (await root.count() > 0) {
+      await root.screenshot({ path: outputPath });
+    } else {
+      await page.screenshot({ path: outputPath });
+    }
+  } finally {
+    await browser.close();
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/engine/visual-compare.ts` around lines 150 - 154, Wrap the
browser/context/page usage in a try/finally so the Chromium process is always
closed: after creating browser, call browser.newContext(...) and
context.newPage() inside a try block where you perform
goto/waitForTimeout/screenshot, and in the finally block await context.close()
(or context?.close()) and await browser.close() to guarantee cleanup even on
errors; update references to browser, context, and page in the function (e.g.,
the code around browser.newContext, context.newPage, and any subsequent
waits/screenshots) so they are declared in outer scope and closed in the
finally.

Comment on lines +261 to 268
const exportScale = options.figmaExportScale ?? 2;

// Step 1: Fetch Figma screenshot (skip if already cached in output dir)
if (existsSync(figmaScreenshotPath)) {
// Reuse cached figma.png — same design, no need to re-fetch
} else {
await fetchFigmaScreenshot(fileKey, nodeId, options.figmaToken, figmaScreenshotPath);
await fetchFigmaScreenshot(fileKey, nodeId, options.figmaToken, figmaScreenshotPath, exportScale);
if (!existsSync(figmaScreenshotPath)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't short-circuit on an existing figma.png.

figma.png is now scale-dependent, but this branch reuses whatever is already in outputDir without checking fileKey, nodeId, or exportScale. A second run against the default temp dir can silently compare new code against the previous design or scale.

Suggested fix
-  // Step 1: Fetch Figma screenshot (skip if already cached in output dir)
-  if (existsSync(figmaScreenshotPath)) {
-    // Reuse cached figma.png — same design, no need to re-fetch
-  } else {
-    await fetchFigmaScreenshot(fileKey, nodeId, options.figmaToken, figmaScreenshotPath, exportScale);
-    if (!existsSync(figmaScreenshotPath)) {
-      throw new Error(`Figma screenshot was not created at expected path: ${figmaScreenshotPath}`);
-    }
-  }
+  // Step 1: Materialize the screenshot for this file/node/scale.
+  await fetchFigmaScreenshot(fileKey, nodeId, options.figmaToken, figmaScreenshotPath, exportScale);
+  if (!existsSync(figmaScreenshotPath)) {
+    throw new Error(`Figma screenshot was not created at expected path: ${figmaScreenshotPath}`);
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/engine/visual-compare.ts` around lines 261 - 268, The code currently
reuses figma.png from figmaScreenshotPath without validating that it matches the
current fileKey, nodeId or exportScale (exportScale / options.figmaExportScale),
causing stale or wrong-scale images to be reused; update the logic in the
visual-compare flow (around exportScale, figmaScreenshotPath, and
fetchFigmaScreenshot) to include exportScale (and/or fileKey/nodeId) in the
cache identity or verify metadata before short-circuiting: either compute a
cache filename that embeds exportScale/fileKey/nodeId or check stored metadata
for those values and call fetchFigmaScreenshot(fileKey, nodeId,
options.figmaToken, figmaScreenshotPath, exportScale) when they differ so the
image is always correct for the current run.

let-sunny and others added 4 commits March 25, 2026 07:48
1. Token metrics:
   - generateDesignTreeWithStats() returns estimatedTokens + bytes
   - CLI design-tree command shows token estimate
   - CalibrationReportData includes tokenMetrics (tokens, bytes, per-node)

2. Target environment in CLAUDE.md:
   - Primary target: teams with designers, 300+ node pages
   - Token consumption is first-class metric
   - Component scores should not be lowered from small fixture calibration

3. add-rule A/B always includes token comparison:
   - Step 4 measures both visual similarity AND token savings
   - Token savings ratio: 1 - tokens_b / tokens_a
   - Even if visual diff is zero, token savings justify the rule

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Returns tree, estimatedTokens, bytes
- Output matches generateDesignTree
- CLI verified: 31KB fixture → ~7912 tokens

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract normalizeDecision() and countDecisions() from duplicated
  String(d.decision).trim().toLowerCase() pattern (was 3x)
- Add APPLIED_TYPES/REJECTED_TYPES constants
- Add JSDoc to all exported interfaces/functions in changed files:
  run-directory.ts, visual-compare.ts, design-tree.ts,
  report-generator.ts, docs.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. evidence-collector: normalize ruleId/fixture before storing
   (prevents bucket splitting from whitespace differences)
2. cli: validate --width/--height as positive numbers
3. visual-compare: browser.close() in finally block
   (prevents Chromium leak on goto/screenshot failure)
4. visual-compare: remove stale figma.png comment, keep simple
   cache-miss-only logic
5. mcp docs: sync visual-compare topic with new scale options

Deferred: Zod schema for debate.json (#2), test helper extraction (#4)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@let-sunny
Copy link
Copy Markdown
Owner Author

리뷰 반영 완료 (5/7)

# 내용 상태
1 evidence ruleId/fixture normalize 후 저장 ✅ 반영
2 debate.json Zod 스키마 파싱 ⏭️ #32 이슈로 분리
3 --width/--height 양수 검증 ✅ 반영
4 visual-compare 테스트 헬퍼 추출 ⏭️ #33 이슈로 분리
5 browser.close() finally 블록 ✅ 반영
6 figma.png 캐시 로직 정리 ✅ 반영
7 MCP docs visual-compare 업데이트 ✅ 반영

#2, #4는 이 PR 스코프를 넘는 리팩토링이라 별도 이슈로 분리했습니다.

@let-sunny let-sunny marked this pull request as ready for review March 24, 2026 22:58
Three templates matching the patterns used in this project:
- Bug: symptom + cause + fix + affected files
- Feature/Refactor: background + current + proposal + prerequisites
- Research: question + method + expected outcome + blockers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cli/index.ts (1)

920-967: ⚠️ Potential issue | 🟠 Major

Coerce --width and --height from strings before validation.

CAC returns <value> options as strings, but the code validates width and height without parsing them first. Number.isFinite("1440") returns false, so valid inputs like --width 1440 fail and the command exits. The figmaScale option already demonstrates the correct pattern: coerce with Number() then validate. Additionally, per coding guidelines, the entire option contract should be validated with a Zod schema instead of inline checks.

Suggested fix
-  .action(async (codePath: string, options: VisualCompareOptions & { figmaScale?: string }) => {
+  .action(async (
+    codePath: string,
+    options: Omit<VisualCompareOptions, "width" | "height"> & {
+      width?: string;
+      height?: string;
+      figmaScale?: string;
+    },
+  ) => {
       try {
@@
+      const width = options.width !== undefined ? Number(options.width) : undefined;
+      const height = options.height !== undefined ? Number(options.height) : undefined;
+
-      if (options.width !== undefined && (!Number.isFinite(options.width) || options.width <= 0)) {
+      if (width !== undefined && (!Number.isFinite(width) || width <= 0)) {
         console.error("Error: --width must be a positive number");
         process.exit(1);
       }
-      if (options.height !== undefined && (!Number.isFinite(options.height) || options.height <= 0)) {
+      if (height !== undefined && (!Number.isFinite(height) || height <= 0)) {
         console.error("Error: --height must be a positive number");
         process.exit(1);
       }
 
-      const hasViewportOverride = options.width !== undefined || options.height !== undefined;
+      const hasViewportOverride = width !== undefined || height !== undefined;
@@
-                ...(options.width !== undefined ? { width: options.width } : {}),
-                ...(options.height !== undefined ? { height: options.height } : {}),
+                ...(width !== undefined ? { width } : {}),
+                ...(height !== undefined ? { height } : {}),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/index.ts` around lines 920 - 967, The width/height validation
currently uses Number.isFinite on options.width/options.height which are strings
from CAC and therefore fails for valid numeric inputs; mirror the figmaScale
pattern by coercing options.width and options.height with Number(...) (e.g.,
const width = options.width !== undefined ? Number(options.width) : undefined)
before running the finite/positive checks, then pass those numeric values into
the visualCompare invocation (replace references to options.width/options.height
in the viewport object with the coerced width/height); optionally, for larger
refactors, replace these inline checks with a Zod schema to validate/coerce the
entire CLI options contract (validate figmaScale/width/height there and use the
parsed results in visualCompare).
♻️ Duplicate comments (2)
src/core/engine/visual-compare.ts (1)

266-275: ⚠️ Potential issue | 🟠 Major

Don't trust an existing outputDir/figma.png.

The global cache is now keyed by file/node/scale, but this branch still reuses any preexisting figma.png in the output directory. Re-running visual-compare against the default tmp dir can compare fresh code against an old reference and skew similarity/evidence. Always call fetchFigmaScreenshot(...) here — it already hits the keyed cache when safe — or validate metadata before skipping.

🖼️ Suggested fix
-  if (!existsSync(figmaScreenshotPath)) {
-    await fetchFigmaScreenshot(fileKey, nodeId, options.figmaToken, figmaScreenshotPath, exportScale);
-    if (!existsSync(figmaScreenshotPath)) {
-      throw new Error(`Figma screenshot was not created at expected path: ${figmaScreenshotPath}`);
-    }
-  }
+  await fetchFigmaScreenshot(fileKey, nodeId, options.figmaToken, figmaScreenshotPath, exportScale);
+  if (!existsSync(figmaScreenshotPath)) {
+    throw new Error(`Figma screenshot was not created at expected path: ${figmaScreenshotPath}`);
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/engine/visual-compare.ts` around lines 266 - 275, The current branch
skips fetching the Figma screenshot if outputDir/figma.png already exists, which
can reuse stale images; change the logic in visual-compare.ts to always call
fetchFigmaScreenshot(fileKey, nodeId, options.figmaToken, figmaScreenshotPath,
exportScale) (where exportScale = options.figmaExportScale ?? 2) instead of
gating on existsSync(figmaScreenshotPath), since fetchFigmaScreenshot already
uses the cache keyed by file/node/scale; remove or replace the early existsSync
check and still keep the post-fetch existence check that throws the Error if
figmaScreenshotPath was not created.
src/agents/run-directory.ts (1)

193-209: ⚠️ Potential issue | 🟠 Major

debate.json is still not actually schema-validated.

The new guard only proves the top level is an object. critic, arbitrator, and nested decisions are still cast wholesale, so malformed JSON can escape this function as a trusted DebateResult despite the JSDoc claiming defensive validation. Please move this to a DebateResultSchema in src/contracts and infer the returned type from that schema.

Based on learnings "Validate all external inputs with Zod schemas and infer TypeScript types using z.infer<typeof Schema>" and "Define all Zod validation schemas in the contracts/ directory".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/run-directory.ts` around lines 193 - 209, The parseDebateResult
function currently only checks that debate.json is an object and then casts
fields unsafely; replace this with a Zod-backed validation: create a
DebateResultSchema (using zod) in the contracts module, export its inferred type
(z.infer<typeof DebateResultSchema>) as DebateResult, then in parseDebateResult
call DebateResultSchema.parse or safeParse on the parsed JSON and return the
validated value (or null on failure), ensuring nested fields like critic,
arbitrator, and decisions are properly validated rather than cast.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/commands/add-rule.md:
- Around line 96-101: The token-savings calculation using the expression `1 -
tokens_b / tokens_a` is unsafe when `tokens_a` can be zero; update the doc and
any corresponding implementation notes to first check `tokens_a == 0` (or falsy)
and handle that case explicitly—e.g., set the savings ratio to 0 or null, or
mark the comparison as invalid/skipped—before computing `1 - tokens_b /
tokens_a`; reference the variables `tokens_a` and `tokens_b` and the ratio
expression so maintainers can find and update the logic and documentation
accordingly.

In @.github/ISSUE_TEMPLATE/bug.yml:
- Around line 1-44: Insert a new checkbox field at the start of the form body
(before the existing "symptom" field) to prompt users to confirm they searched
for duplicate issues; add a field with type: checkboxes, id: duplicate_search
(or similar), attributes.label: "I searched for existing issues" and an options
array containing a single option with label "Yes, I searched" and value "yes" so
reviewers can quickly spot potential duplicates when triaging; keep the existing
symptom, cause, fix, files, and related fields unchanged.

In `@docs/fixtures/ANALYSIS-VALIDITY-FEEDBACK.md`:
- Line 58: Update the header text in the line "### E. `figma-ui3-kit` (token
category very weak, few nodes)" to use a more precise, measurable descriptor
(e.g., "limited token coverage" or "low token coverage, few nodes") instead of
the vague phrase "very weak" so the analysis is clearer and actionable.
- Line 141: In the sentence "Font and color contract in README for this fixture;
optional webfont load in generated HTML." replace the token "webfont" with "web
font" to match the project's orthography convention; update that phrase in
ANALYSIS-VALIDITY-FEEDBACK.md so it reads "...optional web font load in
generated HTML." ensuring consistency with other docs.
- Line 1: Rename the file docs/fixtures/ANALYSIS-VALIDITY-FEEDBACK.md to
kebab-case (for example docs/fixtures/analysis-validity-feedback.md) and update
all references to it (README links, tests, docs, or any import/path usages) so
they point to the new filename; ensure the git move preserves history (git mv)
or the change is committed as a rename.

In `@src/agents/report-generator.ts`:
- Around line 57-59: renderOverview conditionally renders the token metrics rows
only when data.tokenMetrics exists, but the report object constructed in the
orchestrator does not include tokenMetrics; update the orchestrator's report
construction to populate the tokenMetrics property (e.g., from the token metrics
computation or a new computeTokenMetrics function/variable) so data.tokenMetrics
is present when calling renderOverview, or alternatively change renderOverview
to derive metrics from the existing report fields; specifically, add a
tokenMetrics object to the report payload (containing designTreeTokens,
designTreeBytes, tokensPerNode) where the report is assembled so renderOverview
can emit the rows.
- Around line 19-24: The tokenMetrics property declaration currently uses an
explicit "| undefined" on top of the optional marker; remove the " | undefined"
so the declaration reads tokenMetrics?: { designTreeTokens: number;
designTreeBytes: number; tokensPerNode: number; } thereby relying on the
exactOptionalPropertyTypes behavior; update the tokenMetrics type in the
report-generator (the doc comment "Design tree token metrics") accordingly and
run type checks to ensure no other code expects an explicit undefined union.

---

Outside diff comments:
In `@src/cli/index.ts`:
- Around line 920-967: The width/height validation currently uses
Number.isFinite on options.width/options.height which are strings from CAC and
therefore fails for valid numeric inputs; mirror the figmaScale pattern by
coercing options.width and options.height with Number(...) (e.g., const width =
options.width !== undefined ? Number(options.width) : undefined) before running
the finite/positive checks, then pass those numeric values into the
visualCompare invocation (replace references to options.width/options.height in
the viewport object with the coerced width/height); optionally, for larger
refactors, replace these inline checks with a Zod schema to validate/coerce the
entire CLI options contract (validate figmaScale/width/height there and use the
parsed results in visualCompare).

---

Duplicate comments:
In `@src/agents/run-directory.ts`:
- Around line 193-209: The parseDebateResult function currently only checks that
debate.json is an object and then casts fields unsafely; replace this with a
Zod-backed validation: create a DebateResultSchema (using zod) in the contracts
module, export its inferred type (z.infer<typeof DebateResultSchema>) as
DebateResult, then in parseDebateResult call DebateResultSchema.parse or
safeParse on the parsed JSON and return the validated value (or null on
failure), ensuring nested fields like critic, arbitrator, and decisions are
properly validated rather than cast.

In `@src/core/engine/visual-compare.ts`:
- Around line 266-275: The current branch skips fetching the Figma screenshot if
outputDir/figma.png already exists, which can reuse stale images; change the
logic in visual-compare.ts to always call fetchFigmaScreenshot(fileKey, nodeId,
options.figmaToken, figmaScreenshotPath, exportScale) (where exportScale =
options.figmaExportScale ?? 2) instead of gating on
existsSync(figmaScreenshotPath), since fetchFigmaScreenshot already uses the
cache keyed by file/node/scale; remove or replace the early existsSync check and
still keep the post-fetch existence check that throws the Error if
figmaScreenshotPath was not created.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ce88d9e9-70fd-4fe9-a63a-889efc0b2a3a

📥 Commits

Reviewing files that changed from the base of the PR and between 465eecc and bd8c90b.

📒 Files selected for processing (15)
  • .claude/commands/add-rule.md
  • .github/ISSUE_TEMPLATE/bug.yml
  • .github/ISSUE_TEMPLATE/feature.yml
  • .github/ISSUE_TEMPLATE/research.yml
  • CLAUDE.md
  • docs/fixtures/ANALYSIS-VALIDITY-FEEDBACK.md
  • src/agents/evidence-collector.ts
  • src/agents/report-generator.ts
  • src/agents/run-directory.ts
  • src/cli/docs.ts
  • src/cli/index.ts
  • src/core/engine/design-tree.test.ts
  • src/core/engine/design-tree.ts
  • src/core/engine/visual-compare.ts
  • src/mcp/server.ts

Comment on lines +96 to +101
6. **Token comparison** (always measure, even if visual diff is zero):
- Count lines/bytes of `visual-a.html` vs `visual-b.html`
- If Test B is significantly smaller → the rule reduces token consumption (important for large pages)
- Record `tokens_a` (estimated: file bytes / 4) and `tokens_b`
- Token savings ratio: `1 - tokens_b / tokens_a`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard the token-savings ratio against tokens_a = 0.

If Test A output is empty/failed, 1 - tokens_b / tokens_a becomes invalid and can break downstream evaluation data.

🛠️ Suggested doc tweak
-   - Token savings ratio: `1 - tokens_b / tokens_a`
+   - Token savings ratio:
+     - if `tokens_a > 0`: `1 - tokens_b / tokens_a`
+     - else: `0` (or `n/a`, and mark the run as generation-failed)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
6. **Token comparison** (always measure, even if visual diff is zero):
- Count lines/bytes of `visual-a.html` vs `visual-b.html`
- If Test B is significantly smaller → the rule reduces token consumption (important for large pages)
- Record `tokens_a` (estimated: file bytes / 4) and `tokens_b`
- Token savings ratio: `1 - tokens_b / tokens_a`
6. **Token comparison** (always measure, even if visual diff is zero):
- Count lines/bytes of `visual-a.html` vs `visual-b.html`
- If Test B is significantly smaller → the rule reduces token consumption (important for large pages)
- Record `tokens_a` (estimated: file bytes / 4) and `tokens_b`
- Token savings ratio:
- if `tokens_a > 0`: `1 - tokens_b / tokens_a`
- else: `0` (or `n/a`, and mark the run as generation-failed)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/add-rule.md around lines 96 - 101, The token-savings
calculation using the expression `1 - tokens_b / tokens_a` is unsafe when
`tokens_a` can be zero; update the doc and any corresponding implementation
notes to first check `tokens_a == 0` (or falsy) and handle that case
explicitly—e.g., set the savings ratio to 0 or null, or mark the comparison as
invalid/skipped—before computing `1 - tokens_b / tokens_a`; reference the
variables `tokens_a` and `tokens_b` and the ratio expression so maintainers can
find and update the logic and documentation accordingly.

Comment on lines +1 to +44
name: Bug Report
description: Something is broken or producing wrong results
labels: ["bug"]
body:
- type: textarea
id: symptom
attributes:
label: Symptom
description: What's happening? Include error messages, wrong output, or unexpected behavior.
placeholder: "invisible-layer scores -10 (blocking) but hidden layers don't block implementation"
validations:
required: true
- type: textarea
id: cause
attributes:
label: Cause
description: Why is this happening? Root cause if known.
placeholder: "design-tree already skips visible:false nodes, so hidden layers have zero impact on code generation"
validations:
required: false
- type: textarea
id: fix
attributes:
label: Proposed Fix
description: How should it be fixed?
placeholder: "Change severity from blocking to suggestion, score from -10 to -1"
validations:
required: false
- type: textarea
id: files
attributes:
label: Affected Files
description: Which source files need changes?
placeholder: "src/core/rules/rule-config.ts, src/core/rules/ai-readability/index.ts"
validations:
required: false
- type: textarea
id: related
attributes:
label: Related Issues
description: Link related issues or PRs
placeholder: "#14, #15"
validations:
required: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Well-structured bug report template.

The template follows GitHub's issue form schema correctly and provides a logical field progression (symptom → cause → fix → affected files → related issues). Requiring only the symptom field allows for quick bug reports while providing optional fields for more detailed analysis. The project-specific placeholders effectively demonstrate the expected format.

💡 Optional enhancement: Add duplicate issue check

Consider adding a checkbox at the beginning to encourage users to search for existing issues:

 labels: ["bug"]
 body:
+  - type: checkboxes
+    id: checklist
+    attributes:
+      label: Pre-submission checklist
+      options:
+        - label: I have searched existing issues for duplicates
+          required: true
   - type: textarea
     id: symptom

This helps reduce duplicate bug reports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/ISSUE_TEMPLATE/bug.yml around lines 1 - 44, Insert a new checkbox
field at the start of the form body (before the existing "symptom" field) to
prompt users to confirm they searched for duplicate issues; add a field with
type: checkboxes, id: duplicate_search (or similar), attributes.label: "I
searched for existing issues" and an options array containing a single option
with label "Yes, I searched" and value "yes" so reviewers can quickly spot
potential duplicates when triaging; keep the existing symptom, cause, fix,
files, and related fields unchanged.

@@ -0,0 +1,156 @@
# Fixture analysis: validity feedback (over- vs under-estimation)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rename this file to kebab-case to satisfy repository naming rules.

docs/fixtures/ANALYSIS-VALIDITY-FEEDBACK.md violates the filename convention and should be renamed (for example: docs/fixtures/analysis-validity-feedback.md).

As per coding guidelines, "**/*: Use kebab-case for filenames (e.g., my-component.ts)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/fixtures/ANALYSIS-VALIDITY-FEEDBACK.md` at line 1, Rename the file
docs/fixtures/ANALYSIS-VALIDITY-FEEDBACK.md to kebab-case (for example
docs/fixtures/analysis-validity-feedback.md) and update all references to it
(README links, tests, docs, or any import/path usages) so they point to the new
filename; ensure the git move preserves history (git mv) or the change is
committed as a rename.

- **Prompt:** “Map Figma node IDs to React components; names are not authoritative.”
- **Tooling:** Separate **“semantic naming score”** from **implementation difficulty** in user-facing copy.

### E. `figma-ui3-kit` (token category very weak, few nodes)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Prefer a more precise descriptor than “very weak.”

Consider replacing “very weak” with a concrete phrase (for example, “limited token coverage”) to keep the analysis language crisp and measurable.

🧰 Tools
🪛 LanguageTool

[style] ~58-~58: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ... ### E. figma-ui3-kit (token category very weak, few nodes) - Overrated: **Overall...

(EN_WEAK_ADJECTIVE)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/fixtures/ANALYSIS-VALIDITY-FEEDBACK.md` at line 58, Update the header
text in the line "### E. `figma-ui3-kit` (token category very weak, few nodes)"
to use a more precise, measurable descriptor (e.g., "limited token coverage" or
"low token coverage, few nodes") instead of the vague phrase "very weak" so the
analysis is clearer and actionable.


- **Overrated:** **Overall B** with **token 21%** — **readiness for unguided AI** is **overrated** unless fonts/colors are specified in prompt.
- **Underrated:** **20 nodes** — absolute work is small; grade **undersells** “quick human polish.”
- **Remedies:** Font and color contract in README for this fixture; optional webfont load in generated HTML.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use “web font” for clearer orthography and consistency.

Replace “webfont” with “web font” in this sentence for cleaner documentation style.

🧰 Tools
🪛 LanguageTool

[grammar] ~141-~141: Ensure spelling is correct
Context: ...polish.” - Remedies: Font and color contract in README for this fixture; optional we...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/fixtures/ANALYSIS-VALIDITY-FEEDBACK.md` at line 141, In the sentence
"Font and color contract in README for this fixture; optional webfont load in
generated HTML." replace the token "webfont" with "web font" to match the
project's orthography convention; update that phrase in
ANALYSIS-VALIDITY-FEEDBACK.md so it reads "...optional web font load in
generated HTML." ensuring consistency with other docs.

Comment on lines +19 to +24
/** Design tree token metrics (optional — present when design-tree stats are available) */
tokenMetrics?: {
designTreeTokens: number;
designTreeBytes: number;
tokensPerNode: number;
} | undefined;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Drop explicit | undefined from optional tokenMetrics.

With strict exactOptionalPropertyTypes, tokenMetrics?: ... already models absence; adding | undefined weakens the contract.

♻️ Proposed fix
-  tokenMetrics?: {
+  tokenMetrics?: {
     designTreeTokens: number;
     designTreeBytes: number;
     tokensPerNode: number;
-  } | undefined;
+  };

As per coding guidelines "exactOptionalPropertyTypes enabled - no explicit undefined assignment to optional properties".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/** Design tree token metrics (optional — present when design-tree stats are available) */
tokenMetrics?: {
designTreeTokens: number;
designTreeBytes: number;
tokensPerNode: number;
} | undefined;
/** Design tree token metrics (optional — present when design-tree stats are available) */
tokenMetrics?: {
designTreeTokens: number;
designTreeBytes: number;
tokensPerNode: number;
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/report-generator.ts` around lines 19 - 24, The tokenMetrics
property declaration currently uses an explicit "| undefined" on top of the
optional marker; remove the " | undefined" so the declaration reads
tokenMetrics?: { designTreeTokens: number; designTreeBytes: number;
tokensPerNode: number; } thereby relying on the exactOptionalPropertyTypes
behavior; update the tokenMetrics type in the report-generator (the doc comment
"Design tree token metrics") accordingly and run type checks to ensure no other
code expects an explicit undefined union.

Comment on lines +57 to +59
| Overall Grade | ${data.scoreReport.overall.grade} (${data.scoreReport.overall.percentage}%) |${data.tokenMetrics ? `
| Design Tree Tokens | ~${data.tokenMetrics.designTreeTokens.toLocaleString()} tokens (${Math.round(data.tokenMetrics.designTreeBytes / 1024)}KB) |
| Tokens per Node | ~${Math.round(data.tokenMetrics.tokensPerNode)} |` : ""}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Token metrics rows are currently not reachable in the shown report assembly path.

renderOverview only emits these rows when data.tokenMetrics exists, but the provided src/agents/orchestrator.ts:318-335 report object construction does not pass tokenMetrics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/report-generator.ts` around lines 57 - 59, renderOverview
conditionally renders the token metrics rows only when data.tokenMetrics exists,
but the report object constructed in the orchestrator does not include
tokenMetrics; update the orchestrator's report construction to populate the
tokenMetrics property (e.g., from the token metrics computation or a new
computeTokenMetrics function/variable) so data.tokenMetrics is present when
calling renderOverview, or alternatively change renderOverview to derive metrics
from the existing report fields; specifically, add a tokenMetrics object to the
report payload (containing designTreeTokens, designTreeBytes, tokensPerNode)
where the report is assembled so renderOverview can emit the rows.

CAC passes CLI option values as strings. Number.isFinite("1440")
returns false, rejecting valid inputs. Mirror the figmaScale pattern:
coerce with Number() first, then validate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/cli/index.ts (1)

920-953: 🛠️ Refactor suggestion | 🟠 Major

Use a schema for these CLI overrides.

This block is still hand-validating external input inside the action, and the declared handler shape says width/height are already numbers even though the parsed values are derived later. That keeps the runtime contract and TypeScript contract out of sync. Please move these fields into a Zod schema under contracts/ and derive the parsed type with z.infer.

As per coding guidelines "Validate all external inputs with Zod schemas and infer TypeScript types using z.infer<typeof Schema>" and "Use Zod schemas for external input validation (schemas in contracts/ directory)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/index.ts` around lines 920 - 953, Extract the CLI override validation
into a Zod schema (e.g., VisualCompareCliSchema) placed in contracts/, defining
figmaUrl (string), token (string | optional), figmaScale/exportScale (coerced
number >= 1 | optional), width and height (coerced positive numbers | optional),
then export a TypeScript type using z.infer<typeof VisualCompareCliSchema> and
use the schema to parse/validate the incoming options at the start of the
.action handler instead of the manual checks; replace the existing manual
coercion/validation for options.figmaScale, width, and height with schema
parsing that coerces strings to numbers and throws on invalid input, and update
the action handler signature/types to use the inferred type and pass the parsed
values into visualCompare (referencing options, exportScale, width, height, and
visualCompare).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cli/index.ts`:
- Around line 935-940: The CLI currently validates options.figmaScale via the
exportScale variable but uses the wrong bounds; update the check on exportScale
(derived from options.figmaScale) to allow finite numbers between 0.01 and 4
inclusive (use Number.isFinite(exportScale) and ensure exportScale >= 0.01 &&
exportScale <= 4), and change the error message to "Error: --figma-scale must be
a number between 0.01 and 4" before calling process.exit(1).
- Around line 614-627: The call is passing an object with lenient:
options.lenientConvergence which creates a present property set to undefined and
violates exactOptionalPropertyTypes; instead build the convergence options only
when the flag is provided (e.g. const convergenceOpts =
options.lenientConvergence === undefined ? undefined : { lenient:
options.lenientConvergence }) and pass that variable to
isConverged(resolve(options.runDir), convergenceOpts) so the lenient property is
omitted when not set; update any other isConverged usages in this block
similarly.

---

Duplicate comments:
In `@src/cli/index.ts`:
- Around line 920-953: Extract the CLI override validation into a Zod schema
(e.g., VisualCompareCliSchema) placed in contracts/, defining figmaUrl (string),
token (string | optional), figmaScale/exportScale (coerced number >= 1 |
optional), width and height (coerced positive numbers | optional), then export a
TypeScript type using z.infer<typeof VisualCompareCliSchema> and use the schema
to parse/validate the incoming options at the start of the .action handler
instead of the manual checks; replace the existing manual coercion/validation
for options.figmaScale, width, and height with schema parsing that coerces
strings to numbers and throws on invalid input, and update the action handler
signature/types to use the inferred type and pass the parsed values into
visualCompare (referencing options, exportScale, width, height, and
visualCompare).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ec4fbf19-53a3-44b6-8c65-1115506553dd

📥 Commits

Reviewing files that changed from the base of the PR and between bd8c90b and fdfd3e9.

📒 Files selected for processing (1)
  • src/cli/index.ts

Comment on lines +614 to 627
.action(
(fixturePath: string, options: {
fixturesDir?: string;
force?: boolean;
runDir?: string;
lenientConvergence?: boolean;
}) => {
if (!options.force) {
if (!options.runDir) {
console.error("Error: --run-dir required to check convergence (or use --force to skip check)");
process.exit(1);
}
if (!isConverged(resolve(options.runDir))) {
if (!isConverged(resolve(options.runDir), { lenient: options.lenientConvergence })) {
const debate = parseDebateResult(resolve(options.runDir));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
fd 'tsconfig.*\.json$' -a | xargs -r rg -n '"exactOptionalPropertyTypes"\s*:\s*true'
fd 'run-directory\.ts$' -a | xargs -r rg -n -C2 'interface ConvergenceOptions|type ConvergenceOptions|function isConverged|const isConverged'

Repository: let-sunny/canicode

Length of output: 578


🏁 Script executed:

# Check the exact signature of isConverged to understand property constraints
cat -n src/run-directory.ts | sed -n '250,275p'

Repository: let-sunny/canicode

Length of output: 114


🏁 Script executed:

# Check the current code in src/cli/index.ts around line 614-627
cat -n src/cli/index.ts | sed -n '614,627p'

Repository: let-sunny/canicode

Length of output: 680


🏁 Script executed:

# Find the file containing ConvergenceOptions definition
rg -l 'interface ConvergenceOptions' --type ts

Repository: let-sunny/canicode

Length of output: 89


🏁 Script executed:

# Get the context around the ConvergenceOptions definition
rg -n -B2 -A8 'export interface ConvergenceOptions' --type ts

Repository: let-sunny/canicode

Length of output: 725


🏁 Script executed:

# Verify the isConverged function signature
rg -n -A2 'export function isConverged' src/agents/run-directory.ts

Repository: let-sunny/canicode

Length of output: 231


Build the convergence options object conditionally to comply with exactOptionalPropertyTypes.

The current code passes { lenient: options.lenientConvergence } which assigns undefined to the lenient property when the flag is not set. Under TypeScript's exactOptionalPropertyTypes rule (enabled in this project), optional properties must be absent rather than present with undefined values. Omit the entire options object when the flag is not provided:

Suggested fix
-      if (!isConverged(resolve(options.runDir), { lenient: options.lenientConvergence })) {
+      const convergenceOptions = options.lenientConvergence ? { lenient: true } : undefined;
+      if (!isConverged(resolve(options.runDir), convergenceOptions)) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.action(
(fixturePath: string, options: {
fixturesDir?: string;
force?: boolean;
runDir?: string;
lenientConvergence?: boolean;
}) => {
if (!options.force) {
if (!options.runDir) {
console.error("Error: --run-dir required to check convergence (or use --force to skip check)");
process.exit(1);
}
if (!isConverged(resolve(options.runDir))) {
if (!isConverged(resolve(options.runDir), { lenient: options.lenientConvergence })) {
const debate = parseDebateResult(resolve(options.runDir));
.action(
(fixturePath: string, options: {
fixturesDir?: string;
force?: boolean;
runDir?: string;
lenientConvergence?: boolean;
}) => {
if (!options.force) {
if (!options.runDir) {
console.error("Error: --run-dir required to check convergence (or use --force to skip check)");
process.exit(1);
}
const convergenceOptions = options.lenientConvergence ? { lenient: true } : undefined;
if (!isConverged(resolve(options.runDir), convergenceOptions)) {
const debate = parseDebateResult(resolve(options.runDir));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/index.ts` around lines 614 - 627, The call is passing an object with
lenient: options.lenientConvergence which creates a present property set to
undefined and violates exactOptionalPropertyTypes; instead build the convergence
options only when the flag is provided (e.g. const convergenceOpts =
options.lenientConvergence === undefined ? undefined : { lenient:
options.lenientConvergence }) and pass that variable to
isConverged(resolve(options.runDir), convergenceOpts) so the lenient property is
omitted when not set; update any other isConverged usages in this block
similarly.

Comment on lines +935 to +940
const exportScale =
options.figmaScale !== undefined ? Number(options.figmaScale) : undefined;
if (exportScale !== undefined && (!Number.isFinite(exportScale) || exportScale < 1)) {
console.error("Error: --figma-scale must be a number >= 1");
process.exit(1);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for scale-related code in the codebase
rg -A 5 -B 5 "figmaScale\|figma.*scale" src/ --type ts

Repository: let-sunny/canicode

Length of output: 44


🏁 Script executed:

# Check if there's any documentation about scale in README or comments
rg -i "scale.*figma\|figma.*scale" . --type md --type ts | head -20

Repository: let-sunny/canicode

Length of output: 44


🌐 Web query:

Figma API scale parameter bounds documentation

💡 Result:

Figma APIs feature several "scale" parameters with documented bounds in official documentation. REST API (GET /v1/images/:key): - scale: Number between 0.01 and 4 (image scaling factor). Plugin API: - rescale(scale: number): scale factor must be >= 0.01 (no explicit upper bound stated). - resize(width: number, height: number): width and height must be >= 0.01. - resizeWithoutConstraints(width: number, height: number): width and height must be >= 0.01. No other scale parameters with explicit bounds found in core API docs. Constraints include "SCALE" as a type but no numeric bounds. Widget ImagePaint scalingFactor has no explicit bounds documented.

Citations:


🌐 Web query:

site:developers.figma.com scale parameter images export

💡 Result:

The "scale parameter" in the context of image exports primarily refers to Figma's Plugin API and REST API. In Figma Plugin API: - Use node.exportAsync({ format: 'PNG', constraint: { type: 'SCALE', value: 2 } }) to export at 2x resolution (value=1 is default 1x). - ExportSettingsConstraints: { type: "SCALE" | "WIDTH" | "HEIGHT", value: number }, where SCALE is proportional to layer size. In Figma REST API: - GET /v1/images/:key?ids=...:id&scale=2 for 2x scaling factor (0.01 to 4). Example code from docs: (async => { const bytes = await polygon.exportAsync({ format: 'PNG', constraint: { type: 'SCALE', value: 2 } }); const image = figma.createImage(bytes); // Use image... }) No strong evidence for other tools like Google Earth Engine or GDAL matching this exact phrase; Figma docs dominate results from official sources.

Citations:


🏁 Script executed:

# Check if the file exists and read the specific lines
ls -la src/cli/index.ts 2>/dev/null && echo "File found" || echo "File not found"

Repository: let-sunny/canicode

Length of output: 139


🏁 Script executed:

# List all TypeScript files in src/cli/
fd . src/cli/ -e ts -e tsx

Repository: let-sunny/canicode

Length of output: 94


🏁 Script executed:

# Search for exportScale more broadly
rg "exportScale" --type ts

Repository: let-sunny/canicode

Length of output: 1115


🏁 Script executed:

# Read lines around 935-940 from src/cli/index.ts
sed -n '930,945p' src/cli/index.ts

Repository: let-sunny/canicode

Length of output: 740


🏁 Script executed:

# Read the usage in visual-compare.ts to understand impact
sed -n '1,50p' src/core/engine/visual-compare.ts | head -30

Repository: let-sunny/canicode

Length of output: 1005


🏁 Script executed:

# Get more context on visual-compare.ts exportScale usage
rg -B 2 -A 2 "exportScale" src/core/engine/visual-compare.ts

Repository: let-sunny/canicode

Length of output: 1188


Enforce Figma's actual scale bounds at the CLI boundary.

The Figma REST API documents scale as 0.014. The current check rejects valid values like 0.5 and accepts invalid ones like 5, so users can be blocked unnecessarily or fail later at the Figma call. Tighten the range to match the API bounds.

Suggested fix
-      if (exportScale !== undefined && (!Number.isFinite(exportScale) || exportScale < 1)) {
-        console.error("Error: --figma-scale must be a number >= 1");
+      if (
+        exportScale !== undefined &&
+        (!Number.isFinite(exportScale) || exportScale < 0.01 || exportScale > 4)
+      ) {
+        console.error("Error: --figma-scale must be a number between 0.01 and 4");
         process.exit(1);
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/index.ts` around lines 935 - 940, The CLI currently validates
options.figmaScale via the exportScale variable but uses the wrong bounds;
update the check on exportScale (derived from options.figmaScale) to allow
finite numbers between 0.01 and 4 inclusive (use Number.isFinite(exportScale)
and ensure exportScale >= 0.01 && exportScale <= 4), and change the error
message to "Error: --figma-scale must be a number between 0.01 and 4" before
calling process.exit(1).

@let-sunny let-sunny marked this pull request as draft March 24, 2026 23:23
Integrate Zod-based debate.json parsing from main (#32) while
preserving lenient convergence mode from this branch (issue #14).
Update defensive tests to match Zod validation at parse boundary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@let-sunny let-sunny marked this pull request as ready for review March 24, 2026 23:33
@let-sunny let-sunny merged commit 7eb744e into main Mar 24, 2026
3 checks passed
@let-sunny let-sunny deleted the fix/calibration-pipeline-issue-14 branch March 24, 2026 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: 캘리브레이션 파이프라인 고도화

1 participant