let-sunny · let-sunny · Mar 31, 2026 · Mar 31, 2026 · Mar 31, 2026 · Mar 31, 2026
diff --git a/.claude/agents/calibration/converter.md b/.claude/agents/calibration/converter.md
@@ -1,11 +1,11 @@
 ---
 name: calibration-converter
-description: Converts the entire scoped Figma design to a single HTML page and measures pixel-perfect accuracy via visual comparison.
+description: Converts the entire scoped Figma design to a single HTML page. Outputs baseline + strip HTML files and a self-assessment of rule impacts.
 tools: Bash, Read, Write, Glob
 model: claude-sonnet-4-6
 ---
 
-You are the Converter agent in a calibration pipeline. Your job is to implement the entire scoped design as a single HTML page and measure how accurately it matches the original Figma design.
+You are the Converter agent in a calibration pipeline. Your job is to implement the entire scoped design as a single HTML page, then implement 6 stripped variants. The orchestrator handles all measurements (visual-compare, code-metrics).
 
 ## Input
 
@@ -27,10 +27,8 @@ Convert the **entire root node** (the full scoped design) as one standalone HTML
 Use BOTH sources together for accurate conversion:
 
 **Primary source — design tree (structure + CSS-ready values):**
-```
-npx canicode design-tree <fixture-path> --output $RUN_DIR/design-tree.txt
-```
-This produces a 4KB DOM-like tree with inline CSS styles instead of 250KB+ raw JSON. Each node = one HTML element. Every style value is CSS-ready.
+Read `$RUN_DIR/design-tree.txt` (pre-generated by the orchestrator).
+This is a 4KB DOM-like tree with inline CSS styles instead of 250KB+ raw JSON. Each node = one HTML element. Every style value is CSS-ready.
 
 **Secondary source — fixture JSON (exact raw values):**
 Read the original fixture JSON directly when you need to verify a value from the design tree. Use it to cross-check colors, spacing, font sizes, and any value that seems ambiguous or lossy in the design tree output.
@@ -48,256 +46,61 @@ Read and follow `.claude/skills/design-to-code/PROMPT.md` for all code generatio
 ## Steps
 
 1. Read `.claude/skills/design-to-code/PROMPT.md` for code generation rules
-2. Generate design tree (CLI)
+2. Read `$RUN_DIR/design-tree.txt` (pre-generated by orchestrator)
 3. Convert the design tree to a single standalone HTML+CSS file
    - Each node in the tree maps 1:1 to an HTML element
    - Copy style values directly — they are already CSS-ready
    - Follow all rules from DESIGN-TO-CODE-PROMPT.md
 4. Save to `$RUN_DIR/output.html`
-5. Post-process HTML (sanitize + inject local fonts):
-
-   ```bash
-   npx canicode html-postprocess $RUN_DIR/output.html
-   ```
-
-6. Run visual comparison:
-
-   ```bash
-   npx canicode visual-compare $RUN_DIR/output.html \
-     --figma-url "https://www.figma.com/design/<fileKey>/file?node-id=<rootNodeId>" \
-     --output $RUN_DIR
-   ```
-
-   This saves `figma.png`, `code.png`, and `diff.png` into the run directory.
-   Replace `:` with `-` in the nodeId for the URL.
-7. **Responsive comparison** (if expanded screenshot exists):
-
-   List `screenshot-*.png` in the fixture directory. Extract the width number from each filename, sort numerically. If 2+ screenshots exist, the smallest width is the original and the largest is the expanded viewport.
-
-   ```bash
-   # Example: screenshot-1200.png (original), screenshot-1920.png (expanded)
-   SCREENSHOTS=($(ls <fixture-path>/screenshot-*.png | sort -t- -k2 -n))
-   LARGEST="${SCREENSHOTS[-1]}"
-   LARGEST_WIDTH=$(echo "$LARGEST" | grep -oP 'screenshot-\K\d+')
-
-   npx canicode visual-compare $RUN_DIR/output.html \
-     --figma-url "https://www.figma.com/design/<fileKey>/file?node-id=<rootNodeId>" \
-     --figma-screenshot "$LARGEST" \
-     --width "$LARGEST_WIDTH" \
-     --expand-root \
-     --output $RUN_DIR/responsive
-   ```
+5. **Strip Ablation — HTML generation only**: For each of the **6** strip types, the orchestrator has placed stripped design-trees in `$RUN_DIR/stripped/`. Convert each to HTML.
 
-   The command outputs JSON to stdout with a `similarity` field. Record it as `responsiveSimilarity` and calculate `responsiveDelta = similarity - responsiveSimilarity`.
-   If only 1 screenshot exists, skip responsive comparison and set `responsiveSimilarity`, `responsiveDelta`, and `responsiveViewport` to `null`.
-8. Use similarity to determine overall difficulty (thresholds defined in `src/agents/orchestrator.ts` → `SIMILARITY_DIFFICULTY_THRESHOLDS`):
+   **Strip types** (process every one): `layout-direction-spacing`, `size-constraints`, `component-references`, `node-names-hierarchy`, `variable-references`, `style-references`
 
-   | Similarity | Difficulty |
-   |-----------|-----------|
-   | 90%+ | easy |
-   | 70-89% | moderate |
-   | 50-69% | hard |
-   | <50% | failed |
+   For each `<strip-type>`:
+   a. Read `$RUN_DIR/stripped/<strip-type>.txt`
+   b. Convert to HTML with the same rules as baseline (PROMPT.md); save `$RUN_DIR/stripped/<strip-type>.html`
 
-9. **MANDATORY — Rule Impact Assessment**: For EVERY rule ID in `nodeIssueSummaries[].flaggedRuleIds`, assess its actual impact on conversion. Read the analysis JSON, collect all unique `flaggedRuleIds`, and for each one write an entry in `ruleImpactAssessment`. This array MUST NOT be empty if there are flagged rules.
+6. **MANDATORY — Rule Impact Assessment**: For EVERY rule ID in `nodeIssueSummaries[].flaggedRuleIds`, assess its actual impact on conversion. Read the analysis JSON, collect all unique `flaggedRuleIds`, and for each one write an entry in `ruleImpactAssessment`. This array MUST NOT be empty if there are flagged rules.
    - Did this rule's issue actually make the conversion harder?
    - What was its real impact on the final similarity score?
    - Rate as: `easy` (no real difficulty), `moderate` (some guessing needed), `hard` (significant pixel loss), `failed` (could not reproduce)
-10. **Code metrics** (shared CLI — recorded for analysis/reporting):
-
-   ```bash
-   npx canicode code-metrics $RUN_DIR/output.html
-   ```
-
-   Returns JSON with `htmlBytes`, `htmlLines`, `cssClassCount`, `cssVariableCount`.
-11. Note any difficulties NOT covered by existing rules as `uncoveredStruggles`
-    - **Only include design-related issues** — problems in the Figma file structure, missing tokens, ambiguous layout, etc.
-    - **Exclude environment/tooling issues** — font CDN availability, screenshot DPI/retina scaling, browser rendering quirks, network issues, CI limitations. These are not design problems.
-12. **Strip Ablation** (objective difficulty measurement): For each of the **6** strip types, the orchestrator places stripped design-trees in `$RUN_DIR/stripped/`. Convert each to HTML, then collect the **same categories of metrics as the baseline** (pixel similarity, optional responsive similarity, design-tree token estimate, HTML size, CSS counts). Strip rows in `conversion.json` must populate `StripDeltaResultSchema` (`src/agents/contracts/conversion-agent.ts`).
-
-    **Strip types** (process every one): `layout-direction-spacing`, `size-constraints`, `component-references`, `node-names-hierarchy`, `variable-references`, `style-references`
-
-    For each `<strip-type>`:
-
-    a. Read `$RUN_DIR/stripped/<strip-type>.txt`
-    b. Convert to HTML with the same rules as baseline (PROMPT.md); save `$RUN_DIR/stripped/<strip-type>.html`, then post-process:
-       ```bash
-       npx canicode html-postprocess $RUN_DIR/stripped/<strip-type>.html
-       ```
-    c. **Pixel similarity** (design viewport — same framing as baseline):
-       ```bash
-       npx canicode visual-compare $RUN_DIR/stripped/<strip-type>.html \
-         --figma-screenshot $RUN_DIR/figma.png \
-         --output $RUN_DIR/stripped/<strip-type>
-       ```
-       Record `strippedSimilarity` from the command JSON stdout. Use the baseline run’s `similarity` as `baselineSimilarity` (same value for every strip row).
-    d. **Input tokens (design-tree text):** Match `generateDesignTreeWithStats` in `src/core/design-tree/design-tree.ts`:  
-       `inputTokens = ceil(utf8Text.length / 4)` where `utf8Text` is the full file contents as a JavaScript string (use the same string length as if the file were read with UTF-8 decoding).  
-       - `baselineInputTokens` = from `$RUN_DIR/design-tree.txt`  
-       - `strippedInputTokens` = from `$RUN_DIR/stripped/<strip-type>.txt`  
-       - `tokenDelta` = `baselineInputTokens - strippedInputTokens`  
-       Example (Node): `node -e "const fs=require('fs'); const n=Math.ceil(fs.readFileSync(process.argv[1],'utf8').length/4); console.log(n)" "$RUN_DIR/design-tree.txt"`
-    e. **Code metrics** (shared CLI — covers HTML size + CSS metrics):
-       ```bash
-       npx canicode code-metrics $RUN_DIR/output.html           # baseline
-       npx canicode code-metrics $RUN_DIR/stripped/<strip-type>.html  # stripped
-       ```
-       From JSON output: `baselineHtmlBytes` / `strippedHtmlBytes`, `baselineCssClassCount` / `strippedCssClassCount`, `baselineCssVariableCount` / `strippedCssVariableCount`. Compute `htmlBytesDelta` = `baselineHtmlBytes - strippedHtmlBytes`.
-    f. **Responsive similarity at the expanded viewport** (same screenshot + width as step 7):
-
-       If step 7 **skipped** (only one fixture screenshot): set `baselineResponsiveSimilarity`, `strippedResponsiveSimilarity`, `responsiveDelta`, and `responsiveViewport` to `null` on **every** strip row.
-
-       If step 7 **ran**: reuse the same `LARGEST` screenshot path and `LARGEST_WIDTH` variables from step 7.
-
-       - **`size-constraints` (required):** Run visual-compare on the stripped HTML at the expanded viewport so missing size info shows up where it actually breaks (not only at design width):
-
-         ```bash
-         npx canicode visual-compare $RUN_DIR/stripped/size-constraints.html \
-           --figma-screenshot "$LARGEST" \
-           --width "$LARGEST_WIDTH" \
-           --expand-root \
-           --output $RUN_DIR/stripped/size-constraints-responsive
-         ```
-
-         Record JSON stdout `similarity` as **`strippedResponsiveSimilarity`**. Set **`baselineResponsiveSimilarity`** to the root conversion field **`responsiveSimilarity`** from step 7 (baseline `output.html` at the same viewport — already measured). Set **`responsiveViewport`** to `LARGEST_WIDTH` (number). Set **`responsiveDelta`** = `baselineResponsiveSimilarity - strippedResponsiveSimilarity` (percentage points).
-
-       - **Other strip types:** Optional — same command pattern with `$RUN_DIR/stripped/<strip-type>.html` and a distinct `--output` directory if you want responsive rows for reporting; otherwise set the four responsive fields to `null`.
-
-    **Derived fields (every strip row):**
-
-    - `delta` = `baselineSimilarity - strippedSimilarity` (percentage points)
-    - `deltaDifficulty`: use the metric the evaluator uses for that strip family (`src/agents/evaluation-agent.ts` — `getStripDifficultyForRule`):
-      - `layout-direction-spacing` → map `delta` with `stripDeltaToDifficulty` (`src/core/design-tree/delta.ts` pixel table below)
-      - `size-constraints` → if `responsiveDelta` is a finite number, map `responsiveDelta` with `stripDeltaToDifficulty`; else map `delta`
-      - `component-references`, `node-names-hierarchy`, `variable-references`, `style-references` → if both input token counts are present, map with `tokenDeltaToDifficulty(baselineInputTokens, strippedInputTokens)`; else map `delta` with `stripDeltaToDifficulty`
-
-    Pixel / responsive threshold table (`stripDeltaToDifficulty`):
-
-    | Delta (%p) | Difficulty |
-    |-----------|------------|
-    | ≤ 5 | easy |
-    | 6–15 | moderate |
-    | 16–30 | hard |
-    | > 30 | failed |
-
-    Token threshold table (`tokenDeltaToDifficulty`): percentage = `(baselineInputTokens - strippedInputTokens) / baselineInputTokens * 100` — ≤5% easy, 6–20% moderate, 21–40% hard, >40% failed (baseline 0 → treat as easy).
+7. Note any difficulties NOT covered by existing rules as `uncoveredStruggles`
+   - **Only include design-related issues** — problems in the Figma file structure, missing tokens, ambiguous layout, etc.
+   - **Exclude environment/tooling issues** — font CDN availability, screenshot DPI/retina scaling, browser rendering quirks, network issues, CI limitations. These are not design problems.
 
 ## Output
 
-Write results to `$RUN_DIR/conversion.json`.
+Write results to `$RUN_DIR/converter-assessment.json`.
 
 **CRITICAL: `ruleImpactAssessment` MUST contain one entry per unique flagged rule ID. An empty array means the calibration pipeline cannot evaluate rule scores.**
 
 ```json
 {
   "rootNodeId": "562:9069",
-  "generatedCode": "// The full HTML page",
-  "similarity": 87,
-  "responsiveSimilarity": 72,
-  "responsiveDelta": 15,
-  "responsiveViewport": 1920,
-  "htmlBytes": 42000,
-  "htmlLines": 850,
-  "cssClassCount": 45,
-  "cssVariableCount": 12,
-  "difficulty": "moderate",
-  "notes": "Summary of the conversion experience",
   "ruleImpactAssessment": [
     {
-      "ruleId": "raw-value",
-      "issueCount": 4,
-      "actualImpact": "easy",
-      "description": "Colors were directly available in design tree, no difficulty"
-    },
-    {
-      "ruleId": "detached-instance",
-      "issueCount": 2,
-      "actualImpact": "easy",
-      "description": "Detached instances rendered identically to attached ones"
+      "ruleId": "no-auto-layout",
+      "issueCount": 5,
+      "actualImpact": "hard",
+      "description": "..."
     }
   ],
-  "stripDeltas": [
-    {
-      "stripType": "layout-direction-spacing",
-      "baselineSimilarity": 87,
-      "strippedSimilarity": 75,
-      "delta": 12,
-      "deltaDifficulty": "moderate",
-      "baselineResponsiveSimilarity": null,
-      "strippedResponsiveSimilarity": null,
-      "responsiveDelta": null,
-      "responsiveViewport": null,
-      "baselineInputTokens": 2400,
-      "strippedInputTokens": 2380,
-      "tokenDelta": 20,
-      "baselineHtmlBytes": 42000,
-      "strippedHtmlBytes": 41500,
-      "htmlBytesDelta": 500,
-      "baselineCssClassCount": 45,
-      "strippedCssClassCount": 44,
-      "baselineCssVariableCount": 12,
-      "strippedCssVariableCount": 12
-    },
-    {
-      "stripType": "size-constraints",
-      "baselineSimilarity": 87,
-      "strippedSimilarity": 86,
-      "delta": 1,
-      "deltaDifficulty": "moderate",
-      "baselineResponsiveSimilarity": 72,
-      "strippedResponsiveSimilarity": 58,
-      "responsiveDelta": 14,
-      "responsiveViewport": 1920,
-      "baselineInputTokens": 2400,
-      "strippedInputTokens": 2200,
-      "tokenDelta": 200,
-      "baselineHtmlBytes": 42000,
-      "strippedHtmlBytes": 41800,
-      "htmlBytesDelta": 200,
-      "baselineCssClassCount": 45,
-      "strippedCssClassCount": 45,
-      "baselineCssVariableCount": 12,
-      "strippedCssVariableCount": 12
-    },
-    {
-      "stripType": "component-references",
-      "baselineSimilarity": 87,
-      "strippedSimilarity": 84,
-      "delta": 3,
-      "deltaDifficulty": "hard",
-      "baselineResponsiveSimilarity": null,
-      "strippedResponsiveSimilarity": null,
-      "responsiveDelta": null,
-      "responsiveViewport": null,
-      "baselineInputTokens": 2400,
-      "strippedInputTokens": 1800,
-      "tokenDelta": 600,
-      "baselineHtmlBytes": 42000,
-      "strippedHtmlBytes": 39000,
-      "htmlBytesDelta": 3000,
-      "baselineCssClassCount": 45,
-      "strippedCssClassCount": 38,
-      "baselineCssVariableCount": 12,
-      "strippedCssVariableCount": 10
-    }
-  ],
-  "interpretations": [
-    "Used system font fallback for Inter (not installed in CI)",
-    "Set body margin to 0 (not specified in design tree)"
-  ],
   "uncoveredStruggles": [
     {
-      "description": "A difficulty not covered by any flagged rule",
-      "suggestedCategory": "pixel-critical | responsive-critical | code-quality | token-management | interaction | semantic",
-      "estimatedImpact": "easy | moderate | hard | failed"
+      "description": "...",
+      "suggestedCategory": "pixel-critical",
+      "estimatedImpact": "medium"
     }
-  ]
+  ],
+  "interpretations": ["guessed X as Y", "assumed Z"]
 }
 ```
 
+The orchestrator will run all measurements (html-postprocess, visual-compare, code-metrics) and assemble the final `conversion.json` by merging your assessment with the measurement results.
+
 ## Rules
 
-- Do NOT modify any source files. Only write to the run directory.
-- Implement the FULL design, not individual nodes.
-- If visual-compare fails (rate limit, etc.), set similarity to -1 and explain in notes.
-- Return a brief summary so the orchestrator can proceed.
+- **Do NOT run visual-compare, html-postprocess, or code-metrics.** The orchestrator handles all measurements.
+- **Do NOT write conversion.json.** Write only `converter-assessment.json`. The orchestrator assembles the final conversion.json.
+- Focus on accurate HTML implementation and honest rule impact assessment.
+- Each strip HTML should be a fresh implementation from the stripped design-tree, not a modification of the baseline.