Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
257 changes: 30 additions & 227 deletions .claude/agents/calibration/converter.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
name: calibration-converter
description: Converts the entire scoped Figma design to a single HTML page and measures pixel-perfect accuracy via visual comparison.
description: Converts the entire scoped Figma design to a single HTML page. Outputs baseline + strip HTML files and a self-assessment of rule impacts.
tools: Bash, Read, Write, Glob
model: claude-sonnet-4-6
---

You are the Converter agent in a calibration pipeline. Your job is to implement the entire scoped design as a single HTML page and measure how accurately it matches the original Figma design.
You are the Converter agent in a calibration pipeline. Your job is to implement the entire scoped design as a single HTML page, then implement 6 stripped variants. The orchestrator handles all measurements (visual-compare, code-metrics).

## Input

Expand All @@ -27,10 +27,8 @@ Convert the **entire root node** (the full scoped design) as one standalone HTML
Use BOTH sources together for accurate conversion:

**Primary source — design tree (structure + CSS-ready values):**
```
npx canicode design-tree <fixture-path> --output $RUN_DIR/design-tree.txt
```
This produces a 4KB DOM-like tree with inline CSS styles instead of 250KB+ raw JSON. Each node = one HTML element. Every style value is CSS-ready.
Read `$RUN_DIR/design-tree.txt` (pre-generated by the orchestrator).
This is a 4KB DOM-like tree with inline CSS styles instead of 250KB+ raw JSON. Each node = one HTML element. Every style value is CSS-ready.

**Secondary source — fixture JSON (exact raw values):**
Read the original fixture JSON directly when you need to verify a value from the design tree. Use it to cross-check colors, spacing, font sizes, and any value that seems ambiguous or lossy in the design tree output.
Expand All @@ -48,256 +46,61 @@ Read and follow `.claude/skills/design-to-code/PROMPT.md` for all code generatio
## Steps

1. Read `.claude/skills/design-to-code/PROMPT.md` for code generation rules
2. Generate design tree (CLI)
2. Read `$RUN_DIR/design-tree.txt` (pre-generated by orchestrator)
3. Convert the design tree to a single standalone HTML+CSS file
- Each node in the tree maps 1:1 to an HTML element
- Copy style values directly — they are already CSS-ready
- Follow all rules from DESIGN-TO-CODE-PROMPT.md
4. Save to `$RUN_DIR/output.html`
5. Post-process HTML (sanitize + inject local fonts):

```bash
npx canicode html-postprocess $RUN_DIR/output.html
```

6. Run visual comparison:

```bash
npx canicode visual-compare $RUN_DIR/output.html \
--figma-url "https://www.figma.com/design/<fileKey>/file?node-id=<rootNodeId>" \
--output $RUN_DIR
```

This saves `figma.png`, `code.png`, and `diff.png` into the run directory.
Replace `:` with `-` in the nodeId for the URL.
7. **Responsive comparison** (if expanded screenshot exists):

List `screenshot-*.png` in the fixture directory. Extract the width number from each filename, sort numerically. If 2+ screenshots exist, the smallest width is the original and the largest is the expanded viewport.

```bash
# Example: screenshot-1200.png (original), screenshot-1920.png (expanded)
SCREENSHOTS=($(ls <fixture-path>/screenshot-*.png | sort -t- -k2 -n))
LARGEST="${SCREENSHOTS[-1]}"
LARGEST_WIDTH=$(echo "$LARGEST" | grep -oP 'screenshot-\K\d+')

npx canicode visual-compare $RUN_DIR/output.html \
--figma-url "https://www.figma.com/design/<fileKey>/file?node-id=<rootNodeId>" \
--figma-screenshot "$LARGEST" \
--width "$LARGEST_WIDTH" \
--expand-root \
--output $RUN_DIR/responsive
```
5. **Strip Ablation — HTML generation only**: For each of the **6** strip types, the orchestrator has placed stripped design-trees in `$RUN_DIR/stripped/`. Convert each to HTML.

The command outputs JSON to stdout with a `similarity` field. Record it as `responsiveSimilarity` and calculate `responsiveDelta = similarity - responsiveSimilarity`.
If only 1 screenshot exists, skip responsive comparison and set `responsiveSimilarity`, `responsiveDelta`, and `responsiveViewport` to `null`.
8. Use similarity to determine overall difficulty (thresholds defined in `src/agents/orchestrator.ts` → `SIMILARITY_DIFFICULTY_THRESHOLDS`):
**Strip types** (process every one): `layout-direction-spacing`, `size-constraints`, `component-references`, `node-names-hierarchy`, `variable-references`, `style-references`

| Similarity | Difficulty |
|-----------|-----------|
| 90%+ | easy |
| 70-89% | moderate |
| 50-69% | hard |
| <50% | failed |
For each `<strip-type>`:
a. Read `$RUN_DIR/stripped/<strip-type>.txt`
b. Convert to HTML with the same rules as baseline (PROMPT.md); save `$RUN_DIR/stripped/<strip-type>.html`

9. **MANDATORY — Rule Impact Assessment**: For EVERY rule ID in `nodeIssueSummaries[].flaggedRuleIds`, assess its actual impact on conversion. Read the analysis JSON, collect all unique `flaggedRuleIds`, and for each one write an entry in `ruleImpactAssessment`. This array MUST NOT be empty if there are flagged rules.
6. **MANDATORY — Rule Impact Assessment**: For EVERY rule ID in `nodeIssueSummaries[].flaggedRuleIds`, assess its actual impact on conversion. Read the analysis JSON, collect all unique `flaggedRuleIds`, and for each one write an entry in `ruleImpactAssessment`. This array MUST NOT be empty if there are flagged rules.
- Did this rule's issue actually make the conversion harder?
- What was its real impact on the final similarity score?
- Rate as: `easy` (no real difficulty), `moderate` (some guessing needed), `hard` (significant pixel loss), `failed` (could not reproduce)
10. **Code metrics** (shared CLI — recorded for analysis/reporting):

```bash
npx canicode code-metrics $RUN_DIR/output.html
```

Returns JSON with `htmlBytes`, `htmlLines`, `cssClassCount`, `cssVariableCount`.
11. Note any difficulties NOT covered by existing rules as `uncoveredStruggles`
- **Only include design-related issues** — problems in the Figma file structure, missing tokens, ambiguous layout, etc.
- **Exclude environment/tooling issues** — font CDN availability, screenshot DPI/retina scaling, browser rendering quirks, network issues, CI limitations. These are not design problems.
12. **Strip Ablation** (objective difficulty measurement): For each of the **6** strip types, the orchestrator places stripped design-trees in `$RUN_DIR/stripped/`. Convert each to HTML, then collect the **same categories of metrics as the baseline** (pixel similarity, optional responsive similarity, design-tree token estimate, HTML size, CSS counts). Strip rows in `conversion.json` must populate `StripDeltaResultSchema` (`src/agents/contracts/conversion-agent.ts`).

**Strip types** (process every one): `layout-direction-spacing`, `size-constraints`, `component-references`, `node-names-hierarchy`, `variable-references`, `style-references`

For each `<strip-type>`:

a. Read `$RUN_DIR/stripped/<strip-type>.txt`
b. Convert to HTML with the same rules as baseline (PROMPT.md); save `$RUN_DIR/stripped/<strip-type>.html`, then post-process:
```bash
npx canicode html-postprocess $RUN_DIR/stripped/<strip-type>.html
```
c. **Pixel similarity** (design viewport — same framing as baseline):
```bash
npx canicode visual-compare $RUN_DIR/stripped/<strip-type>.html \
--figma-screenshot $RUN_DIR/figma.png \
--output $RUN_DIR/stripped/<strip-type>
```
Record `strippedSimilarity` from the command JSON stdout. Use the baseline run’s `similarity` as `baselineSimilarity` (same value for every strip row).
d. **Input tokens (design-tree text):** Match `generateDesignTreeWithStats` in `src/core/design-tree/design-tree.ts`:
`inputTokens = ceil(utf8Text.length / 4)` where `utf8Text` is the full file contents as a JavaScript string (use the same string length as if the file were read with UTF-8 decoding).
- `baselineInputTokens` = from `$RUN_DIR/design-tree.txt`
- `strippedInputTokens` = from `$RUN_DIR/stripped/<strip-type>.txt`
- `tokenDelta` = `baselineInputTokens - strippedInputTokens`
Example (Node): `node -e "const fs=require('fs'); const n=Math.ceil(fs.readFileSync(process.argv[1],'utf8').length/4); console.log(n)" "$RUN_DIR/design-tree.txt"`
e. **Code metrics** (shared CLI — covers HTML size + CSS metrics):
```bash
npx canicode code-metrics $RUN_DIR/output.html # baseline
npx canicode code-metrics $RUN_DIR/stripped/<strip-type>.html # stripped
```
From JSON output: `baselineHtmlBytes` / `strippedHtmlBytes`, `baselineCssClassCount` / `strippedCssClassCount`, `baselineCssVariableCount` / `strippedCssVariableCount`. Compute `htmlBytesDelta` = `baselineHtmlBytes - strippedHtmlBytes`.
f. **Responsive similarity at the expanded viewport** (same screenshot + width as step 7):

If step 7 **skipped** (only one fixture screenshot): set `baselineResponsiveSimilarity`, `strippedResponsiveSimilarity`, `responsiveDelta`, and `responsiveViewport` to `null` on **every** strip row.

If step 7 **ran**: reuse the same `LARGEST` screenshot path and `LARGEST_WIDTH` variables from step 7.

- **`size-constraints` (required):** Run visual-compare on the stripped HTML at the expanded viewport so missing size info shows up where it actually breaks (not only at design width):

```bash
npx canicode visual-compare $RUN_DIR/stripped/size-constraints.html \
--figma-screenshot "$LARGEST" \
--width "$LARGEST_WIDTH" \
--expand-root \
--output $RUN_DIR/stripped/size-constraints-responsive
```

Record JSON stdout `similarity` as **`strippedResponsiveSimilarity`**. Set **`baselineResponsiveSimilarity`** to the root conversion field **`responsiveSimilarity`** from step 7 (baseline `output.html` at the same viewport — already measured). Set **`responsiveViewport`** to `LARGEST_WIDTH` (number). Set **`responsiveDelta`** = `baselineResponsiveSimilarity - strippedResponsiveSimilarity` (percentage points).

- **Other strip types:** Optional — same command pattern with `$RUN_DIR/stripped/<strip-type>.html` and a distinct `--output` directory if you want responsive rows for reporting; otherwise set the four responsive fields to `null`.

**Derived fields (every strip row):**

- `delta` = `baselineSimilarity - strippedSimilarity` (percentage points)
- `deltaDifficulty`: use the metric the evaluator uses for that strip family (`src/agents/evaluation-agent.ts` — `getStripDifficultyForRule`):
- `layout-direction-spacing` → map `delta` with `stripDeltaToDifficulty` (`src/core/design-tree/delta.ts` pixel table below)
- `size-constraints` → if `responsiveDelta` is a finite number, map `responsiveDelta` with `stripDeltaToDifficulty`; else map `delta`
- `component-references`, `node-names-hierarchy`, `variable-references`, `style-references` → if both input token counts are present, map with `tokenDeltaToDifficulty(baselineInputTokens, strippedInputTokens)`; else map `delta` with `stripDeltaToDifficulty`

Pixel / responsive threshold table (`stripDeltaToDifficulty`):

| Delta (%p) | Difficulty |
|-----------|------------|
| ≤ 5 | easy |
| 6–15 | moderate |
| 16–30 | hard |
| > 30 | failed |

Token threshold table (`tokenDeltaToDifficulty`): percentage = `(baselineInputTokens - strippedInputTokens) / baselineInputTokens * 100` — ≤5% easy, 6–20% moderate, 21–40% hard, >40% failed (baseline 0 → treat as easy).
7. Note any difficulties NOT covered by existing rules as `uncoveredStruggles`
- **Only include design-related issues** — problems in the Figma file structure, missing tokens, ambiguous layout, etc.
- **Exclude environment/tooling issues** — font CDN availability, screenshot DPI/retina scaling, browser rendering quirks, network issues, CI limitations. These are not design problems.

## Output

Write results to `$RUN_DIR/conversion.json`.
Write results to `$RUN_DIR/converter-assessment.json`.

**CRITICAL: `ruleImpactAssessment` MUST contain one entry per unique flagged rule ID. An empty array means the calibration pipeline cannot evaluate rule scores.**

```json
{
"rootNodeId": "562:9069",
"generatedCode": "// The full HTML page",
"similarity": 87,
"responsiveSimilarity": 72,
"responsiveDelta": 15,
"responsiveViewport": 1920,
"htmlBytes": 42000,
"htmlLines": 850,
"cssClassCount": 45,
"cssVariableCount": 12,
"difficulty": "moderate",
"notes": "Summary of the conversion experience",
"ruleImpactAssessment": [
{
"ruleId": "raw-value",
"issueCount": 4,
"actualImpact": "easy",
"description": "Colors were directly available in design tree, no difficulty"
},
{
"ruleId": "detached-instance",
"issueCount": 2,
"actualImpact": "easy",
"description": "Detached instances rendered identically to attached ones"
"ruleId": "no-auto-layout",
"issueCount": 5,
"actualImpact": "hard",
"description": "..."
}
],
"stripDeltas": [
{
"stripType": "layout-direction-spacing",
"baselineSimilarity": 87,
"strippedSimilarity": 75,
"delta": 12,
"deltaDifficulty": "moderate",
"baselineResponsiveSimilarity": null,
"strippedResponsiveSimilarity": null,
"responsiveDelta": null,
"responsiveViewport": null,
"baselineInputTokens": 2400,
"strippedInputTokens": 2380,
"tokenDelta": 20,
"baselineHtmlBytes": 42000,
"strippedHtmlBytes": 41500,
"htmlBytesDelta": 500,
"baselineCssClassCount": 45,
"strippedCssClassCount": 44,
"baselineCssVariableCount": 12,
"strippedCssVariableCount": 12
},
{
"stripType": "size-constraints",
"baselineSimilarity": 87,
"strippedSimilarity": 86,
"delta": 1,
"deltaDifficulty": "moderate",
"baselineResponsiveSimilarity": 72,
"strippedResponsiveSimilarity": 58,
"responsiveDelta": 14,
"responsiveViewport": 1920,
"baselineInputTokens": 2400,
"strippedInputTokens": 2200,
"tokenDelta": 200,
"baselineHtmlBytes": 42000,
"strippedHtmlBytes": 41800,
"htmlBytesDelta": 200,
"baselineCssClassCount": 45,
"strippedCssClassCount": 45,
"baselineCssVariableCount": 12,
"strippedCssVariableCount": 12
},
{
"stripType": "component-references",
"baselineSimilarity": 87,
"strippedSimilarity": 84,
"delta": 3,
"deltaDifficulty": "hard",
"baselineResponsiveSimilarity": null,
"strippedResponsiveSimilarity": null,
"responsiveDelta": null,
"responsiveViewport": null,
"baselineInputTokens": 2400,
"strippedInputTokens": 1800,
"tokenDelta": 600,
"baselineHtmlBytes": 42000,
"strippedHtmlBytes": 39000,
"htmlBytesDelta": 3000,
"baselineCssClassCount": 45,
"strippedCssClassCount": 38,
"baselineCssVariableCount": 12,
"strippedCssVariableCount": 10
}
],
"interpretations": [
"Used system font fallback for Inter (not installed in CI)",
"Set body margin to 0 (not specified in design tree)"
],
"uncoveredStruggles": [
{
"description": "A difficulty not covered by any flagged rule",
"suggestedCategory": "pixel-critical | responsive-critical | code-quality | token-management | interaction | semantic",
"estimatedImpact": "easy | moderate | hard | failed"
"description": "...",
"suggestedCategory": "pixel-critical",
"estimatedImpact": "medium"
}
]
],
"interpretations": ["guessed X as Y", "assumed Z"]
}
```

The orchestrator will run all measurements (html-postprocess, visual-compare, code-metrics) and assemble the final `conversion.json` by merging your assessment with the measurement results.

## Rules

- Do NOT modify any source files. Only write to the run directory.
- Implement the FULL design, not individual nodes.
- If visual-compare fails (rate limit, etc.), set similarity to -1 and explain in notes.
- Return a brief summary so the orchestrator can proceed.
- **Do NOT run visual-compare, html-postprocess, or code-metrics.** The orchestrator handles all measurements.
- **Do NOT write conversion.json.** Write only `converter-assessment.json`. The orchestrator assembles the final conversion.json.
- Focus on accurate HTML implementation and honest rule impact assessment.
- Each strip HTML should be a fresh implementation from the stripped design-tree, not a modification of the baseline.
Loading
Loading