From 15a4d60da5f8dcf33260c49b59f7883890ebaf6d Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 31 Mar 2026 06:14:04 +0000 Subject: [PATCH 1/3] refactor: separate Converter role + rename run-phase1 (#218) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 2 of pipeline consolidation: Converter role separation: - Converter now only writes HTML (baseline + 6 strips) and converter-assessment.json (ruleImpact + uncoveredStruggles) - Orchestrator handles all measurements: html-postprocess, visual-compare, code-metrics, responsive comparison, strip delta calculations - Orchestrator assembles final conversion.json from measurements + assessment Ablation rename: - run-phase1.ts → run-strip.ts (clearer name for strip experiments) - Updated CLAUDE.md references https://claude.ai/code/session_01N72z2Wbib4cLhYc3FRdSQi --- .claude/agents/calibration/converter.md | 257 ++---------------- .claude/commands/calibrate-loop.md | 143 +++++++++- CLAUDE.md | 4 +- .../ablation/{run-phase1.ts => run-strip.ts} | 4 +- 4 files changed, 163 insertions(+), 245 deletions(-) rename src/experiments/ablation/{run-phase1.ts => run-strip.ts} (99%) diff --git a/.claude/agents/calibration/converter.md b/.claude/agents/calibration/converter.md index 83fdcbb..09ffd30 100644 --- a/.claude/agents/calibration/converter.md +++ b/.claude/agents/calibration/converter.md @@ -1,11 +1,11 @@ --- name: calibration-converter -description: Converts the entire scoped Figma design to a single HTML page and measures pixel-perfect accuracy via visual comparison. +description: Converts the entire scoped Figma design to a single HTML page. Outputs baseline + strip HTML files and a self-assessment of rule impacts. tools: Bash, Read, Write, Glob model: claude-sonnet-4-6 --- -You are the Converter agent in a calibration pipeline. Your job is to implement the entire scoped design as a single HTML page and measure how accurately it matches the original Figma design. +You are the Converter agent in a calibration pipeline. Your job is to implement the entire scoped design as a single HTML page, then implement 6 stripped variants. The orchestrator handles all measurements (visual-compare, code-metrics). ## Input @@ -27,10 +27,8 @@ Convert the **entire root node** (the full scoped design) as one standalone HTML Use BOTH sources together for accurate conversion: **Primary source — design tree (structure + CSS-ready values):** -``` -npx canicode design-tree --output $RUN_DIR/design-tree.txt -``` -This produces a 4KB DOM-like tree with inline CSS styles instead of 250KB+ raw JSON. Each node = one HTML element. Every style value is CSS-ready. +Read `$RUN_DIR/design-tree.txt` (pre-generated by the orchestrator). +This is a 4KB DOM-like tree with inline CSS styles instead of 250KB+ raw JSON. Each node = one HTML element. Every style value is CSS-ready. **Secondary source — fixture JSON (exact raw values):** Read the original fixture JSON directly when you need to verify a value from the design tree. Use it to cross-check colors, spacing, font sizes, and any value that seems ambiguous or lossy in the design tree output. @@ -48,256 +46,61 @@ Read and follow `.claude/skills/design-to-code/PROMPT.md` for all code generatio ## Steps 1. Read `.claude/skills/design-to-code/PROMPT.md` for code generation rules -2. Generate design tree (CLI) +2. Read `$RUN_DIR/design-tree.txt` (pre-generated by orchestrator) 3. Convert the design tree to a single standalone HTML+CSS file - Each node in the tree maps 1:1 to an HTML element - Copy style values directly — they are already CSS-ready - Follow all rules from DESIGN-TO-CODE-PROMPT.md 4. Save to `$RUN_DIR/output.html` -5. Post-process HTML (sanitize + inject local fonts): - - ```bash - npx canicode html-postprocess $RUN_DIR/output.html - ``` - -6. Run visual comparison: - - ```bash - npx canicode visual-compare $RUN_DIR/output.html \ - --figma-url "https://www.figma.com/design//file?node-id=" \ - --output $RUN_DIR - ``` - - This saves `figma.png`, `code.png`, and `diff.png` into the run directory. - Replace `:` with `-` in the nodeId for the URL. -7. **Responsive comparison** (if expanded screenshot exists): - - List `screenshot-*.png` in the fixture directory. Extract the width number from each filename, sort numerically. If 2+ screenshots exist, the smallest width is the original and the largest is the expanded viewport. - - ```bash - # Example: screenshot-1200.png (original), screenshot-1920.png (expanded) - SCREENSHOTS=($(ls /screenshot-*.png | sort -t- -k2 -n)) - LARGEST="${SCREENSHOTS[-1]}" - LARGEST_WIDTH=$(echo "$LARGEST" | grep -oP 'screenshot-\K\d+') - - npx canicode visual-compare $RUN_DIR/output.html \ - --figma-url "https://www.figma.com/design//file?node-id=" \ - --figma-screenshot "$LARGEST" \ - --width "$LARGEST_WIDTH" \ - --expand-root \ - --output $RUN_DIR/responsive - ``` +5. **Strip Ablation — HTML generation only**: For each of the **6** strip types, the orchestrator has placed stripped design-trees in `$RUN_DIR/stripped/`. Convert each to HTML. - The command outputs JSON to stdout with a `similarity` field. Record it as `responsiveSimilarity` and calculate `responsiveDelta = similarity - responsiveSimilarity`. - If only 1 screenshot exists, skip responsive comparison and set `responsiveSimilarity`, `responsiveDelta`, and `responsiveViewport` to `null`. -8. Use similarity to determine overall difficulty (thresholds defined in `src/agents/orchestrator.ts` → `SIMILARITY_DIFFICULTY_THRESHOLDS`): + **Strip types** (process every one): `layout-direction-spacing`, `size-constraints`, `component-references`, `node-names-hierarchy`, `variable-references`, `style-references` - | Similarity | Difficulty | - |-----------|-----------| - | 90%+ | easy | - | 70-89% | moderate | - | 50-69% | hard | - | <50% | failed | + For each ``: + a. Read `$RUN_DIR/stripped/.txt` + b. Convert to HTML with the same rules as baseline (PROMPT.md); save `$RUN_DIR/stripped/.html` -9. **MANDATORY — Rule Impact Assessment**: For EVERY rule ID in `nodeIssueSummaries[].flaggedRuleIds`, assess its actual impact on conversion. Read the analysis JSON, collect all unique `flaggedRuleIds`, and for each one write an entry in `ruleImpactAssessment`. This array MUST NOT be empty if there are flagged rules. +6. **MANDATORY — Rule Impact Assessment**: For EVERY rule ID in `nodeIssueSummaries[].flaggedRuleIds`, assess its actual impact on conversion. Read the analysis JSON, collect all unique `flaggedRuleIds`, and for each one write an entry in `ruleImpactAssessment`. This array MUST NOT be empty if there are flagged rules. - Did this rule's issue actually make the conversion harder? - What was its real impact on the final similarity score? - Rate as: `easy` (no real difficulty), `moderate` (some guessing needed), `hard` (significant pixel loss), `failed` (could not reproduce) -10. **Code metrics** (shared CLI — recorded for analysis/reporting): - - ```bash - npx canicode code-metrics $RUN_DIR/output.html - ``` - - Returns JSON with `htmlBytes`, `htmlLines`, `cssClassCount`, `cssVariableCount`. -11. Note any difficulties NOT covered by existing rules as `uncoveredStruggles` - - **Only include design-related issues** — problems in the Figma file structure, missing tokens, ambiguous layout, etc. - - **Exclude environment/tooling issues** — font CDN availability, screenshot DPI/retina scaling, browser rendering quirks, network issues, CI limitations. These are not design problems. -12. **Strip Ablation** (objective difficulty measurement): For each of the **6** strip types, the orchestrator places stripped design-trees in `$RUN_DIR/stripped/`. Convert each to HTML, then collect the **same categories of metrics as the baseline** (pixel similarity, optional responsive similarity, design-tree token estimate, HTML size, CSS counts). Strip rows in `conversion.json` must populate `StripDeltaResultSchema` (`src/agents/contracts/conversion-agent.ts`). - - **Strip types** (process every one): `layout-direction-spacing`, `size-constraints`, `component-references`, `node-names-hierarchy`, `variable-references`, `style-references` - - For each ``: - - a. Read `$RUN_DIR/stripped/.txt` - b. Convert to HTML with the same rules as baseline (PROMPT.md); save `$RUN_DIR/stripped/.html`, then post-process: - ```bash - npx canicode html-postprocess $RUN_DIR/stripped/.html - ``` - c. **Pixel similarity** (design viewport — same framing as baseline): - ```bash - npx canicode visual-compare $RUN_DIR/stripped/.html \ - --figma-screenshot $RUN_DIR/figma.png \ - --output $RUN_DIR/stripped/ - ``` - Record `strippedSimilarity` from the command JSON stdout. Use the baseline run’s `similarity` as `baselineSimilarity` (same value for every strip row). - d. **Input tokens (design-tree text):** Match `generateDesignTreeWithStats` in `src/core/design-tree/design-tree.ts`: - `inputTokens = ceil(utf8Text.length / 4)` where `utf8Text` is the full file contents as a JavaScript string (use the same string length as if the file were read with UTF-8 decoding). - - `baselineInputTokens` = from `$RUN_DIR/design-tree.txt` - - `strippedInputTokens` = from `$RUN_DIR/stripped/.txt` - - `tokenDelta` = `baselineInputTokens - strippedInputTokens` - Example (Node): `node -e "const fs=require('fs'); const n=Math.ceil(fs.readFileSync(process.argv[1],'utf8').length/4); console.log(n)" "$RUN_DIR/design-tree.txt"` - e. **Code metrics** (shared CLI — covers HTML size + CSS metrics): - ```bash - npx canicode code-metrics $RUN_DIR/output.html # baseline - npx canicode code-metrics $RUN_DIR/stripped/.html # stripped - ``` - From JSON output: `baselineHtmlBytes` / `strippedHtmlBytes`, `baselineCssClassCount` / `strippedCssClassCount`, `baselineCssVariableCount` / `strippedCssVariableCount`. Compute `htmlBytesDelta` = `baselineHtmlBytes - strippedHtmlBytes`. - f. **Responsive similarity at the expanded viewport** (same screenshot + width as step 7): - - If step 7 **skipped** (only one fixture screenshot): set `baselineResponsiveSimilarity`, `strippedResponsiveSimilarity`, `responsiveDelta`, and `responsiveViewport` to `null` on **every** strip row. - - If step 7 **ran**: reuse the same `LARGEST` screenshot path and `LARGEST_WIDTH` variables from step 7. - - - **`size-constraints` (required):** Run visual-compare on the stripped HTML at the expanded viewport so missing size info shows up where it actually breaks (not only at design width): - - ```bash - npx canicode visual-compare $RUN_DIR/stripped/size-constraints.html \ - --figma-screenshot "$LARGEST" \ - --width "$LARGEST_WIDTH" \ - --expand-root \ - --output $RUN_DIR/stripped/size-constraints-responsive - ``` - - Record JSON stdout `similarity` as **`strippedResponsiveSimilarity`**. Set **`baselineResponsiveSimilarity`** to the root conversion field **`responsiveSimilarity`** from step 7 (baseline `output.html` at the same viewport — already measured). Set **`responsiveViewport`** to `LARGEST_WIDTH` (number). Set **`responsiveDelta`** = `baselineResponsiveSimilarity - strippedResponsiveSimilarity` (percentage points). - - - **Other strip types:** Optional — same command pattern with `$RUN_DIR/stripped/.html` and a distinct `--output` directory if you want responsive rows for reporting; otherwise set the four responsive fields to `null`. - - **Derived fields (every strip row):** - - - `delta` = `baselineSimilarity - strippedSimilarity` (percentage points) - - `deltaDifficulty`: use the metric the evaluator uses for that strip family (`src/agents/evaluation-agent.ts` — `getStripDifficultyForRule`): - - `layout-direction-spacing` → map `delta` with `stripDeltaToDifficulty` (`src/core/design-tree/delta.ts` pixel table below) - - `size-constraints` → if `responsiveDelta` is a finite number, map `responsiveDelta` with `stripDeltaToDifficulty`; else map `delta` - - `component-references`, `node-names-hierarchy`, `variable-references`, `style-references` → if both input token counts are present, map with `tokenDeltaToDifficulty(baselineInputTokens, strippedInputTokens)`; else map `delta` with `stripDeltaToDifficulty` - - Pixel / responsive threshold table (`stripDeltaToDifficulty`): - - | Delta (%p) | Difficulty | - |-----------|------------| - | ≤ 5 | easy | - | 6–15 | moderate | - | 16–30 | hard | - | > 30 | failed | - - Token threshold table (`tokenDeltaToDifficulty`): percentage = `(baselineInputTokens - strippedInputTokens) / baselineInputTokens * 100` — ≤5% easy, 6–20% moderate, 21–40% hard, >40% failed (baseline 0 → treat as easy). +7. Note any difficulties NOT covered by existing rules as `uncoveredStruggles` + - **Only include design-related issues** — problems in the Figma file structure, missing tokens, ambiguous layout, etc. + - **Exclude environment/tooling issues** — font CDN availability, screenshot DPI/retina scaling, browser rendering quirks, network issues, CI limitations. These are not design problems. ## Output -Write results to `$RUN_DIR/conversion.json`. +Write results to `$RUN_DIR/converter-assessment.json`. **CRITICAL: `ruleImpactAssessment` MUST contain one entry per unique flagged rule ID. An empty array means the calibration pipeline cannot evaluate rule scores.** ```json { "rootNodeId": "562:9069", - "generatedCode": "// The full HTML page", - "similarity": 87, - "responsiveSimilarity": 72, - "responsiveDelta": 15, - "responsiveViewport": 1920, - "htmlBytes": 42000, - "htmlLines": 850, - "cssClassCount": 45, - "cssVariableCount": 12, - "difficulty": "moderate", - "notes": "Summary of the conversion experience", "ruleImpactAssessment": [ { - "ruleId": "raw-value", - "issueCount": 4, - "actualImpact": "easy", - "description": "Colors were directly available in design tree, no difficulty" - }, - { - "ruleId": "detached-instance", - "issueCount": 2, - "actualImpact": "easy", - "description": "Detached instances rendered identically to attached ones" + "ruleId": "no-auto-layout", + "issueCount": 5, + "actualImpact": "high", + "description": "..." } ], - "stripDeltas": [ - { - "stripType": "layout-direction-spacing", - "baselineSimilarity": 87, - "strippedSimilarity": 75, - "delta": 12, - "deltaDifficulty": "moderate", - "baselineResponsiveSimilarity": null, - "strippedResponsiveSimilarity": null, - "responsiveDelta": null, - "responsiveViewport": null, - "baselineInputTokens": 2400, - "strippedInputTokens": 2380, - "tokenDelta": 20, - "baselineHtmlBytes": 42000, - "strippedHtmlBytes": 41500, - "htmlBytesDelta": 500, - "baselineCssClassCount": 45, - "strippedCssClassCount": 44, - "baselineCssVariableCount": 12, - "strippedCssVariableCount": 12 - }, - { - "stripType": "size-constraints", - "baselineSimilarity": 87, - "strippedSimilarity": 86, - "delta": 1, - "deltaDifficulty": "moderate", - "baselineResponsiveSimilarity": 72, - "strippedResponsiveSimilarity": 58, - "responsiveDelta": 14, - "responsiveViewport": 1920, - "baselineInputTokens": 2400, - "strippedInputTokens": 2200, - "tokenDelta": 200, - "baselineHtmlBytes": 42000, - "strippedHtmlBytes": 41800, - "htmlBytesDelta": 200, - "baselineCssClassCount": 45, - "strippedCssClassCount": 45, - "baselineCssVariableCount": 12, - "strippedCssVariableCount": 12 - }, - { - "stripType": "component-references", - "baselineSimilarity": 87, - "strippedSimilarity": 84, - "delta": 3, - "deltaDifficulty": "hard", - "baselineResponsiveSimilarity": null, - "strippedResponsiveSimilarity": null, - "responsiveDelta": null, - "responsiveViewport": null, - "baselineInputTokens": 2400, - "strippedInputTokens": 1800, - "tokenDelta": 600, - "baselineHtmlBytes": 42000, - "strippedHtmlBytes": 39000, - "htmlBytesDelta": 3000, - "baselineCssClassCount": 45, - "strippedCssClassCount": 38, - "baselineCssVariableCount": 12, - "strippedCssVariableCount": 10 - } - ], - "interpretations": [ - "Used system font fallback for Inter (not installed in CI)", - "Set body margin to 0 (not specified in design tree)" - ], "uncoveredStruggles": [ { - "description": "A difficulty not covered by any flagged rule", - "suggestedCategory": "pixel-critical | responsive-critical | code-quality | token-management | interaction | semantic", - "estimatedImpact": "easy | moderate | hard | failed" + "description": "...", + "suggestedCategory": "pixel-critical", + "estimatedImpact": "medium" } - ] + ], + "interpretations": ["guessed X as Y", "assumed Z"] } ``` +The orchestrator will run all measurements (html-postprocess, visual-compare, code-metrics) and assemble the final `conversion.json` by merging your assessment with the measurement results. + ## Rules -- Do NOT modify any source files. Only write to the run directory. -- Implement the FULL design, not individual nodes. -- If visual-compare fails (rate limit, etc.), set similarity to -1 and explain in notes. -- Return a brief summary so the orchestrator can proceed. +- **Do NOT run visual-compare, html-postprocess, or code-metrics.** The orchestrator handles all measurements. +- **Do NOT write conversion.json.** Write only `converter-assessment.json`. The orchestrator assembles the final conversion.json. +- Focus on accurate HTML implementation and honest rule impact assessment. +- Each strip HTML should be a fresh implementation from the stripped design-tree, not a modification of the baseline. diff --git a/.claude/commands/calibrate-loop.md b/.claude/commands/calibrate-loop.md index 38c0e24..85521af 100644 --- a/.claude/commands/calibrate-loop.md +++ b/.claude/commands/calibrate-loop.md @@ -50,7 +50,7 @@ If tier is `"visual-only"`, append after Converter completes: {"step":"Gap Analyzer","timestamp":"","result":"SKIPPED — tier=visual-only, gap analysis skipped","durationMs":0} ``` -### Step 2 — Converter (Baseline + Strip Ablation) +### Step 2 — Converter (HTML Generation) Read the analysis JSON to extract `fileKey`. Also determine the root nodeId — if the input was a Figma URL, parse the node-id from it. If it was a fixture, use the document root id. @@ -91,32 +91,147 @@ Fixture directory: fileKey: Root nodeId: Run directory: -figma.png is already in the run directory (copied from fixture screenshot). visual-compare will reuse it. +design-tree.txt is already in the run directory. Stripped design-trees are pre-generated in $RUN_DIR/stripped/. -After completing the baseline conversion (steps 1-10), proceed with step 11 (Strip Ablation) -to convert each stripped design-tree and measure similarity deltas. -``` -The Converter writes `output.html`, `conversion.json`, `design-tree.txt` to $RUN_DIR and runs `visual-compare --output $RUN_DIR` which creates `figma.png` (or reuses cached), `code.png`, `diff.png`. It also writes stripped HTML files and their comparison results. +Your job: implement baseline HTML (output.html) + 6 strip HTMLs (stripped/.html), +then write converter-assessment.json with ruleImpactAssessment + uncoveredStruggles. +Do NOT run visual-compare, html-postprocess, or code-metrics — the orchestrator handles measurements. +``` After the Converter returns, **verify** these files exist in $RUN_DIR: ```bash -ls $RUN_DIR/conversion.json $RUN_DIR/output.html +ls $RUN_DIR/output.html $RUN_DIR/converter-assessment.json +``` + +If `converter-assessment.json` is missing, write it yourself from the Converter's returned summary. + +**Record token usage**: The subagent result includes `total_tokens`, `tool_uses`, `duration_ms` in usage metadata. Store these for later inclusion in conversion.json. + +Append to `$RUN_DIR/activity.jsonl`: +```json +{"step":"Converter","timestamp":"","result":"baseline + 6 strips written, tokens=","durationMs":} +``` + +### Step 2.5 — Measurements (CLI — no LLM) + +Run all measurements on the Converter's HTML outputs. This is deterministic — no subagent needed. + +**Baseline measurements:** + +```bash +# Post-process HTML (sanitize + inject local fonts) +npx canicode html-postprocess $RUN_DIR/output.html + +# Visual comparison (baseline) +npx canicode visual-compare $RUN_DIR/output.html \ + --figma-screenshot $RUN_DIR/figma.png \ + --output $RUN_DIR +``` + +Record the `similarity` from visual-compare JSON stdout. + +**Responsive comparison** (if expanded screenshot exists): + +List `screenshot-*.png` in the fixture directory. Extract the width number from each filename, sort numerically. If 2+ screenshots exist, the smallest width is the original and the largest is the expanded viewport. + +```bash +# Example: screenshot-1200.png (original), screenshot-1920.png (expanded) +SCREENSHOTS=($(ls /screenshot-*.png | sort -t- -k2 -n)) +LARGEST="${SCREENSHOTS[-1]}" +LARGEST_WIDTH=$(echo "$LARGEST" | grep -oP 'screenshot-\K\d+') + +npx canicode visual-compare $RUN_DIR/output.html \ + --figma-screenshot "$LARGEST" \ + --width "$LARGEST_WIDTH" \ + --expand-root \ + --output $RUN_DIR/responsive +``` + +Record `responsiveSimilarity` from JSON stdout. If only 1 screenshot exists, set `responsiveSimilarity`, `responsiveDelta`, `responsiveViewport` to `null`. + +**Code metrics (baseline):** + +```bash +npx canicode code-metrics $RUN_DIR/output.html +``` + +Record `htmlBytes`, `htmlLines`, `cssClassCount`, `cssVariableCount` from JSON stdout. + +**Strip measurements** — for each of the 6 strip types: + +```bash +# Post-process +npx canicode html-postprocess $RUN_DIR/stripped/.html + +# Visual comparison +npx canicode visual-compare $RUN_DIR/stripped/.html \ + --figma-screenshot $RUN_DIR/figma.png \ + --output $RUN_DIR/stripped/ + +# Code metrics +npx canicode code-metrics $RUN_DIR/stripped/.html +``` + +For each strip, record `strippedSimilarity`, `strippedHtmlBytes`, `strippedCssClassCount`, `strippedCssVariableCount` from the CLI outputs. + +**Input tokens** (design-tree text): `inputTokens = ceil(utf8Text.length / 4)` +- `baselineInputTokens` from `$RUN_DIR/design-tree.txt` +- `strippedInputTokens` from `$RUN_DIR/stripped/.txt` +- `tokenDelta` = `baselineInputTokens - strippedInputTokens` + +**Responsive for size-constraints strip** (if responsive comparison ran above): + +```bash +npx canicode visual-compare $RUN_DIR/stripped/size-constraints.html \ + --figma-screenshot "$LARGEST" \ + --width "$LARGEST_WIDTH" \ + --expand-root \ + --output $RUN_DIR/stripped/size-constraints-responsive ``` -If `conversion.json` is missing, write it yourself from the Converter's returned summary. +Other strip types: set responsive fields to `null`. + +**Derived fields (every strip row):** + +- `delta` = `baselineSimilarity - strippedSimilarity` (percentage points) +- `htmlBytesDelta` = `baselineHtmlBytes - strippedHtmlBytes` +- `deltaDifficulty`: use the metric the evaluator uses for that strip family (`src/agents/evaluation-agent.ts` — `getStripDifficultyForRule`): + - `layout-direction-spacing` → map `delta` with `stripDeltaToDifficulty` (≤5 easy, 6–15 moderate, 16–30 hard, >30 failed) + - `size-constraints` → if `responsiveDelta` is finite, map `responsiveDelta` with `stripDeltaToDifficulty`; else map `delta` + - `component-references`, `node-names-hierarchy`, `variable-references`, `style-references` → if both token counts present, map with `tokenDeltaToDifficulty` (≤5% easy, 6–20% moderate, 21–40% hard, >40% failed); else map `delta` with `stripDeltaToDifficulty` -**Verify strip deltas**: Read `conversion.json` and check that `stripDeltas` array is present and non-empty. If missing (Converter didn't complete strip ablation), log a warning but continue — the evaluation will fall back to Converter self-assessment. +**Difficulty from similarity:** Use `SIMILARITY_DIFFICULTY_THRESHOLDS` from `src/agents/orchestrator.ts`: 90%+ easy, 70-89% moderate, 50-69% hard, <50% failed. + +**Assemble `conversion.json`**: Merge Converter's `converter-assessment.json` (ruleImpactAssessment, uncoveredStruggles) with all measurement results: + +```json +{ + "rootNodeId": "", + "similarity": , + "difficulty": "", + "responsiveSimilarity": , + "responsiveDelta": , + "responsiveViewport": , + "htmlBytes": , + "htmlLines": , + "cssClassCount": , + "cssVariableCount": , + "ruleImpactAssessment": , + "uncoveredStruggles": , + "stripDeltas": [], + "converterTokens": , + "converterToolUses": , + "converterDurationMs": +} +``` -**Record token usage**: The subagent result includes `total_tokens`, `tool_uses`, `duration_ms` in usage metadata. Read `conversion.json`, add these fields, and write back: -- `converterTokens`: total tokens consumed by the Converter subagent -- `converterToolUses`: number of tool calls -- `converterDurationMs`: execution time in milliseconds +Write `$RUN_DIR/conversion.json`. Append to `$RUN_DIR/activity.jsonl`: ```json -{"step":"Converter","timestamp":"","result":"similarity=% difficulty= strips=/5 tokens=","durationMs":} +{"step":"Measurements","timestamp":"","result":"similarity=% difficulty= strips=/6","durationMs":} ``` ### Step 3 — Gap Analysis diff --git a/CLAUDE.md b/CLAUDE.md index 4d1ed9f..6bf2551 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -126,10 +126,10 @@ Calibration commands are NOT exposed as CLI commands. They run exclusively insid Two scripts, shared helpers: -**`run-phase1.ts` — Strip experiments** +**`run-strip.ts` — Strip experiments** ```bash -ANTHROPIC_API_KEY=sk-... npx tsx src/experiments/ablation/run-phase1.ts +ANTHROPIC_API_KEY=sk-... npx tsx src/experiments/ablation/run-strip.ts ABLATION_FIXTURES=desktop-product-detail ABLATION_TYPES=component-references npx tsx ... ``` diff --git a/src/experiments/ablation/run-phase1.ts b/src/experiments/ablation/run-strip.ts similarity index 99% rename from src/experiments/ablation/run-phase1.ts rename to src/experiments/ablation/run-strip.ts index bb80620..375c933 100644 --- a/src/experiments/ablation/run-phase1.ts +++ b/src/experiments/ablation/run-strip.ts @@ -1,11 +1,11 @@ /** - * Ablation Phase 1: Strip experiments. + * Ablation: Strip experiments. * * For each selected strip type × N fixtures × M runs: * Strip info from design-tree → implement via API → render → compare → record metrics * * Usage: - * ANTHROPIC_API_KEY=sk-... npx tsx src/experiments/ablation/run-phase1.ts + * ANTHROPIC_API_KEY=sk-... npx tsx src/experiments/ablation/run-strip.ts * * Environment variables: * ANTHROPIC_API_KEY, ABLATION_FIXTURES, ABLATION_TYPES, ABLATION_RUNS, ABLATION_BASELINE_ONLY From e8d77e110760e70041e0ccc15024bff251fb4252 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 31 Mar 2026 06:37:04 +0000 Subject: [PATCH 2/3] =?UTF-8?q?fix:=20address=20review=20=E2=80=94=20fix?= =?UTF-8?q?=20actualImpact=20enum=20+=20verify=20stripped=20HTMLs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix example JSON "high" → "hard" (valid: easy/moderate/hard/failed) - Add verification for 6 stripped HTML files after Converter returns https://claude.ai/code/session_01N72z2Wbib4cLhYc3FRdSQi --- .claude/agents/calibration/converter.md | 2 +- .claude/commands/calibrate-loop.md | 8 +++++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/.claude/agents/calibration/converter.md b/.claude/agents/calibration/converter.md index 09ffd30..b957108 100644 --- a/.claude/agents/calibration/converter.md +++ b/.claude/agents/calibration/converter.md @@ -81,7 +81,7 @@ Write results to `$RUN_DIR/converter-assessment.json`. { "ruleId": "no-auto-layout", "issueCount": 5, - "actualImpact": "high", + "actualImpact": "hard", "description": "..." } ], diff --git a/.claude/commands/calibrate-loop.md b/.claude/commands/calibrate-loop.md index 85521af..277c03f 100644 --- a/.claude/commands/calibrate-loop.md +++ b/.claude/commands/calibrate-loop.md @@ -103,9 +103,15 @@ Do NOT run visual-compare, html-postprocess, or code-metrics — the orchestrato After the Converter returns, **verify** these files exist in $RUN_DIR: ```bash ls $RUN_DIR/output.html $RUN_DIR/converter-assessment.json +ls $RUN_DIR/stripped/layout-direction-spacing.html \ + $RUN_DIR/stripped/size-constraints.html \ + $RUN_DIR/stripped/component-references.html \ + $RUN_DIR/stripped/node-names-hierarchy.html \ + $RUN_DIR/stripped/variable-references.html \ + $RUN_DIR/stripped/style-references.html ``` -If `converter-assessment.json` is missing, write it yourself from the Converter's returned summary. +If any file is missing, log a warning naming the missing files but continue. **Record token usage**: The subagent result includes `total_tokens`, `tool_uses`, `duration_ms` in usage metadata. Store these for later inclusion in conversion.json. From edd6e97f21d892d5e488c2592522b4e7056dd578 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 31 Mar 2026 06:49:46 +0000 Subject: [PATCH 3/3] fix: update Step 3 similarity source to conversion.json https://claude.ai/code/session_01N72z2Wbib4cLhYc3FRdSQi --- .claude/commands/calibrate-loop.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.claude/commands/calibrate-loop.md b/.claude/commands/calibrate-loop.md index 277c03f..ca3d6d8 100644 --- a/.claude/commands/calibrate-loop.md +++ b/.claude/commands/calibrate-loop.md @@ -256,7 +256,7 @@ Proceed to Step 4. **If EXISTS**: spawn the `calibration-gap-analyzer` subagent. In the prompt include: - Screenshot paths: `$RUN_DIR/figma.png`, `$RUN_DIR/code.png`, `$RUN_DIR/diff.png` -- Similarity score from the Converter's output +- Similarity score from `$RUN_DIR/conversion.json` - Generated HTML path: `$RUN_DIR/output.html` - Fixture path and analysis JSON path: `$RUN_DIR/analysis.json` - The Converter's interpretations list