let-sunny · let-sunny · Mar 24, 2026 · Mar 23, 2026 · Mar 23, 2026 · Mar 23, 2026
diff --git a/.claude/agents/calibration/arbitrator.md b/.claude/agents/calibration/arbitrator.md
@@ -1,7 +1,7 @@
 ---
 name: calibration-arbitrator
 description: Makes final calibration decisions by weighing Runner and Critic. Applies approved changes to rule-config.ts and commits. Use after calibration-critic completes.
-tools: Read, Write, Edit, Bash
+tools: Read, Edit, Bash
 model: claude-sonnet-4-6
 ---
 
@@ -13,7 +13,7 @@ You receive the Runner's proposals and the Critic's reviews, and make final deci
 - **Both APPROVE** → apply Runner's proposed value
 - **Critic REJECT** → keep current score (no change)
 - **Critic REVISE** → apply the Critic's revised value
-- **New rule proposals** → append to `logs/calibration/new-rule-proposals.md` only, do NOT add to `rule-config.ts`
+- **New rule proposals** → record in `$RUN_DIR/debate.json` only, do NOT add to `rule-config.ts`
 
 ## After Deciding
 
@@ -27,16 +27,28 @@ You receive the Runner's proposals and the Critic's reviews, and make final deci
 
 ## Output
 
-**CRITICAL: Your prompt will contain a line like `Activity log: logs/activity/2026-03-20-22-30-material3-kit.jsonl`. You MUST append your summary to that EXACT file path. Do NOT use any other path. Do NOT create `agent-activity-*.jsonl` or any other file.**
+**Do NOT write to any log files. Return your decisions as JSON text so the orchestrator can save it.**
 
-The log uses **JSON Lines format** — append exactly one JSON object on a single line:
+Only `rule-config.ts` may be edited directly (for approved score changes). All log writes are the orchestrator's job.
+
+Return this JSON structure:
 
 ```json
-{"step":"Arbitrator","timestamp":"<ISO8601>","result":"applied=2 rejected=1 revised=1 newProposals=0","durationMs":<ms>,"decisions":[{"ruleId":"X","decision":"applied","before":-10,"after":-7,"reason":"Critic revised, midpoint applied"},{"ruleId":"X","decision":"rejected","reason":"Critic rejection compelling — insufficient evidence"}]}
+{
+  "timestamp": "<ISO8601>",
+  "summary": "applied=2 rejected=1 revised=1 newProposals=0",
+  "decisions": [
+    {"ruleId": "X", "decision": "applied", "before": -10, "after": -7, "reason": "Critic revised, midpoint applied"},
+    {"ruleId": "X", "decision": "rejected", "reason": "Critic rejection compelling — insufficient evidence"}
+  ],
+  "newRuleProposals": []
+}
 ```
 
 ## Rules
 
+- **Do NOT write to ANY file except `src/rules/rule-config.ts`.** No log files, no `new-rule-proposals.md`, no `debate.json`, no `activity.jsonl`. The orchestrator handles ALL other file I/O.
+- **Do NOT create files.** Only Edit existing `rule-config.ts` when applying approved score changes.
 - Only modify `rule-config.ts` for approved score/severity changes.
 - Never force-push or amend existing commits.
 - If tests fail, revert everything and report which change caused the failure.
diff --git a/.claude/agents/calibration/converter.md b/.claude/agents/calibration/converter.md
@@ -10,11 +10,11 @@ You are the Converter agent in a calibration pipeline. Your job is to implement
 ## Input
 
 You will be given:
-- A path to an analysis JSON file (`logs/calibration/calibration-analysis.json`)
+- A run directory path (`$RUN_DIR`) containing `analysis.json`
 - The original fixture path or Figma URL
 - The `fileKey` and root `nodeId` from the analysis
 
-Read the analysis JSON to get:
+Read `$RUN_DIR/analysis.json` to get:
 - `fileKey`: The Figma file key
 - `nodeIssueSummaries`: Issues grouped by node (used for per-rule impact assessment, not for selecting what to convert)
 
@@ -28,7 +28,7 @@ Use BOTH sources together for accurate conversion:
 
 **Primary source — design tree (structure + CSS-ready values):**
 ```
-npx canicode design-tree <fixture-path> --output /tmp/design-tree.txt
+npx canicode design-tree <fixture-path> --output $RUN_DIR/design-tree.txt
 ```
 This produces a 4KB DOM-like tree with inline CSS styles instead of 250KB+ raw JSON. Each node = one HTML element. Every style value is CSS-ready.
 
@@ -55,11 +55,12 @@ Read and follow `.claude/skills/design-to-code/PROMPT.md` for all code generatio
    - Each node in the tree maps 1:1 to an HTML element
    - Copy style values directly — they are already CSS-ready
    - Follow all rules from DESIGN-TO-CODE-PROMPT.md
-3. Save to `/tmp/calibration-output.html`
+3. Save to `$RUN_DIR/output.html`
 4. Run visual comparison:
    ```
-   npx canicode visual-compare /tmp/calibration-output.html --figma-url "https://www.figma.com/design/<fileKey>/file?node-id=<rootNodeId>"
+   npx canicode visual-compare $RUN_DIR/output.html --figma-url "https://www.figma.com/design/<fileKey>/file?node-id=<rootNodeId>" --output $RUN_DIR
    ```
+   This saves `figma.png`, `code.png`, and `diff.png` into the run directory.
    Replace `:` with `-` in the nodeId for the URL.
 5. Use similarity to determine overall difficulty:
 
@@ -70,14 +71,17 @@ Read and follow `.claude/skills/design-to-code/PROMPT.md` for all code generatio
    | 50-70% | hard |
    | <50% | failed |
 
-6. Review each issue in `nodeIssueSummaries`:
+6. **MANDATORY — Rule Impact Assessment**: For EVERY rule ID in `nodeIssueSummaries[].flaggedRuleIds`, assess its actual impact on conversion. Read the analysis JSON, collect all unique `flaggedRuleIds`, and for each one write an entry in `ruleImpactAssessment`. This array MUST NOT be empty if there are flagged rules.
    - Did this rule's issue actually make the conversion harder?
    - What was its real impact on the final similarity score?
-7. Note any difficulties NOT covered by existing rules
+   - Rate as: `easy` (no real difficulty), `moderate` (some guessing needed), `hard` (significant pixel loss), `failed` (could not reproduce)
+7. Note any difficulties NOT covered by existing rules as `uncoveredStruggles`
 
 ## Output
 
-Write results to `logs/calibration/calibration-conversion.json`:
+Write results to `$RUN_DIR/conversion.json`.
+
+**CRITICAL: `ruleImpactAssessment` MUST contain one entry per unique flagged rule ID. An empty array means the calibration pipeline cannot evaluate rule scores.**
 
 ```json
 {
@@ -90,8 +94,14 @@ Write results to `logs/calibration/calibration-conversion.json`:
     {
       "ruleId": "raw-color",
       "issueCount": 4,
-      "actualImpact": "easy | moderate | hard | failed",
-      "description": "How this rule's issues affected the overall conversion"
+      "actualImpact": "easy",
+      "description": "Colors were directly available in design tree, no difficulty"
+    },
+    {
+      "ruleId": "detached-instance",
+      "issueCount": 2,
+      "actualImpact": "easy",
+      "description": "Detached instances rendered identically to attached ones"
     }
   ],
   "interpretations": [
@@ -108,16 +118,9 @@ Write results to `logs/calibration/calibration-conversion.json`:
 }
 ```
 
-Also append a brief summary to the activity log file specified by the orchestrator.
-The log uses **JSON Lines format** — append exactly one JSON object on a single line:
-
-```json
-{"step":"Converter","timestamp":"<ISO8601>","result":"similarity=<N>% difficulty=<level>","durationMs":<ms>}
-```
-
 ## Rules
 
-- Do NOT modify any source files. Only write to `logs/` and `/tmp/`.
+- Do NOT modify any source files. Only write to the run directory.
 - Implement the FULL design, not individual nodes.
 - If visual-compare fails (rate limit, etc.), set similarity to -1 and explain in notes.
 - Return a brief summary so the orchestrator can proceed.
diff --git a/.claude/agents/calibration/critic.md b/.claude/agents/calibration/critic.md
@@ -1,7 +1,7 @@
 ---
 name: calibration-critic
 description: Challenges calibration proposals from Runner. Rejects low-confidence or over-aggressive adjustments. Use after calibration-runner completes.
-tools: Read, Write
+tools: Read
 model: claude-sonnet-4-6
 ---
 
@@ -35,16 +35,25 @@ For each proposal, output ONE of:
 
 ## Output
 
-**CRITICAL: Your prompt will contain a line like `Append your critique to: logs/activity/2026-03-20-22-30-material3-kit.jsonl`. You MUST append your output to that EXACT file path. Do NOT use any other path. Do NOT create `agent-activity-*.jsonl` or any other file.**
+**Do NOT write any files. Return your critique as JSON text so the orchestrator can save it.**
 
-The log uses **JSON Lines format** — append exactly one JSON object on a single line:
+Return this JSON structure:
 
 ```json
-{"step":"Critic","timestamp":"<ISO8601>","result":"approved=1 rejected=1 revised=1","durationMs":<ms>,"reviews":[{"ruleId":"X","decision":"APPROVE","reason":"3 cases, high confidence"},{"ruleId":"X","decision":"REJECT","reason":"Rule 1 — only 1 case with low confidence"},{"ruleId":"X","decision":"REVISE","revised":-7,"reason":"Rule 2 — change too large, midpoint applied"}]}
+{
+  "timestamp": "<ISO8601>",
+  "summary": "approved=1 rejected=1 revised=1",
+  "reviews": [
+    {"ruleId": "X", "decision": "APPROVE", "reason": "3 cases, high confidence"},
+    {"ruleId": "X", "decision": "REJECT", "reason": "Rule 1 — only 1 case with low confidence"},
+    {"ruleId": "X", "decision": "REVISE", "revised": -7, "reason": "Rule 2 — change too large, midpoint applied"}
+  ]
+}
 ```
 
 ## Rules
 
+- **Do NOT write any files.** The orchestrator handles all file I/O.
 - Do NOT modify `src/rules/rule-config.ts`.
 - Be strict. When in doubt, REJECT or REVISE.
 - Return your full critique so the Arbitrator can decide.
diff --git a/.claude/agents/calibration/gap-analyzer.md b/.claude/agents/calibration/gap-analyzer.md
@@ -1,7 +1,7 @@
 ---
 name: calibration-gap-analyzer
 description: Analyzes visual diff between Figma screenshot and AI-generated code to identify specific causes of pixel differences. Accumulates gap data for rule discovery.
-tools: Bash, Read, Write
+tools: Bash, Read
 model: claude-sonnet-4-6
 ---
 
@@ -10,14 +10,15 @@ You are the Gap Analyzer agent in a calibration pipeline. Your job is to examine
 ## Input
 
 You will be given:
-- Figma screenshot path (e.g., `/tmp/canicode-visual-compare/figma.png`)
-- Code screenshot path (e.g., `/tmp/canicode-visual-compare/code.png`)
-- Diff image path (e.g., `/tmp/canicode-visual-compare/diff.png`)
+- Figma screenshot path (e.g., `$RUN_DIR/figma.png`)
+- Code screenshot path (e.g., `$RUN_DIR/code.png`)
+- Diff image path (e.g., `$RUN_DIR/diff.png`)
 - Similarity score (e.g., 95%)
 - The generated HTML code path
 - The fixture path (for reference)
-- The analysis JSON (nodeIssueSummaries)
+- The analysis JSON (`$RUN_DIR/analysis.json`)
 - The Converter's interpretations list (values that were guessed, not from data)
+- A run directory path (`$RUN_DIR`)
 
 ## Steps
 
@@ -52,7 +53,9 @@ You will be given:
 
 ## Output
 
-Write gap analysis to `logs/calibration/gaps/<fixture-name>-<timestamp>.json`:
+**Do NOT write any files. Return the gap analysis as JSON text so the orchestrator can save it.**
+
+Return this JSON structure:
 
 ```json
 {
@@ -68,14 +71,6 @@ Write gap analysis to `logs/calibration/gaps/<fixture-name>-<timestamp>.json`:
       "causedByInterpretation": false,
       "actionable": true,
       "suggestedRuleCategory": "layout"
-    },
-    {
-      "category": "typography",
-      "description": "System font fallback — Inter not available in Playwright",
-      "pixelImpact": "medium",
-      "coveredByRule": null,
-      "actionable": false,
-      "reason": "Rendering environment limitation"
     }
   ],
   "summary": {
@@ -88,16 +83,9 @@ Write gap analysis to `logs/calibration/gaps/<fixture-name>-<timestamp>.json`:
 }
 ```
 
-Also append a summary to the activity log file specified by the orchestrator.
-The log uses **JSON Lines format** — append exactly one JSON object on a single line:
-
-```json
-{"step":"Gap Analyzer","timestamp":"<ISO8601>","result":"similarity=95% gaps=5 actionable=3 newRuleCandidates=2","durationMs":<ms>}
-```
-
 ## Rules
 
-- Do NOT modify any source files. Only write to `logs/`.
+- **Do NOT write any files.** The orchestrator handles all file I/O.
 - Be specific about pixel values — "4px off" not "slightly off".
 - Distinguish actionable gaps from rendering artifacts clearly.
 - This data accumulates over time — future rule discovery agents will read it.
diff --git a/.claude/agents/calibration/runner.md b/.claude/agents/calibration/runner.md
@@ -9,23 +9,22 @@ You are the Runner agent in a calibration pipeline. You perform analysis only
 
 ## Steps
 
-1. Run `pnpm exec canicode calibrate-analyze $input --output logs/calibration/calibration-analysis.json`
-2. Read the generated `logs/calibration/calibration-analysis.json`
+1. Run `pnpm exec canicode calibrate-analyze $input --run-dir $RUN_DIR`
+2. Read the generated `$RUN_DIR/analysis.json`
 3. Extract the analysis summary: node count, issue count, grade, and the list of `nodeIssueSummaries`
 
 ## Output
 
-Append your report to the activity log file specified by the orchestrator.
-If no log file is specified, use `logs/activity/YYYY-MM-DD-HH-mm-<fixture-name>.jsonl`.
+Append your report to `$RUN_DIR/activity.jsonl` (the run directory is provided by the orchestrator).
 
 The log uses **JSON Lines format** — append exactly one JSON object on a single line:
 
 ```json
-{"step":"Runner","timestamp":"<ISO8601>","result":"nodes=<N> issues=<N> grade=<X>","durationMs":<ms>,"fixture":"<input>","analysisOutput":"logs/calibration/calibration-analysis.json"}
+{"step":"Runner","timestamp":"<ISO8601>","result":"nodes=<N> issues=<N> grade=<X>","durationMs":<ms>,"fixture":"<input>","analysisOutput":"$RUN_DIR/analysis.json"}
 ```
 
 ## Rules
 
-- Do NOT modify any source files. Only write to `logs/`.
+- Do NOT modify any source files. Only write to the run directory.
 - Return your full report text so the orchestrator can proceed.
 - If the analysis produces zero issues, return: "No issues found — calibration not needed."
diff --git a/.claude/agents/rule-discovery/critic.md b/.claude/agents/rule-discovery/critic.md
@@ -1,7 +1,7 @@
 ---
 name: rule-discovery-critic
 description: Challenges whether a new rule adds real value. Decides keep, adjust, or drop based on Evaluator's data.
-tools: Read, Write
+tools: Read
 model: claude-sonnet-4-6
 ---
 
@@ -41,8 +41,9 @@ You will receive:
 
 ## Output
 
-Append your critique to the activity log file specified by the orchestrator.
-The log uses **JSON Lines format** — append exactly one JSON object on a single line:
+**Do NOT write any files. Return your decision as JSON text so the orchestrator can save it.**
+
+Return this JSON structure:
 
 ```json
 {"step":"Critic","timestamp":"<ISO8601>","result":"<KEEP|ADJUST|DROP> for rule <rule-id>","durationMs":<ms>,"ruleId":"<rule-id>","decision":"<KEEP|ADJUST|DROP>","evidenceStrength":"<strong|moderate|weak>","falsePositiveConcern":"<none|low|high>","difficultyCorrelation":"<strong|moderate|weak>","adjustments":{"score":-7,"severity":"blocking","triggerChange":"..."},"dropReason":"..."}

diff --git a/.claude/agents/rule-discovery/designer.md b/.claude/agents/rule-discovery/designer.md
@@ -1,7 +1,7 @@
 ---
 name: rule-discovery-designer
 description: Proposes rule specification based on Researcher findings. Defines check logic, severity, category, and initial score.
-tools: Read, Write
+tools: Read
 model: claude-sonnet-4-6
 ---
 
@@ -30,8 +30,9 @@ You will receive:
 
 ## Output
 
-Append your proposal to the activity log file specified by the orchestrator.
-The log uses **JSON Lines format** — append exactly one JSON object on a single line:
+**Do NOT write any files. Return your proposal as JSON text so the orchestrator can save it.**
+
+Return this JSON structure:
 
 ```json
 {"step":"Designer","timestamp":"<ISO8601>","result":"proposed rule <rule-id>","durationMs":<ms>,"ruleId":"<rule-id>","category":"<category>","severity":"<severity>","initialScore":-5,"trigger":"<when does this fire>","requiresTransformerChanges":false}

diff --git a/.claude/agents/rule-discovery/evaluator.md b/.claude/agents/rule-discovery/evaluator.md
@@ -1,7 +1,7 @@
 ---
 name: rule-discovery-evaluator
 description: Tests new rule against fixtures. Reports issue count, false positive rate, and score impact.
-tools: Bash, Read, Write
+tools: Bash, Read
 model: claude-sonnet-4-6
 ---
 
@@ -33,8 +33,9 @@ You will receive:
 
 ## Output
 
-Append your evaluation to the activity log file specified by the orchestrator.
-The log uses **JSON Lines format** — append exactly one JSON object on a single line:
+**Do NOT write any files. Return your evaluation as JSON text so the orchestrator can save it.**
+
+Return this JSON structure:
 
 ```json
 {"step":"Evaluator","timestamp":"<ISO8601>","result":"verdict=<KEEP|ADJUST|DROP> falsePositiveRate=<X>%","durationMs":<ms>,"ruleId":"<rule-id>","fixtures":[{"name":"material3-kit.json","issues":0,"nodesAffected":0,"scoreImpact":"-X%"}],"falsePositiveRate":"<X>%","verdict":"<KEEP|ADJUST|DROP>","verdictReason":"..."}

diff --git a/.claude/agents/rule-discovery/implementer.md b/.claude/agents/rule-discovery/implementer.md
@@ -35,12 +35,12 @@ You will receive:
 
 ## Output
 
-Append your implementation summary to the activity log file specified by the orchestrator.
-The log uses **JSON Lines format** — append exactly one JSON object on a single line:
+**Do NOT write to log files.** The orchestrator handles activity logging.
 
-```json
-{"step":"Implementer","timestamp":"<ISO8601>","result":"implemented rule <rule-id> lintOk=true testsOk=true buildOk=true","durationMs":<ms>,"ruleId":"<rule-id>","filesModified":["src/core/rules/<category>/index.ts","src/core/rules/rule-config.ts","src/core/rules/index.ts"],"newTests":0,"lintOk":true,"testsOk":true,"buildOk":true}
-```
+Return a summary of what you did, including:
+- Rule ID
+- Files modified
+- Whether lint, tests, and build passed
 
 ## Rules