Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 17 additions & 5 deletions .claude/agents/calibration/arbitrator.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: calibration-arbitrator
description: Makes final calibration decisions by weighing Runner and Critic. Applies approved changes to rule-config.ts and commits. Use after calibration-critic completes.
tools: Read, Write, Edit, Bash
tools: Read, Edit, Bash
model: claude-sonnet-4-6
---

Expand All @@ -13,7 +13,7 @@ You receive the Runner's proposals and the Critic's reviews, and make final deci
- **Both APPROVE** → apply Runner's proposed value
- **Critic REJECT** → keep current score (no change)
- **Critic REVISE** → apply the Critic's revised value
- **New rule proposals** → append to `logs/calibration/new-rule-proposals.md` only, do NOT add to `rule-config.ts`
- **New rule proposals** → record in `$RUN_DIR/debate.json` only, do NOT add to `rule-config.ts`

## After Deciding

Expand All @@ -27,16 +27,28 @@ You receive the Runner's proposals and the Critic's reviews, and make final deci

## Output

**CRITICAL: Your prompt will contain a line like `Activity log: logs/activity/2026-03-20-22-30-material3-kit.jsonl`. You MUST append your summary to that EXACT file path. Do NOT use any other path. Do NOT create `agent-activity-*.jsonl` or any other file.**
**Do NOT write to any log files. Return your decisions as JSON text so the orchestrator can save it.**

The log uses **JSON Lines format** — append exactly one JSON object on a single line:
Only `rule-config.ts` may be edited directly (for approved score changes). All log writes are the orchestrator's job.

Return this JSON structure:

```json
{"step":"Arbitrator","timestamp":"<ISO8601>","result":"applied=2 rejected=1 revised=1 newProposals=0","durationMs":<ms>,"decisions":[{"ruleId":"X","decision":"applied","before":-10,"after":-7,"reason":"Critic revised, midpoint applied"},{"ruleId":"X","decision":"rejected","reason":"Critic rejection compelling — insufficient evidence"}]}
{
"timestamp": "<ISO8601>",
"summary": "applied=2 rejected=1 revised=1 newProposals=0",
"decisions": [
{"ruleId": "X", "decision": "applied", "before": -10, "after": -7, "reason": "Critic revised, midpoint applied"},
{"ruleId": "X", "decision": "rejected", "reason": "Critic rejection compelling — insufficient evidence"}
],
"newRuleProposals": []
}
```

## Rules

- **Do NOT write to ANY file except `src/rules/rule-config.ts`.** No log files, no `new-rule-proposals.md`, no `debate.json`, no `activity.jsonl`. The orchestrator handles ALL other file I/O.
- **Do NOT create files.** Only Edit existing `rule-config.ts` when applying approved score changes.
- Only modify `rule-config.ts` for approved score/severity changes.
- Never force-push or amend existing commits.
- If tests fail, revert everything and report which change caused the failure.
39 changes: 21 additions & 18 deletions .claude/agents/calibration/converter.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ You are the Converter agent in a calibration pipeline. Your job is to implement
## Input

You will be given:
- A path to an analysis JSON file (`logs/calibration/calibration-analysis.json`)
- A run directory path (`$RUN_DIR`) containing `analysis.json`
- The original fixture path or Figma URL
- The `fileKey` and root `nodeId` from the analysis

Read the analysis JSON to get:
Read `$RUN_DIR/analysis.json` to get:
- `fileKey`: The Figma file key
- `nodeIssueSummaries`: Issues grouped by node (used for per-rule impact assessment, not for selecting what to convert)

Expand All @@ -28,7 +28,7 @@ Use BOTH sources together for accurate conversion:

**Primary source — design tree (structure + CSS-ready values):**
```
npx canicode design-tree <fixture-path> --output /tmp/design-tree.txt
npx canicode design-tree <fixture-path> --output $RUN_DIR/design-tree.txt
```
This produces a 4KB DOM-like tree with inline CSS styles instead of 250KB+ raw JSON. Each node = one HTML element. Every style value is CSS-ready.

Expand All @@ -55,11 +55,12 @@ Read and follow `.claude/skills/design-to-code/PROMPT.md` for all code generatio
- Each node in the tree maps 1:1 to an HTML element
- Copy style values directly — they are already CSS-ready
- Follow all rules from DESIGN-TO-CODE-PROMPT.md
3. Save to `/tmp/calibration-output.html`
3. Save to `$RUN_DIR/output.html`
4. Run visual comparison:
```
npx canicode visual-compare /tmp/calibration-output.html --figma-url "https://www.figma.com/design/<fileKey>/file?node-id=<rootNodeId>"
npx canicode visual-compare $RUN_DIR/output.html --figma-url "https://www.figma.com/design/<fileKey>/file?node-id=<rootNodeId>" --output $RUN_DIR
```
This saves `figma.png`, `code.png`, and `diff.png` into the run directory.
Replace `:` with `-` in the nodeId for the URL.
5. Use similarity to determine overall difficulty:

Expand All @@ -70,14 +71,17 @@ Read and follow `.claude/skills/design-to-code/PROMPT.md` for all code generatio
| 50-70% | hard |
| <50% | failed |

6. Review each issue in `nodeIssueSummaries`:
6. **MANDATORY — Rule Impact Assessment**: For EVERY rule ID in `nodeIssueSummaries[].flaggedRuleIds`, assess its actual impact on conversion. Read the analysis JSON, collect all unique `flaggedRuleIds`, and for each one write an entry in `ruleImpactAssessment`. This array MUST NOT be empty if there are flagged rules.
- Did this rule's issue actually make the conversion harder?
- What was its real impact on the final similarity score?
7. Note any difficulties NOT covered by existing rules
- Rate as: `easy` (no real difficulty), `moderate` (some guessing needed), `hard` (significant pixel loss), `failed` (could not reproduce)
7. Note any difficulties NOT covered by existing rules as `uncoveredStruggles`

## Output

Write results to `logs/calibration/calibration-conversion.json`:
Write results to `$RUN_DIR/conversion.json`.

**CRITICAL: `ruleImpactAssessment` MUST contain one entry per unique flagged rule ID. An empty array means the calibration pipeline cannot evaluate rule scores.**

```json
{
Expand All @@ -90,8 +94,14 @@ Write results to `logs/calibration/calibration-conversion.json`:
{
"ruleId": "raw-color",
"issueCount": 4,
"actualImpact": "easy | moderate | hard | failed",
"description": "How this rule's issues affected the overall conversion"
"actualImpact": "easy",
"description": "Colors were directly available in design tree, no difficulty"
},
{
"ruleId": "detached-instance",
"issueCount": 2,
"actualImpact": "easy",
"description": "Detached instances rendered identically to attached ones"
}
],
"interpretations": [
Expand All @@ -108,16 +118,9 @@ Write results to `logs/calibration/calibration-conversion.json`:
}
```

Also append a brief summary to the activity log file specified by the orchestrator.
The log uses **JSON Lines format** — append exactly one JSON object on a single line:

```json
{"step":"Converter","timestamp":"<ISO8601>","result":"similarity=<N>% difficulty=<level>","durationMs":<ms>}
```

## Rules

- Do NOT modify any source files. Only write to `logs/` and `/tmp/`.
- Do NOT modify any source files. Only write to the run directory.
- Implement the FULL design, not individual nodes.
- If visual-compare fails (rate limit, etc.), set similarity to -1 and explain in notes.
- Return a brief summary so the orchestrator can proceed.
17 changes: 13 additions & 4 deletions .claude/agents/calibration/critic.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: calibration-critic
description: Challenges calibration proposals from Runner. Rejects low-confidence or over-aggressive adjustments. Use after calibration-runner completes.
tools: Read, Write
tools: Read
model: claude-sonnet-4-6
---

Expand Down Expand Up @@ -35,16 +35,25 @@ For each proposal, output ONE of:

## Output

**CRITICAL: Your prompt will contain a line like `Append your critique to: logs/activity/2026-03-20-22-30-material3-kit.jsonl`. You MUST append your output to that EXACT file path. Do NOT use any other path. Do NOT create `agent-activity-*.jsonl` or any other file.**
**Do NOT write any files. Return your critique as JSON text so the orchestrator can save it.**

The log uses **JSON Lines format** — append exactly one JSON object on a single line:
Return this JSON structure:

```json
{"step":"Critic","timestamp":"<ISO8601>","result":"approved=1 rejected=1 revised=1","durationMs":<ms>,"reviews":[{"ruleId":"X","decision":"APPROVE","reason":"3 cases, high confidence"},{"ruleId":"X","decision":"REJECT","reason":"Rule 1 — only 1 case with low confidence"},{"ruleId":"X","decision":"REVISE","revised":-7,"reason":"Rule 2 — change too large, midpoint applied"}]}
{
"timestamp": "<ISO8601>",
"summary": "approved=1 rejected=1 revised=1",
"reviews": [
{"ruleId": "X", "decision": "APPROVE", "reason": "3 cases, high confidence"},
{"ruleId": "X", "decision": "REJECT", "reason": "Rule 1 — only 1 case with low confidence"},
{"ruleId": "X", "decision": "REVISE", "revised": -7, "reason": "Rule 2 — change too large, midpoint applied"}
]
}
```

## Rules

- **Do NOT write any files.** The orchestrator handles all file I/O.
- Do NOT modify `src/rules/rule-config.ts`.
- Be strict. When in doubt, REJECT or REVISE.
- Return your full critique so the Arbitrator can decide.
32 changes: 10 additions & 22 deletions .claude/agents/calibration/gap-analyzer.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: calibration-gap-analyzer
description: Analyzes visual diff between Figma screenshot and AI-generated code to identify specific causes of pixel differences. Accumulates gap data for rule discovery.
tools: Bash, Read, Write
tools: Bash, Read
model: claude-sonnet-4-6
---

Expand All @@ -10,14 +10,15 @@ You are the Gap Analyzer agent in a calibration pipeline. Your job is to examine
## Input

You will be given:
- Figma screenshot path (e.g., `/tmp/canicode-visual-compare/figma.png`)
- Code screenshot path (e.g., `/tmp/canicode-visual-compare/code.png`)
- Diff image path (e.g., `/tmp/canicode-visual-compare/diff.png`)
- Figma screenshot path (e.g., `$RUN_DIR/figma.png`)
- Code screenshot path (e.g., `$RUN_DIR/code.png`)
- Diff image path (e.g., `$RUN_DIR/diff.png`)
- Similarity score (e.g., 95%)
- The generated HTML code path
- The fixture path (for reference)
- The analysis JSON (nodeIssueSummaries)
- The analysis JSON (`$RUN_DIR/analysis.json`)
- The Converter's interpretations list (values that were guessed, not from data)
- A run directory path (`$RUN_DIR`)

## Steps

Expand Down Expand Up @@ -52,7 +53,9 @@ You will be given:

## Output

Write gap analysis to `logs/calibration/gaps/<fixture-name>-<timestamp>.json`:
**Do NOT write any files. Return the gap analysis as JSON text so the orchestrator can save it.**

Return this JSON structure:

```json
{
Expand All @@ -68,14 +71,6 @@ Write gap analysis to `logs/calibration/gaps/<fixture-name>-<timestamp>.json`:
"causedByInterpretation": false,
"actionable": true,
"suggestedRuleCategory": "layout"
},
{
"category": "typography",
"description": "System font fallback — Inter not available in Playwright",
"pixelImpact": "medium",
"coveredByRule": null,
"actionable": false,
"reason": "Rendering environment limitation"
}
],
"summary": {
Expand All @@ -88,16 +83,9 @@ Write gap analysis to `logs/calibration/gaps/<fixture-name>-<timestamp>.json`:
}
```

Also append a summary to the activity log file specified by the orchestrator.
The log uses **JSON Lines format** — append exactly one JSON object on a single line:

```json
{"step":"Gap Analyzer","timestamp":"<ISO8601>","result":"similarity=95% gaps=5 actionable=3 newRuleCandidates=2","durationMs":<ms>}
```

## Rules

- Do NOT modify any source files. Only write to `logs/`.
- **Do NOT write any files.** The orchestrator handles all file I/O.
- Be specific about pixel values — "4px off" not "slightly off".
- Distinguish actionable gaps from rendering artifacts clearly.
- This data accumulates over time — future rule discovery agents will read it.
11 changes: 5 additions & 6 deletions .claude/agents/calibration/runner.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,22 @@ You are the Runner agent in a calibration pipeline. You perform analysis only

## Steps

1. Run `pnpm exec canicode calibrate-analyze $input --output logs/calibration/calibration-analysis.json`
2. Read the generated `logs/calibration/calibration-analysis.json`
1. Run `pnpm exec canicode calibrate-analyze $input --run-dir $RUN_DIR`
2. Read the generated `$RUN_DIR/analysis.json`
3. Extract the analysis summary: node count, issue count, grade, and the list of `nodeIssueSummaries`

## Output

Append your report to the activity log file specified by the orchestrator.
If no log file is specified, use `logs/activity/YYYY-MM-DD-HH-mm-<fixture-name>.jsonl`.
Append your report to `$RUN_DIR/activity.jsonl` (the run directory is provided by the orchestrator).

The log uses **JSON Lines format** — append exactly one JSON object on a single line:

```json
{"step":"Runner","timestamp":"<ISO8601>","result":"nodes=<N> issues=<N> grade=<X>","durationMs":<ms>,"fixture":"<input>","analysisOutput":"logs/calibration/calibration-analysis.json"}
{"step":"Runner","timestamp":"<ISO8601>","result":"nodes=<N> issues=<N> grade=<X>","durationMs":<ms>,"fixture":"<input>","analysisOutput":"$RUN_DIR/analysis.json"}
```

## Rules

- Do NOT modify any source files. Only write to `logs/`.
- Do NOT modify any source files. Only write to the run directory.
- Return your full report text so the orchestrator can proceed.
- If the analysis produces zero issues, return: "No issues found — calibration not needed."
7 changes: 4 additions & 3 deletions .claude/agents/rule-discovery/critic.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: rule-discovery-critic
description: Challenges whether a new rule adds real value. Decides keep, adjust, or drop based on Evaluator's data.
tools: Read, Write
tools: Read
model: claude-sonnet-4-6
---

Expand Down Expand Up @@ -41,8 +41,9 @@ You will receive:

## Output

Append your critique to the activity log file specified by the orchestrator.
The log uses **JSON Lines format** — append exactly one JSON object on a single line:
**Do NOT write any files. Return your decision as JSON text so the orchestrator can save it.**

Return this JSON structure:

```json
{"step":"Critic","timestamp":"<ISO8601>","result":"<KEEP|ADJUST|DROP> for rule <rule-id>","durationMs":<ms>,"ruleId":"<rule-id>","decision":"<KEEP|ADJUST|DROP>","evidenceStrength":"<strong|moderate|weak>","falsePositiveConcern":"<none|low|high>","difficultyCorrelation":"<strong|moderate|weak>","adjustments":{"score":-7,"severity":"blocking","triggerChange":"..."},"dropReason":"..."}
Expand Down
7 changes: 4 additions & 3 deletions .claude/agents/rule-discovery/designer.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: rule-discovery-designer
description: Proposes rule specification based on Researcher findings. Defines check logic, severity, category, and initial score.
tools: Read, Write
tools: Read
model: claude-sonnet-4-6
---

Expand Down Expand Up @@ -30,8 +30,9 @@ You will receive:

## Output

Append your proposal to the activity log file specified by the orchestrator.
The log uses **JSON Lines format** — append exactly one JSON object on a single line:
**Do NOT write any files. Return your proposal as JSON text so the orchestrator can save it.**

Return this JSON structure:

```json
{"step":"Designer","timestamp":"<ISO8601>","result":"proposed rule <rule-id>","durationMs":<ms>,"ruleId":"<rule-id>","category":"<category>","severity":"<severity>","initialScore":-5,"trigger":"<when does this fire>","requiresTransformerChanges":false}
Expand Down
7 changes: 4 additions & 3 deletions .claude/agents/rule-discovery/evaluator.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: rule-discovery-evaluator
description: Tests new rule against fixtures. Reports issue count, false positive rate, and score impact.
tools: Bash, Read, Write
tools: Bash, Read
model: claude-sonnet-4-6
---

Expand Down Expand Up @@ -33,8 +33,9 @@ You will receive:

## Output

Append your evaluation to the activity log file specified by the orchestrator.
The log uses **JSON Lines format** — append exactly one JSON object on a single line:
**Do NOT write any files. Return your evaluation as JSON text so the orchestrator can save it.**

Return this JSON structure:

```json
{"step":"Evaluator","timestamp":"<ISO8601>","result":"verdict=<KEEP|ADJUST|DROP> falsePositiveRate=<X>%","durationMs":<ms>,"ruleId":"<rule-id>","fixtures":[{"name":"material3-kit.json","issues":0,"nodesAffected":0,"scoreImpact":"-X%"}],"falsePositiveRate":"<X>%","verdict":"<KEEP|ADJUST|DROP>","verdictReason":"..."}
Expand Down
10 changes: 5 additions & 5 deletions .claude/agents/rule-discovery/implementer.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,12 @@ You will receive:

## Output

Append your implementation summary to the activity log file specified by the orchestrator.
The log uses **JSON Lines format** — append exactly one JSON object on a single line:
**Do NOT write to log files.** The orchestrator handles activity logging.

```json
{"step":"Implementer","timestamp":"<ISO8601>","result":"implemented rule <rule-id> lintOk=true testsOk=true buildOk=true","durationMs":<ms>,"ruleId":"<rule-id>","filesModified":["src/core/rules/<category>/index.ts","src/core/rules/rule-config.ts","src/core/rules/index.ts"],"newTests":0,"lintOk":true,"testsOk":true,"buildOk":true}
```
Return a summary of what you did, including:
- Rule ID
- Files modified
- Whether lint, tests, and build passed

## Rules

Expand Down
Loading
Loading