docs: update calibration and rule discovery documentation#88
Conversation
- Update CALIBRATION.md: 4-agent → 6-agent pipeline, add Gap Analyzer (Step 3), Prune Evidence (Step 6.5), tiered approach, cross-run evidence section, fix rule-config.ts path - Update CALIBRATION-PLAYBOOK.md: nightly command with fixture input, evidence accumulation in step table, discovery evidence pruning, remove server scripts reference - Update SCORING.md: align calibration steps with current pipeline - Remove duplicated pipeline structure section and outdated Next Steps Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughExpanded calibration pipeline to six steps (added Gap Analyzer and Prune Evidence), moved evidence persistence to Changes
Sequence Diagram(s)sequenceDiagram
participant Researcher
participant Converter
participant Orchestrator
participant GapAnalyzer
participant Evaluator
participant Arbitrator
participant DataStore as "data/*.json"
Researcher->>Converter: produce one-page HTML for visual-compare
Converter->>Orchestrator: provide HTML + fixture
Orchestrator->>GapAnalyzer: supply diff images for pixel-gap categorization
GapAnalyzer->>DataStore: append uncovered gaps -> `data/discovery-evidence.json`
Orchestrator->>Evaluator: run A/B + pixel-similarity checks
Evaluator->>DataStore: append score/impact -> `data/calibration-evidence.json`
Arbitrator->>DataStore: prune applied/covered evidence from discovery/calibration files
Arbitrator->>Orchestrator: commit applied changes / append `debate.json`
Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/CALIBRATION-PLAYBOOK.md`:
- Around line 61-63: The fenced code block containing the command
"/calibrate-night fixtures/" needs a language tag; update that Markdown block to
use triple backticks with "bash" (e.g., ```bash) so the block reads "```bash"
followed by "/calibrate-night fixtures/" and closing "```" to satisfy
markdownlint MD040.
In `@docs/CALIBRATION.md`:
- Around line 62-64: Update the table row that currently reads "A+ and above |
Full 6-step pipeline" to explicitly mention evidence pruning; replace it with
wording like "A+ and above | Full pipeline (6 agents + evidence pruning)" so
Step 6.5 (Prune Evidence) is clearly included—adjust the cell containing "Full
6-step pipeline" accordingly in the docs/CALIBRATION.md table.
- Around line 249-251: The fenced code block containing the command /add-rule
"concept" fixtures/path is missing a language tag; update that block to include
a language identifier (e.g., change ``` to ```bash) so the block becomes ```bash
followed by /add-rule "concept" fixtures/path and the closing ``` to satisfy
markdownlint MD040.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: cf7dd34b-a28a-4dd8-ac57-570500805475
📒 Files selected for processing (3)
docs/CALIBRATION-PLAYBOOK.mddocs/CALIBRATION.mddocs/SCORING.md
- Add bash language tags to fenced code blocks - Update tier policy: always run Converter regardless of grade (was: skip visual for below B) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
docs/CALIBRATION.md (1)
60-63:⚠️ Potential issue | 🟡 MinorClarify the full-tier pipeline scope in the table.
The table at line 62 says "Full pipeline (Converter + Gap Analysis)" but doesn't clearly indicate that all 6 steps plus evidence pruning (Step 6.5) are included. Readers might interpret this as only running Converter and Gap Analyzer, missing the Evaluator, Critic, Arbitrator, and pruning steps.
Consider revising the "Pipeline" column to be more explicit, such as "Full 6-step pipeline + evidence pruning" or "All steps (Converter through Arbitrator + pruning)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/CALIBRATION.md` around lines 60 - 63, The "Full pipeline (Converter + Gap Analysis)" cell is ambiguous; update the Pipeline column to explicitly state that it includes all steps and evidence pruning — e.g., replace that text with "Full 6-step pipeline + evidence pruning (Converter → Evaluator → Critic → Gap Analyzer → Arbitrator + Step 6.5 pruning)" or "All steps (Converter through Arbitrator) + evidence pruning" so readers know it includes Converter, Evaluator, Critic, Gap Analyzer, Arbitrator and Step 6.5 pruning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/CALIBRATION.md`:
- Line 3: The phrase "6-agent pipeline" is inconsistent with the pipeline
diagram which lists Subagent roles (Converter, Gap Analyzer, Critic, Arbitrator)
and marks Analysis and Evaluation as CLI; update the wording to accurately
reflect this by either changing "6-agent pipeline" to "6-step pipeline with 4
agents" or explicitly stating that the count includes two CLI steps (Analysis
and Evaluation) in addition to the four Subagent agents, and adjust any
neighboring text to match the chosen phrasing so readers aren’t confused about
what constitutes an "agent."
- Around line 248-259: Update the documentation to state a "5-agent pipeline"
instead of "6-agent pipeline" for the /add-rule flow and adjust the agent list
to: Researcher, Designer, Implementer, Evaluator, Critic; clarify that Step 4
("A/B Visual Validation") is an orchestrated process executed by the
Orchestrator rather than a separate agent, keeping the step order but removing
it from the enumerated agent count and wording around agents in CALIBRATION.md
and the /add-rule description.
---
Duplicate comments:
In `@docs/CALIBRATION.md`:
- Around line 60-63: The "Full pipeline (Converter + Gap Analysis)" cell is
ambiguous; update the Pipeline column to explicitly state that it includes all
steps and evidence pruning — e.g., replace that text with "Full 6-step pipeline
+ evidence pruning (Converter → Evaluator → Critic → Gap Analyzer → Arbitrator +
Step 6.5 pruning)" or "All steps (Converter through Arbitrator) + evidence
pruning" so readers know it includes Converter, Evaluator, Critic, Gap Analyzer,
Arbitrator and Step 6.5 pruning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6970c5bd-d80a-47bb-8873-391cd614839a
📒 Files selected for processing (2)
docs/CALIBRATION-PLAYBOOK.mddocs/CALIBRATION.md
| New rules are added through a 6-agent pipeline (`/add-rule`). See [CALIBRATION-PLAYBOOK.md](./CALIBRATION-PLAYBOOK.md) for operational details. | ||
|
|
||
| ``` | ||
| /add-rule "concept" fixture.json | ||
| ```bash | ||
| /add-rule "concept" fixtures/path | ||
|
|
||
| Step 1 — Researcher: explore fixture data + accumulated gap data | ||
| Step 1 — Researcher: explore fixture data + data/discovery-evidence.json | ||
| Step 2 — Designer: propose rule spec (ID, category, severity, score) | ||
| Step 3 — Implementer: write rule code + tests | ||
| Step 4 — A/B Visual Validation: implement entire design with/without the rule's data, compare similarity | ||
| Step 4 — A/B Visual Validation: implement design with/without the rule's data, compare similarity | ||
| Step 5 — Evaluator: measure impact, false positives, visual improvement | ||
| Step 6 — Critic: decide KEEP / ADJUST / DROP | ||
| ``` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there's an A/B Visual Validation agent definition
fd -t f 'visual.*validation|a-?b.*test' .claude/agents/rule-discovery/
# List all agent files in rule-discovery directory to confirm count
echo "=== Rule discovery agents ==="
fd -t f '.md$' .claude/agents/rule-discovery/ --exec basenameRepository: let-sunny/canicode
Length of output: 155
🏁 Script executed:
sed -n '120,129p' docs/CALIBRATION-PLAYBOOK.mdRepository: let-sunny/canicode
Length of output: 786
Fix inaccurate "6-agent pipeline" claim—only 5 agents exist.
Line 248 claims a "6-agent pipeline," but the pipeline contains only 5 agents (Researcher, Designer, Implementer, Evaluator, Critic). Step 4, "A/B Visual Validation," is an orchestrated process run by the Orchestrator, not a separate agent. Update the description to reflect the correct agent count.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/CALIBRATION.md` around lines 248 - 259, Update the documentation to
state a "5-agent pipeline" instead of "6-agent pipeline" for the /add-rule flow
and adjust the agent list to: Researcher, Designer, Implementer, Evaluator,
Critic; clarify that Step 4 ("A/B Visual Validation") is an orchestrated process
executed by the Orchestrator rather than a separate agent, keeping the step
order but removing it from the enumerated agent count and wording around agents
in CALIBRATION.md and the /add-rule description.
There was a problem hiding this comment.
맞습니다. 실제 agent 파일은 5개 (Researcher, Designer, Implementer, Evaluator, Critic)이고, Step 4 A/B Visual Validation은 Orchestrator가 수행하는 프로세스입니다. "5-agent pipeline"으로 수정하는 게 정확합니다.
Generated by Claude Code
There was a problem hiding this comment.
@let-sunny, thanks for confirming! Here's the correction needed in docs/CALIBRATION.md:
-New rules are added through a 6-agent pipeline (`/add-rule`). See [CALIBRATION-PLAYBOOK.md](./CALIBRATION-PLAYBOOK.md) for operational details.
+New rules are added through a 5-agent pipeline (`/add-rule`). See [CALIBRATION-PLAYBOOK.md](./CALIBRATION-PLAYBOOK.md) for operational details.
```bash
/add-rule "concept" fixtures/path
Step 1 — Researcher: explore fixture data + data/discovery-evidence.json
Step 2 — Designer: propose rule spec (ID, category, severity, score)
Step 3 — Implementer: write rule code + tests
-Step 4 — A/B Visual Validation: implement design with/without the rule's data, compare similarity
+Step 4 — A/B Visual Validation (Orchestrator): implement design with/without the rule's data, compare similarity
Step 5 — Evaluator: measure impact, false positives, visual improvement
Step 6 — Critic: decide KEEP / ADJUST / DROP
Also worth checking `docs/CALIBRATION-PLAYBOOK.md` for the same "6-agent" wording — the AI summary mentions the Orchestrator step is described there as well, so it may need the same correction for consistency.
---
<details>
<summary>✏️ Learnings added</summary>
Learnt from: let-sunny
Repo: let-sunny/canicode PR: 88
File: docs/CALIBRATION.md:248-259
Timestamp: 2026-03-26T01:29:06.955Z
Learning: In the let-sunny/canicode repository, the /add-rule rule discovery pipeline has 5 agents (Researcher, Designer, Implementer, Evaluator, Critic). Step 4 "A/B Visual Validation" is an orchestrated process run by the Orchestrator, not a separate agent. The pipeline should always be referred to as a "5-agent pipeline," not "6-agent."
> Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure [review instructions](https://docs.coderabbit.ai/guides/review-instructions). You can manage existing learnings in the [app](https://app.coderabbit.ai/learnings).
</details>
---
<details>
<summary>🧠 Learnings used</summary>
Learnt from: CR
Repo: let-sunny/canicode PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-25T18:32:58.247Z
Learning: The core metric for analysis is visual-compare similarity (0-100%), measuring how accurately AI can reproduce a Figma design as code
---
Learnt from: CR
Repo: let-sunny/canicode PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-25T18:32:58.247Z
Learning: Component-related rule scores should NOT be lowered based on small fixture calibration; calibration fixtures should include large, complex pages
---
Learnt from: let-sunny
Repo: let-sunny/canicode PR: 59
File: src/cli/commands/internal/fixture-management.ts:121-129
Timestamp: 2026-03-25T11:36:48.861Z
Learning: In src/cli/commands/internal/fixture-management.ts (and related internal calibration commands), using console.log + exit code 0 for missing run directories or null results from parseDebateResult is intentional by design. These are internal commands consumed by subagents that check stdout text output rather than exit codes; switching to console.error + process.exit(1) would break the calibration pipeline.
</details>
<!-- This is an auto-generated reply by CodeRabbit -->
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docs/CALIBRATION.md (2)
56-66:⚠️ Potential issue | 🟡 MinorTable still doesn't mention Step 6.5 (Prune Evidence).
Line 62 says "Full pipeline (Converter + Gap Analysis)" but doesn't explicitly reference Step 6.5 (Prune Evidence) defined at lines 50-53. Since pruning is part of the full pipeline flow, the table should clarify that the full tier includes evidence pruning.
Suggested wording clarification
-| A+ and above | Full pipeline (Converter + Gap Analysis) | High-quality designs benefit from gap analysis | +| A+ and above | Full pipeline (Converter + Gap Analysis + Evidence Pruning) | High-quality designs benefit from gap analysis and evidence management |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/CALIBRATION.md` around lines 56 - 66, The Tiered Approach table omits Step 6.5 (Prune Evidence); update the table text so the "Full pipeline" entry explicitly mentions Prune Evidence (Step 6.5) as part of the flow—e.g., change "Full pipeline (Converter + Gap Analysis)" to "Full pipeline (Converter + Gap Analysis + Step 6.5 Prune Evidence)" and ensure the "Always run the Converter" note remains; reference the "Tiered Approach" heading and "Step 6.5 (Prune Evidence)" when making the edit.
246-262:⚠️ Potential issue | 🟡 MinorFix agent count—should be 5 agents, not 6.
Line 248 claims a "6-agent pipeline," but the previous review identified that Step 4 (A/B Visual Validation) is an orchestrated process run by the Orchestrator, not a separate agent. The actual agents are: Researcher, Designer, Implementer, Evaluator, and Critic (5 total).
Suggested fix
-New rules are added through a 6-agent pipeline (`/add-rule`). See [CALIBRATION-PLAYBOOK.md](./CALIBRATION-PLAYBOOK.md) for operational details. +New rules are added through a 5-agent pipeline (`/add-rule`). See [CALIBRATION-PLAYBOOK.md](./CALIBRATION-PLAYBOOK.md) for operational details.Additionally, consider clarifying that Step 4 is orchestrated rather than a dedicated agent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/CALIBRATION.md` around lines 246 - 262, Update the "Rule Discovery Pipeline" section to correct the agent count from "6-agent pipeline" to "5-agent pipeline" and add a short clarifying note that "Step 4 — A/B Visual Validation" is an orchestrated process run by the Orchestrator rather than a separate agent; keep the listed steps (Researcher, Designer, Implementer, A/B Visual Validation, Evaluator, Critic) but change the header text to "5-agent pipeline" and append one sentence indicating Step 4 is orchestration not an independent agent so readers understand the count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/CALIBRATION.md`:
- Line 242: The line stating "Gap data is also saved per run in
`logs/calibration/*/gaps.json`" is inconsistent with the updated summary; check
whether the code still writes per-run gaps.json files and if not remove that
sentence from CALIBRATION.md and replace references with the centralized
`data/discovery-evidence.json`; if the files are still produced, update the
sentence to explicitly state both storage locations (per-run
`logs/calibration/*/gaps.json` and centralized `data/discovery-evidence.json`)
and adjust the AI summary/overview text to match; search for the literal string
"gaps.json" and update the docs and summary accordingly.
---
Duplicate comments:
In `@docs/CALIBRATION.md`:
- Around line 56-66: The Tiered Approach table omits Step 6.5 (Prune Evidence);
update the table text so the "Full pipeline" entry explicitly mentions Prune
Evidence (Step 6.5) as part of the flow—e.g., change "Full pipeline (Converter +
Gap Analysis)" to "Full pipeline (Converter + Gap Analysis + Step 6.5 Prune
Evidence)" and ensure the "Always run the Converter" note remains; reference the
"Tiered Approach" heading and "Step 6.5 (Prune Evidence)" when making the edit.
- Around line 246-262: Update the "Rule Discovery Pipeline" section to correct
the agent count from "6-agent pipeline" to "5-agent pipeline" and add a short
clarifying note that "Step 4 — A/B Visual Validation" is an orchestrated process
run by the Orchestrator rather than a separate agent; keep the listed steps
(Researcher, Designer, Implementer, A/B Visual Validation, Evaluator, Critic)
but change the header text to "5-agent pipeline" and append one sentence
indicating Step 4 is orchestration not an independent agent so readers
understand the count.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 02b87ecf-67de-40f4-8daf-567981571bad
📒 Files selected for processing (1)
docs/CALIBRATION.md
Summary
CALIBRATION.mdpipeline from 4-agent to 6-agent (Analysis, Converter, Gap Analyzer, Evaluation, Critic, Arbitrator), add tiered approach, cross-run evidence section, and evidence pruning stepCALIBRATION-PLAYBOOK.mdwith accurate agent names in step table, evidence accumulation details,/calibrate-nightfixture input, and discovery evidence pruningSCORING.mdcalibration steps to match current pipelineCloses #87
Test plan
/calibrate-loopbehavior/add-rulebehavior🤖 Generated with Claude Code
Summary by CodeRabbit