Skip to content

docs: update calibration and rule discovery documentation#88

Merged
let-sunny merged 3 commits intomainfrom
docs/calibration-discovery-docs-87
Mar 26, 2026
Merged

docs: update calibration and rule discovery documentation#88
let-sunny merged 3 commits intomainfrom
docs/calibration-discovery-docs-87

Conversation

@let-sunny
Copy link
Copy Markdown
Owner

@let-sunny let-sunny commented Mar 26, 2026

Summary

  • Update CALIBRATION.md pipeline from 4-agent to 6-agent (Analysis, Converter, Gap Analyzer, Evaluation, Critic, Arbitrator), add tiered approach, cross-run evidence section, and evidence pruning step
  • Update CALIBRATION-PLAYBOOK.md with accurate agent names in step table, evidence accumulation details, /calibrate-night fixture input, and discovery evidence pruning
  • Update SCORING.md calibration steps to match current pipeline
  • Remove duplicated pipeline structure section and outdated Next Steps from CALIBRATION.md

Closes #87

Test plan

  • Review rendered markdown for formatting issues
  • Verify pipeline steps match actual /calibrate-loop behavior
  • Verify rule discovery steps match actual /add-rule behavior

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Restructured calibration pipeline from a 4-agent to a 6-step flow with a dedicated Gap Analyzer
    • Persisted discovery and calibration evidence to data/ files and added explicit evidence pruning after changes
    • Clarified Arbitrator responsibilities to prune discovery evidence after applying approved changes
    • Introduced pixel-level diff analysis and pixel-similarity A/B comparison; scoped converter to one HTML page
    • Simplified nightly calibration command with auto-discovery of active fixtures; clarified rule-discovery workflow and penalty semantics

- Update CALIBRATION.md: 4-agent → 6-agent pipeline, add Gap Analyzer
  (Step 3), Prune Evidence (Step 6.5), tiered approach, cross-run
  evidence section, fix rule-config.ts path
- Update CALIBRATION-PLAYBOOK.md: nightly command with fixture input,
  evidence accumulation in step table, discovery evidence pruning,
  remove server scripts reference
- Update SCORING.md: align calibration steps with current pipeline
- Remove duplicated pipeline structure section and outdated Next Steps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

Expanded calibration pipeline to six steps (added Gap Analyzer and Prune Evidence), moved evidence persistence to data/discovery-evidence.json and data/calibration-evidence.json, refined nightly calibration invocation and scoring semantics, and updated rule-config path references in docs.

Changes

Cohort / File(s) Summary
Calibration pipeline docs
docs/CALIBRATION.md, docs/CALIBRATION-PLAYBOOK.md
Replaced 4-agent framing with a 6-step pipeline (Converter, Gap Analyzer, Orchestrator, Evaluator, Arbitrator, Prune Evidence); describe converter producing a single HTML page and use of visual-compare; added gap analysis and explicit evidence pruning; shifted outputs to data/discovery-evidence.json and data/calibration-evidence.json; refined scoring semantics and nightly calibration invocation.
Scoring doc
docs/SCORING.md
Aligned /calibrate-loop description with new workflow: one-page conversion, pixel-diff analysis, and updated comparison wording against Figma screenshots.
Path/reference update
docs/CALIBRATION.md (refs)
Updated rule configuration path references from src/rules/rule-config.ts to src/core/rules/rule-config.ts.

Sequence Diagram(s)

sequenceDiagram
    participant Researcher
    participant Converter
    participant Orchestrator
    participant GapAnalyzer
    participant Evaluator
    participant Arbitrator
    participant DataStore as "data/*.json"

    Researcher->>Converter: produce one-page HTML for visual-compare
    Converter->>Orchestrator: provide HTML + fixture
    Orchestrator->>GapAnalyzer: supply diff images for pixel-gap categorization
    GapAnalyzer->>DataStore: append uncovered gaps -> `data/discovery-evidence.json`
    Orchestrator->>Evaluator: run A/B + pixel-similarity checks
    Evaluator->>DataStore: append score/impact -> `data/calibration-evidence.json`
    Arbitrator->>DataStore: prune applied/covered evidence from discovery/calibration files
    Arbitrator->>Orchestrator: commit applied changes / append `debate.json`
Loading

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰 I hopped through diffs beneath the moonlit code,
Counting pixel gaps along the narrow road,
I nudged the evidence lines and pruned the old away,
Appended new discovery for a brighter day,
A rabbit’s happy tick—docs trimmed, pipelines stowed.

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Linked Issues check ❓ Inconclusive The PR addresses most objectives from issue #87, including updating CALIBRATION.md for the 6-agent pipeline, updating CALIBRATION-PLAYBOOK.md for evidence accumulation, and verifying SCORING.md alignment. However, explicit Rule Discovery process documentation for the /add-rule pipeline appears incomplete or unclear. Clarify whether Rule Discovery process documentation was fully added to CALIBRATION-PLAYBOOK.md or a separate document, as required by issue #87 scope item 3.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: updating calibration and rule discovery documentation to reflect the new pipeline structure.
Out of Scope Changes check ✅ Passed All changes are directly aligned with updating calibration and discovery documentation as specified in issue #87. No out-of-scope changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/calibration-discovery-docs-87

Comment @coderabbitai help to get the list of available commands and usage tips.

@let-sunny let-sunny marked this pull request as ready for review March 26, 2026 00:32
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/CALIBRATION-PLAYBOOK.md`:
- Around line 61-63: The fenced code block containing the command
"/calibrate-night fixtures/" needs a language tag; update that Markdown block to
use triple backticks with "bash" (e.g., ```bash) so the block reads "```bash"
followed by "/calibrate-night fixtures/" and closing "```" to satisfy
markdownlint MD040.

In `@docs/CALIBRATION.md`:
- Around line 62-64: Update the table row that currently reads "A+ and above |
Full 6-step pipeline" to explicitly mention evidence pruning; replace it with
wording like "A+ and above | Full pipeline (6 agents + evidence pruning)" so
Step 6.5 (Prune Evidence) is clearly included—adjust the cell containing "Full
6-step pipeline" accordingly in the docs/CALIBRATION.md table.
- Around line 249-251: The fenced code block containing the command /add-rule
"concept" fixtures/path is missing a language tag; update that block to include
a language identifier (e.g., change ``` to ```bash) so the block becomes ```bash
followed by /add-rule "concept" fixtures/path and the closing ``` to satisfy
markdownlint MD040.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cf7dd34b-a28a-4dd8-ac57-570500805475

📥 Commits

Reviewing files that changed from the base of the PR and between cc39c7b and 85d55cf.

📒 Files selected for processing (3)
  • docs/CALIBRATION-PLAYBOOK.md
  • docs/CALIBRATION.md
  • docs/SCORING.md

- Add bash language tags to fenced code blocks
- Update tier policy: always run Converter regardless of grade
  (was: skip visual for below B)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
docs/CALIBRATION.md (1)

60-63: ⚠️ Potential issue | 🟡 Minor

Clarify the full-tier pipeline scope in the table.

The table at line 62 says "Full pipeline (Converter + Gap Analysis)" but doesn't clearly indicate that all 6 steps plus evidence pruning (Step 6.5) are included. Readers might interpret this as only running Converter and Gap Analyzer, missing the Evaluator, Critic, Arbitrator, and pruning steps.

Consider revising the "Pipeline" column to be more explicit, such as "Full 6-step pipeline + evidence pruning" or "All steps (Converter through Arbitrator + pruning)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/CALIBRATION.md` around lines 60 - 63, The "Full pipeline (Converter +
Gap Analysis)" cell is ambiguous; update the Pipeline column to explicitly state
that it includes all steps and evidence pruning — e.g., replace that text with
"Full 6-step pipeline + evidence pruning (Converter → Evaluator → Critic → Gap
Analyzer → Arbitrator + Step 6.5 pruning)" or "All steps (Converter through
Arbitrator) + evidence pruning" so readers know it includes Converter,
Evaluator, Critic, Gap Analyzer, Arbitrator and Step 6.5 pruning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/CALIBRATION.md`:
- Line 3: The phrase "6-agent pipeline" is inconsistent with the pipeline
diagram which lists Subagent roles (Converter, Gap Analyzer, Critic, Arbitrator)
and marks Analysis and Evaluation as CLI; update the wording to accurately
reflect this by either changing "6-agent pipeline" to "6-step pipeline with 4
agents" or explicitly stating that the count includes two CLI steps (Analysis
and Evaluation) in addition to the four Subagent agents, and adjust any
neighboring text to match the chosen phrasing so readers aren’t confused about
what constitutes an "agent."
- Around line 248-259: Update the documentation to state a "5-agent pipeline"
instead of "6-agent pipeline" for the /add-rule flow and adjust the agent list
to: Researcher, Designer, Implementer, Evaluator, Critic; clarify that Step 4
("A/B Visual Validation") is an orchestrated process executed by the
Orchestrator rather than a separate agent, keeping the step order but removing
it from the enumerated agent count and wording around agents in CALIBRATION.md
and the /add-rule description.

---

Duplicate comments:
In `@docs/CALIBRATION.md`:
- Around line 60-63: The "Full pipeline (Converter + Gap Analysis)" cell is
ambiguous; update the Pipeline column to explicitly state that it includes all
steps and evidence pruning — e.g., replace that text with "Full 6-step pipeline
+ evidence pruning (Converter → Evaluator → Critic → Gap Analyzer → Arbitrator +
Step 6.5 pruning)" or "All steps (Converter through Arbitrator) + evidence
pruning" so readers know it includes Converter, Evaluator, Critic, Gap Analyzer,
Arbitrator and Step 6.5 pruning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6970c5bd-d80a-47bb-8873-391cd614839a

📥 Commits

Reviewing files that changed from the base of the PR and between 85d55cf and 20abd69.

📒 Files selected for processing (2)
  • docs/CALIBRATION-PLAYBOOK.md
  • docs/CALIBRATION.md

Comment on lines +248 to 259
New rules are added through a 6-agent pipeline (`/add-rule`). See [CALIBRATION-PLAYBOOK.md](./CALIBRATION-PLAYBOOK.md) for operational details.

```
/add-rule "concept" fixture.json
```bash
/add-rule "concept" fixtures/path

Step 1 — Researcher: explore fixture data + accumulated gap data
Step 1 — Researcher: explore fixture data + data/discovery-evidence.json
Step 2 — Designer: propose rule spec (ID, category, severity, score)
Step 3 — Implementer: write rule code + tests
Step 4 — A/B Visual Validation: implement entire design with/without the rule's data, compare similarity
Step 4 — A/B Visual Validation: implement design with/without the rule's data, compare similarity
Step 5 — Evaluator: measure impact, false positives, visual improvement
Step 6 — Critic: decide KEEP / ADJUST / DROP
```
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if there's an A/B Visual Validation agent definition
fd -t f 'visual.*validation|a-?b.*test' .claude/agents/rule-discovery/

# List all agent files in rule-discovery directory to confirm count
echo "=== Rule discovery agents ==="
fd -t f '.md$' .claude/agents/rule-discovery/ --exec basename

Repository: let-sunny/canicode

Length of output: 155


🏁 Script executed:

sed -n '120,129p' docs/CALIBRATION-PLAYBOOK.md

Repository: let-sunny/canicode

Length of output: 786


Fix inaccurate "6-agent pipeline" claim—only 5 agents exist.

Line 248 claims a "6-agent pipeline," but the pipeline contains only 5 agents (Researcher, Designer, Implementer, Evaluator, Critic). Step 4, "A/B Visual Validation," is an orchestrated process run by the Orchestrator, not a separate agent. Update the description to reflect the correct agent count.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/CALIBRATION.md` around lines 248 - 259, Update the documentation to
state a "5-agent pipeline" instead of "6-agent pipeline" for the /add-rule flow
and adjust the agent list to: Researcher, Designer, Implementer, Evaluator,
Critic; clarify that Step 4 ("A/B Visual Validation") is an orchestrated process
executed by the Orchestrator rather than a separate agent, keeping the step
order but removing it from the enumerated agent count and wording around agents
in CALIBRATION.md and the /add-rule description.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

맞습니다. 실제 agent 파일은 5개 (Researcher, Designer, Implementer, Evaluator, Critic)이고, Step 4 A/B Visual Validation은 Orchestrator가 수행하는 프로세스입니다. "5-agent pipeline"으로 수정하는 게 정확합니다.


Generated by Claude Code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@let-sunny, thanks for confirming! Here's the correction needed in docs/CALIBRATION.md:

-New rules are added through a 6-agent pipeline (`/add-rule`). See [CALIBRATION-PLAYBOOK.md](./CALIBRATION-PLAYBOOK.md) for operational details.
+New rules are added through a 5-agent pipeline (`/add-rule`). See [CALIBRATION-PLAYBOOK.md](./CALIBRATION-PLAYBOOK.md) for operational details.

 ```bash
 /add-rule "concept" fixtures/path
 
 Step 1 — Researcher: explore fixture data + data/discovery-evidence.json
 Step 2 — Designer: propose rule spec (ID, category, severity, score)
 Step 3 — Implementer: write rule code + tests
-Step 4 — A/B Visual Validation: implement design with/without the rule's data, compare similarity
+Step 4 — A/B Visual Validation (Orchestrator): implement design with/without the rule's data, compare similarity
 Step 5 — Evaluator: measure impact, false positives, visual improvement
 Step 6 — Critic: decide KEEP / ADJUST / DROP

Also worth checking `docs/CALIBRATION-PLAYBOOK.md` for the same "6-agent" wording — the AI summary mentions the Orchestrator step is described there as well, so it may need the same correction for consistency.

---

<details>
<summary>✏️ Learnings added</summary>

Learnt from: let-sunny
Repo: let-sunny/canicode PR: 88
File: docs/CALIBRATION.md:248-259
Timestamp: 2026-03-26T01:29:06.955Z
Learning: In the let-sunny/canicode repository, the /add-rule rule discovery pipeline has 5 agents (Researcher, Designer, Implementer, Evaluator, Critic). Step 4 "A/B Visual Validation" is an orchestrated process run by the Orchestrator, not a separate agent. The pipeline should always be referred to as a "5-agent pipeline," not "6-agent."


> Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure [review instructions](https://docs.coderabbit.ai/guides/review-instructions). You can manage existing learnings in the [app](https://app.coderabbit.ai/learnings).

</details>



---

<details>
<summary>🧠 Learnings used</summary>

Learnt from: CR
Repo: let-sunny/canicode PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-25T18:32:58.247Z
Learning: The core metric for analysis is visual-compare similarity (0-100%), measuring how accurately AI can reproduce a Figma design as code


---

Learnt from: CR
Repo: let-sunny/canicode PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-25T18:32:58.247Z
Learning: Component-related rule scores should NOT be lowered based on small fixture calibration; calibration fixtures should include large, complex pages


---

Learnt from: let-sunny
Repo: let-sunny/canicode PR: 59
File: src/cli/commands/internal/fixture-management.ts:121-129
Timestamp: 2026-03-25T11:36:48.861Z
Learning: In src/cli/commands/internal/fixture-management.ts (and related internal calibration commands), using console.log + exit code 0 for missing run directories or null results from parseDebateResult is intentional by design. These are internal commands consumed by subagents that check stdout text output rather than exit codes; switching to console.error + process.exit(1) would break the calibration pipeline.


</details>



<!-- This is an auto-generated reply by CodeRabbit -->

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docs/CALIBRATION.md (2)

56-66: ⚠️ Potential issue | 🟡 Minor

Table still doesn't mention Step 6.5 (Prune Evidence).

Line 62 says "Full pipeline (Converter + Gap Analysis)" but doesn't explicitly reference Step 6.5 (Prune Evidence) defined at lines 50-53. Since pruning is part of the full pipeline flow, the table should clarify that the full tier includes evidence pruning.

Suggested wording clarification
-| A+ and above | Full pipeline (Converter + Gap Analysis) | High-quality designs benefit from gap analysis |
+| A+ and above | Full pipeline (Converter + Gap Analysis + Evidence Pruning) | High-quality designs benefit from gap analysis and evidence management |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/CALIBRATION.md` around lines 56 - 66, The Tiered Approach table omits
Step 6.5 (Prune Evidence); update the table text so the "Full pipeline" entry
explicitly mentions Prune Evidence (Step 6.5) as part of the flow—e.g., change
"Full pipeline (Converter + Gap Analysis)" to "Full pipeline (Converter + Gap
Analysis + Step 6.5 Prune Evidence)" and ensure the "Always run the Converter"
note remains; reference the "Tiered Approach" heading and "Step 6.5 (Prune
Evidence)" when making the edit.

246-262: ⚠️ Potential issue | 🟡 Minor

Fix agent count—should be 5 agents, not 6.

Line 248 claims a "6-agent pipeline," but the previous review identified that Step 4 (A/B Visual Validation) is an orchestrated process run by the Orchestrator, not a separate agent. The actual agents are: Researcher, Designer, Implementer, Evaluator, and Critic (5 total).

Suggested fix
-New rules are added through a 6-agent pipeline (`/add-rule`). See [CALIBRATION-PLAYBOOK.md](./CALIBRATION-PLAYBOOK.md) for operational details.
+New rules are added through a 5-agent pipeline (`/add-rule`). See [CALIBRATION-PLAYBOOK.md](./CALIBRATION-PLAYBOOK.md) for operational details.

Additionally, consider clarifying that Step 4 is orchestrated rather than a dedicated agent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/CALIBRATION.md` around lines 246 - 262, Update the "Rule Discovery
Pipeline" section to correct the agent count from "6-agent pipeline" to "5-agent
pipeline" and add a short clarifying note that "Step 4 — A/B Visual Validation"
is an orchestrated process run by the Orchestrator rather than a separate agent;
keep the listed steps (Researcher, Designer, Implementer, A/B Visual Validation,
Evaluator, Critic) but change the header text to "5-agent pipeline" and append
one sentence indicating Step 4 is orchestration not an independent agent so
readers understand the count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/CALIBRATION.md`:
- Line 242: The line stating "Gap data is also saved per run in
`logs/calibration/*/gaps.json`" is inconsistent with the updated summary; check
whether the code still writes per-run gaps.json files and if not remove that
sentence from CALIBRATION.md and replace references with the centralized
`data/discovery-evidence.json`; if the files are still produced, update the
sentence to explicitly state both storage locations (per-run
`logs/calibration/*/gaps.json` and centralized `data/discovery-evidence.json`)
and adjust the AI summary/overview text to match; search for the literal string
"gaps.json" and update the docs and summary accordingly.

---

Duplicate comments:
In `@docs/CALIBRATION.md`:
- Around line 56-66: The Tiered Approach table omits Step 6.5 (Prune Evidence);
update the table text so the "Full pipeline" entry explicitly mentions Prune
Evidence (Step 6.5) as part of the flow—e.g., change "Full pipeline (Converter +
Gap Analysis)" to "Full pipeline (Converter + Gap Analysis + Step 6.5 Prune
Evidence)" and ensure the "Always run the Converter" note remains; reference the
"Tiered Approach" heading and "Step 6.5 (Prune Evidence)" when making the edit.
- Around line 246-262: Update the "Rule Discovery Pipeline" section to correct
the agent count from "6-agent pipeline" to "5-agent pipeline" and add a short
clarifying note that "Step 4 — A/B Visual Validation" is an orchestrated process
run by the Orchestrator rather than a separate agent; keep the listed steps
(Researcher, Designer, Implementer, A/B Visual Validation, Evaluator, Critic)
but change the header text to "5-agent pipeline" and append one sentence
indicating Step 4 is orchestration not an independent agent so readers
understand the count.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 02b87ecf-67de-40f4-8daf-567981571bad

📥 Commits

Reviewing files that changed from the base of the PR and between 20abd69 and 4f53ae3.

📒 Files selected for processing (1)
  • docs/CALIBRATION.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: update calibration/discovery docs for new category structure + add rule discovery playbook

1 participant