refactor: separate Converter role + rename run-phase1 (#218 Phase 2) by let-sunny · Pull Request #222 · let-sunny/canicode

let-sunny · 2026-03-31T06:14:24Z

Summary

Converter role separation: Converter now only writes HTML (baseline + 6 strips) and converter-assessment.json (ruleImpactAssessment + uncoveredStruggles). Orchestrator handles all measurements (html-postprocess, visual-compare, code-metrics, responsive, strip deltas) and assembles conversion.json.
Ablation rename: run-phase1.ts → run-strip.ts — clearer name for strip experiments

Before → After

Before:
  orchestrator → Converter subagent (구현 + 측정 + 보고 전부)

After:
  orchestrator
    → Converter subagent (HTML 구현 + self-assessment만)
    → orchestrator가 CLI로 측정 (Step 2.5)
    → orchestrator가 conversion.json 조립

Why

Converter 세션이 너무 길어서 컨텍스트 품질 저하
구현과 측정이 같은 에이전트라 self-assessment 편향
측정은 deterministic CLI로 충분 — LLM 불필요

Test plan

pnpm build passes
pnpm test:run — 39 files, 655 tests all pass
No remaining references to run-phase1

Part of #218 (Phase 2: role separation)

https://claude.ai/code/session_01N72z2Wbib4cLhYc3FRdSQi

Summary by CodeRabbit

Documentation
- Renamed the converter step to "HTML Generation": it now produces a baseline HTML, six stripped HTML variants, and a concise converter assessment; orchestration verifies outputs and logs warnings when missing.
- Added a deterministic Measurements step (CLI) that performs post-processing, visual comparisons, and code metrics, then assembles the full conversion report with derived similarity and difficulty fields.
Refactor
- Clarified the calibration script entry point and updated usage examples.

Phase 2 of pipeline consolidation: Converter role separation: - Converter now only writes HTML (baseline + 6 strips) and converter-assessment.json (ruleImpact + uncoveredStruggles) - Orchestrator handles all measurements: html-postprocess, visual-compare, code-metrics, responsive comparison, strip delta calculations - Orchestrator assembles final conversion.json from measurements + assessment Ablation rename: - run-phase1.ts → run-strip.ts (clearer name for strip experiments) - Updated CLAUDE.md references https://claude.ai/code/session_01N72z2Wbib4cLhYc3FRdSQi

coderabbitai · 2026-03-31T06:14:33Z

📝 Walkthrough

Walkthrough

The converter agent is re-scoped to produce baseline output.html, six stripped/*.html variants, and converter-assessment.json (ruleImpactAssessment, uncoveredStruggles, interpretations). Measurement tasks (html-postprocess, visual-compare, code-metrics, delta/difficulty derivation) are relocated to a deterministic CLI “Step 2.5 — Measurements” run by the orchestrator, which assembles the final conversion.json.

Changes

Cohort / File(s)	Summary
Converter docs & orchestrator flow `.claude/agents/calibration/converter.md`, `.claude/commands/calibrate-loop.md`	Removed converter-side measurement/postprocess steps; converter now reads pre-generated `design-tree.txt`, writes `output.html`, six `stripped/.html`, and `converter-assessment.json`. Added orchestrator Step 2.5 to run measurement CLIs and deterministically assemble `conversion.json`. Orchestrator verification updated to check for `output.html`, `converter-assessment.json`, and all six `stripped/.html` files (warns and continues if missing).
Ablation script rename & docs `CLAUDE.md`, `src/experiments/ablation/run-strip.ts`	Renamed example entry from `run-phase1.ts` to `run-strip.ts`; updated header/comment and example command only; no runtime logic changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Converter as Converter Agent
  participant FS as File System
  participant Orch as Orchestrator
  participant CLI as Measurements CLI
  participant Tools as Measurement Tools

  Converter->>FS: read `$RUN_DIR/design-tree.txt`
  Converter->>FS: write `$RUN_DIR/output.html`
  Converter->>FS: write `$RUN_DIR/stripped/{6}.html`
  Converter->>FS: write `$RUN_DIR/converter-assessment.json`
  Orch->>FS: verify presence of `output.html`, `converter-assessment.json`, `stripped/*.html` (warn if missing)
  Orch->>CLI: invoke Measurements (Step 2.5) with `$RUN_DIR`
  CLI->>Tools: run `html-postprocess`, `visual-compare`, `code-metrics` on baseline + strips
  Tools-->>CLI: return similarities, html-byte metrics, css/variable counts
  CLI->>Orch: return measurements and derived deltas/difficulty
  Orch->>FS: merge `converter-assessment.json` + measurements -> write `$RUN_DIR/conversion.json`

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

refactor: unify pipeline CLI interface (#218 Phase 1) #221 — Moves measurement/postprocess out of the converter and introduces CLI-driven measurements referenced by the new Step 2.5.
feat(calibration): size-constraints strip responsive visual-compare (… #209 — Adjusts responsive visual-compare handling; closely related because responsive comparisons are moved from converter to Measurements step.
feat: strip ablation enhancements for accurate calibration #208 — Expands strip metrics/schema and difficulty derivation logic used by the orchestrator when assembling conversion.json.

Poem

🐰
I hopped through markup, nibbling trees,
Baseline, strips in gentle breeze.
I stamped the rules with careful paw,
Sent measuring tools to pace the law.
Now fields are tidy — hop, hooray, applause! 🥕

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main architectural refactoring: separating the Converter role (now HTML-only) and renaming the strip experiment script, which aligns with the PR's core objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/issue-218-phase2-role-separation

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/agents/calibration/converter.md:
- Around line 63-67: The example JSON uses an invalid actualImpact value
("high") for ruleImpactAssessment entries; update the example so every entry in
ruleImpactAssessment (derived from nodeIssueSummaries[].flaggedRuleIds) uses one
of the allowed strings: easy, moderate, hard, or failed; ensure all occurrences
(including the examples around lines 63–67 and 82–85) replace "high" (or any
other nonconforming value) with the correct enum and confirm the example shows
how actualImpact maps to impact reasoning for each flaggedRuleId.

In @.claude/commands/calibrate-loop.md:
- Around line 105-106: The script currently only lists $RUN_DIR/output.html and
$RUN_DIR/converter-assessment.json, but step 2.5 requires six stripped HTML
files (stripped/<type>.html); update the verification to check for existence of
all six stripped files immediately after the converter returns and fail fast
with a clear error message naming any missing files. Modify the ls/check step
that references $RUN_DIR/output.html and $RUN_DIR/converter-assessment.json to
instead iterate over the expected filenames in $RUN_DIR/stripped (e.g.,
stripped/type1.html ... stripped/type6.html) or glob for
$RUN_DIR/stripped/*.html, verify count == 6, and if not, print which expected
stripped/<type>.html files are missing and exit nonzero.
- Around line 98-101: Update the Step 2 instructions in the "Your job" block so
they no longer tell the subagent to write files directly; instead instruct the
subagent to return the baseline HTML (output.html), the six stripped HTMLs
(stripped/<type>.html) and the converter-assessment.json content as returned
text/JSON payloads, and note that the orchestrator will persist those to
$RUN_DIR. Replace phrases like "implement" and "write" with "produce and return"
and explicitly reference the artifact names (output.html, stripped/<type>.html,
converter-assessment.json) and the orchestrator as the authority that will save
files to $RUN_DIR.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 89443d49-8bd8-4a92-ae08-fefa30aa747f

📥 Commits

Reviewing files that changed from the base of the PR and between cb7948b and 15a4d60.

📒 Files selected for processing (4)

.claude/agents/calibration/converter.md
.claude/commands/calibrate-loop.md
CLAUDE.md
src/experiments/ablation/run-strip.ts

.claude/agents/calibration/converter.md

.claude/commands/calibrate-loop.md

coderabbitai · 2026-03-31T06:36:08Z

.claude/commands/calibrate-loop.md

+ls $RUN_DIR/output.html $RUN_DIR/converter-assessment.json
+```


⚠️ Potential issue | 🟠 Major

Verify all 6 stripped HTML outputs before running measurements.

Current verification checks only baseline + assessment file. Step 2.5 depends on six stripped/<type>.html files; validate them immediately after converter return to fail fast with clear diagnostics.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/commands/calibrate-loop.md around lines 105 - 106, The script currently only lists $RUN_DIR/output.html and $RUN_DIR/converter-assessment.json, but step 2.5 requires six stripped HTML files (stripped/<type>.html); update the verification to check for existence of all six stripped files immediately after the converter returns and fail fast with a clear error message naming any missing files. Modify the ls/check step that references $RUN_DIR/output.html and $RUN_DIR/converter-assessment.json to instead iterate over the expected filenames in $RUN_DIR/stripped (e.g., stripped/type1.html ... stripped/type6.html) or glob for $RUN_DIR/stripped/*.html, verify count == 6, and if not, print which expected stripped/<type>.html files are missing and exit nonzero.

- Fix example JSON "high" → "hard" (valid: easy/moderate/hard/failed) - Add verification for 6 stripped HTML files after Converter returns https://claude.ai/code/session_01N72z2Wbib4cLhYc3FRdSQi

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

.claude/commands/calibrate-loop.md (1)

103-115: ⚠️ Potential issue | 🟠 Major

Fail fast when required Converter artifacts are missing.

Continuing after missing output.html/converter-assessment.json/stripped/*.html will cause downstream Step 2.5 failures with worse diagnostics. Abort here with a non-zero exit and explicit missing-file list.

Suggested fix

-If any file is missing, log a warning naming the missing files but continue.
+If any required file is missing, log the missing paths and stop immediately (non-zero exit).
+Do not proceed to Step 2.5 with partial Converter outputs.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.claude/commands/calibrate-loop.md around lines 103 - 115, Change the
post-Converter validation to fail fast: after the Converter returns, check for
the presence of $RUN_DIR/output.html, $RUN_DIR/converter-assessment.json and all
required $RUN_DIR/stripped/*.html files (layout-direction-spacing.html,
size-constraints.html, component-references.html, node-names-hierarchy.html,
variable-references.html, style-references.html); if any are missing, log an
explicit error listing the missing filenames and exit with a non-zero status
instead of merely warning and continuing so downstream steps do not run with
incomplete artifacts.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/commands/calibrate-loop.md:
- Around line 206-210: The token-difficulty bands in the calibration doc are
inconsistent with the implementation; update the described thresholds for
tokenDeltaToDifficulty to use <=10 (easy), <=25 (moderate), <=50 (hard), and >50
(failed) so they match src/core/design-tree/delta.ts and avoid wrong labels in
conversion.json; ensure the mapping rules under `deltaDifficulty` (for
`component-references`, `node-names-hierarchy`, `variable-references`,
`style-references`) explicitly state to use tokenDeltaToDifficulty when both
token counts are present and otherwise fall back to stripDeltaToDifficulty, and
keep the other mappings (`layout-direction-spacing`, `size-constraints`)
unchanged referencing stripDeltaToDifficulty/responsiveDelta logic as in
getStripDifficultyForRule.
- Around line 123-241: Step 3 still reads "similarity from Converter output" —
update it to read the measured similarity produced in Step 2.5 by loading
RUN_DIR/conversion.json and using its similarity/responsiveSimilarity fields;
change any code or docs that reference the Converter's similarity source (look
for references to "similarity from Converter output", variables named similarity
or converterSimilarity, and the Step 3 handler) to instead parse conversion.json
and fall back to null or the converter's value only if conversion.json is
missing. Ensure the assembled conversion.json keys (similarity,
responsiveSimilarity) are used by the orchestrator/evaluator logic (e.g., where
SIMILARITY_DIFFICULTY_THRESHOLDS is applied) so difficulty is computed from
conversion.json values.

---

Duplicate comments:
In @.claude/commands/calibrate-loop.md:
- Around line 103-115: Change the post-Converter validation to fail fast: after
the Converter returns, check for the presence of $RUN_DIR/output.html,
$RUN_DIR/converter-assessment.json and all required $RUN_DIR/stripped/*.html
files (layout-direction-spacing.html, size-constraints.html,
component-references.html, node-names-hierarchy.html, variable-references.html,
style-references.html); if any are missing, log an explicit error listing the
missing filenames and exit with a non-zero status instead of merely warning and
continuing so downstream steps do not run with incomplete artifacts.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b58a9999-4445-4141-940e-c2cea19b90e4

📥 Commits

Reviewing files that changed from the base of the PR and between 15a4d60 and e8d77e1.

📒 Files selected for processing (2)

.claude/agents/calibration/converter.md
.claude/commands/calibrate-loop.md

.claude/commands/calibrate-loop.md

coderabbitai · 2026-03-31T06:41:15Z

.claude/commands/calibrate-loop.md

+- `deltaDifficulty`: use the metric the evaluator uses for that strip family (`src/agents/evaluation-agent.ts` — `getStripDifficultyForRule`):
+  - `layout-direction-spacing` → map `delta` with `stripDeltaToDifficulty` (≤5 easy, 6–15 moderate, 16–30 hard, >30 failed)
+  - `size-constraints` → if `responsiveDelta` is finite, map `responsiveDelta` with `stripDeltaToDifficulty`; else map `delta`
+  - `component-references`, `node-names-hierarchy`, `variable-references`, `style-references` → if both token counts present, map with `tokenDeltaToDifficulty` (≤5% easy, 6–20% moderate, 21–40% hard, >40% failed); else map `delta` with `stripDeltaToDifficulty`



⚠️ Potential issue | 🟠 Major

Token-difficulty thresholds are inconsistent with implementation.

The documented tokenDeltaToDifficulty bands here do not match src/core/design-tree/delta.ts. Use <=10 easy, <=25 moderate, <=50 hard, >50 failed to avoid wrong difficulty labels in assembled conversion.json.

Suggested fix

- - `component-references`, `node-names-hierarchy`, `variable-references`, `style-references` → if both token counts present, map with `tokenDeltaToDifficulty` (≤5% easy, 6–20% moderate, 21–40% hard, >40% failed); else map `delta` with `stripDeltaToDifficulty` + - `component-references`, `node-names-hierarchy`, `variable-references`, `style-references` → if both token counts present, map with `tokenDeltaToDifficulty` (≤10% easy, 11–25% moderate, 26–50% hard, >50% failed); else map `delta` with `stripDeltaToDifficulty`

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/commands/calibrate-loop.md around lines 206 - 210, The token-difficulty bands in the calibration doc are inconsistent with the implementation; update the described thresholds for tokenDeltaToDifficulty to use <=10 (easy), <=25 (moderate), <=50 (hard), and >50 (failed) so they match src/core/design-tree/delta.ts and avoid wrong labels in conversion.json; ensure the mapping rules under `deltaDifficulty` (for `component-references`, `node-names-hierarchy`, `variable-references`, `style-references`) explicitly state to use tokenDeltaToDifficulty when both token counts are present and otherwise fall back to stripDeltaToDifficulty, and keep the other mappings (`layout-direction-spacing`, `size-constraints`) unchanged referencing stripDeltaToDifficulty/responsiveDelta logic as in getStripDifficultyForRule.

The documented thresholds (≤5% easy, 6–20% moderate, 21–40% hard, >40% failed) are correct and match the implementation in src/core/design-tree/delta.ts:

if (deltaPercent <= 5) return "easy"; if (deltaPercent <= 20) return "moderate"; if (deltaPercent <= 40) return "hard"; return "failed";

Your suggested ≤10/≤25/≤50/>50 does not match the source code.

Generated by Claude Code

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the actual tokenDeltaToDifficulty implementation in delta.ts rg -n "tokenDeltaToDifficulty|deltaPercent" --type ts -C 5

Length of output: 5748

@let-sunny, you're absolutely right — I apologize for the incorrect suggestion. The implementation in src/core/design-tree/delta.ts confirms your thresholds exactly:

if (deltaPercent <= 5) return "easy"; if (deltaPercent <= 20) return "moderate"; if (deltaPercent <= 40) return "hard"; return "failed";

The documented bands (≤5% easy, 6–20% moderate, 21–40% hard, >40% failed) in the PR are correct. My original review comment was mistaken — the ≤10/≤25/≤50/>50 values I suggested do not reflect the actual code. No changes needed here.

🧠 Learnings used

Learnt from: let-sunny Repo: let-sunny/canicode PR: 222 File: .claude/commands/calibrate-loop.md:98-101 Timestamp: 2026-03-31T06:37:53.918Z Learning: In this repo, do not treat the Converter subagent’s writing of HTML files as a violation of the orchestrator “file ownership” rule. Specifically, the Converter agent at `.claude/agents/calibration/converter.md` intentionally has `Write` tool access and writes `$RUN_DIR/output.html` plus 6 stripped HTML variants directly to disk to avoid bloating subagent response payloads. The orchestrator-only file-writing constraint in `.claude/commands/calibrate-loop.md` applies only to JSON outputs (e.g., `gaps.json`, `debate.json`) where the orchestrator controls the exact file paths and formats. Therefore, when reviewing changes to this calibration flow, skip the “orchestrator must write all files” flagging for Converter-produced HTML files in `$RUN_DIR`.

Learnt from: let-sunny Repo: let-sunny/canicode PR: 221 File: src/experiments/ablation/run-phase1.ts:151-156 Timestamp: 2026-03-31T05:09:23.284Z Learning: In `src/experiments/ablation/run-phase1.ts` (and `run-condition.ts`), JSON payloads returned from internal CLI calls (`visual-compare`, `code-metrics`, `html-postprocess`) via `execCli` are intentionally cast without Zod validation. These CLIs and the experiment scripts are co-owned and co-evolved, so their stdout contracts are treated as trusted internal boundaries, not external inputs. Do NOT flag the absence of Zod schema validation on these `JSON.parse` / `as` casts as a deficiency.

Learnt from: CR Repo: let-sunny/canicode PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-31T03:59:28.420Z Learning: Applies to src/core/design-tree/*.ts : Generated HTML from design-tree must remove noise (exportSettings, pluginData, internal IDs) and pre-convert Figma properties (layoutMode→flex, fills→color, etc.)

Learnt from: CR Repo: let-sunny/canicode PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-31T03:59:28.420Z Learning: Applies to src/core/design-tree/strip.ts : Strip ablation must implement 6 info types from `DESIGN_TREE_INFO_TYPES` in `src/core/design-tree/strip.ts`: layout-direction-spacing, size-constraints, component-references, node-names-hierarchy, variable-references, style-references

Learnt from: let-sunny Repo: let-sunny/canicode PR: 221 File: src/experiments/ablation/run-condition.ts:96-100 Timestamp: 2026-03-31T05:09:23.357Z Learning: In let-sunny/canicode, experiment and ablation scripts (e.g., `src/experiments/ablation/run-condition.ts`, `src/experiments/ablation/run-phase1.ts`) that parse JSON stdout from internal CLI commands (`execCli(...)` calling `dist/cli/index.js` subcommands like `visual-compare`, `code-metrics`, `html-postprocess`) do NOT require Zod validation. The CLI output format is controlled and tested within the same repo. The "validate external inputs with Zod" guideline applies to user-supplied input and external APIs only — not intra-repo CLI stdout.

Learnt from: CR Repo: let-sunny/canicode PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-31T03:59:28.420Z Learning: Cross-run evidence accumulates in `data/calibration-evidence.json` with overscored/underscored rules, confidence, pro/con, and decision

Learnt from: CR Repo: let-sunny/canicode PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-31T03:59:28.420Z Learning: Critic must receive structured evidence including proposals from evaluation, Converter's ruleImpactAssessment, gap analysis, and prior cross-run evidence

Learnt from: let-sunny Repo: let-sunny/canicode PR: 147 File: src/agents/ablation/run-phase1.ts:168-202 Timestamp: 2026-03-28T00:12:55.972Z Learning: In `src/agents/ablation/run-phase1.ts`, skipping no-op fixture/type pairs from the `computeRankings` delta aggregation is intentional by design. Zero-padding skipped pairs would dilute the signal (e.g., `variable-references` strip on a fixture with no variables is semantically meaningless). Instead, `fixtureCount` (N column) is surfaced in the rankings table to indicate per-type coverage, keeping the denominator honest without polluting the signal. Do NOT flag the omission of zero-delta entries for skipped fixtures.

Learnt from: let-sunny Repo: let-sunny/canicode PR: 91 File: .claude/commands/calibrate-loop.md:36-40 Timestamp: 2026-03-26T01:35:55.319Z Learning: For internal Claude Code subagent command prompts under `.claude/commands/` (not user-facing docs), preserve any dual grading/threshold criteria exactly as written—e.g., lines like “A or higher (percentage >= 90)” should not be collapsed into a single label such as “A+” because the subagent relies on the explicit percentage threshold in addition to the letter grade.

Learnt from: let-sunny Repo: let-sunny/canicode PR: 167 File: src/core/report-html/render.ts:183-199 Timestamp: 2026-03-29T07:26:36.727Z Learning: In `let-sunny/canicode`, rule-level `totalScore` values in `src/core/report-html/render.ts` are always negative (all rules produce score deductions, never positive contributions). Therefore, the `score-red` CSS class on the rule section badge in `renderRuleSection()` is intentionally hardcoded and should NOT be flagged as inconsistent or suggested to be replaced with a dynamic severity-based class.

Learnt from: let-sunny Repo: let-sunny/canicode PR: 59 File: src/cli/helpers.ts:27-30 Timestamp: 2026-03-25T11:36:59.928Z Learning: In `src/cli/helpers.ts`, `pickRandomScope` intentionally uses `Math.random()` to return a nondeterministic subtree. This is by design: each `canicode analyze` run on a large fixture samples a different section to avoid bias in the calibration pipeline (documented in CLAUDE.md). Do NOT flag this as a nondeterminism/flakiness issue.

Learnt from: let-sunny Repo: let-sunny/canicode PR: 165 File: src/core/ui-helpers.ts:32-52 Timestamp: 2026-03-29T05:46:00.767Z Learning: In `src/core/ui-helpers.ts` (let-sunny/canicode), `severityDot` and `severityBadge` are intentionally kept as separate exported functions despite having identical implementations today. The separation is for semantic clarity and to allow independent divergence if badge styling needs different class composition in the future. Do NOT flag them as duplicates or suggest consolidation.

Learnt from: let-sunny Repo: let-sunny/canicode PR: 93 File: src/core/rules/rule-config.ts:76-80 Timestamp: 2026-03-26T01:28:57.785Z Learning: In the let-sunny/canicode repo, `src/core/rules/rule-config.ts` is automatically adjusted by a nightly calibration pipeline. Do NOT suggest adding inline comments to this file for calibration rationale — the change evidence is tracked in PR descriptions, commit messages, and `data/calibration-evidence.json` instead. Inline comments would create clutter as the file is frequently auto-modified.

Learnt from: let-sunny Repo: let-sunny/canicode PR: 179 File: README.md:137-137 Timestamp: 2026-03-29T13:25:05.487Z Learning: In the let-sunny/canicode repo, `docs/REFERENCE.md` was renamed to `docs/CUSTOMIZATION.md` (PR `#179`). All links and references throughout the codebase (README.md, package.json, src/mcp/server.ts, src/core/rules/rule-config.test.ts, scripts/sync-rule-docs.ts) now point to `docs/CUSTOMIZATION.md` as the canonical user-facing config/customization guide. Do not flag these links as incorrect.

Learnt from: CR Repo: let-sunny/canicode PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-31T03:59:28.420Z Learning: Applies to src/core/rules/rule-config.ts : All rule scores, severity, and thresholds are managed in `rules/rule-config.ts`

Learnt from: let-sunny Repo: let-sunny/canicode PR: 194 File: src/core/engine/html-utils.ts:32-35 Timestamp: 2026-03-30T07:34:03.867Z Learning: In let-sunny/canicode, the shared HTML processing utilities (`extractHtml`, `sanitizeHtml`, `injectLocalFont`, `processHtml`) live in `src/core/comparison/html-utils.ts` (created in PR `#193`). The path `src/core/engine/html-utils.ts` shown in PR `#194`'s diff is the new extracted copy; both files may coexist, but the canonical path that was in scope for security review is `src/core/comparison/html-utils.ts`. A known unquoted XSS issue in `sanitizeHtml` (unquoted `on*=` attributes and unquoted `href=javascript:` forms not stripped) is deferred to a follow-up PR.

Learnt from: let-sunny Repo: let-sunny/canicode PR: 88 File: docs/CALIBRATION.md:248-259 Timestamp: 2026-03-26T01:29:06.955Z Learning: In the let-sunny/canicode repository, the `/add-rule` rule discovery pipeline has 5 agents (Researcher, Designer, Implementer, Evaluator, Critic). Step 4 "A/B Visual Validation" is an orchestrated process run by the Orchestrator, not a separate agent. The pipeline should always be referred to as a "5-agent pipeline," not "6-agent."

Learnt from: let-sunny Repo: let-sunny/canicode PR: 147 File: src/agents/ablation/run-phase1.ts:35-50 Timestamp: 2026-03-28T00:13:07.729Z Learning: In `src/agents/ablation/run-phase1.ts` (and the broader ablation pipeline), fixture data files (`fixtures/<name>/data.json`) and their associated screenshots are **immutable by convention**: when fixture content changes, a new fixture name is used instead of updating in place. Therefore, per-fixture input hashing in `computeConfigVersion` is unnecessary and should NOT be flagged as a cache-correctness gap.

Learnt from: let-sunny Repo: let-sunny/canicode PR: 91 File: .claude/commands/calibrate-loop.md:43-46 Timestamp: 2026-03-26T01:35:52.693Z Learning: Do not apply markdownlint rules to files under `.claude/commands/` (e.g., `.claude/commands/calibrate-loop.md`). These are Claude Code agent prompt files consumed by Claude Code, not rendered markdown documents, so markdown-specific linting issues (such as MD031: blanks-around-fences) should not be flagged in code review.

https://claude.ai/code/session_01N72z2Wbib4cLhYc3FRdSQi

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

.claude/commands/calibrate-loop.md (1)

114-115: ⚠️ Potential issue | 🟠 Major

Fail fast when required Converter artifacts are missing

Line 114-Line 115 currently logs a warning and continues, but Step 2.5 requires these files unconditionally. This turns a clear precondition failure into later CLI errors with worse diagnostics.

Suggested fix

-If any file is missing, log a warning naming the missing files but continue.
+If any required file is missing, stop immediately:
+- log an error that lists each missing path
+- append an activity entry with `result: "FAILED — missing converter artifacts"`
+- exit nonzero before Step 2.5

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.claude/commands/calibrate-loop.md around lines 114 - 115, The code
currently only logs a warning and continues when required Converter artifact
files are missing; change this to fail fast by detecting the missing files where
the warning is emitted, constructing a clear error message that lists the
missing filenames, and then aborting execution (throwing an Error or calling
process.exit with a non-zero code) so Step 2.5’s precondition is enforced;
update the code path that emits the "missing files" warning to raise this error
and ensure any surrounding callers propagate/handle it appropriately.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/commands/calibrate-loop.md:
- Around line 147-149: The current selection sorts full paths in SCREENSHOTS
using sort -t- -k2 -n which mis-sorts when parent directories contain hyphens;
update the logic that builds SCREENSHOTS/LARGEST/LARGEST_WIDTH to sort or
extract by the screenshot filename itself (not the full path) — e.g., map paths
to basenames (or extract the numeric suffix with a regex) and sort numerically
on that value, then pick the last entry as LARGEST and derive LARGEST_WIDTH by
parsing the numeric part from that basename; ensure you change the pipeline that
builds SCREENSHOTS and the extraction used for LARGEST_WIDTH so it no longer
relies on -t- -k2 on full paths.

---

Duplicate comments:
In @.claude/commands/calibrate-loop.md:
- Around line 114-115: The code currently only logs a warning and continues when
required Converter artifact files are missing; change this to fail fast by
detecting the missing files where the warning is emitted, constructing a clear
error message that lists the missing filenames, and then aborting execution
(throwing an Error or calling process.exit with a non-zero code) so Step 2.5’s
precondition is enforced; update the code path that emits the "missing files"
warning to raise this error and ensure any surrounding callers propagate/handle
it appropriately.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e9f7e047-37d6-4f41-b839-5bdf2e3c372d

📥 Commits

Reviewing files that changed from the base of the PR and between e8d77e1 and edd6e97.

📒 Files selected for processing (1)

.claude/commands/calibrate-loop.md

.claude/commands/calibrate-loop.md

let-sunny mentioned this pull request Mar 31, 2026

docs: update outdated Wiki pages (Rule Reference, Scoring Model) #219

Closed

let-sunny marked this pull request as ready for review March 31, 2026 06:30

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

fix: address review — fix actualImpact enum + verify stripped HTMLs

e8d77e1

- Fix example JSON "high" → "hard" (valid: easy/moderate/hard/failed) - Add verification for 6 stripped HTML files after Converter returns https://claude.ai/code/session_01N72z2Wbib4cLhYc3FRdSQi

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

fix: update Step 3 similarity source to conversion.json

edd6e97

https://claude.ai/code/session_01N72z2Wbib4cLhYc3FRdSQi

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

.claude/commands/calibrate-loop.md Show resolved Hide resolved

let-sunny merged commit 7942bbb into main Mar 31, 2026
3 checks passed

let-sunny deleted the claude/issue-218-phase2-role-separation branch March 31, 2026 06:57

This was referenced Mar 31, 2026

docs: update documentation for pipeline changes (#218 Phase 3) #223

Merged

refactor: extract common validation engine from Ablation + Calibration #218

Closed

		ls $RUN_DIR/output.html $RUN_DIR/converter-assessment.json
		```

Conversation

let-sunny commented Mar 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before → After

Why

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

let-sunny Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

let-sunny commented Mar 31, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 31, 2026 •

edited

Loading

coderabbitai bot Mar 31, 2026 •

edited

Loading