calibrate: nightly run — 7 score adjustments from 9 fixtures#93
calibrate: nightly run — 7 score adjustments from 9 fixtures#93
Conversation
Circular clock layout without Auto Layout required trigonometric coordinate calculations — genuinely hard to implement. Validated by 6-agent debate: Critic APPROVE, Arbitrator APPLY. Fixture: material3-52949-27916 (time picker, 453 nodes, 72% similarity) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…5→-3 Both rules overscored across 2 cross-fixture cases (material3 + simple-ds). Critic revised from proposed -2 to -3 (modest reduction, not aggressive). Arbitrator approved. 9 other proposals rejected (CLAUDE.md policy, low confidence). Fixture: simple-ds-175-9106 (236 nodes, 94% similarity) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 cross-fixture overscored observations (material3 + simple-ds). Critic revised from proposed -2 to -3 (modest reduction). 14 other proposals rejected (session stability, CLAUDE.md policy, low confidence). Fixture: material3-56615-82356 (330 nodes, 92% similarity) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- absolute-position-in-auto-layout: -10 → -7 (keep blocking, 2 cases medium+easy) - missing-responsive-behavior: -4 → -3 (keep risk, 2 cases easy) - raw-font: -8 → -4 (blocking → risk, 3 cross-fixture easy cases) 14 other proposals rejected (session stability, CLAUDE.md policy, low confidence). Fixture: material3-51954-18254 (959 nodes, 98% similarity, date range picker) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughUpdated scoring and severity values for multiple linting rules in Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/core/rules/rule-config.ts`:
- Around line 76-80: The PR reduces the "raw-font" rule's impact (severity
changed from "blocking" to "risk" and score from -8 to -4) but lacks inline
rationale; add a concise comment next to the "raw-font" entry in rule-config.ts
that explains the evidence and reasoning (e.g., "downgraded to recoverable risk
based on 3 cross-fixture easy cases") and mention the relevant calibration
factors (SEVERITY_DENSITY_WEIGHT and SEVERITY_WEIGHT) so future maintainers can
understand the change context; ensure the comment references the "raw-font" key
and the adjusted severity/score values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 415d5cf3-77bb-45d8-a783-adaf21ad776e
⛔ Files ignored due to path filters (2)
data/calibration-evidence.jsonis excluded by!data/**data/discovery-evidence.jsonis excluded by!data/**
📒 Files selected for processing (1)
src/core/rules/rule-config.ts
| "raw-font": { | ||
| severity: "blocking", | ||
| score: -8, | ||
| severity: "risk", | ||
| score: -4, | ||
| enabled: true, | ||
| }, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Significant downgrade: raw-font from blocking to risk with score halved.
This is the largest calibration adjustment in this PR. Per the relevant code snippets:
SEVERITY_DENSITY_WEIGHT: blocking=3.0 → risk=2.0 (33% reduction in density impact)SEVERITY_WEIGHT: blocking=10 → risk=5 (50% reduction in weight)
Combined with the score change (-8 → -4), raw-font violations will have substantially less impact on overall design scores. The PR states this was based on "3 cross-fixture easy" evidence.
Consider documenting the rationale inline (e.g., a comment explaining why raw fonts are now considered recoverable risk rather than blocking) to help future maintainers understand this calibration decision.
📝 Suggested documentation
"raw-font": {
+ // Calibrated 2026-03-26: downgraded from blocking based on 3 cross-fixture evidence
+ // Raw fonts are recoverable via style tokens; not a hard blocker for implementation
severity: "risk",
score: -4,
enabled: true,
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "raw-font": { | |
| severity: "blocking", | |
| score: -8, | |
| severity: "risk", | |
| score: -4, | |
| enabled: true, | |
| }, | |
| "raw-font": { | |
| // Calibrated 2026-03-26: downgraded from blocking based on 3 cross-fixture evidence | |
| // Raw fonts are recoverable via style tokens; not a hard blocker for implementation | |
| severity: "risk", | |
| score: -4, | |
| enabled: true, | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/core/rules/rule-config.ts` around lines 76 - 80, The PR reduces the
"raw-font" rule's impact (severity changed from "blocking" to "risk" and score
from -8 to -4) but lacks inline rationale; add a concise comment next to the
"raw-font" entry in rule-config.ts that explains the evidence and reasoning
(e.g., "downgraded to recoverable risk based on 3 cross-fixture easy cases") and
mention the relevant calibration factors (SEVERITY_DENSITY_WEIGHT and
SEVERITY_WEIGHT) so future maintainers can understand the change context; ensure
the comment references the "raw-font" key and the adjusted severity/score
values.
There was a problem hiding this comment.
rule-config.ts는 calibration pipeline이 자동으로 조정하는 파일이라 매번 인라인 코멘트를 달면 금방 clutter됩니다. 변경 근거는 PR description과 commit message에 기록되어 있고, data/calibration-evidence.json에 cross-run evidence도 누적되므로 별도 인라인 코멘트는 불필요합니다.
Generated by Claude Code
There was a problem hiding this comment.
@let-sunny, understood — since rule-config.ts is auto-managed by the calibration pipeline, inline comments would indeed be overwritten or accumulate noise over time. The combination of PR descriptions, commit messages, and data/calibration-evidence.json as the source of truth for calibration rationale makes sense. I'll skip inline documentation suggestions for this file in future reviews.
✏️ Learnings added
Learnt from: let-sunny
Repo: let-sunny/canicode PR: 93
File: src/core/rules/rule-config.ts:76-80
Timestamp: 2026-03-26T01:28:57.785Z
Learning: In the let-sunny/canicode repo, `src/core/rules/rule-config.ts` is automatically adjusted by a nightly calibration pipeline. Do NOT suggest adding inline comments to this file for calibration rationale — the change evidence is tracked in PR descriptions, commit messages, and `data/calibration-evidence.json` instead. Inline comments would create clutter as the file is frequently auto-modified.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: CR
Repo: let-sunny/canicode PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-25T18:32:58.247Z
Learning: Applies to src/core/rules/rule-config.ts : Rule scores, severity, and thresholds must be managed in `rules/rule-config.ts`
Learnt from: CR
Repo: let-sunny/canicode PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-25T18:32:58.247Z
Learning: Rule logic and score config must be intentionally separated so scores can be tuned without modifying rule logic
Learnt from: CR
Repo: let-sunny/canicode PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-25T18:32:58.247Z
Learning: Component-related rule scores should NOT be lowered based on small fixture calibration; calibration fixtures should include large, complex pages
Calibrate-night 결과 (2026-03-26)
9개 fixture 전부 풀 파이프라인 (Converter + Evaluation + Critic + Arbitrator) 실행.
Score Adjustments Applied (7)
no-auto-layoutfixed-size-in-auto-layoutmissing-size-constraintmagic-number-spacingabsolute-position-in-auto-layoutmissing-responsive-behaviorraw-fontSimilarity Results (all 9 fixtures)
Key Insights
Refs #89
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit