calibrate: nightly run — 7 score adjustments from 9 fixtures by let-sunny · Pull Request #93 · let-sunny/canicode

let-sunny · 2026-03-26T01:12:02Z

Calibrate-night 결과 (2026-03-26)

9개 fixture 전부 풀 파이프라인 (Converter + Evaluation + Critic + Arbitrator) 실행.

Score Adjustments Applied (7)

Rule	Before	After	Evidence	Fixture
`no-auto-layout`	-7	-10	hard (circular clock layout)	material3-52949
`fixed-size-in-auto-layout`	-5	-3	2 cross-fixture easy	simple-ds-175-9106
`missing-size-constraint`	-5	-3	2 cross-fixture easy	simple-ds-175-9106
`magic-number-spacing`	-4	-3	2 cross-fixture easy	material3-56615-82356
`absolute-position-in-auto-layout`	-10	-7	medium+easy (2 cases)	material3-51954
`missing-responsive-behavior`	-4	-3	2 fixtures easy	material3-51954
`raw-font`	-8	-4	3 cross-fixture easy, blocking→risk	material3-51954

Similarity Results (all 9 fixtures)

Fixture	Nodes	Grade	Similarity	Difficulty
simple-ds-4333-9262	140	B+ (81%)	97%	easy
material3-52949-27916	453	C (67%)	72%	hard
simple-ds-562-9518	245	C (67%)	98%	easy
simple-ds-175-9106	236	C (65%)	94%	low
material3-56615-82356	330	D (60%)	92%	moderate
material3-51954-18254	959	D (64%)	98%	easy
material3-56615-45927	197	D (59%)	86%	moderate
simple-ds-175-8591	301	D (63%)	97%	easy
simple-ds-175-7790	310	D (64%)	98%	easy

Key Insights

Below-B fixtures produced the most valuable data — removing the tier skip was the right call
Critic quality excellent: CLAUDE.md policies correctly applied, session saturation recognized
raw-font -8→-4 was the biggest change: 3 fixtures showed AI handles raw fonts easily (explicit values in design tree)
Component rules held firm: CLAUDE.md policy blocked all disable proposals for missing-component, detached-instance, variant-structure-mismatch

Refs #89

Test plan

590 tests pass
9 fixtures analyzed with full pipeline

🤖 Generated with Claude Code

Summary by CodeRabbit

Chores
- Adjusted severity levels and scoring weights for multiple validation rules to improve prioritization and consistency in system feedback.

Circular clock layout without Auto Layout required trigonometric coordinate calculations — genuinely hard to implement. Validated by 6-agent debate: Critic APPROVE, Arbitrator APPLY. Fixture: material3-52949-27916 (time picker, 453 nodes, 72% similarity) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…5→-3 Both rules overscored across 2 cross-fixture cases (material3 + simple-ds). Critic revised from proposed -2 to -3 (modest reduction, not aggressive). Arbitrator approved. 9 other proposals rejected (CLAUDE.md policy, low confidence). Fixture: simple-ds-175-9106 (236 nodes, 94% similarity) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2 cross-fixture overscored observations (material3 + simple-ds). Critic revised from proposed -2 to -3 (modest reduction). 14 other proposals rejected (session stability, CLAUDE.md policy, low confidence). Fixture: material3-56615-82356 (330 nodes, 92% similarity) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- absolute-position-in-auto-layout: -10 → -7 (keep blocking, 2 cases medium+easy) - missing-responsive-behavior: -4 → -3 (keep risk, 2 cases easy) - raw-font: -8 → -4 (blocking → risk, 3 cross-fixture easy cases) 14 other proposals rejected (session stability, CLAUDE.md policy, low confidence). Fixture: material3-51954-18254 (959 nodes, 98% similarity, date range picker) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-26T01:12:15Z

📝 Walkthrough

Walkthrough

Updated scoring and severity values for multiple linting rules in RULE_CONFIGS. Adjusted penalty scores across layout and token-related rules, including changes to no-auto-layout, absolute-position-in-auto-layout, fixed-size-in-auto-layout, and others, along with a severity downgrade for raw-font rule.

Changes

Cohort / File(s)	Summary
Rule Configuration Scoring `src/core/rules/rule-config.ts`	Updated penalty scores for `no-auto-layout` (−7 to −10), `absolute-position-in-auto-layout` (−10 to −7), `fixed-size-in-auto-layout` and `missing-size-constraint` (−5 to −3), `missing-responsive-behavior` (−4 to −3), and `magic-number-spacing` (−4 to −3). Changed `raw-font` rule severity from `blocking` to `risk` with score adjustment from −8 to −4.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 Whiskers twitching with delight,
Scoring tweaks align just right,
Rule severities now dance and sway,
Configuration brightens up the day! ✨

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: calibration adjustments to rule scoring based on a nightly test run with specific fixtures.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch calibrate/nightly-2026-03-26

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/core/rules/rule-config.ts`:
- Around line 76-80: The PR reduces the "raw-font" rule's impact (severity
changed from "blocking" to "risk" and score from -8 to -4) but lacks inline
rationale; add a concise comment next to the "raw-font" entry in rule-config.ts
that explains the evidence and reasoning (e.g., "downgraded to recoverable risk
based on 3 cross-fixture easy cases") and mention the relevant calibration
factors (SEVERITY_DENSITY_WEIGHT and SEVERITY_WEIGHT) so future maintainers can
understand the change context; ensure the comment references the "raw-font" key
and the adjusted severity/score values.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 415d5cf3-77bb-45d8-a783-adaf21ad776e

📥 Commits

Reviewing files that changed from the base of the PR and between 55d92c3 and f97c1c0.

⛔ Files ignored due to path filters (2)

data/calibration-evidence.json is excluded by !data/**
data/discovery-evidence.json is excluded by !data/**

📒 Files selected for processing (1)

src/core/rules/rule-config.ts

coderabbitai · 2026-03-26T01:15:02Z

src/core/rules/rule-config.ts

  "raw-font": {
-    severity: "blocking",
-    score: -8,
+    severity: "risk",
+    score: -4,
    enabled: true,
  },


🧹 Nitpick | 🔵 Trivial

Significant downgrade: raw-font from blocking to risk with score halved.

This is the largest calibration adjustment in this PR. Per the relevant code snippets:

SEVERITY_DENSITY_WEIGHT: blocking=3.0 → risk=2.0 (33% reduction in density impact)

SEVERITY_WEIGHT: blocking=10 → risk=5 (50% reduction in weight)

Combined with the score change (-8 → -4), raw-font violations will have substantially less impact on overall design scores. The PR states this was based on "3 cross-fixture easy" evidence.

Consider documenting the rationale inline (e.g., a comment explaining why raw fonts are now considered recoverable risk rather than blocking) to help future maintainers understand this calibration decision.

📝 Suggested documentation

"raw-font": { + // Calibrated 2026-03-26: downgraded from blocking based on 3 cross-fixture evidence + // Raw fonts are recoverable via style tokens; not a hard blocker for implementation severity: "risk", score: -4, enabled: true, },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"raw-font": {

severity: "blocking",

score: -8,

severity: "risk",

score: -4,

enabled: true,

},

"raw-font": {

// Calibrated 2026-03-26: downgraded from blocking based on 3 cross-fixture evidence

// Raw fonts are recoverable via style tokens; not a hard blocker for implementation

severity: "risk",

score: -4,

enabled: true,

},

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/core/rules/rule-config.ts` around lines 76 - 80, The PR reduces the "raw-font" rule's impact (severity changed from "blocking" to "risk" and score from -8 to -4) but lacks inline rationale; add a concise comment next to the "raw-font" entry in rule-config.ts that explains the evidence and reasoning (e.g., "downgraded to recoverable risk based on 3 cross-fixture easy cases") and mention the relevant calibration factors (SEVERITY_DENSITY_WEIGHT and SEVERITY_WEIGHT) so future maintainers can understand the change context; ensure the comment references the "raw-font" key and the adjusted severity/score values.

rule-config.ts는 calibration pipeline이 자동으로 조정하는 파일이라 매번 인라인 코멘트를 달면 금방 clutter됩니다. 변경 근거는 PR description과 commit message에 기록되어 있고, data/calibration-evidence.json에 cross-run evidence도 누적되므로 별도 인라인 코멘트는 불필요합니다.

Generated by Claude Code

@let-sunny, understood — since rule-config.ts is auto-managed by the calibration pipeline, inline comments would indeed be overwritten or accumulate noise over time. The combination of PR descriptions, commit messages, and data/calibration-evidence.json as the source of truth for calibration rationale makes sense. I'll skip inline documentation suggestions for this file in future reviews.

✏️ Learnings added

Learnt from: let-sunny Repo: let-sunny/canicode PR: 93 File: src/core/rules/rule-config.ts:76-80 Timestamp: 2026-03-26T01:28:57.785Z Learning: In the let-sunny/canicode repo, `src/core/rules/rule-config.ts` is automatically adjusted by a nightly calibration pipeline. Do NOT suggest adding inline comments to this file for calibration rationale — the change evidence is tracked in PR descriptions, commit messages, and `data/calibration-evidence.json` instead. Inline comments would create clutter as the file is frequently auto-modified.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: CR Repo: let-sunny/canicode PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-25T18:32:58.247Z Learning: Applies to src/core/rules/rule-config.ts : Rule scores, severity, and thresholds must be managed in `rules/rule-config.ts`

Learnt from: CR Repo: let-sunny/canicode PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-25T18:32:58.247Z Learning: Rule logic and score config must be intentionally separated so scores can be tuned without modifying rule logic

Learnt from: CR Repo: let-sunny/canicode PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-25T18:32:58.247Z Learning: Component-related rule scores should NOT be lowered based on small fixture calibration; calibration fixtures should include large, complex pages

let-sunny and others added 5 commits March 26, 2026 09:49

calibrate: accumulate evidence from 9-fixture nightly run

f97c1c0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

let-sunny mentioned this pull request Mar 26, 2026

chore: calibrate-night 결과 분석 — 개선 필요 사항 #89

Closed

2 tasks

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

let-sunny merged commit b20cd38 into main Mar 26, 2026
3 checks passed

let-sunny deleted the calibrate/nightly-2026-03-26 branch March 26, 2026 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calibrate: nightly run — 7 score adjustments from 9 fixtures#93

calibrate: nightly run — 7 score adjustments from 9 fixtures#93
let-sunny merged 5 commits intomainfrom
calibrate/nightly-2026-03-26

let-sunny commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 26, 2026 •

edited

Loading

Uh oh!

let-sunny Mar 26, 2026

Uh oh!

coderabbitai bot Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

let-sunny commented Mar 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Calibrate-night 결과 (2026-03-26)

Score Adjustments Applied (7)

Similarity Results (all 9 fixtures)

Key Insights

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

let-sunny Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

let-sunny commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 26, 2026 •

edited

Loading

coderabbitai bot Mar 26, 2026 •

edited

Loading