Skip to content

feat: use calibrated per-rule scores in final percentage calculation#110

Merged
let-sunny merged 1 commit intomainfrom
feat/use-calculated-score-in-scoring
Mar 26, 2026
Merged

feat: use calibrated per-rule scores in final percentage calculation#110
let-sunny merged 1 commit intomainfrom
feat/use-calculated-score-in-scoring

Conversation

@let-sunny
Copy link
Copy Markdown
Owner

@let-sunny let-sunny commented Mar 26, 2026

Summary

  • Replace flat severity weights with calculatedScore from rule engine in density calculation
  • Per-rule scores and depthWeight from rule-config.ts now actually influence user-facing scores
  • Calibration loop score adjustments flow through to final percentages

What was wrong

calculateScores() used SEVERITY_DENSITY_WEIGHT (blocking=3.0, risk=2.0, missing-info=1.0, suggestion=0.5) for density. This meant:

  • All blocking rules contributed 3.0 equally regardless of their calibrated score
  • no-auto-layout (score: -10) and missing-size-constraint (score: -3) were treated the same
  • depthWeight was computed into calculatedScore but never consumed by scoring
  • Adjusting scores in rule-config.ts via calibration loop had zero effect on user-facing scores

What this fixes

Now calculateScores() uses Math.abs(issue.calculatedScore) instead of flat severity weights:

Rule Before (severity weight) After (calculatedScore)
no-auto-layout at root (score -10, depthWeight 1.5) 3.0 15
no-auto-layout at leaf (score -10, depthWeight 1.0) 3.0 10
missing-size-constraint (score -3) 3.0 3
unnecessary-node (score -2) 0.5 2

Test plan

  • 623 tests pass
  • Type check clean
  • Re-run calibration fixtures and compare before/after score distributions

Closes #104

Made with Cursor

Summary by CodeRabbit

  • Improvements
    • Scoring calculation now weights issues based on individual impact scores rather than severity classification.
    • Issue density assessment is more precise, reflecting actual impact differences across severity categories.

Replace flat severity weights (blocking=3.0, risk=2.0, etc.) with
calculatedScore from rule engine in density calculation. This makes
per-rule scores and depthWeight from rule-config.ts actually influence
the user-facing score, connecting the calibration pipeline to output.

Before: no-auto-layout (-10) and missing-size-constraint (-3) both
contributed 3.0 (same severity = same weight). depthWeight was computed
but never consumed by scoring.

After: no-auto-layout at root contributes 15 (|-10 × 1.5|), while
missing-size-constraint contributes 3 (|-3 × 1.0|). Calibration loop
score adjustments now flow through to final percentages.

Closes #104
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 143a510b-d995-4b95-9aa9-614532e877ea

📥 Commits

Reviewing files that changed from the base of the PR and between 624a2f5 and 6c071a2.

📒 Files selected for processing (2)
  • src/core/engine/scoring.test.ts
  • src/core/engine/scoring.ts

📝 Walkthrough

Walkthrough

The changes replace severity-based density weighting with per-issue calculatedScore in the scoring pipeline. Previously, SEVERITY_DENSITY_WEIGHT assigned fixed weights by severity category; now density contribution derives from each issue's calculatedScore, enabling depth weights and per-rule calibration to influence final scores.

Changes

Cohort / File(s) Summary
Scoring Logic Update
src/core/engine/scoring.ts
Removed SEVERITY_DENSITY_WEIGHT constant and replaced weightedIssueCount accumulation from severity-based weights to Math.abs(issue.calculatedScore), allowing per-rule calibration and depth weighting to affect density scoring.
Test Suite Updates
src/core/engine/scoring.test.ts
Updated density scoring tests to verify that densityScore ordering is driven by individual issue calculatedScore values rather than severity category, with new assertions on weightedIssueCount results.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 The density now dances with calibrated grace,
No more fixed severities in their rigid place—
Each issue's score tells its own weighted tale,
Depth and calibration make the metrics sail! ✨

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: replacing flat severity weights with calibrated per-rule scores in the final percentage calculation.
Linked Issues check ✅ Passed The PR successfully implements the proposed fix in issue #104: replaces flat SEVERITY_DENSITY_WEIGHT with per-issue calculatedScore from the rule engine, allowing calibrated scores and depthWeight to influence final percentages.
Out of Scope Changes check ✅ Passed All changes are directly related to the scope of issue #104 and PR objectives; modifications to scoring.ts and scoring.test.ts align with replacing severity weights with calculatedScore-based density weighting.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/use-calculated-score-in-scoring

Comment @coderabbitai help to get the list of available commands and usage tips.

@let-sunny let-sunny marked this pull request as ready for review March 26, 2026 09:53
@let-sunny
Copy link
Copy Markdown
Owner Author

Before/After Score Comparison

Ran analysis on 4 fixtures comparing old (flat severity weights) vs new (calculatedScore):

Overall

Fixture OLD NEW Delta
material3-52949-27916 C+ (71%) D (57%) -14
simple-ds-175-7790 C (69%) D (52%) -17
simple-ds-175-9106 C+ (70%) D (52%) -18
material3-56615-82356 D (60%) D (52%) -8

By Category (selected highlights)

Fixture Category OLD NEW Delta
material3-52949-27916 token 49% 16% -33
material3-52949-27916 component 72% 61% -11
simple-ds-175-7790 component 57% 25% -32
simple-ds-175-7790 behavior 61% 45% -16
simple-ds-175-9106 naming 63% 51% -12
simple-ds-175-9106 behavior 68% 48% -20

Full category breakdown

material3-52949-27916
  OLD: struct:78 token:49 comp:72 name:74 behav:82
  NEW: struct:63 token:16 comp:61 name:67 behav:77

simple-ds-175-7790
  OLD: struct:73 token:79 comp:57 name:74 behav:61
  NEW: struct:57 token:65 comp:25 name:67 behav:45

simple-ds-175-9106
  OLD: struct:74 token:80 comp:64 name:63 behav:68
  NEW: struct:55 token:64 comp:43 name:51 behav:48

material3-56615-82356
  OLD: struct:5  token:82 comp:74 name:75 behav:65
  NEW: struct:5  token:77 comp:51 name:68 behav:58

Observations

  • Scores dropped across the board (expected — high-score rules like no-auto-layout at -10 now contribute 10+ to density instead of flat 3.0)
  • Component category saw the largest drops — missing-component (score: -7) was previously treated as risk=2.0, now contributes 7.0
  • Token category dropped significantly for material3-52949-27916 (204 issues with raw-font at -4 instead of 2.0)
  • Structure on material3-56615-82356 was already at floor (5%) — no change possible
  • Scores will need recalibration via /calibrate-loop — but now score adjustments in rule-config.ts will actually affect the output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use calibrated per-rule scores in final percentage calculation

1 participant