feat: weight diversity scoring by rule severity#111
Conversation
📝 WalkthroughWalkthroughScoring now derives density from each issue’s Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
let-sunny
left a comment
There was a problem hiding this comment.
코드 리뷰
전반적으로 문제 정의가 명확하고 해결 방향도 적절합니다. 몇 가지 피드백 남깁니다.
1. TOTAL_SCORE_PER_CATEGORY 하드코딩 → 동기화 리스크
rule-config.ts의 score가 변경되거나 룰이 추가/삭제될 때마다 이 상수를 수동으로 맞춰야 합니다. 주석에 "Must be updated"라고 적었지만, 실제로는 잊기 쉽습니다. rule-config.ts에서 동적으로 계산하는 방식을 고려해볼 수 있습니다.
2. density vs diversity의 score 사용 비대칭
- density:
issue.calculatedScore사용 (= base score × depthWeight) → 위치(depth) 반영 - diversity:
issue.config.score사용 (= base score만) → 위치 무시
의도적 설계로 보이지만 (diversity는 "어떤 종류의 문제가 있는가"를 측정하므로 depth 무관), 이 비대칭에 대한 근거를 코드 주석에 명시해두면 향후 혼란을 줄일 수 있을 것 같습니다.
3. diversity 의미론 변화에 따른 edge case
기존: "얼마나 다양한 종류의 문제가 있는가" (breadth)
변경 후: "severity 가중치 기준 weighted breadth"
suggestion 룰 3개 트리거 (score: -2, -1, -1) 예시:
- 기존: 3/9 = 33% ratio → diversity 67%
- 변경 후: 4/42 = 9.5% ratio → diversity 90%
낮은 severity 룰들이 대량 트리거되는 케이스에서 diversity가 과도하게 높게 나올 수 있습니다. 이것이 의도된 동작인지 확인이 필요합니다.
4. #106 이슈의 재검토 조건 미충족
이슈 #106 본문에 아래 내용이 있습니다:
Re-evaluate after #104 whether this change is still needed, or whether density alone provides sufficient differentiation.
#104가 #110으로 이미 머지된 상태이므로, density만으로 severity 차이가 충분히 반영되는지 재검토 결과를 PR 본문에 포함하면 좋겠습니다.
5. 테스트 & 정확성
- 새 테스트 케이스가 핵심 시나리오를 잘 커버하고 있습니다 ✓
TOTAL_SCORE_PER_CATEGORY값을 rule-config.ts 대조 검증한 결과 모두 정확합니다 ✓
Review Feedback Addressed (5b4c80b)1. ✅ TOTAL_SCORE_PER_CATEGORY 동적 계산
2. ✅ density vs diversity 비대칭 코드 주석
3. ✅ low-severity edge case
4. ✅ #104 이후 재검토 결과
5. 테스트 & 정확성
|
|
✅ LGTM — 리뷰 피드백 5건 모두 반영 확인. 머지 가능합니다.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/core/engine/scoring.test.ts`:
- Around line 89-117: Test currently uses makeIssue where opts.score populates
both issue.config.score and issue.calculatedScore, so it doesn't prove density
uses calculatedScore; change the fixture to create an issue whose config.score
and calculatedScore differ and assert density/weightedIssueCount follows
calculatedScore. Locate the tests using calculateScores, makeResult and
makeIssue (the two "uses calculatedScore..." and "differentiates rules..."
cases) and either call makeIssue with an explicit calculatedScore different from
config.score (or mutate the produced issue to set issue.calculatedScore) before
passing to makeResult, then keep the existing expectations on
byCategory.structure.densityScore and weightedIssueCount to ensure they reflect
calculatedScore rather than config.score.
In `@src/core/engine/scoring.ts`:
- Around line 69-82: TOTAL_SCORE_PER_CATEGORY is computed from the static
RULE_CONFIGS while weightedTriggered/issue.config.score come from the
preset-adjusted map returned by getConfigsWithPreset, causing mismatched
numerator/denominator; change the code to compute totals from the same preset
config map used to produce result.issues (i.e., derive totals from the config
object returned by getConfigsWithPreset instead of RULE_CONFIGS). Specifically,
when you call getConfigsWithPreset (or wherever result.issues' issue.config is
created), pass that resulting config map into the TOTAL_SCORE_PER_CATEGORY
calculation (or replace the current IIFE with a function that accepts the preset
config map) so totals are computed by iterating over that preset-adjusted config
map (use the same rule ids and RULE_ID_CATEGORY mapping) to sum
Math.abs(config.score) per category; ensure any other places noted (lines
~214-218) also use the preset config totals rather than the base RULE_CONFIGS.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 10a9a73c-a096-450d-bd7a-c8c6829b65ec
📒 Files selected for processing (3)
src/core/engine/scoring.test.tssrc/core/engine/scoring.tssrc/core/rules/rule-config.ts
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 4 file(s) based on 2 unresolved review comments. Files modified:
Commit: The changes have been pushed to the Time taken: |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/core/engine/scoring.test.ts (1)
32-46: 🧹 Nitpick | 🔵 TrivialConsider adding
calculatedScoreparameter tomakeIssuehelper.The tests at lines 91-95 and 110-114 manually override
calculatedScoreafter callingmakeIssue. Adding an optional parameter would be cleaner.♻️ Suggested helper improvement
function makeIssue(opts: { ruleId: string; category: Category; severity: Severity; score?: number; + calculatedScore?: number; }): AnalysisIssue { return { violation: makeViolation(opts.ruleId), rule: makeRule({ id: opts.ruleId, category: opts.category }), config: makeConfig(opts.severity, opts.score ?? -5), depth: 0, maxDepth: 5, - calculatedScore: opts.score ?? -5, + calculatedScore: opts.calculatedScore ?? opts.score ?? -5, }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/engine/scoring.test.ts` around lines 32 - 46, The helper makeIssue currently sets calculatedScore from opts.score and tests override it manually; add an optional calculatedScore?: number field to the makeIssue opts and use it to set the returned object's calculatedScore (falling back to opts.score ?? -5) so tests can pass a specific calculatedScore directly; update the function signature and the returned object (references: makeIssue, makeRule, makeConfig, calculatedScore) and update callers in the tests to pass the new arg instead of mutating the result.
♻️ Duplicate comments (1)
src/core/engine/scoring.ts (1)
229-236:⚠️ Potential issue | 🟡 MinorPotential division by zero if a category has no enabled rules.
If
totalScorePerCategory[category]is 0 (e.g., all rules in a category are disabled via preset), line 234 will produceInfinityorNaN.🛡️ Proposed fix to guard against zero denominator
let diversityScore = 100; if (catScore.issueCount > 0) { const ruleScores = ruleScorePerCategory.get(category)!; const weightedTriggered = Array.from(ruleScores.values()).reduce((sum, s) => sum + s, 0); const weightedTotal = totalScorePerCategory[category]; - const diversityRatio = weightedTriggered / weightedTotal; + const diversityRatio = weightedTotal > 0 ? weightedTriggered / weightedTotal : 0; diversityScore = clamp(Math.round((1 - diversityRatio) * 100), 0, 100); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/engine/scoring.ts` around lines 229 - 236, The division can produce Infinity/NaN when totalScorePerCategory[category] is zero; in the block that computes diversityScore (using catScore.issueCount, ruleScorePerCategory, totalScorePerCategory, and clamp) guard against a zero/negative weightedTotal by checking if weightedTotal <= 0 and in that case leave diversityScore at its default (100) or treat diversityRatio as 0 before computing clamp; ensure the check is added before computing diversityRatio so no division occurs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/core/engine/scoring.test.ts`:
- Around line 179-195: The comment and thresholds rely on a hardcoded total
score "42" which can go stale; update the test in scoring.test.ts to compute the
total dynamically from the current RULE_CONFIGS (or derive it via existing
helpers) instead of embedding 42 in the comment and expectations, e.g., use
RULE_CONFIGS sum-of-absolute-scores (or call the same scoring helpers used by
calculateScores/makeIssue/makeResult) to compute the ratio explanation and keep
the assertions (expect(...diversityScore).toBeGreaterThan(90) etc.) valid
against changing rule weights; adjust the inline comment to reference the
computed total rather than the literal "42".
In `@src/core/engine/scoring.ts`:
- Around line 189-194: The fallback to RULE_CONFIGS when configs is falsy can
cause numerator/denominator mismatch if issues were generated with a preset;
update the function or surrounding JSDoc to clearly state that callers must
provide the same configs/preset used to produce issues to ensure denominator
alignment, and document this behavior next to the computeTotalScorePerCategory
usage and the configs parameter (and mention RULE_CONFIGS as the default
fallback) so callers know to always pass configs when using presets.
---
Outside diff comments:
In `@src/core/engine/scoring.test.ts`:
- Around line 32-46: The helper makeIssue currently sets calculatedScore from
opts.score and tests override it manually; add an optional calculatedScore?:
number field to the makeIssue opts and use it to set the returned object's
calculatedScore (falling back to opts.score ?? -5) so tests can pass a specific
calculatedScore directly; update the function signature and the returned object
(references: makeIssue, makeRule, makeConfig, calculatedScore) and update
callers in the tests to pass the new arg instead of mutating the result.
---
Duplicate comments:
In `@src/core/engine/scoring.ts`:
- Around line 229-236: The division can produce Infinity/NaN when
totalScorePerCategory[category] is zero; in the block that computes
diversityScore (using catScore.issueCount, ruleScorePerCategory,
totalScorePerCategory, and clamp) guard against a zero/negative weightedTotal by
checking if weightedTotal <= 0 and in that case leave diversityScore at its
default (100) or treat diversityRatio as 0 before computing clamp; ensure the
check is added before computing diversityRatio so no division occurs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d3ac25e5-c4d7-4727-9dd7-6e79e1c7bc6e
📒 Files selected for processing (4)
src/cli/commands/analyze.tssrc/core/engine/scoring.test.tssrc/core/engine/scoring.tssrc/mcp/server.ts
Replace equal-count diversity (each triggered rule = 1/N) with severity-weighted diversity (each triggered rule = |score|/totalScore). Before: no-auto-layout (score -10) and unnecessary-node (score -2) both counted as 1/9 in structure diversity — a single concentrated blocking problem got a high diversity score. After: no-auto-layout counts as 10/42, unnecessary-node as 2/42. Concentrated blocking issues now correctly lower the diversity score. Closes #106
1. Dynamic TOTAL_SCORE_PER_CATEGORY: Replace hardcoded values with computation from RULE_CONFIGS + RULE_ID_CATEGORY mapping, eliminating sync risk when rules are added/removed or scores change. 2. Document density vs diversity asymmetry: density uses calculatedScore (with depthWeight), diversity uses config.score (without depthWeight). Intentional — diversity measures "what types of problems" not "where". 3. Low-severity edge case: documented as intentional behavior with test. Within structure, 1 suggestion rule (score -2) → diversity 95%, 1 blocking rule (score -10) → diversity 76%. This correctly reflects that low-severity issues are less concerning for implementation.
Fixed 4 file(s) based on 2 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
1. Division by zero guard: diversityRatio defaults to 0 when weightedTotal is 0 (all category rules disabled via preset). 2. Clarify fallback comment: document that omitting configs param is only correct when issues were produced with default RULE_CONFIGS. 3. Remove hardcoded "42" from test comment to avoid stale values when calibration changes rule scores. 4. Restore trailing newlines removed by autofix in analyze.ts and server.ts.
60841c1 to
8413767
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/core/engine/scoring.test.ts (1)
179-194: 🧹 Nitpick | 🔵 TrivialMake the diversity assertions calibration-proof.
These
> 90/< 80checks still encode today’s structure weights, so a calibration-only change insrc/core/rules/rule-config.tscan fail the test without any regression in diversity logic. Derive the expected percentages from the current structure total instead of hard-coded thresholds. Based on learnings,src/core/rules/rule-config.tsis automatically adjusted by a nightly calibration pipeline.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/engine/scoring.test.ts` around lines 179 - 194, Replace the hard-coded >90 / <80 assertions with thresholds computed from the current "structure" category total in the rule configuration: import or access the structure category total from the rule-config (e.g., via a getCategoryTotal("structure") or the exported config in rule-config.ts), compute expectedPercentageLow = (lowSeverity.byCategory.structure.diversityScore / structureTotal) * 100 and expectedPercentageHigh = (highSeverity.byCategory.structure.diversityScore / structureTotal) * 100 (or otherwise compute the percentages consistently), and assert those relative values instead of fixed numeric literals so the test adapts to nightly calibration changes; update the assertions that reference calculateScores, makeResult, makeIssue, and the byCategory.structure.diversityScore accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/core/engine/scoring.ts`:
- Around line 70-84: computeTotalScorePerCategory currently uses the built-in
RULE_ID_CATEGORY lookup so any custom rules merged into configs are ignored;
update computeTotalScorePerCategory to derive the category per rule id from the
active rule definitions/category map used by calculateScores (e.g.,
RULE_DEFINITIONS[id].category or an activeRuleCategoryMap passed in) instead of
RULE_ID_CATEGORY, and include enabled custom rules' Math.abs(config.score) in
the totals so weightedTotal/weightedTriggered operate over the same rule set.
---
Duplicate comments:
In `@src/core/engine/scoring.test.ts`:
- Around line 179-194: Replace the hard-coded >90 / <80 assertions with
thresholds computed from the current "structure" category total in the rule
configuration: import or access the structure category total from the
rule-config (e.g., via a getCategoryTotal("structure") or the exported config in
rule-config.ts), compute expectedPercentageLow =
(lowSeverity.byCategory.structure.diversityScore / structureTotal) * 100 and
expectedPercentageHigh = (highSeverity.byCategory.structure.diversityScore /
structureTotal) * 100 (or otherwise compute the percentages consistently), and
assert those relative values instead of fixed numeric literals so the test
adapts to nightly calibration changes; update the assertions that reference
calculateScores, makeResult, makeIssue, and the
byCategory.structure.diversityScore accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a176e762-fc9d-4740-9feb-0cf7991998f1
📒 Files selected for processing (5)
src/cli/commands/analyze.tssrc/core/engine/scoring.test.tssrc/core/engine/scoring.tssrc/core/rules/rule-config.tssrc/mcp/server.ts
| function computeTotalScorePerCategory( | ||
| configs: Record<RuleId, RuleConfig> | ||
| ): Record<Category, number> { | ||
| const totals = Object.fromEntries( | ||
| CATEGORIES.map(c => [c, 0]) | ||
| ) as Record<Category, number>; | ||
|
|
||
| for (const [id, config] of Object.entries(configs)) { | ||
| const category = RULE_ID_CATEGORY[id as RuleId]; | ||
| if (category && config.enabled) { | ||
| totals[category] += Math.abs(config.score); | ||
| } | ||
| } | ||
|
|
||
| return totals; |
There was a problem hiding this comment.
Include custom rules in the diversity denominator.
RULE_ID_CATEGORY only covers built-in IDs, but both updated call sites now merge custom rule configs before invoking calculateScores. Any enabled custom rule is skipped here, while Line 205 still counts it once triggered, so weightedTotal and weightedTriggered stop describing the same rule set. That over-penalizes custom rules and can even leave diversity at 100 when a category only has custom rules enabled. Build the denominator from the active rule definitions/category map, not just the built-in lookup.
Also applies to: 205-205
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/core/engine/scoring.ts` around lines 70 - 84,
computeTotalScorePerCategory currently uses the built-in RULE_ID_CATEGORY lookup
so any custom rules merged into configs are ignored; update
computeTotalScorePerCategory to derive the category per rule id from the active
rule definitions/category map used by calculateScores (e.g.,
RULE_DEFINITIONS[id].category or an activeRuleCategoryMap passed in) instead of
RULE_ID_CATEGORY, and include enabled custom rules' Math.abs(config.score) in
the totals so weightedTotal/weightedTriggered operate over the same rule set.
Summary
triggered rules / total rules) with severity-weighted diversity (Σ|score| of triggered / Σ|score| of all)TOTAL_SCORE_PER_CATEGORYcomputed dynamically fromRULE_CONFIGS+RULE_ID_CATEGORY(no hardcoded sync risk)Re-evaluation: Is density alone sufficient after #104?
After #110 landed
calculatedScore-based density, density does differentiate rules by score. However, diversity still adds value in a specific scenario:Density alone misses breadth. Consider two designs with identical density (same
weightedIssueCount / nodeCount):no-auto-layout(1 rule, systematic)Density scores are identical, but Design B is harder to fix — 4 different problems require 4 different solutions. Diversity captures this by penalizing breadth.
Severity-weighting makes diversity more accurate: in old equal-count diversity, triggering 1 blocking rule and 1 suggestion rule had equal impact. Now blocking rules correctly dominate the diversity penalty.
Conclusion: Diversity remains necessary as a complement to density. The change in this PR makes it more accurate by aligning diversity penalties with actual rule severity.
Edge case: low-severity rules
3 suggestion rules (score -2, -1, -1) triggering in structure → weighted ratio 4/42 = 9.5% → diversity 90%.
This is intentional: low-severity issues represent minor concerns that shouldn't heavily penalize the design score.
Depends on
Closes #106
Summary by CodeRabbit
Improvements
Tests