experiment: ablation Phase 2 — generalization across 6 new fixtures#150
Merged
experiment: ablation Phase 2 — generalization across 6 new fixtures#150
Conversation
Contributor
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (300)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Strip experiments (5 types × 6 new fixtures): - layout-direction-spacing: ΔV=+7.5% (up from +3.3% in Phase 1) - style-references: ΔV=+3.5% (emerged as significant, was noise in P1) - variable-references: +1.5% (borderline) - component-references: 0.0% pixel, CSS classes -15 - node-names-hierarchy: -1.0% (no impact) Size-constraints responsive (6 fixtures): - Average ΔV=+15.9% across 7 valid results - mobile-shop: ΔV=+46% (extreme — complex layout breaks without size info) - Baseline HTML reused from cache (no redundant API calls) run-condition.ts: baseline reuse from phase1 cache run-responsive.ts: local-only responsive comparison utility Total Phase 2 cost: ~$13 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
let-sunny
added a commit
that referenced
this pull request
Mar 28, 2026
…eriment data Categories changed from intuition-based (structure/token/component/naming/behavior) to experiment-based (pixel-critical/responsive-critical/code-quality/token-management/minor). Removed 10 low-impact rules, merged 6 rules into 2 (raw-value, irregular-spacing). Scores recalibrated using Phase 1+2 ablation results (PR #149, #150). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Mar 28, 2026
let-sunny
added a commit
that referenced
this pull request
Mar 29, 2026
#154) * refactor: reorganize rule categories and scores based on ablation experiment data Categories changed from intuition-based (structure/token/component/naming/behavior) to experiment-based (pixel-critical/responsive-critical/code-quality/token-management/minor). Removed 10 low-impact rules, merged 6 rules into 2 (raw-value, irregular-spacing). Scores recalibrated using Phase 1+2 ablation results (PR #149, #150). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address CodeRabbit review — missing category, score, duplication - Add missing responsive-critical to orchestrator + report-generator tests - Restore missing-component score to -7 (CLAUDE.md guideline) - Fix raw-value font check: flag partial tokenization (fontFamily OR fontSize) - Reuse CATEGORY_LABELS in getCategoryLabel (remove duplication) - Align example config with baseline defaults Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add subType to RuleViolation and centralize rule messages - Add optional subType field to RuleViolation for programmatic grouping (e.g., raw-value has color/font/shadow/opacity/spacing sub-types) - Create rule-messages.ts with all message template functions - Replace inline message strings in all 15 rules with centralized constants - Include subType in JSON output (buildResultJson) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: remove missing-responsive-behavior (duplicate of no-auto-layout), add subType to default-name and irregular-spacing - Remove missing-responsive-behavior rule (15→14 rules) — duplicates no-auto-layout - Add subType to default-name: frame/group/vector/shape/text/image/component/instance - Add subType to irregular-spacing: padding/gap - Separate vector from shape in default-name subType for granular control Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: rename group-usage → non-layout-container, add Section detection - Rename rule: group-usage → non-layout-container (broader scope) - Add Section detection: Sections used as layout containers are flagged - SubTypes: group (blocking, -8) and section (same score, flagged only with children) - Remove missing-responsive-behavior (duplicate of no-auto-layout) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add subType to missing-size-constraint (max-width/min-width/wrap/grid) Messages pre-defined for all 4 sub-types. Currently only max-width is detected — min-width, wrap, and grid conditions to be implemented in #152. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address CodeRabbit review round 2 - Add subType regression test in buildResultJson - Use CATEGORIES import in report-generator.test.ts (prevent taxonomy drift) - Legacy config deprecation → #156 (separate issue) - raw-value single-violation-per-node → by design (RuleCheckFn contract) - SECTION detection condition kept as-is (semantic misuse = valid flag) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add empty-string subType edge case test Verifies that falsy subType (empty string) is omitted from JSON output, matching the conditional spread guard in buildResultJson. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: reject unknown rule IDs in config-loader Config files with invalid rule IDs now throw with a clear error message listing the unknown IDs and all valid options. No legacy alias mapping needed — no existing users to support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address CodeRabbit review round 3+4 - Guard indexed array access with length assertion (noUncheckedIndexedAccess) - Strengthen unknown rule ID test to assert both error and valid IDs list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Mar 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 2: Generalization
Phase 1 results (desktop 3) — verify reproducibility across different fixtures.
Fixtures (6)
Experiments
Cost estimate: ~$16
Hover-interaction: skipped (confirmed as "cannot implement" in Phase 1)
Wiki: Experiment 05
🤖 Generated with Claude Code