Skip to content

experiment: ablation Phase 1 results#149

Merged
let-sunny merged 2 commits intomainfrom
run/ablation-phase1
Mar 28, 2026
Merged

experiment: ablation Phase 1 results#149
let-sunny merged 2 commits intomainfrom
run/ablation-phase1

Conversation

@let-sunny
Copy link
Copy Markdown
Owner

@let-sunny let-sunny commented Mar 28, 2026

Ablation Phase 1 Results

Full experiment data for Experiment 05 on the wiki.

Key findings

Type ΔPixel ΔResponsive Code Impact Verdict
layout-direction-spacing +3.3% +7% at 1920px Code decreases Critical — affects pixel accuracy
size-constraints 0% at 1200px +7% at 1920px Minor Significant for responsive
component-references ~0% - CSS classes -12, code +3KB Code quality impact
style-references ~0% - CSS classes -5, code +2.8KB Minor code quality
variable-references ~0% - Minor input savings Minor
node-names-hierarchy ~0% - Minor Minor
hover-interaction - - 0 rules without data Cannot implement

Included

  • Strip experiment data (5 types × 3 fixtures)
  • Condition experiment data (size-constraints, hover-interaction)
  • Baseline data (3 fixtures)
  • exportScale fix + recompare.ts utility
  • Fixture rebuild with getImageFills

Total cost: $10.55 (30 API calls)

🤖 Generated with Claude Code

let-sunny and others added 2 commits March 28, 2026 17:21
2400px width was detected as @1x (> 1500), but it's actually
@2x of 1200px. Use known @1x widths (1920, 768) instead of
threshold. Default to @2x for fixture screenshots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strip experiments (5 types × 3 fixtures):
- layout-direction-spacing: ΔV=+3.3% (pixel), +7% (responsive 1920px)
- component-references: ΔV≈0, CSS classes -12, code +3KB
- style-references: ΔV≈0, CSS classes -5, code +2.8KB
- variable-references: ΔV≈0, minor input savings
- node-names-hierarchy: ΔV≈0, minor

Condition experiments:
- size-constraints @1920px: ΔV=+7% (significant for responsive)
- hover-interaction: 0 :hover rules without data (cannot implement)

Fixes applied during experiment:
- exportScale detection for @2x screenshots
- Fixture rebuild with getImageFills (no text overlay)
- recompare.ts for local re-rendering without API calls

Total cost: $10.55 (30 API calls)
Wiki: Experiment-05-ablation-phase1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@let-sunny let-sunny marked this pull request as ready for review March 28, 2026 10:30
@let-sunny let-sunny merged commit 7396b06 into main Mar 28, 2026
2 checks passed
let-sunny added a commit that referenced this pull request Mar 28, 2026
…eriment data

Categories changed from intuition-based (structure/token/component/naming/behavior)
to experiment-based (pixel-critical/responsive-critical/code-quality/token-management/minor).
Removed 10 low-impact rules, merged 6 rules into 2 (raw-value, irregular-spacing).
Scores recalibrated using Phase 1+2 ablation results (PR #149, #150).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
let-sunny added a commit that referenced this pull request Mar 29, 2026
#154)

* refactor: reorganize rule categories and scores based on ablation experiment data

Categories changed from intuition-based (structure/token/component/naming/behavior)
to experiment-based (pixel-critical/responsive-critical/code-quality/token-management/minor).
Removed 10 low-impact rules, merged 6 rules into 2 (raw-value, irregular-spacing).
Scores recalibrated using Phase 1+2 ablation results (PR #149, #150).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review — missing category, score, duplication

- Add missing responsive-critical to orchestrator + report-generator tests
- Restore missing-component score to -7 (CLAUDE.md guideline)
- Fix raw-value font check: flag partial tokenization (fontFamily OR fontSize)
- Reuse CATEGORY_LABELS in getCategoryLabel (remove duplication)
- Align example config with baseline defaults

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add subType to RuleViolation and centralize rule messages

- Add optional subType field to RuleViolation for programmatic grouping
  (e.g., raw-value has color/font/shadow/opacity/spacing sub-types)
- Create rule-messages.ts with all message template functions
- Replace inline message strings in all 15 rules with centralized constants
- Include subType in JSON output (buildResultJson)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: remove missing-responsive-behavior (duplicate of no-auto-layout), add subType to default-name and irregular-spacing

- Remove missing-responsive-behavior rule (15→14 rules) — duplicates no-auto-layout
- Add subType to default-name: frame/group/vector/shape/text/image/component/instance
- Add subType to irregular-spacing: padding/gap
- Separate vector from shape in default-name subType for granular control

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: rename group-usage → non-layout-container, add Section detection

- Rename rule: group-usage → non-layout-container (broader scope)
- Add Section detection: Sections used as layout containers are flagged
- SubTypes: group (blocking, -8) and section (same score, flagged only with children)
- Remove missing-responsive-behavior (duplicate of no-auto-layout)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add subType to missing-size-constraint (max-width/min-width/wrap/grid)

Messages pre-defined for all 4 sub-types. Currently only max-width is
detected — min-width, wrap, and grid conditions to be implemented in #152.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review round 2

- Add subType regression test in buildResultJson
- Use CATEGORIES import in report-generator.test.ts (prevent taxonomy drift)
- Legacy config deprecation → #156 (separate issue)
- raw-value single-violation-per-node → by design (RuleCheckFn contract)
- SECTION detection condition kept as-is (semantic misuse = valid flag)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add empty-string subType edge case test

Verifies that falsy subType (empty string) is omitted from JSON output,
matching the conditional spread guard in buildResultJson.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: reject unknown rule IDs in config-loader

Config files with invalid rule IDs now throw with a clear error message
listing the unknown IDs and all valid options. No legacy alias mapping
needed — no existing users to support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review round 3+4

- Guard indexed array access with length assertion (noUncheckedIndexedAccess)
- Strengthen unknown rule ID test to assert both error and valid IDs list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@let-sunny let-sunny deleted the run/ablation-phase1 branch March 29, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant