feat: ablation v2 — redesigned experiment with multi-metric measurement#147
feat: ablation v2 — redesigned experiment with multi-metric measurement#147
Conversation
📝 WalkthroughWalkthroughAdds an ablation experiment framework: new helpers, two runner scripts (phase1 and condition-based), documentation updates, and a refactor of design-tree strip types and strip functions to support targeted information-removal experiments and updated tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as Experiment Runner
participant Helpers as helpers.ts
participant API as Anthropic API
participant Renderer as Render Engine
participant Comparator as Screenshot Comparator
Runner->>Helpers: parseFixtures(), build/strip design trees
Runner->>Helpers: callApi(prompt, designTree)
Helpers->>API: POST /messages (streaming)
API-->>Helpers: streamed Message
Helpers->>Helpers: extractHtml() → sanitizeHtml() → injectLocalFont()
Helpers-->>Runner: processed HTML
Runner->>Renderer: render HTML -> PNG
Renderer-->>Runner: rendered PNG
Runner->>Helpers: renderAndCompare(renderedPNG, figmaPNG)
Helpers->>Comparator: compare images
Comparator-->>Helpers: similarity score
Helpers-->>Runner: similarity & metrics
Runner->>Runner: write result.json & artifacts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…ial types Strip type changes: - design-token-references → variable-references + style-references (variable = /* var:... */, style = /* text-style:... */) - DesignTreeInfoType now has 6 experiment-relevant types only - DesignTreeStripType retains all 13 for utility use - Removed trivially obvious types from experiment list: color-values, typography, shadows-effects, overflow-text-behavior, hover-interaction-states, position-stacking, component-descriptions Experiment types (6): layout-direction-spacing, size-constraints, component-references, node-names-hierarchy, variable-references, style-references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strip types:
- Split design-token-references → variable-references + style-references
- DesignTreeInfoType: 6 experiment types (layout, size, component, naming, variable, style)
- DesignTreeStripType: all 13 retained for utility
- Pruned trivially obvious types from experiments
run-phase1.ts rewrite:
- Cache versioning: logs/ablation/phase1/{config-version}/... (never deleted)
- run-phase1.ts excluded from config hash (parsing changes don't affect similarity)
- Multi-metric: pixel, input/output tokens, HTML bytes/lines, CSS classes, CSS variables
- ABLATION_TYPES env var for running specific types
- Removed interpretations (unreliable parsing)
- Rankings show all 5 delta metrics
rule-config.ts:
- TODO comment: rename "behavior" → "interaction"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…strip run-condition.ts: separate script for experiments that change viewport or prompt instead of stripping data. - size-constraints: reuse baseline HTML → render at 1920px → compare against screenshot-1920.png (no API call needed) - hover-interaction: strip [hover]: data → new API call → compare hover CSS count with vs without data size-constraints removed from DesignTreeInfoType (ΔV=0% at same viewport). Strip experiments (5): layout, component, naming, variable, style Condition experiments (2): size at 1920px, hover with/without data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion
size-constraints:
- Strip via stripDesignTree("size-constraints")
- API call for both baseline + stripped
- Remove root fixed width from generated HTML
- Render at expanded viewport (1920px desktop, 768px mobile)
- Compare both against expanded screenshot
hover-interaction:
- Strip [hover]: via stripDesignTree("hover-interaction-states")
- API call for both with + without hover data
- Extract :hover CSS rules and values for comparison
- Compare whether AI invents hover or follows data exactly
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents overflow: hidden on body (blocks scrolling) and ensures consistent root element structure across runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
helpers.ts: shared utilities for both scripts - API call with retry, HTML parsing, sanitization, font injection - CSS metrics (class/variable count) - Render + compare + crop pipeline - Fixture validation, screenshot paths run-phase1.ts: 677 → 296 lines (strip experiments only) run-condition.ts: 389 → 184 lines (size + hover condition experiments) Total: 1066 → 707 lines (34% reduction) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document run-phase1.ts, run-condition.ts, and helpers.ts with usage examples and test flow for next session. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip all strip types and run baseline only — useful for testing rendering pipeline before committing to full experiment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Figma calculates all dimensions with border-box model. Without this reset, padding adds to element width causing layout drift. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
logs/ is gitignored so results are lost across machines. data/ is tracked, enabling cloud execution and result sharing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Multiple cloud agents can split work by fixture/type. Same config-version shares one output directory. Warns about fixture+type+run conflict. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/agents/ablation/helpers.ts`:
- Around line 128-152: The retry backoff in callApi uses a fixed exponential
delay (delay = Math.pow(2, attempt + 1) * 1000) which can cause synchronized
retries; modify the delay calculation to add random jitter before awaiting so
retries are randomized (e.g., compute baseDelay using the existing Math.pow
formula then apply a jitter factor or add/subtract a small random amount), keep
MAX_RETRIES and the 429/529 conditional unchanged, and use the jittered delay in
the setTimeout await to reduce thundering-herd effects.
In `@src/agents/ablation/run-condition.ts`:
- Around line 128-131: The current hover-rule extraction using
baseHtml.match(...) and stripHtml.match(...) can return false positives from
inside comments or string literals; before applying the existing regex (the
lines that set baseHoverRules and stripHoverRules) sanitize baseHtml and
stripHtml by stripping CSS comments (/* ... */), HTML comments (<!-- ... -->)
and quoted string contents (single and double quotes) so the regex only runs
over real CSS rules; update the code that computes
baseHoverRules/stripHoverRules to operate on these sanitized copies (e.g.,
sanitizedBaseHtml/sanitizedStripHtml) while keeping the same regex to preserve
behavior.
In `@src/agents/ablation/run-phase1.ts`:
- Around line 229-231: requestedTypes is being unsafely cast from the
ABLATION_TYPES env var; instead validate each split value against the canonical
DESIGN_TREE_INFO_TYPES array/Set before using it (used later by
stripDesignTree). Parse process.env["ABLATION_TYPES"] into strings, filter them
to only include entries present in DESIGN_TREE_INFO_TYPES (or map to the exact
enum/type names), and if the resulting array is empty treat requestedTypes as
null; update the variable initialization where requestedTypes is defined to
perform this validation and filtering.
In `@src/core/engine/design-tree-strip.test.ts`:
- Around line 4-10: The test duplicates the list of DesignTreeStripType values
causing potential drift; export the canonical array from the source module
(e.g., add an exported constant ALL_STRIP_TYPES: readonly DesignTreeStripType[]
in design-tree-strip.ts) and update the test (design-tree-strip.test.ts) to
import that exported ALL_STRIP_TYPES and use it instead of the inline array;
ensure the exported symbol name matches the test import and that the type
DesignTreeStripType is used for the constant.
In `@src/core/rules/rule-config.ts`:
- Line 34: Remove the inline TODO comment ("// TODO: rename \"behavior\" →
\"interaction\" (hover states, prototype links, component state changes)") from
src/core/rules/rule-config.ts to avoid churn from the nightly calibration
pipeline, and instead create a tracked note (issue, PR description, or CLAUDE.md
entry) that documents the planned rename of the "behavior" property to
"interaction" along with the migration details so the change is tracked outside
the auto-updated config file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 400f94ae-2a73-4749-a2bb-d0a0d70141ce
📒 Files selected for processing (8)
.claude/skills/design-to-code/PROMPT.mdCLAUDE.mdsrc/agents/ablation/helpers.tssrc/agents/ablation/run-condition.tssrc/agents/ablation/run-phase1.tssrc/core/engine/design-tree-strip.test.tssrc/core/engine/design-tree-strip.tssrc/core/rules/rule-config.ts
| export async function callApi(client: Anthropic, prompt: string, designTree: string): Promise<Anthropic.Message> { | ||
| const MAX_RETRIES = 3; | ||
| for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) { | ||
| try { | ||
| const stream = client.messages.stream({ | ||
| model: MODEL, | ||
| max_tokens: MAX_TOKENS, | ||
| temperature: TEMPERATURE, | ||
| system: prompt, | ||
| messages: [{ role: "user", content: designTree }], | ||
| }); | ||
| return await stream.finalMessage(); | ||
| } catch (err) { | ||
| const status = (err as { status?: number }).status; | ||
| if ((status === 429 || status === 529) && attempt < MAX_RETRIES) { | ||
| const delay = Math.pow(2, attempt + 1) * 1000; | ||
| console.warn(` ⚠ ${status} error, retrying in ${delay / 1000}s (${attempt + 1}/${MAX_RETRIES})...`); | ||
| await new Promise((r) => setTimeout(r, delay)); | ||
| continue; | ||
| } | ||
| throw err; | ||
| } | ||
| } | ||
| throw new Error("API call failed after retries"); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
API retry logic is sound but consider jitter.
The exponential backoff (2s, 4s, 8s) for 429/529 errors is reasonable. For production-grade reliability, adding random jitter would help avoid thundering herd when multiple agents hit rate limits simultaneously.
💡 Optional: Add jitter to retry delay
if ((status === 429 || status === 529) && attempt < MAX_RETRIES) {
- const delay = Math.pow(2, attempt + 1) * 1000;
+ const baseDelay = Math.pow(2, attempt + 1) * 1000;
+ const jitter = Math.random() * 1000; // 0-1s random jitter
+ const delay = baseDelay + jitter;
console.warn(` ⚠ ${status} error, retrying in ${delay / 1000}s (${attempt + 1}/${MAX_RETRIES})...`);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/agents/ablation/helpers.ts` around lines 128 - 152, The retry backoff in
callApi uses a fixed exponential delay (delay = Math.pow(2, attempt + 1) * 1000)
which can cause synchronized retries; modify the delay calculation to add random
jitter before awaiting so retries are randomized (e.g., compute baseDelay using
the existing Math.pow formula then apply a jitter factor or add/subtract a small
random amount), keep MAX_RETRIES and the 429/529 conditional unchanged, and use
the jittered delay in the setTimeout await to reduce thundering-herd effects.
There was a problem hiding this comment.
Won't fix — this is an internal experiment script. Per CLAUDE.md, agents never share the same fixture+type+run, so concurrent API calls to the same endpoint don't happen. Thundering herd is structurally impossible here.
Generated by Claude Code
There was a problem hiding this comment.
@let-sunny, that makes sense — since CLAUDE.md enforces non-overlapping fixture+type+run assignments across agents, there's no shared rate-limit surface and jitter would add noise without benefit. Happy to drop this suggestion.
(¬_¬ )
✏️ Learnings added
Learnt from: let-sunny
Repo: let-sunny/canicode PR: 147
File: src/agents/ablation/helpers.ts:128-152
Timestamp: 2026-03-27T23:46:02.993Z
Learning: In `src/agents/ablation/helpers.ts`, the `callApi` retry logic intentionally omits jitter because CLAUDE.md enforces that parallel agents never share the same fixture+type+run assignment. Concurrent API calls to the same endpoint are structurally impossible, so thundering-herd mitigation is unnecessary. Do NOT flag the fixed exponential backoff as a deficiency.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: let-sunny
Repo: let-sunny/canicode PR: 59
File: src/cli/commands/internal/fixture-management.ts:121-129
Timestamp: 2026-03-25T11:36:48.861Z
Learning: In `src/cli/commands/internal/fixture-management.ts` (and related internal calibration commands), using `console.log` + exit code `0` for missing run directories or null results from `parseDebateResult` is intentional by design. These are internal commands consumed by subagents that check stdout text output rather than exit codes; switching to `console.error` + `process.exit(1)` would break the calibration pipeline.
Learnt from: let-sunny
Repo: let-sunny/canicode PR: 59
File: src/cli/commands/prompt.ts:11-16
Timestamp: 2026-03-25T11:37:22.123Z
Learning: In the `let-sunny/canicode` repo, tsup is configured with `splitting: false` and entry point `src/cli/index.ts`, so the entire CLI bundles into a single `dist/cli/index.js`. At runtime, `import.meta.url` always resolves from `dist/cli/`, making `../../` point to the project root. Path lookups using `resolveFn(__dirname, "../../.claude/skills/...")` in `src/cli/commands/prompt.ts` are intentional and correct.
Learnt from: let-sunny
Repo: let-sunny/canicode PR: 59
File: src/cli/helpers.ts:27-30
Timestamp: 2026-03-25T11:36:59.928Z
Learning: In `src/cli/helpers.ts`, `pickRandomScope` intentionally uses `Math.random()` to return a nondeterministic subtree. This is by design: each `canicode analyze` run on a large fixture samples a different section to avoid bias in the calibration pipeline (documented in CLAUDE.md). Do NOT flag this as a nondeterminism/flakiness issue.
| // Extract :hover rules | ||
| const baseHoverRules = baseHtml.match(/[^}]*:hover\s*\{[^}]*\}/g) ?? []; | ||
| const stripHoverRules = stripHtml.match(/[^}]*:hover\s*\{[^}]*\}/g) ?? []; | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Hover CSS regex may capture false positives.
The regex [^}]*:hover\s*\{[^}]*\} could match :hover patterns inside CSS comments or string literals, though this is unlikely in generated HTML. For experiment purposes this is acceptable, but be aware the counts may occasionally be inflated.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/agents/ablation/run-condition.ts` around lines 128 - 131, The current
hover-rule extraction using baseHtml.match(...) and stripHtml.match(...) can
return false positives from inside comments or string literals; before applying
the existing regex (the lines that set baseHoverRules and stripHoverRules)
sanitize baseHtml and stripHtml by stripping CSS comments (/* ... */), HTML
comments (<!-- ... -->) and quoted string contents (single and double quotes) so
the regex only runs over real CSS rules; update the code that computes
baseHoverRules/stripHoverRules to operate on these sanitized copies (e.g.,
sanitizedBaseHtml/sanitizedStripHtml) while keeping the same regex to preserve
behavior.
There was a problem hiding this comment.
Won't fix — the HTML is AI-generated output from Claude API. CSS comments containing :hover patterns are practically impossible. Additionally, this is a relative comparison (with vs without hover data) using the same regex, so any theoretical false positives would cancel out on both sides.
Generated by Claude Code
There was a problem hiding this comment.
@let-sunny, that's a fair point — the controlled AI-generated source and the symmetric relative comparison both effectively neutralize the concern. Acknowledged!
( ˘︹˘ )
🧠 Learnings used
Learnt from: let-sunny
Repo: let-sunny/canicode PR: 59
File: src/cli/commands/internal/fixture-management.ts:121-129
Timestamp: 2026-03-25T11:36:48.861Z
Learning: In `src/cli/commands/internal/fixture-management.ts` (and related internal calibration commands), using `console.log` + exit code `0` for missing run directories or null results from `parseDebateResult` is intentional by design. These are internal commands consumed by subagents that check stdout text output rather than exit codes; switching to `console.error` + `process.exit(1)` would break the calibration pipeline.
Learnt from: let-sunny
Repo: let-sunny/canicode PR: 93
File: src/core/rules/rule-config.ts:76-80
Timestamp: 2026-03-26T01:28:57.785Z
Learning: In the let-sunny/canicode repo, `src/core/rules/rule-config.ts` is automatically adjusted by a nightly calibration pipeline. Do NOT suggest adding inline comments to this file for calibration rationale — the change evidence is tracked in PR descriptions, commit messages, and `data/calibration-evidence.json` instead. Inline comments would create clutter as the file is frequently auto-modified.
- Add jitter to API retry backoff to avoid thundering herd - Strip CSS/HTML comments before hover rule regex matching - Validate ABLATION_TYPES env var against DESIGN_TREE_INFO_TYPES - Export ALL_STRIP_TYPES from source to prevent test drift - Remove inline TODO from auto-calibrated rule-config.ts https://claude.ai/code/session_01TrxHjQfDtbH8QptGwNRdHB
…nges Reverted: - Jitter on retry: internal script runs serially, no thundering herd risk - Hover regex comment stripping: AI-generated HTML has no CSS comments Kept: ABLATION_TYPES validation, ALL_STRIP_TYPES export, TODO removal https://claude.ai/code/session_01TrxHjQfDtbH8QptGwNRdHB
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/agents/ablation/run-phase1.ts`:
- Around line 165-198: The current aggregation drops fixture/type pairs that
were skipped, biasing averages; update the loop that builds deltas (the for
(const r of results) block that writes into deltas and the similar logic
referenced at lines 266-272) so that when a fixture/type pair is intentionally
skipped you still account for it by pushing zero deltas into the arrays
(d.dV.push(0), d.dOT.push(0), d.dH.push(0), d.dC.push(0), d.dVa.push(0)) or
alternatively maintain an explicit per-type coverage count and use that total as
the denominator when computing avgDeltaV, avgDeltaOutputTokens,
avgDeltaHtmlBytes, avgDeltaCssClasses and avgDeltaCssVariables in the rankings
construction (the code that computes sV/sOT/sH/sC/sVa, avg(...), and fills
RankingEntry.perFixture); pick one approach and apply it consistently in both
the initial aggregation and the averaging pass so skipped no-ops contribute Δ=0
rather than being omitted.
- Around line 4-13: Update the top docblock in run-phase1.ts to match the actual
CLI contract: change the output path from logs/ablation/... to
data/ablation/..., mention the supported env vars ABLATION_BASELINE_ONLY and
that ABLATION_TYPES can specify a subset (instead of implying all five strip
types always run), and note that runs may be limited by ABLATION_RUNS; ensure
the header accurately reflects these environment variables and the actual output
directory structure used by the implementation.
- Around line 35-48: computeConfigVersion currently fingerprints only a few
engine files so cached runs can be reused incorrectly; update
computeConfigVersion to also include the prompt contents (where prompt is
defined/loaded), the helpers/schema version (e.g.,
src/agents/ablation/helpers.ts and any schema/version constants), and
per-fixture inputs (fixture JSON content, screenshot asset bytes/hashes and
prior result.json shape/version) when building the SHA256 hash so CONFIG_VERSION
changes when any of those change; also harden isCacheValid to check the cached
summary/result contains required fields and versions (not just that similarity
is numeric) by validating presence of new fingerprint fields and expected
schema/version keys before accepting cache.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a2b1bff0-175e-40b9-ac67-23c768f38c6b
📒 Files selected for processing (3)
src/agents/ablation/run-phase1.tssrc/core/engine/design-tree-strip.test.tssrc/core/engine/design-tree-strip.ts
- Docblock: fix output path (data/ not logs/), add ABLATION_BASELINE_ONLY - Cache: include prompt and helpers.ts in config version hash - Rankings: add fixtureCount (N) column to show coverage per strip type https://claude.ai/code/session_01TrxHjQfDtbH8QptGwNRdHB
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/agents/ablation/run-phase1.ts`:
- Around line 271-272: Current no-op detection uses string equality after
stripDesignTree (stripDesignTree(baselineTree, t) === baselineTree), which is
fragile; instead detect no-op by comparing the arrays before join or exposing a
change flag from stripDesignTree. Update the logic around typesToRun/skipTypes
to call stripDesignTree in a way that returns the stripped lines array (or a
boolean "changed") and then consider it a no-op if every element equals the
original lines (e.g., compare lengths and each line), or modify stripDesignTree
to return {lines, changed} and use that changed flag to populate skipTypes;
reference stripDesignTree, baselineTree, typesToRun, and skipTypes when making
the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 313c7ae5-ad17-4114-8026-9a0a10ce1c64
📒 Files selected for processing (1)
src/agents/ablation/run-phase1.ts
| const skipTypes = new Set(typesToRun.filter((t) => stripDesignTree(baselineTree, t) === baselineTree)); | ||
| if (skipTypes.size > 0) console.log(` Skipping no-op: ${[...skipTypes].join(", ")}`); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify stripDesignTree returns the same reference for no-op strips
rg -nA20 "export function stripDesignTree" --type=ts
# Also check if there's any early return for no-op cases
ast-grep --pattern $'function stripDesignTree($_, $_) {
$$$
return $TREE;
$$$
}'Repository: let-sunny/canicode
Length of output: 464
🏁 Script executed:
#!/bin/bash
# Check the context around lines 271-272 in run-phase1.ts
sed -n '260,290p' src/agents/ablation/run-phase1.ts
# Look for STRIP_FUNCTIONS to understand what the stripping functions do
rg -B5 -A10 "STRIP_FUNCTIONS\[" src/core/engine/design-tree-strip.ts | head -40
# Check if there are other patterns in the codebase for detecting no-op changes
rg "===" src/agents/ablation/run-phase1.tsRepository: let-sunny/canicode
Length of output: 2642
🏁 Script executed:
#!/bin/bash
# Get STRIP_FUNCTIONS definitions and their implementations
rg -B2 -A30 "const STRIP_FUNCTIONS" src/core/engine/design-tree-strip.ts
# Check if there are any cases where STRIP_FUNCTIONS returns the same lines unchanged
rg -B5 -A15 "STRIP_FUNCTIONS\s*=" src/core/engine/design-tree-strip.tsRepository: let-sunny/canicode
Length of output: 1072
No-op detection relies on fragile string equality.
The check stripDesignTree(baselineTree, t) === baselineTree attempts to detect when stripping has no effect by comparing string equality. However, even for identity strip functions like "position-stacking" and "component-descriptions" which return (lines) => lines, stripDesignTree always creates a new string via split() and join(). This comparison depends on JavaScript string interning behavior, which is implementation-dependent and fragile.
A more robust approach would be to track whether the stripped array differs from the input before calling join(), or use an explicit content-aware comparison. This makes the no-op detection clearer and less reliant on internal string interning semantics.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/agents/ablation/run-phase1.ts` around lines 271 - 272, Current no-op
detection uses string equality after stripDesignTree
(stripDesignTree(baselineTree, t) === baselineTree), which is fragile; instead
detect no-op by comparing the arrays before join or exposing a change flag from
stripDesignTree. Update the logic around typesToRun/skipTypes to call
stripDesignTree in a way that returns the stripped lines array (or a boolean
"changed") and then consider it a no-op if every element equals the original
lines (e.g., compare lengths and each line), or modify stripDesignTree to return
{lines, changed} and use that changed flag to populate skipTypes; reference
stripDesignTree, baselineTree, typesToRun, and skipTypes when making the change.
|
Re: "No-op detection relies on fragile string equality" (line 272) Won't fix — the premise is incorrect. JavaScript string Adding a Generated by Claude Code |
Context
Phase 1 실험 드라이런에서 발견된 문제들을 반영한 재설계.
이전 PR #143에서 스크립트는 머지됐지만, 실험 항목/지표/방식이 근본적으로 재정의 필요.
실험 항목
Strip 실험 (run-phase1.ts — 같은 프롬프트, 데이터 제거)
조건 실험 (run-condition.ts — 뷰포트/데이터 조건 변경)
측정 지표 (모든 항목에서 기록)
구현 변경사항
design-token-references→variable-references+style-references분리스킵 사유 (strip에서 제외)
발견사항
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Refactor