Skip to content

feat: add ablation Phase 1 execution script#143

Merged
let-sunny merged 30 commits intomainfrom
feat/ablation-script-130
Mar 27, 2026
Merged

feat: add ablation Phase 1 execution script#143
let-sunny merged 30 commits intomainfrom
feat/ablation-script-130

Conversation

@let-sunny
Copy link
Copy Markdown
Owner

@let-sunny let-sunny commented Mar 27, 2026

Summary

Automated script for targeted ablation experiment (#130).

What it does

  1. Loads 3 fixtures (product-detail, landing-page, ai-chat)
  2. For each: generates baseline design-tree + 10 stripped variants
  3. Calls Claude API (Sonnet, temperature=0) with PROMPT.md + design-tree
  4. Parses HTML + interpretations from response
  5. Renders HTML via Playwright → screenshot
  6. Compares vs Figma screenshot → pixel similarity
  7. Computes ΔV, ΔI, ΔT rankings

Usage

ANTHROPIC_API_KEY=sk-... npx tsx src/agents/ablation/run-phase1.ts

Output

logs/ablation/phase1/ (gitignored):

  • Per run: design-tree.txt, response.txt, output.html, code.png, figma.png, diff.png, result.json
  • Summary: summary.json with rankings sorted by avg ΔV

Features

  • Resume support: skips completed runs on restart (checks result.json)
  • Multi-metric: ΔV (primary), ΔI (tie-break), ΔT (efficiency)
  • Cost tracking: reports total tokens and estimated cost

Cost estimate

33 API calls (3 baselines + 30 ablations) × ~$0.50 = ~$16.50

Skipped types: position-stacking, component-descriptions (no-op stubs)

Test plan

  • Type check passes
  • 672 tests pass
  • Dry run with 1 fixture (manual)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added an Ablation Phase 1 runner that scores baseline vs variants across fixtures, saving per-run HTML, screenshots, diffs, parsed interpretations, token usage, and a ranked summary.
  • Bug Fixes

    • Ensured deterministic, single-pass filename mapping for asset downloads so recorded names match downloaded files.
  • Improvements

    • Nested vector node names now include parent qualification for clearer identifiers.
    • Rendering/layout emits more consistent alignment, absolute-position support, refined sizing rules, and improved text vertical alignment.

Automated script for targeted ablation experiment:
- 3 fixtures × (1 baseline + 10 strip types) = 33 API calls
- Calls Claude API with PROMPT.md + design-tree (baseline or stripped)
- Renders HTML via Playwright, compares vs Figma screenshot
- Records ΔV (similarity), ΔI (interpretations), ΔT (tokens)
- Resume support: skips completed runs on restart
- Outputs rankings sorted by avg ΔV

Usage:
  ANTHROPIC_API_KEY=sk-... npx tsx src/agents/ablation/run-phase1.ts

Output: logs/ablation/phase1/ (gitignored)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 27, 2026

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (272)
  • fixtures/desktop-about/data.json is excluded by !fixtures/**
  • fixtures/desktop-about/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-about/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-about/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-about/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-about/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-about/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-about/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/data.json is excluded by !fixtures/**
  • fixtures/desktop-ai-chat/vectors/arrow-up-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/code-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/figma-icon-3.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/image-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-ai-chat/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/mic-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/plus-circle-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/search-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-ai-chat/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-article/data.json is excluded by !fixtures/**
  • fixtures/desktop-article/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-article/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-article/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-article/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-article/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-article/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-article/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-contact-us/data.json is excluded by !fixtures/**
  • fixtures/desktop-contact-us/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-contact-us/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-contact-us/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-contact-us/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-contact-us/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-contact-us/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-contact-us/vectors/textarea-drag.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-contact-us/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-home-page/data.json is excluded by !fixtures/**
  • fixtures/desktop-home-page/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-home-page/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-home-page/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-home-page/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-home-page/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-home-page/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-home-page/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/data.json is excluded by !fixtures/**
  • fixtures/desktop-landing-page/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/vectors/info-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/vectors/info-icon-3.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/vectors/info-icon-4.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/vectors/info-icon-5.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/vectors/info-icon-6.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/vectors/info-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-landing-page/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-landing-page/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-portfolio/data.json is excluded by !fixtures/**
  • fixtures/desktop-portfolio/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-portfolio/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-portfolio/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-portfolio/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-portfolio/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-portfolio/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-portfolio/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-pricing/data.json is excluded by !fixtures/**
  • fixtures/desktop-pricing/vectors/chevron-down-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-pricing/vectors/chevron-down-icon-3.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-pricing/vectors/chevron-down-icon-4.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-pricing/vectors/chevron-down-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-pricing/vectors/chevron-up-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-pricing/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-pricing/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-pricing/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-pricing/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-pricing/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-pricing/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-pricing/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/data.json is excluded by !fixtures/**
  • fixtures/desktop-shop/vectors/check-icon-10.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/check-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/check-icon-3.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/check-icon-4.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/check-icon-5.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/check-icon-6.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/check-icon-7.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/check-icon-8.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/check-icon-9.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/check-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-shop/vectors/search-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/x-icon-3.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/x-icon-4.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/x-icon-5.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-shop/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-slot/data.json is excluded by !fixtures/**
  • fixtures/desktop-slot/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-slot/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-slot/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-slot/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-slot/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-slot/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-slot/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-waitlist/data.json is excluded by !fixtures/**
  • fixtures/desktop-waitlist/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-waitlist/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-waitlist/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-waitlist/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-waitlist/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-waitlist/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-waitlist/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-about/data.json is excluded by !fixtures/**
  • fixtures/mobile-about/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-about/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-about/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-about/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-about/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-about/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-about/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-about/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/data.json is excluded by !fixtures/**
  • fixtures/mobile-ai-chat/vectors/arrow-up-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/code-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/figma-icon-3.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/image-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-ai-chat/vectors/menu-icon-3.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/message-circle-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/mic-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/plus-circle-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/search-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-ai-chat/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-article/data.json is excluded by !fixtures/**
  • fixtures/mobile-article/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-article/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-article/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-article/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-article/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-article/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-article/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-article/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-contact-us/data.json is excluded by !fixtures/**
  • fixtures/mobile-contact-us/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-contact-us/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-contact-us/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-contact-us/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-contact-us/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-contact-us/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-contact-us/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-contact-us/vectors/textarea-drag.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-contact-us/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-home-page/data.json is excluded by !fixtures/**
  • fixtures/mobile-home-page/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-home-page/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-home-page/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-home-page/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-home-page/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-home-page/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-home-page/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-home-page/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/data.json is excluded by !fixtures/**
  • fixtures/mobile-landing-page/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/info-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/info-icon-3.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/info-icon-4.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/info-icon-5.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/info-icon-6.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/info-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-landing-page/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-landing-page/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-portfolio/data.json is excluded by !fixtures/**
  • fixtures/mobile-portfolio/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-portfolio/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-portfolio/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-portfolio/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-portfolio/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-portfolio/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-portfolio/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-portfolio/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/data.json is excluded by !fixtures/**
  • fixtures/mobile-pricing/vectors/chevron-down-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/vectors/chevron-down-icon-3.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/vectors/chevron-down-icon-4.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/vectors/chevron-down-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/vectors/chevron-up-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-pricing/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-pricing/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/data.json is excluded by !fixtures/**
  • fixtures/mobile-product-detail/vectors/chevron-down-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/chevron-down-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/chevron-up-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/heart-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-product-detail/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-10.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-11.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-12.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-13.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-14.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-15.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-16.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-3.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-4.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-5.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-6.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-7.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-8.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/star-icon-9.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-product-detail/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-shop/data.json is excluded by !fixtures/**
  • fixtures/mobile-shop/vectors/check-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-shop/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-shop/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-shop/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-shop/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-shop/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-shop/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-shop/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-shop/vectors/search-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-shop/vectors/sliders-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-shop/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-slot/data.json is excluded by !fixtures/**
  • fixtures/mobile-slot/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-slot/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-slot/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-slot/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-slot/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-slot/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-slot/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-slot/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-waitlist/data.json is excluded by !fixtures/**
  • fixtures/mobile-waitlist/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-waitlist/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-waitlist/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-waitlist/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-waitlist/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/mobile-waitlist/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/mobile-waitlist/vectors/menu-icon.svg is excluded by !**/*.svg, !fixtures/**

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ce788d8c-8b57-4e97-ad35-9263042ffc97

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added a new Ablation Phase 1 CLI runner that validates fixtures and PROMPT, generates baseline and stripped design-trees, runs cached or fresh Anthropic calls per condition/run, parses/sanitizes model HTML and interpretations, renders and compares screenshots, computes per-type ΔV/ΔI/ΔT rankings, and writes artifacts and a summary.

Changes

Cohort / File(s) Summary
Ablation Phase 1 Runner
src/agents/ablation/run-phase1.ts
New executable CLI implementing Phase 1: fixture validation, baseline and stripped design-tree generation, no-op detection, per-condition runs with file-cache validation, Anthropic calls on cache miss, HTML/interpretation extraction and sanitization, rendering, image diff/similarity, per-run artifacts and logs/ablation/phase1/summary.json.
CLI implement command
src/cli/commands/implement.ts
Refactored asset mapping: compute and assign sanitized, de-duplicated filenames and immediately populate mapping within the same loop that determines filenames and fetches assets; removed a separate post-loop mapping pass.
CLI save-fixture command
src/cli/commands/save-fixture.ts
Same single-pass change for vector SVG downloads: compute de-duplicated filename, assign mapping[id] immediately, then download/write SVG; removed the two-pass mapping recompute.
CLI helpers
src/cli/helpers.ts
collectVectorNodes signature updated to accept optional parentName and now prefixes collected vector name values with ancestor parentName during recursion, changing collected node naming for nested vectors.
Design-tree rendering
src/core/engine/design-tree.ts
Layout/rendering logic changes: emit align-items: flex-start when counter-axis align unset, add absolute positioning support (emit position:absolute and top/left deltas; parent becomes position:relative if child absolute), only emit width:100% for horizontal FILL when layoutGrow !== 1, refine textAlignVertical flex hint use, and extend renderNode to accept optional parentBBox passed down during recursion.

Sequence Diagram

sequenceDiagram
    participant Runner as Phase1 Runner
    participant FS as File System
    participant Cache as Cache Layer
    participant API as Anthropic API
    participant Parser as Response Parser
    participant Renderer as HTML Renderer
    participant Comparator as Image Comparator
    participant Aggregator as Result Aggregator

    Runner->>FS: Load fixtures & PROMPT.md
    Runner->>Runner: Build baseline design-tree
    Runner->>Runner: Identify no-op strip types

    loop For each fixture/type/run
        Runner->>Cache: Check cache (model/temp/tokens/promptHash/config/fixtureHash)
        alt Cache Hit
            Cache-->>Runner: Return cached result + artifacts
        else Cache Miss
            Runner->>API: Send prompt + design-tree
            API-->>Runner: Response + token usage
            Runner->>Parser: Extract HTML & interpretations
            Parser-->>Runner: Parsed HTML + interpretations (or parse-fail)
            Runner->>Renderer: Render HTML -> code.png
            Renderer-->>Runner: code.png
            Runner->>FS: Copy GT screenshot -> figma.png
            Runner->>Comparator: Compare code.png vs figma.png
            Comparator-->>Runner: diff.png + similarity
            Runner->>FS: Save result.json and artifacts
        end
    end

    Runner->>Aggregator: Pair baseline vs ablated runs
    Aggregator->>Aggregator: Compute ΔV, ΔI, ΔT per type
    Aggregator-->>Runner: Rankings + summary
    Runner->>FS: Write logs/ablation/phase1/summary.json
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Poem

🐇 I hop through fixtures with a prompt in paw,

I strip the trees and ask the model to draw,
Cache hums softly while retries take flight,
Pixels whisper differences in the night,
A rabbit tallies deltas by moonlight.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat: add ablation Phase 1 execution script' directly and accurately reflects the main change—a new TypeScript script in src/agents/ablation/run-phase1.ts that implements automated Phase 1 execution for an ablation experiment.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/ablation-script-130

Comment @coderabbitai help to get the list of available commands and usage tips.

let-sunny and others added 10 commits March 27, 2026 23:21
High:
- Cache invalidation: store cacheKey (model + promptHash + configVersion)
  in result.json, invalidate on mismatch
- Multi-run support: ABLATION_RUNS env var (default 1 for Phase 1,
  set 3 for Phase 2), rankings average across runs

Medium:
- ΔI parsing: 3 fallback patterns for interpretations extraction,
  warnings on parse failure instead of silent 0
- Fixture configurability: ABLATION_FIXTURES env var overrides defaults

Low:
- Auto-detect no-op strips: isStripNoOp() compares before/after,
  no more hardcoded SKIP_TYPES

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Medium:
- cacheKey includes temperature + maxTokens (experiment condition changes
  now invalidate cache)
- ABLATION_RUNS validation: reject NaN, 0, negative values with error

Low:
- HTML parsing: 4 fallback patterns (```html, <!DOCTYPE, <html, // filename:)
- Cost output: separate session vs cached (newResults tracking)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Blocking:
- runIndex-paired Δ calculation: baseline[fixture:runIndex] vs
  ablation[fixture:runIndex] → removes covariance, reduces variance
- Cache validation: check all required artifacts exist (output.html,
  code.png, figma.png, diff.png) not just result.json

Medium:
- interpretationsCount = -1 on parse failure (not silent 0)
  ΔI calculation skips pairs where either side failed
- HTML parsing: pick LARGEST code block (>100 chars, contains <)
  instead of first match, avoids grabbing small CSS snippets

Low:
- ABLATION_FIXTURES: filter empty strings from comma-split

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…o config version

Medium:
- RunResult: add interpretationsParseFailed + parseWarnings fields
  for post-hoc failure analysis in summary.json
- HTML block selection: prioritize <!doctype/<html > <body > largest
  (prevents grabbing CSS/JS snippets over actual HTML)

Low:
- CONFIG_VERSION: auto-computed from sha256 of core source files
  (design-tree-strip.ts, visual-compare-helpers.ts) — no manual bump needed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add design-tree.ts and visual-compare.ts to the hash input alongside
design-tree-strip.ts and visual-compare-helpers.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ABLATION_RUNS: regex /^\d+$/ check rejects "3abc", spaces, floats
- HTML block threshold: 100 → 50 chars for small fixture robustness
- Empty fixture list: explicit error instead of silent empty summary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mary

- Console: interp=PARSE_FAIL instead of interp=0 when parsing failed
- summary.json: skippedFixtures array with fixture name + reason
- Phase1Summary type: add skippedFixtures field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per-run context in result.json:
- screenshotPath, logicalViewport, exportScale
- htmlExtractMethod (doctype/body/largest/none)
- designTreeTokens (input size)

Summary-level metadata:
- cacheStats: { hits, misses }
- parseFailureCount

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Config version hash now includes the script itself (HTML parsing,
  interpretation extraction, ranking logic changes invalidate cache)
- cacheStats.misses → cacheStats.newCalls (accurate semantics)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@let-sunny let-sunny marked this pull request as ready for review March 27, 2026 14:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agents/ablation/run-phase1.ts`:
- Line 423: The current comparator used in rankings.sort only compares avgDeltaV
and leaves ties to insertion order; update the comparator used in rankings.sort
to implement the documented tie-breakers by comparing avgDeltaV (desc), then
avgDeltaI (desc) if avgDeltaV is equal, and finally avgDeltaT (desc) if
avgDeltaI is also equal, ensuring you handle undefined/null values consistently
so tied entries are deterministically ordered by avgDeltaI then avgDeltaT.
- Line 361: The log and formatter multiply comparison.similarity by 100 even
though compareScreenshots() already returns a 0–100 percentage; remove the extra
"* 100" so values are displayed correctly (e.g., change occurrences in the
console.log at run-phase1.ts that reference comparison.similarity and any other
places in the same file such as the similar expressions around lines where
similarity is used for printing or computing ΔV—refer to the console.log at the
shown diff and the analogous usages later in the file) so that
comparison.similarity is used directly and then formatted with toFixed as
needed.
- Around line 152-153: Add proper Zod validation for cache inputs: create Zod
schemas in src/agents/contracts/ (e.g., RunResultSchema and CacheKeySchema) and
export types via z.infer; then replace direct JSON parsing/casts in isCacheValid
and the cache-loading loop (where JSON.parse(... ) as RunResult is used) to
first safeParse the file contents with RunResultSchema (and validate the
cacheKey with CacheKeySchema), reject or skip files on validation failure, and
use the validated typed result for all numeric operations (similarity,
inputTokens, outputTokens, totalTokens) so downstream math uses guaranteed
numbers rather than potentially undefined/NaN values.
- Around line 532-545: Compute the design tree before checking cache and include
an input hash of that tree plus the fixture reference screenshot in the cache
identity: move the tree derivation (tree = type === "baseline" ? baselineTree :
stripDesignTree(baselineTree, type)) to before calling isCacheValid, implement a
helper (e.g., computeInputHash(tree, fixture) or hashTreeAndScreenshot) that
hashes the serialized tree plus the fixture's ground-truth image/metadata, and
use that hash when calling isCacheValid(...) and when constructing cacheKey
passed into runSingle(...); update getResultPath / cache writing to persist this
input hash so cache validation will detect changes to fixture data, images, or
generated trees.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 167e4aa1-a2dd-448d-805d-495afa6fdbdf

📥 Commits

Reviewing files that changed from the base of the PR and between b790e44 and 61d4107.

📒 Files selected for processing (1)
  • src/agents/ablation/run-phase1.ts

Comment on lines +152 to +153
try {
const result = JSON.parse(readFileSync(resultPath, "utf-8")) as RunResult;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the file structure and find the relevant code
git ls-files | grep -E "(run-phase1|contracts)" | head -20

Repository: let-sunny/canicode

Length of output: 624


🏁 Script executed:

# Check if the file exists and get line count
wc -l src/agents/ablation/run-phase1.ts

Repository: let-sunny/canicode

Length of output: 99


🏁 Script executed:

# Read the specific lines mentioned in the review (152-153 and 536-539)
cat -n src/agents/ablation/run-phase1.ts | sed -n '140,165p'

Repository: let-sunny/canicode

Length of output: 1230


🏁 Script executed:

# Also check lines around 536-539
cat -n src/agents/ablation/run-phase1.ts | sed -n '525,550p'

Repository: let-sunny/canicode

Length of output: 1312


🏁 Script executed:

# Search for RunResult type definition
rg "RunResult" --type ts --type tsx -B 2 -A 5

Repository: let-sunny/canicode

Length of output: 89


🏁 Script executed:

# Check if there's a Zod schema for RunResult in contracts directory
fd . src/contracts --type f

Repository: let-sunny/canicode

Length of output: 162


🏁 Script executed:

# Search for RunResult type definition
rg "type RunResult|interface RunResult|RunResult =" --type ts -B 2 -A 5

Repository: let-sunny/canicode

Length of output: 1085


🏁 Script executed:

# Check imports at the beginning of run-phase1.ts
head -50 src/agents/ablation/run-phase1.ts

Repository: let-sunny/canicode

Length of output: 1902


🏁 Script executed:

# List all files in contracts directories
find src/agents/contracts src/core/contracts -type f -name "*.ts"

Repository: let-sunny/canicode

Length of output: 590


🏁 Script executed:

# Search for Zod schema definitions related to results
rg "z\.object\|z\.infer" --type ts | grep -i result

Repository: let-sunny/canicode

Length of output: 44


🏁 Script executed:

# Get the full RunResult interface definition
rg "interface RunResult" -A 20 --type ts

Repository: let-sunny/canicode

Length of output: 1322


🏁 Script executed:

# Check if there's a CacheKey interface
rg "interface CacheKey|type CacheKey" -A 10 --type ts

Repository: let-sunny/canicode

Length of output: 649


🏁 Script executed:

# See if Zod is used elsewhere in the ablation code
rg "import.*zod|from.*zod" src/agents/ablation --type ts

Repository: let-sunny/canicode

Length of output: 44


🏁 Script executed:

# Check the CLAUDE.md to see exact coding guidelines
cat CLAUDE.md | grep -A 10 "Zod"

Repository: let-sunny/canicode

Length of output: 962


🏁 Script executed:

# Check if CLAUDE.md applies to agents directory
grep -A 5 "src/agents" CLAUDE.md

Repository: let-sunny/canicode

Length of output: 44


🏁 Script executed:

# See how results are used downstream
rg "cached\.|\.similarity|\.inputTokens|\.outputTokens" src/agents/ablation/run-phase1.ts -B 2 -A 2

Repository: let-sunny/canicode

Length of output: 1420


🏁 Script executed:

# Check if any Zod schema pattern exists elsewhere in agents for reference
rg "z\.object\|ZodSchema" src/agents --type ts -l

Repository: let-sunny/canicode

Length of output: 44


Add Zod schema validation for RunResult and CacheKey before parsing cached JSON.

File reads from disk are external inputs and must be validated with Zod schemas per project guidelines. Both line 153 (in isCacheValid) and line 536 (cache loading loop) parse JSON directly and cast to RunResult without validation. A cache file with matching cacheKey but missing or malformed numeric fields (similarity, inputTokens, outputTokens, totalTokens) will pass type narrowing and feed undefined/NaN into downstream calculations:

  • Line 539: cached.similarity * 100
  • Delta calculations: baseline.similarity - r.similarity
  • Cost estimation: token summation

Define Zod schemas for RunResult and CacheKey in src/agents/contracts/ and parse with validation instead of type assertion. Infer TypeScript types from the schemas using z.infer<typeof Schema>.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 152 - 153, Add proper Zod
validation for cache inputs: create Zod schemas in src/agents/contracts/ (e.g.,
RunResultSchema and CacheKeySchema) and export types via z.infer; then replace
direct JSON parsing/casts in isCacheValid and the cache-loading loop (where
JSON.parse(... ) as RunResult is used) to first safeParse the file contents with
RunResultSchema (and validate the cacheKey with CacheKeySchema), reject or skip
files on validation failure, and use the validated typed result for all numeric
operations (similarity, inputTokens, outputTokens, totalTokens) so downstream
math uses guaranteed numbers rather than potentially undefined/NaN values.

Comment on lines +532 to +545
for (const type of conditions) {
for (let run = 0; run < runsPerCondition; run++) {
// Check cache with key + artifact validation
if (isCacheValid(fixture, type, run, cacheKey)) {
const cached = JSON.parse(readFileSync(getResultPath(fixture, type, run), "utf-8")) as RunResult;
allResults.push(cached);
cacheHits++;
console.log(` [cached] ${type} run ${run + 1} → similarity=${(cached.similarity * 100).toFixed(1)}%`);
continue;
}

console.log(` [${type}] run ${run + 1}/${runsPerCondition}`);
const tree = type === "baseline" ? baselineTree : stripDesignTree(baselineTree, type);
const result = await runSingle(client, prompt, fixture, type, tree, run, cacheKey);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fold the current fixture inputs into cache validation.

This loop checks isCacheValid() before it derives the current baseline/stripped tree, and the cache identity never changes when fixtures/<name>/data.json, fixture images, or the reference screenshot change. After any fixture/export update, the run will silently reuse stale similarity/token metrics and skew the ablation ranking. Please compute the current tree first and include an input hash for the generated design-tree plus the ground-truth screenshot in the cache key.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 532 - 545, Compute the design
tree before checking cache and include an input hash of that tree plus the
fixture reference screenshot in the cache identity: move the tree derivation
(tree = type === "baseline" ? baselineTree : stripDesignTree(baselineTree,
type)) to before calling isCacheValid, implement a helper (e.g.,
computeInputHash(tree, fixture) or hashTreeAndScreenshot) that hashes the
serialized tree plus the fixture's ground-truth image/metadata, and use that
hash when calling isCacheValid(...) and when constructing cacheKey passed into
runSingle(...); update getResultPath / cache writing to persist this input hash
so cache validation will detect changes to fixture data, images, or generated
trees.

let-sunny and others added 3 commits March 27, 2026 23:52
…t validation

- Remove ×100 on similarity display (compareScreenshots returns 0-100)
- Ranking sort: primary=ΔV → tie-break=ΔI → ΔT
- Cache key includes fixtureHash (data.json + screenshot sha256)
  so fixture updates invalidate cached results
- isValidRunResult: validate numeric fields before using cached data
- computeFixtureHash: per-fixture input fingerprint

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 retries with 2s/4s/8s delays for rate limit (429) and
overloaded (529) errors. Other errors still fail immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Download Inter variable font (352KB) to assets/fonts/.
Script replaces Google Fonts <link> tags with local @font-face
to eliminate CDN dependency and prevent Playwright timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/agents/ablation/run-phase1.ts (2)

154-193: ⚠️ Potential issue | 🟠 Major

Validate cached results once with a full schema, then reuse that parsed object.

isValidRunResult() only guards a subset of fields, and the cache-hit branch reparses result.json and trusts it as RunResult. A corrupted type, runIndex, or interpretationsCount can still skew rankings and summary.json. Replace the boolean check with a single Zod-backed loader that safeParses the full payload and returns the validated object for reuse. As per coding guidelines, "Use Zod schemas for external input validation (schemas in contracts/ directory)" and "Infer TypeScript types from Zod schemas using z.infer<typeof Schema>."

Also applies to: 572-573

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 154 - 193, Replace the
fragile isValidRunResult boolean guard and the reparsing in isCacheValid with a
Zod-backed loader: create/use the full RunResult Zod schema from contracts/
(infer type with z.infer<typeof RunResultSchema>), implement a function (e.g.,
loadValidatedRunResult(resultPath): RunResult | null) that reads result.json,
calls RunResultSchema.safeParse, and returns the parsed/validated object or null
on failure; then update isCacheValid to call loadValidatedRunResult once, reuse
the returned validated RunResult (instead of reparsing and using
isValidRunResult), and perform the cacheKey checks against that validated object
while still verifying REQUIRED_ARTIFACTS and existence of result.json.

145-151: ⚠️ Potential issue | 🟠 Major

Build the cache identity from the actual run inputs.

computeFixtureHash() only sees data.json and the Figma screenshot, but this run also depends on the generated tree plus fixture images/ and vectors/. Updating those inputs can still hit a stale cache and reuse old similarity/token data. Move cache-key construction inside the condition loop and hash the per-type tree together with the relevant asset bytes.

Also applies to: 271-278, 326-334, 543-545, 581-582

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 145 - 151, computeFixtureHash
currently only hashes data.json and the screenshot, causing stale cache hits;
move cache-key construction into the per-type processing loop and include the
actual run inputs: the generated per-type "tree" string and the bytes of fixture
assets under images/ and vectors/ when present. Update computeFixtureHash (and
the equivalent hashing logic at the other call sites referenced) to accept the
per-type tree and asset bytes or compute them in-loop, then combine those values
with data.json and screenshot before calling createHash("sha256") so the
returned digest uniquely reflects the exact inputs used for that run.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agents/ablation/run-phase1.ts`:
- Around line 317-347: The HTML from parseResponse is untrusted: before
rendering it, validate/sanitize it in run-phase1.ts (e.g., in the block that
writes htmlPath) by rejecting or stripping any <script>, <iframe>, inline event
handlers (on*), and external resource URLs (http(s):) or at minimum fail the run
if such tags/attributes are present; then harden the renderer in
src/core/engine/visual-compare.ts inside the renderCodeScreenshot function by
disabling JavaScript execution on the page (use the API that sets
JavaScriptEnabled to false), enable request interception and block/abort all
network requests (allow only data: or file: if absolutely required), and ensure
no navigation/waitUntil phases rely on network activity—this will prevent
prompt-injected scripts from running or making outbound requests when rendering
the untrusted model HTML.

---

Duplicate comments:
In `@src/agents/ablation/run-phase1.ts`:
- Around line 154-193: Replace the fragile isValidRunResult boolean guard and
the reparsing in isCacheValid with a Zod-backed loader: create/use the full
RunResult Zod schema from contracts/ (infer type with z.infer<typeof
RunResultSchema>), implement a function (e.g.,
loadValidatedRunResult(resultPath): RunResult | null) that reads result.json,
calls RunResultSchema.safeParse, and returns the parsed/validated object or null
on failure; then update isCacheValid to call loadValidatedRunResult once, reuse
the returned validated RunResult (instead of reparsing and using
isValidRunResult), and perform the cacheKey checks against that validated object
while still verifying REQUIRED_ARTIFACTS and existence of result.json.
- Around line 145-151: computeFixtureHash currently only hashes data.json and
the screenshot, causing stale cache hits; move cache-key construction into the
per-type processing loop and include the actual run inputs: the generated
per-type "tree" string and the bytes of fixture assets under images/ and
vectors/ when present. Update computeFixtureHash (and the equivalent hashing
logic at the other call sites referenced) to accept the per-type tree and asset
bytes or compute them in-loop, then combine those values with data.json and
screenshot before calling createHash("sha256") so the returned digest uniquely
reflects the exact inputs used for that run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a0833a63-0372-4a02-8e4b-f9b0e44be230

📥 Commits

Reviewing files that changed from the base of the PR and between 61d4107 and 2533e38.

📒 Files selected for processing (1)
  • src/agents/ablation/run-phase1.ts

Comment on lines +317 to +347
const { html, interpretations, parseWarnings, interpretationsParseFailed } = parseResponse(responseText);
if (parseWarnings.length > 0) {
for (const w of parseWarnings) console.warn(` WARNING: ${w}`);
}

const htmlPath = join(runDir, "output.html");
writeFileSync(htmlPath, html);
writeFileSync(join(runDir, "interpretations.json"), JSON.stringify(interpretations, null, 2));

// Copy fixture images to run dir so HTML can reference them
const fixtureImagesDir = resolve(`fixtures/${fixture}/images`);
if (existsSync(fixtureImagesDir)) {
const runImagesDir = join(runDir, "images");
mkdirSync(runImagesDir, { recursive: true });
for (const f of readdirSync(fixtureImagesDir)) {
copyFileSync(join(fixtureImagesDir, f), join(runImagesDir, f));
}
}

// Render HTML to screenshot
console.log(` Rendering screenshot...`);
const codePngPath = join(runDir, "code.png");
const figmaScreenshotPath = getFixtureScreenshotPath(fixture);
const figmaPng = readFileSync(figmaScreenshotPath);
const { PNG } = await import("pngjs");
const figmaImage = PNG.sync.read(figmaPng);
const exportScale = 2;
const logicalW = Math.max(1, Math.round(figmaImage.width / exportScale));
const logicalH = Math.max(1, Math.round(figmaImage.height / exportScale));

await renderCodeScreenshot(htmlPath, codePngPath, { width: logicalW, height: logicalH }, exportScale);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's check the parseResponse function to understand where html comes from
fd "run-phase1.ts" --type f

Repository: let-sunny/canicode

Length of output: 95


🏁 Script executed:

# Also search for renderCodeScreenshot to understand the rendering context
rg "renderCodeScreenshot" --type ts --type tsx -B 2 -A 5

Repository: let-sunny/canicode

Length of output: 89


🏁 Script executed:

# Check for parseResponse implementation
rg "function parseResponse|const parseResponse|export.*parseResponse" --type ts --type tsx -B 2 -A 10

Repository: let-sunny/canicode

Length of output: 89


🏁 Script executed:

# Look at the actual browser context configuration in the rendering code
rg "browser\.newContext|javaScriptEnabled|route\(" --type ts --type tsx -B 3 -A 3

Repository: let-sunny/canicode

Length of output: 89


🏁 Script executed:

# Search for parseResponse and renderCodeScreenshot without type filters
rg "parseResponse" -A 10 src/

Repository: let-sunny/canicode

Length of output: 2009


🏁 Script executed:

# Search for renderCodeScreenshot
rg "renderCodeScreenshot" -B 2 -A 10 src/

Repository: let-sunny/canicode

Length of output: 5940


🏁 Script executed:

# Look for browser context configuration
rg "javaScriptEnabled|newContext" -B 2 -A 5 src/

Repository: let-sunny/canicode

Length of output: 540


🏁 Script executed:

# Read the run-phase1.ts file to understand the full context
wc -l src/agents/ablation/run-phase1.ts

Repository: let-sunny/canicode

Length of output: 99


🏁 Script executed:

# Let me check the full renderCodeScreenshot implementation
sed -n '1,100p' src/core/engine/visual-compare.ts

Repository: let-sunny/canicode

Length of output: 3318


🏁 Script executed:

# Check lines around where page is navigated
rg "goto\|page\." src/core/engine/visual-compare.ts -A 2 -B 1

Repository: let-sunny/canicode

Length of output: 44


🏁 Script executed:

# Check if there's any HTML sanitization or validation before rendering
rg "sanitize|validate|script|<script" src/agents/ablation/run-phase1.ts -B 2 -A 2

Repository: let-sunny/canicode

Length of output: 44


🏁 Script executed:

# See the complete context around lines 317-347 in run-phase1.ts
sed -n '300,360p' src/agents/ablation/run-phase1.ts

Repository: let-sunny/canicode

Length of output: 2680


🏁 Script executed:

# Continue reading renderCodeScreenshot implementation
sed -n '95,150p' src/core/engine/visual-compare.ts

Repository: let-sunny/canicode

Length of output: 2037


🏁 Script executed:

# Get more of the renderCodeScreenshot function
rg "renderCodeScreenshot" -A 30 src/core/engine/visual-compare.ts | head -50

Repository: let-sunny/canicode

Length of output: 1565


🏁 Script executed:

# Check if page.goto or any navigation happens and what options are set
sed -n '95,200p' src/core/engine/visual-compare.ts

Repository: let-sunny/canicode

Length of output: 3864


🏁 Script executed:

# Verify if there are any other uses of renderCodeScreenshot or similar patterns
rg "newContext|javaScriptEnabled|route\(" src/ --type ts -c

Repository: let-sunny/canicode

Length of output: 97


🏁 Script executed:

# Check if there's any documentation about this code's threat model
cat README.md 2>/dev/null | head -50

Repository: let-sunny/canicode

Length of output: 2544


Disable JavaScript and block network requests before rendering untrusted model HTML.

The html output from parseResponse() comes directly from the Claude model response with no sanitization. When passed to renderCodeScreenshot(), it's executed in a Chromium context with JavaScript enabled and network access unrestricted. This allows a prompt-injected response to make arbitrary HTTP requests or probe local files during the waitUntil: "networkidle" phase.

Apply these hardening steps in src/core/engine/visual-compare.ts:

Required security fix
    const context = await browser.newContext({
      viewport: logicalViewport,
      deviceScaleFactor,
+     javaScriptEnabled: false,
    });
    const page = await context.newPage();
+   await context.route("http://**/*", (route) => route.abort());
+   await context.route("https://**/*", (route) => route.abort());

Additionally, validate the HTML before rendering in src/agents/ablation/run-phase1.ts:

+ if (/<script\b/i.test(finalHtml) || /\bon\w+=/i.test(finalHtml)) {
+   throw new Error("Refusing to render HTML with executable content");
+ }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const { html, interpretations, parseWarnings, interpretationsParseFailed } = parseResponse(responseText);
if (parseWarnings.length > 0) {
for (const w of parseWarnings) console.warn(` WARNING: ${w}`);
}
const htmlPath = join(runDir, "output.html");
writeFileSync(htmlPath, html);
writeFileSync(join(runDir, "interpretations.json"), JSON.stringify(interpretations, null, 2));
// Copy fixture images to run dir so HTML can reference them
const fixtureImagesDir = resolve(`fixtures/${fixture}/images`);
if (existsSync(fixtureImagesDir)) {
const runImagesDir = join(runDir, "images");
mkdirSync(runImagesDir, { recursive: true });
for (const f of readdirSync(fixtureImagesDir)) {
copyFileSync(join(fixtureImagesDir, f), join(runImagesDir, f));
}
}
// Render HTML to screenshot
console.log(` Rendering screenshot...`);
const codePngPath = join(runDir, "code.png");
const figmaScreenshotPath = getFixtureScreenshotPath(fixture);
const figmaPng = readFileSync(figmaScreenshotPath);
const { PNG } = await import("pngjs");
const figmaImage = PNG.sync.read(figmaPng);
const exportScale = 2;
const logicalW = Math.max(1, Math.round(figmaImage.width / exportScale));
const logicalH = Math.max(1, Math.round(figmaImage.height / exportScale));
await renderCodeScreenshot(htmlPath, codePngPath, { width: logicalW, height: logicalH }, exportScale);
const { html, interpretations, parseWarnings, interpretationsParseFailed } = parseResponse(responseText);
if (parseWarnings.length > 0) {
for (const w of parseWarnings) console.warn(` WARNING: ${w}`);
}
const htmlPath = join(runDir, "output.html");
writeFileSync(htmlPath, html);
writeFileSync(join(runDir, "interpretations.json"), JSON.stringify(interpretations, null, 2));
// Copy fixture images to run dir so HTML can reference them
const fixtureImagesDir = resolve(`fixtures/${fixture}/images`);
if (existsSync(fixtureImagesDir)) {
const runImagesDir = join(runDir, "images");
mkdirSync(runImagesDir, { recursive: true });
for (const f of readdirSync(fixtureImagesDir)) {
copyFileSync(join(fixtureImagesDir, f), join(runImagesDir, f));
}
}
// Render HTML to screenshot
if (/<script\b/i.test(html) || /\bon\w+=/i.test(html)) {
throw new Error("Refusing to render HTML with executable content");
}
console.log(` Rendering screenshot...`);
const codePngPath = join(runDir, "code.png");
const figmaScreenshotPath = getFixtureScreenshotPath(fixture);
const figmaPng = readFileSync(figmaScreenshotPath);
const { PNG } = await import("pngjs");
const figmaImage = PNG.sync.read(figmaPng);
const exportScale = 2;
const logicalW = Math.max(1, Math.round(figmaImage.width / exportScale));
const logicalH = Math.max(1, Math.round(figmaImage.height / exportScale));
await renderCodeScreenshot(htmlPath, codePngPath, { width: logicalW, height: logicalH }, exportScale);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 317 - 347, The HTML from
parseResponse is untrusted: before rendering it, validate/sanitize it in
run-phase1.ts (e.g., in the block that writes htmlPath) by rejecting or
stripping any <script>, <iframe>, inline event handlers (on*), and external
resource URLs (http(s):) or at minimum fail the run if such tags/attributes are
present; then harden the renderer in src/core/engine/visual-compare.ts inside
the renderCodeScreenshot function by disabling JavaScript execution on the page
(use the API that sets JavaScriptEnabled to false), enable request interception
and block/abort all network requests (allow only data: or file: if absolutely
required), and ensure no navigation/waitUntil phases rely on network
activity—this will prevent prompt-injected scripts from running or making
outbound requests when rendering the untrusted model HTML.

let-sunny and others added 2 commits March 28, 2026 00:08
LLM output starts with "// filename: index.html" which is not valid
HTML. Playwright's body > *:first-child selector fails because the
root element is invisible. Strip the prefix before saving.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Untrusted model output rendered in Chromium via Playwright.
Remove script tags to prevent arbitrary JS execution during
screenshot capture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/agents/ablation/run-phase1.ts (3)

154-193: ⚠️ Potential issue | 🟠 Major

Replace the partial cache guard with a schema-backed parse.

isValidRunResult() only validates a few numerics and that cacheKey is “an object”. A malformed result.json can still slip bad type, fixture, runIndex, interpretationsCount, or nested cacheKey fields into the cache-hit path when it is later reloaded as RunResult. Parse cached results through one Zod schema instead of a partial type guard/cast, and infer the TypeScript type from that schema. As per coding guidelines, "Use Zod schemas for external input validation (schemas in contracts/ directory)" and "Infer TypeScript types from Zod schemas using z.infer".

#!/bin/bash
rg -n 'isValidRunResult|JSON.parse\(readFileSync\(.*result\.json|as RunResult' src/agents/ablation/run-phase1.ts || true
rg -n 'RunResultSchema|CacheKeySchema' src || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 154 - 193, Replace the
fragile isValidRunResult + manual JSON cast with a Zod-backed parse: import the
RunResult schema (and the nested CacheKey schema) from the contracts directory,
infer the TypeScript type with z.infer<typeof RunResultSchema>, and use
RunResultSchema.parse(parsed) inside isCacheValid to validate the contents of
result.json (instead of isValidRunResult and the cast to RunResult). Ensure you
remove the loose "cacheKey is object" check and instead compare fields against
the validated result.cacheKey typed from the schema (use the schema's
fixtureHash field), and handle parse errors by returning false in the catch
block.

145-150: ⚠️ Potential issue | 🟠 Major

Include fixture asset directories in fixtureHash.

computeFixtureHash() only hashes data.json and the reference screenshot, but getDesignTreeOptions() feeds vectors/ and images/ into generateDesignTree(), and the run later copies images/ into the render sandbox. Editing those assets leaves the cache key unchanged, so stale result.json / code.png / diff.png can be reused as valid cache hits. Hash the optional asset directories recursively as part of the fixture inputs.

Also applies to: 271-278, 357-364

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 145 - 150, computeFixtureHash
currently only includes data.json and the screenshot, missing mutable asset
directories; update computeFixtureHash to also recursively walk and include
contents of fixtures/<fixture>/vectors/ and fixtures/<fixture>/images/ (if they
exist) into the SHA256 via createHash (same hash instance used for data.json and
screenshot), and include file paths / file contents deterministically (e.g.,
sorted) so changes to any asset file update the hash; apply the same
directory-recursive hashing fix to the other duplicated implementations
referenced in the comment so that getDesignTreeOptions -> generateDesignTree
inputs (vectors/images) and the render-sandbox image copy are part of the cache
key.

334-355: ⚠️ Potential issue | 🔴 Critical

Reject executable or remote-resource HTML before the browser render.

The HTML here comes straight from the model and is only rewritten for fonts before rendering. A prompt-injected response can still carry <script>, on*= handlers, <iframe>, or model-supplied http(s) / file: URLs and turn the screenshot step into network or local-file access. Fail closed on that content before writing output.html.

Also applies to: 367-378

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 334 - 355, The HTML from
parseResponse (variables html and finalHtml) must be scanned and rejected if it
contains executable or remote-resource content before writing output.html
(htmlPath) or interpretations.json; update run-phase1.ts to parse finalHtml
(e.g., with a DOM parser like JSDOM or a sanitizer library) and fail-closed by
throwing/logging and aborting when any disallowed elements/attributes are
present—specifically detect <script>, <iframe>, <object>, <embed>, inline event
handlers (attributes matching /^on/i), and any src/href attributes containing
http:, https:, file:, or data: URLs; apply this check right after building
finalHtml and before calling writeFileSync, or alternatively sanitize by
removing those nodes/attributes and only proceed if resulting HTML contains no
remote/executable references.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agents/ablation/run-phase1.ts`:
- Around line 449-454: The deltaT sign is inverted: currently deltas.deltaT is
computed as ablation minus baseline so higher values mean more tokens (worse)
but later logic treats larger as better; update all occurrences where deltaT is
pushed (e.g., the deltas.deltaT.push(...) lines near deltas.deltaV/deltaI and
the other two spots flagged) to compute baseline.totalTokens - r.totalTokens
instead (or alternatively change downstream sorting to sort avgDeltaT
ascending); ensure references to deltas.deltaT and avgDeltaT remain consistent
after the change so ties rank variants that save tokens higher.
- Around line 520-522: The fixtures list assigned in run-phase1.ts must be
validated and sandboxed: when computing fixtures from
process.env["ABLATION_FIXTURES"], validate each entry against a strict safe-name
pattern (e.g. allow only alphanumerics, dots, dashes, underscores) and reject or
ignore any that don't match; additionally, after resolving a fixture path into
the fixture input directory and when building run directories via getRunDir(),
canonicalize (resolve) the path and assert it is still rooted under the expected
base directories (e.g., the fixtures base and logs/ablation/phase1 base) before
using it, failing fast or skipping any fixture that escapes its base. Use the
existing fixtures variable and the code that joins with the fixtures base and
getRunDir() to add these checks so no input can traverse out of the intended
sandbox.

---

Duplicate comments:
In `@src/agents/ablation/run-phase1.ts`:
- Around line 154-193: Replace the fragile isValidRunResult + manual JSON cast
with a Zod-backed parse: import the RunResult schema (and the nested CacheKey
schema) from the contracts directory, infer the TypeScript type with
z.infer<typeof RunResultSchema>, and use RunResultSchema.parse(parsed) inside
isCacheValid to validate the contents of result.json (instead of
isValidRunResult and the cast to RunResult). Ensure you remove the loose
"cacheKey is object" check and instead compare fields against the validated
result.cacheKey typed from the schema (use the schema's fixtureHash field), and
handle parse errors by returning false in the catch block.
- Around line 145-150: computeFixtureHash currently only includes data.json and
the screenshot, missing mutable asset directories; update computeFixtureHash to
also recursively walk and include contents of fixtures/<fixture>/vectors/ and
fixtures/<fixture>/images/ (if they exist) into the SHA256 via createHash (same
hash instance used for data.json and screenshot), and include file paths / file
contents deterministically (e.g., sorted) so changes to any asset file update
the hash; apply the same directory-recursive hashing fix to the other duplicated
implementations referenced in the comment so that getDesignTreeOptions ->
generateDesignTree inputs (vectors/images) and the render-sandbox image copy are
part of the cache key.
- Around line 334-355: The HTML from parseResponse (variables html and
finalHtml) must be scanned and rejected if it contains executable or
remote-resource content before writing output.html (htmlPath) or
interpretations.json; update run-phase1.ts to parse finalHtml (e.g., with a DOM
parser like JSDOM or a sanitizer library) and fail-closed by throwing/logging
and aborting when any disallowed elements/attributes are present—specifically
detect <script>, <iframe>, <object>, <embed>, inline event handlers (attributes
matching /^on/i), and any src/href attributes containing http:, https:, file:,
or data: URLs; apply this check right after building finalHtml and before
calling writeFileSync, or alternatively sanitize by removing those
nodes/attributes and only proceed if resulting HTML contains no
remote/executable references.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 75e1572c-6fa8-46f0-b35c-8824d57217c7

📥 Commits

Reviewing files that changed from the base of the PR and between 2533e38 and 01742a1.

⛔ Files ignored due to path filters (1)
  • assets/fonts/Inter.var.woff2 is excluded by !**/*.woff2
📒 Files selected for processing (1)
  • src/agents/ablation/run-phase1.ts

Comment on lines +520 to +522
const fixtures = process.env["ABLATION_FIXTURES"]
? process.env["ABLATION_FIXTURES"].split(",").map((s) => s.trim()).filter(Boolean)
: DEFAULT_FIXTURES;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate fixture names before feeding them into path resolution.

ABLATION_FIXTURES is split and trimmed, but never sanitized before later being interpolated into both fixture input paths and getRunDir(). A value like ../../tmp/x escapes fixtures/ and logs/ablation/phase1/, letting this script read or write outside its intended sandbox. Restrict fixture names to a safe pattern and reject resolved paths that leave the expected base directory.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 520 - 522, The fixtures list
assigned in run-phase1.ts must be validated and sandboxed: when computing
fixtures from process.env["ABLATION_FIXTURES"], validate each entry against a
strict safe-name pattern (e.g. allow only alphanumerics, dots, dashes,
underscores) and reject or ignore any that don't match; additionally, after
resolving a fixture path into the fixture input directory and when building run
directories via getRunDir(), canonicalize (resolve) the path and assert it is
still rooted under the expected base directories (e.g., the fixtures base and
logs/ablation/phase1 base) before using it, failing fast or skipping any fixture
that escapes its base. Use the existing fixtures variable and the code that
joins with the fixtures base and getRunDir() to add these checks so no input can
traverse out of the intended sandbox.

- ΔT: baseline - ablation (positive = fewer tokens = better)
  Was ablation - baseline (more tokens ranked higher in ties)
- Fixture name validation: reject path traversal attempts
  (only lowercase alphanumeric, hyphens, underscores)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (5)
src/agents/ablation/run-phase1.ts (5)

154-165: ⚠️ Potential issue | 🟠 Major

Schema-validate cached result.json before using it.

isValidRunResult() narrows to RunResult after checking only a few numeric fields, and the cache-hit path at Line 608 bypasses even that with as RunResult. A malformed cache file can still poison rankings or token totals. Parse once with a Zod schema from the repo’s contracts/ directory and infer the TS type from that schema instead of asserting it. As per coding guidelines, "Use Zod schemas for external input validation (schemas in contracts/ directory)" and "Infer TypeScript types from Zod schemas using z.infer<typeof Schema>."

Also applies to: 177-181, 607-608

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 154 - 165, Replace the ad-hoc
isValidRunResult check and any unchecked casts to RunResult with a Zod-based
validation: import the appropriate RunResult schema from the contracts/
directory (e.g., RunResultSchema), use RunResultSchema.safeParse(...) when
reading/parsing cached result.json and in isValidRunResult to both validate and
infer the type (z.infer<typeof RunResultSchema>), and handle failures by
rejecting the cache-hit (log the error and proceed as a cache miss or throw).
Update the cache-hit path that currently uses "as RunResult" to use the schema
result (or discard on safeParse.error) so malformed cache files cannot poison
rankings or token totals.

197-199: ⚠️ Potential issue | 🟠 Major

Reject unsafe fixture names before resolving paths.

ABLATION_FIXTURES entries are trimmed but never sanitized. Values like ../../tmp/x escape both fixtures/ and logs/ablation/phase1/ through resolve(...) and getRunDir(), which turns an env var into arbitrary read/write access outside the experiment sandbox. Validate against a safe fixture-name pattern and assert the resolved paths stay under the expected base directories.

Also applies to: 523-526, 564-568

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 197 - 199, The getRunDir
function (and other places using ABLATION_FIXTURES/resolve) currently accepts
unvalidated fixture names which allows path traversal; validate fixture names
against a strict safe pattern (e.g., /^[a-z0-9-_]+$/i) before using them and
reject/throw on invalid values, and after calling resolve(OUTPUT_DIR, fixture,
...) assert that the resolved path is a descendant of the intended base
(OUTPUT_DIR or fixtures base) by comparing path.normalize/resolve prefixes so a
malicious input like "../../tmp" cannot escape the sandbox; apply the same
validation/assertion to the other occurrences mentioned (around lines 523-526
and 564-568) so all uses of ABLATION_FIXTURES are sanitized and bounded.

345-349: ⚠️ Potential issue | 🔴 Critical

<script> stripping alone does not make this HTML safe to render.

Inline on* handlers, <iframe>/<object>, and external http(s) resources still survive into output.html, and the screenshot renderer loads that HTML in a normal browser context. A prompt-injected response can still run or fetch during screenshot capture unless the remaining active content is rejected or stripped too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 345 - 349, The current
sanitization only strips <script> and specific <link> tags but leaves inline
event handlers, frames, objects, and external resource references; update the
sanitization around finalHtml to fully neutralize active content by parsing
finalHtml into a DOM and: remove element types iframe, object, embed, frame;
strip any attributes that match /^on/i (e.g., onclick, onload) and remove
javascript: URLs in href/src attributes; rewrite or remove src/href that
reference external http(s) resources (and data: if you want to avoid remote-like
behavior) or replace them with inert placeholders; and ensure the existing
replace calls on finalHtml are replaced with this DOM-based sanitizer (target
the code around the finalHtml variable and the script/link replace logic).

145-150: ⚠️ Potential issue | 🟠 Major

Hash images/ and vectors/ into the cache key.

computeFixtureHash() only fingerprints data.json and the reference screenshot. This file also consumes fixtures/<fixture>/vectors/ when building the design tree and fixtures/<fixture>/images/ when rendering, so asset changes can still hit a stale cache and skew the rankings. Hash those directories too, or hash the generated tree plus copied assets before accepting result.json.

Also applies to: 271-279, 361-369

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 145 - 150, computeFixtureHash
currently only includes data.json and the reference screenshot so changes under
fixtures/<fixture>/images/ and fixtures/<fixture>/vectors/ can be missed; update
computeFixtureHash to also walk and incorporate the contents (and relative
paths) of the images/ and vectors/ directories into the SHA256 hash (e.g.,
recursively read each file in fixtures/${fixture}/images and
fixtures/${fixture}/vectors and feed their bytes and path names into
hash.update), and apply the same change to the other identical hashing sites in
this file; alternatively, if you already generate a design tree and copy assets
before finalizing result.json, hash the generated tree plus copied asset files
(their bytes and relative paths) and use that value as the cache key before
accepting result.json.

453-458: ⚠️ Potential issue | 🟠 Major

Flip ΔT so the efficiency tie-break prefers cheaper runs.

avgDeltaT is sorted descending, but Line 458 records ablation - baseline. In ties, a more expensive ablation outranks one that saves tokens.

🔧 Proposed fix
-    deltas.deltaT.push(r.totalTokens - baseline.totalTokens);
+    deltas.deltaT.push(baseline.totalTokens - r.totalTokens);

Also applies to: 486-490

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 453 - 458, The ΔT push
currently records r.totalTokens - baseline.totalTokens which makes positive
values mean the ablation is more expensive; change the sign so ΔT =
baseline.totalTokens - r.totalTokens by replacing
deltas.deltaT.push(r.totalTokens - baseline.totalTokens) with
deltas.deltaT.push(baseline.totalTokens - r.totalTokens) so avgDeltaT (sorted
descending) treats token savings as better; make the same flip in the other
occurrence (the block around deltas.deltaT.push at 486-490) so both places
produce positive values for cheaper runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/agents/ablation/run-phase1.ts`:
- Around line 154-165: Replace the ad-hoc isValidRunResult check and any
unchecked casts to RunResult with a Zod-based validation: import the appropriate
RunResult schema from the contracts/ directory (e.g., RunResultSchema), use
RunResultSchema.safeParse(...) when reading/parsing cached result.json and in
isValidRunResult to both validate and infer the type (z.infer<typeof
RunResultSchema>), and handle failures by rejecting the cache-hit (log the error
and proceed as a cache miss or throw). Update the cache-hit path that currently
uses "as RunResult" to use the schema result (or discard on safeParse.error) so
malformed cache files cannot poison rankings or token totals.
- Around line 197-199: The getRunDir function (and other places using
ABLATION_FIXTURES/resolve) currently accepts unvalidated fixture names which
allows path traversal; validate fixture names against a strict safe pattern
(e.g., /^[a-z0-9-_]+$/i) before using them and reject/throw on invalid values,
and after calling resolve(OUTPUT_DIR, fixture, ...) assert that the resolved
path is a descendant of the intended base (OUTPUT_DIR or fixtures base) by
comparing path.normalize/resolve prefixes so a malicious input like "../../tmp"
cannot escape the sandbox; apply the same validation/assertion to the other
occurrences mentioned (around lines 523-526 and 564-568) so all uses of
ABLATION_FIXTURES are sanitized and bounded.
- Around line 345-349: The current sanitization only strips <script> and
specific <link> tags but leaves inline event handlers, frames, objects, and
external resource references; update the sanitization around finalHtml to fully
neutralize active content by parsing finalHtml into a DOM and: remove element
types iframe, object, embed, frame; strip any attributes that match /^on/i
(e.g., onclick, onload) and remove javascript: URLs in href/src attributes;
rewrite or remove src/href that reference external http(s) resources (and data:
if you want to avoid remote-like behavior) or replace them with inert
placeholders; and ensure the existing replace calls on finalHtml are replaced
with this DOM-based sanitizer (target the code around the finalHtml variable and
the script/link replace logic).
- Around line 145-150: computeFixtureHash currently only includes data.json and
the reference screenshot so changes under fixtures/<fixture>/images/ and
fixtures/<fixture>/vectors/ can be missed; update computeFixtureHash to also
walk and incorporate the contents (and relative paths) of the images/ and
vectors/ directories into the SHA256 hash (e.g., recursively read each file in
fixtures/${fixture}/images and fixtures/${fixture}/vectors and feed their bytes
and path names into hash.update), and apply the same change to the other
identical hashing sites in this file; alternatively, if you already generate a
design tree and copy assets before finalizing result.json, hash the generated
tree plus copied asset files (their bytes and relative paths) and use that value
as the cache key before accepting result.json.
- Around line 453-458: The ΔT push currently records r.totalTokens -
baseline.totalTokens which makes positive values mean the ablation is more
expensive; change the sign so ΔT = baseline.totalTokens - r.totalTokens by
replacing deltas.deltaT.push(r.totalTokens - baseline.totalTokens) with
deltas.deltaT.push(baseline.totalTokens - r.totalTokens) so avgDeltaT (sorted
descending) treats token savings as better; make the same flip in the other
occurrence (the block around deltas.deltaT.push at 486-490) so both places
produce positive values for cheaper runs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5821fbe7-5272-4eee-9e6c-0eb1964162d0

📥 Commits

Reviewing files that changed from the base of the PR and between 01742a1 and 68ab026.

📒 Files selected for processing (1)
  • src/agents/ablation/run-phase1.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agents/ablation/run-phase1.ts`:
- Around line 110-123: computeConfigVersion currently builds a hash from
coreFiles but omits the Figma loader; update the coreFiles array inside
computeConfigVersion to include the resolver path for
"src/core/engine/figma-file-loader.ts" so changes to the loader invalidate the
cache—locate the computeConfigVersion function and add the figma-file-loader
entry alongside the other resolve(...) entries.
- Around line 345-349: Sanitize finalHtml more robustly by removing inline event
handler attributes and javascript: URLs in addition to stripping <script> and
Google Fonts links: update the sanitization around the finalHtml.replace calls
(the lines that currently strip <script> and <link> tags) to also remove
attributes matching /^on[a-z]+/i and href/src values starting with "javascript:"
(and consider stripping style attributes that include url("javascript:")
patterns); as defense-in-depth, also harden the Playwright rendering path — add
javaScriptEnabled: false to the browser context options used by
renderCodeScreenshot() (or the context creation in visual-compare.ts) so even if
sanitization misses something the page cannot execute JS.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 23115220-3ad3-4858-a9b3-417b05f88919

📥 Commits

Reviewing files that changed from the base of the PR and between 68ab026 and 9ae7067.

📒 Files selected for processing (1)
  • src/agents/ablation/run-phase1.ts

@let-sunny let-sunny marked this pull request as draft March 27, 2026 15:44
Critical — SVG icon mapping bug:
- save-fixture + implement: download loop skipped nodes without URL
  but mapping loop didn't, causing filename offset (icon-2.svg mapped
  to wrong node). Fixed by single-pass mapping+download.
- Vector filenames now include parent name (Star-Icon vs Figma-Icon)
  since all vectors are named "Icon" in Figma.

Experiment fixes:
- Crop code.png to match figma.png dimensions before comparison
  (AI renders taller HTML → all pixels below design height were diff)
- coreFiles: add figma-file-loader.ts to config version hash
- HTML sanitization: strip event handlers + javascript: URLs
- ΔT sign: baseline - ablation (positive = fewer tokens = better)
- Fixture name validation: reject path traversal

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@let-sunny let-sunny marked this pull request as ready for review March 27, 2026 15:54
let-sunny and others added 3 commits March 28, 2026 00:59
Figma's layoutPositioning: ABSOLUTE was ignored in design-tree output.
Now emits:
- position: absolute + top/left (relative to parent bbox) for ABSOLUTE nodes
- position: relative on parents that have ABSOLUTE children

This fixes icon buttons and dropdown overlays that were rendered
in normal flow instead of their intended absolute positions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…break)

FILL sizing + flex-grow: 1 together causes flex children to take full
width and push siblings to next line in wrap containers. flex-grow: 1
alone correctly fills remaining space without forcing 100% width.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…retch)

Figma's counterAxisAlignItems defaults to MIN (flex-start) when unset,
but CSS flex align-items defaults to stretch. This mismatch caused
children to stretch instead of aligning to start.

Also: skip textAlignVertical: TOP (CSS default, no emit needed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
let-sunny and others added 2 commits March 28, 2026 01:28
SVGs renamed from generic icon-N.svg to parent-named (star-icon,
figma-icon, chevron-down-icon, etc.) with corrected mapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All fixtures rebuilt with:
- Parent-named SVG filenames (star-icon, figma-icon, etc.)
- Corrected mapping (single-pass, no offset bug)
- interactionDestinations included
- 4 viewport screenshots (375/768/1200/1920)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/agents/ablation/run-phase1.ts (2)

145-152: ⚠️ Potential issue | 🟠 Major

Cache keys still miss asset-driven tree changes.

computeFixtureHash() only hashes data.json and the reference screenshot, but generateDesignTree(file, getDesignTreeOptions(fixture)) also depends on fixture vectors/ and images/ content. src/core/engine/design-tree.ts:586-606 loads mapping.json from those directories, so fixture rebuilds like this PR's vector renames can still reuse stale similarity/token results because the loop checks the cache before hashing the actual per-condition tree.

Hash the current tree plus the reference screenshot, or hash every asset/mapping file that feeds getDesignTreeOptions(), and build the cache key after tree is derived for the current condition.

Also applies to: 615-617, 643-654

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 145 - 152, computeFixtureHash
currently only hashes data.json and the screenshot, missing assets that affect
generateDesignTree (vectors/, images/, mapping.json) so cache keys can be stale;
update the cache key generation to include the actual design tree used for the
condition (or include all asset/mapping files used by getDesignTreeOptions) by
computing the tree via generateDesignTree(file, getDesignTreeOptions(fixture))
first, then hashing the serialized tree plus the reference screenshot (or hashed
contents of vectors/, images/, and mapping.json) before storing/looking up cache
entries; apply the same change to the other cache-key sites in this file that
call computeFixtureHash or build per-condition cache keys so all checks use the
tree-aware hash.

346-352: ⚠️ Potential issue | 🔴 Critical

Untrusted model HTML is still rendered with active browser capabilities.

This sanitizer only removes <script>, quoted on*= handlers, and href="javascript:...". The renderer used at Line 387 still goes through renderCodeScreenshot(), and the current src/core/engine/visual-compare.ts:1-35 snippet shows a Playwright context without javaScriptEnabled: false or request blocking. A prompt-injected response can still execute or make outbound requests during page.goto(..., { waitUntil: "networkidle" }).

Harden both sides: disable JS and block network in the renderer, and reject or strip the remaining executable/resource vectors here (src=javascript:, unquoted handlers, iframes, external URLs, etc.).

Verify that the renderer still leaves JS/network enabled and that the sanitizer only handles a narrow subset of executable attributes. Expected result: src/core/engine/visual-compare.ts shows newContext without hardening and src/agents/ablation/run-phase1.ts shows no handling for src="javascript:", unquoted handlers, or remote resource URLs.

#!/bin/bash
set -euo pipefail

echo "=== src/core/engine/visual-compare.ts ==="
sed -n '1,40p' src/core/engine/visual-compare.ts

echo
echo "=== src/agents/ablation/run-phase1.ts ==="
sed -n '340,390p' src/agents/ablation/run-phase1.ts

Also applies to: 387-387

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agents/ablation/run-phase1.ts` around lines 346 - 352, Sanitizer and
renderer are insufficient: in src/agents/ablation/run-phase1.ts the finalHtml
cleanup only strips <script>, quoted on* handlers and javascript: hrefs, and the
renderer (renderCodeScreenshot / src/core/engine/visual-compare.ts newContext)
still allows JS/network; fix by (1) hardening the Playwright context used by
renderCodeScreenshot: createContext/newContext must set javaScriptEnabled: false
and block network requests (or use route to abort non-data: requests) so pages
cannot execute JS or fetch external resources, and (2) extend the HTML sanitizer
in run-phase1.ts (the finalHtml replacements) to also remove
src="javascript:..." and src='javascript:...', unquoted on* handlers (e.g.,
onload=...), strip or neutralize <iframe> and <object> tags, and remove/relocate
external resource URLs (e.g., <link href="https://fonts..." rel=...>, <img
src="http(s)://...">) so no remote resources can be loaded; reference functions:
renderCodeScreenshot, newContext (visual-compare.ts), and the finalHtml
replacement block in run-phase1.ts when making these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agents/ablation/run-phase1.ts`:
- Around line 620-655: Wrap each fixture and each individual run iteration in
their own try/catch blocks so a single failure does not abort the entire matrix:
call loadFigmaFileFromJson and generateDesignTree inside a fixture-level
try/catch and on error create a failure RunResult (matching the shape used by
allResults/summary.json) and push it to allResults, increment failure counters,
write/update summary.json, then continue to the next fixture; likewise wrap the
per-condition/per-run logic (including isCacheValid, readFileSync,
stripDesignTree, and runSingle) in a run-level try/catch and on any error
construct and push a failure RunResult (with type, run index, fixture id, and
error message), persist summary.json, and continue to the next run. Ensure you
reference generateDesignTree, loadFigmaFileFromJson, isCacheValid,
stripDesignTree, runSingle, allResults, and the existing summary write path so
resume support can skip completed entries on subsequent runs.

In `@src/core/engine/design-tree.ts`:
- Around line 351-364: The code currently can push both "position: absolute" and
later "position: relative" for the same node, which overwrites the absolute
positioning; update the logic in the block that adds the parent-relative rule so
it only adds "position: relative" when the node itself is not absolutely
positioned (check node.layoutPositioning !== "ABSOLUTE" or that styles does not
already include a "position: absolute" entry) and leave the existing absolute
branch (where node.layoutPositioning === "ABSOLUTE") unchanged; target the code
around node.layoutPositioning, the styles array, and the children check so
absolute nodes with absolute children keep "position: absolute" and only
non-absolute parents receive "position: relative".

---

Duplicate comments:
In `@src/agents/ablation/run-phase1.ts`:
- Around line 145-152: computeFixtureHash currently only hashes data.json and
the screenshot, missing assets that affect generateDesignTree (vectors/,
images/, mapping.json) so cache keys can be stale; update the cache key
generation to include the actual design tree used for the condition (or include
all asset/mapping files used by getDesignTreeOptions) by computing the tree via
generateDesignTree(file, getDesignTreeOptions(fixture)) first, then hashing the
serialized tree plus the reference screenshot (or hashed contents of vectors/,
images/, and mapping.json) before storing/looking up cache entries; apply the
same change to the other cache-key sites in this file that call
computeFixtureHash or build per-condition cache keys so all checks use the
tree-aware hash.
- Around line 346-352: Sanitizer and renderer are insufficient: in
src/agents/ablation/run-phase1.ts the finalHtml cleanup only strips <script>,
quoted on* handlers and javascript: hrefs, and the renderer
(renderCodeScreenshot / src/core/engine/visual-compare.ts newContext) still
allows JS/network; fix by (1) hardening the Playwright context used by
renderCodeScreenshot: createContext/newContext must set javaScriptEnabled: false
and block network requests (or use route to abort non-data: requests) so pages
cannot execute JS or fetch external resources, and (2) extend the HTML sanitizer
in run-phase1.ts (the finalHtml replacements) to also remove
src="javascript:..." and src='javascript:...', unquoted on* handlers (e.g.,
onload=...), strip or neutralize <iframe> and <object> tags, and remove/relocate
external resource URLs (e.g., <link href="https://fonts..." rel=...>, <img
src="http(s)://...">) so no remote resources can be loaded; reference functions:
renderCodeScreenshot, newContext (visual-compare.ts), and the finalHtml
replacement block in run-phase1.ts when making these changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 785f2a6d-6cd7-432d-8543-484709bb6501

📥 Commits

Reviewing files that changed from the base of the PR and between 68ab026 and 4bc2ba2.

⛔ Files ignored due to path filters (28)
  • fixtures/desktop-product-detail/data.json is excluded by !fixtures/**
  • fixtures/desktop-product-detail/screenshot-1920.png is excluded by !**/*.png, !fixtures/**, !**/*.png
  • fixtures/desktop-product-detail/vectors/chevron-down-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/chevron-down-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/chevron-up-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/figma-icon-2.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/figma-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/heart-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/linkedin-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/logo-instagram-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/logo-youtube-icon.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/mapping.json is excluded by !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-10.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-11.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-12.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-13.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-14.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-15.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-16.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-17.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-18.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-4.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-5.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-6.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-7.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-8.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/star-icon-9.svg is excluded by !**/*.svg, !fixtures/**
  • fixtures/desktop-product-detail/vectors/x-logo-icon.svg is excluded by !**/*.svg, !fixtures/**
📒 Files selected for processing (5)
  • src/agents/ablation/run-phase1.ts
  • src/cli/commands/implement.ts
  • src/cli/commands/save-fixture.ts
  • src/cli/helpers.ts
  • src/core/engine/design-tree.ts

let-sunny and others added 7 commits March 28, 2026 01:48
Previously used getNodeImages (renders entire node including children)
for image downloads. Background images with text/buttons on top would
include those children in the PNG, causing duplicate content when
HTML also renders the text.

Now uses getImageFills API to download the original fill image only.

Also:
- collectImageNodes returns imageRef for direct fill lookup
- getImageFills added to FigmaClient
- Figma default align-items: flex-start emitted explicitly
- textAlignVertical: TOP skipped (CSS default)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Don't emit position: relative if node is already position: absolute
  (absolute node with absolute children would get both, relative wins)
- Wrap per-condition loop in try/catch so one failed run doesn't abort
  the entire experiment matrix. Failures logged to skippedFixtures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MAX_TOKENS 16000 → 32000 (responses were truncated, HTML block
  never closed, parsing failed for all conditions)
- Parse regex: match unclosed code blocks (truncated by max_tokens)
  using (?:```|$) instead of requiring closing ```

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Anthropic SDK requires streaming for operations that may take >10 min.
With MAX_TOKENS=32000, non-streaming calls hit this limit.
Switch to client.messages.stream() + await stream.finalMessage().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removing all colors or all fonts guarantees pixel diff — these test
whether raw data exists, not whether the data is organized/useful.
The interesting question (tokenization helps?) is already covered
by design-token-references strip type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without [hover]: data, AI cannot produce hover CSS. Same logic as
color/typography — tests data existence, not AI's inference ability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ow coverage)

Same logic as hover — AI can't infer overflow: hidden if not provided.
Plus only a few nodes have it, so ΔV would be noise.

Remaining test types (5): layout-direction-spacing, size-constraints,
component-references, node-names-hierarchy, design-token-references.
These test AI's inference/organization ability, not data existence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant