Skip to content

Debug: Debug Dangerfile PR data content#34398

Closed
Sidnioulz wants to merge 49 commits into
project/sb-agentic-setupfrom
DEBUG-DANGER
Closed

Debug: Debug Dangerfile PR data content#34398
Sidnioulz wants to merge 49 commits into
project/sb-agentic-setupfrom
DEBUG-DANGER

Conversation

@Sidnioulz
Copy link
Copy Markdown
Member

@Sidnioulz Sidnioulz commented Mar 30, 2026

N/A

Summary by CodeRabbit

  • New Features

    • Automated PR-review AI skill that generates a scrollable single-page HTML review.
    • End-to-end evaluation system for Storybook setup: trial runner, grading, ghost-story checks, prompts, and utilities.
    • New agent drivers for Claude and Codex to run evaluation trials.
  • Improvements

    • Ghost-stories/testing accepts custom working directories.
    • Enable native TypeScript imports with explicit .ts extensions; Node guidance bumped to 22.22.1.
  • Dependencies

    • Added Claude and OpenAI Codex SDKs and a CLI helper library.

Eval system to test how well AI agents complete Storybook setup after
`npx storybook@latest init --yes` on real-world projects.

Features:
- Multi-LLM support: Claude Code (Opus/Sonnet/Haiku), GitHub Copilot CLI
  (Claude models + GPT-5.2-codex, GPT-5.2, GPT-5.1-codex-max)
- 6 test projects covering different tech stacks: styled-components/Redux,
  Tailwind/HeadlessUI, Zustand, ECharts, GraphQL
- Structured JSON output with execution metrics (cost, duration, turns)
  and grading (build success, TypeScript errors, quality score)
- CLI with project/model/agent selection, iterations, custom prompts

Usage: npx jiti scripts/eval/eval.ts --project wikitok --model claude-sonnet-4-6

Refs: #34295
Replace CLI process spawning with proper SDKs:
- Claude: @anthropic-ai/claude-agent-sdk with query() API
- Codex: @openai/codex-sdk with thread streaming API

Benefits: structured responses, proper cost tracking, no stream-json
parsing, no CLI installation dependency, full conversation transcript.
- Pre-prepared eval-baseline branches on forked repos (kasperpeulen/*)
  eliminates storybook init during trials
- Cache system: first run clones + installs, subsequent runs copy from
  cache — agent starts immediately
- Post-init baseline commit for clean git diffs
- Richer result schema: changed files, setup patterns, ghost stories
- Ghost stories grading via STORYBOOK_COMPONENT_PATHS + Vitest
- Setup pattern detection (tailwind, redux, router, etc.)
- Better prompt: allows story creation, focuses on real components
- Smarter cleanup: only removes starter stories, not project stories

Tested on wikitok: quality 1.0, build pass, 7/7 ghost stories, $0.78
- Google Sheets integration via Apps Script webhook (set EVAL_GOOGLE_SHEETS_URL)
- Run ID (per session) and upload ID (for grouping) like MCP eval
- Environment capture (node version, git branch/commit)
- Included google-apps-script.js for setting up the spreadsheet
Prompts are now composable: --prompt setup self-heal doctor
Each name maps to prompts/{name}.md, concatenated in order.

Available prompts:
- setup: base setup prompt (default)
- self-heal: iterative fix loop using vitest --project=storybook
- doctor: run diagnostics before large config changes

Updated verification to prefer vitest over storybook build since
storybook init creates the vitest integration automatically.
- Move cleanEnv to utils (was duplicated in prepare-trial and grade)
- Replace fast-glob/glob with Node 22 built-in fs.globSync
- Compact setup-patterns rules into tuple array
- Remove manual file recursion in setup-patterns and ghost-stories
- Fix save.ts bug (relative(EVAL_ROOT, "") → removed trialPath)
- Remove unused logWarn, simplify logging helpers
- Tighten prepare-trial install detection into single expression
- Delete config.ts and generate-prompt.ts — merge PROJECTS into types.ts,
  prompts into utils.ts, inline agents map into run-task.ts
- computeQualityScore takes options object instead of 4 positional params
- Quality score now includes ghost stories (40%), build (25%),
  typecheck (25%), and performance (10%)
- exec() uses tinyexec native timeout instead of manual AbortController
- Codex agent tracks token usage and estimates cost from pricing table
- Environment fields renamed to evalBranch/evalCommit for clarity
- IPC sentinel shared as exported constant between eval.ts and eval-parallel.ts
- Summary tables now show quality score column
- setup-patterns uses object array instead of positional tuples
- prepare-repos.ts uses shared exec(), static imports, consistent quotes
- google-apps-script.js modernized to const/let + arrow functions
- Remove SupportedModel type alias (was just string)
- Fix .gitignore trailing newline, prompt no longer hardcodes React+Vite
- MAX_TURNS extracted as named constant in claude agent
…rts)

Core source files use extensionless import specifiers that fail under
Node's native TypeScript loader. Read numPassedTests/numTotalTests
directly from the vitest JSON report instead.
Node's native TypeScript loader requires explicit .ts extensions.
Add them to parse-vitest-report.ts and categorize-render-errors.ts
so the eval can import parseVitestResults from core via relative path.
… tsconfig fixes

- Separate types from runtime config (types.ts + config.ts)
- Thread Logger through entire pipeline (fixes garbled parallel output)
- Replace fragile stdout sentinel IPC with Node fork/process.send
- Run storybook build + typecheck in parallel (saves ~60-120s/trial)
- Tighten Agent interface to single params object
- Add --agent/--model/--prompt filters to eval-parallel
- Make quality score weights configurable
- Add prompt template variable support
- Enable allowImportingTsExtensions in root and scripts tsconfigs
- Fix all pre-existing TS errors in eval files
kasperpeulen and others added 12 commits March 30, 2026 15:23
…slides

- Replace slideshow format with a scrollable HTML page using file cards
- Show complete file contents for new files, diffs for modified files
- Lexend + JetBrains Mono fonts, light/dark theme, mobile-responsive
- Static server on port 3000 (no live-reload)
- Issues shown inline as smell-boxes, never block page generation
- Simplified to 5 steps: gather → read → generate → serve → iterate
…bility review

- Two layers per area: curated walkthrough (API→Tests→Impl) + collapsed full files
- Use language-typescript with data-diff attribute instead of language-diff
- Post-processing script for line-level add/remove backgrounds on top of TS highlighting
- Add readability review guidance: logical order, clear names, comments, test quality
- Order areas high-level to low-level
Principle 3 now explicitly requires showing complete interface definitions
where they're first relevant, not just type names.
Extract AgentRunConfig { agent, model, effort } and compose it as
a `run` field in TrialConfig, ExecutionResult, and TrialResult
instead of spreading via extends/inheritance.
- AgentRunConfig → AgentVariant (it's the experimental variant, not a "run config")
- Agent → AgentDriver, AgentConfig → AgentDefinition (disambiguate)
- ExecutionResult → Execution, GradingResult → Grade, QualityResult → QualityScore
- TrialResult → TrialReport, TrialPaths → TrialWorkspace
- ChangedFile → FileChange, Pricing → TokenPricing, Environment → EvalEnvironment
- GhostStoriesResult → GhostStoryGrade, GhostStoryRunResult → GhostStoryOutput
- QualityWeights → ScoreWeights, DEFAULT_QUALITY_WEIGHTS → DEFAULT_SCORE_WEIGHTS
- Field renames: run → variant, grading → grade, quality → score,
  changedFiles → fileChanges, storybookFiles → storybookChanges
- Extract AgentExecuteParams with variant: AgentVariant (reuses the model)
- Remove redundant run field from Execution (lives on TrialReport only)
Every project needs a branch for cloning. The type now reflects
that, and the `branch!` assertion in prepareTrial is no longer needed.
…Trial, throw on ghost story errors

- Make AgentVariant a discriminated union on agent, with typed model/effort per agent
- Rename runTask→runTrial and run-task.ts→run-trial.ts for consistent domain naming
- Store full Project in TrialReport instead of just the name for reproducibility
- Replace error-object returns with GhostStoryError throws in ghost-stories.ts
- Fix successRate rounding to use Math.round(x*100)/100 consistently
- Extract scoring magic numbers into named constants
- Validate git status chars against known set instead of blind casting
- Truncate build/typecheck output at line boundaries
Copilot AI review requested due to automatic review settings March 30, 2026 10:23
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 30, 2026

Fails
🚫 PR is marked with "ci: do not merge" label.

Generated by 🚫 dangerJS against 493d332

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new scripts/eval harness for running automated Storybook-setup trials (including agent execution + grading) and makes supporting updates to enable native Node execution of TypeScript with explicit .ts import specifiers. It also includes a few core “ghost stories” utility updates and a temporary Dangerfile debug print.

Changes:

  • Add an eval CLI (node scripts/eval/eval.ts) with Claude/Codex agent drivers, trial orchestration, grading, prompts, and unit tests.
  • Extend “ghost stories” utilities (core + eval harness) with cwd support and naming updates.
  • Enable allowImportingTsExtensions in tsconfigs and add new script/dependencies (plus a stub storybook skill CLI command).

Reviewed changes

Copilot reviewed 31 out of 34 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/eval/eval.ts New eval CLI entrypoint (arg parsing, parallel trials, results output).
scripts/eval/config.ts Agent/project registry + pricing/cost estimation.
scripts/eval/types.ts Shared types for eval pipeline + scoring schema.
scripts/eval/lib/run-trial.ts Orchestrates prepare → run agent → grade → write report artifacts.
scripts/eval/lib/grade.ts Computes grade outputs + quality score (build, typecheck, ghost stories, perf).
scripts/eval/lib/ghost-stories.ts Eval-side ghost stories discovery + vitest execution/report parsing.
scripts/eval/lib/setup-patterns.ts Scans .storybook/ configs for setup signals (CSS, providers, aliases, etc.).
scripts/eval/lib/prepare-trial.ts Clones/caches benchmark repos and installs dependencies.
scripts/eval/lib/package-manager.ts Detects PM and runs installs for prepared trials.
scripts/eval/lib/agents/claude-code.ts Claude agent driver via @anthropic-ai/claude-agent-sdk.
scripts/eval/lib/agents/codex.ts Codex agent driver via @openai/codex-sdk.
scripts/eval/**.test.ts Vitest coverage for config/type invariants and eval utilities/pipeline.
scripts/eval/prompts/*.md Prompt templates used by the eval harness.
scripts/package.json Adds eval script + new dependencies for the eval system.
scripts/tsconfig.json Enables allowImportingTsExtensions; adjusts excludes.
code/tsconfig.json Enables allowImportingTsExtensions for code/ TypeScript.
code/core/src/core-server/utils/ghost-stories/run-story-tests.ts Renames export to runGhostStories and adds optional cwd.
code/core/src/core-server/utils/ghost-stories/get-candidates.ts Adds cwd option for globbing candidate components.
code/core/src/core-server/utils/ghost-stories/parse-vitest-report.ts Updates imports to use explicit .ts extensions.
code/core/src/core-server/server-channel/ghost-stories-channel.ts Switches to runGhostStories export.
code/core/src/shared/utils/categorize-render-errors.ts Updates import to explicit .ts extension.
code/lib/cli-storybook/src/bin/run.ts Adds a new (currently stubbed) skill CLI command.
scripts/dangerfile.js Adds debug logging of PR data (should be removed before merge).
yarn.lock Lockfile updates for new eval dependencies.
AGENTS.md Documents Node version and migration toward native Node TS execution.
.gitignore Adds eval-related ignore entries.
.agents/skills/review-pr/SKILL.md Adds an agent “skill” definition for narrative PR review output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/eval/eval.ts
Comment thread scripts/dangerfile.js
Comment thread scripts/eval/lib/run-trial.test.ts
Comment thread scripts/eval/eval.ts
@Sidnioulz Sidnioulz added bug ci:docs Run the CI jobs for documentation checks only. labels Mar 30, 2026
@Sidnioulz Sidnioulz changed the base branch from next to tech/channel-no-functions March 30, 2026 10:40
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 30, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1ee956ea-0d8f-48d0-830c-9458d7e4bc3c

📥 Commits

Reviewing files that changed from the base of the PR and between d48f719 and 493d332.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (33)
  • .agents/skills/review-pr/SKILL.md
  • .gitignore
  • AGENTS.md
  • code/core/src/core-server/server-channel/ghost-stories-channel.ts
  • code/core/src/core-server/utils/ghost-stories/get-candidates.ts
  • code/core/src/core-server/utils/ghost-stories/parse-vitest-report.ts
  • code/core/src/core-server/utils/ghost-stories/run-story-tests.ts
  • code/core/src/shared/utils/categorize-render-errors.ts
  • code/tsconfig.json
  • foo
  • scripts/dangerfile.js
  • scripts/eval/config.ts
  • scripts/eval/eval.ts
  • scripts/eval/lib/agents/claude-code.ts
  • scripts/eval/lib/agents/codex.ts
  • scripts/eval/lib/ghost-stories.ts
  • scripts/eval/lib/grade.test.ts
  • scripts/eval/lib/grade.ts
  • scripts/eval/lib/grading-helpers.test.ts
  • scripts/eval/lib/package-manager.ts
  • scripts/eval/lib/prepare-trial.ts
  • scripts/eval/lib/run-trial.test.ts
  • scripts/eval/lib/run-trial.ts
  • scripts/eval/lib/setup-patterns.test.ts
  • scripts/eval/lib/setup-patterns.ts
  • scripts/eval/lib/utils.test.ts
  • scripts/eval/lib/utils.ts
  • scripts/eval/prompts/self-heal.md
  • scripts/eval/prompts/setup.md
  • scripts/eval/types.test.ts
  • scripts/eval/types.ts
  • scripts/package.json
  • scripts/tsconfig.json

📝 Walkthrough

Walkthrough

Adds a new PR-review agent skill and a comprehensive evaluation harness for Storybook setup: typed configs, CLI orchestration, agent drivers, grading/ghost-story execution, tests, utilities, tsconfig/import adjustments, package changes, and a modified dangerfile and .gitignore.

Changes

Cohort / File(s) Summary
Agent Skill
./.agents/skills/review-pr/SKILL.md
New agent skill that builds a scrollable single-page HTML PR review (sticky nav, PR header, per-area walkthroughs, collapsed full diffs), writes to ~/life/slideshows/pr-<number>/index.html, and includes a small static server snippet.
TypeScript config & imports
code/tsconfig.json, scripts/tsconfig.json, code/core/src/shared/utils/categorize-render-errors.ts, code/core/src/core-server/utils/ghost-stories/parse-vitest-report.ts
Enabled allowImportingTsExtensions, updated local imports to explicit .ts extensions, and added a tsconfig comment about native Node .ts execution.
Ghost-stories utilities & channel
code/core/src/core-server/utils/ghost-stories/get-candidates.ts, code/core/src/core-server/utils/ghost-stories/run-story-tests.ts, code/core/src/core-server/server-channel/ghost-stories-channel.ts
Added optional cwd parameters for candidate discovery and test execution, renamed run entry to runGhostStories, adjusted imports and removed unused logger import; channel now calls runGhostStories.
Dangerfile & gitignore
scripts/dangerfile.js, .gitignore
checkTargetBranch() now logs PR JSON and unconditionally calls fail() for non-owner/member authors; .gitignore added scripts/eval/.cache and scripts/eval/results and fixed newline.
Eval types & config
scripts/eval/types.ts, scripts/eval/config.ts
New typed contracts for agents/trials/grades/reports and agent/project/pricing configuration with cost-estimation helper and PROJECTS list.
Eval CLI & orchestration
scripts/eval/eval.ts, scripts/eval/lib/run-trial.ts
New CLI eval.ts with zod-validated options and multi-trial expansion; run-trial orchestrates prepare → agent execute → grade → report write.
Agent drivers
scripts/eval/lib/agents/claude-code.ts, scripts/eval/lib/agents/codex.ts
New AgentDriver implementations for Claude and Codex: streamed event consumption, logging, token/turn/cost extraction, transcript persistence, and Execution return values.
Ghost-stories eval (new)
scripts/eval/lib/ghost-stories.ts
Component discovery via glob, Vitest invocation with JSON report capture, failure classification, and GhostStoryOutput summary.
Grading & setup-patterns
scripts/eval/lib/grade.ts, scripts/eval/lib/setup-patterns.ts
New grading pipeline: parse changed files, run build/tsc, compute quality score, optionally run ghost stories; setup-pattern detection scans .storybook/ files with regex rules.
Helpers & workspace
scripts/eval/lib/utils.ts, scripts/eval/lib/package-manager.ts, scripts/eval/lib/prepare-trial.ts
Logging/formatting/trial-id/prompt utilities, environment capture, package-manager detection and install wrapper, trial workspace clone/cache and results dir setup.
Tests
scripts/eval/lib/*.test.ts, scripts/eval/types.test.ts
Extensive Vitest suites covering grading helpers, setup-pattern detection, utils, run-trial sequencing, types/config invariants, and integration-like behavior with mocks.
Prompts & package
scripts/eval/prompts/setup.md, scripts/eval/prompts/self-heal.md, scripts/package.json
Added evaluation prompts (setup, self-heal) and an eval npm script plus dependencies for Anthropic/OpenAI SDKs and citty.
Misc artifact
foo
New single-line file containing terminal control escape sequences (likely accidental artifact).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as Eval CLI
    participant Runner as runTrial()
    participant Prep as prepareTrial()
    participant Agent as AgentDriver (claude/codex)
    participant Grade as grade()
    participant Ghost as runGhostStories()

    CLI->>Runner: runTrial(config)
    Runner->>Prep: prepareTrial(project, trialId)
    Prep-->>Runner: TrialWorkspace (repoRoot, projectPath, resultsDir, baselineCommit)

    Runner->>Agent: execute(prompt, projectPath, variant, resultsDir)
    activate Agent
    Agent->>Agent: stream events (messages, turns, token usage)
    Agent-->>Runner: Execution (duration, cost?, turns)
    deactivate Agent

    Runner->>Grade: grade(workspace, execution.duration)
    activate Grade
    Grade->>Grade: git diff, storybook build, tsc typecheck
    alt build success
        Grade->>Ghost: runGhostStories(candidates, { cwd })
        Ghost->>Ghost: npx vitest run (storybook) -> JSON report
        Ghost-->>Grade: GhostStoryOutput (total, passed, successRate)
    end
    Grade-->>Runner: Grade + QualityScore
    deactivate Grade

    Runner->>Runner: write summary.json, prompt.md
    Runner-->>CLI: TrialReport
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@Sidnioulz Sidnioulz changed the base branch from tech/channel-no-functions to project/sb-agentic-setup March 30, 2026 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug ci: do not merge ci:docs Run the CI jobs for documentation checks only.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants