Debug: Debug Dangerfile PR data content 2 by Sidnioulz · Pull Request #34399 · storybookjs/storybook

Sidnioulz · 2026-03-30T10:47:53Z

Closes #

What I did

Checklist for Contributors

Testing

The changes in this PR are covered in the following automated tests:

stories
unit tests
integration tests
end-to-end tests

Manual testing

ribbit

Documentation

Add or update documentation reflecting your changes
If you are deprecating/removing a feature, make sure to update
MIGRATION.MD

Checklist for Maintainers

When this PR is ready for testing, make sure to add ci:normal, ci:merged or ci:daily GH label to it to run a specific set of sandboxes. The particular set of sandboxes can be found in code/lib/cli-storybook/src/sandbox-templates.ts
Make sure this PR contains one of the labels below:
Available labels
- bug: Internal changes that fixes incorrect behavior.
- maintenance: User-facing maintenance tasks.
- dependencies: Upgrading (sometimes downgrading) dependencies.
- build: Internal-facing build tooling & test updates. Will not show up in release changelog.
- cleanup: Minor cleanup style change. Will not show up in release changelog.
- documentation: Documentation only changes. Will not show up in release changelog.
- feature request: Introducing a new feature.
- BREAKING CHANGE: Changes that break compatibility in some way with current major version.
- other: Changes that don't fit in the above categories.

🦋 Canary release

This PR does not have a canary release associated. You can request a canary release of this pull request by mentioning the @storybookjs/core team here.

core team members can create a canary release here or locally with gh workflow run --repo storybookjs/storybook publish.yml --field pr=<PR_NUMBER>

Summary by CodeRabbit

Release Notes

New Features
- Added evaluation framework for benchmarking Storybook setup across projects with configurable agents and models.
- Introduced skill CLI command for Storybook.
- Added ghost stories testing workflow for automated component validation.
- Added setup guidance prompts and self-healing iteration workflow.
Updates
- Updated Node.js version guidance to 22.22.1.
- Enhanced TypeScript support with native .ts file execution.
- Improved component candidate discovery and grading system.

Eval system to test how well AI agents complete Storybook setup after `npx storybook@latest init --yes` on real-world projects. Features: - Multi-LLM support: Claude Code (Opus/Sonnet/Haiku), GitHub Copilot CLI (Claude models + GPT-5.2-codex, GPT-5.2, GPT-5.1-codex-max) - 6 test projects covering different tech stacks: styled-components/Redux, Tailwind/HeadlessUI, Zustand, ECharts, GraphQL - Structured JSON output with execution metrics (cost, duration, turns) and grading (build success, TypeScript errors, quality score) - CLI with project/model/agent selection, iterations, custom prompts Usage: npx jiti scripts/eval/eval.ts --project wikitok --model claude-sonnet-4-6 Refs: #34295

Replace CLI process spawning with proper SDKs: - Claude: @anthropic-ai/claude-agent-sdk with query() API - Codex: @openai/codex-sdk with thread streaming API Benefits: structured responses, proper cost tracking, no stream-json parsing, no CLI installation dependency, full conversation transcript.

- Pre-prepared eval-baseline branches on forked repos (kasperpeulen/*) eliminates storybook init during trials - Cache system: first run clones + installs, subsequent runs copy from cache — agent starts immediately - Post-init baseline commit for clean git diffs - Richer result schema: changed files, setup patterns, ghost stories - Ghost stories grading via STORYBOOK_COMPONENT_PATHS + Vitest - Setup pattern detection (tailwind, redux, router, etc.) - Better prompt: allows story creation, focuses on real components - Smarter cleanup: only removes starter stories, not project stories Tested on wikitok: quality 1.0, build pass, 7/7 ghost stories, $0.78

- Google Sheets integration via Apps Script webhook (set EVAL_GOOGLE_SHEETS_URL) - Run ID (per session) and upload ID (for grouping) like MCP eval - Environment capture (node version, git branch/commit) - Included google-apps-script.js for setting up the spreadsheet

Prompts are now composable: --prompt setup self-heal doctor Each name maps to prompts/{name}.md, concatenated in order. Available prompts: - setup: base setup prompt (default) - self-heal: iterative fix loop using vitest --project=storybook - doctor: run diagnostics before large config changes Updated verification to prefer vitest over storybook build since storybook init creates the vitest integration automatically.

- Move cleanEnv to utils (was duplicated in prepare-trial and grade) - Replace fast-glob/glob with Node 22 built-in fs.globSync - Compact setup-patterns rules into tuple array - Remove manual file recursion in setup-patterns and ghost-stories - Fix save.ts bug (relative(EVAL_ROOT, "") → removed trialPath) - Remove unused logWarn, simplify logging helpers - Tighten prepare-trial install detection into single expression

…al dirs

… imports

…ogging

…env, 1s timeout, no --project

- Delete config.ts and generate-prompt.ts — merge PROJECTS into types.ts, prompts into utils.ts, inline agents map into run-task.ts - computeQualityScore takes options object instead of 4 positional params - Quality score now includes ghost stories (40%), build (25%), typecheck (25%), and performance (10%) - exec() uses tinyexec native timeout instead of manual AbortController - Codex agent tracks token usage and estimates cost from pricing table - Environment fields renamed to evalBranch/evalCommit for clarity - IPC sentinel shared as exported constant between eval.ts and eval-parallel.ts - Summary tables now show quality score column - setup-patterns uses object array instead of positional tuples - prepare-repos.ts uses shared exec(), static imports, consistent quotes - google-apps-script.js modernized to const/let + arrow functions - Remove SupportedModel type alias (was just string) - Fix .gitignore trailing newline, prompt no longer hardcodes React+Vite - MAX_TURNS extracted as named constant in claude agent

…rts) Core source files use extensionless import specifiers that fail under Node's native TypeScript loader. Read numPassedTests/numTotalTests directly from the vitest JSON report instead.

Node's native TypeScript loader requires explicit .ts extensions. Add them to parse-vitest-report.ts and categorize-render-errors.ts so the eval can import parseVitestResults from core via relative path.

… tsconfig fixes - Separate types from runtime config (types.ts + config.ts) - Thread Logger through entire pipeline (fixes garbled parallel output) - Replace fragile stdout sentinel IPC with Node fork/process.send - Run storybook build + typecheck in parallel (saves ~60-120s/trial) - Tighten Agent interface to single params object - Add --agent/--model/--prompt filters to eval-parallel - Make quality score weights configurable - Add prompt template variable support - Enable allowImportingTsExtensions in root and scripts tsconfigs - Fix all pre-existing TS errors in eval files

…from core-server, inline into grade.ts - Rename runStoryTests to runGhostStories in core (clearer name) - Add cwd parameter to runGhostStories and getComponentCandidates - Export getComponentCandidates, runGhostStories, TestRunSummary from core-server index - Remove eval ghost-stories.ts wrapper — inline logic into grade.ts - Remove eval ghost-stories.test.ts — core already has its own tests - Revert speculative isCandidate/isValidCandidate export (unused) - Remove unused logger import from get-candidates.ts

…ions The core-server barrel index re-exports modules (build-static, etc.) that fail under native Node TS. Import ghost-stories utilities directly from their source files instead, and add .ts extensions to internal imports in the import chain.

…rop exec wrapper - Replace fork/IPC parallel execution with direct Promise.allSettled + prefixed loggers - Make blocking fs calls async (cpSync→cp, writeFileSync→writeFile, mkdirSync→mkdir) - Remove Google Sheets upload, google-apps-script.js, and upload-id/run-id plumbing - Drop custom exec wrapper — use tinyexec's x() directly at call sites - Remove runId/uploadId from runTask signature and both CLI entry points

- Replace plain interfaces with Zod schemas for runtime validation (types.ts) - Merge eval.ts + eval-parallel.ts into a single CLI with comma-separated args - Fix deep core imports to use barrel export (core-server/index.ts) - Extract shared package-manager detection and install (lib/package-manager.ts) - Move pricing tables and model ID mappings into config.ts - Make setup-patterns.ts fully async with fs/promises - Add formatTable utility with ANSI-aware column alignment - Integrate prepare-repos.ts with shared logger and PM utilities

…slides - Replace slideshow format with a scrollable HTML page using file cards - Show complete file contents for new files, diffs for modified files - Lexend + JetBrains Mono fonts, light/dark theme, mobile-responsive - Static server on port 3000 (no live-reload) - Issues shown inline as smell-boxes, never block page generation - Simplified to 5 steps: gather → read → generate → serve → iterate

…bility review - Two layers per area: curated walkthrough (API→Tests→Impl) + collapsed full files - Use language-typescript with data-diff attribute instead of language-diff - Post-processing script for line-level add/remove backgrounds on top of TS highlighting - Add readability review guidance: logical order, clear names, comments, test quality - Order areas high-level to low-level

Principle 3 now explicitly requires showing complete interface definitions where they're first relevant, not just type names.

Extract AgentRunConfig { agent, model, effort } and compose it as a `run` field in TrialConfig, ExecutionResult, and TrialResult instead of spreading via extends/inheritance.

- AgentRunConfig → AgentVariant (it's the experimental variant, not a "run config") - Agent → AgentDriver, AgentConfig → AgentDefinition (disambiguate) - ExecutionResult → Execution, GradingResult → Grade, QualityResult → QualityScore - TrialResult → TrialReport, TrialPaths → TrialWorkspace - ChangedFile → FileChange, Pricing → TokenPricing, Environment → EvalEnvironment - GhostStoriesResult → GhostStoryGrade, GhostStoryRunResult → GhostStoryOutput - QualityWeights → ScoreWeights, DEFAULT_QUALITY_WEIGHTS → DEFAULT_SCORE_WEIGHTS - Field renames: run → variant, grading → grade, quality → score, changedFiles → fileChanges, storybookFiles → storybookChanges - Extract AgentExecuteParams with variant: AgentVariant (reuses the model) - Remove redundant run field from Execution (lives on TrialReport only)

Every project needs a branch for cloning. The type now reflects that, and the `branch!` assertion in prepareTrial is no longer needed.

…Trial, throw on ghost story errors - Make AgentVariant a discriminated union on agent, with typed model/effort per agent - Rename runTask→runTrial and run-task.ts→run-trial.ts for consistent domain naming - Store full Project in TrialReport instead of just the name for reproducibility - Replace error-object returns with GhostStoryError throws in ghost-stories.ts - Fix successRate rounding to use Math.round(x*100)/100 consistently - Extract scoring magic numbers into named constants - Validate git status chars against known set instead of blind casting - Truncate build/typecheck output at line boundaries

Copilot

Pull request overview

This PR adds a new “eval” harness under scripts/eval/ to benchmark/grade Storybook setup work using AI agents (Claude + Codex), while also advancing the repo’s move toward native Node execution of .ts files (explicit .ts import extensions). It also updates core “ghost stories” utilities and introduces some debug/CI changes.

Changes:

Add a new scripts/eval/ pipeline (prepare trial → run agent → grade results) with prompts, scoring, and Vitest coverage.
Update TypeScript configs to support explicit .ts import extensions (native Node TS execution migration).
Refactor/extend core ghost-stories utilities (renames runStoryTests → runGhostStories, adds optional cwd) and add a stub CLI command (skill).

Reviewed changes

Copilot reviewed 32 out of 35 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
yarn.lock	Lockfile updates for new agent/eval dependencies (Anthropic SDK, Codex SDK, citty, transitive deps).
scripts/tsconfig.json	Enables `allowImportingTsExtensions`; excludes one eval artifact file from typechecking.
scripts/package.json	Adds `eval` script entrypoint and new dependencies for agent SDKs + `citty`.
scripts/eval/types.ts	Defines core types for eval trials, grading, scoring, and reporting.
scripts/eval/types.test.ts	Validates `AGENTS`/`PROJECTS` config invariants (defaults, mappings, uniqueness).
scripts/eval/prompts/setup.md	Adds the “setup” prompt used to guide agents toward stable Storybook setup.
scripts/eval/prompts/self-heal.md	Adds a “self-heal” loop prompt focused on iterating via `vitest --project=storybook`.
scripts/eval/lib/utils.ts	Implements shared utilities: logging, formatting, prompt loading, environment capture, table formatting.
scripts/eval/lib/utils.test.ts	Unit tests for formatting helpers, prompt loading/listing, and table alignment (incl ANSI handling).
scripts/eval/lib/setup-patterns.ts	Detects common Storybook setup patterns by scanning `.storybook/` files.
scripts/eval/lib/setup-patterns.test.ts	Tests setup-pattern detection against a temporary `.storybook/` tree.
scripts/eval/lib/run-trial.ts	Orchestrates a full trial (prepare → capture env → prompt → agent → grade → summary.json).
scripts/eval/lib/run-trial.test.ts	Mocks pipeline dependencies and verifies report assembly, sequencing, and output files.
scripts/eval/lib/prepare-trial.ts	Clones/caches benchmark repos and installs deps before the agent runs.
scripts/eval/lib/package-manager.ts	Detects package manager via lockfiles and runs installs.
scripts/eval/lib/grading-helpers.test.ts	Integration-style tests composing candidate discovery, setup patterns, git parsing, and scoring.
scripts/eval/lib/grade.ts	Implements grading: changed files, setup patterns, `storybook build`, `tsc`, ghost stories, and scoring.
scripts/eval/lib/grade.test.ts	Unit tests for file filtering, scoring math, TS error counting, and git name-status parsing.
scripts/eval/lib/ghost-stories.ts	Eval-side ghost story runner (find candidates, run vitest JSON reporter, parse counts).
scripts/eval/lib/agents/codex.ts	Codex agent driver using `@openai/codex-sdk`, streaming events and estimating cost.
scripts/eval/lib/agents/claude-code.ts	Claude agent driver using `@anthropic-ai/claude-agent-sdk` with debug logging and transcript capture.
scripts/eval/eval.ts	CLI entrypoint for running one or many eval trials in parallel with zod-validated args.
scripts/eval/config.ts	Defines agent model/effort/pricing tables and benchmark projects (eval-baseline repos).
scripts/dangerfile.js	Adds debug printing and an unconditional `fail()` for non-team PRs in target-branch check.
foo	New file containing terminal escape sequences (appears accidental).
code/tsconfig.json	Enables `allowImportingTsExtensions` in the main `code/` TS config.
code/lib/cli-storybook/src/bin/run.ts	Adds a new `skill` command (currently a stub implementation).
code/core/src/shared/utils/categorize-render-errors.ts	Switches relative import to explicit `.ts` extension.
code/core/src/core-server/utils/ghost-stories/run-story-tests.ts	Renames exported runner to `runGhostStories` and adds optional `cwd`; updates imports to `.ts`.
code/core/src/core-server/utils/ghost-stories/parse-vitest-report.ts	Updates imports to explicit `.ts` extensions.
code/core/src/core-server/utils/ghost-stories/get-candidates.ts	Adds configurable `cwd` for globbing and updates import to explicit `.ts`.
code/core/src/core-server/server-channel/ghost-stories-channel.ts	Updates channel to call renamed `runGhostStories`.
AGENTS.md	Updates Node version and documents native Node TS execution migration guidance.
.gitignore	Adds ignore entries for eval outputs (currently pointing under `scripts/eval/`).
.agents/skills/review-pr/SKILL.md	Adds a new “review-pr” agent skill document (HTML single-page PR review generator).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T10:55:30Z

+      return ['pnpm', ['install', '--no-frozen-lockfile']];
+    case 'yarn':
+      return [
+        'yarn',
+        existsSync(join(dir, '.yarnrc.yml')) ? ['install', '--no-immutable'] : ['install'],
+      ];
+    case 'bun':
+      return ['bun', ['install']];


This eval harness clones external repos and installs dependencies; however only the npm path uses --ignore-scripts, while pnpm/yarn/bun will run lifecycle scripts by default. For safety and reproducibility, consider consistently disabling install scripts (or explicitly documenting/isolating why it’s safe to run them) across all package managers.

Suggested change

return ['pnpm', ['install', '--no-frozen-lockfile']];

case 'yarn':

return [

'yarn',

existsSync(join(dir, '.yarnrc.yml')) ? ['install', '--no-immutable'] : ['install'],

];

case 'bun':

return ['bun', ['install']];

return ['pnpm', ['install', '--no-frozen-lockfile', '--ignore-scripts']];

case 'yarn':

return [

'yarn',

existsSync(join(dir, '.yarnrc.yml'))

? ['install', '--no-immutable', '--ignore-scripts']

: ['install', '--ignore-scripts'],

];

case 'bun':

return ['bun', ['install', '--ignore-scripts']];

Copilot · 2026-03-30T10:55:30Z

+  console.log('authorAssociation', authorAssociation);
+  console.log('author', author);
+  console.log(JSON.stringify(danger.github.pr, null, 2));
+


The added debug logging prints the full PR payload (including potentially large or sensitive metadata) to Danger’s output. Please remove these console.log calls (or guard them behind an explicit debug flag) to keep CI logs clean and avoid leaking data.

Suggested change

console.log('authorAssociation', authorAssociation);

console.log('author', author);

console.log(JSON.stringify(danger.github.pr, null, 2));

Copilot · 2026-03-30T10:55:30Z

+  fail(JSON.stringify(danger.github.pr, null, 2));
+


fail(JSON.stringify(danger.github.pr, null, 2)) will make Danger fail every non-team PR unconditionally, bypassing the actual target-branch logic below. This should be removed; only fail when the base branch is invalid.

Copilot · 2026-03-30T10:55:31Z

@@ -0,0 +1 @@
+[?2004h[?1049h[22;0;0t[1;58r(B[m[4l[?7h[39;49m[?1h=[?1h=[?25l[39;49m(B[m[H[2J[56;121H(B[0;7m[ Reading... ](B[m[56;120H(B[0;7m[ Read 1 line ](B[m[H(B[0;7m  GNU nano 8.7.1                                                                                                                foo                                                                                                                           [1;253H(B[m[57d(B[0;7m^G(B[m Help[57;19H(B[0;7m^O(B[m Write Out[37G(B[0;7m^F(B[m Where Is[55G(B[0;7m^K(B[m Cut[57;73H(B[0;7m^T(B[m Execute[57;91H(B[0;7m^C(B[m Location[109G(B[0;7mM-U(B[m Undo[57;127H(B[0;7mM-A(B[m Set Mark[145G(B[0;7mM-](B[m To Bracket    (B[0;7mM-B(B[m Previous[181G(B[0;7m◂(B[m Back[57;199H(B[0;7m^◂(B[m Prev Word[217G(B[0;7m^A(B[m Home[57;235H(B[0;7m^P(B[m Prev Line[58d(B[0;7m^X(B[m Exit[58;19H(B[0;7m^R(B[m Read File[37G(B[0;7m^\(B[m Replace[58;55H(B[0;7m^U(B[m Paste[58;73H(B[0;7m^J(B[m Justify[58;91H(B[0;7m^/(B[m Go To Line     (B[0;7mM-E(B[m Redo[58;127H(B[0;7mM-6(B[m Copy[58;145H(B[0;7m^B(B[m Where Was[163G(B[0;7mM-F(B[m Next[58;181H(B[0;7m▸(B[m Forward[58;199H(B[0;7m^▸(B[m Next Word[217G(B[0;7m^E(B[m End[58;235H(B[0;7m^N(B[m Next Line[2d^[[?2004h^[[?1049h^[[22;0;0t^[[1;58r^[(B^[[m^[[4l^[[?7h^[[39;49m^[[?1h^[=^[[?1h^[=^[[?25l^[[39;49m^[(B^[[m^[[H^[[2J^[[56;121H^[(B^[[0;7m[ Reading... ]^[(B^[[m[?12l[?25h[?25l[56;99H(B[0;7m[ line  1/2 (50%), col  1/159 (  0%), char   0/135 ( 0%) ](B[m[?12l[?25h[2d[?25l[56d[J[58d[?12l[?25h[58;1H[?1049l[23;0;0t[?1l>[?2004l


This file appears to contain raw terminal escape sequences (likely an accidentally committed editor buffer) and has no meaningful source content. It should be removed from the repository to avoid polluting diffs and tooling.

Copilot · 2026-03-30T10:55:31Z

+    "@anthropic-ai/claude-agent-sdk": "^0.2.85",
    "@fal-works/esbuild-plugin-global-externals": "^2.1.2",
    "@google-cloud/bigquery": "^6.2.1",
    "@octokit/graphql": "^5.0.6",
    "@octokit/request": "^8.4.1",
+    "@openai/codex-sdk": "^0.117.0",
    "@polka/parse": "^1.0.0-next.28",


@anthropic-ai/claude-agent-sdk declares a peer dependency on zod@^4, but this workspace currently depends on zod@^3.25.76. This will cause peer-dep warnings and can break at runtime if the SDK relies on Zod v4 APIs; either upgrade zod in scripts/ to a compatible major or use an SDK version compatible with Zod v3.

Copilot · 2026-03-30T10:55:31Z

+    "citty": "^0.2.1",
    "codecov": "^3.8.1",


citty is added as a dependency but there are no references to it in the scripts/ workspace. If it’s not used yet, please remove it to avoid unnecessary install surface area; otherwise, add the usage in this PR so the dependency is justified.

coderabbitai · 2026-03-30T11:02:34Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive Storybook evaluation system for testing automated setup workflows. It adds PR review skill documentation, enables TypeScript's native .ts import resolution, refactors the ghost stories testing pipeline to support configurable working directories, introduces a new CLI "skill" command, and implements a complete evaluation harness with agent drivers (Claude, Codex), grading logic, and extensive test coverage.

Changes

Cohort / File(s)	Summary
PR Review Skill Documentation `.agents/skills/review-pr/SKILL.md`	New skill definition providing an HTML document template for summarizing pull requests with sticky navigation, API/test/implementation walktabs, and diff styling via Highlight.js and custom post-processing.
TypeScript & Runtime Configuration `code/tsconfig.json`, `scripts/tsconfig.json`, `AGENTS.md`	Enable `allowImportingTsExtensions` for native Node.js TypeScript execution; update Node.js version guidance to 22.22.1 and document migration from `jiti` to direct `.ts` file imports.
Ghost Stories Test Refactoring `code/core/src/core-server/server-channel/ghost-stories-channel.ts`, `code/core/src/core-server/utils/ghost-stories/...`	Rename `runStoryTests` to `runGhostStories` and add optional `cwd` parameter support; update imports to use explicit `.ts` extensions; remove unused logger dependency.
CLI Skill Command `code/lib/cli-storybook/src/bin/run.ts`	Add new `skill` command with `--package-manager` and `--config-dir` options for executing Storybook skills (currently a stub with placeholder logging).
Evaluation System Type Definitions `scripts/eval/types.ts`	Define core data models: `Logger`, `AgentVariant`, `TrialConfig`, `TrialWorkspace`, `Execution`, `Grade`, `QualityScore`, `TrialReport`, and related interfaces for the evaluation pipeline.
Evaluation System Configuration `scripts/eval/config.ts`	Export agent configurations (`AGENTS`), project definitions (`PROJECTS`), token pricing tables, and `estimateCost()` function for computing trial execution costs across Claude and Codex models.
Evaluation Harness `scripts/eval/eval.ts`	Main CLI entry point orchestrating the evaluation workflow: parses arguments, derives trial configurations, executes trials concurrently via `runTrial`, aggregates results, and outputs summary tables with cost/performance metrics.
Agent Implementations `scripts/eval/lib/agents/claude-code.ts`, `scripts/eval/lib/agents/codex.ts`	Concrete `AgentDriver` implementations for Claude and Codex models; stream agent output, log execution details, compute costs, and write transcripts to results directories.
Grading & Scoring Logic `scripts/eval/lib/grade.ts`, `scripts/eval/lib/setup-patterns.ts`	Implement weighted quality scoring (build success, TypeScript errors, ghost story pass rate, duration), run `storybook build` and `tsc` checks with timeouts, detect Storybook setup patterns via regex, and conditionally run ghost story evaluation.
Ghost Stories Runner `scripts/eval/lib/ghost-stories.ts`	Discover component candidates via globbing, run Vitest with `STORYBOOK_COMPONENT_PATHS` env variable, parse JSON report, compute success rate, and provide standardized error reporting.
Trial Orchestration & Utilities `scripts/eval/lib/run-trial.ts`, `scripts/eval/lib/prepare-trial.ts`, `scripts/eval/lib/utils.ts`, `scripts/eval/lib/package-manager.ts`	Coordinate single-trial execution; prepare and cache project repositories; provide logging, formatting, prompt loading, environment capture, and package manager detection utilities.
Evaluation System Tests `scripts/eval/lib/...test.ts`, `scripts/eval/types.test.ts`	Add 1,200+ lines of Vitest suites validating agent configuration invariants, grading helper behavior, trial orchestration sequencing, setup pattern detection, file parsing, and utility functions.
Evaluation Prompts `scripts/eval/prompts/setup.md`, `scripts/eval/prompts/self-heal.md`	Markdown guides instructing agents on Storybook setup procedures and iterative self-healing workflows using Vitest integration for story validation.
Dependencies & Infrastructure `.gitignore`, `scripts/package.json`, `scripts/dangerfile.js`, `foo`	Add eval system cache/results directories to ignore patterns; introduce `@anthropic-ai/claude-agent-sdk`, `@openai/codex-sdk`, and `citty` dependencies; add debug logging to PR validation; include control character file (unclear purpose).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client CLI
    participant Eval as eval.ts
    participant Trial as runTrial()
    participant Prep as prepareTrial()
    participant Grade as grade()
    participant Agent as AgentDriver
    participant Vitest as Vitest<br/>(Ghost Stories)
    participant Build as Build & TSC

    Client->>Eval: npm run eval (with args)
    Eval->>Eval: Parse arguments & derive<br/>trial configs
    Eval->>Trial: Execute trial config<br/>(concurrent)
    Trial->>Prep: Prepare workspace<br/>(clone/install)
    Prep-->>Trial: TrialWorkspace
    Trial->>Agent: execute({prompt,<br/>projectPath, ...})
    Agent->>Agent: Stream & log<br/>agent output
    Agent-->>Trial: Execution{cost,<br/>duration, turns}
    Trial->>Grade: grade(workspace,<br/>logger, duration)
    Grade->>Build: Run storybook build<br/>& tsc --noEmit
    Build-->>Grade: Outputs & errors
    Grade->>Vitest: runGhostStories<br/>(candidates)
    Vitest-->>Grade: GhostStoryGrade
    Grade-->>Trial: {grade,<br/>QualityScore}
    Trial->>Trial: Assemble TrialReport<br/>& write summary.json
    Trial-->>Eval: TrialReport
    Eval->>Eval: Aggregate results &<br/>format output table
    Eval-->>Client: Summary with cost,<br/>metrics, success rate

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

The PR spans 25+ new TypeScript files introducing a modular but substantial evaluation system. While most components are isolated and logic is relatively straightforward (no complex algorithms), understanding the orchestration flow, agent integrations, grading architecture, and ensuring type safety across the pipeline requires careful cross-file reasoning. The breadth of changes across agents, grading, utilities, and tests demands attention to architectural consistency and API contracts.

Possibly related PRs

Core: Improve the story generation experience #33259: Closely related code-level changes to ghost-stories workflow and identical files (ghost-stories-channel.ts, get-candidates.ts, runGhostStories, parse-vitest-report.ts) across both PRs.
Danger: Fail/warn when PR targets wrong base branch #34007: Both PRs modify scripts/dangerfile.js in the checkTargetBranch function with overlapping CI validation logic.
CLI: Implement design feedback #32984: Both PRs add/modify code/lib/cli-storybook/src/bin/run.ts to extend CLI command support (main PR adds "skill" command).

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

yannbf and others added 30 commits March 24, 2026 12:43

add todo for skill command

8a8d429

Merge branch 'next' into project/sb-agentic-setup

7b578f8

Merge branch 'next' into project/sb-agentic-setup

18ca3df

Merge branch 'next' into project/sb-agentic-setup

d48f719

Remove cleanEnv from grading — only needed for installDeps

6c3e716

Remove cleanEnv entirely — .npmrc is only in the monorepo, not in tri…

e11b9bd

…al dirs

Switch from jiti to native Node TS support, add .ts extensions to all…

2be54f4

… imports

Update models: Sonnet 4.6, Opus 4.6, Haiku 4.5, GPT 5.4 Medium/High

5aabbda

Decouple agent × model × effort as three independent axes

986988a

Simplify prompt to single name, add per-agent default model

1ee462d

Split into eval.ts (single run) and eval-parallel.ts (8 runs)

06c5f9a

Add prefixed logging for parallel runs

2336c46

Spawn separate node processes in eval-parallel for multi-core CPU usage

ca03d7c

Live-stream prefixed logs from child processes, improve Codex agent l…

8629948

…ogging

Fix Codex agent logging to match actual SDK event/item types

1606025

Decouple agent and model — choose agent then model independently

47e64e3

Clean up names: claude-code→claude, claude-sonnet-4-6→sonnet-4.6

f6671a1

Infer agent from model — node eval.ts -m gpt-5.4 auto-selects codex

5701e8d

Fix parallel race condition: add prompt + random suffix to trial IDs

bdbae36

Use crypto.randomUUID for unique trial IDs

8819ae2

Fix ghost stories to match core implementation: pass paths as args + …

3caafda

…env, 1s timeout, no --project

Add tests, import ghost stories utilities from core, switch to parseArgs

4e04c66

Fix: stop importing parse-vitest-report from core (extensionless impo…

f397085

…rts) Core source files use extensionless import specifiers that fail under Node's native TypeScript loader. Read numPassedTests/numTotalTests directly from the vitest JSON report instead.

kasperpeulen and others added 23 commits March 30, 2026 15:23

Add .ts extensions to core imports used by eval harness

9fb35ca

Node's native TypeScript loader requires explicit .ts extensions. Add them to parse-vitest-report.ts and categorize-render-errors.ts so the eval can import parseVitestResults from core via relative path.

Fix ghost-stories comment to reflect inline vitest parsing approach

460fc5d

Use parseVitestResults from core for ghost stories grading

3dd2246

Update AGENTS.md and tsconfig comments for native Node TS execution

9b6085b

WIP: checkpoint current eval harness changes

cabe15a

Fix eval ghost-stories globbing lint

73d7415

Refine eval grading review fixes

98a2f74

Update review-pr skill: show full interface bodies in walkthrough

6e5fcf4

Principle 3 now explicitly requires showing complete interface definitions where they're first relevant, not just type names.

Refactor: use composition for AgentRunConfig instead of extends

87abae4

Extract AgentRunConfig { agent, model, effort } and compose it as a `run` field in TrialConfig, ExecutionResult, and TrialResult instead of spreading via extends/inheritance.

Make Project.branch required and remove non-null assertion

920e6d3

Every project needs a branch for cloning. The type now reflects that, and the `branch!` assertion in prepareTrial is no longer needed.

Fix CI: format eval files, fix effort type narrowing in Claude agent

b4bab02

DEBUG DANGERFILE

538bce8

debug

c07bbd6

foo

493d332

Copilot AI review requested due to automatic review settings March 30, 2026 10:47

Sidnioulz added build Internal-facing build tooling & test updates ci:docs Run the CI jobs for documentation checks only. labels Mar 30, 2026

Copilot started reviewing on behalf of Sidnioulz March 30, 2026 10:48 View session

Sidnioulz closed this Mar 30, 2026

Copilot AI reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Debug: Debug Dangerfile PR data content 2#34399

Debug: Debug Dangerfile PR data content 2#34399
Sidnioulz wants to merge 53 commits into
nextfrom
DEBUG-DANGER-2

Sidnioulz commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

coderabbitai Bot commented Mar 30, 2026

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	console.log('authorAssociation', authorAssociation);
	console.log('author', author);
	console.log(JSON.stringify(danger.github.pr, null, 2));

		@@ -0,0 +1 @@
		[?2004h[?1049h[22;0;0t[1;58r(B[m[4l[?7h[39;49m[?1h=[?1h=[?25l[39;49m(B[m[H[2J[56;121H(B[0;7m[ Reading... ](B[m[56;120H(B[0;7m[ Read 1 line ](B[m[H(B[0;7m GNU nano 8.7.1 foo [1;253H(B[m[57d(B[0;7m^G(B[m Help[57;19H(B[0;7m^O(B[m Write Out[37G(B[0;7m^F(B[m Where Is[55G(B[0;7m^K(B[m Cut[57;73H(B[0;7m^T(B[m Execute[57;91H(B[0;7m^C(B[m Location[109G(B[0;7mM-U(B[m Undo[57;127H(B[0;7mM-A(B[m Set Mark[145G(B[0;7mM-](B[m To Bracket (B[0;7mM-B(B[m Previous[181G(B[0;7m◂(B[m Back[57;199H(B[0;7m^◂(B[m Prev Word[217G(B[0;7m^A(B[m Home[57;235H(B[0;7m^P(B[m Prev Line[58d(B[0;7m^X(B[m Exit[58;19H(B[0;7m^R(B[m Read File[37G(B[0;7m^\(B[m Replace[58;55H(B[0;7m^U(B[m Paste[58;73H(B[0;7m^J(B[m Justify[58;91H(B[0;7m^/(B[m Go To Line (B[0;7mM-E(B[m Redo[58;127H(B[0;7mM-6(B[m Copy[58;145H(B[0;7m^B(B[m Where Was[163G(B[0;7mM-F(B[m Next[58;181H(B[0;7m▸(B[m Forward[58;199H(B[0;7m^▸(B[m Next Word[217G(B[0;7m^E(B[m End[58;235H(B[0;7m^N(B[m Next Line[2d^[[?2004h^[[?1049h^[[22;0;0t^[[1;58r^[(B^[[m^[[4l^[[?7h^[[39;49m^[[?1h^[=^[[?1h^[=^[[?25l^[[39;49m^[(B^[[m^[[H^[[2J^[[56;121H^[(B^[[0;7m[ Reading... ]^[(B^[[m[?12l[?25h[?25l[56;99H(B[0;7m[ line 1/2 (50%), col 1/159 ( 0%), char 0/135 ( 0%) ](B[m[?12l[?25h[2d[?25l[56d[J[58d[?12l[?25h[58;1H[?1049l[23;0;0t[?1l>[?2004l No newline at end of file

Uh oh!

Conversation

Sidnioulz commented Mar 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What I did

Checklist for Contributors

Testing

The changes in this PR are covered in the following automated tests:

Manual testing

Documentation

Checklist for Maintainers

🦋 Canary release

Summary by CodeRabbit

Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Mar 30, 2026

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sidnioulz commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading