Skip to content

CLI: Introduce Agentic Setup workflow#34297

Merged
yannbf merged 377 commits into
nextfrom
project/sb-agentic-setup
Apr 30, 2026
Merged

CLI: Introduce Agentic Setup workflow#34297
yannbf merged 377 commits into
nextfrom
project/sb-agentic-setup

Conversation

@yannbf
Copy link
Copy Markdown
Member

@yannbf yannbf commented Mar 24, 2026

Closes #34295

What I did

This PR adds an agentic setup workflow that lets AI agents set up Storybook for an existing project in a self-healing loop. Two pieces ship together:

1. storybook ai setup CLI command

A new subcommand under storybook that inspects the current project (framework, renderer, builder, language, existing components) and generates a project-aware markdown prompt designed to be consumed by a coding agent. The prompt instructs the agent to:

  • Analyze the codebase for component patterns, providers, and mocks
  • Configure .storybook/preview.ts with the right decorators, styles, and MSW handlers
  • Write story files with play functions for ~10 representative components
  • Run them with Vitest and iterate until they pass

The MCP addon is installed alongside so agents have a live query interface into the component library while they work.

2. Eval harness (scripts/eval/)

A benchmarking system that runs Claude Code and Codex against 7 real-world React + Vite projects, grades the output, and publishes each trial as a draft PR with a structured data.json. collect-pr-data.ts ingests those PRs into a local SQLite database for analysis.

Key pieces:

  • eval.ts — single trial runner
  • run-batch.ts — batch orchestrator with concurrency and repetition controls
  • lib/grade.ts — 4-dimensional grading (build, typecheck, story render, ghost stories)
  • sync-baselines.ts — pushes a canonical .storybook baseline to all benchmark repos
  • lib/agents/ — Claude Code and Codex driver implementations

The headline metric is normalized preview gain — how much of the remaining gap to a 100% story pass rate the agent closed. See scripts/eval/README.md for full docs.

Checklist for Contributors

Testing

The changes in this PR are covered in the following automated tests:

  • stories
  • unit tests
  • integration tests
  • end-to-end tests

Manual testing

Caution

This section is mandatory for all contributions. If you believe no manual test is necessary, please state so explicitly. Thanks!

Test the CLI command itself:

  1. Check out this branch and run yarn && yarn task compile
  2. Generate a sandbox: yarn task sandbox --template react-vite/default-ts --start-from auto
  3. From the sandbox directory, run npx storybook ai setup
  4. Verify the command prints a markdown prompt that mentions React, Vite, and TypeScript-specific instructions

Test the agentic flow with a real agent:

  1. Select a React + Vite project that does NOT have Storybook installed (e.g. Mealdrop in without-storybook branch or any other – you can use npx @hipster/sb-utils uninstall --yes in a project if you like), open Claude Code (or Codex) at the project root
  2. Paste this prompt: Run npx storybook@next init and follow its instructions precisely.
  3. Let the agent run to completion — it should configure .storybook/preview.ts, write ~10 story files with play functions, and run Vitest to verify them
  4. Verify the MCP addon is added to the project's dev dependencies and registered in .storybook/main.ts
  5. Verify all written stories pass when running yarn vitest run --project=storybook
  6. Open Storybook and confirm the stories render correctly

Test the manual flow:

  1. In a React + Vite project that does NOT have Storybook installed (e.g. Mealdrop in without-storybook branch or any other – you can use npx @hipster/sb-utils uninstall --yes in a project if you like)
  2. Manually run npx storybook@next init
  3. You should get prompted to use AI (only for React + Vite projects)
  4. When saying yes, verify that MCP addons is added to main.ts
  5. Verify that there's a prompt in the CLI
  6. Run Storybook
  7. Verify that there's a copy prompt button in the Storybook UI (onboarding checklist and guide page)
  8. Paste the prompt to an agent and let it do its job
  9. Verify that the copy prompt button is gone from the UI after the agent has done some work

Test the eval harness (optional, requires gh CLI + Claude Code/Codex installed):

You don't have to do this, only if you're curious about the eval system.

  1. From repo root, list available projects: node scripts/eval/eval.ts --list-projects
  2. Run a single trial against a small project: node scripts/eval/eval.ts -p mealdrop --prompt pattern-copy-play
  3. Verify the trial produces a PR on storybook-tmp/mealdrop with a data.json artifact and grade summary
  4. Run the collector: node scripts/eval/collect-pr-data.ts --project mealdrop
  5. Open the SQLite DB at scripts/eval/.cache/eval-pr-data.sqlite and verify the trial appears in the trials table and in the story_render_summary_by_project_model_effort view
  6. You can also write to an agent something like "Collect the recent data with this node script: scripts/eval/collect-pr-data.ts. Then write a query to find what are the parts which the agents are spending the longest time at in this eval script" for instance

Documentation

  • Add or update documentation reflecting your changes
  • If you are deprecating/removing a feature, make sure to update
    MIGRATION.MD

Checklist for Maintainers

  • When this PR is ready for testing, make sure to add ci:normal, ci:merged or ci:daily GH label to it to run a specific set of sandboxes. The particular set of sandboxes can be found in code/lib/cli-storybook/src/sandbox-templates.ts

  • Make sure this PR contains one of the labels below:

    Available labels
    • bug: Internal changes that fixes incorrect behavior.
    • maintenance: User-facing maintenance tasks.
    • dependencies: Upgrading (sometimes downgrading) dependencies.
    • build: Internal-facing build tooling & test updates. Will not show up in release changelog.
    • cleanup: Minor cleanup style change. Will not show up in release changelog.
    • documentation: Documentation only changes. Will not show up in release changelog.
    • feature request: Introducing a new feature.
    • BREAKING CHANGE: Changes that break compatibility in some way with current major version.
    • other: Changes that don't fit in the above categories.

🦋 Canary release

This pull request has been released as version 0.0.0-pr-34297-sha-61aa8ea7. Try it out in a new sandbox by running npx storybook@0.0.0-pr-34297-sha-61aa8ea7 sandbox or in an existing project with npx storybook@0.0.0-pr-34297-sha-61aa8ea7 upgrade.

More information
Published version 0.0.0-pr-34297-sha-61aa8ea7
Triggered by @yannbf
Repository storybookjs/storybook
Branch project/sb-agentic-setup
Commit 61aa8ea7
Datetime Thu Apr 30 06:13:38 UTC 2026 (1777529618)
Workflow run 25150411051

To request a new release of this pull request, mention the @storybookjs/core team.

core team members can create a new canary release here or locally with gh workflow run --repo storybookjs/storybook publish.yml --field pr=34297

Summary by CodeRabbit

  • New Features

    • AI setup CLI (generates setup markdown, can snapshot preview baselines and record pending setups) and a lightweight placeholder "skill" subcommand.
    • Evaluation harness: trial runner, agent drivers, grading, publish-to-PR flow, batch runner, story-render tooling, and result/reporting utilities.
  • Chores

    • Increased CI executor size, updated .gitignore, VS Code TypeScript format-on-save, and contributor Node.js guidance.
  • Tests

    • Added extensive tests for telemetry, reporters, AI setup flows, and the eval pipeline.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new AI setup/skill CLI and extensive evaluation tooling: CLI ai/skill entries, AI prompt registry and generators, Vitest agent telemetry reporter, create-storybook AI onboarding plumbing, and a full eval harness (agents, grading, story-render, trial runner, publishing, batch runner, and utilities), plus tests, docs, and minor config/editor changes.

Changes

Cohort / File(s) Summary
CLI — run entry & ai CLI
code/lib/cli-storybook/src/bin/run.ts, code/lib/cli-storybook/src/ai/..., code/lib/cli-storybook/src/ai/types.ts, code/lib/cli-storybook/src/ai/prompt.ts, code/lib/cli-storybook/src/ai/setup-prompts/...
Adds a placeholder skill subcommand in run.ts and a full ai setup implementation: new types, prompt registry, multiple prompt variants (pattern-copy-play, setup), markdown generation, telemetry gating, preview snapshot + cache write, and output/file plumbing.
Eval harness — core runner & orchestration
scripts/eval/eval.ts, scripts/eval/run-batch.ts, scripts/eval/lib/run-trial.ts, scripts/eval/lib/publish-trial.ts
New CLI entry and orchestrators for single and batch eval runs: trial preparation, agent selection/invocation, capture of ai-setup markdown, provisional and final data.json handling, and publishing draft PRs with deterministic labels.
Eval harness — agents, grading, rendering, utils
scripts/eval/lib/agents/*.ts, scripts/eval/lib/grade.ts, scripts/eval/lib/story-render.ts, scripts/eval/lib/utils.ts
Adds agent drivers (Claude, Codex), grading pipeline (git diff, build/typecheck, ghost-story grading, normalized quality score), story-render execution with baseline environment swapping, and shared CLI/utils (ID/timestamp/formatting, prompt registry, cost/score formatting).
Eval harness — tests & publish
scripts/eval/lib/*.test.ts, scripts/eval/lib/publish-trial.test.ts, scripts/eval/lib/run-trial.test.ts
Comprehensive Vitest tests for utils, publish-runner and run-trial flows; mocks external steps and asserts produced artifacts, PR body/labels, and run reporting.
Create-storybook / initiate & prefs
code/lib/create-storybook/src/initiate.ts, code/lib/create-storybook/src/commands/UserPreferencesCommand.ts, code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts
Agent-mode forces non-interactive acceptance and suppresses dev server; moves feature-availability flags into inputs, adds AI opt-in prompting/telemetry, sets onboarding-pending cache when onboarding selected, and updates function signatures/tests accordingly.
Vitest telemetry — plugin & reporter
code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.ts, code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts, code/addons/vitest/src/vitest-plugin/index.ts
New AgentTelemetryReporter collects story-scoped test results and emits ai-setup-self-healing-scoring telemetry; plugin injection is gated by telemetry/session detection; tests added.
Core settings & small edits
code/core/src/cli/globalSettings.ts, code/addons/a11y/src/preview.tsx
Adds optional aiSetup checklist key to global settings schema; adds comment clarifying ghost-stories accessibility skip.
Repo metadata, editor, CI
.gitignore, .vscode/settings.json, AGENTS.md, .circleci/config.yml
Adds ignore patterns for eval artifacts and .pr-review; enables TypeScript format-on-save in VS Code settings; updates contributor docs (Node 22.22.1 and TS runtime guidance); increases CircleCI executor from small to large.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI as "eval CLI / runTrial"
  participant Workspace as "Trial Workspace (fs)"
  participant Agent as "Agent Driver (Claude/Codex)"
  participant Grader as "Grader (grade/story-render)"
  participant Publisher as "publishTrialBranch / GitHub"

  rect rgba(200,220,255,0.5)
    CLI->>Workspace: prepareTrialWorkspace(config)
    CLI->>Workspace: captureEnvironment()
    CLI->>Workspace: write prompt.md & setup-prompt.md
  end

  rect rgba(200,255,200,0.5)
    CLI->>Agent: execute(prompt, workspace, variant)
    Agent-->>CLI: streaming transcript & execution metrics
    Agent->>Workspace: write provisional artifacts (optional)
  end

  rect rgba(255,220,200,0.5)
    CLI->>Grader: grade(baseline, trial, ghost-stories)
    Grader-->>CLI: computed Grade & QualityScore
    CLI->>Workspace: update data.json with final results
  end

  rect rgba(220,255,220,0.5)
    CLI->>Publisher: publishTrialBranch(results, workspace)
    Publisher->>GitHub: create branch, push, open PR, add labels
    Publisher-->>CLI: {branch, labels, url}
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
code/lib/cli-storybook/src/bin/run.ts (1)

314-321: Extract and export the action handler for direct testing.

The inline anonymous handler makes targeted unit tests harder; moving it to a named exported function improves coverage and maintainability.

As per coding guidelines, "Export functions that need direct tests to enable proper unit testing coverage".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/lib/cli-storybook/src/bin/run.ts` around lines 314 - 321, Extract the
inline anonymous .action handler into a named, exported async function (e.g.,
export async function runStorybookSkills(options)) that contains the same steps
(logger.intro('Checking Storybook skills'), await skill(options) if applicable,
logger.outro('Done')) and preserve any telemetry/error handling semantics; then
replace the anonymous handler in .action(...) with a reference to this new
function so it can be directly imported and unit-tested. Ensure the new function
signature matches how options are passed by the CLI and re-export it from the
module so tests can import it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@code/lib/cli-storybook/src/bin/run.ts`:
- Around line 306-307: The CLI registers the command as command('skill') which
mismatches the tracked contract expecting the plural form; update the command
registration in run.ts to use command('skills') (and adjust any associated
description or help text if needed) so the CLI name matches the
documented/expected "storybook skills" contract and avoid command/docs mismatch.
- Around line 314-321: The current action handler is a no-op that only logs
progress; replace it with the telemetry + failure flow and either invoke the
real skill or fail-fast until the implementation exists: wrap the body with
withTelemetry('skill', { cliOptions: options }, async () => {
logger.intro('Checking Storybook skills'); await skill(options);
logger.outro('Done'); }) and append
.catch(handleCommandFailure(options.logfile)); if the exported skill function is
not yet available, call withTelemetry and throw a clear Error (e.g. "storybook
skill not implemented") inside the async callback so telemetry is preserved and
the failure handler runs; reference the action callback, withTelemetry, skill,
handleCommandFailure, and logger.intro/logger.outro when making the change.

---

Nitpick comments:
In `@code/lib/cli-storybook/src/bin/run.ts`:
- Around line 314-321: Extract the inline anonymous .action handler into a
named, exported async function (e.g., export async function
runStorybookSkills(options)) that contains the same steps
(logger.intro('Checking Storybook skills'), await skill(options) if applicable,
logger.outro('Done')) and preserve any telemetry/error handling semantics; then
replace the anonymous handler in .action(...) with a reference to this new
function so it can be directly imported and unit-tested. Ensure the new function
signature matches how options are passed by the CLI and re-export it from the
module so tests can import it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5205f63d-da05-4d2c-9898-ddccd28f26ef

📥 Commits

Reviewing files that changed from the base of the PR and between 33afce2 and 8a8d429.

📒 Files selected for processing (1)
  • code/lib/cli-storybook/src/bin/run.ts

Comment thread code/lib/cli-storybook/src/bin/run.ts Outdated
Comment thread code/lib/cli-storybook/src/bin/run.ts Outdated
@yannbf yannbf changed the title CLI: Introduce Agentic Setup workflow [Project] CLI: Introduce Agentic Setup workflow Mar 27, 2026
@yannbf yannbf self-assigned this Mar 27, 2026
@yannbf yannbf added the ci:daily Run the CI jobs that normally run in the daily job. label Mar 30, 2026
@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented Mar 30, 2026

View your CI Pipeline Execution ↗ for commit 8ed4332

Command Status Duration Result
nx run-many -t compile,check,knip,test,lint,fmt... ✅ Succeeded 9m 46s View ↗

☁️ Nx Cloud last updated this comment at 2026-04-18 07:00:24 UTC

@yannbf yannbf removed the ci:daily Run the CI jobs that normally run in the daily job. label Mar 31, 2026
@Sidnioulz Sidnioulz force-pushed the project/sb-agentic-setup branch from 3b94c97 to cc92a8c Compare April 8, 2026 15:55
@Sidnioulz Sidnioulz changed the title [Project] CLI: Introduce Agentic Setup workflow Telemetry: Add agentic setup tracking, prompt traits, and evidence-based completion Apr 8, 2026
@storybook-app-bot
Copy link
Copy Markdown

storybook-app-bot Bot commented Apr 8, 2026

Package Benchmarks

Commit: 8f1a55d, ran on 30 April 2026 at 09:09:31 UTC

The following packages have significant changes to their size or dependencies:

storybook

Before After Difference
Dependency count 50 50 0
Self size 20.55 MB 20.58 MB 🚨 +38 KB 🚨
Dependency size 16.56 MB 16.56 MB 0 B
Bundle Size Analyzer Link Link

@storybook/cli

Before After Difference
Dependency count 184 184 0
Self size 782 KB 836 KB 🚨 +54 KB 🚨
Dependency size 68.22 MB 68.26 MB 🚨 +46 KB 🚨
Bundle Size Analyzer Link Link

@storybook/codemod

Before After Difference
Dependency count 177 177 0
Self size 32 KB 32 KB 🎉 -36 B 🎉
Dependency size 66.74 MB 66.78 MB 🚨 +38 KB 🚨
Bundle Size Analyzer Link Link

create-storybook

Before After Difference
Dependency count 51 51 0
Self size 1.04 MB 1.05 MB 🚨 +7 KB 🚨
Dependency size 37.10 MB 37.14 MB 🚨 +38 KB 🚨
Bundle Size Analyzer node node

@Sidnioulz Sidnioulz force-pushed the project/sb-agentic-setup branch from cc92a8c to 11bc8f0 Compare April 9, 2026 07:49
@Sidnioulz Sidnioulz closed this Apr 9, 2026
@Sidnioulz Sidnioulz changed the title Telemetry: Add agentic setup tracking, prompt traits, and evidence-based completion [Project] CLI: Introduce Agentic Setup workflow Apr 9, 2026
@Sidnioulz Sidnioulz reopened this Apr 9, 2026
@yannbf yannbf marked this pull request as ready for review April 29, 2026 09:41
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
code/addons/vitest/src/vitest-plugin/index.ts (1)

462-491: ⚠️ Potential issue | 🔴 Critical

The configureVitest hook must not be async in Vitest 4.x.

Vitest 4.x defines configureVitest as synchronous: (context: VitestPluginContext) => void. The hook does not support async functions or return Promises—this is not documented or intended by the plugin API. The current code uses async configureVitest(context) with await isWithinInitialSession('ai-setup'), which violates this contract.

Move the async logic outside the hook, or handle the session check synchronously within configureVitest and defer telemetry operations if necessary (e.g., using setImmediate or a separate initialization step).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/addons/vitest/src/vitest-plugin/index.ts` around lines 462 - 491, The
configureVitest hook is declared async but Vitest 4.x requires it be
synchronous; change configureVitest to a plain synchronous function (remove
async/await) and move any async work (the await
isWithinInitialSession('ai-setup') and any async telemetry/reporters setup) into
a separate async initializer that runs after configuration (for example call an
async init function via setImmediate/Promise.resolve().then(...) or a top-level
startTelemetry() invoked without awaiting), and inside that async initializer
use detectAgent(), await isWithinInitialSession('ai-setup'),
isTelemetryModuleEnabled(), and push the new AgentTelemetryReporter({ configDir:
finalOptions.configDir, agent }) into context.vitest.config.reporters or
otherwise mutate the config synchronously-safe; keep synchronous bits like
context.vitest.config.coverage.exclude.push('storybook-static') inside
configureVitest and reference the existing symbols configureVitest,
isWithinInitialSession, detectAgent, AgentTelemetryReporter, telemetry,
isTelemetryModuleEnabled, withinAgenticSetupSession, finalOptions.configDir, and
context.vitest.config.reporters when moving logic.
🧹 Nitpick comments (3)
code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts (2)

5-13: Mock pattern violates coding guidelines — prefer spy: true.

The mock for storybook/internal/telemetry doesn't use the spy: true option and defines implementations inline. Per coding guidelines, mocks should use spy: true and behaviors should be configured in beforeEach.

For isExampleStoryId, you're re-implementing the actual function behavior. Using spy: true would preserve the original implementation automatically.

♻️ Suggested refactor
-vi.mock('storybook/internal/telemetry', () => ({
-  telemetry: vi.fn(),
-  isExampleStoryId: vi.fn(
-    (id: string) =>
-      id.startsWith('example-button--') ||
-      id.startsWith('example-header--') ||
-      id.startsWith('example-page--')
-  ),
-}));
+vi.mock('storybook/internal/telemetry', { spy: true });

Then in beforeEach:

beforeEach(() => {
  vi.clearAllMocks();
  vi.mocked(telemetry).mockResolvedValue(undefined);
  // isExampleStoryId retains its original implementation with spy: true
  reporter = new AgentTelemetryReporter({
    configDir: '.storybook',
    agent: { name: 'claude' },
  });
});

As per coding guidelines: "Use vi.mock() with the spy: true option for all package and file mocks" and "Implement mock behaviors in beforeEach blocks".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts` around
lines 5 - 13, The test currently provides a full inline mock for
'storybook/internal/telemetry' which reimplements isExampleStoryId and omits
spy: true; change vi.mock(...) to use the spy: true option so original
implementations (including isExampleStoryId) are preserved, remove the inline
implementation for isExampleStoryId, and move specific mock behaviors into the
test beforeEach: call vi.clearAllMocks(), mock telemetry (vi.mocked(telemetry))
to resolve/return the desired value, and then instantiate reporter = new
AgentTelemetryReporter(...) as before; ensure references in the file target the
telemetry export and isExampleStoryId symbol names so behavior is configured in
beforeEach rather than inside vi.mock.

72-96: Tests for onTestCaseResult lack assertions.

The tests in lines 73-79, 81-87, and 89-95 call onTestCaseResult but don't assert any expected behavior. Consider adding assertions to verify the internal state or expose a way to check collected results.

♻️ Suggested improvement
 it('should collect story test results', () => {
   const testCase = createMockTestCase({
     storyId: 'my-story--primary',
     status: 'passed',
   });
   reporter.onTestCaseResult(testCase as any);
+  // Verify by checking telemetry is called with this result in onTestRunEnd
+  // Or expose a getter for testResults length if needed for unit testing
 });

Alternatively, consider testing these behaviors through integration with onTestRunEnd assertions, which already verify the accumulated results.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts` around
lines 72 - 96, The tests call reporter.onTestCaseResult but have no assertions;
update each test to assert expected behavior by either (A) inspecting reporter's
collected results (e.g., add or use a method/property like
reporter.getCollectedResults() or reporter.collectedResults) after calling
onTestCaseResult for the story and example cases, or (B) invoke
reporter.onTestRunEnd() and assert the telemetry sender was called with expected
payloads (spy/mock the sendTelemetry function) to verify accumulation; use the
existing createMockTestCase inputs (storyId/status) to assert the specific
inclusion, exclusion, or skipping of results for onTestCaseResult.
code/addons/vitest/src/vitest-plugin/index.ts (1)

251-252: Closure captures withinAgenticSetupSession by reference — verify timing is correct.

The variable is initialized to false, captured in the getInitialGlobals closure, then updated in configureVitest. This works because JS closures capture by reference, and Vitest's lifecycle guarantees configureVitest completes before tests invoke browser commands.

Consider adding a brief comment explaining this dependency for future maintainers:

+ // Set in configureVitest, read by getInitialGlobals browser command
  let withinAgenticSetupSession = false;

Also applies to: 397-406

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/addons/vitest/src/vitest-plugin/index.ts` around lines 251 - 252, The
closure captures the module-scoped boolean withinAgenticSetupSession which is
set to false then later toggled in configureVitest and read inside
getInitialGlobals; add a short explanatory comment next to the
withinAgenticSetupSession declaration (and mirror at the other capture site
around the code at the second occurrence) stating that closures capture by
reference and that Vitest's lifecycle guarantees configureVitest runs before any
test/browser commands so the timing is intentional and must be preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.vscode/settings.json:
- Around line 26-29: The settings file contains two separate "[typescript]"
blocks causing a duplicate key; merge them by consolidating the properties so a
single "[typescript]" object contains both "editor.defaultFormatter" and
"editor.formatOnSave" (preserve the existing values) and remove the redundant
block; locate the duplicate "[typescript]" entries and update the one named
"[typescript]" to include both editor.defaultFormatter and editor.formatOnSave
while deleting the other.

---

Outside diff comments:
In `@code/addons/vitest/src/vitest-plugin/index.ts`:
- Around line 462-491: The configureVitest hook is declared async but Vitest 4.x
requires it be synchronous; change configureVitest to a plain synchronous
function (remove async/await) and move any async work (the await
isWithinInitialSession('ai-setup') and any async telemetry/reporters setup) into
a separate async initializer that runs after configuration (for example call an
async init function via setImmediate/Promise.resolve().then(...) or a top-level
startTelemetry() invoked without awaiting), and inside that async initializer
use detectAgent(), await isWithinInitialSession('ai-setup'),
isTelemetryModuleEnabled(), and push the new AgentTelemetryReporter({ configDir:
finalOptions.configDir, agent }) into context.vitest.config.reporters or
otherwise mutate the config synchronously-safe; keep synchronous bits like
context.vitest.config.coverage.exclude.push('storybook-static') inside
configureVitest and reference the existing symbols configureVitest,
isWithinInitialSession, detectAgent, AgentTelemetryReporter, telemetry,
isTelemetryModuleEnabled, withinAgenticSetupSession, finalOptions.configDir, and
context.vitest.config.reporters when moving logic.

---

Nitpick comments:
In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts`:
- Around line 5-13: The test currently provides a full inline mock for
'storybook/internal/telemetry' which reimplements isExampleStoryId and omits
spy: true; change vi.mock(...) to use the spy: true option so original
implementations (including isExampleStoryId) are preserved, remove the inline
implementation for isExampleStoryId, and move specific mock behaviors into the
test beforeEach: call vi.clearAllMocks(), mock telemetry (vi.mocked(telemetry))
to resolve/return the desired value, and then instantiate reporter = new
AgentTelemetryReporter(...) as before; ensure references in the file target the
telemetry export and isExampleStoryId symbol names so behavior is configured in
beforeEach rather than inside vi.mock.
- Around line 72-96: The tests call reporter.onTestCaseResult but have no
assertions; update each test to assert expected behavior by either (A)
inspecting reporter's collected results (e.g., add or use a method/property like
reporter.getCollectedResults() or reporter.collectedResults) after calling
onTestCaseResult for the story and example cases, or (B) invoke
reporter.onTestRunEnd() and assert the telemetry sender was called with expected
payloads (spy/mock the sendTelemetry function) to verify accumulation; use the
existing createMockTestCase inputs (storyId/status) to assert the specific
inclusion, exclusion, or skipping of results for onTestCaseResult.

In `@code/addons/vitest/src/vitest-plugin/index.ts`:
- Around line 251-252: The closure captures the module-scoped boolean
withinAgenticSetupSession which is set to false then later toggled in
configureVitest and read inside getInitialGlobals; add a short explanatory
comment next to the withinAgenticSetupSession declaration (and mirror at the
other capture site around the code at the second occurrence) stating that
closures capture by reference and that Vitest's lifecycle guarantees
configureVitest runs before any test/browser commands so the timing is
intentional and must be preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 279f1224-7ef4-4b74-aaa8-f6edeb102003

📥 Commits

Reviewing files that changed from the base of the PR and between 8a8d429 and ce0e6cb.

📒 Files selected for processing (9)
  • .circleci/config.yml
  • .gitignore
  • .vscode/settings.json
  • AGENTS.md
  • code/addons/a11y/src/preview.tsx
  • code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts
  • code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.ts
  • code/addons/vitest/src/vitest-plugin/index.ts
  • code/core/src/cli/globalSettings.ts
✅ Files skipped from review due to trivial changes (4)
  • .circleci/config.yml
  • code/addons/a11y/src/preview.tsx
  • code/core/src/cli/globalSettings.ts
  • .gitignore

Comment thread .vscode/settings.json
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts (1)

457-472: Assert forwarded behavior here, not just shape.

This test stays green even if executeUserPreferences() stops forwarding isAiSetupAvailable or isTestFeatureAvailable, because any object with selectedFeatures and newUser satisfies it. Please assert one observable branch through the wrapper instead, e.g. that Feature.TEST is omitted when isTestFeatureAvailable: false or that Feature.AI is added when AI is available.

Based on learnings: Export functions that need direct tests and test real behavior, not just syntax patterns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts` around
lines 457 - 472, The test for executeUserPreferences only asserts shape; instead
update the test to assert forwarded flags affect behavior by calling
executeUserPreferences with controlled inputs for isTestFeatureAvailable and
isAiSetupAvailable (via defaultExecuteOptions or explicit options) and then
asserting that Feature.TEST is absent when isTestFeatureAvailable:false and that
Feature.AI is present when isAiSetupAvailable:true (use the exported
executeUserPreferences helper and the Feature enum to check presence/absence in
result.selectedFeatures); if helper functions being tested are not exported,
export them so tests can call them directly and verify real branching rather
than only the return shape.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@code/lib/create-storybook/src/initiate.ts`:
- Around line 116-118: The current gate disables AI in empty directories by
setting isAiSetupAvailable: isAiSetupAvailable && !isEmptyProject, which
prevents Feature.AI from being selectable for explicit agent runs; change the
condition to keep AI available when the user explicitly requested an agent run
(e.g., add an explicitAgentRun/agentFlag check) so it reads something like
isAiSetupAvailable && (!isEmptyProject || explicitAgentRun); ensure the
Feature.AI selection can be made and that downstream flags showAgentFollowUp and
showAiInstructions are set when explicitAgentRun is true.

---

Nitpick comments:
In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts`:
- Around line 457-472: The test for executeUserPreferences only asserts shape;
instead update the test to assert forwarded flags affect behavior by calling
executeUserPreferences with controlled inputs for isTestFeatureAvailable and
isAiSetupAvailable (via defaultExecuteOptions or explicit options) and then
asserting that Feature.TEST is absent when isTestFeatureAvailable:false and that
Feature.AI is present when isAiSetupAvailable:true (use the exported
executeUserPreferences helper and the Feature enum to check presence/absence in
result.selectedFeatures); if helper functions being tested are not exported,
export them so tests can call them directly and verify real branching rather
than only the return shape.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 76a67a8d-8931-48d3-8319-942b0534f12d

📥 Commits

Reviewing files that changed from the base of the PR and between ce0e6cb and 62849d4.

📒 Files selected for processing (3)
  • code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts
  • code/lib/create-storybook/src/commands/UserPreferencesCommand.ts
  • code/lib/create-storybook/src/initiate.ts

Comment thread code/lib/create-storybook/src/commands/UserPreferencesCommand.ts
Comment thread code/lib/create-storybook/src/initiate.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (3)
scripts/eval/lib/publish-trial.test.ts (1)

102-123: Refactor tinyexec mocking to align with repo Vitest spy-mocking patterns

The test uses vi.doMock() without the spy: true option and places inline mock implementations inside test cases, which violates the repo's standardized mocking guidelines. Move mocks to the top of the file with spy: true and implement behaviors in beforeEach blocks.

Current pattern (lines 102-123, 289-297, 393-423)
vi.doMock('tinyexec', () => ({
  x: vi.fn(
    async (cmd: string, args: string[], options?: { nodeOptions?: { cwd?: string } }) => {
      // inline mock implementation
    }
  ),
}));

Per repo guidelines: Use vi.mock() with the spy: true option, place all mocks at the top before test cases, and avoid inline mock implementations within test cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval/lib/publish-trial.test.ts` around lines 102 - 123, The test is
creating an inline mock for the tinyexec module using vi.doMock with an inline
vi.fn implementation inside test bodies; refactor by replacing vi.doMock(...)
with vi.mock('tinyexec', { spy: true }) at the top of the file and move the
inline behavior into a beforeEach that sets the spy implementation for the
exported x function (e.g., use (tinyexec.x as
vi.SpyInstance).mockImplementation(...) or vi.spyOn(...) in beforeEach) so calls
array setup and the conditional return cases (gh label list, git config, gh pr
create) are defined outside individual tests and the mock is a spy per repo
guidelines.
scripts/eval/lib/run-trial.test.ts (1)

10-55: ⚡ Quick win

Bring these Vitest mocks back in line with the repo rules.

Lines 10-55 bake behavior into the vi.mock(...) factories, and Lines 320-369 reconfigure mocks inside a test case. That makes the suite harder to reset cleanly and diverges from the repo’s spy: true/beforeEach pattern. It’d also be safer to make the mock specifiers match the imported .ts paths so the intercepted module is unambiguous.

As per coding guidelines, "Use vi.mock() with the spy: true option for all package and file mocks in Vitest tests" and "Implement mock behaviors in beforeEach blocks in Vitest tests".

Also applies to: 320-369

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval/lib/run-trial.test.ts` around lines 10 - 55, Mocks in this test
file are baking behavior inside vi.mock factories instead of using spy: true and
configuring behaviors in beforeEach; update each vi.mock call (for
'./prepare-trial', './grade', './publish-trial', './utils',
'./agents/claude-code', './agents/codex', and 'tinyexec') to include the option
{ spy: true } and stop returning concrete mockResolvedValue data from the
factory; instead, move all mock behavior (e.g., publishTrialBranch
mockResolvedValue, captureEnvironment mockResolvedValue,
claudeAgent.execute/codexAgent.execute implementations, tinyexec.x resolved
value, and any prepareTrial/grade spies) into a beforeEach block where you call
vi.mocked(...).mockResolvedValue or .mockImplementation as needed; also ensure
the module specifiers exactly match the imported .ts paths used by the code
under test (e.g., './utils.ts', './agents/claude-code.ts') so the mocks
intercept the correct modules.
scripts/eval/lib/utils.ts (1)

33-40: ⚡ Quick win

Use the Storybook node logger here instead of raw console.log.

This helper is shared across the eval library now, so it is no longer isolated enough to justify raw console logging.

As per coding guidelines, use Storybook loggers instead of raw console.* in normal code paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval/lib/utils.ts` around lines 33 - 40, The helper createLogger
currently uses console.log in its methods (log, logStep, logSuccess, logError);
replace those calls with the Storybook node logger by importing the logger from
'@storybook/node-logger' and routing messages to the appropriate logger methods
(e.g., logger.info for regular/log and step messages, logger.success or
logger.info with a success prefix for logSuccess, and logger.error or
logger.warn for logError) while preserving the existing prefix/formatting logic
so createLogger returns the same API but using Storybook's logger instead of
console.log.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@code/lib/cli-storybook/src/ai/index.ts`:
- Around line 70-73: The support gate currently uses a conjunction so it only
rejects when both rendererPackage !== '@storybook/react' AND builderPackage !==
'@storybook/builder-vite'; change the condition in the if that checks
projectInfo.rendererPackage and projectInfo.builderPackage (in ai/index.ts) to
use OR (||) so the command exits whenever either requirement is not met (i.e.,
renderer is not '@storybook/react' OR builder is not '@storybook/builder-vite'),
keeping the existing exit/log behavior inside that if block.

In `@code/lib/cli-storybook/src/ai/setup-prompts/pattern-copy-play.ts`:
- Around line 62-119: Update getPreviewConfigExample to use ProjectInfo.language
to emit JS or TS variants: when ProjectInfo.language indicates JavaScript,
produce preview.js (not preview.tsx), remove TypeScript-only constructs (no
"import type", no ": Preview" annotations, no "satisfies" or typed Story
aliases) and use plain default export (module.exports or export default with an
untyped object) and when TypeScript emit the existing TS example; apply the same
language-aware branching and removals to the other prompt generators referenced
(the blocks around lines 140-154, 184-205, 234-255, 319-354, 388-415, 445-468,
488-489, 671-679) so all shipped prompts use .js filenames and untyped syntax
for JS projects and keep typed examples for TS projects.

In `@code/lib/cli-storybook/src/ai/setup-prompts/setup.ts`:
- Around line 181-216: The steps hardcode TypeScript filenames and TS-only
syntax; make the generated filenames and example snippets language-aware using
projectInfo.language: compute an extension variable (e.g., ext =
projectInfo.language startsWith 'ts' ? 'tsx' : 'jsx') and use
`${configDir}/preview.${ext}` and `<ComponentName>.stories.${ext}` instead of
hardcoded `.tsx`/`.stories.tsx`; update getPreviewDecoratorExample(projectInfo)
(or its caller) to accept/inspect projectInfo.language and emit JS-safe examples
when language is JS (no `import type`, no `satisfies`, use plain imports/exports
and JS-compatible syntax) so the instructions and fallback examples are valid
for both JS and TS projects.

In `@scripts/eval/eval.ts`:
- Around line 134-149: The code calls inferAgent(values.model) before
argsSchema.safeParse, which lets inferAgent throw on invalid --model and bypass
CLI validation; change the flow so parsing/validation runs first without calling
inferAgent (e.g., pass through the raw values and an undefined agent into
argsSchema.safeParse), then after parsed.success compute the final agent from
parsed.data (use parsed.data.agent ?? (parsed.data.model ?
inferAgent(parsed.data.model) : 'claude')), and apply the same fix to the other
occurrence around the block at lines 246-250; refer to inferAgent,
argsSchema.safeParse, parsed, agent, and values.model to locate and update the
logic.
- Around line 253-262: In buildManualCommand, make the manual-shell command
robust by quoting and escaping promptPath inside the subshell so paths with
spaces or metacharacters don't break the command; change how promptArg is
constructed (the `"$(cat ${promptPath})"` piece) to wrap promptPath in quotes
and escape any embedded quotes (so the subshell becomes something like "$(cat
'...')" with proper escaping of single quotes) and use that updated promptArg in
the returned strings for both the claude and codex branches.

In `@scripts/eval/lib/grade.ts`:
- Around line 183-190: The current branch treats any non-'pass' cssCheck as an
error; change the logic in the block that reads cssCheck (variable cssCheck from
storyRenderRun.summary?.cssCheck) so that only 'fail' triggers logger.logError
and 'not-run' (or when storyRenderRun.attempted === false from
runStoryRenderPass()) is logged as a non-error (e.g., logger.logInfo or
logger.log with a neutral message), while 'pass' still calls logger.logSuccess;
reference storyRenderRun, cssCheck, runStoryRenderPass(), and
logger.logError/logSuccess to locate and update the condition handling.
- Around line 132-149: The hardcoded 'npx' invocations for Storybook and tsc
should use the project's package manager the same way ghost stories do: call
detectPackageManager(resolveInstallRoot(projectPath)) to get the package manager
(e.g., pm) and then use getScriptRunCommand(pm) as the executable passed into
x(...) instead of the literal 'npx'; keep the same argument arrays
(['storybook','build','--quiet'] and ['tsc','--noEmit']) and preserve the
existing options (throwOnError, timeout, nodeOptions/env including NODE_OPTIONS
and STORYBOOK_DISABLE_TELEMETRY). Target the x(...) calls that currently use
'npx' for Storybook build and tsc and replace the first argument with
getScriptRunCommand(pm) obtained from detectPackageManager(...).

In `@scripts/eval/lib/publish-trial.ts`:
- Around line 279-287: The prompt body written by opts.data.prompt.content can
contain triple-backtick code fences which will prematurely close the surrounding
fence; in the lines.push block that builds the details block in publish-trial.ts
(the section adding '<details>', '<summary>Full prompt</summary>' and the fenced
block), change the opening fence string from '```md' to '````md' and the closing
fence from '```' to '````' so the outer fence can contain embedded
triple-backtick examples without being closed.

In `@scripts/eval/lib/story-render.ts`:
- Around line 202-216: readStoryRenderSummary currently calls JSON.parse(await
readFile(...)) which will throw and abort the trial if the Vitest report is
unreadable; update readStoryRenderSummary to wrap the file read + JSON.parse
(and subsequent parseVitestResults usage) in a try/catch and return undefined on
any error so the caller treats it as a missing summary rather than a fatal
error, referencing the readStoryRenderSummary function and the
parseVitestResults/parsed.summary path when implementing the catch-and-return
behavior.

In `@scripts/eval/lib/utils.ts`:
- Around line 43-44: The formatDuration function can produce "1m60s" because it
rounds the seconds remainder independently; change it to round the total seconds
first and then compute minutes and seconds from that integer total: inside
formatDuration, compute const total = Math.round(s), then const minutes =
Math.floor(total / 60) and const seconds = total % 60, and return
`${minutes}m${seconds}s` when minutes > 0 (or `${seconds}s` when minutes === 0)
to avoid an overflow to 60 seconds; update references to formatDuration
accordingly.
- Around line 177-186: The git probe in captureEnvironment currently runs in the
current process CWD and returns "unknown" if not executed from the repo root;
update captureEnvironment to run the x(...) git commands from the known
repository root by invoking x with a working-directory option (or temporarily
chdir) using the harness's REPO_ROOT constant so the calls to x('git',
['rev-parse', ...]) execute with cwd: REPO_ROOT (reference function
captureEnvironment and the x helper and REPO_ROOT symbol).

In `@scripts/eval/README.md`:
- Around line 63-74: The fenced code blocks in the README (for example the block
beginning with "Result" showing build/stories/ghost/score output and other
similar CLI/output/code examples) are missing language identifiers which
triggers markdownlint MD040; update each fenced block to include an appropriate
language tag (e.g., "text" or "console" for command/output snippets, "bash" for
shell commands, or the actual language for code samples) so every
triple-backtick fence in the README has a language identifier.
- Around line 282-304: The README points to the wrong directory for prompt
variants; update the documentation to reflect the actual code location used in
this PR by replacing references to code/lib/cli-storybook/src/ai/prompts with
code/lib/cli-storybook/src/ai/setup-prompts, and adjust the steps to mention
creating files like setup-prompts/<name>.ts that export an
instructions(projectInfo: ProjectInfo): string and registering them in
PROMPT_BUILDERS (in prompts/index.ts or the corresponding setup-prompts index),
so contributors modify the live variant location used by the eval harness.

In `@scripts/eval/run-batch.ts`:
- Around line 477-481: Update the help text for the prompt option in the prompt
config (the object keyed by prompt in run-batch.ts) to reference the new
registry location src/ai/setup-prompts/ instead of the old
code/lib/cli-storybook/src/ai/prompts/ path; edit the description string on the
prompt config (type: 'string' as const, description: ...) to point contributors
to src/ai/setup-prompts/ as the place where prompt variants are registered.
- Around line 340-354: The function requireBatchPrompt currently validates the
prompt case-insensitively but returns the caller's casing, which later breaks
loadPrompt's exact includes() check; update requireBatchPrompt to find the
matching canonical name from listPrompts() (e.g., use available.find(name =>
name.toLowerCase() === prompt.toLowerCase())) and return that canonical
available name instead of the original trimmed input, keeping the same error
behavior when no match is found; reference function requireBatchPrompt and
listPrompts to locate the change.

---

Nitpick comments:
In `@scripts/eval/lib/publish-trial.test.ts`:
- Around line 102-123: The test is creating an inline mock for the tinyexec
module using vi.doMock with an inline vi.fn implementation inside test bodies;
refactor by replacing vi.doMock(...) with vi.mock('tinyexec', { spy: true }) at
the top of the file and move the inline behavior into a beforeEach that sets the
spy implementation for the exported x function (e.g., use (tinyexec.x as
vi.SpyInstance).mockImplementation(...) or vi.spyOn(...) in beforeEach) so calls
array setup and the conditional return cases (gh label list, git config, gh pr
create) are defined outside individual tests and the mock is a spy per repo
guidelines.

In `@scripts/eval/lib/run-trial.test.ts`:
- Around line 10-55: Mocks in this test file are baking behavior inside vi.mock
factories instead of using spy: true and configuring behaviors in beforeEach;
update each vi.mock call (for './prepare-trial', './grade', './publish-trial',
'./utils', './agents/claude-code', './agents/codex', and 'tinyexec') to include
the option { spy: true } and stop returning concrete mockResolvedValue data from
the factory; instead, move all mock behavior (e.g., publishTrialBranch
mockResolvedValue, captureEnvironment mockResolvedValue,
claudeAgent.execute/codexAgent.execute implementations, tinyexec.x resolved
value, and any prepareTrial/grade spies) into a beforeEach block where you call
vi.mocked(...).mockResolvedValue or .mockImplementation as needed; also ensure
the module specifiers exactly match the imported .ts paths used by the code
under test (e.g., './utils.ts', './agents/claude-code.ts') so the mocks
intercept the correct modules.

In `@scripts/eval/lib/utils.ts`:
- Around line 33-40: The helper createLogger currently uses console.log in its
methods (log, logStep, logSuccess, logError); replace those calls with the
Storybook node logger by importing the logger from '@storybook/node-logger' and
routing messages to the appropriate logger methods (e.g., logger.info for
regular/log and step messages, logger.success or logger.info with a success
prefix for logSuccess, and logger.error or logger.warn for logError) while
preserving the existing prefix/formatting logic so createLogger returns the same
API but using Storybook's logger instead of console.log.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f18da5e9-c6fb-4b59-b620-1b73f0b7b262

📥 Commits

Reviewing files that changed from the base of the PR and between 62849d4 and db8eabc.

📒 Files selected for processing (21)
  • code/lib/cli-storybook/src/ai/index.test.ts
  • code/lib/cli-storybook/src/ai/index.ts
  • code/lib/cli-storybook/src/ai/prompt.ts
  • code/lib/cli-storybook/src/ai/setup-prompts/index.ts
  • code/lib/cli-storybook/src/ai/setup-prompts/pattern-copy-play.ts
  • code/lib/cli-storybook/src/ai/setup-prompts/setup.ts
  • code/lib/cli-storybook/src/ai/types.ts
  • scripts/eval/README.md
  • scripts/eval/eval.ts
  • scripts/eval/lib/agents/claude-code.ts
  • scripts/eval/lib/agents/codex.ts
  • scripts/eval/lib/agents/config.ts
  • scripts/eval/lib/grade.ts
  • scripts/eval/lib/publish-trial.test.ts
  • scripts/eval/lib/publish-trial.ts
  • scripts/eval/lib/run-trial.test.ts
  • scripts/eval/lib/run-trial.ts
  • scripts/eval/lib/story-render.ts
  • scripts/eval/lib/utils.test.ts
  • scripts/eval/lib/utils.ts
  • scripts/eval/run-batch.ts
✅ Files skipped from review due to trivial changes (1)
  • code/lib/cli-storybook/src/ai/types.ts

Comment thread code/lib/cli-storybook/src/ai/index.ts
Comment thread code/lib/cli-storybook/src/ai/setup-prompts/pattern-copy-play.ts
Comment thread code/lib/cli-storybook/src/ai/setup-prompts/setup.ts
Comment thread scripts/eval/eval.ts
Comment on lines +134 to +149
// Resolve the discriminator: explicit --agent, inferred from --model, or default to claude.
const agent = values.agent ?? (values.model ? inferAgent(values.model) : 'claude');

const parsed = argsSchema.safeParse({
...values,
agent,
listProjects: values['list-projects'],
listModels: values['list-models'],
listPrompts: values['list-prompts'],
});

if (!parsed.success) {
for (const issue of parsed.error.issues) {
console.error(pc.red(` ${issue.path.join('.')}: ${issue.message}`));
}
process.exit(1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t let inferAgent() bypass CLI validation.

Line 135 calls inferAgent() before safeParse(), so --model foo throws an uncaught exception instead of returning a normal validation error.

Suggested fix
-const agent = values.agent ?? (values.model ? inferAgent(values.model) : 'claude');
+const inferredAgent = values.model ? inferAgent(values.model) : undefined;
+const agent = values.agent ?? inferredAgent ?? 'claude';
...
-function inferAgent(model: string): AgentId {
+function inferAgent(model: string): AgentId | undefined {
   for (const id of AGENT_IDS) {
     if (AGENTS[id].models.some((candidate) => candidate === model)) return id;
   }
-  throw new Error(`No agent found for model: ${model}`);
+  return undefined;
 }

Also applies to: 246-250

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval/eval.ts` around lines 134 - 149, The code calls
inferAgent(values.model) before argsSchema.safeParse, which lets inferAgent
throw on invalid --model and bypass CLI validation; change the flow so
parsing/validation runs first without calling inferAgent (e.g., pass through the
raw values and an undefined agent into argsSchema.safeParse), then after
parsed.success compute the final agent from parsed.data (use parsed.data.agent
?? (parsed.data.model ? inferAgent(parsed.data.model) : 'claude')), and apply
the same fix to the other occurrence around the block at lines 246-250; refer to
inferAgent, argsSchema.safeParse, parsed, agent, and values.model to locate and
update the logic.

Comment thread scripts/eval/eval.ts
Comment thread scripts/eval/lib/utils.ts
Comment thread scripts/eval/README.md Outdated
Comment thread scripts/eval/README.md Outdated
Comment thread scripts/eval/run-batch.ts Outdated
Comment thread scripts/eval/run-batch.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
scripts/eval/eval.ts (1)

134-150: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

inferAgent() still throws before Zod validation can catch invalid models.

Line 135 calls inferAgent(values.model) before argsSchema.safeParse() on line 137. If --model foo is passed (an invalid model), inferAgent throws an uncaught exception instead of letting Zod produce a validation error with the friendly message format.

Proposed fix
-// Resolve the discriminator: explicit --agent, inferred from --model, or default to claude.
-const agent = values.agent ?? (values.model ? inferAgent(values.model) : 'claude');
+// Resolve the discriminator: explicit --agent, inferred from --model, or default to claude.
+const inferredAgent = values.model ? inferAgentSafe(values.model) : undefined;
+const agent = values.agent ?? inferredAgent ?? 'claude';
...
-function inferAgent(model: string): AgentId {
+function inferAgentSafe(model: string): AgentId | undefined {
   for (const id of AGENT_IDS) {
     if (AGENTS[id].models.some((candidate) => candidate === model)) return id;
   }
-  throw new Error(`No agent found for model: ${model}`);
+  return undefined;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval/eval.ts` around lines 134 - 150, The code calls
inferAgent(values.model) before Zod validation, allowing inferAgent to throw on
invalid models; instead, stop calling inferAgent inside the object passed to
argsSchema.safeParse and move the inference until after successful parsing: pass
agent through unchanged (or leave it undefined) into argsSchema.safeParse, then
after parsed.success compute the final agent with something like agent =
parsed.data.agent ?? (parsed.data.model ? inferAgent(parsed.data.model) :
'claude'); update downstream uses to use this post-validated agent so invalid
models produce Zod validation errors rather than uncaught exceptions.
🧹 Nitpick comments (3)
scripts/eval/lib/story-render.ts (1)

99-99: 💤 Low value

Consider moving constant definition before first use for readability.

STORY_RENDER_TIMEOUT_MS is referenced on line 99 but defined on line 158. While this works due to JavaScript's hoisting, placing the constant before runStoryRenderPass improves code readability.

Also applies to: 158-158

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval/lib/story-render.ts` at line 99, Move the
STORY_RENDER_TIMEOUT_MS constant declaration so it appears before its first use
in runStoryRenderPass: locate the current reference to STORY_RENDER_TIMEOUT_MS
in the runStoryRenderPass scope (where timeoutMs is assigned) and cut/paste the
constant definition (the STORY_RENDER_TIMEOUT_MS declaration) to a position
above runStoryRenderPass to improve readability and flow.
code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts (2)

75-80: ⚡ Quick win

Make telemetry mocks async and complete to match runtime behavior.

trackAiSetupNudge is awaited in production code, but mocks are plain vi.fn() in several places. Use mockResolvedValue(undefined) and keep the same method surface everywhere to avoid false positives in async flows.

Proposed fix
     vi.mocked(TelemetryService).mockImplementation(function () {
       return {
-        trackNewUserCheck: vi.fn(),
-        trackInstallType: vi.fn(),
+        trackNewUserCheck: vi.fn().mockResolvedValue(undefined),
+        trackInstallType: vi.fn().mockResolvedValue(undefined),
+        trackAiSetupNudge: vi.fn().mockResolvedValue(undefined),
       };
     });

@@
     const mockTelemetryService = {
-      trackNewUserCheck: vi.fn(),
-      trackInstallType: vi.fn(),
-      trackAiSetupNudge: vi.fn(),
+      trackNewUserCheck: vi.fn().mockResolvedValue(undefined),
+      trackInstallType: vi.fn().mockResolvedValue(undefined),
+      trackAiSetupNudge: vi.fn().mockResolvedValue(undefined),
     };

@@
       (yesCommand as unknown as CommandWithPrivates).telemetryService = {
-        trackNewUserCheck: vi.fn(),
-        trackInstallType: vi.fn(),
-        trackAiSetupNudge: vi.fn(),
+        trackNewUserCheck: vi.fn().mockResolvedValue(undefined),
+        trackInstallType: vi.fn().mockResolvedValue(undefined),
+        trackAiSetupNudge: vi.fn().mockResolvedValue(undefined),
       };

As per coding guidelines, “Each mock implementation should return a Promise for async functions in Vitest” and “Mock all required properties and methods that the test subject uses in Vitest tests.”

Also applies to: 92-97, 336-340

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts` around
lines 75 - 80, The TelemetryService test mock currently returns synchronous
vi.fn() methods but production awaits async methods (e.g., trackAiSetupNudge);
update the mock implementations used in UserPreferencesCommand.test (the
vi.mocked(TelemetryService).mockImplementation blocks) to provide the same
method surface but with async-resolving functions (use
vi.fn().mockResolvedValue(undefined) for trackAiSetupNudge and any other awaited
methods like trackNewUserCheck and trackInstallType) so tests mirror runtime
behavior and avoid false positives; apply the same change to the other mock
blocks referenced (around the other mocked implementations at the noted
locations).

126-458: 🏗️ Heavy lift

Move prompt mock behaviors from inline test bodies into beforeEach-based scenario setup.

There are many inline mockResolvedValueOnce(...) calls inside test cases. Please shift these into scoped beforeEach blocks per scenario (interactive new user, light install, AI accepted, etc.) to match repo spy-mocking rules and reduce test coupling.

As per coding guidelines, “Implement mock behaviors in beforeEach blocks in Vitest tests” and “Avoid inline mock implementations within test cases in Vitest tests.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts` around
lines 126 - 458, Several tests use inline
vi.mocked(prompt.select).mockResolvedValueOnce(...) and
vi.mocked(prompt.confirm).mockResolvedValueOnce(...) inside test bodies; move
those mockResolvedValueOnce calls into scoped beforeEach blocks for each
scenario (e.g., the "interactive new user" scenario, the "light install"
scenario, and the "AI accepted/declined" scenario) so mocks are declared before
calling UserPreferencesCommand.execute; specifically, create nested
describe/beforeEach blocks under the existing "AI setup prompt" and "execute"
suites that set prompt.select and prompt.confirm sequences used by
command.execute (referencing prompt.select, prompt.confirm, command.execute,
defaultExecuteOptions, and Feature enums), and ensure you reset/restore mocks
between scenarios (using vi.resetAllMocks or similar) to avoid cross-test
coupling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts`:
- Around line 41-49: The afterAll cleanup currently recreates an own isTTY
property when originalIsTTYDescriptor is falsy; instead, in the afterAll handler
for UserPreferencesCommand.test.ts detect originalIsTTYDescriptor and if falsy
remove the override by deleting process.stdout.isTTY (rather than defining it as
undefined) so inherited behavior is restored; update the afterAll block that
references originalIsTTYDescriptor and process.stdout.isTTY to call delete
process.stdout.isTTY in the else branch.

---

Duplicate comments:
In `@scripts/eval/eval.ts`:
- Around line 134-150: The code calls inferAgent(values.model) before Zod
validation, allowing inferAgent to throw on invalid models; instead, stop
calling inferAgent inside the object passed to argsSchema.safeParse and move the
inference until after successful parsing: pass agent through unchanged (or leave
it undefined) into argsSchema.safeParse, then after parsed.success compute the
final agent with something like agent = parsed.data.agent ?? (parsed.data.model
? inferAgent(parsed.data.model) : 'claude'); update downstream uses to use this
post-validated agent so invalid models produce Zod validation errors rather than
uncaught exceptions.

---

Nitpick comments:
In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts`:
- Around line 75-80: The TelemetryService test mock currently returns
synchronous vi.fn() methods but production awaits async methods (e.g.,
trackAiSetupNudge); update the mock implementations used in
UserPreferencesCommand.test (the vi.mocked(TelemetryService).mockImplementation
blocks) to provide the same method surface but with async-resolving functions
(use vi.fn().mockResolvedValue(undefined) for trackAiSetupNudge and any other
awaited methods like trackNewUserCheck and trackInstallType) so tests mirror
runtime behavior and avoid false positives; apply the same change to the other
mock blocks referenced (around the other mocked implementations at the noted
locations).
- Around line 126-458: Several tests use inline
vi.mocked(prompt.select).mockResolvedValueOnce(...) and
vi.mocked(prompt.confirm).mockResolvedValueOnce(...) inside test bodies; move
those mockResolvedValueOnce calls into scoped beforeEach blocks for each
scenario (e.g., the "interactive new user" scenario, the "light install"
scenario, and the "AI accepted/declined" scenario) so mocks are declared before
calling UserPreferencesCommand.execute; specifically, create nested
describe/beforeEach blocks under the existing "AI setup prompt" and "execute"
suites that set prompt.select and prompt.confirm sequences used by
command.execute (referencing prompt.select, prompt.confirm, command.execute,
defaultExecuteOptions, and Feature enums), and ensure you reset/restore mocks
between scenarios (using vi.resetAllMocks or similar) to avoid cross-test
coupling.

In `@scripts/eval/lib/story-render.ts`:
- Line 99: Move the STORY_RENDER_TIMEOUT_MS constant declaration so it appears
before its first use in runStoryRenderPass: locate the current reference to
STORY_RENDER_TIMEOUT_MS in the runStoryRenderPass scope (where timeoutMs is
assigned) and cut/paste the constant definition (the STORY_RENDER_TIMEOUT_MS
declaration) to a position above runStoryRenderPass to improve readability and
flow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c91e55b9-f65f-426f-b63c-3b250e7dd09e

📥 Commits

Reviewing files that changed from the base of the PR and between db8eabc and e8b2899.

📒 Files selected for processing (8)
  • code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts
  • scripts/eval/README.md
  • scripts/eval/eval.ts
  • scripts/eval/lib/grade.ts
  • scripts/eval/lib/publish-trial.ts
  • scripts/eval/lib/story-render.ts
  • scripts/eval/lib/utils.ts
  • scripts/eval/run-batch.ts
✅ Files skipped from review due to trivial changes (2)
  • scripts/eval/run-batch.ts
  • scripts/eval/lib/grade.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/eval/lib/publish-trial.ts

Comment on lines +41 to +49
afterAll(() => {
if (originalIsTTYDescriptor) {
Object.defineProperty(process.stdout, 'isTTY', originalIsTTYDescriptor);
} else {
Object.defineProperty(process.stdout, 'isTTY', {
value: undefined,
configurable: true,
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts | head -80

Repository: storybookjs/storybook

Length of output: 3557


🏁 Script executed:

cat -n code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts | sed -n '80,150p'

Repository: storybookjs/storybook

Length of output: 2952


🏁 Script executed:

# Check if there are any other tests or processes that might be affected
cat -n code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts | tail -100

Repository: storybookjs/storybook

Length of output: 4361


Restore process.stdout.isTTY by removing the override when there was no original descriptor.

The else branch currently defines isTTY as undefined, creating an own property that shadows inherited behavior. Since Object.getOwnPropertyDescriptor() only returns own properties (not inherited ones), when originalIsTTYDescriptor is falsy, the correct restoration is to delete the property, not redefine it.

Proposed fix
   afterAll(() => {
     if (originalIsTTYDescriptor) {
       Object.defineProperty(process.stdout, 'isTTY', originalIsTTYDescriptor);
     } else {
-      Object.defineProperty(process.stdout, 'isTTY', {
-        value: undefined,
-        configurable: true,
-      });
+      Reflect.deleteProperty(process.stdout, 'isTTY');
     }
   });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
afterAll(() => {
if (originalIsTTYDescriptor) {
Object.defineProperty(process.stdout, 'isTTY', originalIsTTYDescriptor);
} else {
Object.defineProperty(process.stdout, 'isTTY', {
value: undefined,
configurable: true,
});
}
afterAll(() => {
if (originalIsTTYDescriptor) {
Object.defineProperty(process.stdout, 'isTTY', originalIsTTYDescriptor);
} else {
Reflect.deleteProperty(process.stdout, 'isTTY');
}
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts` around
lines 41 - 49, The afterAll cleanup currently recreates an own isTTY property
when originalIsTTYDescriptor is falsy; instead, in the afterAll handler for
UserPreferencesCommand.test.ts detect originalIsTTYDescriptor and if falsy
remove the override by deleting process.stdout.isTTY (rather than defining it as
undefined) so inherited behavior is restored; update the afterAll block that
references originalIsTTYDescriptor and process.stdout.isTTY to call delete
process.stdout.isTTY in the else branch.

@yannbf yannbf merged commit e94c57c into next Apr 30, 2026
299 checks passed
@yannbf yannbf deleted the project/sb-agentic-setup branch April 30, 2026 10:22
@yannbf yannbf added the needs qa Indicates that this needs manual QA during the upcoming minor/major release label May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:daily Run the CI jobs that normally run in the daily job. cli core feature request needs qa Indicates that this needs manual QA during the upcoming minor/major release ui

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Tracking]: SB Agentic Setup

6 participants