CLI: Introduce Agentic Setup workflow#34297
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new AI setup/skill CLI and extensive evaluation tooling: CLI ai/skill entries, AI prompt registry and generators, Vitest agent telemetry reporter, create-storybook AI onboarding plumbing, and a full eval harness (agents, grading, story-render, trial runner, publishing, batch runner, and utilities), plus tests, docs, and minor config/editor changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as "eval CLI / runTrial"
participant Workspace as "Trial Workspace (fs)"
participant Agent as "Agent Driver (Claude/Codex)"
participant Grader as "Grader (grade/story-render)"
participant Publisher as "publishTrialBranch / GitHub"
rect rgba(200,220,255,0.5)
CLI->>Workspace: prepareTrialWorkspace(config)
CLI->>Workspace: captureEnvironment()
CLI->>Workspace: write prompt.md & setup-prompt.md
end
rect rgba(200,255,200,0.5)
CLI->>Agent: execute(prompt, workspace, variant)
Agent-->>CLI: streaming transcript & execution metrics
Agent->>Workspace: write provisional artifacts (optional)
end
rect rgba(255,220,200,0.5)
CLI->>Grader: grade(baseline, trial, ghost-stories)
Grader-->>CLI: computed Grade & QualityScore
CLI->>Workspace: update data.json with final results
end
rect rgba(220,255,220,0.5)
CLI->>Publisher: publishTrialBranch(results, workspace)
Publisher->>GitHub: create branch, push, open PR, add labels
Publisher-->>CLI: {branch, labels, url}
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
code/lib/cli-storybook/src/bin/run.ts (1)
314-321: Extract and export the action handler for direct testing.The inline anonymous handler makes targeted unit tests harder; moving it to a named exported function improves coverage and maintainability.
As per coding guidelines, "Export functions that need direct tests to enable proper unit testing coverage".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code/lib/cli-storybook/src/bin/run.ts` around lines 314 - 321, Extract the inline anonymous .action handler into a named, exported async function (e.g., export async function runStorybookSkills(options)) that contains the same steps (logger.intro('Checking Storybook skills'), await skill(options) if applicable, logger.outro('Done')) and preserve any telemetry/error handling semantics; then replace the anonymous handler in .action(...) with a reference to this new function so it can be directly imported and unit-tested. Ensure the new function signature matches how options are passed by the CLI and re-export it from the module so tests can import it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@code/lib/cli-storybook/src/bin/run.ts`:
- Around line 306-307: The CLI registers the command as command('skill') which
mismatches the tracked contract expecting the plural form; update the command
registration in run.ts to use command('skills') (and adjust any associated
description or help text if needed) so the CLI name matches the
documented/expected "storybook skills" contract and avoid command/docs mismatch.
- Around line 314-321: The current action handler is a no-op that only logs
progress; replace it with the telemetry + failure flow and either invoke the
real skill or fail-fast until the implementation exists: wrap the body with
withTelemetry('skill', { cliOptions: options }, async () => {
logger.intro('Checking Storybook skills'); await skill(options);
logger.outro('Done'); }) and append
.catch(handleCommandFailure(options.logfile)); if the exported skill function is
not yet available, call withTelemetry and throw a clear Error (e.g. "storybook
skill not implemented") inside the async callback so telemetry is preserved and
the failure handler runs; reference the action callback, withTelemetry, skill,
handleCommandFailure, and logger.intro/logger.outro when making the change.
---
Nitpick comments:
In `@code/lib/cli-storybook/src/bin/run.ts`:
- Around line 314-321: Extract the inline anonymous .action handler into a
named, exported async function (e.g., export async function
runStorybookSkills(options)) that contains the same steps
(logger.intro('Checking Storybook skills'), await skill(options) if applicable,
logger.outro('Done')) and preserve any telemetry/error handling semantics; then
replace the anonymous handler in .action(...) with a reference to this new
function so it can be directly imported and unit-tested. Ensure the new function
signature matches how options are passed by the CLI and re-export it from the
module so tests can import it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5205f63d-da05-4d2c-9898-ddccd28f26ef
📒 Files selected for processing (1)
code/lib/cli-storybook/src/bin/run.ts
|
View your CI Pipeline Execution ↗ for commit 8ed4332
☁️ Nx Cloud last updated this comment at |
…lify to once-ever gate
3b94c97 to
cc92a8c
Compare
Package BenchmarksCommit: The following packages have significant changes to their size or dependencies:
|
| Before | After | Difference | |
|---|---|---|---|
| Dependency count | 50 | 50 | 0 |
| Self size | 20.55 MB | 20.58 MB | 🚨 +38 KB 🚨 |
| Dependency size | 16.56 MB | 16.56 MB | 0 B |
| Bundle Size Analyzer | Link | Link |
@storybook/cli
| Before | After | Difference | |
|---|---|---|---|
| Dependency count | 184 | 184 | 0 |
| Self size | 782 KB | 836 KB | 🚨 +54 KB 🚨 |
| Dependency size | 68.22 MB | 68.26 MB | 🚨 +46 KB 🚨 |
| Bundle Size Analyzer | Link | Link |
@storybook/codemod
| Before | After | Difference | |
|---|---|---|---|
| Dependency count | 177 | 177 | 0 |
| Self size | 32 KB | 32 KB | 🎉 -36 B 🎉 |
| Dependency size | 66.74 MB | 66.78 MB | 🚨 +38 KB 🚨 |
| Bundle Size Analyzer | Link | Link |
create-storybook
| Before | After | Difference | |
|---|---|---|---|
| Dependency count | 51 | 51 | 0 |
| Self size | 1.04 MB | 1.05 MB | 🚨 +7 KB 🚨 |
| Dependency size | 37.10 MB | 37.14 MB | 🚨 +38 KB 🚨 |
| Bundle Size Analyzer | node | node |
cc92a8c to
11bc8f0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
code/addons/vitest/src/vitest-plugin/index.ts (1)
462-491:⚠️ Potential issue | 🔴 CriticalThe
configureVitesthook must not be async in Vitest 4.x.Vitest 4.x defines
configureVitestas synchronous:(context: VitestPluginContext) => void. The hook does not support async functions or return Promises—this is not documented or intended by the plugin API. The current code usesasync configureVitest(context)withawait isWithinInitialSession('ai-setup'), which violates this contract.Move the async logic outside the hook, or handle the session check synchronously within
configureVitestand defer telemetry operations if necessary (e.g., usingsetImmediateor a separate initialization step).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code/addons/vitest/src/vitest-plugin/index.ts` around lines 462 - 491, The configureVitest hook is declared async but Vitest 4.x requires it be synchronous; change configureVitest to a plain synchronous function (remove async/await) and move any async work (the await isWithinInitialSession('ai-setup') and any async telemetry/reporters setup) into a separate async initializer that runs after configuration (for example call an async init function via setImmediate/Promise.resolve().then(...) or a top-level startTelemetry() invoked without awaiting), and inside that async initializer use detectAgent(), await isWithinInitialSession('ai-setup'), isTelemetryModuleEnabled(), and push the new AgentTelemetryReporter({ configDir: finalOptions.configDir, agent }) into context.vitest.config.reporters or otherwise mutate the config synchronously-safe; keep synchronous bits like context.vitest.config.coverage.exclude.push('storybook-static') inside configureVitest and reference the existing symbols configureVitest, isWithinInitialSession, detectAgent, AgentTelemetryReporter, telemetry, isTelemetryModuleEnabled, withinAgenticSetupSession, finalOptions.configDir, and context.vitest.config.reporters when moving logic.
🧹 Nitpick comments (3)
code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts (2)
5-13: Mock pattern violates coding guidelines — preferspy: true.The mock for
storybook/internal/telemetrydoesn't use thespy: trueoption and defines implementations inline. Per coding guidelines, mocks should usespy: trueand behaviors should be configured inbeforeEach.For
isExampleStoryId, you're re-implementing the actual function behavior. Usingspy: truewould preserve the original implementation automatically.♻️ Suggested refactor
-vi.mock('storybook/internal/telemetry', () => ({ - telemetry: vi.fn(), - isExampleStoryId: vi.fn( - (id: string) => - id.startsWith('example-button--') || - id.startsWith('example-header--') || - id.startsWith('example-page--') - ), -})); +vi.mock('storybook/internal/telemetry', { spy: true });Then in
beforeEach:beforeEach(() => { vi.clearAllMocks(); vi.mocked(telemetry).mockResolvedValue(undefined); // isExampleStoryId retains its original implementation with spy: true reporter = new AgentTelemetryReporter({ configDir: '.storybook', agent: { name: 'claude' }, }); });As per coding guidelines: "Use
vi.mock()with thespy: trueoption for all package and file mocks" and "Implement mock behaviors inbeforeEachblocks".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts` around lines 5 - 13, The test currently provides a full inline mock for 'storybook/internal/telemetry' which reimplements isExampleStoryId and omits spy: true; change vi.mock(...) to use the spy: true option so original implementations (including isExampleStoryId) are preserved, remove the inline implementation for isExampleStoryId, and move specific mock behaviors into the test beforeEach: call vi.clearAllMocks(), mock telemetry (vi.mocked(telemetry)) to resolve/return the desired value, and then instantiate reporter = new AgentTelemetryReporter(...) as before; ensure references in the file target the telemetry export and isExampleStoryId symbol names so behavior is configured in beforeEach rather than inside vi.mock.
72-96: Tests foronTestCaseResultlack assertions.The tests in lines 73-79, 81-87, and 89-95 call
onTestCaseResultbut don't assert any expected behavior. Consider adding assertions to verify the internal state or expose a way to check collected results.♻️ Suggested improvement
it('should collect story test results', () => { const testCase = createMockTestCase({ storyId: 'my-story--primary', status: 'passed', }); reporter.onTestCaseResult(testCase as any); + // Verify by checking telemetry is called with this result in onTestRunEnd + // Or expose a getter for testResults length if needed for unit testing });Alternatively, consider testing these behaviors through integration with
onTestRunEndassertions, which already verify the accumulated results.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts` around lines 72 - 96, The tests call reporter.onTestCaseResult but have no assertions; update each test to assert expected behavior by either (A) inspecting reporter's collected results (e.g., add or use a method/property like reporter.getCollectedResults() or reporter.collectedResults) after calling onTestCaseResult for the story and example cases, or (B) invoke reporter.onTestRunEnd() and assert the telemetry sender was called with expected payloads (spy/mock the sendTelemetry function) to verify accumulation; use the existing createMockTestCase inputs (storyId/status) to assert the specific inclusion, exclusion, or skipping of results for onTestCaseResult.code/addons/vitest/src/vitest-plugin/index.ts (1)
251-252: Closure captureswithinAgenticSetupSessionby reference — verify timing is correct.The variable is initialized to
false, captured in thegetInitialGlobalsclosure, then updated inconfigureVitest. This works because JS closures capture by reference, and Vitest's lifecycle guaranteesconfigureVitestcompletes before tests invoke browser commands.Consider adding a brief comment explaining this dependency for future maintainers:
+ // Set in configureVitest, read by getInitialGlobals browser command let withinAgenticSetupSession = false;Also applies to: 397-406
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code/addons/vitest/src/vitest-plugin/index.ts` around lines 251 - 252, The closure captures the module-scoped boolean withinAgenticSetupSession which is set to false then later toggled in configureVitest and read inside getInitialGlobals; add a short explanatory comment next to the withinAgenticSetupSession declaration (and mirror at the other capture site around the code at the second occurrence) stating that closures capture by reference and that Vitest's lifecycle guarantees configureVitest runs before any test/browser commands so the timing is intentional and must be preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.vscode/settings.json:
- Around line 26-29: The settings file contains two separate "[typescript]"
blocks causing a duplicate key; merge them by consolidating the properties so a
single "[typescript]" object contains both "editor.defaultFormatter" and
"editor.formatOnSave" (preserve the existing values) and remove the redundant
block; locate the duplicate "[typescript]" entries and update the one named
"[typescript]" to include both editor.defaultFormatter and editor.formatOnSave
while deleting the other.
---
Outside diff comments:
In `@code/addons/vitest/src/vitest-plugin/index.ts`:
- Around line 462-491: The configureVitest hook is declared async but Vitest 4.x
requires it be synchronous; change configureVitest to a plain synchronous
function (remove async/await) and move any async work (the await
isWithinInitialSession('ai-setup') and any async telemetry/reporters setup) into
a separate async initializer that runs after configuration (for example call an
async init function via setImmediate/Promise.resolve().then(...) or a top-level
startTelemetry() invoked without awaiting), and inside that async initializer
use detectAgent(), await isWithinInitialSession('ai-setup'),
isTelemetryModuleEnabled(), and push the new AgentTelemetryReporter({ configDir:
finalOptions.configDir, agent }) into context.vitest.config.reporters or
otherwise mutate the config synchronously-safe; keep synchronous bits like
context.vitest.config.coverage.exclude.push('storybook-static') inside
configureVitest and reference the existing symbols configureVitest,
isWithinInitialSession, detectAgent, AgentTelemetryReporter, telemetry,
isTelemetryModuleEnabled, withinAgenticSetupSession, finalOptions.configDir, and
context.vitest.config.reporters when moving logic.
---
Nitpick comments:
In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts`:
- Around line 5-13: The test currently provides a full inline mock for
'storybook/internal/telemetry' which reimplements isExampleStoryId and omits
spy: true; change vi.mock(...) to use the spy: true option so original
implementations (including isExampleStoryId) are preserved, remove the inline
implementation for isExampleStoryId, and move specific mock behaviors into the
test beforeEach: call vi.clearAllMocks(), mock telemetry (vi.mocked(telemetry))
to resolve/return the desired value, and then instantiate reporter = new
AgentTelemetryReporter(...) as before; ensure references in the file target the
telemetry export and isExampleStoryId symbol names so behavior is configured in
beforeEach rather than inside vi.mock.
- Around line 72-96: The tests call reporter.onTestCaseResult but have no
assertions; update each test to assert expected behavior by either (A)
inspecting reporter's collected results (e.g., add or use a method/property like
reporter.getCollectedResults() or reporter.collectedResults) after calling
onTestCaseResult for the story and example cases, or (B) invoke
reporter.onTestRunEnd() and assert the telemetry sender was called with expected
payloads (spy/mock the sendTelemetry function) to verify accumulation; use the
existing createMockTestCase inputs (storyId/status) to assert the specific
inclusion, exclusion, or skipping of results for onTestCaseResult.
In `@code/addons/vitest/src/vitest-plugin/index.ts`:
- Around line 251-252: The closure captures the module-scoped boolean
withinAgenticSetupSession which is set to false then later toggled in
configureVitest and read inside getInitialGlobals; add a short explanatory
comment next to the withinAgenticSetupSession declaration (and mirror at the
other capture site around the code at the second occurrence) stating that
closures capture by reference and that Vitest's lifecycle guarantees
configureVitest runs before any test/browser commands so the timing is
intentional and must be preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 279f1224-7ef4-4b74-aaa8-f6edeb102003
📒 Files selected for processing (9)
.circleci/config.yml.gitignore.vscode/settings.jsonAGENTS.mdcode/addons/a11y/src/preview.tsxcode/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.tscode/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.tscode/addons/vitest/src/vitest-plugin/index.tscode/core/src/cli/globalSettings.ts
✅ Files skipped from review due to trivial changes (4)
- .circleci/config.yml
- code/addons/a11y/src/preview.tsx
- code/core/src/cli/globalSettings.ts
- .gitignore
…mpt-check Eval: Record hasCssCheckStory when the diff adds CssCheck
Co-authored-by: Steve Dodier-Lazaro <Sidnioulz@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts (1)
457-472: Assert forwarded behavior here, not just shape.This test stays green even if
executeUserPreferences()stops forwardingisAiSetupAvailableorisTestFeatureAvailable, because any object withselectedFeaturesandnewUsersatisfies it. Please assert one observable branch through the wrapper instead, e.g. thatFeature.TESTis omitted whenisTestFeatureAvailable: falseor thatFeature.AIis added when AI is available.Based on learnings: Export functions that need direct tests and test real behavior, not just syntax patterns.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts` around lines 457 - 472, The test for executeUserPreferences only asserts shape; instead update the test to assert forwarded flags affect behavior by calling executeUserPreferences with controlled inputs for isTestFeatureAvailable and isAiSetupAvailable (via defaultExecuteOptions or explicit options) and then asserting that Feature.TEST is absent when isTestFeatureAvailable:false and that Feature.AI is present when isAiSetupAvailable:true (use the exported executeUserPreferences helper and the Feature enum to check presence/absence in result.selectedFeatures); if helper functions being tested are not exported, export them so tests can call them directly and verify real branching rather than only the return shape.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@code/lib/create-storybook/src/initiate.ts`:
- Around line 116-118: The current gate disables AI in empty directories by
setting isAiSetupAvailable: isAiSetupAvailable && !isEmptyProject, which
prevents Feature.AI from being selectable for explicit agent runs; change the
condition to keep AI available when the user explicitly requested an agent run
(e.g., add an explicitAgentRun/agentFlag check) so it reads something like
isAiSetupAvailable && (!isEmptyProject || explicitAgentRun); ensure the
Feature.AI selection can be made and that downstream flags showAgentFollowUp and
showAiInstructions are set when explicitAgentRun is true.
---
Nitpick comments:
In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts`:
- Around line 457-472: The test for executeUserPreferences only asserts shape;
instead update the test to assert forwarded flags affect behavior by calling
executeUserPreferences with controlled inputs for isTestFeatureAvailable and
isAiSetupAvailable (via defaultExecuteOptions or explicit options) and then
asserting that Feature.TEST is absent when isTestFeatureAvailable:false and that
Feature.AI is present when isAiSetupAvailable:true (use the exported
executeUserPreferences helper and the Feature enum to check presence/absence in
result.selectedFeatures); if helper functions being tested are not exported,
export them so tests can call them directly and verify real branching rather
than only the return shape.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 76a67a8d-8931-48d3-8319-942b0534f12d
📒 Files selected for processing (3)
code/lib/create-storybook/src/commands/UserPreferencesCommand.test.tscode/lib/create-storybook/src/commands/UserPreferencesCommand.tscode/lib/create-storybook/src/initiate.ts
CLI: Source eval prompts from ai setup via EVAL_SETUP_PROMPT
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (3)
scripts/eval/lib/publish-trial.test.ts (1)
102-123: Refactortinyexecmocking to align with repo Vitest spy-mocking patternsThe test uses
vi.doMock()without thespy: trueoption and places inline mock implementations inside test cases, which violates the repo's standardized mocking guidelines. Move mocks to the top of the file withspy: trueand implement behaviors inbeforeEachblocks.Current pattern (lines 102-123, 289-297, 393-423)
vi.doMock('tinyexec', () => ({ x: vi.fn( async (cmd: string, args: string[], options?: { nodeOptions?: { cwd?: string } }) => { // inline mock implementation } ), }));Per repo guidelines: Use
vi.mock()with thespy: trueoption, place all mocks at the top before test cases, and avoid inline mock implementations within test cases.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval/lib/publish-trial.test.ts` around lines 102 - 123, The test is creating an inline mock for the tinyexec module using vi.doMock with an inline vi.fn implementation inside test bodies; refactor by replacing vi.doMock(...) with vi.mock('tinyexec', { spy: true }) at the top of the file and move the inline behavior into a beforeEach that sets the spy implementation for the exported x function (e.g., use (tinyexec.x as vi.SpyInstance).mockImplementation(...) or vi.spyOn(...) in beforeEach) so calls array setup and the conditional return cases (gh label list, git config, gh pr create) are defined outside individual tests and the mock is a spy per repo guidelines.scripts/eval/lib/run-trial.test.ts (1)
10-55: ⚡ Quick winBring these Vitest mocks back in line with the repo rules.
Lines 10-55 bake behavior into the
vi.mock(...)factories, and Lines 320-369 reconfigure mocks inside a test case. That makes the suite harder to reset cleanly and diverges from the repo’sspy: true/beforeEachpattern. It’d also be safer to make the mock specifiers match the imported.tspaths so the intercepted module is unambiguous.As per coding guidelines, "Use
vi.mock()with thespy: trueoption for all package and file mocks in Vitest tests" and "Implement mock behaviors inbeforeEachblocks in Vitest tests".Also applies to: 320-369
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval/lib/run-trial.test.ts` around lines 10 - 55, Mocks in this test file are baking behavior inside vi.mock factories instead of using spy: true and configuring behaviors in beforeEach; update each vi.mock call (for './prepare-trial', './grade', './publish-trial', './utils', './agents/claude-code', './agents/codex', and 'tinyexec') to include the option { spy: true } and stop returning concrete mockResolvedValue data from the factory; instead, move all mock behavior (e.g., publishTrialBranch mockResolvedValue, captureEnvironment mockResolvedValue, claudeAgent.execute/codexAgent.execute implementations, tinyexec.x resolved value, and any prepareTrial/grade spies) into a beforeEach block where you call vi.mocked(...).mockResolvedValue or .mockImplementation as needed; also ensure the module specifiers exactly match the imported .ts paths used by the code under test (e.g., './utils.ts', './agents/claude-code.ts') so the mocks intercept the correct modules.scripts/eval/lib/utils.ts (1)
33-40: ⚡ Quick winUse the Storybook node logger here instead of raw
console.log.This helper is shared across the eval library now, so it is no longer isolated enough to justify raw console logging.
As per coding guidelines, use Storybook loggers instead of raw
console.*in normal code paths.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval/lib/utils.ts` around lines 33 - 40, The helper createLogger currently uses console.log in its methods (log, logStep, logSuccess, logError); replace those calls with the Storybook node logger by importing the logger from '@storybook/node-logger' and routing messages to the appropriate logger methods (e.g., logger.info for regular/log and step messages, logger.success or logger.info with a success prefix for logSuccess, and logger.error or logger.warn for logError) while preserving the existing prefix/formatting logic so createLogger returns the same API but using Storybook's logger instead of console.log.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@code/lib/cli-storybook/src/ai/index.ts`:
- Around line 70-73: The support gate currently uses a conjunction so it only
rejects when both rendererPackage !== '@storybook/react' AND builderPackage !==
'@storybook/builder-vite'; change the condition in the if that checks
projectInfo.rendererPackage and projectInfo.builderPackage (in ai/index.ts) to
use OR (||) so the command exits whenever either requirement is not met (i.e.,
renderer is not '@storybook/react' OR builder is not '@storybook/builder-vite'),
keeping the existing exit/log behavior inside that if block.
In `@code/lib/cli-storybook/src/ai/setup-prompts/pattern-copy-play.ts`:
- Around line 62-119: Update getPreviewConfigExample to use ProjectInfo.language
to emit JS or TS variants: when ProjectInfo.language indicates JavaScript,
produce preview.js (not preview.tsx), remove TypeScript-only constructs (no
"import type", no ": Preview" annotations, no "satisfies" or typed Story
aliases) and use plain default export (module.exports or export default with an
untyped object) and when TypeScript emit the existing TS example; apply the same
language-aware branching and removals to the other prompt generators referenced
(the blocks around lines 140-154, 184-205, 234-255, 319-354, 388-415, 445-468,
488-489, 671-679) so all shipped prompts use .js filenames and untyped syntax
for JS projects and keep typed examples for TS projects.
In `@code/lib/cli-storybook/src/ai/setup-prompts/setup.ts`:
- Around line 181-216: The steps hardcode TypeScript filenames and TS-only
syntax; make the generated filenames and example snippets language-aware using
projectInfo.language: compute an extension variable (e.g., ext =
projectInfo.language startsWith 'ts' ? 'tsx' : 'jsx') and use
`${configDir}/preview.${ext}` and `<ComponentName>.stories.${ext}` instead of
hardcoded `.tsx`/`.stories.tsx`; update getPreviewDecoratorExample(projectInfo)
(or its caller) to accept/inspect projectInfo.language and emit JS-safe examples
when language is JS (no `import type`, no `satisfies`, use plain imports/exports
and JS-compatible syntax) so the instructions and fallback examples are valid
for both JS and TS projects.
In `@scripts/eval/eval.ts`:
- Around line 134-149: The code calls inferAgent(values.model) before
argsSchema.safeParse, which lets inferAgent throw on invalid --model and bypass
CLI validation; change the flow so parsing/validation runs first without calling
inferAgent (e.g., pass through the raw values and an undefined agent into
argsSchema.safeParse), then after parsed.success compute the final agent from
parsed.data (use parsed.data.agent ?? (parsed.data.model ?
inferAgent(parsed.data.model) : 'claude')), and apply the same fix to the other
occurrence around the block at lines 246-250; refer to inferAgent,
argsSchema.safeParse, parsed, agent, and values.model to locate and update the
logic.
- Around line 253-262: In buildManualCommand, make the manual-shell command
robust by quoting and escaping promptPath inside the subshell so paths with
spaces or metacharacters don't break the command; change how promptArg is
constructed (the `"$(cat ${promptPath})"` piece) to wrap promptPath in quotes
and escape any embedded quotes (so the subshell becomes something like "$(cat
'...')" with proper escaping of single quotes) and use that updated promptArg in
the returned strings for both the claude and codex branches.
In `@scripts/eval/lib/grade.ts`:
- Around line 183-190: The current branch treats any non-'pass' cssCheck as an
error; change the logic in the block that reads cssCheck (variable cssCheck from
storyRenderRun.summary?.cssCheck) so that only 'fail' triggers logger.logError
and 'not-run' (or when storyRenderRun.attempted === false from
runStoryRenderPass()) is logged as a non-error (e.g., logger.logInfo or
logger.log with a neutral message), while 'pass' still calls logger.logSuccess;
reference storyRenderRun, cssCheck, runStoryRenderPass(), and
logger.logError/logSuccess to locate and update the condition handling.
- Around line 132-149: The hardcoded 'npx' invocations for Storybook and tsc
should use the project's package manager the same way ghost stories do: call
detectPackageManager(resolveInstallRoot(projectPath)) to get the package manager
(e.g., pm) and then use getScriptRunCommand(pm) as the executable passed into
x(...) instead of the literal 'npx'; keep the same argument arrays
(['storybook','build','--quiet'] and ['tsc','--noEmit']) and preserve the
existing options (throwOnError, timeout, nodeOptions/env including NODE_OPTIONS
and STORYBOOK_DISABLE_TELEMETRY). Target the x(...) calls that currently use
'npx' for Storybook build and tsc and replace the first argument with
getScriptRunCommand(pm) obtained from detectPackageManager(...).
In `@scripts/eval/lib/publish-trial.ts`:
- Around line 279-287: The prompt body written by opts.data.prompt.content can
contain triple-backtick code fences which will prematurely close the surrounding
fence; in the lines.push block that builds the details block in publish-trial.ts
(the section adding '<details>', '<summary>Full prompt</summary>' and the fenced
block), change the opening fence string from '```md' to '````md' and the closing
fence from '```' to '````' so the outer fence can contain embedded
triple-backtick examples without being closed.
In `@scripts/eval/lib/story-render.ts`:
- Around line 202-216: readStoryRenderSummary currently calls JSON.parse(await
readFile(...)) which will throw and abort the trial if the Vitest report is
unreadable; update readStoryRenderSummary to wrap the file read + JSON.parse
(and subsequent parseVitestResults usage) in a try/catch and return undefined on
any error so the caller treats it as a missing summary rather than a fatal
error, referencing the readStoryRenderSummary function and the
parseVitestResults/parsed.summary path when implementing the catch-and-return
behavior.
In `@scripts/eval/lib/utils.ts`:
- Around line 43-44: The formatDuration function can produce "1m60s" because it
rounds the seconds remainder independently; change it to round the total seconds
first and then compute minutes and seconds from that integer total: inside
formatDuration, compute const total = Math.round(s), then const minutes =
Math.floor(total / 60) and const seconds = total % 60, and return
`${minutes}m${seconds}s` when minutes > 0 (or `${seconds}s` when minutes === 0)
to avoid an overflow to 60 seconds; update references to formatDuration
accordingly.
- Around line 177-186: The git probe in captureEnvironment currently runs in the
current process CWD and returns "unknown" if not executed from the repo root;
update captureEnvironment to run the x(...) git commands from the known
repository root by invoking x with a working-directory option (or temporarily
chdir) using the harness's REPO_ROOT constant so the calls to x('git',
['rev-parse', ...]) execute with cwd: REPO_ROOT (reference function
captureEnvironment and the x helper and REPO_ROOT symbol).
In `@scripts/eval/README.md`:
- Around line 63-74: The fenced code blocks in the README (for example the block
beginning with "Result" showing build/stories/ghost/score output and other
similar CLI/output/code examples) are missing language identifiers which
triggers markdownlint MD040; update each fenced block to include an appropriate
language tag (e.g., "text" or "console" for command/output snippets, "bash" for
shell commands, or the actual language for code samples) so every
triple-backtick fence in the README has a language identifier.
- Around line 282-304: The README points to the wrong directory for prompt
variants; update the documentation to reflect the actual code location used in
this PR by replacing references to code/lib/cli-storybook/src/ai/prompts with
code/lib/cli-storybook/src/ai/setup-prompts, and adjust the steps to mention
creating files like setup-prompts/<name>.ts that export an
instructions(projectInfo: ProjectInfo): string and registering them in
PROMPT_BUILDERS (in prompts/index.ts or the corresponding setup-prompts index),
so contributors modify the live variant location used by the eval harness.
In `@scripts/eval/run-batch.ts`:
- Around line 477-481: Update the help text for the prompt option in the prompt
config (the object keyed by prompt in run-batch.ts) to reference the new
registry location src/ai/setup-prompts/ instead of the old
code/lib/cli-storybook/src/ai/prompts/ path; edit the description string on the
prompt config (type: 'string' as const, description: ...) to point contributors
to src/ai/setup-prompts/ as the place where prompt variants are registered.
- Around line 340-354: The function requireBatchPrompt currently validates the
prompt case-insensitively but returns the caller's casing, which later breaks
loadPrompt's exact includes() check; update requireBatchPrompt to find the
matching canonical name from listPrompts() (e.g., use available.find(name =>
name.toLowerCase() === prompt.toLowerCase())) and return that canonical
available name instead of the original trimmed input, keeping the same error
behavior when no match is found; reference function requireBatchPrompt and
listPrompts to locate the change.
---
Nitpick comments:
In `@scripts/eval/lib/publish-trial.test.ts`:
- Around line 102-123: The test is creating an inline mock for the tinyexec
module using vi.doMock with an inline vi.fn implementation inside test bodies;
refactor by replacing vi.doMock(...) with vi.mock('tinyexec', { spy: true }) at
the top of the file and move the inline behavior into a beforeEach that sets the
spy implementation for the exported x function (e.g., use (tinyexec.x as
vi.SpyInstance).mockImplementation(...) or vi.spyOn(...) in beforeEach) so calls
array setup and the conditional return cases (gh label list, git config, gh pr
create) are defined outside individual tests and the mock is a spy per repo
guidelines.
In `@scripts/eval/lib/run-trial.test.ts`:
- Around line 10-55: Mocks in this test file are baking behavior inside vi.mock
factories instead of using spy: true and configuring behaviors in beforeEach;
update each vi.mock call (for './prepare-trial', './grade', './publish-trial',
'./utils', './agents/claude-code', './agents/codex', and 'tinyexec') to include
the option { spy: true } and stop returning concrete mockResolvedValue data from
the factory; instead, move all mock behavior (e.g., publishTrialBranch
mockResolvedValue, captureEnvironment mockResolvedValue,
claudeAgent.execute/codexAgent.execute implementations, tinyexec.x resolved
value, and any prepareTrial/grade spies) into a beforeEach block where you call
vi.mocked(...).mockResolvedValue or .mockImplementation as needed; also ensure
the module specifiers exactly match the imported .ts paths used by the code
under test (e.g., './utils.ts', './agents/claude-code.ts') so the mocks
intercept the correct modules.
In `@scripts/eval/lib/utils.ts`:
- Around line 33-40: The helper createLogger currently uses console.log in its
methods (log, logStep, logSuccess, logError); replace those calls with the
Storybook node logger by importing the logger from '@storybook/node-logger' and
routing messages to the appropriate logger methods (e.g., logger.info for
regular/log and step messages, logger.success or logger.info with a success
prefix for logSuccess, and logger.error or logger.warn for logError) while
preserving the existing prefix/formatting logic so createLogger returns the same
API but using Storybook's logger instead of console.log.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f18da5e9-c6fb-4b59-b620-1b73f0b7b262
📒 Files selected for processing (21)
code/lib/cli-storybook/src/ai/index.test.tscode/lib/cli-storybook/src/ai/index.tscode/lib/cli-storybook/src/ai/prompt.tscode/lib/cli-storybook/src/ai/setup-prompts/index.tscode/lib/cli-storybook/src/ai/setup-prompts/pattern-copy-play.tscode/lib/cli-storybook/src/ai/setup-prompts/setup.tscode/lib/cli-storybook/src/ai/types.tsscripts/eval/README.mdscripts/eval/eval.tsscripts/eval/lib/agents/claude-code.tsscripts/eval/lib/agents/codex.tsscripts/eval/lib/agents/config.tsscripts/eval/lib/grade.tsscripts/eval/lib/publish-trial.test.tsscripts/eval/lib/publish-trial.tsscripts/eval/lib/run-trial.test.tsscripts/eval/lib/run-trial.tsscripts/eval/lib/story-render.tsscripts/eval/lib/utils.test.tsscripts/eval/lib/utils.tsscripts/eval/run-batch.ts
✅ Files skipped from review due to trivial changes (1)
- code/lib/cli-storybook/src/ai/types.ts
| // Resolve the discriminator: explicit --agent, inferred from --model, or default to claude. | ||
| const agent = values.agent ?? (values.model ? inferAgent(values.model) : 'claude'); | ||
|
|
||
| const parsed = argsSchema.safeParse({ | ||
| ...values, | ||
| agent, | ||
| listProjects: values['list-projects'], | ||
| listModels: values['list-models'], | ||
| listPrompts: values['list-prompts'], | ||
| }); | ||
|
|
||
| if (!parsed.success) { | ||
| for (const issue of parsed.error.issues) { | ||
| console.error(pc.red(` ${issue.path.join('.')}: ${issue.message}`)); | ||
| } | ||
| process.exit(1); |
There was a problem hiding this comment.
Don’t let inferAgent() bypass CLI validation.
Line 135 calls inferAgent() before safeParse(), so --model foo throws an uncaught exception instead of returning a normal validation error.
Suggested fix
-const agent = values.agent ?? (values.model ? inferAgent(values.model) : 'claude');
+const inferredAgent = values.model ? inferAgent(values.model) : undefined;
+const agent = values.agent ?? inferredAgent ?? 'claude';
...
-function inferAgent(model: string): AgentId {
+function inferAgent(model: string): AgentId | undefined {
for (const id of AGENT_IDS) {
if (AGENTS[id].models.some((candidate) => candidate === model)) return id;
}
- throw new Error(`No agent found for model: ${model}`);
+ return undefined;
}Also applies to: 246-250
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/eval/eval.ts` around lines 134 - 149, The code calls
inferAgent(values.model) before argsSchema.safeParse, which lets inferAgent
throw on invalid --model and bypass CLI validation; change the flow so
parsing/validation runs first without calling inferAgent (e.g., pass through the
raw values and an undefined agent into argsSchema.safeParse), then after
parsed.success compute the final agent from parsed.data (use parsed.data.agent
?? (parsed.data.model ? inferAgent(parsed.data.model) : 'claude')), and apply
the same fix to the other occurrence around the block at lines 246-250; refer to
inferAgent, argsSchema.safeParse, parsed, agent, and values.model to locate and
update the logic.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/eval/eval.ts (1)
134-150:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
inferAgent()still throws before Zod validation can catch invalid models.Line 135 calls
inferAgent(values.model)beforeargsSchema.safeParse()on line 137. If--model foois passed (an invalid model),inferAgentthrows an uncaught exception instead of letting Zod produce a validation error with the friendly message format.Proposed fix
-// Resolve the discriminator: explicit --agent, inferred from --model, or default to claude. -const agent = values.agent ?? (values.model ? inferAgent(values.model) : 'claude'); +// Resolve the discriminator: explicit --agent, inferred from --model, or default to claude. +const inferredAgent = values.model ? inferAgentSafe(values.model) : undefined; +const agent = values.agent ?? inferredAgent ?? 'claude'; ... -function inferAgent(model: string): AgentId { +function inferAgentSafe(model: string): AgentId | undefined { for (const id of AGENT_IDS) { if (AGENTS[id].models.some((candidate) => candidate === model)) return id; } - throw new Error(`No agent found for model: ${model}`); + return undefined; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval/eval.ts` around lines 134 - 150, The code calls inferAgent(values.model) before Zod validation, allowing inferAgent to throw on invalid models; instead, stop calling inferAgent inside the object passed to argsSchema.safeParse and move the inference until after successful parsing: pass agent through unchanged (or leave it undefined) into argsSchema.safeParse, then after parsed.success compute the final agent with something like agent = parsed.data.agent ?? (parsed.data.model ? inferAgent(parsed.data.model) : 'claude'); update downstream uses to use this post-validated agent so invalid models produce Zod validation errors rather than uncaught exceptions.
🧹 Nitpick comments (3)
scripts/eval/lib/story-render.ts (1)
99-99: 💤 Low valueConsider moving constant definition before first use for readability.
STORY_RENDER_TIMEOUT_MSis referenced on line 99 but defined on line 158. While this works due to JavaScript's hoisting, placing the constant beforerunStoryRenderPassimproves code readability.Also applies to: 158-158
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval/lib/story-render.ts` at line 99, Move the STORY_RENDER_TIMEOUT_MS constant declaration so it appears before its first use in runStoryRenderPass: locate the current reference to STORY_RENDER_TIMEOUT_MS in the runStoryRenderPass scope (where timeoutMs is assigned) and cut/paste the constant definition (the STORY_RENDER_TIMEOUT_MS declaration) to a position above runStoryRenderPass to improve readability and flow.code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts (2)
75-80: ⚡ Quick winMake telemetry mocks async and complete to match runtime behavior.
trackAiSetupNudgeis awaited in production code, but mocks are plainvi.fn()in several places. UsemockResolvedValue(undefined)and keep the same method surface everywhere to avoid false positives in async flows.Proposed fix
vi.mocked(TelemetryService).mockImplementation(function () { return { - trackNewUserCheck: vi.fn(), - trackInstallType: vi.fn(), + trackNewUserCheck: vi.fn().mockResolvedValue(undefined), + trackInstallType: vi.fn().mockResolvedValue(undefined), + trackAiSetupNudge: vi.fn().mockResolvedValue(undefined), }; }); @@ const mockTelemetryService = { - trackNewUserCheck: vi.fn(), - trackInstallType: vi.fn(), - trackAiSetupNudge: vi.fn(), + trackNewUserCheck: vi.fn().mockResolvedValue(undefined), + trackInstallType: vi.fn().mockResolvedValue(undefined), + trackAiSetupNudge: vi.fn().mockResolvedValue(undefined), }; @@ (yesCommand as unknown as CommandWithPrivates).telemetryService = { - trackNewUserCheck: vi.fn(), - trackInstallType: vi.fn(), - trackAiSetupNudge: vi.fn(), + trackNewUserCheck: vi.fn().mockResolvedValue(undefined), + trackInstallType: vi.fn().mockResolvedValue(undefined), + trackAiSetupNudge: vi.fn().mockResolvedValue(undefined), };As per coding guidelines, “Each mock implementation should return a Promise for async functions in Vitest” and “Mock all required properties and methods that the test subject uses in Vitest tests.”
Also applies to: 92-97, 336-340
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts` around lines 75 - 80, The TelemetryService test mock currently returns synchronous vi.fn() methods but production awaits async methods (e.g., trackAiSetupNudge); update the mock implementations used in UserPreferencesCommand.test (the vi.mocked(TelemetryService).mockImplementation blocks) to provide the same method surface but with async-resolving functions (use vi.fn().mockResolvedValue(undefined) for trackAiSetupNudge and any other awaited methods like trackNewUserCheck and trackInstallType) so tests mirror runtime behavior and avoid false positives; apply the same change to the other mock blocks referenced (around the other mocked implementations at the noted locations).
126-458: 🏗️ Heavy liftMove prompt mock behaviors from inline test bodies into
beforeEach-based scenario setup.There are many inline
mockResolvedValueOnce(...)calls inside test cases. Please shift these into scopedbeforeEachblocks per scenario (interactive new user,light install,AI accepted, etc.) to match repo spy-mocking rules and reduce test coupling.As per coding guidelines, “Implement mock behaviors in
beforeEachblocks in Vitest tests” and “Avoid inline mock implementations within test cases in Vitest tests.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts` around lines 126 - 458, Several tests use inline vi.mocked(prompt.select).mockResolvedValueOnce(...) and vi.mocked(prompt.confirm).mockResolvedValueOnce(...) inside test bodies; move those mockResolvedValueOnce calls into scoped beforeEach blocks for each scenario (e.g., the "interactive new user" scenario, the "light install" scenario, and the "AI accepted/declined" scenario) so mocks are declared before calling UserPreferencesCommand.execute; specifically, create nested describe/beforeEach blocks under the existing "AI setup prompt" and "execute" suites that set prompt.select and prompt.confirm sequences used by command.execute (referencing prompt.select, prompt.confirm, command.execute, defaultExecuteOptions, and Feature enums), and ensure you reset/restore mocks between scenarios (using vi.resetAllMocks or similar) to avoid cross-test coupling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts`:
- Around line 41-49: The afterAll cleanup currently recreates an own isTTY
property when originalIsTTYDescriptor is falsy; instead, in the afterAll handler
for UserPreferencesCommand.test.ts detect originalIsTTYDescriptor and if falsy
remove the override by deleting process.stdout.isTTY (rather than defining it as
undefined) so inherited behavior is restored; update the afterAll block that
references originalIsTTYDescriptor and process.stdout.isTTY to call delete
process.stdout.isTTY in the else branch.
---
Duplicate comments:
In `@scripts/eval/eval.ts`:
- Around line 134-150: The code calls inferAgent(values.model) before Zod
validation, allowing inferAgent to throw on invalid models; instead, stop
calling inferAgent inside the object passed to argsSchema.safeParse and move the
inference until after successful parsing: pass agent through unchanged (or leave
it undefined) into argsSchema.safeParse, then after parsed.success compute the
final agent with something like agent = parsed.data.agent ?? (parsed.data.model
? inferAgent(parsed.data.model) : 'claude'); update downstream uses to use this
post-validated agent so invalid models produce Zod validation errors rather than
uncaught exceptions.
---
Nitpick comments:
In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts`:
- Around line 75-80: The TelemetryService test mock currently returns
synchronous vi.fn() methods but production awaits async methods (e.g.,
trackAiSetupNudge); update the mock implementations used in
UserPreferencesCommand.test (the vi.mocked(TelemetryService).mockImplementation
blocks) to provide the same method surface but with async-resolving functions
(use vi.fn().mockResolvedValue(undefined) for trackAiSetupNudge and any other
awaited methods like trackNewUserCheck and trackInstallType) so tests mirror
runtime behavior and avoid false positives; apply the same change to the other
mock blocks referenced (around the other mocked implementations at the noted
locations).
- Around line 126-458: Several tests use inline
vi.mocked(prompt.select).mockResolvedValueOnce(...) and
vi.mocked(prompt.confirm).mockResolvedValueOnce(...) inside test bodies; move
those mockResolvedValueOnce calls into scoped beforeEach blocks for each
scenario (e.g., the "interactive new user" scenario, the "light install"
scenario, and the "AI accepted/declined" scenario) so mocks are declared before
calling UserPreferencesCommand.execute; specifically, create nested
describe/beforeEach blocks under the existing "AI setup prompt" and "execute"
suites that set prompt.select and prompt.confirm sequences used by
command.execute (referencing prompt.select, prompt.confirm, command.execute,
defaultExecuteOptions, and Feature enums), and ensure you reset/restore mocks
between scenarios (using vi.resetAllMocks or similar) to avoid cross-test
coupling.
In `@scripts/eval/lib/story-render.ts`:
- Line 99: Move the STORY_RENDER_TIMEOUT_MS constant declaration so it appears
before its first use in runStoryRenderPass: locate the current reference to
STORY_RENDER_TIMEOUT_MS in the runStoryRenderPass scope (where timeoutMs is
assigned) and cut/paste the constant definition (the STORY_RENDER_TIMEOUT_MS
declaration) to a position above runStoryRenderPass to improve readability and
flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c91e55b9-f65f-426f-b63c-3b250e7dd09e
📒 Files selected for processing (8)
code/lib/create-storybook/src/commands/UserPreferencesCommand.test.tsscripts/eval/README.mdscripts/eval/eval.tsscripts/eval/lib/grade.tsscripts/eval/lib/publish-trial.tsscripts/eval/lib/story-render.tsscripts/eval/lib/utils.tsscripts/eval/run-batch.ts
✅ Files skipped from review due to trivial changes (2)
- scripts/eval/run-batch.ts
- scripts/eval/lib/grade.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/eval/lib/publish-trial.ts
| afterAll(() => { | ||
| if (originalIsTTYDescriptor) { | ||
| Object.defineProperty(process.stdout, 'isTTY', originalIsTTYDescriptor); | ||
| } else { | ||
| Object.defineProperty(process.stdout, 'isTTY', { | ||
| value: undefined, | ||
| configurable: true, | ||
| }); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts | head -80Repository: storybookjs/storybook
Length of output: 3557
🏁 Script executed:
cat -n code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts | sed -n '80,150p'Repository: storybookjs/storybook
Length of output: 2952
🏁 Script executed:
# Check if there are any other tests or processes that might be affected
cat -n code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts | tail -100Repository: storybookjs/storybook
Length of output: 4361
Restore process.stdout.isTTY by removing the override when there was no original descriptor.
The else branch currently defines isTTY as undefined, creating an own property that shadows inherited behavior. Since Object.getOwnPropertyDescriptor() only returns own properties (not inherited ones), when originalIsTTYDescriptor is falsy, the correct restoration is to delete the property, not redefine it.
Proposed fix
afterAll(() => {
if (originalIsTTYDescriptor) {
Object.defineProperty(process.stdout, 'isTTY', originalIsTTYDescriptor);
} else {
- Object.defineProperty(process.stdout, 'isTTY', {
- value: undefined,
- configurable: true,
- });
+ Reflect.deleteProperty(process.stdout, 'isTTY');
}
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| afterAll(() => { | |
| if (originalIsTTYDescriptor) { | |
| Object.defineProperty(process.stdout, 'isTTY', originalIsTTYDescriptor); | |
| } else { | |
| Object.defineProperty(process.stdout, 'isTTY', { | |
| value: undefined, | |
| configurable: true, | |
| }); | |
| } | |
| afterAll(() => { | |
| if (originalIsTTYDescriptor) { | |
| Object.defineProperty(process.stdout, 'isTTY', originalIsTTYDescriptor); | |
| } else { | |
| Reflect.deleteProperty(process.stdout, 'isTTY'); | |
| } | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@code/lib/create-storybook/src/commands/UserPreferencesCommand.test.ts` around
lines 41 - 49, The afterAll cleanup currently recreates an own isTTY property
when originalIsTTYDescriptor is falsy; instead, in the afterAll handler for
UserPreferencesCommand.test.ts detect originalIsTTYDescriptor and if falsy
remove the override by deleting process.stdout.isTTY (rather than defining it as
undefined) so inherited behavior is restored; update the afterAll block that
references originalIsTTYDescriptor and process.stdout.isTTY to call delete
process.stdout.isTTY in the else branch.
Closes #34295
What I did
This PR adds an agentic setup workflow that lets AI agents set up Storybook for an existing project in a self-healing loop. Two pieces ship together:
1.
storybook ai setupCLI commandA new subcommand under
storybookthat inspects the current project (framework, renderer, builder, language, existing components) and generates a project-aware markdown prompt designed to be consumed by a coding agent. The prompt instructs the agent to:.storybook/preview.tswith the right decorators, styles, and MSW handlersplayfunctions for ~10 representative componentsThe MCP addon is installed alongside so agents have a live query interface into the component library while they work.
2. Eval harness (
scripts/eval/)A benchmarking system that runs Claude Code and Codex against 7 real-world React + Vite projects, grades the output, and publishes each trial as a draft PR with a structured
data.json.collect-pr-data.tsingests those PRs into a local SQLite database for analysis.Key pieces:
eval.ts— single trial runnerrun-batch.ts— batch orchestrator with concurrency and repetition controlslib/grade.ts— 4-dimensional grading (build, typecheck, story render, ghost stories)sync-baselines.ts— pushes a canonical.storybookbaseline to all benchmark reposlib/agents/— Claude Code and Codex driver implementationsThe headline metric is normalized preview gain — how much of the remaining gap to a 100% story pass rate the agent closed. See
scripts/eval/README.mdfor full docs.Checklist for Contributors
Testing
The changes in this PR are covered in the following automated tests:
Manual testing
Caution
This section is mandatory for all contributions. If you believe no manual test is necessary, please state so explicitly. Thanks!
Test the CLI command itself:
yarn && yarn task compileyarn task sandbox --template react-vite/default-ts --start-from autonpx storybook ai setupTest the agentic flow with a real agent:
without-storybookbranch or any other – you can usenpx @hipster/sb-utils uninstall --yesin a project if you like), open Claude Code (or Codex) at the project rootRun npx storybook@next init and follow its instructions precisely..storybook/preview.ts, write ~10 story files with play functions, and run Vitest to verify them.storybook/main.tsyarn vitest run --project=storybookTest the manual flow:
without-storybookbranch or any other – you can usenpx @hipster/sb-utils uninstall --yesin a project if you like)npx storybook@next initTest the eval harness (optional, requires
ghCLI + Claude Code/Codex installed):You don't have to do this, only if you're curious about the eval system.
node scripts/eval/eval.ts --list-projectsnode scripts/eval/eval.ts -p mealdrop --prompt pattern-copy-playdata.jsonartifact and grade summarynode scripts/eval/collect-pr-data.ts --project mealdropscripts/eval/.cache/eval-pr-data.sqliteand verify the trial appears in thetrialstable and in thestory_render_summary_by_project_model_effortviewDocumentation
MIGRATION.MD
Checklist for Maintainers
When this PR is ready for testing, make sure to add
ci:normal,ci:mergedorci:dailyGH label to it to run a specific set of sandboxes. The particular set of sandboxes can be found incode/lib/cli-storybook/src/sandbox-templates.tsMake sure this PR contains one of the labels below:
Available labels
bug: Internal changes that fixes incorrect behavior.maintenance: User-facing maintenance tasks.dependencies: Upgrading (sometimes downgrading) dependencies.build: Internal-facing build tooling & test updates. Will not show up in release changelog.cleanup: Minor cleanup style change. Will not show up in release changelog.documentation: Documentation only changes. Will not show up in release changelog.feature request: Introducing a new feature.BREAKING CHANGE: Changes that break compatibility in some way with current major version.other: Changes that don't fit in the above categories.🦋 Canary release
This pull request has been released as version
0.0.0-pr-34297-sha-61aa8ea7. Try it out in a new sandbox by runningnpx storybook@0.0.0-pr-34297-sha-61aa8ea7 sandboxor in an existing project withnpx storybook@0.0.0-pr-34297-sha-61aa8ea7 upgrade.More information
0.0.0-pr-34297-sha-61aa8ea7project/sb-agentic-setup61aa8ea71777529618)To request a new release of this pull request, mention the
@storybookjs/coreteam.core team members can create a new canary release here or locally with
gh workflow run --repo storybookjs/storybook publish.yml --field pr=34297Summary by CodeRabbit
New Features
Chores
Tests