Skip to content

Build: Agentic Workflow improvements#33985

Closed
valentinpalkovic wants to merge 36 commits into
nextfrom
valentin/agentic-workflow
Closed

Build: Agentic Workflow improvements#33985
valentinpalkovic wants to merge 36 commits into
nextfrom
valentin/agentic-workflow

Conversation

@valentinpalkovic
Copy link
Copy Markdown
Contributor

@valentinpalkovic valentinpalkovic commented Mar 3, 2026

Closes #33953

What I did

This PR introduces a comprehensive infrastructure for AI agents (GitHub Copilot/Claude) to automatically verify bug fixes in Storybook with proper evidence collection.

What's Added

1. Bug Verification Workflows (Claude Skills)

Five new skills in skills that guide AI agents through systematic bug fix verification:

  • /fix-bug: End-to-end orchestration: fetch issue → plan fix → implement → verify → open PR. Now requires PRs to be labeled with agent, ci:normal, and bug
  • /verification-checklist: Universal pre-PR checks for all bug types (Flow 0)
  • /renderer-bug-workflow: Verify renderer bugs with template stories and visual inspection (Flow 1)
  • /builder-bug-workflow: Verify builder bugs with frontend snapshots (Flow 2) or terminal output (Flow 3)
  • /manager-bug-workflow: Verify Manager UI bugs with E2E tests and screenshots (Flow 4)

Key Feature: Smart routing logic that determines verification requirements based on where the bug manifests to users, not just where code is changed.

2. Terminal Output Snapshot Testing

New script capture-terminal-output.ts that:

  • Captures and normalizes terminal output from Storybook builds
  • Replaces variable content (hashes, sizes, timestamps, paths) with placeholders
  • Compares against baseline snapshots
  • Catches unintended changes to user-facing error messages and warnings

3. GitHub Actions Workflow

New workflow copilot-verification.yml that:

  • Compiles packages
  • Generates required sandboxes (react-vite-default, react-webpack-18)
  • Runs terminal output verification for both Vite and Webpack5 builders
  • Fails PR if terminal output has changed unexpectedly

4. Documentation Updates

  • Updated CLAUDE.md and copilot-instructions.md with workflow references
  • Added Cursor rules for Cursor IDE users

Why This Matters

Before: AI agents would fix bugs but couldn't systematically verify the fix worked or collect evidence for reviewers.

After: AI agents can:

  • Understand complex bug reports
  • Route to appropriate verification flow based on bug manifestation
  • Generate visual evidence (screenshots, E2E tests, snapshots)
  • Catch regressions in terminal output
  • Open PRs with complete verification evidence
  • Testing
  • Terminal output baselines seeded for:
    ✅ builder-vite-build.snap.txt
    ✅ builder-webpack5-build.snap.txt

Checklist for Contributors

Testing

The changes in this PR are covered in the following automated tests:

  • stories
  • unit tests
  • integration tests
  • end-to-end tests

Manual testing

Caution

This section is mandatory for all contributions. If you believe no manual test is necessary, please state so explicitly. Thanks!

Documentation

  • Add or update documentation reflecting your changes
  • If you are deprecating/removing a feature, make sure to update
    MIGRATION.MD

Checklist for Maintainers

  • When this PR is ready for testing, make sure to add ci:normal, ci:merged or ci:daily GH label to it to run a specific set of sandboxes. The particular set of sandboxes can be found in code/lib/cli-storybook/src/sandbox-templates.ts

  • Make sure this PR contains one of the labels below:

    Available labels
    • bug: Internal changes that fixes incorrect behavior.
    • maintenance: User-facing maintenance tasks.
    • dependencies: Upgrading (sometimes downgrading) dependencies.
    • build: Internal-facing build tooling & test updates. Will not show up in release changelog.
    • cleanup: Minor cleanup style change. Will not show up in release changelog.
    • documentation: Documentation only changes. Will not show up in release changelog.
    • feature request: Introducing a new feature.
    • BREAKING CHANGE: Changes that break compatibility in some way with current major version.
    • other: Changes that don't fit in the above categories.

🦋 Canary release

This PR does not have a canary release associated. You can request a canary release of this pull request by mentioning the @storybookjs/core team here.

core team members can create a canary release here or locally with gh workflow run --repo storybookjs/storybook publish.yml --field pr=<PR_NUMBER>

Summary by CodeRabbit

  • Documentation

    • Added comprehensive end-to-end bug-fix and verification workflows (multiple verifier flows), a universal verification checklist, CLAUDE guidance, terminal-snapshot guidance; removed legacy Copilot instructions and updated ignore rules.
  • Tests

    • Added Manager UI smoke tests and new Storybook example stories to support visual and behavioral verification.
  • Chores

    • Added automated PR verification workflow, Playwright browser install step, and terminal-output capture/compare tooling with baseline management plus npm scripts to run it.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 3, 2026

Fails
🚫 The "#### Manual testing" section must be filled in. Please describe how to test the changes you've made, step by step, so that reviewers can confirm your PR works as intended.

Generated by 🚫 dangerJS against 571522d

@valentinpalkovic valentinpalkovic changed the title Agentic Workflow improvements Build: Agentic Workflow improvements Mar 3, 2026
@valentinpalkovic valentinpalkovic self-assigned this Mar 3, 2026
@valentinpalkovic valentinpalkovic added build Internal-facing build tooling & test updates ci:normal labels Mar 3, 2026
@valentinpalkovic valentinpalkovic moved this to In Progress in Core Team Projects Mar 3, 2026
@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented Mar 3, 2026

View your CI Pipeline Execution ↗ for commit 67d335f

Command Status Duration Result
nx run-many --targets compile ✅ Succeeded 34s View ↗

☁️ Nx Cloud last updated this comment at 2026-03-05 10:34:03 UTC

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Claude skill workflows and verification docs, terminal-output capture/normalization tooling with baseline snapshots, CI GitHub Actions to run terminal-output checks (including Playwright browser install), workspace/storybook stories and e2e smoke tests, and supporting script/exec changes to enable snapshot capture and comparison.

Changes

Cohort / File(s) Summary
Claude Skills & Workflows
\.claude/skills/verification-checklist/SKILL.md, \.claude/skills/renderer-bug-workflow/SKILL.md, \.claude/skills/builder-bug-workflow/SKILL.md, \.claude/skills/manager-bug-workflow/SKILL.md, \.claude/skills/fix-bug/SKILL.md, \.claude/skills/implement-and-verify-fix/SKILL.md, \.claude/skills/plan-bug-fix/SKILL.md, \.claude/skills/open-pull-request/SKILL.md
Add multiple skill docs describing Flow 0–4 verification, planning, implement-and-verify, and PR workflows with step-by-step capture/verification procedures and evidence requirements.
Terminal-output tooling & snapshots
scripts/capture-terminal-output.ts, scripts/terminal-output-snapshots/README.md, scripts/terminal-output-snapshots/*snap.txt
New CLI to run/build sandboxes, normalize volatile terminal output, create/update provisional baselines, compare output and emit diffs; includes README and two initial snapshot baselines.
Exec utility & scripts packaging
scripts/utils/exec.ts, scripts/package.json, package.json
Change exec signature to optionally capture and return output; add capture-terminal-output npm script in root and scripts packages.
CI Workflows & setup
.github/workflows/copilot-setup-steps.yml, .github/workflows/copilot-verification.yml
Add Playwright browser install step and a new Copilot verification workflow that builds packages, generates sandboxes, and runs terminal-output snapshot comparisons per builder.
Storybook stories & tests
code/renderers/react/template/stories/copilot-verification-example.stories.tsx, code/renderers/react/template/stories/visual-render-verification.stories.tsx, code/e2e-tests/manager.spec.ts
Add example visual stories and a manager UI smoke test for verification flows.
Workspace, docs & misc
CLAUDE.md, AGENTS.md, .github/copilot-instructions.md, storybook.code-workspace, .gitignore
Add CLAUDE.md and AGENTS.md, remove old .github/copilot-instructions.md, update workspace MCP config, and remove .claude from .gitignore so skills are tracked.
Snapshots & examples
scripts/terminal-output-snapshots/*
Add concrete build output snapshots for webpack5 and vite builds used as baselines.

Sequence Diagram(s)

sequenceDiagram
    participant CI as "GitHub Actions (PR)"
    participant Repo as "Repo (checkout)"
    participant Setup as "Setup & Build"
    participant Sandbox as "Sandbox Generator"
    participant Capture as "capture-terminal-output"
    participant Normalize as "Normalizer"
    participant Compare as "Snapshot Comparator"
    participant Reporter as "CI Reporter"

    CI->>Repo: checkout
    CI->>Setup: install deps + install Playwright browsers
    Setup->>Setup: compile packages
    CI->>Sandbox: generate sandboxes (react-vite, react-webpack)
    CI->>Capture: invoke per-builder capture (build/dev)
    Capture->>Sandbox: run yarn storybook / build-storybook
    Sandbox->>Capture: stream terminal output
    Capture->>Normalize: send raw output
    Normalize->>Compare: provide normalized output
    Compare->>Compare: load baseline snapshot
    alt match
        Compare->>Reporter: pass
    else mismatch
        Compare->>Reporter: fail + unified diff
    end
    Reporter->>CI: exit with status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (7)
.claude/skills/renderer-bug-workflow/SKILL.md (1)

53-57: Use a safer port cleanup command before kill -9.

At Line 55, leading with SIGKILL is unnecessarily aggressive. Prefer graceful termination first, then escalate only if needed.

Proposed command update
-  lsof -ti :6006 | xargs kill -9
+  lsof -ti :6006 | xargs kill
+  # if a process does not exit, then force kill:
+  lsof -ti :6006 | xargs kill -9
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/renderer-bug-workflow/SKILL.md around lines 53 - 57, Update
the "Known failure — port in use" guidance to avoid starting with SIGKILL:
replace the current one-line command that uses kill -9 with a two-step safer
sequence that first attempts a graceful SIGTERM (kill -15) against the pid(s)
found by lsof (or equivalent) and only escalates to SIGKILL (kill -9) if the
process remains; ensure the commands use xargs safely (e.g., handle empty input)
and mention retrying the dev server after successful termination; reference the
text block titled "Known failure — port in use" and the existing lsof | xargs
kill usage when making this change.
scripts/terminal-output-snapshots/builder-vite-build.snap.txt (1)

16-277: Consider reducing snapshot volatility by filtering per-asset listings.

This baseline captures a very large chunk/file inventory, which is likely to churn on unrelated changes and generate noisy failures. Keeping phase markers, warnings, and completion lines would preserve intent with better signal-to-noise.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/terminal-output-snapshots/builder-vite-build.snap.txt` around lines
16 - 277, The snapshot is too noisy because it records every per-asset listing
(lines containing "Vite storybook-static/assets/" and large entries like "Vite
storybook-static/assets/iframe-<HASH>.js"), causing churn; update the snapshot
generation or test that produces builder-vite-build.snap.txt to filter or
collapse per-asset file entries (keep only phase markers/warnings/completion
lines and aggregate asset counts or sizes) so snapshots only assert high-level
output rather than each asset path, e.g., replace detailed asset listings with a
single summary line or whitelist only important filenames (retain "Vite
storybook-static", "iframe-<HASH>.js" and top-level warnings/completion text).
storybook.code-workspace (1)

24-28: Pin chrome-devtools-mcp to a concrete version instead of @latest.

At Line 27, @latest makes agent tooling behavior non-reproducible and can introduce sudden breakage. Recent releases (e.g., v0.18.0) include behavior-affecting changes like "move emulation settings to context" that could unexpectedly alter tool functionality. Pin to a stable version like 0.18.1 and update intentionally.

Proposed change
-            "chrome-devtools-mcp@latest"
+            "chrome-devtools-mcp@0.18.1"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@storybook.code-workspace` around lines 24 - 28, Replace the floating tag
"chrome-devtools-mcp@latest" in the args array (the npx command invocation) with
a concrete version to ensure reproducible tooling, e.g.,
"chrome-devtools-mcp@0.18.1"; locate the args entry that currently reads
"chrome-devtools-mcp@latest" and update it to the chosen fixed version so future
runs use the pinned release instead of `@latest`.
.github/workflows/copilot-verification.yml (2)

64-66: Missing newline at end of file.

Add a newline at the end of the file for POSIX compliance.

Add trailing newline
           if [ $failed -eq 1 ]; then
             exit 1
           fi
+
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/copilot-verification.yml around lines 64 - 66, The file
ends without a trailing newline; add a single newline character at the end of
the file so the last line ("if [ $failed -eq 1 ]; then" / "exit 1" / "fi") is
terminated by a newline to satisfy POSIX and linters—simply ensure the final
line is followed by one newline character.

3-5: Consider adding path filters to reduce unnecessary CI runs.

This workflow runs on every PR, including those that don't affect builder code or terminal output. Consider adding path filters to run only when relevant files change.

Example path filter configuration
 on:
   pull_request:
     types: [opened, synchronize, reopened]
+    paths:
+      - 'code/builders/**'
+      - 'scripts/capture-terminal-output.ts'
+      - 'scripts/terminal-output-snapshots/**'
+      - '.github/workflows/copilot-verification.yml'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/copilot-verification.yml around lines 3 - 5, The
pull_request trigger currently fires for all PRs (see the on: pull_request:
types: [opened, synchronize, reopened] block); add path filters (paths and/or
paths-ignore) to that pull_request trigger so the workflow only runs when
relevant files change (e.g., builder, CI, or terminal/output-related files);
update the on: pull_request block to include appropriate path patterns to limit
runs and keep the existing event types unchanged.
code/renderers/react/template/stories/visual-render-verification.stories.tsx (1)

26-29: Consider using satisfies for better type safety.

Using as Meta<...> allows type coercion, while satisfies provides type checking without widening. This is a minor stylistic preference.

Alternative using satisfies
 export default {
   component: VisualRenderVerification,
   parameters: { chromatic: { disableSnapshot: true } },
-} as Meta<typeof VisualRenderVerification>;
+} satisfies Meta<typeof VisualRenderVerification>;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/renderers/react/template/stories/visual-render-verification.stories.tsx`
around lines 26 - 29, The default export uses "as Meta<typeof
VisualRenderVerification>" which allows coercion—replace the type assertion with
the "satisfies" operator to retain type checking without widening: change the
export default object so it ends with " } satisfies Meta<typeof
VisualRenderVerification>;" (referencing the exported object,
VisualRenderVerification and Meta) and ensure your TS target supports
"satisfies"; no runtime behavior changes needed, only the type annotation.
scripts/capture-terminal-output.ts (1)

80-141: Diff function outputs all content as context instead of limiting to surrounding lines.

The CONTEXT_LINES constant is defined but never used to limit context output. The function currently outputs the entire content as context lines rather than showing only the surrounding context around changes (typically 3 lines before/after). This could result in very large diff outputs for files with small changes.

Additionally, line 91 filters out empty lines (filter((line) => line.length > 0)), which loses blank line information and could make the diff output misleading when comparing files with significant whitespace.

Consider limiting context output
-      const lines = text.split('\n').filter((line) => line.length > 0);
+      const lines = text.split('\n');
+      // Remove only the trailing empty string from split if text ends with newline
+      if (lines.length > 0 && lines[lines.length - 1] === '') {
+        lines.pop();
+      }

For a proper unified diff with limited context, consider using a dedicated diff library like diff (from npm) which has built-in support for unified diff format with configurable context lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/capture-terminal-output.ts` around lines 80 - 141, The
createUnifiedDiff currently treats all equal lines as context (and strips blank
lines) instead of limiting to CONTEXT_LINES; fix it by (1) removing the
.filter((line) => line.length > 0) so blank lines are preserved, (2) change how
contextBuffer is used: make it a sliding window that only retains the last
CONTEXT_LINES of {type:0,line} and when you detect the first non-equal change
(hunkStart set in createUnifiedDiff) flush the preceding up-to-CONTEXT_LINES
context into the output for that hunk, then collect following changed lines and
up-to-CONTEXT_LINES trailing equal lines after the change before closing the
hunk, and (3) recompute hunkStart, beforeIndex and afterIndex counts based on
the trimmed context and the actual lines emitted so @@ -start,count +start,count
reflects the trimmed context; use the existing symbols (CONTEXT_LINES,
contextBuffer, hunkStart, beforeIndex, afterIndex, diffLines) to locate where to
implement this logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/fix-bug/SKILL.md:
- Around line 418-430: The checklist contains contradictory guidance about
force-pushing: replace the two conflicting items with an explicit policy and
instructions. Update the checklist text so it reads that contributors should not
force-push for routine fixes, but when resolving merge conflicts during a rebase
(the block that currently shows the git fetch/rebase/resolve sequence) they may
rebase and push using git push --force-with-lease; add a short note requiring a
comment in the PR or approval from a maintainer before performing the
force-with-lease push to make the exception explicit.
- Around line 27-43: The fenced code blocks that begin with "Step 1: Fetch &
Understand Issue" (and other similar untyped fences) are missing language
identifiers and trigger markdownlint MD040; update each opening triple-backtick
for these blocks to include a language token (e.g., ```text) so the fences are
typed, ensuring all occurrences in SKILL.md use consistent language identifiers
and re-run linting to confirm MD040 is resolved.

In @.claude/skills/manager-bug-workflow/SKILL.md:
- Around line 44-46: The markdown code fence containing the line
"{title-prefix}-{story-title}--{export-name}" is missing a language identifier
(MD040); update the opening fence from ``` to include a language tag (for
example change it to ```text) so the block becomes a fenced code block with a
language identifier for that exact content.

In `@CLAUDE.md`:
- Around line 5-23: The fenced code block in CLAUDE.md is unlabeled (triple
backticks around the repository tree) which triggers markdownlint MD040; update
the opening fence to include a language specifier such as "text" (change ``` to
```text) for the block that contains the repository layout so the code fence is
labeled while keeping the closing ``` unchanged and preserving the content of
the tree.

In `@scripts/utils/exec.ts`:
- Around line 52-57: The loop over the command array incorrectly detects the
last subcommand by comparing subcommand text, so duplicate command strings can
trigger an early return; update the loop in the exec utility to detect the final
element by position (e.g., use an indexed for loop or track the index) and only
perform the captureOutput return when the current index === command.length - 1,
keeping references to currentChild, logger, execa, captureOutput and options
intact.

---

Nitpick comments:
In @.claude/skills/renderer-bug-workflow/SKILL.md:
- Around line 53-57: Update the "Known failure — port in use" guidance to avoid
starting with SIGKILL: replace the current one-line command that uses kill -9
with a two-step safer sequence that first attempts a graceful SIGTERM (kill -15)
against the pid(s) found by lsof (or equivalent) and only escalates to SIGKILL
(kill -9) if the process remains; ensure the commands use xargs safely (e.g.,
handle empty input) and mention retrying the dev server after successful
termination; reference the text block titled "Known failure — port in use" and
the existing lsof | xargs kill usage when making this change.

In @.github/workflows/copilot-verification.yml:
- Around line 64-66: The file ends without a trailing newline; add a single
newline character at the end of the file so the last line ("if [ $failed -eq 1
]; then" / "exit 1" / "fi") is terminated by a newline to satisfy POSIX and
linters—simply ensure the final line is followed by one newline character.
- Around line 3-5: The pull_request trigger currently fires for all PRs (see the
on: pull_request: types: [opened, synchronize, reopened] block); add path
filters (paths and/or paths-ignore) to that pull_request trigger so the workflow
only runs when relevant files change (e.g., builder, CI, or
terminal/output-related files); update the on: pull_request block to include
appropriate path patterns to limit runs and keep the existing event types
unchanged.

In
`@code/renderers/react/template/stories/visual-render-verification.stories.tsx`:
- Around line 26-29: The default export uses "as Meta<typeof
VisualRenderVerification>" which allows coercion—replace the type assertion with
the "satisfies" operator to retain type checking without widening: change the
export default object so it ends with " } satisfies Meta<typeof
VisualRenderVerification>;" (referencing the exported object,
VisualRenderVerification and Meta) and ensure your TS target supports
"satisfies"; no runtime behavior changes needed, only the type annotation.

In `@scripts/capture-terminal-output.ts`:
- Around line 80-141: The createUnifiedDiff currently treats all equal lines as
context (and strips blank lines) instead of limiting to CONTEXT_LINES; fix it by
(1) removing the .filter((line) => line.length > 0) so blank lines are
preserved, (2) change how contextBuffer is used: make it a sliding window that
only retains the last CONTEXT_LINES of {type:0,line} and when you detect the
first non-equal change (hunkStart set in createUnifiedDiff) flush the preceding
up-to-CONTEXT_LINES context into the output for that hunk, then collect
following changed lines and up-to-CONTEXT_LINES trailing equal lines after the
change before closing the hunk, and (3) recompute hunkStart, beforeIndex and
afterIndex counts based on the trimmed context and the actual lines emitted so
@@ -start,count +start,count reflects the trimmed context; use the existing
symbols (CONTEXT_LINES, contextBuffer, hunkStart, beforeIndex, afterIndex,
diffLines) to locate where to implement this logic.

In `@scripts/terminal-output-snapshots/builder-vite-build.snap.txt`:
- Around line 16-277: The snapshot is too noisy because it records every
per-asset listing (lines containing "Vite storybook-static/assets/" and large
entries like "Vite storybook-static/assets/iframe-<HASH>.js"), causing churn;
update the snapshot generation or test that produces builder-vite-build.snap.txt
to filter or collapse per-asset file entries (keep only phase
markers/warnings/completion lines and aggregate asset counts or sizes) so
snapshots only assert high-level output rather than each asset path, e.g.,
replace detailed asset listings with a single summary line or whitelist only
important filenames (retain "Vite storybook-static", "iframe-<HASH>.js" and
top-level warnings/completion text).

In `@storybook.code-workspace`:
- Around line 24-28: Replace the floating tag "chrome-devtools-mcp@latest" in
the args array (the npx command invocation) with a concrete version to ensure
reproducible tooling, e.g., "chrome-devtools-mcp@0.18.1"; locate the args entry
that currently reads "chrome-devtools-mcp@latest" and update it to the chosen
fixed version so future runs use the pinned release instead of `@latest`.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 349399d and e4831bf.

📒 Files selected for processing (21)
  • .claude/skills/builder-bug-workflow/SKILL.md
  • .claude/skills/fix-bug/SKILL.md
  • .claude/skills/manager-bug-workflow/SKILL.md
  • .claude/skills/renderer-bug-workflow/SKILL.md
  • .claude/skills/verification-checklist/SKILL.md
  • .github/copilot-instructions.md
  • .github/workflows/copilot-setup-steps.yml
  • .github/workflows/copilot-verification.yml
  • .gitignore
  • CLAUDE.md
  • code/e2e-tests/manager.spec.ts
  • code/renderers/react/template/stories/copilot-verification-example.stories.tsx
  • code/renderers/react/template/stories/visual-render-verification.stories.tsx
  • package.json
  • scripts/capture-terminal-output.ts
  • scripts/package.json
  • scripts/terminal-output-snapshots/README.md
  • scripts/terminal-output-snapshots/builder-vite-build.snap.txt
  • scripts/terminal-output-snapshots/builder-webpack5-build.snap.txt
  • scripts/utils/exec.ts
  • storybook.code-workspace
💤 Files with no reviewable changes (1)
  • .gitignore

Comment thread .claude/skills/fix-bug/SKILL.md
Comment thread .claude/skills/fix-bug/SKILL.md Outdated
Comment on lines +418 to +430
- [ ] Address any feedback in new commits (do NOT force-push)
- [ ] Re-run verification if requested
- [ ] Respond to questions about the fix

**If merge conflicts occur**:

```bash
git fetch origin
git rebase origin/next
# Resolve conflicts
cd code && yarn test
git push --force-with-lease origin agent/fix-issue-$ARGUMENTS[0]
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resolve contradictory force-push guidance.

Line 418 says “do NOT force-push,” but Line 429 instructs --force-with-lease. Please make the policy explicit so contributors know when force-push is allowed.

Suggested fix
-- [ ] Address any feedback in new commits (do NOT force-push)
+- [ ] Address any feedback in new commits (avoid force-push except when required after a local rebase conflict resolution)

 git fetch origin
 git rebase origin/next
 # Resolve conflicts
 cd code && yarn test
-git push --force-with-lease origin agent/fix-issue-$ARGUMENTS[0]
+git push --force-with-lease origin agent/fix-issue-$ARGUMENTS[0]  # allowed here after conflict-resolution rebase
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/fix-bug/SKILL.md around lines 418 - 430, The checklist
contains contradictory guidance about force-pushing: replace the two conflicting
items with an explicit policy and instructions. Update the checklist text so it
reads that contributors should not force-push for routine fixes, but when
resolving merge conflicts during a rebase (the block that currently shows the
git fetch/rebase/resolve sequence) they may rebase and push using git push
--force-with-lease; add a short note requiring a comment in the PR or approval
from a maintainer before performing the force-with-lease push to make the
exception explicit.

Comment on lines +44 to +46
```
{title-prefix}-{story-title}--{export-name}
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language identifier to the fenced block (MD040).

The code fence at Line 44 is missing a language tag and will trigger markdownlint.

Proposed fix
-```
+```text
 {title-prefix}-{story-title}--{export-name}
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 44-44: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/manager-bug-workflow/SKILL.md around lines 44 - 46, The
markdown code fence containing the line
"{title-prefix}-{story-title}--{export-name}" is missing a language identifier
(MD040); update the opening fence from ``` to include a language tag (for
example change it to ```text) so the block becomes a fenced code block with a
language identifier for that exact content.

Comment thread CLAUDE.md Outdated
Comment thread scripts/utils/exec.ts
Comment on lines 52 to +57
for (const subcommand of command) {
logger.debug(`> ${subcommand}`);
currentChild = execa(subcommand, { ...defaultOptions, ...options });
await currentChild;
const result = await currentChild;
if (captureOutput && subcommand === command[command.length - 1]) {
return result.all ?? '';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix last-subcommand detection to avoid early return on duplicate commands.

Line 56 compares command text instead of position. If a subcommand string repeats, this can return too early and skip remaining commands.

Suggested fix
-      for (const subcommand of command) {
+      for (const [index, subcommand] of command.entries()) {
         logger.debug(`> ${subcommand}`);
         currentChild = execa(subcommand, { ...defaultOptions, ...options });
         const result = await currentChild;
-        if (captureOutput && subcommand === command[command.length - 1]) {
+        if (captureOutput && index === command.length - 1) {
           return result.all ?? '';
         }
       }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (const subcommand of command) {
logger.debug(`> ${subcommand}`);
currentChild = execa(subcommand, { ...defaultOptions, ...options });
await currentChild;
const result = await currentChild;
if (captureOutput && subcommand === command[command.length - 1]) {
return result.all ?? '';
for (const [index, subcommand] of command.entries()) {
logger.debug(`> ${subcommand}`);
currentChild = execa(subcommand, { ...defaultOptions, ...options });
const result = await currentChild;
if (captureOutput && index === command.length - 1) {
return result.all ?? '';
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/utils/exec.ts` around lines 52 - 57, The loop over the command array
incorrectly detects the last subcommand by comparing subcommand text, so
duplicate command strings can trigger an early return; update the loop in the
exec utility to detect the final element by position (e.g., use an indexed for
loop or track the index) and only perform the captureOutput return when the
current index === command.length - 1, keeping references to currentChild,
logger, execa, captureOutput and options intact.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/builder-bug-workflow/SKILL.md:
- Line 3: Update the inconsistent flow numbering in the workflow description:
find all occurrences of "Flow 1", "Flow 2", and "Flow 3" in the SKILL.md
workflow text (the description line and the paragraphs that label builder
verification flows) and make them consistent so that the builder verification
flows referenced at the top (frontend output vs terminal output) match the later
flow labels (e.g., change the later "Flow 1" and "Flow 2" back to "Flow 2" and
"Flow 3" or vice versa), ensuring each flow name used in the sentence fragments
and headings is identical across the file; update any cross-references/snapshots
or sandbox requirements that mention the old numbering to the corrected numbers.

In @.claude/skills/fix-bug/SKILL.md:
- Line 30: The document has inconsistent step numbering (the overview lists
"Step 2: Determine Verification Flow" but the planning section and the later
verification-flow section use different step numbers), so pick the correct
canonical order (e.g., keep "Determine Verification Flow" as Step 2) and update
every occurrence to match: change the header and any inline references in the
overview, the planning section (currently labeled as Step 2 at Line 69), the
reintroduced verification-flow header (currently Step 4 at Line 198), and the
later reference (Line 248) so all mentions of these steps and the Table of
Contents use the same step numbers and wording (search for the literal tokens
"Step 2" and "Step 4" and the section title "Determine Verification Flow" to
ensure consistency).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a27fc89 and 3d59ac4.

📒 Files selected for processing (3)
  • .claude/skills/builder-bug-workflow/SKILL.md
  • .claude/skills/fix-bug/SKILL.md
  • CLAUDE.md

@@ -0,0 +1,150 @@
---
name: builder-bug-workflow
description: Complete workflows for verifying builder bug fixes in code/builders/**. Includes Flow 2 (frontend output) and Flow 3 (terminal output) with sandbox requirements, hash normalization, and snapshot management.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix inconsistent flow numbering in this workflow doc.

Lines 3 and 8 define builder verification as Flow 2 (frontend) and Flow 3 (terminal), but Lines 17 and 79 rename them to Flow 1 and Flow 2. This can misroute verification steps; keep numbering aligned everywhere.

Proposed doc fix
-# Flow 1 — Builder Bug Verification (Frontend Output)
+# Flow 2 — Builder Bug Verification (Frontend Output)

-# Flow 2 — Builder Bug Verification (Terminal Output)
+# Flow 3 — Builder Bug Verification (Terminal Output)

Also applies to: 8-8, 17-17, 79-79

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/builder-bug-workflow/SKILL.md at line 3, Update the
inconsistent flow numbering in the workflow description: find all occurrences of
"Flow 1", "Flow 2", and "Flow 3" in the SKILL.md workflow text (the description
line and the paragraphs that label builder verification flows) and make them
consistent so that the builder verification flows referenced at the top
(frontend output vs terminal output) match the later flow labels (e.g., change
the later "Flow 1" and "Flow 2" back to "Flow 2" and "Flow 3" or vice versa),
ensuring each flow name used in the sentence fragments and headings is identical
across the file; update any cross-references/snapshots or sandbox requirements
that mention the old numbering to the corrected numbers.

Comment thread .claude/skills/fix-bug/SKILL.md Outdated
@storybook-app-bot
Copy link
Copy Markdown

storybook-app-bot Bot commented Mar 3, 2026

Package Benchmarks

Commit: 0f6c96a, ran on 5 March 2026 at 12:46:48 UTC

No significant changes detected, all good. 👏

@valentinpalkovic valentinpalkovic force-pushed the valentin/agentic-workflow branch from 3d59ac4 to ca406a5 Compare March 3, 2026 18:19
- Added detailed decision-making steps for baseline creation and updates in builder-bug-workflow.
- Revised fix-bug workflow to clarify steps for planning, implementing, and verifying bug fixes.
- Introduced implement-and-verify-fix skill to streamline the implementation and verification process.
- Updated manager-bug-workflow to include comprehensive evidence quality checklists.
- Created open-pull-request skill to automate PR creation with flow-specific templates and required labels.
- Established plan-bug-fix skill to facilitate thorough issue understanding and fix planning.
- Improved renderer-bug-workflow with enhanced evidence quality criteria for screenshots.
- Refined CLAUDE.md to clarify linting and formatting processes.
@valentinpalkovic valentinpalkovic force-pushed the valentin/agentic-workflow branch from ca406a5 to d962d1a Compare March 3, 2026 18:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
.claude/skills/plan-bug-fix/SKILL.md (1)

27-37: Consider adding language identifiers to template/diagram fences.

Lines 27, 57, 76, and 129 have unlabeled fences containing workflow diagrams and templates. Adding text identifiers resolves markdownlint MD040.

Example fix for line 27
-```
+```text
 Step 1: Fetch & Understand Issue
          ↓

Also applies to: 57-62, 76-95, 129-161

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/plan-bug-fix/SKILL.md around lines 27 - 37, Add a language
identifier (use "text") to each unlabeled fenced code block that contains
workflow diagrams or templates in SKILL.md (e.g., the fence that begins with
"Step 1: Fetch & Understand Issue", the block covering the checkpoint "Confirm
routing", the larger template blocks around Steps 3–4, and the template starting
near "COMPLETE — Ready for implementation") so markdownlint MD040 is resolved;
update each opening triple-backtick to ```text for those four diagram/template
fences.
.claude/skills/open-pull-request/SKILL.md (1)

33-40: Add language identifier to workflow overview fence.

Line 33 has an unlabeled fence for the workflow diagram. Adding text resolves markdownlint MD040.

Suggested fix
-```
+```text
 Step 1: Push Feature Branch
          ↓
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/open-pull-request/SKILL.md around lines 33 - 40, The fenced
block containing the workflow diagram in SKILL.md is missing a language
identifier (causing markdownlint MD040); update the opening fence for the
diagram from ``` to ```text so the workflow overview fence is labeled, e.g.,
change the code fence that wraps the "Step 1: Push Feature Branch ... [DONE: PR
created, labeled, and ready]" diagram to use ```text.
CLAUDE.md (1)

29-32: Add language identifiers to bash code fences.

Lines 29, 36, 42, 49, 65, and 78 have unlabeled fences for bash commands. Adding bash identifiers improves syntax highlighting and resolves markdownlint MD040.

Example fix for line 29
-```
+```bash
 yarn nx run-many -t compile -c production
 yarn nx compile <package-name> -c production

Apply the same pattern to the other bash code blocks.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @CLAUDE.md around lines 29 - 32, Add the Bash language identifier to each
unlabeled code fence containing shell commands so markdownlint MD040 is
satisfied; specifically update the fences that wrap the blocks with the commands
"yarn nx run-many -t compile -c production" / "yarn nx compile -c
production" and the other unlabeled blocks that contain similar yarn/npm shell
examples by changing the opening "" to "bash" for each such block.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.claude/skills/open-pull-request/SKILL.md:

  • Around line 33-40: The fenced block containing the workflow diagram in
    SKILL.md is missing a language identifier (causing markdownlint MD040); update
    the opening fence for the diagram from totext so the workflow overview
    fence is labeled, e.g., change the code fence that wraps the "Step 1: Push
    Feature Branch ... [DONE: PR created, labeled, and ready]" diagram to use

In @.claude/skills/plan-bug-fix/SKILL.md:
- Around line 27-37: Add a language identifier (use "text") to each unlabeled
fenced code block that contains workflow diagrams or templates in SKILL.md
(e.g., the fence that begins with "Step 1: Fetch & Understand Issue", the block
covering the checkpoint "Confirm routing", the larger template blocks around
Steps 3–4, and the template starting near "COMPLETE — Ready for implementation")
so markdownlint MD040 is resolved; update each opening triple-backtick to
```text for those four diagram/template fences.

In `@CLAUDE.md`:
- Around line 29-32: Add the Bash language identifier to each unlabeled code
fence containing shell commands so markdownlint MD040 is satisfied; specifically
update the fences that wrap the blocks with the commands "yarn nx run-many -t
compile -c production" / "yarn nx compile <package-name> -c production" and the
other unlabeled blocks that contain similar yarn/npm shell examples by changing
the opening "```" to "```bash" for each such block.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d59ac4 and d962d1a.

📒 Files selected for processing (8)
  • .claude/skills/builder-bug-workflow/SKILL.md
  • .claude/skills/fix-bug/SKILL.md
  • .claude/skills/implement-and-verify-fix/SKILL.md
  • .claude/skills/manager-bug-workflow/SKILL.md
  • .claude/skills/open-pull-request/SKILL.md
  • .claude/skills/plan-bug-fix/SKILL.md
  • .claude/skills/renderer-bug-workflow/SKILL.md
  • CLAUDE.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.claude/skills/fix-bug/SKILL.md (1)

29-39: ⚠️ Potential issue | 🟡 Minor

Add language identifiers to all fenced code blocks.

markdownlint MD040 is still triggered by untyped fences (e.g., Line 29 and the other listed blocks). Please annotate each opening fence (text for workflow/output examples, bash for commands).

Suggested patch
-```
+```text
 Step 1: Plan Bug Fix (understand, route, plan)
          ↓ [via /plan-bug-fix skill]
 Step 2: Implement and Verify Fix (code, test, verify)
          ↓ [MUST PASS: All tests, evidence gathered]
 Step 3: Prepare PR Description (with evidence)
          ↓
 Step 4: Open PR (via /open-pull-request skill)
          ↓ [Handles: push, PR creation, labels]
          ✅ COMPLETE — PR ready for review

@@
- +text
/plan-bug-fix [issue-number]

@@
-```
+```text
/plan-bug-fix 12345

@@
- +text
/implement-and-verify-fix [issue-number]

@@
-```
+```text
/implement-and-verify-fix 12345

@@
- +text
/open-pull-request [issue-number]

@@
-```
+```text
/open-pull-request 12345
</details>


Also applies to: 56-58, 62-64, 102-104, 108-110, 195-197, 201-203

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @.claude/skills/fix-bug/SKILL.md around lines 29 - 39, Update all fenced code
blocks in .claude/skills/fix-bug/SKILL.md to include explicit language
identifiers (e.g., use text for workflow/output examples and bash for
shell commands) so markdownlint MD040 stops flagging untyped fences;
specifically add language tags to the top workflow block (the multi-line fence
around "Step 1: Plan Bug Fix..." and the individual command/example fences that
show /plan-bug-fix, /implement-and-verify-fix, /open-pull-request and
their example invocations), and apply the same change to the other untyped
fences reported (the blocks near the other listed locations) so every opening

🧹 Nitpick comments (1)
.claude/skills/fix-bug/SKILL.md (1)

176-177: Narrow staging scope to avoid accidental unrelated docs commits.

Line 176 stages the entire .claude/skills/ tree, which can unintentionally include unrelated edits. Prefer explicit file paths (or git add -p) in this workflow doc.

Suggested patch
-  git add CLAUDE.md .claude/skills/
+  # Stage only files you intentionally changed
+  git add CLAUDE.md .claude/skills/fix-bug/SKILL.md
+  # or use: git add -p
   git commit -m "Docs: Improve [skill-name] instructions based on workflow execution"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/fix-bug/SKILL.md around lines 176 - 177, Replace the broad
staging command "git add CLAUDE.md .claude/skills/" with a narrowly-scoped add
(e.g., "git add CLAUDE.md .claude/skills/<specific-file>.md" or use "git add -p"
to interactively stage hunks) so unrelated files under .claude/skills/ are not
included; update the workflow text that shows the commit command accordingly
(keep the git commit -m line but ensure the preceding git add shows the explicit
path or -p usage).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.claude/skills/fix-bug/SKILL.md:
- Around line 29-39: Update all fenced code blocks in
.claude/skills/fix-bug/SKILL.md to include explicit language identifiers (e.g.,
use ```text for workflow/output examples and ```bash for shell commands) so
markdownlint MD040 stops flagging untyped fences; specifically add language tags
to the top workflow block (the multi-line fence around "Step 1: Plan Bug Fix..."
and the individual command/example fences that show `/plan-bug-fix`,
`/implement-and-verify-fix`, `/open-pull-request` and their example
invocations), and apply the same change to the other untyped fences reported
(the blocks near the other listed locations) so every opening ``` has a language
identifier.

---

Nitpick comments:
In @.claude/skills/fix-bug/SKILL.md:
- Around line 176-177: Replace the broad staging command "git add CLAUDE.md
.claude/skills/" with a narrowly-scoped add (e.g., "git add CLAUDE.md
.claude/skills/<specific-file>.md" or use "git add -p" to interactively stage
hunks) so unrelated files under .claude/skills/ are not included; update the
workflow text that shows the commit command accordingly (keep the git commit -m
line but ensure the preceding git add shows the explicit path or -p usage).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d962d1a and eb31841.

📒 Files selected for processing (1)
  • .claude/skills/fix-bug/SKILL.md

@valentinpalkovic valentinpalkovic force-pushed the valentin/agentic-workflow branch from b9c0c21 to f0afb5b Compare March 3, 2026 21:33
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a GitHub Workflow instead of another job in CircleCI?

@yannbf yannbf added agent and removed agent labels Mar 5, 2026
yannbf and others added 9 commits March 5, 2026 12:39
- Replace page.goto() + waitUntilLoaded() pattern with navigateToStory()
- Add note: Controls panel argType rows need both args + control to render
- Add troubleshooting tips: error-context.md ARIA snapshot and react-aria
  dialog accessible name behaviour

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Apply labels via 'gh pr edit --add-label' after PR creation instead of
'--label' in 'gh pr create'. The --label flag can fail silently when
GitHub's label lookup is slow, leaving the PR without the 'agent' label.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…GitHub.com Copilot

Remove stale 'Copilot handles it automatically' wording from description,
overview, and summary. The skill now explicitly uses 'gh pr create' +
'gh pr edit --add-label' in Step 3, which works in both Claude Code CLI
and GitHub.com Copilot coding agent environments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
'gh pr edit --add-label' only works for maintainers/Copilot with write
access. For non-maintainers, fall back to leaving a PR comment listing
the labels to apply. Use '|| true' so the step doesn't hard-fail.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ci:normal is now added alongside agent+bug in Step 3, but the step is
explicitly gated by a pre-flight checklist: verification-checklist must
pass, flow-specific verification must be done, and PR evidence section
must be fully populated. ci:normal is never applied if any of those
are incomplete.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Story IDs are hard to predict due to Storybook's title calculation logic.
Add a note to manager-bug-workflow and renderer-bug-workflow explaining
that agents can query http://localhost:6006/index.json to look up the
exact story id, title, and name instead of guessing the URL pattern.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Internal-facing build tooling & test updates ci:normal

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Tracking]: Storybook Copilot Agent Verification Suite

3 participants