diff --git a/.agents/skills/verify-recipe-author/SKILL.md b/.agents/skills/verify-recipe-author/SKILL.md new file mode 100644 index 000000000000..5baff97444f9 --- /dev/null +++ b/.agents/skills/verify-recipe-author/SKILL.md @@ -0,0 +1,246 @@ +--- +name: verify-recipe-author +description: Generate the Playwright recipe spec for a PR-verify-pr-generate prompt bundle. Reads `.verify-output//prompt-bundle.json`, dispatches the OMC executor agent (model=opus), runs deny-regex, writes `.verify-recipes/pr-<#>.spec.ts` with header-comment provenance, lints with one retry, emits `.verify-output//result.json`. Trigger after `yarn verify-pr-generate`. +allowed-tools: Agent, Bash, Read, Write, Edit +--- + +# Verify Recipe Author + +Consumes a prompt bundle emitted by `yarn verify-pr-generate --pr <#>` and produces the per-PR Playwright recipe spec for human review. Authoring only — never executes the spec. + +This skill is invoked **after** `yarn verify-pr-generate --pr <#>` succeeds. The bun script does the deterministic I/O (gh fetch, triage, prompt assembly, bundle write); this skill drives the agent dispatch, deny-regex, lint, retry, and the final write to `.verify-recipes/pr-<#>.spec.ts`. + +The full design and acceptance criteria live in `/Users/valentinpalkovic/Projects/storybook/.omc/plans/pr-verify-v3-agent-generated-recipes.md` (§Lane C, §D6, §D8, §D9). Read the plan if anything below is ambiguous. + +## Inputs + +No args required. The skill discovers the most recent bundle automatically. The caller may optionally pass an explicit bundle path as the skill argument. + +1. **Auto-discover (default)**: list `/Users/valentinpalkovic/Projects/storybook/.verify-output/`, pick the directory with the lexicographically largest name (ISO timestamps sort correctly), then read `prompt-bundle.json` inside it. +2. **Explicit path**: if the user passed an absolute path to a `prompt-bundle.json`, read that file directly. + +Bundle shape (see `scripts/verify-pr-generate.ts` for the canonical emitter): + +```jsonc +{ + "version": 1, + "prNumber": 12345, + "runId": "...", + "outputSpecPath": "/abs/path/.verify-recipes/pr-12345.spec.ts", + "force": false, + "prompt": "", + "metadata": { + "agentModel": "claude-opus-4-7[1m]", + "referenceSpecs": ["..."], + "triageGlobs": ["..."], + "generatedAt": "" + } +} +``` + +The `` is the parent directory of the bundle — derive it from the bundle path, not from a field. + +## Runbook + +Follow these steps in order. Stop and emit `result.json` per §Failure Modes on any non-success outcome. + +### Step 1 — Read the bundle + +`Read` the bundle JSON. Capture `prNumber`, `runId` (from the parent dir), `outputSpecPath`, `force`, `prompt`, and `metadata`. + +### Step 2 — Pre-flight collision check (D9, TOCTOU re-guard) + +Re-check whether `bundle.outputSpecPath` already exists. The bun script enforced D9 at bundle-emit time; the skill re-checks because the user may have created the file between the two steps. + +- If the file exists and `bundle.force === false` → write `result.json` with `{ status: "collision-aborted", specPath: , attempts: 0 }` and stop. +- Otherwise proceed. + +### Step 3 — Dispatch the agent (attempt 1) + +``` +Agent({ + description: "Generate PR recipe spec", + subagent_type: "oh-my-claudecode:executor", + model: "opus", + prompt: bundle.prompt +}) +``` + +The bundle's `prompt` already contains the full authoring contract, reference specs, PR diff, and fence-marker instruction (`<<>>` … `<<>>`). + +### Step 4 — Extract spec body + +Parse the agent's reply for the text strictly between `<<>>` and `<<>>` (exclusive of the markers, trimmed). + +- If both markers are present and the body is non-empty → continue to Step 5. +- Otherwise → treat as a fence-miss. On attempt 1, jump to Step 9 with a retry message asking the agent to re-emit between fence markers and nothing else. If attempt 2 also fences-misses → write `result.json` `{ status: "agent-emitted-no-spec", attempts: 2 }` and stop. + +### Step 5 — Deny-regex (security gate, NO retry) + +Run the deny-regex pure function from `/Users/valentinpalkovic/Projects/storybook/scripts/verify/recipe-deny.ts` via Bash, executed from the repo root: + +```bash +bun -e "import('/Users/valentinpalkovic/Projects/storybook/scripts/verify/recipe-deny.ts').then(m => { m.assertNoDeniedPatterns(process.argv[1]); })" -- "$SPEC_BODY" +``` + +Pass the spec body via a temp file (avoid shell-escaping pitfalls): write the body to `.verify-output//.deny-input.txt`, then read it inside the bun one-liner. The function throws on hit. + +- On any throw → write `result.json` `{ status: "deny-regex-failed", error: , attempts: }` and stop. **Do not retry — deny hits are security blockers.** +- On exit 0 → continue. + +### Step 6 — Prepend header-comment provenance (D8) + +Build a JSON-pretty block from `bundle.metadata` and `bundle.prNumber`, then prepend it as a block comment to the spec body: + +```ts +/** + * verify-pr-generate: AUTO-GENERATED — review and commit alongside the PR + * { + * "generatedAt": "", + * "agentModel": "", + * "prNumber": , + * "referenceSpecs": [ "", ... ], + * "triageGlobs": [ "", ... ] + * } + */ + +``` + +The JSON body inside the comment uses 2-space indentation. Every line of the embedded JSON begins with ` * ` to keep the block-comment well-formed. + +### Step 7 — Write the file + +`Write` the assembled content (header + spec body) to `bundle.outputSpecPath` (absolute path from the bundle). + +### Step 8 — Pipe to `verify-pr-author` (D4-α sentinel-exit-75 contract) + +Lint, deny-regex, post-write regex checks, header-comment provenance, retry-message +authoring, and result.json emission all live in TypeScript core. The skill's job is +strictly to dispatch the agent and pipe its raw reply into the author CLI. + +#### 8a. Dispatch the agent (attempt 1) + +``` +Agent({ + description: "Generate PR recipe spec", + subagent_type: "oh-my-claudecode:executor", + model: "opus", + prompt: bundle.prompt +}) +``` + +Capture the agent's full reply as ``. + +#### 8b. Pipe to `verify-pr-author --bundle --dispatch-mode stdin` + +```bash +printf '%s' "$REPLY" | node /Users/valentinpalkovic/Projects/storybook/scripts/verify-pr-author.ts --bundle --dispatch-mode stdin +``` + +The CLI runs the deny-regex, header-comment provenance, file write, scoped lint +(`scripts/verify/lint-invocation.ts`), and post-write regex checks. Exit codes: + +- `0` — success. CLI has already written the spec and `result.json`. Skip to Step 10. +- `1` — non-retryable error (deny-regex hit, collision, IO error). CLI has written + `result.json` with the failure status. Print the failure line and stop. +- `75` — retryable lint/structural failure. The CLI has emitted a framed retry + block on stdout (see Step 9). Continue to Step 9. + +Treat **exit 75** as the sole retry sentinel. Any other non-zero exit is terminal. + +### Step 9 — Retry once (attempt 2, sentinel-exit-75) + +On exit 75 from Step 8b, parse stdout for the framed retry block: + +``` +===VERIFY_PR_AUTHOR_RETRY_BEGIN=== + +===VERIFY_PR_AUTHOR_RETRY_END=== +``` + +Extract the lines strictly between the BEGIN/END markers and assemble the retry +prompt as: + +``` + + +[RETRY] + +``` + +Dispatch the agent again with the assembled retry prompt (same `subagent_type` and +`model`). Capture the new reply as ``, then pipe it back through the CLI in +retry mode: + +```bash +printf '%s' "$REPLY2" | node /Users/valentinpalkovic/Projects/storybook/scripts/verify-pr-author.ts --bundle --dispatch-mode stdin --retry-of +``` + +The CLI enforces `RECIPE_RETRY_POLICY.maxAttempts` (currently 2) — it will not +re-emit exit 75 on the retry call. Expected exits: + +- `0` — success. Skip to Step 10. +- `1` — terminal failure (lint exhausted, regex-check exhausted, deny-regex hit). + CLI has written `result.json` with `attempts: 2` and the failure status. Print + the failure line and stop. + +### Step 10 — Emit `result.json` on success + +Write `/Users/valentinpalkovic/Projects/storybook/.verify-output//result.json`: + +```jsonc +{ + "version": 1, + "status": "spec-written", + "specPath": "", + "attempts": 1 | 2, + "lint": "clean", + "regex": { "listenerBeforeGoto": true, "attachPattern": true }, + "agentModel": "", + "generatedAt": "" +} +``` + +### Step 11 — Print actionable next-step lines + +Emit these lines (one per line) to the agent's final reply: + +``` +[verify-recipe-author] spec written: +[verify-recipe-author] result.json: +[verify-recipe-author] attempts: | lint: clean +[verify-recipe-author] Next: review the spec, then run `yarn verify-pr --recipe-spec ` +``` + +## Failure Modes + +Always write `result.json` to `.verify-output//result.json` before stopping. Schema is the same as Step 10 with the `status` field swapped. + +| Cause | `result.json.status` | Retry? | +|---|---|---| +| `outputSpecPath` exists and `force === false` | `collision-aborted` | no | +| Agent reply lacks fence markers / empty body | `agent-emitted-no-spec` | yes (1 retry) | +| `assertNoDeniedPatterns` throws | `deny-regex-failed` | **NO — security block** | +| Lint exit non-zero | `lint-failed` | yes (1 retry) | +| Post-write regex check failed (listener-before-goto OR attach pattern) | `regex-check-failed` | yes (1 retry, fed back as lint-equivalent) | +| All gates pass | `spec-written` | n/a | + +Print the failure cause + the `result.json` path on stop: + +``` +[verify-recipe-author] FAILED: — see +``` + +## Notes + +- This skill runs inside Claude Code; it uses `Agent`, `Read`, `Write`, `Bash`, and `Edit` tools. +- All paths in invocations are absolute. Lint commands `cd code` via `yarn --cwd`. +- Max attempts = `RECIPE_RETRY_POLICY.maxAttempts` (currently 2). Read the value from `scripts/verify/recipe-retry-policy.ts` — do not hardcode. +- The skill **never executes** the generated spec. The human review gate (Phase-1 lethal-trifecta breaker) is preserved. +- Deny-regex hits are not retried — they are security signals, not lint nits. +- Cap retry feedback at 5 errors (R3). +- The `runId` is the basename of the parent directory of the bundle; do not invent a new one. + +## Phase-2 follow-up + +This skill currently couples generation to a running Claude Code session via the `Agent` tool dispatch. Phase-2 CI activation will require migrating to a direct Anthropic SDK call (`@anthropic-ai/sdk`) with an `ANTHROPIC_API_KEY` env var, replacing the `Agent` dispatch with a standalone API call so the workflow at `.github/workflows/verify-pr.yml` can run unattended. Tracked as a follow-up in the plan's ADR §Follow-ups. diff --git a/.claude/skills/verify-recipe-author/SKILL.md b/.claude/skills/verify-recipe-author/SKILL.md new file mode 100644 index 000000000000..534cd1871898 --- /dev/null +++ b/.claude/skills/verify-recipe-author/SKILL.md @@ -0,0 +1 @@ +@../../../.agents/skills/verify-recipe-author/SKILL.md diff --git a/.dockerignore b/.dockerignore new file mode 100644 index 000000000000..29fd5ffe235c --- /dev/null +++ b/.dockerignore @@ -0,0 +1,19 @@ +.env +.env.* +**/.env +**/.env.* +~/.ssh/ +~/.aws/ +~/.config/gcloud/ +~/.azure/ +~/.docker/config.json +~/.kube/config +.npmrc +.pypirc +**/*-service-account.json +**/*.pem +**/*.key +~/.git-credentials +.verify-output/ +node_modules/ +.nx/ diff --git a/.github/workflows/verify-pr.yml b/.github/workflows/verify-pr.yml new file mode 100644 index 000000000000..e41cb7d47479 --- /dev/null +++ b/.github/workflows/verify-pr.yml @@ -0,0 +1,324 @@ +name: PR Verification Harness +on: + pull_request_target: + types: [labeled, synchronize] + paths-ignore: ['docs/**', '**/*.md', '.github/**'] + +# v6 single-round model: recipe is authored AND executed in the same workflow +# run. The authored spec is materialised straight into the untrusted PR-head +# workspace ($RUNNER_TEMP/pr-head/.verify-recipes/pr-<#>.spec.ts) — never +# committed. See scripts/verify/SECURITY.md for the load-bearing controls. + +jobs: + verify: + if: github.event.pull_request.draft == false && contains(github.event.pull_request.labels.*.name, 'ci:verify') + runs-on: ubuntu-22.04 + timeout-minutes: 30 + concurrency: + group: verify-${{ github.event.pull_request.number }} + cancel-in-progress: true + permissions: + pull-requests: write + issues: write + statuses: write + contents: write # needed to push screenshots to the _verify-screenshots side branch + steps: + - name: Check actor permission + uses: prince-chrismc/check-actor-permissions-action@d504e74ba31658f4cdf4fcfeb509d4c09736d88e # v3.0.2 + with: + permission: write + - name: Checkout base + uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 + with: + # `pull_request.base.sha` is captured at PR creation / sync and lags + # behind the actual base-branch tip when the base advances independently + # of the PR head. The harness depends on always-latest trusted scripts + # (verify-pr-generate flags, deny-regex, lint config) so we pin to the + # branch ref instead — pull_request_target keeps execution on trusted + # workflow + script content either way. + ref: ${{ github.event.pull_request.base.ref }} + - name: Setup Node.js and Install Dependencies + uses: ./.github/actions/setup-node-and-install + with: + install-code-deps: true + - name: Setup Bun + uses: oven-sh/setup-bun@0c5077e51419868618aeaa5fe8019c62421857d6 # v2.2.0 + + - name: Compute paths + # Export PR_HEAD_DIR via $GITHUB_ENV so every subsequent step can + # reference $PR_HEAD_DIR (shell) or ${{ env.PR_HEAD_DIR }} (YAML). + # ${{ runner.temp }} is only valid in step contexts, not at job-env + # level — this is the canonical workaround. + run: echo "PR_HEAD_DIR=${{ runner.temp }}/pr-head" >> "$GITHUB_ENV" + + - name: Checkout PR head (untrusted execution context) + # Manual git clone instead of actions/checkout because the repo carries + # gitlinks (.external/addon-svelte-csf etc.) without a .gitmodules file, + # which breaks actions/checkout's submodule-foreach cleanup pass under + # persist-credentials: false. The manual clone never writes auth into + # pr-head/.git, preserving the same security posture as + # persist-credentials: false. + # + # Cloning into $RUNNER_TEMP (outside the base checkout tree) so that + # nx / yarn / jiti module resolution inside pr-head never walks up + # into the base checkout's node_modules — that escape was the round-3 + # / round-4 failure mode (eslint-plugin prebuild resolved + # `storybook/csf` to the base copy, which has no `dist/csf` yet). + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }} + REPO: ${{ github.repository }} + run: | + set -euo pipefail + git -c protocol.version=2 clone \ + --no-tags --no-checkout --filter=blob:none \ + "https://x-access-token:${GITHUB_TOKEN}@github.com/${REPO}.git" \ + "$PR_HEAD_DIR" + cd "$PR_HEAD_DIR" + git -c protocol.version=2 fetch --no-tags --depth=1 origin "$PR_HEAD_SHA" + git checkout --force "$PR_HEAD_SHA" + git remote set-url origin "https://github.com/${REPO}.git" + git config --local --unset-all credential.helper 2>/dev/null || true + + - name: Fetch PR diff + run: gh pr diff ${{ github.event.pull_request.number }} > /tmp/pr.diff + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + + - name: Generate bundle + # Trusted scripts (base checkout) read trusted authoring-guide + + # canonical smoke + triage rules from the base. The bundle's + # outputSpecPath is set to the ephemeral PR-head path so the Author + # step writes the recipe straight into the untrusted workspace. + run: | + yarn verify-pr-generate \ + --pr ${{ github.event.pull_request.number }} \ + --force \ + --output "$PR_HEAD_DIR/.verify-recipes/pr-${{ github.event.pull_request.number }}.spec.ts" + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + + - name: Author recipe + # ANTHROPIC_API_KEY is scoped to THIS step only. The Verify step that + # executes the authored spec has no API key. Static deny-regex + + # scoped lint inside recipe-author-core gate what lands on disk. + run: | + # shellcheck disable=SC2012 + BUNDLE=$(ls -t .verify-output/*/prompt-bundle.json | head -1) + yarn verify-pr-author --bundle "$BUNDLE" + env: + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + + - name: Verify PR + working-directory: ${{ runner.temp }}/pr-head + env: + PR_NUMBER: ${{ github.event.pull_request.number }} + # Force Nx to run locally — the fork harness firetest runs without + # Nx Cloud org auth, and the Verify step does not need distributed + # cache. Upstream CI behaviour is unaffected (other steps that + # already use Nx Cloud auth continue to do so). + NX_NO_CLOUD: 'true' + NX_CLOUD_ACCESS_TOKEN: '' + run: | + set -euo pipefail + yarn install --immutable + # Install the Playwright chromium browser binary on the runner + # (the GitHub-hosted ubuntu-22.04 image does not pre-bundle + # @playwright/test browsers). + yarn playwright install --with-deps chromium + # Compile core FIRST so eslint-plugin's prebuild script (which + # imports from `storybook/csf` via jiti) can resolve + # `storybook/dist/csf/index.js`. nx run-many parallelises by + # default and does not topo-order compile targets unless an + # explicit `dependsOn: ["^compile"]` is set, which the repo + # currently lacks for the eslint-plugin chain. + yarn nx compile core + # Then compile every other NX project so the internal-ui dev + # server can resolve @storybook/react-vite + every addon + # listed in code/.storybook/main.ts. + yarn nx run-many -t compile + yarn verify-pr --recipe-spec ".verify-recipes/pr-${PR_NUMBER}.spec.ts" + + - name: Read verdict + id: verdict + if: always() + run: | + set -euo pipefail + RESULT="" + if compgen -G "$PR_HEAD_DIR/.verify-output/*/verify-result.json" >/dev/null; then + RESULT=$(ls -t "$PR_HEAD_DIR"/.verify-output/*/verify-result.json | head -1) + fi + if [ -n "$RESULT" ]; then + VERDICT=$(jq -r '.verdict' "$RESULT") + echo "verdict=$VERDICT" >> "$GITHUB_OUTPUT" + echo "verdict: $VERDICT" >> "$GITHUB_STEP_SUMMARY" + jq -r '"target=\(.template) regressionReason=\(.regressionReason // "n/a")"' "$RESULT" >> "$GITHUB_STEP_SUMMARY" + else + echo "verdict=missing" >> "$GITHUB_OUTPUT" + fi + + - name: Apply verified-by-harness label + if: steps.verdict.outputs.verdict == 'verified' + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + run: | + # Ensure the label exists on the repo (idempotent — `gh label create` + # exits non-zero if the label already exists, which we swallow). + gh label create verified-by-harness \ + --repo ${{ github.repository }} \ + --description "Verified by PR Verify Harness" \ + --color 0E8A16 \ + 2>/dev/null || true + gh pr edit ${{ github.event.pull_request.number }} \ + --repo ${{ github.repository }} \ + --add-label verified-by-harness + + - name: Push screenshots to side branch + id: screenshots + if: always() && steps.verdict.outputs.verdict != '' && steps.verdict.outputs.verdict != 'missing' + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PR_NUMBER: ${{ github.event.pull_request.number }} + REPO: ${{ github.repository }} + RUN_ID: ${{ github.run_id }} + run: | + set -euo pipefail + SOURCE="$PR_HEAD_DIR/.verify-output" + if ! find "$SOURCE" -name '*.png' -print -quit | grep -q .; then + echo "no screenshots produced — skipping side-branch push" + echo "urls=[]" >> "$GITHUB_OUTPUT" + exit 0 + fi + + ASSETS_DIR="$RUNNER_TEMP/verify-assets" + AUTH_URL="https://x-access-token:${GITHUB_TOKEN}@github.com/${REPO}.git" + + # Try cloning the existing side branch; if it doesn't exist yet, + # create an orphan branch with a single empty commit. + if git -c protocol.version=2 clone --depth 1 --branch _verify-screenshots \ + "$AUTH_URL" "$ASSETS_DIR" 2>/dev/null; then + echo "cloned existing _verify-screenshots branch" + else + echo "side branch missing — creating orphan _verify-screenshots" + git -c protocol.version=2 clone --depth 1 "$AUTH_URL" "$ASSETS_DIR" + ( + cd "$ASSETS_DIR" + git -c user.email=actions@github.com -c user.name="github-actions[bot]" \ + checkout --orphan _verify-screenshots + git rm -rf . >/dev/null + echo "Verify harness screenshot store. Auto-managed; do not edit." > README.md + git add README.md + git -c user.email=actions@github.com -c user.name="github-actions[bot]" \ + commit -m "chore: init _verify-screenshots branch" + git push origin _verify-screenshots + ) + fi + + PR_DIR="$ASSETS_DIR/pr-${PR_NUMBER}/${RUN_ID}" + rm -rf "$ASSETS_DIR/pr-${PR_NUMBER}/${RUN_ID}" + mkdir -p "$PR_DIR" + + # Copy each runId//.png preserving the structure + # under the run directory. + ( + cd "$SOURCE" + find . -name '*.png' -print0 + ) | while IFS= read -r -d '' rel; do + dest="$PR_DIR/${rel#./}" + mkdir -p "$(dirname "$dest")" + cp "$SOURCE/$rel" "$dest" + done + + cd "$ASSETS_DIR" + git add "pr-${PR_NUMBER}/${RUN_ID}" + if git diff --cached --quiet; then + echo "no new screenshots to commit" + echo "urls=[]" >> "$GITHUB_OUTPUT" + exit 0 + fi + git -c user.email=actions@github.com -c user.name="github-actions[bot]" \ + commit -m "verify: PR #${PR_NUMBER} run ${RUN_ID}" + git push origin _verify-screenshots + COMMIT_SHA=$(git rev-parse HEAD) + + # Emit JSON array of { rel, url } for each PNG, where url is the + # raw.githubusercontent.com URL pinned at the commit we just pushed. + URLS=$( + cd "$ASSETS_DIR/pr-${PR_NUMBER}/${RUN_ID}" + find . -name '*.png' \ + | sed "s|^\./||" \ + | jq -R --arg base "https://raw.githubusercontent.com/${REPO}/${COMMIT_SHA}/pr-${PR_NUMBER}/${RUN_ID}" \ + '{ rel: ., url: ($base + "/" + .) }' \ + | jq -s . + ) + echo "urls<> "$GITHUB_OUTPUT" + echo "$URLS" >> "$GITHUB_OUTPUT" + echo "EOF" >> "$GITHUB_OUTPUT" + + - name: Upload artifacts + if: always() + uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 + with: + name: verify-output-pr-${{ github.event.pull_request.number }}-${{ github.run_id }} + path: | + ${{ runner.temp }}/pr-head/.verify-output/ + ${{ runner.temp }}/pr-head/.verify-recipes/pr-${{ github.event.pull_request.number }}.spec.ts + .verify-output/ + if-no-files-found: warn + include-hidden-files: true + retention-days: 14 + - name: Post PR comment + if: always() + env: + SCREENSHOT_URLS: ${{ steps.screenshots.outputs.urls }} + uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0 + with: + script: | + const fs = require('fs'); + const path = require('path'); + const url = `${context.serverUrl}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`; + + function renderScreenshots() { + let raw = process.env.SCREENSHOT_URLS; + if (!raw || raw.trim() === '' || raw.trim() === '[]') return ''; + let items; + try { + items = JSON.parse(raw); + } catch { + return ''; + } + if (!Array.isArray(items) || items.length === 0) return ''; + const blocks = items.map( + (it) => `### \`${it.rel}\`\n\n![${it.rel}](${it.url})\n`, + ); + return `\n\n## Screenshots\n\n${blocks.join('\n')}`; + } + + let body; + try { + const root = path.join(process.env.PR_HEAD_DIR || 'pr-head', '.verify-output'); + const dirs = fs.existsSync(root) + ? fs.readdirSync(root, { withFileTypes: true }) + .filter((d) => d.isDirectory()) + .map((d) => d.name) + .sort() + .reverse() + : []; + const resultPath = dirs + .map((d) => path.join(root, d, 'verify-result.json')) + .find((p) => fs.existsSync(p)); + if (!resultPath) { + body = `## Verify Harness\n\nNo verdict produced — the workflow failed before the harness ran (likely recipe-author dispatch, deny-regex, or lint). See [run log](${url}) for details.`; + } else { + const verdict = JSON.parse(fs.readFileSync(resultPath, 'utf-8')); + body = `## Verify Harness\n\nVerdict: \`${verdict.verdict}\` (target \`${verdict.template}\`)${verdict.regressionReason ? `\n\nReason: \`${verdict.regressionReason}\`` : ''}\n\nReplay: \`npx playwright show-trace\` on the trace.zip in [artifacts](${url}/artifacts).${renderScreenshots()}`; + } + } catch (err) { + body = `## Verify Harness\n\nError reading verdict: \`${err.message}\`. See [run log](${url}).`; + } + github.rest.issues.createComment({ + issue_number: context.issue.number, + owner: context.repo.owner, + repo: context.repo.repo, + body, + }); diff --git a/.gitignore b/.gitignore index c8417670f741..0843b0c36dca 100644 --- a/.gitignore +++ b/.gitignore @@ -88,5 +88,8 @@ scripts/eval/results # review-pr skill output .pr-review +# verify-pr harness output +.verify-output + # Unknown .omc diff --git a/.verify-recipes/.eslintrc.cjs b/.verify-recipes/.eslintrc.cjs new file mode 100644 index 000000000000..ba20b3258f4a --- /dev/null +++ b/.verify-recipes/.eslintrc.cjs @@ -0,0 +1,15 @@ +module.exports = { + root: true, + parser: require.resolve('@typescript-eslint/parser'), + parserOptions: { ecmaVersion: 2022, sourceType: 'module', project: false }, + env: { node: true, es2022: true }, + plugins: ['@typescript-eslint'], + extends: ['eslint:recommended', 'plugin:@typescript-eslint/recommended'], + rules: { + 'no-unused-vars': 'off', + '@typescript-eslint/no-unused-vars': [ + 'error', + { argsIgnorePattern: 'none', varsIgnorePattern: 'none' }, + ], + }, +}; diff --git a/.verify-recipes/.gitkeep b/.verify-recipes/.gitkeep new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/.verify-recipes/_recipe-authoring-guide.md b/.verify-recipes/_recipe-authoring-guide.md new file mode 100644 index 000000000000..6d35ae316686 --- /dev/null +++ b/.verify-recipes/_recipe-authoring-guide.md @@ -0,0 +1,235 @@ +# Recipe Authoring Guide (for LLM recipe-author agents) + +This file is the **authoring contract** for agent-generated Playwright recipes in `.verify-recipes/`. The `verify-recipe-author` skill includes this guide verbatim in the prompt; the runner executes the committed spec via `bun x playwright test`. + +> **Audience:** an LLM that writes a single `.spec.ts` file for one PR. The output must match the contract below exactly — no exceptions. + +--- + +## 1. Output contract + +Emit **one file** at the path specified by the skill: `.verify-recipes/pr-<#>.spec.ts`. + +Required shape: + +```ts +import { expect, test } from '@playwright/test'; + +import { RecipePage } from './_util.ts'; + +test('', async ({ page }, testInfo) => { + // ... see rules below ... +}); +``` + +Hard requirements: + +- **Imports**: ONLY `@playwright/test` and `./_util.ts`. Nothing else. No `node:*`, no `child_process`, no `fs`, no `@storybook/*`, no relative imports outside `.verify-recipes/`. +- **Exactly one `test(...)` call.** No `describe`, no `test.skip`, no `test.only`, no `beforeEach`/`afterEach`. +- **`.ts` extension on relative imports** (`./_util.ts`, not `./_util`). +- **No top-level side effects** — everything inside the `test(...)` callback. + +Output is wrapped between fenced markers `<<>>` and `<<>>` (the skill strips these and writes the body). + +--- + +## 2. Listener-before-goto rule (HARD GATE — AC-V3-3) + +`page.on('pageerror', ...)` and `page.on('console', ...)` listeners MUST be registered **before** the first `page.goto(...)` call. The skill's post-write regex check enforces this; if you call `page.goto` first, the spec is rejected. + +Canonical pattern: + +```ts +test('my recipe', async ({ page }, testInfo) => { + const pageErrors: string[] = []; + const consoleErrors: string[] = []; + + // Listeners FIRST. Always. + page.on('pageerror', (err) => { + pageErrors.push(err.stack ?? err.message ?? String(err)); + }); + page.on('console', (msg) => { + if (msg.type() === 'error') consoleErrors.push(msg.text()); + }); + + const baseURL = + process.env.STORYBOOK_URL ?? testInfo.project.use.baseURL ?? 'http://localhost:6006'; + + // Now (and only now) navigate. + await page.goto(`${baseURL}/?path=/story/example-button--primary`); + // ... +}); +``` + +Never call `page.goto` (or `page.waitForURL`, or any other navigation primitive) before the listeners are attached. + +--- + +## 3. Attach pattern (HARD GATE — AC-V3-4) + +The runner harvests `pageErrors` and `consoleErrors` from test attachments. You MUST attach both in a `finally` block (so attachments land even on assertion failure): + +```ts +try { + // ...goto + assertions... +} finally { + await testInfo.attach('pageErrors', { + body: JSON.stringify(pageErrors), + contentType: 'application/json', + }); + await testInfo.attach('consoleErrors', { + body: JSON.stringify(consoleErrors), + contentType: 'application/json', + }); +} +``` + +Attachment names are exactly `pageErrors` and `consoleErrors`. The body is JSON-stringified array of strings (already accumulated by the listeners). + +--- + +## 4. `RecipePage` API (the only helper) + +From `./_util.ts`: + +```ts +new RecipePage(page, expect).waitUntilLoaded(): Promise +new RecipePage(page, expect).previewIframe(): FrameLocator +new RecipePage(page, expect).previewRoot(): Locator +new RecipePage(page, expect).waitForStoryLoaded(): Promise +``` + +- `waitUntilLoaded()` injects a session-storage layout, disables transitions, waits for `.sb-preparing-story` / `.sb-preparing-docs` to vanish, then for the story root to be attached. +- `previewIframe()` returns `page.frameLocator('#storybook-preview-iframe')` — use for any preview-frame assertions. +- `previewRoot()` returns the visible `#storybook-root` (or `#storybook-docs`) inside the preview iframe. + +Call `waitUntilLoaded()` immediately after `page.goto(...)`. + +--- + +## 5. Selectors and locators + +Preferred (in priority order): + +1. `page.getByRole(...)` — accessibility-tree queries, most stable +2. `page.getByTestId(...)` / `data-testid` selectors +3. ID selectors (`#storybook-preview-iframe`, `#sb-errordisplay`, `#storybook-root`) +4. Class selectors that look stable (`.sb-preparing-story` etc.) + +Avoid: + +- `:nth-child(N)` chains — break on layout shifts +- Brittle class chains (`.foo .bar .baz > div`) +- Free-text matches without `i18n` context +- `setTimeout` / `page.waitForTimeout` for synchronization — use Playwright web-first assertions or `RecipePage.waitUntilLoaded()` + +--- + +## 6. Story URL routing + +- Story: `?path=/story/--` (e.g., `/?path=/story/example-button--primary`) +- Docs: `?path=/docs/--` (e.g., `/?path=/docs/example-button--docs`) +- Manager only (no story): omit the `path` param or use `?path=/` + +Kind-ids are kebab-case from the story `title` field; story-ids are kebab-case from the export name. When the PR diff names a `*.stories.tsx` file, derive the kind-id from the file path or the `title:` line in the diff. + +--- + +## 7. Frame access + +- Manager DOM (toolbar, sidebar, addon panels): use `page` directly. +- Preview iframe DOM (the story itself): use `page.frameLocator('#storybook-preview-iframe')` or `recipe.previewIframe()`. + +Example: + +```ts +const recipe = new RecipePage(page, expect); +await recipe.waitUntilLoaded(); + +const previewIframe = recipe.previewIframe(); +const button = previewIframe.getByRole('button', { name: /primary/i }); +await expect(button).toBeVisible(); +``` + +--- + +## 8. Assertions — what counts as a meaningful recipe + +A **smoke-shaped recipe** is the minimum: navigate to a story, wait until loaded, assert preview-root has children, `#sb-errordisplay` is hidden, screenshot the iframe, attach errors. See `example-smoke.spec.ts` for the canonical form. + +A **targeted recipe** goes further. Examples per change type: + +| Diff touches | Recipe should additionally | +|---|---| +| `code/addons//**` | Open the addon panel (`recipe.previewIframe()` may be irrelevant; manager queries needed); assert addon-tab present; trigger the addon's primary interaction | +| `code/core/src/manager/**` | Assert sidebar entries render; navigate between two stories; assert URL update | +| `code/core/src/manager-api/**` | Assert at least one channel-bound UI element responds (e.g., theme toggle, tab switch) | +| `code/core/src/csf-tools/**` | Open a story whose CSF the PR touches; assert it indexes (visible in sidebar tree) | +| `code/core/src/preview-api/**` | Open a story with args/decorators in scope; assert `previewRoot()` rendered without errordisplay | +| `code/frameworks//**` | Use the framework's reference template story (e.g., svelte → svelte-vite default story); confirm SSR/CSR hydration shape if applicable | +| `code/builders/**` | Assert preview-iframe loads at all (builder errors surface here); navigate to a story; confirm HMR not needed for static load | + +Pick the assertion shape that most directly observes the changed code path. Prefer 1-3 focused assertions over a long list — the runner harvests pageerrors/consoleerrors orthogonally. + +--- + +## 9. What to AVOID (skill's deny-regex enforces several of these) + +| Pattern | Why | +|---|---| +| `import ... from 'child_process'` / `require('child_process')` | Recipes never spawn subprocesses | +| `fs.unlink`, `fs.rm`, `fs.rmdir`, `fsp.unlink`, etc. | Recipes never delete files | +| `process.exit(...)` | Playwright handles test exit codes; never short-circuit | +| `eval(...)` | Never. Use `page.evaluate(...)` if you need in-browser execution | +| `import 'node:...'` (Node-only modules) | Recipes are Playwright-test files, not orchestration scripts | +| `@storybook/...` direct imports | Adds the non-erasable TS-enum chain that breaks under bun's strip-types path | +| `page.waitForTimeout(N)` | Always avoid time-based waits; use web-first assertions | +| `test.only`, `test.skip`, `describe.only` | Single test only; no skipping | +| Network calls (`fetch`, `axios`, etc.) inside the spec body | Storybook is local; no external endpoints | + +--- + +## 10. Header comment provenance (the skill prepends this) + +After you emit your spec body, the `verify-recipe-author` skill prepends a block comment with `{ generatedAt, agentModel, prNumber, referenceSpecs, triageGlobs }`. Do NOT emit this yourself — the skill owns it. + +--- + +## 11. Worked example (reference shape) + +See `.verify-recipes/example-smoke.spec.ts` for the canonical minimum. Your output should look structurally similar: listeners → goto → `waitUntilLoaded` → assertions → `finally` attach → `expect(pageErrors).toEqual([])`. + +--- + +## 12. Target selection (v6) + +Pick one of two execution targets via a single-line header comment as +the **first non-empty line** of the spec: + +```ts +// @verify-target: internal-ui +// or: +// @verify-target: sandbox:react-vite/default-ts +``` + +| Target | What the harness boots | Pick when | +|---|---|---| +| `internal-ui` (default if header absent) | `code/storybook-static/` served via `http-server`. Built once from the PR-head monorepo. | The diff touches a package that the internal Storybook UI exercises (manager, manager-api, channels, core-server, addons, csf-tools, preview-api). This is the right answer for ~all PRs. | +| `sandbox: