diff --git a/.agents/skills/verify-recipe-author/SKILL.md b/.agents/skills/verify-recipe-author/SKILL.md new file mode 100644 index 000000000000..9d99a1108743 --- /dev/null +++ b/.agents/skills/verify-recipe-author/SKILL.md @@ -0,0 +1,246 @@ +--- +name: verify-recipe-author +description: Generate the Playwright recipe spec for a PR-verify-pr-generate prompt bundle. Reads `.verify-output//prompt-bundle.json`, dispatches the OMC executor agent (model=opus), runs deny-regex, writes `.verify-recipes/pr-<#>.spec.ts` with header-comment provenance, lints with one retry, emits `.verify-output//result.json`. Trigger after `yarn verify-pr-generate`. +allowed-tools: Agent, Bash, Read, Write, Edit +--- + +# Verify Recipe Author + +Consumes a prompt bundle emitted by `yarn verify-pr-generate --pr <#>` and produces the per-PR Playwright recipe spec for human review. Authoring only — never executes the spec. + +This skill is invoked **after** `yarn verify-pr-generate --pr <#>` succeeds. The bun script does the deterministic I/O (gh fetch, triage, prompt assembly, bundle write); this skill drives the agent dispatch, deny-regex, lint, retry, and the final write to `.verify-recipes/pr-<#>.spec.ts`. + +The full design and acceptance criteria live in `/Users/valentinpalkovic/Projects/storybook/.omc/plans/pr-verify-v3-agent-generated-recipes.md` (§Lane C, §D6, §D8, §D9). Read the plan if anything below is ambiguous. + +## Inputs + +No args required. The skill discovers the most recent bundle automatically. The caller may optionally pass an explicit bundle path as the skill argument. + +1. **Auto-discover (default)**: list `/Users/valentinpalkovic/Projects/storybook/.verify-output/`, pick the directory with the lexicographically largest name (ISO timestamps sort correctly), then read `prompt-bundle.json` inside it. +2. **Explicit path**: if the user passed an absolute path to a `prompt-bundle.json`, read that file directly. + +Bundle shape (see `scripts/verify-pr-generate.ts` for the canonical emitter): + +```jsonc +{ + "version": 1, + "prNumber": 12345, + "runId": "...", + "outputSpecPath": "/abs/path/.verify-recipes/pr-12345.spec.ts", + "force": false, + "prompt": "", + "metadata": { + "agentModel": "claude-opus-4-7[1m]", + "referenceSpecs": ["..."], + "triageGlobs": ["..."], + "generatedAt": "" + } +} +``` + +The `` is the parent directory of the bundle — derive it from the bundle path, not from a field. + +## Runbook + +Follow these steps in order. Stop and emit `result.json` per §Failure Modes on any non-success outcome. + +### Step 1 — Read the bundle + +`Read` the bundle JSON. Capture `prNumber`, `runId` (from the parent dir), `outputSpecPath`, `force`, `prompt`, and `metadata`. + +### Step 2 — Pre-flight collision check (D9, TOCTOU re-guard) + +Re-check whether `bundle.outputSpecPath` already exists. The bun script enforced D9 at bundle-emit time; the skill re-checks because the user may have created the file between the two steps. + +- If the file exists and `bundle.force === false` → write `result.json` with `{ status: "collision-aborted", specPath: , attempts: 0 }` and stop. +- Otherwise proceed. + +### Step 3 — Dispatch the agent (attempt 1) + +``` +Agent({ + description: "Generate PR recipe spec", + subagent_type: "oh-my-claudecode:executor", + model: "opus", + prompt: bundle.prompt +}) +``` + +The bundle's `prompt` already contains the full authoring contract, reference specs, PR diff, and fence-marker instruction (`<<>>` … `<<>>`). + +### Step 4 — Extract spec body + +Parse the agent's reply for the text strictly between `<<>>` and `<<>>` (exclusive of the markers, trimmed). + +- If both markers are present and the body is non-empty → continue to Step 5. +- Otherwise → treat as a fence-miss. On attempt 1, jump to Step 9 with a retry message asking the agent to re-emit between fence markers and nothing else. If attempt 2 also fences-misses → write `result.json` `{ status: "agent-emitted-no-spec", attempts: 2 }` and stop. + +### Step 5 — Deny-regex (security gate, NO retry) + +Run the deny-regex pure function from `/Users/valentinpalkovic/Projects/storybook/scripts/verify/recipe-deny.ts` via Bash, executed from the repo root: + +```bash +bun -e "import('/Users/valentinpalkovic/Projects/storybook/scripts/verify/recipe-deny.ts').then(m => { m.assertNoDeniedPatterns(process.argv[1]); })" -- "$SPEC_BODY" +``` + +Pass the spec body via a temp file (avoid shell-escaping pitfalls): write the body to `.verify-output//.deny-input.txt`, then read it inside the bun one-liner. The function throws on hit. + +- On any throw → write `result.json` `{ status: "deny-regex-failed", error: , attempts: }` and stop. **Do not retry — deny hits are security blockers.** +- On exit 0 → continue. + +### Step 6 — Prepend header-comment provenance (D8) + +Build a JSON-pretty block from `bundle.metadata` and `bundle.prNumber`, then prepend it as a block comment to the spec body: + +```ts +/** + * verify-pr-generate: AUTO-GENERATED — review and commit alongside the PR + * { + * "generatedAt": "", + * "agentModel": "", + * "prNumber": , + * "referenceSpecs": [ "", ... ], + * "triageGlobs": [ "", ... ] + * } + */ + +``` + +The JSON body inside the comment uses 2-space indentation. Every line of the embedded JSON begins with ` * ` to keep the block-comment well-formed. + +### Step 7 — Write the file + +`Write` the assembled content (header + spec body) to `bundle.outputSpecPath` (absolute path from the bundle). + +### Step 8 — Pipe to `verify-pr-author` (D4-α sentinel-exit-75 contract) + +Lint, deny-regex, post-write regex checks, header-comment provenance, retry-message +authoring, and result.json emission all live in TypeScript core. The skill's job is +strictly to dispatch the agent and pipe its raw reply into the author CLI. + +#### 8a. Dispatch the agent (attempt 1) + +``` +Agent({ + description: "Generate PR recipe spec", + subagent_type: "oh-my-claudecode:executor", + model: "opus", + prompt: bundle.prompt +}) +``` + +Capture the agent's full reply as ``. + +#### 8b. Pipe to `verify-pr-author --bundle --dispatch-mode stdin` + +```bash +printf '%s' "$REPLY" | node /Users/valentinpalkovic/Projects/storybook/scripts/verify-pr-author.ts --bundle --dispatch-mode stdin +``` + +The CLI runs the deny-regex, header-comment provenance, file write, scoped lint +(`scripts/verify/lint-invocation.ts`), and post-write regex checks. Exit codes: + +- `0` — success. CLI has already written the spec and `result.json`. Skip to Step 10. +- `1` — non-retryable error (deny-regex hit, collision, IO error). CLI has written + `result.json` with the failure status. Print the failure line and stop. +- `75` — retryable lint/structural failure. The CLI has emitted a framed retry + block on stdout (see Step 9). Continue to Step 9. + +Treat **exit 75** as the sole retry sentinel. Any other non-zero exit is terminal. + +### Step 9 — Retry once (attempt 2, sentinel-exit-75) + +On exit 75 from Step 8b, parse stdout for the framed retry block: + +``` +===VERIFY_PR_AUTHOR_RETRY_BEGIN=== + +===VERIFY_PR_AUTHOR_RETRY_END=== +``` + +Extract the lines strictly between the BEGIN/END markers and assemble the retry +prompt as: + +``` + + +[RETRY] + +``` + +Dispatch the agent again with the assembled retry prompt (same `subagent_type` and +`model`). Capture the new reply as ``, then pipe it back through the CLI in +retry mode: + +```bash +printf '%s' "$REPLY2" | node /Users/valentinpalkovic/Projects/storybook/scripts/verify-pr-author.ts --bundle --dispatch-mode stdin --retry-of +``` + +The CLI enforces `RECIPE_RETRY_POLICY.maxAttempts` (currently 2) — it will not +re-emit exit 75 on the retry call. Expected exits: + +- `0` — success. Skip to Step 10. +- `1` — terminal failure (lint exhausted, regex-check exhausted, deny-regex hit). + CLI has written `result.json` with `attempts: 2` and the failure status. Print + the failure line and stop. + +### Step 10 — Emit `result.json` on success + +Write `/Users/valentinpalkovic/Projects/storybook/.verify-output//result.json`: + +```jsonc +{ + "version": 1, + "status": "spec-written", + "specPath": "", + "attempts": 1 | 2, + "lint": "clean", + "regex": { "listenerBeforeGoto": true, "attachPattern": true }, + "agentModel": "", + "generatedAt": "" +} +``` + +### Step 11 — Print actionable next-step lines + +Emit these lines (one per line) to the agent's final reply: + +``` +[verify-recipe-author] spec written: +[verify-recipe-author] result.json: +[verify-recipe-author] attempts: | lint: clean +[verify-recipe-author] Next: review the spec, then run `yarn verify-pr --recipe-spec ` +``` + +## Failure Modes + +Always write `result.json` to `.verify-output//result.json` before stopping. Schema is the same as Step 10 with the `status` field swapped. + +| Cause | `result.json.status` | Retry? | +|---|---|---| +| `outputSpecPath` exists and `force === false` | `collision-aborted` | no | +| Agent reply lacks fence markers / empty body | `agent-emitted-no-spec` | yes (1 retry) | +| `assertNoDeniedPatterns` throws | `deny-regex-failed` | **NO — security block** | +| Lint exit non-zero | `lint-failed` | yes (1 retry) | +| Post-write regex check failed (listener-before-goto OR attach pattern) | `regex-check-failed` | yes (1 retry, fed back as lint-equivalent) | +| All gates pass | `spec-written` | n/a | + +Print the failure cause + the `result.json` path on stop: + +``` +[verify-recipe-author] FAILED: — see +``` + +## Notes + +- This skill runs inside Claude Code; it uses `Agent`, `Read`, `Write`, `Bash`, and `Edit` tools. +- All paths in invocations are absolute. Lint commands `cd code` via `yarn --cwd`. +- Max attempts = `MAX_RECIPE_ATTEMPTS` (currently 2). Read the value from `scripts/verify/recipe-author-core.ts` — do not hardcode. +- The skill **never executes** the generated spec. The human review gate (Phase-1 lethal-trifecta breaker) is preserved. +- Deny-regex hits are not retried — they are security signals, not lint nits. +- Cap retry feedback at 5 errors (R3). +- The `runId` is the basename of the parent directory of the bundle; do not invent a new one. + +## Phase-2 follow-up + +This skill currently couples generation to a running Claude Code session via the `Agent` tool dispatch. Phase-2 CI activation will require migrating to a direct Anthropic SDK call (`@anthropic-ai/sdk`) with an `ANTHROPIC_API_KEY` env var, replacing the `Agent` dispatch with a standalone API call so the workflow at `.github/workflows/verify-pr.yml` can run unattended. Tracked as a follow-up in the plan's ADR §Follow-ups. diff --git a/.claude/skills/verify-recipe-author/SKILL.md b/.claude/skills/verify-recipe-author/SKILL.md new file mode 100644 index 000000000000..534cd1871898 --- /dev/null +++ b/.claude/skills/verify-recipe-author/SKILL.md @@ -0,0 +1 @@ +@../../../.agents/skills/verify-recipe-author/SKILL.md diff --git a/.dockerignore b/.dockerignore new file mode 100644 index 000000000000..29fd5ffe235c --- /dev/null +++ b/.dockerignore @@ -0,0 +1,19 @@ +.env +.env.* +**/.env +**/.env.* +~/.ssh/ +~/.aws/ +~/.config/gcloud/ +~/.azure/ +~/.docker/config.json +~/.kube/config +.npmrc +.pypirc +**/*-service-account.json +**/*.pem +**/*.key +~/.git-credentials +.verify-output/ +node_modules/ +.nx/ diff --git a/.github/actions/agentic-pr-prepare/README.md b/.github/actions/agentic-pr-prepare/README.md new file mode 100644 index 000000000000..721ce655d23f --- /dev/null +++ b/.github/actions/agentic-pr-prepare/README.md @@ -0,0 +1,142 @@ +# agentic-pr-prepare + +Universal infrastructure setup for agentic workflows running under +`pull_request_target`: actor-permission gate, base + PR-head manual clones, +toolchain install, sandbox-runtime (srt) install + sha-pin verification, +srt-settings JSON, egress smoke-test, and trusted-harness sync. + +This is **half 1 of 2** of the split `verify-pr.yml` infrastructure. The +companion is `agentic-pr-publish`. + +## Caller contract + +The composite **cannot** declare these — the caller workflow MUST: + +1. Trigger on `pull_request_target` (composite `uses: ./.github/actions/...` + resolves against the **base ref** under PRT, which is load-bearing for + trust — never lift this to a trigger that resolves against PR-head). +2. Declare a `permissions:` block. Verify-PR needs at least: + ```yaml + permissions: + pull-requests: write + issues: write + statuses: write + contents: write # side-branch screenshot push (drop if not needed) + ``` +3. Declare a `concurrency:` block. Single-PR: + ```yaml + concurrency: + group: verify-${{ github.event.pull_request.number }} + cancel-in-progress: true + ``` + With `strategy.matrix`, include the matrix dim in the key: + `verify-${{ pr-num }}-${{ matrix.target }}` (matrix-concurrency footgun). +4. Pass `srt-sha256` **inline** with every call. The composite has **no + default** — this keeps a chore-bump PR carrying the heightened + workflow-review bar instead of single-approval flipping a composite default. + +## Inputs + +| Name | Required | Default | Purpose | +|------------------------|----------|----------------------------------|----------------------------------------------------------------------------------------| +| `github-token` | yes | — | Base + PR-head manual clones. | +| `base-ref` | yes | — | `github.event.pull_request.base.ref`. | +| `base-sha` | yes | — | `github.event.pull_request.base.sha`. | +| `pr-head-sha` | yes | — | `github.event.pull_request.head.sha`. | +| `repo` | yes | — | `github.repository`. | +| `srt-version` | no | `0.0.51` | Pinned `@anthropic-ai/sandbox-runtime` version. | +| `srt-sha256` | **yes** | — (no default by design) | sha256 of the resolved `srt` shim at `srt-version`. Bump via `_srt-sha-probe.yml`. | +| `srt-allowed-domains` | no | localhost + registries + CDNs | Newline list. Caller may extend. | +| `srt-allow-write-paths`| no | `$PR_HEAD_DIR`, `$SANDBOX_TMPDIR`, `/tmp`, `$HOME/.cache`, … | Newline list; env vars expanded at composite runtime. | +| `srt-deny-read-paths` | no | `$HOME/.ssh`, `$HOME/.aws`, … | Newline list. | +| `srt-deny-write-paths` | no | `$GITHUB_WORKSPACE`, `$GITHUB_WORKSPACE/.git` | Newline list. | +| `sync-files` | no | (empty) | Newline-delimited `src:dst` pairs (paths relative). H2 path-validated. | +| `sync-trees` | no | (empty) | Newline-delimited tree paths (relative). H2 path-validated. | +| `provenance-secret` | no | (empty → per-run random) | Optional caller-supplied. M2: written to file, not `$GITHUB_ENV`. | +| `install-code-deps` | no | `true` | Pass-through to `setup-node-and-install`. | + +### Path-input safety (H2) + +`sync-files` and `sync-trees` reject `..`, leading `/`, extra `:`; resolve +realpath and assert under `$PR_HEAD_DIR`. Refuses symlink at destination +before `cp --no-dereference` / `cp -aT`. + +### srt-settings JSON emission (H3) + +allowWrite / denyRead / denyWrite / allowedDomains arrays are emitted via +`jq -R . | jq -s .` so PR-controllable strings cannot inject JSON keys. + +## Outputs + +| Name | Purpose | +|----------------------------|--------------------------------------------------------------------------------------------------| +| `pr-head-dir` | Absolute path to untrusted PR-head workspace clone. | +| `srt-settings-path` | Absolute path to `srt-settings.json`. | +| `diff-path` | Absolute path to captured `pr.diff`. | +| `provenance-secret-path` | M2: path to file (mode 0600) holding the per-run provenance secret. NOT in `$GITHUB_ENV`. | + +## Side-effects + +Writes to `$GITHUB_ENV` (so subsequent caller steps in the same job see them): + +- `PR_HEAD_DIR` — absolute path to PR-head workspace +- `SRT_SETTINGS` — absolute path to srt-settings.json +- `CLAUDE_CODE_TMPDIR` — absolute path to sandbox scratch tmpdir + +Does **NOT** write `VERIFY_PROVENANCE_SECRET` to `$GITHUB_ENV`. Trusted task +steps load it explicitly: `cat "$(provenance-secret-path)"`. + +## Worked example + +```yaml +- name: Prepare agentic environment + id: prep + uses: ./.github/actions/agentic-pr-prepare + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + base-ref: ${{ github.event.pull_request.base.ref }} + base-sha: ${{ github.event.pull_request.base.sha }} + pr-head-sha: ${{ github.event.pull_request.head.sha }} + repo: ${{ github.repository }} + srt-version: '0.0.51' + srt-sha256: '36de38197ac22991c8c9edead4d6184914c8b786e040ecf27bdcf26abd166338' + sync-files: | + .verify-recipes/_util.ts:.verify-recipes/_util.ts + sync-trees: | + scripts/verify + provenance-secret: ${{ secrets.VERIFY_PROVENANCE_SECRET }} + +- name: Your task + env: + PROVENANCE_SECRET_PATH: ${{ steps.prep.outputs.provenance-secret-path }} + run: | + VERIFY_PROVENANCE_SECRET="$(cat "$PROVENANCE_SECRET_PATH")" \ + yarn your-thing +``` + +## Pre-existing architectural debt (C1 — NOT fixed by this composite) + +`verify-result.json` (the file the verdict is read from) lives at +`$PR_HEAD_DIR/.verify-out-trusted/verify-result.json` — inside srt's +`allowWrite` set. A malicious PR-added unit test running inside srt can +forge it. The split documented here does NOT make C1 worse; it stays at +its current path so the legitimate writer (`verify-pr.ts`, which itself +runs INSIDE srt) keeps working. + +**The architectural fix requires** one of: + +1. **Process-split** — orchestrator OUTSIDE srt, only Playwright + dev-server + spawns wrapped. **Attempted 2026-05-14, failed**: srt uses bubblewrap with + a fresh network namespace per invocation, so localhost IPC between + orchestrator (outside) and dispatcher (inside) breaks. Reviving requires + shared host netns (loses egress policy on dispatcher), host-network bridge + / Unix socket, or moving dispatcher outside srt (loosens trust on + PR-modified framework code). +2. **HMAC-bound verdict** — `verify-pr.ts` HMAC-signs the JSON with the + provenance secret; trusted bash verifies. Requires scrubbing the secret + from orchestrator env before spawning Playwright + auditing + `/proc//environ` reachability inside srt. + +Until that lands, the verdict is trustworthy ONLY when paired with the +side-channel signals (PR comment, telemetry, GitHub run conclusion) that an +attacker would also have to forge. Tracked as separate follow-up. diff --git a/.github/actions/agentic-pr-prepare/action.yml b/.github/actions/agentic-pr-prepare/action.yml new file mode 100644 index 000000000000..cae736d6ee69 --- /dev/null +++ b/.github/actions/agentic-pr-prepare/action.yml @@ -0,0 +1,423 @@ +name: 'Prepare agentic PR environment' +description: > + Universal trust-gate, sandbox setup, and PR-head clone for agentic + workflows running under `pull_request_target`. Carves out steps 1-8 and + 11-14 of the original verify-pr.yml so future agentic actions can reuse + this trust-boundary plumbing without copy-pasting. + + Caller contract: + - Workflow MUST use `pull_request_target` so this composite resolves + against the base ref (load-bearing for trust). + - Caller MUST declare `permissions:` and `concurrency:` blocks itself; + composites cannot. + - Caller MUST pass `srt-sha256` explicitly (kept inline so a "chore: bump + srt" PR cannot lower the social-review bar via single-approval default). + +inputs: + github-token: + description: 'GITHUB_TOKEN for base + PR-head manual clones' + required: true + base-ref: + description: 'Base branch ref (e.g. github.event.pull_request.base.ref)' + required: true + base-sha: + description: 'Base SHA (github.event.pull_request.base.sha)' + required: true + pr-head-sha: + description: 'PR head SHA (github.event.pull_request.head.sha)' + required: true + repo: + description: 'owner/name (github.repository)' + required: true + srt-version: + description: 'Pinned @anthropic-ai/sandbox-runtime version' + required: false + default: '0.0.51' + srt-sha256: + description: > + sha256 of the resolved srt shim at srt-version. H1: NO default — caller + must pass it inline in the workflow so a chore-bump PR cannot flip both + version and sha together via a single approval. + required: true + srt-allowed-domains: + description: 'Newline list of allowed network domains for srt jail' + required: false + default: | + localhost + 127.0.0.1 + registry.yarnpkg.com + registry.npmjs.org + registry.npmjs.com + objects.githubusercontent.com + playwright.azureedge.net + playwright-akamai.azureedge.net + playwright-verizon.azureedge.net + cdn.playwright.dev + srt-allow-write-paths: + description: 'Newline list of srt allowWrite paths (env vars expanded at composite runtime)' + required: false + default: | + $PR_HEAD_DIR + $SANDBOX_TMPDIR + /tmp + $HOME/.cache + $HOME/.local/share + $HOME/.storybook + srt-deny-read-paths: + description: 'Newline list of srt denyRead paths' + required: false + default: | + $HOME/.ssh + $HOME/.aws + $HOME/.docker + $HOME/.npmrc + $HOME/.gitconfig + $HOME/.config/gh + $GITHUB_WORKSPACE/.git + srt-deny-write-paths: + description: 'Newline list of srt denyWrite paths' + required: false + default: | + $GITHUB_WORKSPACE + $GITHUB_WORKSPACE/.git + sync-files: + description: > + Newline-delimited `src:dst` pairs (src relative to base checkout, + dst relative to PR_HEAD_DIR). H2: rejects `..`, leading `/`, or + extra `:`; refuses symlink at dst before `cp --no-dereference`. + required: false + default: '' + sync-trees: + description: > + Newline-delimited tree paths (relative). Each tree is copied with + `cp -aT` after refusing symlink at the dst root. H2 path-validated. + required: false + default: '' + provenance-secret: + description: > + Optional caller-supplied secret (e.g. secrets.VERIFY_PROVENANCE_SECRET). + If empty, a per-run random 32-byte hex secret is generated. + required: false + default: '' + install-code-deps: + description: 'Pass-through to setup-node-and-install' + required: false + default: 'true' + +outputs: + pr-head-dir: + description: 'Absolute path to untrusted PR-head workspace clone' + value: ${{ steps.paths.outputs.pr-head-dir }} + srt-settings-path: + description: 'Absolute path to srt-settings.json' + value: ${{ steps.srt-settings.outputs.srt-settings-path }} + diff-path: + description: 'Absolute path to captured pr.diff' + value: ${{ steps.diff.outputs.diff-path }} + provenance-secret-path: + description: > + M2: Absolute path to file holding the per-run provenance secret. Trusted + steps `cat` it explicitly; the secret is NOT written to $GITHUB_ENV so + a future caller forgetting an `env -i` allowlist cannot leak it. + value: ${{ steps.provenance.outputs.provenance-secret-path }} + +runs: + using: 'composite' + steps: + - name: Check actor permission + uses: prince-chrismc/check-actor-permissions-action@d504e74ba31658f4cdf4fcfeb509d4c09736d88e # v3.0.2 + with: + permission: write + + - name: Checkout base (manual clone — no token persistence) + # Manual git clone instead of actions/checkout because the repo carries + # gitlinks (.external/addon-svelte-csf etc.) without a .gitmodules + # entry. actions/checkout's "Removing auth" cleanup walks the submodule + # tree under persist-credentials:false and aborts with + # fatal: No url found for submodule path '.external/...' + # Manual clone never writes credentials into .git/config and never + # walks .gitmodules, so we get persist-credentials:false semantics + # without the cleanup-walk crash. + shell: bash + env: + GITHUB_TOKEN: ${{ inputs.github-token }} + BASE_REF: ${{ inputs.base-ref }} + BASE_SHA: ${{ inputs.base-sha }} + PR_HEAD_SHA: ${{ inputs.pr-head-sha }} + REPO: ${{ inputs.repo }} + run: | + set -euo pipefail + # Clone to $RUNNER_TEMP (outside workspace) then overlay contents + # onto $GITHUB_WORKSPACE via `cp -aT`. The caller's bootstrap + # sparse-checkout left `.github/` on disk; overlay overwrites it + # cleanly so the workspace ends up with the trusted base ref's + # full tree (including this composite's own action.yml). + TMP_CLONE="$RUNNER_TEMP/_base_clone" + rm -rf "$TMP_CLONE" + git -c protocol.version=2 clone \ + --no-tags --depth=1 --branch "$BASE_REF" \ + "https://x-access-token:${GITHUB_TOKEN}@github.com/${REPO}.git" \ + "$TMP_CLONE" + cp -aT "$TMP_CLONE" "$GITHUB_WORKSPACE" + rm -rf "$TMP_CLONE" + git -C "$GITHUB_WORKSPACE" -c protocol.version=2 \ + fetch --no-tags --depth=1 origin "$BASE_SHA" "$PR_HEAD_SHA" + git -C "$GITHUB_WORKSPACE" remote set-url origin "https://github.com/${REPO}.git" + git -C "$GITHUB_WORKSPACE" config --local --unset-all credential.helper 2>/dev/null || true + git -C "$GITHUB_WORKSPACE" config --local --unset-all http.https://github.com/.extraheader 2>/dev/null || true + + - name: Setup Node.js and Install Dependencies + uses: ./.github/actions/setup-node-and-install + with: + install-code-deps: ${{ inputs.install-code-deps }} + + - name: Setup Bun + uses: oven-sh/setup-bun@0c5077e51419868618aeaa5fe8019c62421857d6 # v2.2.0 + + - name: Compute paths + id: paths + shell: bash + run: | + set -euo pipefail + PR_HEAD_DIR="${RUNNER_TEMP}/pr-head" + echo "PR_HEAD_DIR=${PR_HEAD_DIR}" >> "$GITHUB_ENV" + echo "pr-head-dir=${PR_HEAD_DIR}" >> "$GITHUB_OUTPUT" + + - name: Init provenance secret + id: provenance + # M2: write secret to file (mode 0600), NOT $GITHUB_ENV. A future + # caller forgetting the `env -i` allowlist would otherwise leak it + # to untrusted task steps. Trusted steps consume it via + # `cat "$(provenance-secret-path)"`. + shell: bash + env: + INHERITED: ${{ inputs.provenance-secret }} + run: | + set -euo pipefail + if [ -n "$INHERITED" ]; then + SECRET="$INHERITED" + else + SECRET="$(openssl rand -hex 32)" + fi + echo "::add-mask::$SECRET" + SECRET_FILE="$RUNNER_TEMP/provenance-secret" + umask 077 + printf '%s' "$SECRET" > "$SECRET_FILE" + echo "provenance-secret-path=$SECRET_FILE" >> "$GITHUB_OUTPUT" + + - name: Checkout PR head (untrusted execution context) + shell: bash + env: + GITHUB_TOKEN: ${{ inputs.github-token }} + PR_HEAD_SHA: ${{ inputs.pr-head-sha }} + BASE_SHA: ${{ inputs.base-sha }} + REPO: ${{ inputs.repo }} + run: | + set -euo pipefail + git -c protocol.version=2 clone \ + --no-tags --no-checkout --filter=blob:none \ + "https://x-access-token:${GITHUB_TOKEN}@github.com/${REPO}.git" \ + "$PR_HEAD_DIR" + cd "$PR_HEAD_DIR" + git -c protocol.version=2 fetch --no-tags --depth=1 origin "$PR_HEAD_SHA" + git -c protocol.version=2 fetch --no-tags --depth=1 origin "$BASE_SHA" + git checkout --force "$PR_HEAD_SHA" + git remote set-url origin "https://github.com/${REPO}.git" + git config --local --unset-all credential.helper 2>/dev/null || true + + - name: Fetch PR diff + id: diff + shell: bash + env: + PR_HEAD_SHA: ${{ inputs.pr-head-sha }} + BASE_SHA: ${{ inputs.base-sha }} + run: | + set -euo pipefail + DIFF_PATH="$RUNNER_TEMP/pr.diff" + git -C "$PR_HEAD_DIR" diff "$BASE_SHA" "$PR_HEAD_SHA" > "$DIFF_PATH" + echo "diff-path=$DIFF_PATH" >> "$GITHUB_OUTPUT" + + - name: Sync trusted harness code into PR head + # H2: path-validate every src:dst pair from `sync-files` and every tree + # from `sync-trees`. Reject `..`, leading `/`, extra `:`; resolve + # realpath and assert under $PR_HEAD_DIR. Refuse symlink at dst + # before cp --no-dereference. M4: also rm -f .verify-recipes/pr-*.spec.ts + # belt-and-suspenders even though Author overwrites later. + shell: bash + env: + SYNC_FILES: ${{ inputs.sync-files }} + SYNC_TREES: ${{ inputs.sync-trees }} + run: | + set -euo pipefail + + assert_safe_rel() { + local rel="$1" label="$2" + case "$rel" in + /*) echo "[sync] $label: absolute path rejected: '$rel'" >&2; exit 1 ;; + *..*) echo "[sync] $label: '..' rejected: '$rel'" >&2; exit 1 ;; + '') echo "[sync] $label: empty path rejected" >&2; exit 1 ;; + esac + } + + assert_under_prhead() { + local resolved="$1" label="$2" + case "$resolved" in + "$PR_HEAD_DIR"|"$PR_HEAD_DIR"/*) : ;; + *) echo "[sync] $label: resolved path '$resolved' escapes \$PR_HEAD_DIR" >&2; exit 1 ;; + esac + } + + mkdir -p "$PR_HEAD_DIR/.verify-recipes" + rm -f "$PR_HEAD_DIR"/.verify-recipes/pr-*.spec.ts + + # sync-files: src:dst pairs + while IFS= read -r line; do + [ -z "$line" ] && continue + # Strip trailing whitespace + line="${line%"${line##*[![:space:]]}"}" + [ -z "$line" ] && continue + # Reject extra colons (more than one) + if [ "$(awk -F: '{print NF-1}' <<<"$line")" != "1" ]; then + echo "[sync] sync-files entry must contain exactly one ':' — got '$line'" >&2 + exit 1 + fi + src="${line%%:*}" + dst="${line#*:}" + assert_safe_rel "$src" "sync-files src" + assert_safe_rel "$dst" "sync-files dst" + src_abs="$GITHUB_WORKSPACE/$src" + dst_abs="$PR_HEAD_DIR/$dst" + # Assert dst resolves under PR_HEAD_DIR even if intermediate parents are symlinks. + dst_resolved="$(realpath -m "$dst_abs")" + assert_under_prhead "$dst_resolved" "sync-files dst" + if [ ! -f "$src_abs" ]; then + echo "[sync] sync-files src missing: $src_abs" >&2 + exit 1 + fi + mkdir -p "$(dirname "$dst_abs")" + if [ -L "$dst_abs" ]; then + echo "[sync] refusing to overwrite symlink at $dst_abs" >&2 + exit 1 + fi + cp --no-dereference --remove-destination "$src_abs" "$dst_abs" + done <<< "$SYNC_FILES" + + # sync-trees: tree paths + while IFS= read -r line; do + [ -z "$line" ] && continue + line="${line%"${line##*[![:space:]]}"}" + [ -z "$line" ] && continue + assert_safe_rel "$line" "sync-trees" + src_abs="$GITHUB_WORKSPACE/$line" + dst_abs="$PR_HEAD_DIR/$line" + dst_resolved="$(realpath -m "$dst_abs")" + assert_under_prhead "$dst_resolved" "sync-trees dst" + if [ ! -d "$src_abs" ]; then + echo "[sync] sync-trees src missing: $src_abs" >&2 + exit 1 + fi + mkdir -p "$(dirname "$dst_abs")" + if [ -L "$dst_abs" ]; then + echo "[sync] refusing to overwrite symlink dir at $dst_abs" >&2 + exit 1 + fi + rm -rf "$dst_abs" + cp -aT "$src_abs" "$dst_abs" + done <<< "$SYNC_TREES" + + - name: Install sandbox-runtime (Layer-2) + # H1: caller MUST pass srt-sha256 inline; no composite default. + shell: bash + env: + SRT_VERSION: ${{ inputs.srt-version }} + EXPECTED_SRT_SHA: ${{ inputs.srt-sha256 }} + run: | + set -euo pipefail + if [ -z "$EXPECTED_SRT_SHA" ]; then + echo "srt-sha256 input is required (no composite default — see SECURITY.md §pinning-sandbox-runtime)" >&2 + exit 1 + fi + sudo apt-get update -qq + sudo apt-get install -y --no-install-recommends bubblewrap socat ripgrep + sudo npm install -g --ignore-scripts "@anthropic-ai/sandbox-runtime@${SRT_VERSION}" + srt --version + actual=$(sha256sum "$(which srt)" | cut -d' ' -f1) + if [ "$actual" != "$EXPECTED_SRT_SHA" ]; then + echo "srt sha mismatch — expected=$EXPECTED_SRT_SHA actual=$actual" >&2 + exit 1 + fi + echo "[srt] sha verified: $actual" + + - name: Build sandbox settings + id: srt-settings + # H3: emit JSON arrays via jq, never via shell string interpolation. + # Allows future callers to pass PR-controlled values without enabling + # JSON injection into the srt config. + shell: bash + env: + SRT_ALLOWED_DOMAINS: ${{ inputs.srt-allowed-domains }} + SRT_ALLOW_WRITE_PATHS: ${{ inputs.srt-allow-write-paths }} + SRT_DENY_READ_PATHS: ${{ inputs.srt-deny-read-paths }} + SRT_DENY_WRITE_PATHS: ${{ inputs.srt-deny-write-paths }} + run: | + set -euo pipefail + SANDBOX_TMPDIR="$RUNNER_TEMP/sandbox-tmp" + mkdir -p "$SANDBOX_TMPDIR" + mkdir -p "$PR_HEAD_DIR/.verify-scratch" + mkdir -p "$PR_HEAD_DIR/.verify-out-trusted" + mkdir -p "$HOME/.storybook" + + # Expand $VAR refs in path lists at composite runtime, then drop + # blank lines. The newline-list-to-jq-array convention is + # `jq -R . | jq -s .` (read each line as string, then slurp to array). + expand_lines() { + local raw="$1" + # envsubst would also work but isn't installed everywhere. Use + # `eval echo` per line, then strip empties. + while IFS= read -r line; do + [ -z "$line" ] && continue + line="${line%"${line##*[![:space:]]}"}" + [ -z "$line" ] && continue + eval "printf '%s\n' \"$line\"" + done <<< "$raw" + } + + ALLOWED_DOMAINS_JSON=$(expand_lines "$SRT_ALLOWED_DOMAINS" | jq -R . | jq -s .) + ALLOW_WRITE_JSON=$(expand_lines "$SRT_ALLOW_WRITE_PATHS" | jq -R . | jq -s .) + DENY_READ_JSON=$(expand_lines "$SRT_DENY_READ_PATHS" | jq -R . | jq -s .) + DENY_WRITE_JSON=$(expand_lines "$SRT_DENY_WRITE_PATHS" | jq -R . | jq -s .) + + SRT_SETTINGS_PATH="$RUNNER_TEMP/srt-settings.json" + jq -n \ + --argjson allowedDomains "$ALLOWED_DOMAINS_JSON" \ + --argjson allowWrite "$ALLOW_WRITE_JSON" \ + --argjson denyRead "$DENY_READ_JSON" \ + --argjson denyWrite "$DENY_WRITE_JSON" \ + '{ + network: { + allowLocalBinding: true, + allowedDomains: $allowedDomains, + deniedDomains: [] + }, + filesystem: { + allowRead: [], + allowWrite: $allowWrite, + denyRead: $denyRead, + denyWrite: $denyWrite + } + }' > "$SRT_SETTINGS_PATH" + + echo "SRT_SETTINGS=$SRT_SETTINGS_PATH" >> "$GITHUB_ENV" + echo "CLAUDE_CODE_TMPDIR=$SANDBOX_TMPDIR" >> "$GITHUB_ENV" + echo "srt-settings-path=$SRT_SETTINGS_PATH" >> "$GITHUB_OUTPUT" + cat "$SRT_SETTINGS_PATH" + + - name: Smoke-test srt egress policy + shell: bash + run: | + set -uo pipefail + if srt --settings "$SRT_SETTINGS" curl --max-time 5 https://example.com >/dev/null 2>&1; then + echo "[smoke] EGRESS POLICY BROKEN — example.com reachable inside srt" >&2 + exit 1 + fi + echo "[smoke] egress correctly denied" diff --git a/.github/actions/agentic-pr-publish/README.md b/.github/actions/agentic-pr-publish/README.md new file mode 100644 index 000000000000..e0746e2e141a --- /dev/null +++ b/.github/actions/agentic-pr-publish/README.md @@ -0,0 +1,115 @@ +# agentic-pr-publish + +Universal post-task publishing for agentic workflows: read verdict, push +screenshots to a side branch, append telemetry, stage + upload artifacts. + +This is **half 2 of 2** of the split `verify-pr.yml` infrastructure. + +## Caller contract + +The composite **cannot** declare these — the caller workflow MUST: + +1. Run under `pull_request_target` (composite resolves against base ref). +2. Wrap the `uses:` step in `if: always()` if the caller wants publish to + run on prior-step failure. M1: composite-level `if:` does NOT cascade to + sub-steps, but every sub-step inside this composite that needs + prior-step-failure tolerance already carries explicit `if: always()`. +3. Thread `result-path` from the task step that wrote `verify-result.json` + (e.g. `${{ steps.verify.outputs.result-path }}`). Do not glob + PR-writable directories to find it — that's the C1 forgery surface. +4. Declare `permissions:` block at least: + ```yaml + permissions: + pull-requests: write + contents: write # side-branch screenshot push (omit if skip-screenshots) + ``` + +## Inputs + +| Name | Required | Default | Purpose | +|----------------------------|----------|--------------------------|------------------------------------------------------------------------| +| `github-token` | yes | — | Side-branch push. | +| `pr-number` | yes | — | PR number. | +| `run-id` | yes | — | `github.run_id`. | +| `repo` | yes | — | `github.repository`. | +| `result-path` | yes | — | Trusted absolute path to `verify-result.json`. | +| `pr-head-dir` | no | `env.PR_HEAD_DIR` | Inherited from prepare composite. | +| `screenshot-source-dir` | no | `/.verify-output` | Where `push-screenshots.ts` scans for PNGs. | +| `dispatch-dirs` | no | `/.verify-output\n/.verify-output` | Newline list. Passed as repeated `--dispatch-dir`. | +| `telemetry-webhook-url` | no | (empty → no-op) | Telemetry sink. | +| `telemetry-webhook-token` | no | (empty → no-op) | Telemetry auth. | +| `artifact-name-prefix` | no | `verify-output` | Final artifact name = `-pr--`. | +| `retention-days` | no | `14` | Artifact retention. | +| `skip-screenshots` | no | `false` | `true` → skip side-branch push (callers without PNG output). | +| `skip-telemetry` | no | `false` | `true` → telemetry no-op regardless of webhook secrets. | + +## Outputs + +| Name | Purpose | +|----------------------------|-----------------------------------------------------------------------------------------------| +| `verdict` | Verdict from `derive-verdict.ts` (`verified` / `regression` / `evidence-missing` / `missing`).| +| `screenshot-urls-path` | **H4**: absolute path to FILE containing screenshot URLs JSON. Read with `fs.readFileSync` in caller; do not interpolate into shell.| + +## H4: screenshot-urls indirection + +Composite output is a **file path**, not a heredoc-encoded JSON string. The +caller's `actions/github-script` step reads: + +```js +const path = process.env.SCREENSHOT_URLS_PATH; +const items = JSON.parse(fs.readFileSync(path, 'utf-8')); +``` + +Closes the heredoc-terminator-injection surface that exists if `screenshot-urls` +were a single-line composite output piped through `< + Read verdict, push screenshots to side branch, append telemetry, stage + artifacts, upload artifacts. Carves out steps 19 + 21-24 of the original + verify-pr.yml. Action-agnostic; verdict-gated label / PR comment / final + fail-gate stay in the caller because they encode action-specific shape. + + M1: every sub-step that mirrors a current `if: always()` step carries + explicit `if: always()` here — composite-level `if:` does NOT cascade. + + H4: `screenshot-urls` is exposed as a FILE PATH (composite output + `screenshot-urls-path`), not a heredoc string. Caller's github-script + reads the file with fs.readFileSync. Closes a heredoc-terminator-injection + surface against the PR comment renderer. + +inputs: + github-token: + description: 'GITHUB_TOKEN for side-branch push' + required: true + pr-number: + description: 'PR number' + required: true + run-id: + description: 'github.run_id' + required: true + repo: + description: 'owner/name (github.repository)' + required: true + result-path: + description: > + Trusted absolute path to verify-result.json (caller threads from the + task step's $GITHUB_OUTPUT). + required: true + provenance-secret-path: + description: > + C1 fix: path to file holding the per-run provenance secret. derive-verdict.ts + reads VERIFY_PROVENANCE_SECRET from env to validate the HMAC signature + on verify-result.json. If empty, HMAC gate is skipped (back-compat for + callers that don't yet pass it; treats the verdict as untrusted in the + same sense as the pre-C1 path). + required: false + default: '' + pr-head-dir: + description: 'Absolute path to PR-head workspace (defaults to env.PR_HEAD_DIR)' + required: false + default: '' + screenshot-source-dir: + description: 'Where push-screenshots.ts scans for PNGs' + required: false + default: '' + dispatch-dirs: + description: 'Newline-delimited dirs passed as repeated --dispatch-dir to append-telemetry.ts' + required: false + default: '' + telemetry-webhook-url: + description: 'Empty → telemetry no-ops' + required: false + default: '' + telemetry-webhook-token: + description: 'Empty → telemetry no-ops' + required: false + default: '' + artifact-name-prefix: + description: 'Final artifact name = -pr--' + required: false + default: 'verify-output' + retention-days: + description: 'Artifact retention' + required: false + default: '14' + skip-screenshots: + description: 'true → side-branch push skipped' + required: false + default: 'false' + skip-telemetry: + description: 'true → telemetry no-op (independent of webhook secrets)' + required: false + default: 'false' + +outputs: + verdict: + description: 'Verdict emitted by derive-verdict.ts (e.g. verified | regression | missing)' + value: ${{ steps.verdict.outputs.verdict }} + screenshot-urls-path: + description: > + H4: absolute path to a file containing the screenshot URLs JSON + (array of {rel,url} or []). Empty file or missing path means no + screenshots. Caller reads with fs.readFileSync to avoid heredoc + injection through composite output. + value: ${{ steps.screenshots.outputs.urls-path }} + +runs: + using: 'composite' + steps: + - name: Read verdict + id: verdict + if: always() + shell: bash + env: + RESULT: ${{ inputs.result-path }} + PROVENANCE_SECRET_PATH: ${{ inputs.provenance-secret-path }} + run: | + set -euo pipefail + if [ ! -f "$RESULT" ]; then + echo "verdict=missing" >> "$GITHUB_OUTPUT" + exit 0 + fi + REPORT="$(dirname "$RESULT")/playwright-report.json" + # C1 fix: thread the provenance secret into derive-verdict.ts via env + # so it can validate the HMAC signature. The secret never appears in + # the composite's $GITHUB_ENV or step output — only in this subshell. + if [ -n "$PROVENANCE_SECRET_PATH" ] && [ -f "$PROVENANCE_SECRET_PATH" ]; then + VERIFY_PROVENANCE_SECRET="$(cat "$PROVENANCE_SECRET_PATH")" \ + node "$GITHUB_WORKSPACE/scripts/verify/ci/derive-verdict.ts" \ + --result "$RESULT" \ + --report "$REPORT" \ + >> "$GITHUB_OUTPUT" + else + node "$GITHUB_WORKSPACE/scripts/verify/ci/derive-verdict.ts" \ + --result "$RESULT" \ + --report "$REPORT" \ + >> "$GITHUB_OUTPUT" + fi + + - name: Push screenshots to side branch + id: screenshots + if: always() && inputs.skip-screenshots != 'true' && steps.verdict.outputs.verdict != '' && steps.verdict.outputs.verdict != 'missing' + shell: bash + # M3: token via env-mapping, not literal interpolation. H4: write urls + # JSON to a file path exposed as composite output, not a heredoc to + # $GITHUB_OUTPUT. + env: + GITHUB_TOKEN: ${{ inputs.github-token }} + PR_NUMBER: ${{ inputs.pr-number }} + REPO: ${{ inputs.repo }} + RUN_ID: ${{ inputs.run-id }} + SRC_DIR: ${{ inputs.screenshot-source-dir }} + run: | + set -euo pipefail + SRC="${SRC_DIR:-${{ inputs.pr-head-dir != '' && inputs.pr-head-dir || env.PR_HEAD_DIR }}/.verify-output}" + URLS_FILE="$RUNNER_TEMP/screenshot-urls.json" + # Default to empty array so caller can always fs.readFileSync. + echo '[]' > "$URLS_FILE" + + # Route push-screenshots' heredoc output to a private sink, then + # extract just the urls JSON value into URLS_FILE. Avoids exposing + # the heredoc terminator across the composite-caller boundary. + SINK="$RUNNER_TEMP/screenshot-output-sink" + : > "$SINK" + node "$GITHUB_WORKSPACE/scripts/verify/ci/push-screenshots.ts" \ + --source "$SRC" \ + --pr "$PR_NUMBER" \ + --run-id "$RUN_ID" \ + --repo "$REPO" \ + --assets-dir "$RUNNER_TEMP/verify-assets" \ + --output "$SINK" + # Heredoc shape: `urls<\nEOF\n`. Extract everything + # between `urls< "$URLS_FILE" + fi + echo "urls-path=$URLS_FILE" >> "$GITHUB_OUTPUT" + + - name: Append telemetry + if: always() && inputs.skip-telemetry != 'true' && steps.verdict.outputs.verdict != '' && steps.verdict.outputs.verdict != 'missing' + shell: bash + # M3: webhook secrets via env-mapping only. No literal interpolation + # into the shell. append-telemetry.ts no-ops when env unset. + env: + PR_NUMBER: ${{ inputs.pr-number }} + RUN_ID: ${{ inputs.run-id }} + TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_URL: ${{ inputs.telemetry-webhook-url }} + TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_TOKEN: ${{ inputs.telemetry-webhook-token }} + RESULT: ${{ inputs.result-path }} + DISPATCH_DIRS: ${{ inputs.dispatch-dirs }} + PR_HEAD_DIR_IN: ${{ inputs.pr-head-dir }} + run: | + set -euo pipefail + DISPATCH_ARGS=() + if [ -n "$DISPATCH_DIRS" ]; then + while IFS= read -r d; do + [ -z "$d" ] && continue + d="${d%"${d##*[![:space:]]}"}" + [ -z "$d" ] && continue + DISPATCH_ARGS+=(--dispatch-dir "$d") + done <<< "$DISPATCH_DIRS" + else + PRHD="${PR_HEAD_DIR_IN:-$PR_HEAD_DIR}" + DISPATCH_ARGS+=(--dispatch-dir "$PRHD/.verify-output") + DISPATCH_ARGS+=(--dispatch-dir "$GITHUB_WORKSPACE/.verify-output") + fi + node "$GITHUB_WORKSPACE/scripts/verify/ci/append-telemetry.ts" \ + --result "$RESULT" \ + --pr "$PR_NUMBER" \ + --run-id "$RUN_ID" \ + "${DISPATCH_ARGS[@]}" \ + --curl-cfg "$RUNNER_TEMP/.curl-cfg" + + - name: Stage artifacts (rename dotdirs) + if: always() + shell: bash + env: + PR_NUMBER: ${{ inputs.pr-number }} + PR_HEAD_DIR_IN: ${{ inputs.pr-head-dir }} + run: | + set -euo pipefail + PRHD="${PR_HEAD_DIR_IN:-$PR_HEAD_DIR}" + STAGE="$RUNNER_TEMP/verify-artifacts" + rm -rf "$STAGE" + mkdir -p "$STAGE/runner-pr-head" "$STAGE/base" + if [ -d "$PRHD/.verify-output" ]; then + cp -a "$PRHD/.verify-output/." "$STAGE/runner-pr-head/verify-output/" + fi + if [ -d "$PRHD/.verify-recipes" ]; then + mkdir -p "$STAGE/runner-pr-head/verify-recipes" + if [ -f "$PRHD/.verify-recipes/pr-${PR_NUMBER}.spec.ts" ]; then + cp "$PRHD/.verify-recipes/pr-${PR_NUMBER}.spec.ts" \ + "$STAGE/runner-pr-head/verify-recipes/pr-${PR_NUMBER}.spec.ts" + fi + fi + if [ -d "$GITHUB_WORKSPACE/.verify-output" ]; then + cp -a "$GITHUB_WORKSPACE/.verify-output/." "$STAGE/base/verify-output/" + fi + echo "VERIFY_ARTIFACTS_DIR=$STAGE" >> "$GITHUB_ENV" + ls -la "$STAGE" || true + + - name: Upload artifacts + if: always() + uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 + with: + name: ${{ inputs.artifact-name-prefix }}-pr-${{ inputs.pr-number }}-${{ inputs.run-id }} + path: ${{ env.VERIFY_ARTIFACTS_DIR }}/ + if-no-files-found: warn + retention-days: ${{ inputs.retention-days }} diff --git a/.github/actions/setup-node-and-install/action.yml b/.github/actions/setup-node-and-install/action.yml index d882d1c57068..0d700d649898 100644 --- a/.github/actions/setup-node-and-install/action.yml +++ b/.github/actions/setup-node-and-install/action.yml @@ -11,16 +11,12 @@ runs: using: 'composite' steps: - name: Setup Node.js - uses: actions/setup-node@v4 + uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0 with: node-version-file: '.nvmrc' - - name: Update npm to latest - shell: bash - run: npm install -g npm@latest - - name: Cache dependencies - uses: actions/cache@v4 + uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3 with: path: | ~/.yarn/berry/cache diff --git a/.github/scripts/agent-scan-check-org-membership.mjs b/.github/scripts/agent-scan-check-org-membership.mjs deleted file mode 100644 index 7a9da73799e5..000000000000 --- a/.github/scripts/agent-scan-check-org-membership.mjs +++ /dev/null @@ -1,37 +0,0 @@ -import * as core from '@actions/core'; -import * as github from '@actions/github'; - -async function main() { - const token = core.getInput('token', { required: true }); - const org = core.getInput('org', { required: true }); - const username = core.getInput('username', { required: true }); - - const octokit = github.getOctokit(token); - - let isOrgMember = false; - - try { - await octokit.rest.orgs.checkMembershipForUser({ - org, - username, - }); - - isOrgMember = true; - } catch (error) { - if (error.status === 404) { - } else if (error.status === 302 || error.status === 403) { - core.warning( - `Unable to verify org membership for ${username}; GitHub API returned ${error.status}. Falling back to scanning this fork PR.` - ); - } else { - throw error; - } - } - - core.setOutput('is-org-member', String(isOrgMember)); - core.setOutput('should-scan', String(!isOrgMember)); -} - -main().catch((error) => { - core.setFailed(error.message); -}); diff --git a/.github/workflows/agent-scan.yml b/.github/workflows/agent-scan.yml index badbb0da6a48..52891e575a1a 100644 --- a/.github/workflows/agent-scan.yml +++ b/.github/workflows/agent-scan.yml @@ -17,8 +17,10 @@ jobs: agentscan: if: | github.repository_owner == 'storybookjs' && - github.event.pull_request.head.repo.full_name != github.repository && !contains( + fromJSON('["OWNER","MEMBER","COLLABORATOR"]'), + github.event.pull_request.author_association + ) && !contains( fromJSON('["dependabot[bot]", "github-actions[bot]","storybook-bot"]'), github.event.pull_request.user.login ) @@ -29,22 +31,13 @@ jobs: - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd - name: Install script dependencies run: npm install --prefix .github/scripts - - name: Check author org membership - id: membership - env: - INPUT_TOKEN: ${{ secrets.GITHUB_TOKEN }} - INPUT_ORG: ${{ github.repository_owner }} - INPUT_USERNAME: ${{ github.event.pull_request.user.login }} - run: node .github/scripts/agent-scan-check-org-membership.mjs - name: Cache AgentScan analysis - if: steps.membership.outputs.should-scan == 'true' uses: actions/cache@668228422ae6a00e4ad889ee87cd7109ec5666a7 with: path: .agentscan-cache key: agentscan-cache-${{ github.actor }} restore-keys: agentscan-cache- - name: AgentScan - if: steps.membership.outputs.should-scan == 'true' id: agentscan uses: MatteoGabriele/agentscan-action@a584774dd15cabe6df4c6ab45fc43514a3b56b2d with: @@ -52,7 +45,7 @@ jobs: agent-scan-comment: false cache-path: .agentscan-cache - name: Label PR with classification - if: steps.membership.outputs.should-scan == 'true' && steps.agentscan.outputs.classification + if: steps.agentscan.outputs env: INPUT_TOKEN: ${{ secrets.GITHUB_TOKEN }} INPUT_CLASSIFICATION: ${{ steps.agentscan.outputs.classification }} diff --git a/.github/workflows/verify-pr.yml b/.github/workflows/verify-pr.yml new file mode 100644 index 000000000000..48b530da97bd --- /dev/null +++ b/.github/workflows/verify-pr.yml @@ -0,0 +1,531 @@ +name: PR Verification Harness +on: + pull_request_target: + types: [labeled, synchronize] + paths-ignore: ['docs/**', '**/*.md'] + +# v6 single-round model: recipe is authored AND executed in the same workflow +# run. The authored spec is materialised straight into the untrusted PR-head +# workspace ($RUNNER_TEMP/pr-head/.verify-recipes/pr-<#>.spec.ts) — never +# committed. See scripts/verify/SECURITY.md for the load-bearing controls. +# +# Infrastructure (trust gate, sandbox, artifact publish) is factored into +# two composite actions: +# .github/actions/agentic-pr-prepare/ — universal prep (steps 1-8, 11-14) +# .github/actions/agentic-pr-publish/ — universal publish (steps 19, 21-24) +# This file owns only the verify-specific TASK steps in between. +# +# Composite resolution under pull_request_target: `uses: ./.github/actions/…` +# resolves against the BASE ref, which is load-bearing for trust. Do not +# switch this workflow to a trigger that resolves composites against PR-head. + +jobs: + verify: + if: github.event.pull_request.draft == false && contains(github.event.pull_request.labels.*.name, 'ci:verify') + runs-on: ubuntu-22.04 + timeout-minutes: 30 + concurrency: + group: verify-${{ github.event.pull_request.number }} + cancel-in-progress: true + permissions: + pull-requests: write + issues: write + statuses: write + contents: write # needed to push screenshots to the _agentic-pr-assets side branch + steps: + - name: Bootstrap composite actions + # Manual sparse clone of `.github/` only so `uses: ./.github/actions/...` + # below can resolve composite action.yml from disk. + # + # Cannot use actions/checkout: its cleanup post-step walks gitlinks + # (registered in git index even when not in working tree) via + # `git submodule foreach`, which aborts with + # fatal: No url found for submodule path '.external/addon-svelte-csf' + # because the repo has 160000 tree entries (.external/, .rollout-repos/) + # without a corresponding .gitmodules entry. + # + # Manual clone with sparse-checkout = .github avoids both the + # working-tree materialization of gitlinks AND the cleanup walk. + # The composite's own base clone runs next and overlays the full + # trusted base tree onto this workspace. + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + BASE_REF: ${{ github.event.pull_request.base.ref }} + REPO: ${{ github.repository }} + run: | + set -euo pipefail + TMP="$RUNNER_TEMP/_bootstrap" + rm -rf "$TMP" + git -c protocol.version=2 clone \ + --no-tags --no-checkout --depth=1 --filter=blob:none \ + --branch "$BASE_REF" \ + "https://x-access-token:${GITHUB_TOKEN}@github.com/${REPO}.git" \ + "$TMP" + git -C "$TMP" sparse-checkout set --no-cone .github + git -C "$TMP" checkout "$BASE_REF" -- .github + mkdir -p "$GITHUB_WORKSPACE/.github" + cp -a "$TMP/.github/." "$GITHUB_WORKSPACE/.github/" + rm -rf "$TMP" + + - name: Prepare agentic environment + id: prep + uses: ./.github/actions/agentic-pr-prepare + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + base-ref: ${{ github.event.pull_request.base.ref }} + base-sha: ${{ github.event.pull_request.base.sha }} + pr-head-sha: ${{ github.event.pull_request.head.sha }} + repo: ${{ github.repository }} + # H1: srt-sha256 inline so chore-bump PRs carry the heightened + # workflow-review bar (not lowered to "config default" review). + # Bump procedure: re-run .github/workflows/_srt-sha-probe.yml and + # paste the emitted sha here. + srt-version: '0.0.51' + srt-sha256: '36de38197ac22991c8c9edead4d6184914c8b786e040ecf27bdcf26abd166338' + sync-files: | + .verify-recipes/_util.ts:.verify-recipes/_util.ts + .verify-recipes/example-smoke.spec.ts:.verify-recipes/example-smoke.spec.ts + scripts/verify-pr.ts:scripts/verify-pr.ts + scripts/verify-pr-author.ts:scripts/verify-pr-author.ts + scripts/verify-pr-generate.ts:scripts/verify-pr-generate.ts + scripts/verify-evidence-check.ts:scripts/verify-evidence-check.ts + sync-trees: | + scripts/verify + scripts/utils + provenance-secret: ${{ secrets.VERIFY_PROVENANCE_SECRET }} + install-code-deps: 'true' + + - name: Generate bundle + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PR_NUMBER: ${{ github.event.pull_request.number }} + PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }} + BASE_SHA: ${{ github.event.pull_request.base.sha }} + run: | + yarn verify-pr-generate \ + --pr "$PR_NUMBER" \ + --force \ + --base-sha "$BASE_SHA" \ + --head-sha "$PR_HEAD_SHA" \ + --output "$PR_HEAD_DIR/.verify-recipes/pr-${PR_NUMBER}.spec.ts" + + - name: Author recipe + # M2: load provenance secret from file (not $GITHUB_ENV) and thread + # explicitly via env. recipe-author-core reads VERIFY_PROVENANCE_SECRET + # from its own process env to HMAC-sign the spec. + env: + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + PROVENANCE_SECRET_PATH: ${{ steps.prep.outputs.provenance-secret-path }} + run: | + set -euo pipefail + # shellcheck disable=SC2012 + BUNDLE=$(ls -t .verify-output/*/prompt-bundle.json | head -1) + VERIFY_PROVENANCE_SECRET="$(cat "$PROVENANCE_SECRET_PATH")" \ + yarn verify-pr-author --bundle "$BUNDLE" + + - name: Verify PR + id: verify + working-directory: ${{ runner.temp }}/pr-head + env: + PR_NUMBER: ${{ github.event.pull_request.number }} + NX_NO_CLOUD: 'true' + NX_CLOUD_ACCESS_TOKEN: '' + PROVENANCE_SECRET_PATH: ${{ steps.prep.outputs.provenance-secret-path }} + run: | + set -euo pipefail + # see RUNBOOK.md §verify-pr-secret-stripping + # C3: extended unset list. M2 note: VERIFY_PROVENANCE_SECRET no + # longer lives in $GITHUB_ENV, but unset is still defense-in-depth + # in case a future caller adds it. + unset GITHUB_TOKEN GH_TOKEN ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN \ + ACTIONS_RUNTIME_TOKEN ACTIONS_ID_TOKEN_REQUEST_TOKEN \ + ACTIONS_ID_TOKEN_REQUEST_URL ACTIONS_RESULTS_URL ACTIONS_CACHE_URL \ + TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_URL \ + TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_TOKEN \ + VERIFY_PROVENANCE_SECRET + PROVENANCE_SECRET="$(cat "$PROVENANCE_SECRET_PATH")" + # UX1: redirect $GITHUB_ENV / $GITHUB_PATH / $GITHUB_OUTPUT to + # throwaway sinks so an untrusted install / compile step can't + # poison subsequent steps. + SINK_DIR="$RUNNER_TEMP/sink" + mkdir -p "$SINK_DIR" + SINK_GITHUB_ENV="$SINK_DIR/github_env" + SINK_GITHUB_PATH="$SINK_DIR/github_path" + SINK_GITHUB_OUTPUT="$SINK_DIR/github_output" + : > "$SINK_GITHUB_ENV" + : > "$SINK_GITHUB_PATH" + : > "$SINK_GITHUB_OUTPUT" + env -i \ + HOME="$HOME" \ + PATH="$PATH" \ + RUNNER_TEMP="$RUNNER_TEMP" \ + NX_NO_CLOUD=true \ + PR_HEAD_DIR="$PR_HEAD_DIR" \ + GITHUB_ENV="$SINK_GITHUB_ENV" \ + GITHUB_PATH="$SINK_GITHUB_PATH" \ + GITHUB_OUTPUT="$SINK_GITHUB_OUTPUT" \ + YARN_ENABLE_SCRIPTS=false \ + yarn install --immutable + env -i \ + HOME="$HOME" \ + PATH="$PATH" \ + RUNNER_TEMP="$RUNNER_TEMP" \ + NX_NO_CLOUD=true \ + PR_HEAD_DIR="$PR_HEAD_DIR" \ + GITHUB_ENV="$SINK_GITHUB_ENV" \ + GITHUB_PATH="$SINK_GITHUB_PATH" \ + GITHUB_OUTPUT="$SINK_GITHUB_OUTPUT" \ + yarn playwright install --with-deps chromium + + VERIFY_RESULT_PATH="$PR_HEAD_DIR/.verify-out-trusted/verify-result.json" + export VERIFY_RESULT_PATH + STUB_OUT_DIR="$(dirname "$VERIFY_RESULT_PATH")" + mkdir -p "$STUB_OUT_DIR" + echo "result-path=$VERIFY_RESULT_PATH" >> "$GITHUB_OUTPUT" + + write_compile_failure_stub() { + local log="$1" + node "$GITHUB_WORKSPACE/scripts/verify/ci/write-compile-failure-stub.ts" \ + --log "$log" \ + --out-dir "$STUB_OUT_DIR" \ + --template "internal-ui" + } + COMPILE_LOG="$RUNNER_TEMP/verify-compile.log" + SPEC=".verify-recipes/pr-${PR_NUMBER}.spec.ts" + TARGET="internal-ui" + TEMPLATE="" + if [ -f "$SPEC" ]; then + HEADER=$(grep -E '^// @verify-target:' "$SPEC" | head -1 || true) + CANDIDATE=$(echo "$HEADER" | sed -E 's|.*@verify-target:[[:space:]]*||;s|[[:space:]]+$||') + case "$CANDIDATE" in + sandbox:*) + TEMPLATE="${CANDIDATE#sandbox:}" + case "$TEMPLATE" in + react-vite/default-ts|react-webpack/default-ts|vue3-vite/default-ts|svelte-vite/default-ts|angular-cli/default-ts|nextjs/default-ts|nextjs-vite/default-ts) + TARGET="sandbox" + ;; + *) + echo "[verify] sandbox template '$TEMPLATE' is not allowlisted; falling back to internal-ui prep." + TEMPLATE="" + ;; + esac + ;; + esac + fi + + srt_compile() { + env -i \ + HOME="$HOME" \ + PATH="$PATH" \ + RUNNER_TEMP="$RUNNER_TEMP" \ + NX_NO_CLOUD=true \ + PR_HEAD_DIR="$PR_HEAD_DIR" \ + GITHUB_ENV="$SINK_GITHUB_ENV" \ + GITHUB_PATH="$SINK_GITHUB_PATH" \ + GITHUB_OUTPUT="$SINK_GITHUB_OUTPUT" \ + "$@" + } + if [ "$TARGET" = "sandbox" ]; then + echo "[verify] spec targets sandbox '$TEMPLATE' — running 'nx run $TEMPLATE:sandbox'" + if ! srt_compile yarn nx run "$TEMPLATE:sandbox" 2>&1 | tee "$COMPILE_LOG"; then + write_compile_failure_stub "$COMPILE_LOG" + exit 0 + fi + else + if ! srt_compile yarn nx compile core 2>&1 | tee "$COMPILE_LOG"; then + write_compile_failure_stub "$COMPILE_LOG" + exit 0 + fi + if ! srt_compile yarn nx run-many -t compile 2>&1 | tee "$COMPILE_LOG"; then + write_compile_failure_stub "$COMPILE_LOG" + exit 0 + fi + fi + + # M2: thread VERIFY_PROVENANCE_SECRET into srt env explicitly via + # `env VAR=...` so the orchestrator can HMAC-verify the spec. srt + # propagates this var into the jailed process; spec code already + # cannot reach it because the spec runs as Playwright workers, and + # the orchestrator scrubs it before spawning workers. + srt --settings "$SRT_SETTINGS" \ + env VERIFY_RESULT_PATH="$VERIFY_RESULT_PATH" \ + VERIFY_PROVENANCE_SECRET="$PROVENANCE_SECRET" \ + yarn verify-pr --recipe-spec ".verify-recipes/pr-${PR_NUMBER}.spec.ts" || \ + echo "verify-pr exited non-zero — verdict captured in verify-result.json" + + - name: Evidence check (vision) + if: always() + env: + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + PR_NUMBER: ${{ github.event.pull_request.number }} + RESULT: ${{ steps.verify.outputs.result-path }} + run: | + set -euo pipefail + if [ ! -f "$RESULT" ]; then + echo "no verify-result.json at $RESULT — skipping evidence check" + exit 0 + fi + VERDICT=$(jq -r '.verdict' "$RESULT") + if [ "$VERDICT" != "verified" ]; then + echo "verdict is '$VERDICT' — skipping evidence check (retry will use Playwright error context)" + exit 0 + fi + RECIPE="$PR_HEAD_DIR/.verify-recipes/pr-${PR_NUMBER}.spec.ts" + yarn verify-evidence-check \ + --result "$RESULT" \ + --diff "$RUNNER_TEMP/pr.diff" \ + --recipe "$RECIPE" + + - name: Retry on regression or evidence missing/undetermined + if: always() + env: + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PR_NUMBER: ${{ github.event.pull_request.number }} + PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }} + BASE_SHA: ${{ github.event.pull_request.base.sha }} + NX_NO_CLOUD: 'true' + NX_CLOUD_ACCESS_TOKEN: '' + RESULT: ${{ steps.verify.outputs.result-path }} + PROVENANCE_SECRET_PATH: ${{ steps.prep.outputs.provenance-secret-path }} + run: | + set -euo pipefail + if [ ! -f "$RESULT" ]; then + echo "no verify-result.json at $RESULT — skipping retry" + exit 0 + fi + VERDICT=$(jq -r '.verdict' "$RESULT") + EVIDENCE=$(jq -r '.evidenceVerdict // "n/a"' "$RESULT") + RUN_DIR=$(dirname "$RESULT") + + REASON="" + case "$VERDICT" in + regression) + ERROR_CTX="" + for f in "$RUN_DIR"/*/error-context.md; do + [ -f "$f" ] || continue + SLUG=$(basename "$(dirname "$f")") + ERROR_CTX+=$'\n\n--- page snapshot at failure ('"$SLUG"$') ---\n' + ERROR_CTX+=$(head -c 8000 "$f") + IF="$(dirname "$f")/iframe-snapshot.md" + if [ -f "$IF" ]; then + ERROR_CTX+=$'\n\n--- preview iframe snapshot at failure ('"$SLUG"$') ---\n' + ERROR_CTX+=$(head -c 8000 "$IF") + fi + done + REPORT="$RUN_DIR/playwright-report.json" + ERROR_MSG="" + if [ -f "$REPORT" ]; then + ERROR_MSG=$(jq -r ' + [.. | objects | select(has("errors")) | .errors[]? | (.message // .stack // tostring)] | .[0] // "" + ' "$REPORT" 2>/dev/null || true) + fi + REASON="Playwright assertions failed. The recipe ran but the test did not pass. Use the page snapshot below as ground truth for selectors / route paths / aria roles — if you navigated to a route that does not exist, the snapshot will say so; if a locator timed out, the actual DOM is in the snapshot. Adjust selectors / navigation accordingly. Do NOT repeat the previous attempt's selectors or routes verbatim." + if [ -n "$ERROR_MSG" ]; then + REASON+=$'\n\nFirst Playwright error:\n'"$ERROR_MSG" + fi + REASON+="$ERROR_CTX" + echo "regression — retrying with error context (length=${#REASON})" + ;; + *) + if [ "$EVIDENCE" != "missing" ] && [ "$EVIDENCE" != "undetermined" ]; then + echo "verdict=$VERDICT evidence=$EVIDENCE — no retry needed" + exit 0 + fi + REASON=$(jq -r '.evidenceReasoning // ""' "$RESULT") + echo "evidence $EVIDENCE — retrying with vision reasoning" + ;; + esac + + # shellcheck disable=SC2012 + PRIOR_RUN_DIR=$(ls -dt "$PR_HEAD_DIR"/.verify-output/*/ 2>/dev/null | head -1 || true) + if ! yarn verify-pr-generate \ + --pr "$PR_NUMBER" \ + --force \ + --base-sha "$BASE_SHA" \ + --head-sha "$PR_HEAD_SHA" \ + --prior-run-dir "${PRIOR_RUN_DIR%/}" \ + --output "$PR_HEAD_DIR/.verify-recipes/pr-${PR_NUMBER}.spec.ts" \ + --retry-context "$REASON"; then + echo "retry verify-pr-generate failed — keeping original verdict" + exit 0 + fi + + # shellcheck disable=SC2012 + BUNDLE=$(ls -t .verify-output/*/prompt-bundle.json | head -1) + PROVENANCE_SECRET="$(cat "$PROVENANCE_SECRET_PATH")" + if ! VERIFY_PROVENANCE_SECRET="$PROVENANCE_SECRET" yarn verify-pr-author --bundle "$BUNDLE"; then + echo "retry verify-pr-author failed — keeping original verdict" + exit 0 + fi + + PREV_MTIME=$(stat -c '%Y' "$RESULT" 2>/dev/null || stat -f '%m' "$RESULT" 2>/dev/null || echo 0) + + ( + cd "$PR_HEAD_DIR" + env -i \ + HOME="$HOME" \ + PATH="$PATH" \ + RUNNER_TEMP="$RUNNER_TEMP" \ + NX_NO_CLOUD=true \ + PR_HEAD_DIR="$PR_HEAD_DIR" \ + VERIFY_RESULT_PATH="$RESULT" \ + VERIFY_PROVENANCE_SECRET="$PROVENANCE_SECRET" \ + srt --settings "$SRT_SETTINGS" \ + yarn verify-pr --recipe-spec ".verify-recipes/pr-${PR_NUMBER}.spec.ts" + ) || \ + echo "retry verify-pr exited non-zero — evidence-check will still run if verdict==verified" + + NEW_MTIME=$(stat -c '%Y' "$RESULT" 2>/dev/null || stat -f '%m' "$RESULT" 2>/dev/null || echo 0) + if [ "$NEW_MTIME" = "$PREV_MTIME" ]; then + echo "retry produced no new verify-result.json — keeping original verdict" + exit 0 + fi + jq '. + {evidenceRetry: true}' "$RESULT" > "$RESULT.tmp" && mv "$RESULT.tmp" "$RESULT" + + NEW_VERDICT=$(jq -r '.verdict' "$RESULT") + if [ "$NEW_VERDICT" = "verified" ]; then + yarn verify-evidence-check \ + --result "$RESULT" \ + --diff "$RUNNER_TEMP/pr.diff" \ + --recipe "$PR_HEAD_DIR/.verify-recipes/pr-${PR_NUMBER}.spec.ts" || true + fi + + FINAL_EVIDENCE=$(jq -r '.evidenceVerdict // "n/a"' "$RESULT") + echo "retry complete — final verdict=$NEW_VERDICT evidence=$FINAL_EVIDENCE" + + - name: Run PR-added unit tests + if: always() + working-directory: ${{ runner.temp }}/pr-head + env: + PR_NUMBER: ${{ github.event.pull_request.number }} + NX_NO_CLOUD: 'true' + NX_CLOUD_ACCESS_TOKEN: '' + RESULT: ${{ steps.verify.outputs.result-path }} + run: | + set -euo pipefail + unset GITHUB_TOKEN GH_TOKEN ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN \ + ACTIONS_RUNTIME_TOKEN ACTIONS_ID_TOKEN_REQUEST_TOKEN \ + ACTIONS_ID_TOKEN_REQUEST_URL ACTIONS_RESULTS_URL ACTIONS_CACHE_URL \ + TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_URL \ + TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_TOKEN \ + VERIFY_PROVENANCE_SECRET + if [ ! -f "$RESULT" ]; then + echo "no verify-result.json at $RESULT — skipping unit-test step" + exit 0 + fi + + TEST_FILES=() + while IFS= read -r f; do + [ -z "$f" ] && continue + [ -f "$f" ] || continue + TEST_FILES+=("$f") + done < <( + grep -E '^\+\+\+ b/code/.+\.(test|spec)\.(ts|tsx|js|jsx)$' "$RUNNER_TEMP/pr.diff" 2>/dev/null \ + | sed 's|^+++ b/||' \ + | sort -u + ) + + if [ ${#TEST_FILES[@]} -eq 0 ]; then + echo "no PR-added unit tests detected — recording n/a" + jq '. + {unitTests: {ran: false, files: [], passed: null, summary: "no PR-added test files in diff"}}' \ + "$RESULT" > "$RESULT.tmp" && mv "$RESULT.tmp" "$RESULT" + exit 0 + fi + + echo "running ${#TEST_FILES[@]} unit test file(s): ${TEST_FILES[*]}" + REPORT="$RUNNER_TEMP/unit-tests-report.json" + VITEST_LOG="$RUNNER_TEMP/vitest.log" + + set +e + env -i \ + HOME="$HOME" \ + PATH="$PATH" \ + RUNNER_TEMP="$RUNNER_TEMP" \ + NX_NO_CLOUD=true \ + PR_HEAD_DIR="$PR_HEAD_DIR" \ + srt --settings "$SRT_SETTINGS" \ + yarn vitest run --reporter=json --outputFile "$REPORT" -- "${TEST_FILES[@]}" > "$VITEST_LOG" 2>&1 + VITEST_EXIT=$? + set -e + + if [ -f "$REPORT" ]; then + PASSED=$(jq '.numFailedTests == 0 and .numFailedTestSuites == 0' "$REPORT") + SUMMARY=$(jq -r '"\(.numPassedTests) passed, \(.numFailedTests) failed across \(.numTotalTestSuites) suite(s)"' "$REPORT") + else + PASSED=false + SUMMARY="vitest exited $VITEST_EXIT without writing a JSON report (likely setup error); see Action log" + fi + DETAILS=$(tail -c 4000 "$VITEST_LOG" \ + | perl -pe 's/\x1B[@-Z\\-_]|\x1B\[[0-?]*[ -\/]*[@-~]|\x1B\][^\x07\x1B]*(?:\x07|\x1B\\)|\x1BP[^\x1B]*\x1B\\|[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]//g' \ + | jq -Rs .) + + jq \ + --argjson passed "$PASSED" \ + --arg summary "$SUMMARY" \ + --argjson files "$(printf '%s\n' "${TEST_FILES[@]}" | jq -R . | jq -s .)" \ + --argjson details "$DETAILS" \ + '. + {unitTests: {ran: true, files: $files, passed: $passed, summary: $summary, details: $details}}' \ + "$RESULT" > "$RESULT.tmp" && mv "$RESULT.tmp" "$RESULT" + echo "unit-tests verdict: $SUMMARY (passed=$PASSED, exit=$VITEST_EXIT)" + + - name: Publish agentic results + id: pub + if: always() + uses: ./.github/actions/agentic-pr-publish + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + pr-number: ${{ github.event.pull_request.number }} + run-id: ${{ github.run_id }} + repo: ${{ github.repository }} + result-path: ${{ steps.verify.outputs.result-path }} + provenance-secret-path: ${{ steps.prep.outputs.provenance-secret-path }} + telemetry-webhook-url: ${{ secrets.TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_URL }} + telemetry-webhook-token: ${{ secrets.TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_TOKEN }} + + - name: Apply verified-by-harness label + if: steps.pub.outputs.verdict == 'verified' + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PR_NUMBER: ${{ github.event.pull_request.number }} + REPO: ${{ github.repository }} + run: | + gh label create verified-by-harness \ + --repo "$REPO" \ + --description "Verified by PR Verify Harness" \ + --color 0E8A16 \ + 2>/dev/null || true + gh pr edit "$PR_NUMBER" \ + --repo "$REPO" \ + --add-label verified-by-harness + + - name: Post PR comment + # Body rendering lives in scripts/verify/ci/render-pr-comment.ts so the + # workflow stays slim and the logic is testable in isolation. H4: + # screenshot URLs are read from FILE path (not a heredoc string) to + # close the terminator-injection surface across the composite boundary. + if: always() + env: + SCREENSHOT_URLS_PATH: ${{ steps.pub.outputs.screenshot-urls-path }} + RESULT: ${{ steps.verify.outputs.result-path }} + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + REPO: ${{ github.repository }} + PR_NUMBER: ${{ github.event.pull_request.number }} + RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} + run: | + set -euo pipefail + BODY_FILE="$RUNNER_TEMP/verify-comment-body.md" + node "$GITHUB_WORKSPACE/scripts/verify/ci/render-pr-comment.ts" \ + --result "$RESULT" \ + --run-url "$RUN_URL" \ + ${SCREENSHOT_URLS_PATH:+--urls-path "$SCREENSHOT_URLS_PATH"} \ + --output "$BODY_FILE" + gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file "$BODY_FILE" + + - name: Fail job if final verdict is not verified + if: always() && steps.pub.outputs.verdict != 'verified' + env: + VERDICT: ${{ steps.pub.outputs.verdict }} + run: | + echo "Final verdict: ${VERDICT:-} — failing job." + exit 1 diff --git a/.gitignore b/.gitignore index c8417670f741..0843b0c36dca 100644 --- a/.gitignore +++ b/.gitignore @@ -88,5 +88,8 @@ scripts/eval/results # review-pr skill output .pr-review +# verify-pr harness output +.verify-output + # Unknown .omc diff --git a/.verify-recipes/.eslintrc.cjs b/.verify-recipes/.eslintrc.cjs new file mode 100644 index 000000000000..cdbcffbb1d28 --- /dev/null +++ b/.verify-recipes/.eslintrc.cjs @@ -0,0 +1,131 @@ +module.exports = { + root: true, + parser: require.resolve('@typescript-eslint/parser'), + parserOptions: { ecmaVersion: 2022, sourceType: 'module', project: false }, + env: { node: true, es2022: true }, + plugins: ['@typescript-eslint', 'verify-recipes'], + extends: ['eslint:recommended', 'plugin:@typescript-eslint/recommended'], + rules: { + 'no-unused-vars': 'off', + '@typescript-eslint/no-unused-vars': [ + 'error', + { argsIgnorePattern: 'none', varsIgnorePattern: 'none' }, + ], + + // Security: forbid dynamic code execution + 'no-eval': 'error', + 'no-new-func': 'error', + 'no-implied-eval': 'error', + 'no-restricted-globals': ['error', 'eval', 'Function'], + + // Security (C6): deny every node: built-in (and its bare-form alias). + // ESLint's no-restricted-imports has no "allow-list" mode, so we + // enumerate the dangerous module names explicitly and pair it with + // a `node:*` glob to catch the prefixed forms regardless of which + // built-in shows up. The intent is an allow-list: imports allowed + // in recipes are limited to `@playwright/test`, `./_util.ts`, and + // `./_util` (and the deny-regex tripwire enforces the same). + 'no-restricted-imports': [ + 'error', + { + paths: [ + { name: 'child_process' }, + { name: 'node:child_process' }, + { name: 'fs' }, + { name: 'fs/promises' }, + { name: 'node:fs' }, + { name: 'node:fs/promises' }, + { name: 'net' }, + { name: 'node:net' }, + { name: 'dns' }, + { name: 'node:dns' }, + { name: 'http' }, + { name: 'node:http' }, + { name: 'https' }, + { name: 'node:https' }, + { name: 'module' }, + { name: 'node:module' }, + { name: 'vm' }, + { name: 'node:vm' }, + { name: 'cluster' }, + { name: 'node:cluster' }, + { name: 'worker_threads' }, + { name: 'node:worker_threads' }, + { name: 'os' }, + { name: 'node:os' }, + { name: 'path' }, + { name: 'node:path' }, + { name: 'stream' }, + { name: 'node:stream' }, + { name: 'tls' }, + { name: 'node:tls' }, + ], + patterns: [ + { + group: ['node:*'], + message: + 'node: built-ins are forbidden in recipes. Imports allowed: @playwright/test, ./_util.ts, ./_util.', + }, + ], + }, + ], + + // Security (C6): forbid runtime resolver / dynamic eval / native bindings. + // Each selector pins one obfuscation path that would otherwise sneak past + // the static import allow-list. + 'no-restricted-syntax': [ + 'error', + { + selector: "CallExpression[callee.name='require']", + message: 'Runtime require() is forbidden in recipes.', + }, + { + selector: "MemberExpression[property.name='require']", + message: + 'Member-access require (e.g. `foo.require`, `module.require`) is forbidden in recipes.', + }, + { + selector: + "MemberExpression[object.object.name='process'][object.property.name='mainModule']", + message: 'process.mainModule.* access is forbidden in recipes.', + }, + { + selector: "MemberExpression[object.name='globalThis'][computed=true]", + message: 'Computed globalThis[...] access is forbidden in recipes.', + }, + { + selector: 'ImportExpression', + message: 'Dynamic import() is forbidden in recipes.', + }, + { + selector: "Identifier[name='createRequire']", + message: 'createRequire is forbidden in recipes.', + }, + { + selector: "MemberExpression[property.name='_load']", + message: 'Module._load access is forbidden in recipes.', + }, + { + selector: "MemberExpression[object.name='process'][property.name='binding']", + message: 'process.binding() is forbidden in recipes.', + }, + { + selector: "MemberExpression[object.name='process'][property.name='dlopen']", + message: 'process.dlopen is forbidden in recipes.', + }, + { + selector: + "CallExpression[callee.object.name='process'][callee.property.name=/^(exit|kill|binding)$/]", + message: 'process.exit/kill/binding are forbidden in recipes.', + }, + { + selector: "CallExpression[callee.name='fetch']", + message: 'Global fetch is forbidden in recipes; use page.* primitives.', + }, + ], + + // Custom structural rules for Playwright recipe correctness + 'verify-recipes/listener-before-goto': 'error', + 'verify-recipes/attach-pattern': 'error', + }, +}; diff --git a/.verify-recipes/.gitkeep b/.verify-recipes/.gitkeep new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/.verify-recipes/__fixtures__/bad-node-import.ts b/.verify-recipes/__fixtures__/bad-node-import.ts new file mode 100644 index 000000000000..3bd9bb0864f0 --- /dev/null +++ b/.verify-recipes/__fixtures__/bad-node-import.ts @@ -0,0 +1,7 @@ +// Fixture: should fail ESLint with no-restricted-imports +// This file intentionally imports a forbidden node: built-in. +import { readFileSync } from 'node:fs'; + +export function dummy() { + return readFileSync('/dev/null', 'utf8'); +} diff --git a/.verify-recipes/__fixtures__/bypass-process-mainmodule.ts b/.verify-recipes/__fixtures__/bypass-process-mainmodule.ts new file mode 100644 index 000000000000..3420e26348a4 --- /dev/null +++ b/.verify-recipes/__fixtures__/bypass-process-mainmodule.ts @@ -0,0 +1,10 @@ +// Fixture (C6 bypass-attempt): should fail ESLint with no-restricted-syntax. +// Tries to reach child_process via process.mainModule.require — caught by the +// process.mainModule + member-access require selectors. +import { test } from './_util.ts'; + +test('attempt to load child_process via process.mainModule', () => { + // @ts-expect-error - process.mainModule is non-null at runtime in Node. + const cp = process.mainModule.require('child_process'); + cp.execSync('echo pwned'); +}); diff --git a/.verify-recipes/__fixtures__/goto-without-listener.ts b/.verify-recipes/__fixtures__/goto-without-listener.ts new file mode 100644 index 000000000000..5882dafeef96 --- /dev/null +++ b/.verify-recipes/__fixtures__/goto-without-listener.ts @@ -0,0 +1,8 @@ +// Fixture: should fail ESLint with verify-recipes/listener-before-goto +// This spec calls page.goto() without registering a listener first. +import { test, expect } from './_util.ts'; + +test('navigate without listener', async ({ page }) => { + await page.goto('/'); + await expect(page).toHaveURL('/'); +}); diff --git a/.verify-recipes/_recipe-authoring-guide.md b/.verify-recipes/_recipe-authoring-guide.md new file mode 100644 index 000000000000..ecc6941c536f --- /dev/null +++ b/.verify-recipes/_recipe-authoring-guide.md @@ -0,0 +1,382 @@ +# Recipe Authoring Guide (for LLM recipe-author agents) + +This file is the **authoring contract** for agent-generated Playwright recipes in `.verify-recipes/`. The `verify-recipe-author` skill includes this guide verbatim in the prompt; the runner executes the committed spec via `bun x playwright test`. + +> **Audience:** an LLM that writes a single `.spec.ts` file for one PR. The output must match the contract below exactly — no exceptions. + +--- + +## 1. Output contract + +Emit **one file** at the path specified by the skill: `.verify-recipes/pr-<#>.spec.ts`. + +Required shape: + +```ts +import { RecipePage, expect, filterPageErrors, test } from './_util.ts'; + +test('', async ({ page }, testInfo) => { + // ... see rules below ... +}); +``` + +Hard requirements: + +- **Imports**: ONLY `./_util.ts` (which re-exports `expect`, `filterPageErrors`, and a `test` extended with the harness's auto-failure-capture fixture — captures the preview iframe accessibility snapshot to `iframe-snapshot.md` so the retry loop can feed it back to the next author dispatch). Nothing else. No `node:*`, no `child_process`, no `fs`, no `@storybook/*`, no relative imports outside `.verify-recipes/`. Do not import `test` or `expect` directly from `@playwright/test`; that bypasses the failure-capture fixture. +- **Exactly one `test(...)` call.** No `describe`, no `test.skip`, no `test.only`, no `beforeEach`/`afterEach`. +- **`.ts` extension on relative imports** (`./_util.ts`, not `./_util`). +- **No top-level side effects** — everything inside the `test(...)` callback. + +Output is wrapped between fenced markers `<<>>` and `<<>>` (the skill strips these and writes the body). + +--- + +## 2. Listener-before-goto rule (HARD GATE — AC-V3-3) + +`page.on('pageerror', ...)` and `page.on('console', ...)` listeners MUST be registered **before** the first `page.goto(...)` call. The skill's post-write regex check enforces this; if you call `page.goto` first, the spec is rejected. + +Canonical pattern: + +```ts +test('my recipe', async ({ page }, testInfo) => { + const pageErrors: string[] = []; + const consoleErrors: string[] = []; + + // Listeners FIRST. Always. + page.on('pageerror', (err) => { + pageErrors.push(err.stack ?? err.message ?? String(err)); + }); + page.on('console', (msg) => { + if (msg.type() === 'error') consoleErrors.push(msg.text()); + }); + + const baseURL = + process.env.STORYBOOK_URL ?? testInfo.project.use.baseURL ?? 'http://localhost:6006'; + + // Now (and only now) navigate. + await page.goto(`${baseURL}/?path=/story/example-button--primary`); + // ... +}); +``` + +Never call `page.goto` (or `page.waitForURL`, or any other navigation primitive) before the listeners are attached. + +--- + +## 3. Attach pattern (HARD GATE — AC-V3-4) + +The runner harvests `pageErrors` and `consoleErrors` from test attachments. You MUST attach both in a `finally` block (so attachments land even on assertion failure): + +```ts +try { + // ...goto + assertions... +} finally { + await testInfo.attach('pageErrors', { + body: JSON.stringify(pageErrors), + contentType: 'application/json', + }); + await testInfo.attach('consoleErrors', { + body: JSON.stringify(consoleErrors), + contentType: 'application/json', + }); +} +``` + +Attachment names are exactly `pageErrors` and `consoleErrors`. The body is JSON-stringified array of strings (already accumulated by the listeners). + +### Filtering known low-signal pageErrors + +When you assert `pageErrors` at the end of the recipe, wrap the array in +`filterPageErrors(...)` from `./_util.ts`: + +```ts +expect(filterPageErrors(pageErrors)).toEqual([]); +``` + +`filterPageErrors` drops upstream-known noise — currently the cross-origin +`SecurityError: Failed to read the 'sessionStorage' property from 'Window'` +that `@storybook/addon-mcp` emits on every internal-ui boot when its +composed-ref auth probe touches chromatic-hosted iframes. The runner's +`computeVerdict` applies the same filter on the attachment side, so +**`filterPageErrors(pageErrors)` keeps the local assertion in sync with the +runner's verdict logic** and prevents a "regression" verdict driven entirely +by environmental noise. Never assert on the raw `pageErrors` array. + +--- + +## 4. `RecipePage` API (the only helper) + +From `./_util.ts`: + +```ts +new RecipePage(page, expect).waitUntilLoaded(): Promise +new RecipePage(page, expect).previewIframe(): FrameLocator +new RecipePage(page, expect).previewRoot(): Locator +new RecipePage(page, expect).waitForStoryLoaded(): Promise +``` + +- `waitUntilLoaded()` injects a session-storage layout, disables transitions, waits for `.sb-preparing-story` / `.sb-preparing-docs` to vanish, then for the story root to be attached. +- `previewIframe()` returns `page.frameLocator('#storybook-preview-iframe')` — use for any preview-frame assertions. +- `previewRoot()` returns the visible `#storybook-root` (or `#storybook-docs`) inside the preview iframe. + +Call `waitUntilLoaded()` immediately after `page.goto(...)`. + +--- + +## 5. Selectors and locators + +Preferred (in priority order): + +1. `page.getByRole(...)` — accessibility-tree queries, most stable +2. `page.getByTestId(...)` / `data-testid` selectors +3. ID selectors (`#storybook-preview-iframe`, `#sb-errordisplay`, `#storybook-root`) +4. Class selectors that look stable (`.sb-preparing-story` etc.) + +Avoid: + +- `:nth-child(N)` chains — break on layout shifts +- Brittle class chains (`.foo .bar .baz > div`) +- Free-text matches without `i18n` context +- `setTimeout` / `page.waitForTimeout` for synchronization — use Playwright web-first assertions or `RecipePage.waitUntilLoaded()` + +--- + +## 6. Story URL routing + +- Story: `?path=/story/--` (e.g., `/?path=/story/example-button--primary`) +- Docs: `?path=/docs/--` (e.g., `/?path=/docs/example-button--docs`) +- Manager only (no story): omit the `path` param or use `?path=/` + +**Use the routes the harness pre-computes for you.** The prompt bundle contains a "Story routes (computed deterministically by the harness)" section that lists, for each `*.stories.{ts,tsx,mdx}` file referenced by the diff (or imported by a sibling of a touched non-stories source file), the canonical title, the per-export `storyId`, and the matching `storyUrl` / `docsUrl`. These come from Storybook's own auto-title + `toId` algorithms, so they match what the indexer would emit at runtime. + +Past dispatches that hand-derived kebab-case kind-ids (`addons-controls-object--basic`, `addons-controls-basics--docs`, …) have 404'd because Storybook's auto-title pipeline mangles paths differently than a naive kebabify (leaf/dir dedupe, `index.stories.ts` collapsing, `titlePrefix` interplay, etc). Always prefer the routes the harness emits. + +If the section is absent (because the diff doesn't touch any code under `code/` or because no sibling story imports the changed module), fall back to the manager-only route `?path=/` and rely on sidebar-driven navigation. Do not invent a URL. + +--- + +## 7. Frame access + +- Manager DOM (toolbar, sidebar, addon panels): use `page` directly. +- Preview iframe DOM (the story itself): use `page.frameLocator('#storybook-preview-iframe')` or `recipe.previewIframe()`. + +Example: + +```ts +const recipe = new RecipePage(page, expect); +await recipe.waitUntilLoaded(); + +const previewIframe = recipe.previewIframe(); +const button = previewIframe.getByRole('button', { name: /primary/i }); +await expect(button).toBeVisible(); +``` + +--- + +## 8. Assertions — what counts as a meaningful recipe + +A **smoke-shaped recipe** is the minimum: navigate to a story, wait until loaded, assert preview-root has children, `#sb-errordisplay` is hidden, screenshot the iframe, attach errors. See `example-smoke.spec.ts` for the canonical form. + +A **targeted recipe** goes further. Examples per change type: + +| Diff touches | Recipe should additionally | +|---|---| +| `code/addons//**` | Open the addon panel (`recipe.previewIframe()` may be irrelevant; manager queries needed); assert addon-tab present; trigger the addon's primary interaction | +| `code/core/src/manager/**` | Assert sidebar entries render; navigate between two stories; assert URL update | +| `code/core/src/manager-api/**` | Assert at least one channel-bound UI element responds (e.g., theme toggle, tab switch) | +| `code/core/src/csf-tools/**` | Open a story whose CSF the PR touches; assert it indexes (visible in sidebar tree) | +| `code/core/src/preview-api/**` | Open a story with args/decorators in scope; assert `previewRoot()` rendered without errordisplay | +| `code/frameworks//**` | Use the framework's reference template story (e.g., svelte → svelte-vite default story); confirm SSR/CSR hydration shape if applicable | +| `code/builders/**` | Assert preview-iframe loads at all (builder errors surface here); navigate to a story; confirm HMR not needed for static load | + +Pick the assertion shape that most directly observes the changed code path. Prefer 1-3 focused assertions over a long list — the runner harvests pageerrors/consoleerrors orthogonally. + +--- + +## 8.1 Evidence requirement (HARD GATE for single-round CI) + +In single-round CI mode the assertions + screenshots ARE the evidence the harness reports as "verified". A smoke recipe that asserts unrelated story behaviour is technically passable but **does not verify the diff** — the PR comment will mislead reviewers. Treat this section as a hard authoring gate. + +Before emitting the spec, work through the following four questions explicitly: + +1. **What does this PR visibly or behaviourally change?** Read the diff carefully. Icon swap, text change, conditional render branch, focus / hover / dark-mode state, addon panel content, sidebar tree, URL params, computed style — all qualify. +2. **What UI state is required to see the change?** Common gates: + - **Conditional render** — e.g. `if (newCount === 0 && modifiedCount === 0) return null`. The element only mounts when its predicate is true. Identify the predicate's inputs and either set them via `page.evaluate(...)` against the manager-api / universal store, set localStorage / sessionStorage keys, or navigate to a route that produces the required state. + - **Feature flags** — `globalThis.FEATURES.changeDetection` and similar flags are **enabled by default** in the internal-ui Storybook. The diff itself is the only authoritative source for whether a new flag must be set. + - **Theme / dark-mode** — pass `?globals=theme:dark` in the URL or set the theme via `manager-api` once the manager mounts. + - **Focus / hover / keyboard-only states** — use `.focus()`, `.hover()`, `page.keyboard.press('Tab')`. Many a11y-related PRs only render their change in these states. + - **Specific story route** — when the diff names a specific component, navigate to the story that mounts it, not the generic `example-button--primary`. +3. **Before deciding the trigger state is unreachable, walk through every affordance listed in the next subsection.** For each one, decide whether it applies to this diff. Most "I can't do this without `fs.*`" assumptions turn out to be wrong because Storybook's own in-app machinery exposes a path: Save from Controls writes story files via csf-tools, `page.evaluate` reaches manager-api setters, URL globals flip theme/args, and so on. **Only after explicitly considering each affordance and rejecting it with a one-sentence reason** may you fall back to: render the surrounding container, assert `#sb-errordisplay` is hidden, assert `expect(filterPageErrors(pageErrors)).toEqual([])`. The bare phrase "working-tree mutation required" is **not** a valid fallback justification — Save from Controls satisfies that exact need without ever touching `fs.*`. The fallback is reserved for cases where (a) the diff is non-visual at all (pure type/logic refactor), or (b) the visible effect depends on env state outside the runner's reach. Either way, state the rejected affordances in the spec comment so a reviewer can audit the reasoning. +4. **Screenshot the region containing the changed UI**, not the whole page. Use `locator.screenshot({ path: testInfo.outputPath('.png') })` against the parent of the changed element (e.g. `.sidebar-container` for sidebar diffs, the addon-panel locator for addon panels, the docs `[role="table"]` for ArgsTable changes). Full-page or generic preview screenshots are acceptable only for layout-wide changes. The PR comment renders every screenshot you attach inline — reviewers should see the change in the image. + +### Affordances Playwright recipes have for setting up trigger state + +The deny-regex blocks `fs.*` and `child_process` inside the spec body. That does **not** mean the test cannot reach state that lives on disk — Storybook's own in-app machinery exposes plenty of paths. Before declaring a trigger state unreachable, consider: + +- **URL params for navigation, theme, args, globals, and docs vs story modes.** + - `?path=/story/--` and `?path=/docs/--` for story / docs routes. + - `?globals=theme:dark` (or whatever global the renderer exposes) to flip dark-mode and other globals without clicking the toolbar. + - `?args=name:Hello` to seed initial arg values for a story. +- **`page.evaluate(...)` against the manager-api.** Storybook exposes its manager-api on `window` once the manager mounts — useful when a feature has a public toggle / setter that recipes can call directly. Inspect the diff for an `experimental_*` or `api.*` setter the change relies on and call it from the recipe. +- **Save from Controls (csf-tools write-back) for change-detection-style features.** The Controls addon's save button is enabled by default. The recipe opens a story (e.g. `example-button--primary`), clicks the Controls tab (`getByRole('tab', { name: /controls/i })`), edits a control value (e.g. the `label` input), and clicks **`Save changes to story`** (aria-label) / **`Update story`** (visible text) — i.e. `getByRole('button', { name: /save changes to story|update story/i })`. Storybook's csf-tools writes the modified args back to the underlying `*.stories.tsx` file on the runner's PR-head workspace. The change-detection scanner reads uncommitted working-tree state, so the Save-driven edit flips the story's status to MOD and surfaces change-detection UI (e.g. `ReviewChangesButton`'s clear button) in the sidebar. The recipe never touches `fs.*` directly — Storybook does the write. +- **Keyboard / focus / hover.** `.focus()`, `.hover()`, `page.keyboard.press('Tab')`, `page.keyboard.press('Escape')`. Many a11y / interaction PRs only render their change in these states. +- **localStorage / sessionStorage / cookies.** Read or write via `page.evaluate(...)` when the change depends on persisted UI state (e.g. sidebar collapse, recently-viewed list). +- **Manager-side state via `__STORYBOOK_*` globals.** When the diff touches preview-api or manager-api code that exposes a development hook on `globalThis`, prefer `page.evaluate(() => globalThis.__STORYBOOK_*…)` over reverse-engineering a click sequence. + +Only fall back to the §8.1.3 "trigger state is genuinely unreachable" path after walking through the affordances above and confirming none apply to the diff at hand. If none apply, say so explicitly in a single-line comment in the spec body and limit the assertions to module-resolution + pageerror — the harness's evidence-check will report the gap honestly to reviewers. + +### Worked example — focus ring on a selected sidebar item + +```ts +await page.goto(`${baseURL}/?path=/story/example-button--primary`); +await new RecipePage(page, expect).waitUntilLoaded(); + +const selected = page.locator( + '[data-item-id="example-button--primary"][data-selected="true"]', +); +await selected.focus(); // trigger the focus-ring state + +await expect(selected).toHaveCSS('box-shadow', /inset.+2px/i); + +// Screenshot the sidebar region — the focus ring is visible here: +await page.locator('.sidebar-container').screenshot({ + path: testInfo.outputPath('sidebar-focus-ring.png'), +}); +``` + +### Worked example — icon swap inside a conditionally-rendered, change-detection-gated button + +`ReviewChangesButton` (and its clear button containing the icon under test) only renders when at least one story has status NEW or MOD. We use **Save from Controls** to mutate a story file on the working tree; the change-detection scanner picks the uncommitted edit up and flips the story's status, which causes `ReviewChangesButton` to mount. The recipe never calls `fs.*` directly — Storybook's csf-tools does the write. + +```ts +await page.goto(`${baseURL}/?path=/story/example-button--primary`); +const recipe = new RecipePage(page, expect); +await recipe.waitUntilLoaded(); + +// 1. Open the Controls panel and edit a control value. +const controlsTab = page.getByRole('tab', { name: /controls/i }); +await controlsTab.click(); +const labelInput = page.locator('input[name="label"], textarea[name="label"]').first(); +await labelInput.fill('Verify harness saved this'); + +// 2. Save from Controls — csf-tools writes the edit back to the story file on +// the runner's working tree. The button in this Storybook is +// aria-labelled "Save changes to story" with visible text "Update story"; +// match on either to stay robust across label drift. +const saveButton = page.getByRole('button', { name: /save changes to story|update story/i }); +await expect(saveButton).toBeVisible({ timeout: 10000 }); +await saveButton.click(); + +// 3. Change-detection now sees the story as MOD; the review toggle mounts. +// NOTE: it is rendered as an aria `switch`, NOT a `button`. Match accordingly. +const reviewToggle = page.getByRole('switch', { name: /review.+stories/i }); +await expect(reviewToggle).toBeVisible({ timeout: 15000 }); + +// 4. Activate review mode so the *clear* button (which carries the diff's icon) renders. +await reviewToggle.click(); +const clearButton = page.getByRole('button', { name: /^clear$/i }); +await expect(clearButton).toBeVisible({ timeout: 10000 }); + +// 5. Screenshot the sidebar region — the new UndoIcon is inside the clear button. +await page.locator('.sidebar-container').screenshot({ + path: testInfo.outputPath('sidebar-with-clear-button.png'), +}); +``` + +This pattern (Save from Controls → wait for status flip → screenshot the now-visible UI) is the canonical answer for any diff that touches change-detection-gated UI. The closing `expect(filterPageErrors(pageErrors)).toEqual([])` in the standard footer covers module-resolution as a free bonus. + +--- + +## 9. What to AVOID (skill's deny-regex enforces several of these) + +| Pattern | Why | +|---|---| +| `import ... from 'child_process'` / `require('child_process')` | Recipes never spawn subprocesses | +| `fs.unlink`, `fs.rm`, `fs.rmdir`, `fsp.unlink`, etc. | Recipes never delete files | +| `process.exit(...)` | Playwright handles test exit codes; never short-circuit | +| `eval(...)` | Never. Use `page.evaluate(...)` if you need in-browser execution | +| `import 'node:...'` (Node-only modules) | Recipes are Playwright-test files, not orchestration scripts | +| `@storybook/...` direct imports | Adds the non-erasable TS-enum chain that breaks under bun's strip-types path | +| `page.waitForTimeout(N)` | Always avoid time-based waits; use web-first assertions | +| `test.only`, `test.skip`, `describe.only` | Single test only; no skipping | +| Network calls (`fetch`, `axios`, etc.) inside the spec body | Storybook is local; no external endpoints | + +--- + +## 10. Header comment provenance (the skill prepends this) + +After you emit your spec body, the `verify-recipe-author` skill prepends a block comment with `{ generatedAt, agentModel, prNumber, referenceSpecs, triageGlobs }`. Do NOT emit this yourself — the skill owns it. + +--- + +## 11. Worked example (reference shape) + +See `.verify-recipes/example-smoke.spec.ts` for the canonical minimum. Your output should look structurally similar: listeners → goto → `waitUntilLoaded` → assertions → `finally` attach → `expect(filterPageErrors(pageErrors)).toEqual([])`. + +--- + +## 12. Target selection (v6) + +Pick one of two execution targets via a single-line header comment as +the **first non-empty line** of the spec: + +```ts +// @verify-target: internal-ui +// or: +// @verify-target: sandbox:react-vite/default-ts +``` + +| Target | What the harness boots | Pick when | +|---|---|---| +| `internal-ui` (default if header absent) | `code/storybook-static/` served via `http-server`. Built once from the PR-head monorepo. | The diff touches a package that the internal Storybook UI exercises (manager, manager-api, channels, core-server, addons, csf-tools, preview-api). This is the right answer for ~all PRs. | +| `sandbox: