diff --git a/.agents/skills/verify-recipe-author/SKILL.md b/.agents/skills/verify-recipe-author/SKILL.md new file mode 100644 index 000000000000..2843fc67d27d --- /dev/null +++ b/.agents/skills/verify-recipe-author/SKILL.md @@ -0,0 +1,192 @@ +--- +name: verify-recipe-author +description: Generate the Playwright recipe spec for a PR-verify-pr-generate prompt bundle. Reads `.verify-output//prompt-bundle.json`, dispatches the OMC executor agent (model=opus), and pipes the raw agent reply into `verify-pr-author` (stdin mode). The TypeScript core owns extraction, deny-regex, header-comment provenance, the file write to `.verify-recipes/pr-<#>.spec.ts`, scoped lint, the single retry, and `.verify-output//result.json`. Trigger after `yarn verify-pr-generate`. +allowed-tools: Agent, Bash, Read, Write, Edit +--- + +# Verify Recipe Author + +Consumes a prompt bundle emitted by `yarn verify-pr-generate --pr <#>` and produces the per-PR Playwright recipe spec for human review. Authoring only — never executes the spec. + +This skill is invoked **after** `yarn verify-pr-generate --pr <#>` succeeds. The bun script does the deterministic I/O (gh fetch, triage, prompt assembly, bundle write); this skill **only** dispatches the agent and pipes its raw reply into the `verify-pr-author` CLI. Extraction, deny-regex, provenance, file write, lint, the single retry, and `result.json` all live in TypeScript core — the skill never does them itself. + +> **Paths are repo-root-relative.** Every path below is written relative to +> the repository root, denoted `$REPO_ROOT`. Resolve it once at runtime with +> `REPO_ROOT="$(git rev-parse --show-toplevel)"` (works from any clone, +> worktree, or CI checkout) and substitute it wherever `$REPO_ROOT` appears. +> Never hardcode an absolute machine path — it breaks on every other +> clone/worktree/CI runner. + +The full design and acceptance criteria live in `$REPO_ROOT/.omc/plans/pr-verify-v3-agent-generated-recipes.md` (§Lane C, §D6, §D8, §D9). Read the plan if anything below is ambiguous. + +## Inputs + +No args required. The skill discovers the most recent bundle automatically. The caller may optionally pass an explicit bundle path as the skill argument. + +1. **Auto-discover (default)**: list `$REPO_ROOT/.verify-output/`, pick the directory with the lexicographically largest name (ISO timestamps sort correctly), then read `prompt-bundle.json` inside it. +2. **Explicit path**: if the user passed an absolute path to a `prompt-bundle.json`, read that file directly. + +Bundle shape (see `scripts/verify-pr-generate.ts` for the canonical emitter): + +```jsonc +{ + "version": 1, + "prNumber": 12345, + "runId": "...", + "outputSpecPath": "/abs/path/.verify-recipes/pr-12345.spec.ts", + "force": false, + "prompt": "", + "metadata": { + "agentModel": "claude-opus-4-7[1m]", + "referenceSpecs": ["..."], + "triageGlobs": ["..."], + "generatedAt": "" + } +} +``` + +The `` is the parent directory of the bundle — derive it from the bundle path, not from a field. + +## Runbook + +Follow these steps in order. Stop and emit `result.json` per §Failure Modes on any non-success outcome. + +### Step 1 — Read the bundle + +`Read` the bundle JSON. Capture `prNumber`, `runId` (from the parent dir), `outputSpecPath`, `force`, `prompt`, and `metadata`. + +### Step 2 — Pre-flight collision check (D9, TOCTOU re-guard) + +Re-check whether `bundle.outputSpecPath` already exists. The bun script enforced D9 at bundle-emit time; the skill re-checks because the user may have created the file between the two steps. + +- If the file exists and `bundle.force === false` → write `result.json` with `{ status: "collision", specPath: , attempts: 0 }` and stop. (This mirrors the CLI's own `collision` status / exit 1; the pre-flight only exists to skip a wasted agent dispatch — the CLI re-enforces D9 regardless.) +- Otherwise proceed. + +> **One owner.** After dispatch, the TypeScript core +> (`scripts/verify-pr-author.ts` → `scripts/verify/recipe-author-core.ts`) +> owns spec-body extraction, deny-regex, header-comment provenance, the +> file write, scoped lint, post-write regex checks, the single retry, and +> `result.json`. The skill does **not** extract fences, run deny-regex, or +> write the spec itself. Steps 3–5 below are the entire runbook. + +### Step 3 — Dispatch the agent (attempt 1) + +``` +Agent({ + description: "Generate PR recipe spec", + subagent_type: "oh-my-claudecode:executor", + model: "opus", + prompt: bundle.prompt +}) +``` + +The bundle's `prompt` already contains the full authoring contract, +reference specs, PR diff, and fence-marker instruction +(`<<>>` … `<<>>`). Capture the agent's full raw +reply as `$REPLY` (do not parse or edit it). + +### Step 4 — Pipe the raw reply to `verify-pr-author` (stdin mode) + +```bash +printf '%s' "$REPLY" | node "$REPO_ROOT/scripts/verify-pr-author.ts" --bundle --dispatch-mode stdin +``` + +The CLI performs extraction, deny-regex, provenance, file write, scoped +lint (`scripts/verify/lint-invocation.ts`), post-write regex checks, and +writes `result.json`. Exit codes: + +- `0` — success. CLI wrote the spec and `result.json`. Go to Step 6. +- `75` — retryable failure (lint, post-write regex, **or a first + deny-regex hit** — the CLI asks the agent to self-correct). The CLI + emitted a framed retry block on stdout. Go to Step 5. +- `1` — terminal failure (collision, extract-failed, or any gate + exhausted on the final attempt). CLI already wrote `result.json` with + the failure status. Print the failure line (Step 6) and stop. + +Exit 75 is the sole retry sentinel; any other non-zero exit is terminal. +The skill never decides retryability — the CLI does. + +### Step 5 — Retry once (on exit 75) + +Parse stdout for the framed retry block: + +``` +===VERIFY_PR_AUTHOR_RETRY_BEGIN=== + +===VERIFY_PR_AUTHOR_RETRY_END=== +``` + +Assemble the retry prompt and re-dispatch the agent (same +`subagent_type` and `model`): + +``` + + +[RETRY] + +``` + +Pipe the new raw reply back through the CLI in retry mode: + +```bash +printf '%s' "$REPLY2" | node "$REPO_ROOT/scripts/verify-pr-author.ts" --bundle --dispatch-mode stdin --retry-of +``` + +The CLI enforces `MAX_RECIPE_ATTEMPTS` (read from +`scripts/verify/recipe-author-core.ts`; currently 2) and will **not** +re-emit exit 75 on the retry call. Expected exits: + +- `0` — success. Go to Step 6. +- `1` — terminal failure (any gate exhausted on attempt 2). CLI wrote + `result.json` with `attempts: 2` and the terminal status. Print the + failure line and stop. + +### Step 6 — Print actionable next-step lines + +`result.json` is already written by the CLI — do **not** write it from +the skill. On success print: + +``` +[verify-recipe-author] spec written: +[verify-recipe-author] result.json: +[verify-recipe-author] attempts: +[verify-recipe-author] Next: review the spec, then run `yarn verify-pr --recipe-spec ` +``` + +On a terminal exit-1, print instead: + +``` +[verify-recipe-author] FAILED: — see +``` + +## Failure Modes + +`result.json` is written by the CLI, not the skill. `status` is the exact +`RecipeAuthorStatus` union from `scripts/verify/recipe-author-core.ts` — +do not invent values. On attempt 1 in stdin mode, lint / post-write-regex +/ **first deny-regex hit** all return `retry-requested` (CLI exit 75) so +the agent can self-correct; the terminal status below is what lands when +attempts are exhausted (CLI exit 1). + +| Cause | terminal `status` | Exit | Retried once first? | +|---|---|---|---| +| `outputSpecPath` exists and `force === false` | `collision` | 1 | no | +| No parseable body between fence markers | `extract-failed` | 1 | no (terminal immediately) | +| Deny-regex hit | `deny-regex-hit` | 1 | **yes** (attempt-1 → `retry-requested`/exit 75) | +| Scoped lint failed | `lint-failed` | 1 | yes (attempt-1 → `retry-requested`/exit 75) | +| Post-write regex check failed (listener-before-goto OR attach) | `regex-failed` | 1 | yes (attempt-1 → `retry-requested`/exit 75) | +| All gates pass | `spec-written` | 0 | n/a | + +## Notes + +- This skill runs inside Claude Code; it uses `Agent`, `Read`, `Write`, `Bash`, and `Edit` tools. +- Paths in invocations are repo-root-relative (`$REPO_ROOT`, resolved via `git rev-parse --show-toplevel` — see the note near the top); resolve `$REPO_ROOT` to an absolute path before invoking. Lint commands `cd code` via `yarn --cwd`. +- Max attempts = `MAX_RECIPE_ATTEMPTS` (currently 2). Read the value from `scripts/verify/recipe-author-core.ts` — do not hardcode. +- The skill **never executes** the generated spec. The human review gate (Phase-1 lethal-trifecta breaker) is preserved. +- A first deny-regex hit is retried **once** in stdin mode (the CLI emits `retry-requested` / exit 75 so the agent can self-correct, e.g. eval #36); only an exhausted deny hit is the terminal `deny-regex-hit`. The deny-regex remains a security gate — the single self-correction attempt does not weaken it (every attempt is re-checked; a persistent hit still terminates). +- Cap retry feedback at 5 errors (R3). +- The `runId` is the basename of the parent directory of the bundle; do not invent a new one. + +## Phase-2 follow-up + +This skill currently couples generation to a running Claude Code session via the `Agent` tool dispatch. Phase-2 CI activation will require migrating to a direct Anthropic SDK call (`@anthropic-ai/sdk`) with an `ANTHROPIC_API_KEY` env var, replacing the `Agent` dispatch with a standalone API call so the workflow at `.github/workflows/verify-pr.yml` can run unattended. Tracked as a follow-up in the plan's ADR §Follow-ups. diff --git a/.circleci/config.yml b/.circleci/config.yml index d2798c319911..0876d9be0686 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -17,6 +17,13 @@ parameters: default: '' description: The PR number type: string + ghIsFork: + default: 'false' + description: > + 'true' when the triggering PR head is a fork (untrusted). SECURITY: + gates save_cache so a fork pipeline cannot poison the project-global + cache that trusted merged/daily pipelines restore. + type: string workflow: default: skipped description: Which workflow to run @@ -44,7 +51,7 @@ jobs: - run: name: Generate config command: | - yarn dlx jiti ./scripts/ci/main.ts --workflow=<< pipeline.parameters.workflow >> + yarn dlx jiti ./scripts/ci/main.ts --workflow=<< pipeline.parameters.workflow >> --is-fork=<< pipeline.parameters.ghIsFork >> - continuation/continue: configuration_path: .circleci/config.generated.yml workflows: diff --git a/.claude/skills/verify-recipe-author/SKILL.md b/.claude/skills/verify-recipe-author/SKILL.md new file mode 100644 index 000000000000..534cd1871898 --- /dev/null +++ b/.claude/skills/verify-recipe-author/SKILL.md @@ -0,0 +1 @@ +@../../../.agents/skills/verify-recipe-author/SKILL.md diff --git a/.dockerignore b/.dockerignore new file mode 100644 index 000000000000..29fd5ffe235c --- /dev/null +++ b/.dockerignore @@ -0,0 +1,19 @@ +.env +.env.* +**/.env +**/.env.* +~/.ssh/ +~/.aws/ +~/.config/gcloud/ +~/.azure/ +~/.docker/config.json +~/.kube/config +.npmrc +.pypirc +**/*-service-account.json +**/*.pem +**/*.key +~/.git-credentials +.verify-output/ +node_modules/ +.nx/ diff --git a/.github/actions/agentic-pr-prepare/README.md b/.github/actions/agentic-pr-prepare/README.md new file mode 100644 index 000000000000..721ce655d23f --- /dev/null +++ b/.github/actions/agentic-pr-prepare/README.md @@ -0,0 +1,142 @@ +# agentic-pr-prepare + +Universal infrastructure setup for agentic workflows running under +`pull_request_target`: actor-permission gate, base + PR-head manual clones, +toolchain install, sandbox-runtime (srt) install + sha-pin verification, +srt-settings JSON, egress smoke-test, and trusted-harness sync. + +This is **half 1 of 2** of the split `verify-pr.yml` infrastructure. The +companion is `agentic-pr-publish`. + +## Caller contract + +The composite **cannot** declare these — the caller workflow MUST: + +1. Trigger on `pull_request_target` (composite `uses: ./.github/actions/...` + resolves against the **base ref** under PRT, which is load-bearing for + trust — never lift this to a trigger that resolves against PR-head). +2. Declare a `permissions:` block. Verify-PR needs at least: + ```yaml + permissions: + pull-requests: write + issues: write + statuses: write + contents: write # side-branch screenshot push (drop if not needed) + ``` +3. Declare a `concurrency:` block. Single-PR: + ```yaml + concurrency: + group: verify-${{ github.event.pull_request.number }} + cancel-in-progress: true + ``` + With `strategy.matrix`, include the matrix dim in the key: + `verify-${{ pr-num }}-${{ matrix.target }}` (matrix-concurrency footgun). +4. Pass `srt-sha256` **inline** with every call. The composite has **no + default** — this keeps a chore-bump PR carrying the heightened + workflow-review bar instead of single-approval flipping a composite default. + +## Inputs + +| Name | Required | Default | Purpose | +|------------------------|----------|----------------------------------|----------------------------------------------------------------------------------------| +| `github-token` | yes | — | Base + PR-head manual clones. | +| `base-ref` | yes | — | `github.event.pull_request.base.ref`. | +| `base-sha` | yes | — | `github.event.pull_request.base.sha`. | +| `pr-head-sha` | yes | — | `github.event.pull_request.head.sha`. | +| `repo` | yes | — | `github.repository`. | +| `srt-version` | no | `0.0.51` | Pinned `@anthropic-ai/sandbox-runtime` version. | +| `srt-sha256` | **yes** | — (no default by design) | sha256 of the resolved `srt` shim at `srt-version`. Bump via `_srt-sha-probe.yml`. | +| `srt-allowed-domains` | no | localhost + registries + CDNs | Newline list. Caller may extend. | +| `srt-allow-write-paths`| no | `$PR_HEAD_DIR`, `$SANDBOX_TMPDIR`, `/tmp`, `$HOME/.cache`, … | Newline list; env vars expanded at composite runtime. | +| `srt-deny-read-paths` | no | `$HOME/.ssh`, `$HOME/.aws`, … | Newline list. | +| `srt-deny-write-paths` | no | `$GITHUB_WORKSPACE`, `$GITHUB_WORKSPACE/.git` | Newline list. | +| `sync-files` | no | (empty) | Newline-delimited `src:dst` pairs (paths relative). H2 path-validated. | +| `sync-trees` | no | (empty) | Newline-delimited tree paths (relative). H2 path-validated. | +| `provenance-secret` | no | (empty → per-run random) | Optional caller-supplied. M2: written to file, not `$GITHUB_ENV`. | +| `install-code-deps` | no | `true` | Pass-through to `setup-node-and-install`. | + +### Path-input safety (H2) + +`sync-files` and `sync-trees` reject `..`, leading `/`, extra `:`; resolve +realpath and assert under `$PR_HEAD_DIR`. Refuses symlink at destination +before `cp --no-dereference` / `cp -aT`. + +### srt-settings JSON emission (H3) + +allowWrite / denyRead / denyWrite / allowedDomains arrays are emitted via +`jq -R . | jq -s .` so PR-controllable strings cannot inject JSON keys. + +## Outputs + +| Name | Purpose | +|----------------------------|--------------------------------------------------------------------------------------------------| +| `pr-head-dir` | Absolute path to untrusted PR-head workspace clone. | +| `srt-settings-path` | Absolute path to `srt-settings.json`. | +| `diff-path` | Absolute path to captured `pr.diff`. | +| `provenance-secret-path` | M2: path to file (mode 0600) holding the per-run provenance secret. NOT in `$GITHUB_ENV`. | + +## Side-effects + +Writes to `$GITHUB_ENV` (so subsequent caller steps in the same job see them): + +- `PR_HEAD_DIR` — absolute path to PR-head workspace +- `SRT_SETTINGS` — absolute path to srt-settings.json +- `CLAUDE_CODE_TMPDIR` — absolute path to sandbox scratch tmpdir + +Does **NOT** write `VERIFY_PROVENANCE_SECRET` to `$GITHUB_ENV`. Trusted task +steps load it explicitly: `cat "$(provenance-secret-path)"`. + +## Worked example + +```yaml +- name: Prepare agentic environment + id: prep + uses: ./.github/actions/agentic-pr-prepare + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + base-ref: ${{ github.event.pull_request.base.ref }} + base-sha: ${{ github.event.pull_request.base.sha }} + pr-head-sha: ${{ github.event.pull_request.head.sha }} + repo: ${{ github.repository }} + srt-version: '0.0.51' + srt-sha256: '36de38197ac22991c8c9edead4d6184914c8b786e040ecf27bdcf26abd166338' + sync-files: | + .verify-recipes/_util.ts:.verify-recipes/_util.ts + sync-trees: | + scripts/verify + provenance-secret: ${{ secrets.VERIFY_PROVENANCE_SECRET }} + +- name: Your task + env: + PROVENANCE_SECRET_PATH: ${{ steps.prep.outputs.provenance-secret-path }} + run: | + VERIFY_PROVENANCE_SECRET="$(cat "$PROVENANCE_SECRET_PATH")" \ + yarn your-thing +``` + +## Pre-existing architectural debt (C1 — NOT fixed by this composite) + +`verify-result.json` (the file the verdict is read from) lives at +`$PR_HEAD_DIR/.verify-out-trusted/verify-result.json` — inside srt's +`allowWrite` set. A malicious PR-added unit test running inside srt can +forge it. The split documented here does NOT make C1 worse; it stays at +its current path so the legitimate writer (`verify-pr.ts`, which itself +runs INSIDE srt) keeps working. + +**The architectural fix requires** one of: + +1. **Process-split** — orchestrator OUTSIDE srt, only Playwright + dev-server + spawns wrapped. **Attempted 2026-05-14, failed**: srt uses bubblewrap with + a fresh network namespace per invocation, so localhost IPC between + orchestrator (outside) and dispatcher (inside) breaks. Reviving requires + shared host netns (loses egress policy on dispatcher), host-network bridge + / Unix socket, or moving dispatcher outside srt (loosens trust on + PR-modified framework code). +2. **HMAC-bound verdict** — `verify-pr.ts` HMAC-signs the JSON with the + provenance secret; trusted bash verifies. Requires scrubbing the secret + from orchestrator env before spawning Playwright + auditing + `/proc//environ` reachability inside srt. + +Until that lands, the verdict is trustworthy ONLY when paired with the +side-channel signals (PR comment, telemetry, GitHub run conclusion) that an +attacker would also have to forge. Tracked as separate follow-up. diff --git a/.github/actions/agentic-pr-prepare/action.yml b/.github/actions/agentic-pr-prepare/action.yml new file mode 100644 index 000000000000..74c5b4aa14b0 --- /dev/null +++ b/.github/actions/agentic-pr-prepare/action.yml @@ -0,0 +1,424 @@ +name: 'Prepare agentic PR environment' +description: > + Universal trust-gate, sandbox setup, and PR-head clone for agentic + workflows running under `pull_request_target`. Carves out steps 1-8 and + 11-14 of the original verify-pr.yml so future agentic actions can reuse + this trust-boundary plumbing without copy-pasting. + + Caller contract: + - Workflow MUST use `pull_request_target` so this composite resolves + against the base ref (load-bearing for trust). + - Caller MUST declare `permissions:` and `concurrency:` blocks itself; + composites cannot. + - Caller MUST pass `srt-sha256` explicitly (kept inline so a "chore: bump + srt" PR cannot lower the social-review bar via single-approval default). + +inputs: + github-token: + description: 'GITHUB_TOKEN for base + PR-head manual clones' + required: true + base-ref: + description: 'Base branch ref (e.g. github.event.pull_request.base.ref)' + required: true + base-sha: + description: 'Base SHA (github.event.pull_request.base.sha)' + required: true + pr-head-sha: + description: 'PR head SHA (github.event.pull_request.head.sha)' + required: true + repo: + description: 'owner/name (github.repository)' + required: true + srt-version: + description: 'Pinned @anthropic-ai/sandbox-runtime version' + required: false + default: '0.0.51' + srt-sha256: + description: > + sha256 of the resolved srt shim at srt-version. H1: NO default — caller + must pass it inline in the workflow so a chore-bump PR cannot flip both + version and sha together via a single approval. + required: true + srt-allowed-domains: + description: 'Newline list of allowed network domains for srt jail' + required: false + default: | + localhost + 127.0.0.1 + registry.yarnpkg.com + registry.npmjs.org + registry.npmjs.com + objects.githubusercontent.com + playwright.azureedge.net + playwright-akamai.azureedge.net + playwright-verizon.azureedge.net + cdn.playwright.dev + srt-allow-write-paths: + description: 'Newline list of srt allowWrite paths (env vars expanded at composite runtime)' + required: false + default: | + $PR_HEAD_DIR + $SANDBOX_TMPDIR + $RUNNER_TEMP/storybook-sandboxes + /tmp + $HOME/.cache + $HOME/.local/share + $HOME/.storybook + srt-deny-read-paths: + description: 'Newline list of srt denyRead paths' + required: false + default: | + $HOME/.ssh + $HOME/.aws + $HOME/.docker + $HOME/.npmrc + $HOME/.gitconfig + $HOME/.config/gh + $GITHUB_WORKSPACE/.git + srt-deny-write-paths: + description: 'Newline list of srt denyWrite paths' + required: false + default: | + $GITHUB_WORKSPACE + $GITHUB_WORKSPACE/.git + sync-files: + description: > + Newline-delimited `src:dst` pairs (src relative to base checkout, + dst relative to PR_HEAD_DIR). H2: rejects `..`, leading `/`, or + extra `:`; refuses symlink at dst before `cp --no-dereference`. + required: false + default: '' + sync-trees: + description: > + Newline-delimited tree paths (relative). Each tree is copied with + `cp -aT` after refusing symlink at the dst root. H2 path-validated. + required: false + default: '' + provenance-secret: + description: > + Optional caller-supplied secret (e.g. secrets.VERIFY_PROVENANCE_SECRET). + If empty, a per-run random 32-byte hex secret is generated. + required: false + default: '' + install-code-deps: + description: 'Pass-through to setup-node-and-install' + required: false + default: 'true' + +outputs: + pr-head-dir: + description: 'Absolute path to untrusted PR-head workspace clone' + value: ${{ steps.paths.outputs.pr-head-dir }} + srt-settings-path: + description: 'Absolute path to srt-settings.json' + value: ${{ steps.srt-settings.outputs.srt-settings-path }} + diff-path: + description: 'Absolute path to captured pr.diff' + value: ${{ steps.diff.outputs.diff-path }} + provenance-secret-path: + description: > + M2: Absolute path to file holding the per-run provenance secret. Trusted + steps `cat` it explicitly; the secret is NOT written to $GITHUB_ENV so + a future caller forgetting an `env -i` allowlist cannot leak it. + value: ${{ steps.provenance.outputs.provenance-secret-path }} + +runs: + using: 'composite' + steps: + - name: Check actor permission + uses: prince-chrismc/check-actor-permissions-action@d504e74ba31658f4cdf4fcfeb509d4c09736d88e # v3.0.2 + with: + permission: write + + - name: Checkout base (manual clone — no token persistence) + # Manual git clone instead of actions/checkout because the repo carries + # gitlinks (.external/addon-svelte-csf etc.) without a .gitmodules + # entry. actions/checkout's "Removing auth" cleanup walks the submodule + # tree under persist-credentials:false and aborts with + # fatal: No url found for submodule path '.external/...' + # Manual clone never writes credentials into .git/config and never + # walks .gitmodules, so we get persist-credentials:false semantics + # without the cleanup-walk crash. + shell: bash + env: + GITHUB_TOKEN: ${{ inputs.github-token }} + BASE_REF: ${{ inputs.base-ref }} + BASE_SHA: ${{ inputs.base-sha }} + PR_HEAD_SHA: ${{ inputs.pr-head-sha }} + REPO: ${{ inputs.repo }} + run: | + set -euo pipefail + # Clone to $RUNNER_TEMP (outside workspace) then overlay contents + # onto $GITHUB_WORKSPACE via `cp -aT`. The caller's bootstrap + # sparse-checkout left `.github/` on disk; overlay overwrites it + # cleanly so the workspace ends up with the trusted base ref's + # full tree (including this composite's own action.yml). + TMP_CLONE="$RUNNER_TEMP/_base_clone" + rm -rf "$TMP_CLONE" + git -c protocol.version=2 clone \ + --no-tags --depth=1 --branch "$BASE_REF" \ + "https://x-access-token:${GITHUB_TOKEN}@github.com/${REPO}.git" \ + "$TMP_CLONE" + cp -aT "$TMP_CLONE" "$GITHUB_WORKSPACE" + rm -rf "$TMP_CLONE" + git -C "$GITHUB_WORKSPACE" -c protocol.version=2 \ + fetch --no-tags --depth=1 origin "$BASE_SHA" "$PR_HEAD_SHA" + git -C "$GITHUB_WORKSPACE" remote set-url origin "https://github.com/${REPO}.git" + git -C "$GITHUB_WORKSPACE" config --local --unset-all credential.helper 2>/dev/null || true + git -C "$GITHUB_WORKSPACE" config --local --unset-all http.https://github.com/.extraheader 2>/dev/null || true + + - name: Setup Node.js and Install Dependencies + uses: ./.github/actions/setup-node-and-install + with: + install-code-deps: ${{ inputs.install-code-deps }} + + - name: Setup Bun + uses: oven-sh/setup-bun@0c5077e51419868618aeaa5fe8019c62421857d6 # v2.2.0 + + - name: Compute paths + id: paths + shell: bash + run: | + set -euo pipefail + PR_HEAD_DIR="${RUNNER_TEMP}/pr-head" + echo "PR_HEAD_DIR=${PR_HEAD_DIR}" >> "$GITHUB_ENV" + echo "pr-head-dir=${PR_HEAD_DIR}" >> "$GITHUB_OUTPUT" + + - name: Init provenance secret + id: provenance + # M2: write secret to file (mode 0600), NOT $GITHUB_ENV. A future + # caller forgetting the `env -i` allowlist would otherwise leak it + # to untrusted task steps. Trusted steps consume it via + # `cat "$(provenance-secret-path)"`. + shell: bash + env: + INHERITED: ${{ inputs.provenance-secret }} + run: | + set -euo pipefail + if [ -n "$INHERITED" ]; then + SECRET="$INHERITED" + else + SECRET="$(openssl rand -hex 32)" + fi + echo "::add-mask::$SECRET" + SECRET_FILE="$RUNNER_TEMP/provenance-secret" + umask 077 + printf '%s' "$SECRET" > "$SECRET_FILE" + echo "provenance-secret-path=$SECRET_FILE" >> "$GITHUB_OUTPUT" + + - name: Checkout PR head (untrusted execution context) + shell: bash + env: + GITHUB_TOKEN: ${{ inputs.github-token }} + PR_HEAD_SHA: ${{ inputs.pr-head-sha }} + BASE_SHA: ${{ inputs.base-sha }} + REPO: ${{ inputs.repo }} + run: | + set -euo pipefail + git -c protocol.version=2 clone \ + --no-tags --no-checkout --filter=blob:none \ + "https://x-access-token:${GITHUB_TOKEN}@github.com/${REPO}.git" \ + "$PR_HEAD_DIR" + cd "$PR_HEAD_DIR" + git -c protocol.version=2 fetch --no-tags --depth=1 origin "$PR_HEAD_SHA" + git -c protocol.version=2 fetch --no-tags --depth=1 origin "$BASE_SHA" + git checkout --force "$PR_HEAD_SHA" + git remote set-url origin "https://github.com/${REPO}.git" + git config --local --unset-all credential.helper 2>/dev/null || true + + - name: Fetch PR diff + id: diff + shell: bash + env: + PR_HEAD_SHA: ${{ inputs.pr-head-sha }} + BASE_SHA: ${{ inputs.base-sha }} + run: | + set -euo pipefail + DIFF_PATH="$RUNNER_TEMP/pr.diff" + git -C "$PR_HEAD_DIR" diff "$BASE_SHA" "$PR_HEAD_SHA" > "$DIFF_PATH" + echo "diff-path=$DIFF_PATH" >> "$GITHUB_OUTPUT" + + - name: Sync trusted harness code into PR head + # H2: path-validate every src:dst pair from `sync-files` and every tree + # from `sync-trees`. Reject `..`, leading `/`, extra `:`; resolve + # realpath and assert under $PR_HEAD_DIR. Refuse symlink at dst + # before cp --no-dereference. M4: also rm -f .verify-recipes/pr-*.spec.ts + # belt-and-suspenders even though Author overwrites later. + shell: bash + env: + SYNC_FILES: ${{ inputs.sync-files }} + SYNC_TREES: ${{ inputs.sync-trees }} + run: | + set -euo pipefail + + assert_safe_rel() { + local rel="$1" label="$2" + case "$rel" in + /*) echo "[sync] $label: absolute path rejected: '$rel'" >&2; exit 1 ;; + *..*) echo "[sync] $label: '..' rejected: '$rel'" >&2; exit 1 ;; + '') echo "[sync] $label: empty path rejected" >&2; exit 1 ;; + esac + } + + assert_under_prhead() { + local resolved="$1" label="$2" + case "$resolved" in + "$PR_HEAD_DIR"|"$PR_HEAD_DIR"/*) : ;; + *) echo "[sync] $label: resolved path '$resolved' escapes \$PR_HEAD_DIR" >&2; exit 1 ;; + esac + } + + mkdir -p "$PR_HEAD_DIR/.verify-recipes" + rm -f "$PR_HEAD_DIR"/.verify-recipes/pr-*.spec.ts + + # sync-files: src:dst pairs + while IFS= read -r line; do + [ -z "$line" ] && continue + # Strip trailing whitespace + line="${line%"${line##*[![:space:]]}"}" + [ -z "$line" ] && continue + # Reject extra colons (more than one) + if [ "$(awk -F: '{print NF-1}' <<<"$line")" != "1" ]; then + echo "[sync] sync-files entry must contain exactly one ':' — got '$line'" >&2 + exit 1 + fi + src="${line%%:*}" + dst="${line#*:}" + assert_safe_rel "$src" "sync-files src" + assert_safe_rel "$dst" "sync-files dst" + src_abs="$GITHUB_WORKSPACE/$src" + dst_abs="$PR_HEAD_DIR/$dst" + # Assert dst resolves under PR_HEAD_DIR even if intermediate parents are symlinks. + dst_resolved="$(realpath -m "$dst_abs")" + assert_under_prhead "$dst_resolved" "sync-files dst" + if [ ! -f "$src_abs" ]; then + echo "[sync] sync-files src missing: $src_abs" >&2 + exit 1 + fi + mkdir -p "$(dirname "$dst_abs")" + if [ -L "$dst_abs" ]; then + echo "[sync] refusing to overwrite symlink at $dst_abs" >&2 + exit 1 + fi + cp --no-dereference --remove-destination "$src_abs" "$dst_abs" + done <<< "$SYNC_FILES" + + # sync-trees: tree paths + while IFS= read -r line; do + [ -z "$line" ] && continue + line="${line%"${line##*[![:space:]]}"}" + [ -z "$line" ] && continue + assert_safe_rel "$line" "sync-trees" + src_abs="$GITHUB_WORKSPACE/$line" + dst_abs="$PR_HEAD_DIR/$line" + dst_resolved="$(realpath -m "$dst_abs")" + assert_under_prhead "$dst_resolved" "sync-trees dst" + if [ ! -d "$src_abs" ]; then + echo "[sync] sync-trees src missing: $src_abs" >&2 + exit 1 + fi + mkdir -p "$(dirname "$dst_abs")" + if [ -L "$dst_abs" ]; then + echo "[sync] refusing to overwrite symlink dir at $dst_abs" >&2 + exit 1 + fi + rm -rf "$dst_abs" + cp -aT "$src_abs" "$dst_abs" + done <<< "$SYNC_TREES" + + - name: Install sandbox-runtime (Layer-2) + # H1: caller MUST pass srt-sha256 inline; no composite default. + shell: bash + env: + SRT_VERSION: ${{ inputs.srt-version }} + EXPECTED_SRT_SHA: ${{ inputs.srt-sha256 }} + run: | + set -euo pipefail + if [ -z "$EXPECTED_SRT_SHA" ]; then + echo "srt-sha256 input is required (no composite default — see SECURITY.md §pinning-sandbox-runtime)" >&2 + exit 1 + fi + sudo apt-get update -qq + sudo apt-get install -y --no-install-recommends bubblewrap socat ripgrep + sudo npm install -g --ignore-scripts "@anthropic-ai/sandbox-runtime@${SRT_VERSION}" + srt --version + actual=$(sha256sum "$(which srt)" | cut -d' ' -f1) + if [ "$actual" != "$EXPECTED_SRT_SHA" ]; then + echo "srt sha mismatch — expected=$EXPECTED_SRT_SHA actual=$actual" >&2 + exit 1 + fi + echo "[srt] sha verified: $actual" + + - name: Build sandbox settings + id: srt-settings + # H3: emit JSON arrays via jq, never via shell string interpolation. + # Allows future callers to pass PR-controlled values without enabling + # JSON injection into the srt config. + shell: bash + env: + SRT_ALLOWED_DOMAINS: ${{ inputs.srt-allowed-domains }} + SRT_ALLOW_WRITE_PATHS: ${{ inputs.srt-allow-write-paths }} + SRT_DENY_READ_PATHS: ${{ inputs.srt-deny-read-paths }} + SRT_DENY_WRITE_PATHS: ${{ inputs.srt-deny-write-paths }} + run: | + set -euo pipefail + SANDBOX_TMPDIR="$RUNNER_TEMP/sandbox-tmp" + mkdir -p "$SANDBOX_TMPDIR" + mkdir -p "$PR_HEAD_DIR/.verify-scratch" + mkdir -p "$PR_HEAD_DIR/.verify-out-trusted" + mkdir -p "$HOME/.storybook" + + # Expand $VAR refs in path lists at composite runtime, then drop + # blank lines. The newline-list-to-jq-array convention is + # `jq -R . | jq -s .` (read each line as string, then slurp to array). + expand_lines() { + local raw="$1" + # envsubst would also work but isn't installed everywhere. Use + # `eval echo` per line, then strip empties. + while IFS= read -r line; do + [ -z "$line" ] && continue + line="${line%"${line##*[![:space:]]}"}" + [ -z "$line" ] && continue + eval "printf '%s\n' \"$line\"" + done <<< "$raw" + } + + ALLOWED_DOMAINS_JSON=$(expand_lines "$SRT_ALLOWED_DOMAINS" | jq -R . | jq -s .) + ALLOW_WRITE_JSON=$(expand_lines "$SRT_ALLOW_WRITE_PATHS" | jq -R . | jq -s .) + DENY_READ_JSON=$(expand_lines "$SRT_DENY_READ_PATHS" | jq -R . | jq -s .) + DENY_WRITE_JSON=$(expand_lines "$SRT_DENY_WRITE_PATHS" | jq -R . | jq -s .) + + SRT_SETTINGS_PATH="$RUNNER_TEMP/srt-settings.json" + jq -n \ + --argjson allowedDomains "$ALLOWED_DOMAINS_JSON" \ + --argjson allowWrite "$ALLOW_WRITE_JSON" \ + --argjson denyRead "$DENY_READ_JSON" \ + --argjson denyWrite "$DENY_WRITE_JSON" \ + '{ + network: { + allowLocalBinding: true, + allowedDomains: $allowedDomains, + deniedDomains: [] + }, + filesystem: { + allowRead: [], + allowWrite: $allowWrite, + denyRead: $denyRead, + denyWrite: $denyWrite + } + }' > "$SRT_SETTINGS_PATH" + + echo "SRT_SETTINGS=$SRT_SETTINGS_PATH" >> "$GITHUB_ENV" + echo "CLAUDE_CODE_TMPDIR=$SANDBOX_TMPDIR" >> "$GITHUB_ENV" + echo "srt-settings-path=$SRT_SETTINGS_PATH" >> "$GITHUB_OUTPUT" + cat "$SRT_SETTINGS_PATH" + + - name: Smoke-test srt egress policy + shell: bash + run: | + set -uo pipefail + if srt --settings "$SRT_SETTINGS" curl --max-time 5 https://example.com >/dev/null 2>&1; then + echo "[smoke] EGRESS POLICY BROKEN — example.com reachable inside srt" >&2 + exit 1 + fi + echo "[smoke] egress correctly denied" diff --git a/.github/actions/agentic-pr-publish/README.md b/.github/actions/agentic-pr-publish/README.md new file mode 100644 index 000000000000..e0746e2e141a --- /dev/null +++ b/.github/actions/agentic-pr-publish/README.md @@ -0,0 +1,115 @@ +# agentic-pr-publish + +Universal post-task publishing for agentic workflows: read verdict, push +screenshots to a side branch, append telemetry, stage + upload artifacts. + +This is **half 2 of 2** of the split `verify-pr.yml` infrastructure. + +## Caller contract + +The composite **cannot** declare these — the caller workflow MUST: + +1. Run under `pull_request_target` (composite resolves against base ref). +2. Wrap the `uses:` step in `if: always()` if the caller wants publish to + run on prior-step failure. M1: composite-level `if:` does NOT cascade to + sub-steps, but every sub-step inside this composite that needs + prior-step-failure tolerance already carries explicit `if: always()`. +3. Thread `result-path` from the task step that wrote `verify-result.json` + (e.g. `${{ steps.verify.outputs.result-path }}`). Do not glob + PR-writable directories to find it — that's the C1 forgery surface. +4. Declare `permissions:` block at least: + ```yaml + permissions: + pull-requests: write + contents: write # side-branch screenshot push (omit if skip-screenshots) + ``` + +## Inputs + +| Name | Required | Default | Purpose | +|----------------------------|----------|--------------------------|------------------------------------------------------------------------| +| `github-token` | yes | — | Side-branch push. | +| `pr-number` | yes | — | PR number. | +| `run-id` | yes | — | `github.run_id`. | +| `repo` | yes | — | `github.repository`. | +| `result-path` | yes | — | Trusted absolute path to `verify-result.json`. | +| `pr-head-dir` | no | `env.PR_HEAD_DIR` | Inherited from prepare composite. | +| `screenshot-source-dir` | no | `/.verify-output` | Where `push-screenshots.ts` scans for PNGs. | +| `dispatch-dirs` | no | `/.verify-output\n/.verify-output` | Newline list. Passed as repeated `--dispatch-dir`. | +| `telemetry-webhook-url` | no | (empty → no-op) | Telemetry sink. | +| `telemetry-webhook-token` | no | (empty → no-op) | Telemetry auth. | +| `artifact-name-prefix` | no | `verify-output` | Final artifact name = `-pr--`. | +| `retention-days` | no | `14` | Artifact retention. | +| `skip-screenshots` | no | `false` | `true` → skip side-branch push (callers without PNG output). | +| `skip-telemetry` | no | `false` | `true` → telemetry no-op regardless of webhook secrets. | + +## Outputs + +| Name | Purpose | +|----------------------------|-----------------------------------------------------------------------------------------------| +| `verdict` | Verdict from `derive-verdict.ts` (`verified` / `regression` / `evidence-missing` / `missing`).| +| `screenshot-urls-path` | **H4**: absolute path to FILE containing screenshot URLs JSON. Read with `fs.readFileSync` in caller; do not interpolate into shell.| + +## H4: screenshot-urls indirection + +Composite output is a **file path**, not a heredoc-encoded JSON string. The +caller's `actions/github-script` step reads: + +```js +const path = process.env.SCREENSHOT_URLS_PATH; +const items = JSON.parse(fs.readFileSync(path, 'utf-8')); +``` + +Closes the heredoc-terminator-injection surface that exists if `screenshot-urls` +were a single-line composite output piped through `< + Read verdict, push screenshots to side branch, append telemetry, stage + artifacts, upload artifacts. Carves out steps 19 + 21-24 of the original + verify-pr.yml. Action-agnostic; verdict-gated label / PR comment / final + fail-gate stay in the caller because they encode action-specific shape. + + M1: every sub-step that mirrors a current `if: always()` step carries + explicit `if: always()` here — composite-level `if:` does NOT cascade. + + H4: `screenshot-urls` is exposed as a FILE PATH (composite output + `screenshot-urls-path`), not a heredoc string. Caller's github-script + reads the file with fs.readFileSync. Closes a heredoc-terminator-injection + surface against the PR comment renderer. + +inputs: + github-token: + description: 'GITHUB_TOKEN for side-branch push' + required: true + pr-number: + description: 'PR number' + required: true + run-id: + description: 'github.run_id' + required: true + repo: + description: 'owner/name (github.repository)' + required: true + result-path: + description: > + Trusted absolute path to verify-result.json (caller threads from the + task step's $GITHUB_OUTPUT). + required: true + provenance-secret-path: + description: > + C1 fix: path to file holding the per-run provenance secret. derive-verdict.ts + reads VERIFY_PROVENANCE_SECRET from env to validate the HMAC signature + on verify-result.json. If empty, HMAC gate is skipped (back-compat for + callers that don't yet pass it; treats the verdict as untrusted in the + same sense as the pre-C1 path). + required: false + default: '' + pr-head-dir: + description: 'Absolute path to PR-head workspace (defaults to env.PR_HEAD_DIR)' + required: false + default: '' + screenshot-source-dir: + description: 'Where push-screenshots.ts scans for PNGs' + required: false + default: '' + dispatch-dirs: + description: 'Newline-delimited dirs passed as repeated --dispatch-dir to append-telemetry.ts' + required: false + default: '' + telemetry-webhook-url: + description: 'Empty → telemetry no-ops' + required: false + default: '' + telemetry-webhook-token: + description: 'Empty → telemetry no-ops' + required: false + default: '' + artifact-name-prefix: + description: 'Final artifact name = -pr--' + required: false + default: 'verify-output' + retention-days: + description: 'Artifact retention' + required: false + default: '14' + skip-screenshots: + description: 'true → side-branch push skipped' + required: false + default: 'false' + skip-telemetry: + description: 'true → telemetry no-op (independent of webhook secrets)' + required: false + default: 'false' + +outputs: + verdict: + description: 'Verdict emitted by derive-verdict.ts (e.g. verified | regression | missing)' + value: ${{ steps.verdict.outputs.verdict }} + screenshot-urls-path: + description: > + H4: absolute path to a file containing the screenshot URLs JSON + (array of {rel,url} or []). Empty file or missing path means no + screenshots. Caller reads with fs.readFileSync to avoid heredoc + injection through composite output. + value: ${{ steps.screenshots.outputs.urls-path }} + +runs: + using: 'composite' + steps: + - name: Read verdict + id: verdict + if: always() + shell: bash + env: + RESULT: ${{ inputs.result-path }} + PROVENANCE_SECRET_PATH: ${{ inputs.provenance-secret-path }} + run: | + set -euo pipefail + if [ ! -f "$RESULT" ]; then + echo "verdict=missing" >> "$GITHUB_OUTPUT" + exit 0 + fi + # Playwright writes its report into `$PR_HEAD_DIR/.verify-output//`, + # which is separate from the trusted result file at + # `$PR_HEAD_DIR/.verify-out-trusted/verify-result.json`. Using + # `dirname($RESULT)` finds nothing — derive-verdict then can't populate + # `regressionReason` from the report. Resolve via runId. + RUN_ID=$(jq -r '.runId // ""' "$RESULT" 2>/dev/null) + REPORT="${PR_HEAD_DIR:-$(dirname "$RESULT")}/.verify-output/${RUN_ID}/playwright-report.json" + if [ ! -f "$REPORT" ]; then + # Fallback for older runs colocated with verify-result.json (local-dev). + REPORT="$(dirname "$RESULT")/playwright-report.json" + fi + # C1 fix: thread the provenance secret into derive-verdict.ts via env + # so it can validate the HMAC signature. The secret never appears in + # the composite's $GITHUB_ENV or step output — only in this subshell. + if [ -n "$PROVENANCE_SECRET_PATH" ] && [ -f "$PROVENANCE_SECRET_PATH" ]; then + VERIFY_PROVENANCE_SECRET="$(cat "$PROVENANCE_SECRET_PATH")" \ + node "$GITHUB_WORKSPACE/scripts/verify/ci/derive-verdict.ts" \ + --result "$RESULT" \ + --report "$REPORT" \ + >> "$GITHUB_OUTPUT" + else + node "$GITHUB_WORKSPACE/scripts/verify/ci/derive-verdict.ts" \ + --result "$RESULT" \ + --report "$REPORT" \ + >> "$GITHUB_OUTPUT" + fi + + - name: Push screenshots to side branch + id: screenshots + if: always() && inputs.skip-screenshots != 'true' && steps.verdict.outputs.verdict != '' && steps.verdict.outputs.verdict != 'missing' + shell: bash + # M3: token via env-mapping, not literal interpolation. H4: write urls + # JSON to a file path exposed as composite output, not a heredoc to + # $GITHUB_OUTPUT. + env: + GITHUB_TOKEN: ${{ inputs.github-token }} + PR_NUMBER: ${{ inputs.pr-number }} + REPO: ${{ inputs.repo }} + RUN_ID: ${{ inputs.run-id }} + SRC_DIR: ${{ inputs.screenshot-source-dir }} + run: | + set -euo pipefail + SRC="${SRC_DIR:-${{ inputs.pr-head-dir != '' && inputs.pr-head-dir || env.PR_HEAD_DIR }}/.verify-output}" + URLS_FILE="$RUNNER_TEMP/screenshot-urls.json" + # Default to empty array so caller can always fs.readFileSync. + echo '[]' > "$URLS_FILE" + + # Route push-screenshots' heredoc output to a private sink, then + # extract just the urls JSON value into URLS_FILE. Avoids exposing + # the heredoc terminator across the composite-caller boundary. + SINK="$RUNNER_TEMP/screenshot-output-sink" + : > "$SINK" + node "$GITHUB_WORKSPACE/scripts/verify/ci/push-screenshots.ts" \ + --source "$SRC" \ + --pr "$PR_NUMBER" \ + --run-id "$RUN_ID" \ + --repo "$REPO" \ + --assets-dir "$RUNNER_TEMP/verify-assets" \ + --output "$SINK" + # Heredoc shape: `urls<\nEOF\n`. Extract everything + # between `urls< "$URLS_FILE" + fi + echo "urls-path=$URLS_FILE" >> "$GITHUB_OUTPUT" + + - name: Append telemetry + if: always() && inputs.skip-telemetry != 'true' && steps.verdict.outputs.verdict != '' && steps.verdict.outputs.verdict != 'missing' + shell: bash + # M3: webhook secrets via env-mapping only. No literal interpolation + # into the shell. append-telemetry.ts no-ops when env unset. + env: + PR_NUMBER: ${{ inputs.pr-number }} + RUN_ID: ${{ inputs.run-id }} + TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_URL: ${{ inputs.telemetry-webhook-url }} + TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_TOKEN: ${{ inputs.telemetry-webhook-token }} + RESULT: ${{ inputs.result-path }} + DISPATCH_DIRS: ${{ inputs.dispatch-dirs }} + PR_HEAD_DIR_IN: ${{ inputs.pr-head-dir }} + run: | + set -euo pipefail + DISPATCH_ARGS=() + if [ -n "$DISPATCH_DIRS" ]; then + while IFS= read -r d; do + [ -z "$d" ] && continue + d="${d%"${d##*[![:space:]]}"}" + [ -z "$d" ] && continue + DISPATCH_ARGS+=(--dispatch-dir "$d") + done <<< "$DISPATCH_DIRS" + else + PRHD="${PR_HEAD_DIR_IN:-$PR_HEAD_DIR}" + DISPATCH_ARGS+=(--dispatch-dir "$PRHD/.verify-output") + DISPATCH_ARGS+=(--dispatch-dir "$GITHUB_WORKSPACE/.verify-output") + fi + node "$GITHUB_WORKSPACE/scripts/verify/ci/append-telemetry.ts" \ + --result "$RESULT" \ + --pr "$PR_NUMBER" \ + --run-id "$RUN_ID" \ + "${DISPATCH_ARGS[@]}" \ + --curl-cfg "$RUNNER_TEMP/.curl-cfg" + + - name: Stage artifacts (rename dotdirs) + if: always() + shell: bash + env: + PR_NUMBER: ${{ inputs.pr-number }} + PR_HEAD_DIR_IN: ${{ inputs.pr-head-dir }} + run: | + set -euo pipefail + PRHD="${PR_HEAD_DIR_IN:-$PR_HEAD_DIR}" + STAGE="$RUNNER_TEMP/verify-artifacts" + rm -rf "$STAGE" + mkdir -p "$STAGE/runner-pr-head" "$STAGE/base" + if [ -d "$PRHD/.verify-output" ]; then + cp -a "$PRHD/.verify-output/." "$STAGE/runner-pr-head/verify-output/" + fi + if [ -d "$PRHD/.verify-recipes" ]; then + mkdir -p "$STAGE/runner-pr-head/verify-recipes" + if [ -f "$PRHD/.verify-recipes/pr-${PR_NUMBER}.spec.ts" ]; then + cp "$PRHD/.verify-recipes/pr-${PR_NUMBER}.spec.ts" \ + "$STAGE/runner-pr-head/verify-recipes/pr-${PR_NUMBER}.spec.ts" + fi + fi + if [ -d "$GITHUB_WORKSPACE/.verify-output" ]; then + cp -a "$GITHUB_WORKSPACE/.verify-output/." "$STAGE/base/verify-output/" + fi + echo "VERIFY_ARTIFACTS_DIR=$STAGE" >> "$GITHUB_ENV" + ls -la "$STAGE" || true + + - name: Upload artifacts + if: always() + uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 + with: + name: ${{ inputs.artifact-name-prefix }}-pr-${{ inputs.pr-number }}-${{ inputs.run-id }} + path: ${{ env.VERIFY_ARTIFACTS_DIR }}/ + if-no-files-found: warn + retention-days: ${{ inputs.retention-days }} diff --git a/.github/actions/setup-node-and-install/action.yml b/.github/actions/setup-node-and-install/action.yml index d882d1c57068..2fcaa97f5fc2 100644 --- a/.github/actions/setup-node-and-install/action.yml +++ b/.github/actions/setup-node-and-install/action.yml @@ -11,16 +11,25 @@ runs: using: 'composite' steps: - name: Setup Node.js - uses: actions/setup-node@v4 + uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0 with: node-version-file: '.nvmrc' - - name: Update npm to latest - shell: bash - run: npm install -g npm@latest - - - name: Cache dependencies - uses: actions/cache@v4 + # SECURITY (TanStack/router 2026-05-11 class): actions/cache's post-job + # SAVE uses a runner-internal token NOT gated by `permissions:` or the + # caller's `env -i`. Under `pull_request_target` the job runs untrusted + # fork code (verify-pr.yml clones the PR head and runs its `yarn install` + # into the shared ~/.yarn/berry/cache). An auto-save would write that + # poisoned store into the base-repo cache scope, where the trusted + # publish.yml (id-token: write + npm publish) would later restore it — + # the exact fork→base cache-poisoning detonation chain. + # + # Mitigation: split restore (always) from save (only when NOT triggered + # by pull_request_target). Untrusted PRT runs become cache READ-ONLY and + # can never write the shared scope. Trusted push / workflow_dispatch / + # release contexts still populate the cache normally. + - name: Restore dependency cache + uses: actions/cache/restore@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3 with: path: | ~/.yarn/berry/cache @@ -39,3 +48,13 @@ runs: shell: bash working-directory: code run: yarn install + + # Save only from trusted trigger contexts. pull_request_target runs + # untrusted fork code → never let it persist the shared cache scope. + - name: Save dependency cache + if: github.event_name != 'pull_request_target' + uses: actions/cache/save@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3 + with: + path: | + ~/.yarn/berry/cache + key: yarn-v1-${{ hashFiles('yarn.lock') }} diff --git a/.github/workflows/_srt-sha-probe.yml b/.github/workflows/_srt-sha-probe.yml new file mode 100644 index 000000000000..f8211e4f060a --- /dev/null +++ b/.github/workflows/_srt-sha-probe.yml @@ -0,0 +1,71 @@ +name: srt sha probe (manual) + +# Manually-triggered ONLY. Resolves the sha256 of the `srt` shim for a given +# @anthropic-ai/sandbox-runtime version and EMITS a candidate +# scripts/verify/srt.lock.json for a human to land via a normal reviewed PR. +# Never runs on PRs / pushes, and NEVER writes/commits/pushes the lock file +# itself: the supply-chain pin is load-bearing and MUST stay part of the +# reviewed diff (see scripts/verify/SECURITY.md and verify-pr.yml). This probe +# only computes the value and surfaces it (step summary + output + artifact). +# +# Bump procedure: run this probe, copy the emitted {version, sha256} block +# from the run summary (or download the srt-lock-candidate artifact) into +# scripts/verify/srt.lock.json, and open a normal PR. The composite's +# fail-closed post-install sha256 check is unchanged and still gates the run. +on: + workflow_dispatch: + inputs: + srt-version: + description: 'sandbox-runtime npm version to pin (e.g. 0.0.51)' + required: true + type: string + +permissions: + contents: read + +jobs: + probe: + runs-on: ubuntu-22.04 + timeout-minutes: 10 + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Resolve srt sha256 and emit lock-file candidate + id: probe + env: + SRT_VERSION: ${{ inputs.srt-version }} + run: | + set -euo pipefail + case "$SRT_VERSION" in ''|*[!0-9.]*|.*|*.|*..*) echo "invalid srt version '$SRT_VERSION'" >&2; exit 1 ;; esac + sudo npm install -g --ignore-scripts "@anthropic-ai/sandbox-runtime@${SRT_VERSION}" + srt --version + SHA=$(sha256sum "$(which srt)" | cut -d' ' -f1) + case "$SHA" in [0-9a-f]*) [ "${#SHA}" -eq 64 ] || { echo "computed sha is not 64 hex chars" >&2; exit 1; } ;; *) echo "computed sha is not lowercase hex" >&2; exit 1 ;; esac + CANDIDATE="$RUNNER_TEMP/srt.lock.json" + jq -n --arg v "$SRT_VERSION" --arg s "$SHA" '{version: $v, sha256: $s}' > "$CANDIDATE" + echo "candidate-path=$CANDIDATE" >> "$GITHUB_OUTPUT" + echo "version=$SRT_VERSION" >> "$GITHUB_OUTPUT" + echo "sha256=$SHA" >> "$GITHUB_OUTPUT" + { + echo "## srt pin candidate" + echo "" + echo "Update \`scripts/verify/srt.lock.json\` via a **normal reviewed PR**" + echo "with the block below (this probe deliberately does NOT commit it —" + echo "the supply-chain pin must stay part of the reviewed diff)." + echo "" + echo '```json' + cat "$CANDIDATE" + echo "" + echo '```' + } >> "$GITHUB_STEP_SUMMARY" + echo "[srt-probe] candidate emitted (NOT committed):" + cat "$CANDIDATE" + + - name: Upload srt.lock.json candidate + uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7 + with: + name: srt-lock-candidate + path: ${{ steps.probe.outputs.candidate-path }} + if-no-files-found: error + retention-days: 7 diff --git a/.github/workflows/trigger-circle-ci-workflow.yml b/.github/workflows/trigger-circle-ci-workflow.yml index 7b1cedda19f1..191bf547ad9b 100644 --- a/.github/workflows/trigger-circle-ci-workflow.yml +++ b/.github/workflows/trigger-circle-ci-workflow.yml @@ -53,6 +53,12 @@ jobs: workflow: ${{ env.workflow }} ghBaseBranch: ${{ github.event.pull_request.base.ref }} ghPrNumber: ${{ github.event.pull_request.number }} + # SECURITY: 'true' only when the PR head is a fork (untrusted). On + # push events github.event.pull_request is absent → '' → treated as + # not-fork (trusted). Threaded into CircleCI as the ghIsFork pipeline + # parameter so the generated config omits save_cache on fork + # pipelines (TanStack/router 2026-05-11 cache-poisoning class). + ghIsFork: ${{ github.event.pull_request.head.repo.fork }} trigger-circle-ci-workflow: runs-on: ubuntu-latest diff --git a/.github/workflows/verify-pr.yml b/.github/workflows/verify-pr.yml new file mode 100644 index 000000000000..a0ac1da31585 --- /dev/null +++ b/.github/workflows/verify-pr.yml @@ -0,0 +1,632 @@ +name: PR Verification Harness +on: + pull_request_target: + types: [labeled, synchronize] + paths-ignore: ['docs/**', '**/*.md'] + +# v6 single-round model: recipe is authored AND executed in the same workflow +# run. The authored spec is materialised straight into the untrusted PR-head +# workspace ($RUNNER_TEMP/pr-head/.verify-recipes/pr-<#>.spec.ts) — never +# committed. See scripts/verify/SECURITY.md for the load-bearing controls. +# +# Infrastructure (trust gate, sandbox, artifact publish) is factored into +# two composite actions: +# .github/actions/agentic-pr-prepare/ — universal prep (steps 1-8, 11-14) +# .github/actions/agentic-pr-publish/ — universal publish (steps 19, 21-24) +# This file owns only the verify-specific TASK steps in between. +# +# Composite resolution under pull_request_target: `uses: ./.github/actions/…` +# resolves against the BASE ref, which is load-bearing for trust. Do not +# switch this workflow to a trigger that resolves composites against PR-head. + +jobs: + verify: + if: github.event.pull_request.draft == false && contains(github.event.pull_request.labels.*.name, 'ci:verify') + runs-on: ubuntu-22.04 + timeout-minutes: 30 + concurrency: + group: verify-${{ github.event.pull_request.number }} + cancel-in-progress: true + permissions: + pull-requests: write + issues: write + statuses: write + contents: write # needed to push screenshots to the _agentic-pr-assets side branch + steps: + - name: Bootstrap composite actions + # Manual sparse clone of `.github/` only so `uses: ./.github/actions/...` + # below can resolve composite action.yml from disk. + # + # Cannot use actions/checkout: its cleanup post-step walks gitlinks + # (registered in git index even when not in working tree) via + # `git submodule foreach`, which aborts with + # fatal: No url found for submodule path '.external/addon-svelte-csf' + # because the repo has 160000 tree entries (.external/, .rollout-repos/) + # without a corresponding .gitmodules entry. + # + # Manual clone with sparse-checkout = .github avoids both the + # working-tree materialization of gitlinks AND the cleanup walk. + # The composite's own base clone runs next and overlays the full + # trusted base tree onto this workspace. + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + BASE_REF: ${{ github.event.pull_request.base.ref }} + REPO: ${{ github.repository }} + run: | + set -euo pipefail + TMP="$RUNNER_TEMP/_bootstrap" + rm -rf "$TMP" + git -c protocol.version=2 clone \ + --no-tags --no-checkout --depth=1 --filter=blob:none \ + --branch "$BASE_REF" \ + "https://x-access-token:${GITHUB_TOKEN}@github.com/${REPO}.git" \ + "$TMP" + # Also fetch the trusted base copy of the srt pin lock file so the + # next step can source srt-version/srt-sha256 from a committed + # single source instead of inline literals. Same trust boundary as + # .github (BASE ref, never PR-head); the composite still performs + # the fail-closed post-install sha256 verification unchanged. + git -C "$TMP" sparse-checkout set --no-cone .github scripts/verify/srt.lock.json + git -C "$TMP" checkout "$BASE_REF" -- .github scripts/verify/srt.lock.json + mkdir -p "$GITHUB_WORKSPACE/.github" "$GITHUB_WORKSPACE/scripts/verify" + cp -a "$TMP/.github/." "$GITHUB_WORKSPACE/.github/" + cp -a "$TMP/scripts/verify/srt.lock.json" "$GITHUB_WORKSPACE/scripts/verify/srt.lock.json" + rm -rf "$TMP" + + - name: Resolve srt pin from committed lock file + id: srt-pin + # Single source of truth: scripts/verify/srt.lock.json (trusted BASE + # checkout). Replaces the previous inline srt-version/srt-sha256 + # manual-paste pair. The sha256 here is still consumed by the + # composite's fail-closed integrity check — this step only changes + # WHERE the pinned value comes from, never weakens the check. + # Bump procedure: run .github/workflows/_srt-sha-probe.yml, which + # EMITS a candidate scripts/verify/srt.lock.json (run summary + + # artifact). A human lands it via a normal reviewed PR — the probe + # never commits the pin so it stays part of the reviewed diff. + env: + LOCK_FILE: ${{ github.workspace }}/scripts/verify/srt.lock.json + run: | + set -euo pipefail + test -f "$LOCK_FILE" || { echo "srt lock file missing at $LOCK_FILE — refusing to run without a pinned srt sha (fail-closed)" >&2; exit 1; } + SRT_VERSION=$(jq -er '.version' "$LOCK_FILE") + SRT_SHA256=$(jq -er '.sha256' "$LOCK_FILE") + # Defensive: an empty/whitespace sha must hard-fail here too so the + # composite never receives a blank EXPECTED_SRT_SHA. + case "$SRT_VERSION" in ''|*[!0-9.]*|.*|*.|*..*) echo "invalid srt version '$SRT_VERSION' in lock file" >&2; exit 1 ;; esac + case "$SRT_SHA256" in + [0-9a-f]*) [ "${#SRT_SHA256}" -eq 64 ] || { echo "srt sha256 in lock file is not 64 hex chars" >&2; exit 1; } ;; + *) echo "srt sha256 in lock file is not lowercase hex" >&2; exit 1 ;; + esac + echo "version=$SRT_VERSION" >> "$GITHUB_OUTPUT" + echo "sha256=$SRT_SHA256" >> "$GITHUB_OUTPUT" + echo "[srt] pin resolved from lock file: version=$SRT_VERSION sha256=$SRT_SHA256" + + - name: Prepare agentic environment + id: prep + uses: ./.github/actions/agentic-pr-prepare + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + base-ref: ${{ github.event.pull_request.base.ref }} + base-sha: ${{ github.event.pull_request.base.sha }} + pr-head-sha: ${{ github.event.pull_request.head.sha }} + repo: ${{ github.repository }} + # H1: srt-version/srt-sha256 are single-sourced from the committed + # scripts/verify/srt.lock.json (resolved one step above from the + # trusted BASE checkout), not inline literals. A chore-bump PR + # still carries the heightened workflow-review bar because the lock + # file is part of the reviewed diff; the composite still has NO + # default and still performs the fail-closed post-install sha256 + # integrity check against this value. + # Bump procedure: run .github/workflows/_srt-sha-probe.yml — it + # emits a candidate srt.lock.json (summary + artifact) that a + # human lands via a normal reviewed PR (no workflow edit). + srt-version: ${{ steps.srt-pin.outputs.version }} + srt-sha256: ${{ steps.srt-pin.outputs.sha256 }} + sync-files: | + .verify-recipes/_util.ts:.verify-recipes/_util.ts + .verify-recipes/example-smoke.spec.ts:.verify-recipes/example-smoke.spec.ts + scripts/verify-pr.ts:scripts/verify-pr.ts + scripts/verify-pr-author.ts:scripts/verify-pr-author.ts + scripts/verify-pr-generate.ts:scripts/verify-pr-generate.ts + scripts/verify-evidence-check.ts:scripts/verify-evidence-check.ts + sync-trees: | + scripts/verify + scripts/utils + provenance-secret: ${{ secrets.VERIFY_PROVENANCE_SECRET }} + install-code-deps: 'true' + + - name: Generate bundle + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PR_NUMBER: ${{ github.event.pull_request.number }} + PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }} + BASE_SHA: ${{ github.event.pull_request.base.sha }} + run: | + yarn verify-pr-generate \ + --pr "$PR_NUMBER" \ + --force \ + --base-sha "$BASE_SHA" \ + --head-sha "$PR_HEAD_SHA" \ + --output "$PR_HEAD_DIR/.verify-recipes/pr-${PR_NUMBER}.spec.ts" + + - name: Author recipe + # M2: load provenance secret from file (not $GITHUB_ENV) and thread + # explicitly via env. recipe-author-core reads VERIFY_PROVENANCE_SECRET + # from its own process env to HMAC-sign the spec. + env: + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + PROVENANCE_SECRET_PATH: ${{ steps.prep.outputs.provenance-secret-path }} + run: | + set -euo pipefail + # shellcheck disable=SC2012 + BUNDLE=$(ls -t .verify-output/*/prompt-bundle.json | head -1) + VERIFY_PROVENANCE_SECRET="$(cat "$PROVENANCE_SECRET_PATH")" \ + yarn verify-pr-author --bundle "$BUNDLE" + + - name: Verify PR + id: verify + working-directory: ${{ runner.temp }}/pr-head + env: + PR_NUMBER: ${{ github.event.pull_request.number }} + NX_NO_CLOUD: 'true' + NX_CLOUD_ACCESS_TOKEN: '' + PROVENANCE_SECRET_PATH: ${{ steps.prep.outputs.provenance-secret-path }} + run: | + set -euo pipefail + # strip untrusted-PR secrets — single source, TRUSTED base checkout + # only. Assert the path is absolute + present before sourcing so a + # future relative-path edit can't resolve it under the PR-head + # working dir; fail-closed (refuse to run untrusted code) otherwise. + STRIP_SH="$GITHUB_WORKSPACE/scripts/verify/ci/strip-untrusted-secrets.sh" + case "$GITHUB_WORKSPACE" in /*) : ;; *) echo "GITHUB_WORKSPACE not absolute" >&2; exit 1 ;; esac + test -f "$STRIP_SH" || { echo "trusted strip script missing — refusing to run untrusted code" >&2; exit 1; } + source "$STRIP_SH" + PROVENANCE_SECRET="$(cat "$PROVENANCE_SECRET_PATH")" + # UX1: redirect $GITHUB_ENV / $GITHUB_PATH / $GITHUB_OUTPUT to + # throwaway sinks so an untrusted install / compile step can't + # poison subsequent steps. + SINK_DIR="$RUNNER_TEMP/sink" + mkdir -p "$SINK_DIR" + SINK_GITHUB_ENV="$SINK_DIR/github_env" + SINK_GITHUB_PATH="$SINK_DIR/github_path" + SINK_GITHUB_OUTPUT="$SINK_DIR/github_output" + : > "$SINK_GITHUB_ENV" + : > "$SINK_GITHUB_PATH" + : > "$SINK_GITHUB_OUTPUT" + # Single source for the UX1/cache-poison env jail used by every + # untrusted install/compile call in THIS step: a clean env with + # only the vars they may see, GITHUB_* redirected to throwaway + # sinks. Per-call extras are passed as leading KEY=VAL pairs: + # jailed YARN_ENABLE_SCRIPTS=false yarn install --immutable + # NOTE: the unit-test step's vitest call deliberately uses its + # own `env -i` (different step, no SINK_DIR, needs TMPDIR / + # CLAUDE_CODE_TMPDIR) and is intentionally NOT routed here. + jailed() { + env -i \ + HOME="$HOME" \ + PATH="$PATH" \ + RUNNER_TEMP="$RUNNER_TEMP" \ + NX_NO_CLOUD=true \ + PR_HEAD_DIR="$PR_HEAD_DIR" \ + GITHUB_ENV="$SINK_GITHUB_ENV" \ + GITHUB_PATH="$SINK_GITHUB_PATH" \ + GITHUB_OUTPUT="$SINK_GITHUB_OUTPUT" \ + "$@" + } + # SECURITY (fork→base cache poisoning): pin the untrusted PR + # `yarn install` to a cache dir INSIDE $PR_HEAD_DIR (srt-allowWrite, + # never the shared ~/.yarn/berry/cache that setup-node-and-install + # and the trusted publish.yml share). Even though the post-job + # save is now PRT-gated in setup-node-and-install, this guarantees + # fork code physically cannot write the shared store. + UNTRUSTED_YARN_CACHE="$PR_HEAD_DIR/.yarn-cache" + mkdir -p "$UNTRUSTED_YARN_CACHE" + jailed \ + YARN_ENABLE_SCRIPTS=false \ + YARN_CACHE_FOLDER="$UNTRUSTED_YARN_CACHE" \ + yarn install --immutable + jailed \ + YARN_CACHE_FOLDER="$UNTRUSTED_YARN_CACHE" \ + yarn playwright install --with-deps chromium + + VERIFY_RESULT_PATH="$PR_HEAD_DIR/.verify-out-trusted/verify-result.json" + export VERIFY_RESULT_PATH + STUB_OUT_DIR="$(dirname "$VERIFY_RESULT_PATH")" + mkdir -p "$STUB_OUT_DIR" + echo "result-path=$VERIFY_RESULT_PATH" >> "$GITHUB_OUTPUT" + + write_compile_failure_stub() { + local log="$1" + node "$GITHUB_WORKSPACE/scripts/verify/ci/write-compile-failure-stub.ts" \ + --log "$log" \ + --out-dir "$STUB_OUT_DIR" \ + --template "internal-ui" + } + COMPILE_LOG="$RUNNER_TEMP/verify-compile.log" + SPEC=".verify-recipes/pr-${PR_NUMBER}.spec.ts" + TARGET="internal-ui" + TEMPLATE="" + if [ -f "$SPEC" ]; then + HEADER=$(grep -E '^// @verify-target:' "$SPEC" | head -1 || true) + CANDIDATE=$(echo "$HEADER" | sed -E 's|.*@verify-target:[[:space:]]*||;s|[[:space:]]+$||') + case "$CANDIDATE" in + sandbox:*) + TEMPLATE="${CANDIDATE#sandbox:}" + case "$TEMPLATE" in + react-vite/default-ts|react-webpack/default-ts|vue3-vite/default-ts|svelte-vite/default-ts|angular-cli/default-ts|nextjs/default-ts|nextjs-vite/default-ts) + TARGET="sandbox" + ;; + *) + echo "[verify] sandbox template '$TEMPLATE' is not allowlisted; falling back to internal-ui prep." + TEMPLATE="" + ;; + esac + ;; + esac + fi + + srt_compile() { + jailed "$@" + } + if [ "$TARGET" = "sandbox" ]; then + echo "[verify] spec targets sandbox '$TEMPLATE' — running 'nx run $TEMPLATE:sandbox'" + if ! srt_compile yarn nx run "$TEMPLATE:sandbox" 2>&1 | tee "$COMPILE_LOG"; then + write_compile_failure_stub "$COMPILE_LOG" + exit 0 + fi + else + if ! srt_compile yarn nx compile core 2>&1 | tee "$COMPILE_LOG"; then + write_compile_failure_stub "$COMPILE_LOG" + exit 0 + fi + if ! srt_compile yarn nx run-many -t compile 2>&1 | tee "$COMPILE_LOG"; then + write_compile_failure_stub "$COMPILE_LOG" + exit 0 + fi + fi + + # M2: thread VERIFY_PROVENANCE_SECRET into srt env explicitly via + # `env VAR=...` so the orchestrator can HMAC-verify the spec. srt + # propagates this var into the jailed process; spec code already + # cannot reach it because the spec runs as Playwright workers, and + # the orchestrator scrubs it before spawning workers. + srt --settings "$SRT_SETTINGS" \ + env VERIFY_RESULT_PATH="$VERIFY_RESULT_PATH" \ + VERIFY_PROVENANCE_SECRET="$PROVENANCE_SECRET" \ + yarn verify-pr --recipe-spec ".verify-recipes/pr-${PR_NUMBER}.spec.ts" || \ + echo "verify-pr exited non-zero — verdict captured in verify-result.json" + + - name: Evidence check (vision) + if: always() + env: + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + PR_NUMBER: ${{ github.event.pull_request.number }} + RESULT: ${{ steps.verify.outputs.result-path }} + run: | + set -euo pipefail + if [ ! -f "$RESULT" ]; then + echo "no verify-result.json at $RESULT — skipping evidence check" + exit 0 + fi + VERDICT=$(jq -r '.verdict' "$RESULT") + if [ "$VERDICT" != "verified" ]; then + echo "verdict is '$VERDICT' — skipping evidence check (retry will use Playwright error context)" + exit 0 + fi + RECIPE="$PR_HEAD_DIR/.verify-recipes/pr-${PR_NUMBER}.spec.ts" + yarn verify-evidence-check \ + --result "$RESULT" \ + --diff "$RUNNER_TEMP/pr.diff" \ + --recipe "$RECIPE" + + - name: Retry on regression or evidence missing/undetermined + if: always() + env: + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PR_NUMBER: ${{ github.event.pull_request.number }} + PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }} + BASE_SHA: ${{ github.event.pull_request.base.sha }} + NX_NO_CLOUD: 'true' + NX_CLOUD_ACCESS_TOKEN: '' + RESULT: ${{ steps.verify.outputs.result-path }} + PROVENANCE_SECRET_PATH: ${{ steps.prep.outputs.provenance-secret-path }} + run: | + set -euo pipefail + if [ ! -f "$RESULT" ]; then + echo "no verify-result.json at $RESULT — skipping retry" + exit 0 + fi + VERDICT=$(jq -r '.verdict' "$RESULT") + EVIDENCE=$(jq -r '.evidenceVerdict // "n/a"' "$RESULT") + # Playwright run dir lives under `$PR_HEAD_DIR/.verify-output//`, + # separate from the trusted result file. Use the result's runId to + # locate error-context.md + playwright-report.json. Local-dev (where + # both live colocated) falls through the existence check. + RUN_ID=$(jq -r '.runId // ""' "$RESULT") + RUN_DIR="$PR_HEAD_DIR/.verify-output/$RUN_ID" + [ -d "$RUN_DIR" ] || RUN_DIR=$(dirname "$RESULT") + + REASON="" + case "$VERDICT" in + regression) + ERROR_CTX="" + for f in "$RUN_DIR"/*/error-context.md; do + [ -f "$f" ] || continue + SLUG=$(basename "$(dirname "$f")") + ERROR_CTX+=$'\n\n--- page snapshot at failure ('"$SLUG"$') ---\n' + ERROR_CTX+=$(head -c 8000 "$f") + IF="$(dirname "$f")/iframe-snapshot.md" + if [ -f "$IF" ]; then + ERROR_CTX+=$'\n\n--- preview iframe snapshot at failure ('"$SLUG"$') ---\n' + ERROR_CTX+=$(head -c 8000 "$IF") + fi + done + REPORT="$RUN_DIR/playwright-report.json" + ERROR_MSG="" + if [ -f "$REPORT" ]; then + ERROR_MSG=$(jq -r ' + [.. | objects | select(has("errors")) | .errors[]? | (.message // .stack // tostring)] | .[0] // "" + ' "$REPORT" 2>/dev/null || true) + fi + REASON="Playwright assertions failed. The recipe ran but the test did not pass. Use the page snapshot below as ground truth for selectors / route paths / aria roles — if you navigated to a route that does not exist, the snapshot will say so; if a locator timed out, the actual DOM is in the snapshot. Adjust selectors / navigation accordingly. Do NOT repeat the previous attempt's selectors or routes verbatim." + if [ -n "$ERROR_MSG" ]; then + REASON+=$'\n\nFirst Playwright error:\n'"$ERROR_MSG" + fi + REASON+="$ERROR_CTX" + echo "regression — retrying with error context (length=${#REASON})" + ;; + *) + if [ "$EVIDENCE" != "missing" ] && [ "$EVIDENCE" != "undetermined" ]; then + echo "verdict=$VERDICT evidence=$EVIDENCE — no retry needed" + exit 0 + fi + REASON=$(jq -r '.evidenceReasoning // ""' "$RESULT") + echo "evidence $EVIDENCE — retrying with vision reasoning" + ;; + esac + + # shellcheck disable=SC2012 + PRIOR_RUN_DIR=$(ls -dt "$PR_HEAD_DIR"/.verify-output/*/ 2>/dev/null | head -1 || true) + if ! yarn verify-pr-generate \ + --pr "$PR_NUMBER" \ + --force \ + --base-sha "$BASE_SHA" \ + --head-sha "$PR_HEAD_SHA" \ + --prior-run-dir "${PRIOR_RUN_DIR%/}" \ + --output "$PR_HEAD_DIR/.verify-recipes/pr-${PR_NUMBER}.spec.ts" \ + --retry-context "$REASON"; then + echo "retry verify-pr-generate failed — keeping original verdict" + exit 0 + fi + + # shellcheck disable=SC2012 + BUNDLE=$(ls -t .verify-output/*/prompt-bundle.json | head -1) + PROVENANCE_SECRET="$(cat "$PROVENANCE_SECRET_PATH")" + if ! VERIFY_PROVENANCE_SECRET="$PROVENANCE_SECRET" yarn verify-pr-author --bundle "$BUNDLE"; then + echo "retry verify-pr-author failed — keeping original verdict" + exit 0 + fi + + PREV_MTIME=$(stat -c '%Y' "$RESULT" 2>/dev/null || stat -f '%m' "$RESULT" 2>/dev/null || echo 0) + + ( + cd "$PR_HEAD_DIR" + # Distinct jail by design (separate step from the main Verify + # step, so the `jailed()` helper is out of scope): no SINK_DIR + # GITHUB_* redirection here, and VERIFY_PROVENANCE_SECRET + + # VERIFY_RESULT_PATH are threaded in for the HMAC-signed verdict + # re-run. Keep this list explicit for security auditability. + env -i \ + HOME="$HOME" \ + PATH="$PATH" \ + RUNNER_TEMP="$RUNNER_TEMP" \ + NX_NO_CLOUD=true \ + PR_HEAD_DIR="$PR_HEAD_DIR" \ + VERIFY_RESULT_PATH="$RESULT" \ + VERIFY_PROVENANCE_SECRET="$PROVENANCE_SECRET" \ + srt --settings "$SRT_SETTINGS" \ + yarn verify-pr --recipe-spec ".verify-recipes/pr-${PR_NUMBER}.spec.ts" + ) || \ + echo "retry verify-pr exited non-zero — evidence-check will still run if verdict==verified" + + NEW_MTIME=$(stat -c '%Y' "$RESULT" 2>/dev/null || stat -f '%m' "$RESULT" 2>/dev/null || echo 0) + if [ "$NEW_MTIME" = "$PREV_MTIME" ]; then + echo "retry produced no new verify-result.json — keeping original verdict" + exit 0 + fi + jq '. + {evidenceRetry: true}' "$RESULT" > "$RESULT.tmp" && mv "$RESULT.tmp" "$RESULT" + # W4 CONTRACT (HMAC verdict integrity): the jq mutation above is a + # trusted in-place rewrite of the signed verify-result.json. Every + # trusted writer MUST re-sign so the `.sig` stays current — today + # `evidenceRetry` is outside SIGNED_FIELDS so the old sig still + # validates, but adding any mutated field to SIGNED_FIELDS would + # otherwise flip every retried PR to forgery-detected. Re-sign via + # the exported signResultFile from the BASE-checkout core.ts (same + # trusted-script source as write-compile-failure-stub.ts / + # derive-verdict.ts). PROVENANCE_SECRET is already loaded above in + # this trusted step. + # Rollout-ordering guard: signResultFile is a NEW base-core.ts + # export. Until this change is merged to the base ref, a PR run + # (harness taken from base) imports an old core.ts lacking it. + # Feature-detect: if absent we are mid-rollout — warn and skip + # (the `.sig` is still valid over SIGNED_FIELDS since evidenceRetry + # is excluded); a genuine signing throw still fails the step loudly. + VERIFY_PROVENANCE_SECRET="$PROVENANCE_SECRET" \ + node --experimental-strip-types --input-type=module -e ' + const m = await import("'"$GITHUB_WORKSPACE"'/scripts/verify/core.ts"); + if (typeof m.signResultFile !== "function") { + console.warn("[verify] signResultFile not in base core.ts yet (pre-rollout) — skipping re-sign; .sig still valid for unsigned fields"); + process.exit(0); + } + await m.signResultFile(process.argv[1], process.env.VERIFY_PROVENANCE_SECRET); + ' "$RESULT" + + NEW_VERDICT=$(jq -r '.verdict' "$RESULT") + if [ "$NEW_VERDICT" = "verified" ]; then + yarn verify-evidence-check \ + --result "$RESULT" \ + --diff "$RUNNER_TEMP/pr.diff" \ + --recipe "$PR_HEAD_DIR/.verify-recipes/pr-${PR_NUMBER}.spec.ts" || true + fi + + FINAL_EVIDENCE=$(jq -r '.evidenceVerdict // "n/a"' "$RESULT") + echo "retry complete — final verdict=$NEW_VERDICT evidence=$FINAL_EVIDENCE" + + - name: Run PR-added unit tests + if: always() + working-directory: ${{ runner.temp }}/pr-head + env: + PR_NUMBER: ${{ github.event.pull_request.number }} + NX_NO_CLOUD: 'true' + NX_CLOUD_ACCESS_TOKEN: '' + RESULT: ${{ steps.verify.outputs.result-path }} + run: | + set -euo pipefail + # strip untrusted-PR secrets — single source, TRUSTED base checkout + # only (see Verify PR step for rationale). Fail-closed if absent. + STRIP_SH="$GITHUB_WORKSPACE/scripts/verify/ci/strip-untrusted-secrets.sh" + case "$GITHUB_WORKSPACE" in /*) : ;; *) echo "GITHUB_WORKSPACE not absolute" >&2; exit 1 ;; esac + test -f "$STRIP_SH" || { echo "trusted strip script missing — refusing to run untrusted code" >&2; exit 1; } + source "$STRIP_SH" + if [ ! -f "$RESULT" ]; then + echo "no verify-result.json at $RESULT — skipping unit-test step" + exit 0 + fi + + TEST_FILES=() + while IFS= read -r f; do + [ -z "$f" ] && continue + [ -f "$f" ] || continue + TEST_FILES+=("$f") + done < <( + grep -E '^\+\+\+ b/code/.+\.(test|spec)\.(ts|tsx|js|jsx)$' "$RUNNER_TEMP/pr.diff" 2>/dev/null \ + | sed 's|^+++ b/||' \ + | sort -u + ) + + if [ ${#TEST_FILES[@]} -eq 0 ]; then + echo "no PR-added unit tests detected — recording n/a" + jq '. + {unitTests: {ran: false, files: [], passed: null, summary: "no PR-added test files in diff"}}' \ + "$RESULT" > "$RESULT.tmp" && mv "$RESULT.tmp" "$RESULT" + exit 0 + fi + + echo "running ${#TEST_FILES[@]} unit test file(s): ${TEST_FILES[*]}" + # vitest runs inside srt; srt allowWrite excludes $RUNNER_TEMP root + # (only $RUNNER_TEMP/sandbox-tmp is whitelisted there). Writing the + # JSON reporter output to $RUNNER_TEMP/* hits EROFS. Place under + # $PR_HEAD_DIR/.verify-output (already in allowWrite). PR-writable + # but unitTests fields are not in the HMAC-signed set so forgery + # cannot upgrade verdict; derive-verdict can only downgrade verified + # → regression from unitTests.passed=false, never upgrade. + mkdir -p "$PR_HEAD_DIR/.verify-output" + REPORT="$PR_HEAD_DIR/.verify-output/unit-tests-report.json" + VITEST_LOG="$PR_HEAD_DIR/.verify-output/vitest.log" + # `env -i` strips CLAUDE_CODE_TMPDIR. srt derives its sandbox tmp + # from that var; the main recipe run inherits it via $GITHUB_ENV + # ($SANDBOX_TMPDIR), but here env -i drops it so srt falls back to + # its hardcoded default `/tmp/claude`, which is never created → + # Yarn's mktempPromise realpaths it (`lstat '/tmp/claude'` ENOENT) + # and aborts before vitest starts → false "no JSON report" + # regression (eval #36). Pass CLAUDE_CODE_TMPDIR (+ TMPDIR) to an + # existing allowWrite dir, same reason REPORT/VITEST_LOG live under + # $PR_HEAD_DIR/.verify-output. + VITEST_TMPDIR="$PR_HEAD_DIR/.verify-output/vitest-tmp" + mkdir -p "$VITEST_TMPDIR" + + set +e + env -i \ + HOME="$HOME" \ + PATH="$PATH" \ + RUNNER_TEMP="$RUNNER_TEMP" \ + TMPDIR="$VITEST_TMPDIR" \ + CLAUDE_CODE_TMPDIR="$VITEST_TMPDIR" \ + NX_NO_CLOUD=true \ + PR_HEAD_DIR="$PR_HEAD_DIR" \ + srt --settings "$SRT_SETTINGS" \ + yarn vitest run --cache=false --reporter=json --outputFile "$REPORT" -- "${TEST_FILES[@]}" > "$VITEST_LOG" 2>&1 + VITEST_EXIT=$? + set -e + + if [ -f "$REPORT" ]; then + PASSED=$(jq '.numFailedTests == 0 and .numFailedTestSuites == 0' "$REPORT") + SUMMARY=$(jq -r '"\(.numPassedTests) passed, \(.numFailedTests) failed across \(.numTotalTestSuites) suite(s)"' "$REPORT") + else + PASSED=false + SUMMARY="vitest exited $VITEST_EXIT without writing a JSON report (likely setup error); see Action log" + fi + DETAILS=$(tail -c 4000 "$VITEST_LOG" \ + | perl -pe 's/\x1B[@-Z\\-_]|\x1B\[[0-?]*[ -\/]*[@-~]|\x1B\][^\x07\x1B]*(?:\x07|\x1B\\)|\x1BP[^\x1B]*\x1B\\|[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]//g' \ + | jq -Rs .) + + jq \ + --argjson passed "$PASSED" \ + --arg summary "$SUMMARY" \ + --argjson files "$(printf '%s\n' "${TEST_FILES[@]}" | jq -R . | jq -s .)" \ + --argjson details "$DETAILS" \ + '. + {unitTests: {ran: true, files: $files, passed: $passed, summary: $summary, details: $details}}' \ + "$RESULT" > "$RESULT.tmp" && mv "$RESULT.tmp" "$RESULT" + echo "unit-tests verdict: $SUMMARY (passed=$PASSED, exit=$VITEST_EXIT)" + + - name: Publish agentic results + id: pub + if: always() + uses: ./.github/actions/agentic-pr-publish + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + pr-number: ${{ github.event.pull_request.number }} + run-id: ${{ github.run_id }} + repo: ${{ github.repository }} + result-path: ${{ steps.verify.outputs.result-path }} + provenance-secret-path: ${{ steps.prep.outputs.provenance-secret-path }} + telemetry-webhook-url: ${{ secrets.TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_URL }} + telemetry-webhook-token: ${{ secrets.TELEMETRY_AGENTIC_VERIFICATION_WEBHOOK_TOKEN }} + + - name: Apply verified-by-harness label + if: steps.pub.outputs.verdict == 'verified' + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PR_NUMBER: ${{ github.event.pull_request.number }} + REPO: ${{ github.repository }} + run: | + gh label create verified-by-harness \ + --repo "$REPO" \ + --description "Verified by PR Verify Harness" \ + --color 0E8A16 \ + 2>/dev/null || true + gh pr edit "$PR_NUMBER" \ + --repo "$REPO" \ + --add-label verified-by-harness + + - name: Post PR comment + # Body rendering lives in scripts/verify/ci/render-pr-comment.ts so the + # workflow stays slim and the logic is testable in isolation. H4: + # screenshot URLs are read from FILE path (not a heredoc string) to + # close the terminator-injection surface across the composite boundary. + if: always() + env: + SCREENSHOT_URLS_PATH: ${{ steps.pub.outputs.screenshot-urls-path }} + RESULT: ${{ steps.verify.outputs.result-path }} + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + REPO: ${{ github.repository }} + PR_NUMBER: ${{ github.event.pull_request.number }} + RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} + run: | + set -euo pipefail + BODY_FILE="$RUNNER_TEMP/verify-comment-body.md" + node "$GITHUB_WORKSPACE/scripts/verify/ci/render-pr-comment.ts" \ + --result "$RESULT" \ + --run-url "$RUN_URL" \ + ${SCREENSHOT_URLS_PATH:+--urls-path "$SCREENSHOT_URLS_PATH"} \ + --output "$BODY_FILE" + gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file "$BODY_FILE" + + - name: Fail job if final verdict is not verified + if: always() && steps.pub.outputs.verdict != 'verified' + env: + VERDICT: ${{ steps.pub.outputs.verdict }} + run: | + echo "Final verdict: ${VERDICT:-} — failing job." + exit 1 diff --git a/.gitignore b/.gitignore index c8417670f741..ea5b80e7b5bc 100644 --- a/.gitignore +++ b/.gitignore @@ -88,5 +88,9 @@ scripts/eval/results # review-pr skill output .pr-review +# verify-pr harness output +.verify-output +.verify-scratch + # Unknown .omc diff --git a/.verify-recipes/.eslintrc.cjs b/.verify-recipes/.eslintrc.cjs new file mode 100644 index 000000000000..0f84cd20f0cc --- /dev/null +++ b/.verify-recipes/.eslintrc.cjs @@ -0,0 +1,141 @@ +module.exports = { + root: true, + parser: require.resolve('@typescript-eslint/parser'), + parserOptions: { ecmaVersion: 2022, sourceType: 'module', project: false }, + env: { node: true, es2022: true }, + plugins: ['@typescript-eslint', 'verify-recipes'], + extends: ['eslint:recommended', 'plugin:@typescript-eslint/recommended'], + rules: { + 'no-unused-vars': 'off', + '@typescript-eslint/no-unused-vars': [ + 'error', + { argsIgnorePattern: 'none', varsIgnorePattern: 'none' }, + ], + + // Behavioral recipes must reach untyped runtime globals + // (`__STORYBOOK_ADDONS_MANAGER`, `__STORYBOOK_ADDONS_CHANNEL__`, the + // manager-api singleton) inside `page.evaluate()` — `as any` there is + // correct and unavoidable. `@typescript-eslint/recommended` makes + // `no-explicit-any` an error, which produced no-verdict failures on + // valid manager-api recipes (eval #31). This is a code-quality rule, + // NOT a security control — deny-regex + no-restricted-{globals, + // imports,syntax} remain the load-bearing gates. Off here. + '@typescript-eslint/no-explicit-any': 'off', + + // Security: forbid dynamic code execution + 'no-eval': 'error', + 'no-new-func': 'error', + 'no-implied-eval': 'error', + 'no-restricted-globals': ['error', 'eval', 'Function'], + + // Security (C6): deny every node: built-in (and its bare-form alias). + // ESLint's no-restricted-imports has no "allow-list" mode, so we + // enumerate the dangerous module names explicitly and pair it with + // a `node:*` glob to catch the prefixed forms regardless of which + // built-in shows up. The intent is an allow-list: imports allowed + // in recipes are limited to `@playwright/test`, `./_util.ts`, and + // `./_util` (and the deny-regex tripwire enforces the same). + 'no-restricted-imports': [ + 'error', + { + paths: [ + { name: 'child_process' }, + { name: 'node:child_process' }, + { name: 'fs' }, + { name: 'fs/promises' }, + { name: 'node:fs' }, + { name: 'node:fs/promises' }, + { name: 'net' }, + { name: 'node:net' }, + { name: 'dns' }, + { name: 'node:dns' }, + { name: 'http' }, + { name: 'node:http' }, + { name: 'https' }, + { name: 'node:https' }, + { name: 'module' }, + { name: 'node:module' }, + { name: 'vm' }, + { name: 'node:vm' }, + { name: 'cluster' }, + { name: 'node:cluster' }, + { name: 'worker_threads' }, + { name: 'node:worker_threads' }, + { name: 'os' }, + { name: 'node:os' }, + { name: 'path' }, + { name: 'node:path' }, + { name: 'stream' }, + { name: 'node:stream' }, + { name: 'tls' }, + { name: 'node:tls' }, + ], + patterns: [ + { + group: ['node:*'], + message: + 'node: built-ins are forbidden in recipes. Imports allowed: @playwright/test, ./_util.ts, ./_util.', + }, + ], + }, + ], + + // Security (C6): forbid runtime resolver / dynamic eval / native bindings. + // Each selector pins one obfuscation path that would otherwise sneak past + // the static import allow-list. + 'no-restricted-syntax': [ + 'error', + { + selector: "CallExpression[callee.name='require']", + message: 'Runtime require() is forbidden in recipes.', + }, + { + selector: "MemberExpression[property.name='require']", + message: + 'Member-access require (e.g. `foo.require`, `module.require`) is forbidden in recipes.', + }, + { + selector: + "MemberExpression[object.object.name='process'][object.property.name='mainModule']", + message: 'process.mainModule.* access is forbidden in recipes.', + }, + { + selector: "MemberExpression[object.name='globalThis'][computed=true]", + message: 'Computed globalThis[...] access is forbidden in recipes.', + }, + { + selector: 'ImportExpression', + message: 'Dynamic import() is forbidden in recipes.', + }, + { + selector: "Identifier[name='createRequire']", + message: 'createRequire is forbidden in recipes.', + }, + { + selector: "MemberExpression[property.name='_load']", + message: 'Module._load access is forbidden in recipes.', + }, + { + selector: "MemberExpression[object.name='process'][property.name='binding']", + message: 'process.binding() is forbidden in recipes.', + }, + { + selector: "MemberExpression[object.name='process'][property.name='dlopen']", + message: 'process.dlopen is forbidden in recipes.', + }, + { + selector: + "CallExpression[callee.object.name='process'][callee.property.name=/^(exit|kill|binding)$/]", + message: 'process.exit/kill/binding are forbidden in recipes.', + }, + { + selector: "CallExpression[callee.name='fetch']", + message: 'Global fetch is forbidden in recipes; use page.* primitives.', + }, + ], + + // Custom structural rules for Playwright recipe correctness + 'verify-recipes/listener-before-goto': 'error', + 'verify-recipes/attach-pattern': 'error', + }, +}; diff --git a/.verify-recipes/.gitkeep b/.verify-recipes/.gitkeep new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/.verify-recipes/__fixtures__/bad-node-import.ts b/.verify-recipes/__fixtures__/bad-node-import.ts new file mode 100644 index 000000000000..3bd9bb0864f0 --- /dev/null +++ b/.verify-recipes/__fixtures__/bad-node-import.ts @@ -0,0 +1,7 @@ +// Fixture: should fail ESLint with no-restricted-imports +// This file intentionally imports a forbidden node: built-in. +import { readFileSync } from 'node:fs'; + +export function dummy() { + return readFileSync('/dev/null', 'utf8'); +} diff --git a/.verify-recipes/__fixtures__/bypass-process-mainmodule.ts b/.verify-recipes/__fixtures__/bypass-process-mainmodule.ts new file mode 100644 index 000000000000..3420e26348a4 --- /dev/null +++ b/.verify-recipes/__fixtures__/bypass-process-mainmodule.ts @@ -0,0 +1,10 @@ +// Fixture (C6 bypass-attempt): should fail ESLint with no-restricted-syntax. +// Tries to reach child_process via process.mainModule.require — caught by the +// process.mainModule + member-access require selectors. +import { test } from './_util.ts'; + +test('attempt to load child_process via process.mainModule', () => { + // @ts-expect-error - process.mainModule is non-null at runtime in Node. + const cp = process.mainModule.require('child_process'); + cp.execSync('echo pwned'); +}); diff --git a/.verify-recipes/__fixtures__/goto-without-listener.ts b/.verify-recipes/__fixtures__/goto-without-listener.ts new file mode 100644 index 000000000000..5882dafeef96 --- /dev/null +++ b/.verify-recipes/__fixtures__/goto-without-listener.ts @@ -0,0 +1,8 @@ +// Fixture: should fail ESLint with verify-recipes/listener-before-goto +// This spec calls page.goto() without registering a listener first. +import { test, expect } from './_util.ts'; + +test('navigate without listener', async ({ page }) => { + await page.goto('/'); + await expect(page).toHaveURL('/'); +}); diff --git a/.verify-recipes/_recipe-authoring-guide.md b/.verify-recipes/_recipe-authoring-guide.md new file mode 100644 index 000000000000..0b9925288a2d --- /dev/null +++ b/.verify-recipes/_recipe-authoring-guide.md @@ -0,0 +1,681 @@ +# Recipe Authoring Guide (for LLM recipe-author agents) + +This file is the **authoring contract** for agent-generated Playwright recipes in `.verify-recipes/`. The `verify-recipe-author` skill includes this guide verbatim in the prompt; the runner executes the committed spec via `bun x playwright test`. + +> **Audience:** an LLM that writes a single `.spec.ts` file for one PR. The output must match the contract below exactly — no exceptions. + +--- + +## 1. Output contract + +Emit **one file** at the path specified by the skill: `.verify-recipes/pr-<#>.spec.ts`. + +Required shape: + +```ts +import { RecipePage, expect, filterPageErrors, test } from './_util.ts'; + +test('', async ({ page }, testInfo) => { + // ... see rules below ... +}); +``` + +Hard requirements: + +- **Imports**: ONLY `./_util.ts` (which re-exports `expect`, `filterPageErrors`, and a `test` extended with the harness's auto-failure-capture fixture — captures the preview iframe accessibility snapshot to `iframe-snapshot.md` so the retry loop can feed it back to the next author dispatch). Nothing else. No `node:*`, no `child_process`, no `fs`, no `@storybook/*`, no relative imports outside `.verify-recipes/`. Do not import `test` or `expect` directly from `@playwright/test`; that bypasses the failure-capture fixture. +- **Exactly one `test(...)` call.** No `describe`, no `test.skip`, no `test.only`, no `beforeEach`/`afterEach`. +- **`.ts` extension on relative imports** (`./_util.ts`, not `./_util`). +- **No top-level side effects** — everything inside the `test(...)` callback. + +Output is wrapped between fenced markers `<<>>` and `<<>>` (the skill strips these and writes the body). + +--- + +## 2. Listener-before-goto rule (HARD GATE — AC-V3-3) + +`page.on('pageerror', ...)` and `page.on('console', ...)` listeners MUST be registered **before** the first `page.goto(...)` call. The skill's post-write regex check enforces this; if you call `page.goto` first, the spec is rejected. + +Canonical pattern: + +```ts +test('my recipe', async ({ page }, testInfo) => { + const pageErrors: string[] = []; + const consoleErrors: string[] = []; + + // Listeners FIRST. Always. + page.on('pageerror', (err) => { + pageErrors.push(err.stack ?? err.message ?? String(err)); + }); + page.on('console', (msg) => { + if (msg.type() === 'error') consoleErrors.push(msg.text()); + }); + + const baseURL = + process.env.STORYBOOK_URL ?? testInfo.project.use.baseURL ?? 'http://localhost:6006'; + + // Now (and only now) navigate. + await page.goto(`${baseURL}/?path=/story/example-button--primary`); + // ... +}); +``` + +Never call `page.goto` (or `page.waitForURL`, or any other navigation primitive) before the listeners are attached. + +--- + +## 3. Attach pattern (HARD GATE — AC-V3-4) + +The runner harvests `pageErrors` and `consoleErrors` from test attachments. You MUST attach both in a `finally` block (so attachments land even on assertion failure): + +```ts +try { + // ...goto + assertions... +} finally { + await testInfo.attach('pageErrors', { + body: JSON.stringify(pageErrors), + contentType: 'application/json', + }); + await testInfo.attach('consoleErrors', { + body: JSON.stringify(consoleErrors), + contentType: 'application/json', + }); +} +``` + +Attachment names are exactly `pageErrors` and `consoleErrors`. The body is JSON-stringified array of strings (already accumulated by the listeners). + +### Filtering known low-signal pageErrors + +When you assert `pageErrors` at the end of the recipe, wrap the array in +`filterPageErrors(...)` from `./_util.ts`: + +```ts +expect(filterPageErrors(pageErrors)).toEqual([]); +``` + +`filterPageErrors` drops upstream-known noise — currently the cross-origin +`SecurityError: Failed to read the 'sessionStorage' property from 'Window'` +that `@storybook/addon-mcp` emits on every internal-ui boot when its +composed-ref auth probe touches chromatic-hosted iframes. The runner's +`computeVerdict` applies the same filter on the attachment side, so +**`filterPageErrors(pageErrors)` keeps the local assertion in sync with the +runner's verdict logic** and prevents a "regression" verdict driven entirely +by environmental noise. Never assert on the raw `pageErrors` array. + +### Filtering known low-signal consoleErrors (MANDATORY) + +The same applies to `consoleErrors` — wrap in `filterConsoleErrors(...)` +from `./_util.ts`: + +```ts +expect(filterConsoleErrors(consoleErrors)).toEqual([]); +``` + +The harness runs the preview inside an **`srt` egress jail** that denies +every domain not on the allowlist. internal-ui's external probes +(telemetry, composed refs, fonts, analytics) therefore **always** log +`Failed to load resource: net::ERR_INTERNET_DISCONNECTED` (and other +`net::ERR_*`) in CI — environmental, not a PR regression. +`expect(consoleErrors).toEqual([])` on the **raw** array is a guaranteed +false regression (eval #36). **Never assert the raw `consoleErrors` +array — always `filterConsoleErrors(consoleErrors)`.** Import it from +`./_util.ts` alongside `filterPageErrors`. + +--- + +## 4. `RecipePage` API (the only helper) + +From `./_util.ts`: + +```ts +new RecipePage(page, expect).waitUntilLoaded(): Promise +new RecipePage(page, expect).previewIframe(): FrameLocator +new RecipePage(page, expect).previewRoot(): Locator +new RecipePage(page, expect).waitForStoryLoaded(): Promise +new RecipePage(page, expect).scratchDir: string +new RecipePage(page, expect).writeFixture(relPath: string, contents: string): string +``` + +- `waitUntilLoaded()` injects a session-storage layout, disables transitions, waits for `.sb-preparing-story` / `.sb-preparing-docs` to vanish, then for the story root to be attached. +- `previewIframe()` returns `page.frameLocator('#storybook-preview-iframe')` — use for any preview-frame assertions. +- `previewRoot()` returns the visible `#storybook-root` (or `#storybook-docs`) inside the preview iframe. +- `scratchDir` is the absolute path of `$PR_HEAD_DIR/.verify-scratch` — the **only** sanctioned on-disk write location for recipes. +- `writeFixture(relPath, contents)` writes a file under `scratchDir` (parent dirs auto-created) and returns its absolute path. `relPath` must be relative and stay inside the scratch dir. + +Call `waitUntilLoaded()` immediately after `page.goto(...)`. + +### Writing fixtures to disk (non-visual recipes) + +Most recipes only drive the browser and never touch the filesystem. A +non-visual recipe (behavioral / pure-fn / type-only / build-config) that must +write a fixture or config to exercise a code path **must** use +`writeFixture()` / `scratchDir` — never write elsewhere. + +The harness runs recipes inside an `srt` jail. `$GITHUB_WORKSPACE` and `.git` +are **denyWrite**; `$PR_HEAD_DIR` (which contains `.verify-scratch`) is +**allowWrite**. The scratch dir is pre-created by the prepare composite and +gitignored. Do not attempt to widen the jail or write to the repo checkout — +a recipe that does will fail at runtime, not be granted access. + +```ts +const sb = new RecipePage(page, expect); +const cfgPath = sb.writeFixture('preview-head.html', ''); +// hand cfgPath to the code under test, then assert on the resulting behavior +``` + +--- + +## 5. Selectors and locators + +Preferred (in priority order): + +1. `page.getByRole(...)` — accessibility-tree queries, most stable +2. `page.getByTestId(...)` / `data-testid` selectors +3. ID selectors (`#storybook-preview-iframe`, `#sb-errordisplay`, `#storybook-root`) +4. Class selectors that look stable (`.sb-preparing-story` etc.) + +Avoid: + +- `:nth-child(N)` chains — break on layout shifts +- Brittle class chains (`.foo .bar .baz > div`) +- Free-text matches without `i18n` context +- `setTimeout` / `page.waitForTimeout` for synchronization — use Playwright web-first assertions or `RecipePage.waitUntilLoaded()` + +--- + +## 6. Story URL routing + +- Story: `?path=/story/--` (e.g., `/?path=/story/example-button--primary`) +- Docs: `?path=/docs/--` (e.g., `/?path=/docs/example-button--docs`) +- Manager only (no story): omit the `path` param or use `?path=/` + +**Use the routes the harness pre-computes for you.** The prompt bundle contains a "Story routes (computed deterministically by the harness)" section that lists, for each `*.stories.{ts,tsx,mdx}` file referenced by the diff (or imported by a sibling of a touched non-stories source file), the canonical title, the per-export `storyId`, and the matching `storyUrl` / `docsUrl`. These come from Storybook's own auto-title + `toId` algorithms, so they match what the indexer would emit at runtime. + +Past dispatches that hand-derived kebab-case kind-ids (`addons-controls-object--basic`, `addons-controls-basics--docs`, …) have 404'd because Storybook's auto-title pipeline mangles paths differently than a naive kebabify (leaf/dir dedupe, `index.stories.ts` collapsing, `titlePrefix` interplay, etc). Always prefer the routes the harness emits. + +If the section is absent (because the diff doesn't touch any code under `code/` or because no sibling story imports the changed module), fall back to the manager-only route `?path=/` and rely on sidebar-driven navigation. Do not invent a URL. + +--- + +## 7. Frame access + +- Manager DOM (toolbar, sidebar, addon panels): use `page` directly. +- Preview iframe DOM (the story itself): use `page.frameLocator('#storybook-preview-iframe')` or `recipe.previewIframe()`. + +Example: + +```ts +const recipe = new RecipePage(page, expect); +await recipe.waitUntilLoaded(); + +const previewIframe = recipe.previewIframe(); +const button = previewIframe.getByRole('button', { name: /primary/i }); +await expect(button).toBeVisible(); +``` + +--- + +## 8. Assertions — what counts as a meaningful recipe + +A **smoke-shaped recipe** is the minimum: navigate to a story, wait until loaded, assert preview-root has children, `#sb-errordisplay` is hidden, screenshot the iframe, attach errors. See `example-smoke.spec.ts` for the canonical form. + +A **targeted recipe** goes further. Examples per change type: + +| Diff touches | Recipe should additionally | +|---|---| +| `code/addons//**` | Open the addon panel (`recipe.previewIframe()` may be irrelevant; manager queries needed); assert addon-tab present; trigger the addon's primary interaction | +| `code/core/src/manager/**` | Assert sidebar entries render; navigate between two stories; assert URL update | +| `code/core/src/manager-api/**` | Assert at least one channel-bound UI element responds (e.g., theme toggle, tab switch) | +| `code/core/src/csf-tools/**` | Open a story whose CSF the PR touches; assert it indexes (visible in sidebar tree) | +| `code/core/src/preview-api/**` | Open a story with args/decorators in scope; assert `previewRoot()` rendered without errordisplay | +| `code/frameworks//**` | Use the framework's reference template story (e.g., svelte → svelte-vite default story); confirm SSR/CSR hydration shape if applicable | +| `code/builders/**` | Assert preview-iframe loads at all (builder errors surface here); navigate to a story; confirm HMR not needed for static load | + +Pick the assertion shape that most directly observes the changed code path. Prefer 1-3 focused assertions over a long list — the runner harvests pageerrors/consoleerrors orthogonally. + +--- + +## 8.1 Evidence requirement (HARD GATE for single-round CI) + +In single-round CI mode the assertions + screenshots ARE the evidence the harness reports as "verified". A smoke recipe that asserts unrelated story behaviour is technically passable but **does not verify the diff** — the PR comment will mislead reviewers. Treat this section as a hard authoring gate. + +Before emitting the spec, work through the following four questions explicitly: + +1. **What does this PR visibly or behaviourally change?** Read the diff carefully. Icon swap, text change, conditional render branch, focus / hover / dark-mode state, addon panel content, sidebar tree, URL params, computed style — all qualify. +2. **What UI state is required to see the change?** Common gates: + - **Conditional render** — e.g. `if (newCount === 0 && modifiedCount === 0) return null`. The element only mounts when its predicate is true. Identify the predicate's inputs and either set them via `page.evaluate(...)` against the manager-api / universal store, set localStorage / sessionStorage keys, or navigate to a route that produces the required state. + - **Feature flags** — `globalThis.FEATURES.changeDetection` and similar flags are **enabled by default** in the internal-ui Storybook. The diff itself is the only authoritative source for whether a new flag must be set. + - **Theme / dark-mode** — pass `?globals=theme:dark` in the URL or set the theme via `manager-api` once the manager mounts. + - **Focus / hover / keyboard-only states** — use `.focus()`, `.hover()`, `page.keyboard.press('Tab')`. Many a11y-related PRs only render their change in these states. + - **Specific story route** — when the diff names a specific component, navigate to the story that mounts it, not the generic `example-button--primary`. +3. **Before deciding the trigger state is unreachable, walk through every affordance listed in the next subsection.** For each one, decide whether it applies to this diff. Most "I can't do this without `fs.*`" assumptions turn out to be wrong because Storybook's own in-app machinery exposes a path: Save from Controls writes story files via csf-tools, `page.evaluate` reaches manager-api setters, URL globals flip theme/args, and so on. **Only after explicitly considering each affordance and rejecting it with a one-sentence reason** may you fall back to: render the surrounding container, assert `#sb-errordisplay` is hidden, assert `expect(filterPageErrors(pageErrors)).toEqual([])`. The bare phrase "working-tree mutation required" is **not** a valid fallback justification — Save from Controls satisfies that exact need without ever touching `fs.*`. The fallback is reserved for cases where (a) the diff is non-visual at all (pure type/logic refactor), or (b) the visible effect depends on env state outside the runner's reach. Either way, state the rejected affordances in the spec comment so a reviewer can audit the reasoning. +4. **Screenshot the region containing the changed UI**, not the whole page. Use `locator.screenshot({ path: testInfo.outputPath('.png') })` against the parent of the changed element (e.g. `.sidebar-container` for sidebar diffs, the addon-panel locator for addon panels, the docs `[role="table"]` for ArgsTable changes). Full-page or generic preview screenshots are acceptable only for layout-wide changes. The PR comment renders every screenshot you attach inline — reviewers should see the change in the image. + +### Affordances Playwright recipes have for setting up trigger state + +The deny-regex blocks `fs.*` and `child_process` inside the spec body. That does **not** mean the test cannot reach state that lives on disk — Storybook's own in-app machinery exposes plenty of paths. Before declaring a trigger state unreachable, consider: + +- **URL params for navigation, theme, args, globals, and docs vs story modes.** + - `?path=/story/--` and `?path=/docs/--` for story / docs routes. + - `?globals=theme:dark` (or whatever global the renderer exposes) to flip dark-mode and other globals without clicking the toolbar. + - `?args=name:Hello` to seed initial arg values for a story. +- **`page.evaluate(...)` against the manager-api.** Storybook exposes its manager-api on `window` once the manager mounts — useful when a feature has a public toggle / setter that recipes can call directly. Inspect the diff for an `experimental_*` or `api.*` setter the change relies on and call it from the recipe. +- **Save from Controls (csf-tools write-back) for change-detection-style features.** The Controls addon's save button is enabled by default. The recipe opens a story (e.g. `example-button--primary`), clicks the Controls tab (`getByRole('tab', { name: /controls/i })`), edits a control value (e.g. the `label` input), and clicks **`Save changes to story`** (aria-label) / **`Update story`** (visible text) — i.e. `getByRole('button', { name: /save changes to story|update story/i })`. Storybook's csf-tools writes the modified args back to the underlying `*.stories.tsx` file on the runner's PR-head workspace. The change-detection scanner reads uncommitted working-tree state, so the Save-driven edit flips the story's status to MOD and surfaces change-detection UI (e.g. `ReviewChangesButton`'s clear button) in the sidebar. The recipe never touches `fs.*` directly — Storybook does the write. +- **Keyboard / focus / hover.** `.focus()`, `.hover()`, `page.keyboard.press('Tab')`, `page.keyboard.press('Escape')`. Many a11y / interaction PRs only render their change in these states. +- **localStorage / sessionStorage / cookies.** Read or write via `page.evaluate(...)` when the change depends on persisted UI state (e.g. sidebar collapse, recently-viewed list). +- **Manager-side state via `__STORYBOOK_*` globals.** When the diff touches preview-api or manager-api code that exposes a development hook on `globalThis`, prefer `page.evaluate(() => globalThis.__STORYBOOK_*…)` over reverse-engineering a click sequence. + +Only fall back to the §8.1.3 "trigger state is genuinely unreachable" path after walking through the affordances above and confirming none apply to the diff at hand. If none apply, say so explicitly in a single-line comment in the spec body and limit the assertions to module-resolution + pageerror — the harness's evidence-check will report the gap honestly to reviewers. + +### Worked example — focus ring on a selected sidebar item + +```ts +await page.goto(`${baseURL}/?path=/story/example-button--primary`); +await new RecipePage(page, expect).waitUntilLoaded(); + +const selected = page.locator( + '[data-item-id="example-button--primary"][data-selected="true"]', +); +await selected.focus(); // trigger the focus-ring state + +await expect(selected).toHaveCSS('box-shadow', /inset.+2px/i); + +// Screenshot the sidebar region — the focus ring is visible here: +await page.locator('.sidebar-container').screenshot({ + path: testInfo.outputPath('sidebar-focus-ring.png'), +}); +``` + +### Worked example — icon swap inside a conditionally-rendered, change-detection-gated button + +`ReviewChangesButton` (and its clear button containing the icon under test) only renders when at least one story has status NEW or MOD. We use **Save from Controls** to mutate a story file on the working tree; the change-detection scanner picks the uncommitted edit up and flips the story's status, which causes `ReviewChangesButton` to mount. The recipe never calls `fs.*` directly — Storybook's csf-tools does the write. + +```ts +await page.goto(`${baseURL}/?path=/story/example-button--primary`); +const recipe = new RecipePage(page, expect); +await recipe.waitUntilLoaded(); + +// 1. Open the Controls panel and edit a control value. +const controlsTab = page.getByRole('tab', { name: /controls/i }); +await controlsTab.click(); +const labelInput = page.locator('input[name="label"], textarea[name="label"]').first(); +await labelInput.fill('Verify harness saved this'); + +// 2. Save from Controls — csf-tools writes the edit back to the story file on +// the runner's working tree. The button in this Storybook is +// aria-labelled "Save changes to story" with visible text "Update story"; +// match on either to stay robust across label drift. +const saveButton = page.getByRole('button', { name: /save changes to story|update story/i }); +await expect(saveButton).toBeVisible({ timeout: 10000 }); +await saveButton.click(); + +// 3. Change-detection now sees the story as MOD; the review toggle mounts. +// NOTE: it is rendered as an aria `switch`, NOT a `button`. Match accordingly. +const reviewToggle = page.getByRole('switch', { name: /review.+stories/i }); +await expect(reviewToggle).toBeVisible({ timeout: 15000 }); + +// 4. Activate review mode so the *clear* button (which carries the diff's icon) renders. +await reviewToggle.click(); +const clearButton = page.getByRole('button', { name: /^clear$/i }); +await expect(clearButton).toBeVisible({ timeout: 10000 }); + +// 5. Screenshot the sidebar region — the new UndoIcon is inside the clear button. +await page.locator('.sidebar-container').screenshot({ + path: testInfo.outputPath('sidebar-with-clear-button.png'), +}); +``` + +This pattern (Save from Controls → wait for status flip → screenshot the now-visible UI) is the canonical answer for any diff that touches change-detection-gated UI. The closing `expect(filterPageErrors(pageErrors)).toEqual([])` in the standard footer covers module-resolution as a free bonus. + +--- + +## 9. What to AVOID (skill's deny-regex enforces several of these) + +> The authoritative, machine-enforced deny list is `DENY_PATTERNS` in +> `scripts/verify/recipe-deny.ts` (regex tripwire) plus the ESLint allowlist +> in `.verify-recipes/.eslintrc.cjs` (AST boundary). The table below is +> non-exhaustive author guidance — do not treat it as the spec, and do not +> copy it elsewhere. If the two disagree, the code wins. + +| Pattern | Why | +|---|---| +| `import ... from 'child_process'` / `require('child_process')` | Recipes never spawn subprocesses | +| `fs.unlink`, `fs.rm`, `fs.rmdir`, `fsp.unlink`, etc. | Recipes never delete files | +| `process.exit(...)` | Playwright handles test exit codes; never short-circuit | +| `eval(...)` | Never. Use `page.evaluate(...)` if you need in-browser execution | +| `import 'node:...'` (Node-only modules) | Recipes are Playwright-test files, not orchestration scripts | +| `@storybook/...` direct imports | Adds the non-erasable TS-enum chain that breaks under bun's strip-types path | +| `page.waitForTimeout(N)` | Always avoid time-based waits; use web-first assertions | +| `test.only`, `test.skip`, `describe.only` | Single test only; no skipping | +| Network calls (`fetch`, `axios`, etc.) inside the spec body | Storybook is local; no external endpoints | + +--- + +## 10. Header comment provenance (the skill prepends this) + +After you emit your spec body, the `verify-recipe-author` skill prepends a block comment with `{ generatedAt, agentModel, prNumber, referenceSpecs, triageGlobs }`. Do NOT emit this yourself — the skill owns it. + +--- + +## 11. Worked example (reference shape) + +See `.verify-recipes/example-smoke.spec.ts` for the canonical minimum. Your output should look structurally similar: listeners → goto → `waitUntilLoaded` → assertions → `finally` attach → `expect(filterPageErrors(pageErrors)).toEqual([])`. + +--- + +## 12. Target selection (v6) + +Pick one of two execution targets via a single-line header comment as +the **first non-empty line** of the spec: + +```ts +// @verify-target: internal-ui +// or: +// @verify-target: sandbox:react-vite/default-ts +``` + +| Target | What the harness boots | Pick when | +|---|---|---| +| `internal-ui` (default if header absent) | `code/storybook-static/` served via `http-server`. Built once from the PR-head monorepo. | The diff touches a package that the internal Storybook UI exercises (manager, manager-api, channels, core-server, addons, csf-tools, preview-api). This is the right answer for ~all PRs. | +| `sandbox: