Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
13 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
192 changes: 192 additions & 0 deletions .agents/skills/verify-recipe-author/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
---
name: verify-recipe-author
description: Generate the Playwright recipe spec for a PR-verify-pr-generate prompt bundle. Reads `.verify-output/<runId>/prompt-bundle.json`, dispatches the OMC executor agent (model=opus), and pipes the raw agent reply into `verify-pr-author` (stdin mode). The TypeScript core owns extraction, deny-regex, header-comment provenance, the file write to `.verify-recipes/pr-<#>.spec.ts`, scoped lint, the single retry, and `.verify-output/<runId>/result.json`. Trigger after `yarn verify-pr-generate`.
allowed-tools: Agent, Bash, Read, Write, Edit
---

# Verify Recipe Author

Consumes a prompt bundle emitted by `yarn verify-pr-generate --pr <#>` and produces the per-PR Playwright recipe spec for human review. Authoring only — never executes the spec.

This skill is invoked **after** `yarn verify-pr-generate --pr <#>` succeeds. The bun script does the deterministic I/O (gh fetch, triage, prompt assembly, bundle write); this skill **only** dispatches the agent and pipes its raw reply into the `verify-pr-author` CLI. Extraction, deny-regex, provenance, file write, lint, the single retry, and `result.json` all live in TypeScript core — the skill never does them itself.

> **Paths are repo-root-relative.** Every path below is written relative to
> the repository root, denoted `$REPO_ROOT`. Resolve it once at runtime with
> `REPO_ROOT="$(git rev-parse --show-toplevel)"` (works from any clone,
> worktree, or CI checkout) and substitute it wherever `$REPO_ROOT` appears.
> Never hardcode an absolute machine path — it breaks on every other
> clone/worktree/CI runner.

The full design and acceptance criteria live in `$REPO_ROOT/.omc/plans/pr-verify-v3-agent-generated-recipes.md` (§Lane C, §D6, §D8, §D9). Read the plan if anything below is ambiguous.

## Inputs

No args required. The skill discovers the most recent bundle automatically. The caller may optionally pass an explicit bundle path as the skill argument.

1. **Auto-discover (default)**: list `$REPO_ROOT/.verify-output/`, pick the directory with the lexicographically largest name (ISO timestamps sort correctly), then read `prompt-bundle.json` inside it.
2. **Explicit path**: if the user passed an absolute path to a `prompt-bundle.json`, read that file directly.

Bundle shape (see `scripts/verify-pr-generate.ts` for the canonical emitter):

```jsonc
{
"version": 1,
"prNumber": 12345,
"runId": "...",
"outputSpecPath": "/abs/path/.verify-recipes/pr-12345.spec.ts",
"force": false,
"prompt": "<full assembled prompt>",
"metadata": {
"agentModel": "claude-opus-4-7[1m]",
"referenceSpecs": ["..."],
"triageGlobs": ["..."],
"generatedAt": "<ISO>"
}
}
```

The `<runId>` is the parent directory of the bundle — derive it from the bundle path, not from a field.

## Runbook

Follow these steps in order. Stop and emit `result.json` per §Failure Modes on any non-success outcome.

### Step 1 — Read the bundle

`Read` the bundle JSON. Capture `prNumber`, `runId` (from the parent dir), `outputSpecPath`, `force`, `prompt`, and `metadata`.

### Step 2 — Pre-flight collision check (D9, TOCTOU re-guard)

Re-check whether `bundle.outputSpecPath` already exists. The bun script enforced D9 at bundle-emit time; the skill re-checks because the user may have created the file between the two steps.

- If the file exists and `bundle.force === false` → write `result.json` with `{ status: "collision", specPath: <path>, attempts: 0 }` and stop. (This mirrors the CLI's own `collision` status / exit 1; the pre-flight only exists to skip a wasted agent dispatch — the CLI re-enforces D9 regardless.)
- Otherwise proceed.

> **One owner.** After dispatch, the TypeScript core
> (`scripts/verify-pr-author.ts` → `scripts/verify/recipe-author-core.ts`)
> owns spec-body extraction, deny-regex, header-comment provenance, the
> file write, scoped lint, post-write regex checks, the single retry, and
> `result.json`. The skill does **not** extract fences, run deny-regex, or
> write the spec itself. Steps 3–5 below are the entire runbook.

### Step 3 — Dispatch the agent (attempt 1)

```
Agent({
description: "Generate PR recipe spec",
subagent_type: "oh-my-claudecode:executor",
model: "opus",
prompt: bundle.prompt
})
```

The bundle's `prompt` already contains the full authoring contract,
reference specs, PR diff, and fence-marker instruction
(`<<<SPEC_START>>>` … `<<<SPEC_END>>>`). Capture the agent's full raw
reply as `$REPLY` (do not parse or edit it).

### Step 4 — Pipe the raw reply to `verify-pr-author` (stdin mode)

```bash
printf '%s' "$REPLY" | node "$REPO_ROOT/scripts/verify-pr-author.ts" --bundle <abs-bundle-path> --dispatch-mode stdin
```

The CLI performs extraction, deny-regex, provenance, file write, scoped
lint (`scripts/verify/lint-invocation.ts`), post-write regex checks, and
writes `result.json`. Exit codes:

- `0` — success. CLI wrote the spec and `result.json`. Go to Step 6.
- `75` — retryable failure (lint, post-write regex, **or a first
deny-regex hit** — the CLI asks the agent to self-correct). The CLI
emitted a framed retry block on stdout. Go to Step 5.
- `1` — terminal failure (collision, extract-failed, or any gate
exhausted on the final attempt). CLI already wrote `result.json` with
the failure status. Print the failure line (Step 6) and stop.

Exit 75 is the sole retry sentinel; any other non-zero exit is terminal.
The skill never decides retryability — the CLI does.

### Step 5 — Retry once (on exit 75)

Parse stdout for the framed retry block:

```
===VERIFY_PR_AUTHOR_RETRY_BEGIN===
<retryMessage payload — already categorized and capped at 5 errors>
===VERIFY_PR_AUTHOR_RETRY_END===
```

Assemble the retry prompt and re-dispatch the agent (same
`subagent_type` and `model`):

```
<bundle.prompt>

[RETRY]
<retryMessage>
```

Pipe the new raw reply back through the CLI in retry mode:

```bash
printf '%s' "$REPLY2" | node "$REPO_ROOT/scripts/verify-pr-author.ts" --bundle <abs-bundle-path> --dispatch-mode stdin --retry-of <runId>
```

The CLI enforces `MAX_RECIPE_ATTEMPTS` (read from
`scripts/verify/recipe-author-core.ts`; currently 2) and will **not**
re-emit exit 75 on the retry call. Expected exits:

- `0` — success. Go to Step 6.
- `1` — terminal failure (any gate exhausted on attempt 2). CLI wrote
`result.json` with `attempts: 2` and the terminal status. Print the
failure line and stop.

### Step 6 — Print actionable next-step lines

`result.json` is already written by the CLI — do **not** write it from
the skill. On success print:

```
[verify-recipe-author] spec written: <abs spec path>
[verify-recipe-author] result.json: <abs result.json path>
[verify-recipe-author] attempts: <n>
[verify-recipe-author] Next: review the spec, then run `yarn verify-pr --recipe-spec <spec path>`
```

On a terminal exit-1, print instead:

```
[verify-recipe-author] FAILED: <status> — see <abs result.json path>
```

## Failure Modes

`result.json` is written by the CLI, not the skill. `status` is the exact
`RecipeAuthorStatus` union from `scripts/verify/recipe-author-core.ts` —
do not invent values. On attempt 1 in stdin mode, lint / post-write-regex
/ **first deny-regex hit** all return `retry-requested` (CLI exit 75) so
the agent can self-correct; the terminal status below is what lands when
attempts are exhausted (CLI exit 1).

| Cause | terminal `status` | Exit | Retried once first? |
|---|---|---|---|
| `outputSpecPath` exists and `force === false` | `collision` | 1 | no |
| No parseable body between fence markers | `extract-failed` | 1 | no (terminal immediately) |
| Deny-regex hit | `deny-regex-hit` | 1 | **yes** (attempt-1 → `retry-requested`/exit 75) |
| Scoped lint failed | `lint-failed` | 1 | yes (attempt-1 → `retry-requested`/exit 75) |
| Post-write regex check failed (listener-before-goto OR attach) | `regex-failed` | 1 | yes (attempt-1 → `retry-requested`/exit 75) |
| All gates pass | `spec-written` | 0 | n/a |

## Notes

- This skill runs inside Claude Code; it uses `Agent`, `Read`, `Write`, `Bash`, and `Edit` tools.
- Paths in invocations are repo-root-relative (`$REPO_ROOT`, resolved via `git rev-parse --show-toplevel` — see the note near the top); resolve `$REPO_ROOT` to an absolute path before invoking. Lint commands `cd code` via `yarn --cwd`.
- Max attempts = `MAX_RECIPE_ATTEMPTS` (currently 2). Read the value from `scripts/verify/recipe-author-core.ts` — do not hardcode.
- The skill **never executes** the generated spec. The human review gate (Phase-1 lethal-trifecta breaker) is preserved.
- A first deny-regex hit is retried **once** in stdin mode (the CLI emits `retry-requested` / exit 75 so the agent can self-correct, e.g. eval #36); only an exhausted deny hit is the terminal `deny-regex-hit`. The deny-regex remains a security gate — the single self-correction attempt does not weaken it (every attempt is re-checked; a persistent hit still terminates).
- Cap retry feedback at 5 errors (R3).
- The `runId` is the basename of the parent directory of the bundle; do not invent a new one.

## Phase-2 follow-up

This skill currently couples generation to a running Claude Code session via the `Agent` tool dispatch. Phase-2 CI activation will require migrating to a direct Anthropic SDK call (`@anthropic-ai/sdk`) with an `ANTHROPIC_API_KEY` env var, replacing the `Agent` dispatch with a standalone API call so the workflow at `.github/workflows/verify-pr.yml` can run unattended. Tracked as a follow-up in the plan's ADR §Follow-ups.
9 changes: 8 additions & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ parameters:
default: ''
description: The PR number
type: string
ghIsFork:
default: 'false'
description: >
'true' when the triggering PR head is a fork (untrusted). SECURITY:
gates save_cache so a fork pipeline cannot poison the project-global
cache that trusted merged/daily pipelines restore.
type: string
workflow:
default: skipped
description: Which workflow to run
Expand Down Expand Up @@ -44,7 +51,7 @@ jobs:
- run:
name: Generate config
command: |
yarn dlx jiti ./scripts/ci/main.ts --workflow=<< pipeline.parameters.workflow >>
yarn dlx jiti ./scripts/ci/main.ts --workflow=<< pipeline.parameters.workflow >> --is-fork=<< pipeline.parameters.ghIsFork >>
- continuation/continue:
configuration_path: .circleci/config.generated.yml
workflows:
Expand Down
1 change: 1 addition & 0 deletions .claude/skills/verify-recipe-author/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@../../../.agents/skills/verify-recipe-author/SKILL.md
19 changes: 19 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
.env
.env.*
**/.env
**/.env.*
~/.ssh/
~/.aws/
~/.config/gcloud/
~/.azure/
~/.docker/config.json
~/.kube/config
.npmrc
.pypirc
**/*-service-account.json
**/*.pem
**/*.key
~/.git-credentials
.verify-output/
node_modules/
.nx/
142 changes: 142 additions & 0 deletions .github/actions/agentic-pr-prepare/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# agentic-pr-prepare

Universal infrastructure setup for agentic workflows running under
`pull_request_target`: actor-permission gate, base + PR-head manual clones,
toolchain install, sandbox-runtime (srt) install + sha-pin verification,
srt-settings JSON, egress smoke-test, and trusted-harness sync.

This is **half 1 of 2** of the split `verify-pr.yml` infrastructure. The
companion is `agentic-pr-publish`.

## Caller contract

The composite **cannot** declare these — the caller workflow MUST:

1. Trigger on `pull_request_target` (composite `uses: ./.github/actions/...`
resolves against the **base ref** under PRT, which is load-bearing for
trust — never lift this to a trigger that resolves against PR-head).
2. Declare a `permissions:` block. Verify-PR needs at least:
```yaml
permissions:
pull-requests: write
issues: write
statuses: write
contents: write # side-branch screenshot push (drop if not needed)
```
3. Declare a `concurrency:` block. Single-PR:
```yaml
concurrency:
group: verify-${{ github.event.pull_request.number }}
cancel-in-progress: true
```
With `strategy.matrix`, include the matrix dim in the key:
`verify-${{ pr-num }}-${{ matrix.target }}` (matrix-concurrency footgun).
4. Pass `srt-sha256` **inline** with every call. The composite has **no
default** — this keeps a chore-bump PR carrying the heightened
workflow-review bar instead of single-approval flipping a composite default.

## Inputs

| Name | Required | Default | Purpose |
|------------------------|----------|----------------------------------|----------------------------------------------------------------------------------------|
| `github-token` | yes | — | Base + PR-head manual clones. |
| `base-ref` | yes | — | `github.event.pull_request.base.ref`. |
| `base-sha` | yes | — | `github.event.pull_request.base.sha`. |
| `pr-head-sha` | yes | — | `github.event.pull_request.head.sha`. |
| `repo` | yes | — | `github.repository`. |
| `srt-version` | no | `0.0.51` | Pinned `@anthropic-ai/sandbox-runtime` version. |
| `srt-sha256` | **yes** | — (no default by design) | sha256 of the resolved `srt` shim at `srt-version`. Bump via `_srt-sha-probe.yml`. |
| `srt-allowed-domains` | no | localhost + registries + CDNs | Newline list. Caller may extend. |
| `srt-allow-write-paths`| no | `$PR_HEAD_DIR`, `$SANDBOX_TMPDIR`, `/tmp`, `$HOME/.cache`, … | Newline list; env vars expanded at composite runtime. |
| `srt-deny-read-paths` | no | `$HOME/.ssh`, `$HOME/.aws`, … | Newline list. |
| `srt-deny-write-paths` | no | `$GITHUB_WORKSPACE`, `$GITHUB_WORKSPACE/.git` | Newline list. |
| `sync-files` | no | (empty) | Newline-delimited `src:dst` pairs (paths relative). H2 path-validated. |
| `sync-trees` | no | (empty) | Newline-delimited tree paths (relative). H2 path-validated. |
| `provenance-secret` | no | (empty → per-run random) | Optional caller-supplied. M2: written to file, not `$GITHUB_ENV`. |
| `install-code-deps` | no | `true` | Pass-through to `setup-node-and-install`. |

### Path-input safety (H2)

`sync-files` and `sync-trees` reject `..`, leading `/`, extra `:`; resolve
realpath and assert under `$PR_HEAD_DIR`. Refuses symlink at destination
before `cp --no-dereference` / `cp -aT`.

### srt-settings JSON emission (H3)

allowWrite / denyRead / denyWrite / allowedDomains arrays are emitted via
`jq -R . | jq -s .` so PR-controllable strings cannot inject JSON keys.

## Outputs

| Name | Purpose |
|----------------------------|--------------------------------------------------------------------------------------------------|
| `pr-head-dir` | Absolute path to untrusted PR-head workspace clone. |
| `srt-settings-path` | Absolute path to `srt-settings.json`. |
| `diff-path` | Absolute path to captured `pr.diff`. |
| `provenance-secret-path` | M2: path to file (mode 0600) holding the per-run provenance secret. NOT in `$GITHUB_ENV`. |

## Side-effects

Writes to `$GITHUB_ENV` (so subsequent caller steps in the same job see them):

- `PR_HEAD_DIR` — absolute path to PR-head workspace
- `SRT_SETTINGS` — absolute path to srt-settings.json
- `CLAUDE_CODE_TMPDIR` — absolute path to sandbox scratch tmpdir

Does **NOT** write `VERIFY_PROVENANCE_SECRET` to `$GITHUB_ENV`. Trusted task
steps load it explicitly: `cat "$(provenance-secret-path)"`.

## Worked example

```yaml
- name: Prepare agentic environment
id: prep
uses: ./.github/actions/agentic-pr-prepare
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
base-ref: ${{ github.event.pull_request.base.ref }}
base-sha: ${{ github.event.pull_request.base.sha }}
pr-head-sha: ${{ github.event.pull_request.head.sha }}
repo: ${{ github.repository }}
srt-version: '0.0.51'
srt-sha256: '36de38197ac22991c8c9edead4d6184914c8b786e040ecf27bdcf26abd166338'
sync-files: |
.verify-recipes/_util.ts:.verify-recipes/_util.ts
sync-trees: |
scripts/verify
provenance-secret: ${{ secrets.VERIFY_PROVENANCE_SECRET }}

- name: Your task
env:
PROVENANCE_SECRET_PATH: ${{ steps.prep.outputs.provenance-secret-path }}
run: |
VERIFY_PROVENANCE_SECRET="$(cat "$PROVENANCE_SECRET_PATH")" \
yarn your-thing
```

## Pre-existing architectural debt (C1 — NOT fixed by this composite)

`verify-result.json` (the file the verdict is read from) lives at
`$PR_HEAD_DIR/.verify-out-trusted/verify-result.json` — inside srt's
`allowWrite` set. A malicious PR-added unit test running inside srt can
forge it. The split documented here does NOT make C1 worse; it stays at
its current path so the legitimate writer (`verify-pr.ts`, which itself
runs INSIDE srt) keeps working.

**The architectural fix requires** one of:

1. **Process-split** — orchestrator OUTSIDE srt, only Playwright + dev-server
spawns wrapped. **Attempted 2026-05-14, failed**: srt uses bubblewrap with
a fresh network namespace per invocation, so localhost IPC between
orchestrator (outside) and dispatcher (inside) breaks. Reviving requires
shared host netns (loses egress policy on dispatcher), host-network bridge
/ Unix socket, or moving dispatcher outside srt (loosens trust on
PR-modified framework code).
2. **HMAC-bound verdict** — `verify-pr.ts` HMAC-signs the JSON with the
provenance secret; trusted bash verifies. Requires scrubbing the secret
from orchestrator env before spawning Playwright + auditing
`/proc/<pid>/environ` reachability inside srt.

Until that lands, the verdict is trustworthy ONLY when paired with the
side-channel signals (PR comment, telemetry, GitHub run conclusion) that an
attacker would also have to forge. Tracked as separate follow-up.
Loading
Loading