feat(scripts): agentic PR verification harness (v6 single-round — author + execute in one run)#34762
feat(scripts): agentic PR verification harness (v6 single-round — author + execute in one run)#34762valentinpalkovic wants to merge 13 commits into
Conversation
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a PR verification harness: prompt-bundle generation, agent-driven Playwright spec authoring with deny-regex and ESLint gates (single retry), provenance-tagged committed specs, Storybook sandbox boot/sync, Playwright execution with trace capture, fixtures/docs, and a gated CI workflow that uploads artifacts and posts verdict comments. ChangesPR Verification Harness
Sequence Diagram(s)sequenceDiagram
participant Dev as Dev / CI
participant Generator as verify-pr-generate
participant Bundle as PromptBundle (.verify-output)
participant Agent as Anthropic / Agent
participant AuthorCLI as verify-pr-author (stdin/sdk)
participant Linter as ESLint (lint-invocation)
participant Writer as Spec Writer (.verify-recipes)
participant Runner as verify-pr (Playwright Runner)
Dev->>Generator: yarn verify-pr-generate (create prompt-bundle.json)
Generator->>Bundle: write prompt-bundle.json
Bundle->>Agent: dispatch prompt (agent-dispatch)
Agent->>AuthorCLI: assistant reply (stdin/sdk)
AuthorCLI->>Linter: lint candidate spec
Linter-->>AuthorCLI: lint result / violations
AuthorCLI->>Writer: write provenance-tagged `.verify-recipes/pr-<N>.spec.ts` or emit retry (exit 75)
Dev->>Runner: yarn verify-pr --recipe-spec (run committed spec)
Runner-->>Dev: verify-result.json + artifacts (trace, logs)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (2)
scripts/verify/boot.ts (2)
30-58: 💤 Low valueSame command injection concern applies to lsof/netstat calls.
The
lsofandnetstatfallback commands on macOS/Linux have the same pattern. Consider refactoring these to usespawnSyncwith argument arrays as well.♻️ Suggested refactor for Unix port checks
- const out = execSync(`lsof -ti :${port}`, { encoding: 'utf-8' }).trim(); + const lsof = spawnSync('lsof', ['-ti', `:${port}`], { encoding: 'utf-8' }); + const out = (lsof.stdout ?? '').trim();And for the netstat fallback:
- const out = execSync(`netstat -an | grep :${port}`, { encoding: 'utf-8' }).trim(); + const netstat = spawnSync('netstat', ['-an'], { encoding: 'utf-8' }); + const out = (netstat.stdout ?? '') + .split('\n') + .filter((line) => line.includes(`:${port}`)) + .join('\n') + .trim();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/verify/boot.ts` around lines 30 - 58, The lsof/netstat invocations in scripts/verify/boot.ts use execSync with string interpolation (the execSync(`lsof -ti :${port}`) and execSync(`netstat -an | grep :${port}`) calls) which allows shell injection; change both to use child_process.spawnSync with argument arrays (e.g., spawnSync('lsof', ['-ti', String(port)]) and for the fallback use spawnSync('netstat', ['-an']) and pipe/filter the stdout in Node rather than using a shell grep), handle ENOENT the same way, and preserve the existing stdout/trimming/error handling logic used in the try/catch around those calls.
11-11: 💤 Low valuePrefer
spawnSyncwith args array to avoid command injection risk.While
portis typed asnumber(limiting actual injection risk), usingexecSyncwith string interpolation is flagged by static analysis and doesn't follow secure coding best practices. UsingspawnSyncwith an argument array is safer and eliminates the warning.♻️ Suggested refactor for Windows port check
- const out = execSync(`netstat -ano | findstr :${port}`, { encoding: 'utf-8' }).trim(); + const netstat = spawnSync('netstat', ['-ano'], { encoding: 'utf-8' }); + const out = (netstat.stdout ?? '') + .split('\n') + .filter((line) => line.includes(`:${port}`)) + .join('\n') + .trim();Import
spawnSyncalongsideexecSync:-import { execSync, spawn } from 'node:child_process'; +import { execSync, spawn, spawnSync } from 'node:child_process';🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/verify/boot.ts` at line 11, Replace the execSync call with spawnSync: import spawnSync from 'child_process' (or include it when importing execSync), call spawnSync('netstat', ['-ano'], { encoding: 'utf-8' }) to get stdout, then search/filter the stdout string for the port (e.g., split by newlines and find a line that includes `:${port}`) instead of using a shell pipeline to findstr; update the const out assignment to use the spawnSync stdout and handle empty/no-match cases similarly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.agents/skills/verify-recipe-author/SKILL.md:
- Around line 13-20: The SKILL.md currently hardcodes the absolute machine-local
path /Users/valentinpalkovic/Projects/storybook/... for bundle discovery and
other references; change the contract to use repository-relative or configurable
paths instead: describe auto-discovery as looking under the repo/workspace root
(e.g., ./ .verify-output/) or respect an environment/argument (e.g., BUNDLE_ROOT
or a CLI arg) and update all occurrences mentioned (the auto-discover section
and the other occurrences at lines ~80-86, 137-138, 175-177, 187-202) to
reference the new repo-relative or configurable path rather than the absolute
path so the skill works from any checkout.
- Around line 51-89: The docs in SKILL.md are out-of-date and describe legacy
result shapes and retry flows that conflict with the implemented logic in
verify-pr-author.ts and recipe-author-core.ts; update the step descriptions to
match the shipped contract: ensure the pre-flight check references
bundle.outputSpecPath / bundle.force but writes the current result.json shape
used by recipe-author-core.ts (use the exact status strings and attempts framing
implemented there), change the retry/fence-miss semantics to match
verify-pr-author.ts (how many attempts, what messages trigger retries vs final
failure), and update the deny-regex invocation notes to reference the actual
script function assertNoDeniedPatterns in recipe-deny.ts and the real temp-file
path convention (.verify-output/<runId>/.deny-input.txt) and failure handling
(no retry on throw). Ensure all mentions of statuses, retry counts, and
ownership of header/lint/write behavior align with the real functions in
verify-pr-author.ts and recipe-author-core.ts rather than the legacy text.
In @.github/workflows/verify-pr.yml:
- Around line 23-26: The workflow is checking out the base commit (Checkout
base) using ref: github.event.pull_request.base.sha so the job builds the wrong
code; update the checkout step to instead fetch and check out the PR head (use
ref: github.event.pull_request.head.sha) or perform a two-step checkout/merge
(keep Checkout base, then run a step that fetches
github.event.pull_request.head.ref and merges or checks out that commit) so the
subsequent yarn verify-pr runs against the PR changes rather than the base
commit.
- Around line 19-24: Summary: The workflow uses placeholder all-zero SHAs for
two actions which will fail; replace them with real pinned commit SHAs or stable
tags. Locate the uses:
"prince-chrismc/check-actor-permissions-action@0000000000000000000000000000000000000000"
and "actions/checkout@0000000000000000000000000000000000000000" (and the
repeated occurrence at lines referenced 54-61) and swap the placeholder refs for
the actual commit SHA (preferred) or an official released tag, then re-run the
workflow to confirm the actions resolve correctly.
- Around line 31-51: The workflow assumes Bun and a local Docker image exist;
add steps to provision them before the "Generate bundle" and "Run harness in
container" steps: install Bun (e.g., use actions/setup-bun or an install script)
so the yarn verify-pr-generate command runs on a fresh runner, and add a step to
produce or pull the verify-harness:pinned-sha image (e.g., a "Build harness
image" step that runs docker build -t verify-harness:pinned-sha ... or a docker
pull) so the docker run that executes yarn verify-pr --recipe-spec can find the
image; reference the "Generate bundle" (yarn verify-pr-generate), "Author
recipe" (yarn verify-pr-author) and the docker run using
verify-harness:pinned-sha to place these new steps immediately before those
existing steps.
- Around line 15-17: The workflow's permissions block is missing the repository
contents read scope required by actions/checkout; add "contents: read" to the
existing permissions map (alongside pull-requests: write and statuses: write) so
actions/checkout can access the repo before the harness steps run and avoid the
job failing early.
In `@scripts/verify-pr-author.ts`:
- Around line 186-191: The catch block after runRecipeAuthor currently only logs
to stderr and returns; instead build a structured failure result and persist it
to result.json before returning. In the catch for runRecipeAuthor (and any
thrown by dispatchRecipeAuthor/stdin), populate the existing result variable (or
create one) with failure fields (e.g. success: false, error: msg, errorStack:
err.stack, attempt info), then write that object to result.json (using the same
JSON structure the rest of the flow expects) and flush to disk before calling
console.error and returning 1 so every run always leaves a well-formed
result.json.
In `@scripts/verify/README.md`:
- Around line 228-309: The README describes pre-v4 behavior that no longer
matches the shipped implementation; update the Limitations/Roadmap/Increment 2
text to reflect label-gated CI, direct SDK dispatch for the authoring skill, and
the current stdout retry contract `===VERIFY_PR_AUTHOR_RETRY_BEGIN===` (and
remove obsolete notes about `if: false`, Claude Code-only dispatch, stderr/JSON
retry framing, and older retry ownership). Edit the Increment 2 four-step flow
and Spec-name collision/Header-comment/Triage routing/Diff truncation paragraphs
(and the mirrored section at lines ~322-363) so commands like `yarn
verify-pr-generate`, the skill path
`.claude/skills/verify-recipe-author/SKILL.md`, the emitted
`.verify-output/<runId>/result.json`, and CI behavior reference the actual
label-gated workflow and direct Anthropic SDK dispatch used in this PR; keep the
same user-facing commands but change the descriptive prose and troubleshooting
pointers to the new stdout contract and CI gating.
In `@scripts/verify/recipe-author-core.ts`:
- Around line 114-120: The extractSpecBody function currently returns an empty
string for fences that contain only whitespace; change it to treat
whitespace-only payloads as extraction failures by checking the extracted body
with body.trim() and returning null when it's empty. Locate extractSpecBody (and
the variables SPEC_FENCE_START / SPEC_FENCE_END) and after computing body from
reply.slice(...), add a guard that returns null if body.trim() === '' and
otherwise perform the existing whitespace-adjustments and return the body.
- Around line 252-268: The current existsSync + fs.promises.rename still allows
a race where rename can overwrite bundle.outputSpecPath; change the move logic
to an atomic create-that-fails approach: attempt to create the destination using
fs.promises.copyFile(candidatePath, bundle.outputSpecPath,
fs.constants.COPYFILE_EXCL) (or fs.promises.link(candidatePath,
bundle.outputSpecPath) which also fails if the target exists), and only on
success unlink the candidatePath; on failure treat it as a collision (return the
same 'collision' RecipeAuthorResult). Update the code paths around candidatePath
and bundle.outputSpecPath to use COPYFILE_EXCL or link and handle the thrown
EEXIST to preserve the "do not clobber unless --force" guarantee.
In `@scripts/verify/symlink.ts`:
- Around line 27-42: The dangling-symlink heal code incorrectly calls
access(linkTarget) directly because readlink can return a relative path resolved
against dirname(target); change it to resolve relative targets first by
computing const resolved = path.isAbsolute(linkTarget) ? linkTarget :
path.resolve(path.dirname(target), linkTarget) and then call await
access(resolved) (keep existing lstat, readlink, access, unlink flow and ensure
path is imported/available).
---
Nitpick comments:
In `@scripts/verify/boot.ts`:
- Around line 30-58: The lsof/netstat invocations in scripts/verify/boot.ts use
execSync with string interpolation (the execSync(`lsof -ti :${port}`) and
execSync(`netstat -an | grep :${port}`) calls) which allows shell injection;
change both to use child_process.spawnSync with argument arrays (e.g.,
spawnSync('lsof', ['-ti', String(port)]) and for the fallback use
spawnSync('netstat', ['-an']) and pipe/filter the stdout in Node rather than
using a shell grep), handle ENOENT the same way, and preserve the existing
stdout/trimming/error handling logic used in the try/catch around those calls.
- Line 11: Replace the execSync call with spawnSync: import spawnSync from
'child_process' (or include it when importing execSync), call
spawnSync('netstat', ['-ano'], { encoding: 'utf-8' }) to get stdout, then
search/filter the stdout string for the port (e.g., split by newlines and find a
line that includes `:${port}`) instead of using a shell pipeline to findstr;
update the const out assignment to use the spawnSync stdout and handle
empty/no-match cases similarly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eb8d11de-840a-42a1-863d-ef397d7616cc
⛔ Files ignored due to path filters (1)
yarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (34)
.agents/skills/verify-recipe-author/SKILL.md.claude/skills/verify-recipe-author/SKILL.md.dockerignore.github/workflows/verify-pr.yml.gitignore.verify-recipes/.eslintrc.cjs.verify-recipes/.gitkeep.verify-recipes/_recipe-authoring-guide.md.verify-recipes/_util.ts.verify-recipes/example-smoke.spec.tspackage.jsonscripts/verify-pr-author.tsscripts/verify-pr-generate.tsscripts/verify-pr.tsscripts/verify/README.mdscripts/verify/SECURITY.mdscripts/verify/__fixtures__/stub-assistant-reply-clean.txtscripts/verify/__fixtures__/stub-assistant-reply-with-unused-var.txtscripts/verify/__fixtures__/stub-assistant-reply.txtscripts/verify/agent-dispatch.tsscripts/verify/agent-prompt.tsscripts/verify/boot.tsscripts/verify/core.tsscripts/verify/lint-invocation.tsscripts/verify/playwright.config.tsscripts/verify/recipe-author-core.tsscripts/verify/recipe-deny.tsscripts/verify/recipe-retry-policy.tsscripts/verify/recipes/triage-table.tsscripts/verify/runner.tsscripts/verify/sandbox.tsscripts/verify/symlink.tsscripts/verify/sync.tsscripts/verify/triage.ts
| try { | ||
| result = await runRecipeAuthor({ bundle, dispatch, runDir, attempt, mode }); | ||
| } catch (err) { | ||
| const msg = err instanceof Error ? err.message : String(err); | ||
| console.error(`[verify-pr-author] dispatch failed: ${msg}`); | ||
| return 1; |
There was a problem hiding this comment.
Persist dispatch failures to result.json.
If dispatchRecipeAuthor() or stdin consumption throws here, the CLI exits with only stderr output. That breaks the “always leave a structured per-run result” contract the rest of this flow relies on.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/verify-pr-author.ts` around lines 186 - 191, The catch block after
runRecipeAuthor currently only logs to stderr and returns; instead build a
structured failure result and persist it to result.json before returning. In the
catch for runRecipeAuthor (and any thrown by dispatchRecipeAuthor/stdin),
populate the existing result variable (or create one) with failure fields (e.g.
success: false, error: msg, errorStack: err.stack, attempt info), then write
that object to result.json (using the same JSON structure the rest of the flow
expects) and flush to disk before calling console.error and returning 1 so every
run always leaves a well-formed result.json.
| // Net-new dangling-symlink heal: if target exists as a symlink but points to a missing location, | ||
| // unlink it so ensureSymlink can recreate it correctly. | ||
| try { | ||
| const stat = await lstat(target); | ||
| if (stat.isSymbolicLink()) { | ||
| try { | ||
| const linkTarget = await readlink(target); | ||
| await access(linkTarget); | ||
| } catch { | ||
| await unlink(target); | ||
| console.log('[symlink] healed dangling target ' + target); | ||
| } | ||
| } | ||
| } catch (e: any) { | ||
| if (e?.code !== 'ENOENT') throw e; | ||
| } |
There was a problem hiding this comment.
Fix relative symlink target resolution.
The dangling-symlink healing logic has a path-resolution bug. When readlink(target) returns a relative path, it's relative to dirname(target), not process.cwd(). However, access(linkTarget) resolves relative paths against process.cwd(), causing incorrect validation.
Example failure: If a symlink at /repo/foo/link points to ../bar/file, readlink returns ../bar/file, but access checks {cwd}/../bar/file instead of /repo/foo/../bar/file.
🔧 Proposed fix to resolve relative symlink targets correctly
const linkTarget = await readlink(target);
- await access(linkTarget);
+ const resolvedTarget = path.isAbsolute(linkTarget)
+ ? linkTarget
+ : path.resolve(path.dirname(target), linkTarget);
+ await access(resolvedTarget);
} catch {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/verify/symlink.ts` around lines 27 - 42, The dangling-symlink heal
code incorrectly calls access(linkTarget) directly because readlink can return a
relative path resolved against dirname(target); change it to resolve relative
targets first by computing const resolved = path.isAbsolute(linkTarget) ?
linkTarget : path.resolve(path.dirname(target), linkTarget) and then call await
access(resolved) (keep existing lstat, readlink, access, unlink flow and ensure
path is imported/available).
Activation Gates A1–A4 — ResultsApproachA4 testing run on A1 — action SHAs pinned ✅Commit A2 —
|
| Step | Result |
|---|---|
| Check actor permission | ✅ |
| Setup Node.js and Install Dependencies | ✅ |
| Setup Bun | ✅ |
| Fetch PR diff | ✅ |
| Generate bundle (gh + triage) | ✅ |
| Author recipe (Anthropic SDK call) | ✅ |
| Run harness in container | ❌ no Dockerfile in repo |
| Upload artifacts | ✅ (7992 bytes, .verify-output/...) |
| Post PR comment | ✅ (robust fallback) |
Fixes shipped to make activation work
| Commit | Fix |
|---|---|
478ba479ec2 |
Pin 4 action SHAs |
51ca9477016 |
MODEL_ID_MAP pointed at non-existent claude-opus-4-5-20250929 → use valid IDs |
e3aae89b69d |
Workflow missing yarn install + Post PR comment referenced non-existent latest/ symlink |
2a1aa10b9e9 |
Workflow missing bun setup (scripts use bun) |
a256f3a7197 |
path: .verify-output/*/ finds 0 files in upload-artifact v7 |
7b660d73e26 |
Debug step (used to confirm dir layout) |
0993b55e24e |
include-hidden-files: true (.verify-output is dot-prefixed) |
Remaining activation gap
v5-0 — Dockerfile + image build in CI. Run harness in container step references verify-harness:pinned-sha. No Dockerfile in repo. Needs:
scripts/verify/Dockerfiledefining the spec-runner sandbox (Node + Playwright + minimal deps, non-root user)- Workflow step to build the image (or pull from a registry) before the harness step
Until v5-0, the harness step always fails. Everything upstream of it now works.
🤖 Generated with Claude Code
d57333c to
046eda5
Compare
160bdcd to
ad75ba9
Compare
…posites LLM-authored single-round PR verification harness running under pull_request_target on ci:verify label. Authors a Playwright spec from the PR diff, executes against internal-ui or a sandbox template, runs vision evidence-check + PR-added unit tests, posts a verdict comment. Composite actions (.github/actions/agentic-pr-prepare + agentic-pr-publish) factor the trust-boundary plumbing. Hardenings: C1 HMAC-bound verdict, srt-sha256 caller-inline, sync-path traversal guards, jq-emitted srt settings, file-path screenshot output, provenance secret to file, env-mapped tokens. Layer-2 srt (bubblewrap) isolation wraps all PR-controlled steps. Includes eval-driven fixes: vitest --cache=false, evidence-check scans $PR_HEAD_DIR/.verify-output, playwright-report path via runId, evidence reasoning 2000-char cap, recipe-body block in PR comment, 4 authoring triage rules (addon-docs→sandbox, ActionBar hover+scroll, sidebar expand, .first()-visible), srt allowWrite includes sandbox root. Clean 64-file harness footprint atop current next (prior squash had captured stale upstream snapshots surfacing as spurious reverts).
ad75ba9 to
099b6f7
Compare
…ssing Non-visual coverage gap (5 of 9 eval regressions were aria/XSS/type-only where vision returns `undetermined`): - scripts/verify/mode.ts: parse `@verify-mode` header (visual default, behavioral, pure-fn, build-config; type-only deliberately excluded). - core.ts: `mode` on VerifyResult, added to SIGNED_FIELDS so a forged in-srt result cannot claim a non-visual mode to dodge vision. - verify-pr.ts: parse/log/stamp mode; visual+behavioral share the Playwright path; pure-fn/build-config emit an explicit `skipped` (no false verdict) until their execution harness ships. - verify-evidence-check.ts: skip vision when mode != visual (kills the useless `undetermined` on behavioral recipes). - _recipe-authoring-guide.md §12.5: HARD GATE mode-selection triage + behavioral worked example (aria/XSS → behavioral). Scratch-dir blessing (sanctioned FS-write path, no srt loosening): - _util.ts: RecipePage.scratchDir + writeFixture(relPath, contents), traversal/absolute guarded; $PR_HEAD_DIR/.verify-scratch is already in srt allowWrite. - guide §4 documents it; .gitignore += .verify-scratch. DESIGN-nonvisual-coverage.md records full design; coverage gate (Part C) and pure-fn/build-config wiring deferred. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ule import Wave finding (#36 try-pr-34649 a11yRunner): recipe-author correctly chose @verify-mode: behavioral but reached the changed module via in-browser dynamic import() + monkeypatch — the deny-regex gate rejected it at attempt 1 with no retry, producing "no verdict". - recipe-author-core.ts: on assertNoDeniedPatterns failure, build a retry message (denied pattern + §12.5 pointer) and loop, mirroring the lint failure path. Terminal `deny-regex-hit` only after MAX_RECIPE_ATTEMPTS. - _recipe-authoring-guide.md §12.5: HARD GATE — a behavioral recipe must never import()/monkeypatch/eval the changed module (deny-regex blocks it pre-run = no verdict). Drive the public UI path and assert observable effect; if no UI path exists, fall back to visual smoke + filterPageErrors rather than fabricating a module import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rity no-verdict cause Wave finding (#31 try-pr-34712 XSS): recipe-author correctly chose behavioral mode and drove the change via the public manager-api `setOptions` path (no module import), but needed `(window as any).__STORYBOOK_ADDONS_MANAGER` to reach the runtime singleton. `@typescript-eslint/recommended` makes `no-explicit-any` an error, so the scoped lint gate failed twice → no verdict. - .verify-recipes/.eslintrc.cjs: `@typescript-eslint/no-explicit-any: 'off'`. Code-quality rule, NOT a security control — deny-regex and no-restricted-{globals,imports,syntax} remain the load-bearing gates. `as any` for window/manager-api globals is correct and unavoidable. - _recipe-authoring-guide.md §12.5: note that `as any` for runtime globals is allowed; don't waste retries trying to type them. Verified: behavioral recipe using `(window as any).__STORYBOOK_ADDONS_MANAGER` now lints clean (exit 0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ActionBar scope Wave findings (#28/#29/#31 stuck at regression despite passing PR unit tests — recipe-author mis-targeted the DOM): - ActionBar/Canvas rule was conflating the docs-Canvas Zoom/Show-code toolbar with the generic `ActionBar` component. Scope-tagged it to the docs-Canvas surface only. - New HARD GATE "additive-only API changes with no story/consumer" — the #1 false-regression cause. #28/#29 add `ActionItem.ariaLabel` but no story or in-diff consumer passes it, so the attribute is never in the DOM; asserting it always fails. Rule: detect additive-no-consumer, fall back to `@verify-mode: visual` smoke on the component's existing story (`components-actionbar--many-items`), never `getByRole('toolbar')` (the component renders plain <button>s) nor `.docs-story`. - New HARD GATE for `Brand` / `theme.brand.title`: the sanitized dangerouslySetInnerHTML path runs ONLY when `theme.brand.image === null`. Target the existing `manager-sidebar-heading--only-text` / `--link-and-text` stories (already `{title, image:null}`); never runtime `api.setOptions({theme})` (#31 false regression — never reaches the path). XSS-inert proof is the PR's unit test; recipe is a render/boot smoke. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… TMPDIR pinned Two distinct wave-#31/#36 root causes, both false regressions: (a) _util.ts previewRoot() filtered `#storybook-root:visible`. Stories with `parameters.layout:'fullscreen'` + the internal-ui side-by-side/stacked theme decorator wrap the story so #storybook-root has a zero-size (Playwright-"not visible") box though it rendered — locator matched nothing, waitForStoryLoaded timed out (#31 manager-sidebar-heading--*). Use `:has(> *)` instead: selects whichever container actually has children, keeps story-vs-docs disambiguation, drops the bounding-box requirement. (b) verify-pr.yml unit-test step runs `env -i … srt … yarn vitest`. `env -i` strips TMPDIR, so Yarn's run-temp realpaths a nonexistent srt path (`lstat '/tmp/claude'` ENOENT) and aborts before vitest starts → false "vitest exited without JSON report" regression (#36 a11yRunner). Pin TMPDIR to an existing allowWrite dir ($PR_HEAD_DIR/.verify-output/vitest-tmp), same rationale REPORT/LOG already live there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bans root-visible assert Re-run of #36/#31 showed both prior fixes missed the real cause: - #36: TMPDIR pin had zero effect — Yarn's mktempPromise still ENOENT `/tmp/claude`. Root cause: srt derives its sandbox tmp from CLAUDE_CODE_TMPDIR, NOT TMPDIR. The main recipe run inherits it via $GITHUB_ENV ($SANDBOX_TMPDIR); the unit-test step's `env -i` strips it, so srt falls back to its hardcoded `/tmp/claude` (never created). Pass CLAUDE_CODE_TMPDIR=$VITEST_TMPDIR (existing allowWrite dir) in env -i. - #31: previewRoot `:has(> *)` fix removed the _util.ts:66 timeout, but the recipe-author hand-rolled `expect('#storybook-root').toBeVisible()` which is "hidden" for `Sidebar/Heading` (layout:fullscreen + side-by-side = zero-box root). Brand triage rule now explicitly bans root-visibility asserts and prescribes a child `toBeAttached()` content assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ndate it Wave #36 (after CLAUDE_CODE_TMPDIR fix let vitest run): Playwright recipe failed on `expect(consoleErrors).toEqual([])` because the srt egress jail denies every non-allowlisted domain, so internal-ui's external probes always log `Failed to load resource: net::ERR_INTERNET_DISCONNECTED` — environmental, not a PR regression. No console-error equivalent of filterPageErrors existed. - _util.ts: add `filterConsoleErrors()` dropping `net::ERR_*` (INTERNET_DISCONNECTED / NAME_NOT_RESOLVED / BLOCKED_BY_CLIENT / CONNECTION_REFUSED / FAILED) + the shared cross-origin sessionStorage SecurityError. Verified: keeps only genuine errors. - _recipe-authoring-guide.md §3: MANDATORY subsection — never assert the raw consoleErrors array; always filterConsoleErrors(consoleErrors), mirroring the filterPageErrors mandate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on-visual flipped Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pragmatic-Programmer Wave-1 (EPIC-1/2/4), 17 P0 defects: Crash Early / Do No Harm: - loadCostLedger: ENOENT→0, unreadable→fail-safe over-budget; atomic ledger write (tmp+rename) so concurrent reads can't see a torn file - getPricing: unknown model throws instead of silent opus fallback; telemetry warns instead of silently charging $0 - telemetry sink hiccup warns + exits 0 (no longer gates the verdict) - computeVerdict: zero-tests carries an explicit regressionReason - sync/symlink failures propagate (no stale-core false "verified"); atomic dir copy; pruneOldRuns surfaces undeletable dirs Finish What You Start: - preflightPort uses an authoritative bind probe (no fail-open) - abort: SIGTERM→timed SIGKILL, process-group kill, await child exit - abort-race listener leak fixed; runner treats AbortError as non-fatal - SDK call honours an optional AbortSignal; runResync fetch timeout Design by Contract (HMAC verdict integrity): - trusted post-processors re-sign verify-result.json after mutation via exported core.signResultFile; result+sig published atomically (sig before result); evidence-check missing-key writes undetermined - SECURITY.md §c1 cites core.RESULT_FILENAME/verifyResultPath - workflow re-sign is rollout-guarded (feature-detect signResultFile) Security-reviewed: SHIP, 0 CRITICAL/HIGH/MED. Follow-ups (LOW) tracked as Wave-1.1: atomicWrite O_EXCL, derive-verdict atomic write, SIGNED_FIELDS exclusion assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Layers the prior-session Pragmatic-Programmer H1–H7 fixes on top of the committed Wave-1 P0 work. Clean H1–H7 files taken as-is; the 5 files that overlap Wave-1 were 3-way merged (base f573e99 ↔ Wave-1 ↔ H1–H7): - H1: delete dead recipe-retry-policy.ts (inlined into recipe-author-core) - H2: model-pricing.ts single source; agent-dispatch + append-telemetry now consume it. Semantic merge with Wave-1: getPricing keeps W1's fail-loud HARD throw on an unknown model (no silent opus fallback), expressed against the H2 single-source table; telemetry single-sources too but warns + records 0 on unknown (non-blocking, unchanged intent). - H3: deny-policy single-sourced; docs cite recipe-deny.ts - H4: env-jail single source (strip-untrusted-secrets.sh) — merged cleanly with Wave-1's EPIC-4 re-sign + rollout guard in verify-pr.yml - H5/H6: README skeleton + SKILL runbook corrected - H7: dead provenance verifier removed; SECURITY.md §c1 merged with Wave-1's RESULT_FILENAME/verifyResultPath citation - evidence-check: union import — keeps Wave-1 signResultFile (re-sign) AND H2 getModelPrice (vision cost single-source) Verified: 12 .ts typecheck clean, verify-pr.yml YAML valid, shellcheck clean, pricing roundtrip (opus-4-7 $5/$25, unknown→throws), zero conflict markers, no unrelated cross-branch WIP staged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nvariant guard
Security-review follow-ups on the PR-verify harness (all in-worktree):
- core.ts atomicWrite: unpredictable randomBytes(12) temp suffix (was
guessable pid+Date.now()) opened O_WRONLY|O_CREAT|O_EXCL|O_NOFOLLOW
0o600 so a pre-planted temp path / symlink cannot hijack, observe, or
redirect the trusted signed-result write before the atomic rename
(CWE-377/59). Exported as the one safe writer; no-retry-on-EEXIST
documented (a retry would reintroduce a name-guessing oracle).
- core.ts: module-load assertion that SIGNED_FIELDS is disjoint from
{unitTests, evidenceRetry, evidenceVerdict} — crashes early instead of
silently breaking every verified PR if a post-processor field is ever
moved into the signed set (EPIC-4.1 HMAC invariant, fail-loud).
- ci/derive-verdict.ts: both result writes (forgery-downgrade persist +
pre-re-sign write) routed through core.ts atomicWrite so a reader never
sees a torn result and the temp path can't be symlink-hijacked. Re-sign
failure after the unit-test merge now fails the step loudly
(process.exitCode=1) instead of persisting an unregenerable stale .sig.
- SECURITY.md §c1: rewrote the post-signing invariant into the two
distinct mechanisms (machine-enforced disjointness = load-bearing;
re-sign = mandatory only where a signed field changes with the secret
in scope, else defense-in-depth). Corrected the scope: only the
unit-test jq writers are deliberately unsigned (secret stripped before
untrusted vitest); evidenceRetry IS re-signed by the workflow.
Verified: tsc clean, node --experimental-strip-types --check + module
load (assertion passes, non-vacuous), atomicWrite functional (write +
atomic overwrite + zero tmp leftovers), workflow YAML still parses.
Mandatory security-reviewer pass: 0 CRIT/HIGH; MED (fail-loud re-sign)
and LOW(a) applied; LOW(b) test-pin deferred to Wave 2 (EPIC-5).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…8 single-sourcing EPIC-5 (test the verifier — was 3 test files, zero on the security/cost core): 8 new vitest suites, 181 tests, auto-globbed by the `scripts` project (no CI wiring needed; satisfies 5.11): - 5.1 recipe-deny.test.ts — all 19 DENY patterns, exact 1-based line, per-line tripwire (no comment-awareness) pinned; eval-#36 `dynamic import(` pinned (isolated + overlapping). - 5.2 agent-dispatch-cost.test.ts — budget gate boundary (computed, not hardcoded), resolveModelId round-trip, pricing digit-transpose guard. - 5.4 derive-verdict-hmac.test.ts — saboteur suite (forged/tampered/ correct/wrong-secret/non-signed vs signed field) + the deferred Wave-1.1 LOW(b) disjointness pin, made non-vacuous (poisoned-set replica proves the guard has teeth). - 5.6 triage/target-suggest, 5.7 mode/target (30-line window edge), 5.10 agent-prompt-sanitize (ANSI/NUL/fence redaction, cap boundary). EPIC-6: - 6.1 SKILL.md — de-absolutized 4 hardcoded /Users/... paths +1 stale prose note to runtime-resolved $REPO_ROOT (git rev-parse). - 6.8 srt pin — replaced manual-paste srt-version/srt-sha256 in verify-pr.yml with committed scripts/verify/srt.lock.json read fail-closed from the TRUSTED base checkout (values byte-identical: 0.0.51 / 36de…6338); load-bearing sha verification in the composite untouched. Added workflow_dispatch-only _srt-sha-probe.yml. Mandatory separate review pass (security-reviewer + code-reviewer), findings addressed and re-verified: - sec HIGH: probe no longer auto-commits/pushes the supply-chain pin — emit-only (summary + artifact + outputs), contents: read, human lands it via reviewed PR (restores the "reviewed diff" invariant). - sec LOW: strict srt version regex (rejects ./../leading-trailing dot) in both workflows. - code HIGH: derive-verdict-hmac.test.ts typed (Partial<VerifyResult>/ RecipeTest/StepStatus) — 22 tsc errors cleared, assertions unchanged. - code MED/LOW: dropped unused beforeEach import; over-claiming "ordering invariant" tests renamed to honest "disjoint rules resolve independently" (no dual-match input exists in the real globs); added target grammar negative cases. Verified: tsc clean for all 8 test files (scripts/tsconfig.json), 181/181 green, both workflows parse, bash -n OK, srt values unchanged, trust boundary + fail-closed intact (security-reviewed). Scope clean (no yarn.lock/shared-tree drift). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Introduces an agentic PR verification harness under
scripts/verify/and.github/workflows/verify-pr.yml.v6 single-round: drops the v5-0 Docker / Verdaccio / image-build-provenance scaffolding and collapses the previous two-round flow (author at base → maintainer commits at PR head → re-fire
ci:verify) into a single workflow run. The PR-head checkout is staged first; the LLM-authored Playwright recipe is materialised straight into$RUNNER_TEMP/pr-head/.verify-recipes/pr-<#>.spec.ts(ephemeral — never committed) and executed in the same job.ANTHROPIC_API_KEYstays scoped to the Author step only; the Verify step that runs the recipe has no API key and noGITHUB_TOKEN.Architecture diffs vs v5-0
enableScripts: false+ lockfile +.npmrcpurge.code/core/distbetween stages 6 and 7 across 11 v5-0 firetest rounds. v6 eliminates the multi-stage Docker boundary entirely.Architecture diffs vs the earlier v6 two-round model
.verify-recipes/pr-<#>.spec.tson the PR branch$RUNNER_TEMP/pr-head/.verify-recipes/(ephemeral, artifact-only)outputSpecPath+ actor-perm + label gate + trusted-script provenanceSee
scripts/verify/SECURITY.mdfor the full re-audit. Local-dev with theverify-recipe-authorskill (human-review path) remains supported for ambiguous PRs.Per-recipe targets
Recipes declare execution target via a single header line (scanned in the first 30 lines):
internal-ui(default)yarn storybook:ui:buildonce →yarn http-server code/storybook-static -p <port>sandbox:<template>Local CLI:
yarn verify-pr <PR#>resolves to.verify-recipes/pr-<#>.spec.ts; explicit--recipe-spec <path>still works.CI workflow shape
.github/workflows/verify-pr.yml:Check actor permission(≥ write)pull_request.base.ref) + Setup Node.js / install root deps + Setup BunCompute paths—echo "PR_HEAD_DIR=$RUNNER_TEMP/pr-head" >> $GITHUB_ENV(workaround forrunner.tempnot being a valid job-env context)git clonePR head into$PR_HEAD_DIR(clone target is outside the base checkout so nx / yarn / jiti module resolution cannot escape upward into the base node_modules)gh pr diff→/tmp/pr.diffyarn verify-pr-generate --pr <#> --force --output "$PR_HEAD_DIR/.verify-recipes/pr-<#>.spec.ts"— trusted base scripts emit the prompt bundle with the ephemeral output path baked inyarn verify-pr-author --bundle …(ANTHROPIC_API_KEY scoped here) dispatches the LLM, runs deny-regex + lint + structural pattern checks, and atomically renames the candidate onto the ephemeral path inside$PR_HEAD_DIR$PR_HEAD_DIR):Read verdict→ onverified,gh label create(idempotent) +gh pr edit --add-label verified-by-harness_verify-screenshotsorphan side branch (raw.githubusercontent.com URLs pinned at the commit pushed)$PR_HEAD_DIR/.verify-output/+ the ephemeral spec + the base-checkout author run-dir as the artefact bundleFiles dropped (vs v5-0 + two-round v6)
Plus the v5-0 additions in
.dockerignoreand therenovate.jsonDocker-pin rules.New / changed in single-round v6
scripts/verify/target.ts—// @verify-target:header parser (defaultinternal-ui).scripts/verify/internal-ui.ts—storybook:ui:build+http-serverboot.scripts/verify-pr.ts— rewritten around per-recipe target dispatch; positional<PR#>arg; droppedinContainershort-circuit,HEAD_SHAruntime assertion,imageDigestfield.scripts/verify-pr-generate.ts—--output <path>flag so CI can point the bundle at an ephemeral PR-head path; default behaviour preserved for local-dev.scripts/verify/recipe-author-core.ts— provenance header acknowledges both modes (local-dev human-reviewed; CI single-round materialised and executed without intermediate review).scripts/verify/core.ts—templatewidened from a literal tostring;computeVerdictfilters generic"Failed to load resource:"console 404s (low-signal browser-side fetch misses).scripts/verify/SECURITY.md— full rewrite for single-round: load-bearing controls, what single-round explicitly gives up, when to add stronger isolation.scripts/verify/README.mdandscripts/verify/RUNBOOK.md— full rewrite around single-round v6 architecture and failure-mode debugging..verify-recipes/_recipe-authoring-guide.md— new §12 documenting target selection.Validation — fork firetest
Validated end-to-end on a throwaway fork PR (
valentinpalkovic/storybook#12).25721180250— conclusionsuccessin 4m22sverified(targetinternal-ui)verified-by-harnesslabel applied automaticallyTest plan
nextand to a throwaway PR (#12)ci:verifylabel fires the workflowyarn install --immutablesucceeds in$RUNNER_TEMP/pr-head/yarn nx run-many -t compilesucceeds for all 42 projectsyarn verify-prboots internal-ui, runs Playwright recipe, writesverify-result.jsonverified:verified-by-harnesslabel is applied (auto-created if missing)nextafter mergeSecurity posture (single-round v6)
Load-bearing controls — re-audited in
scripts/verify/SECURITY.md:ANTHROPIC_API_KEYis mounted only on theAuthor recipestep'senv:block. The Verify step has no API key and noGITHUB_TOKEN.child_process,fs.unlink*,fs.rm*,process.exit,eval(,import 'node:*', etc. before the spec lands on disk.listener-before-gotoandfinally-attachregex assertions on every dispatched candidate.bundle.outputSpecPathis computed by the trusted base script (verify-pr-generate) and consumed verbatim byrecipe-author-core. The LLM cannot influence where its output lands.verify-pr-generate,recipe-author-core,recipe-deny,lint-invocation, the authoring-guide, and the canonical-smoke reference all read from the base checkout (not the PR head), so a malicious PR cannot replace the gates.verify-recipe-authorskill remains supported for PRs where a maintainer wants human review of the spec before execution.What single-round gives up: no human review of the executed spec, and no replay-by-default in version control (artefacts have 14-day retention). If either is unacceptable for a given PR class, fall back to the local-dev path.
If the threat model later expands to processing third-party PRs at scale with adversarial recipe authors, wrap the playwright test step in
sandbox-runtime(bubblewrap on Linux) — ~10 lines of config per Anthropic's "Securely deploying AI agents" doc. Do not reintroduce the Docker + Verdaccio stack.🤖 Generated with Claude Code
Summary by CodeRabbit
Day 3 + Day 4 follow-ups (post-v6 single-round pivot)
The squash commit
feat(verify): Day 3 + Day 4 follow-upson top of the single-round pivot adds:Verdict layering — three orthogonal signals
*.test.*files from the PR diff).regressionReasonderived fromplaywright-report.jsonwhen the recipe author doesn't populate one — reviewers see the failing test title + first error inline.Retry loop
playwright-report.json) back to the recipe author as--retry-context. Author re-emits the spec, Playwright re-runs. Single retry; final verdict gates the label.Sandbox-target CI path
// @verify-target: sandbox:<template>(e.g.sandbox:vue3-vite/default-ts). The workflow detects the header, runsnx run <template>:sandbox(NX resolves implicitDependencies, emits the sandbox atcode/sandbox/<key>), andverify-pr.tsboots Storybook against that sandbox instead of the internal-ui dev server.react-vite,react-webpack,vue3-vite,svelte-vite,angular-cli,nextjs,nextjs-vite(alldefault-ts).compiletarget when sandbox-bound —:sandboxhandles all transitive deps via the NX project graph.Layer-1 security: secret stripping
pull_request_targetruns build / sandbox / recipe code from the untrusted PR head as the runner user that holdsGITHUB_TOKEN(contents:write, pull-requests:write) andANTHROPIC_API_KEY.unset GITHUB_TOKEN GH_TOKEN ANTHROPIC_API_KEYbefore invoking any PR-head script. Trusted scripts above (verify-pr-generate, verify-pr-author) still see the keys becauseenv -uonly strips for the single command.Layer-2 security:
@anthropic-ai/sandbox-runtimejailyarn verify-pr(initial attempt + retry) insrtwith a bubblewrap-backed FS + network jail. Defence-in-depth on top of Layer-1.network.allowLocalBinding: true(Storybook dev server onlocalhost:6006);network.allowedDomains: [](no public-internet egress).filesystem.allowWrite: $RUNNER_TEMP, /tmp, $HOME/.cache, $HOME/.local/share, $HOME/.storybook.filesystem.denyRead: $HOME/{.ssh, .aws, .docker, .npmrc, .gitconfig, .config/gh}(belt-and-suspenders alongside the env stripping).CLAUDE_CODE_TMPDIR=$RUNNER_TEMP/sandbox-tmpso the sandbox's TMPDIR bind source exists on the host.Recipe-author quality
scripts/verify/derive-story-routes.tsparsescode/.storybook/main.tsvia TS AST + inlines Storybook's auto-title /toId/storyNameFromExportalgorithms. Routes injected into the prompt bundle verbatim — agents stop guessing 404 paths.ariaLabels /data-attrs upfront._util.ts: on test failure, writes the preview-iframe'sbody.ariaSnapshot()toiframe-snapshot.md. Retry step appends this alongside the manager page-snapshot.Compile-failure surfacing
nx compilefails before Playwright runs, the workflow writes a stubverify-result.jsonwithverdict=regression,regressionReason="compile failure",regressionDetails=tail -c 4000 of the log(ANSI-stripped). PR comment renders the build error in-line so reviewers see WHY without downloading artifacts.UX polish
<details>block (verdict stays one-glance, reasoning one click away)./artifactspath.Telemetry
telemetry.csvon the_verify-screenshotsside branch. Columns:run_id, pr_number, verdict, target, evidence_verdict, evidence_retry, unit_tests_ran, unit_tests_passed, duration_ms, timestamp. After 10–20 PRs the data drives v8 prioritisation (in-app role discovery, 2-retry budget, cross-package story heuristic, etc.).Validation
Firetest PRs (fork-side):
foundvue3-vite/default-tsfoundsvelte-vite/default-tsangular-cli/default-ts