[firetest] Refactor svelte docgen source file cache handling#18
[firetest] Refactor svelte docgen source file cache handling#18valentinpalkovic wants to merge 68 commits into
Conversation
Single-template (react-vite/default-ts), single-story
(example-button--primary) PR verification entry script with 6 helpers
under scripts/verify/.
Flow: compile core -> symlink code/core/dist into NX-cached sandbox ->
boot Storybook on :6006 -> Playwright capture via SbPage from
code/e2e-tests/util.ts -> emit verify-result.json + iframe-clipped
screenshot under .verify-output/<runId>/.
Helpers:
- core.ts: types, run-path math, computeVerdict, pruneOldRuns(10)
- symlink.ts: lifted EPERM/EEXIST cp fallback from
scripts/tasks/sandbox-parts.ts:43-79 + net-new dangling-symlink heal
- sandbox.ts: multi-base resolveSandboxDir (code/sandbox, sandbox,
../storybook-sandboxes, STORYBOOK_SANDBOX_ROOT override),
snapshot/restore, sanitizeResolutions
- sync.ts: yarn nx compile core (run from repoRoot) + symlink dist
- boot.ts: cross-platform port preflight, idempotent SIGINT/SIGTERM
handlers, dual wait-on iframe.html + index.html (uses
node:child_process.spawn per repo lint policy)
- capture.ts: page.on('pageerror'/'console') registered before goto,
iframe-clipped screenshot
Run via `yarn verify-pr` (uses bun for native TS exec — node
strip-types rejects transitive enums in cli/projectTypes.ts).
Verification:
- V-1 sanity: verdict=verified, ~8s wall-time (well under 90s SLO)
- V-2 regression: VERIFY_HARNESS_TEST sentinel detected at compile,
exit 1
Plan: .omc/plans/pr-verify-poc-mvp.md
Research: .omc/research/research-20260508-prverify/report.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ning Pivot from custom Chromium launch (capture.ts) to spawning `bun x playwright test` against committed specs under `.verify-recipes/`. Trace artifacts are produced by Playwright's built-in tracing API and replayable via `npx playwright show-trace`. Schema bumped to v2 with per-test results, attached pageErrors/consoleErrors, and trace paths sourced from the Playwright JSON report contract. Adds Phase-1 security hardening: `.claude/settings.json` deny rules (local), `.dockerignore` for credential exclusion, `SECURITY.md` with phase-gated threat model and isolation matrix, and a gated `.github/workflows/verify-pr.yml` (if: false) scaffolding the Phase-2 container/proxy shape. Recipe-local `RecipePage` (`.verify-recipes/_util.ts`) reimplements only the subset of `SbPage` needed for verify recipes — Playwright's Node worker processes cannot strip the non-erasable TS enums reached transitively from `code/e2e-tests/util.ts`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allow overriding the Storybook port (default 6006) so the harness can run alongside side-processes that already occupy 6006. baseURL, preflightPort, bootStorybook, and the --resync alive-check are all threaded through the resolved port. Validates that the value parses as an integer in 1..65535. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a two-step flow on top of the v2 raw-Playwright runner: yarn verify-pr-generate --pr <#> # emit prompt bundle Skill: verify-recipe-author # dispatch executor, write spec yarn verify-pr --recipe-spec ... # run committed spec The generator script does deterministic I/O only — gh pr fetch, triage routing (19 path globs in scripts/verify/recipes/triage-table.ts mapping addon/manager/csf-tools/builder/framework/renderer changes to reference specs under code/e2e-tests/), per-file 500-line cap with 20-file total cap sorted triage-matched first, and prompt-bundle emission. The script never dispatches an agent and never writes the final spec. The verify-recipe-author skill (.agents/skills/verify-recipe-author/SKILL.md with redirect at .claude/skills/...) consumes the bundle, dispatches the oh-my-claudecode:executor subagent (model=opus), runs a security deny-regex guard (recipe-deny.ts: child_process, fs.unlink/rm, process.exit, eval, node: imports), prepends a header-comment provenance block to the agent output, writes .verify-recipes/pr-<#>.spec.ts, lints via yarn --cwd code lint:js:cmd with one categorized retry (recipe-retry-policy.ts: maxAttempts=2, errorCategories=[listener-before-goto, attach-pattern, imports]), runs post-write regex checks for the listener-before-goto and testInfo.attach pattern invariants, and emits result.json. Spec-name collision = fail unless --force; the human-review gate from v2's SECURITY.md is preserved (the skill never executes its output). The authoring guide at .verify-recipes/_recipe-authoring-guide.md is the agent's contract: import surface, listener-before-goto rule, attach pattern, RecipePage API, what to avoid, story URL routing, and per-change assertion shapes. Verification: structural ACs (V3-6, V3-7, V3-9, V3-10) pass via grep against the new files; AC-V3-1 (generator exit 0 + bundle written + next-step printed) and AC-V3-5 (committed spec runs end-to-end via verify-pr, schema v2 verdict emitted with trace.zip and per-test attachments) ran clean against PR storybookjs#34737 (manager-api/modules/stories.ts); AC-V3-3/V3-4 (listener-before-goto + attach-pattern regex) and AC-V3-8 (deny-regex aborts on child_process) verified directly. Phase-1 security model unchanged: spec-review gate is the lethal-trifecta breaker; the bun script + skill make that gate easy to apply, but never substitute for it. Phase-2 CI activation will require migration to a direct Anthropic SDK call with API-key handling — tracked in the SECURITY.md / README roadmap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a headless authoring path (yarn verify-pr-author) that consumes the
v3 prompt bundle and dispatches Claude directly via @anthropic-ai/sdk
(single-block prompt caching on guide + canonical smoke). Skill and CI
script share scripts/verify/recipe-author-core.ts so they cannot drift.
Three lanes:
- Lane A — scripts/verify/agent-dispatch.ts (SDK + MODEL_ID_MAP +
transport retry + stub mode + DEBUG redaction), recipe-author-core.ts
(TOCTOU -> dispatch -> deny-regex -> D8 header -> lint -> regex ->
categorize + retry), verify-pr-author.ts CLI with --dispatch-mode
{sdk|stdin} and --retry-of (D4-α EX_TEMPFAIL=75 sentinel),
recipe-retry-policy.ts extension (categorizeEslintViolations +
formatRetryMessage), three stub fixtures, @anthropic-ai/sdk 0.65.0
exact-pinned.
- Lane B — .github/workflows/verify-pr.yml flipped from if:false to
label-gated (ci:verify) + !draft + actor-permission-action; Generate
bundle + Author recipe steps added on bare runner with
ANTHROPIC_API_KEY scoped to Author recipe env only; spec-runner
container keeps --network=none and never sees the key; proxy.sock
mount removed (Envoy deferred to v5). SECURITY.md Phase-2 section +
README two-paths section.
- Lane C — scripts/verify/lint-invocation.ts wrapper (eslint via
require.resolve('eslint/package.json') + bin/eslint.js, --no-eslintrc
--no-ignore --resolve-plugins-relative-to repo-root); D3-E dedicated
recipe eslintrc (parserOptions.project:false, non-typed recommended,
argsIgnorePattern:'none'); SKILL.md Step 8 rewritten for the D4-α
retry contract.
Verification (10 acceptance criteria):
- AC-V4-2/4/5/6/7a/7b/8/9/10 PASS end-to-end against the existing v3
bundle. AC-V4-1 and AC-V4-3b gated on a live ANTHROPIC_API_KEY (CI
verification mandatory; local optional). AC-V4-3a passes 9/9
buildAnthropicRequest shape checks.
- AC-V4-7a SHA-256 parity: stdin + sdk paths produce byte-identical
specs (D8 header generatedAt pinned to bundle.metadata.generatedAt).
- AC-V4-9 redaction: dispatch-request.json contains no x-api-key /
authorization / sk-ant- substrings.
- AC-V4-10 retry: stdin attempt 1 exits 75 with framed retry block +
result.partial.json; stdin --retry-of <runId> attempt 2 exits 0 with
attempts=2.
scripts/verify-pr.ts (runner) untouched (frozen this increment). Envoy
credential-injector and author_association gating deferred to v5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve placeholder SHAs for the four third-party actions in .github/workflows/verify-pr.yml to commit SHAs of their latest stable releases. Required activation gate before the harness can fire in CI. - prince-chrismc/check-actor-permissions-action: v3.0.2 - actions/checkout: v6.0.2 - actions/upload-artifact: v7.0.1 - actions/github-script: v9.0.0 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous map pointed every entry at claude-opus-4-5-20250929, which returns 404 from the Anthropic API. Update to current public IDs: - claude-opus-4-7[1m] / claude-opus-4-7 → claude-opus-4-7 - claude-opus-4-6 → claude-opus-4-6 - claude-opus-4-5 → claude-opus-4-5-20251101 (correct snapshot) Update MODEL_MAX_TOKENS keys to match. Verified live AC-V4-1 (spec written) and AC-V4-3b (cache_read_input_tokens=4358 >= 1024) against PR storybookjs#34761. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two activation-blocking bugs surfaced by A4 label-fire test on valentinpalkovic/storybook fork: 1. Generate bundle step failed with "Couldn't find the node_modules state file" — workflow never ran yarn install after checkout. Add the standard `./.github/actions/setup-node-and-install` composite step between Checkout and Fetch PR diff. 2. Post PR comment hard-failed with ENOENT on `.verify-output/latest/verify-result.json`. The harness writes timestamped dirs and never creates a `latest` symlink, so the path was wrong on every run, not just failures. Replace with a sort- newest-first scan of `.verify-output/*/verify-result.json` and degrade gracefully when no verdict exists (workflow failed before harness ran), so the comment always posts a useful status. Remaining gap: `Run harness in container` step references `verify-harness:pinned-sha` which has no Dockerfile in repo and is not built anywhere in the workflow. Tracked as next activation gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A4 label-fire test on fork run #25673185333 failed at Generate bundle with "command not found: bun". The verify-pr-generate and verify-pr yarn scripts (in package.json:40,42) invoke bun directly. The composite setup-node-and-install action provisions Node/Yarn but not Bun, so add oven-sh/setup-bun pinned to v2.2.0 between Node setup and Fetch PR diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upload-artifact v7 emitted "No files were found with the provided path: .verify-output/*/" on A4 run #25673778823 despite the dir existing. The trailing-slash dir-glob isn't accepted as a file pattern in v7. Replace with the directory path, which uploads the whole tree. Add explicit `if-no-files-found: warn` so future glob drift surfaces as a warning rather than silent zero artifacts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A4 run #25674121554 confirmed .verify-output/ exists with the prompt bundle inside, but upload-artifact v7 silently skipped it because the default include-hidden-files: false rejects dot-prefixed paths. Set include-hidden-files: true. Drop the temporary debug step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the v5-0 gap where the `Run harness in container` workflow step
referenced `verify-harness:pinned-sha` with no Dockerfile in repo. The
harness can now produce verdicts in CI.
Implementation follows the ralplan-approved design at
`.omc/plans/v5-0-dockerfile.md` (4 iterations to consensus APPROVE from
Architect + Critic under DELIBERATE mode).
What lands:
- `scripts/verify/Dockerfile` — multi-stage build pinned by SHA digest
(Playwright v1.58.2-jammy base + Bun 1.3.0-slim via `COPY --from=`).
Pre-bakes node_modules + code/core/dist + react-vite/default-ts sandbox
so the runtime container can satisfy `--read-only` + `--network=none`.
Corepack is bypassed — yarn invoked directly via `node $YARN_BIN`.
Bakes `HEAD_SHA` for runtime drift detection.
- `scripts/verify/harden-build-context.sh` +
`scripts/verify/strip-lifecycle-scripts.mjs` — supply-chain hardening
that runs on the bare runner before `docker build`. Overlays trusted
`.dockerignore` / `.yarnrc.yml` / `.yarn/releases/` from base-sha,
strips lifecycle scripts from every workspace `package.json`,
normalises `packageManager`, deletes head-supplied `.npmrc`,
diff-asserts `Dockerfile` byte-identity. Walker is hardened with
symlink-skip, max-depth, 1 MB file-size cap, 60s timeout, and
prototype-chain hygiene.
- `.github/workflows/verify-pr.yml` — adds `Checkout PR head`,
`Spec precheck`, `Harden build context`, `Build harness image`
(with per-PR cache scope), `Smoke test image` (digest fail-closed),
`Run harness in container` (named container), and `Mirror tmpfs
output` (no `|| true` on the load-bearing copy).
- `.github/actions/verify-spec-precheck/action.yml` — extracted
composite action so v5-1's first-time-use UX can swap internals
without touching the workflow shape.
- `scripts/verify/core.ts` — adds `writeRegressionResult()` helper plus
optional `regressionReason` / `inContainer` / `imageDigest` /
`headSha` fields (schemaVersion unchanged).
- `scripts/verify-pr.ts` — honours `VERIFY_HARNESS_IN_CONTAINER=1` at
every sandbox-prep call site; rejects `--resync` in-container;
asserts `HEAD_SHA` via the new helper, warn-and-skip when
`VERIFY_HARNESS_EXPECTED_HEAD_SHA` is unset (laptop dev mode).
- `scripts/verify/playwright.config.ts` — chromium-only projects.
- `renovate.json` — tracks Playwright + Bun digests on weekly schedule.
- `scripts/verify/SECURITY.md` § Image-build provenance — documents
every supply-chain control plus the residual `GITHUB_TOKEN`-in-buildx
risk and its v5-1 job-split mitigation.
- `scripts/verify/RUNBOOK.md` — diagnosis playbook for failure signals.
- `scripts/verify/__tests__/` — four integration tests covering the
short-circuit, sandbox-root env, head-sha assertion, and hadolint.
Known residual risk (documented in SECURITY.md, deferred to v5-1):
`GITHUB_TOKEN` remains in the buildx daemon's process env on the
build step. The mitigation stack (lifecycle-script stripping,
`enableScripts: false`, `.npmrc` purge, corepack bypass, per-PR cache
scope, Dockerfile byte-identity) defends `yarn install` against
head-controlled code execution. v5-1 splits into prep + harness jobs
with `permissions: {}` on the harness job to eliminate this surface.
TODOs flagged in source:
- `timeout-minutes: 30` is a placeholder; AC-V5-0-2 cold-build
measurement is required before final lock-in.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
actions/checkout's submodule-foreach cleanup pass aborts with exit 128 on this repo because of orphan gitlinks under `.external/` that have no matching entries in any (missing) `.gitmodules` file. The base-sha checkout escapes this because it doesn't pass `persist-credentials: false` (the cleanup phase that runs `git submodule foreach` is gated on needing to scrub credentials). The PR-head checkout did set the flag for the v5-0 untrusted-context posture and hit the gitlinks/no-modules mismatch. Replace `actions/checkout@v6.0.2` for the PR-head step with a manual `git clone --no-tags --no-checkout --filter=blob:none` followed by a single-sha fetch and checkout. Strip the cached credential helper + rewrite the remote URL to drop the token afterwards. Net posture is equivalent to `persist-credentials: false` and `.git/` is excluded from the docker build context by `.dockerignore`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…xtraction Build failure on fork firetest run 25684557942: playwright:v1.58.2-jammy ships Node 24.13.0 (not 22.22.1), so the conditional Node re-install always fires; but the base image is missing `xz-utils`, so `tar -xJf` on the .tar.xz tarball aborts with "xz: Cannot exec: No such file or directory". Add an apt-get install of xz-utils + ca-certificates inside the same RUN block, gated on the same version-mismatch conditional so a future Playwright base that already ships Node 22.22.1 skips the apt fetch entirely (resolves the apt-vs-probe trade-off in OQ-V5-0-E). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build failure on fork firetest PR #3 run 25685301642: playwright base image ships `pwuser` at UID/GID 1000, so the unconditional `groupadd --gid 1000` aborts with "GID '1000' already exists". Guard the group and user creates with getent/id probes so the layer is idempotent across base-image variants that may or may not ship the user. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_BIN Yarn 4 treats any YARN_<KEY> env var as a config setting, so 'YARN_BIN' was being parsed as a 'bin' config key and rejected with 'Unrecognized or legacy configuration settings found: bin'. Rename the variable to HARNESS_YARN_BIN throughout the Dockerfile and matching docs.
…al target
scripts/package.json depends on eslint-plugin-local-rules via portal:
specifier. Yarn install fails ('Manifest not found') unless the portal
target directory is present in the build context. Add a COPY for the
whole scripts/eslint-plugin-local-rules/ directory in stage 2 so yarn
can resolve the portal manifest and 'yarn lint' has the rule files at
runtime.
WORKDIR /opt/verify-harness/repo runs as root, leaving the directory root-owned even though COPY --chown=1000:1000 sets file ownership. Yarn 4 runs as uid 1000 (USER 1000:1000) and fails the link step with EACCES while creating node_modules/. Add an explicit chown of the workdir before the USER switch so yarn can create node_modules and persist the link tree.
… task scripts/utils/cli-step.ts resolves dist/bin/index.js for both cli-storybook and create-storybook at module-eval time. Stage 3 only compiled 'core', so the sandbox task failed with MODULE_NOT_FOUND. Expand the nx target list to compile core + cli-storybook + create-storybook before the sandbox bootstrap step runs.
…torybook The nx project name in code/lib/cli-storybook/project.json is 'cli' (package name '@storybook/cli'), not 'cli-storybook'. The previous nx run-many list silently dropped the unknown target, so the cli package was never compiled and the sandbox task still failed with MODULE_NOT_FOUND for cli-storybook/dist/bin/index.js.
… index.js code/core/dist has no top-level index.js — the bundle is split into per-entry-point subdirectories (preview-api/, manager-api/, etc.) plus the bin script at dist/bin/dispatcher.js (declared in core package.json#bin). Update the sandbox dist sanity check to verify the dispatcher bin file instead.
…overlay 'yarn task sandbox' runs run-registry -> publish, which packs and republishes packages through verdaccio. The publish step can churn code/core/dist, so by the time stage 3.d tries to cp it the directory no longer exists. Re-run 'nx compile core' immediately before the cp to guarantee the freshly-built artifact from the PR head is in place, and fail loudly with a directory listing if it is somehow still missing.
…caffolding Eleven v5-0 firetest rounds confirmed the Dockerfile architecture is asymmetrically over-engineered: ~70% of the complexity (digest pins, harden-build-context overlay, lifecycle-script stripping, Verdaccio publish pipeline, BuildKit cache scope, smoke-test sentinel) addresses supply-chain threats that `enableScripts: false` + lockfile + `.npmrc` purge already mitigate. Runtime isolation — the threat the doc actually calls out for CI/CD — was weakly addressed (no `--cap-drop ALL` / `--network=none` / `--read-only` / `--tmpfs` in places it would matter). BuildKit also proved fragile: `code/core/dist` kept disappearing between stages 6 and 7 despite multiple recompile attempts. v6 drops the container and accepts the same isolation profile as the existing Storybook PR CI (ephemeral GitHub Actions runner). The recipe author step keeps `ANTHROPIC_API_KEY` scope-limited to one step; the committed-spec runner remains the lethal-trifecta breaker. If the threat model later expands to processing third-party PRs at scale with adversarial recipe authors, sandbox-runtime (bubblewrap on Linux) wraps just the playwright step in ~10 lines of config per Anthropic's "Securely deploying AI agents" doc. Do not reintroduce the full Docker + Verdaccio stack. Deletes: - scripts/verify/Dockerfile - scripts/verify/harden-build-context.sh - scripts/verify/strip-lifecycle-scripts.mjs - scripts/verify/SECURITY.md - scripts/verify/__tests__/dockerfile-lint.test.ts - scripts/verify/__tests__/head-sha-assertion.test.ts - scripts/verify/__tests__/in-container-shortcircuit.test.ts Strips v5-0 additions from .dockerignore (preserves the pre-existing sensitive-path entries) and removes the Dockerfile-pin rules from renovate.json. The workflow yml + verify-pr.ts still reference the container path; both get rewritten in the subsequent v6 commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ispatch Rewrites scripts/verify-pr.ts around two execution targets selected per recipe via a `// @verify-target:` header comment (scanned in the first 30 lines of the spec): internal-ui (default) Builds code/storybook-static once via `yarn storybook:ui:build`, then serves it on the requested port via `yarn http-server`. The fast path for fixes that exercise the monorepo's own UI against the PR head's compiled packages — no sandbox bootstrap, no verdaccio publish, no docker. sandbox:<template> Pre-existing sandbox flow — snapshotSandbox, sanitizeResolutions, syncCorePackage (symlink code/core/dist into the sandbox), then bootStorybook. Use only when a fix is template-specific (rare). Also: - Adds a positional <PR#> argument so `yarn verify-pr 34762` resolves to `.verify-recipes/pr-34762.spec.ts` automatically. The explicit `--recipe-spec <path>` flag still works and takes precedence. - Drops every `VERIFY_HARNESS_IN_CONTAINER` short-circuit, the `/opt/verify-harness/HEAD_SHA` runtime assertion, and the `VERIFY_HARNESS_EXPECTED_HEAD_SHA` env-var contract. The container paths no longer exist. - Drops the imageDigest / inContainer / headSha fields from VerifyResult writes and from writeRegressionResult's options. The fields stayed optional in the schema for backward-compat readers but are no longer populated. - Widens VerifyResult.template from the `'react-vite/default-ts'` literal to `string` so the field can carry `'internal-ui'` and arbitrary sandbox templates. - Switches the root `verify-pr` script from `bun scripts/verify-pr.ts` to `node ./scripts/verify-pr.ts`. verify-pr.ts no longer imports any of the non-erasable enum chain from cli-storybook, so Node 22.22.1's native TS strip is sufficient. The Playwright runner still spawns `bun x playwright test` internally because the recipe specs live under .verify-recipes/ and load through Playwright's own worker process, not through verify-pr.ts. - --resync now only applies to sandbox-target recipes (the internal-ui build is fast enough that --resync would add no value); the script exits with an actionable error if --resync is passed for an internal-ui recipe. - New: scripts/verify/target.ts (header parser, default = internal-ui). - New: scripts/verify/internal-ui.ts (storybook:ui:build + http-server boot, waitOn :port/index.html). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the v5-0 Docker pipeline (harden build context, buildx, image
build, smoke test, docker run with --network=none / --cap-drop ALL /
--read-only / tmpfs, docker cp mirror) with a single shell step that
runs the harness directly on the GitHub Actions runner.
Verify step (only when `verify-spec-precheck` reports the committed spec
exists at the PR head):
set -euo pipefail
yarn install --immutable
yarn nx run-many -t compile -p core,cli,create-storybook
yarn verify-pr --recipe-spec ".verify-recipes/pr-${PR_NUMBER}.spec.ts"
The internal-ui target (default) builds code/storybook-static once and
serves it via http-server. Sandbox targets follow the pre-existing
snapshot + sanitize + sync + boot flow. The recipe header chooses.
New: a `Read verdict` step parses `pr-head/.verify-output/*/verify-result.json`
and a `Apply verified-by-harness label` step adds the label to the PR
when the verdict is `verified`. The PR comment script renders the same
two-state not-applicable message and a verified-vs-regression block,
but drops the imageDigest reference (no longer populated by v6).
Permissions: pull-requests + issues (label add) + statuses. The
ANTHROPIC_API_KEY remains scoped to the `Author recipe` step only;
nothing downstream of that step ever sees the secret. The committed
spec under review is still the lethal-trifecta breaker — the runner
executes only what was committed and reviewed at the PR head.
Workflow surface dropped:
- Harden build context (./scripts/verify/harden-build-context.sh)
- Set up Docker Buildx
- Build harness image (docker/build-push-action + cache-to/from)
- Capture image digest
- Smoke test image
- Run harness in container
- Mirror tmpfs output to runner workspace
Net delta: -78 lines, +33 lines (-45 net) on verify-pr.yml.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrites the harness documentation around the v6 local-first
architecture. The Docker / Verdaccio / image-build-provenance sections
are dropped; new sections cover the per-recipe target dispatch
(`internal-ui` vs `sandbox:<template>`) and the runner-native CI
workflow.
scripts/verify/README.md
* Architecture diagram updated to list target.ts + internal-ui.ts.
* Flag table adds the positional <PR#> sugar and clarifies that
`--resync` and `--restore-sandbox` only apply to sandbox-target
recipes.
* `verify-result.json` example uses `template: "internal-ui"`.
* Prerequisites section calls out Node 22 (native TS-strip) as the
primary runtime; Bun is needed only by the Playwright runner.
* Side-effects section narrows to the sandbox target.
* CI section documents the new yaml shape.
* Drops the "Running inside the verify-harness container" section
in its entirety.
scripts/verify/RUNBOOK.md
* Full rewrite around the two flows: local AI fix-loop +
GH Actions runner. Drops every Docker / buildx / harden-script /
smoke-sentinel signal. Adds signals specific to v6:
bootInternalUi timeout, --resync rejected, sandbox bootstrap
missing, not-applicable verdict, label-step skipped, github-script
verdict-read failure.
scripts/verify/SECURITY.md (recreated)
* Brief threat-model note (~70 lines instead of 250+). Restates the
eight load-bearing controls (committed-spec review, scoped API
key, deny-regex, lint gate, provenance header, actor permission,
label gate, repo-wide deny rules) and explains why v6 dropped the
container without weakening the trifecta breakers. Notes the
sandbox-runtime path as the next-step option if the threat model
expands.
.verify-recipes/_recipe-authoring-guide.md
* New §12 "Target selection (v6)" documenting the
`// @verify-target:` header convention. Renumbers existing §12
"Output budget" to §13.
.verify-recipes/example-smoke.spec.ts
* Adds the explicit `// @verify-target: internal-ui` header as the
canonical baseline example.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…step The Verify PR step runs 'yarn nx run-many -t compile -p core,cli,create-storybook' inside the pr-head/ checkout. When the workflow runs on a fork without Nx Cloud org access (e.g. the v6 firetest on valentinpalkovic/storybook), nx aborts with: NX Cloud: Workspace is unable to be authorized. Exiting run. This Nx Cloud organization is disabled. The Verify step doesn't need distributed cache for correctness — a clean compile against the PR head is the whole point. Force-disable Nx Cloud (NX_NO_CLOUD=true + empty access token) on the step's env block. Upstream storybookjs/storybook CI is unaffected: other workflow steps (Generate bundle, Author recipe) that already rely on Nx Cloud auth continue to use it; only the Verify step opts out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ternal-ui
The Verify PR step previously ran:
yarn nx run-many -t compile -p core,cli,create-storybook
This is sufficient for sandbox-target recipes (the sandbox already has
its own node_modules) but not for the internal-ui target. The
internal-ui build invokes 'yarn storybook:ui:build' in code/, which
loads code/.storybook/main.ts, which imports @storybook/react-vite
plus every addon (addon-onboarding, addon-themes, addon-docs,
addon-designs, addon-vitest, addon-a11y, addon-mcp) and transitive
renderer + builder packages. None of those are compiled by the
narrower filter, so .storybook/main.ts evaluation fails with
ERR_MODULE_NOT_FOUND.
Drop the project filter and let nx compile every project. Slower per
run but correct for the default target. Sandbox-target recipes are
unaffected — the same compile output is reused under the
syncCorePackage symlink path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r eslint-plugin topo eslint-plugin's prebuild script (code/lib/eslint-plugin/scripts/...) imports from 'storybook/csf' via jiti. nx run-many parallelises 42 projects without enforcing the compile-order edge (the repo lacks an explicit dependsOn: ['^compile'] for that target), so eslint-plugin runs before core finishes and the import resolves upward through the parent base checkout's node_modules/storybook symlink → which has no dist/csf yet → ERR_MODULE_NOT_FOUND. Compile core first explicitly, then run-many for everything else. nx caches the core build so the second pass is a no-op for that target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Verify HarnessVerdict: Evidence (after 1 retry) (vision-check, Vision reasoningThe diff is a pure refactoring of internal TypeScript code in the svelte-vite plugin (extracting helper functions for cache management). While the Playwright recipe asserts that the ArgsTable renders correctly with prop rows visible, the screenshots show the UI working as expected, which is consistent with the refactoring being functionally correct. However, the diff itself contains no user-visible changes—it's purely a code organization/extraction refactor with no observable UI differences from before/after. Replay: Screenshots
|
Verify HarnessVerdict: Evidence (after 1 retry) (vision-check, Vision reasoningThe diff is a pure refactor of internal build-time docgen cache helper functions (extracting repeated logic into three new helper functions). This has no user-visible UI changes — the refactoring only reorganizes how the cache is accessed/updated internally. The Playwright recipe correctly verifies that the refactored code still produces correct docgen output by checking that prop rows (label, primary) appear in the Controls panel, which they do in the screenshots, but this is a functional correctness test rather than a visual change detection test. Replay: Screenshots
|
…x-target CI, Layer-1/Layer-2 security, retry on regression, telemetry Squash of fork-side iteration on top of the single-round v6 pivot. Major changes since 00aa5c4: ## Verdict layering - Three orthogonal signals: Playwright (recipe execution) + vision evidence-check (claude-haiku-4-5 reading the diff + spec + screenshots) + PR-added unit tests (vitest on *.test.* files from the PR diff). - Final verdict gates on AND of Playwright + unit tests. Vision is informational (catches sr-only / invisible changes where assertions pass but screenshots can't confirm). - regressionReason is derived from playwright-report.json when the recipe author doesn't populate one — reviewers see the failing test title + first error inline. ## Retry loop - Retry-on-regression: feeds Playwright error context (page snapshot + iframe a11y snapshot + first error from playwright-report.json) back to the recipe author as --retry-context. Author re-emits the spec, Playwright re-runs. Single retry; final verdict gates label. - Retry-on-evidence-undetermined: feeds vision reasoning back so the author can target the diff more precisely (e.g., tighter screenshot region). ## Sandbox-target CI path - Recipes can set `// @verify-target: sandbox:<template>` (e.g., `sandbox:vue3-vite/default-ts`). The workflow detects the header, runs `nx run <template>:sandbox` (NX resolves implicitDependencies, emits the sandbox at code/sandbox/<key>), and verify-pr.ts boots Storybook against that sandbox instead of the internal-ui dev server. - Allowlisted templates: react-vite, react-webpack, vue3-vite, svelte-vite, angular-cli, nextjs, nextjs-vite (all default-ts). - Skips the global `compile` target when sandbox-bound — `:sandbox` handles all transitive deps via the NX project graph. ## Layer-1 security: secret stripping - pull_request_target runs build / sandbox / recipe code from the untrusted PR head as the runner user that holds GITHUB_TOKEN (contents:write, pull-requests:write) and ANTHROPIC_API_KEY. - The Verify-PR, Retry-on-regression, and Run-PR-added-unit-tests steps `unset GITHUB_TOKEN GH_TOKEN ANTHROPIC_API_KEY` before invoking any PR-head script. Trusted scripts above (verify-pr-generate, verify-pr-author) still see the keys because env -u (or env --unset on the inner command) only strips for the single command. ## Layer-2 security: @anthropic-ai/sandbox-runtime jail - Wraps `yarn verify-pr` (initial attempt + retry) in srt with a bubblewrap-backed FS + network jail. Defence-in-depth on top of Layer-1. - network.allowLocalBinding: true (Storybook dev server on localhost:6006); network.allowedDomains: [] (no public-internet egress). - filesystem.allowWrite: $RUNNER_TEMP, /tmp, $HOME/.cache, $HOME/.local/share, $HOME/.storybook. - filesystem.denyRead: $HOME/{.ssh, .aws, .docker, .npmrc, .gitconfig, .config/gh} (belt-and-suspenders alongside the env stripping). - CLAUDE_CODE_TMPDIR=$RUNNER_TEMP/sandbox-tmp so the sandbox's TMPDIR bind source exists on the host. ## Recipe-author quality - Deterministic story-route derivation: scripts/verify/derive-story- routes.ts parses code/.storybook/main.ts via TS AST + inlines Storybook's auto-title / toId / storyNameFromExport algorithms. Routes injected into the prompt bundle verbatim — agents stop guessing 404 paths. - Full source of touched non-stories files in the prompt bundle (capped 250 lines per file, 4 files per PR). Agents see actual component props / ariaLabels / data-attrs upfront. - Iframe a11y snapshot fixture in _util.ts: on test failure, writes the preview-iframe's body.ariaSnapshot() to iframe-snapshot.md. Retry step appends this alongside the manager page-snapshot. - Authoring guide §8.1 expanded with evidence requirement + four-step evidence gate + worked examples (focus-ring, Save-from-Controls icon swap, sr-only label gating). ## Compile-failure surfacing - When `nx compile` fails before Playwright runs, the workflow writes a stub verify-result.json with verdict=regression, regressionReason= "compile failure", regressionDetails=tail -c 4000 of the log (ANSI-stripped). PR comment renders the build error in-line so reviewers see WHY without downloading artifacts. ## UX polish - Vision reasoning collapsed inside a <details> block (verdict stays one-glance, reasoning one click away). - PR comment unitTests block renders ✅/❌ alongside Playwright + vision so reviewers see all three signals together. - Artifact zip staged under non-dot dirs so reviewers can browse it without toggling Finder's hidden-file display. - Replay link points at the run-summary page (where the Artifacts section lives) instead of the 404-emitting /artifacts path. ## Telemetry - New "Append telemetry" workflow step writes one CSV row per run to telemetry.csv on the _verify-screenshots side branch. Columns: run_id, pr_number, verdict, target, evidence_verdict, evidence_retry, unit_tests_ran, unit_tests_passed, duration_ms, timestamp. After 10–20 PRs the data drives v8 prioritisation (in-app role discovery, 2-retry budget, cross-package story heuristic, etc.). ## Validation Firetest PRs (fork-side): - #12 internal-ui smoke — verified - #13 Save-from-Controls icon swap — verified + evidence found - #14 ObjectControl raw JSON sr-only label — verified after retry - #15 ArgsTable dark-mode border — regression (genuine compile-fail) - #16 sidebar focus ring — verified, three signals positive - #17 Vue3 page-style scoping (sandbox target) — verified + found - #18 Svelte docgen refactor (sandbox target) — verified - #21 Angular stats.json (sandbox target) — verified Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ad75ba9 to
099b6f7
Compare









Cross-renderer sweep firetest for upstream PR storybookjs#34644. Validates Layer-2 sandbox-runtime + recipe authoring against the target framework template. Do not merge.