Skip to content

try: storybookjs/storybook#34767 — UndoIcon for Review-changes clear#13

Open
valentinpalkovic wants to merge 39 commits into
nextfrom
try-pr-34767
Open

try: storybookjs/storybook#34767 — UndoIcon for Review-changes clear#13
valentinpalkovic wants to merge 39 commits into
nextfrom
try-pr-34767

Conversation

@valentinpalkovic
Copy link
Copy Markdown
Owner

Cherry-pick of upstream b779fb7 (storybookjs#34767) onto fork's next so the v6 single-round verify harness can author + run a recipe against it.

Reference: storybookjs#34767

valentinpalkovic and others added 30 commits May 8, 2026 23:36
Single-template (react-vite/default-ts), single-story
(example-button--primary) PR verification entry script with 6 helpers
under scripts/verify/.

Flow: compile core -> symlink code/core/dist into NX-cached sandbox ->
boot Storybook on :6006 -> Playwright capture via SbPage from
code/e2e-tests/util.ts -> emit verify-result.json + iframe-clipped
screenshot under .verify-output/<runId>/.

Helpers:
- core.ts: types, run-path math, computeVerdict, pruneOldRuns(10)
- symlink.ts: lifted EPERM/EEXIST cp fallback from
  scripts/tasks/sandbox-parts.ts:43-79 + net-new dangling-symlink heal
- sandbox.ts: multi-base resolveSandboxDir (code/sandbox, sandbox,
  ../storybook-sandboxes, STORYBOOK_SANDBOX_ROOT override),
  snapshot/restore, sanitizeResolutions
- sync.ts: yarn nx compile core (run from repoRoot) + symlink dist
- boot.ts: cross-platform port preflight, idempotent SIGINT/SIGTERM
  handlers, dual wait-on iframe.html + index.html (uses
  node:child_process.spawn per repo lint policy)
- capture.ts: page.on('pageerror'/'console') registered before goto,
  iframe-clipped screenshot

Run via `yarn verify-pr` (uses bun for native TS exec — node
strip-types rejects transitive enums in cli/projectTypes.ts).

Verification:
- V-1 sanity: verdict=verified, ~8s wall-time (well under 90s SLO)
- V-2 regression: VERIFY_HARNESS_TEST sentinel detected at compile,
  exit 1

Plan: .omc/plans/pr-verify-poc-mvp.md
Research: .omc/research/research-20260508-prverify/report.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ning

Pivot from custom Chromium launch (capture.ts) to spawning `bun x playwright test`
against committed specs under `.verify-recipes/`. Trace artifacts are produced by
Playwright's built-in tracing API and replayable via `npx playwright show-trace`.
Schema bumped to v2 with per-test results, attached pageErrors/consoleErrors, and
trace paths sourced from the Playwright JSON report contract.

Adds Phase-1 security hardening: `.claude/settings.json` deny rules (local),
`.dockerignore` for credential exclusion, `SECURITY.md` with phase-gated threat
model and isolation matrix, and a gated `.github/workflows/verify-pr.yml`
(if: false) scaffolding the Phase-2 container/proxy shape.

Recipe-local `RecipePage` (`.verify-recipes/_util.ts`) reimplements only the
subset of `SbPage` needed for verify recipes — Playwright's Node worker
processes cannot strip the non-erasable TS enums reached transitively from
`code/e2e-tests/util.ts`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allow overriding the Storybook port (default 6006) so the harness can run
alongside side-processes that already occupy 6006. baseURL, preflightPort,
bootStorybook, and the --resync alive-check are all threaded through the
resolved port. Validates that the value parses as an integer in 1..65535.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a two-step flow on top of the v2 raw-Playwright runner:

  yarn verify-pr-generate --pr <#>      # emit prompt bundle
  Skill: verify-recipe-author           # dispatch executor, write spec
  yarn verify-pr --recipe-spec ...      # run committed spec

The generator script does deterministic I/O only — gh pr fetch, triage
routing (19 path globs in scripts/verify/recipes/triage-table.ts mapping
addon/manager/csf-tools/builder/framework/renderer changes to reference
specs under code/e2e-tests/), per-file 500-line cap with 20-file total cap
sorted triage-matched first, and prompt-bundle emission. The script never
dispatches an agent and never writes the final spec.

The verify-recipe-author skill (.agents/skills/verify-recipe-author/SKILL.md
with redirect at .claude/skills/...) consumes the bundle, dispatches the
oh-my-claudecode:executor subagent (model=opus), runs a security deny-regex
guard (recipe-deny.ts: child_process, fs.unlink/rm, process.exit, eval,
node: imports), prepends a header-comment provenance block to the agent
output, writes .verify-recipes/pr-<#>.spec.ts, lints via
yarn --cwd code lint:js:cmd with one categorized retry (recipe-retry-policy.ts:
maxAttempts=2, errorCategories=[listener-before-goto, attach-pattern,
imports]), runs post-write regex checks for the listener-before-goto and
testInfo.attach pattern invariants, and emits result.json.

Spec-name collision = fail unless --force; the human-review gate from
v2's SECURITY.md is preserved (the skill never executes its output).

The authoring guide at .verify-recipes/_recipe-authoring-guide.md is the
agent's contract: import surface, listener-before-goto rule, attach
pattern, RecipePage API, what to avoid, story URL routing, and per-change
assertion shapes.

Verification: structural ACs (V3-6, V3-7, V3-9, V3-10) pass via grep
against the new files; AC-V3-1 (generator exit 0 + bundle written +
next-step printed) and AC-V3-5 (committed spec runs end-to-end via
verify-pr, schema v2 verdict emitted with trace.zip and per-test
attachments) ran clean against PR storybookjs#34737 (manager-api/modules/stories.ts);
AC-V3-3/V3-4 (listener-before-goto + attach-pattern regex) and AC-V3-8
(deny-regex aborts on child_process) verified directly.

Phase-1 security model unchanged: spec-review gate is the lethal-trifecta
breaker; the bun script + skill make that gate easy to apply, but never
substitute for it. Phase-2 CI activation will require migration to a
direct Anthropic SDK call with API-key handling — tracked in the
SECURITY.md / README roadmap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a headless authoring path (yarn verify-pr-author) that consumes the
v3 prompt bundle and dispatches Claude directly via @anthropic-ai/sdk
(single-block prompt caching on guide + canonical smoke). Skill and CI
script share scripts/verify/recipe-author-core.ts so they cannot drift.

Three lanes:
- Lane A — scripts/verify/agent-dispatch.ts (SDK + MODEL_ID_MAP +
  transport retry + stub mode + DEBUG redaction), recipe-author-core.ts
  (TOCTOU -> dispatch -> deny-regex -> D8 header -> lint -> regex ->
  categorize + retry), verify-pr-author.ts CLI with --dispatch-mode
  {sdk|stdin} and --retry-of (D4-α EX_TEMPFAIL=75 sentinel),
  recipe-retry-policy.ts extension (categorizeEslintViolations +
  formatRetryMessage), three stub fixtures, @anthropic-ai/sdk 0.65.0
  exact-pinned.
- Lane B — .github/workflows/verify-pr.yml flipped from if:false to
  label-gated (ci:verify) + !draft + actor-permission-action; Generate
  bundle + Author recipe steps added on bare runner with
  ANTHROPIC_API_KEY scoped to Author recipe env only; spec-runner
  container keeps --network=none and never sees the key; proxy.sock
  mount removed (Envoy deferred to v5). SECURITY.md Phase-2 section +
  README two-paths section.
- Lane C — scripts/verify/lint-invocation.ts wrapper (eslint via
  require.resolve('eslint/package.json') + bin/eslint.js, --no-eslintrc
  --no-ignore --resolve-plugins-relative-to repo-root); D3-E dedicated
  recipe eslintrc (parserOptions.project:false, non-typed recommended,
  argsIgnorePattern:'none'); SKILL.md Step 8 rewritten for the D4-α
  retry contract.

Verification (10 acceptance criteria):
- AC-V4-2/4/5/6/7a/7b/8/9/10 PASS end-to-end against the existing v3
  bundle. AC-V4-1 and AC-V4-3b gated on a live ANTHROPIC_API_KEY (CI
  verification mandatory; local optional). AC-V4-3a passes 9/9
  buildAnthropicRequest shape checks.
- AC-V4-7a SHA-256 parity: stdin + sdk paths produce byte-identical
  specs (D8 header generatedAt pinned to bundle.metadata.generatedAt).
- AC-V4-9 redaction: dispatch-request.json contains no x-api-key /
  authorization / sk-ant- substrings.
- AC-V4-10 retry: stdin attempt 1 exits 75 with framed retry block +
  result.partial.json; stdin --retry-of <runId> attempt 2 exits 0 with
  attempts=2.

scripts/verify-pr.ts (runner) untouched (frozen this increment). Envoy
credential-injector and author_association gating deferred to v5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve placeholder SHAs for the four third-party actions in
.github/workflows/verify-pr.yml to commit SHAs of their latest stable
releases. Required activation gate before the harness can fire in CI.

- prince-chrismc/check-actor-permissions-action: v3.0.2
- actions/checkout: v6.0.2
- actions/upload-artifact: v7.0.1
- actions/github-script: v9.0.0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous map pointed every entry at claude-opus-4-5-20250929, which
returns 404 from the Anthropic API. Update to current public IDs:

- claude-opus-4-7[1m] / claude-opus-4-7 → claude-opus-4-7
- claude-opus-4-6 → claude-opus-4-6
- claude-opus-4-5 → claude-opus-4-5-20251101 (correct snapshot)

Update MODEL_MAX_TOKENS keys to match. Verified live AC-V4-1 (spec
written) and AC-V4-3b (cache_read_input_tokens=4358 >= 1024) against
PR storybookjs#34761.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two activation-blocking bugs surfaced by A4 label-fire test on
valentinpalkovic/storybook fork:

1. Generate bundle step failed with "Couldn't find the node_modules
   state file" — workflow never ran yarn install after checkout. Add
   the standard `./.github/actions/setup-node-and-install` composite
   step between Checkout and Fetch PR diff.

2. Post PR comment hard-failed with ENOENT on
   `.verify-output/latest/verify-result.json`. The harness writes
   timestamped dirs and never creates a `latest` symlink, so the path
   was wrong on every run, not just failures. Replace with a sort-
   newest-first scan of `.verify-output/*/verify-result.json` and
   degrade gracefully when no verdict exists (workflow failed before
   harness ran), so the comment always posts a useful status.

Remaining gap: `Run harness in container` step references
`verify-harness:pinned-sha` which has no Dockerfile in repo and is
not built anywhere in the workflow. Tracked as next activation gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A4 label-fire test on fork run #25673185333 failed at Generate bundle
with "command not found: bun". The verify-pr-generate and verify-pr
yarn scripts (in package.json:40,42) invoke bun directly. The composite
setup-node-and-install action provisions Node/Yarn but not Bun, so add
oven-sh/setup-bun pinned to v2.2.0 between Node setup and Fetch PR
diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upload-artifact v7 emitted "No files were found with the provided path:
.verify-output/*/" on A4 run #25673778823 despite the dir existing. The
trailing-slash dir-glob isn't accepted as a file pattern in v7. Replace
with the directory path, which uploads the whole tree. Add explicit
`if-no-files-found: warn` so future glob drift surfaces as a warning
rather than silent zero artifacts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A4 run #25674121554 confirmed .verify-output/ exists with the prompt
bundle inside, but upload-artifact v7 silently skipped it because the
default include-hidden-files: false rejects dot-prefixed paths. Set
include-hidden-files: true. Drop the temporary debug step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the v5-0 gap where the `Run harness in container` workflow step
referenced `verify-harness:pinned-sha` with no Dockerfile in repo. The
harness can now produce verdicts in CI.

Implementation follows the ralplan-approved design at
`.omc/plans/v5-0-dockerfile.md` (4 iterations to consensus APPROVE from
Architect + Critic under DELIBERATE mode).

What lands:

- `scripts/verify/Dockerfile` — multi-stage build pinned by SHA digest
  (Playwright v1.58.2-jammy base + Bun 1.3.0-slim via `COPY --from=`).
  Pre-bakes node_modules + code/core/dist + react-vite/default-ts sandbox
  so the runtime container can satisfy `--read-only` + `--network=none`.
  Corepack is bypassed — yarn invoked directly via `node $YARN_BIN`.
  Bakes `HEAD_SHA` for runtime drift detection.

- `scripts/verify/harden-build-context.sh` +
  `scripts/verify/strip-lifecycle-scripts.mjs` — supply-chain hardening
  that runs on the bare runner before `docker build`. Overlays trusted
  `.dockerignore` / `.yarnrc.yml` / `.yarn/releases/` from base-sha,
  strips lifecycle scripts from every workspace `package.json`,
  normalises `packageManager`, deletes head-supplied `.npmrc`,
  diff-asserts `Dockerfile` byte-identity. Walker is hardened with
  symlink-skip, max-depth, 1 MB file-size cap, 60s timeout, and
  prototype-chain hygiene.

- `.github/workflows/verify-pr.yml` — adds `Checkout PR head`,
  `Spec precheck`, `Harden build context`, `Build harness image`
  (with per-PR cache scope), `Smoke test image` (digest fail-closed),
  `Run harness in container` (named container), and `Mirror tmpfs
  output` (no `|| true` on the load-bearing copy).

- `.github/actions/verify-spec-precheck/action.yml` — extracted
  composite action so v5-1's first-time-use UX can swap internals
  without touching the workflow shape.

- `scripts/verify/core.ts` — adds `writeRegressionResult()` helper plus
  optional `regressionReason` / `inContainer` / `imageDigest` /
  `headSha` fields (schemaVersion unchanged).

- `scripts/verify-pr.ts` — honours `VERIFY_HARNESS_IN_CONTAINER=1` at
  every sandbox-prep call site; rejects `--resync` in-container;
  asserts `HEAD_SHA` via the new helper, warn-and-skip when
  `VERIFY_HARNESS_EXPECTED_HEAD_SHA` is unset (laptop dev mode).

- `scripts/verify/playwright.config.ts` — chromium-only projects.

- `renovate.json` — tracks Playwright + Bun digests on weekly schedule.

- `scripts/verify/SECURITY.md` § Image-build provenance — documents
  every supply-chain control plus the residual `GITHUB_TOKEN`-in-buildx
  risk and its v5-1 job-split mitigation.

- `scripts/verify/RUNBOOK.md` — diagnosis playbook for failure signals.

- `scripts/verify/__tests__/` — four integration tests covering the
  short-circuit, sandbox-root env, head-sha assertion, and hadolint.

Known residual risk (documented in SECURITY.md, deferred to v5-1):
`GITHUB_TOKEN` remains in the buildx daemon's process env on the
build step. The mitigation stack (lifecycle-script stripping,
`enableScripts: false`, `.npmrc` purge, corepack bypass, per-PR cache
scope, Dockerfile byte-identity) defends `yarn install` against
head-controlled code execution. v5-1 splits into prep + harness jobs
with `permissions: {}` on the harness job to eliminate this surface.

TODOs flagged in source:
- `timeout-minutes: 30` is a placeholder; AC-V5-0-2 cold-build
  measurement is required before final lock-in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
actions/checkout's submodule-foreach cleanup pass aborts with exit 128
on this repo because of orphan gitlinks under `.external/` that have no
matching entries in any (missing) `.gitmodules` file. The base-sha
checkout escapes this because it doesn't pass `persist-credentials:
false` (the cleanup phase that runs `git submodule foreach` is gated on
needing to scrub credentials). The PR-head checkout did set the flag
for the v5-0 untrusted-context posture and hit the gitlinks/no-modules
mismatch.

Replace `actions/checkout@v6.0.2` for the PR-head step with a manual
`git clone --no-tags --no-checkout --filter=blob:none` followed by a
single-sha fetch and checkout. Strip the cached credential helper +
rewrite the remote URL to drop the token afterwards. Net posture is
equivalent to `persist-credentials: false` and `.git/` is excluded
from the docker build context by `.dockerignore`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…xtraction

Build failure on fork firetest run 25684557942: playwright:v1.58.2-jammy
ships Node 24.13.0 (not 22.22.1), so the conditional Node re-install
always fires; but the base image is missing `xz-utils`, so `tar -xJf` on
the .tar.xz tarball aborts with "xz: Cannot exec: No such file or
directory". Add an apt-get install of xz-utils + ca-certificates inside
the same RUN block, gated on the same version-mismatch conditional so a
future Playwright base that already ships Node 22.22.1 skips the apt
fetch entirely (resolves the apt-vs-probe trade-off in OQ-V5-0-E).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build failure on fork firetest PR #3 run 25685301642: playwright base
image ships `pwuser` at UID/GID 1000, so the unconditional `groupadd
--gid 1000` aborts with "GID '1000' already exists". Guard the group
and user creates with getent/id probes so the layer is idempotent
across base-image variants that may or may not ship the user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_BIN

Yarn 4 treats any YARN_<KEY> env var as a config setting, so 'YARN_BIN'
was being parsed as a 'bin' config key and rejected with 'Unrecognized or
legacy configuration settings found: bin'. Rename the variable to
HARNESS_YARN_BIN throughout the Dockerfile and matching docs.
…al target

scripts/package.json depends on eslint-plugin-local-rules via portal:
specifier. Yarn install fails ('Manifest not found') unless the portal
target directory is present in the build context. Add a COPY for the
whole scripts/eslint-plugin-local-rules/ directory in stage 2 so yarn
can resolve the portal manifest and 'yarn lint' has the rule files at
runtime.
WORKDIR /opt/verify-harness/repo runs as root, leaving the directory
root-owned even though COPY --chown=1000:1000 sets file ownership.
Yarn 4 runs as uid 1000 (USER 1000:1000) and fails the link step with
EACCES while creating node_modules/. Add an explicit chown of the
workdir before the USER switch so yarn can create node_modules and
persist the link tree.
… task

scripts/utils/cli-step.ts resolves dist/bin/index.js for both
cli-storybook and create-storybook at module-eval time. Stage 3 only
compiled 'core', so the sandbox task failed with MODULE_NOT_FOUND.
Expand the nx target list to compile core + cli-storybook +
create-storybook before the sandbox bootstrap step runs.
…torybook

The nx project name in code/lib/cli-storybook/project.json is 'cli'
(package name '@storybook/cli'), not 'cli-storybook'. The previous
nx run-many list silently dropped the unknown target, so the cli
package was never compiled and the sandbox task still failed with
MODULE_NOT_FOUND for cli-storybook/dist/bin/index.js.
… index.js

code/core/dist has no top-level index.js — the bundle is split into
per-entry-point subdirectories (preview-api/, manager-api/, etc.) plus
the bin script at dist/bin/dispatcher.js (declared in core
package.json#bin). Update the sandbox dist sanity check to verify the
dispatcher bin file instead.
…overlay

'yarn task sandbox' runs run-registry -> publish, which packs and
republishes packages through verdaccio. The publish step can churn
code/core/dist, so by the time stage 3.d tries to cp it the directory
no longer exists. Re-run 'nx compile core' immediately before the cp
to guarantee the freshly-built artifact from the PR head is in place,
and fail loudly with a directory listing if it is somehow still
missing.
…caffolding

Eleven v5-0 firetest rounds confirmed the Dockerfile architecture is
asymmetrically over-engineered: ~70% of the complexity (digest pins,
harden-build-context overlay, lifecycle-script stripping, Verdaccio publish
pipeline, BuildKit cache scope, smoke-test sentinel) addresses supply-chain
threats that `enableScripts: false` + lockfile + `.npmrc` purge already
mitigate. Runtime isolation — the threat the doc actually calls out for
CI/CD — was weakly addressed (no `--cap-drop ALL` / `--network=none` /
`--read-only` / `--tmpfs` in places it would matter). BuildKit also
proved fragile: `code/core/dist` kept disappearing between stages 6 and 7
despite multiple recompile attempts.

v6 drops the container and accepts the same isolation profile as the
existing Storybook PR CI (ephemeral GitHub Actions runner). The recipe
author step keeps `ANTHROPIC_API_KEY` scope-limited to one step; the
committed-spec runner remains the lethal-trifecta breaker.

If the threat model later expands to processing third-party PRs at scale
with adversarial recipe authors, sandbox-runtime (bubblewrap on Linux)
wraps just the playwright step in ~10 lines of config per Anthropic's
"Securely deploying AI agents" doc. Do not reintroduce the full Docker +
Verdaccio stack.

Deletes:
- scripts/verify/Dockerfile
- scripts/verify/harden-build-context.sh
- scripts/verify/strip-lifecycle-scripts.mjs
- scripts/verify/SECURITY.md
- scripts/verify/__tests__/dockerfile-lint.test.ts
- scripts/verify/__tests__/head-sha-assertion.test.ts
- scripts/verify/__tests__/in-container-shortcircuit.test.ts

Strips v5-0 additions from .dockerignore (preserves the pre-existing
sensitive-path entries) and removes the Dockerfile-pin rules from
renovate.json.

The workflow yml + verify-pr.ts still reference the container path; both
get rewritten in the subsequent v6 commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ispatch

Rewrites scripts/verify-pr.ts around two execution targets selected per
recipe via a `// @verify-target:` header comment (scanned in the first
30 lines of the spec):

  internal-ui (default)
      Builds code/storybook-static once via `yarn storybook:ui:build`,
      then serves it on the requested port via `yarn http-server`. The
      fast path for fixes that exercise the monorepo's own UI against
      the PR head's compiled packages — no sandbox bootstrap, no
      verdaccio publish, no docker.

  sandbox:<template>
      Pre-existing sandbox flow — snapshotSandbox, sanitizeResolutions,
      syncCorePackage (symlink code/core/dist into the sandbox), then
      bootStorybook. Use only when a fix is template-specific (rare).

Also:

- Adds a positional <PR#> argument so `yarn verify-pr 34762` resolves to
  `.verify-recipes/pr-34762.spec.ts` automatically. The explicit
  `--recipe-spec <path>` flag still works and takes precedence.
- Drops every `VERIFY_HARNESS_IN_CONTAINER` short-circuit, the
  `/opt/verify-harness/HEAD_SHA` runtime assertion, and the
  `VERIFY_HARNESS_EXPECTED_HEAD_SHA` env-var contract. The container
  paths no longer exist.
- Drops the imageDigest / inContainer / headSha fields from
  VerifyResult writes and from writeRegressionResult's options. The
  fields stayed optional in the schema for backward-compat readers but
  are no longer populated.
- Widens VerifyResult.template from the
  `'react-vite/default-ts'` literal to `string` so the field can carry
  `'internal-ui'` and arbitrary sandbox templates.
- Switches the root `verify-pr` script from `bun scripts/verify-pr.ts`
  to `node ./scripts/verify-pr.ts`. verify-pr.ts no longer imports any
  of the non-erasable enum chain from cli-storybook, so Node 22.22.1's
  native TS strip is sufficient. The Playwright runner still spawns
  `bun x playwright test` internally because the recipe specs live
  under .verify-recipes/ and load through Playwright's own worker
  process, not through verify-pr.ts.
- --resync now only applies to sandbox-target recipes (the internal-ui
  build is fast enough that --resync would add no value); the script
  exits with an actionable error if --resync is passed for an
  internal-ui recipe.
- New: scripts/verify/target.ts (header parser, default = internal-ui).
- New: scripts/verify/internal-ui.ts (storybook:ui:build + http-server
  boot, waitOn :port/index.html).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the v5-0 Docker pipeline (harden build context, buildx, image
build, smoke test, docker run with --network=none / --cap-drop ALL /
--read-only / tmpfs, docker cp mirror) with a single shell step that
runs the harness directly on the GitHub Actions runner.

Verify step (only when `verify-spec-precheck` reports the committed spec
exists at the PR head):

    set -euo pipefail
    yarn install --immutable
    yarn nx run-many -t compile -p core,cli,create-storybook
    yarn verify-pr --recipe-spec ".verify-recipes/pr-${PR_NUMBER}.spec.ts"

The internal-ui target (default) builds code/storybook-static once and
serves it via http-server. Sandbox targets follow the pre-existing
snapshot + sanitize + sync + boot flow. The recipe header chooses.

New: a `Read verdict` step parses `pr-head/.verify-output/*/verify-result.json`
and a `Apply verified-by-harness label` step adds the label to the PR
when the verdict is `verified`. The PR comment script renders the same
two-state not-applicable message and a verified-vs-regression block,
but drops the imageDigest reference (no longer populated by v6).

Permissions: pull-requests + issues (label add) + statuses. The
ANTHROPIC_API_KEY remains scoped to the `Author recipe` step only;
nothing downstream of that step ever sees the secret. The committed
spec under review is still the lethal-trifecta breaker — the runner
executes only what was committed and reviewed at the PR head.

Workflow surface dropped:
- Harden build context (./scripts/verify/harden-build-context.sh)
- Set up Docker Buildx
- Build harness image (docker/build-push-action + cache-to/from)
- Capture image digest
- Smoke test image
- Run harness in container
- Mirror tmpfs output to runner workspace

Net delta: -78 lines, +33 lines (-45 net) on verify-pr.yml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrites the harness documentation around the v6 local-first
architecture. The Docker / Verdaccio / image-build-provenance sections
are dropped; new sections cover the per-recipe target dispatch
(`internal-ui` vs `sandbox:<template>`) and the runner-native CI
workflow.

scripts/verify/README.md
  * Architecture diagram updated to list target.ts + internal-ui.ts.
  * Flag table adds the positional <PR#> sugar and clarifies that
    `--resync` and `--restore-sandbox` only apply to sandbox-target
    recipes.
  * `verify-result.json` example uses `template: "internal-ui"`.
  * Prerequisites section calls out Node 22 (native TS-strip) as the
    primary runtime; Bun is needed only by the Playwright runner.
  * Side-effects section narrows to the sandbox target.
  * CI section documents the new yaml shape.
  * Drops the "Running inside the verify-harness container" section
    in its entirety.

scripts/verify/RUNBOOK.md
  * Full rewrite around the two flows: local AI fix-loop +
    GH Actions runner. Drops every Docker / buildx / harden-script /
    smoke-sentinel signal. Adds signals specific to v6:
    bootInternalUi timeout, --resync rejected, sandbox bootstrap
    missing, not-applicable verdict, label-step skipped, github-script
    verdict-read failure.

scripts/verify/SECURITY.md (recreated)
  * Brief threat-model note (~70 lines instead of 250+). Restates the
    eight load-bearing controls (committed-spec review, scoped API
    key, deny-regex, lint gate, provenance header, actor permission,
    label gate, repo-wide deny rules) and explains why v6 dropped the
    container without weakening the trifecta breakers. Notes the
    sandbox-runtime path as the next-step option if the threat model
    expands.

.verify-recipes/_recipe-authoring-guide.md
  * New §12 "Target selection (v6)" documenting the
    `// @verify-target:` header convention. Renumbers existing §12
    "Output budget" to §13.

.verify-recipes/example-smoke.spec.ts
  * Adds the explicit `// @verify-target: internal-ui` header as the
    canonical baseline example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…step

The Verify PR step runs 'yarn nx run-many -t compile -p core,cli,create-storybook'
inside the pr-head/ checkout. When the workflow runs on a fork without
Nx Cloud org access (e.g. the v6 firetest on valentinpalkovic/storybook),
nx aborts with:

  NX Cloud: Workspace is unable to be authorized. Exiting run.
  This Nx Cloud organization is disabled.

The Verify step doesn't need distributed cache for correctness — a
clean compile against the PR head is the whole point. Force-disable Nx
Cloud (NX_NO_CLOUD=true + empty access token) on the step's env block.

Upstream storybookjs/storybook CI is unaffected: other workflow steps
(Generate bundle, Author recipe) that already rely on Nx Cloud auth
continue to use it; only the Verify step opts out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ternal-ui

The Verify PR step previously ran:

    yarn nx run-many -t compile -p core,cli,create-storybook

This is sufficient for sandbox-target recipes (the sandbox already has
its own node_modules) but not for the internal-ui target. The
internal-ui build invokes 'yarn storybook:ui:build' in code/, which
loads code/.storybook/main.ts, which imports @storybook/react-vite
plus every addon (addon-onboarding, addon-themes, addon-docs,
addon-designs, addon-vitest, addon-a11y, addon-mcp) and transitive
renderer + builder packages. None of those are compiled by the
narrower filter, so .storybook/main.ts evaluation fails with
ERR_MODULE_NOT_FOUND.

Drop the project filter and let nx compile every project. Slower per
run but correct for the default target. Sandbox-target recipes are
unaffected — the same compile output is reused under the
syncCorePackage symlink path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r eslint-plugin topo

eslint-plugin's prebuild script (code/lib/eslint-plugin/scripts/...)
imports from 'storybook/csf' via jiti. nx run-many parallelises 42
projects without enforcing the compile-order edge (the repo lacks an
explicit dependsOn: ['^compile'] for that target), so eslint-plugin
runs before core finishes and the import resolves upward through the
parent base checkout's node_modules/storybook symlink → which has no
dist/csf yet → ERR_MODULE_NOT_FOUND.

Compile core first explicitly, then run-many for everything else. nx
caches the core build so the second pass is a no-op for that target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 14, 2026
github-actions Bot pushed a commit that referenced this pull request May 14, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: verified (target internal-ui)

Evidence (vision-check, claude-haiku-4-5-20251001): missing

Vision reasoning

Recipe produced no screenshots — cannot verify visible evidence.

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-14T09-42-15.647Z/pr-13-UndoIcon-renders-ins-6e856-wChangesButton-clear-button-chromium/sidebar-clear-button-undo-icon.png

2026-05-14T09-42-15.647Z/pr-13-UndoIcon-renders-ins-6e856-wChangesButton-clear-button-chromium/sidebar-clear-button-undo-icon.png

2026-05-14T09-42-15.647Z/pr-13-UndoIcon-renders-ins-6e856-wChangesButton-clear-button-chromium/test-finished-1.png

2026-05-14T09-42-15.647Z/pr-13-UndoIcon-renders-ins-6e856-wChangesButton-clear-button-chromium/test-finished-1.png

@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 14, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: regression (target internal-ui)

Reason: boot failure (see regressionDetails)

Compile output (last 4KB)
Error: bootInternalUi failed: Timed out waiting for: http://localhost:6006/iframe.html
    at bootInternalUi (file:///home/runner/work/_temp/pr-head/scripts/verify/internal-ui.ts:113:11)
    at async main (file:///home/runner/work/_temp/pr-head/scripts/verify-pr.ts:253:22)

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 14, 2026
github-actions Bot pushed a commit that referenced this pull request May 14, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: verified (target internal-ui)

Evidence (vision-check, claude-haiku-4-5-20251001): missing

Vision reasoning

Recipe produced no screenshots — cannot verify visible evidence.

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-14T10-08-33.796Z/pr-13-UndoIcon-renders-ins-ca285-clear-button-after-MOD-edit-chromium/sidebar-with-clear-button.png

2026-05-14T10-08-33.796Z/pr-13-UndoIcon-renders-ins-ca285-clear-button-after-MOD-edit-chromium/sidebar-with-clear-button.png

2026-05-14T10-08-33.796Z/pr-13-UndoIcon-renders-ins-ca285-clear-button-after-MOD-edit-chromium/test-finished-1.png

2026-05-14T10-08-33.796Z/pr-13-UndoIcon-renders-ins-ca285-clear-button-after-MOD-edit-chromium/test-finished-1.png

@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 14, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

No verdict produced — the workflow failed before the harness ran (likely recipe-author dispatch, deny-regex, or lint). See run log for details.

@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 14, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

No verdict produced — the workflow failed before the harness ran (likely recipe-author dispatch, deny-regex, or lint). See run log for details.

@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 14, 2026
github-actions Bot pushed a commit that referenced this pull request May 14, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: verified (target internal-ui)

Evidence (vision-check, claude-haiku-4-5-20251001): missing

Vision reasoning

Recipe produced no screenshots — cannot verify visible evidence.

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-14T10-26-57.011Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/clear-button-undoicon.png

2026-05-14T10-26-57.011Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/clear-button-undoicon.png

2026-05-14T10-26-57.011Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/sidebar-with-clear-button.png

2026-05-14T10-26-57.011Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/sidebar-with-clear-button.png

2026-05-14T10-26-57.011Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/test-finished-1.png

2026-05-14T10-26-57.011Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/test-finished-1.png

@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 14, 2026
github-actions Bot pushed a commit that referenced this pull request May 14, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: regression (target internal-ui)

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-14T10-45-21.496Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/test-failed-1.png

2026-05-14T10-45-21.496Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/test-failed-1.png

@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 14, 2026
github-actions Bot pushed a commit that referenced this pull request May 14, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: verified (target internal-ui)

Evidence (vision-check, claude-haiku-4-5-20251001): missing

Vision reasoning

Recipe produced no screenshots — cannot verify visible evidence.

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-14T10-51-11.140Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/sidebar-with-clear-button.png

2026-05-14T10-51-11.140Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/sidebar-with-clear-button.png

2026-05-14T10-51-11.140Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/test-finished-1.png

2026-05-14T10-51-11.140Z/pr-13-ReviewChangesButton--f2f10-on-after-Save-from-Controls-chromium/test-finished-1.png

@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: verified (target internal-ui)

Evidence (vision-check, claude-haiku-4-5-20251001): missing

Vision reasoning

Recipe produced no screenshots — cannot verify visible evidence.

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-14T11-03-51.253Z/pr-13-ReviewChangesButton--0ccaa-ers-UndoIcon-not-SweepIcon--chromium/sidebar-clear-button-undo-icon.png

2026-05-14T11-03-51.253Z/pr-13-ReviewChangesButton--0ccaa-ers-UndoIcon-not-SweepIcon--chromium/sidebar-clear-button-undo-icon.png

2026-05-14T11-03-51.253Z/pr-13-ReviewChangesButton--0ccaa-ers-UndoIcon-not-SweepIcon--chromium/test-finished-1.png

2026-05-14T11-03-51.253Z/pr-13-ReviewChangesButton--0ccaa-ers-UndoIcon-not-SweepIcon--chromium/test-finished-1.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:verify Trigger PR Verification Harness verified-by-harness Verified by PR Verify Harness

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant