Skip to content

try: storybookjs/storybook#34658 — sidebar focus ring for selected item#16

Open
valentinpalkovic wants to merge 2 commits into
nextfrom
try-pr-34658
Open

try: storybookjs/storybook#34658 — sidebar focus ring for selected item#16
valentinpalkovic wants to merge 2 commits into
nextfrom
try-pr-34658

Conversation

@valentinpalkovic
Copy link
Copy Markdown
Owner

Cherry-pick of upstream 51ddf4c (storybookjs#34658) onto fork's next for v6 single-round verify harness firetest.

Reference: storybookjs#34658

@valentinpalkovic valentinpalkovic added the ci:verify Trigger PR Verification Harness label May 12, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

Fails
🚫

PR is not labeled with one of: ["cleanup","BREAKING CHANGE","feature request","bug","documentation","maintenance","build","dependencies"]

🚫

PR is not labeled with one of: ["ci:normal","ci:merged","ci:daily","ci:docs"]

🚫 PR title must be in the format of "Area: Summary", With both Area and Summary starting with a capital letter Good examples: - "Docs: Describe Canvas Doc Block" - "Svelte: Support Svelte v4" Bad examples: - "add new api docs" - "fix: Svelte 4 support" - "Vue: improve docs"

Generated by 🚫 dangerJS against 8404db8

@github-actions github-actions Bot added the verified-by-harness Verified by PR Verify Harness label May 12, 2026
github-actions Bot pushed a commit that referenced this pull request May 12, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: verified (target internal-ui)

Replay: npx playwright show-trace on the trace.zip in artifacts.

Screenshots

2026-05-12T08-19-58.524Z/pr-16-sidebar-selected-ite-7ca6b-yboard-highlight-focus-ring-chromium/test-finished-1.png

2026-05-12T08-19-58.524Z/pr-16-sidebar-selected-ite-7ca6b-yboard-highlight-focus-ring-chromium/test-finished-1.png

@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 12, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: regression (target internal-ui)

Replay: npx playwright show-trace on the trace.zip in artifacts.

@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: regression (target internal-ui)

Replay: npx playwright show-trace on the trace.zip in artifacts.

Screenshots

2026-05-12T09-50-36.241Z/pr-16-selected-sidebar-ite-5abbc-yboard-highlight-box-shadow-chromium/test-failed-1.png

2026-05-12T09-50-36.241Z/pr-16-selected-sidebar-ite-5abbc-yboard-highlight-box-shadow-chromium/test-failed-1.png

github-actions Bot pushed a commit that referenced this pull request May 12, 2026
valentinpalkovic added a commit that referenced this pull request May 12, 2026
…k to author

Previously only evidence-missing/undetermined verdicts triggered retry. Regression
verdicts (Playwright assertions failed) skipped retry entirely, even though the
author could often self-correct from the error trace (wrong route, missing
trigger state, stale selector). This change extends retry to cover regression
by:

1. Wrapping the initial verify-pr step with `|| true` so the workflow continues
   on Playwright failure; the final verdict gate at the end of the job
   preserves red-CI signal based on post-retry verdict.
2. Evidence-check + Retry steps switch from `if: success()` to `if: always()`
   and gate internally on the JSON verdict.
3. Retry step extends gate to include verdict==regression. Retry-context for
   regression cases is built from each failed test's error-context.md (page
   snapshot) + first Playwright error message from playwright-report.json,
   capped at 8 KB per snapshot.
4. verify-pr-generate's --retry-context preamble is softened to cover both
   regression (selector / route correction) and evidence (trigger-state
   correction) paths.
5. Authoring guide §6: explicit guidance for non-stories diffs — locate the
   sibling *.stories.tsx and derive kind-id from its file path under the
   registered titlePrefix in code/.storybook/main.ts. Avoids agents guessing
   docs-site routes (e.g. addons-controls-basics) that don't exist in
   internal-ui.

Empirical findings from PR #14 and #16 firetest:
- #14 (Object.tsx control label diff) agent navigated to non-existent route
  addons-controls-object--basic. Correct route would be
  addons-docs-blocks-controls-object--object (or --docs autodocs page).
- #16 (HighlightStyles conditional CSS) agent's toHaveCSS regex timed out
  because HighlightStyles only mounts on keyboard-nav highlight. Author had
  no way to learn this without the Playwright trace.

Retry-on-regression gives the author one chance to self-correct using the
page snapshot at failure point.
@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 12, 2026
github-actions Bot pushed a commit that referenced this pull request May 12, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: regression (target internal-ui)

Replay: npx playwright show-trace on the trace.zip in artifacts.

Screenshots

2026-05-12T10-11-43.699Z/pr-16-selected-sidebar-ite-6b652-inset-focus-ring-box-shadow-chromium/test-failed-1.png

2026-05-12T10-11-43.699Z/pr-16-selected-sidebar-ite-6b652-inset-focus-ring-box-shadow-chromium/test-failed-1.png

2026-05-12T10-10-57.740Z/pr-16-selected-sidebar-ite-558b9-inset-focus-ring-box-shadow-chromium/test-failed-1.png

2026-05-12T10-10-57.740Z/pr-16-selected-sidebar-ite-558b9-inset-focus-ring-box-shadow-chromium/test-failed-1.png

valentinpalkovic added a commit that referenced this pull request May 12, 2026
After the deterministic story-route util lands, agents still regressed on
non-trivial DOM details (PR #14: assumed textarea[name="value"] when the
story passes `args: { name: "object" }` so the rendered input id derives
from "object"; PR #16: couldn't see that the TreeNode story doesn't mount
the Explorer / HighlightStyles tree). The page-snapshot retry feedback
captures manager DOM only — iframe story content is opaque.

Fix: include the source of each resolved story file in the prompt bundle
under a "Story file sources" section, capped at 160 lines per file so the
prompt stays bounded. Agents now see `meta.args`, story-level `args`,
`parameters`, and `tags` — enough to derive the rendered input id and the
mount conditions without guessing.

Same file-resolution path as the routes section reuses
collectRelevantStoryFiles(): touched stories first, then sibling stories
that import the changed module by basename.
@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 12, 2026
github-actions Bot pushed a commit that referenced this pull request May 12, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: verified (target internal-ui)

Evidence (after 1 retry) (vision-check, claude-haiku-4-5-20251001): undetermined — The diff adds a CSS box-shadow rule for a selected and highlighted sidebar item. The Playwright recipe verifies the CSS property is applied via toHaveCSS() assertion (which passed), but the screenshots show the sidebar at normal resolution where a 2px inset box-shadow is subtle and difficult to visually confirm in a PNG. The test assertion passing indicates the change works, but the visual presence in the screenshots is not clearly observable.

Replay: npx playwright show-trace on the trace.zip in artifacts.

Screenshots

2026-05-12T10-21-09.086Z/pr-16-selected-sidebar-ite-558b9-inset-focus-ring-box-shadow-chromium/test-failed-1.png

2026-05-12T10-21-09.086Z/pr-16-selected-sidebar-ite-558b9-inset-focus-ring-box-shadow-chromium/test-failed-1.png

2026-05-12T10-22-01.408Z/pr-16-selected-sidebar-ite-dca18-s-ring-from-HighlightStyles-chromium/test-finished-1.png

2026-05-12T10-22-01.408Z/pr-16-selected-sidebar-ite-dca18-s-ring-from-HighlightStyles-chromium/test-finished-1.png

2026-05-12T10-22-01.408Z/pr-16-selected-sidebar-ite-dca18-s-ring-from-HighlightStyles-chromium/sidebar-selected-focus-ring.png

2026-05-12T10-22-01.408Z/pr-16-selected-sidebar-ite-dca18-s-ring-from-HighlightStyles-chromium/sidebar-selected-focus-ring.png

@valentinpalkovic valentinpalkovic removed the ci:verify Trigger PR Verification Harness label May 12, 2026
valentinpalkovic added a commit that referenced this pull request May 12, 2026
…ignal

When a PR ships its own *.test.{ts,tsx,js,jsx} alongside the diff
(e.g. PR #16 added HighlightStyles.test.tsx that exercises the new CSS
rule directly), run them via vitest after the Playwright recipe. Result
is stored under verify-result.json's new `unitTests` field:
{ran, files, passed, summary, details}. PR comment now renders the
unit-test status alongside the Playwright verdict — reviewers see both
signals in one place.

This step doesn't override the Playwright verdict (independent signals
are most useful when shown side-by-side); a verified Playwright + failed
unit test, or the inverse, is informative on its own. Future iteration
may gate the final verdict on the AND of both signals.

Detection: greps /tmp/pr.diff for '+++ b/code/.+\.(test|spec)\.(ts|tsx|js|jsx)' file headers, only includes files that exist on disk. Skips
the step entirely (recording 'ran: false') when the diff has no test files.
@valentinpalkovic valentinpalkovic added the ci:verify Trigger PR Verification Harness label May 12, 2026
github-actions Bot pushed a commit that referenced this pull request May 12, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: verified (target internal-ui)

Evidence (after 1 retry) (vision-check, claude-haiku-4-5-20251001): found — The first screenshot shows the Storybook sidebar with the 'Primary' button selected (highlighted in blue). The PR diff adds a CSS rule that applies an inset 2px box-shadow when [data-selected="true"] is set. The Playwright recipe verified that the selected sidebar item has the correct computed box-shadow style matching the inset 2px ring. The screenshot visibly captures the selected state of the sidebar item, confirming the UI change is observable.

PR-added unit tests: ✅ passed — 1 passed, 0 failed across 2 suite(s)

Files: code/core/src/manager/components/sidebar/__tests__/HighlightStyles.test.tsx

Replay: npx playwright show-trace on the trace.zip in artifacts.

Screenshots

2026-05-12T11-25-06.281Z/pr-16-selected-sidebar-ite-fd1b8-ing-box-shadow-on-highlight-chromium/test-failed-1.png

2026-05-12T11-25-06.281Z/pr-16-selected-sidebar-ite-fd1b8-ing-box-shadow-on-highlight-chromium/test-failed-1.png

2026-05-12T11-25-53.992Z/pr-16-selected-sidebar-ite-e65f0-s-ring-from-HighlightStyles-chromium/test-finished-1.png

2026-05-12T11-25-53.992Z/pr-16-selected-sidebar-ite-e65f0-s-ring-from-HighlightStyles-chromium/test-finished-1.png

2026-05-12T11-25-53.992Z/pr-16-selected-sidebar-ite-e65f0-s-ring-from-HighlightStyles-chromium/sidebar-selected-focus-ring.png

2026-05-12T11-25-53.992Z/pr-16-selected-sidebar-ite-e65f0-s-ring-from-HighlightStyles-chromium/sidebar-selected-focus-ring.png

@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 12, 2026
github-actions Bot pushed a commit that referenced this pull request May 12, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: verified (target internal-ui)

Evidence (after 1 retry) (vision-check, claude-haiku-4-5-20251001): undetermined — The diff adds a CSS box-shadow rule to a selected sidebar item's focus highlight. While the Playwright recipe explicitly navigates to a story, focuses the sidebar row, and attempts to screenshot the sidebar region with the focus ring, the provided screenshots do not show the sidebar with a highlighted/focused item visible. The screenshots appear to show the Storybook interface but not the specific state (focused selected sidebar item) that would display the new box-shadow effect from the diff.

PR-added unit tests: ✅ passed — 1 passed, 0 failed across 2 suite(s)

Files: code/core/src/manager/components/sidebar/__tests__/HighlightStyles.test.tsx

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-12T12-35-17.862Z/pr-16-selected-sidebar-ite-bea05-w-from-HighlightStyles-diff-chromium/test-finished-1.png

2026-05-12T12-35-17.862Z/pr-16-selected-sidebar-ite-bea05-w-from-HighlightStyles-diff-chromium/test-finished-1.png

2026-05-12T12-35-17.862Z/pr-16-selected-sidebar-ite-bea05-w-from-HighlightStyles-diff-chromium/sidebar-selected-focus-ring.png

2026-05-12T12-35-17.862Z/pr-16-selected-sidebar-ite-bea05-w-from-HighlightStyles-diff-chromium/sidebar-selected-focus-ring.png

2026-05-12T12-34-18.397Z/pr-16-selected-sidebar-ite-3d7c9--focus-highlight-box-shadow-chromium/test-failed-1.png

2026-05-12T12-34-18.397Z/pr-16-selected-sidebar-ite-3d7c9--focus-highlight-box-shadow-chromium/test-failed-1.png

@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: regression (target internal-ui)

Reason: Playwright assertion failed in: selected sidebar item shows keyboard highlight focus ring via HighlightStyles — Error: expect(received).toEqual(expected) // deep equality - Expected - 1 + Received + 5 - Array [] + Array [ + "SecurityError: Failed to read the 'sessionStorage' property from 'Window': Access is denied for this document. + at <anonymous>:12:7 + at <anonymous>:13

PR-added unit tests: ✅ passed — 1 passed, 0 failed across 2 suite(s)

Files: code/core/src/manager/components/sidebar/__tests__/HighlightStyles.test.tsx

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-13T07-43-11.874Z/pr-16-selected-sidebar-ite-beda2-yboard-highlight-focus-ring-chromium/test-failed-1.png

2026-05-13T07-43-11.874Z/pr-16-selected-sidebar-ite-beda2-yboard-highlight-focus-ring-chromium/test-failed-1.png

2026-05-13T07-44-04.021Z/pr-16-selected-sidebar-ite-1ad38-us-ring-via-HighlightStyles-chromium/test-failed-1.png

2026-05-13T07-44-04.021Z/pr-16-selected-sidebar-ite-1ad38-us-ring-via-HighlightStyles-chromium/test-failed-1.png

2026-05-13T07-44-04.021Z/pr-16-selected-sidebar-ite-1ad38-us-ring-via-HighlightStyles-chromium/sidebar-selected-focus-ring.png

2026-05-13T07-44-04.021Z/pr-16-selected-sidebar-ite-1ad38-us-ring-via-HighlightStyles-chromium/sidebar-selected-focus-ring.png

valentinpalkovic added a commit that referenced this pull request May 13, 2026
PR #16 run 25785401577 ran cleanly through the new telemetry path
but every token/cost field landed at 0. Cause: recipe-author writes
dispatch-response.json under \$GITHUB_WORKSPACE/.verify-output/<runId>/
(verify-pr-author runs from the trusted base checkout with default
CWD), while the telemetry glob was scoped to
\$PR_HEAD_DIR/.verify-output/ only — so the recipe-author dispatches
were invisible to the aggregator, while evidence-check (which writes
beside verify-result.json under \$PR_HEAD_DIR) would have been
counted if a verified verdict had triggered it.

Scan both roots. The Stage-artifacts step already separates the two
trees (base/ vs runner-pr-head/), so this is a parity fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 13, 2026
github-actions Bot pushed a commit that referenced this pull request May 13, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: regression (target internal-ui)

Reason: Playwright assertion failed in: selected sidebar item has inset focus highlight box-shadow — Error: expect(received).toEqual(expected) // deep equality - Expected - 1 + Received + 5 - Array [] + Array [ + "SecurityError: Failed to read the 'sessionStorage' property from 'Window': Access is denied for this document. + at <anonymous>:12:7 + at <anonymous>:13

PR-added unit tests: ✅ passed — 1 passed, 0 failed across 2 suite(s)

Files: code/core/src/manager/components/sidebar/__tests__/HighlightStyles.test.tsx

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-13T07-54-28.554Z/pr-16-selected-sidebar-ite-ecf57--focus-highlight-box-shadow-chromium/test-failed-1.png

2026-05-13T07-54-28.554Z/pr-16-selected-sidebar-ite-ecf57--focus-highlight-box-shadow-chromium/test-failed-1.png

2026-05-13T07-54-28.554Z/pr-16-selected-sidebar-ite-ecf57--focus-highlight-box-shadow-chromium/sidebar-selected-focus-ring.png

2026-05-13T07-54-28.554Z/pr-16-selected-sidebar-ite-ecf57--focus-highlight-box-shadow-chromium/sidebar-selected-focus-ring.png

2026-05-13T07-53-32.148Z/pr-16-selected-sidebar-ite-3d7c9--focus-highlight-box-shadow-chromium/test-failed-1.png

2026-05-13T07-53-32.148Z/pr-16-selected-sidebar-ite-3d7c9--focus-highlight-box-shadow-chromium/test-failed-1.png

valentinpalkovic added a commit that referenced this pull request May 13, 2026
PR #16 run 25785774485 hit a runtime jq error during cost aggregation:

  jq: error (at .verify-output/<id>/dispatch-response.json:16):
  number (182708.75) and object ({"input_tok...) cannot be divided

The exact same script + exact same artifact files ran cleanly on
jq 1.8.1 locally and produced \$0.340414. The CI runner ships jq 1.6.x
(Ubuntu 22.04 default), which evaluates the chained \`//\` alternative
operator + arithmetic differently when an intermediate field is
absent or non-numeric.

Two fixes layered together:

1. Coerce every SDK field via a new \`num(x)\` helper that returns 0
   for any non-number input (null / object / string), instead of
   relying on \`(.x // 0)\` defaulting which can leak the original
   non-numeric value when the field exists but is the wrong type.
2. Replace the \`(a // null) // (b // 0)\` chain for the 5m cache write
   tokens with an explicit \`if a > 0 then a else b end\` ladder.

Same output as before on the real PR #16 artifacts (\$0.340414, 45803
input / 1996 output / 9111 cache_read / 9111 cache_write).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 13, 2026
github-actions Bot pushed a commit that referenced this pull request May 13, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: regression (target internal-ui)

Reason: Playwright assertion failed in: selected sidebar item shows inset focus highlight from HighlightStyles — Error: expect(received).toEqual(expected) // deep equality - Expected - 1 + Received + 5 - Array [] + Array [ + "SecurityError: Failed to read the 'sessionStorage' property from 'Window': Access is denied for this document. + at <anonymous>:12:7 + at <anonymous>:13

PR-added unit tests: ✅ passed — 1 passed, 0 failed across 2 suite(s)

Files: code/core/src/manager/components/sidebar/__tests__/HighlightStyles.test.tsx

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-13T08-11-43.885Z/pr-16-selected-sidebar-ite-01f76-hlight-from-HighlightStyles-chromium/test-failed-1.png

2026-05-13T08-11-43.885Z/pr-16-selected-sidebar-ite-01f76-hlight-from-HighlightStyles-chromium/test-failed-1.png

2026-05-13T08-11-43.885Z/pr-16-selected-sidebar-ite-01f76-hlight-from-HighlightStyles-chromium/sidebar-selected-highlight.png

2026-05-13T08-11-43.885Z/pr-16-selected-sidebar-ite-01f76-hlight-from-HighlightStyles-chromium/sidebar-selected-highlight.png

2026-05-13T08-10-49.263Z/pr-16-selected-sidebar-ite-3d7c9--focus-highlight-box-shadow-chromium/test-failed-1.png

2026-05-13T08-10-49.263Z/pr-16-selected-sidebar-ite-3d7c9--focus-highlight-box-shadow-chromium/test-failed-1.png

@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 13, 2026
github-actions Bot pushed a commit that referenced this pull request May 13, 2026
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: regression (target internal-ui)

Reason: Playwright assertion failed in: selected sidebar item has inset 2px box-shadow ring from HighlightStyles — Error: expect(received).toEqual(expected) // deep equality - Expected - 1 + Received + 5 - Array [] + Array [ + "SecurityError: Failed to read the 'sessionStorage' property from 'Window': Access is denied for this document. + at <anonymous>:12:7 + at <anonymous>:13

PR-added unit tests: ✅ passed — 1 passed, 0 failed across 2 suite(s)

Files: code/core/src/manager/components/sidebar/__tests__/HighlightStyles.test.tsx

Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-13T08-22-20.385Z/pr-16-selected-sidebar-ite-dec16-w-ring-from-HighlightStyles-chromium/test-failed-1.png

2026-05-13T08-22-20.385Z/pr-16-selected-sidebar-ite-dec16-w-ring-from-HighlightStyles-chromium/test-failed-1.png

2026-05-13T08-22-20.385Z/pr-16-selected-sidebar-ite-dec16-w-ring-from-HighlightStyles-chromium/sidebar-selected-focus-ring.png

2026-05-13T08-22-20.385Z/pr-16-selected-sidebar-ite-dec16-w-ring-from-HighlightStyles-chromium/sidebar-selected-focus-ring.png

2026-05-13T08-21-25.272Z/pr-16-selected-sidebar-ite-601a3-box-shadow-when-highlighted-chromium/test-failed-1.png

2026-05-13T08-21-25.272Z/pr-16-selected-sidebar-ite-601a3-box-shadow-when-highlighted-chromium/test-failed-1.png

valentinpalkovic added a commit that referenced this pull request May 13, 2026
…x-target CI, Layer-1/Layer-2 security, retry on regression, telemetry

Squash of fork-side iteration on top of the single-round v6 pivot.
Major changes since 00aa5c4:

## Verdict layering

- Three orthogonal signals: Playwright (recipe execution) + vision
  evidence-check (claude-haiku-4-5 reading the diff + spec + screenshots)
  + PR-added unit tests (vitest on *.test.* files from the PR diff).
- Final verdict gates on AND of Playwright + unit tests. Vision is
  informational (catches sr-only / invisible changes where assertions
  pass but screenshots can't confirm).
- regressionReason is derived from playwright-report.json when the
  recipe author doesn't populate one — reviewers see the failing test
  title + first error inline.

## Retry loop

- Retry-on-regression: feeds Playwright error context (page snapshot +
  iframe a11y snapshot + first error from playwright-report.json) back
  to the recipe author as --retry-context. Author re-emits the spec,
  Playwright re-runs. Single retry; final verdict gates label.
- Retry-on-evidence-undetermined: feeds vision reasoning back so the
  author can target the diff more precisely (e.g., tighter screenshot
  region).

## Sandbox-target CI path

- Recipes can set `// @verify-target: sandbox:<template>` (e.g.,
  `sandbox:vue3-vite/default-ts`). The workflow detects the header,
  runs `nx run <template>:sandbox` (NX resolves implicitDependencies,
  emits the sandbox at code/sandbox/<key>), and verify-pr.ts boots
  Storybook against that sandbox instead of the internal-ui dev server.
- Allowlisted templates: react-vite, react-webpack, vue3-vite,
  svelte-vite, angular-cli, nextjs, nextjs-vite (all default-ts).
- Skips the global `compile` target when sandbox-bound — `:sandbox`
  handles all transitive deps via the NX project graph.

## Layer-1 security: secret stripping

- pull_request_target runs build / sandbox / recipe code from the
  untrusted PR head as the runner user that holds GITHUB_TOKEN
  (contents:write, pull-requests:write) and ANTHROPIC_API_KEY.
- The Verify-PR, Retry-on-regression, and Run-PR-added-unit-tests
  steps `unset GITHUB_TOKEN GH_TOKEN ANTHROPIC_API_KEY` before
  invoking any PR-head script. Trusted scripts above
  (verify-pr-generate, verify-pr-author) still see the keys because
  env -u (or env --unset on the inner command) only strips for the
  single command.

## Layer-2 security: @anthropic-ai/sandbox-runtime jail

- Wraps `yarn verify-pr` (initial attempt + retry) in srt with a
  bubblewrap-backed FS + network jail. Defence-in-depth on top of
  Layer-1.
- network.allowLocalBinding: true (Storybook dev server on
  localhost:6006); network.allowedDomains: [] (no public-internet
  egress).
- filesystem.allowWrite: $RUNNER_TEMP, /tmp, $HOME/.cache,
  $HOME/.local/share, $HOME/.storybook.
- filesystem.denyRead: $HOME/{.ssh, .aws, .docker, .npmrc, .gitconfig,
  .config/gh} (belt-and-suspenders alongside the env stripping).
- CLAUDE_CODE_TMPDIR=$RUNNER_TEMP/sandbox-tmp so the sandbox's TMPDIR
  bind source exists on the host.

## Recipe-author quality

- Deterministic story-route derivation: scripts/verify/derive-story-
  routes.ts parses code/.storybook/main.ts via TS AST + inlines
  Storybook's auto-title / toId / storyNameFromExport algorithms.
  Routes injected into the prompt bundle verbatim — agents stop
  guessing 404 paths.
- Full source of touched non-stories files in the prompt bundle
  (capped 250 lines per file, 4 files per PR). Agents see actual
  component props / ariaLabels / data-attrs upfront.
- Iframe a11y snapshot fixture in _util.ts: on test failure, writes
  the preview-iframe's body.ariaSnapshot() to iframe-snapshot.md.
  Retry step appends this alongside the manager page-snapshot.
- Authoring guide §8.1 expanded with evidence requirement + four-step
  evidence gate + worked examples (focus-ring, Save-from-Controls
  icon swap, sr-only label gating).

## Compile-failure surfacing

- When `nx compile` fails before Playwright runs, the workflow writes
  a stub verify-result.json with verdict=regression, regressionReason=
  "compile failure", regressionDetails=tail -c 4000 of the log
  (ANSI-stripped). PR comment renders the build error in-line so
  reviewers see WHY without downloading artifacts.

## UX polish

- Vision reasoning collapsed inside a <details> block (verdict stays
  one-glance, reasoning one click away).
- PR comment unitTests block renders ✅/❌ alongside Playwright +
  vision so reviewers see all three signals together.
- Artifact zip staged under non-dot dirs so reviewers can browse it
  without toggling Finder's hidden-file display.
- Replay link points at the run-summary page (where the Artifacts
  section lives) instead of the 404-emitting /artifacts path.

## Telemetry

- New "Append telemetry" workflow step writes one CSV row per run to
  telemetry.csv on the _verify-screenshots side branch. Columns:
  run_id, pr_number, verdict, target, evidence_verdict,
  evidence_retry, unit_tests_ran, unit_tests_passed, duration_ms,
  timestamp. After 10–20 PRs the data drives v8 prioritisation
  (in-app role discovery, 2-retry budget, cross-package story
  heuristic, etc.).

## Validation

Firetest PRs (fork-side):
- #12 internal-ui smoke — verified
- #13 Save-from-Controls icon swap — verified + evidence found
- #14 ObjectControl raw JSON sr-only label — verified after retry
- #15 ArgsTable dark-mode border — regression (genuine compile-fail)
- #16 sidebar focus ring — verified, three signals positive
- #17 Vue3 page-style scoping (sandbox target) — verified + found
- #18 Svelte docgen refactor (sandbox target) — verified
- #21 Angular stats.json (sandbox target) — verified

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@valentinpalkovic valentinpalkovic force-pushed the next branch 2 times, most recently from d57333c to 046eda5 Compare May 14, 2026 10:54
…posites

Adds an LLM-authored, single-round PR verification harness that runs
under `pull_request_target` when a maintainer applies `ci:verify` to a
non-draft PR. Authors a Playwright spec from the PR diff, executes it
against either the monorepo's internal Storybook UI or a sandbox
template, optionally validates evidence via Claude Haiku vision, runs
PR-added unit tests, and posts a verdict comment with screenshots.

Infrastructure is factored into reusable composite actions so future
agentic workflows can reuse the trust-boundary plumbing:

  .github/actions/agentic-pr-prepare/   — actor gate, base + PR-head
                                          manual clones, sandbox-runtime
                                          (srt) install + sha-pin, srt
                                          settings, trusted-harness sync
  .github/actions/agentic-pr-publish/   — verdict read, side-branch
                                          screenshot push, telemetry
                                          append, artifact upload

Trust-boundary hardenings (per dual-LLM security review):

- C1 HMAC-bound verdict: scripts/verify-pr.ts signs the trust-critical
  subset of verify-result.json with VERIFY_PROVENANCE_SECRET; trusted
  derive-verdict.ts downgrades 'verified' → 'regression' on signature
  mismatch (closes in-srt forgery vector).
- H1: srt-sha256 has no composite default — caller must pass inline.
- H2: sync-files/sync-trees inputs reject `..` / leading `/` / extra `:`;
  realpath asserts under $PR_HEAD_DIR; symlink-refuse before cp.
- H3: srt-settings.json arrays emitted via jq -R | jq -s.
- H4: screenshot URLs exposed as composite output FILE PATH (caller
  fs.readFileSync), closing heredoc-terminator injection.
- M1: every publish sub-step that needs prior-step-failure tolerance
  carries explicit `if: always()` (composite-level if: doesn't cascade).
- M2: VERIFY_PROVENANCE_SECRET written to file (mode 0600), not
  $GITHUB_ENV.
- M3: tokens passed via env mapping only, never literal interpolation.

Layer-2 isolation: every PR-controlled step (yarn install, nx compile,
nx run <tpl>:sandbox, Playwright recipe, PR-added unit tests) wraps
under @anthropic-ai/sandbox-runtime (bubblewrap mount/network namespaces).
Layer-1 controls (deny-regex, ESLint policy, enableScripts:false,
committed lockfile, scoped API keys) remain in place.

Trusted scripts (verify-pr-generate / verify-pr-author / recipe-author-core
/ recipe-deny / lint-invocation / authoring guide / canonical smoke) live
in the base checkout, so a malicious PR cannot weaken the gate.

Helper scripts under scripts/verify/ci/:
  - derive-verdict.ts    — reads verify-result.json + playwright report,
                           validates HMAC, downgrades on mismatch.
  - push-screenshots.ts  — clones _agentic-pr-assets side branch, validates
                           PNG mime + per-file (5MB) / bundle (50MB) caps,
                           commits, pushes, emits raw.githubusercontent.com
                           URLs.
  - append-telemetry.ts  — POSTs to Google Apps Script Sheet (no-op when
                           webhook secrets unset).
  - render-pr-comment.ts — renders verdict comment body, redacts token-
                           shaped substrings, supports unit-test merge.
  - write-compile-failure-stub.ts — emits signed regression stub when
                           compile aborts before orchestrator runs.

Documentation: scripts/verify/SECURITY.md (threat model + lethal-trifecta
breakers), scripts/verify/RUNBOOK.md (operational details),
.github/actions/agentic-pr-*/README.md (caller contract + worked example).
@github-actions
Copy link
Copy Markdown

Verify Harness

Verdict: regression (target internal-ui)

PR-added unit tests: ❌ failed — vitest exited 1 without writing a JSON report (likely setup error); see Action log

Files: code/core/src/manager/components/sidebar/__tests__/HighlightStyles.test.tsx

vitest output (last 4KB)
 /tmp/claude/eval-sync-baselines-nested-no-legacy-aunPx6/remotes/excalidraw.git
 * [new branch]      main -> main
Cloning into '/tmp/claude/eval-sync-staging-DP3nYo'...
warning: You appear to have cloned an empty repository.
done.
To /tmp/claude/eval-sync-baselines-auto-clone-EBkbTR/remotes/mealdrop.git
 * [new branch]      main -> main
Cloning into '/tmp/claude/eval-sync-staging-y9ox53'...
warning: You appear to have cloned an empty repository.
done.
To /tmp/claude/eval-sync-baselines-auto-clone-EBkbTR/remotes/wikitok.git
 * [new branch]      main -> main
To /tmp/claude/eval-sync-storybook-version-b3qZhW/remotes/mealdrop.git
 * [new branch]      main -> main
To /tmp/claude/eval-sync-storybook-version-b3qZhW/remotes/wikitok.git
 * [new branch]      main -> main
To /tmp/claude/eval-sync-storybook-version-noop-PGeiSk/remotes/mealdrop.git
 * [new branch]      main -> main
To /tmp/claude/eval-sync-baselines-target-behind-di8Yr1/remotes/mealdrop.git
 * [new branch]      main -> main
To /tmp/claude/eval-sync-baselines-target-behind-di8Yr1/remotes/edgy.git
 * [new branch]      main -> main
Cloning into '/tmp/claude/eval-sync-baselines-target-behind-di8Yr1/edgy-remote-worktree'...
done.
To /tmp/claude/eval-sync-baselines-target-behind-di8Yr1/remotes/edgy.git
   eed4107..629fbfb  main -> main
To /tmp/claude/eval-sync-storybook-version-dirty-g4tDzF/remotes/mealdrop.git
 * [new branch]      main -> main
Cloning into '/tmp/claude/eval-sync-storybook-version-staging-g8Nxcv'...
warning: You appear to have cloned an empty repository.
done.
To /tmp/claude/eval-sync-storybook-version-auto-clone-atDFTP/remotes/mealdrop.git
 * [new branch]      main -> main
To /tmp/claude/eval-sync-storybook-version-skip-push-Dtskw4/remotes/mealdrop.git
 * [new branch]      main -> main
(node:21) ExperimentalWarning: SQLite is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
To /tmp/claude/eval-sync-storybook-version-resume-push-no8hj9/remotes/mealdrop.git
 * [new branch]      main -> main
(node:21) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
(node:21) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
(node:21) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)

������ Unhandled Error �������
Error: EROFS: read-only file system, open '/home/runner/work/_temp/unit-tests-report.json'
 � open node:internal/fs/promises:636:25
 � Object.writeFile node:internal/fs/promises:1205:14
 � JsonReporter.writeReport node_modules/vitest/dist/chunks/index.UpGiHP7g.js:3626:4
 � JsonReporter.onTestRunEnd node_modules/vitest/dist/chunks/index.UpGiHP7g.js:3613:3
 � Vitest.report node_modules/vitest/dist/chunks/cli-api.Cjt90eJu.js:13968:3
 � TestRun.end node_modules/vitest/dist/chunks/cli-api.Cjt90eJu.js:12591:3
 � node_modules/vitest/dist/chunks/cli-api.Cjt90eJu.js:13591:6
 � node_modules/vitest/dist/chunks/cli-api.Cjt90eJu.js:13601:11
 � node_modules/vitest/dist/chunks/cli-api.Cjt90eJu.js:13463:19

������������������������������
Serialized Error: { errno: -30, code: 'EROFS', syscall: 'open', path: '/home/runner/work/_temp/unit-tests-report.json' }




Replay: npx playwright show-trace on the trace.zip listed under "Artifacts" on the run summary page.

Screenshots

2026-05-14T13-56-28.266Z/pr-16-selected-sidebar-ite-beda2-yboard-highlight-focus-ring-chromium/test-failed-1.png

2026-05-14T13-56-28.266Z/pr-16-selected-sidebar-ite-beda2-yboard-highlight-focus-ring-chromium/test-failed-1.png

@valentinpalkovic valentinpalkovic force-pushed the next branch 5 times, most recently from 8e0a05c to 80ccd7d Compare May 15, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:verify Trigger PR Verification Harness

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants