Skip to content

chore: verify-harness A4 label-fire test (do not merge)#34763

Closed
valentinpalkovic wants to merge 1 commit into
valentin/agentic-review-harnessfrom
valentin/verify-harness-a4-firetest
Closed

chore: verify-harness A4 label-fire test (do not merge)#34763
valentinpalkovic wants to merge 1 commit into
valentin/agentic-review-harnessfrom
valentin/verify-harness-a4-firetest

Conversation

@valentinpalkovic
Copy link
Copy Markdown
Contributor

@valentinpalkovic valentinpalkovic commented May 11, 2026

Purpose

Activation gate A4 for the PR Verification Harness (tracked in #34762). Tests that:

  • The verify-pr.yml workflow fires on label apply.
  • Actor-permission gate passes.
  • Generate bundle + Author recipe steps succeed against pinned action SHAs and the provisioned ANTHROPIC_API_KEY secret.
  • The committed spec runs in --network=none.
  • Verdict artifact uploads.

Do not merge

This PR exists only to trigger the workflow. Will be closed after the run completes; the single comment-line change reverts on close.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added automated PR verification harness with AI-assisted test generation and Playwright recipe execution against Storybook.
    • New GitHub Actions workflow enables automated verification on labeled PRs with artifacts and PR comment reporting.
  • Tests

    • Added Playwright test infrastructure with recipe authoring guidance and utility helpers.
  • Documentation

    • Added PR verification guides, security documentation, and recipe authoring specifications.
  • Chores

    • Added Docker and Git ignore configurations for secure isolated execution.
    • Added new CLI commands: verify-pr, verify-pr-generate, verify-pr-author.
    • Added Anthropic SDK dependency.

Review Change Stack

No-op comment added to trigger PR Verification Harness workflow for
activation gate A4. To be closed after the workflow run completes and
the verdict is captured.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Fails
🚫

PR is not labeled with one of: ["cleanup","BREAKING CHANGE","feature request","bug","documentation","maintenance","build","dependencies"]

🚫

PR is not labeled with one of: ["ci:normal","ci:merged","ci:daily","ci:docs"]

🚫 PR title must be in the format of "Area: Summary", With both Area and Summary starting with a capital letter Good examples: - "Docs: Describe Canvas Doc Block" - "Svelte: Support Svelte v4" Bad examples: - "add new api docs" - "fix: Svelte 4 support" - "Vue: improve docs"
🚫 PR description is missing the mandatory "#### Manual testing" section. Please add it so that reviewers know how to manually test your changes.
Warnings
⚠️

This PR targets valentin/agentic-review-harness. The default branch for contributions is next. Please make sure you are targeting the correct branch.

Generated by 🚫 dangerJS against 492d6fa

@valentinpalkovic valentinpalkovic added the ci:verify Trigger PR Verification Harness label May 11, 2026
@valentinpalkovic valentinpalkovic changed the base branch from next to valentin/agentic-review-harness May 11, 2026 13:18
@valentinpalkovic valentinpalkovic added ci:verify Trigger PR Verification Harness and removed ci:verify Trigger PR Verification Harness labels May 11, 2026
@valentinpalkovic
Copy link
Copy Markdown
Contributor Author

Closing — A4 blocked until #34762 merges to next. The workflow file is not on the default branch, so pull_request_target cannot dispatch it. Re-run A4 after merge.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3b7f130f-577f-4500-becd-3d6b4e0ed562

📥 Commits

Reviewing files that changed from the base of the PR and between 0eab5b5 and 492d6fa.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (35)
  • .agents/skills/verify-recipe-author/SKILL.md
  • .claude/skills/verify-recipe-author/SKILL.md
  • .dockerignore
  • .github/workflows/verify-pr.yml
  • .gitignore
  • .verify-recipes/.eslintrc.cjs
  • .verify-recipes/.gitkeep
  • .verify-recipes/_recipe-authoring-guide.md
  • .verify-recipes/_util.ts
  • .verify-recipes/example-smoke.spec.ts
  • code/core/src/manager-api/index.ts
  • package.json
  • scripts/verify-pr-author.ts
  • scripts/verify-pr-generate.ts
  • scripts/verify-pr.ts
  • scripts/verify/README.md
  • scripts/verify/SECURITY.md
  • scripts/verify/__fixtures__/stub-assistant-reply-clean.txt
  • scripts/verify/__fixtures__/stub-assistant-reply-with-unused-var.txt
  • scripts/verify/__fixtures__/stub-assistant-reply.txt
  • scripts/verify/agent-dispatch.ts
  • scripts/verify/agent-prompt.ts
  • scripts/verify/boot.ts
  • scripts/verify/core.ts
  • scripts/verify/lint-invocation.ts
  • scripts/verify/playwright.config.ts
  • scripts/verify/recipe-author-core.ts
  • scripts/verify/recipe-deny.ts
  • scripts/verify/recipe-retry-policy.ts
  • scripts/verify/recipes/triage-table.ts
  • scripts/verify/runner.ts
  • scripts/verify/sandbox.ts
  • scripts/verify/symlink.ts
  • scripts/verify/sync.ts
  • scripts/verify/triage.ts

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive, multi-stage PR verification harness for Storybook that generates Playwright recipe specs via Anthropic Claude, validates them against security and structural gates, and executes them in isolated Storybook sandboxes. It includes GitHub Actions CI integration, hardened container execution, sandbox state management, and detailed result reporting.

Changes

PR Verification Harness—Agent-Driven Recipe Generation & CI Workflow

Layer / File(s) Summary
Core Data Models & Schemas
scripts/verify/core.ts, scripts/verify/agent-prompt.ts, scripts/verify/recipe-author-core.ts, scripts/verify/recipe-retry-policy.ts, scripts/verify/recipes/triage-table.ts
Exports shared VerifyResult, RecipeTest, RunPaths, PromptInput, PromptBundle, RecipeAuthorStatus, RecipeAuthorResult, RECIPE_RETRY_POLICY, and TriageRoute types that model PR verification state, recipe authoring lifecycle, retry categories, and triage routing.
Specification, Architecture & Security Documentation
.agents/skills/verify-recipe-author/SKILL.md, .claude/skills/verify-recipe-author/SKILL.md, .verify-recipes/_recipe-authoring-guide.md, scripts/verify/README.md, scripts/verify/SECURITY.md
Defines skill contract for recipe authoring including bundle discovery, spec extraction, deny-regex enforcement, retry protocol (exit code 75), provenance headers, and result emission; authoring guide enforcing listener-before-goto, error attachment in finally blocks, RecipePage helper usage, selector strategies, and deny-patterns; README documenting prerequisites, CLI flags, output layout, verdict semantics, and architecture overview; and security model covering lethal-trifecta breakers, Phase 1 local controls, Phase 2 CI gating, sensitive-file exclusions, and isolation strategies.
Configuration & Dependencies
.verify-recipes/.eslintrc.cjs, scripts/verify/playwright.config.ts, .dockerignore, .gitignore, package.json
Pins TypeScript-aware ESLint config for recipe specs, Playwright test configuration (baseURL, trace, screenshots, single worker), Docker credential/artifact exclusions, .verify-output ignore pattern, and adds @anthropic-ai/sdk@0.65.0 plus three npm scripts (verify-pr, verify-pr-author, verify-pr-generate).
Security Gates & Validation Rules
scripts/verify/recipe-deny.ts, scripts/verify/lint-invocation.ts, scripts/verify/recipe-retry-policy.ts
Exports labeled DENY_PATTERNS (child_process, filesystem deletion, eval, node: imports), lintRecipeSpec() spawning ESLint with pinned config and remapping exit codes for error-only failures, categorizeEslintViolations() grouping violations by rule categories (listener-before-goto, attach-pattern, imports), and formatRetryMessage() building retry prompts with truncated JSON output (8 KB cap).
Anthropic API Integration & Request Building
scripts/verify/agent-dispatch.ts, scripts/verify/agent-prompt.ts
Exports MODEL_ID_MAP, resolveModelId(), buildAnthropicRequest() assembling message params with cached guide+smoke as ephemeral blocks, and dispatchRecipeAuthor() dispatching to Anthropic SDK with 3-attempt retry loop (429/500/502/503/504 with exponential backoff), credential enforcement via ANTHROPIC_API_KEY, stub mode support, and debug artifact writing. Prompt builder exports buildRecipeAuthorPrompt() assembling multi-section prompt with mission, guide, reference specs, PR metadata/diff, and token budget enforcement (char-count heuristic).
PR Metadata Fetching & Prompt Bundle Generation
scripts/verify-pr-generate.ts, scripts/verify/triage.ts
Parses --pr flag, fetches PR metadata/diff via GitHub CLI, triages changed paths to reference specs using TRIAGE_ROUTES glob matching, truncates and caps diff content (per-file line cap, total file cap, byte cap), orders triage-matched files first, loads authoring guide + smoke + reference specs, builds and writes prompt-bundle.json under .verify-output/<runId>/, and prints next-step instructions.
Recipe Author Core & CLI
scripts/verify/recipe-author-core.ts, scripts/verify-pr-author.ts
Exports runRecipeAuthor() orchestrating spec generation via agent dispatch, extracting spec between fence markers, validating against deny-patterns, prepending provenance header, writing candidate spec, running linting and structural regex checks (listener-before-goto, attach pattern), handling collisions, and formatting retry messages from violations. CLI entry point locates latest prompt-bundle.json or uses --bundle, validates bundle version, dispatches in SDK or stdin mode, maps result statuses to exit codes (0 on success, 75 on retry-requested, 1 on failure), emits retry frames to stdout.
Sandbox Lifecycle & Storybook Startup
scripts/verify/sandbox.ts, scripts/verify/symlink.ts, scripts/verify/sync.ts, scripts/verify/boot.ts
Resolves sandbox directories with bootstrap validation, snapshots/restores package.json/yarn.lock/.yarnrc.yml, sanitizes Yarn resolutions, ensures symlinks/copies with dangling-heal and CI/Windows fallbacks, compiles core package via yarn nx compile core, preflights port availability via OS-specific commands, and boots Storybook via yarn storybook with child process forwarding and startup race against abort signal.
Recipe Execution & Report Parsing
scripts/verify/runner.ts, scripts/verify/core.ts
Spawns Playwright via bun x playwright test with environment wiring and abort controller support, streams output with line prefixes, discovers and returns trace attachment paths from JSON report, implements parsePlaywrightReport() walking nested suites/specs/tests to extract final-retry results and attachments (pageErrors, consoleErrors), and computeVerdict() classifying outcomes as verified/regression based on test status and error presence.
Main Harness Orchestration & CLI Control Flow
scripts/verify-pr.ts
Parses CLI flags (--resync, --skip-recipe, --restore-sandbox, --keep-open, --recipe-spec, --port, --help), handles control flows (restore/skip/resync/full-run), manages sandbox snapshot/restore/sanitize, installs signal handlers for graceful shutdown, preflights port, syncs and boots Storybook, runs recipe, parses report, computes verdict, writes comprehensive VerifyResult with compile/symlink/boot/recipe/total durations and trace artifacts.
GitHub Actions CI Workflow Integration
.github/workflows/verify-pr.yml
Workflow triggered on PR labeled ci:verify (non-draft, labeled/synchronize), checks base SHA, fetches PR diff, generates bundle via yarn verify-pr-generate, authors recipe via yarn verify-pr-author with scoped ANTHROPIC_API_KEY, runs harness in pinned Docker image with hardened settings (dropped caps, no-new-privileges, read-only FS, tmpfs mounts, no network), uploads artifacts (14 days), and posts verdict comment with artifact link.
Test Fixtures & Playwright Helper Utilities
.verify-recipes/_util.ts, .verify-recipes/example-smoke.spec.ts, scripts/verify/__fixtures__/stub-assistant-reply*.txt
Exports RecipePage helper wrapping Playwright Page/Expect for Storybook preview navigation (previewIframe(), previewRoot(), waitForStoryLoaded(), waitUntilLoaded()), defines canonical smoke test for example-button--primary story with error collection and assertion, and stub fixtures for testing agent reply parsing and linting edge cases (clean reply, unused variables).
Minor Placeholder & Cross-Reference
code/core/src/manager-api/index.ts
No-op touch comment added to signal test label availability.

🎯 4 (Complex) | ⏱️ ~75 minutes

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:verify Trigger PR Verification Harness

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant