-
Notifications
You must be signed in to change notification settings - Fork 1
feat(verify): Phase-12 live-smoke verification harness #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
thejustinwalsh
wants to merge
9
commits into
main
Choose a base branch
from
middle-issue-208
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
a03a2e9
docs(verify): plan + decisions log for Epic #208
thejustinwalsh 2650d90
test(epic-store): file-mode live-smoke integration + shared runFileMo…
thejustinwalsh d3069e3
feat(cli): mm verify-file-mode — wraps the file-mode smoke with a str…
thejustinwalsh 13ba06e
feat(cli): mm verify-file-mode --live — real-GitHub smoke against a t…
thejustinwalsh 0428d60
docs(verify): live-smoke verification — what it proves, when to run, …
thejustinwalsh 4a0065e
fix(cli): self-review hardening of the --live smoke + docs guard
thejustinwalsh 23f8b22
Merge remote-tracking branch 'origin/main' into middle-issue-208
thejustinwalsh 82c9746
docs(verify): TSDoc SmokeSectionName + normalize issue refs in decisi…
thejustinwalsh 48a4794
docs(verify): docstring the smoke harness's git/gh/EPIC_BODY helpers
thejustinwalsh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,302 @@ | ||
| /** | ||
| * `mm verify-file-mode --live --repo <owner/name>` — the real-GitHub smoke: the | ||
| * test the autonomous flow never ran (Epic #208). It drives the full file-mode | ||
| * loop against a **live** GitHub repo — author an Epic file on a fresh branch, | ||
| * dispatch it, satisfy any park by editing the answer block, then assert a draft | ||
| * PR exists with the expected sub-issue checkbox flipped — and cleans up on | ||
| * success / leaves the artifacts (printing their URLs) on failure. | ||
| * | ||
| * This is **not** part of `bun test`: it needs real GitHub, a real agent, real | ||
| * tokens, and minutes of wall-clock, so it is an opt-in operator step on a | ||
| * manual/weekly cadence (see `docs/dogfooding.md`). The orchestration | ||
| * ({@link runLiveSmoke}) is fully unit-tested against an injected {@link LiveSmokeIO}; | ||
| * the production IO ({@link makeLiveSmokeIO}) is the GitHub/daemon/git boundary CI | ||
| * cannot exercise — that boundary is the recorded one-shot evidence run. | ||
| */ | ||
|
|
||
| import { runDispatch } from "./dispatch.ts"; | ||
|
|
||
| /** An open PR as the smoke needs to reason about it. */ | ||
| export type LivePr = { | ||
| number: number; | ||
| isDraft: boolean; | ||
| url: string; | ||
| }; | ||
|
|
||
| /** Settled workflow state the smoke distinguishes: a question park vs anything terminal. */ | ||
| export type SettledState = "completed" | "waiting-human" | "failed"; | ||
|
|
||
| /** | ||
| * The GitHub/daemon/git boundary the live smoke drives. Every method is a real | ||
| * side-effect against the test repo; the orchestration ({@link runLiveSmoke}) is | ||
| * pure control flow over this seam, so it is unit-tested with a fake IO and the | ||
| * production impl ({@link makeLiveSmokeIO}) is the operator-run boundary. | ||
| */ | ||
| export type LiveSmokeIO = { | ||
| log: (line: string) => void; | ||
| /** Author the Epic file on a fresh branch in the test repo; returns its slug + branch URL. */ | ||
| authorEpic: () => Promise<{ slug: string; branch: string; branchUrl: string }>; | ||
| /** Dispatch the Epic through the daemon and resolve once the row settles. */ | ||
| dispatch: (slug: string) => Promise<SettledState>; | ||
| /** Fill in the open question's answer block on disk (the file-mode resume trigger). */ | ||
| answerQuestion: (slug: string) => Promise<void>; | ||
| /** Wait for the daemon's file-watcher resume to drive the sub-issue checkbox to `[x]`. */ | ||
| awaitResume: (slug: string) => Promise<void>; | ||
| /** The Epic's open draft PR, or null if none opened. */ | ||
| findEpicPr: (slug: string) => Promise<LivePr | null>; | ||
| /** Whether sub-issue `id`'s checkbox is `[x]` on the PR head. */ | ||
| isSubIssueChecked: (slug: string, pr: LivePr, id: number) => Promise<boolean>; | ||
| /** Tear the test branch + PR down (success only). */ | ||
| cleanup: (slug: string, branch: string, pr: LivePr | null) => Promise<void>; | ||
| }; | ||
|
|
||
| /** | ||
| * The live-smoke orchestration. Returns a process exit code (0 green / 1 failed). | ||
| * On success it cleans up; on **any** failure it leaves the surviving branch/PR | ||
| * intact and prints their URLs for operator inspection (never cleans up a | ||
| * failure — the artifacts are the diagnosis). | ||
| */ | ||
| export async function runLiveSmoke(io: LiveSmokeIO): Promise<number> { | ||
| io.log("authoring an Epic file on a fresh branch in the test repo…"); | ||
| const { slug, branch, branchUrl } = await io.authorEpic(); | ||
| io.log(`authored Epic '${slug}' on branch '${branch}'`); | ||
|
|
||
| io.log(`dispatching '${slug}' through the daemon…`); | ||
| const settled = await io.dispatch(slug); | ||
| io.log(`workflow settled: ${settled}`); | ||
| if (settled === "failed") { | ||
| io.log(`FAIL: dispatch failed. Surviving branch: ${branchUrl}`); | ||
| return 1; | ||
| } | ||
|
|
||
| if (settled === "waiting-human") { | ||
| io.log("parked — filling in the answer block to satisfy the park…"); | ||
| await io.answerQuestion(slug); | ||
| io.log("waiting for the file-watcher resume to complete the sub-issue…"); | ||
| await io.awaitResume(slug); | ||
| } | ||
|
|
||
| const pr = await io.findEpicPr(slug); | ||
| if (!pr) { | ||
| io.log(`FAIL: no draft PR opened on the test repo. Surviving branch: ${branchUrl}`); | ||
| return 1; | ||
| } | ||
| if (!pr.isDraft) { | ||
| io.log(`FAIL: PR #${pr.number} is not a draft. Surviving PR: ${pr.url}`); | ||
| return 1; | ||
| } | ||
|
|
||
| const checked = await io.isSubIssueChecked(slug, pr, 1); | ||
| if (!checked) { | ||
| io.log(`FAIL: sub-issue #1 checkbox not flipped on PR #${pr.number}. Surviving PR: ${pr.url}`); | ||
| return 1; | ||
| } | ||
|
|
||
| io.log(`PASS: draft PR #${pr.number} with sub-issue #1 checked — ${pr.url}`); | ||
| await io.cleanup(slug, branch, pr); | ||
| io.log("cleaned up the test branch + PR."); | ||
| return 0; | ||
| } | ||
|
|
||
| /** Options for {@link runVerifyFileModeLive}. */ | ||
| export type LiveOptions = { | ||
| /** `owner/name` of the designated throwaway test repo. */ | ||
| repo?: string; | ||
| /** Local checkout of the test repo (the daemon dispatches against it). Defaults to cwd. */ | ||
| repoPath?: string; | ||
| /** Inject a fake IO (tests only); production builds {@link makeLiveSmokeIO}. */ | ||
| io?: LiveSmokeIO; | ||
| }; | ||
|
|
||
| /** | ||
| * Entry point for `mm verify-file-mode --live`. Validates `--repo`, builds the | ||
| * production IO (unless an `io` is injected for tests), and runs the smoke. | ||
| */ | ||
| export async function runVerifyFileModeLive(opts: LiveOptions = {}): Promise<number> { | ||
| const repo = opts.repo?.trim(); | ||
| if (!repo || !/^[^/\s]+\/[^/\s]+$/.test(repo)) { | ||
| console.error( | ||
| "mm verify-file-mode --live: pass --repo <owner/name> for the designated test repo", | ||
| ); | ||
| return 1; | ||
| } | ||
| const io = opts.io ?? makeLiveSmokeIO({ repo, repoPath: opts.repoPath ?? process.cwd() }); | ||
| return runLiveSmoke(io); | ||
| } | ||
|
|
||
| // ── Production IO — the GitHub/daemon/git boundary (operator-run; not CI-tested) ── | ||
|
|
||
| /** Render the throwaway Epic file body for a `--live` run, keyed to `slug` (one sub-issue, one question). */ | ||
| const EPIC_BODY = (slug: string): string => | ||
| [ | ||
| "<!-- middle:epic v1 -->", | ||
| "# feat: live-smoke verification probe", | ||
| "", | ||
| "## meta", | ||
| `slug: ${slug}`, | ||
| "adapter: claude", | ||
| "", | ||
| "## context", | ||
| "Throwaway Epic authored by `mm verify-file-mode --live` to prove the", | ||
| "file-mode dispatch loop opens a real PR end to end. Safe to delete.", | ||
| "", | ||
| "## acceptance criteria", | ||
| "- [ ] a draft PR opens for this Epic", | ||
| "", | ||
| "## sub-issues", | ||
| "<!-- middle:sub-issue id=1 -->", | ||
| "- [ ] **1 — touch a probe file** Create `verify-live-probe.txt` with any content, open the draft PR, and ask the operator to confirm before finishing.", | ||
| "<!-- /middle:sub-issue -->", | ||
| "", | ||
| "## conversation", | ||
| "", | ||
| ].join("\n"); | ||
|
|
||
| const ANSWER_TEXT = "Confirmed — finish the sub-issue and leave the PR as a draft."; | ||
|
|
||
| /** Run a `gh` subcommand, capturing stdout/stderr; returns `ok` instead of throwing so callers can branch on failure. */ | ||
| async function gh(args: string[]): Promise<{ ok: boolean; stdout: string; stderr: string }> { | ||
| const proc = Bun.spawn(["gh", ...args], { stdout: "pipe", stderr: "pipe", stdin: "ignore" }); | ||
| const [stdout, stderr] = await Promise.all([ | ||
| new Response(proc.stdout).text(), | ||
| new Response(proc.stderr).text(), | ||
| ]); | ||
| return { ok: (await proc.exited) === 0, stdout, stderr }; | ||
| } | ||
|
|
||
| /** Run a git subcommand in `cwd`; throws with stderr on non-zero exit. */ | ||
| async function git(cwd: string, args: string[]): Promise<void> { | ||
| const proc = Bun.spawn(["git", "-C", cwd, ...args], { stdout: "ignore", stderr: "pipe" }); | ||
| if ((await proc.exited) !== 0) { | ||
| throw new Error(`git ${args.join(" ")}: ${(await new Response(proc.stderr).text()).trim()}`); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * The real GitHub/daemon/git IO. Operator-run — this boundary is what `bun test` | ||
| * cannot exercise (real repo, real agent, real tokens). The recorded one-shot run | ||
| * against the designated test repo is the evidence; the orchestration above is | ||
| * what CI proves. | ||
| */ | ||
| export function makeLiveSmokeIO(cfg: { repo: string; repoPath: string }): LiveSmokeIO { | ||
| const { repo, repoPath } = cfg; | ||
| const stamp = Date.now(); | ||
| const slug = `verify-smoke-${stamp}`; | ||
| // The branch the daemon's worktree opens its PR from: `middle-<unit>`, where | ||
| // unit is `issue-<epicRef>` (see worktree.ts `unitName`/`createWorktree`). The | ||
| // smoke finds + cleans the PR by this head branch — file-mode Epics have no | ||
| // issue number, so the gh `closes #N` finder (ghGitHub.findEpicPr) can't match. | ||
| const agentBranch = `middle-issue-${slug}`; | ||
| // The local seed branch the Epic file is authored on; never pushed (the daemon | ||
| // dispatches against the local checkout, so the Epic only needs to be on disk). | ||
| const seedBranch = `middle-smoke-${stamp}`; | ||
| const epicRelPath = `planning/epics/${slug}.md`; | ||
| const log = (line: string): void => console.log(`mm verify-file-mode --live: ${line}`); | ||
| const prUrl = (n: number): string => `https://github.com/${repo}/pull/${n}`; | ||
| const branchUrl = `https://github.com/${repo}/tree/${agentBranch}`; | ||
|
|
||
| return { | ||
| log, | ||
| async authorEpic() { | ||
| const { writeFileSync, mkdirSync } = await import("node:fs"); | ||
| const { join, dirname } = await import("node:path"); | ||
| const abs = join(repoPath, epicRelPath); | ||
| mkdirSync(dirname(abs), { recursive: true }); | ||
| writeFileSync(abs, EPIC_BODY(slug)); | ||
| // Seed the Epic on a fresh local branch; the daemon's worktree branches off | ||
| // this HEAD, so its checkout carries the Epic file. No push needed. | ||
| await git(repoPath, ["checkout", "-b", seedBranch]); | ||
| await git(repoPath, ["add", epicRelPath]); | ||
| await git(repoPath, ["commit", "-m", `chore: live-smoke Epic ${slug}`]); | ||
| return { slug, branch: seedBranch, branchUrl }; | ||
| }, | ||
| async dispatch(s) { | ||
| // runDispatch returns 0 when the workflow completes or parks; infer which by | ||
| // re-reading the Epic file for an open question (the file-mode park trace). | ||
| const code = await runDispatch(repoPath, s, {}); | ||
| if (code !== 0) return "failed"; | ||
| return (await hasOpenQuestion(repoPath, s)) ? "waiting-human" : "completed"; | ||
| }, | ||
| async answerQuestion(s) { | ||
| // The human-edit the file-watcher detects: fill the answer block on disk. | ||
| // The daemon reads the local checkout, so no push is needed. | ||
| await fillAnswerBlock(repoPath, s, ANSWER_TEXT); | ||
| }, | ||
| async awaitResume(s) { | ||
| // The daemon's file-watcher polls on its cron; poll the PR until the | ||
| // sub-issue checkbox flips (or a generous deadline passes). | ||
| const deadline = Date.now() + 15 * 60_000; | ||
| while (Date.now() < deadline) { | ||
| const pr = await this.findEpicPr(s); | ||
| if (pr && (await this.isSubIssueChecked(s, pr, 1))) return; | ||
| await Bun.sleep(10_000); | ||
| } | ||
| log(`timed out after 15m waiting for the resume to flip the sub-issue checkbox`); | ||
| }, | ||
| async findEpicPr() { | ||
| // Match by the agent's head branch (file-mode Epics have no issue number). | ||
| const res = await gh([ | ||
| "pr", | ||
| "list", | ||
| "--repo", | ||
| repo, | ||
| "--head", | ||
| agentBranch, | ||
| "--state", | ||
| "open", | ||
| "--json", | ||
| "number,isDraft", | ||
| "--jq", | ||
| ".[0] // empty", | ||
| ]); | ||
| if (!res.ok || res.stdout.trim() === "") return null; | ||
| const pr = JSON.parse(res.stdout.trim()) as { number: number; isDraft: boolean }; | ||
| return { number: pr.number, isDraft: pr.isDraft, url: prUrl(pr.number) }; | ||
| }, | ||
| async isSubIssueChecked(_s, _pr, id) { | ||
| // Read the Epic file at the agent branch head and parse the sub-issue's box. | ||
| const fileRes = await gh([ | ||
| "api", | ||
| `repos/${repo}/contents/${epicRelPath}?ref=${agentBranch}`, | ||
| "--jq", | ||
| ".content", | ||
| ]); | ||
| if (!fileRes.ok) return false; | ||
| const text = Buffer.from(fileRes.stdout.trim(), "base64").toString("utf8"); | ||
| const { parseEpicFile } = | ||
| await import("@middle/dispatcher/src/epic-store/epic-file/parser.ts"); | ||
| const epic = parseEpicFile(text); | ||
| return epic.subIssues.find((sub) => sub.id === id)?.checked === true; | ||
| }, | ||
| async cleanup(_s, _b, pr) { | ||
| // Close the agent PR and delete its remote branch; drop the local seed branch. | ||
| if (pr) await gh(["pr", "close", String(pr.number), "--repo", repo, "--delete-branch"]); | ||
| await git(repoPath, ["checkout", "-"]).catch(() => {}); | ||
| await git(repoPath, ["branch", "-D", seedBranch]).catch(() => {}); | ||
| }, | ||
| }; | ||
| } | ||
|
|
||
| /** Does the Epic file carry an open question? (the file-mode park trace). */ | ||
| async function hasOpenQuestion(repoPath: string, slug: string): Promise<boolean> { | ||
| const { readEpicFile } = await import("@middle/dispatcher/src/epic-store/epic-file-io.ts"); | ||
| const { join } = await import("node:path"); | ||
| const epic = readEpicFile(join(repoPath, "planning", "epics"), slug); | ||
| return (epic?.conversation ?? []).some((e) => e.kind === "question" && e.status === "open"); | ||
| } | ||
|
|
||
| /** Fill the open question's answer block on disk (the human-edit the watcher detects). */ | ||
| async function fillAnswerBlock(repoPath: string, slug: string, answer: string): Promise<void> { | ||
| const { readEpicFile, writeEpicFile } = | ||
| await import("@middle/dispatcher/src/epic-store/epic-file-io.ts"); | ||
| const { join } = await import("node:path"); | ||
| const epicsDir = join(repoPath, "planning", "epics"); | ||
| const epic = readEpicFile(epicsDir, slug); | ||
| if (!epic) throw new Error(`no Epic file for ${slug} to answer`); | ||
| writeEpicFile(epicsDir, slug, { | ||
| ...epic, | ||
| conversation: epic.conversation.map((e) => | ||
| e.kind === "question" && e.status === "open" ? { ...e, answer: { body: answer } } : e, | ||
| ), | ||
| }); | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decision —
--liveevidence run is the operator step; headless ships code + deterministic tests. The Epic context states a headless run "could not create a throwaway GitHub repo or spawn a real agent" — the live run fundamentally needs a real agent to open a real PR. So this boundary IO is operator-run (not CI-tested by design); the orchestrationrunLiveSmokeabove is fully unit-tested. The PR finds + cleans the agent's PR by its head branchmiddle-issue-<slug>because file-mode Epics have no issue number for gh'scloses #Nfinder.