Skip to content

feat(self-audit): integration-verified requirements + freshness (Epic #143)#163

Merged
thejustinwalsh merged 12 commits into
mainfrom
middle-issue-143
May 28, 2026
Merged

feat(self-audit): integration-verified requirements + freshness (Epic #143)#163
thejustinwalsh merged 12 commits into
mainfrom
middle-issue-143

Conversation

@thejustinwalsh

@thejustinwalsh thejustinwalsh commented May 26, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #143

Three coordinated self-auditing systems that fix middle's "green tests as the artifact" failure mode — middle applying its own second-pass-review instinct to its requirements, its definition of done, and its documents:

Each system is integration-verified itself — its own test exercises the real path (spawns the real mm CLI / drives the real gate / runs the real pass), the very contract the Epic introduces. See the plan comment and planning/issues/143/plan.md.

Status

Acceptance criteria

Verification evidence

#144 — Requirements auditor

  • Shared rubric (packages/core/src/integration-rubric.ts): isIntegrationCriterion requires a product-wiring signal and a real-path-test signal; "unit tests pass" fails. Unit tests: packages/core/test/integration-rubric.test.ts.
  • mm audit-issues (packages/cli/src/commands/audit-issues.ts): three modes (--body-file, --issue, backlog --label), exit 0/1 as a gate.
  • Integration-verified itselfpackages/cli/test/audit-issues-cli.test.ts spawns the real mm audit-issues CLI against a weak fixture (asserts it flags + suggests, exit 1) and a well-formed one (asserts it passes, exit 0). The dogfooded "exercise the real path" requirement, not a unit stub.
  • Second pass wired into creating-github-issues (Phase 8.5) + new verifying-requirements skill (all four skill copies in sync).
  • Standing backlog audit (packages/dispatcher/src/audit-cron.ts) labels rubric-failing open feature issues needs-design; pass-level test packages/dispatcher/test/backlog-audit.test.ts.
  • Full suite green: bun test → 745 pass; bun run typecheck, bun run lint, bun run format clean.

#145 — Integration-verified definition of done

  • verify.toml integration gate category (packages/dispatcher/src/gates/verify-config.ts): a category = "unit" | "integration" field (default unit) + integrationGates() helper; schema doc schemas/verify.v1.md updated; validation rejects bad categories. Tests in packages/dispatcher/test/gates/verify-config.test.ts.
  • PR-ready integration-evidence gate (packages/dispatcher/src/gates/pr-ready.ts): on top of the per-criterion evidence check, the gate now requires ≥1 acceptance criterion that is an integration criterion (shared @middle/core rubric) evidenced by a named test, or a human-authored (integration-exempt: <comment-url>) annotation. A unit-only feature is blocked from ready.
  • Integration-verified itselfpackages/dispatcher/test/gates/pr-ready.test.ts drives the real evaluatePrReady decision against a fixture PR lacking an integration test (asserts deny) and one evidencing it (asserts allow), plus the exemption (human → allow, bot → deny).
  • Definition of done updated in implementing-github-issues (7d + the PR-ready gate note); all four skill copies in sync.
  • Full suite green: bun test → 752 pass; typecheck/lint/format clean.

#146 — Anti-staleness reconciliation

  • reconcileStaleness pass (packages/dispatcher/src/staleness.ts): closes landed-but-open issues (a merged PR's closingIssuesReferences names them, yet they're open) with an evidence comment naming the PR, and detectSpecDrift flags spec lines describing a now-merged phase as future work (e.g. "lands in Phase 9"), filing a proposal-first housekeeping reconcile task (deduped by title). Never edits the spec, never closes without an evidence trail.
  • Recommender-sibling cron (packages/dispatcher/src/staleness-cron.ts) sweeps managed, non-paused repos hourly, reading each repo's spec from its checkout; wired into main.ts alongside the poller/recommender/audit crons (and torn down on shutdown).
  • New gateway methods (packages/dispatcher/src/github.ts): listMergedPrsClosingRefs, closeIssue, createIssue.
  • Integration-verified itselfpackages/dispatcher/test/staleness.test.ts runs the real pass against an in-memory GitHubGateway + a drifted fixture spec (landed-but-open issue + "lands in Phase 9" line) and asserts the close and the drift task both fire; staleness-cron.test.ts covers the managed-repo sweep reading a real spec file.
  • Full suite green: bun test → 761 pass; typecheck/lint/format clean.

How to run / verify

bun test                      # full suite (761 pass)
bun run typecheck             # tsc --noEmit, clean
# The dogfood demonstrations, each exercising a real path:
bun test packages/cli/test/audit-issues-cli.test.ts     # #144: real `mm audit-issues` CLI vs fixtures
bun test packages/dispatcher/test/gates/pr-ready.test.ts # #145: real PR-ready decision, with/without integration test
bun test packages/dispatcher/test/staleness.test.ts      # #146: real reconcile pass, close + drift
bun test packages/dispatcher/test/epic-143-demo.test.ts  # Epic: all three together
# Try the auditor by hand:
printf '## Acceptance criteria\n- [ ] unit tests pass\n' > /tmp/weak.md
bun packages/cli/src/index.ts audit-issues . --body-file /tmp/weak.md --title "Demo"   # exits 1, suggests a rewrite

Full suite at HEAD: bun test763 pass, 0 fail; bun run typecheck / lint / format clean. Branch is MERGEABLE against main (rebased — main hadn't moved).

How to review

Stumbling points

  • The worktree had no node_modules. A fresh git worktree had never had bun install run, so direct bun <file> runs and workspace subpath imports (@middle/dispatcher/src/db.ts) failed to resolve — though bun test resolved package roots fine, masking it. Running bun install in the worktree fixed it and is required for the real-CLI integration test (which spawns bun src/index.ts). Suggested CLAUDE.md note below.
  • The PR-ready hook trips on the literal phrase in any command. commandIsPrReady substring-matches gh pr ready anywhere in a command, so a grep "gh pr ready" over the skill docs fired the gate against this very PR. Harmless here, but it's a reminder the matcher is broad by design.

Suggested CLAUDE.md updates

  • Under Tech stack & build: note that a freshly-created git worktree needs bun install before bun run <file> / the daemon / the real-CLI tests will resolve workspace subpath imports — bun test masks this because it resolves package roots without the symlinks.

Follow-up issues

Out of scope

Decisions

See planning/issues/143/decisions.md (distilled into per-line review comments on this PR).

Summary by CodeRabbit

  • New Features

    • Added an audit command to check feature issue acceptance criteria and a backlog audit that flags failing issues.
    • Added hourly background jobs for continuous issue auditing and anti-staleness reconciliation.
    • Introduced an integration-exempt annotation escape hatch.
  • Bug Fixes & Improvements

    • PR-ready gate now requires integration evidence; unit-only criteria are blocked unless exempted.
    • Limits and resilient labeling behavior for backlog audits.
  • Documentation

    • New verification skill and updated workflows and schema to document the integration-rubric and Phase 8.5 audit.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1df8419c-d145-4680-ac95-5005476f81da

📥 Commits

Reviewing files that changed from the base of the PR and between b74317e and bcf0cbd.

📒 Files selected for processing (17)
  • packages/cli/src/commands/audit-issues.ts
  • packages/cli/src/index.ts
  • packages/cli/test/audit-issues-cli.test.ts
  • packages/cli/test/issue-audit.test.ts
  • packages/core/src/integration-rubric.ts
  • packages/core/test/integration-rubric.test.ts
  • packages/dispatcher/src/audit.ts
  • packages/dispatcher/src/gates/pr-ready.ts
  • packages/dispatcher/src/gates/verify-config.ts
  • packages/dispatcher/src/staleness-cron.ts
  • packages/dispatcher/src/staleness.ts
  • packages/dispatcher/test/gates/pr-ready.test.ts
  • packages/dispatcher/test/gates/verify-config.test.ts
  • packages/dispatcher/test/staleness-cron.test.ts
  • packages/dispatcher/test/staleness.test.ts
  • planning/issues/143/decisions.md
  • planning/issues/143/plan.md
✅ Files skipped from review due to trivial changes (2)
  • planning/issues/143/plan.md
  • planning/issues/143/decisions.md
🚧 Files skipped from review as they are similar to previous changes (11)
  • packages/cli/test/audit-issues-cli.test.ts
  • packages/core/test/integration-rubric.test.ts
  • packages/cli/test/issue-audit.test.ts
  • packages/dispatcher/src/audit.ts
  • packages/dispatcher/test/gates/verify-config.test.ts
  • packages/dispatcher/test/staleness.test.ts
  • packages/dispatcher/test/staleness-cron.test.ts
  • packages/dispatcher/src/staleness-cron.ts
  • packages/core/src/integration-rubric.ts
  • packages/dispatcher/src/staleness.ts
  • packages/cli/src/commands/audit-issues.ts

📝 Walkthrough

Walkthrough

Adds a shared integration-rubric in core; new mm audit-issues CLI (local/single/backlog) and backlog cron; PR-ready gate requiring evidenced integration criteria or a human-authored integration-exempt: exemption; staleness reconciliation to close landed issues and file drift tasks; GitHub gateway extensions, crons wiring, tests, and documentation updates (Phase 8.5).

Changes

Epic #143 Self-Auditing Implementation

Layer / File(s) Summary
Integration rubric and core exports
packages/core/src/integration-rubric.ts, packages/core/test/integration-rubric.test.ts, packages/core/src/index.ts
Parses first "Acceptance criteria", classifies integration criteria (wiring + real-path test), detects (integration-exempt:), and exposes auditIssueBody and types.
CLI audit command and helpers
packages/cli/src/commands/audit-issues.ts, packages/cli/src/checks/issue-audit.ts, packages/cli/src/index.ts
Adds mm audit-issues subcommand (local body, single issue, backlog), formatting, label application (needs-design), gh wrappers, and CLI option validation.
CLI audit tests
packages/cli/test/audit-issues-cli.test.ts, packages/cli/test/issue-audit.test.ts
End-to-end and unit tests for audit-issues modes, JSON output, labeling behavior, and error handling.
Backlog audit and cron
packages/dispatcher/src/audit.ts, packages/dispatcher/src/audit-cron.ts, packages/dispatcher/test/backlog-audit.test.ts
Scans open feature issues, labels failing ones up to a per-pass cap, and schedules hourly backlog audits skipping paused repos.
Staleness reconciliation and cron
packages/dispatcher/src/staleness.ts, packages/dispatcher/src/staleness-cron.ts, packages/dispatcher/test/staleness.test.ts, packages/dispatcher/test/staleness-cron.test.ts
Closes landed-but-open issues, detects spec drift (including verb-less "planned for phase N"), files deduped reconcile tasks, enforces maxPerPass, and schedules hourly cron.
PR-ready gate and verify-config
packages/dispatcher/src/gates/pr-ready.ts, packages/dispatcher/src/gates/verify-config.ts, packages/dispatcher/test/gates/*
Replaces local parse with core export, requires at least one evidenced integration criterion (or human non-bot integration-exempt:), adds gate category (unit/integration) with validation and filtering.
GitHub gateway extensions
packages/dispatcher/src/github.ts
Adds listOpenIssues, addLabel, listMergedPrsClosingRefs, closeIssue, createIssue methods and types; implements gh CLI usage and temp-file handling.
Daemon wiring
packages/dispatcher/src/main.ts
Starts audit and staleness crons on init and adds guarded shutdown teardown.
Skills documentation
.claude/, .codex/, packages/cli/src/bootstrap-assets/skills/*, packages/skills/*
Adds verifying-requirements skill and updates creating-github-issues / implementing-github-issues to require Phase 8.5 audit and tightened definition of done; documents integration-exempt escape hatch.
Planning and schema
planning/issues/143/plan.md, planning/issues/143/decisions.md, schemas/verify.v1.md
Epic plan and decisions recorded; schema docs extended to include optional gate category and example.
End-to-end integration demo tests
packages/dispatcher/test/epic-143-demo.test.ts
Integration demo asserting audit flags weak issues, PR-ready gate denies unit-only, and staleness reconciliation closes and files drift tasks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

docs

@thejustinwalsh thejustinwalsh left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision-log rationale, distilled inline (source of truth: planning/issues/143/decisions.md). These explain the why behind the load-bearing choices in this Epic.

* names a product-wiring signal **and** a real-path-test signal. This is the
* atom both #144 and #145 build on.
*/
export function isIntegrationCriterion(text: string): boolean {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this predicate, here. A criterion is an integration criterion iff it names a product-wiring signal and a real-path-test signal; "unit tests pass" matches neither. This lives in @middle/core (not the CLI check or the dispatcher gate) deliberately: it's the single source of truth both the filing-time auditor (#144) and the landing-time PR-ready gate (#145) consume, so the contract written when an issue is filed is byte-for-byte the one enforced when work lands. The heuristic is anchored to the spec's own worked example (mm start serves the dashboard … a smoke test boots the daemon and GETs /), which matches both signal classes.


const DEFERRED_RE = /\(deferred:\s*(\S+?)\s*\)/i;
/** The integration escape hatch — mirrors `(deferred: …)`; must be human-authored. */
const INTEGRATION_EXEMPT_RE = /\(integration-exempt:\s*(\S+?)\s*\)/i;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exemption reuses the (deferred: …) shape on purpose. (integration-exempt: <comment-url>) is validated exactly like the existing deferral annotation — the linked comment's author must be a non-bot human. This keeps "declare the exemption explicitly, not silently" honest: an agent can't write its own escape hatch. Note the division of labour with @middle/core's detectExemption (which accepts a prose reason in an issue body): core detects the declaration; this gate validates authorization.

// takes precedence over any exemption annotation (a real test beats a waiver).
// A *deferred* integration criterion does NOT count: the integration test can't
// be punted, only evidenced or (explicitly) exempted.
const evidenced = criteria.some(

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering is load-bearing (self-review fix). The evidenced-integration-criterion check runs before the exemption check, so a real test always wins over a waiver — avoiding a false deny when a PR both has an integration test and carries a stray/bot-authored exemption. And a (deferred: …) integration criterion is explicitly excluded: the integration test can be evidenced or exempted, never punted. Letting deferral waive it would reopen the exact "unit-green-but-unwired" hole #145 closes.

* "ships in phase 12", "planned for Phase 3". The captured group is the phase.
* Two shapes: "<future-verb> in phase N", and the verb-less "planned for phase N".
*/
const DRIFT_RE =

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two shapes, not one (self-review fix). The first draft folded planned for into the verb alternation, which then required a trailing in phase — so "planned for phase 3" silently failed to match. The regex now matches <future-verb> in phase N and the verb-less planned for phase N. This is the concrete drift class the spec calls out: a stale "lands in Phase N" line surviving past that phase's merge.

}

const gate: Gate = { name: name.trim(), command, timeoutSeconds: resolvedTimeout };
let resolvedCategory: GateCategory = "unit";

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

category defaults to unit. An integration gate is the verify-side companion to the PR-ready integration-evidence check: it declares a gate that exercises the running product (boots/serves/invokes the real path), distinct from unit gates. integrationGates(config) lets callers recognise it. Validation rejects any other value (loud-failure contract).

@thejustinwalsh

Copy link
Copy Markdown
Owner Author

Reviewer's brief — Epic #143 (PR #163)

Three self-auditing systems that make middle apply its second-pass-review instinct to its own requirements (#144), definition of done (#145), and documents (#146). Posted on both the Epic and the PR.

How to run it

bun install                    # required in a fresh worktree (see "fragile" below)
bun test                       # full suite -> 763 pass, 0 fail
bun run typecheck && bun run lint && bun run format   # all clean
# Each system's dogfood test (real path, not a stub):
bun test packages/cli/test/audit-issues-cli.test.ts       # 144 - spawns the real `mm audit-issues` CLI
bun test packages/dispatcher/test/gates/pr-ready.test.ts  # 145 - drives the real PR-ready decision
bun test packages/dispatcher/test/staleness.test.ts       # 146 - runs the real reconcile pass
bun test packages/dispatcher/test/epic-143-demo.test.ts   # Epic - all three together
# By hand:
printf '## Acceptance criteria\n- [ ] unit tests pass\n' > /tmp/weak.md
bun packages/cli/src/index.ts audit-issues . --body-file /tmp/weak.md --title "Demo"  # exits 1 + suggests a rewrite

What to verify (and what "correct" looks like)

  • The rubric (packages/core/src/integration-rubric.ts) - isIntegrationCriterion is true only when a criterion names a product-wiring signal and a real-path-test signal; "unit tests pass" must be false. This atom is shared by both enforcement points, so agreeing with it carries most of the review.
  • PR-ready gate (packages/dispatcher/src/gates/pr-ready.ts, evaluateIntegrationEvidence) - the security-sensitive change. Confirm no bypass: a (deferred: ...) integration criterion must NOT count as evidence; the (integration-exempt: ...) escape hatch requires a non-bot comment author; an evidenced integration criterion allows even alongside a stray exemption; a unit-only PR is denied.
  • Staleness (packages/dispatcher/src/staleness.ts) - only closes issues a merged PR records as closing, always with an evidence comment, and never edits the spec (files a proposal-first task instead).
  • Skill parity - packages/skills/ is canonical; bootstrap-assets/skills/, .claude/skills/, .codex/skills/ are byte-identical mirrors (all four synced).

How to review

Start at the rubric, then the two enforcement points, then the cron wiring in main.ts (three sibling crons: recommender, backlog-audit, staleness - each guarded + torn down). Decision rationale is in per-line review comments on the PR and in planning/issues/143/decisions.md.

Fragile / extra eyes

  • The regexes are the main risk surface: WIRING_RE/REAL_PATH_TEST_RE (core), DRIFT_RE/TEST_FILE_RE/INTEGRATION_EXEMPT_RE/EVIDENCE_RE (gate). An internal review pass already hardened DRIFT_RE (verb-less "planned for phase N") and the evidence/exemption ordering.
  • Fresh-worktree node_modules: a new git worktree needs bun install before direct bun <file> runs / the daemon / the real-CLI test resolve workspace subpath imports (bun test masks this). Proposed as a CLAUDE.md note in the PR body.

Follow-up

Human does the final review + merge - the workflow stops here.

@thejustinwalsh thejustinwalsh marked this pull request as ready for review May 26, 2026 23:45
@thejustinwalsh thejustinwalsh added the ready-for-review All phases done and verified — PR ready for final human review and merge label May 26, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/dispatcher/src/gates/verify-config.ts (1)

70-74: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Unknown-key error message is stale after adding category.

The guidance text still lists old keys only, which is misleading during config validation failures.

💡 Proposed fix
       throw new VerifyConfigError(
-        `${where}: unknown key "${key}" (did you mean one of name, command, timeout_seconds, phases?)`,
+        `${where}: unknown key "${key}" (did you mean one of name, command, timeout_seconds, phases, category?)`,
       );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dispatcher/src/gates/verify-config.ts` around lines 70 - 74, The
unknown-key error message is stale: update the throw in the loop that validates
config keys (the block referencing raw, KNOWN_GATE_KEYS, where, and throwing
VerifyConfigError) to include the current valid keys instead of the hard-coded
list; e.g., build the suggestion from KNOWN_GATE_KEYS (or list name, command,
timeout_seconds, phases, category) and include that dynamic list in the
VerifyConfigError message so the guidance stays accurate when keys change.
packages/dispatcher/src/gates/pr-ready.ts (1)

76-84: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Invalid deferral currently overrides valid evidence for the same criterion.

A criterion with evidence should still pass even if its (deferred: ...) annotation is invalid; current control flow denies it.

💡 Proposed fix
   for (const criterion of criteria) {
     const deferred = DEFERRED_RE.exec(criterion);
     if (deferred) {
       const author = await opts.resolveCommentAuthor(deferred[1]!);
       if (author && !author.isBot) continue; // stakeholder-authorized deferral
-      unmet.push(criterion);
-      continue;
+      if (namesEvidence(criterion)) continue; // evidence still satisfies the criterion
+      unmet.push(criterion);
+      continue;
     }
     if (namesEvidence(criterion)) continue; // has evidence (link/#ref or a named test file)
     unmet.push(criterion);
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dispatcher/src/gates/pr-ready.ts` around lines 76 - 84, The deferral
check using DEFERRED_RE currently runs before evidence detection and causes an
invalid deferred tag to mark a criterion unmet even when
namesEvidence(criterion) would pass; re-order the logic so
namesEvidence(criterion) is evaluated before handling DEFERRED_RE, or within the
deferred branch first verify that namesEvidence is false before pushing to
unmet; specifically update the block around DEFERRED_RE,
opts.resolveCommentAuthor, namesEvidence, and unmet so valid evidence wins over
an invalid `(deferred: ...)` annotation.
🧹 Nitpick comments (3)
packages/dispatcher/test/gates/pr-ready.test.ts (1)

124-184: ⚡ Quick win

Add a regression case for “evidence + invalid deferral” on the same criterion.

Given the OR contract, that criterion should still pass. A focused test here will lock behavior and prevent future regressions.

🧪 Suggested test case
 describe("evaluatePrReady — integration evidence", () => {
+  test("evidence still satisfies when the same criterion has a bot-authored deferral", async () => {
+    const body =
+      "## Acceptance criteria\n- [ ] done (https://example.com/x) (deferred: https://github.com/o/r/issues/1#issuecomment-2)\n- [ ] `mm start` serves it; a smoke test boots the daemon and GETs `/` (packages/cli/test/daemon-entry.test.ts)";
+    const resolve: CommentAuthorResolver = async () => ({ login: "middle[bot]", isBot: true });
+    const result = await evaluatePrReady({ body, resolveCommentAuthor: resolve });
+    expect(result).toEqual({ decision: "allow" });
+  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dispatcher/test/gates/pr-ready.test.ts` around lines 124 - 184, Add
a regression test in the same suite that constructs an acceptance-criteria body
containing a single integration criterion that is both evidenced (e.g., includes
a named test file like "packages/cli/test/daemon-entry.test.ts") and also has an
invalid deferral annotation (e.g., "(deferred: ...)" authored by a bot); call
evaluatePrReady with that body and a resolveCommentAuthor that returns a bot
author, and assert the result is { decision: "allow" } — reference
evaluatePrReady, CommentAuthorResolver, and resolveCommentAuthor so the test
mirrors the existing "evidenced integration criterion allows even if a stray bot
exemption is present" but uses a bad/deferred annotation to lock in the OR
semantics.
packages/dispatcher/src/staleness-cron.ts (1)

23-33: ⚡ Quick win

Add a top-level TSDoc block for StalenessCronDeps.

StalenessCronDeps is a public export and should have an explicit module-level contract comment above the type.

As per coding guidelines: Every public export in a module must carry a TSDoc/JSDoc comment.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dispatcher/src/staleness-cron.ts` around lines 23 - 33, Add a
top-level TSDoc comment immediately above the exported type StalenessCronDeps
describing the purpose of this dependency bag (what the staleness cron needs),
and briefly document each property (db, github with its required methods list,
specPath and its default DEFAULT_SPEC_PATH, and now as an optional time
provider). Ensure the TSDoc is a module-level/public comment (/** ... */) so the
exported type carries an explicit contract for callers.
packages/dispatcher/src/staleness.ts (1)

61-83: ⚡ Quick win

Add top-level TSDoc for exported StalenessDeps and StalenessResult.

Both are public exports and should carry explicit contract comments above the type declarations.

As per coding guidelines: Every public export in a module must carry a TSDoc/JSDoc comment.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dispatcher/src/staleness.ts` around lines 61 - 83, Add top-level
TSDoc comments above the exported types StalenessDeps and StalenessResult: for
StalenessDeps describe it as the input contract for the staleness checker
(include short descriptions for repo, github gateway methods, readSpec, specPath
and note that maxPerPass is an optional cap with default behavior), and for
StalenessResult describe it as the output of a staleness pass (briefly document
closed as issue numbers closed this pass, drift as detected SpecDrift items, and
filed as reconcile task issue numbers); ensure the comments are placed
immediately above the respective type declarations so they satisfy the
public-export documentation rule.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/cli/src/commands/audit-issues.ts`:
- Around line 14-37: Add a top-level TSDoc comment for the exported type
AuditIssuesOptions describing its purpose and public contract (e.g., "Options
for running the audit-issues command, controlling input sources, GitHub
interactions, and logging"); place it immediately above the "export type
AuditIssuesOptions" declaration and include brief notes about intended use,
defaults/behavior for key fields like issue, bodyFile, resolveSlug, fetchIssue,
listOpenIssues, addLabel, readBodyFile, log, and errlog so consumers understand
the API surface.
- Around line 156-161: Wrap the single-issue fetch in a try/catch: when
opts.issue is set, call (opts.fetchIssue ?? fetchIssueDefault)(slug, opts.issue)
inside a try block and catch any thrown errors, logging them via errlog (include
error.message or the error object) and return 1; keep the existing null-check
for the returned issue but ensure thrown parser/runtime exceptions are handled
the same way. Use the same symbols: opts.issue, fetchIssue/fetchIssueDefault,
slug, errlog, and preserve the existing control flow.
- Around line 110-112: The helper addLabelDefault silently ignores failures from
the gh call, so it should fail fast: update addLabelDefault (the function that
calls gh(["gh", "issue", "edit", String(n), "--repo", slug, "--add-label",
label])) to detect a non-zero/failed result and throw or propagate an Error when
gh does not succeed (or rethrow the underlying error) instead of returning
silently; ensure the caller receives the rejection so label application failures
are not treated as successful.

In `@packages/cli/src/index.ts`:
- Around line 165-183: Validate the options.issue value before calling
runAuditIssues: in the async CLI handler that currently forwards issue:
options.issue === undefined ? undefined : Number(options.issue), parse and
verify options.issue is a positive integer (use parseInt/Number and
Number.isInteger and > 0) and if invalid print a clear error and exit with
non-zero status instead of passing NaN into runAuditIssues; keep undefined when
the flag is omitted and only convert to Number once it passes validation.

In `@packages/cli/test/audit-issues-cli.test.ts`:
- Around line 30-41: The test is combining stderr into the returned out which
makes JSON parsing flaky; update the runCli helper (function runCli) to return
stdout and stderr separately (e.g., return { code, stdout, stderr }) or add a
flag to indicate JSON mode, and then in the JSON-mode test parse JSON from the
stdout field only (do not include stderr); also update the other tests that rely
on runCli (including the ones referenced around the second occurrence) to use
the new shape or flag when they need combined output.

In `@packages/core/src/integration-rubric.ts`:
- Around line 56-57: The REAL_PATH_TEST_RE currently uses "integration[ -]?test"
and "smoke[ -]?test" which won't match plural forms like "integration tests" or
"smoke tests"; update the pattern (symbol: REAL_PATH_TEST_RE) to accept optional
plural "s" for those tokens (e.g., use "integration[ -]?tests?" and "smoke[
-]?tests?") while preserving the rest of the alternatives, flags, and
case-insensitivity so plural phrases are recognized without breaking other
matches.
- Around line 31-33: parseAcceptanceCriteria currently toggles inSection on any
acceptance heading, allowing later headings to reopen acceptance parsing; add a
guard to enforce "first acceptance heading only": introduce a boolean (e.g.,
seenAcceptanceSection) and in parseAcceptanceCriteria when you detect a heading
(/^#{1,6}\s/), if it's an acceptance heading set inSection = true only when
seenAcceptanceSection is false and then set seenAcceptanceSection = true; for
any subsequent headings (or when leaving the section) ensure inSection remains
false so later acceptance headings are ignored—update references to inSection
and the heading-detection branch to use seenAcceptanceSection to prevent
reopening the acceptance section.

In `@packages/dispatcher/src/audit.ts`:
- Around line 19-24: Add a declaration-level TSDoc block above the exported
BacklogAuditDeps type describing the purpose of the type and each field (repo,
github, and maxFlagsPerPass), including that maxFlagsPerPass is optional and
defaults to DEFAULT_MAX_FLAGS_PER_PASS; ensure the comment follows TSDoc style
(/** ... */) and mentions this is the dependency contract for the backlog audit
routine so it documents the public export.

In `@packages/dispatcher/src/staleness-cron.ts`:
- Around line 51-56: The readSpec closure currently swallows all filesystem
errors and treats them as a missing spec; change readSpec to only convert ENOENT
(file not found) into null and rethrow any other errors so permission/I/O errors
surface. Specifically, inside the readSpec function that calls
readFileSync(join(managed.checkoutPath, specPath), "utf8"), catch the thrown
error, check error.code === "ENOENT" and return null for that case, otherwise
rethrow the error (or throw a new error preserving the original) so the
caller/runtime logs a real failure instead of silently skipping drift checks.

In `@packages/dispatcher/src/staleness.ts`:
- Around line 93-94: The cap is applied separately to closes and creates so a
pass can exceed the intended total; initialize a single budget = deps.maxPerPass
?? DEFAULT_MAX_PER_PASS and use it across all mutation decisions (both close and
create paths) instead of using cap per bucket. In staleness.ts, replace
per-bucket cap checks around the code that builds close lists and task-create
lists (references: cap, deps.maxPerPass, DEFAULT_MAX_PER_PASS, the loops that
call deps.github.listOpenIssues and the logic that increments closes/creates)
with checks that consult and decrement the shared budget; stop processing
further buckets or items once budget reaches zero. Ensure any early returns or
final reporting use the shared budget so the sum of closes+creates never exceeds
the configured maxPerPass.

In `@planning/issues/143/decisions.md`:
- Line 26: Update the inline code span that currently contains a trailing space
(`mm `) to remove the space so it reads `mm`; locate the code span in the text
"product-wiring signal
(served/mounted/invoked/reachable/wired/booted/GET/POST/`mm `/endpoint/route…)"
and replace the malformed inline code token `mm ` with `mm` to satisfy Markdown
lint rule MD038.

In `@planning/issues/143/plan.md`:
- Around line 38-39: Replace the awkward phrase "requires evidenced integration
test" in the line containing "category (schema + parser + `verify.v1.md`);
PR-ready gate requires evidenced integration test + `(integration-exempt:
<url>)` escape hatch;" with the clearer wording "requires an evidenced
integration test" (or alternatively "requires integration-test evidence") so the
sentence reads naturally and unambiguously.

---

Outside diff comments:
In `@packages/dispatcher/src/gates/pr-ready.ts`:
- Around line 76-84: The deferral check using DEFERRED_RE currently runs before
evidence detection and causes an invalid deferred tag to mark a criterion unmet
even when namesEvidence(criterion) would pass; re-order the logic so
namesEvidence(criterion) is evaluated before handling DEFERRED_RE, or within the
deferred branch first verify that namesEvidence is false before pushing to
unmet; specifically update the block around DEFERRED_RE,
opts.resolveCommentAuthor, namesEvidence, and unmet so valid evidence wins over
an invalid `(deferred: ...)` annotation.

In `@packages/dispatcher/src/gates/verify-config.ts`:
- Around line 70-74: The unknown-key error message is stale: update the throw in
the loop that validates config keys (the block referencing raw, KNOWN_GATE_KEYS,
where, and throwing VerifyConfigError) to include the current valid keys instead
of the hard-coded list; e.g., build the suggestion from KNOWN_GATE_KEYS (or list
name, command, timeout_seconds, phases, category) and include that dynamic list
in the VerifyConfigError message so the guidance stays accurate when keys
change.

---

Nitpick comments:
In `@packages/dispatcher/src/staleness-cron.ts`:
- Around line 23-33: Add a top-level TSDoc comment immediately above the
exported type StalenessCronDeps describing the purpose of this dependency bag
(what the staleness cron needs), and briefly document each property (db, github
with its required methods list, specPath and its default DEFAULT_SPEC_PATH, and
now as an optional time provider). Ensure the TSDoc is a module-level/public
comment (/** ... */) so the exported type carries an explicit contract for
callers.

In `@packages/dispatcher/src/staleness.ts`:
- Around line 61-83: Add top-level TSDoc comments above the exported types
StalenessDeps and StalenessResult: for StalenessDeps describe it as the input
contract for the staleness checker (include short descriptions for repo, github
gateway methods, readSpec, specPath and note that maxPerPass is an optional cap
with default behavior), and for StalenessResult describe it as the output of a
staleness pass (briefly document closed as issue numbers closed this pass, drift
as detected SpecDrift items, and filed as reconcile task issue numbers); ensure
the comments are placed immediately above the respective type declarations so
they satisfy the public-export documentation rule.

In `@packages/dispatcher/test/gates/pr-ready.test.ts`:
- Around line 124-184: Add a regression test in the same suite that constructs
an acceptance-criteria body containing a single integration criterion that is
both evidenced (e.g., includes a named test file like
"packages/cli/test/daemon-entry.test.ts") and also has an invalid deferral
annotation (e.g., "(deferred: ...)" authored by a bot); call evaluatePrReady
with that body and a resolveCommentAuthor that returns a bot author, and assert
the result is { decision: "allow" } — reference evaluatePrReady,
CommentAuthorResolver, and resolveCommentAuthor so the test mirrors the existing
"evidenced integration criterion allows even if a stray bot exemption is
present" but uses a bad/deferred annotation to lock in the OR semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: be5a7373-d23d-4c90-8dd3-4dcf98cf9f00

📥 Commits

Reviewing files that changed from the base of the PR and between 720044c and b74317e.

📒 Files selected for processing (39)
  • .claude/skills/creating-github-issues/SKILL.md
  • .claude/skills/implementing-github-issues/SKILL.md
  • .claude/skills/verifying-requirements/SKILL.md
  • .codex/skills/creating-github-issues/SKILL.md
  • .codex/skills/implementing-github-issues/SKILL.md
  • .codex/skills/verifying-requirements/SKILL.md
  • packages/cli/src/bootstrap-assets/skills/creating-github-issues/SKILL.md
  • packages/cli/src/bootstrap-assets/skills/implementing-github-issues/SKILL.md
  • packages/cli/src/bootstrap-assets/skills/verifying-requirements/SKILL.md
  • packages/cli/src/checks/issue-audit.ts
  • packages/cli/src/commands/audit-issues.ts
  • packages/cli/src/index.ts
  • packages/cli/test/audit-issues-cli.test.ts
  • packages/cli/test/issue-audit.test.ts
  • packages/core/src/index.ts
  • packages/core/src/integration-rubric.ts
  • packages/core/test/integration-rubric.test.ts
  • packages/dispatcher/src/audit-cron.ts
  • packages/dispatcher/src/audit.ts
  • packages/dispatcher/src/gates/pr-ready.ts
  • packages/dispatcher/src/gates/verify-config.ts
  • packages/dispatcher/src/github.ts
  • packages/dispatcher/src/main.ts
  • packages/dispatcher/src/staleness-cron.ts
  • packages/dispatcher/src/staleness.ts
  • packages/dispatcher/test/backlog-audit.test.ts
  • packages/dispatcher/test/epic-143-demo.test.ts
  • packages/dispatcher/test/gates/gate-runner.test.ts
  • packages/dispatcher/test/gates/pr-ready-handler.test.ts
  • packages/dispatcher/test/gates/pr-ready.test.ts
  • packages/dispatcher/test/gates/verify-config.test.ts
  • packages/dispatcher/test/staleness-cron.test.ts
  • packages/dispatcher/test/staleness.test.ts
  • packages/skills/creating-github-issues/SKILL.md
  • packages/skills/implementing-github-issues/SKILL.md
  • packages/skills/verifying-requirements/SKILL.md
  • planning/issues/143/decisions.md
  • planning/issues/143/plan.md
  • schemas/verify.v1.md

Comment thread packages/cli/src/commands/audit-issues.ts
Comment thread packages/cli/src/commands/audit-issues.ts
Comment thread packages/cli/src/commands/audit-issues.ts
Comment thread packages/cli/src/index.ts Outdated
Comment thread packages/cli/test/audit-issues-cli.test.ts Outdated
Comment thread packages/dispatcher/src/audit.ts
Comment thread packages/dispatcher/src/staleness-cron.ts
Comment thread packages/dispatcher/src/staleness.ts Outdated
Comment thread planning/issues/143/decisions.md Outdated
Comment thread planning/issues/143/plan.md Outdated
CodeRabbit review on Epic #143 + internal clean-eyes pass. Each fix carries a test.

- pr-ready gate: an invalid (bot/unresolvable) deferral no longer overrides
  independent evidence (OR semantics); strip EVERY (deferred: …) annotation
  before the evidence check so a deferral's own URL — or a second annotation —
  can't self-satisfy the criterion.
- integration rubric: parseAcceptanceCriteria collects only the first acceptance
  section (a later acceptance heading no longer reopens it); REAL_PATH_TEST_RE
  matches plural 'integration tests' / 'smoke tests'.
- staleness: maxPerPass is one shared budget across closes + filed tasks, not a
  per-bucket cap.
- staleness-cron: readSpec maps only ENOENT to 'no spec'; other I/O errors
  surface instead of silently disabling drift detection.
- audit-issues: addLabelDefault throws on non-zero gh exit; label failures are
  isolated (logged, never logged-as-applied, never abort the sweep); single-issue
  fetch is wrapped in try/catch.
- mm audit-issues: --issue validated as a canonical positive integer (rejects
  trailing garbage like '12abc', not just non-numeric).
- verify-config: unknown-key error built from the live key set (no longer omits
  'category').
- TSDoc on AuditIssuesOptions, BacklogAuditDeps, StalenessDeps, StalenessResult,
  StalenessCronDeps; two markdown-lint fixes in planning docs.
@thejustinwalsh

Copy link
Copy Markdown
Owner Author

Review round 1 addressed — bcf0cbd

All inline comments are replied to in-thread. Covering the findings that had no inline thread to reply to:

Outside-diff range comments

  • gates/verify-config.ts (unknown-key message stale) — fixed; the suggestion is now built from the live KNOWN_GATE_KEYS set, so it can't omit category (or drift again as keys are added). Test asserts the message contains category.
  • gates/pr-ready.ts (invalid deferral overriding valid evidence) — fixed; an unauthorized deferral no longer disqualifies a criterion that also carries independent evidence (OR semantics).

Body nitpicks

  • TSDoc added for StalenessCronDeps, StalenessDeps, and StalenessResult.
  • Added the requested "evidence + bot-deferral on the same criterion → allow" regression test.

Self-review edges hardened in the same pass (same classes):

  • The deferral/evidence fix originally stripped only the first (deferred: …) annotation, so a second annotation's URL could still satisfy the evidence check — now strips every annotation (global regex) before the evidence test. Regression test added (two bot deferrals, no real evidence → deny).
  • --issue validation rejects trailing garbage (12abc), not just fully non-numeric input.
  • addLabelDefault throwing is isolated at both call sites so a label-write failure is logged (never logged-as-applied) without crashing the command or aborting the backlog sweep.

bun test (773 pass), bun run typecheck, bun run lint, bun run format all clean.

@thejustinwalsh thejustinwalsh merged commit 227fde8 into main May 28, 2026
1 check passed
@thejustinwalsh thejustinwalsh deleted the middle-issue-143 branch May 28, 2026 19:24
thejustinwalsh added a commit that referenced this pull request May 28, 2026
PR #163 (#146) added listOpenIssues/addLabel/listMergedPrsClosingRefs/
closeIssue/createIssue to GitHubGateway; this test's in-memory mock predates
them, so root tsc was red on main. Add the missing methods as unimplemented
stubs (the existing pattern for unused surface) to restore a green typecheck.
thejustinwalsh added a commit that referenced this pull request May 29, 2026
Single-pass new-work-as-base merge of origin/main after rebase kept
re-conflicting on the same hunks across multiple commits (CLAUDE.md
escape hatch).

- packages/dispatcher/src/poller-cron.ts — unified `startPoller(deps,
  opts)` signature; folded `ReconcilerHooks` into `StartPollerOptions`
  as `opts.reconcilers` (alongside `opts.checkboxRevert` and
  `opts.intervalMs`).
- packages/dispatcher/src/main.ts — unified daemon-startup: keeps the
  durable engine + `recoverEngine` + `reconcileOrphanedSignals` from
  #160, the notification-failsafe watchdog comment from #162, and adds
  the `reconcileOpenPRsForRepo` block + `reconcilers` config in the
  `startPoller` call. Dropped the now-unused `Engine` import (main
  routes through `createDurableEngine`).
- packages/core/src/index.ts — kept both export blocks: integration
  rubric from #163, `selectAdapter` from this PR.
- packages/dispatcher/test/recommender-run.test.ts — kept both describe
  blocks (adapter-enabled gate from this PR, schema-resolution from
  #157); added `enabled: true` to the schema test's adapter config so
  it passes the new gate.
- packages/dispatcher/test/gates/checkbox-revert-pass.test.ts — added
  the five new `GitHubGateway` methods to the test stub
  (`listOpenIssues`, `addLabel`, `listMergedPrsClosingRefs`,
  `closeIssue`, `createIssue`) main grew during the marathon.

Gates re-verified locally: `bun run typecheck` clean, `bun test
packages/dispatcher` 620/620 pass, `bun run lint` clean, `bun run
format` clean (no changes).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-review All phases done and verified — PR ready for final human review and merge

Projects

None yet

1 participant