feat(self-audit): integration-verified requirements + freshness (Epic #143)#163
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (17)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (11)
📝 WalkthroughWalkthroughAdds a shared integration-rubric in core; new ChangesEpic
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
|
…econd-pass audit (#144)
… + flag spec drift (#146)
…f-auditing systems
thejustinwalsh
left a comment
There was a problem hiding this comment.
Decision-log rationale, distilled inline (source of truth: planning/issues/143/decisions.md). These explain the why behind the load-bearing choices in this Epic.
| * names a product-wiring signal **and** a real-path-test signal. This is the | ||
| * atom both #144 and #145 build on. | ||
| */ | ||
| export function isIntegrationCriterion(text: string): boolean { |
There was a problem hiding this comment.
Why this predicate, here. A criterion is an integration criterion iff it names a product-wiring signal and a real-path-test signal; "unit tests pass" matches neither. This lives in @middle/core (not the CLI check or the dispatcher gate) deliberately: it's the single source of truth both the filing-time auditor (#144) and the landing-time PR-ready gate (#145) consume, so the contract written when an issue is filed is byte-for-byte the one enforced when work lands. The heuristic is anchored to the spec's own worked example (mm start serves the dashboard … a smoke test boots the daemon and GETs /), which matches both signal classes.
|
|
||
| const DEFERRED_RE = /\(deferred:\s*(\S+?)\s*\)/i; | ||
| /** The integration escape hatch — mirrors `(deferred: …)`; must be human-authored. */ | ||
| const INTEGRATION_EXEMPT_RE = /\(integration-exempt:\s*(\S+?)\s*\)/i; |
There was a problem hiding this comment.
Exemption reuses the (deferred: …) shape on purpose. (integration-exempt: <comment-url>) is validated exactly like the existing deferral annotation — the linked comment's author must be a non-bot human. This keeps "declare the exemption explicitly, not silently" honest: an agent can't write its own escape hatch. Note the division of labour with @middle/core's detectExemption (which accepts a prose reason in an issue body): core detects the declaration; this gate validates authorization.
| // takes precedence over any exemption annotation (a real test beats a waiver). | ||
| // A *deferred* integration criterion does NOT count: the integration test can't | ||
| // be punted, only evidenced or (explicitly) exempted. | ||
| const evidenced = criteria.some( |
There was a problem hiding this comment.
Ordering is load-bearing (self-review fix). The evidenced-integration-criterion check runs before the exemption check, so a real test always wins over a waiver — avoiding a false deny when a PR both has an integration test and carries a stray/bot-authored exemption. And a (deferred: …) integration criterion is explicitly excluded: the integration test can be evidenced or exempted, never punted. Letting deferral waive it would reopen the exact "unit-green-but-unwired" hole #145 closes.
| * "ships in phase 12", "planned for Phase 3". The captured group is the phase. | ||
| * Two shapes: "<future-verb> in phase N", and the verb-less "planned for phase N". | ||
| */ | ||
| const DRIFT_RE = |
There was a problem hiding this comment.
Two shapes, not one (self-review fix). The first draft folded planned for into the verb alternation, which then required a trailing in phase — so "planned for phase 3" silently failed to match. The regex now matches <future-verb> in phase N and the verb-less planned for phase N. This is the concrete drift class the spec calls out: a stale "lands in Phase N" line surviving past that phase's merge.
| } | ||
|
|
||
| const gate: Gate = { name: name.trim(), command, timeoutSeconds: resolvedTimeout }; | ||
| let resolvedCategory: GateCategory = "unit"; |
There was a problem hiding this comment.
category defaults to unit. An integration gate is the verify-side companion to the PR-ready integration-evidence check: it declares a gate that exercises the running product (boots/serves/invokes the real path), distinct from unit gates. integrationGates(config) lets callers recognise it. Validation rejects any other value (loud-failure contract).
Reviewer's brief — Epic #143 (PR #163)Three self-auditing systems that make middle apply its second-pass-review instinct to its own requirements (#144), definition of done (#145), and documents (#146). Posted on both the Epic and the PR. How to run itbun install # required in a fresh worktree (see "fragile" below)
bun test # full suite -> 763 pass, 0 fail
bun run typecheck && bun run lint && bun run format # all clean
# Each system's dogfood test (real path, not a stub):
bun test packages/cli/test/audit-issues-cli.test.ts # 144 - spawns the real `mm audit-issues` CLI
bun test packages/dispatcher/test/gates/pr-ready.test.ts # 145 - drives the real PR-ready decision
bun test packages/dispatcher/test/staleness.test.ts # 146 - runs the real reconcile pass
bun test packages/dispatcher/test/epic-143-demo.test.ts # Epic - all three together
# By hand:
printf '## Acceptance criteria\n- [ ] unit tests pass\n' > /tmp/weak.md
bun packages/cli/src/index.ts audit-issues . --body-file /tmp/weak.md --title "Demo" # exits 1 + suggests a rewriteWhat to verify (and what "correct" looks like)
How to reviewStart at the rubric, then the two enforcement points, then the cron wiring in Fragile / extra eyes
Follow-up
Human does the final review + merge - the workflow stops here. |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/dispatcher/src/gates/verify-config.ts (1)
70-74:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUnknown-key error message is stale after adding
category.The guidance text still lists old keys only, which is misleading during config validation failures.
💡 Proposed fix
throw new VerifyConfigError( - `${where}: unknown key "${key}" (did you mean one of name, command, timeout_seconds, phases?)`, + `${where}: unknown key "${key}" (did you mean one of name, command, timeout_seconds, phases, category?)`, );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/dispatcher/src/gates/verify-config.ts` around lines 70 - 74, The unknown-key error message is stale: update the throw in the loop that validates config keys (the block referencing raw, KNOWN_GATE_KEYS, where, and throwing VerifyConfigError) to include the current valid keys instead of the hard-coded list; e.g., build the suggestion from KNOWN_GATE_KEYS (or list name, command, timeout_seconds, phases, category) and include that dynamic list in the VerifyConfigError message so the guidance stays accurate when keys change.packages/dispatcher/src/gates/pr-ready.ts (1)
76-84:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInvalid deferral currently overrides valid evidence for the same criterion.
A criterion with evidence should still pass even if its
(deferred: ...)annotation is invalid; current control flow denies it.💡 Proposed fix
for (const criterion of criteria) { const deferred = DEFERRED_RE.exec(criterion); if (deferred) { const author = await opts.resolveCommentAuthor(deferred[1]!); if (author && !author.isBot) continue; // stakeholder-authorized deferral - unmet.push(criterion); - continue; + if (namesEvidence(criterion)) continue; // evidence still satisfies the criterion + unmet.push(criterion); + continue; } if (namesEvidence(criterion)) continue; // has evidence (link/#ref or a named test file) unmet.push(criterion); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/dispatcher/src/gates/pr-ready.ts` around lines 76 - 84, The deferral check using DEFERRED_RE currently runs before evidence detection and causes an invalid deferred tag to mark a criterion unmet even when namesEvidence(criterion) would pass; re-order the logic so namesEvidence(criterion) is evaluated before handling DEFERRED_RE, or within the deferred branch first verify that namesEvidence is false before pushing to unmet; specifically update the block around DEFERRED_RE, opts.resolveCommentAuthor, namesEvidence, and unmet so valid evidence wins over an invalid `(deferred: ...)` annotation.
🧹 Nitpick comments (3)
packages/dispatcher/test/gates/pr-ready.test.ts (1)
124-184: ⚡ Quick winAdd a regression case for “evidence + invalid deferral” on the same criterion.
Given the OR contract, that criterion should still pass. A focused test here will lock behavior and prevent future regressions.
🧪 Suggested test case
describe("evaluatePrReady — integration evidence", () => { + test("evidence still satisfies when the same criterion has a bot-authored deferral", async () => { + const body = + "## Acceptance criteria\n- [ ] done (https://example.com/x) (deferred: https://github.com/o/r/issues/1#issuecomment-2)\n- [ ] `mm start` serves it; a smoke test boots the daemon and GETs `/` (packages/cli/test/daemon-entry.test.ts)"; + const resolve: CommentAuthorResolver = async () => ({ login: "middle[bot]", isBot: true }); + const result = await evaluatePrReady({ body, resolveCommentAuthor: resolve }); + expect(result).toEqual({ decision: "allow" }); + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/dispatcher/test/gates/pr-ready.test.ts` around lines 124 - 184, Add a regression test in the same suite that constructs an acceptance-criteria body containing a single integration criterion that is both evidenced (e.g., includes a named test file like "packages/cli/test/daemon-entry.test.ts") and also has an invalid deferral annotation (e.g., "(deferred: ...)" authored by a bot); call evaluatePrReady with that body and a resolveCommentAuthor that returns a bot author, and assert the result is { decision: "allow" } — reference evaluatePrReady, CommentAuthorResolver, and resolveCommentAuthor so the test mirrors the existing "evidenced integration criterion allows even if a stray bot exemption is present" but uses a bad/deferred annotation to lock in the OR semantics.packages/dispatcher/src/staleness-cron.ts (1)
23-33: ⚡ Quick winAdd a top-level TSDoc block for
StalenessCronDeps.
StalenessCronDepsis a public export and should have an explicit module-level contract comment above the type.As per coding guidelines:
Every public export in a module must carry a TSDoc/JSDoc comment.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/dispatcher/src/staleness-cron.ts` around lines 23 - 33, Add a top-level TSDoc comment immediately above the exported type StalenessCronDeps describing the purpose of this dependency bag (what the staleness cron needs), and briefly document each property (db, github with its required methods list, specPath and its default DEFAULT_SPEC_PATH, and now as an optional time provider). Ensure the TSDoc is a module-level/public comment (/** ... */) so the exported type carries an explicit contract for callers.packages/dispatcher/src/staleness.ts (1)
61-83: ⚡ Quick winAdd top-level TSDoc for exported
StalenessDepsandStalenessResult.Both are public exports and should carry explicit contract comments above the type declarations.
As per coding guidelines:
Every public export in a module must carry a TSDoc/JSDoc comment.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/dispatcher/src/staleness.ts` around lines 61 - 83, Add top-level TSDoc comments above the exported types StalenessDeps and StalenessResult: for StalenessDeps describe it as the input contract for the staleness checker (include short descriptions for repo, github gateway methods, readSpec, specPath and note that maxPerPass is an optional cap with default behavior), and for StalenessResult describe it as the output of a staleness pass (briefly document closed as issue numbers closed this pass, drift as detected SpecDrift items, and filed as reconcile task issue numbers); ensure the comments are placed immediately above the respective type declarations so they satisfy the public-export documentation rule.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/cli/src/commands/audit-issues.ts`:
- Around line 14-37: Add a top-level TSDoc comment for the exported type
AuditIssuesOptions describing its purpose and public contract (e.g., "Options
for running the audit-issues command, controlling input sources, GitHub
interactions, and logging"); place it immediately above the "export type
AuditIssuesOptions" declaration and include brief notes about intended use,
defaults/behavior for key fields like issue, bodyFile, resolveSlug, fetchIssue,
listOpenIssues, addLabel, readBodyFile, log, and errlog so consumers understand
the API surface.
- Around line 156-161: Wrap the single-issue fetch in a try/catch: when
opts.issue is set, call (opts.fetchIssue ?? fetchIssueDefault)(slug, opts.issue)
inside a try block and catch any thrown errors, logging them via errlog (include
error.message or the error object) and return 1; keep the existing null-check
for the returned issue but ensure thrown parser/runtime exceptions are handled
the same way. Use the same symbols: opts.issue, fetchIssue/fetchIssueDefault,
slug, errlog, and preserve the existing control flow.
- Around line 110-112: The helper addLabelDefault silently ignores failures from
the gh call, so it should fail fast: update addLabelDefault (the function that
calls gh(["gh", "issue", "edit", String(n), "--repo", slug, "--add-label",
label])) to detect a non-zero/failed result and throw or propagate an Error when
gh does not succeed (or rethrow the underlying error) instead of returning
silently; ensure the caller receives the rejection so label application failures
are not treated as successful.
In `@packages/cli/src/index.ts`:
- Around line 165-183: Validate the options.issue value before calling
runAuditIssues: in the async CLI handler that currently forwards issue:
options.issue === undefined ? undefined : Number(options.issue), parse and
verify options.issue is a positive integer (use parseInt/Number and
Number.isInteger and > 0) and if invalid print a clear error and exit with
non-zero status instead of passing NaN into runAuditIssues; keep undefined when
the flag is omitted and only convert to Number once it passes validation.
In `@packages/cli/test/audit-issues-cli.test.ts`:
- Around line 30-41: The test is combining stderr into the returned out which
makes JSON parsing flaky; update the runCli helper (function runCli) to return
stdout and stderr separately (e.g., return { code, stdout, stderr }) or add a
flag to indicate JSON mode, and then in the JSON-mode test parse JSON from the
stdout field only (do not include stderr); also update the other tests that rely
on runCli (including the ones referenced around the second occurrence) to use
the new shape or flag when they need combined output.
In `@packages/core/src/integration-rubric.ts`:
- Around line 56-57: The REAL_PATH_TEST_RE currently uses "integration[ -]?test"
and "smoke[ -]?test" which won't match plural forms like "integration tests" or
"smoke tests"; update the pattern (symbol: REAL_PATH_TEST_RE) to accept optional
plural "s" for those tokens (e.g., use "integration[ -]?tests?" and "smoke[
-]?tests?") while preserving the rest of the alternatives, flags, and
case-insensitivity so plural phrases are recognized without breaking other
matches.
- Around line 31-33: parseAcceptanceCriteria currently toggles inSection on any
acceptance heading, allowing later headings to reopen acceptance parsing; add a
guard to enforce "first acceptance heading only": introduce a boolean (e.g.,
seenAcceptanceSection) and in parseAcceptanceCriteria when you detect a heading
(/^#{1,6}\s/), if it's an acceptance heading set inSection = true only when
seenAcceptanceSection is false and then set seenAcceptanceSection = true; for
any subsequent headings (or when leaving the section) ensure inSection remains
false so later acceptance headings are ignored—update references to inSection
and the heading-detection branch to use seenAcceptanceSection to prevent
reopening the acceptance section.
In `@packages/dispatcher/src/audit.ts`:
- Around line 19-24: Add a declaration-level TSDoc block above the exported
BacklogAuditDeps type describing the purpose of the type and each field (repo,
github, and maxFlagsPerPass), including that maxFlagsPerPass is optional and
defaults to DEFAULT_MAX_FLAGS_PER_PASS; ensure the comment follows TSDoc style
(/** ... */) and mentions this is the dependency contract for the backlog audit
routine so it documents the public export.
In `@packages/dispatcher/src/staleness-cron.ts`:
- Around line 51-56: The readSpec closure currently swallows all filesystem
errors and treats them as a missing spec; change readSpec to only convert ENOENT
(file not found) into null and rethrow any other errors so permission/I/O errors
surface. Specifically, inside the readSpec function that calls
readFileSync(join(managed.checkoutPath, specPath), "utf8"), catch the thrown
error, check error.code === "ENOENT" and return null for that case, otherwise
rethrow the error (or throw a new error preserving the original) so the
caller/runtime logs a real failure instead of silently skipping drift checks.
In `@packages/dispatcher/src/staleness.ts`:
- Around line 93-94: The cap is applied separately to closes and creates so a
pass can exceed the intended total; initialize a single budget = deps.maxPerPass
?? DEFAULT_MAX_PER_PASS and use it across all mutation decisions (both close and
create paths) instead of using cap per bucket. In staleness.ts, replace
per-bucket cap checks around the code that builds close lists and task-create
lists (references: cap, deps.maxPerPass, DEFAULT_MAX_PER_PASS, the loops that
call deps.github.listOpenIssues and the logic that increments closes/creates)
with checks that consult and decrement the shared budget; stop processing
further buckets or items once budget reaches zero. Ensure any early returns or
final reporting use the shared budget so the sum of closes+creates never exceeds
the configured maxPerPass.
In `@planning/issues/143/decisions.md`:
- Line 26: Update the inline code span that currently contains a trailing space
(`mm `) to remove the space so it reads `mm`; locate the code span in the text
"product-wiring signal
(served/mounted/invoked/reachable/wired/booted/GET/POST/`mm `/endpoint/route…)"
and replace the malformed inline code token `mm ` with `mm` to satisfy Markdown
lint rule MD038.
In `@planning/issues/143/plan.md`:
- Around line 38-39: Replace the awkward phrase "requires evidenced integration
test" in the line containing "category (schema + parser + `verify.v1.md`);
PR-ready gate requires evidenced integration test + `(integration-exempt:
<url>)` escape hatch;" with the clearer wording "requires an evidenced
integration test" (or alternatively "requires integration-test evidence") so the
sentence reads naturally and unambiguously.
---
Outside diff comments:
In `@packages/dispatcher/src/gates/pr-ready.ts`:
- Around line 76-84: The deferral check using DEFERRED_RE currently runs before
evidence detection and causes an invalid deferred tag to mark a criterion unmet
even when namesEvidence(criterion) would pass; re-order the logic so
namesEvidence(criterion) is evaluated before handling DEFERRED_RE, or within the
deferred branch first verify that namesEvidence is false before pushing to
unmet; specifically update the block around DEFERRED_RE,
opts.resolveCommentAuthor, namesEvidence, and unmet so valid evidence wins over
an invalid `(deferred: ...)` annotation.
In `@packages/dispatcher/src/gates/verify-config.ts`:
- Around line 70-74: The unknown-key error message is stale: update the throw in
the loop that validates config keys (the block referencing raw, KNOWN_GATE_KEYS,
where, and throwing VerifyConfigError) to include the current valid keys instead
of the hard-coded list; e.g., build the suggestion from KNOWN_GATE_KEYS (or list
name, command, timeout_seconds, phases, category) and include that dynamic list
in the VerifyConfigError message so the guidance stays accurate when keys
change.
---
Nitpick comments:
In `@packages/dispatcher/src/staleness-cron.ts`:
- Around line 23-33: Add a top-level TSDoc comment immediately above the
exported type StalenessCronDeps describing the purpose of this dependency bag
(what the staleness cron needs), and briefly document each property (db, github
with its required methods list, specPath and its default DEFAULT_SPEC_PATH, and
now as an optional time provider). Ensure the TSDoc is a module-level/public
comment (/** ... */) so the exported type carries an explicit contract for
callers.
In `@packages/dispatcher/src/staleness.ts`:
- Around line 61-83: Add top-level TSDoc comments above the exported types
StalenessDeps and StalenessResult: for StalenessDeps describe it as the input
contract for the staleness checker (include short descriptions for repo, github
gateway methods, readSpec, specPath and note that maxPerPass is an optional cap
with default behavior), and for StalenessResult describe it as the output of a
staleness pass (briefly document closed as issue numbers closed this pass, drift
as detected SpecDrift items, and filed as reconcile task issue numbers); ensure
the comments are placed immediately above the respective type declarations so
they satisfy the public-export documentation rule.
In `@packages/dispatcher/test/gates/pr-ready.test.ts`:
- Around line 124-184: Add a regression test in the same suite that constructs
an acceptance-criteria body containing a single integration criterion that is
both evidenced (e.g., includes a named test file like
"packages/cli/test/daemon-entry.test.ts") and also has an invalid deferral
annotation (e.g., "(deferred: ...)" authored by a bot); call evaluatePrReady
with that body and a resolveCommentAuthor that returns a bot author, and assert
the result is { decision: "allow" } — reference evaluatePrReady,
CommentAuthorResolver, and resolveCommentAuthor so the test mirrors the existing
"evidenced integration criterion allows even if a stray bot exemption is
present" but uses a bad/deferred annotation to lock in the OR semantics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: be5a7373-d23d-4c90-8dd3-4dcf98cf9f00
📒 Files selected for processing (39)
.claude/skills/creating-github-issues/SKILL.md.claude/skills/implementing-github-issues/SKILL.md.claude/skills/verifying-requirements/SKILL.md.codex/skills/creating-github-issues/SKILL.md.codex/skills/implementing-github-issues/SKILL.md.codex/skills/verifying-requirements/SKILL.mdpackages/cli/src/bootstrap-assets/skills/creating-github-issues/SKILL.mdpackages/cli/src/bootstrap-assets/skills/implementing-github-issues/SKILL.mdpackages/cli/src/bootstrap-assets/skills/verifying-requirements/SKILL.mdpackages/cli/src/checks/issue-audit.tspackages/cli/src/commands/audit-issues.tspackages/cli/src/index.tspackages/cli/test/audit-issues-cli.test.tspackages/cli/test/issue-audit.test.tspackages/core/src/index.tspackages/core/src/integration-rubric.tspackages/core/test/integration-rubric.test.tspackages/dispatcher/src/audit-cron.tspackages/dispatcher/src/audit.tspackages/dispatcher/src/gates/pr-ready.tspackages/dispatcher/src/gates/verify-config.tspackages/dispatcher/src/github.tspackages/dispatcher/src/main.tspackages/dispatcher/src/staleness-cron.tspackages/dispatcher/src/staleness.tspackages/dispatcher/test/backlog-audit.test.tspackages/dispatcher/test/epic-143-demo.test.tspackages/dispatcher/test/gates/gate-runner.test.tspackages/dispatcher/test/gates/pr-ready-handler.test.tspackages/dispatcher/test/gates/pr-ready.test.tspackages/dispatcher/test/gates/verify-config.test.tspackages/dispatcher/test/staleness-cron.test.tspackages/dispatcher/test/staleness.test.tspackages/skills/creating-github-issues/SKILL.mdpackages/skills/implementing-github-issues/SKILL.mdpackages/skills/verifying-requirements/SKILL.mdplanning/issues/143/decisions.mdplanning/issues/143/plan.mdschemas/verify.v1.md
CodeRabbit review on Epic #143 + internal clean-eyes pass. Each fix carries a test. - pr-ready gate: an invalid (bot/unresolvable) deferral no longer overrides independent evidence (OR semantics); strip EVERY (deferred: …) annotation before the evidence check so a deferral's own URL — or a second annotation — can't self-satisfy the criterion. - integration rubric: parseAcceptanceCriteria collects only the first acceptance section (a later acceptance heading no longer reopens it); REAL_PATH_TEST_RE matches plural 'integration tests' / 'smoke tests'. - staleness: maxPerPass is one shared budget across closes + filed tasks, not a per-bucket cap. - staleness-cron: readSpec maps only ENOENT to 'no spec'; other I/O errors surface instead of silently disabling drift detection. - audit-issues: addLabelDefault throws on non-zero gh exit; label failures are isolated (logged, never logged-as-applied, never abort the sweep); single-issue fetch is wrapped in try/catch. - mm audit-issues: --issue validated as a canonical positive integer (rejects trailing garbage like '12abc', not just non-numeric). - verify-config: unknown-key error built from the live key set (no longer omits 'category'). - TSDoc on AuditIssuesOptions, BacklogAuditDeps, StalenessDeps, StalenessResult, StalenessCronDeps; two markdown-lint fixes in planning docs.
Review round 1 addressed — bcf0cbdAll inline comments are replied to in-thread. Covering the findings that had no inline thread to reply to: Outside-diff range comments
Body nitpicks
Self-review edges hardened in the same pass (same classes):
|
PR #163 (#146) added listOpenIssues/addLabel/listMergedPrsClosingRefs/ closeIssue/createIssue to GitHubGateway; this test's in-memory mock predates them, so root tsc was red on main. Add the missing methods as unimplemented stubs (the existing pattern for unused surface) to restore a green typecheck.
Single-pass new-work-as-base merge of origin/main after rebase kept re-conflicting on the same hunks across multiple commits (CLAUDE.md escape hatch). - packages/dispatcher/src/poller-cron.ts — unified `startPoller(deps, opts)` signature; folded `ReconcilerHooks` into `StartPollerOptions` as `opts.reconcilers` (alongside `opts.checkboxRevert` and `opts.intervalMs`). - packages/dispatcher/src/main.ts — unified daemon-startup: keeps the durable engine + `recoverEngine` + `reconcileOrphanedSignals` from #160, the notification-failsafe watchdog comment from #162, and adds the `reconcileOpenPRsForRepo` block + `reconcilers` config in the `startPoller` call. Dropped the now-unused `Engine` import (main routes through `createDurableEngine`). - packages/core/src/index.ts — kept both export blocks: integration rubric from #163, `selectAdapter` from this PR. - packages/dispatcher/test/recommender-run.test.ts — kept both describe blocks (adapter-enabled gate from this PR, schema-resolution from #157); added `enabled: true` to the schema test's adapter config so it passes the new gate. - packages/dispatcher/test/gates/checkbox-revert-pass.test.ts — added the five new `GitHubGateway` methods to the test stub (`listOpenIssues`, `addLabel`, `listMergedPrsClosingRefs`, `closeIssue`, `createIssue`) main grew during the marathon. Gates re-verified locally: `bun run typecheck` clean, `bun test packages/dispatcher` 620/620 pass, `bun run lint` clean, `bun run format` clean (no changes).
Summary
Closes #143
Three coordinated self-auditing systems that fix middle's "green tests as the artifact" failure mode — middle applying its own second-pass-review instinct to its requirements, its definition of done, and its documents:
@middle/core, anmm audit-issuescommand, averifying-requirementsskill, acreating-github-issuessecond pass, and a standing backlog-audit cron that labels weak issuesneeds-design.verify.tomlintegration gate category and a PR-ready gate that blocks a feature whose integration criterion isn't evidenced by a named test (with a human-authored exemption escape hatch).Each system is integration-verified itself — its own test exercises the real path (spawns the real
mmCLI / drives the real gate / runs the real pass), the very contract the Epic introduces. See the plan comment andplanning/issues/143/plan.md.Status
@middle/corerubric +mm audit-issues+verifying-requirementsskill +creating-github-issuessecond pass + backlog-audit cron +needs-designlabellingverify.tomlintegration gate category + PR-ready integration-evidence gate +implementing-github-issuesDoD updateAcceptance criteria
completedwith evidence comments on this PR).packages/dispatcher/test/epic-143-demo.test.ts(drives the realauditIssueBody,evaluatePrReady, andreconcileStalenesspaths, all three assertions green).Verification evidence
#144 — Requirements auditor
packages/core/src/integration-rubric.ts):isIntegrationCriterionrequires a product-wiring signal and a real-path-test signal; "unit tests pass" fails. Unit tests:packages/core/test/integration-rubric.test.ts.mm audit-issues(packages/cli/src/commands/audit-issues.ts): three modes (--body-file,--issue, backlog--label), exit 0/1 as a gate.packages/cli/test/audit-issues-cli.test.tsspawns the realmm audit-issuesCLI against a weak fixture (asserts it flags + suggests, exit 1) and a well-formed one (asserts it passes, exit 0). The dogfooded "exercise the real path" requirement, not a unit stub.creating-github-issues(Phase 8.5) + newverifying-requirementsskill (all four skill copies in sync).packages/dispatcher/src/audit-cron.ts) labels rubric-failing open feature issuesneeds-design; pass-level testpackages/dispatcher/test/backlog-audit.test.ts.bun test→ 745 pass;bun run typecheck,bun run lint,bun run formatclean.#145 — Integration-verified definition of done
verify.tomlintegration gate category (packages/dispatcher/src/gates/verify-config.ts): acategory = "unit" | "integration"field (defaultunit) +integrationGates()helper; schema docschemas/verify.v1.mdupdated; validation rejects bad categories. Tests inpackages/dispatcher/test/gates/verify-config.test.ts.packages/dispatcher/src/gates/pr-ready.ts): on top of the per-criterion evidence check, the gate now requires ≥1 acceptance criterion that is an integration criterion (shared@middle/corerubric) evidenced by a named test, or a human-authored(integration-exempt: <comment-url>)annotation. A unit-only feature is blocked from ready.packages/dispatcher/test/gates/pr-ready.test.tsdrives the realevaluatePrReadydecision against a fixture PR lacking an integration test (asserts deny) and one evidencing it (asserts allow), plus the exemption (human → allow, bot → deny).implementing-github-issues(7d + the PR-ready gate note); all four skill copies in sync.bun test→ 752 pass; typecheck/lint/format clean.#146 — Anti-staleness reconciliation
reconcileStalenesspass (packages/dispatcher/src/staleness.ts): closes landed-but-open issues (a merged PR'sclosingIssuesReferencesnames them, yet they're open) with an evidence comment naming the PR, anddetectSpecDriftflags spec lines describing a now-merged phase as future work (e.g. "lands in Phase 9"), filing a proposal-firsthousekeepingreconcile task (deduped by title). Never edits the spec, never closes without an evidence trail.packages/dispatcher/src/staleness-cron.ts) sweeps managed, non-paused repos hourly, reading each repo's spec from its checkout; wired intomain.tsalongside the poller/recommender/audit crons (and torn down on shutdown).packages/dispatcher/src/github.ts):listMergedPrsClosingRefs,closeIssue,createIssue.packages/dispatcher/test/staleness.test.tsruns the real pass against an in-memoryGitHubGateway+ a drifted fixture spec (landed-but-open issue + "lands in Phase 9" line) and asserts the close and the drift task both fire;staleness-cron.test.tscovers the managed-repo sweep reading a real spec file.bun test→ 761 pass; typecheck/lint/format clean.How to run / verify
Full suite at HEAD:
bun test→ 763 pass, 0 fail;bun run typecheck/lint/formatclean. Branch isMERGEABLEagainstmain(rebased — main hadn't moved).How to review
packages/core/src/integration-rubric.ts.isIntegrationCriterion(wiring signal and real-path-test signal) is the atom both Requirements auditor: audit issue acceptance criteria against an integration rubric #144 and Integration-verified definition of done: verify.toml integration gate + PR-ready evidence #145 build on; if you agree with it, the two enforcement points follow.packages/cli/src/commands/audit-issues.ts(the command) +packages/dispatcher/src/audit.ts/audit-cron.ts(the cron). The dogfood testpackages/cli/test/audit-issues-cli.test.tsspawns the real CLI.packages/dispatcher/src/gates/pr-ready.tsevaluateIntegrationEvidence— the most security-sensitive change (it gates PR-ready). Verify there's no bypass: a deferred integration criterion must not count; the exemption must be human-authored. Tests inpackages/dispatcher/test/gates/pr-ready.test.ts.packages/dispatcher/src/staleness.ts— confirm it never edits the spec and never closes an issue without an evidence comment.WIRING_RE/REAL_PATH_TEST_REin core;DRIFT_RE/TEST_FILE_RE/INTEGRATION_EXEMPT_REin the gate) — false positives/negatives are the main risk surface; and the four-copy skill sync (packages/skills↔bootstrap-assets↔.claude/skills↔.codex/skills), all verified in parity.Stumbling points
node_modules. A freshgit worktreehad never hadbun installrun, so directbun <file>runs and workspace subpath imports (@middle/dispatcher/src/db.ts) failed to resolve — thoughbun testresolved package roots fine, masking it. Runningbun installin the worktree fixed it and is required for the real-CLI integration test (which spawnsbun src/index.ts). Suggested CLAUDE.md note below.commandIsPrReadysubstring-matchesgh pr readyanywhere in a command, so agrep "gh pr ready"over the skill docs fired the gate against this very PR. Harmless here, but it's a reminder the matcher is broad by design.Suggested CLAUDE.md updates
git worktreeneedsbun installbeforebun run <file>/ the daemon / the real-CLI tests will resolve workspace subpath imports —bun testmasks this because it resolves package roots without the symlinks.Follow-up issues
Out of scope
Decisions
See
planning/issues/143/decisions.md(distilled into per-line review comments on this PR).Summary by CodeRabbit
New Features
Bug Fixes & Improvements
Documentation