fix(dispatcher): multi-repo coordination — close the real holes#229
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds deterministic cross-repo blocker resolution (gateway + engine + workflow), a normalized shared-checkout collision guard wired into init/registration/dispatch with DB uniqueness migration, and a two-phase recommender cron with bounded concurrency and per-repo timeouts plus tests. ChangesCross-repo blocker resolution
Shared checkout path collision guard
Recommender cron parallelization
Sequence Diagram(s)sequenceDiagram
participant Recommender as Recommender Workflow
participant EpicGateway as EpicGateway.getIssueState
participant GitHub as gh CLI / GitHub
participant FileEpic as File Epic
Recommender->>Recommender: reapply-dispatcher-sections
Recommender->>EpicGateway: getIssueState(repoB, "123")
EpicGateway->>FileEpic: check local epic file
FileEpic-->>EpicGateway: {state,title} or null
EpicGateway->>GitHub: gh issue view (if numeric)
GitHub-->>EpicGateway: {state,title} or null
EpicGateway-->>Recommender: IssueState | null
Recommender->>Recommender: annotate open blocker or move closed → readyToDispatch
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…ime (#225) Add a deterministic resolve-blockers step to the recommender workflow that parses each BlockedItem.blocker, resolves same-repo (#n) and cross-repo (owner/repo#n) issue references against live state via the routing EpicGateway, and reclassifies the blocked item: a closed blocker unblocks it into Ready to dispatch, an open one annotates the line with the blocker title, an unresolvable one gets a (stale blocker: <ref>) suffix. Backticked non-issue blockers stay put. - New EpicGateway.getIssueState (github via gh issue view; file via the Epic file) - New pure blocker-resolution module (unit-tested) + workflow wiring - Relax the state-issue validator + schema doc for cross-repo/annotated blockers - Integration test drives the real workflow through the engine, observed via the live state body
Two repo slugs pointing at one checkout collide on git worktree add — the second mm init silently overwrote the in-memory path map. Add a collision guard: - registerManagedRepo throws RepoPathCollisionError when a DIFFERENT repo is already registered for the same checkout_path; same-slug re-register stays idempotent. New assertNoRepoPathCollision helper. - mm init runs the guard (injected checkCollision hook) after the slug resolves but before any files are written, so a rejected init scaffolds nothing; it exits non-zero naming both repos + the shared path. - The /control/dispatch route maps the collision to a 400 (naming both repos), re-throwing any other error. Tests: repo-config (a/b/c cases), control-routes 400 mapping, init e2e asserting the second repo's .middle/<slug>.toml is never written.
Verification gates — phase #225✅ All 4 verification gate(s) passed for phase #225.
format — ✅ pass (0.3s)lint — ✅ pass (0.1s)typecheck — ✅ pass (2.1s)test — ✅ pass (87.3s) |
Verification gates — phase #226✅ All 4 verification gate(s) passed for phase #226.
format — ✅ pass (0.3s)lint — ✅ pass (0.1s)typecheck — ✅ pass (2.1s)test — ✅ pass (86.5s) |
The cron awaited each repo's recommender run sequentially, so a hung run (gh blocked on auth, a long state-issue write) stalled every later repo until its timeout fired. Fire the per-repo runs concurrently instead: - Phase 1 stamps last_recommender_run for every due repo synchronously (before any await), preserving the overlapping-tick double-dispatch guard under concurrency; Phase 2 fans them out behind a bounded pool (maxConcurrentRepos, default 4), each under a hard per-repo timeout (runTimeoutMs, default 60s). - A hang/timeout/throw on one repo is isolated: its stamp rolls back (retries next tick) and the others complete unaffected. - New global [recommender] config: max_concurrent_repos, run_timeout_seconds. Integration test: 3 repos (A 100ms, B hangs 5s, C 200ms) — A+C succeed, B is marked failed-by-timeout, the pass finishes <2s (B never blocks A/C); plus a bounded-concurrency test asserting maxInFlight never exceeds the cap.
Verification gates — phase #227✅ All 4 verification gate(s) passed for phase #227.
format — ✅ pass (0.3s)lint — ✅ pass (0.1s)typecheck — ✅ pass (2.0s)test — ✅ pass (86.4s) |
…tion Self-review (internal code-review pass) findings: - resolveBlockers: an open blocker with an empty/whitespace title produced '#42 ()', which the verify step's validate() then rejected — fall back to the bare ref when the sanitized title is empty; also truncate the annotation title to 60 chars (consistent with the Epic-cell rule). - repo-config: normalize checkout paths (resolve) on store + collision-compare so trailing-slash / dot-segment spellings of one directory don't slip the #226 guard. Symlinks remain out of scope (documented). Each with a regression test.
thejustinwalsh
left a comment
There was a problem hiding this comment.
Decision-log highlights, distilled inline (rationale lives in planning/issues/211/decisions.md).
| * Returns the original `state` object unchanged when nothing reclassified (so a | ||
| * no-resolvable-blockers pass is a cheap no-op the caller can skip writing). | ||
| */ | ||
| export async function resolveBlockers( |
There was a problem hiding this comment.
#225 design. Resolution is a deterministic post-agent step, not the agent's job — the audit's finding was that no code consumes blockedItem.blocker. A closed blocker → Ready to dispatch (best-effort row the next full recommender run re-ranks); open → stays blocked, annotated with the title; unresolvable → (stale blocker: <ref>). The "needs-human if its own criteria are unmet" branch of the AC is intentionally NOT done here — judging an Epic's own acceptance-criteria readiness requires reading its body and is the recommender agent's job, not deterministic code. Pure module so it unit-tests with an injected resolver.
| * AFTER the dispatcher-owned reapply so it reads the latest body, and BEFORE | ||
| * verify so the reclassified body is what gets validated. | ||
| */ | ||
| async function resolveBlockersStep(ctx: StepContext<RecommenderInput>): Promise<void> { |
There was a problem hiding this comment.
#225 placement. Runs after reapply-dispatcher-sections (so it reads the latest body) and before verify-state-issue-parses (so the reclassified body is what gets validated). Best-effort + idempotent, mirroring the reapply step: skip on parse error, skip the listOpenEpics/gh round-trips entirely when no blocked item carries a resolvable issue reference, skip the write on a no-op.
| // optionally annotated with a resolved title or `(stale blocker: …)`), or a | ||
| // backticked / free-text non-issue blocker (exempt). Only validate the shape | ||
| // of one that's *trying* to be an issue reference. | ||
| if (REF_LIKE_BLOCKER_RE.test(item.blocker) && !BLOCKER_REF_RE.test(item.blocker)) { |
There was a problem hiding this comment.
#225 schema change. This previously rejected any #-prefixed blocker that wasn't exactly #\d+. Relaxed to accept an optional <owner>/<repo> cross-repo prefix and an optional trailing (<title>) / (stale blocker: <ref>) annotation — otherwise the resolution pass's own output would fail the verify step it feeds. schemas/state-issue.v1.md (the source of truth) is updated in the same change.
| * {@link registerManagedRepo} before it writes, and by `mm init` *before* it | ||
| * scaffolds (so a rejected init writes nothing). | ||
| */ | ||
| export function assertNoRepoPathCollision(db: Database, repo: string, checkoutPath: string): void { |
There was a problem hiding this comment.
#226 guard. One helper, two callers: registerManagedRepo calls it before its INSERT (so the daemon's rememberRepoPath rejects too) and mm init calls it via an injected hook before scaffolding (so a rejected init writes nothing). The repo != ? filter keeps a same-slug re-register idempotent — the daemon re-registers a repo's path on every dispatch and that must never self-collide. Paths are resolve()-normalized so /foo and /foo/ collide; symlinks are out of scope (documented).
| const timeoutMs = deps.runTimeoutMs ?? DEFAULT_RUN_TIMEOUT_MS; | ||
| const limit = deps.maxConcurrentRepos ?? DEFAULT_MAX_CONCURRENT_REPOS; | ||
|
|
||
| // Phase 1 — due-check + stamp, SYNCHRONOUSLY before any run fires. Stamping all |
There was a problem hiding this comment.
#227 ordering. Two phases: stamp last_recommender_run for every due repo synchronously (no intervening await), then fan out the runs concurrently. Stamping all due repos before any await is what preserves the existing "overlapping tick can't double-dispatch" invariant under concurrency. The per-repo withTimeout is what actually delivers the fix: a hung gh/state-write on one repo is abandoned (its promise orphaned but no longer blocking) so the others still run; its stamp rolls back so it retries next tick.
Reviewer's brief — Epic #211 (PR #229)Three independent multi-repo coordination fixes, one per closed sub-issue. Posted on both the Epic and the PR. How to run itbun install
bun run typecheck # clean
bun run lint # clean
bun test # 1408 pass
# The three integration tests that prove the fixes:
bun test packages/dispatcher/test/multi-repo-blockers.test.ts # #225
bun test packages/cli/test/init-collision.test.ts # #226
bun test packages/dispatcher/test/recommender-cron-parallel.test.ts # #227What to verify (per fix)
How to review itThree commits, one per sub-issue (plus a review-hardening commit Fragile bits worth extra eyes
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/dispatcher/src/epic-store/file-epic-gateway.ts`:
- Around line 144-145: The current branch calls gh.getIssueState when
epicFileExists(epicsDir, ref) is false which causes failures for non-numeric
slug refs; change the logic in file-epic-gateway so that when
epicFileExists(epicsDir, ref) is false you first detect whether ref is a numeric
issue id (e.g. /^\d+$/) and only call gh.getIssueState for numeric refs,
otherwise return null to mark the slug as unresolvable; update the conditional
around epicFileExists/readEpicFile to implement this numeric check and return
null for non-numeric missing refs.
In `@packages/dispatcher/src/github.ts`:
- Around line 380-382: getIssueState() currently treats any parsed.state value
other than "OPEN" as "closed"; update it to explicitly handle known uppercase gh
values: return "open" for "OPEN", "closed" for "CLOSED", and "merged" for
"MERGED" (PR refs), and return null for any other/unexpected parsed.state to
mark the status as unknown/stale; keep returning parsed.title unchanged. Locate
the logic using parsed.state and parsed.title in getIssueState() and replace the
ternary that maps everything else to "closed" with an explicit switch/if that
returns "open"/"closed"/"merged"/null as described.
In `@packages/dispatcher/src/recommender-cron.ts`:
- Around line 120-121: Guard against non-positive runTimeoutMs by ensuring
timeoutMs uses the default when deps.runTimeoutMs is 0 or negative: replace the
current assignment for timeoutMs (which reads deps.runTimeoutMs ??
DEFAULT_RUN_TIMEOUT_MS) with a check that picks deps.runTimeoutMs only if it's >
0, otherwise use DEFAULT_RUN_TIMEOUT_MS (e.g. timeoutMs = deps.runTimeoutMs &&
deps.runTimeoutMs > 0 ? deps.runTimeoutMs : DEFAULT_RUN_TIMEOUT_MS). This uses
the existing symbols timeoutMs, deps.runTimeoutMs, and DEFAULT_RUN_TIMEOUT_MS so
runs won't immediately time out when a non-positive value is provided.
In `@packages/dispatcher/src/repo-config.ts`:
- Around line 115-121: The current read-then-write flow in
assertNoRepoPathCollision and registerManagedRepo can race; add a DB-level
uniqueness guarantee by creating a partial unique index on
repo_config.checkout_path for non-null values (so nulls remain allowed) in the
migration that added checkout_path, then modify registerManagedRepo to handle
unique-constraint violations from the INSERT ... ON CONFLICT(...) DO UPDATE and
translate them into a RepoPathCollisionError: on constraint error, normalize the
attempted checkoutPath, query repo_config for the existing repo with that
normalized checkout_path, and throw new RepoPathCollisionError(existingRepo,
attemptedRepo, normalizedPath); keep assertNoRepoPathCollision for eager checks
but rely on the DB constraint + error handling as the atomic guard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 849cb601-6fd2-4171-80d9-2fe9a714900c
📒 Files selected for processing (29)
packages/cli/src/bootstrap/init.tspackages/cli/src/bootstrap/types.tspackages/cli/src/commands/init.tspackages/cli/src/index.tspackages/cli/test/init-collision.test.tspackages/core/src/config.tspackages/core/test/config.test.tspackages/dispatcher/src/blocker-resolution.tspackages/dispatcher/src/epic-store/file-epic-gateway.tspackages/dispatcher/src/epic-store/index.tspackages/dispatcher/src/github.tspackages/dispatcher/src/hook-server.tspackages/dispatcher/src/main.tspackages/dispatcher/src/recommender-cron.tspackages/dispatcher/src/recommender-run.tspackages/dispatcher/src/repo-config.tspackages/dispatcher/src/workflows/recommender.tspackages/dispatcher/test/blocker-resolution.test.tspackages/dispatcher/test/control-routes.test.tspackages/dispatcher/test/gates/checkbox-revert-pass.test.tspackages/dispatcher/test/multi-repo-blockers.test.tspackages/dispatcher/test/recommender-cron-parallel.test.tspackages/dispatcher/test/recommender-workflow.test.tspackages/dispatcher/test/repo-config.test.tspackages/state-issue/src/validate.tspackages/state-issue/test/validate.test.tsplanning/issues/211/decisions.mdplanning/issues/211/plan.mdschemas/state-issue.v1.md
…ates/inputs Address review round 1 on PR #229 (multi-repo coordination). Each finding resolved class-wide within its blast radius, with a test per fix: - getIssueState (github mode): map only known gh states — OPEN→open, CLOSED/MERGED→closed — and treat any other value as unresolvable (null / stale blocker) instead of default-unblocking. A MERGED PR ref is a satisfied blocker (closed); a future/unexpected state no longer silently unblocks. Extracted as the pure, tested mapGhIssueState. - getIssueState (file mode): a non-numeric slug with no Epic file returns null rather than forwarding to gh, where refToIssueNumber would throw on the non-numeric ref. Only numeric refs fall through; matches the documented 'no file → stale' contract. - recommender cron: guard non-positive / non-finite runTimeoutMs and maxConcurrentRepos (new isPositiveNumber helper). A 0/NaN timeout would expire every run immediately (all stamps roll back, nothing completes); a NaN concurrency would zero the worker pool. - repo_config checkout-path collision: add migration 011, a partial UNIQUE index on checkout_path (non-null), as the atomic backstop the eager read- check can't provide across processes. registerManagedRepo translates the constraint violation into RepoPathCollisionError. The migration de-dupes pre-existing duplicate paths first so CREATE UNIQUE INDEX can't abort startup on an older db.
A long assistant.message line in classifyStop's rate-limit test wasn't format-clean; oxfmt reflows it. No behavior change.
The db-scripts backup/restore round-trip asserts the restored db's schema version; migration 011 (checkout_path unique index) bumped it 10→11.
…ates/inputs Address review round 1 on PR #229 (multi-repo coordination). Each finding resolved class-wide within its blast radius, with a test per fix: - getIssueState (github mode): map only known gh states — OPEN→open, CLOSED/MERGED→closed — and treat any other value as unresolvable (null / stale blocker) instead of default-unblocking. A MERGED PR ref is a satisfied blocker (closed); a future/unexpected state no longer silently unblocks. Extracted as the pure, tested mapGhIssueState. - getIssueState (file mode): a non-numeric slug with no Epic file returns null rather than forwarding to gh, where refToIssueNumber would throw on the non-numeric ref. Only numeric refs fall through; matches the documented 'no file → stale' contract. - recommender cron: guard non-positive / non-finite runTimeoutMs and maxConcurrentRepos (new isPositiveNumber helper). A 0/NaN timeout would expire every run immediately (all stamps roll back, nothing completes); a NaN concurrency would zero the worker pool. - repo_config checkout-path collision: add migration 011, a partial UNIQUE index on checkout_path (non-null), as the atomic backstop the eager read- check can't provide across processes. registerManagedRepo translates the constraint violation into RepoPathCollisionError. The migration de-dupes pre-existing duplicate paths first so CREATE UNIQUE INDEX can't abort startup on an older db.
Summary
Closes #211
Closes the three real multi-repo coordination holes the audit found: cross-repo Epic blockers were ignored at runtime, two repos could silently share one checkout, and a hung recommender on one repo stalled all the others. One Epic = one branch = one PR; the three commits map one-to-one to the closed sub-issues (#225, #226, #227).
Status
BlockedItem.blockerruntime resolution (cross-repo unblock)mm initon shared checkout_path)What changed
packages/dispatcher/src/blocker-resolution.ts(new) — pure ref-parse + reclassify: same-repo#nand cross-repoowner/repo#nblockers resolved against live state.packages/dispatcher/src/workflows/recommender.ts— newresolve-blockersstep (after the dispatcher-section reapply, before verify).packages/dispatcher/src/github.ts/epic-store/*— newEpicGateway.getIssueState(github viagh issue view; file via the Epic file) + routing.packages/state-issue/src/validate.ts+schemas/state-issue.v1.md— blocker grammar relaxed for cross-repo + title/stale annotations.packages/dispatcher/src/repo-config.ts—RepoPathCollisionError+assertNoRepoPathCollision(path-normalized);registerManagedReporejects a shared checkout.packages/cli/src/{commands/init.ts,bootstrap/*}+index.ts—mm initruns the guard before scaffolding; exits non-zero on collision.packages/dispatcher/src/hook-server.ts—/control/dispatchmaps a collision to 400.packages/dispatcher/src/recommender-cron.ts+packages/core/src/config.ts— concurrent per-repo cron pass behind a bounded pool + per-repo timeout; new[recommender] max_concurrent_repos/run_timeout_seconds.Acceptance criteria
BlockedItem.blockerand reclassifies (closed → Ready, open → annotated, unresolvable →(stale blocker: <ref>)), backticked stays blocked —packages/dispatcher/test/blocker-resolution.test.tsschemas/state-issue.v1.mdpackages/dispatcher/test/multi-repo-blockers.test.tsregisterManagedReporejects a different repo at an already-registered checkout (same-slug idempotent);mm initguards before writing and exits non-zero;/control/dispatchreturns 400 —packages/dispatcher/test/repo-config.test.ts,packages/dispatcher/test/control-routes.test.tsmm init: a second init at the same path with a different slug exits non-zero and never writes the second repo's.middle/<slug>.toml—packages/cli/test/init-collision.test.tspackages/dispatcher/test/recommender-cron-parallel.test.tspackages/dispatcher/test/recommender-cron.test.ts,packages/dispatcher/test/control-routes.test.tsWhy these changes
The audit found
BlockedItem.blockerhad no runtime consumer, so a cross-repo block was permanent. Resolution is deterministic (a workflow step), not the agent's judgment, because an LLM can't reliably reach repo B and a deterministic unblock is reproducible. The collision guard is one helper shared by the registry write and an earlymm inithook so a rejected init scaffolds nothing. The cron parallelization stamps all due repos before fanning out (keeping the double-dispatch guard) and wraps each run in a hard timeout so one hang is isolated.Verification
.middle/acme-b.tomlnever written).bun test→ 1408 pass;bun run typecheckclean;bun run lintclean; PRmergeable=MERGEABLE.Internal review
Ran a clean-eyes adversarial review pass over the diff before marking ready. It found one real bug (an empty blocker title produced
#42 (), which the verify step'svalidatethen rejected) and two adjacent hardenings (untruncated annotation title; collision guard missing trailing-slash/dot-segment path variants) — all fixed with regression tests in 6ae8348, then re-reviewed clean.Decisions
Distilled into inline review comments on this PR; full rationale in
planning/issues/211/decisions.md.Stumbling points
Engine— embedded engines share a process-singleton queue manager (perpackages/dispatcher/CLAUDE.md). Fixed by registering once on a shared engine and ticking viaengine.start(mirrorsrecommender-workflow.test.ts).validate()rejected the resolution pass's own annotated output until the blocker grammar was relaxed — a reminder that the verify step validates the post-resolution body.Suggested CLAUDE.md updates
None — the bunqueue single-engine gotcha is already documented in
packages/dispatcher/CLAUDE.md.Follow-up issues
None filed. Two items are documented design boundaries, not discovery: cross-repo Epic-file references (file mode) are explicitly out of scope for #225 (a schema-v2 step), and the cron's per-repo failure surface is the log + rolled-back watermark (the cron has no state-issue writer of its own).
Out of scope
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests