fix(recommender): run on the daemon's engine, not a standalone second one#150
Conversation
… one
The recommender never actually ran from the daemon. dispatchRecommender stood up
its OWN HookServer (on the dispatcher port → EADDRINUSE for mm run-recommender
while the daemon was up) AND its OWN embedded engine inside the daemon (a second
in-process engine that never processed the job → 202 'enqueued' but no workflow
row, no tmux, no run). Dispatch was migrated to the daemon's engine long ago; the
recommender was left on the Phase-7 standalone path. The autonomous loop's first
link was broken end to end.
- recommender workflow: per-repo settings via an optional resolveRunSettings(repo)
resolver (schemaPath/config/repoConfig/agentTimeoutMs), so ONE registration on
the daemon's long-lived engine serves every managed repo — mirroring how the
implementation workflow resolves per-repo. Static fields stay as the fallback
for the standalone runner (backward-compatible).
- main.ts: register the recommender on the daemon's engine; runRecommenderForRepo
now engine.start("recommender", …) on that engine (reusing the daemon's
HookServer/sessionGate + dispatcherUrl), exactly like startDispatchImpl.
- mm run-recommender: now a thin client — auto-starts the daemon (like mm
dispatch) and POSTs /trigger/recommender; no more standalone engine/port clash.
- integration test: proves the daemon path RUNS the recommender on the engine and
creates the recommender workflow row (the row the dead-engine path never made).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR migrates recommender execution from in-process dispatch to a daemon-based client model and registers the recommender workflow on the daemon engine with per-repo run-settings resolution and agent-timeout backstops. ChangesRecommender daemon migration with per-repo config resolution
Sequence Diagram(s)sequenceDiagram
participant CLI
participant Daemon
participant DispatcherEngine
participant RecommenderWorkflow
CLI->>Daemon: GET /health (probeHealth)
alt daemon down
CLI->>Daemon: startDaemon()
CLI->>Daemon: poll /health until ready (healthTimeoutMs)
end
CLI->>Daemon: POST /trigger/recommender (trigger with repoPath)
Daemon->>DispatcherEngine: engine.start("recommender", { repo })
DispatcherEngine->>RecommenderWorkflow: enqueue run -> runSettings(repo) -> spawn agent
RecommenderWorkflow-->>DispatcherEngine: complete/failed
Daemon-->>CLI: HTTP 202 / other status -> CLI maps to exit code
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/dispatcher/src/workflows/recommender.ts`:
- Around line 516-522: The step timeout for "spawn-recommender-agent" uses the
static deps.agentTimeoutMs which is unset in daemon mode; change the timeout to
derive the agent timeout from the per-repo resolver first
(runSettings(repo).agentTimeoutMs) falling back to deps.agentTimeoutMs and then
to DEFAULT_AGENT_TIMEOUT_MS, and use that computed value in the timeout
expression along with launchTimeout (so the step timeout mirrors the same
per-repo agentTimeoutMs used earlier around awaitStop).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: efab13dc-bd2c-4397-b68f-3a66b01c98a1
📒 Files selected for processing (6)
packages/cli/src/commands/run-recommender.tspackages/cli/test/run-recommender.test.tspackages/dispatcher/src/main.tspackages/dispatcher/src/recommender-run.tspackages/dispatcher/src/workflows/recommender.tspackages/dispatcher/test/recommender-workflow.test.ts
…backstop covers CodeRabbit on #150: the step's registration-time timeout read the static agentTimeoutMs (unset → 15min default in daemon mode) while awaitStop read the per-repo resolveRunSettings value — a repo configured above the default would trip the generic step timeout before its own specific Stop-await. Add MAX_AGENT_TIMEOUT_MS (30min) ceiling: clamp the per-repo awaitStop to it and size the step backstop to it, so the internal timeout always fires first.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/dispatcher/test/recommender-workflow.test.ts (1)
647-649: ⚡ Quick winTighten the wiring-guard assertion.
This path throws after
prepare-shallow-worktree, so the DB row should end incompensatedand the worktree should be gone. Allowing"failed"here would still pass if compensation regressed.💡 Suggested fix
const id = await runToEnd(broken); - // The guard fails the run (and compensation rolls the worktree back) rather - // than silently producing a half-run — exactly the failure mode we're fixing. - expect(["failed", "compensated"]).toContain(getWorkflow(db, id)!.state); + expect(getWorkflow(db, id)!.state).toBe("compensated"); + expect(await listWorktrees({ repoPath, worktreeRoot })).toEqual([]);Based on learnings
launching,waiting-human,rate-limited, andcompensatedare the DB-only workflow states.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/dispatcher/test/recommender-workflow.test.ts` around lines 647 - 649, Tighten the assertion so the test requires the workflow DB state to be "compensated" (not allowing "failed") after the error thrown post-prepare-shallow-worktree, and add an assertion that the worktree was removed; update the expect that currently calls getWorkflow(db, id)!.state to assert === "compensated" and add a check (using the existing worktree lookup helper used elsewhere in this test file) that the worktree no longer exists for id.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/dispatcher/src/workflows/recommender.ts`:
- Around line 409-414: The workflow repeatedly calls
runSettings(ctx.input.repo)/resolveRunSettings(repo) across build, spawn, verify
and auto-dispatch which lets live config changes mix settings mid-run; instead
call and await runSettings(ctx.input.repo) once at the start of the execution
(store it in a local const like resolvedRunSettings or runCfg) and replace all
subsequent calls (including the agentTimeout calculation and checks for
schemaPath/repoConfig/autoDispatch) to read from that single frozen object so
every step uses the same resolved settings for the entire run.
---
Nitpick comments:
In `@packages/dispatcher/test/recommender-workflow.test.ts`:
- Around line 647-649: Tighten the assertion so the test requires the workflow
DB state to be "compensated" (not allowing "failed") after the error thrown
post-prepare-shallow-worktree, and add an assertion that the worktree was
removed; update the expect that currently calls getWorkflow(db, id)!.state to
assert === "compensated" and add a check (using the existing worktree lookup
helper used elsewhere in this test file) that the worktree no longer exists for
id.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 18040610-fa04-4062-9e61-18962b4a7925
📒 Files selected for processing (2)
packages/dispatcher/src/workflows/recommender.tspackages/dispatcher/test/recommender-workflow.test.ts
…steps CodeRabbit on #150: resolveRunSettings was re-called in build/spawn/verify/ trigger, so a live config edit mid-run could mix schemaPath/config/repoConfig/ autoDispatch/agentTimeoutMs within one execution. Resolve once in build-prompt, return it on BuildPromptResult, and read it from ctx in the later steps.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
…backstop covers CodeRabbit on #150: the step's registration-time timeout read the static agentTimeoutMs (unset → 15min default in daemon mode) while awaitStop read the per-repo resolveRunSettings value — a repo configured above the default would trip the generic step timeout before its own specific Stop-await. Add MAX_AGENT_TIMEOUT_MS (30min) ceiling: clamp the per-repo awaitStop to it and size the step backstop to it, so the internal timeout always fires first.
…l on a failed launch Two follow-ups from #150's debugging: - mm start now runs one recommender due-check pass immediately (any overdue managed repo fires now → auto-dispatch on completion) and nudges auto-dispatch for every managed repo, instead of idling until the first cron tick (up to a full interval). A fresh restart works in seconds, not 15 minutes. - The cron stamps last_recommender_run before firing (overlap guard) but now ROLLS IT BACK on a failed launch (setLastRecommenderRun to the prior value), so a failure retries on the next tick rather than going quiet for a full interval.
The bug
The recommender never actually ran from the daemon — the autonomous loop's first link was broken end to end (found by dogfooding:
mm startchose nothing for 30+ min;mm run-recommenderfailed with "port 4120 in use"; the daemon trigger returned202 enqueuedbut created no workflow row, no tmux, no run).Root cause:
dispatchRecommenderwas the Phase-7 standalone path and never migrated to the daemon when dispatch was. It stands up its own HookServer (binds the dispatcher port →EADDRINUSEwhen the daemon is up) and its own embedded engine (a second in-process engine inside the daemon that never processes the job). Dispatch, by contrast, runs on the daemon's one long-lived engine. The asymmetry was the bug.The fix — make the recommender use the daemon, exactly like dispatch
resolveRunSettings(repo)resolver (schemaPath/config/repoConfig/agentTimeoutMs), so one registration on the daemon's long-lived engine serves every managed repo — mirroring how the implementation workflow resolves per-repo. The static fields remain as the fallback for the standalone runner (backward-compatible).main.ts: registers the recommender on the daemon's engine;runRecommenderForReponowengine.start("recommender", …)on it, reusing the daemon's HookServer/sessionGate + dispatcherUrl — exactly likestartDispatchImpl.mm run-recommender: now a thin client — auto-starts the daemon (likemm dispatch) and POSTs/trigger/recommender. No standalone engine, no port clash.Integration-verified (per #143/#144)
recommender-workflow.test.ts— proves the daemon path RUNS: registers the workflow withresolveRunSettingson a real engine, drives it to completion, and asserts arecommenderworkflow row is created (the row the dead-engine path never made) + the per-repo resolver was used + auto-dispatch fired from the resolved config. Plus a guard test for the no-settings wiring error.run-recommender.test.ts— rewritten for the thin-client contract: auto-starts the daemon when down, POSTs the trigger, relays a non-202 rejection, handles never-ready + unreachable.Verify
Follow-up still open: a healthy
mm startshould kick one recommender+auto-dispatch pass immediately rather than waiting for the first cron interval, and a failed recommender launch shouldn't burn the full interval (both noted while debugging).Summary by CodeRabbit