Skip to content

fix(recommender): run on the daemon's engine, not a standalone second one#150

Merged
thejustinwalsh merged 3 commits into
mainfrom
fix/recommender-on-daemon-engine
May 25, 2026
Merged

fix(recommender): run on the daemon's engine, not a standalone second one#150
thejustinwalsh merged 3 commits into
mainfrom
fix/recommender-on-daemon-engine

Conversation

@thejustinwalsh

@thejustinwalsh thejustinwalsh commented May 25, 2026

Copy link
Copy Markdown
Owner

The bug

The recommender never actually ran from the daemon — the autonomous loop's first link was broken end to end (found by dogfooding: mm start chose nothing for 30+ min; mm run-recommender failed with "port 4120 in use"; the daemon trigger returned 202 enqueued but created no workflow row, no tmux, no run).

Root cause: dispatchRecommender was the Phase-7 standalone path and never migrated to the daemon when dispatch was. It stands up its own HookServer (binds the dispatcher port → EADDRINUSE when the daemon is up) and its own embedded engine (a second in-process engine inside the daemon that never processes the job). Dispatch, by contrast, runs on the daemon's one long-lived engine. The asymmetry was the bug.

The fix — make the recommender use the daemon, exactly like dispatch

  • Recommender workflow: per-repo settings via an optional resolveRunSettings(repo) resolver (schemaPath/config/repoConfig/agentTimeoutMs), so one registration on the daemon's long-lived engine serves every managed repo — mirroring how the implementation workflow resolves per-repo. The static fields remain as the fallback for the standalone runner (backward-compatible).
  • main.ts: registers the recommender on the daemon's engine; runRecommenderForRepo now engine.start("recommender", …) on it, reusing the daemon's HookServer/sessionGate + dispatcherUrl — exactly like startDispatchImpl.
  • mm run-recommender: now a thin client — auto-starts the daemon (like mm dispatch) and POSTs /trigger/recommender. No standalone engine, no port clash.

Integration-verified (per #143/#144)

  • recommender-workflow.test.tsproves the daemon path RUNS: registers the workflow with resolveRunSettings on a real engine, drives it to completion, and asserts a recommender workflow row is created (the row the dead-engine path never made) + the per-repo resolver was used + auto-dispatch fired from the resolved config. Plus a guard test for the no-settings wiring error.
  • run-recommender.test.ts — rewritten for the thin-client contract: auto-starts the daemon when down, POSTs the trigger, relays a non-202 rejection, handles never-ready + unreachable.
  • Full suite 673 pass, lint/typecheck/format clean.

Verify

bun run typecheck && bun test
# live: mm start  (the cron now fires the recommender on the daemon's engine);
#       mm run-recommender <repo>  (thin client, auto-starts the daemon)

Follow-up still open: a healthy mm start should kick one recommender+auto-dispatch pass immediately rather than waiting for the first cron interval, and a failed recommender launch shouldn't burn the full interval (both noted while debugging).

Summary by CodeRabbit

  • New Features
    • run-recommender now acts as a thin client to a long‑lived daemon: auto-starts, health-checks, and relays trigger requests; per-repo settings can be resolved at runtime with configurable timeouts.
  • Bug Fixes
    • Improved handling of daemon connection/readiness and clearer exit codes on dispatcher rejections/failures.
  • Tests
    • Expanded coverage for daemon client behavior, per‑repo resolution, timeouts, and failure cases.
  • API Changes
    • Minor public API surface updates: new/exported types and an exported helper surfaced.

Review Change Stack

… one

The recommender never actually ran from the daemon. dispatchRecommender stood up
its OWN HookServer (on the dispatcher port → EADDRINUSE for mm run-recommender
while the daemon was up) AND its OWN embedded engine inside the daemon (a second
in-process engine that never processed the job → 202 'enqueued' but no workflow
row, no tmux, no run). Dispatch was migrated to the daemon's engine long ago; the
recommender was left on the Phase-7 standalone path. The autonomous loop's first
link was broken end to end.

- recommender workflow: per-repo settings via an optional resolveRunSettings(repo)
  resolver (schemaPath/config/repoConfig/agentTimeoutMs), so ONE registration on
  the daemon's long-lived engine serves every managed repo — mirroring how the
  implementation workflow resolves per-repo. Static fields stay as the fallback
  for the standalone runner (backward-compatible).
- main.ts: register the recommender on the daemon's engine; runRecommenderForRepo
  now engine.start("recommender", …) on that engine (reusing the daemon's
  HookServer/sessionGate + dispatcherUrl), exactly like startDispatchImpl.
- mm run-recommender: now a thin client — auto-starts the daemon (like mm
  dispatch) and POSTs /trigger/recommender; no more standalone engine/port clash.
- integration test: proves the daemon path RUNS the recommender on the engine and
  creates the recommender workflow row (the row the dead-engine path never made).
@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d836465f-f79e-4844-a316-4c28547f04c4

📥 Commits

Reviewing files that changed from the base of the PR and between 8434ad5 and 720172b.

📒 Files selected for processing (1)
  • packages/dispatcher/src/workflows/recommender.ts

📝 Walkthrough

Walkthrough

This PR migrates recommender execution from in-process dispatch to a daemon-based client model and registers the recommender workflow on the daemon engine with per-repo run-settings resolution and agent-timeout backstops.

Changes

Recommender daemon migration with per-repo config resolution

Layer / File(s) Summary
CLI daemon client interface
packages/cli/src/commands/run-recommender.ts, packages/cli/test/run-recommender.test.ts
RunRecommenderOptions gains startDaemon, probeHealth, healthTimeoutMs, and trigger injection points with default implementations for health probing and HTTP trigger. runRecommender now derives the daemon URL from dispatcherPort, auto-starts on health probe failure, polls for readiness, and returns exit code 0 only for HTTP 202; tests verify daemon lifecycle and trigger behavior under up/down/error conditions.
Workflow per-repo configuration contract
packages/dispatcher/src/workflows/recommender.ts, packages/dispatcher/test/recommender-workflow.test.ts
Adds exported RecommenderRunSettings and makes RecommenderDeps static fields optional while adding resolveRunSettings?: (repo: string) => RecommenderRunSettings. createRecommenderWorkflow resolves per-run settings, uses them in prompt construction, clamps agent stop-await with MAX_AGENT_TIMEOUT_MS, uses per-repo repoConfig for validation, and reads autoDispatch from per-repo config. Tests updated for daemon-path and timeout backstop.
Daemon engine registration and enqueue
packages/dispatcher/src/main.ts
Imports expanded for recommender wiring. runRecommenderForRepo now enqueues work via engine.start("recommender", ...) with try/catch and 500-on-enqueue-failure. Registers createRecommenderWorkflow(...) on the daemon engine and wires DB, adapter resolver, session gate, tmux/worktree helpers, state issue gateway, ghSurfaceProblem, and auto-dispatch. Documentation updated to note long-lived bunqueue engine use.
Export ghSurfaceProblem helper
packages/dispatcher/src/recommender-run.ts
ghSurfaceProblem is exported to allow injection into the daemon workflow.
Tests and timeout backstop updates
packages/dispatcher/test/recommender-workflow.test.ts, packages/cli/test/run-recommender.test.ts
Workflow tests updated to expect spawn-recommender-agent timeout sized against the per-repo ceiling (MAX_AGENT_TIMEOUT_MS) and new daemon-path tests added covering resolveRunSettings success and wiring failure cases; CLI tests cover daemon lifecycle and trigger behaviors.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant Daemon
  participant DispatcherEngine
  participant RecommenderWorkflow
  CLI->>Daemon: GET /health (probeHealth)
  alt daemon down
    CLI->>Daemon: startDaemon()
    CLI->>Daemon: poll /health until ready (healthTimeoutMs)
  end
  CLI->>Daemon: POST /trigger/recommender (trigger with repoPath)
  Daemon->>DispatcherEngine: engine.start("recommender", { repo })
  DispatcherEngine->>RecommenderWorkflow: enqueue run -> runSettings(repo) -> spawn agent
  RecommenderWorkflow-->>DispatcherEngine: complete/failed
  Daemon-->>CLI: HTTP 202 / other status -> CLI maps to exit code
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • thejustinwalsh/middle#142: Related changes around daemon-triggered recommender dispatch and cron/managed-repo triggering that integrate with the same engine-driven recommender path.
  • thejustinwalsh/middle#123: Both PRs modify how agentTimeoutMs flows into the recommender workflow and adjust timeout/backstop behavior.
  • thejustinwalsh/middle#105: Introduced the /trigger/recommender endpoint this PR’s CLI targets; this PR extends that endpoint’s workflow with daemon-mode per-repo config resolution.

Suggested labels

ready-for-review

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: converting recommender execution from a standalone second engine to using the daemon's long-lived engine, which is the core bug fix across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@thejustinwalsh

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/dispatcher/src/workflows/recommender.ts`:
- Around line 516-522: The step timeout for "spawn-recommender-agent" uses the
static deps.agentTimeoutMs which is unset in daemon mode; change the timeout to
derive the agent timeout from the per-repo resolver first
(runSettings(repo).agentTimeoutMs) falling back to deps.agentTimeoutMs and then
to DEFAULT_AGENT_TIMEOUT_MS, and use that computed value in the timeout
expression along with launchTimeout (so the step timeout mirrors the same
per-repo agentTimeoutMs used earlier around awaitStop).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: efab13dc-bd2c-4397-b68f-3a66b01c98a1

📥 Commits

Reviewing files that changed from the base of the PR and between c7f2ed4 and 12c4300.

📒 Files selected for processing (6)
  • packages/cli/src/commands/run-recommender.ts
  • packages/cli/test/run-recommender.test.ts
  • packages/dispatcher/src/main.ts
  • packages/dispatcher/src/recommender-run.ts
  • packages/dispatcher/src/workflows/recommender.ts
  • packages/dispatcher/test/recommender-workflow.test.ts

Comment thread packages/dispatcher/src/workflows/recommender.ts
…backstop covers

CodeRabbit on #150: the step's registration-time timeout read the static
agentTimeoutMs (unset → 15min default in daemon mode) while awaitStop read the
per-repo resolveRunSettings value — a repo configured above the default would
trip the generic step timeout before its own specific Stop-await. Add
MAX_AGENT_TIMEOUT_MS (30min) ceiling: clamp the per-repo awaitStop to it and size
the step backstop to it, so the internal timeout always fires first.
@thejustinwalsh

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/dispatcher/test/recommender-workflow.test.ts (1)

647-649: ⚡ Quick win

Tighten the wiring-guard assertion.

This path throws after prepare-shallow-worktree, so the DB row should end in compensated and the worktree should be gone. Allowing "failed" here would still pass if compensation regressed.

💡 Suggested fix
     const id = await runToEnd(broken);
-    // The guard fails the run (and compensation rolls the worktree back) rather
-    // than silently producing a half-run — exactly the failure mode we're fixing.
-    expect(["failed", "compensated"]).toContain(getWorkflow(db, id)!.state);
+    expect(getWorkflow(db, id)!.state).toBe("compensated");
+    expect(await listWorktrees({ repoPath, worktreeRoot })).toEqual([]);

Based on learnings launching, waiting-human, rate-limited, and compensated are the DB-only workflow states.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dispatcher/test/recommender-workflow.test.ts` around lines 647 -
649, Tighten the assertion so the test requires the workflow DB state to be
"compensated" (not allowing "failed") after the error thrown
post-prepare-shallow-worktree, and add an assertion that the worktree was
removed; update the expect that currently calls getWorkflow(db, id)!.state to
assert === "compensated" and add a check (using the existing worktree lookup
helper used elsewhere in this test file) that the worktree no longer exists for
id.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/dispatcher/src/workflows/recommender.ts`:
- Around line 409-414: The workflow repeatedly calls
runSettings(ctx.input.repo)/resolveRunSettings(repo) across build, spawn, verify
and auto-dispatch which lets live config changes mix settings mid-run; instead
call and await runSettings(ctx.input.repo) once at the start of the execution
(store it in a local const like resolvedRunSettings or runCfg) and replace all
subsequent calls (including the agentTimeout calculation and checks for
schemaPath/repoConfig/autoDispatch) to read from that single frozen object so
every step uses the same resolved settings for the entire run.

---

Nitpick comments:
In `@packages/dispatcher/test/recommender-workflow.test.ts`:
- Around line 647-649: Tighten the assertion so the test requires the workflow
DB state to be "compensated" (not allowing "failed") after the error thrown
post-prepare-shallow-worktree, and add an assertion that the worktree was
removed; update the expect that currently calls getWorkflow(db, id)!.state to
assert === "compensated" and add a check (using the existing worktree lookup
helper used elsewhere in this test file) that the worktree no longer exists for
id.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 18040610-fa04-4062-9e61-18962b4a7925

📥 Commits

Reviewing files that changed from the base of the PR and between 12c4300 and 8434ad5.

📒 Files selected for processing (2)
  • packages/dispatcher/src/workflows/recommender.ts
  • packages/dispatcher/test/recommender-workflow.test.ts

Comment thread packages/dispatcher/src/workflows/recommender.ts
…steps

CodeRabbit on #150: resolveRunSettings was re-called in build/spawn/verify/
trigger, so a live config edit mid-run could mix schemaPath/config/repoConfig/
autoDispatch/agentTimeoutMs within one execution. Resolve once in build-prompt,
return it on BuildPromptResult, and read it from ctx in the later steps.
@thejustinwalsh

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@thejustinwalsh thejustinwalsh merged commit 975a064 into main May 25, 2026
1 check passed
thejustinwalsh added a commit that referenced this pull request May 25, 2026
…backstop covers

CodeRabbit on #150: the step's registration-time timeout read the static
agentTimeoutMs (unset → 15min default in daemon mode) while awaitStop read the
per-repo resolveRunSettings value — a repo configured above the default would
trip the generic step timeout before its own specific Stop-await. Add
MAX_AGENT_TIMEOUT_MS (30min) ceiling: clamp the per-repo awaitStop to it and size
the step backstop to it, so the internal timeout always fires first.
@thejustinwalsh thejustinwalsh deleted the fix/recommender-on-daemon-engine branch May 25, 2026 18:08
thejustinwalsh added a commit that referenced this pull request May 25, 2026
…l on a failed launch

Two follow-ups from #150's debugging:

- mm start now runs one recommender due-check pass immediately (any overdue
  managed repo fires now → auto-dispatch on completion) and nudges auto-dispatch
  for every managed repo, instead of idling until the first cron tick (up to a
  full interval). A fresh restart works in seconds, not 15 minutes.
- The cron stamps last_recommender_run before firing (overlap guard) but now
  ROLLS IT BACK on a failed launch (setLastRecommenderRun to the prior value), so
  a failure retries on the next tick rather than going quiet for a full interval.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant