Skip to content

feat(orchestrator): replace /invoke-workflow text sentinel with typed invoke_workflow MCP tool#1122

Closed
mhooooo wants to merge 2 commits intocoleam00:mainfrom
mhooooo:feat/invoke-workflow-mcp-tool-v2
Closed

feat(orchestrator): replace /invoke-workflow text sentinel with typed invoke_workflow MCP tool#1122
mhooooo wants to merge 2 commits intocoleam00:mainfrom
mhooooo:feat/invoke-workflow-mcp-tool-v2

Conversation

@mhooooo
Copy link
Copy Markdown

@mhooooo mhooooo commented Apr 12, 2026

Summary

The current orchestrator detects workflow dispatch via a text-based /invoke-workflow sentinel — Claude is prompted to emit a magic string, and stream/batch handlers post-parse for the pattern. This is fragile: Claude emits the sentinel inconsistently (in code blocks, with leading text, paraphrased, or not at all). Workflow dispatch is important and deserves to be deterministic.

This PR registers an in-process MCP server exposing invoke_workflow as a real tool. Claude calls a typed function with structured parameters (workflow_name, project_name, task_description) instead of emitting a string convention. Dispatch becomes machine-structured and deterministic.

Note: This PR depends on #1121 (stale-session auto-reset) — they share changes in orchestrator-agent.ts. Review/merge #1121 first.

Changes

packages/core/src/orchestrator/workflow-tool.ts (NEW)

  • buildWorkflowMcpServer(deps) factory using createSdkMcpServer
  • invoke_workflow tool with zod schema: workflow_name, project_name, task_description
  • Fire-and-forget dispatch (workflows take minutes-hours)

packages/core/src/orchestrator/codebase-utils.ts (NEW)

  • findCodebaseByName extracted from orchestrator-agent.ts (DRY)

packages/core/src/orchestrator/orchestrator-agent.ts

  • Stream + batch build workflowMcpServer, inject via requestOptions.mcpServers
  • Text-sentinel detection removed from post-loop parsers
  • handleWorkflowInvocationResult deleted (dead code)

packages/core/src/orchestrator/prompt-builder.ts

  • Router description rewritten for tool interface

packages/core/src/utils/error-formatter.ts

  • /reset`reset` in error messages (Slack intercepts /reset)

Tests

  • workflow-tool.test.ts — 8 tests: factory, dispatch, error paths, case matching, zod validation
  • Existing orchestrator + prompt-builder tests updated
  • Full suite: 2960 tests, exit 0

Scope

  • /register-project still uses text sentinel (different shape, less error-prone) — separable follow-up
  • Fire-and-forget: tool returns confirmation immediately, workflow runs async
  • error-formatter.ts change is incidental — happy to split out

Carried forward from dynamous-community/remote-coding-agent @ v0.2.12 (commit 3df00e1b). Rebased onto latest main.

Test plan

  • bun run type-check — clean
  • bun test packages/core/src/orchestrator/workflow-tool.test.ts
  • bun test packages/core/src/orchestrator/orchestrator.test.ts
  • bun test packages/core/src/orchestrator/prompt-builder.test.ts
  • bun run test — full suite exit 0
  • Manual: dispatch workflow via chat → verify Claude calls tool, workflow dispatches
  • Manual: /invoke-workflow text is treated as literal message (no longer triggers dispatch)

mhooooo added 2 commits April 12, 2026 21:31
…lack

When the Claude Code SDK rejects a resume attempt with "No conversation
found" (the SDK session ID is gone), the orchestrator now transparently
resets the session and retries the query instead of surfacing an error
that the user has to /reset manually.

Also accepts bare 'reset' without the leading slash on Slack, since Slack
intercepts /reset as its own slash command.

Changes:
- claude.ts: classify stale_session as a non-retryable error class
  (checked before 'crash' — specific wins over generic); export
  STALE_SESSION_PATTERNS as the single source of truth for both the
  classifier and the orchestrator's isStaleSessionError() helper
- session-transitions.ts: new 'stale-session-cleared' transition
  (deactivates — next message creates a fresh session)
- orchestrator-agent.ts: isStaleSessionError() helper; SLACK_BARE_COMMANDS
  normalization scoped to Slack platform only; handleStreamMode and
  handleBatchMode wrap their AI query loops in runStreamQuery() /
  runBatchQuery() functions so a catch block can reset sessionForQuery
  and re-run with the fresh session ID; state reset before retry
  (allMessages/allChunks/assistantMessages/commandDetected) so partial
  content from the failed attempt never bleeds into the fresh response
- claude.test.ts: stale_session classification tests, including priority
  over 'crash' on overlapping error messages, and .cause assertions
- orchestrator.test.ts: parameterized stream/batch retry tests covering
  successful reset+retry, no-third-retry guard, null-session skip, and
  fresh session ID assertion on retry

Ported from the dynamous/remote-coding-agent fork (commit 229217cf) with
the following intentional deltas against the new v0.3.5 base:
- Dropped defaultCodebase auto-scoping block (Patch 1 not carried —
  CONFIG-REPLACEABLE per investigation verdict)
- Slack bare-command normalization scoped to Slack platform only
  (fork shipped unscoped initially; this change came in a later
  review-findings sub-commit)
- runStreamQuery/runBatchQuery keep upstream's 4-arg aiClient.sendQuery
  signature including requestOptions (fork was on pre-v0.3.2 3-arg shape)
- Upstream deterministic command list preserved (help, status, reset,
  workflow, register-project, update-project, remove-project, commands,
  init, worktree) — fork only had 5
- No CHANGELOG / bun.lock / package.json / docs changes — those will
  be rebuilt on top of the v0.3.5 base

Upstream-PR candidate for coleam00/Archon.
… tool

The previous /invoke-workflow text-sentinel approach was unreliable —
Claude would emit the sentinel inconsistently (mid-response, inside code
blocks, with extra text, or not at all). The fallback post-loop regex
parser caught some cases but left a persistent failure mode where
workflows either didn't dispatch or dispatched at the wrong time.

Migrate to an in-process MCP server exposing invoke_workflow as a real
typed tool call. Claude now dispatches workflows by calling a function
with structured parameters, which is deterministic and reliable.

Changes:
- packages/core/src/orchestrator/workflow-tool.ts (NEW): buildWorkflowMcpServer
  factory using createSdkMcpServer. Registers invoke_workflow tool with
  zod schema for workflow_name / project_name / task_description. Tool is
  fire-and-forget — it kicks off dispatchOrchestratorWorkflow via the
  injected dispatch callback and returns immediately so the conversation
  turn can end cleanly.
- packages/core/src/orchestrator/codebase-utils.ts (NEW): findCodebaseByName
  helper — org-qualified and case-insensitive project matching, extracted
  from orchestrator-agent.ts to eliminate duplication between workflow-tool.ts
  and the register-project handler.
- packages/core/src/orchestrator/workflow-tool.test.ts (NEW): 8 tests
  covering server shape, error paths, dispatch happy path, error handling,
  case-insensitive project matching, org-qualified matching, and zod
  validation of task_description.
- orchestrator-agent.ts:
  - handleStreamMode and handleBatchMode each build a workflowMcpServer
    at entry via buildWorkflowMcpServer({ ... dispatch }), then pass it
    via requestOptions.mcpServers['archon-tools'] to aiClient.sendQuery.
    Caller-provided requestOptions are merged, not overwritten, so outer
    MCP config still works.
  - /invoke-workflow text-sentinel detection removed from both stream
    and batch post-loop command parsers. /register-project still uses
    the text sentinel since it needs inline-parseable user-visible output.
  - handleWorkflowInvocationResult function deleted (dead code after
    sentinel removal).
  - issueContext parameter renamed to _issueContext in handleStreamMode
    and handleBatchMode to document that it's unused — issue context now
    travels through the task_description field of the tool call instead.
  - Imports: buildWorkflowMcpServer and findCodebaseByName added.
- prompt-builder.ts: router description rewritten to describe the
  invoke_workflow tool interface (tool parameters) instead of the
  text-sentinel command syntax.
- orchestrator-agent.test.ts: workflow-tool module mock added.
- prompt-builder.test.ts: assertions updated to match the new
  tool-based routing instructions.
- error-formatter.ts: "Use /reset" → "Use `reset`" for the session-error
  fallback message (Slack intercepts /reset as its own slash command;
  bare 'reset' is accepted by the orchestrator after commit 1's
  SLACK_BARE_COMMANDS normalization).
- error-formatter.test.ts: test expectation updated.

Ported from dynamous/remote-coding-agent fork commit 3df00e1b with the
following intentional deltas:
- Dropped: Slack thinking indicator (⏳ emoji) — the feature was broken
  in v0.2 (emoji flashed on then off because the wrapper's await fn()
  returned immediately on a fire-and-forget handler) and not worth
  carrying forward without a fix.
- Kept: upstream's 4-arg aiClient.sendQuery signature with requestOptions;
  MCP server merged into caller-provided requestOptions rather than
  replacing them.
- Kept: upstream's longer deterministic command list (help, status,
  reset, workflow, register-project, update-project, remove-project,
  commands, init, worktree) — fork only had 5.
- Skipped: CHANGELOG, CLAUDE.md, docs/adapters/slack.md changes —
  upstream docs have diverged; docs will be rebuilt on top of v0.3.5.
- Skipped: packages/core/package.json MCP SDK dependency bump — will
  resolve naturally via bun install once the runtime is assembled.

Upstream-PR candidate for coleam00/Archon.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef44bd48-34a8-4fe0-b75d-4c8f89fa4767

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Apr 20, 2026

Thanks for the thorough work here, @mhooooo — the bug you're pointing at is real (Claude does emit the /invoke-workflow sentinel inconsistently, and post-parse regex on stream output is inherently fragile). But after a close look I don't think this is the right shape for the fix, and I'd rather be direct than leave you building on top of it.

Why this is going to be closed

1. It's solving a fragility in a narrow non-primary surface.
The text sentinel is only used in handleMessage → stream/batch chat (Slack/Telegram/Web chat), where the AI decides from natural language to dispatch a workflow. The CLI path (archon workflow run <name> "prompt"), the Web UI workflow picker, and scheduled triggers all use structured args and never touch this code. Archon is CLI-first; this PR is ~979 lines to rewire a fuzzy-intent path that isn't where most invocations happen.

2. The primitive cuts against the multi-provider architecture.
createSdkMcpServer / tool() are Claude-SDK-specific. Archon's whole shape — IAgentProvider, the new community-provider registry, Pi and Codex support — is built around providers being interchangeable behind a narrow interface. Baking a Claude MCP construct into the orchestrator core chat path means Codex/Pi chat users don't get the fix, and the orchestrator grows a hard Claude coupling we've been working to minimize.

3. MCP is the wrong layer for an in-process callback.
MCP is designed for cross-process / out-of-tree tool sharing. invoke_workflow is a same-process JS function that calls dispatchOrchestratorWorkflow. The Claude SDK supports plain inline tool definitions without the MCP server wrapper — MCP adds a whole protocol round-trip for what's structurally a local function call.

4. The pattern isn't eliminated, just halved.
/register-project still uses text-parsing (you call this out). So the post-parse regex stays in the codebase, and we're carrying two dispatch mechanisms for two commands with the same shape. Worst of both worlds.

5. Tool-call response patterns ≠ text-response patterns.
With the sentinel the user sees Claude's reasoning first, then dispatch. With a tool call Claude may call silently, mid-response, or repeatedly. Recovering the "reasoning first, then dispatch" UX requires prompt-forcing — which is the same class of instruction-following you're trying to escape.

If you want to revisit the underlying problem

Three lighter paths that wouldn't have these issues:

  • Tighter sentinel prompt-engineering. The current prompt-builder.ts constraints are heavy but not exhaustive. If the real failure mode is measurable, prompt changes are a ~10-line PR.
  • Claude SDK inline tool (not MCP) behind a provider-capability gate. If structured tool-call is really the right answer, do it at the provider layer (providerCapabilities.supportsOrchestratorTools) so Codex/Pi degrade gracefully to the text path instead of silently losing the feature.
  • Accept the sentinel as the right shape for a convenience surface. Chat-AI fuzzy-intent dispatch has always been a convenience layer on top of the structured CLI/web entry points; occasional flakiness there is a smaller cost than provider coupling in the orchestrator core.

Unrelated issues worth flagging

What we'd love to see instead

If you want to keep contributing — and we do — #1121 is the one with real returns on effort. It's a smaller, targeted bug fix whose design we've already agreed on. Getting that ported to the new architecture (see my earlier porting checklist) is the highest-leverage next step.

Closing this one. Thanks again for the care you put into it — the concern is real, just not right-sized for where it lives in the architecture.

@Wirasm Wirasm closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants