fix(orchestrator): surface AI error results instead of silent drop (fixes #1076) by kagura-agent · Pull Request #1084 · coleam00/Archon

kagura-agent · 2026-04-11T12:22:54Z

Problem

When the Claude OAuth refresh token is expired (or any AI error occurs with no session_id), the chat UI shows no response at all — no error message, no indication of failure.

Root Cause

Both handleStreamMode and handleBatchMode in orchestrator-agent.ts had a condition that required msg.sessionId to be truthy to process result messages:

} else if (msg.type === 'result' && msg.sessionId) {

When OAuth fails, the result message has is_error: true but session_id: undefined, so:

The error result is silently dropped
No assistant messages are produced → allMessages.length === 0
Early return with no user-facing feedback

Fix

Split the result handler to process sessionId and isError independently
Surface error messages when AI errors produce no assistant content:
- Auth errors: AI error (authentication_error). Check your Claude credentials or use /reset.
- Other errors: AI error (<subtype>). Check server logs for details.
Enhanced logging: include errorResult in debug log for observability

Applied to both stream and batch mode paths.

Validation

5 new tests covering:
- Stream mode: auth error, non-auth error, missing subtype
- Batch mode: auth error, non-auth error
All pre-existing tests pass
bun run type-check clean across all packages
eslint + prettier via lint-staged clean

Closes #1076

Summary by CodeRabbit

New Features
- Added provider-based AI support (Claude and Codex) and new end-to-end smoke workflows to validate node types and scripts.
- New env-var management endpoints and a /health endpoint.
Bug Fixes
- Improved user-facing AI error messages with authentication-specific hints and clearer guidance for other failures.
Tests
- Added extensive tests for AI error handling in both streaming and batch modes and provider behaviors.
Documentation
- Updated docs and guides to reflect “agent provider” terminology and new provider configuration.

coderabbitai · 2026-04-11T12:23:09Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Refactors AI client -> provider surface (new @archon/providers), removes env-leak gate/allow_env_keys plumbing, updates orchestrator to use providers and send user-facing AI error messages when providers yield error results, adds provider implementations/tests, and introduces several E2E workflow and docs updates.

Changes

Cohort / File(s)	Summary
Provider package & types `packages/providers/*`, `packages/providers/src/types.ts`, `packages/providers/package.json`	New providers package exposing `IAgentProvider`, `MessageChunk`, factory (`getAgentProvider`), Claude/Codex provider implementations, config parsers, errors, and tests. Re-exports/types added for consumers.
Orchestrator & AI error handling `packages/core/src/orchestrator/orchestrator-agent.ts`, `packages/core/src/orchestrator/orchestrator-agent.test.ts`	Replaced getAssistantClient→getAgentProvider usage; changed request shape to SendQueryOptions/assistantConfig; process `msg.type === 'result'` regardless of sessionId, capture error metadata (`isError`/`errorSubtype`), and send explicit user-facing error messages when no assistant output is produced; added tests for stream/batch error variants.
Clients → removed legacy clients `packages/core/src/clients/*`, `packages/core/src/clients/index.ts`, `packages/core/src/clients/factory.ts`	Deleted Claude/Codex client modules and client factory; consumers migrated to new providers package and provider factory.
DB / codebase env consent removal `packages/core/src/db/*`, `packages/core/src/db/codebases.ts`, `packages/core/src/db/adapters/sqlite.ts`	Removed `allow_env_keys` column, migration step, update helper, and codebase-level allow-env-key plumbing.
Env-leak scanner & related handlers `packages/core/src/utils/env-leak-scanner.`, `packages/core/src/handlers/clone.ts`, `packages/core/src/handlers/`, `packages/server/src/`, `packages/web/src/`	Deleted env-leak scanner utilities, error types, API/UI/CLI consent surfaces (`--allow-env-keys`, PATCH codebase endpoint, UI toggle). Updated server startup to remove pre-scan; API/route/schema/client typings adjusted.
Workflows & executor refactor `packages/workflows/src/*`, `packages/workflows/src/dag-executor.ts`, `packages/workflows/src/deps.ts`	Executor rewritten to use provider types (`SendQueryOptions`, `NodeConfig`, `ProviderCapabilities`), replaced assistant-capability checks with provider capability-driven warnings, added bun `--no-env-file` for script nodes, and updated numerous tests.
Providers: Claude & Codex implementations `packages/providers/src/claude/`, `packages/providers/src/codex/`	Added full provider implementations: streaming normalization, first-event timeout, error classification, retry policies, capabilities, structured output handling, and provider-specific tests.
Tests & mocks adjusted `packages/core/src/test/...`, `packages/workflows/src/test/...`, `packages/providers/src/test/...`	Updated many tests to mock `@archon/providers.getAgentProvider` and new `mockSendQuery`/capabilities; removed or adapted env-leak related test suites; added provider factory tests.
Docs, CLI, packaging, CI, Docker `.claude/`, `packages/docs-web/`, `packages/cli/`, `Dockerfile`, `.archon/workflows/`, `homebrew/archon.rb`	Documentation updated from “client”→“provider”; removed env-leak docs; CLI flag `--allow-env-keys` removed; new workflow YAMLs and echo script added; Dockerfile and package manifests updated to include providers; Homebrew version bump.

Sequence Diagram(s)

sequenceDiagram
    participant User as User (Web UI)
    participant Orch as Orchestrator
    participant Prov as Agent Provider
    participant Plat as Platform Adapter

    User->>Orch: send message
    Orch->>Prov: getAgentProvider(...).sendQuery(requestOptions)
    Prov-->>Orch: yield events (assistant/tool/result)
    alt provider yields result isError (no assistant text)
        Orch->>Orch: capture errorResult (errorSubtype)
        Orch->>Plat: sendMessage(user-facing error guidance)
        Plat-->>User: display error message
    else provider yields normal assistant output
        Orch->>Plat: forward assistant chunks
        Plat-->>User: display assistant response
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

security: zero env leaks from managed repos into provider subprocesses #1135 — Env-leak consent removal and hardening: this PR’s removal of the env-leak gate and related flags aligns with objectives to eliminate consent-based pre-scan flows.

Possibly related PRs

feat: Phase 2 — community-friendly provider registry system #1195 — Provider abstraction refactor: shares the same client→provider renaming and type migration changes.
refactor: extract providers from @archon/core into @archon/providers #1137 — Provider extraction: overlaps on moving client logic into a providers package and wiring getAgentProvider.
security: prevent target repo .env from leaking into subprocesses (#1135) #1169 — Env-leak consent removal: implements similar deletion of env-leak scanner and consent surfaces.

Suggested reviewers

Wirasm

Poem

🐰 I hopped through code at break of light,

Found silent errors now given sight.
Providers sing where clients once stood,
Env leaks gone — the garden's good.
Hop, debug, rejoice — the user sees right.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 895-903: Replace the hardcoded "Claude credentials" text with an
assistant-specific auth hint by checking the conversation.ai_assistant_type when
building the hint used in the error handling before calling
platform.sendMessage; use errorResult.subtype and conversation.ai_assistant_type
to pick an appropriate message (e.g., "Check your <assistant> credentials or use
/reset.") and fall back to the generic "Check server logs for details." Ensure
this change is applied in the same block referencing errorResult.subtype and
platform.sendMessage and the analogous block around the other occurrence (lines
~1045-1053) so both error messages reflect the actual ai_assistant_type.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04c8cd53-1e71-436b-8c29-8aa786fcdeb8

📥 Commits

Reviewing files that changed from the base of the PR and between 536584d and 8206afe.

📒 Files selected for processing (2)

packages/core/src/orchestrator/orchestrator-agent.test.ts
packages/core/src/orchestrator/orchestrator-agent.ts

kagura-agent · 2026-04-11T12:30:00Z

Good catch @coderabbitai — updated to use aiClient.getType() for provider-aware error messages instead of hardcoding 'Claude'. Both stream and batch mode paths now show the correct provider label (e.g. 'Claude' vs 'AI'). All tests pass.

coderabbitai · 2026-04-11T12:30:50Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the current implementation of error message handling in orchestrator-agent.ts
rg -n "getType\|aiClient\|authentication_error\|AI error\|provider" --type=ts packages/core/src/orchestrator/orchestrator-agent.ts -A2 -B2

Length of output: 0

🏁 Script executed:

#!/bin/bash
# Find the orchestrator-agent.ts file
fd "orchestrator-agent.ts" --type f

# Also search broadly for getType and AI error patterns
rg -n "getType\|authentication_error\|AI error" --type=ts -A2 -B2

Length of output: 52

🏁 Script executed:

#!/bin/bash
# Read the full orchestrator-agent.ts to inspect error handling
cat packages/core/src/orchestrator/orchestrator-agent.ts

Length of output: 48052

@kagura-agent: The fix looks great! I've verified the implementation in both handlers:

Stream mode (handleStreamMode):

const providerLabel = aiClient.getType() === 'claude' ? 'Claude' : 'AI';
const hint =
  errorResult.subtype === 'authentication_error'
    ? `Check your ${providerLabel} credentials or use /reset.`
    : 'Check server logs for details.';

Batch mode (handleBatchMode):

const providerLabel = aiClient.getType() === 'claude' ? 'Claude' : 'AI';
const hint =
  errorResult.subtype === 'authentication_error'
    ? `Check your ${providerLabel} credentials or use /reset.`
    : 'Check server logs for details.';

Both paths are symmetric and correctly use aiClient.getType() to produce provider-aware labels — showing "Claude" for the Claude client and "AI" as a sensible fallback for other providers. Well done! ✅

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

coderabbitai

🧹 Nitpick comments (1)

packages/core/src/orchestrator/orchestrator-agent.ts (1)

906-907: Align log event names with repository structured logging format.

no_ai_response does not follow the {domain}.{action}_{state} convention required by the repo guidelines.

♻️ Suggested rename

-    getLog().debug({ conversationId, errorResult }, 'no_ai_response');
+    getLog().debug({ conversationId, errorResult }, 'orchestrator.ai_response_missing');
...
-    getLog().debug({ conversationId, errorResult }, 'no_ai_response');
+    getLog().debug({ conversationId, errorResult }, 'orchestrator.ai_response_missing');

As per coding guidelines, "Use structured logging with Pino; event naming format: {domain}.{action}_{state}; always pair _started with _completed or _failed; include context IDs and error details".

Also applies to: 1057-1058

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/orchestrator/orchestrator-agent.ts` around lines 906 - 907,
Replace the non-conforming log event name 'no_ai_response' in the getLog().debug
call with a structured event following the repo format (for example
'orchestrator.ai_response_failed' or 'orchestrator.get_response_failed'), keep
the existing context variables (conversationId, errorResult) in the payload, and
ensure this failure event is paired with corresponding '_started' or
'_completed' events elsewhere (see the other getLog() calls around the same
method). Target the getLog().debug invocation that logs { conversationId,
errorResult } in orchestrator-agent.ts and update the event string to follow
{domain}.{action}_{state}.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 906-907: Replace the non-conforming log event name
'no_ai_response' in the getLog().debug call with a structured event following
the repo format (for example 'orchestrator.ai_response_failed' or
'orchestrator.get_response_failed'), keep the existing context variables
(conversationId, errorResult) in the payload, and ensure this failure event is
paired with corresponding '_started' or '_completed' events elsewhere (see the
other getLog() calls around the same method). Target the getLog().debug
invocation that logs { conversationId, errorResult } in orchestrator-agent.ts
and update the event string to follow {domain}.{action}_{state}.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d2a58df4-9d98-4c91-aab2-404b2e45d8ec

📥 Commits

Reviewing files that changed from the base of the PR and between 8206afe and 67b6a56.

📒 Files selected for processing (1)

packages/core/src/orchestrator/orchestrator-agent.ts

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

packages/core/src/orchestrator/orchestrator-agent.ts (1)

895-904: Extract the AI error formatter into one helper.

Line 895 and Line 1046 now build the same provider label, hint, and final message string. Pulling that into a small helper will keep stream and batch behavior from drifting again on the next wording or subtype tweak.

♻️ Suggested refactor

+function formatAiErrorMessage(
+  aiClient: ReturnType<typeof getAssistantClient>,
+  errorResult: { subtype?: string }
+): string {
+  const providerLabel = aiClient.getType() === 'claude' ? 'Claude' : 'AI';
+  const hint =
+    errorResult.subtype === 'authentication_error'
+      ? `Check your ${providerLabel} credentials or use /reset.`
+      : 'Check server logs for details.';
+  return `AI error${errorResult.subtype ? ` (${errorResult.subtype})` : ''}. ${hint}`;
+}

Then reuse it in both branches:

-      const providerLabel = aiClient.getType() === 'claude' ? 'Claude' : 'AI';
-      const hint =
-        errorResult.subtype === 'authentication_error'
-          ? `Check your ${providerLabel} credentials or use /reset.`
-          : 'Check server logs for details.';
-      await platform.sendMessage(
-        conversationId,
-        `AI error${errorResult.subtype ? ` (${errorResult.subtype})` : ''}. ${hint}`
-      );
+      await platform.sendMessage(conversationId, formatAiErrorMessage(aiClient, errorResult));

Also applies to: 1046-1055

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/orchestrator/orchestrator-agent.ts` around lines 895 - 904,
The code builds the same provider label, hint, and message in two places (when
handling errorResult before streaming and later at lines around the batch
branch); extract that logic into a single helper like
formatAIErrorMessage(errorResult, aiClient) which returns the full string, then
replace the repeated construction in both places (the blocks using
aiClient.getType(), errorResult.subtype, and platform.sendMessage) to call this
helper so wording and subtype handling stay consistent across stream and batch
flows.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 906-907: The log event string "no_ai_response" does not follow our
`{domain}.{action}_{state}` convention—update the two log calls that use this
literal (the debug call at getLog().debug({ conversationId, errorResult },
'no_ai_response') and the other instance around lines 1057–1058) to a
domain-prefixed event like "orchestrator.ai_response_failed" so they are
queryable with other structured logs; keep the same metadata object ({
conversationId, errorResult }) and use the existing getLog() logger (created via
createLogger('orchestrator') elsewhere in this module) when replacing the event
name.

---

Nitpick comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 895-904: The code builds the same provider label, hint, and
message in two places (when handling errorResult before streaming and later at
lines around the batch branch); extract that logic into a single helper like
formatAIErrorMessage(errorResult, aiClient) which returns the full string, then
replace the repeated construction in both places (the blocks using
aiClient.getType(), errorResult.subtype, and platform.sendMessage) to call this
helper so wording and subtype handling stay consistent across stream and batch
flows.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c8fa93e-7a38-49d3-9a53-b759214a0a0c

📥 Commits

Reviewing files that changed from the base of the PR and between 67b6a56 and 511c326.

📒 Files selected for processing (3)

homebrew/archon.rb
packages/core/src/orchestrator/orchestrator-agent.test.ts
packages/core/src/orchestrator/orchestrator-agent.ts

✅ Files skipped from review due to trivial changes (1)

homebrew/archon.rb

🚧 Files skipped from review as they are similar to previous changes (1)

packages/core/src/orchestrator/orchestrator-agent.test.ts

coderabbitai · 2026-04-13T11:34:38Z

+    getLog().debug({ conversationId, errorResult }, 'no_ai_response');
    return;


⚠️ Potential issue | 🟡 Minor

Rename this log event to match the repo convention.

no_ai_response does not follow the {domain}.{action}_{state} format used elsewhere in the project. Please rename both entries to something like orchestrator.ai_response_failed so they stay queryable with the rest of the structured logs.

As per coding guidelines, "Use createLogger('domain') from @archon/paths for structured Pino logging with event naming format {domain}.{action}_{state} and standard states: _started, _completed, _failed, _validated, _rejected".

Also applies to: 1057-1058

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/core/src/orchestrator/orchestrator-agent.ts` around lines 906 - 907, The log event string "no_ai_response" does not follow our `{domain}.{action}_{state}` convention—update the two log calls that use this literal (the debug call at getLog().debug({ conversationId, errorResult }, 'no_ai_response') and the other instance around lines 1057–1058) to a domain-prefixed event like "orchestrator.ai_response_failed" so they are queryable with other structured logs; keep the same metadata object ({ conversationId, errorResult }) and use the existing getLog() logger (created via createLogger('orchestrator') elsewhere in this module) when replacing the event name.

…ixes coleam00#1076) When the Claude OAuth refresh token is expired, the result message has is_error=true but session_id=undefined. Both handleStreamMode and handleBatchMode required sessionId to be truthy to process result messages, causing errors to be silently dropped. Changes: - Split result handler condition to process sessionId and isError independently in both stream and batch modes - Send descriptive error message to user when AI errors produce no assistant messages (with auth-specific hint for authentication_error) - Include errorResult in debug log for observability Tests: 5 new tests covering stream/batch × auth/generic/no-subtype

Address CodeRabbit review: replace hardcoded 'Claude credentials' with provider-aware label from aiClient.getType() so error messages are accurate for non-Claude assistants.

Wirasm · 2026-04-20T06:59:35Z

@kagura-agent this PR looks similar to the already-merged #1089 (merged April 16, 2026). Please check whether your change is still needed — rebase on main and confirm.

kagura-agent · 2026-04-20T08:35:37Z

Closing — this is superseded by #1089 which was merged on April 16 with a similar fix for surfacing AI error results. Thanks @Wirasm for flagging the overlap! 🙏

coderabbitai Bot reviewed Apr 11, 2026

View reviewed changes

Comment thread packages/core/src/orchestrator/orchestrator-agent.ts

coderabbitai Bot reviewed Apr 11, 2026

View reviewed changes

kagura-agent force-pushed the fix/stream-error-silent-drop branch from 67b6a56 to 511c326 Compare April 13, 2026 11:29

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

kagura-agent force-pushed the fix/stream-error-silent-drop branch 9 times, most recently from 0dffb6d to c5908df Compare April 18, 2026 16:13

kagura-agent added 2 commits April 19, 2026 10:13

refactor: use provider-specific label in auth error hint

0476be6

Address CodeRabbit review: replace hardcoded 'Claude credentials' with provider-aware label from aiClient.getType() so error messages are accurate for non-Claude assistants.

kagura-agent force-pushed the fix/stream-error-silent-drop branch from c5908df to 0476be6 Compare April 19, 2026 02:16

kagura-agent closed this Apr 20, 2026

		getLog().debug({ conversationId, errorResult }, 'no_ai_response');
		return;

Conversation

kagura-agent commented Apr 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kagura-agent commented Apr 11, 2026

Uh oh!

coderabbitai Bot commented Apr 11, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Wirasm commented Apr 20, 2026

Uh oh!

kagura-agent commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kagura-agent commented Apr 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 11, 2026 •

edited

Loading