Skip to content

fix(orchestrator): surface AI error results instead of silent drop (fixes #1076)#1084

Closed
kagura-agent wants to merge 2 commits intocoleam00:devfrom
kagura-agent:fix/stream-error-silent-drop
Closed

fix(orchestrator): surface AI error results instead of silent drop (fixes #1076)#1084
kagura-agent wants to merge 2 commits intocoleam00:devfrom
kagura-agent:fix/stream-error-silent-drop

Conversation

@kagura-agent
Copy link
Copy Markdown
Contributor

@kagura-agent kagura-agent commented Apr 11, 2026

Problem

When the Claude OAuth refresh token is expired (or any AI error occurs with no session_id), the chat UI shows no response at all — no error message, no indication of failure.

Root Cause

Both handleStreamMode and handleBatchMode in orchestrator-agent.ts had a condition that required msg.sessionId to be truthy to process result messages:

} else if (msg.type === 'result' && msg.sessionId) {

When OAuth fails, the result message has is_error: true but session_id: undefined, so:

  1. The error result is silently dropped
  2. No assistant messages are produced → allMessages.length === 0
  3. Early return with no user-facing feedback

Fix

  • Split the result handler to process sessionId and isError independently
  • Surface error messages when AI errors produce no assistant content:
    • Auth errors: AI error (authentication_error). Check your Claude credentials or use /reset.
    • Other errors: AI error (<subtype>). Check server logs for details.
  • Enhanced logging: include errorResult in debug log for observability

Applied to both stream and batch mode paths.

Validation

  • 5 new tests covering:
    • Stream mode: auth error, non-auth error, missing subtype
    • Batch mode: auth error, non-auth error
  • All pre-existing tests pass
  • bun run type-check clean across all packages
  • eslint + prettier via lint-staged clean

Closes #1076

Summary by CodeRabbit

  • New Features

    • Added provider-based AI support (Claude and Codex) and new end-to-end smoke workflows to validate node types and scripts.
    • New env-var management endpoints and a /health endpoint.
  • Bug Fixes

    • Improved user-facing AI error messages with authentication-specific hints and clearer guidance for other failures.
  • Tests

    • Added extensive tests for AI error handling in both streaming and batch modes and provider behaviors.
  • Documentation

    • Updated docs and guides to reflect “agent provider” terminology and new provider configuration.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refactors AI client -> provider surface (new @archon/providers), removes env-leak gate/allow_env_keys plumbing, updates orchestrator to use providers and send user-facing AI error messages when providers yield error results, adds provider implementations/tests, and introduces several E2E workflow and docs updates.

Changes

Cohort / File(s) Summary
Provider package & types
packages/providers/*, packages/providers/src/types.ts, packages/providers/package.json
New providers package exposing IAgentProvider, MessageChunk, factory (getAgentProvider), Claude/Codex provider implementations, config parsers, errors, and tests. Re-exports/types added for consumers.
Orchestrator & AI error handling
packages/core/src/orchestrator/orchestrator-agent.ts, packages/core/src/orchestrator/orchestrator-agent.test.ts
Replaced getAssistantClient→getAgentProvider usage; changed request shape to SendQueryOptions/assistantConfig; process msg.type === 'result' regardless of sessionId, capture error metadata (isError/errorSubtype), and send explicit user-facing error messages when no assistant output is produced; added tests for stream/batch error variants.
Clients → removed legacy clients
packages/core/src/clients/*, packages/core/src/clients/index.ts, packages/core/src/clients/factory.ts
Deleted Claude/Codex client modules and client factory; consumers migrated to new providers package and provider factory.
DB / codebase env consent removal
packages/core/src/db/*, packages/core/src/db/codebases.ts, packages/core/src/db/adapters/sqlite.ts
Removed allow_env_keys column, migration step, update helper, and codebase-level allow-env-key plumbing.
Env-leak scanner & related handlers
packages/core/src/utils/env-leak-scanner.*, packages/core/src/handlers/clone.ts, packages/core/src/handlers/*, packages/server/src/*, packages/web/src/*
Deleted env-leak scanner utilities, error types, API/UI/CLI consent surfaces (--allow-env-keys, PATCH codebase endpoint, UI toggle). Updated server startup to remove pre-scan; API/route/schema/client typings adjusted.
Workflows & executor refactor
packages/workflows/src/*, packages/workflows/src/dag-executor.ts, packages/workflows/src/deps.ts
Executor rewritten to use provider types (SendQueryOptions, NodeConfig, ProviderCapabilities), replaced assistant-capability checks with provider capability-driven warnings, added bun --no-env-file for script nodes, and updated numerous tests.
Providers: Claude & Codex implementations
packages/providers/src/claude/*, packages/providers/src/codex/*
Added full provider implementations: streaming normalization, first-event timeout, error classification, retry policies, capabilities, structured output handling, and provider-specific tests.
Tests & mocks adjusted
packages/core/src/test/..., packages/workflows/src/test/..., packages/providers/src/test/...
Updated many tests to mock @archon/providers.getAgentProvider and new mockSendQuery/capabilities; removed or adapted env-leak related test suites; added provider factory tests.
Docs, CLI, packaging, CI, Docker
.claude/*, packages/docs-web/*, packages/cli/*, Dockerfile, .archon/workflows/*, homebrew/archon.rb
Documentation updated from “client”→“provider”; removed env-leak docs; CLI flag --allow-env-keys removed; new workflow YAMLs and echo script added; Dockerfile and package manifests updated to include providers; Homebrew version bump.

Sequence Diagram(s)

sequenceDiagram
    participant User as User (Web UI)
    participant Orch as Orchestrator
    participant Prov as Agent Provider
    participant Plat as Platform Adapter

    User->>Orch: send message
    Orch->>Prov: getAgentProvider(...).sendQuery(requestOptions)
    Prov-->>Orch: yield events (assistant/tool/result)
    alt provider yields result isError (no assistant text)
        Orch->>Orch: capture errorResult (errorSubtype)
        Orch->>Plat: sendMessage(user-facing error guidance)
        Plat-->>User: display error message
    else provider yields normal assistant output
        Orch->>Plat: forward assistant chunks
        Plat-->>User: display assistant response
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • Wirasm

Poem

🐰 I hopped through code at break of light,

Found silent errors now given sight.
Providers sing where clients once stood,
Env leaks gone — the garden's good.
Hop, debug, rejoice — the user sees right.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 895-903: Replace the hardcoded "Claude credentials" text with an
assistant-specific auth hint by checking the conversation.ai_assistant_type when
building the hint used in the error handling before calling
platform.sendMessage; use errorResult.subtype and conversation.ai_assistant_type
to pick an appropriate message (e.g., "Check your <assistant> credentials or use
/reset.") and fall back to the generic "Check server logs for details." Ensure
this change is applied in the same block referencing errorResult.subtype and
platform.sendMessage and the analogous block around the other occurrence (lines
~1045-1053) so both error messages reflect the actual ai_assistant_type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04c8cd53-1e71-436b-8c29-8aa786fcdeb8

📥 Commits

Reviewing files that changed from the base of the PR and between 536584d and 8206afe.

📒 Files selected for processing (2)
  • packages/core/src/orchestrator/orchestrator-agent.test.ts
  • packages/core/src/orchestrator/orchestrator-agent.ts

Comment thread packages/core/src/orchestrator/orchestrator-agent.ts
@kagura-agent
Copy link
Copy Markdown
Contributor Author

Good catch @coderabbitai — updated to use aiClient.getType() for provider-aware error messages instead of hardcoding 'Claude'. Both stream and batch mode paths now show the correct provider label (e.g. 'Claude' vs 'AI'). All tests pass.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the current implementation of error message handling in orchestrator-agent.ts
rg -n "getType\|aiClient\|authentication_error\|AI error\|provider" --type=ts packages/core/src/orchestrator/orchestrator-agent.ts -A2 -B2

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Find the orchestrator-agent.ts file
fd "orchestrator-agent.ts" --type f

# Also search broadly for getType and AI error patterns
rg -n "getType\|authentication_error\|AI error" --type=ts -A2 -B2

Length of output: 52


🏁 Script executed:

#!/bin/bash
# Read the full orchestrator-agent.ts to inspect error handling
cat packages/core/src/orchestrator/orchestrator-agent.ts

Length of output: 48052


@kagura-agent: The fix looks great! I've verified the implementation in both handlers:

Stream mode (handleStreamMode):

const providerLabel = aiClient.getType() === 'claude' ? 'Claude' : 'AI';
const hint =
  errorResult.subtype === 'authentication_error'
    ? `Check your ${providerLabel} credentials or use /reset.`
    : 'Check server logs for details.';

Batch mode (handleBatchMode):

const providerLabel = aiClient.getType() === 'claude' ? 'Claude' : 'AI';
const hint =
  errorResult.subtype === 'authentication_error'
    ? `Check your ${providerLabel} credentials or use /reset.`
    : 'Check server logs for details.';

Both paths are symmetric and correctly use aiClient.getType() to produce provider-aware labels — showing "Claude" for the Claude client and "AI" as a sensible fallback for other providers. Well done! ✅


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/core/src/orchestrator/orchestrator-agent.ts (1)

906-907: Align log event names with repository structured logging format.

no_ai_response does not follow the {domain}.{action}_{state} convention required by the repo guidelines.

♻️ Suggested rename
-    getLog().debug({ conversationId, errorResult }, 'no_ai_response');
+    getLog().debug({ conversationId, errorResult }, 'orchestrator.ai_response_missing');
...
-    getLog().debug({ conversationId, errorResult }, 'no_ai_response');
+    getLog().debug({ conversationId, errorResult }, 'orchestrator.ai_response_missing');

As per coding guidelines, "Use structured logging with Pino; event naming format: {domain}.{action}_{state}; always pair _started with _completed or _failed; include context IDs and error details".

Also applies to: 1057-1058

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/orchestrator/orchestrator-agent.ts` around lines 906 - 907,
Replace the non-conforming log event name 'no_ai_response' in the getLog().debug
call with a structured event following the repo format (for example
'orchestrator.ai_response_failed' or 'orchestrator.get_response_failed'), keep
the existing context variables (conversationId, errorResult) in the payload, and
ensure this failure event is paired with corresponding '_started' or
'_completed' events elsewhere (see the other getLog() calls around the same
method). Target the getLog().debug invocation that logs { conversationId,
errorResult } in orchestrator-agent.ts and update the event string to follow
{domain}.{action}_{state}.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 906-907: Replace the non-conforming log event name
'no_ai_response' in the getLog().debug call with a structured event following
the repo format (for example 'orchestrator.ai_response_failed' or
'orchestrator.get_response_failed'), keep the existing context variables
(conversationId, errorResult) in the payload, and ensure this failure event is
paired with corresponding '_started' or '_completed' events elsewhere (see the
other getLog() calls around the same method). Target the getLog().debug
invocation that logs { conversationId, errorResult } in orchestrator-agent.ts
and update the event string to follow {domain}.{action}_{state}.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d2a58df4-9d98-4c91-aab2-404b2e45d8ec

📥 Commits

Reviewing files that changed from the base of the PR and between 8206afe and 67b6a56.

📒 Files selected for processing (1)
  • packages/core/src/orchestrator/orchestrator-agent.ts

@kagura-agent kagura-agent force-pushed the fix/stream-error-silent-drop branch from 67b6a56 to 511c326 Compare April 13, 2026 11:29
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/core/src/orchestrator/orchestrator-agent.ts (1)

895-904: Extract the AI error formatter into one helper.

Line 895 and Line 1046 now build the same provider label, hint, and final message string. Pulling that into a small helper will keep stream and batch behavior from drifting again on the next wording or subtype tweak.

♻️ Suggested refactor
+function formatAiErrorMessage(
+  aiClient: ReturnType<typeof getAssistantClient>,
+  errorResult: { subtype?: string }
+): string {
+  const providerLabel = aiClient.getType() === 'claude' ? 'Claude' : 'AI';
+  const hint =
+    errorResult.subtype === 'authentication_error'
+      ? `Check your ${providerLabel} credentials or use /reset.`
+      : 'Check server logs for details.';
+  return `AI error${errorResult.subtype ? ` (${errorResult.subtype})` : ''}. ${hint}`;
+}

Then reuse it in both branches:

-      const providerLabel = aiClient.getType() === 'claude' ? 'Claude' : 'AI';
-      const hint =
-        errorResult.subtype === 'authentication_error'
-          ? `Check your ${providerLabel} credentials or use /reset.`
-          : 'Check server logs for details.';
-      await platform.sendMessage(
-        conversationId,
-        `AI error${errorResult.subtype ? ` (${errorResult.subtype})` : ''}. ${hint}`
-      );
+      await platform.sendMessage(conversationId, formatAiErrorMessage(aiClient, errorResult));

Also applies to: 1046-1055

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/orchestrator/orchestrator-agent.ts` around lines 895 - 904,
The code builds the same provider label, hint, and message in two places (when
handling errorResult before streaming and later at lines around the batch
branch); extract that logic into a single helper like
formatAIErrorMessage(errorResult, aiClient) which returns the full string, then
replace the repeated construction in both places (the blocks using
aiClient.getType(), errorResult.subtype, and platform.sendMessage) to call this
helper so wording and subtype handling stay consistent across stream and batch
flows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 906-907: The log event string "no_ai_response" does not follow our
`{domain}.{action}_{state}` convention—update the two log calls that use this
literal (the debug call at getLog().debug({ conversationId, errorResult },
'no_ai_response') and the other instance around lines 1057–1058) to a
domain-prefixed event like "orchestrator.ai_response_failed" so they are
queryable with other structured logs; keep the same metadata object ({
conversationId, errorResult }) and use the existing getLog() logger (created via
createLogger('orchestrator') elsewhere in this module) when replacing the event
name.

---

Nitpick comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 895-904: The code builds the same provider label, hint, and
message in two places (when handling errorResult before streaming and later at
lines around the batch branch); extract that logic into a single helper like
formatAIErrorMessage(errorResult, aiClient) which returns the full string, then
replace the repeated construction in both places (the blocks using
aiClient.getType(), errorResult.subtype, and platform.sendMessage) to call this
helper so wording and subtype handling stay consistent across stream and batch
flows.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c8fa93e-7a38-49d3-9a53-b759214a0a0c

📥 Commits

Reviewing files that changed from the base of the PR and between 67b6a56 and 511c326.

📒 Files selected for processing (3)
  • homebrew/archon.rb
  • packages/core/src/orchestrator/orchestrator-agent.test.ts
  • packages/core/src/orchestrator/orchestrator-agent.ts
✅ Files skipped from review due to trivial changes (1)
  • homebrew/archon.rb
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/core/src/orchestrator/orchestrator-agent.test.ts

Comment on lines +906 to 907
getLog().debug({ conversationId, errorResult }, 'no_ai_response');
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rename this log event to match the repo convention.

no_ai_response does not follow the {domain}.{action}_{state} format used elsewhere in the project. Please rename both entries to something like orchestrator.ai_response_failed so they stay queryable with the rest of the structured logs.

As per coding guidelines, "Use createLogger('domain') from @archon/paths for structured Pino logging with event naming format {domain}.{action}_{state} and standard states: _started, _completed, _failed, _validated, _rejected".

Also applies to: 1057-1058

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/orchestrator/orchestrator-agent.ts` around lines 906 - 907,
The log event string "no_ai_response" does not follow our
`{domain}.{action}_{state}` convention—update the two log calls that use this
literal (the debug call at getLog().debug({ conversationId, errorResult },
'no_ai_response') and the other instance around lines 1057–1058) to a
domain-prefixed event like "orchestrator.ai_response_failed" so they are
queryable with other structured logs; keep the same metadata object ({
conversationId, errorResult }) and use the existing getLog() logger (created via
createLogger('orchestrator') elsewhere in this module) when replacing the event
name.

@kagura-agent kagura-agent force-pushed the fix/stream-error-silent-drop branch 9 times, most recently from 0dffb6d to c5908df Compare April 18, 2026 16:13
…ixes coleam00#1076)

When the Claude OAuth refresh token is expired, the result message has
is_error=true but session_id=undefined. Both handleStreamMode and
handleBatchMode required sessionId to be truthy to process result
messages, causing errors to be silently dropped.

Changes:
- Split result handler condition to process sessionId and isError
  independently in both stream and batch modes
- Send descriptive error message to user when AI errors produce no
  assistant messages (with auth-specific hint for authentication_error)
- Include errorResult in debug log for observability

Tests: 5 new tests covering stream/batch × auth/generic/no-subtype
Address CodeRabbit review: replace hardcoded 'Claude credentials' with
provider-aware label from aiClient.getType() so error messages are
accurate for non-Claude assistants.
@kagura-agent kagura-agent force-pushed the fix/stream-error-silent-drop branch from c5908df to 0476be6 Compare April 19, 2026 02:16
@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Apr 20, 2026

@kagura-agent this PR looks similar to the already-merged #1089 (merged April 16, 2026). Please check whether your change is still needed — rebase on main and confirm.

@kagura-agent
Copy link
Copy Markdown
Contributor Author

Closing — this is superseded by #1089 which was merged on April 16 with a similar fix for surfacing AI error results. Thanks @Wirasm for flagging the overlap! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(core): chat UI fails silently when Claude OAuth refresh token is expired

2 participants