Skip to content

fix(providers): throw ProviderNotConfiguredError instead of raw Error on null provider resolution#30716

Merged
ashleeradka merged 1 commit into
mainfrom
devin/1778790232-fix-provider-not-configured-error
May 14, 2026
Merged

fix(providers): throw ProviderNotConfiguredError instead of raw Error on null provider resolution#30716
ashleeradka merged 1 commit into
mainfrom
devin/1778790232-fix-provider-not-configured-error

Conversation

@ashleeradka
Copy link
Copy Markdown
Contributor

Closes #30714
Part of JARVIS-765

Prompt / plan

Sentry audit of 0.8.1 revealed BRAIN-EJ (166 events, 13 unique production devices) — satellite callers throwing a generic Error when resolveDefaultProvider() returns null, bypassing the existing error classification pipeline.

Changes

1. Replace raw Error with ProviderNotConfiguredError in three satellite callers

conversation-store.ts, subagent/manager.ts, and approval-generators.ts all threw new Error("...default provider 'anthropic' is not registered") when resolveDefaultProvider() returned null. This bypassed the classification pipeline in conversation-error.ts which specifically checks instanceof ProviderNotConfiguredError and provides actionable user-facing messages ("API key required for connection X"). Users saw a generic unhelpful error instead of guidance.

Now they throw ProviderNotConfiguredError with connectionName attribution, so the macOS chat banner renders the correct actionable message.

2. Early connection.status === "disabled" check in tryResolveProviderForConnectionName

A disabled connection (e.g. managed connections on BYOK installs, flipped by disableManagedConnectionsForByokHatch) was still going through auth resolution, which failed with a confusing transient-failure log. Now it short-circuits to null immediately with a debug-level log.

Root cause analysis

  1. How did the code get into this state? PR refactor(inference): drop legacy getProvider fallback (Phase 1.1 + 1.3) #30217 removed the legacy getProvider(name) fallback and made resolveDefaultProvider() return null on soft credential failures. The satellite callers were updated to handle null, but they threw raw Error instead of ProviderNotConfiguredError. The error type mismatch was not caught because the test mocks supply a working provider.

  2. What decisions led to it? The satellite callers were updated in a single refactoring PR that focused on the resolution architecture. The error classification pipeline (which checks instanceof ProviderNotConfiguredError) was in a different file and the mismatch between the thrown type and the expected type wasn't flagged.

  3. Warning signs missed? The conversation-error.ts pipeline explicitly checks for ProviderNotConfiguredError — any caller that throws a different error type for the same condition will bypass classification. This coupling is implicit and easy to miss.

  4. Prevention: When throwing errors in provider resolution paths, always use ProviderNotConfiguredError (not raw Error) so the existing classification pipeline can provide actionable messages.

  5. AGENTS.md update: Not warranted — this is a specific error-type mismatch, not a general pattern. The error-handling docs already cover the convention.

Alternatives NOT taken

  • Retry with backoff: Initial investigation considered retry, but Sentry data showed the same devices failing repeatedly over days (19 events from one device). The failures are permanent per-device (missing credentials, platform auth unavailable), not transient I/O blips. Retry would add latency before the same failure.

  • Propagating auth failure reason through the chain: Changing tryResolveProviderForConnectionName's return type from Provider | null to a result object with error codes would give more precise error messages ("Platform auth unavailable" vs "Credential not found") but touches 8+ callers. The existing ProviderNotConfiguredError + classifyConversationError pipeline already handles the distinction via getProviderRoutingSource() (managed vs user-key), so the benefit is marginal vs the blast radius.

Test plan

  • bunx tsc --noEmit — passes
  • bun run lint — passes
  • All 523 affected tests run via pre-push hook — 3 pre-existing failures (compaction benchmark, inference-no-mode-boot-e2e, provider-commit-message-generator) confirmed identical on main
  • dispatch-connection-routing.test.ts — 4/4 pass
  • inference-no-mode-boot-e2e.test.ts — all connection-resolution tests pass (the 1 failure is pre-existing on main)

Link to Devin session: https://app.devin.ai/sessions/056fc283f36e4364868105d8eeb64d8c
Requested by: @ashleeradka

…ovider resolution

Fixes VELLUM-ASSISTANT-BRAIN-EJ — the satellite callers (conversation-store,
subagent manager, approval-generators) threw a raw Error when
resolveDefaultProvider returned null. This bypassed the existing error
classification pipeline in conversation-error.ts that provides actionable
user-facing messages through ProviderNotConfiguredError.

Also adds an early status check in tryResolveProviderForConnectionName
so disabled connections short-circuit before attempting auth resolution.

Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@linear
Copy link
Copy Markdown

linear Bot commented May 14, 2026

JARVIS-765

Copy link
Copy Markdown
Contributor

@vex-assistant-bot vex-assistant-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVE

Value: 13 production devices hitting BRAIN-EJ (166 events) were getting a generic, unhelpful error banner instead of the actionable "API key required for connection X" message — this restores the classification pipeline for all three satellite callers.

What this does:

  1. Replaces new Error(...) with ProviderNotConfiguredError in conversation-store.ts, subagent/manager.ts, and approval-generators.ts — the three callers that handle a null return from resolveDefaultProvider()
  2. Adds an early connection.status === "disabled" guard in tryResolveProviderForConnectionName so disabled connections short-circuit cleanly before hitting auth resolution

Code review:

ProviderNotConfiguredError constructor signature — verified: (requestedProvider: string, registeredProviders: string[], attribution?: { connectionName?, profileName? }). All three throw sites pass (resolved.provider, listProviders(), { connectionName: resolved.provider_connection }) — matches exactly. ✅

classifyConversationError checks instanceof ProviderNotConfiguredError as its first branch and extracts error.connectionName for the banner. The raw Error throws were silently falling through to the regex fallback, producing useless output. Pipeline now fires correctly. ✅

The disabled early-return in connection-resolution.ts is placed at exactly the right point — after the provider-mismatch hard check, before resolveProviderFromConnection dips into auth. The debug-level log is appropriate (not noise, not silent). Callers will surface ProviderNotConfiguredError with connectionName, which tells BYOK users to configure their own key — the correct action. ✅

Minor observation: For an intentionally disabled managed connection, the user-facing banner will say "API key required for connection X" rather than "connection X is disabled." Semantically it's slightly imprecise, but the prompted action (configure a credential) is correct for BYOK scenarios. Not a blocker — worth a follow-up if user confusion is reported.

This PR is also a perfect application of the documented daemon anti-pattern: "DO NOT throw plain Error from typed error paths — callers use instanceof discrimination; plain Error silently bypasses it." (See anti-patterns KB, cf. PR #6726 web-side equivalent.)

CI: Type Check ✅ · Lint ✅ · FlexFrame Lint ✅ · OpenAPI ✅ · Security ✅ · Tests in-progress (description confirms 523 affected tests pass locally, 3 pre-existing failures on main). Codex 👍'd. Clean. Ready to merge once Test run completes.

Vellum Constitution — Trust-seeking: silent, generic errors replaced with actionable attribution — users now know exactly which connection to fix.

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Test Results

Ran adversarial unit tests against the error classification fix. All 4 tests passed.

Test Details
Test Result
ProviderNotConfiguredError passes instanceof check that raw Error fails ✅ Passed
Disabled connection short-circuits without calling resolveProviderFromConnection ✅ Passed
ProviderNotConfiguredError carries correct attribution for classification pipeline ✅ Passed
Raw Error lacks attribution fields that classification pipeline needs ✅ Passed
Source Verification
  • All 3 satellite callers throw ProviderNotConfiguredError: conversation-store.ts:236, subagent/manager.ts:196, approval-generators.ts:152
  • Zero remaining throw new Error(...not registered...) in any of the 3 files
  • connection.status === "disabled" guard confirmed in connection-resolution.ts:113-119

CI: 8 passed, 0 failed. Type Check ✅, Lint ✅, Test ✅.

Devin session

@ashleeradka ashleeradka merged commit 642fcca into main May 14, 2026
13 checks passed
@ashleeradka ashleeradka deleted the devin/1778790232-fix-provider-not-configured-error branch May 14, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Satellite callers throw raw Error instead of ProviderNotConfiguredError on null provider resolution

2 participants