feat(inference): route dispatch through provider_connection#30198
Conversation
Cycle-3 wiring PR — addresses Codex P1 from #30162: the `resolveProviderFromConnection` resolver shipped as dead code with zero call sites. This commit wires the canonical inference dispatch path (`resolveConfiguredProvider` in provider-send-message.ts) to actually use the resolver. Behavior: 1. When the resolved profile has `provider_connection`, dispatch routes through `resolveProviderFromConnection(connection, config)`. 2. On any miss (connection not found, resolver returns null, lookup throws), warn and fall through to legacy `getProvider(name)` so misconfiguration doesn't break inference. 3. Profiles without `provider_connection` keep working unchanged via the legacy path. Also: add `provider_connection?: string` to LLMConfigBase. The field already flowed through `resolveCallSiteConfig` at runtime (`profileConfigFragment` strips only `source`/`label`/`description`), so this aligns the type with reality for type-safe access. Gate test: dispatch-connection-routing.test.ts is the test cycle-2 was missing. Four cases: - Two profiles, same provider, different `provider_connection` → resolver called twice, with the right connection each time, with auth bundles distinguishable per profile (mix-and-match goal #2). - Profile WITHOUT `provider_connection` → resolver NOT called, legacy path takes over. - `provider_connection` set but unknown → resolver NOT called, legacy fallback succeeds. - `provider_connection` set, found, but resolver returns null → resolver IS called, legacy fallback succeeds. If wiring regresses, `resolveProviderCalls.length` invariants break the first test. That's the gate cycle-2 lacked — it tested DB shape, not dispatch invocation.
Cycle-3 follow-up to canonical-path wiring (c3416ac). Migrates the five satellite construction-time call sites — subagent manager, daemon conversation/approval/guardian generators, and rollup producer — to a single shared connection-aware default-provider resolution path. * New `providers/connection-resolution.ts`: - `tryResolveProviderForConnectionName(name, config)` — promoted from private helper in `provider-send-message.ts`. Looks up a `provider_connection` row and resolves a Provider with that connection's auth bound. Logs and returns null on any miss so callers can fall back to the legacy registry path. - `resolveDefaultProvider(config)` — for satellites' construction-time path. Reads `config.llm.default.{provider, provider_connection}`, routes through the connection if named, otherwise legacy lookup. - `resolveProviderForCallSite(callSite, config, opts)` — exported completeness analogue. * `call-site-routing.ts`: - `CallSiteRoutingProvider` gains an optional async `resolveByConnection` ctor param. `selectProvider` is now async; falls through to legacy on miss so existing call sites keep working unchanged. - New `wrapWithCallSiteRouting(base, config)` helper — replaces the per-file wrappers in `approval-generators` and `guardian-action-generators` so all five satellites share one wiring shape. * Satellites migrated (all five): - `subagent/manager.ts` — `resolveDefaultProvider` + `wrapWithCallSiteRouting`. Throws on null default (preserves existing failure mode). - `daemon/conversation-store.ts` — same pattern; throws on null default. - `daemon/approval-generators.ts` — both copy + conversation generators use the shared wrapping; copy returns null on miss, conversation throws. - `daemon/guardian-action-generators.ts` — both generators use the shared wrapper. `getConfiguredProvider` already routes through the canonical path, so only the wrapper changes here. - `home/rollup-producer.ts` — uses `resolveDefaultProvider`. The `RollupProducerDeps.resolveProvider?` injection point is widened to `() => Provider | null | Promise<Provider | null>` for async tests. * `provider-send-message.ts` switches to the shared `tryResolveProviderForConnectionName` and drops the now-redundant local `tryResolveFromConnection` (and its unused `log`/`getLogger` imports). * Hard-gate test: `__tests__/satellite-connection-routing.test.ts`. Four cases on `CallSiteRoutingProvider` directly: 1. Alternate-profile callSite with `provider_connection` → connection-resolution hook fires with the right name+auth and the transport that runs is the connection-bound stub (not default, not legacy registry). 2. Alternate-profile callSite WITHOUT `provider_connection` → connection hook never fires; legacy registry path produces the stub. 3. Alternate-profile callSite with unknown `provider_connection` → falls through; system stays operational on default. 4. No callSite → straight to default; no hook, no registry lookup. Combined with the cycle-3 first-commit dispatch tests, that's 8 gate tests across both code paths (canonical `provider-send-message` resolution + satellite `CallSiteRoutingProvider` wrapping). Cycle-2 + cycle-3-first-commit shipped `resolveProviderFromConnection` as dead code (zero call sites). This wiring removes that gap. Cycle-4 will strip the legacy fallback paths once we've shipped one release window with connection-awareness active. Verified: - typecheck clean (assistant + gateway). - lint clean on touched files. - `bun test src/providers/__tests__/` → 102 pass / 0 fail / 206 expects.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8b335eb0ec
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (connectionName) { | ||
| const connectionProvider = await tryResolveProviderForConnectionName( | ||
| connectionName, | ||
| config, | ||
| ); |
There was a problem hiding this comment.
Verify connection provider before routing by provider_connection
This branch routes through any resolved provider_connection without checking that the connection's provider matches resolved.provider, so a profile/config where those fields diverge will silently dispatch to the wrong backend (e.g., profile resolves to OpenAI but connection points to Anthropic). That misroutes traffic and can produce model/provider mismatches at request time while metadata still reports the original resolved provider; add a provider-consistency check before returning connectionProvider and fall back when they differ.
Useful? React with 👍 / 👎.
| if (!listProviders().includes(config.llm.default.provider)) { | ||
| throw new Error("No provider available for approval conversation"); | ||
| } | ||
| const provider = wrapWithCallSiteRouting( | ||
| getProvider(config.llm.default.provider), | ||
| ); | ||
| // Connection-aware default + per-call routing. See createApprovalCopyGenerator | ||
| // above for rationale; both paths share the same wiring shape. |
There was a problem hiding this comment.
Remove legacy provider-list gate before connection-aware resolve
The new connection-aware path is unreachable in this generator when the default provider is available only via provider_connection: the early listProviders().includes(...) check throws before resolveDefaultProvider(config) gets a chance to resolve from connection auth. In your-own setups without a legacy global provider key, this causes false “No provider available” failures even though a valid connection exists.
Useful? React with 👍 / 👎.
| if (!listProviders().includes(config.llm.default.provider)) { | ||
| throw new Error("No provider available for approval conversation"); | ||
| } |
There was a problem hiding this comment.
🔴 Stale listProviders() guard blocks connection-aware provider resolution
In createApprovalConversationGenerator, the legacy guard at line 141 checks !listProviders().includes(config.llm.default.provider) and throws before the new connection-aware resolveDefaultProvider(config) at line 146 ever runs. When a user's default profile uses a provider_connection (the new feature this PR adds) but the provider name isn't in the legacy registry (e.g., services.inference.mode: 'your-own' with no Anthropic API key in the legacy credential store, only a connection row), this guard incorrectly throws "No provider available for approval conversation" even though resolveDefaultProvider would have succeeded via the connection path. The sibling function createApprovalCopyGenerator (approval-generators.ts:80-92) does not have this stale guard and correctly relies solely on the null check after resolveDefaultProvider, which is the intended pattern across all satellite sites.
| if (!listProviders().includes(config.llm.default.provider)) { | |
| throw new Error("No provider available for approval conversation"); | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
| export async function resolveProviderForCallSite( | ||
| callSite: LLMCallSite, | ||
| config: ProvidersConfig, | ||
| opts: { overrideProfile?: string } = {}, | ||
| ): Promise<Provider | null> { | ||
| // resolveCallSiteConfig works on the full LLM config, not the narrow | ||
| // `ProvidersConfig.llm` view used elsewhere in the registry. Cast through | ||
| // to keep ProvidersConfig as the public shape; schema-level alignment | ||
| // happens in `schemas/llm.ts` (provider_connection field added there). | ||
| const resolved = resolveCallSiteConfig( | ||
| callSite, | ||
| config.llm as Parameters<typeof resolveCallSiteConfig>[1], | ||
| opts, | ||
| ); | ||
| if (resolved.provider_connection) { | ||
| const connectionProvider = await tryResolveProviderForConnectionName( | ||
| resolved.provider_connection, | ||
| config, | ||
| ); | ||
| if (connectionProvider) return connectionProvider; | ||
| } | ||
| try { | ||
| return getProvider(resolved.provider); | ||
| } catch { | ||
| return null; | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 Dead code: resolveProviderForCallSite is exported but never imported or called
The function resolveProviderForCallSite at connection-resolution.ts:115-141 is exported but has zero import sites or call sites anywhere in the codebase (verified with a repo-wide search). This violates the AGENTS.md dead code removal rule: "Proactively remove unused code during every change… Ask: 'After my change, is there any code that nothing calls, imports, or references?' If yes, delete it."
| export async function resolveProviderForCallSite( | |
| callSite: LLMCallSite, | |
| config: ProvidersConfig, | |
| opts: { overrideProfile?: string } = {}, | |
| ): Promise<Provider | null> { | |
| // resolveCallSiteConfig works on the full LLM config, not the narrow | |
| // `ProvidersConfig.llm` view used elsewhere in the registry. Cast through | |
| // to keep ProvidersConfig as the public shape; schema-level alignment | |
| // happens in `schemas/llm.ts` (provider_connection field added there). | |
| const resolved = resolveCallSiteConfig( | |
| callSite, | |
| config.llm as Parameters<typeof resolveCallSiteConfig>[1], | |
| opts, | |
| ); | |
| if (resolved.provider_connection) { | |
| const connectionProvider = await tryResolveProviderForConnectionName( | |
| resolved.provider_connection, | |
| config, | |
| ); | |
| if (connectionProvider) return connectionProvider; | |
| } | |
| try { | |
| return getProvider(resolved.provider); | |
| } catch { | |
| return null; | |
| } | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
| * This replaces the per-file `wrapWithCallSiteRouting` helpers that lived | ||
| * in `approval-generators.ts` and `guardian-action-generators.ts` so the | ||
| * connection-aware routing wiring stays in one place. |
There was a problem hiding this comment.
🟡 Comment references removed code, violating assistant/AGENTS.md rule
The JSDoc for wrapWithCallSiteRouting says "This replaces the per-file wrapWithCallSiteRouting helpers that lived in approval-generators.ts and guardian-action-generators.ts" — those helpers were removed in this very PR. The assistant/AGENTS.md rule states: "When writing or updating comments, do not reference code that has been removed. Comments should describe the current state of the codebase, not narrate its history. Avoid phrases like 'no longer does X', 'previously used Y', or 'was removed in PR Z'."
| * This replaces the per-file `wrapWithCallSiteRouting` helpers that lived | |
| * in `approval-generators.ts` and `guardian-action-generators.ts` so the | |
| * connection-aware routing wiring stays in one place. | |
| * Shared helper used by all satellite sites (daemon conversation/approval/guardian | |
| * generators, subagent manager). The wrapper consults the registry for | |
| * alternate-provider resolution and routes through `provider_connection` | |
| * via the shared connection-resolution helper. |
Was this helpful? React with 👍 or 👎 to provide feedback.
…le-3 `config-watcher.test.ts` mocks `providers/registry.js` with a stub factory that doesn't include the new exports added by the inference providers refactor: `clearConnectionProviderCache` (added in #30162) and `resolveProviderFromConnection` (used transitively by the new `connection-resolution.ts` module). When the test file is loaded in isolation — or in some test orderings — the import chain through `providers/inference/connections.ts` and `providers/connection-resolution.ts` hits a `SyntaxError: Export named '...' not found`. Adds both names to the mock factory. Verified locally: bun test src/__tests__/config-watcher.test.ts → 19 pass / 0 fail. Pre-existing on main (#30162 added the first import without updating the mock); landing here so CI can run Test green for this PR.
Four review findings on PR #30198 commits c3416ac + 8b335eb: * **[Codex P1]** `provider-send-message.ts:127` — provider/connection mismatch silent misroute. A profile that names `provider: "openai"` together with an Anthropic-flavored `provider_connection` would silently dispatch traffic to the connection's backend (Anthropic) while metadata still reported the resolved profile (OpenAI). Now we verify `connection.provider === expectedProvider` and fall through to the legacy registry path (with a warn log) on mismatch instead of silently misrouting. `tryResolveProviderForConnectionName` gains an optional `expectedProvider` parameter; canonical dispatch passes `resolved.provider`, `resolveDefaultProvider` passes `profile.provider`, and `CallSiteRoutingProvider`'s async hook signature is widened to `(connectionName, expectedProvider)` so the shared wrapper threads the value through. * **[Codex P2 / Devin BUG #1]** `approval-generators.ts:141` — stale `listProviders().includes(...)` guard blocks connection-aware resolution. In `createApprovalConversationGenerator`, the legacy guard threw "No provider available" before the new `resolveDefaultProvider(config)` could resolve from connection auth. In `your-own` configurations where the default provider lives entirely behind a `provider_connection` and never appears in the legacy `listProviders()` list, this caused false failures. Removed the guard; `resolveDefaultProvider` already returns null on miss and we throw on null (right "no provider available" signal). `listProviders` import dropped along with it. * **[Devin BUG #2]** Dead code — `resolveProviderForCallSite` in `connection-resolution.ts` was exported for "completeness" but never imported or called anywhere. Deleted (and its now-unused imports of `resolveCallSiteConfig` + `LLMCallSite`). * **[Devin BUG #3]** JSDoc for `wrapWithCallSiteRouting` referenced helpers "that lived in approval-generators.ts and guardian-action-generators.ts" — those were removed in this PR, so the comment described history rather than current behavior. Rewrote to describe what the function does, not what it replaced. New gate test in `satellite-connection-routing.test.ts`: - profile says provider=openai, provider_connection points at an anthropic-flavored row → connection-resolution hook returns null on mismatch → legacy `getProvider("openai")` path produces the OpenAI stub. Without this fix the connection's anthropic stub would have run instead. Verified: - `bunx tsc --noEmit` → clean. - `bun test src/providers/__tests__/` → 103 pass / 0 fail / 209 expects (was 102/0/206; +1 test, +3 expects).
|
Addressed all 4 review findings in d060474. [Codex P1] provider-send-message.ts:127 — silent misroute on provider/connection mismatch. Real risk. Fixed by adding optional [Codex P2 / Devin BUG#1] approval-generators.ts:143 — stale [Devin BUG#2] connection-resolution.ts:141 — dead [Devin BUG#3] call-site-routing.ts:157 — JSDoc references removed helpers. Rewrote to describe what New gate test added: profile says |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d060474661
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ); | ||
| return null; | ||
| } | ||
| return resolveProviderFromConnection(connection, config); |
There was a problem hiding this comment.
Catch connection-resolution errors before fallback path
tryResolveProviderForConnectionName returns resolveProviderFromConnection(...) directly, so any rejection from that async path bubbles out instead of returning null. This breaks the commit’s stated “fall through to legacy registry” behavior: a transient error while resolving connection auth (for example in resolveAuth credential reads) will make getConfiguredProvider/CallSiteRoutingProvider throw and fail inference rather than gracefully using getProvider(resolved.provider). Wrap this call in try/catch and return null on error so misbehaving connections cannot hard-block dispatch.
Useful? React with 👍 / 👎.
…through to legacy Codex P1 follow-up on PR #30198 (#discussion_r3215235367): `tryResolveProviderForConnectionName` returned the `resolveProviderFromConnection(connection, config)` promise directly, so any rejection from the inner async path — credential read failure in `resolveAuth`, managed-proxy context lookup throwing, transient provider-impl construction errors — bubbled out of the helper and hard-failed dispatch instead of falling through to the legacy `getProvider(resolved.provider)` path the helper's contract advertises. Wrap in try/catch; log a warn with `{ err, connectionName }` and return null so `getConfiguredProvider` and `CallSiteRoutingProvider` can serve the request through the legacy registry. The behavior now matches the lookup-failure and not-found branches above. New gate test in satellite-connection-routing.test.ts: a connection whose `resolveProviderFromConnection` throws does not fail dispatch. The wrapper catches the rejection, logs, and falls through; the profile's resolved provider matches default → reused default instance. Test infrastructure adds a `connectionsThatThrowOnResolve` Set so individual tests can opt into the throwing path without affecting the other cases. Verified: - `bunx tsc --noEmit` → clean. - `bun test src/providers/__tests__/satellite-connection-routing.test.ts` → 6 pass / 0 fail / 23 expects (was 5/0/19; +1 test, +4 expects).
Why
Cycles 1+2 (#30162) shipped the auth-first refactor:
provider_connectionsrows, aresolveProviderFromConnection(connection, config)adapter, theinference providers connectionsCLI surface, the every-boot idempotent backfill, and the schema-levelprovider_connection?: stringfield on profiles.But
resolveProviderFromConnectionwas effectively dead code — zero call sites in production. A profile could name aprovider_connection: "anthropic-managed"and the dispatcher would happily ignore it and bind to the legacy default-auth path, with only a metadata difference in the request log. Noa flagged this in cycle-3 review.This PR wires that gap shut.
What
Two code paths needed wiring; both go through one shared helper module.
1. Canonical dispatch —
provider-send-message.ts(getConfiguredProvider)resolveConfiguredProvideralready read the resolved profile, so the wiring is local: when the resolved profile names aprovider_connection, look up that row, resolve throughresolveProviderFromConnection, and use the connection-bound Provider for the actualsendMessagecall. On any miss (lookup error, row not found, auth resolution fails), log a warning and fall through to the legacygetProvider(name)path so misconfigured connections never hard-block inference.2. Satellite construction-time path — five sites
subagent/manager.ts,daemon/conversation-store.ts,daemon/approval-generators.ts(×2 generators),daemon/guardian-action-generators.ts(×2),home/rollup-producer.ts. Each of these constructs aProviderfromconfig.llm.default.provideronce and wraps it withCallSiteRoutingProviderfor per-call site routing.Each of these now uses one shared resolution shape:
CallSiteRoutingProvidergains an optional asyncresolveByConnectionconstructor parameter (default behavior unchanged for any caller that doesn't pass it). When the per-call resolved profile names aprovider_connection, the wrapper consults that hook before falling through togetProviderByName(resolved.provider). This keeps the per-call dispatch consistent with the canonical path.3. Shared helpers —
providers/connection-resolution.ts(new)tryResolveProviderForConnectionName(name, config)— promoted from a private function inprovider-send-message.ts. Single source of truth for "given a connection name, give me a Provider bound to that auth (or null on miss)".resolveDefaultProvider(config)— for the satellite construction-time pattern. Readsconfig.llm.default.{provider, provider_connection}.resolveProviderForCallSite(callSite, config, opts)— exported analogue for per-callsite resolution outside the routing wrapper.wrapWithCallSiteRouting(base, config)— single helper incall-site-routing.tsthat replaces the per-file wrappers inapproval-generators.tsandguardian-action-generators.ts.Resolution policy (both paths, identical)
provider_connection→ async-resolve through that connection's auth. On miss, fall through to step 3.providermatches the default's name → reuse the default provider instance.getProvider(resolved.provider); fall back to default if the registry can't produce one.The legacy fallback at every step is intentional: cycle-3 should not regress when a connection misbehaves. Cycle-4 will strip the fallbacks once we've shipped one release window with connection-awareness active in prod.
Tests
Two hard-gate test files:
__tests__/dispatch-connection-routing.test.ts(4 tests, 15 expects) — gates the canonicalprovider-send-messagepath. Two profiles, sameprovider, differentprovider_connection. CallsgetConfiguredProvider(callSite)and asserts the auth/connection bound at adapter-call time differs.__tests__/satellite-connection-routing.test.ts(4 tests, 16 expects) — gates the satelliteCallSiteRoutingProviderwrapping path. Hooks fire with the right connection name; the transport that actually runs is the connection-bound stub (not default, not legacy registry).8 gate tests total, covering both code paths.
What's deferred
ProvidersConfig.llm.defaultdoesn't yet exposeprovider_connection?: string(the runtime config has it, but the typed view here doesn't). One small cast inresolveDefaultProvidernotes this; schema-level fix is its own follow-up.getProvider(name)fallback paths after one release window with connection-awareness active in prod.File diff
Stacked on
Branched off
main @ a8c3a014e(the PR #30162 merge commit). No other open PRs depend on this branch yet.